Efficient Data Upload Method Via External Table

If you ever have tried to upload data to Netezza using convention query tool methods which create individual inserts, you will know this can be EXTREMELY SLOW.  The way to upload data efficiently is to create an on-the-fly external table on your local datafile and use that in a CTAS (create table as) or an insert into a table.  This is similar to the data download method in the previous post.  In my example I create a temp table, but it could also be a perm table. This example is for comma delimited with backslash as the escape character, but you can change this to match your source file.  If you cut and paste these examples, becareful your quotes don’t become open and close quotes.

Via ODBC Like Aginity Workbench

CREATE TEMP TABLE BUBBA5 AS

SELECT * FROM EXTERNAL ‘C:\\Users\\ValuedCustomer\\customer dim dump.csv’ ( CUSTOMER_FIRST_NAME VARCHAR(50) NOT NULL,

CUSTOMER_LAST_NAME VARCHAR(50) NOT NULL,

CUSTOMER_ADDRESS VARCHAR(200) NOT NULL,

CUSTOMER_CITY VARCHAR(50) NOT NULL,

CUSTOMER_STATE VARCHAR(20) NOT NULL )

USING ( QUOTEDVALUE ‘DOUBLE’ NULLVALUE ” CRINSTRING TRUE CTRLCHARS TRUE

LOGDIR ‘C:\\Users\\ValuedCustomer\\AppData\\Local\\Temp\\mylog.tmp\\’ DELIMITER ‘,’ Y2BASE 2000 ENCODING ‘internal’ REMOTESOURCE ‘ODBC’ ESCAPECHAR ‘\’ )

Via JDBC Like with Squirrel SQL

CREATE TEMP TABLE BUBBA5 AS

SELECT * FROM EXTERNAL ‘C:\\Users\\ValuedCustomer\\customer dim dump.csv’ ( CUSTOMER_FIRST_NAME VARCHAR(50) NOT NULL,

CUSTOMER_LAST_NAME VARCHAR(50) NOT NULL,

CUSTOMER_ADDRESS VARCHAR(200) NOT NULL,

CUSTOMER_CITY VARCHAR(50) NOT NULL,

CUSTOMER_STATE VARCHAR(20) NOT NULL )

USING ( QUOTEDVALUE ‘DOUBLE’ NULLVALUE ” CRINSTRING TRUE CTRLCHARS TRUE

LOGDIR ‘C:\\Users\\ValuedCustomer\\AppData\\Local\\Temp\\mylog.tmp\\’ DELIMITER ‘,’ Y2BASE 2000 ENCODING ‘internal’ REMOTESOURCE ‘JDBC’ ESCAPECHAR ‘\’ )

This entry was posted in Performance Tuning for Netezza, SQL Tricks and tagged , , , , , . Bookmark the permalink.

10 Responses to Efficient Data Upload Method Via External Table

  1. Kerr says:

    Thanks for this tip! I was having trouble with a large Netezza upload, and your post pointed me in the right direction. I’m new to Netezza, and I didn’t even know you could load/unload via remote in this manner… super simple!

  2. Would this method be faster or slower than using nzload?

    • NZGuy says:

      My understanding is nzload and external tables use a very similar method, if not the same method under the covers. Local to the host, we’ve seen load performances be about the same with the two methods. Now I might speculate remotely (client side) running a commandline nzload using a native connection might be a bit more efficient than an odbc or jdbc connection in the method mentioned. However for us this is a method we teach even our analysis using a sandbox TF3, so it so far has proven more user friendly of a method that they can use from their query tool.

      But the real benefit is for those who have temp table create on our production environment. In a tool like Squirrel you can create and upload to a temp table with this method and then use that table in queries within the same open session. Things like custom filter sets or custom grouping sets. Obviously this won’t work with nzload because it will execute within it’s own session. No way to create the temp table, and no way to use it all from within the nzload session.

      • Gary Colbran says:

        nzload is a wrapper for external tables and uses the same process to load and unload data. Look at the active queries whilst running nzload and you will see the translated command.
        Over the network will almost always be slower than from the system, so if you can get your files mounted on the server, use nzload from there.
        Don’t be tempted to use /nzscratch as a landing area, it’s not supported and can result in data loss.

        • NZGuy says:

          Thanks for confirming. I would add obviously over a faster backside network, like a 10Gb/s network from the host to a NAS will be faster than from some slower frontside network. But often these backside storage devices are not accessable to people who are more of an end user.

  3. JeanV says:

    Hi, The sql code for uploading process is great!!
    If my data has negtive value (for example: -34.12345) and set the field as varchar(12), I received an error message: ERROR [HY000] ERROR: External Table : count of bad input rows reached maxerrors limit. I tried to use Import tool from Aginity, I got error message too. Could you tell me how to handle negative value using the upload procedure.
    Thanks

    • NZGuy says:

      Something in your data is not fitting into one of your defined columns. If you look in your defined logdir, you should find a nzlog file with the error details. Also you can add a parameters MAXERRORS 100000 to allow it to finish. Rejected rows should be in the logdir in a nzbad file.

    • JeanV says:

      Hi,
      Thanks for your reply. I am using ODBC upload method not Unix method. I can’t use nzlog or nzbad. Is there a log file in PC I can use? Can I add a parameters MAXERRORS 100000 in ODBC connect to allow it to finish?

      here is part of my ODBC load code
      USING ( QUOTEDVALUE ‘DOUBLE’ NULLVALUE ” CRINSTRING TRUE CTRLCHARS TRUE

      LOGDIR ‘C:\\Documents and Settings\\vyw\\My Documents\\file_JV\\mylog.tmp\\’ DELIMITER ‘,’ Y2BASE 2000 ENCODING ‘internal’ REMOTESOURCE ‘ODBC’ ESCAPECHAR ‘\’ )

      jean

  4. External Table 350 columns says:

    Is there any restriction in the no. of columns which can be used for the external table.
    I have a flat file which has 367 columns in it.

    • NZGuy says:

      A conventional table has a limit of 1600 columns and a row max of 64k bytes

      From the 7.0 docs

      Note that a table cannot have the following:

       More than 1600 columns. In practice, the effective limit is lower because of tuplelength
      constraints.

      I would tend to think that external tables could not have a limit less then this because nz_backup uses external tables under the covers so they need to be as big as the tables they are backing up.

Leave a Reply to Gary Colbran Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>