Rownum In Netezza?

Those of you coming from the Oracle world may find you are missing a ROWNUM pseudo column in Netezza.  Sure there’s the LIMIT clause, but sometimes you have a need to assign a sequential number to each row.  The best solution I’ve found is using the analytic function ROW_NUMBER. 

Examples:

SELECT ROW_NUMBER() OVER (ORDER BY CUSTOMER_ID_PK),
*
FROM CUSTOMER
ORDER BY 1;

Important to realize, unlike the ROWNUM the ordering is determined by the ORDER defined in the OVER clause, not by the ordering at the end of the select. I only added the ordering at then end so you see the results sequential, not in the somewhat random spu returned order.

Other examples;

SELECT ROW_NUMBER() OVER (ORDER BY NULL),
*
FROM CUSTOMER
ORDER BY 1;

SELECT ROW_NUMBER() OVER (ORDER BY ROWID),
*
FROM CUSTOMER
ORDER BY 1;

Those familiar with analytic functions may notice the lack of a PARTITION statement.  When it is omitted, the partition is the entire record set.

Posted in SQL Tricks | Tagged , , , | 2 Comments

Make Your Sequences 25 Times Faster

So where I work we have the need to sequence a few hundred million rows a day.  Surprisingly using a Netezza sequence object took about 2 hours on a 10200 to sequence 70 million rows.  This was acceptable for our initial release,  but in the future we have several hundred million more sequences to generate every day,  so this was a big issue.  Having an Oracle DBA background I realized the difference in performance sequence caching could make.  But to my surprise it would not allow me to put a postgres style cache clause to the sequence create statement.   For reference here is the syntax…

CREATE [ TEMPORARY | TEMP ] SEQUENCE name [ INCREMENT [ BY ] increment ]
    [ MINVALUE minvalue | NO MINVALUE ] [ MAXVALUE maxvalue | NO MAXVALUE ]
    [ START [ WITH ] start ] [ CACHE cache ] [ [ NO ] CYCLE ]

But then while investigating another sequence issue I came across the configuration parameter

ENABLE_SEQUENCE_CACHESIZE

So this can be simply used at the session level.  In a sql session just run a set command and set this to 1 or true

set ENABLE_SEQUENCE_CACHESIZE=1

Also to look at your current setting enter

show ENABLE_SEQUENCE_CACHESIZE

And if you are using a sql tool that uses jdbc, the result returns like an error message.  For example

SQLState:  null

ErrorCode: 0

Warning:   NOTICE:  ENABLE_SEQUENCE_CACHESIZE is off

So here’s the gold, once this is set to 1,  you can create your sequence with a cache clause.  You can also alter the sequence and add the cache clause.  Once the sequence is created with a cache,  you do not have to have ENABLE_SEQUENCE_CACHESIZE set to 1.

So what does this do for you?  I set the cache to what I found to be a sweet spot of  50.  I ran a test on our development 10050 and the sequencing of 34 million rows that had taken 6387 seconds,  reduced to only 250 seconds.  That’s over 25 times faster!  This was such a big breakthrough for us,  and it is not really documented anywhere I cound find.  If you use sequences on high volumes this may be the quickest win performace tip you are going to find.

Posted in Performance Tuning for Netezza | Tagged , , | 4 Comments

With Netezza Always Use Integer Join Keys For Good Compression, Zone Maps, And Joins

So I’ll start off with a real simple best practice.  Make sure your join keys are integers.  First off this will mean, if you don’t already, you’ll want to use surrogate keys in your physical table model.  First, integers and data types with underlying integers like timestamp and date will compress with Netezza’s compression.  We have the compression option on our 10X and I hear all Twinfin models have it.  Second, zone maps, these are those clever min and max map on every integer based column for every 3 meg block.  If your data is naturally ordered (like by date) or sorted, these will behave much like Oracle’s partitions, and only the blocks needed will be read.  Zone maps only work on integer based (date and time too) columns.  I’ve done tests, and it seems Netezza will often take advantage of zone maps even through joins,  but ONLY if they are integer.  And third, per Netezza docs,  floating point numerics hash poorly and will force you to slower sort merge joins if joining on numerics not integers.  I haven’t proven this one out, but I did see this in some training docs.  And finally, sticking to a standard of integer data type on keys will help you avoid problems like a NOT achieving good join colocation because you have, say, customer id defined as a varchar in one table, and as an integer in another.  In this case, if both tables are distributed on customer id, you would think the data would colocate, but it won’t because a varchar and an integer are not the same.

Posted in Best Practices | Tagged , , , , | Leave a comment