Make Your Sequences 25 Times Faster

So where I work we have the need to sequence a few hundred million rows a day.  Surprisingly using a Netezza sequence object took about 2 hours on a 10200 to sequence 70 million rows.  This was acceptable for our initial release,  but in the future we have several hundred million more sequences to generate every day,  so this was a big issue.  Having an Oracle DBA background I realized the difference in performance sequence caching could make.  But to my surprise it would not allow me to put a postgres style cache clause to the sequence create statement.   For reference here is the syntax…

CREATE [ TEMPORARY | TEMP ] SEQUENCE name [ INCREMENT [ BY ] increment ]
    [ MINVALUE minvalue | NO MINVALUE ] [ MAXVALUE maxvalue | NO MAXVALUE ]
    [ START [ WITH ] start ] [ CACHE cache ] [ [ NO ] CYCLE ]

But then while investigating another sequence issue I came across the configuration parameter

ENABLE_SEQUENCE_CACHESIZE

So this can be simply used at the session level.  In a sql session just run a set command and set this to 1 or true

set ENABLE_SEQUENCE_CACHESIZE=1

Also to look at your current setting enter

show ENABLE_SEQUENCE_CACHESIZE

And if you are using a sql tool that uses jdbc, the result returns like an error message.  For example

SQLState:  null

ErrorCode: 0

Warning:   NOTICE:  ENABLE_SEQUENCE_CACHESIZE is off

So here’s the gold, once this is set to 1,  you can create your sequence with a cache clause.  You can also alter the sequence and add the cache clause.  Once the sequence is created with a cache,  you do not have to have ENABLE_SEQUENCE_CACHESIZE set to 1.

So what does this do for you?  I set the cache to what I found to be a sweet spot of  50.  I ran a test on our development 10050 and the sequencing of 34 million rows that had taken 6387 seconds,  reduced to only 250 seconds.  That’s over 25 times faster!  This was such a big breakthrough for us,  and it is not really documented anywhere I cound find.  If you use sequences on high volumes this may be the quickest win performace tip you are going to find.

This entry was posted in Performance Tuning for Netezza and tagged , , . Bookmark the permalink.

4 Responses to Make Your Sequences 25 Times Faster

  1. Ibrahim Alagöz says:

    Hello,
    I have a question about sequences. I read your article about “Make Your Sequences 25 Times Faster” and I practiced for myself. But I didn’t get the same performance impact.

    First I want to explain what I did:
    - I recognized that the ENABLE_SEQUENCE_CACHESIZE is of (show ENABLE_SEQUENCE_CACHESIZE);
    - I set the parameter: set ENABLE_SEQUENCE_CACHESIZE=1
    - then I created a sequence: CREATE SEQUENCE seq CACHE 50;
    -then I repeated the following sql-statement: “SELECT NEXT VALUE FOR seq” 1000 times. but I got the same delay (4 seconds) if I didn’t enable the cache parameter.

    - or should I use the following SQL-Statement “SELECT NEXT 1000 VALUE FOR seq”. But this statement only returns one value.

    Where is my mistake?

    Thank you for your attention!

    • NZGuy says:

      So first I should update and say I notice sequence creation now seems to be defaulted to a fairly big cache size. So perhaps this tip is no longer needed unless you want to change it. Second when ever you do a select from no table or you select from an external table the sequencing happens from the host and that will be a bottleneck. So really you want this to run from a select on a source table that is internal, and use the sequence in a select statement that will run down on the snippet processors

      so if your source table is

      Create table customer_source
      ( Name varchar(200)
      address varchar(200)
      ) distributed on random;

      and your target is

      create table customer
      ( cust_key bigint,
      name varchar(200),
      address varchar(200)
      ) distribute on random

      and let say you set the enable sequence cachesize

      and
      create sequence cust_seq cache 50;

      You then use the seq in an insert select

      Insert into customer
      ( cust_key,
      name,
      address)
      select
      next value for cust_seq as cust_key,
      name,
      address
      from customer_source ;

      This will run down in the snippet processors.

      Note that there is always an overhead in starting up any DML statement in Netezza, so the trick is to alway process large batches of data at one time, though 4 seconds seems high. But we alway say load 1 row or load 5 million, it will take 1 or 2 seconds.

  2. Mike Lapenna says:

    We have been using Netezza for about a year and are happy with sequences except for the fact that the gaps can be fairly large. Due to data volumes, data types have been chosen to gross avoid waste of space. When I have a table expecting to hold 50 rows, I could choose a surrogate key of type byteint (which allows -127 to +127). However, if the rows are inserted sporadically, with different processes, I may in fact run out if I use a sequence even though I have very fews rows populated. And, this could happen even if I left myself some room like choosing a smallint. With the variability around the next sequence number, should they be avoided? What’s wrong with rank()?

    • NZGuy says:

      Mike. Take a look at this post

      http://nztips.com/2010/11/netezza-alternate-method-for-sequences

      We use this for all of our smaller tables, and for us I would define that as probably < 10M rows. We only use the sequence object to populate BIGINT which is big enough to handle the gaps. However, another thing you might want to think about , Netezza compression does a very good job of compressing sequential integers, and remember that compression is at the spu or dataslice level, so it those gaps you see usually aren’t happening within a dataslice (unless with system restarts where cached sequences are lost). But it’s important to have your data sorted. to be clear and simple , what I’m saying is often after compression a series like 10000000001 10000000002 10000000003 10000000004 Stored in a bigint, will compress to about the same space as 1,2,3,4 in a smallint. Runs some tests and prove it out for yourself. I hope this helps.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>