With Netezza Always Use Integer Join Keys For Good Compression, Zone Maps, And Joins

So I’ll start off with a real simple best practice.  Make sure your join keys are integers.  First off this will mean, if you don’t already, you’ll want to use surrogate keys in your physical table model.  First, integers and data types with underlying integers like timestamp and date will compress with Netezza’s compression.  We have the compression option on our 10X and I hear all Twinfin models have it.  Second, zone maps, these are those clever min and max map on every integer based column for every 3 meg block.  If your data is naturally ordered (like by date) or sorted, these will behave much like Oracle’s partitions, and only the blocks needed will be read.  Zone maps only work on integer based (date and time too) columns.  I’ve done tests, and it seems Netezza will often take advantage of zone maps even through joins,  but ONLY if they are integer.  And third, per Netezza docs,  floating point numerics hash poorly and will force you to slower sort merge joins if joining on numerics not integers.  I haven’t proven this one out, but I did see this in some training docs.  And finally, sticking to a standard of integer data type on keys will help you avoid problems like a NOT achieving good join colocation because you have, say, customer id defined as a varchar in one table, and as an integer in another.  In this case, if both tables are distributed on customer id, you would think the data would colocate, but it won’t because a varchar and an integer are not the same.

This entry was posted in Best Practices and tagged , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>