Showing posts with label HASHDIFF. Show all posts
Showing posts with label HASHDIFF. Show all posts

Sunday, May 1, 2022

Anti-pattern: Including Business Keys in the SAT HASHDIFF

A Satellite contains the descriptive attributes of a Business Key. As such, a Satellite HASHDIFF should be constructed using the only the descriptive attributes of the Business Key. The Business Key, itself, should not be part of the Satellite HASHDIFF.

Note: While it is a common practice to include the Business Keys in the SAT to eliminate the need for JOINs to HUB to get the BKs, the SAT HASHDIFF should still not contain the Business Keys.

Anti-pattern: Adding column to the middle of the hashdiff

All new columns added to the SAT must be added to the end of the Satellite HASHDIFF. Dan Linstedt describes this as, "Columns that are NULL and at the end of the table are not added to the input of the hash function"[1]. Basically this pattern prevents reloading of the entire dataset when there is a new column added to the SAT and is NULL for the historical records. Loading data into SAT that has not changed is an Anti-pattern.

  1. Linstedt, D. and Olschimke, M., 2016. Building a Scalable Data Warehouse with Data Vault 2.0. Morgan Kaufmann, p.369.

Wednesday, February 23, 2022

Anti-pattern: Using varchar to store the HashKeys and HashDiffs

Why would you do that? HashKeys and HashDiffs are binary generated using a hashing algorithm like MD5 or SHA-1. Just store them as binary and effectively halve your storage and double your I/O! No need to convert them to the Char to store them as Varchar.