Showing posts with label SAT. Show all posts
Showing posts with label SAT. Show all posts

Sunday, May 1, 2022

Anti-pattern: Including Business Keys in the SAT HASHDIFF

A Satellite contains the descriptive attributes of a Business Key. As such, a Satellite HASHDIFF should be constructed using the only the descriptive attributes of the Business Key. The Business Key, itself, should not be part of the Satellite HASHDIFF.

Note: While it is a common practice to include the Business Keys in the SAT to eliminate the need for JOINs to HUB to get the BKs, the SAT HASHDIFF should still not contain the Business Keys.

Anti-pattern: Adding column to the middle of the hashdiff

All new columns added to the SAT must be added to the end of the Satellite HASHDIFF. Dan Linstedt describes this as, "Columns that are NULL and at the end of the table are not added to the input of the hash function"[1]. Basically this pattern prevents reloading of the entire dataset when there is a new column added to the SAT and is NULL for the historical records. Loading data into SAT that has not changed is an Anti-pattern.

  1. Linstedt, D. and Olschimke, M., 2016. Building a Scalable Data Warehouse with Data Vault 2.0. Morgan Kaufmann, p.369.

Thursday, April 7, 2022

Anti-pattern: Using Multi-Active SAT to model data with multiple records for the same Business Key that arrive one Micro-batch

When using a Change Data Capture (CDC) tool it is possible to get multiple records for the same Business Key in a single micro-batch. Loading this data as-is will will result in multiple records for the same Business Key for the same LOAD_DATE. This is the incorrect loading pattern. A regular Satellite should have only record per Business Key per Load Date i.e. the BK + LOAD_DATE combination should be the Primary Key of the Satellite. Multi-active SAT is an exception.

However having multiple records for a single Business Key in a Micro-batch is not Multi-active. This is just incremental data where multiple increments were captured in a single Micro-batch. This needs to be captured in a regular Satellite. The correct way to add this data to the SAT is to increment the nanoseconds on the LOAD_DATE for each of the record per Business Key that is present in the Micro-batch at the time of the SAT Load. This will ensure the uniqueness of the BK + LOAD_DATE. 

BK + LOAD_DATE will still be the Primary Key

Monday, March 28, 2022

Anti-pattern: Having two or more records in a SAT for a Single Business Key with the same LOAD_DATE

A Satellite, by definition, should have only one record per Business Key per Load Date. The Business Key (or the hash of the Business Key) + LOAD_DATE is the unique key for the record. BK + LOAD_DATE is the Primary of the Satellite

The exception is a Multi-Active Satellite where a Sequence Number is added to indicate the each of the active records. Sequence Number becomes part of the PK of the Multi-active Satellite.

See section 5.4 of the Data Vault Data Modeling Specification v 2.0.2

Update 05/07/2022:
An exception to this rule is when a stable descriptive attribute from the source can be used as part of the Private Key for the SAT. For e.g. the Modify Timestamp from the source can be combined with the HASHKEY and the LOAD_DATE to generate the Private Key, which will be unique for all records in the SAT