Showing posts with label LOAD_DATE. Show all posts
Showing posts with label LOAD_DATE. Show all posts

Thursday, April 7, 2022

Anti-pattern: Using Multi-Active SAT to model data with multiple records for the same Business Key that arrive one Micro-batch

When using a Change Data Capture (CDC) tool it is possible to get multiple records for the same Business Key in a single micro-batch. Loading this data as-is will will result in multiple records for the same Business Key for the same LOAD_DATE. This is the incorrect loading pattern. A regular Satellite should have only record per Business Key per Load Date i.e. the BK + LOAD_DATE combination should be the Primary Key of the Satellite. Multi-active SAT is an exception.

However having multiple records for a single Business Key in a Micro-batch is not Multi-active. This is just incremental data where multiple increments were captured in a single Micro-batch. This needs to be captured in a regular Satellite. The correct way to add this data to the SAT is to increment the nanoseconds on the LOAD_DATE for each of the record per Business Key that is present in the Micro-batch at the time of the SAT Load. This will ensure the uniqueness of the BK + LOAD_DATE. 

BK + LOAD_DATE will still be the Primary Key


Monday, March 28, 2022

Anti-pattern: Having two or more records in a SAT for a Single Business Key with the same LOAD_DATE

A Satellite, by definition, should have only one record per Business Key per Load Date. The Business Key (or the hash of the Business Key) + LOAD_DATE is the unique key for the record. BK + LOAD_DATE is the Primary of the Satellite

The exception is a Multi-Active Satellite where a Sequence Number is added to indicate the each of the active records. Sequence Number becomes part of the PK of the Multi-active Satellite.

See section 5.4 of the Data Vault Data Modeling Specification v 2.0.2

Update 05/07/2022:
An exception to this rule is when a stable descriptive attribute from the source can be used as part of the Private Key for the SAT. For e.g. the Modify Timestamp from the source can be combined with the HASHKEY and the LOAD_DATE to generate the Private Key, which will be unique for all records in the SAT

HASHKEY + LOAD_DATE + MODIFY_TS

Tuesday, February 22, 2022

Anti-pattern: Load Dates that are anything other than time of loading the Staging

By using the Load Date we should be able to identify all the Data that was loaded into DV in that particular batch. If the Load Date is some else, for e.g. the Load Date from the ETL tool, the entire batch in DV can not identified using the Load Date.