A way to ensure auditability in data processing
ref: Petrica Leuca
The trinity of data management: traceability, auditability and reproducibility.
Data auditability has 2 main components:
- Be able to provide an overview of how you work with data — data quality, data processing standards and data user management
- Be able to assess how performant or efficient your way of working with data is
set of standard metadata
Data processing design
We start with data loaded into a raw layer (or stage) and from there we transform and process it into the integration layer. On top of the integration layer, other data-marts or layers might be created.
- The source can be any other application generating data
- Then the data is loaded into the raw/stage layer as an exact copy of what was received from the source
- The integration layer is the layer in which data modeling activities happen
- technical: creating surrogate keys, adding technical validity intervals, creating foreign keys OR
- functional: adding attributes which make sense from a business perspective (type of the user, number of log-ins etc)