Data Lineage

Data lineage reveals the life cycle of data; it seeks to depict the entire data flow from beginning to end. Data lineage is understanding, capturing and displaying data as it moves across different databases.
Data lineage has a huge impact on the data governance process. Data lineage is one of the best ways to manage risk and ensure data is handled following organisational policies and regulatory standards.
Data Lineage Process (Learning Center. (n.d.). What is Data Lineage | Examples of Tools and Techniques | Imperva. [online] Available at: https://www.imperva.com/learn/data-security/data-lineage/.)

Data lineage in the Aristotle Metadata Registry:

The Aristotle Metadata Registry has a feature to graphically show different relationships between metadata. Data lineage helps users depict data flow from different data sources in the Registry.
Different Data lineage relationships that users can create in the Registry:
  • Dataset Relationship: Users can create relationships between different datasets showing the source of the current dataset.
  • Distribution Relationship: Users can create relationships between different distributions showing the source of the current distribution.

Steps to create Data lineage for a Dataset

  1. 1.
    Data lineage is an in depth description of where data is coming from. To add a source to any Dataset, go to the Dataset to which the source Dataset will be added.
  2. 2.
    From the Actions Tab, Click on "Open Item editor".
  1. 3.
    Select the Components Tab.
  1. 4.
    Scroll Down to
    . Under the Source Datasets tab, the user can link the source Dataset to the current Dataset.
  1. 5.
    Click on Save Changes to add the source Dataset to the current Dataset.

Steps to create Data lineage for a Data Element

  1. 1.
    Users can also add a source to the Distribution and Data Element. Go to the Distribution item page to add a source to the Distribution.
  2. 2.
    Click on the Actions drop-down and select "Open item editor".
  1. 3.
    Go to the Components tab and scroll down to "Data Element Path".
  1. 4.
    Select a Data Element from the side panel to add the source.
  1. 5.
    Scroll down to "Lineage Details". Users can add additional Lineage details from here.
  1. 6.
    The user can search for the "Filter by distribution" drop-down to create a data lineage relationship at the Data Element level.
  1. 7.
    Users can select the Data Element from the Distribution list under the "Select path name" drop-down.

Steps to create Data lineage for Distributions

  1. 1.
    Go to the Distribution where the user wants to create a data lineage relationship. Click on the Actions drop-down and select "Open item editor".
  1. 2.
    Go to the Components tab and scroll down to Provenance.
  1. 3.
    Under the Source Distribution tab, select the Source Distribution(s) from the searchable list to create the data lineage relationship.
  1. 4.
    Select Save Changes to create the relationship.

Graphical Presentation of the Data Lineage Relationship

  1. 1.
    Go to the metadata item page where the user wants to graphically view the data lineage relationship.
  2. 2.
    Select the Graphs tab on the metadata item page.
  1. 3.
    Click on "Data Lineage" in the Graphs Tab.
  1. 4.
    Users can view different data lineage relationships under the "Data Lineage" section.
  • Blue containers: The blue containers in this visualisation represent the Datasets in the Registry. Users can click the Distributions (in green) to view Data Elements in the Distribution.
  • Users can view the data lineage relationship at different levels under the graphs tab:
    • Relationship at Dataset Level: The user can evaluate the data lineage relationship at the Dataset level (represented by the direction of blue arrows in the Registry).
    • Relationship at the Distribution level: The user can evaluate the data lineage relationship at the Distribution level (represented by the direction of green arrows in the Registry).
    • Relationship at the Data Element level: The user can evaluate the data lineage relationship at the Data Element level (represented by the direction of grey arrows in the registry).