Why it's important to map data dependencies and how to get it done

Understanding how data flows and interacts with the systems in your stack has great benefits.
Blog Post Main Image
The benefits of dependency mapping
What to consider before dependency mapping
The three most common data dependency mapping techniques
Tools for mapping data dependencies
Why data dependency mapping might be right for you

At some point, you will be working with a messy, disorganized tech stack. Maybe your organization started using new products before considering how they interacted with others. Or you inherited someone else’s code. 🤯 Mapping data dependencies will show you and your team how data flows and interacts with the systems in your stack.

Companies lose money from the proliferation of data and become more susceptible to security vulnerabilities and costly regulations.

According to Hiscox, “The mean figure for losses associated with all cyber incidents among firms reporting attacks has risen from $229,000 last year [2018] to $369,000 [2019] — an increase of 61%.” 💰

Having a data dependency map will not only help you better understand your tech stack, but it will also allow you to make more informed decisions going forward.

Here’s what you can do to help clean things up. 🧹

The benefits of dependency mapping

At first, it might seem like a lot of extra work to set up—and it can be—but there are clear reasons why you should create a data dependency map.

Picture courtesy of Pathway Systems

It gives you full awareness of your data and helps data teams design better tracking plans. It can also ensure that it won’t break any of the tracking systems when the analytics code is updated or removed. This is especially important when you are changing code at source and its implications on downstream systems. Tracking the implications that change might have on depending systems will save you and your team time as you can see where depending systems might break due to changes.

Sounds great, right? 🤩 There are a lot of benefits that come from making a data dependency map.

Better understanding of the technology environment

A well-designed map allows anyone to easily see how the systems interact, helping you track which systems interact with data and where the data goes, step-by-step. Something like this can show you exactly how essential each platform is in your stack.

Picture courtesy of IntelligenceBank

This helps in planning future products or components as well, as you can see where they can aid in data integration or migration.

Improved accuracy

Mapping out your data dependencies will assist you in maintaining data accuracy while data moves from its source to its destination. It’s simple as that—by performing thorough data mapping, you can feel confident in your data quality in the warehouse.

By giving your team a complete view of your infrastructure and dependencies, you can track how each component works with the others.

Picture courtesy of V3B.com

You can also use the map to identify the root causes of application disruptions. If you're having an issue with an application, you can start from where it originated and move back along the map to see if there's a specific root cause. Is it in the infrastructure? An application? An outside threat?

Easier to identify risks

Mapping out your data dependencies gives users clear visibility into your tech stack, which can help determine possible failure points that put your business at risk. If done properly, data mapping can be an effective tool for your organization as it typically helps a company in the following areas:

  • Data quality: As the sheer volume of data sources increases, data mapping is more complex than ever. Mapping out data dependencies closes the gap between data models, ensuring that it is accurate and accessible for decision makers to analyze when data is moved throughout your stack.
  • Cyber attacks and data breaches: As companies drive insights from data, protecting users' information has become a must for organizations. A data map can help an organization identify where key data sets are stored, processed, and transmitted. Once organizations figure this out, they can take the necessary steps to protect sensitive information from ending up in the wrong hands.

What to consider before dependency mapping

Sure, you can make a physical map with sticky notes, but there are many tools out there that can help you and your team create a digital version. But before getting started with data mapping, there are two things you should consider.

First, determine the directionality of dependency

When starting with dependency mapping, it's crucial to know how things will fail. By determining where things will fail, you identify vulnerabilities within your stack. When you can identify failures faster within your organization, you can find the quickest way to solve the problem at hand. This will not only save your workers time but will also save your organization money in the long run.

Keep it simple

While data maps should be comprehensive to account for many data sources, they shouldn’t be complicated to understand. Data maps should contain information relevant to your organization and be updated regularly, but there is no need to go overboard when mapping out your dependencies. A complicated data map can be more hurtful than helpful for your organization.

A data map should be simple enough for a layperson to understand, so next time there is a problem within your stack, workers can easily find the root of the problem and solve it in a reasonable amount of time.

The three most common data dependency mapping techniques

While data mapping varies by the complexity of your organization’s tech stack, these three data dependency mapping techniques are the most common among companies.

1. Manual mapping

Most data systems have grown to a point where they are now too complicated to track manually. However, manual mapping is a great place to start if your data system is small, and you don't expect your system to grow.

With manual mapping, developers use languages such as SQL, C++, XSLT, and Java. While this solution does require a lot of work upfront, it can be done, but it will not be as effective as schema or automated mapping.

2. Schema mapping

Schema mapping software compares data sources to the target schema, generating connections. After that is complete, a developer must manually go into the software and verify the information is correct and make changes where needed.

Once the data map is complete, the software generates code to load the data. This is often referred to as a semi-automated strategy as it relies on workers to double-check the work done by the software before moving forward.

3. Automated mapping

Companies are leaning toward automated solutions as it is not necessary to have coding experience to use these systems. Users of these softwares’ drag-and-drop lines between databases making it easier for workers to map out relationships and handle in a reasonable amount of time.

It is important to check for data accuracy with this method as there is still a small chance of human error.

Tools for mapping data dependencies

Fortunately, there are many tools available that can aid you when mapping out your data dependencies. Here are a few tools we recommend: either free, open-source, or a paid solution.

  • Datafold: An up-and-coming data lineage company that helps businesses visualize their data ecosystem. It assures companies that a change to the schema of one table will not affect functionality elsewhere. While they offer a free version for businesses, their paid solution offers various benefits, including Slack integration and live in-product chat support.
  • Monte Carlo: A data lineage company that alerts your organization when data breaks so you can fix it before it reaches the end user. It is a fully automated solution that covers your whole data stack. Monte Carlo is a paid solution that allows businesses to start with a free trial.
  • Datadog: Their APM tool enables organizations to understand service dependencies while monitoring them in real-time and alerting users when a system is down. They offer a free trial for up to 14 days.
  • Prometheus: An open-source solution that enables you to monitor application performance. The solution is known for its high reliability and uptime. Prometheus will alert you to any major changes in behavior in your applications, so you can investigate the problem.

Why data dependency mapping might be right for you

Any company that is truly data-driven should be mapping out their data dependencies. Data that is poorly mapped or not mapped at all will eventually lead to issues downstream as data travels from end-to-end within your organization. Mapping out your data dependencies is a scary task for businesses, especially when you rely on data to make informed business decisions. 😬

Think of mapping your data dependencies as a task that future you will thank later on. 🙏 We are not perfect—data is bound to break at some point regardless of how flawless we think our current solution is, and you know what? That is okay. The process of mapping out your data dependencies will ensure that when data does break, it does not lead to a bigger problem down the line. Take the time to map out your data dependencies; it will save you lots of time hunting down what other system is affected by the failure. When done correctly, data mapping ensures your organization’s data is not only correct but also reliable.

Has your organization started mapping your data dependencies? Do you have any lessons you wish to share with the rest of the community? Drop us an email at team@iterative.ly.