From Data Silos to Data Freedom

In our first blog on data governance, we explored how unorganized data fractures analytics capabilities. In our second, we described the components for successful data infrastructure. Now, to wrap up this series, we’ll detail a modern, general architecture to successfully realize data impacts at scale.

The scope of data analytics projects varies widely—from automating dashboards to deploying production-grade machine learning models. We see three project levels.

Analytics Project Levels

1. Business Analytics: Small analytics initiatives include business intelligence efforts such as dashboarding, organizing Excel spreadsheets, basic automation, and collecting data via surveys, sensors, or outputs from products.

2. Data Science: Medium to large organizations can dedicate data science experts to performing advanced analytics and model building. Improving efficiency, cost savings, and business insights make the investment worthwhile.

3. Production-Level Model Building: Large organizations are heavily investing in creating fully automated pipelines from the data sources to the output of machine learning models. This requires expertise in data science, data engineering, MLOps, and software engineering.

Data pipelines at every maturity level need tools to support three overarching layers:

1. Data Sources

In this layer, anything that produces data is considered a data source—from applications and hardware to client surveys and quarterly reports.

Data management icon

2. Data Management

This layer focuses on properly storing, validating, re-engineering, and cataloging data. Good management is critical for extracting valuable insights.

Data consumption icon

3. Consumption

In this layer, data is ready to be leveraged in predictive modeling, applications, and dashboards—driving insights and decisions.

Data Sources

Data comes from so many directions—via phone, website, cloud, customer, apps, sensors, sales, etc. Each source has specific needs for ingesting, using, and retaining. Businesses face a real challenge in keeping track of all these resources. Leaders should work with data stewards across the organization to decide on their most valuable data and support the unique needs of that data.

Data Management

Quote: Ideally, key data from all teams feeds into a central location that can be treated as the source of truth. Every team can define their unique data needs depending on their mission. But that purpose needs to be holistically aligned with the rest of the organization. Ideally, key data from all teams feeds into a central location that can be treated as the source of truth.

Heavy lifting for data management and governance takes place early in the process. As your organization’s data governance takes shape, many data processes are standardized, and automation becomes easier to implement. As the technical foundation solidifies, business leaders can readily adjust rules and standards as needed.

Consumption

Once data is organized, it is ready to be used. Team members from all over the organization aim to use data in valuable ways—from gaining business insights to building complex machine learning models.

But data teams often disagree on how the data should be organized. Figure 1 shows how messy a data architecture can look even with only a few data sources.

Figure showing process with ungoverned data practices

We see systems like Figure 1 quite often. Teams work within their silos, organizing the data how they want, and data engineering efforts are duplicated in multiple places. It is difficult for the business as a whole to manage their data ecosystem—and even harder to extract value from the data.

For the data consumer, the result is navigating a complex network of permissions, data sources, and personnel before accessing the data needed. Many data science teams must do this repeatedly across departments—even for one initiative.

A data science professional’s time is best spent analyzing the data, not repurposing it due to poor infrastructure. Building a solid data architecture empowers your teams to focus on their best work to support the enterprise.

Figure 2 illustrates the simplicity of having centralized data storage:

Figure showing process with governed and structured data practices

Though the storage structure is centralized, individual data teams can still act independently. When a team wants to publish valuable data to the wider organization, it connects to the enterprise ecosystem. As teams align around the enterprise data governance standards, they can more easily work together to ensure those standards are met.

Consolidating data efforts ramps up efficiency across the organization. And data security around the company’s most valuable data assets ensures the right protections across them.

Once a team’s data products meet the required enterprise standards, they can be made discoverable in a marketplace or catalog—a one-stop shop for valuable datasets. This can lead to wonderful innovations internally by uniting data that normally doesn’t connect.

Even more importantly, with a generally structured flow, like in Figure 2, your analytics teams will be liberated from hunting down required data. With your organization’s data at their fingertips, they can focus on what they do best—extracting valuable insights to make your business better.

Quote: Every organization needs to assess their data needs and build an architecture complimenting those needs. Every organization needs to assess their data needs and build an architecture complimenting those needs. Mature data teams build pipelines from their many data sources into one or more ecosystems. This allows teams to work as they want with their data. Then, when they are ready to publish their data to the broader enterprise, there are established ecosystems with governance rules in place.

As datasets meet the enterprise standards, they become available to wider use cases and forms of consumption, increasing your data empowerment while setting you up for the future.

Conclusion

Teams have their own data ecosystems for good reasons. Our goal isn’t to disrupt how a team operates but rather to streamline as much of the data processing and organization as possible at a larger scale.

Aligning teams around digital transformation requires good communication, documentation, and transparent discussions. As teams get more integrated with the primary data ecosystem, they can begin to offload some of the work (and costs) from their own ecosystems.

If you’re looking to enhance your organization’s data governance efforts, our team is here to help. We’d be glad to provide you with a thoughtful, proven approach tailored to your organization’s goals.

From Data Silos to Data Freedom

Author:

Date Published:

Analytics Project Levels

1. Data Sources

2. Data Management

3. Consumption

Data Sources

Data Management

Consumption

Conclusion

From Data Silos to Data Freedom

Author:

Date Published:

Analytics Project Levels

1. Data Sources

2. Data Management

3. Consumption

Data Sources

Data Management

Consumption

Conclusion

Author

Jacob Turney

Former Technical Business Analyst