With more organizations discovering the value of using data science to make better decisions, new opportunities are emerging for Data Engineers to provide support and integration for analytics teams. What’s valuable about Data Engineering skills?
The Data Cycle
Data in an organization is like water: people collect it, pipe it around, store it, use it, and sometimes flush it down the drain. Much like the water cycle scientists study, an organization has a data cycle.
A good Data Engineer excels at two related tasks: safely transporting data while also tailoring the data to its use case on arrival. Though a Data Engineer focuses on the middle of the data cycle, their work is meaningful only in light of the entire cycle.
Dynamic Duo: Data Engineering and Data Science
At Elder Research, we have been working with a technical software company to streamline how they use customer satisfaction survey data. The client wants to give consistent attention to certain topics customers mention in survey free text. Our Data Scientists created a two-fold solution: a text-classification model to automatically flag important topics and a dashboard where the Customer Satisfaction Team explores the classified data.
At a basic level, this customer survey dashboard requires:
- A flow of customer survey data to display
- “Clean enough” data, as judged by those who use it
- Output from the text-classification model for display with the surveys
- A consistent location where it can source data
- The data to be organized into a consistent schema designed to feed the dashboard
A Data Engineer recognizes and systematizes requirements like these. This Data Engineering work product is often referred to as a data pipeline, which often looks like:
For this project our Data Engineers surfaced and answered questions such as:
- What is the source of the customer survey data? (It was being delivered by another team and the schema for the delivery needed to be specified.)
- What cleaning steps are required to prepare the data for modeling and visualization?
- Where will the survey data be stored to allow access by the model and dashboard? (We provided cloud analytics services to design a a relational database within the client’s AWS infrastructure using Amazon Relational Database Service tools. Had the survey data been massive, it would have been the Data Engineer’s job to select more appropriate tooling.)
- How will the model’s output and the accompanying survey data be organized? (Our team designed an output schema tailored for the dashboard.)
- How will the model and its outputs integrate into the data pipeline?
The Data Science work for this project looked like this:
And the Data Engineering work looked like this:
Data Scientists surface insights and perspective to inform decision-making. Data Engineers reliably feed and integrate Data Science work products into an organization’s data infrastructure. Data Engineering naturally augments and serves the mission of Data Science, amplifying its power.