At Elder Research, we have been working with a technical software company to streamline how they use customer satisfaction survey data. The client wants to give consistent attention to certain topics customers mention in survey free text. Our Data Scientists created a two-fold solution: a text-classification model to automatically flag important topics and a dashboard where the Customer Satisfaction Team explores the classified data.
At a basic level, this customer survey dashboard requires:
- A flow of customer survey data to display
- “Clean enough” data, as judged by those who use it
- Output from the text-classification model for display with the surveys
- A consistent location where it can source data
- The data to be organized into a consistent schema designed to feed the dashboard
A Data Engineer recognizes and systematizes requirements like these. This Data Engineering work product is often referred to as a data pipeline, which often looks like:
For this project our Data Engineers surfaced and answered questions such as:
What is the source of the customer survey data?
(It was being delivered by another team and the schema for the delivery needed to be specified.)
What cleaning steps are required to prepare the data for modeling and visualization?
Where will the survey data be stored to allow access by the model and dashboard?
(We provided cloud analytics services to design a a relational database within the client’s AWS infrastructure using Amazon Relational Database Service tools. Had the survey data been massive, it would have been the Data Engineer’s job to select more appropriate tooling.)
How will the model’s output and the accompanying survey data be organized?
(Our team designed an output schema tailored for the dashboard.)
How will the model and its outputs integrate into the data pipeline?
The Data Science work for this project looked like this:
And the Data Engineering work looked like this:
Data Scientists surface insights and perspective to inform decision-making. Data Engineers reliably feed and integrate Data Science work products into an organization’s data infrastructure. Data Engineering naturally augments and serves the mission of Data Science, amplifying its power.