There is no doubt that a successful Data Scientist must be proficient in programming, modeling, and data munging (extracting, cleaning, and feature engineering data). However, there is another key skill that is often overlooked: the ability to communicate findings clearly and effectively. If you as a Data Scientist cannot motivate the business buy-in to effect change, your powerful model will collect dust on a shelf. Stakeholders will only trust your model if they understand the value it adds, what has been done to create it, and why it works. They should not be left to trust you and your “black box” blindly. The solution is data storytelling: using the power of narrative to communicate your findings in a way that resonates with your stakeholders. Doing this combines your data science expertise with intuitive visualizations and—most importantly—a story to connect the dots.
Data storytelling frequently employs data visualizations, but it involves much more than presenting a graph. Data visualization is often static: a chart may represent a single facet of the data, or layers of features for a more complex concept. Or, it can be an interactive dashboard where the viewer is free to experiment with different scenarios and reach their own conclusions. Data storytelling takes these ideas a step further. It guides the viewer through the process of formulating a question and leads them towards the desired conclusion in a step-by-step fashion. In short, it takes the viewer on a journey through the data. This difference between data visualization and data storytelling is captured in Moritz Stefaner’s analogy comparing data visualization to portraits:
“[Data] can reveal stories, help us tell stories, but they are neither the story itself nor the storyteller. Portraits have no story to them either. Like a photo portrait of a person, a visualization portrait of a data set can allow you to capture many facets of a bigger whole, but there is not a single story there, either.” 
Data storytelling marries data visualization with a guided narrative. It pairs the data and the graphics with words, not only describing what can be seen in the image, but telling a story to lead you through the analysis process. A narrative “is the way we simplify and make sense of a complex world,”  and data is certainly a complex world to understand.
So then, what does good storytelling entail?
First and foremost, it involves a good story – one focused on a very clear “data ask,” much like a thesis statement in a paper. Let this ask lead the direction of the story in the same way it leads your work as a Data Scientist. Be careful to remove the extraneous tangents encountered along the way and summarize your ask to avoid the complex details of the analytics.
The arc of a good story includes an exposition, rising action, climax, falling action, and conclusion. The exposition is the setting of the data stage; what is the universe of data being examined? The rising action explores the data, building up to the questions and feeding the viewer the data ask. Your questions can include: What is happening? To whom? Where? When? and Why/How? The climax will be the pivotal discovery in the data that makes it possible to answer these questions, and the falling action is the ultimate answer. Lastly, and most importantly, comes the conclusion: what is the one thing you want the viewer to leave with? To keep the story cohesive, the whole story should build up to and support this conclusion—anything extraneous should be stripped away.
Additionally, if it is relevant to the data at hand, create a character to follow through this process. Follow what their experience would be like in the data. For instance, if dealing with churn, create a customer and follow their path; point out this character’s possible motivations (since they are the problem to solve), present the action that could be taken by the company, and show how that company’s action impacts the character’s probability of staying with the company. Creating a character can make it easier to follow the narrative as viewers imagine themselves in this role. William Proffitt’s blog post “Taking Action on Technical Success: A Fable of Data Science and Consequences” is a good example of using a character to illustrate your data story.
Because data storytelling builds on data visualization, excellent visualizations are essential. They are the foundation to data storytelling; without strong data visualizations to support your story, it will crumble. This foundation includes more than just making sure parts are labeled correctly; you also need to choose the best visualization for the task, avoid cluttering the graph, and ensure that your figure tells a truthful story.
The first step is picking the right visualization form for the job. This should be led by the data ask and the conclusion. Are you showing differences over time, differences between categories, or differences based on location? Variations of line graphs are good for showing changes over time because they connect the dots and illustrate the peaks, falls, and growth rate (or lack thereof). If you are wanting to compare distinct categories, this is often done with bar graphs or any other graph that shows size and proportionality. If geographic location is important, it is a good idea to actually visualize these locations with a map (Figure 1). Additionally, if your story involves comparing entities, ensure that the structure of your graph allows for them to be easily compared (Figure 2). These are some of the basic types of visualizations, but they can be combined or enhanced to show more detailed and complex points.
Figure 1 A map showing the dispersion of individuals impacted by Hurricane Katrina. This shows the relative distance that people have traveled, and also uses the size of points to illustrate the volume of individuals in a location. 
Figure 2 These graphs show a company’s sales across the different months in its different locations (shading of bars). The first graph focuses on answering the question of which locations have the better sales and when. The second graph focuses more on which months overall the company has better sales. 
Once you have chosen the visualization for the job, you need to be sure that it is presented in a non-deceiving manner. An important factor in this step is correctly displaying your axes. When using bar graphs, which use size of bars to show differences, it is best to show the full size of the bars by starting the axis at zero. When presenting line graphs, it is usually okay to not start at zero so long as it is clearly noted; line graphs are primarily meant to show increases and decreases over time, rather than absolute size. However, when displaying multiple trends on the same line graph, keep your axes consistent to avoid showing misleading relationships. Expanding on the point with bar graphs, when using anything that utilizes size to illustrate a point, you should always ensure that the full size and area of the representative shapes are proportional to the data. This is particularly important with bubble graphs or other shapes. Using the data to set the radius/diameter of a bubble graph exaggerates differences in its areas, when the data should really be used to determine the area of the bubble since this is closer to what the viewer is registering as the measured dimension.
Figure 3 Bar graph from the White House showing how graduation rates have changed since President Obama has been in office 
Figure 3 shows a bar graph tweeted by the White House. The story this graph is intending to show is how graduation rates have improved under President Obama’s leadership. One flaw with this graph is that, as a bar graph, the origin should start at zero, but by removing the bottom half of the image the proportionality is lost. Figure 4 plots the same information, but with the origin starting at zero, causing the relative differences between the years to appear less drastic. Secondly, because this data is intended to show changes over time, a line graph would be more appropriate. Arguably, if this were a line graph, the y-axis not starting at zero would not be as great a concern, though it would show a greater relative change than if the Y axis started at zero.
Figure 4 Bar graph from Figure 3 adjusted to have the Y axis start at zero 
Figure 5 Line graph showing graduation rates from 1975-2012 
There are also problems with the story the bar graph (Figure 3) is trying to tell. The main flaw is that the graph implies that graduation increases are due to President Obama’s time in office, yet there is no evidence to support this conclusion in the graph. Because this graph exclusively shows graduation rates during his terms, the viewer cannot see what graduation rates were before President Obama took office. Did the rates drastically change once he took office, or is this a part of a long trend that started beforehand? A line graph with the time frame extended to 1975 (Figure 5) shows that graduation rates were already increasing before President Obama took office, but that the graduation rates during his terms reached record highs. It would also be useful to mark on the graph where any significant policy changes occurred so as to associate the changes in graduation rates to relevant political or economic events.
Figure 6 Chart created by Americans United for Life illustrating changes in spending on abortion services and cancer screenings and prevention. 
Figure 6 is a line graph from Americans United for Life that intends to show that abortion spending by Planned Parenthood has increased while the spending for cancer screening and prevention has decreased. One flaw with this graph is that the lines each only connect two data points: spending from 2006 and spending from 2013. A line graph is intended to show continuous data, which would reveal rates and timing of change in the data as well. While this graph successfully shows increases and decreases in spending for each service, it visually implies that these changes are equivalent in value. This misperception is caused by putting each item’s spending on different scales, which also makes it appear as if spending for cancer screenings has dropped below the total amount spent on abortions. This graph is essentially a dual-axes graph (without labeled axes), and those are advised against.
Figure 7 : Alternative improvements to plotting the information from Figure 6
Finding the missing data, and creating a line graph with a single Y-axis (Figure 7) improves the presentation. Which graph is “better” depends on the story you wish to tell. Option A better reveals the relative expenditures; it shows that, though cancer screening spending has been cut in half in the seven years, it still is much higher than abortion spending. On this absolute scale, the rise in abortion service spending—the intended story of the original graph’s author—is not evident in option A. Option B makes the spending changes within each channel evident by making the y-axis represent the percent change since 2006. It reveals that abortion spending has increased, and allows you to compare that percent increase to cancer screening’s percent decrease. What option B lacks is a comparison of what the absolute spending is for each service, which could be alleviated by marking those values on the graph. A comprehensive story then, might first show option A to give the viewer an idea of scale before presenting option B, showing relative changes.
Figure 8 The image on the left is the original chart presented in a Vox Media article, while the one on right has adjusted the sizing of the circles 
A less intuitive misstep in data visualization can be seen in Figure 8. The chart is using circles to represent the size of donations to medical causes. The error made here is that the radius, rather than the area of the circle, was made proportional to the data. This calculation makes the observed differences seem much more extreme than they actually are, as seen in the corrected design. It is important to make shapes have areas that are proportional to the values in the data. Additionally, there are arguments  that most viewers struggle with properly perceiving area comparisons, which these graphs require. Though it may be less visually exciting, it may be safer to stick with traditional bar graphs to avoid these possible errors and viewer misconceptions.
Lastly, once you have chosen the right graph to represent your data, a good visualization should not be cluttered. Only add to the visualization what needs to be there to tell the story. With data storytelling, this often means that the visualization should take up the majority of the space. You can add words to the image to guide the story, but they should be succinct and focused on ensuring that the viewer is taking away the important points. Let the visualization do most of the talking.
To allow the visualizations their deserved attention, the narrative should be easy to follow and not require complex explanations. If a single visualization has too many facets for it to be quickly and easily interpreted, break it down and build it up in layers. Start with the basic concept of the visualization, and then slowly add the layers to the graphic as you delve further into the question or problem at hand. The building of these layers makes the story easier to digest.
Some great examples of strong data story telling are exhibited by Tampa Bay Times “Why Pinellas County is the worst place in Florida to be black and go to public school,” the R2D3 “Visual Introduction to Machine Learning,” and Hans Rosling’s Ted Talk “The Best Stats You’ve Ever Seen.”
The Tampa Bay Times example demonstrates the power of using minimal text with the graphic and building up the story in layers. Its images are telling the story, and the words are there just to make sure the viewer is on the right track. Text never takes up more than two sentences’ worth of space at a time. This example builds its layers by first orienting the focus (Pinellas County and its schools), and then adds in the other schools for reference. The gradual steps in this story make it successful.
The R2D3 example focuses less on telling the story of its data, and more on telling the story of how a machine learning process works using the data as the “character” to follow. It presents the problem of needing to classify New York or San Francisco homes, builds up the addition of multiple variables for classification, and then shows how the method’s resulting accuracy is achieved. This concept can be exceptionally useful when needing to convince a stakeholder of how or why a given method works.
Another positive feature of the Tampa Bay Times article and the R2D3 story is that they both rely on computer scrolls to lead through the story instead of forcing the viewer to click on small parts like a dashboard might. This feature makes them more mobile user friendly, which is especially important in the current age. It also makes the stories more adaptable to print, and it is easy for readers’ eyes to scroll through them.
The words presented with great data storytelling don’t always have to be printed. The concept of data storytelling carries over to the words spoken in a presentation, which is phenomenally exemplified by Hans Rosling’s Ted Talk. Rosling uses the added elements of his tone and pacing to build up the story, and he personifies the different countries as characters to follow and root for. Rosling also actively talks the audience through the visualization’s progression the same way the other storytelling examples added text to advance their stories.
Most importantly, when attempting data storytelling, remember that “whatever data we work with, when we share our insights, our goal is to move people to see things they haven’t seen before.”  Let me emphasize the word “move”; you want to engage the viewer in asking questions about what comes next through the building of the story. By telling a clear, cohesive, and interesting story with your data, stakeholders are given the opportunity to understand and trust your methods, and thus it becomes much easier to encourage actions based on the data’s insights.
Request a consultation to speak with an experienced data analytics consultant.
Previously published on Predictive Analytics Times.
RelatedRead the Blog 5 Keys to Powerful Data Visualizations
Download the White Paper Visualizations Get Some Snap from R ShinyRead the Blog Finding Balance: Model Accuracy vs. Interpretability in Regulated Environments