Often when people think of data scientists, they imagine a mythical person who knows how to do everything required for success: write sophisticated Python libraries, derive cutting edge machine learning algorithms using a deep understanding of statistics, shepherd successful models through deployment, administrate the database, create elegant visual dashboards, deeply understand the business, and drive corporate strategy. The required skills listed on many job postings for Data Scientists is long, overwhelming, and in many cases, completely out of reach for a single person.
Fortunately, it doesn’t matter.
In Analyzing the Analyzers, Harlan D. Harris, Sean Patrick Murphy, and Marck Vaisman reported on the results of their 2012 survey of data science practitioners. One of their goals was to identify what is behind the title “Data Scientist”. Is there a common vocabulary or set of skills that define this job title? What they found was four predominant “types” of data scientists, each with a different mix of skills, aptitudes, and desires:
- Data Businessperson
- Data Creative
- Data Developer
- Data Researcher
They also identified five different areas of expertise:
- Machine Learning/ Big Data
- Math/ Operations Research
As is indicated in Figure 3-2 from Analyzing the Analyzers (shown below in Figure 1), the list of skills in each of these areas is extensive; expecting a single person to excel in each of them is not only unrealistic but unhelpful.
As a data science consultancy, our teams at Elder Research have solved a great variety of challenging problems in a wide range of industries. Usually these problems are intractable for a single individual, and by bringing a team together with a diverse set of skills and backgrounds, we have greater success than we would with just one “type” of data scientist. Teaming a Data Researcher (a person with substantial statistical depth) with a Data Developer (who excels at the programming nuts and bolts that drive data wrangling) on a project often proves to be a powerful combination. So how do you build a team unicorn? There is no single best way, as recruiting and hiring is an imprecise process, and excellent Data Scientists come from a wide variety of academic backgrounds and work histories. Nevertheless, we have found a few approaches that have helped us to build strong teams that have earned the trust of our clients and consistently delivered ongoing value.
Over the last five to ten years many high quality programs have emerged that offer the degree of Master’s in Data Science. Many of our Data Scientists have graduated from these programs, and they are a huge asset. This degree provides exposure to a wide variety of key statistical techniques and technologies, and Data Scientists from these programs are often able to quickly zero in on the right solution for a given problem. Students from these programs often, but not exclusively, fit into the Data Creative category, and are broadly successful in a variety of project roles.
One question I have been asked is whether a PhD is necessary for success as a Data Scientist. Though I’d answer with a resounding “No”, including technical PhDs on your team is something I would strongly recommend. The process of obtaining a PhD requires developing a depth of critical thinking and inquiry, particularly in the face of a new problem, when even the right questions to ask must be discovered. PhD earners have demonstrated tenacity for facing challenges and a unique approach to problem solving that can add richness to the Data Science process. They often fall into the Data Researcher group, and will bravely delve deeply into something that is technically challenging. On the down side, some doctoral earners have trouble adjusting to the industrial reality of deadlines, budgets and deliverables. They may unconsciously prefer to solve a related, small problem perfectly instead of delivering a suitable solution to the actual, big, messy problem. This distinction is essential to discover.
In addition to team members with Data Science Master’s Degrees and technical PhDs, software engineers turned Data Scientists (whether through education, training, or a combination) are another critical part of successful data science teams. Data Scientists with this background often fall into the Data Developer category, and their software engineering mindset is essential for creating robust, repeatable code. It is especially necessary as data integration and pipelining are critical to any Data Science project, and having someone familiar with software engineering best practices can help make deliverables in these areas really stand out.
In Building Data Science Teams, DJ Patil highlights curiosity, storytelling, and cleverness, as essential skills which complement technical acumen. He mentions three questions that he has used as guidelines for hiring, the first of which — “Would we be willing to do a startup with you?” — specifically calls out the importance of trust and communication.
In short, all of the technical skills listed so far are meaningless if you can’t trust that individual and they cannot communicate effectively, both internally and with external stakeholders. Moreover, because the best data science is done by teams, rather than individuals, finding people who are a good cultural fit almost always trumps expertise in a specific skill. Technical mindset, willingness to learn, and the humility to ask questions can often fill gaps in your team.
Lastly, we emphasize humility, when considering potential teammates, as it is an essential quality to growth, to learning, to respecting others, and to listening. (And who likes an arrogant consultant?) When we truly listen, we’re armed to figure out where the client’s pain is, and to solve it.