Data Science, Statistics, and the “Method of Moments”


Peter Bruce

Date Published:
August 31, 2018

I got my introduction to statistics via resampling, working with Julian Simon, an early resampling pioneer. Demonstrating this “brute force” computer method to my father, I saw that he was vaguely offended by its inelegance.

He launched into an explanation that involved the “method of moments” and a lot of equations. “Method of moments” stuck with me in a poetic way, though I had no idea what he was talking about; to this day the subject is at the bottom of my list of jolly conversational topics. (The first through fourth “moments” in statistics are the mean, variance, skewness and kurtosis). It seems to epitomize the classical approach to statistics and probability (i.e. lots of math and theory). Pafnuty Chebyshev introduced the “Methods of Moments” in 1887 (see photo at right).

Statistics in Data Science Programs

As data science grows in importance in university programs, so does the prevalence of statistics – it’s almost always part of a data science program. And it is striking how often such programs include statistics courses that look like they could date from a century ago. One top tier American university’s data science program has a course that features topics such as hypothesis testing, one-sample methods, two-sample methods, and ANOVA.

So, what is it about about data science that calls for something different?

The science of statistics arose initially out of the need to measure things – especially physical and mental attributes of people. But what fills most standard statistics books is the machinery of statistical inference (hypothesis tests, confidence intervals) that arose a century ago from the need to quantify the uncertainty inherent in relatively small samples.

Data science, by contrast, is not faced with a shortage of data and consequent small samples. So all that inference machinery (one sample tests, two-sample tests, t-tests, F-tests, chi-square tests, goodness-of-fit, etc.) is mostly unneeded.

But multivariate statistical modeling – regression, principal components, clustering – is very useful for making predictions, reducing dimensionality and segmenting data. It’s just that most software implementations of these modeling procedures, even in Python, come with unnecessary inference information in the output.

Figure 1. Python regression output – statistical inference metrics

Not a big deal, but you have to know to ignore it, or at least not get confused by it! Here at, we retain some of the inferential machinery in our foundational statistics courses, for our students learning “pure statistics”in our programs. But in those introductory courses, we also provide the appropriate connections with, and distinctions from, data science. For example, we teach the R-sq metric in regression, but place it in the context of statistics for research, where we want to know how well the model fits the sample of data. For data science analytics, we point out that predictive accuracy with new data is a more appropriate metric for the regression model.

Need help getting started with analytics?

Our on-site half-day Analytics Executive Strategy session delivers strategies for using analytics to improve organizational decision-making and recommendations on how to grow your analytics capabilities.
Learn more