Blog Header 1.jpg

Blog

Sort by Topic

Visualizing the Performance of COVID Models

Chris McLean & Peter Bruce

May 26, 2020

BLOG_Visualizing the Performance of Covid Models

Never before have statistical models received the attention they are getting now in the midst of the Coronavirus pandemic.  It is hard to read a news feed today without encountering either:

  • New predictions from models such as the IHME model and others, or
  • Critiques of older predictions.

So – how have older predictions turned out? 

Read More

Lockdowns Knock Down the Spread of COVID-19, but Only to a Point (and only early on)

Mike Thurber and John Elder, Ph.D.

May 19, 2020

BLOG_Lockdowns Knock Down the Spread of COVID-19

By tracking anonymized mobile phone location data and COVID-19 case reports for many countries with different policies, we studied the effect of restricting mobility on the spread of COVID-19.  We found that lockdown policies did rapidly reduce the Covid reproduction ratio, R, but only up until ~3 days before a country’s peak daily case rate, and they had little or negative impact after that.  Also, people should be allowed to go to Parks.

Read More

A Holistic Framework for Managing Data Analytics Projects

Mike Thurber

May 15, 2020

Blog_A Holistic Framework for Managing Data Analytics Projects

Data Science project management must be customized to work best with each organization, but we find that our projects are most successful when managed using an Agile + CRISP-DM process rather than a traditional Waterfall approach. Sprint planning in an Agile + CRISP-DM framework constantly encourages the team to consider emerging requirements and to pivot based on findings from the previous sprint.

Read More

COVID-19: Tuesday Blips in a “Downward Trajectory”

Carl D. Hoover

May 7, 2020

BLOG_COVID-19_Tuesday Blips in a Downward Trajectory

We observe a peculiar counter-fluctuation in a COVID-19 statistic -- daily percent changes in deaths; it has a downward trend, but Tuesdays tend to see small increases.

COVID-19 death counts continue to increase. Figure 1a shows total U.S. deaths over the last 7 weeks along with the daily increase in deaths (dashed line). According to The COVID Tracking Project, between March 22nd and May 3rd reported deaths attributed to COVID-19 rose from 436 to a staggering 61,868.

Read More

Coronavirus: Age and Health

Peter Bruce

April 30, 2020

BLOG_Coronavirus - Age and Health

Although there are frequent reports in the news media of young people contracting serious cases of COVID-19 and even dying, the disease in its serious form is overwhelmingly a disease of older people.  Data on US deaths from the U. S. Centers for Disease Control in Figure 1 portray this vividly. (We focus on deaths rather than cases of Covid because deaths are less affected by differences in testing.)

Read More

COVID-19 Social Distancing Has Mitigated 2020 Flu Season

Mike Thurber

April 27, 2020

BLOG_COVID-19 Social Distancing Has Mitigated 2020 Flu Season

Three weeks ago, our Brief Is the Spread of the COVID-19 Coronavirus Being Slowed looked at the impact of social distancing on the flu.  Evidence showed that the unprecedented measures taken by the government are having the expected effect, as measured by seasonal flu cases.  In this Brief, we update and amplify that information.

Read More

Roadmap to Becoming a Data-Driven Organization

Robert Pitney

April 24, 2020

BLOG_Roadmap to Becoming a Data-Driven Organization

Data analytics is the process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information to support the decision-making process. Every organization can benefit from an effective data analytics program that uses data insights to more efficiently and effectively accomplish their mission. Developing an enterprise-wide core analytics group -- that consolidates analytics initiatives within the organization and facilitate communication between business units -- requires significant organizational and cultural change. Cultivating data-driven decision-making starts with the executive leadership.

Read More

COVID-19 Asymptomatic Rates and Implications

Mike Thurber

April 20, 2020

BLOG_COVID-19 Asymptomatic Rates and Implications

A key issue for COVID-19 response policies moving forward is the asymptomatic rate - people who have the virus but do not show symptoms.  If all the people who get the virus show symptoms, then despite the wide scope of the current outbreak, we have a long way to go.  In New York state, the virus epicenter in the U.S., there are nearly 220,000 tested & confirmed infections (as of 4/16/20), and the vast majority are symptomatic (it’s very hard to get tested if you're not symptomatic).  Still this is less than 1% of the total New York state population.  On the other hand, if there are 20 asymptomatic individuals for every confirmed symptomatic case, that would mean 4.5 million cases in New York, probably concentrated in the New York City area.  This would bring the city much closer to “herd immunity” and reduce fear of resurgence or second waves.  So what is the asymptomatic rate? 

Read More

Covid-19: Epidemiological Models vs. Statistical Models

Peter Bruce

April 15, 2020

BLOG_Covid-19-Epidemiological Models vs. Statistical Models

Nearly everyone is now familiar with the IHME Covid-19 forecasts (also called the “Murray” model after the lead project investigator at the Institute for Health Metrics and Evaluation), and perhaps its associated interactive visualizations. 

Read More

Is the Spread of the COVID-19 Coronavirus Being Slowed?

Mike Thurber

April 7, 2020

BLOG_Is the Spread of COVID-19 Being Slowed

The COVID-19 pandemic has created a need for clear and actionable analytics like never before. The world can’t wait for controlled scientific studies to be completed, or dodge with, “We can’t say anything until we get more data.” To discover true relationships, we’d love to have detailed structured public data. But we usually must take the limited data at hand and quantify actionable insights in the face of uncertainty.  Here, we’ll look at one piece of the menacing puzzle:  is Social Distancing helping?

Read More

How and Why to Interpret Black Box Models

Grant Fleming

March 27, 2020

BLOG_Holding Black Box Models Accountable Through Interpretability

Demand for data science services continues to accelerate, which has fueled the rapid development of ever more complex models. That complexity has contributed to the poor application of models and thus to controversy surrounding the true value of data science. It is vital for us as data scientists to ensure that, while our models continue to improve in performance, we can also interpret how they function, and thereby diagnose any harms that they might cause through biased or unfair predictions.

Read More

What is the Value of Data Engineering?

William Proffitt

March 13, 2020

Data Engineering

With more organizations discovering the value of using data science to make better decisions, new opportunities are emerging for Data Engineers to provide support and integration for analytics teams. What’s valuable about Data Engineering skills?

Read More

How to Pick a Winning March Madness Bracket

Robert Robison

February 28, 2020

BLOG_How to Pick a Winning March Madness Bracket

In 2019, over 40 million Americans wagered money on March Madness brackets, according to the American Gaming Association. Most of this money was bet in “bracket pools,” which consist of a group of people each entering their predictions of the NCAA tournament games along with a buy-in. The bracket that comes closest to being right wins. If you also consider the bracket pools where only pride is at stake, the number of participants is much greater. Despite all this attention, most do not give themselves the best chance to win because they are focused on the wrong question.

Read More

Updating a Data Pipeline with AWS’s Latest Offerings

Todd Gerdy

February 14, 2020

 BLOG_Updating a Production Data Pipeline with AWS’s Latest Offerings

In December I attended AWS re:Invent, Amazon Web Services' annual learning conference. It was five days filled with over 4,000 sessions, keynote announcements, a partner expo, and hands-on training and certification opportunities. I learned about a number of tools and services (some brand new) that will improve the data pipeline solutions we develop for clients. This article describes a production pipeline solution and several options for improving it using these tools and services.

Read More

Transform Your Business with Focused Analytics Training

Paul Derstine

February 3, 2020

 BLOG_Elder Research Acquires Statistics

Are you considering investing in sharpening the analytical skills of your staff? Do you wish that your IT, business, marketing, or operations group would more effectively employ predictive analytics in their work? Elder Research is excited to announce that it has acquired the Institute for Statistics Education at Statistics.com to provide focused data science, analytics, and statistics training for corporations and individuals. 

Read More

Improve Predictive Model Performance With Ensembles

Jordan Barr, Ph.D.

January 17, 2020

BLOG_Improve Predictive Model Performance With Ensembles

In my previous blog, Ensembles and Regularization – Analytics Superheroes, I reviewed the many advantages of model ensembles including removing “noise” variables, generalizing better than single component models, and reducing sensitivity to outliers.

In this article I take a deeper dive into the attributes and applications of model ensembles, and explore potential downsides to provide context for when to use them.

Read More

Trends in Natural Language Processing

Stuart Price, Ph.D.

December 27, 2019

BLOG_Trends in Natural Language Processing

Deep Neural Networks (DNN) have radically changed the landscape of state-of-the-art performance in Natural Language Processing (NLP) within recent years. These versatile models are being used in many applications including text classification, language creation, question answering, image captioning, language translation, named entity recognition, and speech recognition. The state-of-the-art is changing quickly, sometimes leading to large leaps in performance with the release of new architectures. In October of 2018 Google released BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding which performed best in 11 different NLP benchmarks upon release. Since then, there have been many more models adding new components or tweaking the approach. In this article we’ll review some of the traditional machine learning methods used in deep learning and new trends such as Transfer Learning and Transformers to provide a foundation no matter what model is currently leading.

Read More

Modeling Outcomes: Explain or Predict

Peter Bruce

December 27, 2019

BLOG_Explain or Predict

A casual user of machine learning methods like CART or Naive Bayes is accustomed to evaluating a model by measuring how well it predicts new data.  When examining the output of statistical models, they are often flummoxed by the profusion of assessment metrics. Typical multiple linear regression output will contain, in addition to a distribution of errors (residuals) and root mean squared error (RMSE), values such as R-squared, adjusted R-squared, t-statistics, F-statistics, P-values, degrees of freedom, at a minimum, plus more.

Read More

Making Good Use of Negative Space in Machine Learning

Will Goodrum, Ph.D.

December 13, 2019

BLOG_Making Good Use of Negative Space in Machine Learning

Data Scientists frequently build Machine Learning models to discover interesting (rare) events in data. These events can be valuable (e.g., customer purchases), costly (e.g., fraud), or even dangerous (e.g., threat). Finding them is a “needle-in-a-haystack” challenge: the events are rare and hard to distinguish from the huge mass of overwhelmingly uninteresting cases recorded. To differentiate rare from normal events it helps to have a good understanding of normal behavior. But, how well do you actually know the haystack?

Read More

Ways Machine Learning Models Fail: Missing Causes

Mike Thurber

November 29, 2019

BLOG_Ways Machine Learning Models Fail - Missing Causes

I have identified five primary reasons why analytical models fail:

  1. Poor Organizational Support
  2. Missing Causes
  3. Model Overfit
  4. Data Problems
  5. False Beliefs

In this post, we will consider how and why missing causes in the data for training a model may result in incorrect inferences or failures.

Read More

Leveraging Data Analytics to Increase ROI

John Elder, Ph.D.

November 15, 2019

BLOG_Leveraging Data Analytics to Increase ROI-1

Reluctance to trust and rely on machine-based decisions is widespread. That is understandable; how can one be sure the automated decision system takes into account all the factors it should? Employees struggle first to learn the new technology, and then after making great progress and producing a promising model, decision-makers can still prove extremely reluctant to risk a new approach, no matter how well tests reveal its effectiveness. Still, in today’s competitive work environment, having a positive relationship with machines is essential to increasing profits and building return on investment (ROI).

Read More

Big Data and Clinical Trials in Medicine

Peter Bruce

November 1, 2019

BLOG_Big Data and Clinical Trials in Medicine

There was an interesting article in the New York Times magazine section on the role that Big Data can play in treating patients — discovering things that clinical trials are too slow, too expensive, and too blunt to find. The story was about a very particular set of lupus symptoms, and how a doctor, on a hunch, searched a large database and found that those symptoms were associated with an increased propensity for blood clots.

Read More

Detecting Hidden Fraud Risk from Public Data

Hudson Hollister

October 18, 2019

BLOG_Detecting Hidden Fraud Risk from Public Data

Detecting which of the federal government’s millions of contracts1 most likely involve fraud used to require insider access to agencies’ IT systems. Data analytics provides greater efficacy and higher hit rate than traditional investigative methods – and now can even be performed using only public data.

Read More

Be Smarter Than Your Devices: Learn About Big Data

Peter Bruce

October 4, 2019

BLOG_Be Smarter Than Your Devices-Learn About Big Data

When Apple CEO Tim Cook finally unveiled his company’s new Apple Watch in a widely-publicized rollout, most of the press coverage centered on its cost ($349 to start) and whether it would be as popular among consumers as the iPod or iMac. Nitin Indurkhya saw things differently.

Read More

The Case for Government Investment in Analytics

Jane Wiseman

September 20, 2019

BLOG_Government Investment in Analytics

Government stands to gain $1 trillion globally from using data analytics.1 Few government data teams have the resources to document their value, but those that do can show as much as eight-to-one return on their cost. There is significant non-financial benefit as well, as public faith in government may improve when saving time and money is paired with increased transparency and accountability.  

Read More