Measuring Invisible Treatment Effects with Uplift Analysis: A Get-Out-The-Vote Example


Mike Thurber

Date Published:
October 23, 2020

Models make predictions by identifying consistent correlations in what has been observed, but we usually require more than predictions to know what action we should take. For example, knowing that older people are more likely to have heart disease is a good first step, but knowing behaviors or treatments that will reduce the risk of heart disease as we age is actionable. Knowing millennials are more likely to buy your product than gen Z is nice, but knowing which marketing approach will persuade gen Z to buy is valuable. In this election season, knowing who will vote is interesting, but identifying unlikely voters who can be persuaded to show up at the polls is everything to campaign managers. When data science goes further to estimate the impact of alternative actions we may perform to achieve a better outcome, we call it uplift modeling or, more technically, treatment effect modeling. For this instructional blog we will use a very limited example of how uplift modeling can apply to get-out-the-vote campaigns, without divulging which sample or geography was used.

Simply examining the predictors in a model is insufficient to know how to influence an outcome. Even though an action may be predictive of an outcome, it may not cause it. For example, being on a diet is a predictor of being overweight, but taking people off diets will not help them lose weight. While hospital admittance indicates a higher chance of death, denying hospital admission will not reduce deaths. Broad online searches for a product like the one you are trying to sell is a helpful predictor that they may buy your product but encouraging your customers to search “broadly” may lead them to a competitor’s product. Uplift modeling must do more than make a good prediction: it must distinguish quantitatively between correlation and causation. It is the difference between circumstantial and direct evidence of an outcome.

Randomized Experiments Are the Gold Standard

Scientific experiments measure if or how an action (a “treatment”) causes a result (an “outcome”) on something (an “entity”). A randomized controlled experiment randomly divides entities into two groups, applies the treatment to the test group, withholds the treatment from the control group, and records the outcomes. When the overall outcome from the test group is significantly different than the control group, the experimenter concludes that the treatment has a measurable effect. This is the scientific method.

But what if such a controlled randomized experiment is impossible, illegal, unreasonable, or unethical? There are many such examples, but I will focus on one having to do with US presidential elections: How did the voting behavior of fellow household members affect a person’s voting behavior in the 2016 general presidential election? Several possible voting history interactions between two partners in a household are shown in the influence diagram in Figure 1.

Figure 1. Voting History Influence Diagram

We wish to measure the direct influence of the woman on the man and the man on the woman in the election, illustrated by the red vertical arrows in the diagram. The “treatment” in one case is the woman voting, with the outcome (the “effect”) being whether or not the man also voted. The second case is just the reflection of the first. An experiment would have been impractical because we can never control (ethically, or legally) whether or not the others in the household vote, i.e., randomly mandate whether or not they vote. As discussed previously, we must know more than the correlated relationship; we want to estimate the causal relationship to inform get-out-the-vote campaigns. How can we do this without a randomized controlled experiment?

To overcome this challenge we will demonstrate a method to find “quasi” random experiments in historical data using data science techniques.

The Data

Political analysts in many states can obtain historical data from publicly available voter registration lists as prescribed here. These data identify most of the registered voters and in which public elections they voted. These voter registration files can also provide information such as sex, age, address, ethnicity, party affiliation, etc.

The Principle

Historical data provides information about treatments, circumstances, and events, but without deliberate random assignment of treatments into distinct test and control groups. A quasi-experiment would be a set of observations among which the treatments happened to have been randomly assigned, meaning each one was equally likely to have received the treatment. We will use a technique called Propensity Score Matching (PSM) that asserts if a set of historical observations were equally likely to be treated, then whether or not they were actually treated is, effectively, a random assignment. To find these quasi-experimental groups we must first develop a proper propensity-to-treat model. This model must consider any information that may also impact the outcome we are trying to influence, except for the treatment itself, in order to resolve any historical selection bias of the treatment as it relates to the outcome.

The Model Development Process

Building an uplift model includes all the steps of a predictive model build but requires a propensity-to-treat model as well as many distinct likelihood-of-outcome models. Since the latter are built on small subsets of the data that would be used for a simple predictive model, the method works far better when the data is plentiful. The following steps detail the process to discover the ideal set of individuals, the “target opportunity”, for a get-out-the-vote campaign among the described households:

  1. Formulate the problem: For a household with one female and one male registered voter, how does the voting behavior of the woman affect the man’s likelihood to vote, and vice versa. We are interested in whether the bi-directional effect is symmetrical and how the effect varies across the households. Ideally, we would like to identify pockets of the population where our get-out-the-vote campaign will be ideally leveraged to the other household member. Limiting the scope to man-woman households is arbitrary and meant to be illustrative. For this example we will limit our population to about 75,000 households (150,000 registered voters) in one area for the 2016 presidential election.
  2. Build a strictly predictive model of voting likelihood using the inputs available in the voter registration file to broadly identify the independent features that are predictive of a person voting.
  3. Leveraging the feature set from step 2, build two propensity to treat models, one for the man and one for the woman in the household. The first is the propensity for the woman to vote using the information that predicts the man’s propensity to vote. The other is the mirror image, the propensity of the man to vote using the information from step 2 that predicts the woman’s propensity to vote. Apply the finished models to the women and men respectively.
  4. Segment the data by propensity score values. Each segment represents one quasi-random experiment. Because the inputs in this example are all nominal in nature, the models will produce a finite number of distinct scores. Some segments will be highly skewed in one direction where all, or nearly all, observations voted, or virtually none of them voted. Segments without sufficient representation in both categories (an unbalanced quasi experiment) are not useful. Typically, only a few segments need to be discarded for this reason.
  5. For each segment of the data, build a likelihood-of-voting model that includes the treatment variable (whether or not the partner in the household voted).
  6. Score every observation twice with the respective model from step 5: once for the actual voting behavior of the partner, and again for the counterfactual, pretending the partner’s voting behavior was the opposite.
  7. Compute the uplift which is the estimated likelihood of voting if the partner voted minus the estimated likelihood of voting if the partner did not vote.
  8. Quantify the size of the opportunity by identifying the persuadables, the households where neither person voted. Estimate the net persuadables, the number who would be persuaded to vote if the partner had voted, by multiplying the persuadables by the uplift from step 7.
  9. Find the segments with high net persuadables and high net lift. These represent the target opportunity.
  10. Describe the target opportunity in terms of its features and number of households. If modeled with discrete-valued features, as is the case with voting history behavior, the target can be described exactly.

Modeled Results for Campaign Targeting

Figure 2 shows the uplift model results for the 2016 General Election. As represented in this graph, we are looking for dots that are high on the vertical axis, large in size, and dark in color.

Figure 2. Uplift Modeling for Voting in 2016 General Election

The horizontal axes identify distinct quasi-experiments where the propensity for the partner to vote is a single value, from a model built on all the information used to estimate the voter’s likelihood of voting.

This is a bi-directional use case, where the man and woman in a household influence the other to vote.

The vertical axis is the “uplift,” the average increase in the likelihood of voting if their partner votes.

The size of a dot represents the number of net persuadables within the quasi-experiment group as shown in the graph legend. Net persuadables is the number of additional votes expected if the partner who did not vote had voted. For example, on the left graph for the male voters, it is the number of households where neither the man nor woman voted in the 2016 general election multiplied by the increase in voting propensity if the woman had voted in that election. The “Level of Evidence” is the z-score statistic of the confidence in the uplift in propensity-to-vote.

Interpreting the Results

The arrows drawn on the scatterplot identify the experimental groups with the largest and highest statistically certainty, the target opportunity. These were neither the most nor least likely voters. Instead uplift modeling shows they are roughly 50% likely to vote.

The symmetry between the figures indicates that the influence of the woman on the man’s voting behavior is nearly the same as the influence of the man on the woman in the household. Examining the cases with that propensity score we find that almost 13% of all man-woman households fall into this target opportunity group. Nearly 50% voted in the 2016 general election, but if a campaign had successfully persuaded one non-voting partner in the household to vote, 50% more of their partners would have also changed their behavior and voted. This is referred to as the “uplift” of a targeted campaign. Note that drifting slightly away from this target group to a different propensity score, for example slightly to the right, sharply reduces the uplift.


A simple predictive model can demonstrate that one partner in a household voting is predictive of the other voting, but causality remains a question. Uplift modeling, using propensity score matching, can reveal a true causal relationship. In addition, it can pinpoint the specific population where net persuadables are the highest, to optimize a campaign effort.

The illustrative application above optimizes get-out-the-vote campaigns, but the principles and processes can apply to other fields to optimize resource utilization and improve outcomes. Elder Research often employs uplift modeling to serve our clients, as illustrated in this telecom example where such models were key to improving customer retention and profitability.

Find out how data analytics can transform your decision-making.

With over two decades of experience expertly applying statistics, computer science, systems engineering, and domain expertise we can solve your challenging business problems.
Contact Us To Learn How