Another approach to comparing sample proportions is based on Bayes’ theorem and the principles of Bayesian inference. In contrast to the frequentist approach, the Bayesian approach allows for the incorporation of prior knowledge about the parameter of interest. We start with a prior probability distribution for the parameter (the proportion of Teslas on the road) and update it based on the data we collect.
To use Bayesian analysis, we estimate a distribution for the parameter which reflects our beliefs about the true proportion of Teslas in each area. Notice here that we have a distribution of possible proportions rather than a single estimate. This can help us quantify the uncertainty of our knowledge.
First, we need to figure out our “prior” information about the proportion of Teslas on the road. According to Car and Driver, roughly 1% of the cars on the road are electric. According to Electrek, Tesla owns roughly two-thirds of the EV market in the US. This would mean that roughly 0.66% of cars on the road in the US are Teslas. However, there is uncertainty in those estimates AND that is for the entire United States. We would expect the rate to be higher in major cities. We can choose a relatively wide distribution as our prior since we have a lot of uncertainty.
We’ll assume the mean of the prior distribution is .05 (so the most likely proportion of Teslas in either city is 5%) and the standard deviation is .1 (which reflects a high uncertainty). We can model this distribution as a Beta Distribution with that mean and standard deviation.
Next, we can create ‘posterior distributions’ of the Tesla proportions in both San Francisco and Cary. We will start with the same prior distribution, and then update using the observed data from my survey.
Posterior Distribution of Tesla Rate
We can see that generally the rate is expected to be higher in San Francisco than in Cary, but there is still a lot of overlap in the distributions.
Since what we really care about is the difference between the rate of Teslas between San Francisco and Cary, we can sample from each of the two and take the difference between them. This gives us a distribution of the differences between the San Francisco Tesla rate and the Cary Tesla rate. Zero indicates no difference, a positive value indicates that San Francisco has a higher rate of Teslas.
Difference in Rate of Teslas (San Francisco, CA and Cary, NC)
Percentage Point Difference
With the Bayesian approach we have some more flexibility in the way we think about the output. We don’t simply reject or fail-to-reject the null hypothesis – we have a distribution of values. This distribution is centered at 0.036, so our most likely estimate is that the rate of Teslas is 3 percentage points higher in SF than in Cary. We can also gauge how confident we are that the rate is higher in SF at all. As shown in the graph, zero difference is at the 16th percentile of this distribution, so we can say we are 84% confident the rate of Teslas is higher in San Francisco.