People are arguing a bit about who isn’t getting vaccinated in America. Are the unvaccinated mostly poor people worried about missing work, or are they Republicans who are too contrarian for their own good?

Matt Bruenig posted vaccination statistics from the Census’ Household Pulse Survey that provide some hints. The post was deleted, but here’s a screenshot:

Tweet text: An update on the vaccination situation from the last Census Household Pulse Survey covering the last half of August

The plots show that the unvaccinated are, on average, younger, poorer, less educated, and less white than vaccinated Americans.

Out of curiosity I decided to take a closer look. The Census provides the raw response data (here), so it’s not much trouble to run your own analysis.

The most useful feature in this data, in regard to this debate, is a question asking unvaccinated respondents why they didn’t get a vaccine, allowing respondents to select multiple reasons from a list. The results clearly refute one side of the debate: a measly 2% of respondents say that the vaccine is hard to get, and only 3% are (incorrectly) worried about the cost. The responses are shown below:

The responses to this question show that resistance to vaccination is overwhelmingly driven by doubts about the vaccine and mistrust of vaccine promoters, and that lack of access is almost a negligible issue. A small minority, 8% of respondents, say their doctor doesn’t recommend taking the vaccine.

Another question is what drives the mistrust of vaccines. The obvious answer is right-wing media, but the summary statistics appear to contradict that. Instead, the unvaccinated are more likely to be part of groups (the young, the poor, racial minorities) that tend to vote for Democrats. I don’t see strong reasons for doubting these results– the standard errors of the estimates are small due to the large sample size. Nonresponse bias could also be a problem, but it’s not clear why it would bias the results in these directions.

To get a better handle on this, I trained a random forest model in R to predict vaccination status.1 Using the model, we can look to see which variables best predict vaccine hesitancy. Here’s what R tells us about the trained model:

## 
## Call:
##  randomForest(formula = vaccinated ~ ., data = train) 
##                Type of random forest: classification
##                      Number of trees: 500
## No. of variables tried at each split: 12
## 
##         OOB estimate of  error rate: 11.89%
## Confusion matrix:
##       FALSE  TRUE class.error
## FALSE   230  5550 0.960207612
## TRUE    100 41637 0.002395956

What stands out here is that the model isn’t good at predicting lack of vaccination. It correctly predicts the status of the vaccinated nearly 100% of the time, but it only correctly predicts lack of vaccination 4% of the time. This suggests that the Census Pulse Survey is missing important variables. My hunch is that the missing variables are political.

When training this kind of model, it’s a good idea to check for overfitting. The error rates in my training and testing datasets are nearly the same, so it’s not an issue here.

## Prediction error rates (%):
##    train     test 
## 11.89048 11.92283

Now that we’ve checked that the model is OK, we can see what the most important predictors are. I plot them below, starting with the most important:

A few interesting results: State, probably a proxy for politics, has by far the most influence. Race doesn’t seem important after accounting for other demographic variables. (Racial differences in skepticism caused by the Tuskegee experiment, for example, don’t appear to be important.) Although the survey includes a question about health insurance, that question is not an important predictor– directly contradicting recent speculation by Zeynep Tufekci in a NYTimes op-ed.

What jumps out to me about this list of predictors is that they’re nearly all associated with either political leanings or levels of political knowledge (or both). Income and education, in particular, are both associated with political knowledge (see here for example).

We can get further information by looking at the partial effects of each variable. These effects represent the influence each variable has on the predicted outcome, when changed independently of the other variables. In this model, higher values imply a higher probability of getting vaccinated.

The two graphs below show that state partial effects are closely correlated with Donald Trump’s vote share in the 2020 election, suggesting that state is working as a proxy for politics. (Though there are other explanations that could also contribute to this pattern.)

(I got this vote share data from the MIT Election Data and Science Lab.)

Given that lower incomes are associated with voting for Democrats, you might expect lower income to be associated with higher vaccination rates, but the opposite is true:

But let’s take a closer look at that. For reference, here are the vote shares for Trump by income, taken from the Cooperative Congressional Election Study (CCES):

Lower income voters were less likely to vote for Trump in 2020, but the effect isn’t huge, especially for households with incomes between $20,000 and $150,000. It wouldn’t be surprising for this effect to be overshadowed by the association of lower incomes with lower levels of political knowledge.

One oddball predictor, a question about COVID diagnoses, doesn’t seem related to either politics or political knowledge. It turns out that those who have been diagnosed with COVID are less likely to get vaccinated. This suggests that practical concerns may also play a role, with people who think they have lower risks (young people, the previously infected) getting vaccinated at lower rates. That’s consistent with some of the popular reasons given above, “I don’t believe I need a COVID-19 vaccine” and “I don’t think COVID-19 is that big of a threat”.

So to me it doesn’t look like people fail to get vaccinated as a direct result of the hardships of being poor or lack of health insurance or anything like that. Instead it looks like the unvaccinated are poorly informed people making a bad decision. It looks like right-wing political leanings often play into those bad decisions. And it looks like being part of a lower risk group can also play a role.


  1. The random forest method generates a collection of decision trees, and the trees are used to make predictions. Decision trees are much more flexible than tools like linear regression, so I don’t need to make strong assumptions about how vaccine resistance works to get these predictions.↩︎

