Building up datasets

We will 1. Build a dataset for percentage african-american by state (including DC) 2. read in excel file from Charles Gaba with %total pop vaxxed and Trump % by states & DC 3. join these together

# first get data for pct aa
# downloaded from 2020 census at
# https://www.censusscope.org/us/rank_race_blackafricanamerican.html
# made into datafile with {datapasta}

dat <- tibble::tribble(
         ~Rank,                 ~State, ~`Number.Selecting.Race.of.Black/African.American.Alone`, ~Total.Population, ~`Percent.Selecting.Race.of.Black/African.American.Alone`,
             1, "District of Columbia",                                                   343213,            572059,                                                  "60.00%",
             2,          "Mississippi",                                                  1033437,           2844658,                                                  "36.33%",
             3,            "Louisiana",                                                  1444566,           4468976,                                                  "32.32%",
             4,       "South Carolina",                                                  1182727,           4012012,                                                  "29.48%",
             5,              "Georgia",                                                  2342110,           8186453,                                                  "28.61%",
             6,             "Maryland",                                                  1468243,           5296486,                                                  "27.72%",
             7,              "Alabama",                                                  1153044,           4447100,                                                  "25.93%",
             8,       "North Carolina",                                                  1734154,           8049313,                                                  "21.54%",
             9,             "Virginia",                                                  1384008,           7078515,                                                  "19.55%",
            10,             "Delaware",                                                   148823,            783600,                                                  "18.99%",
            11,            "Tennessee",                                                   929864,           5689283,                                                  "16.34%",
            12,             "New York",                                                  2986242,          18976457,                                                  "15.74%",
            13,             "Arkansas",                                                   417881,           2673400,                                                  "15.63%",
            14,             "Illinois",                                                  1864619,          12419293,                                                  "15.01%",
            15,              "Florida",                                                  2312105,          15982378,                                                  "14.47%",
            16,             "Michigan",                                                  1401723,           9938444,                                                  "14.10%",
            17,           "New Jersey",                                                  1127266,           8414350,                                                  "13.40%",
            18,                "Texas",                                                  2385554,          20851820,                                                  "11.44%",
            19,                 "Ohio",                                                  1288359,          11353140,                                                  "11.35%",
            20,             "Missouri",                                                   622087,           5595211,                                                  "11.12%",
            21,         "Pennsylvania",                                                  1211669,          12281054,                                                   "9.87%",
            22,          "Connecticut",                                                   305902,           3405565,                                                   "8.98%",
            23,              "Indiana",                                                   504449,           6080485,                                                   "8.30%",
            24,             "Oklahoma",                                                   258532,           3450654,                                                   "7.49%",
            25,             "Kentucky",                                                   293915,           4041769,                                                   "7.27%",
            26,               "Nevada",                                                   132490,           1998257,                                                   "6.63%",
            27,           "California",                                                  2219190,          33871648,                                                   "6.55%",
            28,               "Kansas",                                                   150584,           2688418,                                                   "5.60%",
            29,            "Wisconsin",                                                   300355,           5363675,                                                   "5.60%",
            30,        "Massachusetts",                                                   337157,           6349097,                                                   "5.31%",
            31,         "Rhode Island",                                                    45236,           1048319,                                                   "4.32%",
            32,             "Nebraska",                                                    67435,           1711263,                                                   "3.94%",
            33,             "Colorado",                                                   159279,           4301261,                                                   "3.70%",
            34,               "Alaska",                                                    21968,            626932,                                                   "3.50%",
            35,            "Minnesota",                                                   167857,           4919479,                                                   "3.41%",
            36,           "Washington",                                                   185052,           5894121,                                                   "3.14%",
            37,        "West Virginia",                                                    55999,           1808344,                                                   "3.10%",
            38,              "Arizona",                                                   154316,           5130632,                                                   "3.01%",
            39,                 "Iowa",                                                    59758,           2926324,                                                   "2.04%",
            40,           "New Mexico",                                                    33513,           1819046,                                                   "1.84%",
            41,               "Hawaii",                                                    20945,           1211537,                                                   "1.73%",
            42,               "Oregon",                                                    53032,           3421399,                                                   "1.55%",
            43,        "New Hampshire",                                                     8984,           1235786,                                                   "0.73%",
            44,                 "Utah",                                                    16150,           2233169,                                                   "0.72%",
            45,              "Wyoming",                                                     3126,            493782,                                                   "0.63%",
            46,         "South Dakota",                                                     4518,            754844,                                                   "0.60%",
            47,         "North Dakota",                                                     3673,            642200,                                                   "0.57%",
            48,              "Vermont",                                                     2981,            608827,                                                   "0.49%",
            49,                "Maine",                                                     6047,           1274923,                                                   "0.47%",
            50,                "Idaho",                                                     5244,           1293953,                                                   "0.41%",
            51,              "Montana",                                                     2359,            902195,                                                   "0.26%"
         )

# Now build a dataframe of state names and abbreviations
states <- data.frame(state.abb, state.name) 

# add DC
de <- data.frame("DC", "District of Columbia")
names(de) <- c("state.abb", "state.name")

# bind together
states <- bind_rows(states, de)

# clean up
dat2 <- left_join(dat, states, by = c("State" = "state.name"))  %>% select(-Rank) %>% 
  relocate(State, state.abb) %>% 
  set_names('state', 'state_abb', 'aa_num', 'tot_pop', 'pct_aa') %>% mutate(pct_aa = parse_number(pct_aa))

dat2
## # A tibble: 51 x 5
##    state                state_abb  aa_num tot_pop pct_aa
##    <chr>                <chr>       <dbl>   <dbl>  <dbl>
##  1 District of Columbia DC         343213  572059   60  
##  2 Mississippi          MS        1033437 2844658   36.3
##  3 Louisiana            LA        1444566 4468976   32.3
##  4 South Carolina       SC        1182727 4012012   29.5
##  5 Georgia              GA        2342110 8186453   28.6
##  6 Maryland             MD        1468243 5296486   27.7
##  7 Alabama              AL        1153044 4447100   25.9
##  8 North Carolina       NC        1734154 8049313   21.5
##  9 Virginia             VA        1384008 7078515   19.6
## 10 Delaware             DE         148823  783600   19.0
## # ... with 41 more rows
# read in vax data, clean names,
# convert from proportion to percent
vax <- readxl::read_excel("vaxxes_by_state_red_blue_other.xlsx") %>% janitor::clean_names() %>% 
  mutate(trump_percent = 100* trump_percent) %>% 
  mutate(percent_total_pop_vaxxed = 100* percent_total_pop_vaxxed)

Joining data

You can also embed plots, for example:

vax_demog <- left_join(dat2, vax) %>% 
  janitor::clean_names()
## Joining, by = "state"
vax_demog
## # A tibble: 51 x 17
##    state            state_abb aa_num tot_pop pct_aa trump_percent cumul_cases_1k
##    <chr>            <chr>      <dbl>   <dbl>  <dbl>         <dbl>          <dbl>
##  1 District of Col~ DC        3.43e5  572059   60             5.4           70.4
##  2 Mississippi      MS        1.03e6 2844658   36.3          57.6          106. 
##  3 Louisiana        LA        1.44e6 4468976   32.3          58.5           99.8
##  4 South Carolina   SC        1.18e6 4012012   29.5          55.1          115. 
##  5 Georgia          GA        2.34e6 8186453   28.6          49.2          104. 
##  6 Maryland         MD        1.47e6 5296486   27.7          32.2           73.8
##  7 Alabama          AL        1.15e6 4447100   25.9          62.0          107. 
##  8 North Carolina   NC        1.73e6 8049313   21.5          49.9           94.8
##  9 Virginia         VA        1.38e6 7078515   19.6          44             77.6
## 10 Delaware         DE        1.49e5  783600   19.0          39.8          108. 
## # ... with 41 more rows, and 10 more variables: cumul_deaths_10k <dbl>,
## #   bachelors <dbl>, med_hh_income <dbl>, total_pop <dbl>,
## #   dose_equiv_needed <dbl>, doses <dbl>, doses_per_100k <dbl>, j_j_est <dbl>,
## #   dose_equivs <dbl>, percent_total_pop_vaxxed <dbl>
vax_demog$percent_total_pop_vaxxed
##  [1] 54.22960 29.31156 32.46006 36.08423 34.05570 46.98692 29.61555 38.43342
##  [9] 45.95004 45.74028 34.15777 46.31530 33.73774 43.44102 42.06562 41.59108
## [17] 46.92433 37.92969 41.02814 37.27400 46.27757 53.76829 36.27240 37.05195
## [25] 39.59623 39.17783 46.26807 38.79716 45.57027 53.96797 50.78738 42.50621
## [33] 46.41418 41.24348 45.82595 46.22629 35.62278 40.47817 43.66605 48.99970
## [41] 51.72683 44.64318 46.10259 37.92714 33.93485 43.45425 39.60440 55.72396
## [49] 52.78062 33.10704 39.30872

Now model

Simple model

  • trump % is the only predictor of vax %.
vax_demog %>% 
 lm(percent_total_pop_vaxxed ~  trump_percent, data =  .) %>% broom::glance()
## # A tibble: 1 x 12
##   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1     0.653         0.646  3.92      92.2 7.64e-13     1  -141.  288.  294.
## # ... with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

This model, with only one predictor, predicts 64.6% of the variance in vaccination. This is a LOT for a social outcomes model.

vax_demog %>% 
 lm(percent_total_pop_vaxxed ~  trump_percent, data =  .) %>% broom::tidy()
## # A tibble: 2 x 5
##   term          estimate std.error statistic  p.value
##   <chr>            <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)     64.1      2.34       27.4  2.33e-31
## 2 trump_percent   -0.444    0.0462     -9.60 7.64e-13

Model interpretation -

in a hypothetical US state with no Trump voters, 64.1% of the people would be vaccinated (Intercept). For every one percent increase in Trump voters in each state (on average), the percent vaccinated goes down by 0.44%. This is a highly significant effect (p value 7.64 x 10^-13).

Better model

vax_demog %>% 
 lm(percent_total_pop_vaxxed ~ pct_aa + trump_percent, data =  .) %>% broom::glance()
## # A tibble: 1 x 12
##   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1     0.843         0.836  2.67      128. 5.35e-20     2  -121.  250.  257.
## # ... with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

This model, with two predictors, predicts 83.6% of the variance in vaccination. This is an ENORMOUS AMOUNT for a social outcomes model.

vax_demog %>% 
 lm(percent_total_pop_vaxxed ~ pct_aa + trump_percent, data =  .) %>% broom::tidy()
## # A tibble: 3 x 5
##   term          estimate std.error statistic  p.value
##   <chr>            <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)     70.5      1.80       39.1  4.54e-38
## 2 pct_aa          -0.255    0.0335     -7.61 8.73e-10
## 3 trump_percent   -0.518    0.0329    -15.7  1.41e-20

Model interpretation -

in a hypothetical US state with no Trump voters and no African-Americans, 70.5% of the people would be vaccinated (Intercept). For every one percent increase in African-Americans in each state (on average), the percent vaccinated goes down by 0.26%. This is a highly significant effect (p value 8.73 x 10^-10). For every one percent increase in Trump voters in each state (on average), the percent vaccinated goes down by 0.518%. This is also a highly significant effect (p value 1.41 x 10^-20). By controlling for the effect of African-American population, we have revealed a stronger (and more significant) effect of Trump voting. We also have a better overall prediction model.

Better Model Predictions

Let’s use this better model to make predictions about each state, and see how accurate these are. In states that are less accurate, other variables (that we have not included in the model) may be important, and could be causing the variance between our predictions and the actual percent_total_pop_vaxxed.

predicted <- vax_demog %>% 
 lm(percent_total_pop_vaxxed ~ pct_aa + trump_percent, data =  .) %>% 
  broom::augment() %>% 
  rename(predicted = .fitted) 

predictions_by_state <- vax_demog %>% 
  select(state, trump_percent) %>% 
  right_join(predicted)
## Joining, by = "trump_percent"
predictions_by_state %>% 
  select(state, trump_percent, pct_aa, percent_total_pop_vaxxed, predicted) %>% 
  flextable::flextable()

Most of these predictions are pretty accurate, between 1-3% of the actual values. There may be a west coast vaccine fear factor, as several states are lower than predicted by these 2 variables, and the Dakotas are higher than predicted.