What’s in this data?

The data analysis below examines why California counties opposed Prop 6, the 2024 ballot measure that would have outlawed slavery and involuntary servitude for incarcerated people. Vote outcomes for Prop 6 on the county level are compared to presidential election outcomes, education levels, income levels, the percent of white/non-white people in each county and other factors.

Sources:
–Prop 6 final vote outcomes by county: The New York Times and California Secretary of State, Dec. 13 https://www.nytimes.com/interactive/2024/11/05/us/elections/results-california-proposition-6-end-involuntary-labor-by-the-incarcerated.html
–Presidential election outcomes by county: The New York Times and California Secretary of State, Dec. 13 https://www.sos.ca.gov/elections/prior-elections/statewide-election-results/general-election-nov-5-2024/statement-vote
–County prison populations: California Sentencing Institute http://casi.cjcj.org/about.html#download
–Other demographic variables by county: 2022 American Community Survey https://data.census.gov/
–County shapefiles: 2024 TIGER/Line shapefiles https://www.census.gov/cgi-bin/geo/shapefiles/index.php?year=2024&layergroup=Counties+%28and+equivalent%29

library(readxl)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(janitor)
## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(stats)
library(sf)
## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library(ggplot2)

file_path <- "../data/Prop6_keyvalues.xlsx" # Path to file
data <- read_excel(file_path, sheet = "Prop6_keyvalues") # Import file, sheet 1

Let’s view the data and clean it / check it slightly.

# standardize header names
data <- data %>%
  clean_names()

head(data)
## # A tibble: 6 Ă— 13
##   geography  cofips county prop6_result prop6_margin_ppts median_household_inc…¹
##   <chr>      <chr>  <chr>  <chr>                    <dbl>                  <dbl>
## 1 0500000US… 06001  Alame… YES                         18                 122488
## 2 0500000US… 06003  Alpine YES                          4                 101125
## 3 0500000US… 06005  Amador NO                         -44                  74853
## 4 0500000US… 06007  Butte  NO                         -22                  66085
## 5 0500000US… 06009  Calav… NO                         -42                  77526
## 6 0500000US… 06011  Colusa NO                         -48                  69619
## # ℹ abbreviated name: ¹​median_household_income
## # ℹ 7 more variables: percent_income_below_the_poverty_level <dbl>,
## #   pct_black <dbl>, pct_hispanic <dbl>, percent_non_white <dbl>,
## #   total_adult_imprisonments_per_100_000_population_age_18_69_2016 <dbl>,
## #   margin_harris_negative <dbl>, pct_collegegrads <dbl>
summary(data) #can view quartile ranges here
##   geography            cofips             county          prop6_result      
##  Length:58          Length:58          Length:58          Length:58         
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##  prop6_margin_ppts median_household_income
##  Min.   :-56.00    Min.   : 47317         
##  1st Qu.:-37.50    1st Qu.: 64143         
##  Median :-22.00    Median : 76148         
##  Mean   :-20.14    Mean   : 82967         
##  3rd Qu.: -6.00    3rd Qu.: 98694         
##  Max.   : 34.00    Max.   :153792         
##  percent_income_below_the_poverty_level   pct_black        pct_hispanic   
##  Min.   : 6.40                          Min.   :0.00000   Min.   :0.0754  
##  1st Qu.:10.43                          1st Qu.:0.01153   1st Qu.:0.1535  
##  Median :13.10                          Median :0.01747   Median :0.2682  
##  Mean   :13.18                          Mean   :0.02860   Mean   :0.3172  
##  3rd Qu.:16.20                          3rd Qu.:0.03223   3rd Qu.:0.4587  
##  Max.   :22.00                          Max.   :0.12944   Max.   :0.8541  
##  percent_non_white
##  Min.   :0.1337   
##  1st Qu.:0.3116   
##  Median :0.4998   
##  Mean   :0.4844   
##  3rd Qu.:0.6730   
##  Max.   :0.9043   
##  total_adult_imprisonments_per_100_000_population_age_18_69_2016
##  Min.   : 126.6                                                 
##  1st Qu.: 304.9                                                 
##  Median : 471.4                                                 
##  Mean   : 484.0                                                 
##  3rd Qu.: 620.1                                                 
##  Max.   :1338.1                                                 
##  margin_harris_negative pct_collegegrads
##  Min.   :-65.000        Min.   :0.2113  
##  1st Qu.:-27.000        1st Qu.:0.2985  
##  Median :  1.700        Median :0.3508  
##  Mean   : -2.958        Mean   :0.3840  
##  3rd Qu.: 21.000        3rd Qu.:0.4808  
##  Max.   : 54.000        Max.   :0.6745
# Identify missing values - there are none
data %>%
  summarise(across(everything(), ~ sum(is.na(.))))
## # A tibble: 1 Ă— 13
##   geography cofips county prop6_result prop6_margin_ppts median_household_income
##       <int>  <int>  <int>        <int>             <int>                   <int>
## 1         0      0      0            0                 0                       0
## # ℹ 7 more variables: percent_income_below_the_poverty_level <int>,
## #   pct_black <int>, pct_hispanic <int>, percent_non_white <int>,
## #   total_adult_imprisonments_per_100_000_population_age_18_69_2016 <int>,
## #   margin_harris_negative <int>, pct_collegegrads <int>

Quartile Analysis

By splitting our numeric variables into quartiles, we can look at whether counties in each buckets had different levels of support for Prop 6.

# List of columns to split into quartiles - all numeric variables
columns_to_quartile <- c(
  "median_household_income",
  "percent_income_below_the_poverty_level", 
  "pct_black", 
  "pct_hispanic", 
  "percent_non_white", 
  "total_adult_imprisonments_per_100_000_population_age_18_69_2016", 
  "margin_harris_negative", 
  "pct_collegegrads"
)

# Add new quartile columns for each variable
data <- data %>%
  mutate(across(all_of(columns_to_quartile), 
                ~ ntile(.x, 4), 
                .names = "{.col}_quartile"))

# make a summary table showing county support for prop 6 based on which quartile the county falls into for household income
# remember negative values means a stronger vote against prop 6
prop6_houseincome <- data %>%
  group_by(median_household_income_quartile) %>%
  summarize(average_prop6_support = mean(prop6_margin_ppts, na.rm = TRUE)) %>%
  st_drop_geometry()  # Drop the geometry column

head(prop6_houseincome) # view data
## # A tibble: 4 Ă— 2
##   median_household_income_quartile average_prop6_support
##                              <int>                 <dbl>
## 1                                1               -30    
## 2                                2               -34.9  
## 3                                3               -13.6  
## 4                                4                -0.286
write.csv(prop6_houseincome, "../data/prop6_houseincome.csv", row.names = FALSE) # export data
# higher household income for a county --> less prop 6 support for that county

# make another summary table and export it but for percent of black pop in county
prop6_black <- data %>%
  group_by(pct_black_quartile) %>%
  summarize(average_prop6_support = mean(prop6_margin_ppts, na.rm = TRUE)) %>%
  st_drop_geometry()  # Drop the geometry column

head(prop6_black)
## # A tibble: 4 Ă— 2
##   pct_black_quartile average_prop6_support
##                <int>                 <dbl>
## 1                  1                 -24.5
## 2                  2                 -21.7
## 3                  3                 -20.3
## 4                  4                 -13.6
write.csv(prop6_black, "../data/prop6_black.csv", row.names = FALSE)
# greater pct of black people in county --> more prop 6 support

# another summary table for percent of hispanic pop
prop6_hispanic <- data %>%
  group_by(pct_hispanic_quartile) %>%
  summarize(average_prop6_support = mean(prop6_margin_ppts, na.rm = TRUE)) %>%
  st_drop_geometry()  # Drop the geometry column

head(prop6_hispanic)
## # A tibble: 4 Ă— 2
##   pct_hispanic_quartile average_prop6_support
##                   <int>                 <dbl>
## 1                     1                 -29.7
## 2                     2                 -11.2
## 3                     3                 -13.4
## 4                     4                 -26.1
write.csv(prop6_hispanic, "../data/prop6_hispanic.csv", row.names = FALSE)
# interesting that the quartiles show a divergent trend here -- counties with relatively few and relatively many hispanic people showed less support for prop 6 than places with a moderate amount of hispanic people

# another summary table for percent of nonwhite pop
prop6_nonwhite <- data %>%
  group_by(percent_non_white_quartile) %>%
  summarize(average_prop6_support = mean(prop6_margin_ppts, na.rm = TRUE)) %>%
  st_drop_geometry()  # Drop the geometry column

head(prop6_nonwhite)
## # A tibble: 4 Ă— 2
##   percent_non_white_quartile average_prop6_support
##                        <int>                 <dbl>
## 1                          1                 -29.2
## 2                          2                 -20.3
## 3                          3                 -11.1
## 4                          4                 -19.3
write.csv(prop6_nonwhite, "../data/prop6_nonwhite.csv", row.names = FALSE)
# counties with relatively few white people are showing less prop 6 support than other counties

# another summary table for percent of incarcerated pop
prop6_incarcerated <- data %>%
  group_by(total_adult_imprisonments_per_100_000_population_age_18_69_2016_quartile) %>%
  summarize(average_prop6_support = mean(prop6_margin_ppts, na.rm = TRUE)) %>%
  st_drop_geometry()  # Drop the geometry column

head(prop6_incarcerated)
## # A tibble: 4 Ă— 2
##   total_adult_imprisonments_per_100_000_population_age_1…¹ average_prop6_support
##                                                      <int>                 <dbl>
## 1                                                        1                  -7.6
## 2                                                        2                 -14.4
## 3                                                        3                 -27.9
## 4                                                        4                 -32  
## # ℹ abbreviated name:
## #   ¹​total_adult_imprisonments_per_100_000_population_age_18_69_2016_quartile
write.csv(prop6_incarcerated, "../data/prop6_incarcerated.csv", row.names = FALSE)
# more people incarcerated tracks with less prop 6 support

# another summary table for presidential margin of victory
# positive values are trump victory, negative values are harris victory
prop6_trumpsupport <- data %>%
  group_by(margin_harris_negative_quartile) %>%
  summarize(average_prop6_support = mean(prop6_margin_ppts, na.rm = TRUE)) %>%
  st_drop_geometry()  # Drop the geometry column

head(prop6_trumpsupport)
## # A tibble: 4 Ă— 2
##   margin_harris_negative_quartile average_prop6_support
##                             <int>                 <dbl>
## 1                               1                  6.27
## 2                               2                -15.6 
## 3                               3                -30.7 
## 4                               4                -42.7
write.csv(prop6_trumpsupport, "../data/prop6_trumpsupport.csv", row.names = FALSE)
# more support for Harris tracks with more support for prop 6

# another summary table for percent of college grads
prop6_collegegrads <- data %>%
  group_by(pct_collegegrads_quartile) %>%
  summarize(average_prop6_support = mean(prop6_margin_ppts, na.rm = TRUE)) %>%
  st_drop_geometry()  # Drop the geometry column

head(prop6_collegegrads)
## # A tibble: 4 Ă— 2
##   pct_collegegrads_quartile average_prop6_support
##                       <int>                 <dbl>
## 1                         1                -36.1 
## 2                         2                -28.5 
## 3                         3                -12.6 
## 4                         4                 -1.57
write.csv(prop6_collegegrads, "../data/prop6_collegegrads.csv", row.names = FALSE)
# more college grads in a county --> more support for prop 6

So what do we know about the California counties that showed more opposition to Prop 6? Based on this quartile analysis, they tend to have fewer incarcerated people, more white people/fewer black people, higher household income, fewer college graduates and voted more strongly for Trump in the presidential election than counties with more support for Prop 6.

Map Prop 6 support

It’ll help our viewers and readers if we can visualize county-level Prop 6 support on a map so they see where exactly these counties are that are most opposed to Prop 6.

# import shapefiles for US counties from TIGER 2024
uscounties_shp <- st_read("../data/tl_2024_us_county.shp")
## Reading layer `tl_2024_us_county' from data source 
##   `/Users/CASTJ650/Documents/R projects/propsix/data/tl_2024_us_county.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 3235 features and 18 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -179.2311 ymin: -14.60181 xmax: 179.8597 ymax: 71.43979
## Geodetic CRS:  NAD83
head(uscounties_shp)
## Simple feature collection with 6 features and 18 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -123.7283 ymin: 18.11774 xmax: -65.81565 ymax: 46.38562
## Geodetic CRS:  NAD83
##   STATEFP COUNTYFP COUNTYNS GEOID        GEOIDFQ        NAME
## 1      31      039 00835841 31039 0500000US31039      Cuming
## 2      53      069 01513275 53069 0500000US53069   Wahkiakum
## 3      35      011 00933054 35011 0500000US35011     De Baca
## 4      31      109 00835876 31109 0500000US31109   Lancaster
## 5      31      129 00835886 31129 0500000US31129    Nuckolls
## 6      72      085 01804523 72085 0500000US72085 Las Piedras
##                NAMELSAD LSAD CLASSFP MTFCC CSAFP CBSAFP METDIVFP FUNCSTAT
## 1         Cuming County   06      H1 G4020  <NA>   <NA>     <NA>        A
## 2      Wahkiakum County   06      H1 G4020  <NA>   <NA>     <NA>        A
## 3        De Baca County   06      H1 G4020  <NA>   <NA>     <NA>        A
## 4      Lancaster County   06      H1 G4020   339  30700     <NA>        A
## 5       Nuckolls County   06      H1 G4020  <NA>   <NA>     <NA>        A
## 6 Las Piedras Municipio   13      H1 G4020   490  41980     <NA>        A
##        ALAND   AWATER    INTPTLAT     INTPTLON                       geometry
## 1 1477563042 10772508 +41.9158651 -096.7885168 MULTIPOLYGON (((-96.55525 4...
## 2  680980773 61564428 +46.2946377 -123.4244583 MULTIPOLYGON (((-123.7276 4...
## 3 6016818941 29090018 +34.3592729 -104.3686961 MULTIPOLYGON (((-104.8934 3...
## 4 2169269508 22850511 +40.7835474 -096.6886584 MULTIPOLYGON (((-96.68493 4...
## 5 1489645201  1718484 +40.1764918 -098.0468422 MULTIPOLYGON (((-98.2737 40...
## 6   87748419    32509 +18.1871483 -065.8711890 MULTIPOLYGON (((-65.85703 1...
# filter for California counties (FIPS state code is "06")
california_shp <- uscounties_shp %>%
  filter(str_starts(STATEFP, "06"))
View(california_shp) # check - we have all 58 counties here

# join data with the shapefile using FIPS codes
prop6_map_data <- california_shp %>%
  left_join(data, by = c("GEOID" = "cofips"))
View(prop6_map_data)

# plot the choropleth map
ggplot(prop6_map_data) +
  geom_sf(aes(fill = prop6_margin_ppts), color = "white") +
  scale_fill_gradient2(low = "red",       # Color for values below 0
                       mid = "white",     # Neutral color at 0
                       high = "green",    # Color for values above 0
                       midpoint = 0,      # Center of the gradient
                       na.value = "gray80", 
                       name = "Vote Margin") +
  theme_minimal() +
  labs(title = "Prop 6 Support by California County",
       caption = "Source: California Secretary of State") +
  theme(axis.text = element_blank(),
        axis.ticks = element_blank(),
        panel.grid = element_blank())

# export the map as PNG file
ggsave("../data/california_prop6.png", 
       plot = last_plot(),  # Use the last plotted map
       width = 10,          # Width in inches
       height = 8,          # Height in inches
       dpi = 300)           # Resolution: 300 DPI for print-quality output

Run regressions

Now we want to see whether the relationship between these variables is significant, so we are going to put them into a regression. The outcome variable we are interested in is the margin of victory for prop 6, where a passing vote for prop 6 is positive and a failing vote is negative.

#run regression on prop 6 results and presidential results
regression_harris <- lm(prop6_margin_ppts ~ margin_harris_negative, data=data)
summary(regression_harris) #super strong correlation, so we are running this one separately so it doesn't dwarf other relationships in full model
## 
## Call:
## lm(formula = prop6_margin_ppts ~ margin_harris_negative, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.8422 -3.2617  0.1281  3.5270 13.5149 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            -22.07456    0.62015  -35.60   <2e-16 ***
## margin_harris_negative  -0.65476    0.02112  -31.01   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.699 on 56 degrees of freedom
## Multiple R-squared:  0.945,  Adjusted R-squared:  0.944 
## F-statistic: 961.4 on 1 and 56 DF,  p-value: < 2.2e-16
#run regression model on the remainder of the variables to see what else correlates
regression_full_model <- lm(
  prop6_margin_ppts ~ median_household_income + percent_income_below_the_poverty_level + percent_non_white + total_adult_imprisonments_per_100_000_population_age_18_69_2016 + pct_collegegrads, data=data
)
summary(regression_full_model)
## 
## Call:
## lm(formula = prop6_margin_ppts ~ median_household_income + percent_income_below_the_poverty_level + 
##     percent_non_white + total_adult_imprisonments_per_100_000_population_age_18_69_2016 + 
##     pct_collegegrads, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -22.2780  -7.0976  -0.2619   5.9943  29.0581 
## 
## Coefficients:
##                                                                   Estimate
## (Intercept)                                                     -1.008e+02
## median_household_income                                         -1.537e-04
## percent_income_below_the_poverty_level                           1.152e+00
## percent_non_white                                                3.762e+01
## total_adult_imprisonments_per_100_000_population_age_18_69_2016 -1.154e-02
## pct_collegegrads                                                 1.707e+02
##                                                                 Std. Error
## (Intercept)                                                      1.580e+01
## median_household_income                                          2.027e-04
## percent_income_below_the_poverty_level                           6.754e-01
## percent_non_white                                                1.133e+01
## total_adult_imprisonments_per_100_000_population_age_18_69_2016  7.982e-03
## pct_collegegrads                                                 3.238e+01
##                                                                 t value
## (Intercept)                                                      -6.379
## median_household_income                                          -0.759
## percent_income_below_the_poverty_level                            1.706
## percent_non_white                                                 3.320
## total_adult_imprisonments_per_100_000_population_age_18_69_2016  -1.446
## pct_collegegrads                                                  5.272
##                                                                 Pr(>|t|)    
## (Intercept)                                                     4.82e-08 ***
## median_household_income                                          0.45152    
## percent_income_below_the_poverty_level                           0.09389 .  
## percent_non_white                                                0.00165 ** 
## total_adult_imprisonments_per_100_000_population_age_18_69_2016  0.15409    
## pct_collegegrads                                                2.64e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.1 on 52 degrees of freedom
## Multiple R-squared:  0.7149, Adjusted R-squared:  0.6875 
## F-statistic: 26.08 on 5 and 52 DF,  p-value: 4.459e-13
# results - significant correlations for counties with higher percent of college grads and higher percent of nonwhite people
# college grads is definitely the most significant

What do these regressions tell us? Counties with more support for Kamala Harris is by far the strongest indicator of counties with more support for Prop 6, out of all the variables in our data. In addition, counties with more college graduates and a higher percentage of nonwhite people are also more likely to vote for prop 6.

Prop 6 results vs. presidential results

Support for Prop 6 and Kamala Harris are highly positively correlated, but we know this is not the full explanation – while Harris won the majority of voters in California, the majority of voters voted against Prop 6. Which counties had the greatest different between how they voted for Harris and for Prop 6? Which demographic factors might play a role?

# Add new column to make vote share for Harris positive values
# Quick math assumes Trump and Kamala are the only candidates
data <- data %>%
  mutate(harris_voteshare = (100 - margin_harris_negative) / 2)

# Add new column to make vote share for Prop 6 positive values
data <- data %>%
  mutate(prop6_voteshare = (100 + prop6_margin_ppts) / 2)

# Add new column for Prop 6 minus Harris voteshare
# Higher values means counties with relatively more harris support compared to prop6 support
data <- data %>%
  mutate(harris_minus_prop6 = harris_voteshare - prop6_voteshare) 

# we also need to add this column to our map data before we map vote differentials
prop6_map_data <- prop6_map_data %>%
  mutate(harris_voteshare = (100 - margin_harris_negative) / 2) %>%
  mutate(prop6_voteshare = (100 + prop6_margin_ppts) / 2) %>% 
  mutate(harris_minus_prop6 = harris_voteshare - prop6_voteshare) 

What can we conclude? The counties with the highest vote differentials help explain why California ended up voting for Harris while voting against Prop 6. In other words: We would have expected these counties to show even MORE support for Prop 6 that they did, given how much they supported Harris.

Now let’s try plotting the vote differentials on a map.

# plot the choropleth map using the map data we already prepped
ggplot(prop6_map_data) +
  geom_sf(aes(fill = harris_minus_prop6), color = "white") + 
  scale_fill_viridis_c(option = "plasma", direction = -1, na.value = "gray80", 
                       name = "Vote Difference") +
  theme_minimal() +
  labs(title = "Vote Difference by California County",
       subtitle = "Harris Margin Minus Prop 6 Margin",
       caption = "Source: California Secretary of State") +
  theme(axis.text = element_blank(),
        axis.ticks = element_blank(),
        panel.grid = element_blank())

# export the map as PNG file
ggsave("../data/california_vote_difference_map.png", 
       plot = last_plot(),  # Use the last plotted map
       width = 10,          # Width in inches
       height = 8,          # Height in inches
       dpi = 300)           # Resolution: 300 DPI for print-quality output

Many of them area in bay area counties, so that region could be good for further exploration.

Let’s see what variables of ours correlate to high vote differentials by running another regression.

# run regression on the difference between harris and prop6 support
regression_harris_minus_prop6 <- lm(
  harris_minus_prop6 ~ median_household_income + percent_income_below_the_poverty_level + percent_non_white + total_adult_imprisonments_per_100_000_population_age_18_69_2016 + pct_collegegrads, data=data
)
summary(regression_harris_minus_prop6) 
## 
## Call:
## lm(formula = harris_minus_prop6 ~ median_household_income + percent_income_below_the_poverty_level + 
##     percent_non_white + total_adult_imprisonments_per_100_000_population_age_18_69_2016 + 
##     pct_collegegrads, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.6669 -1.6254  0.1209  1.7809  6.5754 
## 
## Coefficients:
##                                                                   Estimate
## (Intercept)                                                     -2.560e+00
## median_household_income                                          5.010e-05
## percent_income_below_the_poverty_level                          -5.693e-02
## percent_non_white                                                7.377e+00
## total_adult_imprisonments_per_100_000_population_age_18_69_2016 -2.739e-03
## pct_collegegrads                                                 2.202e+01
##                                                                 Std. Error
## (Intercept)                                                      4.822e+00
## median_household_income                                          6.187e-05
## percent_income_below_the_poverty_level                           2.062e-01
## percent_non_white                                                3.459e+00
## total_adult_imprisonments_per_100_000_population_age_18_69_2016  2.437e-03
## pct_collegegrads                                                 9.885e+00
##                                                                 t value
## (Intercept)                                                      -0.531
## median_household_income                                           0.810
## percent_income_below_the_poverty_level                           -0.276
## percent_non_white                                                 2.133
## total_adult_imprisonments_per_100_000_population_age_18_69_2016  -1.124
## pct_collegegrads                                                  2.227
##                                                                 Pr(>|t|)  
## (Intercept)                                                       0.5978  
## median_household_income                                           0.4217  
## percent_income_below_the_poverty_level                            0.7835  
## percent_non_white                                                 0.0377 *
## total_adult_imprisonments_per_100_000_population_age_18_69_2016   0.2662  
## pct_collegegrads                                                  0.0303 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.388 on 52 degrees of freedom
## Multiple R-squared:  0.6656, Adjusted R-squared:  0.6334 
## F-statistic:  20.7 on 5 and 52 DF,  p-value: 2.559e-11
# results - significant correlations for counties with more college grads and greater pct nonwhite
# even so, explaining to the public why this regression is important will be challenging -- I think it's better to just point out the counties with the highest vote differentials

What can we conclude? Counties with relatively more Harris support and less prop 6 support are more highly educated (more college grads) and have more non-white people in them. Importantly, this flips our thinking of these variables from our last regression. While counties with more college grads and a greater percent of non-white people did support Prop 6 more overall, they also showed a greater gap between Harris support and Prop 6 support.

Next steps

-- could repeat regression but at the precinct level for the bay area, comparing precinct support for harris and prop6 - possible explanation could be more Asian people in these areas are driving the difference in vote outcomes
-- replace quick math for vote differentials with final certified numbers