Lab 4: Changing times

Stefania and Agathe

The goal of this week’s lab is to see in which counties Trump did relatively well compared to earlier years.

For this, you:

Election data

First, we get the election data results from Harvard dataverse (https://dataverse.harvard.edu/file.xhtml?fileId=4788675&version=9.0) To make this easy, we use the dataverse package:

library(tidyverse)
library(dataverse)

This pacakge has a function called get_dataframe_by_name, which you give the name of the dataset and data file:

# Notice cache=TRUE - this means the data is only download once
d_raw = read_csv("C:/Users/agath/OneDrive/Documents/PCDA assignments/countypres_2000-2020.csv")

Data cleaning

As a first step, clean the data to keep only the republican party, only keep 2000 - 2016, drop the different modes except for mode=“TOTAL”, and only keep and rename the relevant columns (year, fips, party, votes, totalvotes).

d <- filter(d_raw, mode == 'TOTAL')
d <- select(d,year, fips = 'county_fips', party, votes = 'candidatevotes',totalvotes)

Sanity checks

When doing data science or “big” data research, it is wise to check whether the data confirms to your expectatoins. For example, for this data set we would expect all year-county-party combinations to be unique: each county should have only one result per election, right?

Check whether this is the case by grouping on those variables, mutating to compute the number of rows (using n=n()), and then filtering on combinations with more than one case. Can you see what the problem(s) are?

Another good check is whether there are any missing values where you don’t expect them. Filter to see if year, fips, or party is missing.

Note that for both checks, you probably only want to show results on the screen, not actually save/remember the data containing the problems.

In this case, you probably want to remove the rows with missing values. Filter your dataset to remove those rows, and store/remember the result.

# Sanity checks
group_by(d,year,fips,party) |>
  mutate(n=n()) |>
  filter(n>1) 
## # A tibble: 56 × 6
## # Groups:   year, fips, party [20]
##     year fips  party      votes totalvotes     n
##    <dbl> <chr> <chr>      <dbl>      <dbl> <int>
##  1  2000 <NA>  DEMOCRAT       0          0     3
##  2  2000 <NA>  DEMOCRAT       0          0     3
##  3  2000 <NA>  DEMOCRAT       0          0     3
##  4  2000 <NA>  REPUBLICAN     0          0     3
##  5  2000 <NA>  REPUBLICAN     0          0     3
##  6  2000 <NA>  REPUBLICAN     0          0     3
##  7  2000 <NA>  GREEN          0          0     3
##  8  2000 <NA>  GREEN          0          0     3
##  9  2000 <NA>  GREEN          0          0     3
## 10  2000 <NA>  OTHER          0          0     3
## # ℹ 46 more rows
filter(d_raw,is.na(year)|is.na(county_fips)|is.na(party)) 
year state state_po county_name county_fips office candidate party candidatevotes totalvotes version mode
2000 CONNECTICUT CT STATEWIDE WRITEIN NA US PRESIDENT AL GORE DEMOCRAT 0 0 20220315 TOTAL
2000 MAINE ME MAINE UOCAVA NA US PRESIDENT AL GORE DEMOCRAT 0 0 20220315 TOTAL
2000 RHODE ISLAND RI FEDERAL PRECINCT NA US PRESIDENT AL GORE DEMOCRAT 0 0 20220315 TOTAL
2000 CONNECTICUT CT STATEWIDE WRITEIN NA US PRESIDENT GEORGE W. BUSH REPUBLICAN 0 0 20220315 TOTAL
2000 MAINE ME MAINE UOCAVA NA US PRESIDENT GEORGE W. BUSH REPUBLICAN 0 0 20220315 TOTAL
2000 RHODE ISLAND RI FEDERAL PRECINCT NA US PRESIDENT GEORGE W. BUSH REPUBLICAN 0 0 20220315 TOTAL
2000 CONNECTICUT CT STATEWIDE WRITEIN NA US PRESIDENT RALPH NADER GREEN 0 0 20220315 TOTAL
2000 MAINE ME MAINE UOCAVA NA US PRESIDENT RALPH NADER GREEN 0 0 20220315 TOTAL
2000 RHODE ISLAND RI FEDERAL PRECINCT NA US PRESIDENT RALPH NADER GREEN 0 0 20220315 TOTAL
2000 CONNECTICUT CT STATEWIDE WRITEIN NA US PRESIDENT OTHER OTHER 0 0 20220315 TOTAL
2000 MAINE ME MAINE UOCAVA NA US PRESIDENT OTHER OTHER 0 0 20220315 TOTAL
2000 RHODE ISLAND RI FEDERAL PRECINCT NA US PRESIDENT OTHER OTHER 0 0 20220315 TOTAL
2004 CONNECTICUT CT STATEWIDE WRITEIN NA US PRESIDENT JOHN KERRY DEMOCRAT 0 0 20220315 TOTAL
2004 MAINE ME MAINE UOCAVA NA US PRESIDENT JOHN KERRY DEMOCRAT 0 0 20220315 TOTAL
2004 RHODE ISLAND RI FEDERAL PRECINCT NA US PRESIDENT JOHN KERRY DEMOCRAT 0 0 20220315 TOTAL
2004 CONNECTICUT CT STATEWIDE WRITEIN NA US PRESIDENT GEORGE W. BUSH REPUBLICAN 0 0 20220315 TOTAL
2004 MAINE ME MAINE UOCAVA NA US PRESIDENT GEORGE W. BUSH REPUBLICAN 0 0 20220315 TOTAL
2004 RHODE ISLAND RI FEDERAL PRECINCT NA US PRESIDENT GEORGE W. BUSH REPUBLICAN 0 0 20220315 TOTAL
2004 CONNECTICUT CT STATEWIDE WRITEIN NA US PRESIDENT OTHER OTHER 0 0 20220315 TOTAL
2004 MAINE ME MAINE UOCAVA NA US PRESIDENT OTHER OTHER 0 0 20220315 TOTAL
2004 RHODE ISLAND RI FEDERAL PRECINCT NA US PRESIDENT OTHER OTHER 0 0 20220315 TOTAL
2008 CONNECTICUT CT STATEWIDE WRITEIN NA US PRESIDENT BARACK OBAMA DEMOCRAT 0 0 20220315 TOTAL
2008 MAINE ME MAINE UOCAVA NA US PRESIDENT BARACK OBAMA DEMOCRAT 0 0 20220315 TOTAL
2008 RHODE ISLAND RI FEDERAL PRECINCT NA US PRESIDENT BARACK OBAMA DEMOCRAT 0 0 20220315 TOTAL
2008 CONNECTICUT CT STATEWIDE WRITEIN NA US PRESIDENT JOHN MCCAIN REPUBLICAN 0 0 20220315 TOTAL
2008 MAINE ME MAINE UOCAVA NA US PRESIDENT JOHN MCCAIN REPUBLICAN 0 0 20220315 TOTAL
2008 RHODE ISLAND RI FEDERAL PRECINCT NA US PRESIDENT JOHN MCCAIN REPUBLICAN 0 0 20220315 TOTAL
2008 CONNECTICUT CT STATEWIDE WRITEIN NA US PRESIDENT OTHER OTHER 0 0 20220315 TOTAL
2008 MAINE ME MAINE UOCAVA NA US PRESIDENT OTHER OTHER 0 0 20220315 TOTAL
2008 RHODE ISLAND RI FEDERAL PRECINCT NA US PRESIDENT OTHER OTHER 0 0 20220315 TOTAL
2012 CONNECTICUT CT STATEWIDE WRITEIN NA US PRESIDENT BARACK OBAMA DEMOCRAT 0 918 20220315 TOTAL
2012 MAINE ME MAINE UOCAVA NA US PRESIDENT BARACK OBAMA DEMOCRAT 2071 3054 20220315 TOTAL
2012 RHODE ISLAND RI FEDERAL PRECINCT NA US PRESIDENT BARACK OBAMA DEMOCRAT 268 333 20220315 TOTAL
2012 CONNECTICUT CT STATEWIDE WRITEIN NA US PRESIDENT MITT ROMNEY REPUBLICAN 0 918 20220315 TOTAL
2012 MAINE ME MAINE UOCAVA NA US PRESIDENT MITT ROMNEY REPUBLICAN 858 3054 20220315 TOTAL
2012 RHODE ISLAND RI FEDERAL PRECINCT NA US PRESIDENT MITT ROMNEY REPUBLICAN 53 333 20220315 TOTAL
2012 CONNECTICUT CT STATEWIDE WRITEIN NA US PRESIDENT OTHER OTHER 918 918 20220315 TOTAL
2012 MAINE ME MAINE UOCAVA NA US PRESIDENT OTHER OTHER 125 3054 20220315 TOTAL
2012 RHODE ISLAND RI FEDERAL PRECINCT NA US PRESIDENT OTHER OTHER 12 333 20220315 TOTAL
2016 CONNECTICUT CT STATEWIDE WRITEIN NA US PRESIDENT HILLARY CLINTON DEMOCRAT 0 2616 20220315 TOTAL
2016 MAINE ME MAINE UOCAVA NA US PRESIDENT HILLARY CLINTON DEMOCRAT 3017 3986 20220315 TOTAL
2016 RHODE ISLAND RI FEDERAL PRECINCT NA US PRESIDENT HILLARY CLINTON DEMOCRAT 637 728 20220315 TOTAL
2016 CONNECTICUT CT STATEWIDE WRITEIN NA US PRESIDENT DONALD TRUMP REPUBLICAN 0 2616 20220315 TOTAL
2016 MAINE ME MAINE UOCAVA NA US PRESIDENT DONALD TRUMP REPUBLICAN 648 3986 20220315 TOTAL
2016 RHODE ISLAND RI FEDERAL PRECINCT NA US PRESIDENT DONALD TRUMP REPUBLICAN 53 728 20220315 TOTAL
2016 CONNECTICUT CT STATEWIDE WRITEIN NA US PRESIDENT OTHER OTHER 2616 2616 20220315 TOTAL
2016 MAINE ME MAINE UOCAVA NA US PRESIDENT OTHER OTHER 321 3986 20220315 TOTAL
2016 RHODE ISLAND RI FEDERAL PRECINCT NA US PRESIDENT OTHER OTHER 38 728 20220315 TOTAL
2020 DISTRICT OF COLUMBIA DC DISTRICT OF COLUMBIA NA US PRESIDENT JOSEPH R BIDEN JR DEMOCRAT 317323 344356 20220315 TOTAL
2020 DISTRICT OF COLUMBIA DC DISTRICT OF COLUMBIA NA US PRESIDENT OTHER GREEN 1726 344356 20220315 TOTAL
2020 DISTRICT OF COLUMBIA DC DISTRICT OF COLUMBIA NA US PRESIDENT JO JORGENSEN LIBERTARIAN 2036 344356 20220315 TOTAL
2020 DISTRICT OF COLUMBIA DC DISTRICT OF COLUMBIA NA US PRESIDENT OTHER OTHER 4685 344356 20220315 TOTAL
2020 DISTRICT OF COLUMBIA DC DISTRICT OF COLUMBIA NA US PRESIDENT DONALD J TRUMP REPUBLICAN 18586 344356 20220315 TOTAL
2020 RHODE ISLAND RI FEDERAL PRECINCT NA US PRESIDENT JOSEPH R BIDEN JR DEMOCRAT 1276 1374 20220315 TOTAL
2020 RHODE ISLAND RI FEDERAL PRECINCT NA US PRESIDENT JO JORGENSEN LIBERTARIAN 6 1374 20220315 TOTAL
2020 RHODE ISLAND RI FEDERAL PRECINCT NA US PRESIDENT OTHER OTHER 7 1374 20220315 TOTAL
2020 RHODE ISLAND RI FEDERAL PRECINCT NA US PRESIDENT DONALD J TRUMP REPUBLICAN 85 1374 20220315 TOTAL
# Remove rows as needed

d<- filter(d,!is.na(year)&!is.na(fips)&!is.na(party)) 

Compute variables: compare vote share

Next, we need to compute the vote share of republicans in each election. Compute this vote share from the votes and totalvotes column:

d_rep <- filter(d,party == 'REPUBLICAN') 
d_rep <- mutate(d_rep,vote_share = votes/totalvotes)

Now, compute a new column containing the average vote share in all elections for each county. Do this using group_by and mutate, and show the head of your results:

d_rep <- d_rep|> 
  group_by(fips) |>
  mutate(average=mean(vote_share)) |>
  arrange(fips)
head(d_rep)
## # A tibble: 6 × 7
## # Groups:   fips [1]
##    year fips  party      votes totalvotes vote_share average
##   <dbl> <chr> <chr>      <dbl>      <dbl>      <dbl>   <dbl>
## 1  2000 01001 REPUBLICAN 11993      17208      0.697   0.726
## 2  2004 01001 REPUBLICAN 15196      20081      0.757   0.726
## 3  2008 01001 REPUBLICAN 17403      23641      0.736   0.726
## 4  2012 01001 REPUBLICAN 17379      23932      0.726   0.726
## 5  2016 01001 REPUBLICAN 18172      24973      0.728   0.726
## 6  2020 01001 REPUBLICAN 19838      27770      0.714   0.726

Finally, compute how well Trump did in 2016 compared to the average result per county. Arrange to show to counties where Trump did best compared to earlier Republicans.

# Code to show highest and/or lowest change
d_2016 <- filter(d_rep, year == 2016) |> 
   mutate(change=vote_share-average) |>
  arrange(change)

head(d_2016)
## # A tibble: 6 × 8
## # Groups:   fips [6]
##    year fips  party       votes totalvotes vote_share average change
##   <dbl> <chr> <chr>       <dbl>      <dbl>      <dbl>   <dbl>  <dbl>
## 1  2016 02033 REPUBLICAN   2732       9934      0.275   0.534 -0.259
## 2  2016 16065 REPUBLICAN   8941      15688      0.570   0.825 -0.255
## 3  2016 49005 REPUBLICAN  21139      46157      0.458   0.707 -0.249
## 4  2016 49049 REPUBLICAN 102182     201551      0.507   0.754 -0.247
## 5  2016 49011 REPUBLICAN  62219     138411      0.450   0.680 -0.231
## 6  2016 49035 REPUBLICAN 138043     418868      0.330   0.511 -0.182
tail(d_2016)
## # A tibble: 6 × 8
## # Groups:   fips [6]
##    year fips  party      votes totalvotes vote_share average  change
##   <dbl> <chr> <chr>      <dbl>      <dbl>      <dbl>   <dbl>   <dbl>
## 1  2016 21119 REPUBLICAN  4357       5763      0.756   0.536   0.220
## 2  2016 51027 REPUBLICAN  7296       9247      0.789   0.565   0.224
## 3  2016 21063 REPUBLICAN  2000       2855      0.701   0.434   0.266
## 4  2016 02099 REPUBLICAN    40        342      0.117 NaN     NaN    
## 5  2016 36000 REPUBLICAN 24654     128601      0.192 NaN     NaN    
## 6  2016 51515 REPUBLICAN     0          0    NaN     NaN     NaN

Do those results make sense? What kind of counties or states are those?

These results make sense as we can see Trump was more successful in counties with lower population which is consistent with the fact he did better in rural areas and worst in bigger cities.

Combine with county-level demographics

You can download the county-level facts from 2016:

library(tidyverse)
url = "https://raw.githubusercontent.com/houstondatavis/data-jam-august-2016/master/csv/county_facts.csv"
facts = read_csv(url)

Combining the data

Now, combine the county level data with the relative results computed above:

facts$fips <- as.character(facts$fips)
d_complete <- inner_join(d_2016,facts, by="fips")

Analysis

Finally, select a variable from the demographics that you think could (partly) explain the results. Create a scatter plot and regression/correlation to show whether this variable is related with the relative vote outcome.

cor.test(d_complete$change,d_complete$Pop_college_grad_pct)
## 
##  Pearson's product-moment correlation
## 
## data:  d_complete$change and d_complete$Pop_college_grad_pct
## t = -39.777, df = 2822, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.6224946 -0.5752032
## sample estimates:
##        cor 
## -0.5993717
d_complete |>
  ggplot(aes(x=change,y=Pop_college_grad_pct)) + geom_point() +
  ggtitle("Do college graduates vote for Trump?") +
  ylab("percentage of college graduates") +
  xlab("Trump success compared to earlier republicans") +
  theme_classic()

Interpretation

From these results, we can see there is a negative correlation of -0.5993717 between the success of Trump relative to earlier republicans, and the percentage of college graduates in a given county. Specifically,the scatter plots shows how counties with the most drastic change in favor of Trump were very low in college education. This gives an insight on how the the radically different communication strategies and media presence of Trump could have had more of an impact on less highly educated people.

Bonus

Re-run the analysis above, but at the state level. How should you summarize the voting results and demographics from the county level to the state level?