Stefania and Agathe
The goal of this week’s lab is to see in which counties Trump did relatively well compared to earlier years.
For this, you:
First, we get the election data results from Harvard dataverse (https://dataverse.harvard.edu/file.xhtml?fileId=4788675&version=9.0) To make this easy, we use the dataverse package:
This pacakge has a function called
get_dataframe_by_name, which you give the name of the
dataset and data file:
# Notice cache=TRUE - this means the data is only download once
d_raw = read_csv("C:/Users/agath/OneDrive/Documents/PCDA assignments/countypres_2000-2020.csv")As a first step, clean the data to keep only the republican party, only keep 2000 - 2016, drop the different modes except for mode=“TOTAL”, and only keep and rename the relevant columns (year, fips, party, votes, totalvotes).
d <- filter(d_raw, mode == 'TOTAL')
d <- select(d,year, fips = 'county_fips', party, votes = 'candidatevotes',totalvotes)When doing data science or “big” data research, it is wise to check whether the data confirms to your expectatoins. For example, for this data set we would expect all year-county-party combinations to be unique: each county should have only one result per election, right?
Check whether this is the case by grouping on those
variables, mutating to compute the number of rows (using
n=n()), and then filtering on combinations with
more than one case. Can you see what the problem(s) are?
Another good check is whether there are any missing values where you don’t expect them. Filter to see if year, fips, or party is missing.
Note that for both checks, you probably only want to show results on the screen, not actually save/remember the data containing the problems.
In this case, you probably want to remove the rows with missing values. Filter your dataset to remove those rows, and store/remember the result.
## # A tibble: 56 × 6
## # Groups: year, fips, party [20]
## year fips party votes totalvotes n
## <dbl> <chr> <chr> <dbl> <dbl> <int>
## 1 2000 <NA> DEMOCRAT 0 0 3
## 2 2000 <NA> DEMOCRAT 0 0 3
## 3 2000 <NA> DEMOCRAT 0 0 3
## 4 2000 <NA> REPUBLICAN 0 0 3
## 5 2000 <NA> REPUBLICAN 0 0 3
## 6 2000 <NA> REPUBLICAN 0 0 3
## 7 2000 <NA> GREEN 0 0 3
## 8 2000 <NA> GREEN 0 0 3
## 9 2000 <NA> GREEN 0 0 3
## 10 2000 <NA> OTHER 0 0 3
## # ℹ 46 more rows
| year | state | state_po | county_name | county_fips | office | candidate | party | candidatevotes | totalvotes | version | mode |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 2000 | CONNECTICUT | CT | STATEWIDE WRITEIN | NA | US PRESIDENT | AL GORE | DEMOCRAT | 0 | 0 | 20220315 | TOTAL |
| 2000 | MAINE | ME | MAINE UOCAVA | NA | US PRESIDENT | AL GORE | DEMOCRAT | 0 | 0 | 20220315 | TOTAL |
| 2000 | RHODE ISLAND | RI | FEDERAL PRECINCT | NA | US PRESIDENT | AL GORE | DEMOCRAT | 0 | 0 | 20220315 | TOTAL |
| 2000 | CONNECTICUT | CT | STATEWIDE WRITEIN | NA | US PRESIDENT | GEORGE W. BUSH | REPUBLICAN | 0 | 0 | 20220315 | TOTAL |
| 2000 | MAINE | ME | MAINE UOCAVA | NA | US PRESIDENT | GEORGE W. BUSH | REPUBLICAN | 0 | 0 | 20220315 | TOTAL |
| 2000 | RHODE ISLAND | RI | FEDERAL PRECINCT | NA | US PRESIDENT | GEORGE W. BUSH | REPUBLICAN | 0 | 0 | 20220315 | TOTAL |
| 2000 | CONNECTICUT | CT | STATEWIDE WRITEIN | NA | US PRESIDENT | RALPH NADER | GREEN | 0 | 0 | 20220315 | TOTAL |
| 2000 | MAINE | ME | MAINE UOCAVA | NA | US PRESIDENT | RALPH NADER | GREEN | 0 | 0 | 20220315 | TOTAL |
| 2000 | RHODE ISLAND | RI | FEDERAL PRECINCT | NA | US PRESIDENT | RALPH NADER | GREEN | 0 | 0 | 20220315 | TOTAL |
| 2000 | CONNECTICUT | CT | STATEWIDE WRITEIN | NA | US PRESIDENT | OTHER | OTHER | 0 | 0 | 20220315 | TOTAL |
| 2000 | MAINE | ME | MAINE UOCAVA | NA | US PRESIDENT | OTHER | OTHER | 0 | 0 | 20220315 | TOTAL |
| 2000 | RHODE ISLAND | RI | FEDERAL PRECINCT | NA | US PRESIDENT | OTHER | OTHER | 0 | 0 | 20220315 | TOTAL |
| 2004 | CONNECTICUT | CT | STATEWIDE WRITEIN | NA | US PRESIDENT | JOHN KERRY | DEMOCRAT | 0 | 0 | 20220315 | TOTAL |
| 2004 | MAINE | ME | MAINE UOCAVA | NA | US PRESIDENT | JOHN KERRY | DEMOCRAT | 0 | 0 | 20220315 | TOTAL |
| 2004 | RHODE ISLAND | RI | FEDERAL PRECINCT | NA | US PRESIDENT | JOHN KERRY | DEMOCRAT | 0 | 0 | 20220315 | TOTAL |
| 2004 | CONNECTICUT | CT | STATEWIDE WRITEIN | NA | US PRESIDENT | GEORGE W. BUSH | REPUBLICAN | 0 | 0 | 20220315 | TOTAL |
| 2004 | MAINE | ME | MAINE UOCAVA | NA | US PRESIDENT | GEORGE W. BUSH | REPUBLICAN | 0 | 0 | 20220315 | TOTAL |
| 2004 | RHODE ISLAND | RI | FEDERAL PRECINCT | NA | US PRESIDENT | GEORGE W. BUSH | REPUBLICAN | 0 | 0 | 20220315 | TOTAL |
| 2004 | CONNECTICUT | CT | STATEWIDE WRITEIN | NA | US PRESIDENT | OTHER | OTHER | 0 | 0 | 20220315 | TOTAL |
| 2004 | MAINE | ME | MAINE UOCAVA | NA | US PRESIDENT | OTHER | OTHER | 0 | 0 | 20220315 | TOTAL |
| 2004 | RHODE ISLAND | RI | FEDERAL PRECINCT | NA | US PRESIDENT | OTHER | OTHER | 0 | 0 | 20220315 | TOTAL |
| 2008 | CONNECTICUT | CT | STATEWIDE WRITEIN | NA | US PRESIDENT | BARACK OBAMA | DEMOCRAT | 0 | 0 | 20220315 | TOTAL |
| 2008 | MAINE | ME | MAINE UOCAVA | NA | US PRESIDENT | BARACK OBAMA | DEMOCRAT | 0 | 0 | 20220315 | TOTAL |
| 2008 | RHODE ISLAND | RI | FEDERAL PRECINCT | NA | US PRESIDENT | BARACK OBAMA | DEMOCRAT | 0 | 0 | 20220315 | TOTAL |
| 2008 | CONNECTICUT | CT | STATEWIDE WRITEIN | NA | US PRESIDENT | JOHN MCCAIN | REPUBLICAN | 0 | 0 | 20220315 | TOTAL |
| 2008 | MAINE | ME | MAINE UOCAVA | NA | US PRESIDENT | JOHN MCCAIN | REPUBLICAN | 0 | 0 | 20220315 | TOTAL |
| 2008 | RHODE ISLAND | RI | FEDERAL PRECINCT | NA | US PRESIDENT | JOHN MCCAIN | REPUBLICAN | 0 | 0 | 20220315 | TOTAL |
| 2008 | CONNECTICUT | CT | STATEWIDE WRITEIN | NA | US PRESIDENT | OTHER | OTHER | 0 | 0 | 20220315 | TOTAL |
| 2008 | MAINE | ME | MAINE UOCAVA | NA | US PRESIDENT | OTHER | OTHER | 0 | 0 | 20220315 | TOTAL |
| 2008 | RHODE ISLAND | RI | FEDERAL PRECINCT | NA | US PRESIDENT | OTHER | OTHER | 0 | 0 | 20220315 | TOTAL |
| 2012 | CONNECTICUT | CT | STATEWIDE WRITEIN | NA | US PRESIDENT | BARACK OBAMA | DEMOCRAT | 0 | 918 | 20220315 | TOTAL |
| 2012 | MAINE | ME | MAINE UOCAVA | NA | US PRESIDENT | BARACK OBAMA | DEMOCRAT | 2071 | 3054 | 20220315 | TOTAL |
| 2012 | RHODE ISLAND | RI | FEDERAL PRECINCT | NA | US PRESIDENT | BARACK OBAMA | DEMOCRAT | 268 | 333 | 20220315 | TOTAL |
| 2012 | CONNECTICUT | CT | STATEWIDE WRITEIN | NA | US PRESIDENT | MITT ROMNEY | REPUBLICAN | 0 | 918 | 20220315 | TOTAL |
| 2012 | MAINE | ME | MAINE UOCAVA | NA | US PRESIDENT | MITT ROMNEY | REPUBLICAN | 858 | 3054 | 20220315 | TOTAL |
| 2012 | RHODE ISLAND | RI | FEDERAL PRECINCT | NA | US PRESIDENT | MITT ROMNEY | REPUBLICAN | 53 | 333 | 20220315 | TOTAL |
| 2012 | CONNECTICUT | CT | STATEWIDE WRITEIN | NA | US PRESIDENT | OTHER | OTHER | 918 | 918 | 20220315 | TOTAL |
| 2012 | MAINE | ME | MAINE UOCAVA | NA | US PRESIDENT | OTHER | OTHER | 125 | 3054 | 20220315 | TOTAL |
| 2012 | RHODE ISLAND | RI | FEDERAL PRECINCT | NA | US PRESIDENT | OTHER | OTHER | 12 | 333 | 20220315 | TOTAL |
| 2016 | CONNECTICUT | CT | STATEWIDE WRITEIN | NA | US PRESIDENT | HILLARY CLINTON | DEMOCRAT | 0 | 2616 | 20220315 | TOTAL |
| 2016 | MAINE | ME | MAINE UOCAVA | NA | US PRESIDENT | HILLARY CLINTON | DEMOCRAT | 3017 | 3986 | 20220315 | TOTAL |
| 2016 | RHODE ISLAND | RI | FEDERAL PRECINCT | NA | US PRESIDENT | HILLARY CLINTON | DEMOCRAT | 637 | 728 | 20220315 | TOTAL |
| 2016 | CONNECTICUT | CT | STATEWIDE WRITEIN | NA | US PRESIDENT | DONALD TRUMP | REPUBLICAN | 0 | 2616 | 20220315 | TOTAL |
| 2016 | MAINE | ME | MAINE UOCAVA | NA | US PRESIDENT | DONALD TRUMP | REPUBLICAN | 648 | 3986 | 20220315 | TOTAL |
| 2016 | RHODE ISLAND | RI | FEDERAL PRECINCT | NA | US PRESIDENT | DONALD TRUMP | REPUBLICAN | 53 | 728 | 20220315 | TOTAL |
| 2016 | CONNECTICUT | CT | STATEWIDE WRITEIN | NA | US PRESIDENT | OTHER | OTHER | 2616 | 2616 | 20220315 | TOTAL |
| 2016 | MAINE | ME | MAINE UOCAVA | NA | US PRESIDENT | OTHER | OTHER | 321 | 3986 | 20220315 | TOTAL |
| 2016 | RHODE ISLAND | RI | FEDERAL PRECINCT | NA | US PRESIDENT | OTHER | OTHER | 38 | 728 | 20220315 | TOTAL |
| 2020 | DISTRICT OF COLUMBIA | DC | DISTRICT OF COLUMBIA | NA | US PRESIDENT | JOSEPH R BIDEN JR | DEMOCRAT | 317323 | 344356 | 20220315 | TOTAL |
| 2020 | DISTRICT OF COLUMBIA | DC | DISTRICT OF COLUMBIA | NA | US PRESIDENT | OTHER | GREEN | 1726 | 344356 | 20220315 | TOTAL |
| 2020 | DISTRICT OF COLUMBIA | DC | DISTRICT OF COLUMBIA | NA | US PRESIDENT | JO JORGENSEN | LIBERTARIAN | 2036 | 344356 | 20220315 | TOTAL |
| 2020 | DISTRICT OF COLUMBIA | DC | DISTRICT OF COLUMBIA | NA | US PRESIDENT | OTHER | OTHER | 4685 | 344356 | 20220315 | TOTAL |
| 2020 | DISTRICT OF COLUMBIA | DC | DISTRICT OF COLUMBIA | NA | US PRESIDENT | DONALD J TRUMP | REPUBLICAN | 18586 | 344356 | 20220315 | TOTAL |
| 2020 | RHODE ISLAND | RI | FEDERAL PRECINCT | NA | US PRESIDENT | JOSEPH R BIDEN JR | DEMOCRAT | 1276 | 1374 | 20220315 | TOTAL |
| 2020 | RHODE ISLAND | RI | FEDERAL PRECINCT | NA | US PRESIDENT | JO JORGENSEN | LIBERTARIAN | 6 | 1374 | 20220315 | TOTAL |
| 2020 | RHODE ISLAND | RI | FEDERAL PRECINCT | NA | US PRESIDENT | OTHER | OTHER | 7 | 1374 | 20220315 | TOTAL |
| 2020 | RHODE ISLAND | RI | FEDERAL PRECINCT | NA | US PRESIDENT | DONALD J TRUMP | REPUBLICAN | 85 | 1374 | 20220315 | TOTAL |
Next, we need to compute the vote share of republicans in each election. Compute this vote share from the votes and totalvotes column:
Now, compute a new column containing the average vote share
in all elections for each county. Do this using group_by
and mutate, and show the head of your
results:
## # A tibble: 6 × 7
## # Groups: fips [1]
## year fips party votes totalvotes vote_share average
## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 2000 01001 REPUBLICAN 11993 17208 0.697 0.726
## 2 2004 01001 REPUBLICAN 15196 20081 0.757 0.726
## 3 2008 01001 REPUBLICAN 17403 23641 0.736 0.726
## 4 2012 01001 REPUBLICAN 17379 23932 0.726 0.726
## 5 2016 01001 REPUBLICAN 18172 24973 0.728 0.726
## 6 2020 01001 REPUBLICAN 19838 27770 0.714 0.726
Finally, compute how well Trump did in 2016 compared to the average result per county. Arrange to show to counties where Trump did best compared to earlier Republicans.
# Code to show highest and/or lowest change
d_2016 <- filter(d_rep, year == 2016) |>
mutate(change=vote_share-average) |>
arrange(change)
head(d_2016)## # A tibble: 6 × 8
## # Groups: fips [6]
## year fips party votes totalvotes vote_share average change
## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2016 02033 REPUBLICAN 2732 9934 0.275 0.534 -0.259
## 2 2016 16065 REPUBLICAN 8941 15688 0.570 0.825 -0.255
## 3 2016 49005 REPUBLICAN 21139 46157 0.458 0.707 -0.249
## 4 2016 49049 REPUBLICAN 102182 201551 0.507 0.754 -0.247
## 5 2016 49011 REPUBLICAN 62219 138411 0.450 0.680 -0.231
## 6 2016 49035 REPUBLICAN 138043 418868 0.330 0.511 -0.182
## # A tibble: 6 × 8
## # Groups: fips [6]
## year fips party votes totalvotes vote_share average change
## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2016 21119 REPUBLICAN 4357 5763 0.756 0.536 0.220
## 2 2016 51027 REPUBLICAN 7296 9247 0.789 0.565 0.224
## 3 2016 21063 REPUBLICAN 2000 2855 0.701 0.434 0.266
## 4 2016 02099 REPUBLICAN 40 342 0.117 NaN NaN
## 5 2016 36000 REPUBLICAN 24654 128601 0.192 NaN NaN
## 6 2016 51515 REPUBLICAN 0 0 NaN NaN NaN
Do those results make sense? What kind of counties or states are those?
These results make sense as we can see Trump was more successful in counties with lower population which is consistent with the fact he did better in rural areas and worst in bigger cities.
You can download the county-level facts from 2016:
library(tidyverse)
url = "https://raw.githubusercontent.com/houstondatavis/data-jam-august-2016/master/csv/county_facts.csv"
facts = read_csv(url)Now, combine the county level data with the relative results computed above:
Finally, select a variable from the demographics that you think could (partly) explain the results. Create a scatter plot and regression/correlation to show whether this variable is related with the relative vote outcome.
##
## Pearson's product-moment correlation
##
## data: d_complete$change and d_complete$Pop_college_grad_pct
## t = -39.777, df = 2822, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.6224946 -0.5752032
## sample estimates:
## cor
## -0.5993717
d_complete |>
ggplot(aes(x=change,y=Pop_college_grad_pct)) + geom_point() +
ggtitle("Do college graduates vote for Trump?") +
ylab("percentage of college graduates") +
xlab("Trump success compared to earlier republicans") +
theme_classic()From these results, we can see there is a negative correlation of -0.5993717 between the success of Trump relative to earlier republicans, and the percentage of college graduates in a given county. Specifically,the scatter plots shows how counties with the most drastic change in favor of Trump were very low in college education. This gives an insight on how the the radically different communication strategies and media presence of Trump could have had more of an impact on less highly educated people.
Re-run the analysis above, but at the state level. How should you summarize the voting results and demographics from the county level to the state level?