Data wrangling & visualisation

The Democratic candidate, Joe Biden, flipped 5 states (Michigan, Wisconsin, Arizona, Pennsylvania, and Georgia) won by his incumbent counterpart, Donald Trump, in 2016, and won the election. So, it could be interesting to view how changes in county-level result contributed to the former vice president’s triumph in the just-passed election.

Data from 2020 election was obtained through web scrapping. https://github.com/charlottetse33/portfolio/blob/main/NBC%20US%20election/web%20scrapping.R

Biden flipped 5 states that Trump won in 2016. They are Michigan, Wisconsin, Arizona, Pennsylvania, and Georgia. The following code creates a character vector for the names of the 5 states:

flipped_states <- c('arizona', 'georgia', 'michigan', 'pennsylvania', 'wisconsin')

We’re going to zoom in on counties of these 5 states to inspect changes in vote between the 2016 and 2020 elections.

(county <- as_tibble(map_data("county")))

## # A tibble: 87,949 x 6
##     long   lat group order region  subregion
##    <dbl> <dbl> <dbl> <int> <chr>   <chr>    
##  1 -86.5  32.3     1     1 alabama autauga  
##  2 -86.5  32.4     1     2 alabama autauga  
##  3 -86.5  32.4     1     3 alabama autauga  
##  4 -86.6  32.4     1     4 alabama autauga  
##  5 -86.6  32.4     1     5 alabama autauga  
##  6 -86.6  32.4     1     6 alabama autauga  
##  7 -86.6  32.4     1     7 alabama autauga  
##  8 -86.6  32.4     1     8 alabama autauga  
##  9 -86.6  32.4     1     9 alabama autauga  
## 10 -86.6  32.4     1    10 alabama autauga  
## # ... with 87,939 more rows

(county_fips <- as_tibble(county.fips))

## # A tibble: 3,085 x 2
##     fips polyname        
##    <int> <chr>           
##  1  1001 alabama,autauga 
##  2  1003 alabama,baldwin 
##  3  1005 alabama,barbour 
##  4  1007 alabama,bibb    
##  5  1009 alabama,blount  
##  6  1011 alabama,bullock 
##  7  1013 alabama,butler  
##  8  1015 alabama,calhoun 
##  9  1017 alabama,chambers
## 10  1019 alabama,cherokee
## # ... with 3,075 more rows

county_fips <- county_fips %>% mutate(state_county = polyname, .keep = "unused")
election_result_temp <- read_csv("https://raw.githubusercontent.com/charlottetse33/portfolio/main/NBC%20US%20election/USPresidential08-16.csv") %>% select(fips = fips_code, total_2016,dem_2016,gop_2016)  %>% mutate(fips = as.integer(fips))

## Rows: 3112 Columns: 14

## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr  (2): fips_code, county
## dbl (12): total_2008, dem_2008, gop_2008, oth_2008, total_2012, dem_2012, go...

## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.

election_res_2016_inner.join <- election_result_temp %>% inner_join(county_fips, by = "fips") 
election_res_2016 <- election_res_2016_inner.join[str_extract(election_res_2016_inner.join$state_county,"[a-z]+") %in% flipped_states,] %>% .[,2:5]

election_res_2016

## # A tibble: 396 x 4
##    total_2016 dem_2016 gop_2016 state_county           
##         <dbl>    <dbl>    <dbl> <chr>                  
##  1      18467     6431    11112 michigan,delta         
##  2       6743      904     5676 pennsylvania,fulton    
##  3     284832   169169   106559 pennsylvania,delaware  
##  4       3572     2695      841 georgia,hancock        
##  5       3577     1186     2343 georgia,seminole       
##  6      33848    16050    15871 wisconsin,sauk         
##  7      96945    34436    58941 pennsylvania,washington
##  8      14757     6774     7239 michigan,leelanau      
##  9       6285      619     5561 georgia,brantley       
## 10       3486     1156     2158 michigan,baraga        
## # ... with 386 more rows

In the 5 flipped states, there are 4 counties whose names are inconsistent between election_res_2020 and election_res_2016:

In election_res_2020, their names are "georgia,dekalb", "michigan,st. clair", "michigan,st. joseph", and "wisconsin,st. croix" (they come from NBC News webpages).

while, in election_res_2016, they are "georgia,de kalb", "michigan,st clair", "michigan,st joseph", and "wisconsin,st croix" (they come from maps::county.fips).

Data frame that combines county-level results for the 5 states:

# Provide your code to create election_res_1620
# You are not allowed to use any for/while/repeat loop in this chunk 

election_res_2020 <- read_csv("https://raw.githubusercontent.com/charlottetse33/portfolio/main/NBC%20US%20election/election_res_2020.csv")

## New names:
## * `` -> ...1

## Rows: 4588 Columns: 5

## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): state_county
## dbl (4): ...1, trump, biden, others

## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.

temp_2020 <- election_res_2020[str_extract(election_res_2020$state_county,"[a-z]+") %in% flipped_states,] %>% mutate(state_county = state_county, total_2020 = trump + biden +others, dem_2020 = trump, gop_2020 = biden, .keep = "none")

temp_2020[temp_2020[,1] == "georgia,dekalb", 1] <- "georgia,de kalb"
temp_2020[temp_2020[,1] == "michigan,st. clair", 1] <- "michigan,st clair"
temp_2020[temp_2020[,1] == "michigan,st. joseph", 1] <- "michigan,st joseph"
temp_2020[temp_2020[,1] == "michigan,st. croix", 1] <- "michigan,st croix"
temp_2020[temp_2020[,1] == "wisconsin,st. croix", 1] <- "wisconsin,st croix"

election_res_1620 <- inner_join(election_res_2016, temp_2020, by = "state_county") %>% select(state_county, total_2016, dem_2016, gop_2016, total_2020, dem_2020, gop_2020) %>% arrange(state_county)

election_res_1620

## # A tibble: 396 x 7
##    state_county     total_2016 dem_2016 gop_2016 total_2020 dem_2020 gop_2020
##    <chr>                 <dbl>    <dbl>    <dbl>      <dbl>    <dbl>    <dbl>
##  1 arizona,apache        18659    12196     5315      35183    23293    11442
##  2 arizona,cochise       43147    15291    25036      60473    23732    35557
##  3 arizona,coconino      44929    25308    16573      73346    44698    27052
##  4 arizona,gila          21398     6746    13672      27678     8943    18377
##  5 arizona,graham        11939     3301     8025      14996     4034    10749
##  6 arizona,greenlee       3243     1092     1892       3688     1182     2433
##  7 arizona,la paz         4931     1318     3381       7460     2236     5129
##  8 arizona,maricopa    1201934   549040   590465    2069475  1040774   995665
##  9 arizona,mohave        74189    16485    54656     104705    24831    78535
## 10 arizona,navajo        35409    15362    18165      51783    23383    27657
## # ... with 386 more rows

Based on election_res_1620, create a data frame that summarizes the total numbers of votes received by both parties at the state level.

# Provide your code to create election_res_1620_state
# You are not allowed to use any for/while/repeat loop in this chunk 

election_res_1620_state <- election_res_1620 %>% mutate(state = str_to_title(str_extract(election_res_1620$state_county,"[a-z]+"))) %>% group_by(state) %>% summarise(dem_2016 = sum(dem_2016), dem_2020 = sum(dem_2020), gop_2016 = sum(gop_2016), gop_2020 = sum(gop_2020)) %>% pivot_longer(., cols = dem_2016:gop_2020, names_to = "party_year", values_to = "vote") %>% mutate(party = str_extract(party_year,"[a-z]+"), year = str_extract(party_year, "[0-9]+"), .keep = "unused") %>% .[,c(1,3,4,2)]

election_res_1620_state

## # A tibble: 20 x 4
##    state        party year     vote
##    <chr>        <chr> <chr>   <dbl>
##  1 Arizona      dem   2016   936250
##  2 Arizona      dem   2020  1672143
##  3 Arizona      gop   2016  1021154
##  4 Arizona      gop   2020  1661686
##  5 Georgia      dem   2016  1837300
##  6 Georgia      dem   2020  2473633
##  7 Georgia      gop   2016  2068623
##  8 Georgia      gop   2020  2461854
##  9 Michigan     dem   2016  2267373
## 10 Michigan     dem   2020  2804040
## 11 Michigan     gop   2016  2279210
## 12 Michigan     gop   2020  2649852
## 13 Pennsylvania dem   2016  2844705
## 14 Pennsylvania dem   2020  3459923
## 15 Pennsylvania gop   2016  2912941
## 16 Pennsylvania gop   2020  3378263
## 17 Wisconsin    dem   2016  1382210
## 18 Wisconsin    dem   2020  1630673
## 19 Wisconsin    gop   2016  1409467
## 20 Wisconsin    gop   2020  1610065

With all necessary data ready, create 2 Choropleth maps for the 2016 and 2020 election results

# Provide your code to wrangle the data and create the plot
# You are not allowed to use any for/while/repeat loop in this chunk 
# Use inner_join if you want to join two data frames

state <- as_tibble(map_data("state"))
election_res_1620 <- election_res_1620 %>% mutate(result_2016 = (dem_2016 -gop_2016)/total_2016, result_2020 = (dem_2020 -gop_2020)/total_2020)
county_temp <- county %>% filter(region %in% flipped_states) %>% unite(state_county, region, subregion, sep = ",")
election_result_agg <- inner_join(election_res_1620, county_temp, by = "state_county") %>% select(state_county,result_2016, result_2020, long, lat, group, order) %>% pivot_longer(., cols = result_2016:result_2020, names_to = "year",names_prefix = "result_", values_to = "result")



p <- ggplot(election_result_agg, aes(long, lat, group = group)) + geom_polygon(aes(fill = result)) + facet_grid(~ year) + geom_polygon(data = state, aes(long, lat, group = group), colour = "black", fill = "NA", size = 0.6)


p + ggtitle("Flipped States: 2016 VS. 2020 Presidential Election \n") + theme_bw() +
  scale_fill_gradient2(name=NULL, limits = c(-1, 1), 
                       low = "#e41a1c", high = "#377eb8", 
                       breaks = c(-1, 1), labels = c("Republican Won        ", "Democrat Won")) +
  labs(x = NULL, y = NULL) +
  theme(legend.position = "bottom",
          strip.background = element_rect(fill="lightgray", size= 0.8),
          plot.title = element_text(size = 20, face = "bold"),
          strip.text.x = element_text(size = 16, face = "bold.italic"),
          legend.text = element_text(size = 16, face = "bold"),
          legend.spacing.x = unit(0.5, "line"),
          legend.key.size = unit(0.9, "cm")) +  
    guides(fill = guide_legend(title.position = "top", title.hjust = 0.5))

As you can see, only the 5 flipped states are colored. And colors represent differences in percentage of votes received by the 2 parties (dem vs. gop).

h <- ggplot(election_res_1620_state, aes(x = state, y = vote, fill = party)) + geom_histogram(stat = "identity", position = "dodge", width = 0.5) + facet_grid(~ year)

## Warning: Ignoring unknown parameters: binwidth, bins, pad

h + ggtitle("Flipped States: 2016 VS. 2020 Presidential Election \n") + theme_bw() +
  scale_y_continuous(breaks = c(0, 1000000, 2000000, 3000000), 
                     labels = c("0", "1,000", "2,000", "3,000")) + 
  scale_fill_manual(name=NULL, values = c("dem" = "#377eb8", "gop" = "#e41a1c"),
                    labels = c("gop"= "Republican Party", "dem"="Democrat Party        " )) +
  labs(x = NULL, y = "No. of votes\n(in thousands)") + 
  theme(legend.position = "bottom",
        strip.background = element_rect(fill="lightgray", size= 0.8),
        plot.title = element_text(size = 20, face = "bold"),
        strip.text.x = element_text(size = 16, face = "bold.italic"),
        legend.text = element_text(size = 16, face = "bold"),
        legend.spacing.x = unit(0.5, "line"),
        legend.key.size = unit(0.9, "cm"),
        axis.title.y = element_text(face="bold.italic", size=18),
        axis.text.x = element_text(size = 14, face="italic"),
        axis.text.y = element_text(size = 14, face="italic")
  ) + guides(fill = guide_legend(title.position = "top", title.hjust = 0.5))

Data wrangling & visualisation

Charlotte Tse