These data are from fivethirtyeight.com, 2020 General Election Forecast: link They have been running polls since June and sharing engaging visualizations with poll results and election predictions.
First read the raw csv file from my Github and load into a dataframe. Verify top few rows.
x <- read.csv(url("https://raw.githubusercontent.com/HildaRamirez/DATA-607/master/presidential_national_toplines_2020.csv"))
election <- as.data.frame(x)
head(election)
## cycle branch model modeldate candidate_inc candidate_chal
## 1 2020 President polls-plus 8/30/2020 Trump Biden
## 2 2020 President polls-plus 8/29/2020 Trump Biden
## 3 2020 President polls-plus 8/28/2020 Trump Biden
## 4 2020 President polls-plus 8/27/2020 Trump Biden
## 5 2020 President polls-plus 8/26/2020 Trump Biden
## 6 2020 President polls-plus 8/25/2020 Trump Biden
## candidate_3rd ecwin_inc ecwin_chal ecwin_3rd ec_nomajority popwin_inc
## 1 NA 0.309650 0.685925 NA 0.004425 0.187975
## 2 NA 0.307525 0.688200 NA 0.004275 0.188325
## 3 NA 0.303500 0.692525 NA 0.003975 0.186225
## 4 NA 0.299400 0.696400 NA 0.004200 0.182875
## 5 NA 0.300550 0.695550 NA 0.003900 0.185375
## 6 NA 0.270450 0.725875 NA 0.003675 0.162850
## popwin_chal popwin_3rd ev_inc ev_chal ev_3rd ev_inc_hi ev_chal_hi
## 1 0.812025 NA 221.4668 316.5332 NA 337 429
## 2 0.811675 NA 220.7868 317.2131 NA 337 429
## 3 0.813775 NA 220.4639 317.5361 NA 337 429
## 4 0.817125 NA 220.3914 317.6086 NA 336 428
## 5 0.814625 NA 220.8551 317.1448 NA 337 428
## 6 0.837150 NA 213.9388 324.0612 NA 331 432
## ev_3rd_hi ev_inc_lo ev_chal_lo ev_3rd_lo national_voteshare_inc
## 1 NA 109 201 NA 46.29174
## 2 NA 109 201 NA 46.26345
## 3 NA 109 201 NA 46.25870
## 4 NA 110 202 NA 46.28278
## 5 NA 110 201 NA 46.30215
## 6 NA 106 207 NA 46.02112
## national_voteshare_chal national_voteshare_3rd nat_voteshare_other
## 1 52.36622 NA 1.342047
## 2 52.40189 NA 1.334657
## 3 52.40611 NA 1.335190
## 4 52.39348 NA 1.323738
## 5 52.37099 NA 1.326856
## 6 52.67421 NA 1.304669
## national_voteshare_inc_hi national_voteshare_chal_hi
## 1 50.76257 56.84084
## 2 50.76155 56.91839
## 3 50.75557 56.87645
## 4 50.70728 56.78292
## 5 50.74532 56.80884
## 6 50.43880 57.08401
## national_voteshare_3rd_hi nat_voteshare_other_hi national_voteshare_inc_lo
## 1 NA 2.048664 41.80970
## 2 NA 2.051201 41.77486
## 3 NA 2.055656 41.81157
## 4 NA 2.031385 41.89773
## 5 NA 2.043283 41.88205
## 6 NA 2.018928 41.62480
## national_voteshare_chal_lo national_voteshare_3rd_lo nat_voteshare_other_lo
## 1 47.88091 NA 0.7092417
## 2 47.88276 NA 0.7007713
## 3 47.90101 NA 0.7022665
## 4 47.98074 NA 0.6946608
## 5 47.91171 NA 0.6934588
## 6 48.26421 NA 0.6764926
## timestamp simulations
## 1 05:54:02 30 Aug 2020 40000
## 2 19:00:02 29 Aug 2020 40000
## 3 19:00:03 28 Aug 2020 40000
## 4 19:00:04 27 Aug 2020 40000
## 5 19:01:03 26 Aug 2020 40000
## 6 23:50:03 25 Aug 2020 40000
There are several columns which are either blank or trivial (e.g. incumbent is always Trump and the challenger is always Biden). I’d like to create a simplified dataset in order to display some simple graphs. Let’s create a subset that includes model date, chance that the incumbent wins the electoral college, chance that the challenger wins the electoral college, chance that the incumbent wins the popular vote, and chance that the challenger wins the popular vote.
election2 <- subset(election, select = c(modeldate, ecwin_inc, ecwin_chal, popwin_inc, popwin_chal))
head(election2)
## modeldate ecwin_inc ecwin_chal popwin_inc popwin_chal
## 1 8/30/2020 0.309650 0.685925 0.187975 0.812025
## 2 8/29/2020 0.307525 0.688200 0.188325 0.811675
## 3 8/28/2020 0.303500 0.692525 0.186225 0.813775
## 4 8/27/2020 0.299400 0.696400 0.182875 0.817125
## 5 8/26/2020 0.300550 0.695550 0.185375 0.814625
## 6 8/25/2020 0.270450 0.725875 0.162850 0.837150
Now let’s rename the columns:
colnames(election2) <- c("Date","ElectoralTrump","ElectoralBiden","PopularTrump","PopularBiden")
head(election2)
## Date ElectoralTrump ElectoralBiden PopularTrump PopularBiden
## 1 8/30/2020 0.309650 0.685925 0.187975 0.812025
## 2 8/29/2020 0.307525 0.688200 0.188325 0.811675
## 3 8/28/2020 0.303500 0.692525 0.186225 0.813775
## 4 8/27/2020 0.299400 0.696400 0.182875 0.817125
## 5 8/26/2020 0.300550 0.695550 0.185375 0.814625
## 6 8/25/2020 0.270450 0.725875 0.162850 0.837150
My original intention was to use the election2 dataframe to create some simple graphs, but I struggled with how to represent the data by date (ended up with a clump of characters at the bottom of my graph). Then I intended to do some row subsets but the data didn’t lend itself all that well to that either as the numbers didn’t change that much over time so it was unclear what value I would get from that.
The biggest lesson I learned is to pick my data more carefully next time so that I have more to dig into.
Sample Failed Plot below. I will work on practicing more with the ggplot package.
library(ggplot2)
ggplot(data = election2, aes(x = Date, y = ElectoralTrump))+
geom_line(color = "Red", size = 2)
## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?