Week 1 Assignment

Data Acquisition & Transformation

First read the raw csv file from my Github and load into a dataframe. Verify top few rows.

x <- read.csv(url("https://raw.githubusercontent.com/HildaRamirez/DATA-607/master/presidential_national_toplines_2020.csv"))
election <- as.data.frame(x)

head(election)

##   cycle    branch      model modeldate candidate_inc candidate_chal
## 1  2020 President polls-plus 8/30/2020         Trump          Biden
## 2  2020 President polls-plus 8/29/2020         Trump          Biden
## 3  2020 President polls-plus 8/28/2020         Trump          Biden
## 4  2020 President polls-plus 8/27/2020         Trump          Biden
## 5  2020 President polls-plus 8/26/2020         Trump          Biden
## 6  2020 President polls-plus 8/25/2020         Trump          Biden
##   candidate_3rd ecwin_inc ecwin_chal ecwin_3rd ec_nomajority popwin_inc
## 1            NA  0.309650   0.685925        NA      0.004425   0.187975
## 2            NA  0.307525   0.688200        NA      0.004275   0.188325
## 3            NA  0.303500   0.692525        NA      0.003975   0.186225
## 4            NA  0.299400   0.696400        NA      0.004200   0.182875
## 5            NA  0.300550   0.695550        NA      0.003900   0.185375
## 6            NA  0.270450   0.725875        NA      0.003675   0.162850
##   popwin_chal popwin_3rd   ev_inc  ev_chal ev_3rd ev_inc_hi ev_chal_hi
## 1    0.812025         NA 221.4668 316.5332     NA       337        429
## 2    0.811675         NA 220.7868 317.2131     NA       337        429
## 3    0.813775         NA 220.4639 317.5361     NA       337        429
## 4    0.817125         NA 220.3914 317.6086     NA       336        428
## 5    0.814625         NA 220.8551 317.1448     NA       337        428
## 6    0.837150         NA 213.9388 324.0612     NA       331        432
##   ev_3rd_hi ev_inc_lo ev_chal_lo ev_3rd_lo national_voteshare_inc
## 1        NA       109        201        NA               46.29174
## 2        NA       109        201        NA               46.26345
## 3        NA       109        201        NA               46.25870
## 4        NA       110        202        NA               46.28278
## 5        NA       110        201        NA               46.30215
## 6        NA       106        207        NA               46.02112
##   national_voteshare_chal national_voteshare_3rd nat_voteshare_other
## 1                52.36622                     NA            1.342047
## 2                52.40189                     NA            1.334657
## 3                52.40611                     NA            1.335190
## 4                52.39348                     NA            1.323738
## 5                52.37099                     NA            1.326856
## 6                52.67421                     NA            1.304669
##   national_voteshare_inc_hi national_voteshare_chal_hi
## 1                  50.76257                   56.84084
## 2                  50.76155                   56.91839
## 3                  50.75557                   56.87645
## 4                  50.70728                   56.78292
## 5                  50.74532                   56.80884
## 6                  50.43880                   57.08401
##   national_voteshare_3rd_hi nat_voteshare_other_hi national_voteshare_inc_lo
## 1                        NA               2.048664                  41.80970
## 2                        NA               2.051201                  41.77486
## 3                        NA               2.055656                  41.81157
## 4                        NA               2.031385                  41.89773
## 5                        NA               2.043283                  41.88205
## 6                        NA               2.018928                  41.62480
##   national_voteshare_chal_lo national_voteshare_3rd_lo nat_voteshare_other_lo
## 1                   47.88091                        NA              0.7092417
## 2                   47.88276                        NA              0.7007713
## 3                   47.90101                        NA              0.7022665
## 4                   47.98074                        NA              0.6946608
## 5                   47.91171                        NA              0.6934588
## 6                   48.26421                        NA              0.6764926
##              timestamp simulations
## 1 05:54:02 30 Aug 2020       40000
## 2 19:00:02 29 Aug 2020       40000
## 3 19:00:03 28 Aug 2020       40000
## 4 19:00:04 27 Aug 2020       40000
## 5 19:01:03 26 Aug 2020       40000
## 6 23:50:03 25 Aug 2020       40000

There are several columns which are either blank or trivial (e.g. incumbent is always Trump and the challenger is always Biden). I’d like to create a simplified dataset in order to display some simple graphs. Let’s create a subset that includes model date, chance that the incumbent wins the electoral college, chance that the challenger wins the electoral college, chance that the incumbent wins the popular vote, and chance that the challenger wins the popular vote.

election2 <- subset(election, select = c(modeldate, ecwin_inc, ecwin_chal, popwin_inc, popwin_chal))

head(election2)

##   modeldate ecwin_inc ecwin_chal popwin_inc popwin_chal
## 1 8/30/2020  0.309650   0.685925   0.187975    0.812025
## 2 8/29/2020  0.307525   0.688200   0.188325    0.811675
## 3 8/28/2020  0.303500   0.692525   0.186225    0.813775
## 4 8/27/2020  0.299400   0.696400   0.182875    0.817125
## 5 8/26/2020  0.300550   0.695550   0.185375    0.814625
## 6 8/25/2020  0.270450   0.725875   0.162850    0.837150

Now let’s rename the columns:

colnames(election2) <- c("Date","ElectoralTrump","ElectoralBiden","PopularTrump","PopularBiden")

head(election2)

##        Date ElectoralTrump ElectoralBiden PopularTrump PopularBiden
## 1 8/30/2020       0.309650       0.685925     0.187975     0.812025
## 2 8/29/2020       0.307525       0.688200     0.188325     0.811675
## 3 8/28/2020       0.303500       0.692525     0.186225     0.813775
## 4 8/27/2020       0.299400       0.696400     0.182875     0.817125
## 5 8/26/2020       0.300550       0.695550     0.185375     0.814625
## 6 8/25/2020       0.270450       0.725875     0.162850     0.837150

FINDINGS

My original intention was to use the election2 dataframe to create some simple graphs, but I struggled with how to represent the data by date (ended up with a clump of characters at the bottom of my graph). Then I intended to do some row subsets but the data didn’t lend itself all that well to that either as the numbers didn’t change that much over time so it was unclear what value I would get from that.

The biggest lesson I learned is to pick my data more carefully next time so that I have more to dig into.

Sample Failed Plot below. I will work on practicing more with the ggplot package.

library(ggplot2)

ggplot(data = election2, aes(x = Date, y = ElectoralTrump))+
  geom_line(color = "Red", size = 2)

## geom_path: Each group consists of only one observation. Do you need to adjust
## the group aesthetic?

Week 1 Assignment

Hilda Ramirez

8/29/2020

Introduction

Data Acquisition & Transformation

FINDINGS