Pulling in the data from my github
library(RCurl)
my_git_url <- getURL("https://raw.githubusercontent.com/aelsaeyed/Data607/main/vote_predictions.csv")
trump_vote_predictions <- read.csv(text = my_git_url, quote = "")
my_git_url2 <- getURL("https://raw.githubusercontent.com/aelsaeyed/Data607/main/averages.csv")
vote_averages <- read.csv(text = my_git_url, quote = "")
This dataset contains data pertaining to members of different chambers of congress and how they voted- specifically how many times they voted in line with Trump’s position. Alongside this is a score of how they were predicted to have voted. The website shows a little graphic as a column derived using both sets that compares the number of times the Congressperson was predicted to vote with Trump against how many times they actually did.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
##
## Attaching package: 'tidyr'
## The following object is masked from 'package:RCurl':
##
## complete
voted_summ = summary(trump_vote_predictions)
voted_summ
## congress bill_id roll_id chamber
## Min. :115.0 Length:93983 Length:93983 Length:93983
## 1st Qu.:115.0 Class :character Class :character Class :character
## Median :115.0 Mode :character Mode :character Mode :character
## Mean :115.5
## 3rd Qu.:116.0
## Max. :117.0
##
## voted_at bioguide vote trump_position
## Length:93983 Length:93983 Length:93983 Length:93983
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## last_name state district party
## Length:93983 Length:93983 Min. : 0.00 Length:93983
## Class :character Class :character 1st Qu.: 3.00 Class :character
## Mode :character Mode :character Median : 6.00 Mode :character
## Mean :10.13
## 3rd Qu.:13.00
## Max. :53.00
## NA's :12897
## agree yesno predicted_probability
## Min. :0.0000 Min. :-1.0000 Min. :0.00000
## 1st Qu.:0.0000 1st Qu.: 0.0000 1st Qu.:0.04628
## Median :1.0000 Median : 1.0000 Median :0.65049
## Mean :0.5254 Mean : 0.5629 Mean :0.54093
## 3rd Qu.:1.0000 3rd Qu.: 1.0000 3rd Qu.:0.96509
## Max. :1.0000 Max. : 1.0000 Max. :1.00000
##
avg_summ = summary(vote_averages)
avg_summ
## congress bill_id roll_id chamber
## Min. :115.0 Length:93983 Length:93983 Length:93983
## 1st Qu.:115.0 Class :character Class :character Class :character
## Median :115.0 Mode :character Mode :character Mode :character
## Mean :115.5
## 3rd Qu.:116.0
## Max. :117.0
##
## voted_at bioguide vote trump_position
## Length:93983 Length:93983 Length:93983 Length:93983
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## last_name state district party
## Length:93983 Length:93983 Min. : 0.00 Length:93983
## Class :character Class :character 1st Qu.: 3.00 Class :character
## Mode :character Mode :character Median : 6.00 Mode :character
## Mean :10.13
## 3rd Qu.:13.00
## Max. :53.00
## NA's :12897
## agree yesno predicted_probability
## Min. :0.0000 Min. :-1.0000 Min. :0.00000
## 1st Qu.:0.0000 1st Qu.: 0.0000 1st Qu.:0.04628
## Median :1.0000 Median : 1.0000 Median :0.65049
## Mean :0.5254 Mean : 0.5629 Mean :0.54093
## 3rd Qu.:1.0000 3rd Qu.: 1.0000 3rd Qu.:0.96509
## Max. :1.0000 Max. : 1.0000 Max. :1.00000
##
Ill be looking only at the averages file, which contains the data describing how the votes actually played out alongside the prediction. I will be cleaning up the “party” column so that it is readable, and filtering for only the latest 2 congresses, and within that congress I only want data from the House of Representatives.
avg_subset = subset(vote_averages, vote_averages$congress == (117 || 116) & vote_averages$chamber == 'house')
avg_subset$party[avg_subset$party == 'R'] <- "Republican"
avg_subset$party[avg_subset$party == 'D'] <- "Democrat"
glimpse(avg_subset)
## Rows: 0
## Columns: 15
## $ congress <int>
## $ bill_id <chr>
## $ roll_id <chr>
## $ chamber <chr>
## $ voted_at <chr>
## $ bioguide <chr>
## $ vote <chr>
## $ trump_position <chr>
## $ last_name <chr>
## $ state <chr>
## $ district <int>
## $ party <chr>
## $ agree <int>
## $ yesno <int>
## $ predicted_probability <dbl>
I narrowed the data down some more to a list of the districts that correspond with New York State.
NY_subset = subset(avg_subset, avg_subset$state == 'NY')
glimpse(NY_subset)
## Rows: 0
## Columns: 15
## $ congress <int>
## $ bill_id <chr>
## $ roll_id <chr>
## $ chamber <chr>
## $ voted_at <chr>
## $ bioguide <chr>
## $ vote <chr>
## $ trump_position <chr>
## $ last_name <chr>
## $ state <chr>
## $ district <int>
## $ party <chr>
## $ agree <int>
## $ yesno <int>
## $ predicted_probability <dbl>
Some exploration can be done with this data to see how much Trump polarized the voting from congress to congress. To do that, I would try to see the percentage of people who opposed the expectation to vote in line with Trump on policies, and how that percentage changed for the same people between congresses.