Pulling in the data from my github

library(RCurl)
my_git_url <- getURL("https://raw.githubusercontent.com/aelsaeyed/Data607/main/vote_predictions.csv")
trump_vote_predictions <- read.csv(text = my_git_url, quote = "")

my_git_url2 <- getURL("https://raw.githubusercontent.com/aelsaeyed/Data607/main/averages.csv")
vote_averages <- read.csv(text = my_git_url, quote = "")

Data Exploration

This dataset contains data pertaining to members of different chambers of congress and how they voted- specifically how many times they voted in line with Trump’s position. Alongside this is a score of how they were predicted to have voted. The website shows a little graphic as a column derived using both sets that compares the number of times the Congressperson was predicted to vote with Trump against how many times they actually did.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
## 
## Attaching package: 'tidyr'
## The following object is masked from 'package:RCurl':
## 
##     complete
voted_summ = summary(trump_vote_predictions)
voted_summ
##     congress       bill_id            roll_id            chamber         
##  Min.   :115.0   Length:93983       Length:93983       Length:93983      
##  1st Qu.:115.0   Class :character   Class :character   Class :character  
##  Median :115.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :115.5                                                           
##  3rd Qu.:116.0                                                           
##  Max.   :117.0                                                           
##                                                                          
##    voted_at           bioguide             vote           trump_position    
##  Length:93983       Length:93983       Length:93983       Length:93983      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   last_name            state              district        party          
##  Length:93983       Length:93983       Min.   : 0.00   Length:93983      
##  Class :character   Class :character   1st Qu.: 3.00   Class :character  
##  Mode  :character   Mode  :character   Median : 6.00   Mode  :character  
##                                        Mean   :10.13                     
##                                        3rd Qu.:13.00                     
##                                        Max.   :53.00                     
##                                        NA's   :12897                     
##      agree            yesno         predicted_probability
##  Min.   :0.0000   Min.   :-1.0000   Min.   :0.00000      
##  1st Qu.:0.0000   1st Qu.: 0.0000   1st Qu.:0.04628      
##  Median :1.0000   Median : 1.0000   Median :0.65049      
##  Mean   :0.5254   Mean   : 0.5629   Mean   :0.54093      
##  3rd Qu.:1.0000   3rd Qu.: 1.0000   3rd Qu.:0.96509      
##  Max.   :1.0000   Max.   : 1.0000   Max.   :1.00000      
## 
avg_summ = summary(vote_averages)
avg_summ
##     congress       bill_id            roll_id            chamber         
##  Min.   :115.0   Length:93983       Length:93983       Length:93983      
##  1st Qu.:115.0   Class :character   Class :character   Class :character  
##  Median :115.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :115.5                                                           
##  3rd Qu.:116.0                                                           
##  Max.   :117.0                                                           
##                                                                          
##    voted_at           bioguide             vote           trump_position    
##  Length:93983       Length:93983       Length:93983       Length:93983      
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   last_name            state              district        party          
##  Length:93983       Length:93983       Min.   : 0.00   Length:93983      
##  Class :character   Class :character   1st Qu.: 3.00   Class :character  
##  Mode  :character   Mode  :character   Median : 6.00   Mode  :character  
##                                        Mean   :10.13                     
##                                        3rd Qu.:13.00                     
##                                        Max.   :53.00                     
##                                        NA's   :12897                     
##      agree            yesno         predicted_probability
##  Min.   :0.0000   Min.   :-1.0000   Min.   :0.00000      
##  1st Qu.:0.0000   1st Qu.: 0.0000   1st Qu.:0.04628      
##  Median :1.0000   Median : 1.0000   Median :0.65049      
##  Mean   :0.5254   Mean   : 0.5629   Mean   :0.54093      
##  3rd Qu.:1.0000   3rd Qu.: 1.0000   3rd Qu.:0.96509      
##  Max.   :1.0000   Max.   : 1.0000   Max.   :1.00000      
## 

Subsetting

Ill be looking only at the averages file, which contains the data describing how the votes actually played out alongside the prediction. I will be cleaning up the “party” column so that it is readable, and filtering for only the latest 2 congresses, and within that congress I only want data from the House of Representatives.

avg_subset = subset(vote_averages, vote_averages$congress == (117 || 116) & vote_averages$chamber == 'house')

avg_subset$party[avg_subset$party == 'R'] <- "Republican"
avg_subset$party[avg_subset$party == 'D'] <- "Democrat"

glimpse(avg_subset)
## Rows: 0
## Columns: 15
## $ congress              <int> 
## $ bill_id               <chr> 
## $ roll_id               <chr> 
## $ chamber               <chr> 
## $ voted_at              <chr> 
## $ bioguide              <chr> 
## $ vote                  <chr> 
## $ trump_position        <chr> 
## $ last_name             <chr> 
## $ state                 <chr> 
## $ district              <int> 
## $ party                 <chr> 
## $ agree                 <int> 
## $ yesno                 <int> 
## $ predicted_probability <dbl>

Subsetting Pt. 2 - Narrowing Down Districts

I narrowed the data down some more to a list of the districts that correspond with New York State.

NY_subset = subset(avg_subset, avg_subset$state == 'NY')
glimpse(NY_subset)
## Rows: 0
## Columns: 15
## $ congress              <int> 
## $ bill_id               <chr> 
## $ roll_id               <chr> 
## $ chamber               <chr> 
## $ voted_at              <chr> 
## $ bioguide              <chr> 
## $ vote                  <chr> 
## $ trump_position        <chr> 
## $ last_name             <chr> 
## $ state                 <chr> 
## $ district              <int> 
## $ party                 <chr> 
## $ agree                 <int> 
## $ yesno                 <int> 
## $ predicted_probability <dbl>

Further Exploration

Some exploration can be done with this data to see how much Trump polarized the voting from congress to congress. To do that, I would try to see the percentage of people who opposed the expectation to vote in line with Trump on policies, and how that percentage changed for the same people between congresses.