Introduction

Is the net vote share won by Trump per district in 2016 predictive of a Senator’s voting record with regards to the Trump agenda? Is the predicted agreement (with Trump) value generated by 538.com more predictive of a Senator’s actual voting record?

538 updates a daily tally of who in congress supports President Trump by voting in alingment with his agenda. The also have built a prediction as to how frequently the Congressperson ‘should’ vote in alignment with President Trump based on various historical factors.

I like this idea becuase it takes the general voting trend of a given district or state as a proxy for how aligned that location is with President Trump’s agend. It makes for an interesting view into how well a member of Congress aligns with the voting record of their constituency vs. their alignment with Trump.

For my analysis, I will be focusing on the Senate and how members vote regarding Trump’s agenda.

Data Overview

A link to the data can be found here.

These data come from an observational study of the voting patterns of Senators serving in Congress during the Trump administration. Agreement with the Trump agend is defined by which bills the President openly supports. There is no experiment conducted here, it is simple a record of how each Senator has voted. Each case is the current (as of 5/5/2020) total votes for Senator, their home state, party affiliation, the percent they vote in agreement with the Trump administration, their predicted agreement by 538, and the net share of the vote won by Trump in their district in 2016. The variables of interest here are numeric; percent agreement, predicted agreement, net Trump vote share. Analysis of predictors by party are categorical; Democrat and Republican. Independents are excluded form this analysis for two reasons; 1) there are only two of them, and 2) one independent has been running for president as a democrat since 2016.

library(RCurl)
library(ggplot2)
library(dplyr)

d<-getURL(
"https://raw.githubusercontent.com/Shampjeff/cuny_msds/master/DATA_607/data/averages.csv")
df<-read.csv(text=d, stringsAsFactors = FALSE)
senate_df<-subset(df, chamber=='senate' & congress==0)
senate_df<- subset(senate_df, select=-c(district,bioguide, chamber, congress))
senate_df$party[senate_df$party == "I"]<- "Independent"
senate_df$party[senate_df$party == "D"]<- "Democrat"
senate_df$party[senate_df$party == "R"]<-"Republican"
senate_df <- senate_df %>% filter(party != "Independent")
DT::datatable(senate_df,
              extensions = c('FixedColumns',"FixedHeader"),
              options = list(scrollX = TRUE, 
                         paging=TRUE,
                         fixedHeader=TRUE,
                         pageLength=7))

Exploratory Data Analysis

As one might expect now the distribution is bimodal, there is a clear divide on agreement with Trump base on party affiliation.

senate_df %>%
  ggplot() +
  geom_histogram(aes(x=agree_pct, fill=party),
                 alpha=0.5,
                 binwidth = 0.03)+
  scale_fill_manual(values = c("blue", "red")) + 
  labs(x="Percent Agreement with Trump Agenda", 
       y="Count", 
       title="Histogram of Agreement Percent by Party")

Predicted Agreement Score - Democrats

We will start with the predicted agreement score generated by 538 as a function of actual agreement via voting record. To do this, we will fit a linear model and assess it’s fit. We will pay close attention to the conditions of inference; linearity, constant variance, nearly normal residuals.

Let’s start with a linear model and plot of agreement percent as a function of predicted score for each party separately. We will also show the model diagnostics below.

senate_dem<- senate_df %>%
  filter(party == "Democrat")
dem_lin_reg<- lm(data=senate_dem, agree_pct ~ predicted_agree)

senate_df %>%
  filter(party =="Democrat") %>%
  select(predicted_agree, agree_pct) %>%
  ggplot(aes(x=predicted_agree, y=agree_pct)) +
  geom_abline(slope= dem_lin_reg$coefficients[2], 
              intercept = dem_lin_reg$coefficients[1]) +
  geom_point(color="blue") +
  labs(x="Predicted Agreement Score",
       y="Agreement Percent",
       title="Predicted Agreement Linear Fit")

summary(dem_lin_reg)

## 
## Call:
## lm(formula = agree_pct ~ predicted_agree, data = senate_dem)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.119727 -0.030393  0.001735  0.031922  0.251249 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      0.12701    0.02172   5.848 4.28e-07 ***
## predicted_agree  0.37152    0.04676   7.945 2.67e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06703 on 48 degrees of freedom
## Multiple R-squared:  0.568,  Adjusted R-squared:  0.559 
## F-statistic: 63.12 on 1 and 48 DF,  p-value: 2.667e-10

Now we assess the residuals for linearity and normal distribution.

senate_dem %>%
  ggplot() +
  geom_point(aes(x=predicted_agree,
                 y=dem_lin_reg$residuals),
             color="blue") +
  geom_hline(yintercept = 0, 
             linetype="dashed") +
  labs(x="Predicted Agreement",
       y="Residuals",
       title="Residual Plot of Predicted Agreement")

We see there is an outlier at x=0.4, but these residual seem to be fairly well distributed around zero. There is also good indication of constant variance in the data as well. Let’s also check the distribution.

ggplot(data = dem_lin_reg, aes(x=dem_lin_reg$residuals)) +
  geom_histogram(fill = "blue", color="black") +
  labs(x="Residuals", y="Count", 
       title="Residual Histogram")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Again, outside of one outlier, this appears to be nearly normal.

Summary of Predicted Agreement Score - Democrats

With the conditions for infernce met we see that the slope of the line for agreement percent as a function of predicted agreement by 538 is non-zero with p-value nearly zero (2.67e-10). The model can explain 56% of the variance in the data for democrats.

Predicted Agreement Score - Republicans

We will take a similar route with Republicans.

senate_rep<- senate_df %>%
  filter(party == "Republican")
rep_lin_reg<- lm(data=senate_rep, agree_pct ~ predicted_agree)

senate_df %>%
  filter(party =="Republican") %>%
  select(predicted_agree, agree_pct) %>%
  ggplot(aes(x=predicted_agree, y=agree_pct)) +
  geom_abline(slope= rep_lin_reg$coefficients[2], 
              intercept = rep_lin_reg$coefficients[1]) +
  geom_point(color="red") +
  labs(x="Predicted Agreement Score",
       y="Agreement Percent",
       title="Predicted Agreement Linear Fit")

summary(rep_lin_reg)

## 
## Call:
## lm(formula = agree_pct ~ predicted_agree, data = senate_rep)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.21383 -0.02629  0.02028  0.04048  0.10830 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      0.82458    0.04636  17.785   <2e-16 ***
## predicted_agree  0.09401    0.05993   1.569    0.122    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06774 on 61 degrees of freedom
## Multiple R-squared:  0.03878,    Adjusted R-squared:  0.02302 
## F-statistic: 2.461 on 1 and 61 DF,  p-value: 0.1219

Now we assess the residuals for linearity and normal distribution.

senate_rep %>%
  ggplot() +
  geom_point(aes(x=predicted_agree,
                 y=rep_lin_reg$residuals),
             color="red") +
  geom_hline(yintercept = 0, 
             linetype="dashed") +
  labs(x="Predicted Agreement",
       y="Residuals",
       title="Residual Plot of Predicted Agreement")

This do not appear to be too well distributed around zero with many points far below the y=0 line. There does appear to be constant variance.

ggplot(data = rep_lin_reg, aes(x=rep_lin_reg$residuals)) +
  geom_histogram(fill = "red", color="black") +
  labs(x="Residuals", y="Count", 
       title="Residual Histogram")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Again, several points below x=0 on the residual histogram indicate a left-skewed distribution.

Summary of Predicted Agreement Score - Republicans

With conditions reasonably met for a reliable model we see that predicted agreement as generated by 538 for Republicans fails to reject the null hypothesis for a non-zero slope with p-value= 0.122. The predicted agreement score is not predictive for how a Republican will vote in alignment with the Trump Agenda.

Net Trump Vote - Democrats

We now turn out focus to if the net share of the vote won by Trump in 2016 is predictive of that senators voting record on the Trump agenda.

senate_dem<- senate_df %>%
  filter(party == "Democrat")
dem_lin_reg<- lm(data=senate_dem, agree_pct ~ net_trump_vote)

senate_df %>%
  filter(party =="Democrat") %>%
  select(net_trump_vote, agree_pct) %>%
  ggplot(aes(x=net_trump_vote, y=agree_pct)) +
  geom_abline(slope= dem_lin_reg$coefficients[2], 
              intercept = dem_lin_reg$coefficients[1]) +
  geom_point(color="blue") +
  labs(x="Net Trump Vote Share",
       y="Agreement Percent",
       title="Net Trump Vote Share Linear Fit")

summary(dem_lin_reg)

## 
## Call:
## lm(formula = agree_pct ~ net_trump_vote, data = senate_dem)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.124683 -0.038430 -0.005219  0.041798  0.195612 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    0.3156469  0.0100813  31.310  < 2e-16 ***
## net_trump_vote 0.0045973  0.0005541   8.297 7.85e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06536 on 48 degrees of freedom
## Multiple R-squared:  0.5892, Adjusted R-squared:  0.5806 
## F-statistic: 68.85 on 1 and 48 DF,  p-value: 7.854e-11

Now we assess the residuals for linearity and normal distribution.

senate_dem %>%
  ggplot() +
  geom_point(aes(x=net_trump_vote,
                 y=dem_lin_reg$residuals),
             color="blue") +
  geom_hline(yintercept = 0, 
             linetype="dashed") +
  labs(x="Net Trump Vote Share",
       y="Residuals",
       title="Residual Plot of Net Trump Vote Share")

We see there is an outlier near 0% vote share, but these residual seem to be fairly well distributed around zero. There is also good indication of constant variance in the data as well. Let’s also check the distribution.

ggplot(data = dem_lin_reg, aes(x=dem_lin_reg$residuals)) +
  geom_histogram(fill = "blue", color="black") +
  labs(x="Residuals", y="Count", 
       title="Residual Histogram")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

These appear to be nearly normal with some reservations about the distribution of the residuals. Again, a few outliers skewing the data to the right.

Net Trump Vote - Republicans

We now turn out focus to if the net share of the vote won by Trump in 2016 is predictive of that senators voting record on the Trump agenda.

senate_rep<- senate_df %>%
  filter(party == "Republican")
rep_lin_reg<- lm(data=senate_rep, agree_pct ~ net_trump_vote)

senate_df %>%
  filter(party =="Republican") %>%
  select(net_trump_vote, agree_pct) %>%
  ggplot(aes(x=net_trump_vote, y=agree_pct)) +
  geom_abline(slope= rep_lin_reg$coefficients[2], 
              intercept = rep_lin_reg$coefficients[1]) +
  geom_point(color="red") +
  labs(x="Net Trump Vote Share",
       y="Agreement Percent",
       title="Net Trump Vote Share Linear Fit")

summary(rep_lin_reg)

## 
## Call:
## lm(formula = agree_pct ~ net_trump_vote, data = senate_rep)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.21430 -0.02370  0.02049  0.03805  0.11723 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    0.877498   0.015040  58.346   <2e-16 ***
## net_trump_vote 0.001022   0.000681   1.501    0.139    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06786 on 61 degrees of freedom
## Multiple R-squared:  0.03562,    Adjusted R-squared:  0.01981 
## F-statistic: 2.253 on 1 and 61 DF,  p-value: 0.1385

Now we assess the residuals for linearity and normal distribution.

senate_rep %>%
  ggplot() +
  geom_point(aes(x=net_trump_vote,
                 y=rep_lin_reg$residuals),
             color="red") +
  geom_hline(yintercept = 0, 
             linetype="dashed") +
  labs(x="Net Trump Vote Share",
       y="Residuals",
       title="Residual Plot of Net Trump Vote Share")

We see a similar story with the previous analysis of Republicans a left-skewed distribution that is seems nearly normal.

Conclusion

I was surprised by these results, but it seems to make sense to some extent. Net vote share and 538 predicted score are not predictors of how a Reoublican will vote alongside Trump. This suggests that for Senators in states that Trump lost or only marginally won, Republicans still will vote largely in alignment with Trump. This seems counter to the standard rules of political self-interest but clearly there must a some other factor that out-weighs voting margins or historical data. Democrats appear to have more freedom in how they vote, though they all seem to cap their agreement with the Trump agenda to under 60%.

Regression Analysis for US Sentors Voting Patterns

Jeff shamp

2020-09-03

Introduction

Data Overview

Exploratory Data Analysis

Predicted Agreement Score - Democrats

Summary of Predicted Agreement Score - Democrats

Predicted Agreement Score - Republicans

Summary of Predicted Agreement Score - Republicans

Net Trump Vote - Democrats

Net Trump Vote - Republicans

Conclusion

Introduction

Data Overview

Exploratory Data Analysis

Predicted Agreement Score - Democrats

Summary of Predicted Agreement Score - Democrats

Predicted Agreement Score - Republicans

Summary of Predicted Agreement Score - Republicans

Net Trump Vote - Democrats

Summary of Net Trump Vote Share - Democrats

Net Trump Vote - Republicans

Summary of Net Trump Vote Share - Republicans

Conclusion