Will Bruton

Thesis and Objectives

For this project, I wanted to explore a bit of everything: swing states, pollster data, Trump’s polling in 2020 versus 2024 as he reappears as a candidate, and how these factors will influence an election many are calling “The Most Important Election Ever.” Based on my initial research, this race is close. My prediction, however, leans toward Kamala Harris, largely due to the incumbency advantage.

Models

Model 1: Pollster Consistency

  • Summary: This model model_pollster_consistency examines whether pollster quality metrics—such as numeric_grade and rank affect the reported polling percentages for each candidate. The goal is to identify if higher-rated pollsters report different results, possibly with lower bias or greater consistency, compared to lower-rated pollsters.
  • Insight: The results show that numeric_grade and rank have only a marginally significant effect on polling percentages, suggesting that higher-rated pollsters might slightly differ in their reporting but without a strong overall trend. However, the model does reveal a statistically significant difference for Kamala Harris’s polling percentage, indicating that Harris’s support is consistent across various pollsters and methodologies.
## 
## Call:
## lm(formula = pct ~ numeric_grade + rank + candidate_name, data = t_PresidentPolls2024_with_ratings)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.0383 -2.0383 -0.0225  1.9486  8.8009 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 52.62559    3.56768  14.751  < 2e-16 ***
## numeric_grade               -2.13483    1.20454  -1.772   0.0767 .  
## rank                        -0.02203    0.01160  -1.900   0.0578 .  
## candidate_nameKamala Harris  0.83924    0.19085   4.397 1.23e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.803 on 864 degrees of freedom
##   (103 observations deleted due to missingness)
## Multiple R-squared:  0.02686,    Adjusted R-squared:  0.02348 
## F-statistic: 7.948 on 3 and 864 DF,  p-value: 3.124e-05

Model 2: Trump Trend Analysis

  • Summary: This model model_trump_trend assesses whether Trump’s polling percentages in 2020 are predictive of his 2024 polling percentages. By correlating pct_2020 with pct_2024, this analysis aims to detect any trends or consistency in Trump’s support across the two election cycles.
  • Insight: The significant positive relationship found between Trump’s 2020 and 2024 polling percentages suggests that Trump’s base has remained stable, with high polling percentages in 2020 carrying over into 2024. The results indicate that regions or demographic groups where Trump performed well in 2020 continue to support him strongly, highlighting a continuity in his voter base.
## 
## Call:
## lm(formula = pct_2024 ~ pct_2020, data = t_PresidentPolls_Trump)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.6965 -1.6658  0.0765  1.5612  9.5918 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 34.84407    0.88115   39.54   <2e-16 ***
## pct_2020     0.25766    0.02031   12.69   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.075 on 2700 degrees of freedom
## Multiple R-squared:  0.05625,    Adjusted R-squared:  0.0559 
## F-statistic: 160.9 on 1 and 2700 DF,  p-value: < 2.2e-16

Model 3: Methodology Impact

  • Summary: This model model_methodology explores the impact of polling methodology on polling percentages. It also includes candidate_name to examine whether specific candidates have systematically different polling results based on methodology.
  • Insight: The results indicate that methodology does not significantly affect polling percentages, as shown by the non-significant coefficient for “Online Panel.” However, Kamala Harris’s polling percentage is significantly higher, suggesting a consistent level of support for her across methodologies. This finding implies that both “Live Phone” and “Online Panel” methodologies yield comparable results for each candidate.
## 
## Call:
## lm(formula = pct ~ candidate_name + methodology, data = t_PresidentPolls2024_with_ratings)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.1065 -1.8956  0.1044  1.8935  8.9024 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  46.0976     0.1765 261.169  < 2e-16 ***
## candidate_nameKamala Harris   0.7981     0.1796   4.444 9.86e-06 ***
## methodologyOnline Panel       0.2109     0.1895   1.113    0.266    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.791 on 968 degrees of freedom
## Multiple R-squared:  0.02105,    Adjusted R-squared:  0.01903 
## F-statistic: 10.41 on 2 and 968 DF,  p-value: 3.373e-05

Plots

Plot 1: Swing State-Level Weighted Polling Average by Candidate (2024)

  • Summary: This bar plot displays the weighted polling averages for candidates in key swing states for 2024. It highlights each candidate’s performance within individual swing states.
  • Insight: The plot offers a comparative view of candidate support across key swing states that frequently determine election outcomes. Currently, Trump leads in four of the eight swing states, while Kamala leads in three. Nevada remains close and could swing either way, potentially giving Trump a fifth state or Kamala a fourth, resulting in a tie.

Plot 2: Pollster Rank vs. Polling Percentage (2024)

  • Summary: This scatter plot examines the relationship between pollster rank and polling percentage in 2024. Each point represents a poll, with rank on the x-axis and pct on the y-axis.
  • Insight: The plot shows a wide spread of polling percentages across pollster ranks, with most polls concentrated at lower and mid-range ranks. Polling percentages remain consistent between 45% and 50% regardless of rank, suggesting that pollster rank has minimal impact on the polling outcome. Higher-ranked pollsters do not appear to systematically report higher or lower percentages, indicating that polling consistency is relatively unaffected by pollster rank.

Plot 3: Trump’s 2020 vs. 2024 Polling Percentages

  • Summary: This scatter plot with a regression line shows the relationship between Trump’s 2020 and 2024 polling percentages. Each point represents a poll, with pct_2020 on the x-axis and pct_2024 on the y-axis.
  • Insight: The positive slope of the regression line indicates a slight but consistent correlation between Trump’s polling percentages in 2020 and 2024. This suggests that Trump’s support base has remained stable, with regions or demographics that supported him in 2020 likely to continue their support in 2024. The clustering around 40–50% shows a concentration of steady support, while outliers above 50% are minimal, further emphasizing stability across the two election cycles.
## `geom_smooth()` using formula = 'y ~ x'

Discussion:

After looking at swing state polling I had both Kamala and Trump having a roughly even split in swing states. Trump had four while Kamala had 3 with Nevada possibly going either way. In the end, I was shocked by the results. While Nevada remained up in the air well after the election was called, Trump managed to secure more than enough swing states early on to secure more than enough votes. In the end, Trump managed to win every single swing state. This what not at all what I predicted. I assumed it would be a battle down to a last swing state to secure either candidates victory. I think this result happened for a couple different reasons. I found that Trump support had either remained the same or has grown stronger since the previous election which could help explain his jump in numbers but I did think the race was very close considering the other numbers that came back. What I do think got looked over is that Republican polling is often lower due to that people in the party being very reluctant to answer polls. I also think the silent majority had a play in this. This is another group who is reluctant to answer polls or talk their politics. This group is a mix of both parties but often it’s hard to guess where they are leaning. At the same time, my numbers showed Trumps numbers in the polls remaining the same this time around if not growing more since 2020. Who’s to say the same thing growth in support didn’t happen within the population but they didn’t answer polls. Overall, I was definitely surprised by what occurred and certainly expecting a closer race.

Resources:

  • Sites used for Data: Pollster Data | Polling Data

  • Swing State Info: NPR

  • ChatGPT: Major help with cleaning up my coding and finding errors within it. Helped to streamline my pipes using %in% and other functions. Helpful in formatting my text in markdown and help reword bits and pieces.