Thesis: The Project 538 prediction models overestimate the popular
vote support for third-party candidates, resulting in an underestimation
of Republican popular vote support and creating the false impression
that the race is less competitive than it truly is.
This analysis uses popular vote data from 2016 and 2020 to predict
the popular vote per candidate for the 2024 election. Its data includes
the election year, the candidate, the grade of the pollster, and the
popular vote prediction per pollster. If Project 538 pollster data over
predicts the third-party candidate and under predicts the republican and
democratic candidate, I expect the actual popular vote to have less
votes for the third-party candidate and more votes for republican and
democratic candidates. Proportionally, though, the republican candidate
should get more of the third-party votes than the democratic candidate.
However, if the predictions for the republican, democratic, and
third-party candidate are accurate, the Project 538 pollster data is
accurate as is.
Data Description
Project 538 provided over thirty-thousand rows of data from the 2016,
2020, 2024 presidential elections. Twenty-thousand rows related to
specific states were excluded.
Key variables included:
- Candidate: The presidential candidate termed either
by republican, democrat, or other.
- Average Error: The error of actual percentage minus
predicted percentage on average per candidate.
- Predicted Percentage: The percentage of the popular
vote per candidate that was predicted by the 538 pollster data.
- Expected Percentage: The percentage of the popular
vote per candidate that is expected based on the 538 pollster data
predictions.
- Actual Percentage: The percentage of the popular
vote per candidate that actually occurred for previous elections.
- Average Percentage: The average percentage of the
popular vote per candidate predicted with and without a model.
- Frequency: The number of times a value occurs.
- Residuals: The actual percentage result per
candidate minus the predicted percentage result per candidate.
Methods
Average Error in Popular Vote Predictions Per Candidate
In 2016 and in 2020, the popular vote was overpredicted for the
third-party, or “other”, candidate. This overprediction was mainly taken
from the republican candidate. However, both the republican and
democratic candidate were underpredicted.

Expected Popular Vote Results Without Using a Regression Model
By using the average underprediction of the republican and democratic
candidates and the average overprediction of the other candidates, a
simple model can be made based on predicted 2024 popular vote per
candidate to predict the actual 2024 popular vote per candidate.

Creating a Regression Model Based on 2016 and 2020 Data
Using 2016 and 2020 popular vote data, a training and testing dataset
can be created to create a model that predicts the popular vote by
candidate based on the predicted percentage of votes, the grade of the
pollster, and the candidate in the election.

Results of the 2016/2020 Model on the Testing Data
The values and histogram below show the results of the training model
being applied to the testing data. It appears that the model is
overfitted due to the extremely high R^2 value and the small
residuals.
[1] "Testing RMSE: 0.0075"
[1] "R-squared for testing: 0.998"

Applying the Model to 2024 Predicted Data
The output below does not produce the result that is wanted. The
output shows that the other candidate will receive more votes than
expected, and that proportionally, the democratic candidate will receive
more votes than expected than the republican candidate.

Limitations
The original model exhibited overfitting, leading to unrealistic
predictions. A primary concern is that the model assumes the Democratic
candidate will win the popular vote in 2024, based on their victories in
2016 and 2020. As a result, the model unjustly deducts points from the
Republican candidate’s expected outcomes. To fix this overfitting, it
would be beneficial to include data from multiple elections rather than
relying solely on these two recent elections.
References
Sources included:
