Data Description

Project 538 provided over thirty-thousand rows of data from the 2016, 2020, 2024 presidential elections. Twenty-thousand rows related to specific states were excluded.

Key variables included:

Methods

Creating a Regression Model Based on 2016 and 2020 Data

Using 2016 and 2020 popular vote data, a training and testing dataset can be created to create a model that predicts the popular vote by candidate based on the predicted percentage of votes, the grade of the pollster, and the candidate in the election.

Results of the 2016/2020 Model on the Testing Data

The values and histogram below show the results of the training model being applied to the testing data. It appears that the model is overfitted due to the extremely high R^2 value and the small residuals.

[1] "Testing RMSE: 0.0075"
[1] "R-squared for testing: 0.998"

Applying the Model to 2024 Predicted Data

The output below does not produce the result that is wanted. The output shows that the other candidate will receive more votes than expected, and that proportionally, the democratic candidate will receive more votes than expected than the republican candidate.

Limitations

The original model exhibited overfitting, leading to unrealistic predictions. A primary concern is that the model assumes the Democratic candidate will win the popular vote in 2024, based on their victories in 2016 and 2020. As a result, the model unjustly deducts points from the Republican candidate’s expected outcomes. To fix this overfitting, it would be beneficial to include data from multiple elections rather than relying solely on these two recent elections.

References

Sources included:

