Overview
FiveThirtyEight’s Political House Forecast was developed to predict the change of each house of representative candidate winning in 2018. The data is separated into multiple variables, such as party, win probability, mean, median, and popular vote margin.
Data Dictionary
forecastdate - date in which prediction occurred
state - specify each state in the United States of America
party - specify whether Democrat or Republican representative
model - model used, whether classic, deluxe, or lite
win_probability(TARGET) - prediction of representative winning
mean_seats - average amount of seats
median_seats - midpoint out of seats
p10_seats - 10% of seats in house based on party
p90_seats - 90% of seats in house based on party
margin - popular vote margin
p10_margin - 10% of popular vote margin
p90_margin - 90% of popular vote margin
Load Data
The data has header names, thus header will be ‘True’. The target for this dataset will be the ‘win_probability’ column.
path = 'https://raw.githubusercontent.com/AlphaCurse/DATA607/main/house_national_forecast.csv'
house = read.table(file=path, header=TRUE, sep=',')
df = data.frame(house)
head(df)
## forecastdate state party model win_probability mean_seats median_seats
## 1 2018-08-01 US D classic 0.7719 231.37 230
## 2 2018-08-01 US R classic 0.2281 203.63 205
## 3 2018-08-02 US D classic 0.7431 229.86 228
## 4 2018-08-02 US R classic 0.2569 205.14 207
## 5 2018-08-03 US D classic 0.7440 229.83 228
## 6 2018-08-03 US R classic 0.2560 205.17 207
## p10_seats p90_seats margin p10_margin p90_margin
## 1 210 255 7.84 3.53 12.26
## 2 180 225 -7.84 -3.53 -12.26
## 3 209 254 7.51 3.24 12.01
## 4 181 226 -7.51 -3.24 -12.01
## 5 209 253 7.52 3.27 11.95
## 6 182 226 -7.52 -3.27 -11.95
Subset Data
Drop Column
Since all ‘state’ column values are the same and the ‘forecastdate’ column does not contribute much toward the probability, I see no value in including them in the dataset.
edit_df = subset(df, select = -c(state, forecastdate))
head(edit_df)
## party model win_probability mean_seats median_seats p10_seats p90_seats
## 1 D classic 0.7719 231.37 230 210 255
## 2 R classic 0.2281 203.63 205 180 225
## 3 D classic 0.7431 229.86 228 209 254
## 4 R classic 0.2569 205.14 207 181 226
## 5 D classic 0.7440 229.83 228 209 253
## 6 R classic 0.2560 205.17 207 182 226
## margin p10_margin p90_margin
## 1 7.84 3.53 12.26
## 2 -7.84 -3.53 -12.26
## 3 7.51 3.24 12.01
## 4 -7.51 -3.24 -12.01
## 5 7.52 3.27 11.95
## 6 -7.52 -3.27 -11.95
Sort by Column
For better visualization, the dataframe will be sorted by the ‘win_probability’ column.
sort_df = edit_df[order(-edit_df$win_probability),]
Conclusions
To make a better conclusion of which party is probable to win seats in the house, the probability to win will be averaged over the Democratic and Republican parties respectively. From the data, we can determine the Democratic party has the highest probability to win majority seats in the house of representatives. The average for a democratic representative to win is 74.79% while a republican representative to win is 25.20%. From analyzing the 10 highest representatives probable to win, there is a 85.23% to 87.92% democrats will hold a seat.
aggregate(sort_df$win_probability, list(sort_df$party), FUN=mean)
## Group.1 x
## 1 D 0.7479548
## 2 R 0.2520456
head(sort_df, 10)
## party model win_probability mean_seats median_seats p10_seats p90_seats
## 195 D classic 0.8792 234.35 233 216 254
## 193 D classic 0.8759 234.06 233 216 254
## 165 D classic 0.8647 235.22 234 215 257
## 179 D classic 0.8637 234.69 233 215 256
## 181 D classic 0.8613 234.25 233 215 255
## 189 D classic 0.8593 233.65 232 215 254
## 391 D deluxe 0.8578 231.40 231 215 249
## 191 D classic 0.8575 233.25 232 215 253
## 389 D deluxe 0.8551 231.16 230 215 248
## 167 D classic 0.8523 234.48 233 214 257
## margin p10_margin p90_margin
## 195 9.15 5.75 12.53
## 193 9.08 5.69 12.45
## 165 9.01 5.25 12.82
## 179 8.98 5.35 12.61
## 181 8.98 5.39 12.57
## 189 8.89 5.68 12.62
## 391 8.81 5.41 12.18
## 191 8.95 5.48 12.37
## 389 8.73 5.35 12.09
## 167 8.95 5.19 12.65