Overview

FiveThirtyEight’s Political House Forecast was developed to predict the change of each house of representative candidate winning in 2018. The data is separated into multiple variables, such as party, win probability, mean, median, and popular vote margin.

Data Dictionary

forecastdate - date in which prediction occurred

state - specify each state in the United States of America

party - specify whether Democrat or Republican representative

model - model used, whether classic, deluxe, or lite

win_probability(TARGET) - prediction of representative winning

mean_seats - average amount of seats

median_seats - midpoint out of seats

p10_seats - 10% of seats in house based on party

p90_seats - 90% of seats in house based on party

margin - popular vote margin

p10_margin - 10% of popular vote margin

p90_margin - 90% of popular vote margin

Load Data

The data has header names, thus header will be ‘True’. The target for this dataset will be the ‘win_probability’ column.

path = 'https://raw.githubusercontent.com/AlphaCurse/DATA607/main/house_national_forecast.csv'
house = read.table(file=path, header=TRUE, sep=',')
df = data.frame(house)
head(df)
##   forecastdate state party   model win_probability mean_seats median_seats
## 1   2018-08-01    US     D classic          0.7719     231.37          230
## 2   2018-08-01    US     R classic          0.2281     203.63          205
## 3   2018-08-02    US     D classic          0.7431     229.86          228
## 4   2018-08-02    US     R classic          0.2569     205.14          207
## 5   2018-08-03    US     D classic          0.7440     229.83          228
## 6   2018-08-03    US     R classic          0.2560     205.17          207
##   p10_seats p90_seats margin p10_margin p90_margin
## 1       210       255   7.84       3.53      12.26
## 2       180       225  -7.84      -3.53     -12.26
## 3       209       254   7.51       3.24      12.01
## 4       181       226  -7.51      -3.24     -12.01
## 5       209       253   7.52       3.27      11.95
## 6       182       226  -7.52      -3.27     -11.95

Subset Data

Drop Column

Since all ‘state’ column values are the same and the ‘forecastdate’ column does not contribute much toward the probability, I see no value in including them in the dataset.

edit_df = subset(df, select = -c(state, forecastdate))
head(edit_df)
##   party   model win_probability mean_seats median_seats p10_seats p90_seats
## 1     D classic          0.7719     231.37          230       210       255
## 2     R classic          0.2281     203.63          205       180       225
## 3     D classic          0.7431     229.86          228       209       254
## 4     R classic          0.2569     205.14          207       181       226
## 5     D classic          0.7440     229.83          228       209       253
## 6     R classic          0.2560     205.17          207       182       226
##   margin p10_margin p90_margin
## 1   7.84       3.53      12.26
## 2  -7.84      -3.53     -12.26
## 3   7.51       3.24      12.01
## 4  -7.51      -3.24     -12.01
## 5   7.52       3.27      11.95
## 6  -7.52      -3.27     -11.95

Sort by Column

For better visualization, the dataframe will be sorted by the ‘win_probability’ column.

sort_df = edit_df[order(-edit_df$win_probability),]

Conclusions

To make a better conclusion of which party is probable to win seats in the house, the probability to win will be averaged over the Democratic and Republican parties respectively. From the data, we can determine the Democratic party has the highest probability to win majority seats in the house of representatives. The average for a democratic representative to win is 74.79% while a republican representative to win is 25.20%. From analyzing the 10 highest representatives probable to win, there is a 85.23% to 87.92% democrats will hold a seat.

aggregate(sort_df$win_probability, list(sort_df$party), FUN=mean)
##   Group.1         x
## 1       D 0.7479548
## 2       R 0.2520456
head(sort_df, 10)
##     party   model win_probability mean_seats median_seats p10_seats p90_seats
## 195     D classic          0.8792     234.35          233       216       254
## 193     D classic          0.8759     234.06          233       216       254
## 165     D classic          0.8647     235.22          234       215       257
## 179     D classic          0.8637     234.69          233       215       256
## 181     D classic          0.8613     234.25          233       215       255
## 189     D classic          0.8593     233.65          232       215       254
## 391     D  deluxe          0.8578     231.40          231       215       249
## 191     D classic          0.8575     233.25          232       215       253
## 389     D  deluxe          0.8551     231.16          230       215       248
## 167     D classic          0.8523     234.48          233       214       257
##     margin p10_margin p90_margin
## 195   9.15       5.75      12.53
## 193   9.08       5.69      12.45
## 165   9.01       5.25      12.82
## 179   8.98       5.35      12.61
## 181   8.98       5.39      12.57
## 189   8.89       5.68      12.62
## 391   8.81       5.41      12.18
## 191   8.95       5.48      12.37
## 389   8.73       5.35      12.09
## 167   8.95       5.19      12.65