The following libraries are required:
library(ggplot2)
library(mosaic)
library(dplyr)
library(mosaicData)
library(tidyr)
library(tidyverse)
library(ggiraphExtra)
library(sf)
library(knitr)
library(patchwork)
library(scales)
The 2022 Elections, or midterm elections play a pivotal role in the success of the presidential term of the current elected official. This has been a thing since the formation of the United States, and if the house shifts away from the current party who controls the house, it is a key indicator that the current administration is not fulfilling the promises they have made during the presidential elections. Other events that may occur throughout the time between these election cycles can also cause swings from one direction to another, which we have seen throughout many midterm elections. The data I have gathered stems from the FiveThirtyEight data set from the following link: https://projects.fivethirtyeight.com/2022-election-forecast/.
The Democratic Party has currently been in control of both the house and the senate since the 2020 Elections, but the current forecasts state the Democrats are at risk of losing control of the house and a slight risk on losing control of the Senate. The one thing people are also unaware of are the elections that take place for the governors of certain states, and the Republicans have had control of more states than the Democrats since 2010.
For this assignment I will talk mostly about the house of Representatives, since we know the Republicans have won the majority in the 2022 Midterm Elections, and the Senate Races to show changes.
The following files are shown below:
house1 <- read.csv('election-forecasts-2022/house_district_toplines_2022.csv')
house2 <- read.csv('election-forecasts-2022/house_national_toplines_2022.csv')
house3 <- read.csv('election-forecasts-2022/house_seat_distribution_2022.csv')
house4 <- read.csv('election-forecasts-2022/house_steps_2022.csv')
senate1 <- read.csv('election-forecasts-2022/senate_state_toplines_2022.csv')
senate2 <- read.csv('election-forecasts-2022/senate_national_toplines_2022.csv')
senate3 <- read.csv('election-forecasts-2022/senate_seat_distribution_2022.csv')
senate4 <- read.csv('election-forecasts-2022/senate_steps_2022.csv')
The projections of the 2022 Elections for the house stem from the following link: https://projects.fivethirtyeight.com/2022-election-forecast/house/
With the following datasets from the house, we will see the following forecasts from the file called “house_national_toplines_2022.csv” which has a forecast within from June 1st, 2022 until November 8th, 2022.
I will also filter out the data to include only the “_deluxe” expressions along with decreasing the number of columns to make the data ‘cleaner’.
filtered_data <- house2 %>%
filter(grepl("_deluxe", expression))
head(filtered_data[, c("forecastdate" , "chamber_Dparty", "chamber_Rparty", "mean_seats_Dparty", "mean_seats_Rparty", "median_seats_Dparty", "median_seats_Rparty", "total_national_turnout")])
## forecastdate chamber_Dparty chamber_Rparty mean_seats_Dparty
## 1 11/8/22 0.160200 0.839800 205.2352
## 2 11/7/22 0.159875 0.840125 205.2016
## 3 11/6/22 0.175025 0.824975 205.6574
## 4 11/5/22 0.158225 0.841775 204.6030
## 5 11/4/22 0.155600 0.844400 204.4902
## 6 11/3/22 0.149025 0.850975 204.1487
## mean_seats_Rparty median_seats_Dparty median_seats_Rparty
## 1 229.7647 206 229
## 2 229.7984 206 229
## 3 229.3426 207 228
## 4 230.3969 205 230
## 5 230.5098 205 230
## 6 230.8513 205 230
## total_national_turnout
## 1 1.03e+08
## 2 1.03e+08
## 3 1.02e+08
## 4 1.02e+08
## 5 1.02e+08
## 6 1.02e+08
After filtering out the data, we will generate a plot to show the projected number of seats each party will ‘control’ from June to November over time. I will use the mean_seats_’x’party variables for better visibility.
Note: ‘x’ will be either Democratic or Republican.
filtered_data_subset <- filtered_data[, c("forecastdate" , "chamber_Dparty", "chamber_Rparty", "mean_seats_Dparty", "mean_seats_Rparty", "median_seats_Dparty", "median_seats_Rparty", "total_national_turnout")]
ggplot(filtered_data_subset, aes(x = as.Date(forecastdate, format="%m/%d/%y"))) +
geom_line(aes(y = mean_seats_Rparty, color = "Republican Party"), linewidth = 1) +
geom_line(aes(y = mean_seats_Dparty, color = "Democratic Party"), linewidth = 1) +
scale_color_manual(values = c("Republican Party" = "red", "Democratic Party" = "blue")) +
labs(title = "Projected Number of Seats Each Party Will Control Over Time",
x = "Date",
y = "Number of Seats",
color = "Party") +
theme_minimal()+
theme(plot.title = element_text(hjust = 0.5))
As we can see from the following plots over time, the number of seats projected to be controlled by the Republican Party decreased over time from June to October, but then increased from October through November. From the link for the house election forecast, it was mentioned that the Supreme Court decision was a key factor for this decrease, which is common for majority of the midterm elections, but the Republican Party still had a large lead to gain control of the house at the beginning of the forecasts around June 2022. The voter turnout also increased after the Supreme Court decision as shown below:
filtered_data_subset$forecastdate <- as.Date(filtered_data_subset$forecastdate, format = "%m/%d/%y")
filtered_data_subset <- filtered_data_subset %>% arrange(forecastdate)
ggplot(filtered_data_subset, aes(x = forecastdate, y = total_national_turnout)) +
geom_line(color = "blue", linewidth = 1) +
geom_smooth(method = "loess", color = "green", se = FALSE, linewidth = 1.5) +
labs(title = "National Turnout Over Time",
x = "Date",
y = "Total National Turnout") +
scale_x_date(date_labels = "%b", date_breaks = "1 month") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
axis.title.x = element_text(size = 14),
axis.title.y = element_text(size = 14),
axis.text.x = element_text(angle = 45, hjust = 1))
## `geom_smooth()` using formula = 'y ~ x'
Similar to the house projections, the projections of the 2022 Elections for the house stem from the following link: https://projects.fivethirtyeight.com/2022-election-forecast/senate/
With the following datasets from the house, we will see the following forecasts from the file called “senate_national_toplines_2022.csv” which has a forecast within from June 1st, 2022 until November 8th, 2022.
I will also filter out the data to include only the “_deluxe” expressions along with decreasing the number of columns to make the data ‘cleaner’.
filtered_data1 <- senate2 %>%
filter(grepl("_deluxe", expression))
head(filtered_data1[, c("forecastdate" , "chamber_Dparty", "chamber_Rparty", "mean_seats_Dparty", "mean_seats_Rparty", "median_seats_Dparty", "median_seats_Rparty", "total_national_turnout")])
## forecastdate chamber_Dparty chamber_Rparty mean_seats_Dparty
## 1 11/8/22 0.414825 0.585175 49.11835
## 2 11/7/22 0.414000 0.586000 49.11675
## 3 11/6/22 0.457700 0.542300 49.32322
## 4 11/5/22 0.450350 0.549650 49.26793
## 5 11/4/22 0.448300 0.551700 49.26505
## 6 11/3/22 0.453675 0.546325 49.30110
## mean_seats_Rparty median_seats_Dparty median_seats_Rparty
## 1 50.88165 49 51
## 2 50.88325 49 51
## 3 50.67678 49 51
## 4 50.73207 49 51
## 5 50.73495 49 51
## 6 50.69890 49 51
## total_national_turnout
## 1 78300000
## 2 78300000
## 3 78000000
## 4 78000000
## 5 78000000
## 6 78000000
Similar to the House ggplot, we will utilize the data from the senate to show the changes over time. I will use the mean_seats_’x’party variables for better visibility.
Note: ‘x’ will be either Democratic or Republican.
filtered_data_subset1 <- filtered_data1[, c("forecastdate" , "chamber_Dparty", "chamber_Rparty", "mean_seats_Dparty", "mean_seats_Rparty", "median_seats_Dparty", "median_seats_Rparty", "total_national_turnout")]
ggplot(filtered_data_subset1, aes(x = as.Date(forecastdate, format="%m/%d/%y"))) +
geom_line(aes(y = mean_seats_Rparty, color = "Republican Party"), linewidth = 1) +
geom_line(aes(y = mean_seats_Dparty, color = "Democratic Party"), linewidth = 1) +
scale_color_manual(values = c("Republican Party" = "red", "Democratic Party" = "blue")) +
labs(title = "Projected Number of Seats Each Party Will Control Over Time",
x = "Date",
y = "Number of Seats",
color = "Party") +
theme_minimal()+
theme(plot.title = element_text(hjust = 0.5))
With the Senate projections over time, you can tell the Democrats gained the majority from August to mid-October, but have lost their momentum by the time of elections, which was attributed by the current events between August 2022 and November 2022. Similar to the House trends, there was a decrease from June to August becuause of the Supreme Court decision as mentioned from the link. This led to the Democrats having a forecast of keeping control of the senate for the 2022 Elections, but around October the forecasts started to shift towards the Republicans once again. Similarly to the House of Represenatives forecasts we are seeing a similar trend for the Senate in the National Turnout over time.
filtered_data_subset1$forecastdate <- as.Date(filtered_data_subset1$forecastdate, format = "%m/%d/%y")
filtered_data_subset1 <- filtered_data_subset1 %>% arrange(forecastdate)
ggplot(filtered_data_subset1, aes(x = forecastdate, y = total_national_turnout)) +
geom_line(color = "red", linewidth = 1) +
geom_smooth(method = "loess", color = "purple", se = FALSE, linewidth = 1.5) +
labs(title = "National Turnout Over Time",
x = "Date",
y = "Total National Turnout") +
scale_x_date(date_labels = "%b", date_breaks = "1 month") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
axis.title.x = element_text(size = 14),
axis.title.y = element_text(size = 14),
axis.text.x = element_text(angle = 45, hjust = 1))
## `geom_smooth()` using formula = 'y ~ x'
Upon utilizing the data that I have obtained from fivethirthyeight, the only issue that I noticed was the projected Senate Race was expected to have the Republicans gain control of the senate, which did not occur as the Senate gained a seat from Pennsylvania on 2022. With this end result that happened in the Senate, further analysis needs to be conducted to determine if there were any anomalies or biases that may have attributed to this forecast being incorrect. With the senate race being close, there was definitely possible room for error, hence why the forecast was incorrect. Another variable the forecasts should include for the senate would be to include the Vice President in case there is a 49-49 tie, which will increase the accuracy of such forecasts.
The House of Representatives results also yielded only 9 seats gained for the Republicans which led to a total of 222 seats for the Republicans, whereas the Democrats have a total of 213 seats. It did accurately predict the regain of Republican control of the house but the numbers were off by 8 seats, which may indicate some margins of errors or some external variables that have led to this discrepancy.
I also included the voter turnouts, which both state there were a large number of voter turnouts because of the Supreme Court decision mentioned in fivethirtyeight in this link. Further investigation will be needed along with improving the simulations that were made to include factors that may lead to futher changes in the forecasts for both the house and senate races.