Final Data 110

Introduction

Unemployment rate is a highly valued method of judging a President’s ability to lead the country. Unemployment rate reflects the health of the American economy which is greatly swayed by the fiscal decisions. The unemployment rate is defined by the percentage of labor force who don’t have jobs and are actively seeking one. This excludes people like children or stay at home mothers who would not be working regardless of if a job was open for them. Presidents run to represent their party. Parties have values and policies that influence how they run the country. Democrats and Republicans are the two dominant parties and they typically take different economic approaches. With their differing policies you would think the get different results. My question is: Is there a correlation between Presidents’ party and unemployment rate? The data set I will use to answer this is collected from Wikipedia sources and accessible by OpenIntro. The three variables I will focus on are unemp or unemployment, party; a categorical variable specifying Republican or Democrat, and potus or president in office. The data is simple with no NAs or Upper case letters. I will start by seeing what the residuals look like on their own with out the great depression because the unemployment spiked extremely high due to major world events and skewed the data. I will not keep this filtered out as I am looking at trends over time. I will then perform a linear regression to look for any significant correlation between unemployment rate and the party of the president. Then I plan to plot the regression line over a plot of the unemployment rate over the years regardless of party, as well as a line graph comparing democratic and republican progression over the years. I would also like to look at a plot of unemployment rate by president sorted by year and specifying party and term. In order to do this i will need to mutate the data and assign the minimum year, or first term year, to the presidents so I can order them by when they first started in office. Finally I would like to look at a simple box plot to visually compare residuals between parties. This topic is important to me because I will have to vote soon and I will need to take into account every method of judging a presidents efficacy and the best way to predict the future is to study the past.

Loading Libraries

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(dplyr)
library(plotly)

## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout

setwd("C:/Users/MCuser/Downloads")
midterm <- read_csv("midterms_house.csv")

## Rows: 31 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): potus, party
## dbl (3): year, unemp, house_change
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Removing Outliers and summarizing

dem <- midterm |>
  filter(party == "Democrat",
         year != "1935",
         year != "1939")  # Removing the years of the great depression
rep <- midterm|>
  filter(party == "Republican")
midterm1 <- midterm |>
  filter(year != "1935",
         year != "1939")
summary(dem$unemp)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.380   4.350   5.400   5.451   6.100   9.700

summary(rep$unemp)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.290   4.600   5.600   6.127   6.930  11.620

Important note to keep in mind is that when removing the years of the great depression Democrats have a slightly lower mean and median

Linear Regression

lm_model <- lm(unemp ~ party, data = midterm) #linear model of correlation between uneployement and party
summary(lm_model)

## 
## Call:
## lm(formula = unemp ~ party, data = midterm)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.0850 -2.1360 -1.2271  0.7379 13.8350 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        7.465      1.062   7.031 9.85e-08 ***
## partyRepublican   -1.338      1.434  -0.933    0.358    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.972 on 29 degrees of freedom
## Multiple R-squared:  0.02916,    Adjusted R-squared:  -0.004321 
## F-statistic: 0.8709 on 1 and 29 DF,  p-value: 0.3584

Equation

Unemployment = 7.456 - 1.338Rep

The intercept estimate is the mean unemployment rate of Democrats and the partyRepublican estimate is the difference in mean from the democratic estimate, so it is 1.338% less then the democratic unemployment rate

The R-squared is 0.02916 so only 2.9% of the variation in unemployment is explained by the party, so this model is a poor fit.

The p-value is 0.358, there is not enough significant evidence to say there is a relationship between party and unemployment, I accept the null hypothesis.

The linear model was not designed to compare only two categorical variables so the results are not very reliable. The instructions for this assignment did not give the option of an alternative statistical test but i recognize a t-test would be a better fit model so I provided it below.

\(h_0\): \(\mu1\) = \(\mu2\)

\(H_a\) = \(\mu1\) \(\neq\) \(\mu2\)

tresult <- t.test(unemp ~ party, data = midterm)
tresult

## 
##  Welch Two Sample t-test
## 
## data:  unemp by party
## t = 0.86908, df = 16.642, p-value = 0.3972
## alternative hypothesis: true difference in means between group Democrat and group Republican is not equal to 0
## 95 percent confidence interval:
##  -1.915444  4.591326
## sample estimates:
##   mean in group Democrat mean in group Republican 
##                 7.465000                 6.127059

The p-value here is the same as the linear model so it still shows no significant impact. I am curious if we remove the abnormal years of the great depression how that will affect the p-value.

tresult2 <- t.test(unemp ~ party, data = midterm1)
tresult2

## 
##  Welch Two Sample t-test
## 
## data:  unemp by party
## t = -0.92959, df = 26.813, p-value = 0.3609
## alternative hypothesis: true difference in means between group Democrat and group Republican is not equal to 0
## 95 percent confidence interval:
##  -2.1693035  0.8168525
## sample estimates:
##   mean in group Democrat mean in group Republican 
##                 5.450833                 6.127059

The p-value barely dropped so there is still not enough significant evidence to reject my null hypothesis.

Visualization

midterm |>
  ggplot(aes(x = year, y = unemp)) + #setting it to be unemployement by year
  geom_line(color = "#680b2d") +
  geom_smooth(method = "lm", color = "black") + #making the linear regression line
  labs(x = "Year",
       y = "Unemployment Rate",
       title = "Unemployment Over the Years",
       caption = "Source: Wikipedia") +
  theme_bw()

## `geom_smooth()` using formula = 'y ~ x'

There are a lot of ups and downs throughout the graph with no apparent trend. The regression line has a very small slope and is very wide showing very little correlation. I would like to see if the great depression causes any skewness in the lm line

midterm1 |>
  ggplot(aes(x = year, y = unemp)) + #setting it to be unemployement by year
  geom_line(color = "#680b2d") +
  geom_smooth(method = "lm", color = "black") + #making the linear regression line
  labs(x = "Year",
       y = "Unemployment Rate",
       title = "Unemployment Over the Years",
       caption = "Source: Wikipedia") +
  theme_bw()

## `geom_smooth()` using formula = 'y ~ x'

the linear line is now almost completely parallel with the x-axis showing no real trend of unemployment changing drastically over the years regardless of party

colors <- c("Democrat" = "blue", "Republican" = "red")
colors2 <- c("Democrat" = "darkblue", "Republican" = "darkred")

p <- midterm |>
  ggplot(aes(x = year, y = unemp, color = party)) + #separating the previous gra ph by party
  geom_line() +
  geom_point(aes(fill = party, text = paste("President:", potus))) +  # line of code from nyadavxenc posted on https://www.geeksforgeeks.org/how-to-choose-variable-to-display-in-tooltip-when-using-ggplotly-in-r/
  scale_fill_manual(values = colors2) +
  scale_color_manual(values = colors) +
  labs(x = "Party",
       y = "Unemployment Rate",
       title = "Unemployment Rate Over the Years by Party",
       caption = "Source: Wikipedia") +
  theme_bw()

## Warning in geom_point(aes(fill = party, text = paste("President:", potus))):
## Ignoring unknown aesthetics: text

ggplotly(p)

#guidance on code structure from https://plotly.com/ggplot2/time-series/#continuous-scale

Once again no significant trends, even when distinguishing by party. The republican party line appears to typically lay just above the democratic party line.

#cleaning and mutating for the next plot
midterm <- midterm |>
  group_by(potus) |>
  mutate(first_term_year = min(year)) |> #min for the first term 
  ungroup() |>
  mutate(potus = fct_reorder(potus, first_term_year)) #reorders factor levels based on that year

midterm <- midterm |>
  group_by(potus) |>
  mutate(term = row_number()) |>
  ungroup()


# Code above written by Professor Alraee

midterm |>
  ggplot(aes(x = potus, y = unemp, color = party, size = term)) +
  geom_point() +
  scale_color_manual(values = colors) + #setting the colors to the chosen values
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +  # adjusting the x-axis titles so they don't overlap
  labs(x = "President",
      y = "Unemployment Rate",
      title = "Unemployment rate based on President",
       caption = "Source: Wikipedia")

No significant trends. It is however interesting that most presidents had a lower unemployment rate in their second term.

midterm1 |>
  ggplot(aes(x = party, y = unemp, fill = party)) +
  geom_boxplot(color = "#680b2d") +
  scale_fill_manual(values = colors) +
  labs(x = "Party",
       y = "Unemployment Rate",
       title = "Distribution of Unemployment Rate by Party",
       caption = "Source: Wikipedia") +
  theme_bw()

Both means fall at about the same value however Democrats upper quartile is much lower then Republicans, without remocing the outliers from the Great Depression both boxplots would look about the same

Conclusion

in conclusion, both parties are relatively equivalent in their ability to handle unemployment rate, while they take different approaches they both get to a pretty similar endpoint. There was no significant trends, all my graphs showed was natural influx as major events occur. The regression model showed no significant relationship or any variance dependent on party. The only take away you can get from this is that if you want to judge a president you can’t just look at the numbers but you also have to take into account how they handled major situations. For example, on the graphs FDR had the highest unemployment rate however his policies were able to decrease the unemployment rate that was passed onto him from Woodrow Wilson by 16.6% to a lower unemployment rate then many other presidents have been able to achieve, at first glance of the data with no context you wouldn’t be able to see that, you would just see a high number attached to FDR’s name. In the future I would like to compare the difference from each year to the one after to look at how effective each president was in enacting meaningful change.

Works cited

Data Sets. www.openintro.org/book/statdata/?data=midterms.

GeeksforGeeks. “How to Choose Variable to Display in Tooltip When Using Ggplotly in R.” GeeksforGeeks, 29 Aug. 2024, www.geeksforgeeks.org/how-to-choose-variable-to-display-in-tooltip-when-using-ggplotly-in-r.

Time. plotly.com/ggplot2/time-series/#continuous-scale.

Team, Investopedia. “What Is the Unemployment Rate?” Investopedia, 27 Jan. 2025, www.investopedia.com/terms/u/unemploymentrate.asp.