Project 1

Question topic?

How does the unemployment rate affect the employment-population ratio in the United States?

This project analyzes labor market data from the Bureau of Labor Statistics (BLS) to examine the relationship between unemployment rate and employment population ratio in the United States. The dataset includes quantitative variables such as unemployment rate and employment-population ratio, as well as categorical variables such as gender and age group.
The goal of this analysis is to determine how unemployment affects employment levels and whether this relationship is statistically significant.
Source: Bureau of Labor Statistics (BLS)

library(readr)
library(ggplot2)

data <- read_csv("/Users/mohameddione/Desktop/bls_unemployment.csv")
Rows: 10 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Gender, Age_Group
dbl (3): Year, Unemployment_Rate, Employment_Population_Ratio

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(data)
# A tibble: 6 × 5
   Year Gender Age_Group Unemployment_Rate Employment_Population_Ratio
  <dbl> <chr>  <chr>                 <dbl>                       <dbl>
1  2018 Male   16-24                   8.5                        55.2
2  2018 Female 16-24                   7.9                        54.8
3  2019 Male   16-24                   7.8                        56.1
4  2019 Female 16-24                   7.3                        55.7
5  2020 Male   16-24                  14.5                        50.2
6  2020 Female 16-24                  13.9                        49.8
model <- lm(Employment_Population_Ratio ~ Unemployment_Rate, data = data)

summary(model)

Call:
lm(formula = Employment_Population_Ratio ~ Unemployment_Rate, 
    data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.30005 -0.40577 -0.03662  0.56999  1.04820 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)       61.49362    0.95084  64.673 3.63e-12 ***
Unemployment_Rate -0.82587    0.09618  -8.586 2.62e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.7536 on 8 degrees of freedom
Multiple R-squared:  0.9021,    Adjusted R-squared:  0.8899 
F-statistic: 73.72 on 1 and 8 DF,  p-value: 2.616e-05
par(mfrow = c(1,2))
plot(model, which = 1)
plot(model, which = 2)

The regression equation is:

Employment-Population Ratio = 62.5 − 0.9 × (Unemployment Rate)

This equation shows that for every 1% increase in unemployment rate, the employment-population ratio decreases by approximately 0.9%.

Data Cleaning

The data set was checked to ensure that all variables were in the correct format. The unemployment rate and employment population ratio were confirmed as numeric variables. Additionally, the dataset was checked for missing values, and no missing data was found.

ggplot(data, aes(x = Unemployment_Rate,
                 y = Employment_Population_Ratio,
                 color = factor(Year))) +
  geom_point(size = 4) +
  geom_smooth(method = "lm", se = FALSE, color = "black") +
  labs(
    title = "Unemployment Rate and Employment-Population Ratio in the United States",
    x = "Unemployment Rate (%)",
    y = "Employment-Population Ratio (%)",
    color = "Year",
    caption = "Source: Bureau of Labor Statistics"
  ) +
  scale_color_manual(values = c("2018" = "green",
                                "2019" = "red",
                                "2020" = "grey",
                                "2021" = "blue",
                                "2022" = "pink")) +
  theme_classic()
`geom_smooth()` using formula = 'y ~ x'

Final Analysis

The dataset was cleaned by ensuring that all variables were properly formatted and checking for missing values. Numerical variables were converted into numeric format to allow for accurate analysis.

The visualization shows a clear negative relationship between unemployment rate and employment-population ratio. As unemployment increases, employment decreases. This pattern is consistent across different years.

One limitation of this analysis is the small size of the dataset, which may affect the accuracy of the results. If more data were available, the analysis could be improved. Additionally, including other variables such as education or region could provide deeper insights.