Project 2

Author

Balemlay

Google:#1

Google:#1

INTRODUCTION

The Topic is analyzing the relationship between employment rate and key labor market indicators. The dataset that I use for this project is Employment_Unemployment_and_Labor_Force_Data_20250414.csv. The variables are year, month, date_label, which are considered categorical data, and civilian_non_institutional_population, civilian_labor_force, labor_force_participation_rate, employed, employment_rate, unemployed, unemployment_rate , and date are considered continuous data. The data has come from opendatamaryland.gov. I cleaned the data by using(! is. na(data$column)) to check if there are missing values. And I also changed column names from capital letters to lowercase letters and changed from no spaces to underscores between two or more words. Finally, I changed the column name civilian_non-institutional_population to civilian_non_institutional_population to be consistent.

The reason I chose this dataset is that I am very curious to know why it is difficult to get a job. The other reason I chose the topic is to see the relationship between the employment rate and key labor market indicators. It has a lot of meaning for me because I am looking for a job and it is very difficult to get, so I want to know why I am not getting a job easily. I am hoping that after this project, I will get more detailed information.

These are the packages that I use for this project

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(RColorBrewer)
library(highcharter)
Warning: package 'highcharter' was built under R version 4.4.3
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
Highcharts (www.highcharts.com) is a Highsoft software product which is
not free for commercial and Governmental use
library(ggfortify)
Warning: package 'ggfortify' was built under R version 4.4.3
library(GGally)
Warning: package 'GGally' was built under R version 4.4.3
Registered S3 method overwritten by 'GGally':
  method from   
  +.gg   ggplot2
library(tidyr)
library(psych)
Warning: package 'psych' was built under R version 4.4.3

Attaching package: 'psych'

The following objects are masked from 'package:ggplot2':

    %+%, alpha
library(leaflet)
Warning: package 'leaflet' was built under R version 4.4.3
library(scales)
Warning: package 'scales' was built under R version 4.4.3

Attaching package: 'scales'

The following objects are masked from 'package:psych':

    alpha, rescale

The following object is masked from 'package:purrr':

    discard

The following object is masked from 'package:readr':

    col_factor
library(tidyverse)
library(knitr)
library(webshot2)
Warning: package 'webshot2' was built under R version 4.4.3
setwd("C:/Users/ebale/OneDrive/Desktop/DATA110")
labor1 <- read_csv("Employment__Unemployment__and_Labor_Force_Data_20250414.csv")
Rows: 152 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Date, Date Label
dbl (9): Year, Month, Civilian Non-institutional Population, Civilian Labor ...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
data(labor_condition)
Warning in data(labor_condition): data set 'labor_condition' not found

This code is changing the column name to the lowercase and changing from no space to an underscore.

  # this code change capital letter of colum name to lower letter
colnames(labor1) <- tolower(colnames(labor1))

# Replace full stops (periods) with underscoa lowercaseres
colnames(labor1) <- gsub(" ", "_", colnames(labor1))   # Chatgbt#2 I google how to chage from no space to underscore.

Rename the column name

labor2 <- labor1|>
  rename(civilian_non_institutional_population = `civilian_non-institutional_population`)

This code will check if there is any missing value in each column

# the code help to check if there is missing value in each column
clean1 <- !is.na(labor2$labor_force_participation_rate) & !is.na(labor2$employment_rate) &  !is.na(labor2$unemployment_rate) & !is.na(labor2$year) & !is.na(labor2$employed) & !is.na(labor2$unemployed) & !is.na(labor2$civilian_labor_force) & !is.na(labor2$civilian_non_institutional_population)

To explore correlations

pairs.panels(labor2[5:11],   # plot distributions and correlations for all the data
             gap = 0,
             pch = 21,
             lm = TRUE)

Model 1

# Fit a multiple linear regression model
fit1 <- lm(employment_rate ~ unemployment_rate + labor_force_participation_rate +
  civilian_labor_force + civilian_non_institutional_population +
  employed + unemployed, data = labor2 )
# # Display a detailed summary of the model
# Includes coefficients, significance levels (p-values), R-squared, and F-statistic
summary(fit1)

Call:
lm(formula = employment_rate ~ unemployment_rate + labor_force_participation_rate + 
    civilian_labor_force + civilian_non_institutional_population + 
    employed + unemployed, data = labor2)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.063733 -0.022076 -0.002707  0.021005  0.073419 

Coefficients: (1 not defined because of singularities)
                                        Estimate Std. Error t value Pr(>|t|)
(Intercept)                            7.721e+01  5.800e+00  13.311  < 2e-16
unemployment_rate                     -6.595e-02  7.045e-02  -0.936   0.3508
labor_force_participation_rate        -1.619e-01  8.526e-02  -1.899   0.0595
civilian_labor_force                   4.329e-06  2.576e-06   1.680   0.0950
civilian_non_institutional_population -1.621e-05  1.257e-06 -12.892  < 2e-16
employed                               2.011e-05  2.291e-06   8.781 3.99e-15
unemployed                                    NA         NA      NA       NA
                                         
(Intercept)                           ***
unemployment_rate                        
labor_force_participation_rate        .  
civilian_labor_force                  .  
civilian_non_institutional_population ***
employed                              ***
unemployed                               
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.0298 on 146 degrees of freedom
Multiple R-squared:  0.9993,    Adjusted R-squared:  0.9993 
F-statistic: 4.095e+04 on 5 and 146 DF,  p-value: < 2.2e-16
# Create diagnostic plots for checking regression assumptions
autoplot(fit1, 1:4, nrow=2, ncol=2)

Model 2

# Fit a multiple linear regression model
fit2 <- lm(employment_rate ~ unemployment_rate + labor_force_participation_rate +
  civilian_labor_force + civilian_non_institutional_population + employed, data = labor2)
 # Display a detailed summary of the model
# Includes coefficients, significance levels (p-values), R-squared, and F-statistic
summary(fit2)

Call:
lm(formula = employment_rate ~ unemployment_rate + labor_force_participation_rate + 
    civilian_labor_force + civilian_non_institutional_population + 
    employed, data = labor2)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.063733 -0.022076 -0.002707  0.021005  0.073419 

Coefficients:
                                        Estimate Std. Error t value Pr(>|t|)
(Intercept)                            7.721e+01  5.800e+00  13.311  < 2e-16
unemployment_rate                     -6.595e-02  7.045e-02  -0.936   0.3508
labor_force_participation_rate        -1.619e-01  8.526e-02  -1.899   0.0595
civilian_labor_force                   4.329e-06  2.576e-06   1.680   0.0950
civilian_non_institutional_population -1.621e-05  1.257e-06 -12.892  < 2e-16
employed                               2.011e-05  2.291e-06   8.781 3.99e-15
                                         
(Intercept)                           ***
unemployment_rate                        
labor_force_participation_rate        .  
civilian_labor_force                  .  
civilian_non_institutional_population ***
employed                              ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.0298 on 146 degrees of freedom
Multiple R-squared:  0.9993,    Adjusted R-squared:  0.9993 
F-statistic: 4.095e+04 on 5 and 146 DF,  p-value: < 2.2e-16
# Create diagnostic plots for checking regression assumptions
autoplot(fit2, 1:4, nrow=2, ncol=2)

Model 3

# Fit a multiple linear regression model
fit3 <- lm(employment_rate ~ labor_force_participation_rate +
  civilian_labor_force + civilian_non_institutional_population + employed, data = labor2)
 # Display a detailed summary of the model
# Includes coefficients, significance levels (p-values), R-squared, and F-statistic
summary(fit3)

Call:
lm(formula = employment_rate ~ labor_force_participation_rate + 
    civilian_labor_force + civilian_non_institutional_population + 
    employed, data = labor2)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.067430 -0.022984 -0.002292  0.021441  0.075064 

Coefficients:
                                        Estimate Std. Error t value Pr(>|t|)
(Intercept)                            7.824e+01  5.693e+00  13.743   <2e-16
labor_force_participation_rate        -1.801e-01  8.299e-02  -2.170   0.0316
civilian_labor_force                   2.591e-06  1.785e-06   1.452   0.1487
civilian_non_institutional_population -1.644e-05  1.233e-06 -13.338   <2e-16
employed                               2.226e-05  5.785e-08 384.780   <2e-16
                                         
(Intercept)                           ***
labor_force_participation_rate        *  
civilian_labor_force                     
civilian_non_institutional_population ***
employed                              ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.02979 on 147 degrees of freedom
Multiple R-squared:  0.9993,    Adjusted R-squared:  0.9993 
F-statistic: 5.123e+04 on 4 and 147 DF,  p-value: < 2.2e-16
# Create diagnostic plots for checking regression assumptions
autoplot(fit3, 1:4, nrow=2, ncol=2)

Model 4

# Fit a multiple linear regression model
fit4 <- lm(employment_rate ~ labor_force_participation_rate +
   civilian_non_institutional_population + employed, data = labor2)
# Display a detailed summary of the model
# Includes coefficients, significance levels (p-values), R-squared, and F-statistic
summary(fit4)

Call:
lm(formula = employment_rate ~ labor_force_participation_rate + 
    civilian_non_institutional_population + employed, data = labor2)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.069797 -0.023975 -0.002044  0.023377  0.072937 

Coefficients:
                                        Estimate Std. Error  t value Pr(>|t|)
(Intercept)                            7.003e+01  6.780e-01  103.293  < 2e-16
labor_force_participation_rate        -6.015e-02  7.650e-03   -7.863 7.19e-13
civilian_non_institutional_population -1.465e-05  5.138e-08 -285.145  < 2e-16
employed                               2.223e-05  5.540e-08  401.286  < 2e-16
                                         
(Intercept)                           ***
labor_force_participation_rate        ***
civilian_non_institutional_population ***
employed                              ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.0299 on 148 degrees of freedom
Multiple R-squared:  0.9993,    Adjusted R-squared:  0.9993 
F-statistic: 6.78e+04 on 3 and 148 DF,  p-value: < 2.2e-16
# Create diagnostic plots for checking regression assumptions
autoplot(fit4, 1:4, nrow=2, ncol=2)

This is the analysis of the summary

The mode equation is Employment Rate = (−0.06015Labor force participation rate − 0.00001465Civilian Non-Institutional Population + 0.00002223*Employed) + 70.03. All p-values are less than 0.05, which means each variable is statistically significant. This suggests a strong relationship between labor force participation rate, civilian non-institutional population, employed, and the employment rate. Adjusted R² = 0.9993. This means 99.93% of the variation in employment rate is explained by the model. it is an extremely strong fit.

Diagnostic analysis

  1. Residuals vs Fitted What it shows: The residuals should be randomly scattered around the horizontal line (zero), which indicates linearity and constant variance. Interpretation: the plot shows a fairly random scatter, but some points on the left and right edges.

  2. Normal Q-Q Plot What it shows: This checks if residuals follow a normal distribution. Points should fall along the diagonal line. Interpretation: Most points are near the line, but the tails (especially the ends) deviate slightly.

  3. Scale-Location (Spread-Location) What it shows: Checks for equal variance (homoscedasticity) across fitted values. Interpretation: The points are spread fairly evenly, though there are a few clusters.

  4. Cook’s Distance What it shows: Identifies influential observations—points that have a strong impact on model coefficients. Interpretation: A few points (like observations 3 and 40) have higher Cook’s distance, especially observation 3.

To show an example for regression

plot1 <- ggplot(data = labor2, aes(x = civilian_labor_force, y = employment_rate ))+
  geom_point(size = 1, color = "red")+ # assign what will be on x and y axise
  geom_smooth()
  labs(title = " Correlation Between Employment Rate and Civilian Labor Force",
       x = "Civilian labor force", #lable for x-axise
       y = "Employment Rate",      # label for y-axise
      caption = "source:U.S. Bureau of Labor Statistics")+  # label for caption
theme_bw(base_size = 13)  # theme to set font size
NULL
plot1
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Here I am grouping, mutating, and summarizing

labor3 <- labor2 |>
  group_by(year) |> # this code will group the data by year
  # This code will summarise the data by calculate average for each column
  summarise(
    avg_unemployment_rate = mean(unemployment_rate),
    avg_employment_rate = mean(employment_rate),
    avg_labor_force_participation_rate = mean(labor_force_participation_rate),
    avg_civilian_labor_force = mean(civilian_labor_force),
    avg_civilian_non_institutional_population = mean(civilian_non_institutional_population),
    avg_employed = mean(employed),
    avg_unemployed = mean(unemployed)
  )|>
  
   # Add category based on avg employment rate
  mutate(                      # AI#3 I use chat gbt to find out how to cut and lable average employment rate.
    rate_category = cut(
      avg_employment_rate,
      breaks = c(50,63, 65, 67, 68),
      labels = c("Very Low", "Low", "Medium", "High"),
      right = FALSE
    ),
    # Assign color based on category
    color = case_when(
      rate_category == "Very Low" ~ "#d73027",
      rate_category == "Low" ~ "#fc8d59",
      rate_category == "Medium" ~ "#fee08b",
      rate_category == "High" ~ "#1a9850"
    )
  )

First preliminary graph

# Create a highchart object
highchart() |>
  hc_add_series(data = labor3,    # The dataset I am  using
                type = "line",    # Specifies the chart type as a line plot
                hcaes(x = year,   # Maps the x-axis to the year
                      y = avg_employed,
                      name = "year"))|>    # Maps the y-axis to avg_employed
                      
  hc_title(text = "Average Employed Over Years")|>      # Title displayed at the top of the chart
 hc_xAxis(title = list(text = "Year"))|>
  hc_yAxis(title = list(text = "Avg Employed"))  # Y-axis label

Second preliminary graph

# Create a highchart object
highchart()|>
  hc_add_series(data = labor3,    # The dataset I am  using
                type = "area",    # Specifies the chart type as area
                hcaes(x = "year",  # Maps the x-axis to the year
                     y = "avg_civilian_non_institutional_population"))|>

  hc_title(text = "Average civilian non-institutional population over the Year")|> # # Title displayed at the top of the chart
  hc_yAxis(title = list(text = "avg_civilian_non_institutional_population "))|> # y-axis label
  hc_xAxis(title =list(text = "Year"))  # x-axis label

Final visualization

# Plotting with ggplot

plot3 <- ggplot(labor3, aes(
  x = year,                 # assign year to x-axies
  y = avg_employment_rate,  # assign avg employment rate to y-axies
  text = paste("year:", year, "<br>avg_employment_rate:", round(avg_employment_rate, 2)) # it will paste the text that I write
)) +
  geom_bar(aes(fill = rate_category), stat = "identity", position = "dodge") +  # tell how the barg graph looklike
  labs(
    title = "Average Employment Rate Over Time", # label for title
    x = "Year",                                  # label for x-axies
    y = "Average Employment Rate",              # label for y-axies
    caption = "Data Source: U.S Bureau of Labor Statistics"   # label for caption
  ) +
  scale_fill_manual(values = c(          # assign color manually
    "Very Low" = "#d73027",
    "Low" = "#fc8d59",
    "Medium" = "#fee08b",
    "High" = "#1a9850"
  )) +
  theme_minimal(base_size = 14) +     # set the font size
  theme(
    legend.position = "bottom",       # tell where the legend position will be 
    text = element_text(family = "AvantGarde", face = "bold")
  )  # assign the text style

#plot3 <- ggplotly(plot3, tooltip = "text")   # help to see texts on the graph


# Display the plot
plot3
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
not found in Windows font database
Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
not found in Windows font database
Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database
Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database
Warning in grid.Call(C_stringMetric, as.graphicsAnnot(x$label)): font family
not found in Windows font database
Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database
Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database
Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database
Warning in grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y, : font
family not found in Windows font database
Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
font family not found in Windows font database
Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
font family not found in Windows font database
Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
font family not found in Windows font database
Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
font family not found in Windows font database
Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
font family not found in Windows font database
Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
font family not found in Windows font database

Background Research

Unemployment refers to the share of the labor force that is without work but available for and seeking employment. Annual employment growth is measured as the annual change in permanent full-time workers between the last complete fiscal year and a previous period (2 or 3 years ago). According to Aaronson et al., “The labor force participation rate has declined significantly since 2007, with both cyclical and structural factors playing a role” (215. The decline in the U.S. employment-to-population ratio has been influenced by both cyclical downturns and long-term structural changes (Abraham and Kearney 590). Population aging has had a large effect on the overall employment rate over this period, but within-age-group declines in employment among young- and prime-age adults also have played a central role. Among the factors with effects that we can quantify based on existing evidence, labor demand factors, in particular increased import competition from China and the penetration of robots into the labor market, are the most important drivers of observed within-group declines in employment. The United States’ unemployment rate from 2007 to 2010 increased from 4.62% to 9.61%(Statista). At the same time, the employment rate decreased. From mid-2010 to mid-2020, the unemployment rate decreased to 3.68%, and employment increased in 2020. In mid-2020, the unemployment rate rose again and in 2022 declined. From mid-2022 to now, the unemployment rate has increased a little.

Essay

From plot 1, we can see that when the employment rate was high, the number of the Civilian labor force was low, and then even as the civilian labor force increases, the number of employment rate also rise. On the first preliminary graph, we can see that the average employed population rose from 2008 to 2019. On the second preliminary graph, since 2007, the average civilian non-institutional population increased until 2019. On the final visualization, we can see that the average employment rate was fluctuating. From 2007 and 2008 it was medium, and from 2009 to 2013 it was low, and then in 2014 it was very low. After 2015, the employment rate went back from a low rate to a low rate.Generally, the employment rate is highly correlated with labor force participation rate, civilian non-institutional population, and the number of people employed.

Bibliography

Aaronson, Stephanie, et al. “Labor Force Participation: Recent Developments and Future Prospects.” Brookings Papers on Economic Activity, vol. 2014 no. 2, 2014, p. 197-275. Project MUSE, https://dx.doi.org/10.1353/eca.2014.0015.

Abraham, Katharine G., and Melissa S. Kearney. 2020. “Explaining the Decline in the US Employment-to-Population Ratio: A Review of the Evidence.” Journal of Economic Literature 58 (3): 585–643.DOI: 10.1257/jel.20191480 “Total Employment and Unemployment Rate in the U.S. from 1980 to 2023.

” Statista, 4 July 2024, https://www.statista.com/statistics/193290/unemployment-rate-in-the-usa-since-1990/. Accessed 20 Apr. 2025.

#1, #2, #3