Assignment

Requirements

\(MSDA\ 607\ Final\ Project\)

Deliverables Schedule

Deliverable Date Points
One Paragraph Proposal Sunday April 26 20
Final Project Wednesday May 19 120
Final Project Presentation Before or during Final Meetup on Wednesday May 19 30

Policy on Collaboration

You may work in a team of up to three people. Each project team member is responsible for understanding and being able to explain all of the submitted project code. Remember that you can take work that you find elsewhere as a base to build on, but you need to acknowledge the source, so that I base your grade on what you contributed, not on what you started with!

Approval Meeting

Once you’ve turned in your one paragraph proposal, I want to schedule a 15 minute phone meeting with each person or team (starting when we return from Thanksgiving break), where you’ll describe the reason (benefit) for doing this work and/or question you’re seeking to answer, where you’ll source the data, and the overall flow. For team projects, I also want you to articulate the roles and responsibilities of each team member.

Final Project Checklist

To receive full credit, you’ll need to deliver on all of the items in the checklist below. Please read carefully through this checklist before you make your project proposal. You are (within these checklist constraints) strongly urged to limit scope and make the necessary simplifying assumptions so that you can deliver your work on time!

  • Proposal describes your motivation for performing this analysis
  • Proposal describes likely data sources.
  • Your project has a recognizable “data science workflow,” such as the OSEMN work flow or Hadley Wickham’s Grammar of Data Science. [Example: First the data is acquired, then necessary transformations and clean-up are performed, then the analysis and presentation work is performed]
  • Project includes data from at least two different types of data sources (e.g., two or more of these: relational or CSV,Neo4J, web page [scraped or API], MongoDB, etc.)
  • Project includes at least one data transformation operation. [Examples: transforming from wide to long; converting columns to date format]
  • Project includes at least one statistical analysis and at least one graphics that describes or validates your data.
  • Project includes at least one graphic that supports your conclusion(s).
  • Project includes at least one statistical analysis that supports your conclusion(s).
  • Project includes at least one feature that we did not cover in class! There are many examples: “I used ggmap; I created a decision tree; I ranked the results; I created my presentation slides directly from R; I figured out to use OAuth 2.0…”
  • Presentation. Was the presentation delivered in the allotted time (3 to 5 minutes)?
  • Presentation. Did you show (at least) one challenge you encountered in code and/or data, and what you did when you encountered that challenge? If you didn’t encounter any challenges, your assignment was clearly too easy for you!
  • Presentation. Did the audience come away with a clear understanding of your motivation for undertaking the project?
  • Presentation. Did the audience come away with a clear understanding of at least one insight you gained or conclusion you reached or hypothesis you “confirmed” (rejected or failed to reject…)?
  • Code and data. Have you delivered the submitted code and data where it is self-contained—preferably in rpubs.com and github? Am I able to fully reproduce your results with what you’ve delivered? You won’t receive full credit if your code references data on your local machine!
  • Code and data. Does all of the delivered code run without errors?
  • Code and data. Have you delivered your code and conclusions using a “reproducible research” tool such as RMarkdown?
  • Deadline management. Were your draft project proposal, project, and presentation delivered on time? Any part of the project that is turned in late will receive a maximum grade of 80%. Please turn in your work on time! You are of course welcome to deliver ahead of schedule!

Scope

With COVID-19 impacting employment worldwide, many articles have been published highlighting the toll it has taken on specific demographics. Our motivation to dive deeper into the impact by demographic, has been influenced by articles such as COVID-19 has spurred America’s first female recession. This article talk about the negative affect the pandemic has had on women in the United States. Using data from the U.S. Bureau of Labor Statistics, we hope to identify if the impact was disproportionate or great to women vs. men.

Data Preparation

Load Data

In order to conduct our studies we used data from U.S. Bureau of Labor Statistics | Labor Force Statistics from the Current Population Survey specifically data regarding:

The data was web scraped then encapsulated using the functions as.dataframe(), html_table() and read_html, then subsetted.

url_9  <- "https://www.bls.gov/web/empsit/cpseea09.htm"
url_10 <- "https://www.bls.gov/web/empsit/cpseea10.htm"
url_30 <- "https://www.bls.gov/web/empsit/cpseea30.htm"

Table 9 Data

head(df_tb9  <- as.data.frame(html_table(read_html(url_9))[[2]]),5)%>%
  reactable(highlight = TRUE, pagination = FALSE, height = 450)


________________________________________________________________________________

Table 10 Data

head(df_tb10 <- as.data.frame(html_table(read_html(url_10))[[2]]),5)%>%
  reactable(highlight = TRUE, pagination = FALSE, height = 450)

Table 30 Data

head(df_tb30 <- as.data.frame(html_table(read_html(url_30))[[2]]),10)%>%
  reactable(highlight = TRUE, pagination = TRUE, height = 850)

Transform Data

Data manipulation and tidying for Table 9 included:

  • relabeling column values to reflect a (4 digit: year) - (2 digit: month) (%y%m) format.

  • subsetting data frames to remove unnecessary data or blank rows with the help of piping %>%

  • categorizing data with a Gender column, that included a Male,Female and Total value respectively.

    Note: Total value was used to specify data that reflects total unemployment counts.

colnames(df_tb10) <- c("Label","2020-03", "2020-04",
                         "2020-05", "2020-06", "2020-07",
                         "2020-08", "2020-09", "2020-10",
                         "2020-11", "2020-12","2021-01",
                         "2021-02", "2021-03")
                              

colnames(df_tb9) <- c("Label", "2020-03", "2020-04",
                         "2020-05", "2020-06", "2020-07",
                         "2020-08", "2020-09", "2020-10",
                         "2020-11", "2020-12","2021-01",
                         "2021-02", "2021-03")
df_tb9_tdy<-
    as.data.frame(df_tb9[1,]%>%
      assign_in(list(1,1),"Month")%>%
        bind_rows(df_tb9[3:14,],
          df_tb9[16:27,],
          df_tb9[29:40,])
    )
# Adding male and female category
df_tb9_tdy$Gender<-"Total"
df_tb9_tdy$Gender[14:25] <- "Male"
df_tb9_tdy$Gender[26:37] <- "Female"

And repeated for Table 10.

df_tb10_tdy<-
    as.data.frame(df_tb10[1,]%>%
      assign_in(list(1,1),"Month")%>%
        bind_rows(df_tb10[3:14,],
          df_tb10[16:27,],
          df_tb10[29:40,])
    )

# Adding male and female category
df_tb10_tdy$Gender<-"Total"
df_tb10_tdy$Gender[14:25] <- "Male"
df_tb10_tdy$Gender[26:37] <- "Female"

This tidy data (df_tb9_tdy,df_tb10_tdy) is the baseline used for our evaluation of the comparison between unemployment rates of men and women, during the beginning month until present day, of the pandemic.

Using ZevRoss: Know Your Data | Easy multi-panel plots in R using facet_wrap() and facet_grid() from ggplot2 as a point of reference, we continue to transform the data.

Specifically:

  • regex is used within the gsub() function, to remove , from our Rate column.
  • Month and Rate is then transformed to data types that will allow for further analysis.
#Dependency on df_tb10_tdy    Rate, Gender, Age Group
df_tb10_ln <- as.data.frame(df_tb10_tdy %>% 
  slice(c(16,17,19,22,23,24,25,28,29,31,34,35,36,37)) %>% 
    gather("Month_Year", "Rate", 2:14)%>%
      lapply(function(x){ 
        gsub(",", "", x)}))
df_tb10_ln$Month_Year<- as.yearmon(df_tb10_ln$Month_Year) 
df_tb10_ln$Rate <- as.numeric(df_tb10_ln$Rate)

The data is stored to df_tb10_ln. When sorted, it is easy to see that the highest unemployment rates overall, did occur among women between the ages of 16 to 19, during the months of March and April of 2020, by a small margin over men 18 to 19 in March & April 2020.

Label Gender Month_Year Rate
18 to 19 years Female Mar 2020 38.7
16 to 17 years Female Mar 2020 33.3
16 to 17 years Female Apr 2020 31.0
18 to 19 years Male Mar 2020 30.7
18 to 19 years Female Apr 2020 30.3
18 to 19 years Male Apr 2020 28.8

From here we calculate rates and store the values to df_tb9_rt. The data is transformed in a month over month format, so that it can be assessed as a trend throughout the year. The functions used to create this data frame included:

  • piping %>%
  • slice()
  • gather()
  • mutate()

This data frame is grouped by Gender specifically.

Dates and counts are converted with as.yearmon() & as.numeric().

The calculations for our month-over-month (MoM) values and month-over-month rates (MoM_Rates) were as follows:

\(MoM= \frac{Count - lag(Count)}{lag(Count)}\) \(MoM\_Rate =MoM\times100 :rounded\ to\ the\ tenth\)

where lag() is a time series shift from the stats package.

# Month over Month employment rate change by gender 
#Dependency on df_tb9_tdy:     Rate and Gender
df_tb9_rt <- df_tb9_tdy %>% 
  slice(c(14,26)) %>% 
    gather("Month_Year", "Count", 2:14)

df_tb9_rt$Month_Year<- as.yearmon(df_tb9_rt$Month_Year)
df_tb9_rt$Count <- as.numeric(gsub(",","",df_tb9_rt$Count))

df_tb9_rt <- 
    df_tb9_rt %>% 
      group_by(Gender) %>% 
        mutate(MoM = (Count - lag(Count)) / lag(Count)) %>% 
          mutate(MoM_Rate =  round(MoM * 100, 2))

The result is a data frame with rates for ages 16 and over (or all age groups).

Label Gender Month_Year Count MoM MoM_Rate
Women, 16 years and over Female Nov 2020 5042 0.0430285 4.30
Men, 16 years and over Male Mar 2021 5394 0.0301757 3.02
Men, 16 years and over Male Jan 2021 5395 -0.0071770 -0.72
Men, 16 years and over Male Oct 2020 5894 -0.0107419 -1.07
Women, 16 years and over Female Mar 2021 4418 -0.0125168 -1.25
Women, 16 years and over Female Feb 2021 4474 -0.0225038 -2.25

Following a similar approach, but grouping by Gender & Label we create data frame df_tb9_MoM and can see the impact for each age bracket.

# Month over Month employment rate change by age and gender 
#Dependency on df_tb9_tdy:     Count, age, and Gender

df_tb9_MoM <- df_tb9_tdy %>% 
  slice(c(16,17,19,22,23,24,25,28,29,31,34,35,36,37)) %>% 
  gather("Month_Year", "Count", 2:14)

df_tb9_MoM$Month_Year<- as.yearmon(df_tb9_MoM$Month_Year) 
df_tb9_MoM$Count <- as.numeric(gsub(",","",df_tb9_MoM$Count))

df_tb9_MoM <- df_tb9_MoM %>% 
  group_by(Gender, Label) %>% 
  mutate(MoM = (Count - lag(Count)) / lag(Count)) %>% 
  mutate(MoM_Rate =  round(MoM * 100, 1))
Label Gender Month_Year Count MoM MoM_Rate
16 to 17 years Male Oct 2020 146 -0.1360947 -13.6
16 to 17 years Female Oct 2020 168 0.2923077 29.2
16 to 17 years Male Nov 2020 159 0.0890411 8.9
16 to 17 years Female Nov 2020 143 -0.1488095 -14.9
16 to 17 years Male Dec 2020 157 -0.0125786 -1.3
16 to 17 years Female Dec 2020 180 0.2587413 25.9
16 to 17 years Male Jan 2021 175 0.1146497 11.5
16 to 17 years Female Jan 2021 149 -0.1722222 -17.2
16 to 17 years Male Feb 2021 143 -0.1828571 -18.3
16 to 17 years Female Feb 2021 135 -0.0939597 -9.4
16 to 17 years Male Mar 2021 126 -0.1188811 -11.9

For our linear trend analysis df_tb10_trend is created for an additional perspective of unemployment by demographic by year.

#Dependency on df_tb10_tdy:     Rate and Gender
#data transformation operation: from wide to long function: gather, package: tidyr  
df_tb10_trend <- df_tb10_tdy %>% 
  slice(c(14,26)) %>% 
  gather("Month_Year", "Rate", 2:14)

#data transformation operation: date format function: as.yearmon, package: zoo  
df_tb10_trend$Month_Year<- as.yearmon(df_tb10_trend$Month_Year) 
df_tb10_trend$Rate <- as.numeric(df_tb10_trend$Rate)
Label Gender Month_Year Rate
Men, 16 years and over Male Mar 2020 13.6
Women, 16 years and over Female Mar 2020 16.1
Men, 16 years and over Male Apr 2020 12.2
Women, 16 years and over Female Apr 2020 14.5
Men, 16 years and over Male May 2020 10.5

Because the format of table 30 was slightly different from the latter, information it was tidied using the same overview but accounting for the specific rows and columns we required.

df_tb30 <- 
  as.data.frame(
    bind_rows(df_tb30[3,]
              ,df_tb30[5:33,])%>%
      select(-2,-3)%>%
                lapply(function(x){
                   gsub(" occupations", "", x)}
                        ))%>%
  `colnames<-`(c("Occupation_Label",
                 "Total_Unemployed_Rates_April_2020",
    "Total_Unemployed_Rates_April_2021", "Men_Unemployment_Rate_2020",
    "Men_Unemployment_Rate_2021", "Women_Unemployment_Rate_2020",
    "Women_Unemployment_Rate_2021"))

Industries are categorized as primary industry categories and sub categories in the original HTML file shown below


therefore 2 data frames were made with the gather() function from the tidyr package.

  • df_tb30_full : noting all main and sub industry names
  • df_tb30_sub : noting only the primary categories
  • df_tb30_rc : including rate change

Gender_Year column is created in the transformation with rate being transformed into a single column.

# All categories
df_tb30_full<-as.data.frame(df_tb30 %>% 
  select(-2,-3)%>%
    slice(c(2:30)) %>%
      gather("Gender_Year", "Rate", 2:5))%>%
        mutate(Rate = as.numeric(Rate))
# Only Primary categories
df_tb30_sub<-as.data.frame(df_tb30 %>% 
  select(-2,-3)%>%
    slice(c(3,6,15,21,24,28)) %>%
      gather("Gender_Year", "Rate", 2:5))%>%
        mutate(Rate = as.numeric(Rate))

An if else statement is used, referencing Gender_Year to categorize gender in our results.

df_tb30_full$Gender<-
  ifelse(grepl("Men", df_tb30_full$Gender_Year), "Male",
               ifelse(grepl("Women", df_tb30_full$Gender_Year), "Female", ""))

df_tb30_sub$Gender<-
  ifelse(grepl("Men", df_tb30_sub$Gender_Year), "Male",
         ifelse(grepl("Women", df_tb30_sub$Gender_Year), "Female", ""))

The Year column is added to df_tb30_sub using case_when() and endsWith()

df_tb30_sub <- 
  df_tb30_sub %>%
  mutate(Year = case_when(
    endsWith(Gender_Year, "2020") ~ "2020",
    endsWith(Gender_Year, "2021") ~ "2021"))

In order to create Rate_Change in df_tb30_rc we used the following calculation in mutate():

\(\frac{2021-2020}{2020} \times 100\)

df_tb30_rc<-
  df_tb30_sub %>%
    select(- 2) %>% 
      spread(Year,Rate)

df_tb30_rc<-df_tb30_rc%>%
  mutate(Rate_Change =
    round((((df_tb30_rc$'2021' - df_tb30_rc$'2020')/df_tb30_rc$'2020')*100),2))

The resulting data frames are as follows:

df_tb30_full

Occupation_Label Gender_Year Rate Gender
Management, professional, and related Men_Unemployment_Rate_2020 6.6 Male
Management, business, and financial operations Men_Unemployment_Rate_2020 5.8 Male
Management Men_Unemployment_Rate_2020 5.9 Male
Business and financial operations Men_Unemployment_Rate_2020 5.3 Male
Professional and related Men_Unemployment_Rate_2020 7.3 Male
Computer and mathematical Men_Unemployment_Rate_2020 4.2 Male
Architecture and engineering Men_Unemployment_Rate_2020 5.2 Male
Life, physical, and social science Men_Unemployment_Rate_2020 7.1 Male
Community and social service Men_Unemployment_Rate_2020 5.1 Male
Legal Men_Unemployment_Rate_2020 4.8 Male

df_tb30_sub

Occupation_Label Gender_Year Rate Gender Year
Management, business, and financial operations Men_Unemployment_Rate_2020 5.8 Male 2020
Professional and related Men_Unemployment_Rate_2020 7.3 Male 2020
Service Men_Unemployment_Rate_2020 23.6 Male 2020
Sales and office Men_Unemployment_Rate_2020 12.6 Male 2020
Natural resources, construction, and maintenance Men_Unemployment_Rate_2020 16.2 Male 2020
Production, transportation, and material moving Men_Unemployment_Rate_2020 17.4 Male 2020
Management, business, and financial operations Men_Unemployment_Rate_2021 2.9 Male 2021
Professional and related Men_Unemployment_Rate_2021 3.1 Male 2021
Service Men_Unemployment_Rate_2021 8.2 Male 2021
Sales and office Men_Unemployment_Rate_2021 5.6 Male 2021

df_tb30_rc

Occupation_Label Gender 2020 2021 Rate_Change
Management, business, and financial operations Female 6.7 3.4 -49.25
Management, business, and financial operations Male 5.8 2.9 -50.00
Natural resources, construction, and maintenance Female 17.0 10.2 -40.00
Natural resources, construction, and maintenance Male 16.2 7.8 -51.85
Production, transportation, and material moving Female 21.1 9.0 -57.35
Production, transportation, and material moving Male 17.4 7.8 -55.17
Professional and related Female 9.9 2.9 -70.71
Professional and related Male 7.3 3.1 -57.53
Sales and office Female 16.1 5.6 -65.22
Sales and office Male 12.6 5.6 -55.56
Service Female 29.8 7.9 -73.49
Service Male 23.6 8.2 -65.25

Finally for our analysis of married women vs. married men rates we utilize the rows 43 and 44 from df_tb10 and transform in a similar fashion as the above data.

df_tb10marital<-df_tb10[43:44,]
names(df_tb10marital)[1] <- "Marital Label"
df_tb10marital$`Marital Label`<- c("Male with Spouse Present","Female with Spouse Present")
df_tb10marital$Gender<-c("Male","Female")

df_tb10marital <- 
  as.data.frame(df_tb10marital %>% 
    gather("Month_Year", "Rate", 2:14))%>%
      mutate(Rate = as.numeric(Rate))%>%
        mutate(Month_Year = as.yearmon(Month_Year))

df_tb10marital

Marital Label Gender Month_Year Rate
Male with Spouse Present Male Mar 2020 9.6
Female with Spouse Present Female Mar 2020 13.1
Male with Spouse Present Male Apr 2020 8.2
Female with Spouse Present Female Apr 2020 11.5
Male with Spouse Present Male May 2020 6.9

Exploratory Data Analysis

The data from Unemployment by Occupation and Genderseems to support that women were more significantly impacted by unemployment in 2020, than men in virtually all industries.

ggplot(df_tb30_full, aes(x=fct_reorder(factor(Occupation_Label),Rate), y = (Rate),color = Gender,
                         group = Gender, fill=Gender)) +
  geom_col(position="dodge") +
  theme_minimal()+
  theme(axis.text.x = element_text( hjust=1))+
  labs(title = 'Unemployment by Occupation and Gender'
       ,subtitle = "4/2020-4/2021",
       x = "Occupation", y = "Rate")+
  coord_flip()

However when running summary() on our data, we see an low R-squared value of 0.0118. This indicates that our data does not fit well in our regression model.

#testing linear model. issue: gender is categorical var- need to look into this
# only accounts for 1.18% of data, Multiple R-squared:  0.0118
summary(lm(Count~Gender, df_tb9_rt))
## 
## Call:
## lm(formula = Count ~ Gender, data = df_tb9_rt)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -2123  -1627  -1095   1431   5293 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   6540.8      634.3  10.312 2.69e-10 ***
## GenderMale     480.2      897.1   0.535    0.597    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2287 on 24 degrees of freedom
## Multiple R-squared:  0.0118, Adjusted R-squared:  -0.02938 
## F-statistic: 0.2865 on 1 and 24 DF,  p-value: 0.5974

This fact is supported when running plot(), which results in various charts that we cannot used to support whether a relationship between unemployment and gender exists.

plot(lm(Count~Gender, df_tb9_rt))

## hat values (leverages) are all = 0.07692308
##  and there are no factor predictors; no plot no. 5

Analysis

Based on the the data seen in the below, women experienced a substantial loss in employment from the months of March to August 2020. This coincides with the first day of stay at home order in the United States.

Gender & Marital Status

Marital Status and Unemployment Rate

At the beginning of the pandemic the unemployment rates were higher in the married women than married men. As time progresses the unemployment rate decreases for both the married men and women. However, married women are still experiencing higher unemployment rate than the married men.

ggplot(df_tb10marital, aes(x=factor(Month_Year), y = (Rate),color = `Gender`,
   group = `Gender`, fill=`Gender`)) +
   geom_col(position="dodge") +
   theme_minimal()+
   theme(axis.text.x = element_text(hjust=1))+
   labs(title = 'Unemployment Rates by Gender and Marital Status'
        ,subtitle = "Married with Spouse Present",
        x = "Month-Year", y = "Rate")+
   coord_flip()

Gender & Age

Overall

We see that both male and females saw their largest unemployment spike in early March, women consistently having a higher rate monthly until about September 2020.

df_tb10_trend

ggplot(df_tb10_trend, aes(x=factor(Month_Year), y = (Rate),color = Gender,
                          group = Gender, fill=Gender)) +
  geom_col(position="dodge") +
  theme_minimal()+
  theme(axis.text.x = element_text(angle = 45, hjust=1))+
  labs(title = 'Unemployment Rates by Gender'
       ,subtitle = "03/2020-03/2021",
       x = "Month", y = "Rate")

Additionally, when broken down into age groups, we can see 18 to 19 years, 16 to 17 years and 35 to 44 years of age where the highest for women and far exceeded the rates for men in the early months.

df_tb10_ln

(ggplot(data = df_tb10_ln, aes(Month_Year,Rate,color = Gender,group = Gender))+
  geom_line(size = 1) +
  geom_point() + 
  theme_minimal()+
  theme(axis.text.x = element_text(angle = 45, hjust=1))+
  labs(title = 'Unemployment Rates by Age Group and Gender'
       ,subtitle = "03/2020-03/2021",
       x = "Month", y = "Rate")+
  facet_wrap( ~ Label, scales = "free_y"))

Month-over-month

Grouping and plotting our data by month, allows us to see the trend from March 2020 to March 2021. Doing so paints a clearer picture of the impact by demographic of COVID.

df_tb9_rt

ggplot(df_tb9_rt, aes(x=factor(Month_Year), y=MoM, fill=Gender)) + 
  geom_col(position="dodge") +
  scale_y_continuous(labels = scales::percent)+  
  theme_minimal()+
  theme(axis.text.x = element_text(angle = 45, hjust=1))+
  labs(title = 'Month over Month Unemployment Rate Change by Gender'
       ,subtitle = "03/2020-03/2021, in thousands",
       x = "Month", y = "Rate Chnage")

df_tb9_MoM

##bar plot month over month age grp and geneder
(ggplot(df_tb9_MoM, aes(x=factor(Month_Year), y = (MoM), color = Gender,
                        group = Gender, fill=Gender)) +
  geom_col(position="dodge") +
  theme_minimal()+
  theme(axis.text.x = element_text(angle=90, hjust=1))+
  labs(title= 'Month over Month Unemployment Rate Change by Age Group and Gender',
       subtitle="03/2020-03/2021",
       x="Month", y="Rate Change")+
  facet_wrap( ~ Label, scales = "free_y"))

Gender & Occupation

When the data is stratified by industry, we see the trend holds true regardless of profession.

(ggplot(data = df_tb30_sub, aes(Year, Rate, color = Gender,
                                group = Gender, fill = Gender)) +
  geom_col(position="dodge") +
  theme_minimal()+
  theme(axis.text.x = element_text(angle = 45, hjust=1))+
  labs(title = 'Unemployment Rates by Occupation and Gender'
       ,subtitle = "2020-2021",
       x = "Year", y = "Rate")+
  facet_wrap( ~ Occupation_Label, scales = "free_y",
              ncol = 3,labeller = labeller(Service = "Gender_Year")))

##bar plot month over month age group and gender
ggplot(df_tb30_rc, aes(x=factor(Occupation_Label), y = (Rate_Change), color = Gender,
                        group = Gender, fill=Gender)) +
  geom_col(position="dodge") +
  theme_minimal()+
  theme(axis.text.x = element_text(hjust=1))+
  labs(title= 'Rate Change Per Industry',
       subtitle="04/2020-04/2021",
       x="Industry", y="Rate Change")+coord_flip()

Conclusion

Our results support the title for the the article which inspired us, since it was published August 5,2020. At this point in the pandemic women were losing or leaving their jobs at a alarming rate. The concerns raised at the time were valid, and was at least 1 month away from producing data to supported the contrary. However, with the pandemic coming to a close (hopefully), we believe the trend should lean more favorably in women returning to the workforce.