Project 2

How has the proportion of men and women in the workforce differed over time?

Introduction

The World Development Indicators (WDI) Dataset is provided by the United Nations Educational, Scientific and Cultural Organization (UNESCO). It contains national, regional, and global estimates of development and population data.

Source: https://data.unesco.org/explore/dataset/wdi001/information/

Key Variables:
-Year
-Regional Group
-Labor Force (Male, Female, Total)
-Population (Male, Female)

This dataset includes 6326 observations and 25 variables.

Load libraries and dataset

library(tidyverse)
library(RColorBrewer)
countries <- read_csv("wdi001.csv")

Data Analysis

I will filter the dataset to only include countries in Asia, group by year, and summarize to find the mean population information for men and women. I will also find the mean proportions of men and women in the workforce. With this, I will create a supporting line plot to illustrate the difference in these proportions over time.

Inspect data

#str(df)
head(countries)

## # A tibble: 6 × 25
##    Year `Regional Group`  Country `GDP Total` `GDP Growth Rate` `GDP Per Capita`
##   <dbl> <chr>             <chr>         <dbl>             <dbl>            <dbl>
## 1  1964 Europe and North… Andorra         NA              NA                 NA 
## 2  1968 Europe and North… Andorra         NA              NA                 NA 
## 3  1970 Europe and North… Andorra   78617711.             NA               3935.
## 4  1972 Europe and North… Andorra  113414397.              8.15            4940.
## 5  1974 Europe and North… Andorra  186557082.              5.62            7140.
## 6  1977 Europe and North… Andorra  253997897.              2.84            8168.
## # ℹ 19 more variables: `Youth Literacy Rate` <dbl>,
## #   `Adult Literacy Rate` <dbl>, `Primary School Enrollment` <dbl>,
## #   `Secondary School Enrollment` <dbl>, `Tertiary School Enrollment` <dbl>,
## #   `Labor Force Female` <dbl>, `Labor Force Male` <dbl>,
## #   `Labor Force Total` <dbl>, `Unemployment Rate` <dbl>,
## #   `Life Expectancy` <dbl>, `Population Aged 0-14` <dbl>,
## #   `Population Aged 15-64` <dbl>, `Population Aged 65-up` <dbl>, …

Clean

names(countries) <- gsub(" ", "_", names(countries))#sub spaces w/ underscores
names(countries) <- tolower(names(countries))      #variable names lowercase

countries <- countries |>
  filter(!is.na(labor_force_total))

Filter to only include countries in Asia.

countriesasia <- countries |>
  select(c(year, 
           country,
           regional_group,
           labor_force_female, 
           labor_force_male, 
           labor_force_total,
           female_population,
           male_population,
           population_total)) |>
  filter(regional_group %in% c("Asia and the Pacific", 
                               "Arab States", 
                               "Arab States,Asia and the Pacific"))
   
#Find male/female population count
countriesasia <- countriesasia |>
  mutate(population_female = (female_population/100) * population_total,
         population_male = (male_population/100) * population_total)

Prepare dataset for statistical analysis and plotting

#Find mean of labor percentages across Asia. 
countriesasia2 <- countriesasia |>
  group_by(year) |>
  summarise(mean_total_labor = mean(labor_force_total),  
            mean_female_labor = mean(labor_force_female),
            mean_male_labor = mean(labor_force_male),
            mean_female_pop = mean(population_female),
            mean_male_pop = mean(population_male)) |>
  pivot_longer(cols = c(mean_total_labor,   #Separate percentages for graphing
                        mean_female_labor, 
                        mean_male_labor),
               names_to = "Category",
               values_to = "Percentage")

Plot

ggplot(countriesasia2, aes(x = year, y = Percentage, color = Category)) + 
  geom_line(size = 0.9) +
  geom_point() + 
  labs(title = "Avg. % Population in Asia's Labor Force over Time (1990-2024)",
       x = "Year",
       y = "Avg. % Population in Labor Force",
       caption = "Source: UNESCO") +
  scale_color_brewer(palette = "Dark2", name = "",
                     labels = c("Female", "Male", "Total")) + 
  ylim(40,80) +
  theme_bw()

Statistical Analysis

Two Proportions Z-Test Is the proportion of males in the workforce higher than the proportion of females in the workforce in Asia in 1990 and 2024?

\(H_0\): \(p_1\) = \(p_2\)
\(H_a\): \(p_1\) > \(p_2\)

Where:
\(p_1\) = proportion of male population in the workforce
\(p_2\) = proportion of female population in the workforce

head(countriesasia2, 3)

## # A tibble: 3 × 5
##    year mean_female_pop mean_male_pop Category          Percentage
##   <dbl>           <dbl>         <dbl> <chr>                  <dbl>
## 1  1990       30306549.     31488399. mean_total_labor        61.4
## 2  1990       30306549.     31488399. mean_female_labor       43.1
## 3  1990       30306549.     31488399. mean_male_labor         77.7

tail(countriesasia2, 3)

## # A tibble: 3 × 5
##    year mean_female_pop mean_male_pop Category          Percentage
##   <dbl>           <dbl>         <dbl> <chr>                  <dbl>
## 1  2024       47608549.     49355122. mean_total_labor        60.5
## 2  2024       47608549.     49355122. mean_female_labor       46.0
## 3  2024       47608549.     49355122. mean_male_labor         73.2

1990

#Mean male population * male labor percentage
31488399    * .7768706

## [1] 24462411

#Mean female population * female labor percentage
30306549 * .4306110

## [1] 13050333

prop.test(c(24462411, 13050333), c(31488399, 30306549), alternative = "greater")

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(24462411, 13050333) out of c(31488399, 30306549)
## X-squared = 7762054, df = 1, p-value < 2.2e-16
## alternative hypothesis: greater
## 95 percent confidence interval:
##  0.3460678 1.0000000
## sample estimates:
##    prop 1    prop 2 
## 0.7768706 0.4306110

2024

#Mean male population * male labor percentage
49355122    * .7316760

## [1] 36111958

#Mean female population * female labor percentage
47608549 * .4599562

## [1] 21897847

prop.test(c(36111958, 21897847), c(49355122, 47608549), alternative = "greater")

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(36111958, 21897847) out of c(49355122, 47608549)
## X-squared = 7444178, df = 1, p-value < 2.2e-16
## alternative hypothesis: greater
## 95 percent confidence interval:
##  0.2715621 1.0000000
## sample estimates:
##    prop 1    prop 2 
## 0.7316760 0.4599562

P-value: 2.2e-16
α = 0.05

There is strong evidence to support the alternate hypothesis that the proportion of males in the workforce is higher than the proportion of females in the workforce in Asia.

95% CI for difference = (0.271, 1.000)
The interval is entirely above 0, showing there is a higher proportion of males in the workforce than females in Asia.

Conclusion

In my analysis I found that the average proportion of males in the workforce is higher than females in the workforce in Asian countries. I also found that while the percentage of males in the workforce has decreased since 1990 (77.6% to 73.2%), the percentage of women in the workforce has increased (43.1% to 46%). This answers my question of how the proportion has differs between men and women and how it has changed over time. It is clear from the results that throughout countries in Asia, there is a significant gender gap between the working population. This is likely due to traditional culture norms, however, it is also clear from the data that more and more women are entering the workforce as time moves forward. In the future, I would like to further explore the rise in women entering the workforce by noting which jobs are most popular among the population, and I would also like to see this growth in individual countries rather than the continent as a whole.

References

Dataset: https://data.unesco.org/explore/dataset/wdi001/information/ RColorBrewer library from DATA110.