Overview of the Data

Data is taken from data.lacity.org and contains information about job applicants from the 2013-2014 and 2014-2015 fiscal year. Information includes applicants’ gender and ethnicity. Data was last updated December 1, 2016, and metadata last updated November 30,2020.

For more details see https://data.lacity.org/Administration-Finance/Job-Applicants-by-Gender-and-Ethnicity/mkf9-fagf/about_data.

Intial look at the data + Pre-processing

There’s a lot of interesting data to explore here, but for this project I want to explore gender differences between applicants.

Upon opening the data in a spreadsheet software, I’ve noticed there are many types of jobs for LA county. To maintain consistency, I added a column for occupation type based off the 2018 Standard Occupational Classification System under the U.S. Bureau of Labor and Statistics.

Let’s take a look at number of applications for the 2013-2014 fiscal year vs the 2014-2015 fiscal year.

Less Applicants in 2013-2014 versus 2014-2015

The number of total number of applicants decreased significantly from 2013-2014 to 2014-2015. Let’s see if the ratio between women and men applying changed in any way.

Less Female Applicants in the 2014-2015 Year

Interesting. While the 2013-2014 fiscal year had a more balanced distribution between male and female applicants, the 2014-2015 shows a large disparity between the two.

Let’s open up this data in R to find out why.

Set up our environment

# Load relevant libraries
library(tidyverse)  # Includes ggplot2, dplyr, stringr, etc.

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(janitor)    # For cleaning column names and data

## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library(ggrepel)    # For better label placement in ggplot2

# Load up the data
df <- read_csv("../data/Los Angeles County Job Applicants Dataset.csv") %>%
  clean_names()

## Rows: 187 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Fiscal Year, HR Designations, Occupation Type
## dbl (1): Unknown Gender
## num (3): Apps Received, Female, Male
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Let’s look at the column types.

glimpse(df)

## Rows: 187
## Columns: 7
## $ fiscal_year     <chr> "2013-2014", "2013-2014", "2013-2014", "2013-2014", "2…
## $ hr_designations <chr> "OP", "P", "OP", "P", "O", "P", "OP", "O", "P", "O", "…
## $ occupation_type <chr> "Management", "Office and Administrative Support", "Ma…
## $ apps_received   <dbl> 54, 648, 51, 48, 40, 161, 102, 702, 105, 897, 329, 104…
## $ female          <dbl> 20, 488, 13, 9, 15, 89, 53, 430, 3, 467, 27, 27, 2, 7,…
## $ male            <dbl> 31, 152, 37, 38, 24, 66, 48, 240, 101, 411, 294, 75, 1…
## $ unknown_gender  <dbl> 3, 8, 1, 1, 1, 6, 1, 32, 1, 19, 8, 2, 2, 1, 1, 0, 0, 1…

# check if everything is loaded properly
df %>% filter(if_any(everything(), is.na))

Looks about right.

Let’s create a helper function to summarize data by occupation type.

# Helper Function: Summarizing by occupation
summarize_by_occupation <- function(df) {
  df %>% 
    group_by(occupation_type) %>% 
    summarize(
      total_male = sum(male, na.rm=TRUE), 
      total_female= sum(female, na.rm=TRUE), 
      total_apps = sum(apps_received, na.rm=TRUE),
      .groups = 'drop')
}

Looking into the distribution of applicants by job category

# Helper: Truncate and wrap text
truncate_and_wrap <- function(text, truncate_at = 25, wrap_width = 15) {
  short <- ifelse(nchar(text) > truncate_at, paste0(substr(text, 1, truncate_at), "..."), text)
  str_wrap(short, width = wrap_width)
}

# Prepare summarized and processed data
df_gender_diff <- df %>% 
  summarize_by_occupation() %>% 
  mutate(
    gender_dominance = ifelse(total_male > total_female, "male", "female"),
    gender_diff = abs(total_male - total_female)
  ) %>% 
  select(occupation_type, gender_diff, gender_dominance) %>% 
  arrange(desc(gender_dominance), desc(gender_diff)) %>%
  mutate(
    id = row_number(),
    wrapped_short_label = truncate_and_wrap(occupation_type),
    log_gender_diff = log1p(gender_diff),
    angle = 90 - 360 * (id - 0.5) / n(),
    hjust = ifelse(angle < -90, 1, 0),
    angle = ifelse(angle < -90, angle + 180, angle)
  )

# Create the plot
ggplot(df_gender_diff, aes(x = factor(id), y = log_gender_diff, fill = gender_dominance)) +
  geom_bar(stat = "identity", alpha = 0.7) +

  # Inside bar value labels
  geom_text(aes(label = gender_diff, y = log_gender_diff / 2), 
            color = "black", size = 2.5) +

  # Outside wrapped labels
  geom_text(
    aes(
      label = wrapped_short_label,
      y = log_gender_diff + 0.5,
      angle = angle,
      hjust = hjust
    ),
    size = 1.8
  ) +

  coord_polar(start = 0) +
  ylim(-6, 12) +
  theme_minimal() +
  theme(
    axis.text = element_blank(),
    axis.title = element_blank(),
    panel.grid = element_blank(),
    plot.title = element_text(face = "bold")
  ) +
  scale_fill_manual(
    name = "Gender",
    values = c("male" = "#1F77B4", "female" = "#FF69B4")
  ) +
  labs(
    title = "Gender Dominance in Applicants by Occupation",
    subtitle = "Only 5 out of 19 Categories are Dominated by Women"
  )

It appears that women only dominate applications in five different job groups:

Office and Administrative
Business and Financial
Community and Social Service
Educational Instruction
Healthcare Practitioners

What if we look into the top 5 job categories with greatest gender differences?

Top 5 job categories with greatest gender difference

df %>%
  summarize_by_occupation() %>%  # Summarize data by occupation
  # Determine which gender dominates by comparing totals
  # Calculate absolute difference between male and female totals
  mutate(
    gender_dominance = ifelse(total_male > total_female, "male", "female"),
    gender_diff = abs(total_male - total_female)
  ) %>%
  # Keep only relevant columns for plotting, sort, and slice top 5
  select(occupation_type, gender_diff, gender_dominance) %>%
  arrange(desc(gender_diff)) %>%
  slice_head(n = 5) %>%
  # Mutate occupation_type for graphing by wrapping names, and add a factor level for charting
  mutate(
    occupation_type = str_wrap(occupation_type, width = 15),
    occupation_type = fct_reorder(occupation_type, gender_diff, .desc = TRUE)
  ) %>%
  # CREATING THE CHART
  ggplot(aes(x = occupation_type, y = gender_diff, fill = gender_dominance)) +
  geom_col() +
  # Add numeric labels above bars for exact difference values
  geom_text(aes(label = gender_diff), vjust = -0.3, size = 3.5) +
  theme_minimal() +
  labs(
    title = "Top 5 Occupations with Greatest Difference in Gender",
    subtitle = "The categories Protective Service and Office Administrative stand out",
    x = "Occupation Type",
    y = "Difference in # of Applicants"
  ) +
  # Bold plot title and custom colors
  theme(plot.title = element_text(face = "bold")) +
  scale_fill_manual(name = "Gender", values = c("male" = "#1F77B4", "female" = "#FF69B4"))

Taking a look at the top 5 occupations with greatest gender differences, we can see Protective Service and Office Administrative work stand out the most.

Let’s take a closer look at the fiscal year differences between the two.

Prepare data for visualization

# Split original df by year
df_2013_2014_raw <- filter(df, fiscal_year == "2013-2014")
df_2014_2015_raw <- filter(df, fiscal_year == "2014-2015")

# Helper function: summarize and label
summarize_and_tag <- function(df, label) {
  summarize_by_occupation(df) %>%
    mutate(fiscal_year = label)
}

# Summarize and label each year
df_2013_2014 <- summarize_and_tag(df_2013_2014_raw, "2013_2014")
df_2014_2015 <- summarize_and_tag(df_2014_2015_raw, "2014_2015")

# --- Create yearly gender difference table ---
df_yearly_gender_diff <- full_join(
  df_2013_2014 %>% select(-fiscal_year),
  df_2014_2015 %>% select(-fiscal_year),
  by = "occupation_type"
) %>%
  # Replace missing values with 0
  mutate(across(everything(), ~ replace_na(.x, 0))) %>%
  # Calculate differences
  mutate(
    male_diff = total_male.y - total_male.x,
    female_diff = total_female.y - total_female.x,
    total_diff = total_apps.y - total_apps.x
  ) %>%
  select(occupation_type, male_diff, female_diff, total_diff) %>%
  arrange(female_diff)

# --- Create long-format data for plotting ---
df_yearly_apps <- bind_rows(df_2013_2014, df_2014_2015) %>%
  pivot_longer(
    cols = c(total_male, total_female),
    names_to = "gender",
    values_to = "num_of_apps"
  ) %>%
  mutate(
    gender = str_remove(gender, "total_"),
    log_num_of_apps = log1p(num_of_apps),
    year_numeric = if_else(fiscal_year == "2013_2014", 2013, 2015)
  )

Office and Administrative Support

Let’s first look into Office and Administrative Support by creating a slope graph between the 2013-2014 and 2014-2015 fiscal years.

## CHART: Office and Administrative Support

df_male = df_yearly_apps %>% filter(gender=="male")
df_female = df_yearly_apps %>% filter(gender=="female")
selected_occupation = c("Office and Administrative Support")


# 1. Start with a single ggplot() call.
ggplot() + 
  # 2. Add the light grey lines for male and female
  geom_line(
    data = df_male, 
    aes(x = year_numeric, y = num_of_apps, group = occupation_type),
    color = "grey",
    alpha = 0.5
  ) +
  geom_line(
    data = df_female, 
    aes(x = year_numeric, y = num_of_apps, group = occupation_type), 
    color = "grey",
    alpha = 0.5
  ) +
  # 3. Add darker blue line for relevant occupation
  geom_line(
    data = df_male %>% 
      filter(occupation_type %in% selected_occupation), 
    aes(x = year_numeric, y = num_of_apps, group = occupation_type), 
    color = "#1F77B4",
    linetype = "solid",
    linewidth = .75, 
    lineend = "round"
  ) +
  # 4. Add darker pink line for relevant occupation
  geom_line(
    data = df_female %>% 
      filter(occupation_type %in% selected_occupation), 
    aes(x = year_numeric, y = num_of_apps, group = occupation_type), 
    color = "#FF69B4",
    linetype = "solid",
    linewidth = .75, 
    lineend = "round"
  ) +
  # 5. Add labels for num_of_apps LEFT SIDE
  geom_text_repel(
    data = df_male %>% filter(year_numeric == 2013 & 
                                occupation_type %in% selected_occupation),
    aes(x = year_numeric, y = num_of_apps, label = num_of_apps),
    box.padding = unit(0.5, "lines"),
    point.padding = unit(0.5, "lines"),
    segment.color = "#1F77B4",
    color = "#1F77B4",
    nudge_x = -0.25, # Assign a numeric value to nudge_x
    direction = "y", 
    hjust = "right"
  ) +
  geom_text_repel(
    data = df_female %>% filter(year_numeric == 2013 &
                                  occupation_type %in% selected_occupation),
    aes(x = year_numeric, y = num_of_apps, label = num_of_apps),
    box.padding = unit(0.5, "lines"),
    point.padding = unit(0.5, "lines"),
    segment.color = "#FF69B4",
    color = "#FF69B4",
    nudge_x = -0.25,
    direction = "y",
    hjust = "right"
  ) +
  # 6. Add labels for num_of_apps RIGHT SIDE
  geom_text_repel(
    data = df_male %>% filter(year_numeric == 2015 & 
                                occupation_type %in% selected_occupation),
    aes(x = year_numeric, y = num_of_apps, label = num_of_apps),
    box.padding = unit(0.5, "lines"),
    point.padding = unit(0.5, "lines"),
    segment.color = "#1F77B4",
    color = "#1F77B4",
    nudge_x = 0.25, # Assign a numeric value to nudge_x
    direction = "y", 
    hjust = "right"
  ) +
  geom_text_repel(
    data = df_female %>% filter(year_numeric == 2015 &
                                  occupation_type %in% selected_occupation),
    aes(x = year_numeric, y = num_of_apps, label = num_of_apps),
    box.padding = unit(0.5, "lines"),
    point.padding = unit(0.5, "lines"),
    segment.color = "#FF69B4",
    color = "#FF69B4",
    nudge_x = 0.25,
    nudge_y = 1000,
    direction = "y",
    hjust = "right"
  ) +
  # 7. Set optional theme and axis labels.
  labs(
    title = "Decrease in Applicants for Office and Administrative Support",
    subtitle = "Both male and female applicants decreased for the 2014-2015 fiscal year",
    x = "Year",
    y = "Number of Applications"
  )+
  theme_minimal() +
  theme(
    plot.title = element_text(face="bold"),
    panel.grid = element_blank()) +
  geom_line(aes(x = NA, y = NA, color = "Male"), show.legend = TRUE) +
  geom_line(aes(x = NA, y = NA, color = "Female"), show.legend = TRUE) +
  scale_color_manual(
    name = "Gender",  # Legend title
    values = c("Male" = "#1F77B4", "Female" = "#FF69B4")
  )

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

Office Administrative applications decreased for the 2014-2015 fiscal year, despite being the largest contributor to female applicants.

## CHART: Protective Service

df_male = df_yearly_apps %>% filter(gender=="male")
df_female = df_yearly_apps %>% filter(gender=="female")
selected_occupation = c("Protective Service")

# 1. Start with a single ggplot() call.
ggplot() + 
  # 2. Add the light grey lines for male and female
  geom_line(
    data = df_male, 
    aes(x = year_numeric, y = num_of_apps, group = occupation_type),
    color = "grey",
    alpha = 0.5
  ) +
  geom_line(
    data = df_female, 
    aes(x = year_numeric, y = num_of_apps, group = occupation_type), 
    color = "grey",
    alpha = 0.5
  ) +
  # 3. Add darker blue line for relevant occupation
  geom_line(
    data = df_male %>% 
      filter(occupation_type %in% selected_occupation), 
    aes(x = year_numeric, y = num_of_apps, group = occupation_type), 
    color = "#1F77B4",
    linetype = "solid",
    linewidth = .75, 
    lineend = "round"
  ) +
  # 4. Add darker pink line for relevant occupation
  geom_line(
    data = df_female %>% 
      filter(occupation_type %in% selected_occupation), 
    aes(x = year_numeric, y = num_of_apps, group = occupation_type), 
    color = "#FF69B4",
    linetype = "solid",
    linewidth = .75, 
    lineend = "round"
  ) +
  # 5. Add labels for num_of_apps LEFT SIDE
  geom_text_repel(
    data = df_male %>% filter(year_numeric == 2013 & 
                                occupation_type %in% selected_occupation),
    aes(x = year_numeric, y = num_of_apps, label = num_of_apps),
    box.padding = unit(0.5, "lines"),
    point.padding = unit(0.5, "lines"),
    segment.color = "#1F77B4",
    color = "#1F77B4",
    nudge_x = -0.25, # Assign a numeric value to nudge_x
    direction = "y", 
    hjust = "right"
  ) +
  geom_text_repel(
    data = df_female %>% filter(year_numeric == 2013 &
                                  occupation_type %in% selected_occupation),
    aes(x = year_numeric, y = num_of_apps, label = num_of_apps),
    box.padding = unit(0.5, "lines"),
    point.padding = unit(0.5, "lines"),
    segment.color = "#FF69B4",
    color = "#FF69B4",
    nudge_x = -0.25,
    direction = "y",
    hjust = "right"
  ) +
  # 6. Add labels for num_of_apps RIGHT SIDE
  geom_text_repel(
    data = df_male %>% filter(year_numeric == 2015 & 
                                occupation_type %in% selected_occupation),
    aes(x = year_numeric, y = num_of_apps, label = num_of_apps),
    box.padding = unit(0.5, "lines"),
    point.padding = unit(0.5, "lines"),
    segment.color = "#1F77B4",
    color = "#1F77B4",
    nudge_x = 0.25, # Assign a numeric value to nudge_x
    direction = "y", 
    hjust = "right"
  ) +
  geom_text_repel(
    data = df_female %>% filter(year_numeric == 2015 &
                                  occupation_type %in% selected_occupation),
    aes(x = year_numeric, y = num_of_apps, label = num_of_apps),
    box.padding = unit(0.5, "lines"),
    point.padding = unit(0.5, "lines"),
    segment.color = "#FF69B4",
    color = "#FF69B4",
    nudge_x = 0.25,
    nudge_y = 1000,
    direction = "y",
    hjust = "right"
  ) +
  # 7. Set optional theme and axis labels.
  labs(
    title = "Increase in Male Applicants for Protective Service",
    subtitle = "Protective Service is the only job category that showed growth in male applicants\ndespite a sharp decline in female applicants for the 2014-2015 fiscal year",
    x = "Year",
    y = "Number of Applications"
  )+
  theme_minimal() +
  theme(
    plot.title = element_text(face="bold"),
    panel.grid = element_blank()) +
  geom_line(aes(x = NA, y = NA, color = "Male"), show.legend = TRUE) +
  geom_line(aes(x = NA, y = NA, color = "Female"), show.legend = TRUE) +
  scale_color_manual(
    name = "Gender",  # Legend title
    values = c("Male" = "#1F77B4", "Female" = "#FF69B4")
  )

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

Protective Service increased in male applicants for the 2014-2015 fiscal year, despite all other job categories showing a decrease in applicants. This is an outlier since there are less women who applied in the 2014 fiscal year.

Conclusion

And so the gender disparity in the 2014-2015 fiscal year is explained by both…

The drastic decrease in applicants in Office Administrative roles typically dominated by women
An increase in male applicants in Protective Service despite all other categories showing a decrease Both categories in which contribute to the largest gender disparity among applicants.

Recommendations?

Promote applications for women in Protective Services as workplace culture might contribute to a confidence gap for women in the Protective Service category.
Develop targeted recruitment campaigns to attract applicants from underrepresented genders & ethnicity in the fields with the largest demographic gaps. This ensures equitable opportunities and a more balanced applicant pool across all departments.

Exploring Gender Gap in Job Applications for LA County 2013-2015

Aaron Liauw

2025-10-07