[Insert a concise and informative title here]

Prompt: [insert only the number of the prompt you chose to answer for this assignment; do not copy/paste the text of the prompt here]

ChatGPT/AI disclosure statement: [describe whether and how you used ChatGPT or other AI tools for this assignment. If you did not, please write “I did not use ChatGPT or other AI for this assignment.”]

1.Introduction

Gender inequality continues to challenge global economic development. This report looks at gender-based inequality in the workforce in India and South Asia in comparison to the UK, US, and Europe.

For this analysis, we are using World Bank Gender Portal data across key indicators: female labor force participation (women aged 15+), labor force with basic education, female-to-male participation ratios, and women’s share in senior management roles. These indicators reveal stark global disparities and volatility over time, reflecting changes in social and economic conditions. While some regions have made measurable progress, others continue to face persistent gaps in women’s participation and representation. Clear insights from this data offer valuable guidance for addressing inequality and promoting inclusive growth. This data report helps policymakers understand which regions need more attention and robust policies to bridge the gender inequality gap.

2.Retrieving primary data through APIs

This analysis draws upon primary data obtained through the World Bank’s Gender Portal API. The API calls were made using the httr package in R, facilitating GET requests to retrieve JSON-formatted data. The base URL specified multiple indicators of interest across all countries, spanning categories such as total labor force participation, gender-specific data, and employment in senior management roles. Parameters were appended to control the data format and paging requirements, ensuring comprehensive data extraction. A custom function, combined with lapply(), was employed to iterate over each page and retrieve the necessary data. The lapply() function allowed for applying to fetch the data function to each page in the dataset, returning the results as a list. This list was then combined into a single data frame using bind_rows() from the dplyr package.The jsonlite package’s fromJSON() function was employed to parse the JSON response into structured data frames. To enhance usability, the data was subjected to a cleaning process using the janitor package, which standardized column names by removing special characters and spaces. This systematic approach ensures reliable and reproducible data collection, enabling meaningful insights into global gender-based labor disparities and trends over time.

library(httr)
library(jsonlite)
library(dplyr)
library(janitor)
library(rnaturalearth)
library(sf)
library(dplyr)
library(classInt)
library(ggplot2)
library(viridis)
library(tidyverse)
library(kableExtra)
library(ggthemes)
library(patchwork)
library(writexl)

#Female Labor Force(% of female above15 years of age) data

# Base URL and parameters for World Bank API
base_url_flfpr <- "https://api.worldbank.org/v2/country/all/indicator/SL.TLF.CACT.FE.ZS;SL.TLF.CACT.FE.NE.ZS;SL.TLF.CACT.MA.ZS;SL.TLF.CACT.MA.NE.ZS;SL.TLF.CACT.ZS;SL.TLF.CACT.NE.ZS;SL.TLF.ACTI.1524.FE.ZS;SL.TLF.ACTI.1524.FE.NE.ZS;SL.TLF.ACTI.1524.MA.ZS;SL.TLF.ACTI.1524.MA.NE.ZS;SL.TLF.ACTI.1524.ZS;SL.TLF.ACTI.1524.NE.ZS;SL.TLF.ACTI.FE.ZS;SL.TLF.ACTI.MA.ZS;SL.TLF.ACTI.ZS?source=14"
params_flfpr <- "&format=json&per_page=10000"

# Fetch the first page to determine total pages
first_page_flfpr  <- fromJSON(content(GET(paste0(base_url_flfpr, params_flfpr, "&page=1")), "text"))
total_pages_flfpr  <- first_page_flfpr[[1]]$pages

# Function to fetch and clean data
fetch_data_flfpr  <- function(page_flfpr ) {
  url_flfpr<- paste0(base_url_flfpr, params_flfpr, "&page=", page_flfpr)
  data_flfpr <- fromJSON(content(GET(url_flfpr ), "text"))[[2]]
  return(data_flfpr )
}

# Fetch all data and remove NA values from 'value' column
all_data_flfpr <- bind_rows(lapply(1:total_pages_flfpr , fetch_data_flfpr )) %>%
  filter(!is.na(value))  # Remove rows with NA in value column

# Clean column names by removing special characters and standardizing
cleaned_data_flfpr  <- all_data_flfpr  %>% 
  clean_names()  # janitor removes spaces, special characters, and ensures lowercase

flatten_data_flfpr  <- all_data_flfpr  %>%
  mutate(indicator_id = indicator$id,
         indicator_value = indicator$value,
         country_id = country$id,
         country_name = country$value) %>%
  select(-indicator, -country) %>%
  select(country_id, country_name, countryiso3code, date, value, unit, obs_status, decimal, indicator_id, indicator_value) %>%
  clean_names()  # Standardize column names using janitor 

##Female Labor force with basic education(%) data

# Base URL and parameters for World Bank API
base_url <- "https://api.worldbank.org/v2/country/all/indicator/SL.TLF.BASC.FE.ZS?source=14"
params <- "&format=json&per_page=10000"

# Getting the numbers of pages that contains the data
first_page <- fromJSON(content(GET(paste0(base_url, params, "&page=1")), "text"))

total_pages <- first_page[[1]]$pages

# Function for fetching each page data
fetch_data <- function(page) {
  url <- paste0(base_url, params, "&page=", page)
  fromJSON(content(GET(url), "text"))[[2]]
}

# Using lapply function to combine data from all the pages
all_data <- bind_rows(lapply(1:total_pages, fetch_data))



# Unnesting the columns from the original data
flatten_data <- all_data %>%
  mutate(indicator_id = indicator$id,
    indicator_value = indicator$value,
    country_id = country$id,
    country_name = country$value) %>%
  select(-indicator, -country) %>%
  select(country_id, country_name, countryiso3code, date, value, unit, obs_status, decimal, indicator_id, indicator_value) %>%
  clean_names()  # Standardize column names using janitor 

##Ratio of female to male labor force participation rate (%) data
# Base URL and parameters for World Bank API
base_url_lfpr_ratio <- "https://api.worldbank.org/v2/country/all/indicator/SL.TLF.CACT.FM.ZS;SL.TLF.CACT.FM.NE.ZS?source=14"
params_lfpr_ratio <- "&format=json&per_page=10000"

# Fetch the first page to determine total pages
first_page_lfpr_ratio  <- fromJSON(content(GET(paste0(base_url_lfpr_ratio, params_lfpr_ratio, "&page=1")), "text"))
total_pages_lfpr_ratio  <- first_page_lfpr_ratio[[1]]$pages

# Function to fetch and clean data
fetch_data_lfpr_ratio  <- function(page_lfpr_ratio ) {
  url_lfpr_ratio<- paste0(base_url_lfpr_ratio, params_lfpr_ratio, "&page=", page_lfpr_ratio)
  data_lfpr_ratio <- fromJSON(content(GET(url_lfpr_ratio ), "text"))[[2]]
  return(data_lfpr_ratio )
}

# Fetch all data and remove NA values from 'value' column
all_data_lfpr_ratio <- bind_rows(lapply(1:total_pages_lfpr_ratio , fetch_data_lfpr_ratio )) %>%
  filter(!is.na(value))  # Remove rows with NA in value column

# Clean column names by removing special characters and standardizing
cleaned_data_lfpr_ratio  <- all_data_lfpr_ratio  %>% 
  clean_names()  # janitor removes spaces, special characters, and ensures lowercase

flatten_data_lfpr_ratio  <- all_data_lfpr_ratio  %>%
  mutate(indicator_id = indicator$id,
         indicator_value = indicator$value,
         country_id = country$id,
         country_name = country$value) %>%
  select(-indicator, -country) %>%
  select(country_id, country_name, countryiso3code, date, value, unit, obs_status, decimal, indicator_id, indicator_value) %>%
  clean_names()  # Standardize column names using janitor

3.Retrieving the secondary data

This report examines two key indicators: the percentage of households with a female head and the female share of employment in senior and middle management. The first highlights women’s economic independence, while the second reveals gender disparities in leadership roles.Together, these indicators underscore the dual challenges women face: greater responsibility in households alongside limited representation in top professional positions. This contrast emphasizes the ongoing barriers to gender equality in both the domestic and corporate spheres.

#Percentage of the households having female head data

# Base URL and parameters for World Bank API
base_url_female_head <- "https://api.worldbank.org/v2/country/all/indicator/SP.HOU.FEMA.ZS?source=14"
params_female_head <- "&format=json&per_page=10000"

# Fetch the first page to determine total pages
first_page_female_head  <- fromJSON(content(GET(paste0(base_url_female_head, params_female_head, "&page=1")), "text"))
total_pages_female_head  <- first_page_female_head[[1]]$pages

# Function to fetch and clean data
fetch_data_female_head  <- function(page_female_head ) {
  url_female_head<- paste0(base_url_female_head, params_female_head, "&page=", page_female_head)
  data_female_head <- fromJSON(content(GET(url_female_head ), "text"))[[2]]
  return(data_female_head )
}

# Fetch all data and remove NA values from 'value' column
all_data_female_head <- bind_rows(lapply(1:total_pages_female_head , fetch_data_female_head )) %>%
  filter(!is.na(value))  # Remove rows with NA in value column

# Clean column names by removing special characters and standardizing
cleaned_data_female_head  <- all_data_female_head  %>% 
  clean_names()  # janitor removes spaces, special characters, and ensures lowercase

flatten_data_female_head  <- all_data_female_head  %>%
  mutate(indicator_id = indicator$id,
         indicator_value = indicator$value,
         country_id = country$id,
         country_name = country$value) %>%
  select(-indicator, -country) %>%
  select(country_id, country_name, countryiso3code, date, value, unit, obs_status, decimal, indicator_id, indicator_value) %>%
  clean_names()  # Standardize column names using janitor 

#Female share of employment in senior and middle management (%)  data
# Base URL and parameters for World Bank API
base_url_position_tmd <- "https://api.worldbank.org/v2/country/all/indicator/SL.EMP.SMGT.FE.ZS?source=14"
params_position_tmd <- "&format=json&per_page=10000"

# Fetch the first page to determine total pages
first_page_position_tmd  <- fromJSON(content(GET(paste0(base_url_position_tmd, params_position_tmd, "&page=1")), "text"))
total_pages_position_tmd  <- first_page_position_tmd[[1]]$pages

# Function to fetch and clean data
fetch_data_position_tmd  <- function(page_position_tmd ) {
  url_position_tmd<- paste0(base_url_position_tmd, params_position_tmd, "&page=", page_position_tmd)
  data_position_tmd <- fromJSON(content(GET(url_position_tmd ), "text"))[[2]]
  return(data_position_tmd )
}

# Fetch all data and remove NA values from 'value' column
all_data_position_tmd <- bind_rows(lapply(1:total_pages_position_tmd , fetch_data_position_tmd )) %>%
  filter(!is.na(value))  # Remove rows with NA in value column

# Clean column names by removing special characters and standardizing
cleaned_data_position_tmd  <- all_data_position_tmd  %>% 
  clean_names()  # janitor removes spaces, special characters, and ensures lowercase

4. Tabular data and transformations

Rank Assignment for Female Labor Force Participation (dense_rank function)
Top and Bottom 20 Female Labor Force Participation
Calculation of Difference Between Female and Male Participation
Pivot Wider for Country-Based Gender Participation
Quintile Classification of Data (Calculation of quantiles using cut and breaks method)

##Calculations for the first primary part##
flatten_data_flfpr<- all_data_flfpr  %>%
  mutate(indicator_id = indicator$id,
         indicator_value = indicator$value,
         country_id = country$id,
         country_name = country$value) %>%
  select(-indicator, -country) %>%
  select(country_id, country_name, countryiso3code, date, value, unit, obs_status, decimal, indicator_id, indicator_value) %>%
  clean_names()  # Standardize column names using janitor 



map_flfpr<-flatten_data_flfpr%>%ungroup()%>%group_by(country_name)%>%
mutate(date=as.numeric(date))%>%filter(date==max(date))%>%ungroup()


map_flfpr<-map_flfpr%>%mutate(indicator_id = case_when(
    indicator_id == "SL.TLF.CACT.FE.ZS" ~ "Female",
    indicator_id=="SL.TLF.CACT.MA.ZS"~ "Male",
    TRUE~indicator_id))%>%filter(indicator_id%in%c("Male","Female"))

map_flfpr<-map_flfpr%>%select(country_name,value,indicator_id)

map_flfpr<-map_flfpr%>%pivot_wider(names_from =indicator_id,values_from = value)%>%
  mutate(across(where(is.numeric), ~ round(.x, 2)))


time_series<-flatten_data_flfpr%>%filter(country_name%in%c("United Kingdom",
    "United States","India"))%>%filter(date%in%c(1990:2020))%>%
  mutate(indicator_id = case_when(
    indicator_id == "SL.TLF.CACT.FE.ZS" ~ "Female",
    indicator_id=="SL.TLF.CACT.MA.ZS"~ "Male",
    TRUE~indicator_id))%>%filter(indicator_id%in%c("Male","Female"))%>%
  mutate(date=as.numeric(date))


time_Series_male<-time_series%>%filter(indicator_id=="Male")

time_Series_female<-time_series%>%filter(indicator_id=="Female")




difference<-map_flfpr%>%mutate(Difference=Female-Male)

difference<-difference%>%select(country_name,Difference)%>%
  mutate(Rank=dense_rank(desc(Difference)))

mean_value <- mean(difference$Difference)
median_value <- median(difference$Difference)

##Basic education data stats-
flatten_data <- all_data %>%
  mutate(indicator_id = indicator$id,
    indicator_value = indicator$value,
    country_id = country$id,
    country_name = country$value) %>%
  select(-indicator, -country) %>%
  select(country_id, country_name, countryiso3code, date, value, unit, obs_status, decimal, indicator_id, indicator_value) %>%
  clean_names()  # Standardize column names using janitor 


country<-flatten_data%>%drop_na(value)

country<-country%>%filter(country_name%in%c("United Kingdom",
"United States","India","Central Europe and the Baltics",
"Africa Western and Central","Europe & Central Asia",
"European Union","High income","Latin America & Caribbean","Lower middle income","North America","OECD members","South Asia",
"Bangladesh","Pakistan","Afghanistan","Macao SAR, China",
"Russian Federation"))

country<-country%>%select(country_name, date, value)


country<-country%>%group_by(country_name,date)%>%mutate(date=as.numeric(date))


country<-country%>%filter(country_name%in%c("India","United Kingdom","United States"))

##Male female ratio data

flatten_data_lfpr_ratio  <- all_data_lfpr_ratio  %>%
  mutate(indicator_id = indicator$id,
         indicator_value = indicator$value,
         country_id = country$id,
         country_name = country$value) %>%
  select(-indicator, -country) %>%
  select(country_id, country_name, countryiso3code, date, value, unit, obs_status, decimal, indicator_id, indicator_value) %>%
  clean_names()  # Standardize column names using janitor 


country_lfpr_ratio <-flatten_data_lfpr_ratio %>%drop_na(value)

country_lfpr_ratio <-country_lfpr_ratio%>%filter(country_name%in%c("United Kingdom",
"United States","India","Central Europe and the Baltics",
"Africa Western and Central","Europe & Central Asia",
"European Union","High income","Latin America & Caribbean","Lower middle income","North America","OECD members","South Asia","Bangladesh","Pakistan","Afghanistan","Macao SAR, China","Russian Federation"))

country_lfpr_ratio<-country_lfpr_ratio%>%select(country_name, date, value)


country_lfpr_ratio<-country_lfpr_ratio%>%group_by(country_name,date)%>%mutate(date=as.numeric(date))


country_lfpr_ratio<-country_lfpr_ratio%>%filter(country_name%in%c("India","United Kingdom","United States"))

#Female share of employment in senior and middle management
flatten_data_position_tmd  <- all_data_position_tmd  %>%
  mutate(indicator_id = indicator$id,
         indicator_value = indicator$value,
         country_id = country$id,
         country_name = country$value) %>%
  select(-indicator, -country) %>%
  select(country_id, country_name, countryiso3code, date, value, unit, obs_status, decimal, indicator_id, indicator_value) %>%
  clean_names()  # Standardize column names using janitor 


country_position_tmd <-flatten_data_position_tmd %>%drop_na(value)

country_position_tmd <-country_position_tmd%>%filter(country_name%in%c("United Kingdom",
    "United States","India"))

country_position_tmd<-country_position_tmd%>%select(country_name, date, value)


country_position_tmd<-country_position_tmd%>%group_by(country_name,date)%>%
  mutate(date=as.numeric(date))



quintile_tmd<-flatten_data_position_tmd%>%ungroup()%>%group_by(country_name)%>%
  mutate(date=as.numeric(date))%>%filter(date==max(date))

quintile_tmd <- quintile_tmd %>%
  select(country_name, value) %>%ungroup()%>%
  mutate(Quantile = cut(value,breaks = unique(quantile(value, probs = seq(0, 1, by = 0.2))),labels = c("Bottom 20", "20-40", "40-60", "60-80", "Top 20"),
      include.lowest = TRUE))

quintile_tmd<-quintile_tmd%>%group_by(Quantile)%>%summarise(`Median share`=median(value))

5.Findings:Graphs and Tables

Female labor force distribution across the countries show huge divergence and the distribution of the difference of female and male labor force participation remain highly skewed.The mean of the female labor force with basic education is just 37.9% and a high difference exists across the countries. The share of the women in the higher and mid label employment remains as low as ever and yet to recover the post pandemic value in some of the countries.Over time, female labor force participation rates have shown minimal fluctuation within individual countries, highlighting a largely stagnant trend. However, a marked disparity persists between developed economies like the UK and the USA compared to India. The participation gap between these groups consistently hovers around 20 percentage points, with the UK and the USA maintaining rates in the 50-60% range, while India lags at 25-30%.

p1<-ggplot(time_Series_female) +
  geom_line(aes(x =date, y = value,color = country_name), linewidth=1) +
  theme_wsj() +
  labs(title = "Female labor force participation rate (above age of 15 years)", 
       subtitle = "comparison over a period of time",
       x = "Date", 
       y = "Value", 
       caption = "Data source: World Bank",
       color="Country/Regions:")+
   scale_y_continuous(limits = c(0, 90), breaks = seq(0, 80, by = 10)) +  
  scale_x_continuous(breaks = seq(0, max(time_Series_female$date), by = 2))


p1

The kernel density plot of the difference between the female and male labor force participation rate is highly skewed to the left indicating a huge gender disparity across the globe. This is for the latest data available for each country. The mean and median difference remain negative at—20.2 and 15.75 percentage points, respectively.

p2<-ggplot(difference, aes(x = Difference)) +
  geom_density(fill = "lightblue", alpha = 0.5) +  # Density plot with fill
  geom_vline(aes(xintercept = mean_value), color = "red", linetype = "dashed", size = 1) +  # Mean line
  geom_vline(aes(xintercept = median_value), color = "blue", linetype = "dotted", size = 1) +  # Median line
  annotate("text", x = mean_value, y = 0.02, label = paste("Mean =", round(mean_value, 2)), color = "red", angle = 90, vjust = -0.5) +  # Mean annotation
  annotate("text", x = median_value, y = 0.02, label = paste("Median =", round(median_value, 2)), color = "blue", angle = 90, vjust = 1.5) +  # Median annotation
  labs(title = "Density distribution of the difference of Female and Male LFPR",
       subtitle = "Difference is in percentage points of the countrywise most recent data",
       x = "Value",
       y = "Density",
       caption = "Data Source:World Bank") +
  theme_stata()

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

p2

A more effective way to analyze the gender disparity in labor force participation is by examining the percentage ratio of female to male labor force participation over time. For both the UK and the US, the trend reveals a consistent upward trajectory, with the UK narrowing the gap with the US in recent years. In contrast, India’s trend remains volatile, showing less stability and slower progress in gender parity.

The difference between the US and the UK in this ratio has remained relatively stable, fluctuating between 4 and 7 percentage points from 1991 to 2023. This suggests that while both countries have made strides toward gender equality in labor force participation, the gap between them has not significantly changed over the past three decades. This stability contrasts with the more erratic pattern observed in India, where efforts to close the gender gap face more challenges.

p3<-ggplot(country_lfpr_ratio) +
  geom_line(aes(x =date, y = value, color = country_name), linewidth=1) +
  theme_wsj() +
  labs(title = "Ratio of female to male labor force participation rate (%)", 
       subtitle = "Ratio remains volatile for India ",
       x = "Date", 
       y = "Value", 
       caption = "Data source: World Bank",
       color="Country/Regions:")+
    scale_y_continuous(limits = c(0, 90), breaks = seq(0, 90, by = 10)) + 
  scale_x_continuous(breaks = seq(0, max(country_lfpr_ratio$date), by = 4))

p3

Looking at the trend in basic education among women in the labor force provides a clearer view of progress over time. Between 1993 and 2023, the UK saw a decline from 64% to 49%, indicating a drop in the proportion of women with basic education in the workforce. The US, on the other hand, experienced a steady increase of 7 percentage points, signaling gradual improvement in educational attainment among women in the labor force.

India shows a larger shift, with the percentage of women with basic education rising by 18 percentage points, from 32% to 50%. This trend points to substantial improvements in female education in India, though the baseline remains lower than in the UK and the US. The differing trends highlight how educational access for women in the labor force has evolved differently across these countries.

p4<-ggplot(country) +
  geom_line(aes(x =date, y = value,color = country_name), linewidth=1) +
  theme_wsj() +
  labs(title = "Labor Force Education by Country", 
       subtitle = "Comparison of labor force with basic education over time (%)",
       x = "Date", 
       y = "Value", 
       caption = "Data source: World Bank",
       color="Country/Regions:")+
   scale_y_continuous(limits = c(0, 90), breaks = seq(0, 80, by = 10)) +  
  scale_x_continuous(breaks = seq(0, max(country$date), by = 4))


p4

An increase in basic education and labor force participation does not necessarily translate to economic equity or fair representation of women in top and middle management roles. This contrast becomes clear when comparing the share of women in these positions across countries. In the US and UK, women make up 44% and 40% of top and middle management, respectively, reflecting a relatively higher level of gender representation in leadership roles. In contrast, India lags significantly, with only 13% of women in these positions. This stark difference highlights the ongoing challenges women face in achieving equity, with the gap between India and the US/UK underscoring the varying degrees of progress across these countries. The quintile-wise for all the countries for which latest data is available also shows the contrast with 26 percentage points difference between the Top 20 and Bottom 20 quantiles.

p5<-ggplot(country_position_tmd) +
  geom_line(aes(x =date, y = value, color = country_name), linewidth=1) +
  theme_economist() +
  labs(title = "Percentage of women in middle and top and middle employment", 
       subtitle = "Time series of data for china is not available",
       x = "Date", 
       y = "Value", 
       caption = "Data source: World Bank",
       color="Country/Regions:")+
    scale_y_continuous(limits = c(0, 60), breaks = seq(0, 60, by = 10)) + 
  scale_x_continuous(breaks = seq(0, max(country_position_tmd$date), by = 4))

p5

p6<-ggplot(quintile_tmd, aes(x = Quantile, y = `Median share`)) + 
  geom_bar(stat = "identity", fill = "steelblue") +
  geom_text(aes(label = round(`Median share`, 1)), 
            position = position_dodge(width = 0.9), 
            vjust = -0.5, size = 3) + 
  coord_flip() +
  theme_economist_white() +
  labs(title = "Quintile-wise share of women in middle and top and middle employment", 
       subtitle = "Twenty seven percentage points difference between top 20 and bottom 20 quintiles",
       x = "Quantile", 
       y = "Median Share (%)", 
       caption = "Data source: World Bank") +
  scale_y_continuous(limits = c(0, 90), breaks = seq(0, 90, by = 10))

Storage of the data

##Six graphs have been created p1, p2, p3,p4,p5, and p6 which have been stored in the google drive

##A master dataset has been created by merging the primary and secondary data

master_data<-bind_rows(flatten_data,flatten_data_flfpr,flatten_data_lfpr_ratio,flatten_data_position_tmd)

write_xlsx(master_data,"C:/Users/210185/Downloads/master_data.xlsx")



cat('<a href="https://drive.google.com/drive/folders/1nkilnuOTjS3PMpky1VA01IwH8nj4oChX?usp=sharing" target="_blank">Access Google Drive Folder</a>')

Access Google Drive Folder