For better readability of the Rmd file, the default code chunk options have been set to hide the code. If you wish to view the entire code for each statistic or graph, please click the “Show” button located on the right side of each section.
This report investigates the correlation between women’s tertiary education enrollment and their participation in STEM fields. Despite the global rise in women’s educational attainment, our analysis reveals that this quantitative expansion has not translated into a proportional increase in STEM participation.
Using data from the World Bank and OECD, we identified a persistent “structural gap” across countries regardless of their education levels. Specifically, Korea exemplifies this paradox: possessing top-tier female enrollment rates yet exhibiting one of the largest gender gaps in STEM. Our findings suggest that deeply rooted occupational segregation—where fields like Education and Humanities remain female-dominated while Engineering is male-dominated—is the primary driver of this inequality.
We conclude that policy interventions must go beyond merely increasing educational access and instead address socio-cultural structures and labor market conditions to dismantle the gender division of labor.
The starting point of our investigation, as well as the research question we seek to clarify, is as follows:
“Does an increase in women’s tertiary education enrollment lead to a corresponding increase in women’s participation in STEM fields?”
STEM fields are core industries that determine a nation’s
technological competitiveness and economic growth. Nevertheless, in many
countries, women’s participation in STEM remains notably low. This issue
has been consistently highlighted, as it not only affects gender-based
wage structures but also undermines industrial diversity and long-term
development potential. Meanwhile, as women’s rights have steadily
improved, women’s tertiary education enrollment rates have continued to
rise worldwide. Within this trend, it is natural to expect that the
expansion of educational opportunities would directly lead to an
increased proportion of women in STEM fields.
However, in the case of Korea, despite women’s tertiary education
enrollment rates being nearly equal to those of men, a substantial
gender gap persists in STEM fields. Therefore, it is important to
empirically verify whether expanded educational opportunities actually
translate into participation in STEM. Furthermore, through such
analysis, we must distinguish whether the gender gap in STEM is
primarily a matter of educational attainment, or whether it stems from
more fundamental factors such as major and industry structures,
sociocultural expectations, and labor-market conditions. Only by
identifying this distinction can we develop concrete recommendations for
future society. We expect that our investigation and analysis will
provide an important foundation for setting policy directions aimed at
mitigating gender inequality in STEM fields. Accordingly, to analyze the
above question
“Does an increase in women’s tertiary
education enrollment also lead to an increase in women’s participation
in STEM fields?”
we set the following three specific
sub-questions and tasks:
1. How does higher education enrollment generally relate to entry into STEM fields by gender?
2. For women, does a higher rate of tertiary education also lead to a higher rate of graduation in STEM fields?
3. What factors influence women’s participation in STEM fields more than education rates?
To address these research tasks, we utilized four distinct datasets sourced from the World Bank and OECD, covering tertiary enrollment rates and STEM graduation statistics. The specific contents of each are detailed below.
# Load Libraries
library(tidyverse)
library(ggrepel)
library(scales)
# Load Datasets
df_enroll_all <- read_csv("school_enrollment_tertiary_all.csv")
df_enroll_female <- read_csv("school_enrollment_tertiary_female.csv")
df_stem <- read_csv("oecd_stem_gender_2013-2023.csv")
df_kpg <- read_csv("kor_pol_grc_grad_2023.csv")
# Define list of aggregated regions to exclude
# Since the World Bank dataset contains a vast number of non-country aggregate groups, we defined a separate list during the common preprocessing stage to ensure code efficiency
remove_list <- c("World", "OECD members", "High income", "Low income",
"Middle income", "Upper middle income", "Lower middle income",
"East Asia & Pacific", "Euro area", "European Union",
"North America", "Sub-Saharan Africa", "Arab World",
"Latin America & Caribbean", "Europe & Central Asia",
"East Asia & Pacific (excluding high income)",
"Europe & Central Asia (excluding high income)",
"Latin America & Caribbean (excluding high income)",
"Middle East & North Africa",
"South Asia",
"Early-demographic dividend", "Late-demographic dividend",
"Post-demographic dividend", "Pre-demographic dividend",
"IDA & IBRD total", "IDA total", "IDA blend", "IDA only", "IBRD only",
"Least developed countries: UN classification",
"Heavily indebted poor countries (HIPC)",
"Fragile and conflict affected situations",
"Small states", "Other small states", "Pacific island small states",
"Caribbean small states",
"Low & middle income",
"Middle East, North Africa, Afghanistan & Pakistan",
"East Asia & Pacific (IDA & IBRD countries)",
"Europe & Central Asia (IDA & IBRD countries)",
"Latin America & the Caribbean (IDA & IBRD countries)",
"Middle East & North Africa (IDA & IBRD countries)",
"South Asia (IDA & IBRD)",
"Sub-Saharan Africa (IDA & IBRD countries)",
"Africa Eastern and Southern", "Africa Western and Central")
All data preprocessing and visualization were conducted using the
tidyversepackage in R.
To ensure data integrity and conduct a strict country-level analysis,
we first addressed the issue of aggregated regions present in the raw
datasets. Since the data included non-country entities such as “World,”
“OECD members,” and “High income,” we defined a comprehensive
remove_list to filter these out. Notably, this
comprehensive removal list was essential for the World Bank enrollment
datasets, which contained a vast number of non-country aggregates. In
contrast, the OECD STEM dataset included significantly fewer such
entities; therefore, we handled the filtering for the STEM data directly
within its individual code chunk for efficiency. Additionally, we
refined the dataset by focusing specifically on the year 2023 and
removing observations with missing values to ensure analytical
consistency.
For the specific requirements of our analysis, further structural
transformations were applied. To enable the correlation analysis in
Figures 1 and 2, we merged the World Bank enrollment data with the OECD
STEM datasets using ISO country codes via an inner_join.
Furthermore, to facilitate the group comparison in Figure 1, we
categorized countries into “Low,” “Medium,” and “High” tiers based on
the quantiles of their total tertiary enrollment rates using the
cut() function. Finally, for Figure 3, we reshaped the data
using pivot_wider and derived a new Gap metric
(calculated as the female percentage minus 50%) to intuitively visualize
both the direction and magnitude of gender segregation across different
fields of study.
Additionally, more detailed data cleaning steps were noted as small comments within each code block.
The following sections provide a brief description and basic statistics for each dataset.
school_enrollment_tertiary_all.csv
Sourced from the World Bank, the Tertiary Enrollment dataset provides
the gross tertiary enrollment ratio for the total population, regardless
of gender. For our analysis, we specifically focused on the
Country Name and the enrollment figures for the year
2023. This dataset serves as a crucial baseline to
categorize countries into distinct education levels (“Low,” “Medium,”
and “High”). By establishing these groups, we aim to empirically verify
whether the gender gap in STEM is a universal phenomenon that persists
independently of a country’s general educational development.
# Data Processing for dataset_1
# Data Cleaning (Using distinct variable names to avoid conflicts)
clean_enroll_all_eda <- df_enroll_all %>%
select(Name = `Country Name`, Enroll_Total = `2023`) %>%
filter(!is.na(Enroll_Total)) %>%
filter(!Name %in% remove_list)
# Calculate Descriptive Statistics
stats_summary <- clean_enroll_all_eda %>%
summarise(
Count = n(),
Mean = round(mean(Enroll_Total), 2),
SD = round(sd(Enroll_Total), 2),
Median = round(median(Enroll_Total), 2),
Min = round(min(Enroll_Total), 2),
Max = round(max(Enroll_Total), 2)
)
# Output Table
knitr::kable(stats_summary, caption = "Descriptive Statistics: Total Tertiary Enrollment (2023)")
| Count | Mean | SD | Median | Min | Max |
|---|---|---|---|---|---|
| 126 | 52.87 | 30.92 | 53.98 | 4.94 | 165.11 |
# Histogram Visualization
p0 <- ggplot(clean_enroll_all_eda, aes(x = Enroll_Total)) +
geom_histogram(binwidth = 10, fill = "#69b3a2", color = "white", alpha = 0.8) +
geom_vline(aes(xintercept = mean(Enroll_Total)), color = "red", linetype = "dashed", size = 1) +
labs(title = "Distribution of Total Tertiary Enrollment Rates (2023)",
subtitle = "Red dashed line indicates the global mean",
x = "Tertiary Enrollment Rate (% Gross)",
y = "Count of Countries") +
theme_minimal()
p0
school_enrollment_tertiary_female.csv
This dataset, also sourced from the World Bank, specifically focuses
on the female tertiary enrollment ratio using data from
2023. It functions as the key independent variable in our
analysis, serving as the foundation to test our primary hypothesis:
“Does higher educational access for women lead to higher STEM
participation?” By isolating female enrollment rates, we aim to
quantitatively measure the correlation between the expansion of general
educational opportunities for women and their specific entry into STEM
fields.
# Data Processing for dataset_2
# Use _eda suffix to avoid variable name conflicts
# Data Cleaning (Female Dataset)
clean_enroll_female_eda <- df_enroll_female %>%
select(Name = `Country Name`, Enroll_Rate = `2023`) %>%
filter(!is.na(Enroll_Rate)) %>%
filter(!Name %in% remove_list) # 기존에 정의된 remove_list 사용
# Calculate Descriptive Statistics
stats_female_summary <- clean_enroll_female_eda %>%
summarise(
Count = n(),
Mean = round(mean(Enroll_Rate), 2),
SD = round(sd(Enroll_Rate), 2),
Median = round(median(Enroll_Rate), 2),
Min = round(min(Enroll_Rate), 2),
Max = round(max(Enroll_Rate), 2)
)
# Output Table
knitr::kable(stats_female_summary, caption = "Descriptive Statistics: Female Tertiary Enrollment (2023)")
| Count | Mean | SD | Median | Min | Max |
|---|---|---|---|---|---|
| 125 | 60.92 | 35.49 | 63.11 | 4.66 | 171.11 |
# Histogram Visualization
p_female_hist <- ggplot(clean_enroll_female_eda, aes(x = Enroll_Rate)) +
geom_histogram(binwidth = 10, fill = "#FF9999", color = "white", alpha = 0.8) +
geom_vline(aes(xintercept = mean(Enroll_Rate)), color = "darkred", linetype = "dashed", size = 1) +
labs(title = "Distribution of Female Tertiary Enrollment Rates (2023)",
subtitle = "Red dashed line indicates the global mean for females",
x = "Female Tertiary Enrollment Rate (% Gross)",
y = "Count of Countries") +
theme_minimal()
p_female_hist
After a rigorous data-cleaning process, the number of countries included in the analysis was 126. However, we found that Lebanon’s gender-specific data were missing for the female subset. Consequently, the analysis by overall gender covered 126 countries, while the analysis restricted to female data covered 125 countries.
oecd_stem_gender_2013-2023.csv
Sourced from the OECD, the STEM Graduates dataset measures the share
of graduates in Science, Technology, Engineering, and Mathematics fields
by gender. In our research framework, this dataset serves as the primary
dependent variable. By merging these figures with the World Bank
enrollment data, we are able to calculate the gender gap and identify
significant outliers, such as Japan and Korea. The analysis primarily
utilizes the Reference area (Country), Sex
(Male/Female), and Observation value to quantify and
compare STEM participation rates across different nations.
# Data Processing for dataset_3
# Use '_stem_eda' suffix to avoid variable name conflicts
# Data Cleaning
clean_stem_female_eda <- df_stem %>%
filter(TIME_PERIOD == 2023, Sex == "Female") %>%
# Select only total mobility data (excluding international students)
filter(Mobility == "_T" | Mobility == "Total") %>%
# Remove aggregate groups (e.g., OECD average, EU, G20)
filter(!`Reference area` %in% c("OECD", "European Union (25 countries)", "G20", "EU27 (from 2020)")) %>%
select(Country = `Reference area`, Value = `OBS_VALUE`) %>%
mutate(Value = as.numeric(Value)) %>%
filter(!is.na(Value))
# Calculate Descriptive Statistics
stats_stem_summary <- clean_stem_female_eda %>%
summarise(
Count = n(),
Mean = round(mean(Value), 2),
SD = round(sd(Value), 2),
Median = round(median(Value), 2),
Min = round(min(Value), 2),
Max = round(max(Value), 2)
)
# Output Table
knitr::kable(stats_stem_summary, caption = "Descriptive Statistics: Share of Female Graduates in STEM (2023, OECD)")
| Count | Mean | SD | Median | Min | Max |
|---|---|---|---|---|---|
| 41 | 33.7 | 5.51 | 34.42 | 18.08 | 43.29 |
# Boxplot Visualization
p_stem_box <- ggplot(clean_stem_female_eda, aes(x = "", y = Value)) +
# Median, Quartiles
geom_boxplot(fill = "lavender", color = "purple", width = 0.5, outlier.shape = NA) +
# Jitter
geom_jitter(width = 0.1, size = 2, alpha = 0.6, color = "darkslateblue") +
# Indicate Mean value (Red Diamond)
stat_summary(fun = mean, geom = "point", shape = 18, size = 4, color = "red") +
labs(title = "Distribution of Female STEM Graduates in OECD Countries (2023)",
subtitle = "Boxplot with individual country points (Red diamond = Mean)",
x = "",
y = "Share of Female Graduates in STEM (%)") +
theme_minimal()
p_stem_box
kor_pol_grc_grad_2023.csv
Finally, the Graduate Statistics dataset provides high-granularity
data specifically for our three focus countries: Korea, Poland, and
Greece. Unlike the previous broad indicators, this dataset breaks down
graduate statistics by detailed Field of education, such as
Education, ICT, and Engineering. While the earlier datasets established
the existence of a gender gap, this specific data helps us explain why
it exists by revealing the structural segregation of majors within these
nations. By analyzing variables such as Sex and
Observation value across these specific fields, we can
pinpoint the exact academic disciplines that drive the observed gender
disparity.
# Data Processing for dataset_4
# Descriptive Statistics (Disaggregated by Country), Dumbbell Chart (Aggregated)
# Use '_eda' and '_combined' suffixes to avoid variable name conflicts
# Data Cleaning
clean_kpg_eda <- df_kpg %>%
filter(Mobility == "Total") %>%
filter(Sex %in% c("Female", "Male")) %>%
select(Country = `Reference area`,
Field = `Field of education`,
Sex,
Percentage = `OBS_VALUE`) %>%
mutate(Percentage = as.numeric(Percentage)) %>%
filter(!is.na(Percentage))
# Descriptive Statistics (Disaggregated by Country)
stats_kpg_individual <- clean_kpg_eda %>%
group_by(Country, Sex) %>%
summarise(
Mean = round(mean(Percentage), 2),
SD = round(sd(Percentage), 2),
Min = round(min(Percentage), 2),
Max = round(max(Percentage), 2),
.groups = 'drop'
)
# Output Table
knitr::kable(stats_kpg_individual, caption = "Descriptive Statistics: Graduates Distribution by Country & Gender (2023)")
| Country | Sex | Mean | SD | Min | Max |
|---|---|---|---|---|---|
| Greece | Female | 57.40 | 15.95 | 35.34 | 85.10 |
| Greece | Male | 42.60 | 15.95 | 14.90 | 64.66 |
| Korea | Female | 50.51 | 18.00 | 22.50 | 76.17 |
| Korea | Male | 49.49 | 18.00 | 23.83 | 77.50 |
| Poland | Female | 60.47 | 19.31 | 22.65 | 85.76 |
| Poland | Male | 39.53 | 19.31 | 14.24 | 77.35 |
# Aggregated Dumbbell Chart (Average of 3 Countries)
# Data Aggregation
kpg_combined_data <- clean_kpg_eda %>%
group_by(Field, Sex) %>%
summarise(
Avg_Percentage = mean(Percentage, na.rm = TRUE), # 3개국 평균값 사용
.groups = 'drop'
)
# Reshape Data for Dumbbell Chart (Wide Format)
kpg_wide_combined <- kpg_combined_data %>%
pivot_wider(names_from = Sex, values_from = Avg_Percentage)
# Visualize Aggregated Dumbbell Chart
p_kpg_combined_dumbbell <- ggplot(kpg_wide_combined) +
# Dumbbell Segment (Gap)
geom_segment(aes(y = reorder(Field, Female), yend = reorder(Field, Female),
x = Male, xend = Female),
color = "gray60", size = 1.2) +
# Male Points (Blue)
geom_point(aes(y = reorder(Field, Female), x = Male),
color = "#6699CC", size = 4) +
# Female Points (Red)
geom_point(aes(y = reorder(Field, Female), x = Female),
color = "#FF9999", size = 4) +
# Labels
scale_y_discrete(labels = function(x) str_wrap(x, width = 35)) +
scale_x_continuous(labels = scales::percent_format(scale = 1)) +
labs(title = "Average Gender Gap by Field of Study",
subtitle = "Aggregated View (Avg of Korea, Poland, Greece): Blue=Male, Red=Female",
x = "Average Share of Graduates (%)",
y = "",
caption = "Data Source: OECD (2023)") +
theme_minimal() +
theme(
axis.text.y = element_text(size = 10, face = "bold"),
panel.grid.major.y = element_blank(),
legend.position = "top"
)
p_kpg_combined_dumbbell
To effectively identify overall trends and improve visual readability, we moved away from simple scatter plots, which proved to be too cluttered and failed to show a meaningful regression pattern. Instead, we classified countries into three distinct tiers based on their total tertiary enrollment rates: “Low” (21.5%–76.0%), “Medium” (76.0%–81.4%), and “High” (81.4%–165.1%). Notably, Korea falls into the “High” group with an enrollment rate of 106.7%. We selected boxplots as the primary visualization structure because they offer the most direct method to examine whether countries with similar levels of tertiary enrollment exhibit similar gender distributions in STEM graduation rates. This approach provides statistical clarity by allowing for a quick comparison of the median and overall distribution across the different educational tiers.
# Data Processing for Figure 1
# Data Cleaning
clean_enroll_all <- df_enroll_all %>%
select(Code = `Country Code`, Name = `Country Name`, Enroll_Total = `2023`) %>%
filter(!is.na(Enroll_Total)) %>%
filter(!Name %in% remove_list)
# Data Cleaning
clean_stem_both <- df_stem %>%
filter(TIME_PERIOD == 2023) %>%
filter(Sex %in% c("Female", "Male")) %>%
filter(Mobility == "_T" | Mobility == "Total") %>%
filter(!`Reference area` %in% c("OECD", "European Union (25 countries)")) %>%
select(Code = REF_AREA, Sex, STEM_Rate = OBS_VALUE) %>%
filter(!is.na(STEM_Rate))
# Data Merging & Grouping
merged_boxplot <- inner_join(clean_enroll_all, clean_stem_both, by = "Code") %>%
mutate(
Edu_Level = cut(Enroll_Total,
breaks = quantile(Enroll_Total, probs = c(0, 1/3, 2/3, 1), na.rm = TRUE),
labels = c("Low Enrollment", "Medium Enrollment", "High Enrollment"),
include.lowest = TRUE)
)
# Visualization Graph 1
p1 <- ggplot(merged_boxplot, aes(x = Edu_Level, y = STEM_Rate, fill = Sex)) +
geom_boxplot(alpha = 0.7, outlier.shape = 19, outlier.size = 2) +
# Highlight Label for Japan
geom_text_repel(data = subset(merged_boxplot, Name == "Japan"),
aes(label = Name),
size = 5, fontface = "bold", color = "black",
min.segment.length = 0, # 무조건 선 그리기
nudge_x = 0.4, # 라벨을 옆으로 밀기
show.legend = FALSE) +
scale_fill_manual(values = c("Female" = "#FF9999", "Male" = "#6699CC")) +
scale_y_continuous(breaks = seq(0, 100, 10)) +
labs(title = "Global STEM Gender Gap by Education Level (2023)",
subtitle = "Gender segregation in STEM persists regardless of national education levels.",
x = "Tertiary Education Enrollment Level (Quantiles)",
y = "Share of Graduates in STEM (%)",
fill = "Gender",
caption = "Data Source: World Bank & OECD\n(Groups defined by enrollment rate tertiles)") +
theme_minimal() +
theme(legend.position = "bottom")
p1
The visual analysis reveals a persistent structural gap that exists regardless of a country’s educational attainment. Whether tertiary enrollment rates were high or low, gender disparities remained consistent, with the median STEM graduation rate standing at approximately 30% for women compared to 70% for men. We also identified an extreme outlier within the “Low” enrollment group: Japan. With a male graduation rate of 81.92% versus a female rate of only 18.08%, Japan illustrates that possessing a similar education level to peers does not guarantee gender parity. While this specific case warrants deeper sociological investigation, it serves here as a stark example that extreme gender disparities can persist even within comparable educational contexts.
To further investigate our research question, we narrowed the scope of our analysis specifically to the female population. We utilized a scatterplot to visually examine the correlation between female tertiary enrollment (the independent variable) and the share of female graduates in STEM (the dependent variable). Additionally, a regression line was superimposed on the plot to identify whether a global trend exists connecting educational access to STEM participation. Within this global context, we highlighted three specific countries to better situate Korea’s position: Korea, our primary subject of interest; Poland, which shares a similar enrollment rate to Korea yet exhibits higher STEM participation; and Greece, which outperforms Korea in both enrollment metrics and STEM graduation rates.
# Data Processing for Figure 2
# Data Cleaning
clean_enroll_female <- df_enroll_female %>%
select(Code = `Country Code`, Name = `Country Name`, Enroll_Female = `2023`) %>%
filter(!is.na(Enroll_Female)) %>%
filter(!Name %in% remove_list)
# Filter STEM Data for Female Graduates
clean_stem_female <- clean_stem_both %>%
filter(Sex == "Female")
# Data Merging & Defining Highlight Groups
merged_female <- inner_join(clean_enroll_female, clean_stem_female, by = "Code") %>%
mutate(
Highlight = case_when(
Code %in% c("KOR", "POL", "GRC") ~ "Focus Country",
TRUE ~ "Others"
)
)
# Visualization Figure 2
p2 <- ggplot(merged_female, aes(x = Enroll_Female, y = STEM_Rate)) +
geom_smooth(method = "lm", color = "darkgray", fill = "lightgray", alpha = 0.5) +
# Configure Point Color and Size
geom_point(aes(color = Highlight, size = Highlight), alpha = 0.8) +
scale_color_manual(values = c("Focus Country" = "#7B1FA2", "Others" = "#FF9999")) +
scale_size_manual(values = c("Focus Country" = 4, "Others" = 2.5)) +
# Labeling for Focus Countries
geom_text_repel(data = subset(merged_female, Highlight == "Focus Country"),
aes(label = Name),
size = 5, fontface = "bold", box.padding = 0.5, color = "black") +
labs(title = "Female Tertiary Enrollment vs. Female STEM Share (2023)",
subtitle = "Highlight: Korea, Poland, Greece vs. Global Trend",
x = "Female Tertiary Enrollment Rate (% Gross)",
y = "Share of Female Graduates in STEM (%)",
caption = "Data Source: World Bank & OECD",
color = "Group", size = "Group") +
theme_minimal() +
theme(legend.position = "bottom")
p2
The resulting plot displays a slight positive trend, indicated by the upward slope of the regression line, which suggests that higher female education levels are generally associated with an increase in STEM graduates. However, the correlation is relatively weak, as evidenced by the wide distribution of data points around the line; this variance implies that educational access alone does not guarantee high STEM participation. Most notably, Korea lies significantly below the regression line compared to its peers with similar educational attainment. This marked discrepancy highlights a specific underperformance in STEM integration despite high enrollment rates, explicitly justifying the need for the subsequent focused comparison with Poland and Greece to uncover the underlying structural causes.
To visualize the data effectively, we prioritized clarity and intuition over complexity. Although we initially experimented with dumbbell charts, we found them too cluttered for comparing multiple fields simultaneously. Consequently, we selected a diverging bar chart as the most appropriate visualization method. Furthermore, to reduce redundancy—since the sum of female and male percentages naturally equals 100%—we chose not to display both raw figures. Instead, we defined and visualized a “Skewed Gap” metric (calculated as the female percentage minus 50%). This approach explicitly highlights both the magnitude and direction of gender segregation, allowing for an immediate visual understanding of which gender dominates a specific field of study.
# Data Processing for Figure 3
# Data Cleaning
clean_kpg <- df_kpg %>%
filter(Mobility == "Total") %>%
filter(Sex %in% c("Female", "Male")) %>%
select(Country = `Reference area`,
Field = `Field of education`,
Sex,
Percentage = `OBS_VALUE`) %>%
mutate(Percentage = as.numeric(Percentage)) %>%
filter(!is.na(Percentage))
# Reshape to Wide Format & Calculate Gender Gap
kpg_gap <- clean_kpg %>%
pivot_wider(names_from = Sex, values_from = Percentage) %>%
mutate(Gap = Female - 50) %>%
mutate(Dominance = ifelse(Gap > 0, "Female-Dominated", "Male-Dominated"))
# Visualization Figure 3
p3 <- ggplot(kpg_gap, aes(x = reorder(Field, Gap), y = Gap, fill = Dominance)) +
geom_col(width = 0.7) +
facet_wrap(~ Country, ncol = 3) +
coord_flip() +
scale_fill_manual(values = c("Female-Dominated" = "#FF9999",
"Male-Dominated" = "#6699CC")) +
scale_x_discrete(labels = function(x) str_wrap(x, width = 30)) +
geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") +
labs(title = "Gender Gap by Field of Study (2023)",
subtitle = "Right (Red) = Female Majority / Left (Blue) = Male Majority",
x = "",
y = "Gender Gap (Female Share - 50%)",
fill = "Field Dominance",
caption = "Data Source: OECD") +
theme_bw() +
theme(
strip.text = element_text(size = 12, face = "bold"),
legend.position = "bottom",
axis.text.y = element_text(size = 8)
)
p3
The analysis reveals a consistent pattern of “gender segregation” across fields of study, which is observable regardless of the specific country in question. Specifically, fields such as Education, Healthcare, and Humanities are heavily female-dominated, whereas Engineering, ICT, and Manufacturing remain overwhelmingly male-dominated. This persistent structural distinction implies that simply increasing overall educational attainment does not automatically guarantee higher female participation in STEM industries. Instead, our findings suggest that deeply rooted gender-based structures, such as socio-cultural norms or labor market conditions, may play a more critical role in shaping these occupational outcomes than educational access alone.
In conclusion, our analysis demonstrates that simply raising women’s educational attainment is insufficient to achieve gender equality in STEM fields. While tertiary enrollment rates for women have increased globally, a distinct “gendered trend” persists across different fields of study.
Specifically, our focus group analysis (Figure 3) reveals a deeply rooted occupational segregation: fields such as Education, Healthcare, and Humanities remain heavily female-dominated, whereas STEM and engineering-related fields are overwhelmingly male-dominated. This structural barrier explains the paradox observed in Korea, where top-tier female enrollment rates do not translate into STEM participation.
Therefore, we suggest that closing the gender gap requires more than just educational access. Future interventions should address broader structural factors, such as socio-cultural norms or labor market conditions, that perpetuate the gender division of labor.