3.1 Main Variables
From the original KOSIS table, we construct a tidy dataset with the following variables:
- time_code: Original KOSIS time code (e.g., “2024.10”).
- time: Converted date variable (first day of each month).
- age_group: Age category; for our main analysis we use “Age 15–29”.
- sex: Total, Male, Female (we focus on “Total”).
- indicator: One of three labor market indicators:
- Employment rate(%)
- Unemployment rate(%)
- Labor force participation rate(%)
- value: Numeric Value of the indicator
In the code below, we reshape the raw table into a tidy data frame and create English labels for the key indicators and categories.
# 1. extract indicator labels from first row and reshape
header_long <- youth_raw %>%
slice(1) %>%
pivot_longer(
cols = -c(성별, 연령계층별),
names_to = "time_code",
values_to = "indicator_label"
)
# 2. reshape numeric data rows
data_long <- youth_raw %>%
slice(-1) %>%
pivot_longer(
cols = -c(성별, 연령계층별),
names_to = "time_code",
values_to = "value"
)
# 3. create tidy dataset with English labels
youth_tidy <- data_long %>%
fill(성별, .direction = "down") %>%
left_join(header_long %>% select(time_code, indicator_label),
by = "time_code") %>%
mutate(
value = as.numeric(gsub(",", "", value)),
indicator = case_when(
indicator_label == "고용률 (%)" ~ "Employment rate",
indicator_label == "실업률 (%)" ~ "Unemployment rate",
indicator_label == "경제활동참가율 (%)" ~ "Labor force participation rate",
TRUE ~ NA_character_
),
sex = case_when(
성별 == "계" ~ "Total",
성별 == "남자" ~ "Male",
성별 == "여자" ~ "Female",
TRUE ~ as.character(성별)
),
age_group = case_when(
연령계층별 == "15 - 29세" ~ "Age 15–29",
TRUE ~ as.character(연령계층별)
)
) %>%
filter(!is.na(indicator))
# 4. keep youth age 15–29 and total sex
youth_15_29_total <- youth_tidy %>%
filter(age_group == "Age 15–29", sex == "Total")
# 5. create date variable
youth_15_29_total_time <- youth_15_29_total %>%
mutate(
year = substr(time_code, 1, 4),
month = substr(time_code, 6, 7),
time = as.Date(paste(year, month, "01", sep = "-"))
) %>%
arrange(time)
# 6. summary statistics for key indicators
youth_summary <- youth_15_29_total_time %>%
group_by(indicator) %>%
summarise(
mean = mean(value, na.rm = TRUE),
sd = sd(value, na.rm = TRUE),
min = min(value, na.rm = TRUE),
max = max(value, na.rm = TRUE),
.groups = "drop"
)
youth_summary
## # A tibble: 3 × 5
## indicator mean sd min max
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Employment rate 45.2 0.566 44.3 46.2
## 2 Labor force participation rate 48.0 0.687 47.1 49.5
## 3 Unemployment rate 5.99 0.876 4.8 7.5
##3.2 Distribution of Youth Labor Market Indicators (Age 15–29, Total)
To obtain a first impression of the recent labor market conditions for youths aged 15–29, we look at the distributions of the employment rate, unemployment rate, and labor force participation rate across the one-year period.
ggplot(youth_15_29_total_time, aes(x = value)) +
geom_histogram(bins = 10) +
geom_density(linewidth = 1) +
facet_wrap(~ indicator, scales = "free") +
labs(
title = "Distribution of youth labor market indicators (Age 15–29, Total)",
x = "Value (%)",
y = "Density"
)
These descriptive results provide a baseline overview of how high or low
the three indicators tend to be for youths aged 15–29 and how much they
fluctuate from month to month during the period from October 2024 to
October 2025.
#4. Results and Visualizations
In this section, we present visualizations to address our research questions. All figures focus exclusively on Age 15–29, Total.
4.1 Monthly Trends for Youths Aged 15–29
We first examine how the three indicators change over time for youths aged 15–29 (Total sex) during the one-year period.
ggplot(youth_15_29_total_time,
aes(x = time, y = value, color = indicator)) +
geom_line(linewidth = 1) +
scale_x_date(date_breaks = "1 month", date_labels = "%y-%m") +
labs(
title = "Monthly trends in youth labor indicators (Age 15–29, Total)",
x = "Month",
y = "Value (%)",
color = "Indicator"
) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
This figure shows how the employment rate, unemployment rate, and labor
force participation rate for youths aged 15–29 evolve over the
twelve-month period. By visually comparing the three series, we can see
whether there are months with particularly high or low unemployment and
how stable the overall youth labor market appears in the short term.
4.2 Comparison of youth age groups and sex (bar chart)
Although our main focus is youths aged 15–29 in total, it is still
useful to briefly compare different youth-related age groups and sex in
the most recent month of the dataset.
The bar chart below shows employment, unemployment, and labor force
participation rates for several youth age bands (15–19, 15–24, 15–29,
20–29) by sex.
# 1. Select youth-related age groups available in the raw data
target_age_raw <- c("15 - 19세", "15 - 24세", "15 - 29세", "20 - 29세")
age_sex_data <- youth_tidy %>%
# Keep only the age groups we want to compare
filter(연령계층별 %in% target_age_raw,
sex %in% c("Male", "Female")) %>% # Compare males and females
mutate(
# Create readable English labels for age groups
age_group_en = case_when(
연령계층별 == "15 - 19세" ~ "Age 15–19",
연령계층별 == "15 - 24세" ~ "Age 15–24",
연령계층별 == "15 - 29세" ~ "Age 15–29",
연령계층별 == "20 - 29세" ~ "Age 20–29",
TRUE ~ as.character(연령계층별)
),
# Convert time_code (e.g., "2024.10") to a Date object (e.g., 2024-10-01)
year = substr(time_code, 1, 4),
month = substr(time_code, 6, 7),
time = as.Date(paste(year, month, "01", sep = "-"))
)
# 2. Find the most recent month in the dataset
latest_time <- max(age_sex_data$time, na.rm = TRUE)
# 3. Compute one value per indicator × age group × sex for the latest month
latest_summary <- age_sex_data %>%
filter(time == latest_time) %>%
group_by(indicator, age_group_en, sex) %>%
summarise(
value = mean(value, na.rm = TRUE),
.groups = "drop"
)
# 4. Draw a bar chart showing youth indicators by age group and sex
ggplot(latest_summary,
aes(x = age_group_en, y = value, fill = sex)) +
geom_col(position = "dodge") +
facet_wrap(~ indicator, scales = "free_y") +
labs(
title = "Youth labor market indicators by age group and sex (latest month)",
x = "Age group",
y = "Value (%)",
fill = "Sex"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1)
)
In the latest month, older youths (20–29) generally show higher employment and labor force participation rates than teenagers (15–19), while the unemployment rate differences between males and females are relatively small for most age groups.
##4.3 Relationships Among Indicators (Age 15–29, Total)
To explore how the three indicators relate to each other over time within the 15–29 age group, we construct scatterplots where each point represents a specific month. We also label each point with a simple time code (e.g., “24-10”) to show the temporal order.
library(dplyr)
library(tidyr)
library(ggplot2)
library(ggrepel)
# 1. Choose age groups to compare (use the labels that actually exist in `age_group`)
target_age_groups <- c("15 - 19세", "Age 15–29", "20 - 29세")
# 2. Build wide-format data: one row per age_group × sex × month
scatter_data <- youth_tidy %>%
mutate(
# Create a simple label for each month (e.g., "24-10")
year = substr(time_code, 1, 4),
month = substr(time_code, 6, 7),
time_label = paste0(substr(year, 3, 4), "-", month)
) %>%
filter(age_group %in% target_age_groups) %>%
select(age_group, sex, time_label, indicator, value) %>%
pivot_wider(
names_from = indicator,
values_from = value
)
# 3-1. Employment rate vs Unemployment rate
p_emp_unemp <- ggplot(
scatter_data,
aes(
x = `Employment rate`,
y = `Unemployment rate`,
color = age_group,
label = time_label
)
) +
geom_point(na.rm = TRUE) +
geom_text_repel(size = 3, max.overlaps = 20, na.rm = TRUE) +
facet_wrap(~ sex) +
labs(
title = "Employment vs Unemployment rates by age group and sex",
x = "Employment rate (%)",
y = "Unemployment rate (%)",
color = "Age group"
) +
theme_minimal()
# 3-2. Labor force participation vs Unemployment rate
p_act_unemp <- ggplot(
scatter_data,
aes(
x = `Labor force participation rate`,
y = `Unemployment rate`,
color = age_group,
label = time_label
)
) +
geom_point(na.rm = TRUE) +
geom_text_repel(size = 3, max.overlaps = 20, na.rm = TRUE) +
facet_wrap(~ sex) +
labs(
title = "Labor force participation vs Unemployment rates by age group and sex",
x = "Labor force participation rate (%)",
y = "Unemployment rate (%)",
color = "Age group"
) +
theme_minimal()
# 3-3. Labor force participation vs Employment rate
p_act_emp <- ggplot(
scatter_data,
aes(
x = `Labor force participation rate`,
y = `Employment rate`,
color = age_group,
label = time_label
)
) +
geom_point(na.rm = TRUE) +
geom_text_repel(size = 3, max.overlaps = 20, na.rm = TRUE) +
facet_wrap(~ sex) +
labs(
title = "Labor force participation vs Employment rates by age group and sex",
x = "Labor force participation rate (%)",
y = "Employment rate (%)",
color = "Age group"
) +
theme_minimal()
# 4. Print all three plots
p_emp_unemp
p_act_unemp
p_act_emp
These scatterplots help us see whether higher labor force participation
is associated with higher employment for youths aged 15–29, and how
unemployment behaves relative to employment over time.
###4.4 Turning Points in Youth Employment and Unemployment
Finally, we look for months with particularly large changes in the employment and unemployment rates for youths aged 15–29. These months can be considered short-term “turning points” in the recent youth labor market.
# keep recent period explicitly (already just 2024-10 to 2025-10, but filter for clarity)
youth_15_29_recent <- youth_15_29_total_time %>%
filter(time >= as.Date("2024-10-01"),
time <= as.Date("2025-10-01"))
# Helper: compute month-to-month changes and select turning points
get_turning_points <- function(df, indicator_name, n_points = 1) {
series <- df %>%
filter(indicator == indicator_name) %>%
arrange(time) %>%
mutate(
change = value - dplyr::lag(value)
)
top_inc <- series %>%
slice_max(change, n = n_points, with_ties = FALSE)
top_dec <- series %>%
slice_min(change, n = n_points, with_ties = FALSE)
list(series = series,
turning = bind_rows(top_inc, top_dec))
}
# Employment rate turning points
emp_list <- get_turning_points(youth_15_29_recent, "Employment rate", n_points = 1)
emp_series <- emp_list$series
emp_turning <- emp_list$turning
p_emp_turn <- ggplot(emp_series,
aes(x = time, y = value)) +
geom_line(linewidth = 1) +
geom_point() +
geom_point(data = emp_turning,
aes(x = time, y = value),
color = "blue",
size = 3) +
geom_text(
data = emp_turning,
aes(label = paste0(format(time, "%Y-%m"),
"\nΔ=", round(change, 1))),
vjust = -1,
size = 3
) +
scale_x_date(date_breaks = "1 month", date_labels = "%y-%m") +
labs(
title = "Employment rate (Age 15–29, Total) with largest monthly changes",
x = "Month",
y = "Employment rate (%)"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
# Unemployment rate turning points
unemp_list <- get_turning_points(youth_15_29_recent, "Unemployment rate", n_points = 1)
unemp_series <- unemp_list$series
unemp_turning <- unemp_list$turning
p_unemp_turn <- ggplot(unemp_series,
aes(x = time, y = value)) +
geom_line(linewidth = 1) +
geom_point() +
geom_point(data = unemp_turning,
aes(x = time, y = value),
color = "red",
size = 3) +
geom_text(
data = unemp_turning,
aes(label = paste0(format(time, "%Y-%m"),
"\nΔ=", round(change, 1))),
vjust = -1,
size = 3
) +
scale_x_date(date_breaks = "1 month", date_labels = "%y-%m") +
labs(
title = "Unemployment rate (Age 15–29, Total) with largest monthly changes",
x = "Month",
y = "Unemployment rate (%)"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
p_emp_turn
p_unemp_turn
These annotated line plots highlight the months with the largest month-to-month increases and decreases in the youth employment and unemployment rates. They help identify specific points in time where the youth labor market changed more sharply than usual within this one-year window.
Possible explanations for the February and May turning
points.
Our dataset alone does not allow us to identify causal mechanisms, but
we can cautiously interpret the largest month-to-month changes by
referring to official employment statistics and typical seasonal
patterns in the Korean labor market.
According to Statistics Korea’s “Employment Trend, February 2025,” the unemployment rate for youths aged 15–29 rose to around 7%, while the overall unemployment rate for the total population stayed roughly flat at about 3%. In the official commentary, this spike in youth unemployment was linked to continued weakness in some sectors such as manufacturing, construction, wholesale/retail trade, and accommodation and food services, as well as an increase in young people who temporarily left the labor force or reported that they were “resting.” Combined with the winter off-season—when many part-time and service jobs disappear—this helps explain why our graph shows a sharp jump in the 15–29 unemployment rate between January and February 2025.
By contrast, government summaries of the “Employment Trend, May 2025” describe an overall improvement in labor market conditions. The aggregate unemployment rate fell to the high-2% range, youth unemployment for ages 15–29 moved back down to the mid-6% range, and the employment rate for ages 15–64 increased compared with the previous year. Employment growth was concentrated in service and office-type industries such as health and social welfare, professional and scientific services, and finance and insurance. This pattern is consistent with our data, which show a noticeable drop in the youth unemployment rate and a peak in the employment rate around May 2025. In other words, the February spike appears to capture a period of accumulated weakness in youth jobs, while the May turning point reflects a short-term recovery as more young people were absorbed into expanding sectors.
Overall, these interpretations are plausible given the one-year aggregate data that we analyze. However, month-to-month fluctuations in youth unemployment can also be influenced by additional factors that are not visible in our dataset, such as policy changes (for example, new youth employment support programs), global economic conditions, or the ongoing restructuring of the labor market in the post-COVID period. A more in-depth explanation of the February and May turning points would therefore require combining our descriptive results with external reports or academic research on these broader contextual changes.