Wage Gender Gap Analysis

Author

Şilan Kılıçarslan

library(tidyverse) library(readxl) library(knitr) library(scales) list.files() wage_data <- read_excel(“Data/Wage_GenderDS.xlsx”) list.files(“Data”) wage_data <- read_excel(“Data/Wage_GenderDS.xlsx”) wage_data <- wage_data |> mutate(Gender = if_else(Female == 1, “Women”, “Men”))

glimpse(wage_data) summary(wage_data)

Wage by Gender

Histogram of Wage

ggplot(wage_data, aes(x = Wage)) + geom_histogram(bins = 30, color = “black”, fill = “lightblue”) + labs( title = “Histogram of Wage”, x = “Wage”, y = “Frequency” ) + theme_minimal()

Explanation of the histogram The distribution of wages is right-skewed. Most individuals earn relatively low or moderate wages, while a smaller number of individuals earn very high wages, creating a long right tail in the distribution.

##Boxplot of Wage by Gender

ggplot(wage_data, aes(x = Gender, y = Wage)) + geom_boxplot()

Explanation of the boxplot:
Men have a higher median wage than women. The interquartile range is also wider for men, indicating greater variability in wages. Additionally, men have more and higher outliers compared to women.

Summary Statistics

wage_data |> group_by(Gender) |> summarise( mean = mean(Wage), median = median(Wage), sd = sd(Wage), min = min(Wage), max = max(Wage) )

mean_men <- mean(wage_data\(Wage[wage_data\)Gender == “Men”]) mean_women <- mean(wage_data\(Wage[wage_data\)Gender == “Women”])

mean_men - mean_women

Explanation of the statistics:
The raw wage gap is approximately 27.81 dollars, meaning that men earn on average about 27.81 dollars more per hour than women.

Log Wage

wage_data <- wage_data |> mutate(l_wage = log(Wage)) ggplot(wage_data, aes(x = l_wage)) + geom_histogram(bins = 30)

Explanation of the histogram:
Compared to the raw wage distribution, the log-transformed wages are more symmetric and closer to a normal distribution. The log transformation reduces the impact of very high wage values and makes the distribution more balanced.

Log Wage by Gender

ggplot(wage_data, aes(x = Gender, y = l_wage, fill = Gender)) + geom_boxplot()

Explanation of the boxplot:
The wage gap between men and women is still visible after the log transformation, as men have a higher median log wage. However, the distribution is more compressed and less affected by extreme values compared to the raw wage distribution.

mean_log_men <- mean(wage_data\(l_wage[wage_data\)Gender == “Men”]) mean_log_women <- mean(wage_data\(l_wage[wage_data\)Gender == “Women”])

100 * (mean_log_men - mean_log_women)

Approximate percentage gap The approximate percentage wage gap is about 25.06%, meaning that men earn roughly 25% more than women on average.

Education Levels by Gender

table(wage_data\(Gender, wage_data\)Educ) ## Education Levels by Gender

table(wage_data\(Gender, wage_data\)Educ)

Most common education level among men and women The most common education level among both women and men is level 1. However, men are more represented in higher education levels (3 and 4), while women are concentrated in lower levels (1 and 2).

##Part‑time work by gender

wage_data |> group_by(Gender) |> summarise(parttime_rate = mean(Parttime == 1))

How might differences in part-time work affect the wage gap?

Women are significantly more likely to work part-time than men (56% vs 22.5%). Since part-time jobs typically pay less, this difference may contribute to the observed wage gap between men and women.

Age Distribution

wage_data |> group_by(Gender) |> summarise( mean_age = mean(Age), median_age = median(Age) )

Age Comparison by Gender

The mean and median ages of men and women are very similar. Therefore, age is unlikely to explain a significant part of the wage gap.

Why Use Log(Wage)?

Economists often use logwage instead of raw wages for two main reasons. First, log transformation makes the distribution more symmetric by reducing the impact of very high wage values. Second, differences in log wages can be interpreted as approximate percentage differences, which makes it easier to analyze and compare wage gaps.

Is the Raw Wage Gap the Same as Discrimination?

No, the raw wage gap is not necessarily the same as discrimination. Differences in education levels and part-time work between men and women may explain part of the wage gap. For example, women are more likely to work part-time and are less represented in higher education levels, which can lead to lower average wages. Therefore, the observed wage gap likely reflects both differences in characteristics and potential discrimination.

##AI USE

Tool Used Prompt Given How You Verified or Modified the Output
ChatGPT “How do I create a log(wage) variable in R?” I applied the transformation and verified the results in my dataset
ChatGPT “How can I calculate mean wages and the wage gap between men and women in R?” I ran the calculations myself and confirmed the outputs
ChatGPT “How should I interpret the histogram and boxplot results for wage distribution?” I reviewed the graphs and wrote the explanations based on my own understanding