Wage Gender Gap Analysis
library(tidyverse) library(readxl) library(knitr) library(scales) list.files() wage_data <- read_excel(“Data/Wage_GenderDS.xlsx”) list.files(“Data”) wage_data <- read_excel(“Data/Wage_GenderDS.xlsx”) wage_data <- wage_data |> mutate(Gender = if_else(Female == 1, “Women”, “Men”))
glimpse(wage_data) summary(wage_data)
Wage by Gender
Histogram of Wage
ggplot(wage_data, aes(x = Wage)) + geom_histogram(bins = 30, color = “black”, fill = “lightblue”) + labs( title = “Histogram of Wage”, x = “Wage”, y = “Frequency” ) + theme_minimal()
Explanation of the histogram The distribution of wages is right-skewed. Most individuals earn relatively low or moderate wages, while a smaller number of individuals earn very high wages, creating a long right tail in the distribution.
##Boxplot of Wage by Gender
ggplot(wage_data, aes(x = Gender, y = Wage)) + geom_boxplot()
Explanation of the boxplot:
Men have a higher median wage than women. The interquartile range is also wider for men, indicating greater variability in wages. Additionally, men have more and higher outliers compared to women.
Summary Statistics
wage_data |> group_by(Gender) |> summarise( mean = mean(Wage), median = median(Wage), sd = sd(Wage), min = min(Wage), max = max(Wage) )
mean_men <- mean(wage_data\(Wage[wage_data\)Gender == “Men”]) mean_women <- mean(wage_data\(Wage[wage_data\)Gender == “Women”])
mean_men - mean_women
Explanation of the statistics:
The raw wage gap is approximately 27.81 dollars, meaning that men earn on average about 27.81 dollars more per hour than women.
Log Wage
wage_data <- wage_data |> mutate(l_wage = log(Wage)) ggplot(wage_data, aes(x = l_wage)) + geom_histogram(bins = 30)
Explanation of the histogram:
Compared to the raw wage distribution, the log-transformed wages are more symmetric and closer to a normal distribution. The log transformation reduces the impact of very high wage values and makes the distribution more balanced.
Log Wage by Gender
ggplot(wage_data, aes(x = Gender, y = l_wage, fill = Gender)) + geom_boxplot()
Explanation of the boxplot:
The wage gap between men and women is still visible after the log transformation, as men have a higher median log wage. However, the distribution is more compressed and less affected by extreme values compared to the raw wage distribution.
mean_log_men <- mean(wage_data\(l_wage[wage_data\)Gender == “Men”]) mean_log_women <- mean(wage_data\(l_wage[wage_data\)Gender == “Women”])
100 * (mean_log_men - mean_log_women)
Approximate percentage gap The approximate percentage wage gap is about 25.06%, meaning that men earn roughly 25% more than women on average.
Education Levels by Gender
table(wage_data\(Gender, wage_data\)Educ) ## Education Levels by Gender
table(wage_data\(Gender, wage_data\)Educ)
Most common education level among men and women The most common education level among both women and men is level 1. However, men are more represented in higher education levels (3 and 4), while women are concentrated in lower levels (1 and 2).
##Part‑time work by gender
wage_data |> group_by(Gender) |> summarise(parttime_rate = mean(Parttime == 1))
How might differences in part-time work affect the wage gap?
Women are significantly more likely to work part-time than men (56% vs 22.5%). Since part-time jobs typically pay less, this difference may contribute to the observed wage gap between men and women.
Age Distribution
wage_data |> group_by(Gender) |> summarise( mean_age = mean(Age), median_age = median(Age) )
Age Comparison by Gender
The mean and median ages of men and women are very similar. Therefore, age is unlikely to explain a significant part of the wage gap.
Why Use Log(Wage)?
Economists often use logwage instead of raw wages for two main reasons. First, log transformation makes the distribution more symmetric by reducing the impact of very high wage values. Second, differences in log wages can be interpreted as approximate percentage differences, which makes it easier to analyze and compare wage gaps.
Is the Raw Wage Gap the Same as Discrimination?
No, the raw wage gap is not necessarily the same as discrimination. Differences in education levels and part-time work between men and women may explain part of the wage gap. For example, women are more likely to work part-time and are less represented in higher education levels, which can lead to lower average wages. Therefore, the observed wage gap likely reflects both differences in characteristics and potential discrimination.
##AI USE
| Tool Used | Prompt Given | How You Verified or Modified the Output |
|---|---|---|
| ChatGPT | “How do I create a log(wage) variable in R?” | I applied the transformation and verified the results in my dataset |
| ChatGPT | “How can I calculate mean wages and the wage gap between men and women in R?” | I ran the calculations myself and confirmed the outputs |
| ChatGPT | “How should I interpret the histogram and boxplot results for wage distribution?” | I reviewed the graphs and wrote the explanations based on my own understanding |