## AI Use Log
I used ChatGPT to help me structure the Quarto document, generate R code for the requested graphs and summary tables, and improve the wording of my explanations. I checked the outputs myself and interpreted the results based on the dataset.
## Setup
library(readxl)library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(ggplot2)library(knitr)# Read the datasetdf <-read_excel("Wage Gender.xlsx")# Check variable namesnames(df)
ggplot(df, aes(x = Wage)) +geom_histogram(bins =30, fill ="skyblue", color ="black") +labs(title ="Histogram of Raw Hourly Wage",x ="Wage",y ="Frequency" ) +theme_minimal()
Explanation:
The histogram of raw hourly wage is expected to be right-skewed. This means that most people earn wages around the lower-to-middle range, while a smaller number of people earn much higher wages, creating a long right tail.
2. Boxplot of Wage by Gender
ggplot(df, aes(x = Gender, y = Wage)) +geom_boxplot(fill ="lightgreen") +labs(title ="Boxplot of Wage by Gender",x ="Gender",y ="Wage" ) +theme_minimal()
Explanation:
This boxplot compares the wage distributions of men and women. I will compare the median, interquartile range, and outliers based on the graph. If men’s median wage is higher, this suggests a raw wage gap in favor of men. A larger IQR would indicate more variation in wages within that group.
The raw wage gap in dollars is calculated as the mean wage of men minus the mean wage of women. A positive number indicates that men earn more on average.
Part 2: Log Transformation
1. Histogram of log(Wage)
ggplot(df, aes(x = l_wage)) +geom_histogram(bins =30, fill ="orange", color ="black") +labs(title ="Histogram of Log Wage",x ="log(Wage)",y ="Frequency" ) +theme_minimal()
Explanation:
Compared to the raw wage histogram, the distribution of log(Wage) is usually more symmetric and less strongly right-skewed. The log transformation compresses very large wage values and makes the distribution easier to analyze.
2. Boxplot of log(Wage) by Gender
ggplot(df, aes(x = Gender, y = l_wage)) +geom_boxplot(fill ="pink") +labs(title ="Boxplot of Log Wage by Gender",x ="Gender",y ="log(Wage)" ) +theme_minimal()
Explanation:
The log transformation may reduce the visual influence of extreme high wages. The gender gap may still remain visible, but the distributions are often easier to compare after taking logs. Economists often prefer log(wage) because differences in logs can be interpreted approximately as percentage differences.
The approximate percentage wage gap is calculated as 100 × (mean log wage of men – mean log wage of women). This gives an approximate percentage difference in average wages.
Part 3: Exploring Confounders
1. Education levels by gender
educ_table <-table(df$Gender, df$Educ)educ_table
1 2 3 4
Men 108 77 72 59
Women 88 57 33 6
kable(educ_table, caption ="Education Levels by Gender")
1 2 3 4
Men 0.342 0.244 0.228 0.187
Women 0.478 0.310 0.179 0.033
kable(round(prop_educ, 3), caption ="Proportion of Education Levels within Each Gender")
Proportion of Education Levels within Each Gender
1
2
3
4
Men
0.342
0.244
0.228
0.187
Women
0.478
0.310
0.179
0.033
Explanation:
This table shows the distribution of education levels separately for men and women. The most common education level for each group is the one with the highest frequency.
2. Part-time work by gender
parttime_stats <- df %>%group_by(Gender) %>%summarise(prop_parttime =mean(as.numeric(as.character(Parttime)) ==1, na.rm =TRUE) )kable(parttime_stats, digits =3, caption ="Proportion of Part-Time Workers by Gender")
Proportion of Part-Time Workers by Gender
Gender
prop_parttime
Men
0.225
Women
0.560
Explanation:
If women are more likely to work part-time, this may help explain part of the observed wage gap. Part-time jobs may pay lower wages on average or may be concentrated in lower-paying sectors.
If men and women have similar mean and median ages, age is less likely to explain much of the wage gap. If there is a noticeable difference, age may explain part of the gap because earnings often increase with work experience.
Part 4: Interpretation
Why use log(wage) instead of wage?
Economists use log(wage) because wages are usually right-skewed, and the log transformation makes the distribution more symmetric. In addition, differences in log wages can be interpreted approximately as percentage differences, which are easier to understand when analyzing wage gaps.
Is the raw wage gap the same as discrimination?
No, the raw wage gap is not the same as discrimination. It may also reflect differences in education, age, and part-time work. Therefore, the observed gap includes both possible discrimination and differences in worker characteristics.
Conclusion
In this analysis, I compared wages using both raw and log values. Log wages provide a more balanced distribution. Factors such as education, age, and part-time work may explain part of the wage gap, so it should not be interpreted only as discrimination.