QUIZ2

AI Use Log

I used an AI tool to check some R code and understand a few steps. I reviewed everything and wrote the final answers myself.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)

data <- read_excel("Wage_GenderDS.xlsx")
glimpse(data)
Rows: 500
Columns: 6
$ Observation <dbl> 119, 2, 41, 65, 246, 254, 74, 12, 9, 237, 79, 294, 182, 25…
$ Wage        <dbl> 32, 34, 37, 38, 38, 38, 39, 40, 42, 43, 44, 45, 46, 46, 47…
$ Female      <dbl> 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1…
$ Age         <dbl> 31, 42, 31, 33, 21, 28, 31, 28, 25, 25, 44, 25, 31, 42, 38…
$ Educ        <dbl> 1, 1, 1, 1, 2, 2, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 2, 1, 1, 1…
$ Parttime    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1…

PART 1: DISTRUBUTION OF WAGE

Histogram of Wage

ggplot(data, aes(x = Wage)) +
  geom_histogram(bins = 30) +
  theme_minimal()

The distribution of Wage is right skewed. Most individuals earn lower wages, and a few earn very high wages.

Boxplot by Gender

ggplot(data, aes(x = factor(Female), y = Wage)) +
  geom_boxplot() +
  labs(x = "Gender (0=Male, 1=Female)")

Men have a higher median wage than women. The wage distribution for men is more spread out. There are outliers in both groups, but men have higher top wages.

Summary Statistics

data |> group_by(Female) |>
  summarise(
    mean = mean(Wage),
    median = median(Wage),
    sd = sd(Wage),
    min = min(Wage),
    max = max(Wage)
  )
# A tibble: 2 × 6
  Female  mean median    sd   min   max
   <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
1      0 125.   111    57.3    38   384
2      1  97.3   83.5  46.3    32   364

Men have higher mean and median wages than women, indicating a clear wage gap in the data.

Raw Wage Gap

mean_men <- mean(data$Wage[data$Female == 0])
mean_women <- mean(data$Wage[data$Female == 1])

mean_men - mean_women
[1] 27.80682

The raw wage gap is the difference between male and female mean wages. Men earn more on average.

PART 2: LOG TRANSFORMATION

Create log(Wage)

data <- data |>
  mutate(l_wage = log(Wage))

Histogram of log(Wage)

ggplot(data, aes(x = l_wage)) +
  geom_histogram(bins = 30) +
  theme_minimal()

The distribution of log(Wage) is more symmetric and less skewed compared to raw Wage, which makes analysis easier.

Boxplot of log(Wage)

ggplot(data, aes(x = factor(Female), y = l_wage)) +
  geom_boxplot()

The wage gap still exists after log transformation, but extreme values are reduced.

Percentage Gap

mean_log_men <- mean(data$l_wage[data$Female == 0])
mean_log_women <- mean(data$l_wage[data$Female == 1])

100 * (mean_log_men - mean_log_women)
[1] 25.06425

This gives the approximate percentage wage gap between men and women.

PART 3: CONFOUNDERS

Education

table(data$Educ, data$Female)
   
      0   1
  1 108  88
  2  77  57
  3  72  33
  4  59   6

The most common education level is similar for both groups.

Part-time Work

data |>
  group_by(Female) |>
  summarise(parttime_rate = mean(Parttime))
# A tibble: 2 × 2
  Female parttime_rate
   <dbl>         <dbl>
1      0         0.225
2      1         0.560

Women have a higher part-time work rate. This may lower their average wages.

Age

data |>
  group_by(Female) |>
  summarise(
    mean_age = mean(Age),
    median_age = median(Age)
  )
# A tibble: 2 × 3
  Female mean_age median_age
   <dbl>    <dbl>      <dbl>
1      0     40.1         39
2      1     39.9         39

Age is very similar between men and women, so it does not explain much of the wage gap.

PART 4: INTERPRETATION

Why use log(wage)?

  1. It reduces skewness.
  2. It makes interpretation easier in percentage terms.

Is wage gap discrimination?

No. The wage gap is not only due to discrimination. Differences in part-time work and education levels also play an important role. Therefore, the observed wage gap cannot be interpreted as pure discrimination.