Warning: package 'dplyr' was built under R version 4.5.2
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.5.2
data <-read_excel("Wage_GenderDS.xlsx")
Part 1: Distribution of Wage
Histogram of Wage
wage_histogram <- data|>ggplot(aes(x = Wage)) +geom_histogram(binwidth =1, fill ="blue") +labs( title ="histogram of wage", x ="wage", y ="hour")+theme_minimal()
its left skewed, we observe lower wage levels.
Boxplot of Wage by Gender
wagee_data <- data |>mutate(Gender =if_else(Female ==0, "Men", "Women"))wage_boxplot <- wagee_data |>ggplot(aes(x = Gender, y = Wage, fill = Gender)) +geom_boxplot() +labs(title ="Boxplot of Wage by Gender", x ="gender", y ="wage") +theme_minimal()wage_boxplot
Mens median is higher than womans. Mens one is above 100 while womans is below 100. Boht groups have outliers. Mens outliers are higher than womans.
data <- data |>mutate(l_wage =log(Wage))lwage_histogram <- data |>ggplot(aes(x = l_wage)) +geom_histogram(binwidth =0.5, fill ="pink") +labs(title ="Histogram of Log(Wage)",x ="log(Wage)",y ="Count" ) +theme_minimal()lwage_histogram
Its normally distributed compared to the raw wage one.
Boxplot of l_wage by Gender
data_gender <- data |>mutate(Gender =if_else(Female ==0, "Men", "Women"))lwage_boxplot <- data_gender |>ggplot(aes(x = Gender, y = l_wage, fill = Gender)) +geom_boxplot() +labs(title ="Boxplot of Log(Wage) by Gender",x ="Gender",y ="log(Wage)" ) +theme_minimal()lwage_boxplot
educ_table <- data |>group_by(Female, Educ) |>summarize(count =n()) |>arrange(Female, Educ)
`summarise()` has regrouped the output.
ℹ Summaries were computed grouped by Female and Educ.
ℹ Output is grouped by Female.
ℹ Use `summarise(.groups = "drop_last")` to silence this message.
ℹ Use `summarise(.by = c(Female, Educ))` for per-operation grouping
(`?dplyr::dplyr_by`) instead.
Woman are much moe than man in part time working. this may affect the observed wage gap because part-time jobs are often associated with lower pay and fewer hours
Economists often use log(wage) instead of raw wage for two reasons. First, wages are usually right-skewed, and the log transformation makes the distribution more symmetric. Second, differences in log wages can be interpreted approximately as percentage differences, which is very useful when analysing wage gaps.
Is the raw wage gap the same as discrimination?
No, the raw wage gap is not the same as discrimination. Part of the gap may be explained by other factors such as education and part-time work. In this dataset, women are much more likely to work part-time, so the entire raw wage gap cannot automatically be attributed to discrimination.