Ruplal Lama
2023-12-10
Exploring Wage Disparities in the U.S
This project delves into a comprehensive analysis of hourly wages across the United States, with a special emphasis on understanding how these wages vary among various demographic groups. We’ll be exploring key differences in median and average earnings based on gender, racial background, and educational attainment.
How has an income inequalities between men and women been in the past 50 years? Has their level of educational attainment made difference and how it is in present?
Does the U.S. exhibit differences in wages across racial groups, and how have these variations in median hourly earnings evolved over the past 50 years for diverse racial demographics?
The data is sourced from the Economic Policy Institute’s (EPI) State of Working America Data Library. It is a trusted and reliable source for economic data. EPI provides researchers, media, and the public with easily accessible, up-to-date, and comprehensive historical data on the American labor force.
Economic Policy Institute https://www.epi.org/data/
Data on the labor force in the “Employment” section are compiled from EPI analysis of basic monthly Current Population Survey microdata. Data reflect 12-month moving averages as of the latest month of data.
Demographics Data represent people ages 16 and older unless otherwise noted.
Race/ethnicity Race/ethnicity categories are mutually exclusive.
Black: Black non-Hispanic White: White non-Hispanic Hispanic: Hispanic any race Education Educational categories are mutually exclusive and represent the highest education level attained for all individuals ages 16 and older.
Additional Documentation and Data dictioary can be accessed in following URL https://www.epi.org/data
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Rows: 50 Columns: 31
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (30): Less than HS, High school, Some college, Bachelor's degree, Advanc...
## dbl (1): Date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 50 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (8): Median, Average, White Median, White Average, Black Median, Black A...
## dbl (1): Date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## spc_tbl_ [50 × 31] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Date : num [1:50] 1973 1974 1975 1976 1977 ...
## $ Less than HS : chr [1:50] "$18.06" "$17.68" "$17.30" "$17.52" ...
## $ High school : chr [1:50] "$22.22" "$21.60" "$21.55" "$21.76" ...
## $ Some college : chr [1:50] "$24.08" "$23.32" "$23.30" "$23.49" ...
## $ Bachelor's degree : chr [1:50] "$32.80" "$31.69" "$31.45" "$31.46" ...
## $ Advanced degree : chr [1:50] "$38.16" "$38.37" "$38.41" "$37.50" ...
## $ Less than HS Share : chr [1:50] "31.6%" "30.4%" "28.2%" "27.1%" ...
## $ High school Share : chr [1:50] "36.3%" "36.0%" "36.3%" "36.2%" ...
## $ Some college Share : chr [1:50] "17.6%" "18.5%" "19.3%" "20.0%" ...
## $ Bachelor's degree Share : chr [1:50] "10.3%" "10.6%" "11.5%" "11.6%" ...
## $ Advanced degree Share : chr [1:50] "4.2%" "4.6%" "4.7%" "5.1%" ...
## $ Men Less than HS : chr [1:50] "$21.18" "$20.63" "$20.00" "$20.36" ...
## $ Men High school : chr [1:50] "$26.90" "$26.15" "$26.02" "$26.14" ...
## $ Men Some college : chr [1:50] "$27.67" "$26.79" "$26.93" "$27.10" ...
## $ Men Bachelor's degree : chr [1:50] "$37.69" "$36.62" "$36.21" "$36.42" ...
## $ Men Advanced degree : chr [1:50] "$40.09" "$41.03" "$40.86" "$40.31" ...
## $ Men Less than HS Share : chr [1:50] "33.4%" "32.0%" "29.9%" "29.0%" ...
## $ Men High school Share : chr [1:50] "32.6%" "32.4%" "32.8%" "32.8%" ...
## $ Men Some college Share : chr [1:50] "18.3%" "19.3%" "19.7%" "20.4%" ...
## $ Men Bachelor's degree Share : chr [1:50] "10.5%" "10.6%" "11.7%" "11.7%" ...
## $ Men Advanced degree Share : chr [1:50] "5.2%" "5.8%" "5.8%" "6.1%" ...
## $ Women Less than HS : chr [1:50] "$12.89" "$12.87" "$12.91" "$12.96" ...
## $ Women High school : chr [1:50] "$16.97" "$16.49" "$16.54" "$17.01" ...
## $ Women Some college : chr [1:50] "$18.41" "$17.91" "$17.91" "$18.37" ...
## $ Women Bachelor's degree : chr [1:50] "$25.50" "$24.70" "$24.44" "$24.52" ...
## $ Women Advanced degree : chr [1:50] "$32.73" "$30.78" "$32.14" "$31.05" ...
## $ Women Less than HS Share : chr [1:50] "28.9%" "28.0%" "25.9%" "24.5%" ...
## $ Women High school Share : chr [1:50] "41.7%" "41.1%" "41.1%" "41.0%" ...
## $ Women Some college Share : chr [1:50] "16.6%" "17.5%" "18.7%" "19.5%" ...
## $ Women Bachelor's degree Share: chr [1:50] "10.1%" "10.6%" "11.1%" "11.4%" ...
## $ Women Advanced degree Share : chr [1:50] "2.7%" "2.9%" "3.2%" "3.6%" ...
## - attr(*, "spec")=
## .. cols(
## .. Date = col_double(),
## .. `Less than HS` = col_character(),
## .. `High school` = col_character(),
## .. `Some college` = col_character(),
## .. `Bachelor's degree` = col_character(),
## .. `Advanced degree` = col_character(),
## .. `Less than HS Share` = col_character(),
## .. `High school Share` = col_character(),
## .. `Some college Share` = col_character(),
## .. `Bachelor's degree Share` = col_character(),
## .. `Advanced degree Share` = col_character(),
## .. `Men Less than HS` = col_character(),
## .. `Men High school` = col_character(),
## .. `Men Some college` = col_character(),
## .. `Men Bachelor's degree` = col_character(),
## .. `Men Advanced degree` = col_character(),
## .. `Men Less than HS Share` = col_character(),
## .. `Men High school Share` = col_character(),
## .. `Men Some college Share` = col_character(),
## .. `Men Bachelor's degree Share` = col_character(),
## .. `Men Advanced degree Share` = col_character(),
## .. `Women Less than HS` = col_character(),
## .. `Women High school` = col_character(),
## .. `Women Some college` = col_character(),
## .. `Women Bachelor's degree` = col_character(),
## .. `Women Advanced degree` = col_character(),
## .. `Women Less than HS Share` = col_character(),
## .. `Women High school Share` = col_character(),
## .. `Women Some college Share` = col_character(),
## .. `Women Bachelor's degree Share` = col_character(),
## .. `Women Advanced degree Share` = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
## spc_tbl_ [50 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Date : num [1:50] 2022 2021 2020 2019 2018 ...
## $ Median : chr [1:50] "$22.88" "$23.05" "$23.64" "$22.12" ...
## $ Average : chr [1:50] "$32.00" "$32.08" "$32.54" "$30.36" ...
## $ White Median : chr [1:50] "$24.96" "$25.40" "$25.98" "$24.39" ...
## $ White Average : chr [1:50] "$34.49" "$34.50" "$34.86" "$32.79" ...
## $ Black Median : chr [1:50] "$19.60" "$19.45" "$19.85" "$18.45" ...
## $ Black Average : chr [1:50] "$25.61" "$25.40" "$26.03" "$24.09" ...
## $ Hispanic Median : chr [1:50] "$18.93" "$19.14" "$19.21" "$18.19" ...
## $ Hispanic Average: chr [1:50] "$24.84" "$24.90" "$25.29" "$23.49" ...
## - attr(*, "spec")=
## .. cols(
## .. Date = col_double(),
## .. Median = col_character(),
## .. Average = col_character(),
## .. `White Median` = col_character(),
## .. `White Average` = col_character(),
## .. `Black Median` = col_character(),
## .. `Black Average` = col_character(),
## .. `Hispanic Median` = col_character(),
## .. `Hispanic Average` = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
Deleting unnecessary column from the data
# Remove columns that contain the word "share", "less than", "some"
wages_edu_gender <- wages_edu_gender %>%
select(
-contains("share"),
-contains("less than"),
-contains("some"))# Function to clean wage columns (remove $ and commas, convert to numeric)
clean_wage_column <- function(column) {
as.numeric(gsub("\\$", "", gsub(",", "", column)))
}
# Apply this function to all character columns that are actually numeric
wages_edu_gender <- wages_edu_gender %>%
mutate(across(where(is.character), clean_wage_column))
str(wages_edu_gender)## tibble [50 × 10] (S3: tbl_df/tbl/data.frame)
## $ Date : num [1:50] 1973 1974 1975 1976 1977 ...
## $ High school : num [1:50] 22.2 21.6 21.6 21.8 21.5 ...
## $ Bachelor's degree : num [1:50] 32.8 31.7 31.4 31.5 31.1 ...
## $ Advanced degree : num [1:50] 38.2 38.4 38.4 37.5 37.4 ...
## $ Men High school : num [1:50] 26.9 26.1 26 26.1 26 ...
## $ Men Bachelor's degree : num [1:50] 37.7 36.6 36.2 36.4 36.1 ...
## $ Men Advanced degree : num [1:50] 40.1 41 40.9 40.3 40.6 ...
## $ Women High school : num [1:50] 17 16.5 16.5 17 16.7 ...
## $ Women Bachelor's degree: num [1:50] 25.5 24.7 24.4 24.5 23.9 ...
## $ Women Advanced degree : num [1:50] 32.7 30.8 32.1 31.1 30.3 ...
Has there been an income inequalities between men and women in the past? Has their level of educational attainment made difference?
# Reshaping the data for easier plotting
wages_long <- wages_edu_gender %>%
gather(key = "category", value = "wage", -Date) %>%
separate(category, into = c("Gender", "Education"), sep = "\\s") %>%
unite("Gender_Education", Gender, Education, sep = " - ")## Warning: Expected 2 pieces. Additional pieces discarded in 300 rows [151, 152, 153, 154,
## 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170,
## ...].
ggplot(wages_long, aes(x = Date, y = wage, color = Gender_Education, group = Gender_Education)) +
geom_line() +
labs(title = "Gender Wage Gap Over the Years Across Different Education Levels",
x = "Year",
y = "Wage",
color = "Category") +
theme_minimal() +
theme(legend.position = "right")For all education levels and genders, there has been a general increase in wages over time. The gap between high school and advanced degree wages is notable, with advanced degrees earning significantly more. There is a noticeable difference in wages between men and women at each education level. Men generally earn more than women in corresponding education categories. This is clear from comparing ‘Men High school’ with ‘Women High school’, ‘Men Bachelor’s degree’ with ‘Women Bachelor’s degree’, and ‘Men Advanced degree’ with ‘Women Advanced degree’.
# Cleaning Second dataset
```r
str(wages_race)
## spc_tbl_ [50 × 9] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Date : num [1:50] 2022 2021 2020 2019 2018 ...
## $ Median : chr [1:50] "$22.88" "$23.05" "$23.64" "$22.12" ...
## $ Average : chr [1:50] "$32.00" "$32.08" "$32.54" "$30.36" ...
## $ White Median : chr [1:50] "$24.96" "$25.40" "$25.98" "$24.39" ...
## $ White Average : chr [1:50] "$34.49" "$34.50" "$34.86" "$32.79" ...
## $ Black Median : chr [1:50] "$19.60" "$19.45" "$19.85" "$18.45" ...
## $ Black Average : chr [1:50] "$25.61" "$25.40" "$26.03" "$24.09" ...
## $ Hispanic Median : chr [1:50] "$18.93" "$19.14" "$19.21" "$18.19" ...
## $ Hispanic Average: chr [1:50] "$24.84" "$24.90" "$25.29" "$23.49" ...
## - attr(*, "spec")=
## .. cols(
## .. Date = col_double(),
## .. Median = col_character(),
## .. Average = col_character(),
## .. `White Median` = col_character(),
## .. `White Average` = col_character(),
## .. `Black Median` = col_character(),
## .. `Black Average` = col_character(),
## .. `Hispanic Median` = col_character(),
## .. `Hispanic Average` = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
## Date Median Average White Median
## Min. :1973 Length:50 Length:50 Length:50
## 1st Qu.:1985 Class :character Class :character Class :character
## Median :1998 Mode :character Mode :character Mode :character
## Mean :1998
## 3rd Qu.:2010
## Max. :2022
## White Average Black Median Black Average Hispanic Median
## Length:50 Length:50 Length:50 Length:50
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## Hispanic Average
## Length:50
## Class :character
## Mode :character
##
##
##
# Function to remove dollar signs and convert columns to numeric
clean_wage <- function(column) {
as.numeric(gsub("\\$", "", column))
}
# Applying the function to all wage columns
wage_columns <- c("Median", "Average", "White Median", "White Average",
"Black Median", "Black Average", "Hispanic Median", "Hispanic Average")
wages_race <- wages_race %>%
mutate(across(all_of(wage_columns), clean_wage))## tibble [50 × 9] (S3: tbl_df/tbl/data.frame)
## $ Date : num [1:50] 2022 2021 2020 2019 2018 ...
## $ Median : num [1:50] 22.9 23.1 23.6 22.1 21.9 ...
## $ Average : num [1:50] 32 32.1 32.5 30.4 29.8 ...
## $ White Median : num [1:50] 25 25.4 26 24.4 24 ...
## $ White Average : num [1:50] 34.5 34.5 34.9 32.8 32.4 ...
## $ Black Median : num [1:50] 19.6 19.4 19.9 18.4 17.6 ...
## $ Black Average : num [1:50] 25.6 25.4 26 24.1 23.5 ...
## $ Hispanic Median : num [1:50] 18.9 19.1 19.2 18.2 17.5 ...
## $ Hispanic Average: num [1:50] 24.8 24.9 25.3 23.5 22.8 ...
## Date Median Average White Median White Average
## Min. :1973 Min. :18.78 Min. :22.42 Min. :19.43 Min. :23.08
## 1st Qu.:1985 1st Qu.:19.28 1st Qu.:23.05 1st Qu.:20.13 1st Qu.:23.93
## Median :1998 Median :19.81 Median :24.55 Median :21.11 Median :26.00
## Mean :1998 Mean :20.26 Mean :25.56 Mean :21.69 Mean :27.03
## 3rd Qu.:2010 3rd Qu.:21.11 3rd Qu.:27.34 3rd Qu.:22.93 3rd Qu.:29.43
## Max. :2022 Max. :23.64 Max. :32.54 Max. :25.98 Max. :34.86
## Black Median Black Average Hispanic Median Hispanic Average
## Min. :15.87 Min. :18.36 Min. :14.27 Min. :18.01
## 1st Qu.:16.16 1st Qu.:19.22 1st Qu.:15.50 1st Qu.:18.76
## Median :16.90 Median :20.32 Median :15.89 Median :19.25
## Mean :17.10 Mean :20.98 Mean :16.06 Mean :19.98
## 3rd Qu.:17.80 3rd Qu.:22.34 3rd Qu.:16.35 3rd Qu.:20.68
## Max. :19.85 Max. :26.03 Max. :19.21 Max. :25.29
Does a racial wage disparities exists in the U.S? And how has the trend been in the median and average hourly wages over 40 years for different races in the U.S.?
ggplot(wages_race, aes(x = Date)) +
geom_line(aes(y = `White Median`, color = "White")) +
geom_line(aes(y = `Black Median`, color = "Black")) +
geom_line(aes(y = `Hispanic Median`, color = "Hispanic")) +
labs(
title = "Racial Wage Disparities Over Time",
x = "Year",
y = "Median Wage",
color = "Race"
) +
theme_minimal() +
theme(legend.position = "bottom")Observations from the graph: There has been a general increase in wages across all groups over the years. Significant wage disparities exist between racial groups. White individuals consistently have higher median and average wages compared to Black and Hispanic individuals.
The median wages have been increasing for Americans over the years. Level of education have significantly raised the income for individuals. But, gender and racial income disparities have been existed in the past and it continues to exist today.
Thank You