Final Project: Independent Data Intensive Research
ETR537
Author
Dr. Cansu Tatar
Published
December 14, 2023
Overview
As a final project, you’ll be applying the knowledge and skills gained throughout this semester to conduct the independent analysis. Similar to the case studies provided in this course, your analysis should demonstrate your ability to wrangle, analyze, and communicate findings in response to a research question of interest. You will also be expected to use Quarto to create a reproducible data product, which contains all the code used during each step of your analysis so it can be reproduced by others.
The persistent wage gap between genders is a subject of significant socio-economic research and policy debate. This analysis seeks to contribute to this body of work by quantifying and visualizing the differences in median hourly wages between men and women across varying education levels in the United States over a nearly 50-year span, from 1973 to 2022. By examining these trends over time, the aim to uncover patterns and shifts that could inform the understanding of the progress made towards wage equality and the challenges that remain.
The primary research questions focus on the evolution of the wage gap:
How has the wage gap between men and woman who have completed high school as their highest level of education changed over the years?
In comparison, how has the wage gap between men and women holding bachelor’s degrees developed in the same timeframe?
The dataset used for this analysis comes from the Economic Policy Institute’s State of Working America Data Library. This data, standardized to 2022 dollars, presents a clear view of the real wage disparities.
library(tidyverse)
Warning: package 'ggplot2' was built under R version 4.3.2
Warning: package 'dplyr' was built under R version 4.3.2
Warning: package 'lubridate' was built under R version 4.3.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.3 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Wrangle
#load data and renamewage_data <-read.csv("C:/Users/joeje/OneDrive/Desktop/college_wage_premium.csv")#Checking data structuresstr(wage_data)
'data.frame': 50 obs. of 7 variables:
$ year : int 2022 2021 2020 2019 2018 2017 2016 2015 2014 2013 ...
$ high_school : num 21.9 22.3 22.7 21.6 21.5 ...
$ bachelors_degree : num 41.6 41.3 41.6 39.6 38.9 ...
$ men_high_school : num 24.1 24.4 25.1 24 23.7 ...
$ men_bachelors_degree : num 49 47.8 48.1 45.7 45 ...
$ women_high_school : num 18.9 19.4 19.4 18.5 18.5 ...
$ women_bachelors_degree: num 34.4 35.1 35.4 33.8 33 ...
#Trend Analysis; plot for high school education levelhs_gap_plot <-ggplot(wage_data, aes(x = year, y = gap_high_school)) +geom_line()+labs(title ="Trend of Wage Gap Over Time For High School Education Level",x ="Year",y ="Wage Gap (%)") +theme_minimal()#Plot for bachelor's degreebach_gap_plot <-ggplot(wage_data, aes(x=year, y = gap_bachelors)) +geom_line() +labs(title ="Trend of Wage Gap Over Time for Bachelor's Degree Holders",x ="Year",y ="Wage Gap (%)") +theme_minimal()#Displayhs_gap_plot
bach_gap_plot
Analyzing the first graph, Trend of Wage Gap Over Time for High School Education Level, there is a downward trend from the 1970s to the early 2000s, indicating that the wage gap during this time between men and women with high school education was decreasing. This suggests that over these decades, women’s earnings relative to men’s improved. After the early 2000s, the wage gap seems to stabilize somewhat and fluctuates roughly. This could imply that while the situation improved significantly from the 1970s through the 2000s, progress has slowed down, and the wage gap remained relatively constant since then. In the most recent years on the graph, there is a slight uptick that could be influenced by economic or social factors, such as economic downturns, shifts in the labor market, or changes in social policies. While there has been progress in reducing the gender wage gap among high school graduates over the last 50 years, the rate of improvement has slows in recent years.
Analyzing the second graph, Trend of Wage Gap Over Time for Bachelor’s Degree Holders, in the early part of the graph, there are some fluctuations, followed by a sharp decline in the wage gap from the late 1970s into the early 2000s. This suggests significant progress in closing the wage gap among individuals with a bachelor’s degree during this period. There is a period where the wage gap appears more stable but with fluctuations, suggesting variability year-to-year in the progress towards closing the wage gap. In the most recent years, there is a sharp increase in the wage gap. This should be investigated into what are the possible causes of this rise, such as changes in labor market dynamics, policy shifts, or socio-economic factors that could influence wage disparity.
Comparing both graphs, there were noticeable improvements in wage equality from the 1970s to the early 2000s, but the group with bachelor’s degrees had a more volatile path with sharper fluctuations. We also notice the wage gap for high school graduates appears to have stabilized in recent decades, while the gap for bachelor’s degree holders has shown a troubling increase in recent years. This increase in the wage gap for bachelor’s degree holders could indicate that higher education does not insulate against gender-based wage disparities and these disparities might be growing in certain sectors or roles that require higher education.
Communication
Between the 1970s and 2022, the examination of median hourly wages reveal a narrative concerning the gender wage gap across different educational levels in the United States. For individuals with high school diplomas, there has been a narrowing of the wage gap form the 1970s through the early 2000s. However, it is important to note that the rate of improvement has decelerated starting in the early 2000s to now. In contrast, the wage gap for those holding bachelor’s degrees presents a more complex trajectory. While there was a period of decline in the wage gap for bachelor’s degree holders, this trend also had significant fluctuations. The analysis here has showed a recent uptick in the wage gap for this group, suggesting a regression that undermines the progress previously made towards wage equalization. The findings show that while educational attainment is beneficial in reducing the wage gap, it does not entirely insulate against systemic issues of gender-wage disparities. The reversal in progress, especially pronounced among bachelor’s degree holders, underscores the persistent nature of the wage equality challenge.
This situation demands proactive measures to ensure that the pursuit of wage equality continues to be a focal point of social and economic policy. Policymakers and advocacy groups could push for wage equity legislation and hold corporations accountable for fostering equitable compensation. Employers and organizations could critically assess their wage practices through comprehensive internal audits. Additionally, educational institutions could institute and bolster programs designed to facilitate women’s entry into higher paying fields where they have historically been underrepresented, effectively pushing to reshaping the employment landscape. These strategies collectively represent a multi-facited approach to bridging the wage gap.
This wage gap analysis does acknowledge certain limitations. This analysis does not factor in occupational segregation where women and men may be concentrated in different jobs or industries with different pay scales. The focus on median hourly wages means that the broader compensation (bonuses, overtime pay, and other benefits) falls outside the scope of the analysis. Even though the wages have been adjusted for inflation reflecting 2022 dollars, cost-of-living adjustments was not included in the analysis, which bear significant influence on real income disparities. When looking at the data, it is important to be careful about how it is to be understood and used. The information should be used to encourage more work towards making wages fair for everyone, especially since the data indicates progress has slowed. It is also important to handle the data responsibly, ensuring privacy and compliance with data protection laws, given the sensitivity or wage-related information.
Economic Policy Institute, & Asaniczka. (2023). USA Wage Comparison for College vs. High School [Data set]. Kaggle. https://doi.org/10.34740/KAGGLE/DS/3836693
Assessment
This assignment is worth 18 points total. You will receive 1,5 point (15 points total) for each of the project criteria adequately addressed by your data product, and 3 point for rendering and publishing your work in GitHub/QuartoPubs/RPubs environment.