wages_data <- readr::read_csv("data/wages_by_education.csv")
## Rows: 50 Columns: 61
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (61): year, less_than_hs, high_school, some_college, bachelors_degree, a...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

About Dataset

The Wages by Education dataset which focuses on the average hourly salaries of American workers, makes a distinction depending on the level of educational performance attained. Average Hourly Wages: This is the average amount that employees make per hour. Level of Education: Divided into groups such as “less than high school,” “high school graduate,” “some college,” “bachelor’s degree,” and “advanced degree,” it represents the greatest level of education attained by the individual. This data set, which covers the years 1973 to 2022, is based on a solid base of information gathered by the Economic Policy Institute’s State of Working America Data Library.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
wages_data_race_gender <- wages_data |>
  select(
    year,
    white_men_less_than_hs,
    white_men_high_school,
    white_men_some_college,
    white_men_bachelors_degree,
    white_men_advanced_degree,
    black_men_less_than_hs,
    black_men_high_school,
    black_men_some_college,
    black_men_bachelors_degree,
    black_men_advanced_degree,
    hispanic_men_less_than_hs,
    hispanic_men_high_school,
    hispanic_men_some_college,
    hispanic_men_bachelors_degree,
    hispanic_men_advanced_degree,
    white_women_less_than_hs,
    white_women_high_school,
    white_women_some_college,
    white_women_bachelors_degree,
    white_women_advanced_degree,
    black_women_less_than_hs,
    black_women_high_school,
    black_women_some_college,
    black_women_bachelors_degree,
    black_women_advanced_degree,
    hispanic_women_less_than_hs,
    hispanic_women_high_school,
    hispanic_women_some_college,
    hispanic_women_high_school,
    hispanic_women_some_college,
    hispanic_women_bachelors_degree,
    hispanic_women_advanced_degree
  )|>
  pivot_longer(
    cols = -year,  
    names_to = c("race", "gender", "education"),
    names_pattern = "(white|black|hispanic)_(men|women)_(.*)",
    values_to = "wages"
  )
wages_data_race_gender
## # A tibble: 1,500 × 5
##     year race  gender education        wages
##    <dbl> <chr> <chr>  <chr>            <dbl>
##  1  2022 white men    less_than_hs      17.1
##  2  2022 white men    high_school       25.9
##  3  2022 white men    some_college      29.9
##  4  2022 white men    bachelors_degree  51.2
##  5  2022 white men    advanced_degree   63.9
##  6  2022 black men    less_than_hs      16.4
##  7  2022 black men    high_school       20.7
##  8  2022 black men    some_college      22.6
##  9  2022 black men    bachelors_degree  37.6
## 10  2022 black men    advanced_degree   52.9
## # ℹ 1,490 more rows
library(treemap)

# Summarizing data for treemap (hypothetical example)
data_summary <- wages_data_race_gender %>%
  group_by(race, gender, education) %>%
  summarise(total_wages = sum(wages, na.rm = TRUE))
## `summarise()` has grouped output by 'race', 'gender'. You can override using
## the `.groups` argument.
# Creating the treemap
treemap(data_summary,
        index = "education", 
        vSize = "total_wages", 
        vColor = "education", 
        palette = "RdYlGn",
        title = "Wages by Education level")

This representation of a tree map is called “Wages by Education level.” The size of each nested rectangle in a treemap reflects the quantity of the category it represents, and it is used to illustrate hierarchical data. There are five divisions in the image, each of which represents a different educational level: Advanced Degree: Shown in dark red, this is the largest portion. According to the size, those with advanced degrees make the most money when compared to the other educational groups displayed. Bachelor’s Degree: Located directly below the advanced degree area, this portion is shown in a paler shade of red. According to its size, bachelor’s degree holders make the second-highest wages. Some College: In a pale yellow tint, to the right. The data suggests that individuals with a college degree are paid less than those with advanced or bachelor’s degrees. High School: This area is the same shade of yellow as “Some College,” but it is smaller. According to the size, high school graduates make significantly less money than people who have attended some college. Less Than HS (High School): The smallest section, which is at the bottom right, indicates that, of the categories shown, people with less than a high school education make the least money. In conclusion, the tree map shows a positive association between earnings and education levels: a person’s associated wage seems to rise in tandem with their level of education. better education frequently equates to better-earning potential, which is a widespread trend in many labor markets.

wages_data |>
  select(year, less_than_hs, high_school, some_college, bachelors_degree, advanced_degree) |>
  pivot_longer(!year, names_to = "education", values_to = "wages") |>
  ggplot(aes(x = year, y = wages,  color = education)) +
  geom_point() +
  geom_line() +
  scale_color_brewer(palette = "Set1") +
  theme_bw() +
  theme(panel.border = element_blank(),
        legend.title = element_blank(),
        plot.title = element_text(hjust = 0.5)) +
  labs(x = "Year",  y="Average hourly wages",
       title="Average Wage Trends by Education Level", color="Education Level")

The title of this line graph is “Average Wage Trends by Education Level.” It shows how average hourly wages changed between 1980 and 2020 for various levels of schooling. Advanced Degree (Red Line): Over the course of the 40-year period, people with an advanced degree have consistently earned the highest average hourly pay. This group’s wages have shown a significant upward tendency, beginning at less than 40 dollars in 1980 and rising to just over 50 dollars by 2020. Bachelor’s Degree (Blue Line): The second-highest income trend is seen by individuals with a bachelor’s degree. Salary levels start at about 30 dollars in 1980 and rise steadily over time, reaching over $40 by 2020. Some College (Orange Line): Pay for workers with a college degree has increased moderately over time, beginning at just over 20 dollars in 1980 and getting close to $30 by 2020. High School (Green Line): Over the time, high school graduates’ wages have increased reasonably steadily. They began the 1980s slightly below the “some college” wage category and have continued on a similar trend, finishing 2020 slightly below the “some college” wage level. Less Than HS (Purple Line): The average hourly salary during the time frame is lowest for people who have not completed high school. Through 1980, this line shows a modest increase, but it remains pretty stable, starting just around $20.

wages_data_gender <- wages_data |>
  select(
    year,
    women_less_than_hs,
    women_high_school,
    women_some_college,
    women_bachelors_degree,
    women_advanced_degree,
    men_less_than_hs ,
    men_high_school,
    men_some_college,
    men_bachelors_degree,
    men_advanced_degree
  ) |>
  pivot_longer(
    cols = -year, 
    names_to = c("gender", "education"),
    names_pattern = "(women|men)_(.*)",
    values_to = "wages"
  )
wages_data_gender
## # A tibble: 500 × 4
##     year gender education        wages
##    <dbl> <chr>  <chr>            <dbl>
##  1  2022 women  less_than_hs      14.3
##  2  2022 women  high_school       18.9
##  3  2022 women  some_college      21.8
##  4  2022 women  bachelors_degree  34.4
##  5  2022 women  advanced_degree   44.3
##  6  2022 men    less_than_hs      18.0
##  7  2022 men    high_school       24.1
##  8  2022 men    some_college      28.0
##  9  2022 men    bachelors_degree  49.0
## 10  2022 men    advanced_degree   63.5
## # ℹ 490 more rows
wages_data_gender |>
  ggplot(aes(x = factor(year), y = wages, fill = gender)) +  
  geom_bar(stat="identity", position="dodge", width=0.7) + 
  scale_fill_brewer(palette = "Set1") +
  theme_bw() +
  theme(panel.border = element_blank(),
        legend.title = element_blank(),
        plot.title = element_text(hjust = 0.5),
        axis.text.x = element_text(angle = 45, hjust = 1)) +  
  labs(x = "Year", y = "Average hourly wages",
       title = "Average Wage Trends by Gender", fill = "gender")

The title of this bar graph is “Average Wage Trends by Gender.” It shows the mean hourly earnings for both genders in the United States between 1973 and 2022. Gender Wage Disparity: men and women make significantly different amounts of money throughout the course of the entire period. The height of the red bars, representing men, is consistently taller than the blue bars, representing women, each year, indicating that males regularly receive higher average hourly salaries than women. Progressive Narrowing of the Gap: Prior to 1973, there seemed to be a larger gap in wages between men and women. But as we get into more recent times, particularly the 2000s and 2010s, the gender wage gap closes, with the height of the red bars (men) becoming closer to the blue bars (women). This shows that although wage inequality still exists, it has declined over time. Over the period, there has been a general upward trend in the average hourly salaries of both genders. This rising pattern suggests inflation, economic expansion, and perhaps even improvements in roles and employment possibilities throughout time. Even while the difference has closed, it’s important to remember that there has never been a year in which women’s average pay has exceeded or matched that of men. This difference is consistent, which suggests that gender wage discrepancy is a structural issue. As a result, even though there have been advancements in female salary parity over the past fifty years, the wage gap still exists, indicating that gender-based compensation differences still remain in the American workforce.

wages_data_race <- wages_data |>
  select(
    year,
    white_less_than_hs,
    white_high_school,
    white_some_college,
    white_bachelors_degree,
    white_advanced_degree,
    black_less_than_hs ,
    black_high_school,
    black_some_college,
    black_bachelors_degree,
    black_advanced_degree,
    hispanic_less_than_hs,
    hispanic_high_school,
    hispanic_some_college,
    hispanic_bachelors_degree,
    hispanic_advanced_degree
  ) |>
  pivot_longer(
    cols = -year, 
    names_to = c("race", "education"),
    names_pattern = "(white|black|hispanic)_(.*)",
    values_to = "wages"
  )
wages_data_race
## # A tibble: 750 × 4
##     year race  education        wages
##    <dbl> <chr> <chr>            <dbl>
##  1  2022 white less_than_hs      15.7
##  2  2022 white high_school       23.3
##  3  2022 white some_college      26.3
##  4  2022 white bachelors_degree  43.3
##  5  2022 white advanced_degree   53.3
##  6  2022 black less_than_hs      15.2
##  7  2022 black high_school       19.4
##  8  2022 black some_college      21.3
##  9  2022 black bachelors_degree  33.4
## 10  2022 black advanced_degree   44.7
## # ℹ 740 more rows
wages_data_race|>
  ggplot(aes(x = year, y = wages,  color = race)) +
  geom_point() +
  geom_line() +
  scale_color_brewer(palette = "Set1") +
  theme_bw() +
  theme(panel.border = element_blank(),
        legend.title = element_blank(),
        plot.title = element_text(hjust = 0.5)) +
  facet_grid(.~ education) + 
  labs(x = "Year",  y="Average hourly wages",
       title="Average hourly wages Trends by Education Level and Race", color="Education Level")

The above set of line graphs shows the average hourly salary trends in the United States from 1980 to 2022, broken down by race (Black-red line, Hispanic-blue line, and White-green line) and different education levels. Advanced Degree: Over the time, average hourly wages for individuals with advanced degrees have increased for all racial groups. Salary trends show that Whites make more money than Blacks do, while Hispanics make the least. The salary disparity between Whites and the other two groups has not decreased, with growth in all categories. Bachelor’s Degree: All racial groups have seen increases in wages for workers with a bachelor’s degree, but White workers have seen the most increases. Over the years, the salary gap between Whites and the other two groups has slightly increased, with Hispanics earning the lowest of the three. High School: Since the mid-2000s, there has been relatively small growth in the average salary for those who have completed their high school education. Once more, Whites earn more than Blacks, who are next in line, with Hispanics coming in last. Less Than High School: Over the time, wages for individuals without a high school graduation have changed slightly. Although there is less of a pay gap in this category between the racial groupings than there is in the higher education categories, Whites still make slightly more than Black and Hispanic workers. Some College: Over time, wages for people who have attended some college but did not earn a degree have increased slightly. Though the difference in pay is less than in other categories, Whites continue to make more than Blacks and Hispanics, with the latter group always earning the lowest wages. The average salary of White people is typically higher than that of Black and Hispanic people across all educational levels. Higher education levels are associated with a more significant income gap between racial groupings, particularly for individuals holding advanced and bachelor’s degrees. The data emphasizes the pay differences depending on race, even among individuals with comparable educational backgrounds, in addition to the importance of education in determining wages.

a.How you cleaned the data set up. Column Selection: Particular columns were picked for examination by using the select function. This is an essential stage since it clarifies the variables to be focused on, making future studies more efficient. Data Transformation into Long Format: Tools such as “pivot_longer” were used to transform the data into a ‘long’ format. When working with time series or grouped data, long format is especially helpful as it prevents redundancies and makes visualization easier. I have one “year” column and a corresponding “value” column rather than distinct columns for each year or group. Managing Missing Values in Visualization: When creating line graphs in ggplot2, the geom_line function includes an innate feature that ignores NA values. This automatic omission makes sure that missing data doesn’t cause the graph’s lines to break. It’s a crucial component for interpret ability and visual coherence.

b.What the visualization represents, any interesting patterns or surprises that arise within the visualization. Fascinating Patterns: The salary gap according to race, gender, and education is one constant finding. Those with more education typically have better incomes. On the other hand, Whites typically make more money than Blacks and Hispanics with the same educational level. In most categories, men earn more than women, demonstrating the gender wage difference.

Surprises: The finding that the salary difference in the ‘Some College’ category is reducing was unexpected and suggests that some college education may be partially leveling the wage distribution.

c.Anything that you might have shown that you could not get to work or that you wished you could have included. Correlation with Economic Indicators: At the time of analysis, it was not possible to obtain the extra data sets that we would have needed to investigate the relationship between wages and more general economic indicators such as GDP growth and unemployment rates.