Project 1 - State vs Federal Minimum Wage

Author

Andrew Hart

Introduction to my Project

I chose to use a data set that I found on kaggle.com. The data set “is a cleaned data set of US state and federal minimum wages from 1968 to 2020 (including 2020 equivalency values). The data was scraped from the United States Department of Labor’s table of minimum wage by state.” From this data I chose to focus on one state for each visualization. I begin to clean the data and make it easier to work with by changing all the columns to lowercase and changing the periods to underscores. I then select only the columns that I’ll be using for my visualizations. After that I filter out by state and create new variables for both MD and CA. Even though the data set is listed as “cleaned” I still use is.na to be sure there are no “NA” values present.

Load in the data

I set my working directory after loading the needed libraries, then load in my data.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.2     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)

setwd("C:/Users/andre/OneDrive/Documents/School/Data 110")
wage_data <- read_csv("Minimum Wage Data.csv")
Rows: 2862 Columns: 15
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (3): State, Department.Of.Labor.Uncleaned.Data, Footnote
dbl (12): Year, State.Minimum.Wage, State.Minimum.Wage.2020.Dollars, Federal...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
names(wage_data) <- tolower(names(wage_data))
names(wage_data) <- gsub("\\.", "_", names(wage_data))

selected_columns <- c('year', 'state', 'state_minimum_wage_2020_dollars', 'federal_minimum_wage_2020_dollars')
wage_data <- wage_data[selected_columns]
head(wage_data)
# A tibble: 6 × 4
   year state      state_minimum_wage_2020_dollars federal_minimum_wage_2020_d…¹
  <dbl> <chr>                                <dbl>                         <dbl>
1  1968 Alabama                               0                             8.55
2  1968 Alaska                               15.6                           8.55
3  1968 Arizona                               3.48                          8.55
4  1968 Arkansas                              1.16                          8.55
5  1968 California                           12.3                           8.55
6  1968 Colorado                              7.43                          8.55
# ℹ abbreviated name: ¹​federal_minimum_wage_2020_dollars

Filter the data

I chose to focus on Maryland because it is where we are located and California due to its reputation of a high cost of living.

filtered_wage_data <- wage_data[wage_data$state == "Maryland", ]
filtered_wage_data2 <- wage_data[wage_data$state == "California", ]

head(filtered_wage_data)
# A tibble: 6 × 4
   year state    state_minimum_wage_2020_dollars federal_minimum_wage_2020_dol…¹
  <dbl> <chr>                              <dbl>                           <dbl>
1  1968 Maryland                            7.43                            8.55
2  1969 Maryland                            7.05                            8.11
3  1970 Maryland                            8.67                            8.67
4  1971 Maryland                            8.3                             8.3 
5  1972 Maryland                            9.9                             9.9 
6  1973 Maryland                            9.32                            9.32
# ℹ abbreviated name: ¹​federal_minimum_wage_2020_dollars

Continue to clean the data so that it can be better used

I used is.na() to make sure to clean the data sets of any NA values.

cleaned_wage_data <- filtered_wage_data %>%
  mutate(year = ifelse(is.na(year), 0, year),
         state = ifelse(is.na(state), "", state),
         federal_minimum_wage_2020_dollars = ifelse(is.na(federal_minimum_wage_2020_dollars), 0, federal_minimum_wage_2020_dollars),
         state_minimum_wage_2020_dollars = ifelse(is.na(state_minimum_wage_2020_dollars), 0, state_minimum_wage_2020_dollars))

cleaned_wage_data2 <- filtered_wage_data2 %>%
  mutate(year = ifelse(is.na(year), 0, year),
         state = ifelse(is.na(state), "", state),
         federal_minimum_wage_2020_dollars = ifelse(is.na(federal_minimum_wage_2020_dollars), 0, federal_minimum_wage_2020_dollars),
         state_minimum_wage_2020_dollars = ifelse(is.na(state_minimum_wage_2020_dollars), 0, state_minimum_wage_2020_dollars))



head(cleaned_wage_data)
# A tibble: 6 × 4
   year state    state_minimum_wage_2020_dollars federal_minimum_wage_2020_dol…¹
  <dbl> <chr>                              <dbl>                           <dbl>
1  1968 Maryland                            7.43                            8.55
2  1969 Maryland                            7.05                            8.11
3  1970 Maryland                            8.67                            8.67
4  1971 Maryland                            8.3                             8.3 
5  1972 Maryland                            9.9                             9.9 
6  1973 Maryland                            9.32                            9.32
# ℹ abbreviated name: ¹​federal_minimum_wage_2020_dollars

Bar Graph for Maryland

plot1 <- ggplot(cleaned_wage_data, aes(x = year)) +
  geom_bar(aes(y = state_minimum_wage_2020_dollars, fill = "State Minimum Wage"), stat = "identity") +
  geom_bar(aes(y = federal_minimum_wage_2020_dollars, fill = "Federal Minimum Wage"), stat = "identity", alpha = 0.5) +
  scale_fill_manual(values = c("State Minimum Wage" = "darkred", "Federal Minimum Wage" = "yellow"), name = "Legend") +
  scale_y_continuous(breaks = seq(0, max(cleaned_wage_data$year), by = 2)) +
  labs(title = "Maryland vs. Federal \nMinimum Wage Comparison",
       x = "Year",
       y = "Minimum Wage (in 2020 dollars)") +
  theme_minimal()
plot1

Bar Graph for California

plot2 <- ggplot(cleaned_wage_data2, aes(x = year)) +
  geom_bar(aes(y = state_minimum_wage_2020_dollars, fill = "State Minimum Wage"), stat = "identity") +
  geom_bar(aes(y = federal_minimum_wage_2020_dollars, fill = "Federal Minimum Wage"), stat = "identity", alpha = 0.5) +
  scale_fill_manual(values = c("State Minimum Wage" = "darkred", "Federal Minimum Wage" = "yellow"), name = "Legend") +
  scale_y_continuous(breaks = seq(0, max(cleaned_wage_data$year), by = 2)) +
  labs(title = "California vs. Federal \nMinimum Wage Comparison",
       x = "Year",
       y = "Minimum Wage (in 2020 dollars)") +
  theme_minimal()
plot2

What my visualization represents

My visualizations show the relationship between state and federal minimum wages. I used the year for the x axis and used the minimum wage in 2020 dollars for the y axis. I chose to use dark red (State) and yellow (Federal) for when they blend they make orange. As you can see most of the graph is orange with the only time you see either red or yellow indicating a clear difference. As you can see for a state such as Maryland, the state and federal minimum wages mainly stayed the same until around 2015 when the State decided to raise its minimum. Compared to a state such as California where the state minimum started higher than the federal, then dipped below the federal before effectively matching the federal. Then around 2000 the state minimum rose steadily for the next 20 years. One interesting thing to note is that since 2010 the federal minimum has been steadily decreasing while both states I used saw increases. I am curious if I were to repeat graphs for all states, how many states would align with the federal minimum versus raising over the last ~10 years. The benefit of this code being written now is that it can easily be modified to make a graph for any state.

Anything I could not get to work

Originally I planned to make a heat map including all states, and using the “heat” or color to represent how they compare to the federal minimum. After numerous attempts at the heat maps I could not get a visualization that I felt was effective and easy to read. Including all states made the visualization clustered for starters. I also felt that I couldn’t effectively show the exact value for the states and the federal minimum wage whereas with the bar graphs I could.