Intro: - The final datasets we will be using are “US Minimum Wage by State from 1968 to 2017” found on kaggle.com and “Employment Status” by the United States Census Bureau. Our data cleaning process including trimming the first data set for years 2012-2017 and removing columns three through nine, which included variables that did not correspond with our project plan. Similarly, for our second data set, the cleaning process included removing unwanted variable columns, adding a column for the year, and renaming the columns. To merge our data, we first had to merge the data from the US Census Bureau, since the data for each year was a separate data frame. Then we merged the minimum wage data with the employment status to create our final data set. Finally, we had to remove Alabama, Louisiana, Tennessee, South Carolina, U.S. Virgin Islands, Mississippi, Puerto Rico, District of Columbia, Federal (FLSA), and Guam since data was not available from both data sets for these areas. This data is interesting because it allows us to look at trends experienced by different states. Specifically, we want to focus on trends during election years (2012, 2016) and the years following in order to see if a correlation exists. The data also allows us to break up the states into regions in order to determine if there are similarities within regions and differences between regions. The data would also allow us to break up the states in “blue” and “red” states to see if there is a trend between states with the same and different political affiliation Our goal is to determine if there are trends in the data that can be linked to economic or political events that took place during those years. Although we will not be using time as an explanatory variable since this does not meet the assumption for independence, we will use the year to link any relationships that we conclude among our variables.
### cleaning our final data set
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.6.2
## -- Attaching packages -------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.2.1 v purrr 0.3.3
## v tibble 2.1.3 v dplyr 0.8.3
## v tidyr 1.0.0 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## Warning: package 'tidyr' was built under R version 3.6.2
## Warning: package 'purrr' was built under R version 3.6.2
## -- Conflicts ----------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
data <- read.csv("Minimum Wage Data.csv", header=TRUE)
view(data)
#create new column with average value from high and low value
data$Average.Value <- (data$High.Value + data$Low.Value)/2
minwage <- filter(data, Year %in% c("2012", "2013", "2014", "2015", "2016", "2017"))
data2 <- read.csv("Emp. Rates 2012.csv", header = TRUE)
d2012 <- data2[,-c(2)]
d2012$Year <- "2012"
names(d2012) <- c("State", "Employed", "Unemployed", "Year")
d2012
## State Employed Unemployed Year
## 1 Alabama 52.3 10.0 2012
## 2 Alaska 62.3 7.8 2012
## 3 Arizona 54.0 9.8 2012
## 4 Arkansas 54.2 8.4 2012
## 5 California 56.1 11.4 2012
## 6 Colorado 62.6 7.8 2012
## 7 Connecticut 60.8 9.7 2012
## 8 Delaware 58.2 8.8 2012
## 9 Florida 52.4 11.5 2012
## 10 Georgia 55.4 11.0 2012
## 11 Hawaii 57.8 7.1 2012
## 12 Idaho 58.4 8.0 2012
## 13 Illinois 59.3 10.2 2012
## 14 Indiana 58.6 8.8 2012
## 15 Iowa 64.1 5.3 2012
## 16 Kansas 62.2 6.5 2012
## 17 Kentucky 53.8 9.3 2012
## 18 Louisiana 55.5 8.8 2012
## 19 Maine 59.0 7.6 2012
## 20 Maryland 62.5 8.3 2012
## 21 Massachusetts 61.6 8.5 2012
## 22 Michigan 54.3 11.3 2012
## 23 Minnesota 65.7 6.3 2012
## 24 Mississippi 51.4 11.3 2012
## 25 Missouri 57.9 8.6 2012
## 26 Montana 59.6 7.2 2012
## 27 Nebraska 66.3 5.3 2012
## 28 Nevada 56.8 12.3 2012
## 29 New Hampshire 64.1 6.5 2012
## 30 New Jersey 59.4 10.1 2012
## 31 New Mexico 53.5 10.3 2012
## 32 New York 57.5 9.2 2012
## 33 North Carolina 55.3 10.8 2012
## 34 North Dakota 67.6 3.3 2012
## 35 Ohio 57.5 9.1 2012
## 36 Oklahoma 57.0 6.8 2012
## 37 Oregon 55.4 11.0 2012
## 38 Pennsylvania 57.2 8.9 2012
## 39 Rhode Island 59.6 9.4 2012
## 40 South Carolina 53.4 11.2 2012
## 41 South Dakota 65.3 4.7 2012
## 42 Tennessee 55.0 9.5 2012
## 43 Texas 59.2 8.0 2012
## 44 Utah 63.9 6.8 2012
## 45 Vermont 62.9 6.4 2012
## 46 Virginia 60.2 6.9 2012
## 47 Washington 58.4 8.7 2012
## 48 West Virginia 49.7 8.0 2012
## 49 Wisconsin 62.3 7.3 2012
## 50 Wyoming 63.6 5.6 2012
data3 <- read.csv("Emp. Rates 2013.csv", header = TRUE)
d2013 <- data3[,-c(2)]
d2013$Year <- "2013"
names(d2013) <- c("State", "Employed", "Unemployed", "Year")
d2013
## State Employed Unemployed Year
## 1 Alabama 52.3 9.6 2013
## 2 Alaska 62.4 8.7 2013
## 3 Arizona 53.8 8.9 2013
## 4 Arkansas 53.4 8.1 2013
## 5 California 56.7 10.0 2013
## 6 Colorado 62.4 7.0 2013
## 7 Connecticut 60.9 9.3 2013
## 8 Delaware 58.0 8.6 2013
## 9 Florida 52.9 9.7 2013
## 10 Georgia 55.9 10.3 2013
## 11 Hawaii 57.0 6.1 2013
## 12 Idaho 57.8 7.2 2013
## 13 Illinois 59.5 9.5 2013
## 14 Indiana 58.7 7.8 2013
## 15 Iowa 64.4 4.8 2013
## 16 Kansas 62.2 5.9 2013
## 17 Kentucky 54.2 8.4 2013
## 18 Louisiana 55.3 8.0 2013
## 19 Maine 58.5 6.9 2013
## 20 Maryland 62.9 7.4 2013
## 21 Massachusetts 62.1 7.8 2013
## 22 Michigan 55.2 9.8 2013
## 23 Minnesota 66.1 5.4 2013
## 24 Mississippi 51.3 10.6 2013
## 25 Missouri 58.5 7.5 2013
## 26 Montana 59.1 6.5 2013
## 27 Nebraska 66.4 4.6 2013
## 28 Nevada 57.2 11.0 2013
## 29 New Hampshire 63.9 6.0 2013
## 30 New Jersey 59.8 9.1 2013
## 31 New Mexico 53.5 9.2 2013
## 32 New York 57.9 8.7 2013
## 33 North Carolina 55.8 9.7 2013
## 34 North Dakota 67.6 2.6 2013
## 35 Ohio 58.0 8.1 2013
## 36 Oklahoma 57.2 6.2 2013
## 37 Oregon 56.3 9.2 2013
## 38 Pennsylvania 57.7 8.3 2013
## 39 Rhode Island 59.5 9.2 2013
## 40 South Carolina 54.6 9.5 2013
## 41 South Dakota 65.7 4.0 2013
## 42 Tennessee 55.8 8.7 2013
## 43 Texas 59.9 7.1 2013
## 44 Utah 63.3 5.5 2013
## 45 Vermont 62.2 5.8 2013
## 46 Virginia 60.4 6.6 2013
## 47 Washington 58.2 7.9 2013
## 48 West Virginia 49.7 8.5 2013
## 49 Wisconsin 62.7 6.5 2013
## 50 Wyoming 64.4 5.1 2013
data4 <- read.csv("Emp. Rates 2014.csv", header = TRUE)
d2014 <- data4[,-c(2)]
d2014$Year <- "2014"
names(d2014) <- c("State", "Employed", "Unemployed", "Year")
d2014
## State Employed Unemployed Year
## 1 Alabama 52.5 8.6 2014
## 2 Alaska 61.9 7.6 2014
## 3 Arizona 54.2 7.9 2014
## 4 Arkansas 54.2 6.8 2014
## 5 California 57.4 8.5 2014
## 6 Colorado 63.7 5.5 2014
## 7 Connecticut 61.6 7.9 2014
## 8 Delaware 58.5 6.7 2014
## 9 Florida 53.6 8.0 2014
## 10 Georgia 56.7 8.3 2014
## 11 Hawaii 58.1 5.4 2014
## 12 Idaho 58.1 5.5 2014
## 13 Illinois 60.0 8.1 2014
## 14 Indiana 59.5 7.1 2014
## 15 Iowa 64.7 4.4 2014
## 16 Kansas 61.9 5.2 2014
## 17 Kentucky 54.4 7.6 2014
## 18 Louisiana 55.2 7.5 2014
## 19 Maine 59.3 5.9 2014
## 20 Maryland 62.7 7.2 2014
## 21 Massachusetts 62.9 6.7 2014
## 22 Michigan 55.9 8.3 2014
## 23 Minnesota 66.3 4.7 2014
## 24 Mississippi 51.7 9.8 2014
## 25 Missouri 58.1 6.8 2014
## 26 Montana 60.1 4.9 2014
## 27 Nebraska 67.1 4.2 2014
## 28 Nevada 57.7 8.9 2014
## 29 New Hampshire 64.4 5.1 2014
## 30 New Jersey 61.0 7.5 2014
## 31 New Mexico 53.4 8.7 2014
## 32 New York 58.5 7.3 2014
## 33 North Carolina 56.3 8.3 2014
## 34 North Dakota 66.5 3.0 2014
## 35 Ohio 58.7 7.2 2014
## 36 Oklahoma 57.3 5.7 2014
## 37 Oregon 56.6 7.8 2014
## 38 Pennsylvania 58.1 7.0 2014
## 39 Rhode Island 59.7 7.8 2014
## 40 South Carolina 54.9 8.1 2014
## 41 South Dakota 66.5 3.9 2014
## 42 Tennessee 55.7 7.8 2014
## 43 Texas 60.2 6.1 2014
## 44 Utah 64.0 5.0 2014
## 45 Vermont 62.5 5.5 2014
## 46 Virginia 60.6 6.1 2014
## 47 Washington 59.1 6.5 2014
## 48 West Virginia 49.5 6.9 2014
## 49 Wisconsin 63.3 5.3 2014
## 50 Wyoming 64.8 4.3 2014
data5 <- read.csv("Emp. Rates 2015.csv", header = TRUE)
d2015 <- data5[,-c(2)]
d2015$Year <- "2015"
names(d2015) <- c("State", "Employed", "Unemployed", "Year")
d2015
## State Employed Unemployed Year
## 1 Alaska 62.1 7.9 2015
## 2 Arizona 54.5 6.9 2015
## 3 Arkansas 54.0 5.8 2015
## 4 California 58.1 7.3 2015
## 5 Colorado 63.8 5.2 2015
## 6 Connecticut 61.8 6.9 2015
## 7 Delaware 58.5 5.8 2015
## 8 Florida 54.0 7.0 2015
## 9 Georgia 57.6 7.1 2015
## 10 Hawaii 58.6 4.9 2015
## 11 Idaho 58.6 5.4 2015
## 12 Illinois 60.5 6.9 2015
## 13 Indiana 59.8 5.8 2015
## 14 Iowa 64.6 4.2 2015
## 15 Kansas 62.4 4.7 2015
## 16 Kentucky 54.6 6.5 2015
## 17 Maine 58.7 5.4 2015
## 18 Maryland 63.0 5.5 2015
## 19 Massachusetts 63.1 5.8 2015
## 20 Michigan 56.5 7.2 2015
## 21 Minnesota 67.0 4.2 2015
## 22 Missouri 59.4 5.3 2015
## 23 Montana 59.0 4.5 2015
## 24 Nebraska 66.8 3.2 2015
## 25 Nevada 58.1 7.9 2015
## 26 New Hampshire 64.7 4.2 2015
## 27 New Jersey 60.8 6.6 2015
## 28 New Mexico 53.6 7.4 2015
## 29 New York 59.0 6.5 2015
## 30 North Carolina 56.8 6.9 2015
## 31 North Dakota 67.1 2.6 2015
## 32 Ohio 59.1 6.4 2015
## 33 Oklahoma 57.4 5.5 2015
## 34 Oregon 57.6 6.8 2015
## 35 Pennsylvania 58.6 6.3 2015
## 36 Rhode Island 60.3 6.2 2015
## 37 South Dakota 65.1 4.0 2015
## 38 Texas 60.3 5.5 2015
## 39 Utah 64.9 4.0 2015
## 40 Vermont 62.9 3.8 2015
## 41 Virginia 60.7 5.5 2015
## 42 Washington 59.2 6.0 2015
## 43 West Virginia 49.0 7.3 2015
## 44 Wisconsin 64.0 4.3 2015
## 45 Wyoming 64.1 4.8 2015
data6 <- read.csv("Emp. Rates 2016.csv", header = TRUE)
d2016 <- data6[,-c(2)]
d2016$Year <- "2016"
names(d2016) <- c("State", "Employed", "Unemployed", "Year")
d2016
## State Employed Unemployed Year
## 1 Alabama 53.3 6.4 2016
## 2 Alaska 62.1 8.0 2016
## 3 Arizona 55.3 6.5 2016
## 4 Arkansas 54.4 5.1 2016
## 5 California 58.8 6.5 2016
## 6 Colorado 64.0 4.7 2016
## 7 Connecticut 62.1 6.4 2016
## 8 Delaware 57.6 5.7 2016
## 9 Florida 54.4 6.0 2016
## 10 Georgia 58.6 6.0 2016
## 11 Hawaii 59.6 4.4 2016
## 12 Idaho 58.8 4.7 2016
## 13 Illinois 61.0 6.3 2016
## 14 Indiana 60.4 5.0 2016
## 15 Iowa 64.8 3.9 2016
## 16 Kansas 62.7 4.5 2016
## 17 Kentucky 55.2 6.0 2016
## 18 Louisiana 54.8 7.0 2016
## 19 Maine 59.4 4.4 2016
## 20 Maryland 63.8 5.4 2016
## 21 Massachusetts 63.7 5.3 2016
## 22 Michigan 57.3 6.2 2016
## 23 Minnesota 66.8 3.8 2016
## 24 Mississippi 52.3 7.7 2016
## 25 Missouri 59.5 4.9 2016
## 26 Montana 60.1 4.8 2016
## 27 Nebraska 66.6 3.7 2016
## 28 Nevada 58.8 6.7 2016
## 29 New Hampshire 65.0 3.6 2016
## 30 New Jersey 61.3 6.0 2016
## 31 New Mexico 53.3 7.5 2016
## 32 New York 59.2 5.9 2016
## 33 North Carolina 57.2 6.2 2016
## 34 North Dakota 67.9 2.8 2016
## 35 Ohio 59.5 5.7 2016
## 36 Oklahoma 56.9 6.0 2016
## 37 Oregon 58.3 5.7 2016
## 38 Pennsylvania 58.5 5.8 2016
## 39 Rhode Island 59.7 5.9 2016
## 40 South Carolina 55.9 6.3 2016
## 41 South Dakota 65.0 3.9 2016
## 42 Tennessee 56.8 5.5 2016
## 43 Texas 60.5 5.6 2016
## 44 Utah 65.2 4.1 2016
## 45 Vermont 62.5 3.9 2016
## 46 Virginia 60.9 5.0 2016
## 47 Washington 59.8 5.4 2016
## 48 West Virginia 49.4 7.6 2016
## 49 Wisconsin 63.8 4.1 2016
## 50 Wyoming 62.6 5.6 2016
data7<-read.csv("Emp. Rates 2017.csv", header = TRUE)
d2017<-data7[,-c(2)]
d2017$Year<-"2017"
names(d2017)<-c("State", "Employed", "Unemployed", "Year")
d2017
## State Employed Unemployed Year
## 1 Alabama 52.9 5.8 2017
## 2 Alaska 60.2 7.6 2017
## 3 Arizona 56.1 5.8 2017
## 4 Arkansas 54.6 5.6 2017
## 5 California 59.5 5.9 2017
## 6 Colorado 64.6 4.2 2017
## 7 Connecticut 61.4 6.1 2017
## 8 Delaware 55.8 5.3 2017
## 9 District of Columbia 65.3 6.6 2017
## 10 Florida 54.9 5.5 2017
## 11 Georgia 59.1 5.8 2017
## 12 Hawaii 59.0 4.2 2017
## 13 Idaho 59.2 4.1 2017
## 14 Illinois 60.8 6.1 2017
## 15 Indiana 60.4 4.7 2017
## 16 Iowa 64.8 3.6 2017
## 17 Kansas 63.0 4.2 2017
## 18 Kentucky 55.6 5.5 2017
## 19 Louisiana 54.7 6.5 2017
## 20 Maine 60.3 4.2 2017
## 21 Maryland 63.8 5.2 2017
## 22 Massachusetts 63.6 4.6 2017
## 23 Michigan 57.8 5.9 2017
## 24 Minnesota 66.9 3.6 2017
## 25 Mississippi 52.2 7.0 2017
## 26 Missouri 59.8 4.6 2017
## 27 Montana 61.3 3.5 2017
## 28 Nebraska 67.4 3.3 2017
## 29 Nevada 59.6 5.9 2017
## 30 New Hampshire 65.1 3.8 2017
## 31 New Jersey 61.8 5.3 2017
## 32 New Mexico 52.6 6.6 2017
## 33 New York 59.6 5.5 2017
## 34 North Carolina 58.0 5.3 2017
## 35 North Dakota 67.9 2.9 2017
## 36 Ohio 59.6 5.2 2017
## 37 Oklahoma 57.2 5.4 2017
## 38 Oregon 59.1 5.1 2017
## 39 Pennsylvania 59.0 5.3 2017
## 40 Rhode Island 61.3 5.7 2017
## 41 South Carolina 56.1 5.8 2017
## 42 South Dakota 65.1 3.5 2017
## 43 Tennessee 58.1 4.9 2017
## 44 Texas 60.7 5.1 2017
## 45 Utah 66.0 3.6 2017
## 46 Vermont 62.8 3.8 2017
## 47 Virginia 61.1 4.6 2017
## 48 Washington 60.6 4.9 2017
## 49 West Virginia 48.8 6.7 2017
## 50 Wisconsin 63.9 3.5 2017
## 51 Wyoming 61.9 5.2 2017
## 52 Puerto Rico 36.2 16.4 2017
join<-rbind(d2012, d2013, d2014, d2015, d2016, d2017)
view(join)
Now we have various visuals
excludeStates<-c("Alabama", "Louisiana", "Tennessee", "South Carolina", "U.S. Virgin Islands", "Mississippi", "Puerto Rico", "District of Columbia", "Federal (FLSA)", "Guam")
employmentdata<-join%>%
filter(!State %in% excludeStates)
minwage<-minwage%>%
filter(!State %in% excludeStates)
employmentdata$Year<-as.integer(employmentdata$Year)
joined<-inner_join(minwage, employmentdata)
## Joining, by = c("Year", "State")
## Warning: Column `State` joining factors with different levels, coercing to
## character vector
## We decided to look at the average minimum wage value from 2012 to 2017 by region
west<-filter(joined, State %in% c("Alaska", "California", "Hawaii", "Washington", "Oregon", "New Mexico", "Arizona", "Colorado", "Montana", "Idaho", "Nevada", "Utah", "Wyoming"))
west_mod_data<-lm(Average.Value~Year, data = west)
west_chart<-ggplot(west, aes(x = Year, y = Average.Value, color = as.factor(State)))+
geom_point()+
geom_abline(intercept = west_mod_data$coefficients[1], slope = west_mod_data$coefficients[2])
west_chart
midwest<-filter(joined, State %in% c("North Dakota", "South Dakota", "Minnesota", "Nebraska", "Michigan", "Iowa", "Kansas", "Illinois", "Indiana", "Ohio", "Missouri"))
midwest_mod_data<-lm(Average.Value~Year, data = midwest)
midwest_chart<-ggplot(midwest, aes(x = Year, y = Average.Value, color = as.factor(State)))+
geom_point()+
geom_abline(intercept = midwest_mod_data$coefficients[1], slope = midwest_mod_data$coefficients[2])
midwest_chart
south<-filter(joined, State %in% c("Oklahoma", "Arkansas", "Texas", "Louisiana", "Mississippi", "Kentucky", "Tennessee", "Georgia", "Alabama", "Florida", "South Carolina", "North Carolina", "West Virginia", "Virginia", "Maryland", "Delaware"))
south_mod_data<-lm(Average.Value~Year, data = south)
south_chart<-ggplot(south, aes(x = Year, y = Average.Value, color = as.factor(State)))+
geom_point()+
geom_abline(intercept = south_mod_data$coefficients[1], slope = south_mod_data$coefficients[2])
south_chart
northeast<-filter(joined, State %in% c("New York", "Pennsylvania", "Maine", "Vermont", "New Hampshire", "Rhode Island", "Massachusetts", "Connecticut", "New Jersey"))
northeast_mod_data<-lm(Average.Value~Year, data = northeast)
northeast_chart<-ggplot(northeast, aes(x = Year, y = Average.Value, color = as.factor(State)))+
geom_point()+
geom_abline(intercept = northeast_mod_data$coefficients[1], slope = northeast_mod_data$coefficients[2])
northeast_chart
### Then we switched to employment rates by region.
west_mod_data2<-lm(Employed~Year, data = west)
west_emp_chart<-ggplot(west, aes(x = Year, y = Employed, color = as.factor(State)))+
geom_point()+
geom_abline(intercept = west_mod_data2$coefficients[1], slope = west_mod_data2$coefficients[2])
west_emp_chart
midwest_mod_data2<-lm(Employed~Year, data = midwest)
midwest_emp_chart<-ggplot(midwest, aes(x = Year, y = Employed, color = as.factor(State)))+
geom_point()+
geom_abline(intercept = midwest_mod_data2$coefficients[1], slope = midwest_mod_data2$coefficients[2])
midwest_emp_chart
south_mod_data2<-lm(Employed~Year, data = south)
south_emp_chart<-ggplot(south, aes(x = Year, y = Employed, color = as.factor(State)))+
geom_point()+
geom_abline(intercept = south_mod_data2$coefficients[1], slope = south_mod_data2$coefficients[2])
south_emp_chart
northeast_mod_data2<-lm(Employed~Year, data = northeast)
northeast_emp_chart<-ggplot(northeast, aes(x = Year, y = Employed, color = as.factor(State)))+
geom_point()+
geom_abline(intercept = northeast_mod_data2$coefficients[1], slope = northeast_mod_data2$coefficients[2])
northeast_emp_chart
### Now unemployed by region
wmd3<-lm(Unemployed~Year, data = west)
wuc<-ggplot(west, aes(x = Year, y = Unemployed, color = as.factor(State)))+
geom_point()+
geom_abline(intercept = wmd3$coefficients[1], slope = wmd3$coefficients[2])
wuc
mmd3<-lm(Unemployed~Year, data = midwest)
muc<-ggplot(midwest, aes(x = Year, y = Unemployed, color = as.factor(State)))+
geom_point()+
geom_abline(intercept = mmd3$coefficients[1], slope = mmd3$coefficients[2])
muc
smd3<-lm(Unemployed~Year, data = south)
suc<-ggplot(south, aes(x = Year, y = Unemployed, color = as.factor(State)))+
geom_point()+
geom_abline(intercept = smd3$coefficients[1], slope = smd3$coefficients[2])
suc
nmd3<-lm(Unemployed~Year, data = northeast)
nuc<-ggplot(northeast, aes(x = Year, y = Unemployed, color = as.factor(State)))+
geom_point()+
geom_abline(intercept = nmd3$coefficients[1], slope = nmd3$coefficients[2])
nuc
EDA - Variables: - The variables used in the graphics are the employment and unemployment rates of all 50 states (minus about 6 due to their lack of data). Another variable used to determine relationships were the years 2012 through 2017, the average values of each state’s minimum wage from 2012 to 2017, and the states themselves.
The relationships found were the employment and unemployment rates over the 6 year interval (from 2012 to 2017), as well as the average value of each state’s minimum wage during this time, all divided by region (northeast, south, midwest, and west).
Some possible relationships might be that election years might not have been as influential on minimum wage values and employment rates as we originally thought. Employment rates and minimum wages throughout the United States have been steadily rising over this 6 year time period. Despite growing populations, the unemployment rates seemed to be declining.
We thought that the economy (by looking at minimum wages and employment rates throughout the country) would have some relationship with the elections of President Obama and Trump.