Intro: - The final datasets we will be using are “US Minimum Wage by State from 1968 to 2017” found on kaggle.com and “Employment Status” by the United States Census Bureau. Our data cleaning process including trimming the first data set for years 2012-2017 and removing columns three through nine, which included variables that did not correspond with our project plan. Similarly, for our second data set, the cleaning process included removing unwanted variable columns, adding a column for the year, and renaming the columns. To merge our data, we first had to merge the data from the US Census Bureau, since the data for each year was a separate data frame. Then we merged the minimum wage data with the employment status to create our final data set. Finally, we had to remove Alabama, Louisiana, Tennessee, South Carolina, U.S. Virgin Islands, Mississippi, Puerto Rico, District of Columbia, Federal (FLSA), and Guam since data was not available from both data sets for these areas. This data is interesting because it allows us to look at trends experienced by different states. Specifically, we want to focus on trends during election years (2012, 2016) and the years following in order to see if a correlation exists. The data also allows us to break up the states into regions in order to determine if there are similarities within regions and differences between regions. The data would also allow us to break up the states in “blue” and “red” states to see if there is a trend between states with the same and different political affiliation Our goal is to determine if there are trends in the data that can be linked to economic or political events that took place during those years. Although we will not be using time as an explanatory variable since this does not meet the assumption for independence, we will use the year to link any relationships that we conclude among our variables.

### cleaning our final data set
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.6.2
## -- Attaching packages -------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.2.1     v purrr   0.3.3
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   1.0.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## Warning: package 'tidyr' was built under R version 3.6.2
## Warning: package 'purrr' was built under R version 3.6.2
## -- Conflicts ----------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
data <- read.csv("Minimum Wage Data.csv", header=TRUE)
view(data)

#create new column with average value from high and low value
data$Average.Value <- (data$High.Value + data$Low.Value)/2
minwage <- filter(data, Year %in% c("2012", "2013", "2014", "2015", "2016", "2017"))

data2 <- read.csv("Emp. Rates 2012.csv", header = TRUE)
d2012 <- data2[,-c(2)]
d2012$Year <- "2012"
names(d2012) <- c("State", "Employed", "Unemployed", "Year")
d2012
##             State Employed Unemployed Year
## 1         Alabama     52.3       10.0 2012
## 2          Alaska     62.3        7.8 2012
## 3         Arizona     54.0        9.8 2012
## 4        Arkansas     54.2        8.4 2012
## 5      California     56.1       11.4 2012
## 6        Colorado     62.6        7.8 2012
## 7     Connecticut     60.8        9.7 2012
## 8        Delaware     58.2        8.8 2012
## 9         Florida     52.4       11.5 2012
## 10        Georgia     55.4       11.0 2012
## 11         Hawaii     57.8        7.1 2012
## 12          Idaho     58.4        8.0 2012
## 13       Illinois     59.3       10.2 2012
## 14        Indiana     58.6        8.8 2012
## 15           Iowa     64.1        5.3 2012
## 16         Kansas     62.2        6.5 2012
## 17       Kentucky     53.8        9.3 2012
## 18      Louisiana     55.5        8.8 2012
## 19          Maine     59.0        7.6 2012
## 20       Maryland     62.5        8.3 2012
## 21  Massachusetts     61.6        8.5 2012
## 22       Michigan     54.3       11.3 2012
## 23      Minnesota     65.7        6.3 2012
## 24    Mississippi     51.4       11.3 2012
## 25       Missouri     57.9        8.6 2012
## 26        Montana     59.6        7.2 2012
## 27       Nebraska     66.3        5.3 2012
## 28         Nevada     56.8       12.3 2012
## 29  New Hampshire     64.1        6.5 2012
## 30     New Jersey     59.4       10.1 2012
## 31     New Mexico     53.5       10.3 2012
## 32       New York     57.5        9.2 2012
## 33 North Carolina     55.3       10.8 2012
## 34   North Dakota     67.6        3.3 2012
## 35           Ohio     57.5        9.1 2012
## 36       Oklahoma     57.0        6.8 2012
## 37         Oregon     55.4       11.0 2012
## 38   Pennsylvania     57.2        8.9 2012
## 39   Rhode Island     59.6        9.4 2012
## 40 South Carolina     53.4       11.2 2012
## 41   South Dakota     65.3        4.7 2012
## 42      Tennessee     55.0        9.5 2012
## 43          Texas     59.2        8.0 2012
## 44           Utah     63.9        6.8 2012
## 45        Vermont     62.9        6.4 2012
## 46       Virginia     60.2        6.9 2012
## 47     Washington     58.4        8.7 2012
## 48  West Virginia     49.7        8.0 2012
## 49      Wisconsin     62.3        7.3 2012
## 50        Wyoming     63.6        5.6 2012
data3 <- read.csv("Emp. Rates 2013.csv", header = TRUE)
d2013 <- data3[,-c(2)]
d2013$Year <- "2013"
names(d2013) <- c("State", "Employed", "Unemployed", "Year")
d2013
##             State Employed Unemployed Year
## 1         Alabama     52.3        9.6 2013
## 2          Alaska     62.4        8.7 2013
## 3         Arizona     53.8        8.9 2013
## 4        Arkansas     53.4        8.1 2013
## 5      California     56.7       10.0 2013
## 6        Colorado     62.4        7.0 2013
## 7     Connecticut     60.9        9.3 2013
## 8        Delaware     58.0        8.6 2013
## 9         Florida     52.9        9.7 2013
## 10        Georgia     55.9       10.3 2013
## 11         Hawaii     57.0        6.1 2013
## 12          Idaho     57.8        7.2 2013
## 13       Illinois     59.5        9.5 2013
## 14        Indiana     58.7        7.8 2013
## 15           Iowa     64.4        4.8 2013
## 16         Kansas     62.2        5.9 2013
## 17       Kentucky     54.2        8.4 2013
## 18      Louisiana     55.3        8.0 2013
## 19          Maine     58.5        6.9 2013
## 20       Maryland     62.9        7.4 2013
## 21  Massachusetts     62.1        7.8 2013
## 22       Michigan     55.2        9.8 2013
## 23      Minnesota     66.1        5.4 2013
## 24    Mississippi     51.3       10.6 2013
## 25       Missouri     58.5        7.5 2013
## 26        Montana     59.1        6.5 2013
## 27       Nebraska     66.4        4.6 2013
## 28         Nevada     57.2       11.0 2013
## 29  New Hampshire     63.9        6.0 2013
## 30     New Jersey     59.8        9.1 2013
## 31     New Mexico     53.5        9.2 2013
## 32       New York     57.9        8.7 2013
## 33 North Carolina     55.8        9.7 2013
## 34   North Dakota     67.6        2.6 2013
## 35           Ohio     58.0        8.1 2013
## 36       Oklahoma     57.2        6.2 2013
## 37         Oregon     56.3        9.2 2013
## 38   Pennsylvania     57.7        8.3 2013
## 39   Rhode Island     59.5        9.2 2013
## 40 South Carolina     54.6        9.5 2013
## 41   South Dakota     65.7        4.0 2013
## 42      Tennessee     55.8        8.7 2013
## 43          Texas     59.9        7.1 2013
## 44           Utah     63.3        5.5 2013
## 45        Vermont     62.2        5.8 2013
## 46       Virginia     60.4        6.6 2013
## 47     Washington     58.2        7.9 2013
## 48  West Virginia     49.7        8.5 2013
## 49      Wisconsin     62.7        6.5 2013
## 50        Wyoming     64.4        5.1 2013
data4 <- read.csv("Emp. Rates 2014.csv", header = TRUE)
d2014 <- data4[,-c(2)]
d2014$Year <- "2014"
names(d2014) <- c("State", "Employed", "Unemployed", "Year")
d2014
##             State Employed Unemployed Year
## 1         Alabama     52.5        8.6 2014
## 2          Alaska     61.9        7.6 2014
## 3         Arizona     54.2        7.9 2014
## 4        Arkansas     54.2        6.8 2014
## 5      California     57.4        8.5 2014
## 6        Colorado     63.7        5.5 2014
## 7     Connecticut     61.6        7.9 2014
## 8        Delaware     58.5        6.7 2014
## 9         Florida     53.6        8.0 2014
## 10        Georgia     56.7        8.3 2014
## 11         Hawaii     58.1        5.4 2014
## 12          Idaho     58.1        5.5 2014
## 13       Illinois     60.0        8.1 2014
## 14        Indiana     59.5        7.1 2014
## 15           Iowa     64.7        4.4 2014
## 16         Kansas     61.9        5.2 2014
## 17       Kentucky     54.4        7.6 2014
## 18      Louisiana     55.2        7.5 2014
## 19          Maine     59.3        5.9 2014
## 20       Maryland     62.7        7.2 2014
## 21  Massachusetts     62.9        6.7 2014
## 22       Michigan     55.9        8.3 2014
## 23      Minnesota     66.3        4.7 2014
## 24    Mississippi     51.7        9.8 2014
## 25       Missouri     58.1        6.8 2014
## 26        Montana     60.1        4.9 2014
## 27       Nebraska     67.1        4.2 2014
## 28         Nevada     57.7        8.9 2014
## 29  New Hampshire     64.4        5.1 2014
## 30     New Jersey     61.0        7.5 2014
## 31     New Mexico     53.4        8.7 2014
## 32       New York     58.5        7.3 2014
## 33 North Carolina     56.3        8.3 2014
## 34   North Dakota     66.5        3.0 2014
## 35           Ohio     58.7        7.2 2014
## 36       Oklahoma     57.3        5.7 2014
## 37         Oregon     56.6        7.8 2014
## 38   Pennsylvania     58.1        7.0 2014
## 39   Rhode Island     59.7        7.8 2014
## 40 South Carolina     54.9        8.1 2014
## 41   South Dakota     66.5        3.9 2014
## 42      Tennessee     55.7        7.8 2014
## 43          Texas     60.2        6.1 2014
## 44           Utah     64.0        5.0 2014
## 45        Vermont     62.5        5.5 2014
## 46       Virginia     60.6        6.1 2014
## 47     Washington     59.1        6.5 2014
## 48  West Virginia     49.5        6.9 2014
## 49      Wisconsin     63.3        5.3 2014
## 50        Wyoming     64.8        4.3 2014
data5 <- read.csv("Emp. Rates 2015.csv", header = TRUE)
d2015 <- data5[,-c(2)]
d2015$Year <- "2015"
names(d2015) <- c("State", "Employed", "Unemployed", "Year")
d2015
##             State Employed Unemployed Year
## 1          Alaska     62.1        7.9 2015
## 2         Arizona     54.5        6.9 2015
## 3        Arkansas     54.0        5.8 2015
## 4      California     58.1        7.3 2015
## 5        Colorado     63.8        5.2 2015
## 6     Connecticut     61.8        6.9 2015
## 7        Delaware     58.5        5.8 2015
## 8         Florida     54.0        7.0 2015
## 9         Georgia     57.6        7.1 2015
## 10         Hawaii     58.6        4.9 2015
## 11          Idaho     58.6        5.4 2015
## 12       Illinois     60.5        6.9 2015
## 13        Indiana     59.8        5.8 2015
## 14           Iowa     64.6        4.2 2015
## 15         Kansas     62.4        4.7 2015
## 16       Kentucky     54.6        6.5 2015
## 17          Maine     58.7        5.4 2015
## 18       Maryland     63.0        5.5 2015
## 19  Massachusetts     63.1        5.8 2015
## 20       Michigan     56.5        7.2 2015
## 21      Minnesota     67.0        4.2 2015
## 22       Missouri     59.4        5.3 2015
## 23        Montana     59.0        4.5 2015
## 24       Nebraska     66.8        3.2 2015
## 25         Nevada     58.1        7.9 2015
## 26  New Hampshire     64.7        4.2 2015
## 27     New Jersey     60.8        6.6 2015
## 28     New Mexico     53.6        7.4 2015
## 29       New York     59.0        6.5 2015
## 30 North Carolina     56.8        6.9 2015
## 31   North Dakota     67.1        2.6 2015
## 32           Ohio     59.1        6.4 2015
## 33       Oklahoma     57.4        5.5 2015
## 34         Oregon     57.6        6.8 2015
## 35   Pennsylvania     58.6        6.3 2015
## 36   Rhode Island     60.3        6.2 2015
## 37   South Dakota     65.1        4.0 2015
## 38          Texas     60.3        5.5 2015
## 39           Utah     64.9        4.0 2015
## 40        Vermont     62.9        3.8 2015
## 41       Virginia     60.7        5.5 2015
## 42     Washington     59.2        6.0 2015
## 43  West Virginia     49.0        7.3 2015
## 44      Wisconsin     64.0        4.3 2015
## 45        Wyoming     64.1        4.8 2015
data6 <- read.csv("Emp. Rates 2016.csv", header = TRUE)
d2016 <- data6[,-c(2)]
d2016$Year <- "2016"
names(d2016) <- c("State", "Employed", "Unemployed", "Year")
d2016
##             State Employed Unemployed Year
## 1         Alabama     53.3        6.4 2016
## 2          Alaska     62.1        8.0 2016
## 3         Arizona     55.3        6.5 2016
## 4        Arkansas     54.4        5.1 2016
## 5      California     58.8        6.5 2016
## 6        Colorado     64.0        4.7 2016
## 7     Connecticut     62.1        6.4 2016
## 8        Delaware     57.6        5.7 2016
## 9         Florida     54.4        6.0 2016
## 10        Georgia     58.6        6.0 2016
## 11         Hawaii     59.6        4.4 2016
## 12          Idaho     58.8        4.7 2016
## 13       Illinois     61.0        6.3 2016
## 14        Indiana     60.4        5.0 2016
## 15           Iowa     64.8        3.9 2016
## 16         Kansas     62.7        4.5 2016
## 17       Kentucky     55.2        6.0 2016
## 18      Louisiana     54.8        7.0 2016
## 19          Maine     59.4        4.4 2016
## 20       Maryland     63.8        5.4 2016
## 21  Massachusetts     63.7        5.3 2016
## 22       Michigan     57.3        6.2 2016
## 23      Minnesota     66.8        3.8 2016
## 24    Mississippi     52.3        7.7 2016
## 25       Missouri     59.5        4.9 2016
## 26        Montana     60.1        4.8 2016
## 27       Nebraska     66.6        3.7 2016
## 28         Nevada     58.8        6.7 2016
## 29  New Hampshire     65.0        3.6 2016
## 30     New Jersey     61.3        6.0 2016
## 31     New Mexico     53.3        7.5 2016
## 32       New York     59.2        5.9 2016
## 33 North Carolina     57.2        6.2 2016
## 34   North Dakota     67.9        2.8 2016
## 35           Ohio     59.5        5.7 2016
## 36       Oklahoma     56.9        6.0 2016
## 37         Oregon     58.3        5.7 2016
## 38   Pennsylvania     58.5        5.8 2016
## 39   Rhode Island     59.7        5.9 2016
## 40 South Carolina     55.9        6.3 2016
## 41   South Dakota     65.0        3.9 2016
## 42      Tennessee     56.8        5.5 2016
## 43          Texas     60.5        5.6 2016
## 44           Utah     65.2        4.1 2016
## 45        Vermont     62.5        3.9 2016
## 46       Virginia     60.9        5.0 2016
## 47     Washington     59.8        5.4 2016
## 48  West Virginia     49.4        7.6 2016
## 49      Wisconsin     63.8        4.1 2016
## 50        Wyoming     62.6        5.6 2016
data7<-read.csv("Emp. Rates 2017.csv", header = TRUE)
d2017<-data7[,-c(2)]
d2017$Year<-"2017"
names(d2017)<-c("State", "Employed", "Unemployed", "Year")
d2017
##                   State Employed Unemployed Year
## 1               Alabama     52.9        5.8 2017
## 2                Alaska     60.2        7.6 2017
## 3               Arizona     56.1        5.8 2017
## 4              Arkansas     54.6        5.6 2017
## 5            California     59.5        5.9 2017
## 6              Colorado     64.6        4.2 2017
## 7           Connecticut     61.4        6.1 2017
## 8              Delaware     55.8        5.3 2017
## 9  District of Columbia     65.3        6.6 2017
## 10              Florida     54.9        5.5 2017
## 11              Georgia     59.1        5.8 2017
## 12               Hawaii     59.0        4.2 2017
## 13                Idaho     59.2        4.1 2017
## 14             Illinois     60.8        6.1 2017
## 15              Indiana     60.4        4.7 2017
## 16                 Iowa     64.8        3.6 2017
## 17               Kansas     63.0        4.2 2017
## 18             Kentucky     55.6        5.5 2017
## 19            Louisiana     54.7        6.5 2017
## 20                Maine     60.3        4.2 2017
## 21             Maryland     63.8        5.2 2017
## 22        Massachusetts     63.6        4.6 2017
## 23             Michigan     57.8        5.9 2017
## 24            Minnesota     66.9        3.6 2017
## 25          Mississippi     52.2        7.0 2017
## 26             Missouri     59.8        4.6 2017
## 27              Montana     61.3        3.5 2017
## 28             Nebraska     67.4        3.3 2017
## 29               Nevada     59.6        5.9 2017
## 30        New Hampshire     65.1        3.8 2017
## 31           New Jersey     61.8        5.3 2017
## 32           New Mexico     52.6        6.6 2017
## 33             New York     59.6        5.5 2017
## 34       North Carolina     58.0        5.3 2017
## 35         North Dakota     67.9        2.9 2017
## 36                 Ohio     59.6        5.2 2017
## 37             Oklahoma     57.2        5.4 2017
## 38               Oregon     59.1        5.1 2017
## 39         Pennsylvania     59.0        5.3 2017
## 40         Rhode Island     61.3        5.7 2017
## 41       South Carolina     56.1        5.8 2017
## 42         South Dakota     65.1        3.5 2017
## 43            Tennessee     58.1        4.9 2017
## 44                Texas     60.7        5.1 2017
## 45                 Utah     66.0        3.6 2017
## 46              Vermont     62.8        3.8 2017
## 47             Virginia     61.1        4.6 2017
## 48           Washington     60.6        4.9 2017
## 49        West Virginia     48.8        6.7 2017
## 50            Wisconsin     63.9        3.5 2017
## 51              Wyoming     61.9        5.2 2017
## 52          Puerto Rico     36.2       16.4 2017
join<-rbind(d2012, d2013, d2014, d2015, d2016, d2017)
view(join)

Now we have various visuals

excludeStates<-c("Alabama", "Louisiana", "Tennessee", "South Carolina", "U.S. Virgin Islands", "Mississippi", "Puerto Rico", "District of Columbia", "Federal (FLSA)", "Guam")
employmentdata<-join%>%
  filter(!State %in% excludeStates)

minwage<-minwage%>%
  filter(!State %in% excludeStates)

employmentdata$Year<-as.integer(employmentdata$Year)

joined<-inner_join(minwage, employmentdata)
## Joining, by = c("Year", "State")
## Warning: Column `State` joining factors with different levels, coercing to
## character vector
## We decided to look at the average minimum wage value from 2012 to 2017 by region

west<-filter(joined, State %in% c("Alaska", "California", "Hawaii", "Washington", "Oregon", "New Mexico", "Arizona", "Colorado", "Montana", "Idaho", "Nevada", "Utah", "Wyoming"))

west_mod_data<-lm(Average.Value~Year, data = west)

west_chart<-ggplot(west, aes(x = Year, y = Average.Value, color = as.factor(State)))+
  geom_point()+
  geom_abline(intercept = west_mod_data$coefficients[1], slope = west_mod_data$coefficients[2])

west_chart

midwest<-filter(joined, State %in% c("North Dakota", "South Dakota", "Minnesota", "Nebraska", "Michigan", "Iowa", "Kansas", "Illinois", "Indiana", "Ohio", "Missouri"))

midwest_mod_data<-lm(Average.Value~Year, data = midwest)

midwest_chart<-ggplot(midwest, aes(x = Year, y = Average.Value, color = as.factor(State)))+
  geom_point()+
  geom_abline(intercept = midwest_mod_data$coefficients[1], slope = midwest_mod_data$coefficients[2])

midwest_chart

south<-filter(joined, State %in% c("Oklahoma", "Arkansas", "Texas", "Louisiana", "Mississippi", "Kentucky", "Tennessee", "Georgia", "Alabama", "Florida", "South Carolina", "North Carolina", "West Virginia", "Virginia", "Maryland", "Delaware"))

south_mod_data<-lm(Average.Value~Year, data = south)

south_chart<-ggplot(south, aes(x = Year, y = Average.Value, color = as.factor(State)))+
  geom_point()+
  geom_abline(intercept = south_mod_data$coefficients[1], slope = south_mod_data$coefficients[2])

south_chart

northeast<-filter(joined, State %in% c("New York", "Pennsylvania", "Maine", "Vermont", "New Hampshire", "Rhode Island", "Massachusetts", "Connecticut", "New Jersey"))

northeast_mod_data<-lm(Average.Value~Year, data = northeast)

northeast_chart<-ggplot(northeast, aes(x = Year, y = Average.Value, color = as.factor(State)))+
  geom_point()+
  geom_abline(intercept = northeast_mod_data$coefficients[1], slope = northeast_mod_data$coefficients[2])

northeast_chart

### Then we switched to employment rates by region.
west_mod_data2<-lm(Employed~Year, data = west)

west_emp_chart<-ggplot(west, aes(x = Year, y = Employed, color = as.factor(State)))+
  geom_point()+
  geom_abline(intercept = west_mod_data2$coefficients[1], slope = west_mod_data2$coefficients[2])

west_emp_chart

midwest_mod_data2<-lm(Employed~Year, data = midwest)

midwest_emp_chart<-ggplot(midwest, aes(x = Year, y = Employed, color = as.factor(State)))+
  geom_point()+
  geom_abline(intercept = midwest_mod_data2$coefficients[1], slope = midwest_mod_data2$coefficients[2])

midwest_emp_chart

south_mod_data2<-lm(Employed~Year, data = south)

south_emp_chart<-ggplot(south, aes(x = Year, y = Employed, color = as.factor(State)))+
  geom_point()+
  geom_abline(intercept = south_mod_data2$coefficients[1], slope = south_mod_data2$coefficients[2])

south_emp_chart

northeast_mod_data2<-lm(Employed~Year, data = northeast)

northeast_emp_chart<-ggplot(northeast, aes(x = Year, y = Employed, color = as.factor(State)))+
  geom_point()+
  geom_abline(intercept = northeast_mod_data2$coefficients[1], slope = northeast_mod_data2$coefficients[2])

northeast_emp_chart

### Now unemployed by region
wmd3<-lm(Unemployed~Year, data = west)

wuc<-ggplot(west, aes(x = Year, y = Unemployed, color = as.factor(State)))+
  geom_point()+
  geom_abline(intercept = wmd3$coefficients[1], slope = wmd3$coefficients[2])

wuc

mmd3<-lm(Unemployed~Year, data = midwest)

muc<-ggplot(midwest, aes(x = Year, y = Unemployed, color = as.factor(State)))+
  geom_point()+
  geom_abline(intercept = mmd3$coefficients[1], slope = mmd3$coefficients[2])

muc

smd3<-lm(Unemployed~Year, data = south)

suc<-ggplot(south, aes(x = Year, y = Unemployed, color = as.factor(State)))+
  geom_point()+
  geom_abline(intercept = smd3$coefficients[1], slope = smd3$coefficients[2])

suc

nmd3<-lm(Unemployed~Year, data = northeast)

nuc<-ggplot(northeast, aes(x = Year, y = Unemployed, color = as.factor(State)))+
  geom_point()+
  geom_abline(intercept = nmd3$coefficients[1], slope = nmd3$coefficients[2])

nuc

EDA - Variables: - The variables used in the graphics are the employment and unemployment rates of all 50 states (minus about 6 due to their lack of data). Another variable used to determine relationships were the years 2012 through 2017, the average values of each state’s minimum wage from 2012 to 2017, and the states themselves.

The relationships found were the employment and unemployment rates over the 6 year interval (from 2012 to 2017), as well as the average value of each state’s minimum wage during this time, all divided by region (northeast, south, midwest, and west).

Some possible relationships might be that election years might not have been as influential on minimum wage values and employment rates as we originally thought. Employment rates and minimum wages throughout the United States have been steadily rising over this 6 year time period. Despite growing populations, the unemployment rates seemed to be declining.

We thought that the economy (by looking at minimum wages and employment rates throughout the country) would have some relationship with the elections of President Obama and Trump.