The goal for this section is to Compare wages in retail to their real wage value and purchasing power. In doing so, we can reveal trends in wages and identify if wages are part of a larger issue or if wages have kept up with inflation enough to deduce that working in retail has paid the same over time. Data was provided via FRED: Wage data from https://fred.stlouisfed.org/series/CES4200000003 and CPI data from (https://fred.stlouisfed.org/series/CPIAUCSL
First, is data setup and cleanup. After loading the libraries, we look for null values and omit as necessary then print a summary of the updated data for observation
#load libraries
library(fredr)
library(tidyverse)
library(ggplot2)
library(dplyr)
#Get Data
setwd("C:/Users/hrall/Documents/Data") #set directory
RealWageData<-read.csv("RealWage.csv", header=TRUE)
#check for null values
sum(is.na(RealWageData))
## [1] 710
#Data clean up
#create data to omit nulls
RealWageDataOmit<-na.omit(RealWageData)
#Change the header names
colnames(RealWageDataOmit)[2]<- "CPI"
colnames(RealWageDataOmit)[3]<- "NominalWage"
#View data
summary(RealWageDataOmit)
## DATE CPI NominalWage
## Length:216 Min. :199.7 Min. :15.08
## Class :character 1st Qu.:219.0 1st Qu.:15.59
## Mode :character Median :237.4 Median :17.32
## Mean :242.0 Mean :18.06
## 3rd Qu.:255.9 3rd Qu.:19.79
## Max. :311.1 Max. :24.33
Next, the real wage and purchasing power for each wage data point must be calculated. A new column is added for each.
Real Wage in a year = (Nominal Wage in a Year/CPI in a Year)x100
Purchasing Power in a chosen year = (Nominal wage in the base year/ CPI in the base year)(CPI in the selected year)
For purchasing power, the first year with complete data was 3/1/2006, so that was choosen as the base year and month.
#Calculate purchasing power
#since the first row of data is 3/1/2006, this will be our base
#Determine the values for the base
NomWageBase<- RealWageDataOmit[1,3]
CPIBase<- RealWageDataOmit[1,2]
#Add in columns for purchasing power and real Wage
WageData<-RealWageDataOmit %>%
mutate(
PP = (NomWageBase/CPIBase)*CPI,
RealWage = (NominalWage/CPI)*100)
summary(WageData)
## DATE CPI NominalWage PP
## Length:216 Min. :199.7 Min. :15.08 Min. :15.24
## Class :character 1st Qu.:219.0 1st Qu.:15.59 1st Qu.:16.71
## Mode :character Median :237.4 Median :17.32 Median :18.12
## Mean :242.0 Mean :18.06 Mean :18.47
## 3rd Qu.:255.9 3rd Qu.:19.79 3rd Qu.:19.53
## Max. :311.1 Max. :24.33 Max. :23.74
## RealWage
## Min. :6.949
## 1st Qu.:7.139
## Median :7.380
## Mean :7.436
## 3rd Qu.:7.724
## Max. :8.226
Lastly,a plot of the data the data reveals trends which show purchasing power and the real wage appears to show the same pattern over time, indicating wages may not be the issue.
*Note, only real wage is plotted for now, the purchasing power graph still needs to be built
#Plot real wage and purchasing power
ggplot(WageData, aes(x=DATE, y=RealWage, group = 1))+
geom_line()+
labs(title = "Real Wage", x = "Date", y = "Real Wage")
Comparing health insurance coverage by type of coverage
First, the data is reviewed and the insurance type “Total” is eliminated as it is the total of all insurance types and always equals 1
*Note: my code wasn’t reading when I triend to publish to RPubs
setwd("C:/Users/hrall/Documents/Data")
HealthInsurance<-read.csv("HealthIns.csv", header=TRUE)
#view data
summary(HealthInsurance)
## Year Typle Percentage
## Min. :2008 Length:98 Min. :0.01302
## 1st Qu.:2011 Class :character 1st Qu.:0.06359
## Median :2014 Mode :character Median :0.14257
## Mean :2015 Mean :0.28571
## 3rd Qu.:2018 3rd Qu.:0.48679
## Max. :2022 Max. :1.00000
#check for null values
sum(is.na(HealthInsurance))
## [1] 0
#Eliminate 'Total' as it will always be 1
HealthInsCoverageNew<- filter(HealthInsurance, Typle != "Total")
#view data
summary(HealthInsCoverageNew)
## Year Typle Percentage
## Min. :2008 Length:84 Min. :0.01302
## 1st Qu.:2011 Class :character 1st Qu.:0.06069
## Median :2014 Mode :character Median :0.12514
## Mean :2015 Mean :0.16667
## 3rd Qu.:2018 3rd Qu.:0.19287
## Max. :2022 Max. :0.53460
Plot comparison graph of health insurance coverage by type
*Note:MOre analysis is needed to complete
#plot
ggplot(HealthInsCoverageNew, aes(x=Year, y=Percentage, group = 1))+
geom_line()+facet_wrap(~ Typle)
Data was provided (insert link)
library(ggplot2)
library(reshape2)
data <- read.csv("JOLTS.csv", header = TRUE)
colnames(data)[1] <- "Sector"
data$Sector <- as.factor(data$Sector)
data$Type <- ifelse(grepl("Opening", data$Sector), "Opening Rate", "Quit Rate")
data$Sector <- gsub("_Rate", "", data$Sector)
data$Sector <- gsub("Opening_Rate", "", data$Sector)
data$Sector <- gsub("Quit_Rate", "", data$Sector)
data_melted <- melt(data, id.vars = c("Sector", "Type"), variable.name = "Month", value.name = "Rate")
ggplot(data_melted, aes(x = Month, y = Rate, color = Type)) + geom_line() + facet_wrap(~ Sector, scales = "free_y") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + labs(title = "Trend of Opening and Quit Rates by Sector", y = "Rate (%)", x = "Month")
ggplot(data_melted, aes(x = Sector, y = Rate, fill = Type)) + geom_bar(stat = "identity", position = position_dodge()) + facet_wrap(~ Month, scales = "free_y") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + labs(title = "Opening and Quit Rates by Sector and Month", y = "Rate (%)", x = "Sector")
opening_rates <- subset(data_melted, Type == "Opening Rate")
ggplot(opening_rates, aes(x = Month, y = Sector, fill = Rate)) +
geom_tile() + scale_fill_gradient(low = "blue", high = "red") +
theme_minimal() + theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(title = "Heat Map of Opening Rates", y = "Sector", x = "Month")