DATA 698 : Capstone Research Project
Capstone Project on CPI and Employment
DATA 698 : Capstone Research Project
1 Overview
We are in totally different era of twenty first century, and it gives us very rare situation where any positive news would help the humankind. We want to use the historical CPI data and find the relation of it with the employment, in hope that we would have some positive news on employment by following the trend of the data in past.
We feel that lower CPI would result in more job opportunity, as the it gives space for more competition in small business across sectors.
2 Capstone Project on CPI and Employment
- The estimates of employment for 1998-2006 are based on the 2002 North American Industry Classification System (NAICS). The estimates for 2007-2010 are based on the 2007 NAICS. The estimates for 2011-2016 are based on the 2012 NAICS. The estimates for 2017 forward are based on the 2017 NAICS.
- Excludes limited partners.
- Under the 2007 NAICS, internet publishing and broadcasting was reclassified to other information services.
- (NA) Not available.
- (NM) Not meaningful.
- Not shown to avoid disclosure of confidential information; estimates are included in higher-level totals.
- Estimate for employment suppressed to cover corresponding estimate for earnings. Estimates for this item are included in the total.
Last updated: September 24, 2019- new statistics for 2018; revised statistics for 2014-2017.
3 Data Preparation
Load the required libraries
#packages <- c("R.rsp","pandocfilters","knitr","plyr","tidyr","dplyr","ggplot2","plotly","sqldf","MASS","reshape2","Amelia","mice","googleVis","stringi","ROCR","scatterplot3d","visdat","gridExtra","sjmisc")
#install.packages (packages, repos="http://cran.us.r-project.org", dependencies=TRUE, lib="C:/Program Files/R/R-3.6.3/library")
#invisible (lapply (packages, library, character.only = T))
#install.packages("devtools", lib="C:/Program Files/R/R-3.6.3/library")
#devtools::install_github("hadley/ggplot2")
library(tidyverse)
library(kableExtra)
library(lubridate)
library(forecast)
library(stringi)
library(dplyr)3.1 Load Employment Datasets
file_path <- "../data/cpi/SAINC/"
EMP_DATA_FILE <- paste0(file_path,"EMP9818.csv")
EMP_DATA <- read_delim(EMP_DATA_FILE, delim = ",", col_names = TRUE, trim_ws= TRUE)
#,skip = 6,col_types = list(col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character()))
# names(SAP_INV_DAT) = c("DROP","I_BILLDATE","I_ORG","I_DC","I_DOCCA","I_BILLTYPE","I_ORDER_REA","I_DIV","I_MATYPE","I_BILL_QTY","BLANK")
# SAP_INV_DAT<- SAP_INV_DAT[-which(is.na(SAP_INV_DAT$I_ORG)),] %>% .[-str_which(trimws(.$I_ORG),'SOrg.|-----'),c(-1,-11)]
# 3
# SAP_INV_DAT$I_BILLDATE <- date(parse_datetime(SAP_INV_DAT$I_BILLDATE, "%m/%d/%Y"))
#
# head(SAP_INV_DAT)
summary(EMP_DATA)## GeoFIPS GeoName Region TableName
## Length:7084 Length:7084 Min. :1.000 Length:7084
## Class :character Class :character 1st Qu.:3.000 Class :character
## Mode :character Mode :character Median :5.000 Mode :character
## Mean :4.475
## 3rd Qu.:6.000
## Max. :8.000
## NA's :122
## LineCode IndustryClassification Description Unit
## Min. : 10.0 Length:7084 Length:7084 Length:7084
## 1st Qu.: 517.0 Class :character Class :character Class :character
## Median : 712.5 Mode :character Mode :character Mode :character
## Mean : 872.1
## 3rd Qu.:1103.0
## Max. :2012.0
## NA's :4
## 1998 1999 2000 2001
## Length:7084 Length:7084 Length:7084 Length:7084
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## 2002 2003 2004 2005
## Length:7084 Length:7084 Length:7084 Length:7084
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## 2006 2007 2008 2009
## Length:7084 Length:7084 Length:7084 Length:7084
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## 2010 2011 2012 2013
## Length:7084 Length:7084 Length:7084 Length:7084
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## 2014 2015 2016 2017
## Length:7084 Length:7084 Length:7084 Length:7084
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## 2018
## Length:7084
## Class :character
## Mode :character
##
##
##
##
# Reading the Category and detail Desc for ref.
EMP_DEF_FILE <- paste0(file_path,"SAEMP25N__definition.xml")
IND_DESC <- XML::xmlToDataFrame(EMP_DEF_FILE)
names(IND_DESC) <- c("LineCode","Description")
glimpse(IND_DESC)## Rows: 118
## Columns: 3
## $ LineCode <fct> 10, 20, 40, 50, 60, 70, 80, 90, 100, 101, 102, 103, 200...
## $ Description <fct> "Total employment (number of jobs)", "Wage and salary e...
## $ NA <fct> "A count of jobs, both full-time and part-time. It incl...
3.2 Show Relevant Employment Data
# MN-MAIN | NN - RANGE | NL - LAST |
EMAP_CAT_LAST <- EMP_DATA[,c(2,5,6,7)] %>%
filter(GeoName=="United States") %>%
mutate(ISM = ifelse(IndustryClassification=="...","M","N")) %>%
separate(IndustryClassification, c("From", "To")) %>%
mutate(ISML = ifelse(is.na(To)==T,"L","N")) %>%
filter(ISML %in% c("L","N")) %>%
select(.,LineCode,Description)
#DT::datatable(EMAP_CAT_LAST)
EMP_DAT_CAT <- EMP_DATA[,c(2,5,9:29)] %>%
left_join(EMAP_CAT_LAST,.,by="LineCode")
glimpse(EMP_DAT_CAT)## Rows: 7,080
## Columns: 24
## $ LineCode <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,...
## $ Description <chr> "Total employment (number of jobs)", "Total employment ...
## $ GeoName <chr> "United States", "Alabama", "Alaska", "Arizona", "Arkan...
## $ `1998` <chr> "158481200", "2361892", "382166", "2616288", "1445536",...
## $ `1999` <chr> "161531300", "2378217", "381307", "2695892", "1460374",...
## $ `2000` <chr> "165370800", "2392225", "389734", "2801510", "1482449",...
## $ `2001` <chr> "165522200", "2376053", "394565", "2829002", "1482587",...
## $ `2002` <chr> "165095100", "2364828", "402187", "2847095", "1478929",...
## $ `2003` <chr> "165921500", "2371430", "405621", "2917121", "1482035",...
## $ `2004` <chr> "168839700", "2425649", "413864", "3041476", "1505095",...
## $ `2005` <chr> "172338400", "2486833", "421419", "3219820", "1537680",...
## $ `2006` <chr> "175868600", "2545556", "431320", "3375218", "1567682",...
## $ `2007` <chr> "179543700", "2604078", "439825", "3465075", "1582858",...
## $ `2008` <chr> "179213900", "2582591", "443538", "3402808", "1579283",...
## $ `2009` <chr> "173636700", "2479507", "442447", "3228493", "1542944",...
## $ `2010` <chr> "172901700", "2460298", "443904", "3181571", "1541272",...
## $ `2011` <chr> "176091700", "2497933", "450364", "3239045", "1561948",...
## $ `2012` <chr> "178979700", "2503678", "459222", "3295537", "1565142",...
## $ `2013` <chr> "182325100", "2523338", "461110", "3371219", "1569249",...
## $ `2014` <chr> "186233800", "2551872", "461327", "3448173", "1587414",...
## $ `2015` <chr> "190315800", "2586885", "461767", "3548174", "1610779",...
## $ `2016` <chr> "193371900", "2619154", "457371", "3646604", "1629237",...
## $ `2017` <chr> "196825300", "2653968", "456799", "3751283", "1644432",...
## $ `2018` <chr> "200746000", "2691517", "459178", "3859137", "1663188",...
#DT::datatable(EMP_DAT_CAT)
temp_name <- names(EMP_DAT_CAT [,c(4:24)])
EMP_DAT_CAT_SAN <- data.frame(lapply(EMP_DAT_CAT [,c(4:24)],function(x) {
gsub("(D)|(T)",NA,x)
}))
names(EMP_DAT_CAT_SAN) = temp_name
#DT::datatable(EMP_DAT_CAT_SAN)
EMP_DAT_CAT_SAN <- cbind(EMP_DAT_CAT[,c(1:3)],EMP_DAT_CAT_SAN)
#DT::datatable(EMP_DAT_CAT_SAN)3.2.1 Employment in New York from Select Industry
EMP_DAT_CAT_SAN_NY <- EMP_DAT_CAT_SAN %>%
filter(LineCode %in% c(705,1100,800,1600,1500)) %>%
filter(GeoName=="New York") %>%
reshape2::melt(
id.vars = c("LineCode","Description","GeoName"),
variable.name = "Year",
value.name = "Number of jobs")
EMP_DAT_CAT_SAN_NY %>%
#sample_frac(0.33) %>%
DT::datatable()3.3 Load CPI Data
file_path <- "../data/cpi/archive/"
## Read files named *.csv
filenames <- list.files(path=file_path,
pattern="*.csv",
full.names = TRUE)
#print(filenames)
##Create list of data frame names without the ".csv" part
names <- gsub(pattern='\\.',
replacement='_', # remove the last 4 characters from filename (_clean)
x=(sub(pattern = "(.*)\\..*$", replacement = "\\1", # remove extenson from filename (.csv)
basename(filenames) # get filename without the directory file path
)
)
)
#print(names)
###Load all files
for(i in filenames){
## Create list of data frame names without the ".csv" part
name <- gsub(pattern='\\.',
replacement='_', # remove the last 4 characters from filename (_clean)
x=(sub(pattern = "(.*)\\..*$", replacement = "\\1", # remove extenson from filename (.csv)
basename(i) # get filename without the directory file path
)
)
)
print(name)
assign(x=name, value=data.frame(read.csv(i)))
#DT::datatable(name)
}## [1] "cu_area"
## [1] "cu_base"
## [1] "cu_data_0_Current"
## [1] "cu_data_1_AllItems"
## [1] "cu_data_10_OtherWest"
## [1] "cu_data_11_USFoodBeverage"
## [1] "cu_data_12_USHousing"
## [1] "cu_data_13_USApparel"
## [1] "cu_data_14_USTransportation"
## [1] "cu_data_15_USMedical"
## [1] "cu_data_16_USRecreation"
## [1] "cu_data_17_USEducationAndCommunication"
## [1] "cu_data_18_USOtherGoodsAndServices"
## [1] "cu_data_19_PopulationSize"
## [1] "cu_data_2_Summaries"
## [1] "cu_data_20_USCommoditiesServicesSpecial"
## [1] "cu_data_3_AsizeNorthEast"
## [1] "cu_data_4_AsizeNorthCentral"
## [1] "cu_data_5_AsizeSouth"
## [1] "cu_data_6_AsizeWest"
## [1] "cu_data_7_OtherNorthEast"
## [1] "cu_data_8_OtherNorthCentral"
## [1] "cu_data_9_OtherSouth"
## [1] "cu_footnote"
## [1] "cu_item"
## [1] "cu_period"
## [1] "cu_periodicity"
## [1] "cu_series"
3.4 Show CPI Reference Data
3.4.1 Area
3.4.2 Base
3.4.3 Item
3.4.4 Periodicity
3.4.5 Series
3.4.6 Period
3.5 Combined Reference Data
3.6 Show CPI Data from Select Industry
3.6.1 CPI - US Food Beverages
FoodItemCode <- cu_reference_data %>%
filter(grepl('SAF|SEF|SS0|SS1|SS20', item_code))
#dplyr::filter(item_code %like% 'SAH' | item_code %like% 'SEH')
#DT::datatable((FoodItemCode))
cu_data_USFoodBeverage = cu_data_0_Current %>%
inner_join(FoodItemCode, by="series_id") %>%
inner_join(cu_period, by="period") %>%
filter(year %in% (1998:2018)) %>%
select(c("series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period"))
DT::datatable(head(cu_data_USFoodBeverage))3.6.2 CPI - US Housing
HousingItemCode <- cu_reference_data %>%
#filter(grepl('SAH|SEH', item_code, fixed = TRUE))
filter(grepl('SAH|SEH', item_code))
#dplyr::filter(item_code %like% 'SAH' | item_code %like% 'SEH')
#DT::datatable((HousingItemCode))
cu_data_USHousing = cu_data_0_Current %>%
inner_join(HousingItemCode, by="series_id") %>%
inner_join(cu_period, by="period") %>%
#filter(item_code %like% c('SAH','SEH')) %>%
filter(year %in% (1998:2018)) %>%
select(c("series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period"))
DT::datatable(head(cu_data_USHousing))3.6.3 CPI - US Transportation
TransportItemCode <- cu_reference_data %>%
filter(grepl('SAT|SET|SS4|SS52|SS53|SAS24|SAS4', item_code))
#dplyr::filter(item_code %like% 'SAH' | item_code %like% 'SEH')
#DT::datatable((TransportItemCode))
cu_data_USTransportation = cu_data_0_Current %>%
inner_join(TransportItemCode, by="series_id") %>%
inner_join(cu_period, by="period") %>%
filter(year %in% (1998:2018)) %>%
select(c("series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period"))
DT::datatable(head(cu_data_USTransportation))3.6.4 CPI - US Medical
MedicalItemCode <- cu_reference_data %>%
filter(grepl('SAM|SEM|SS57', item_code))
#dplyr::filter(item_code %like% 'SAH' | item_code %like% 'SEH')
#DT::datatable((MedicalItemCode))
cu_data_USMedical = cu_data_0_Current %>%
inner_join(MedicalItemCode, by="series_id") %>%
inner_join(cu_period, by="period") %>%
filter(year %in% (1998:2018)) %>%
select(c("series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period"))
DT::datatable(head(cu_data_USMedical))3.6.5 CPI - US Education And Communication
EducationItemCode <- cu_reference_data %>%
filter(grepl('SAE|SEE|SSE|SS27', item_code))
#dplyr::filter(item_code %like% 'SAH' | item_code %like% 'SEH')
#DT::datatable((EducationItemCode))
cu_data_USEducationAndCommunication = cu_data_0_Current %>%
inner_join(EducationItemCode, by="series_id") %>%
inner_join(cu_period, by="period") %>%
filter(year %in% (1998:2018)) %>%
select(c("series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period"))
DT::datatable(head(cu_data_USEducationAndCommunication))4 Employment vs CPI in NY - by Industry
4.1 Food & Beverages
cu_emp_data_USFoodBeverage_NY = cu_data_USFoodBeverage %>%
filter(area_name=="New York-Northern New Jersey-Long Island",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAF',
emp_line_code = 705) %>%
select(c("parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = sum(value))
#cu_emp_data_USFoodBeverage_NY %>%
# sample_frac(0.33) %>%
# DT::datatable()
#Food and beverage stores
#EMP_DAT_CAT_SAN_USFoodBeverage_NY = EMP_DAT_CAT_SAN_NY %>%
# filter(LineCode==705)%>%
# mutate(parent_line_code = 705)
cpi_emp_data_USFoodBeverage_NY = merge(EMP_DAT_CAT_SAN_NY, cu_emp_data_USFoodBeverage_NY,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_NY %>%
# sample_frac(0.33) %>%
DT::datatable()4.2 Housing
cu_emp_data_USHousing_NY = cu_data_USHousing %>%
filter(area_name=="New York-Northern New Jersey-Long Island",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAH',
emp_line_code = 1100) %>%
select(c("parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = sum(value))
#cu_emp_data_USHousing_NY %>%
# sample_frac(0.33) %>%
# DT::datatable()
#Real estate and rental and leasing
#EMP_DAT_CAT_SAN_USHousing_NY = EMP_DAT_CAT_SAN_NY %>%
# filter(LineCode %in% c(1100,1101,1102,1103)) %>%
# mutate(parent_line_code = 1100)
cpi_emp_data_USHousing_NY = merge(EMP_DAT_CAT_SAN_NY, cu_emp_data_USHousing_NY,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_NY %>%
# sample_frac(0.33) %>%
DT::datatable()4.3 Transportation
cu_emp_data_USTransportation_NY = cu_data_USTransportation %>%
filter(area_name=="New York-Northern New Jersey-Long Island",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAT',
emp_line_code = 800) %>%
select(c("parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = sum(value))
#cu_emp_data_USTransportation_NY %>%
# sample_frac(0.33) %>%
# DT::datatable()
#Transportation and warehousing
#EMP_DAT_CAT_SAN_USTransportation_NY = EMP_DAT_CAT_SAN_NY %>%
# filter(LineCode %in% c(800,801,802,803,804,805,806,807,808,809,811)) %>%
# mutate(parent_line_code = 800)
cpi_emp_data_USTransportation_NY = merge(EMP_DAT_CAT_SAN_NY, cu_emp_data_USTransportation_NY,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_NY %>%
# sample_frac(0.33) %>%
DT::datatable()4.4 Medical
cu_emp_data_USMedical_NY = cu_data_USMedical %>%
filter(area_name=="New York-Northern New Jersey-Long Island",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAM',
emp_line_code = 1600) %>%
select(c("parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = sum(value))
#cu_emp_data_USMedical_NY %>%
# sample_frac(0.33) %>%
# DT::datatable()
#Health care and social assistance
#EMP_DAT_CAT_SAN_USTransportation_NY = EMP_DAT_CAT_SAN_NY %>%
# filter(LineCode %in% c(1600,1601,1602,1603,1604)) %>%
# mutate(parent_line_code = 1600)
cpi_emp_data_USMedical_NY = merge(EMP_DAT_CAT_SAN_NY, cu_emp_data_USMedical_NY,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_NY %>%
# sample_frac(0.33) %>%
DT::datatable()4.5 Education & Communication
cu_data_USEducationAndCommunication_NY = cu_data_USEducationAndCommunication %>%
filter(area_name=="New York-Northern New Jersey-Long Island",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAE', emp_line_code = 1500) %>%
select(c("parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = sum(value))
#cu_data_USEducationAndCommunication_NY %>%
# sample_frac(0.33) %>%
# DT::datatable()
#Educational services
#EMP_DAT_CAT_SAN_USEducationAndCommunication_NY = EMP_DAT_CAT_SAN_NY %>%
# filter(LineCode %in% c(1500)) %>%
# mutate(parent_line_code = 1500)
cpi_emp_data_USEducationAndCommunication_NY = merge(EMP_DAT_CAT_SAN_NY, cu_data_USEducationAndCommunication_NY,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_NY %>%
# sample_frac(0.33) %>%
DT::datatable()cpi_emp_data_USFoodBeverage_NY %>% select(c("Year","Sector"= "Description","No_Jobs"="Number of jobs","cpi_value")) %>% gather(key="Type", value ="Value", "cpi_value","No_Jobs") %>% ggplot(mapping = aes(x= Year, y=(as.numeric(Value)),fill = Type)) +
geom_col(position = position_dodge()) +
theme(axis.text.x = element_text(angle = 60, colour="gray",hjust = 1,size=rel(0.86)))+
labs(title = "USSFoodBeverage Data for No. Of Jobs and CPI Value in NY",
y = "Sector",x= "Years") cpi_emp_data_USFoodBeverage_NY %>% select(c("Year","Sector"= "Description","No_Jobs"="Number of jobs")) %>% ggplot(mapping = aes(x= Year, y=(as.numeric(No_Jobs)))) + geom_col()+ theme(axis.text.x = element_text(angle = 60, colour="gray",hjust = 1,size=rel(0.86)))+
labs(title = "USSFoodBeverage Data for No. Of Jobs in NY",
y = "Sector",x= "Years") cpi_emp_data_USFoodBeverage_NY %>% select(c("Year","Sector"= "Description","cpi_value")) %>% ggplot(mapping = aes(x= Year, y=(as.numeric(cpi_value)))) + geom_col()+ theme(axis.text.x = element_text(angle = 60, colour="gray",hjust = 1,size=rel(0.86)))+
labs(title = "USSFoodBeverage Data for CPI Value in NY",
y = "Sector",x= "Years")