DATA 698 : Capstone Research Project
Capstone Project on CPI and Employment
DATA 698 : Capstone Research Project
- 1 Overview
- 2 Capstone Project on CPI and Employment
- 3 Data Preparation
- 3.1 Load Employment Datasets
- 3.2 Show Relevant Employment Data
- 3.2.1 Employment from Select Industry by State
- 3.2.1.1 AK - Alaska
- 3.2.1.2 AZ - Arizona
- 3.2.1.3 CA - California
- 3.2.1.4 CO - Colorado
- 3.2.1.5 FL - Florida
- 3.2.1.6 GA - Georgia
- 3.2.1.7 HI - Hawaii
- 3.2.1.8 IL - Illinois
- 3.2.1.9 KS - Kansas
- 3.2.1.10 MA - Massachusetts
- 3.2.1.11 MI - Michigan
- 3.2.1.12 MN - Minnesota
- 3.2.1.13 MO - Missouri
- 3.2.1.14 NY - New York
- 3.2.1.15 OH - Ohio
- 3.2.1.16 OR - Oregon
- 3.2.1.17 PA - Pennsylvania
- 3.2.1.18 TX - Texas
- 3.2.1.19 WA - Washington
- 3.2.1.20 WI - Wisconsin
- 3.2.1 Employment from Select Industry by State
- 3.3 Load CPI Data
- 3.4 Show CPI Reference Data
- 3.5 Show CPI Data from Select Industry
- 4 Employment vs CPI - by Industry and Area
- 4.1 Food & Beverages
- 4.1.1 AK - Alaska : Anchorage
- 4.1.2 AZ - Arizona : Phoenix-Mesa
- 4.1.3 CA - California
- 4.1.4 CO - Colorado : Denver-Boulder-Greeley
- 4.1.5 FL - Florida
- 4.1.6 GA - Georgia : Atlanta
- 4.1.7 HI - Hawaii : Honolulu
- 4.1.8 IL - Illinois : Chicago-Gary-Kenosha
- 4.1.9 KS - Kansas : Kansas City
- 4.1.10 MA - Massachusetts : Boston-Brockton-Nashua
- 4.1.11 MI - Michigan : Detroit-Ann Arbor-Flint
- 4.1.12 MN - Minnesota : Minneapolis-St. Paul
- 4.1.13 MO - Missouri : St. Louis
- 4.1.14 NY - New York : New York-Northern New Jersey-Long Island
- 4.1.15 OH - Ohio
- 4.1.16 OR - Oregon : Portland-Salem
- 4.1.17 PA - Pennsylvania
- 4.1.18 TX - Texas
- 4.1.19 WA - Washington : Seattle-Tacoma-Bremerton
- 4.1.20 WI - Wisconsin : Milwaukee-Racine
- 4.2 Housing
- 4.2.1 AK - Alaska : Anchorage
- 4.2.2 AZ - Arizona : Phoenix-Mesa
- 4.2.3 CA - California
- 4.2.4 CO - Colorado : Denver-Boulder-Greeley
- 4.2.5 FL - Florida
- 4.2.6 GA - Georgia : Atlanta
- 4.2.7 HI - Hawaii : Honolulu
- 4.2.8 IL - Illinois : Chicago-Gary-Kenosha
- 4.2.9 KS - Kansas : Kansas City
- 4.2.10 MA - Massachusetts : Boston-Brockton-Nashua
- 4.2.11 MI - Michigan
- 4.2.12 MN - Minnesota : Minneapolis-St. Paul
- 4.2.13 MO - Missouri : St. Louis
- 4.2.14 NY - New York : New York-Northern New Jersey-Long Island
- 4.2.15 OH - Ohio
- 4.2.16 OR - Oregon : Portland-Salem
- 4.2.17 PA - Pennsylvania
- 4.2.18 TX - Texas
- 4.2.19 WA - Washington : Seattle-Tacoma-Bremerton
- 4.2.20 WI - Wisconsin : Milwaukee-Racine
- 4.3 Transportation
- 4.3.1 AK - Alaska : Anchorage
- 4.3.2 AZ - Arizona : Phoenix-Mesa
- 4.3.3 CA - California
- 4.3.4 CO - Colorado : Denver-Boulder-Greeley
- 4.3.5 FL - Florida
- 4.3.6 GA - Georgia : Atlanta
- 4.3.7 HI - Hawaii : Honolulu
- 4.3.8 IL - Illinois : Chicago-Gary-Kenosha
- 4.3.9 KS - Kansas : Kansas City
- 4.3.10 MA - Massachusetts : Boston-Brockton-Nashua
- 4.3.11 MI - Michigan : Detroit-Ann Arbor-Flint
- 4.3.12 MN - Minnesota : Minneapolis-St. Paul
- 4.3.13 MO - Missouri : St. Louis
- 4.3.14 NY - New York : New York-Northern New Jersey-Long Island
- 4.3.15 OH - Ohio
- 4.3.16 OR - Oregon : Portland-Salem
- 4.3.17 PA - Pennsylvania
- 4.3.18 TX - Texas
- 4.3.19 WA - Washington : Seattle-Tacoma-Bremerton
- 4.3.20 WI - Wisconsin : Milwaukee-Racine
- 4.4 Medical
- 4.4.1 AK - Alaska : Anchorage
- 4.4.2 AZ - Arizona : Phoenix-Mesa
- 4.4.3 CA - California
- 4.4.4 CO - Colorado : Denver-Boulder-Greeley
- 4.4.5 FL - Florida
- 4.4.6 GA - Georgia : Atlanta
- 4.4.7 HI - Hawaii : Honolulu
- 4.4.8 IL - Illinois : Chicago-Gary-Kenosha
- 4.4.9 KS - Kansas : Kansas City
- 4.4.10 MA - Massachusetts : Boston-Brockton-Nashua
- 4.4.11 MI - Michigan : Detroit-Ann Arbor-Flint
- 4.4.12 MN - Minnesota : Minneapolis-St. Paul
- 4.4.13 MO - Missouri : St. Louis
- 4.4.14 NY - New York
- 4.4.15 OH - Ohio
- 4.4.16 OR - Oregon : Portland-Salem
- 4.4.17 PA - Pennsylvania
- 4.4.18 TX - Texas
- 4.4.19 WA - Washington : Seattle-Tacoma-Bremerton
- 4.4.20 WI - Wisconsin : Milwaukee-Racine
- 4.5 Education & Communication
- 4.5.1 AK - Alaska : Anchorage
- 4.5.2 AZ - Arizona : Phoenix-Mesa
- 4.5.3 CA - California
- 4.5.4 CO - Colorado : Denver-Boulder-Greeley
- 4.5.5 FL - Florida
- 4.5.6 GA - Georgia : Atlanta
- 4.5.7 HI - Hawaii : Honolulu
- 4.5.8 IL - Illinois : Chicago-Gary-Kenosha
- 4.5.9 KS - Kansas : Kansas City
- 4.5.10 MA - Massachusetts : Boston-Brockton-Nashua
- 4.5.11 MI - Michigan
- 4.5.12 MN - Minnesota : Minneapolis-St. Paul
- 4.5.13 MO - Missouri : St. Louis
- 4.5.14 NY - New York
- 4.5.15 OH - Ohio
- 4.5.16 OR - Oregon : Portland-Salem
- 4.5.17 PA - Pennsylvania
- 4.5.18 TX - Texas
- 4.5.19 WA - Washington : Seattle-Tacoma-Bremerton
- 4.5.20 WI - Wisconsin : Milwaukee-Racine
- 4.6 Combined Employment and CPI Data
- 4.1 Food & Beverages
- 5 RAJWANT WORK
- 6 References
1 Overview
We are in totally different era of twenty first century, and it gives us very rare situation where any positive news would help the humankind. We want to use the historical CPI data and find the relation of it with the employment, in hope that we would have some positive news on employment by following the trend of the data in past.
We feel that lower CPI would result in more job opportunity, as the it gives space for more competition in small business across sectors.
2 Capstone Project on CPI and Employment
- The estimates of employment for 1998-2006 are based on the 2002 North American Industry Classification System (NAICS). The estimates for 2007-2010 are based on the 2007 NAICS. The estimates for 2011-2016 are based on the 2012 NAICS. The estimates for 2017 forward are based on the 2017 NAICS.
- Excludes limited partners.
- Under the 2007 NAICS, internet publishing and broadcasting was reclassified to other information services.
- (NA) Not available.
- (NM) Not meaningful.
- Not shown to avoid disclosure of confidential information; estimates are included in higher-level totals.
- Estimate for employment suppressed to cover corresponding estimate for earnings. Estimates for this item are included in the total.
Last updated: September 24, 2019- new statistics for 2018; revised statistics for 2014-2017.
3 Data Preparation
Load the required libraries
#packages <- c("R.rsp","pandocfilters","knitr","plyr","tidyr","dplyr","ggplot2","plotly","sqldf","MASS","reshape2","Amelia","mice","googleVis","stringi","ROCR","scatterplot3d","visdat","gridExtra","sjmisc", "devtools")
#install.packages (packages, repos="http://cran.us.r-project.org", dependencies=TRUE, lib="C:/Program Files/R/R-3.6.3/library")
#invisible (lapply (packages, library, character.only = T))
#devtools::install_github("hadley/ggplot2")
#if (!require(devtools)) install.packages("devtools", repos="http://cran.us.r-project.org", dependencies=TRUE, lib="C:/Program Files/R/R-3.6.3/library")
#devtools::install_github("boxuancui/DataExplorer")
#library("DataExplorer", lib.loc="C:/Program Files/R/R-3.6.3/library")
library(tidyverse)
library(kableExtra)
library(lubridate)
library(forecast)
library(stringi)
library(dplyr)3.1 Load Employment Datasets
file_path <- "../data/cpi/SAINC/"
EMP_DATA_FILE <- paste0(file_path,"EMP9818.csv")
EMP_DATA <- read_delim(EMP_DATA_FILE, delim = ",", col_names = TRUE, trim_ws= TRUE)
#,skip = 6,col_types = list(col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character()))
# names(SAP_INV_DAT) = c("DROP","I_BILLDATE","I_ORG","I_DC","I_DOCCA","I_BILLTYPE","I_ORDER_REA","I_DIV","I_MATYPE","I_BILL_QTY","BLANK")
# SAP_INV_DAT<- SAP_INV_DAT[-which(is.na(SAP_INV_DAT$I_ORG)),] %>% .[-str_which(trimws(.$I_ORG),'SOrg.|-----'),c(-1,-11)]
# 3
# SAP_INV_DAT$I_BILLDATE <- date(parse_datetime(SAP_INV_DAT$I_BILLDATE, "%m/%d/%Y"))
#
# head(SAP_INV_DAT)
summary(EMP_DATA)## GeoFIPS GeoName Region TableName
## Length:7084 Length:7084 Min. :1.000 Length:7084
## Class :character Class :character 1st Qu.:3.000 Class :character
## Mode :character Mode :character Median :5.000 Mode :character
## Mean :4.475
## 3rd Qu.:6.000
## Max. :8.000
## NA's :122
## LineCode IndustryClassification Description Unit
## Min. : 10.0 Length:7084 Length:7084 Length:7084
## 1st Qu.: 517.0 Class :character Class :character Class :character
## Median : 712.5 Mode :character Mode :character Mode :character
## Mean : 872.1
## 3rd Qu.:1103.0
## Max. :2012.0
## NA's :4
## 1998 1999 2000 2001
## Length:7084 Length:7084 Length:7084 Length:7084
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## 2002 2003 2004 2005
## Length:7084 Length:7084 Length:7084 Length:7084
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## 2006 2007 2008 2009
## Length:7084 Length:7084 Length:7084 Length:7084
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## 2010 2011 2012 2013
## Length:7084 Length:7084 Length:7084 Length:7084
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## 2014 2015 2016 2017
## Length:7084 Length:7084 Length:7084 Length:7084
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## 2018
## Length:7084
## Class :character
## Mode :character
##
##
##
##
# Reading the Category and detail Desc for ref.
EMP_DEF_FILE <- paste0(file_path,"SAEMP25N__definition.xml")
IND_DESC <- XML::xmlToDataFrame(EMP_DEF_FILE)
names(IND_DESC) <- c("LineCode","Description")
glimpse(IND_DESC)## Rows: 118
## Columns: 3
## $ LineCode <fct> 10, 20, 40, 50, 60, 70, 80, 90, 100, 101, 102, 103, 200...
## $ Description <fct> "Total employment (number of jobs)", "Wage and salary e...
## $ NA <fct> "A count of jobs, both full-time and part-time. It incl...
3.2 Show Relevant Employment Data
# MN-MAIN | NN - RANGE | NL - LAST |
EMAP_CAT_LAST <- EMP_DATA[,c(2,5,6,7)] %>%
filter(GeoName=="United States") %>%
mutate(ISM = ifelse(IndustryClassification=="...","M","N")) %>%
separate(IndustryClassification, c("From", "To")) %>%
mutate(ISML = ifelse(is.na(To)==T,"L","N")) %>%
filter(ISML %in% c("L","N")) %>%
select(.,LineCode,Description)
#DT::datatable(EMAP_CAT_LAST)
EMP_DAT_CAT <- EMP_DATA[,c(2,5,9:29)] %>%
left_join(EMAP_CAT_LAST,.,by="LineCode")
glimpse(EMP_DAT_CAT)## Rows: 7,080
## Columns: 24
## $ LineCode <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,...
## $ Description <chr> "Total employment (number of jobs)", "Total employment ...
## $ GeoName <chr> "United States", "Alabama", "Alaska", "Arizona", "Arkan...
## $ `1998` <chr> "158481200", "2361892", "382166", "2616288", "1445536",...
## $ `1999` <chr> "161531300", "2378217", "381307", "2695892", "1460374",...
## $ `2000` <chr> "165370800", "2392225", "389734", "2801510", "1482449",...
## $ `2001` <chr> "165522200", "2376053", "394565", "2829002", "1482587",...
## $ `2002` <chr> "165095100", "2364828", "402187", "2847095", "1478929",...
## $ `2003` <chr> "165921500", "2371430", "405621", "2917121", "1482035",...
## $ `2004` <chr> "168839700", "2425649", "413864", "3041476", "1505095",...
## $ `2005` <chr> "172338400", "2486833", "421419", "3219820", "1537680",...
## $ `2006` <chr> "175868600", "2545556", "431320", "3375218", "1567682",...
## $ `2007` <chr> "179543700", "2604078", "439825", "3465075", "1582858",...
## $ `2008` <chr> "179213900", "2582591", "443538", "3402808", "1579283",...
## $ `2009` <chr> "173636700", "2479507", "442447", "3228493", "1542944",...
## $ `2010` <chr> "172901700", "2460298", "443904", "3181571", "1541272",...
## $ `2011` <chr> "176091700", "2497933", "450364", "3239045", "1561948",...
## $ `2012` <chr> "178979700", "2503678", "459222", "3295537", "1565142",...
## $ `2013` <chr> "182325100", "2523338", "461110", "3371219", "1569249",...
## $ `2014` <chr> "186233800", "2551872", "461327", "3448173", "1587414",...
## $ `2015` <chr> "190315800", "2586885", "461767", "3548174", "1610779",...
## $ `2016` <chr> "193371900", "2619154", "457371", "3646604", "1629237",...
## $ `2017` <chr> "196825300", "2653968", "456799", "3751283", "1644432",...
## $ `2018` <chr> "200746000", "2691517", "459178", "3859137", "1663188",...
#DT::datatable(EMP_DAT_CAT)
temp_name <- names(EMP_DAT_CAT [,c(4:24)])
EMP_DAT_CAT_SAN <- data.frame(lapply(EMP_DAT_CAT [,c(4:24)],function(x) {
gsub("(D)|(T)",NA,x)
}))
names(EMP_DAT_CAT_SAN) = temp_name
#DT::datatable(EMP_DAT_CAT_SAN)
EMP_DAT_CAT_SAN <- cbind(EMP_DAT_CAT[,c(1:3)],EMP_DAT_CAT_SAN)
#DT::datatable(EMP_DAT_CAT_SAN)3.2.1 Employment from Select Industry by State
3.2.1.1 AK - Alaska
EMP_DAT_CAT_SAN_AK <- EMP_DAT_CAT_SAN %>%
filter(LineCode %in% c(705,1100,800,1600,1500)) %>%
filter(GeoName=="Alaska") %>%
reshape2::melt(
id.vars = c("LineCode","Description","GeoName"),
variable.name = "Year",
value.name = "Number of jobs")
EMP_DAT_CAT_SAN_AK %>%
#sample_frac(0.33) %>%
DT::datatable()3.2.1.2 AZ - Arizona
EMP_DAT_CAT_SAN_AZ <- EMP_DAT_CAT_SAN %>%
filter(LineCode %in% c(705,1100,800,1600,1500)) %>%
filter(GeoName=="Arizona") %>%
reshape2::melt(
id.vars = c("LineCode","Description","GeoName"),
variable.name = "Year",
value.name = "Number of jobs")
EMP_DAT_CAT_SAN_AZ %>%
#sample_frac(0.33) %>%
DT::datatable()3.2.1.3 CA - California
EMP_DAT_CAT_SAN_CA <- EMP_DAT_CAT_SAN %>%
filter(LineCode %in% c(705,1100,800,1600,1500)) %>%
filter(GeoName=="California") %>%
reshape2::melt(
id.vars = c("LineCode","Description","GeoName"),
variable.name = "Year",
value.name = "Number of jobs")
EMP_DAT_CAT_SAN_CA %>%
#sample_frac(0.33) %>%
DT::datatable()3.2.1.4 CO - Colorado
EMP_DAT_CAT_SAN_CO <- EMP_DAT_CAT_SAN %>%
filter(LineCode %in% c(705,1100,800,1600,1500)) %>%
filter(GeoName=="Colorado") %>%
reshape2::melt(
id.vars = c("LineCode","Description","GeoName"),
variable.name = "Year",
value.name = "Number of jobs")
EMP_DAT_CAT_SAN_CO %>%
#sample_frac(0.33) %>%
DT::datatable()3.2.1.5 FL - Florida
EMP_DAT_CAT_SAN_FL <- EMP_DAT_CAT_SAN %>%
filter(LineCode %in% c(705,1100,800,1600,1500)) %>%
filter(GeoName=="Florida") %>%
reshape2::melt(
id.vars = c("LineCode","Description","GeoName"),
variable.name = "Year",
value.name = "Number of jobs")
EMP_DAT_CAT_SAN_FL %>%
#sample_frac(0.33) %>%
DT::datatable()3.2.1.6 GA - Georgia
EMP_DAT_CAT_SAN_GA <- EMP_DAT_CAT_SAN %>%
filter(LineCode %in% c(705,1100,800,1600,1500)) %>%
filter(GeoName=="Georgia") %>%
reshape2::melt(
id.vars = c("LineCode","Description","GeoName"),
variable.name = "Year",
value.name = "Number of jobs")
EMP_DAT_CAT_SAN_GA %>%
#sample_frac(0.33) %>%
DT::datatable()3.2.1.7 HI - Hawaii
EMP_DAT_CAT_SAN_HI <- EMP_DAT_CAT_SAN %>%
filter(LineCode %in% c(705,1100,800,1600,1500)) %>%
filter(GeoName=="Hawaii") %>%
reshape2::melt(
id.vars = c("LineCode","Description","GeoName"),
variable.name = "Year",
value.name = "Number of jobs")
EMP_DAT_CAT_SAN_HI %>%
#sample_frac(0.33) %>%
DT::datatable()3.2.1.8 IL - Illinois
EMP_DAT_CAT_SAN_IL <- EMP_DAT_CAT_SAN %>%
filter(LineCode %in% c(705,1100,800,1600,1500)) %>%
filter(GeoName=="Illinois") %>%
reshape2::melt(
id.vars = c("LineCode","Description","GeoName"),
variable.name = "Year",
value.name = "Number of jobs")
EMP_DAT_CAT_SAN_IL %>%
#sample_frac(0.33) %>%
DT::datatable()3.2.1.9 KS - Kansas
EMP_DAT_CAT_SAN_KS <- EMP_DAT_CAT_SAN %>%
filter(LineCode %in% c(705,1100,800,1600,1500)) %>%
filter(GeoName=="Kansas") %>%
reshape2::melt(
id.vars = c("LineCode","Description","GeoName"),
variable.name = "Year",
value.name = "Number of jobs")
EMP_DAT_CAT_SAN_KS %>%
#sample_frac(0.33) %>%
DT::datatable()3.2.1.10 MA - Massachusetts
EMP_DAT_CAT_SAN_MA <- EMP_DAT_CAT_SAN %>%
filter(LineCode %in% c(705,1100,800,1600,1500)) %>%
filter(GeoName=="Massachusetts") %>%
reshape2::melt(
id.vars = c("LineCode","Description","GeoName"),
variable.name = "Year",
value.name = "Number of jobs")
EMP_DAT_CAT_SAN_MA %>%
#sample_frac(0.33) %>%
DT::datatable()3.2.1.11 MI - Michigan
EMP_DAT_CAT_SAN_MI <- EMP_DAT_CAT_SAN %>%
filter(LineCode %in% c(705,1100,800,1600,1500)) %>%
filter(GeoName=="Michigan") %>%
reshape2::melt(
id.vars = c("LineCode","Description","GeoName"),
variable.name = "Year",
value.name = "Number of jobs")
EMP_DAT_CAT_SAN_MI %>%
#sample_frac(0.33) %>%
DT::datatable()3.2.1.12 MN - Minnesota
EMP_DAT_CAT_SAN_MN <- EMP_DAT_CAT_SAN %>%
filter(LineCode %in% c(705,1100,800,1600,1500)) %>%
filter(GeoName=="Michigan") %>%
reshape2::melt(
id.vars = c("LineCode","Description","GeoName"),
variable.name = "Year",
value.name = "Number of jobs")
EMP_DAT_CAT_SAN_MN %>%
#sample_frac(0.33) %>%
DT::datatable()3.2.1.13 MO - Missouri
EMP_DAT_CAT_SAN_MO <- EMP_DAT_CAT_SAN %>%
filter(LineCode %in% c(705,1100,800,1600,1500)) %>%
filter(GeoName=="Missouri") %>%
reshape2::melt(
id.vars = c("LineCode","Description","GeoName"),
variable.name = "Year",
value.name = "Number of jobs")
EMP_DAT_CAT_SAN_MO %>%
#sample_frac(0.33) %>%
DT::datatable()3.2.1.14 NY - New York
EMP_DAT_CAT_SAN_NY <- EMP_DAT_CAT_SAN %>%
filter(LineCode %in% c(705,1100,800,1600,1500)) %>%
filter(GeoName=="New York") %>%
reshape2::melt(
id.vars = c("LineCode","Description","GeoName"),
variable.name = "Year",
value.name = "Number of jobs")
EMP_DAT_CAT_SAN_NY %>%
#sample_frac(0.33) %>%
DT::datatable()3.2.1.15 OH - Ohio
EMP_DAT_CAT_SAN_OH <- EMP_DAT_CAT_SAN %>%
filter(LineCode %in% c(705,1100,800,1600,1500)) %>%
filter(GeoName=="Ohio") %>%
reshape2::melt(
id.vars = c("LineCode","Description","GeoName"),
variable.name = "Year",
value.name = "Number of jobs")
EMP_DAT_CAT_SAN_OH %>%
#sample_frac(0.33) %>%
DT::datatable()3.2.1.16 OR - Oregon
EMP_DAT_CAT_SAN_OR <- EMP_DAT_CAT_SAN %>%
filter(LineCode %in% c(705,1100,800,1600,1500)) %>%
filter(GeoName=="Oregon") %>%
reshape2::melt(
id.vars = c("LineCode","Description","GeoName"),
variable.name = "Year",
value.name = "Number of jobs")
EMP_DAT_CAT_SAN_OR %>%
#sample_frac(0.33) %>%
DT::datatable()3.2.1.17 PA - Pennsylvania
EMP_DAT_CAT_SAN_PA <- EMP_DAT_CAT_SAN %>%
filter(LineCode %in% c(705,1100,800,1600,1500)) %>%
filter(GeoName=="Pennsylvania") %>%
reshape2::melt(
id.vars = c("LineCode","Description","GeoName"),
variable.name = "Year",
value.name = "Number of jobs")
EMP_DAT_CAT_SAN_PA %>%
#sample_frac(0.33) %>%
DT::datatable()3.2.1.18 TX - Texas
EMP_DAT_CAT_SAN_TX <- EMP_DAT_CAT_SAN %>%
filter(LineCode %in% c(705,1100,800,1600,1500)) %>%
filter(GeoName=="Texas") %>%
reshape2::melt(
id.vars = c("LineCode","Description","GeoName"),
variable.name = "Year",
value.name = "Number of jobs")
EMP_DAT_CAT_SAN_TX %>%
#sample_frac(0.33) %>%
DT::datatable()3.2.1.19 WA - Washington
EMP_DAT_CAT_SAN_WA <- EMP_DAT_CAT_SAN %>%
filter(LineCode %in% c(705,1100,800,1600,1500)) %>%
filter(GeoName=="Washington") %>%
reshape2::melt(
id.vars = c("LineCode","Description","GeoName"),
variable.name = "Year",
value.name = "Number of jobs")
EMP_DAT_CAT_SAN_WA %>%
#sample_frac(0.33) %>%
DT::datatable()3.2.1.20 WI - Wisconsin
EMP_DAT_CAT_SAN_WI <- EMP_DAT_CAT_SAN %>%
filter(LineCode %in% c(705,1100,800,1600,1500)) %>%
filter(GeoName=="Wisconsin") %>%
reshape2::melt(
id.vars = c("LineCode","Description","GeoName"),
variable.name = "Year",
value.name = "Number of jobs")
EMP_DAT_CAT_SAN_WI %>%
#sample_frac(0.33) %>%
DT::datatable()3.3 Load CPI Data
file_path <- "../data/cpi/archive/"
### Read files named *.csv
filenames <- list.files(path=file_path,
pattern="*.csv",
full.names = TRUE)
###print(filenames)
###Create list of data frame names without the ".csv" part
names <- gsub(pattern='\\.',
replacement='_', # remove the last 4 characters from filename (_clean)
x=(sub(pattern = "(.*)\\..*$", replacement = "\\1", # remove extenson from filename (.csv)
basename(filenames) # get filename without the directory file path
)
)
)
###print(names)
###Load all files
for(i in filenames){
## Create list of data frame names without the ".csv" part
name <- gsub(pattern='\\.',
replacement='_', # remove the last 4 characters from filename (_clean)
x=(sub(pattern = "(.*)\\..*$", replacement = "\\1", # remove extenson from filename (.csv)
basename(i) # get filename without the directory file path
)
)
)
print(name)
assign(x=name, value=data.frame(read.csv(i)))
#DT::datatable(name)
}## [1] "cu_area"
## [1] "cu_base"
## [1] "cu_data_0_Current"
## [1] "cu_data_1_AllItems"
## [1] "cu_data_10_OtherWest"
## [1] "cu_data_11_USFoodBeverage"
## [1] "cu_data_12_USHousing"
## [1] "cu_data_13_USApparel"
## [1] "cu_data_14_USTransportation"
## [1] "cu_data_15_USMedical"
## [1] "cu_data_16_USRecreation"
## [1] "cu_data_17_USEducationAndCommunication"
## [1] "cu_data_18_USOtherGoodsAndServices"
## [1] "cu_data_19_PopulationSize"
## [1] "cu_data_2_Summaries"
## [1] "cu_data_20_USCommoditiesServicesSpecial"
## [1] "cu_data_3_AsizeNorthEast"
## [1] "cu_data_4_AsizeNorthCentral"
## [1] "cu_data_5_AsizeSouth"
## [1] "cu_data_6_AsizeWest"
## [1] "cu_data_7_OtherNorthEast"
## [1] "cu_data_8_OtherNorthCentral"
## [1] "cu_data_9_OtherSouth"
## [1] "cu_footnote"
## [1] "cu_item"
## [1] "cu_period"
## [1] "cu_periodicity"
## [1] "cu_series"
3.4 Show CPI Reference Data
3.4.1 Area
3.4.2 Base
3.4.3 Item
3.4.4 Periodicity
3.4.5 Series
3.4.6 Period
3.4.7 Combined Reference Data
3.5 Show CPI Data from Select Industry
3.5.1 CPI - US Food Beverages
FoodItemCode <- cu_reference_data %>%
filter(grepl('SAF|SEF|SS0|SS1|SS20', item_code))
#dplyr::filter(item_code %like% 'SAH' | item_code %like% 'SEH')
#DT::datatable((FoodItemCode))
cu_data_USFoodBeverage = cu_data_0_Current %>%
inner_join(FoodItemCode, by="series_id") %>%
inner_join(cu_period, by="period") %>%
filter(year %in% (1998:2018)) %>%
select(c("series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period"))
DT::datatable(head(cu_data_USFoodBeverage))3.5.2 CPI - US Housing
HousingItemCode <- cu_reference_data %>%
#filter(grepl('SAH|SEH', item_code, fixed = TRUE))
filter(grepl('SAH|SEH', item_code))
#dplyr::filter(item_code %like% 'SAH' | item_code %like% 'SEH')
#DT::datatable((HousingItemCode))
cu_data_USHousing = cu_data_0_Current %>%
inner_join(HousingItemCode, by="series_id") %>%
inner_join(cu_period, by="period") %>%
#filter(item_code %like% c('SAH','SEH')) %>%
filter(year %in% (1998:2018)) %>%
select(c("series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period"))
DT::datatable(head(cu_data_USHousing))3.5.3 CPI - US Transportation
TransportItemCode <- cu_reference_data %>%
filter(grepl('SAT|SET|SS4|SS52|SS53|SAS24|SAS4', item_code))
cu_data_USTransportation = cu_data_0_Current %>%
inner_join(TransportItemCode, by="series_id") %>%
inner_join(cu_period, by="period") %>%
filter(year %in% (1998:2018)) %>%
select(c("series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period"))
DT::datatable(head(cu_data_USTransportation))3.5.4 CPI - US Medical
MedicalItemCode <- cu_reference_data %>%
filter(grepl('SAM|SEM|SS57', item_code))
cu_data_USMedical = cu_data_0_Current %>%
inner_join(MedicalItemCode, by="series_id") %>%
inner_join(cu_period, by="period") %>%
filter(year %in% (1998:2018)) %>%
select(c("series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period"))
DT::datatable(head(cu_data_USMedical))3.5.5 CPI - US Education And Communication
EducationItemCode <- cu_reference_data %>%
filter(grepl('SAE|SEE|SSE|SS27', item_code))
cu_data_USEducationAndCommunication = cu_data_0_Current %>%
inner_join(EducationItemCode, by="series_id") %>%
inner_join(cu_period, by="period") %>%
filter(year %in% (1998:2018)) %>%
select(c("series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period"))
DT::datatable(head(cu_data_USEducationAndCommunication))4 Employment vs CPI - by Industry and Area
4.1 Food & Beverages
4.1.1 AK - Alaska : Anchorage
cu_emp_data_USFoodBeverage_AK = cu_data_USFoodBeverage %>%
filter(area_name=="Anchorage",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAF',
emp_line_code = 705,
industry = 'Food & Beverages',
state = 'AK') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USFoodBeverage_AK = merge(EMP_DAT_CAT_SAN_AK, cu_emp_data_USFoodBeverage_AK,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_AK %>%
###sample_frac(0.33) %>%
DT::datatable()4.1.2 AZ - Arizona : Phoenix-Mesa
cu_emp_data_USFoodBeverage_AZ = cu_data_USFoodBeverage %>%
filter(area_name=="Phoenix-Mesa",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAF',
emp_line_code = 705,
industry = 'Food & Beverages',
state = 'AZ') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USFoodBeverage_AZ = merge(EMP_DAT_CAT_SAN_AZ, cu_emp_data_USFoodBeverage_AZ,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_AZ %>%
###sample_frac(0.33) %>%
DT::datatable()4.1.3 CA - California
- Los Angeles-Riverside-Orange County
- San Francisco-Oakland-San Jose
- San Diego
cu_emp_data_USFoodBeverage_CA = cu_data_USFoodBeverage %>%
filter( (area_name %in% c("Los Angeles-Riverside-Orange County", "San Francisco-Oakland-San Jose")
& period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
)
|
(area_name %in% c("San Diego")
& period %in% c("S01","S02")
)
) %>%
mutate(parent_item_code = 'SAF',
emp_line_code = 705,
area_code = 'A421;A422;A424',
area_name = 'Los Angeles-Riverside-Orange County;San Francisco-Oakland-San Jose;San Diego',
industry = 'Food & Beverages',
state = 'CA') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value/3),2))
cpi_emp_data_USFoodBeverage_CA = merge(EMP_DAT_CAT_SAN_CA, cu_emp_data_USFoodBeverage_CA,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_CA %>%
###sample_frac(0.33) %>%
DT::datatable()4.1.4 CO - Colorado : Denver-Boulder-Greeley
cu_emp_data_USFoodBeverage_CO = cu_data_USFoodBeverage %>%
filter(area_name %in% c("Denver-Boulder-Greeley"),
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAF',
emp_line_code = 705,
industry = 'Food & Beverages',
state = 'CO') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USFoodBeverage_CO = merge(EMP_DAT_CAT_SAN_CO, cu_emp_data_USFoodBeverage_CO,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_CO %>%
###sample_frac(0.33) %>%
DT::datatable()4.1.5 FL - Florida
- Miami-Fort Lauderdale
- Tampa-St. Petersburg-Clearwater
cu_emp_data_USFoodBeverage_FL = cu_data_USFoodBeverage %>%
filter( (area_name %in% c("Miami-Fort Lauderdale")
& period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
)
|
(area_name %in% c("Tampa-St. Petersburg-Clearwater")
& period %in% c("S01","S02")
)
) %>%
mutate(parent_item_code = 'SAF',
emp_line_code = 705,
area_code = 'A320;A321',
area_name = 'Miami-Fort Lauderdale;Tampa-St. Petersburg-Clearwater',
industry = 'Food & Beverages',
state = 'FL') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USFoodBeverage_FL = merge(EMP_DAT_CAT_SAN_FL, cu_emp_data_USFoodBeverage_FL,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_FL %>%
###sample_frac(0.33) %>%
DT::datatable()4.1.6 GA - Georgia : Atlanta
cu_emp_data_USFoodBeverage_GA = cu_data_USFoodBeverage %>%
filter(area_name=="Atlanta",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(industry = 'Food & Beverages',
state = 'GA',
parent_item_code = 'SAF',
emp_line_code = 705,
industry = 'Food & Beverages',
state = 'GA') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USFoodBeverage_GA = merge(EMP_DAT_CAT_SAN_GA, cu_emp_data_USFoodBeverage_GA,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_GA %>%
###sample_frac(0.33) %>%
DT::datatable()4.1.7 HI - Hawaii : Honolulu
cu_emp_data_USFoodBeverage_HI = cu_data_USFoodBeverage %>%
filter(area_name=="Honolulu",
period %in% c("S01","S02")) %>%
mutate(industry = 'Food & Beverages',
state = 'HI',
parent_item_code = 'SAF',
emp_line_code = 705,
industry = 'Food & Beverages',
state = 'HI') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USFoodBeverage_HI = merge(EMP_DAT_CAT_SAN_HI, cu_emp_data_USFoodBeverage_HI,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_HI %>%
###sample_frac(0.33) %>%
DT::datatable()4.1.8 IL - Illinois : Chicago-Gary-Kenosha
cu_emp_data_USFoodBeverage_IL = cu_data_USFoodBeverage %>%
filter(area_name=="Chicago-Gary-Kenosha",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(industry = 'Food & Beverages',
state = 'IL',
parent_item_code = 'SAF',
emp_line_code = 705,
industry = 'Food & Beverages',
state = 'IL') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USFoodBeverage_IL = merge(EMP_DAT_CAT_SAN_IL, cu_emp_data_USFoodBeverage_IL,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_IL %>%
###sample_frac(0.33) %>%
DT::datatable()4.1.9 KS - Kansas : Kansas City
cu_emp_data_USFoodBeverage_KS = cu_data_USFoodBeverage %>%
filter(area_name=="Kansas City",
period %in% c("S01","S02")) %>%
mutate(industry = 'Food & Beverages',
state = 'KS',
parent_item_code = 'SAF',
emp_line_code = 705,
industry = 'Food & Beverages',
state = 'KS') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USFoodBeverage_KS = merge(EMP_DAT_CAT_SAN_KS, cu_emp_data_USFoodBeverage_KS,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_KS %>%
###sample_frac(0.33) %>%
DT::datatable()4.1.10 MA - Massachusetts : Boston-Brockton-Nashua
cu_emp_data_USFoodBeverage_MA = cu_data_USFoodBeverage %>%
filter(area_name=="Boston-Brockton-Nashua",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAF',
emp_line_code = 705,
industry = 'Food & Beverages',
state = 'MA') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USFoodBeverage_MA = merge(EMP_DAT_CAT_SAN_MA, cu_emp_data_USFoodBeverage_MA,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_MA %>%
###sample_frac(0.33) %>%
DT::datatable()4.1.11 MI - Michigan : Detroit-Ann Arbor-Flint
cu_emp_data_USFoodBeverage_MI = cu_data_USFoodBeverage %>%
filter(area_name=="Detroit-Ann Arbor-Flint",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAF',
emp_line_code = 705,
industry = 'Food & Beverages',
state = 'MI') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USFoodBeverage_MI = merge(EMP_DAT_CAT_SAN_MI, cu_emp_data_USFoodBeverage_MI,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_MI %>%
###sample_frac(0.33) %>%
DT::datatable()4.1.12 MN - Minnesota : Minneapolis-St. Paul
cu_emp_data_USFoodBeverage_MN = cu_data_USFoodBeverage %>%
filter(area_name=="Minneapolis-St. Paul",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAF',
emp_line_code = 705,
industry = 'Food & Beverages',
state = 'MN') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USFoodBeverage_MN = merge(EMP_DAT_CAT_SAN_MN, cu_emp_data_USFoodBeverage_MN,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_MN %>%
###sample_frac(0.33) %>%
DT::datatable()4.1.13 MO - Missouri : St. Louis
cu_emp_data_USFoodBeverage_MO = cu_data_USFoodBeverage %>%
filter(area_name=="St. Louis",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAF',
emp_line_code = 705,
industry = 'Food & Beverages',
state = 'MO') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USFoodBeverage_MO = merge(EMP_DAT_CAT_SAN_MO, cu_emp_data_USFoodBeverage_MO,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_MO %>%
###sample_frac(0.33) %>%
DT::datatable()4.1.14 NY - New York : New York-Northern New Jersey-Long Island
cu_emp_data_USFoodBeverage_NY = cu_data_USFoodBeverage %>%
filter(area_name=="New York-Northern New Jersey-Long Island",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAF',
emp_line_code = 705,
industry = 'Food & Beverages',
state = 'NY') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USFoodBeverage_NY = merge(EMP_DAT_CAT_SAN_NY, cu_emp_data_USFoodBeverage_NY,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_NY %>%
###sample_frac(0.33) %>%
DT::datatable()4.1.15 OH - Ohio
- Cleveland-Akron
- Cincinnati-Hamilton
cu_emp_data_USFoodBeverage_OH = cu_data_USFoodBeverage %>%
filter( (area_name %in% c("Cleveland-Akron")
& period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
)
|
(area_name %in% c("Cincinnati-Hamilton")
& period %in% c("S01","S02")
)
) %>%
mutate(parent_item_code = 'SAF',
emp_line_code = 705,
area_code = 'A210;A213',
area_name = 'Cleveland-Akron;Cincinnati-Hamilton',
industry = 'Food & Beverages',
state = 'OH') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USFoodBeverage_OH = merge(EMP_DAT_CAT_SAN_OH, cu_emp_data_USFoodBeverage_OH,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_OH %>%
###sample_frac(0.33) %>%
DT::datatable()4.1.16 OR - Oregon : Portland-Salem
cu_emp_data_USFoodBeverage_OR = cu_data_USFoodBeverage %>%
filter(area_name=="Portland-Salem",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAF',
emp_line_code = 705,
industry = 'Food & Beverages',
state = 'OR') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USFoodBeverage_OR = merge(EMP_DAT_CAT_SAN_OR, cu_emp_data_USFoodBeverage_OR,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_OR %>%
###sample_frac(0.33) %>%
DT::datatable()4.1.17 PA - Pennsylvania
- Philadelphia-Wilmington-Atlantic City
- Pittsburgh
cu_emp_data_USFoodBeverage_PA = cu_data_USFoodBeverage %>%
filter( (area_name %in% c("Philadelphia-Wilmington-Atlantic City")
& period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
)
|
(area_name %in% c("Pittsburgh")
& period %in% c("S01","S02")
)
) %>%
mutate(parent_item_code = 'SAF',
emp_line_code = 705,
area_code = 'A102;A104',
area_name = 'Philadelphia-Wilmington-Atlantic City;Pittsburgh',
industry = 'Food & Beverages',
state = 'PA') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USFoodBeverage_PA = merge(EMP_DAT_CAT_SAN_PA, cu_emp_data_USFoodBeverage_PA,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_PA %>%
###sample_frac(0.33) %>%
DT::datatable()4.1.18 TX - Texas
- Dallas-Fort Worth
- Houston-Galveston-Brazoria
cu_emp_data_USFoodBeverage_TX = cu_data_USFoodBeverage %>%
filter(area_name %in% c("Dallas-Fort Worth","Houston-Galveston-Brazoria"),
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAF',
emp_line_code = 705,
area_code = 'A316;A318',
area_name = 'Dallas-Fort Worth;Houston-Galveston-Brazoria',
industry = 'Food & Beverages',
state = 'TX') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USFoodBeverage_TX = merge(EMP_DAT_CAT_SAN_TX, cu_emp_data_USFoodBeverage_TX,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_TX %>%
###sample_frac(0.33) %>%
DT::datatable()4.1.19 WA - Washington : Seattle-Tacoma-Bremerton
cu_emp_data_USFoodBeverage_WA = cu_data_USFoodBeverage %>%
filter(area_name=="Seattle-Tacoma-Bremerton",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAF',
emp_line_code = 705,
industry = 'Food & Beverages',
state = 'WA') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USFoodBeverage_WA = merge(EMP_DAT_CAT_SAN_WA, cu_emp_data_USFoodBeverage_WA,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_WA %>%
###sample_frac(0.33) %>%
DT::datatable()4.1.20 WI - Wisconsin : Milwaukee-Racine
cu_emp_data_USFoodBeverage_WI = cu_data_USFoodBeverage %>%
filter(area_name=="Milwaukee-Racine",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAF',
emp_line_code = 705,
industry = 'Food & Beverages',
state = 'WI') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USFoodBeverage_WI = merge(EMP_DAT_CAT_SAN_WI, cu_emp_data_USFoodBeverage_WI,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_WI %>%
###sample_frac(0.33) %>%
DT::datatable()4.2 Housing
4.2.1 AK - Alaska : Anchorage
cu_emp_data_USHousing_AK = cu_data_USHousing %>%
filter(area_name=="Anchorage",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAH',
emp_line_code = 1100,
industry = 'Housing',
state = 'AK') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USHousing_AK = merge(EMP_DAT_CAT_SAN_AK, cu_emp_data_USHousing_AK,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_AK %>%
# sample_frac(0.33) %>%
DT::datatable()4.2.2 AZ - Arizona : Phoenix-Mesa
cu_emp_data_USHousing_AZ = cu_data_USHousing %>%
filter(area_name=="Phoenix-Mesa",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAH',
emp_line_code = 1100,
industry = 'Housing',
state = 'AZ') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USHousing_AZ = merge(EMP_DAT_CAT_SAN_AZ, cu_emp_data_USHousing_AZ,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_AZ %>%
# sample_frac(0.33) %>%
DT::datatable()4.2.3 CA - California
- Los Angeles-Riverside-Orange County
- San Francisco-Oakland-San Jose
- San Diego
cu_emp_data_USHousing_CA = cu_data_USHousing %>%
filter( (area_name %in% c("Los Angeles-Riverside-Orange County", "San Francisco-Oakland-San Jose")
& period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
)
|
(area_name %in% c("San Diego")
& period %in% c("S01","S02")
)
) %>%
mutate(parent_item_code = 'SAH',
emp_line_code = 1100,
area_code = 'A421;A422;A424',
area_name = 'Los Angeles-Riverside-Orange County;San Francisco-Oakland-San Jose;San Diego',
industry = 'Housing',
state = 'CA') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value/3),2))
cpi_emp_data_USHousing_CA = merge(EMP_DAT_CAT_SAN_CA, cu_emp_data_USHousing_CA,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_CA %>%
# sample_frac(0.33) %>%
DT::datatable()4.2.4 CO - Colorado : Denver-Boulder-Greeley
cu_emp_data_USHousing_CO = cu_data_USHousing %>%
filter(area_name=="Denver-Boulder-Greeley",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAH',
emp_line_code = 1100,
industry = 'Housing',
state = 'CO') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USHousing_CO = merge(EMP_DAT_CAT_SAN_CO, cu_emp_data_USHousing_CO,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_CO %>%
# sample_frac(0.33) %>%
DT::datatable()4.2.5 FL - Florida
- Miami-Fort Lauderdale
- Tampa-St. Petersburg-Clearwater
cu_emp_data_USHousing_FL = cu_data_USHousing %>%
filter( (area_name %in% c("Miami-Fort Lauderdale")
& period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
)
|
(area_name %in% c("Tampa-St. Petersburg-Clearwater")
& period %in% c("S01","S02")
)
) %>%
mutate(parent_item_code = 'SAH',
emp_line_code = 1100,
area_code = 'A320;A321',
area_name = 'Miami-Fort Lauderdale;Tampa-St. Petersburg-Clearwater',
industry = 'Housing',
state = 'FL') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USHousing_FL = merge(EMP_DAT_CAT_SAN_FL, cu_emp_data_USHousing_FL,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_FL %>%
# sample_frac(0.33) %>%
DT::datatable()4.2.6 GA - Georgia : Atlanta
cu_emp_data_USHousing_GA = cu_data_USHousing %>%
filter(area_name=="Atlanta",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAH',
emp_line_code = 1100,
industry = 'Housing',
state = 'GA') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USHousing_GA = merge(EMP_DAT_CAT_SAN_GA, cu_emp_data_USHousing_GA,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_GA %>%
# sample_frac(0.33) %>%
DT::datatable()4.2.7 HI - Hawaii : Honolulu
cu_emp_data_USHousing_HI = cu_data_USHousing %>%
filter(area_name=="Honolulu",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAH',
emp_line_code = 1100,
industry = 'Housing',
state = 'HI') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USHousing_HI = merge(EMP_DAT_CAT_SAN_HI, cu_emp_data_USHousing_HI,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_HI %>%
# sample_frac(0.33) %>%
DT::datatable()4.2.8 IL - Illinois : Chicago-Gary-Kenosha
cu_emp_data_USHousing_IL = cu_data_USHousing %>%
filter(area_name=="Chicago-Gary-Kenosha",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAH',
emp_line_code = 1100,
industry = 'Housing',
state = 'IL') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USHousing_IL = merge(EMP_DAT_CAT_SAN_IL, cu_emp_data_USHousing_IL,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_IL %>%
# sample_frac(0.33) %>%
DT::datatable()4.2.9 KS - Kansas : Kansas City
cu_emp_data_USHousing_KS = cu_data_USHousing %>%
filter(area_name=="Kansas City",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAH',
emp_line_code = 1100,
industry = 'Housing',
state = 'KS') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USHousing_KS = merge(EMP_DAT_CAT_SAN_KS, cu_emp_data_USHousing_KS,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_KS %>%
# sample_frac(0.33) %>%
DT::datatable()4.2.10 MA - Massachusetts : Boston-Brockton-Nashua
cu_emp_data_USHousing_MA = cu_data_USHousing %>%
filter(area_name=="Boston-Brockton-Nashua",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAH',
emp_line_code = 1100,
industry = 'Housing',
state = 'MA') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USHousing_MA = merge(EMP_DAT_CAT_SAN_MA, cu_emp_data_USHousing_MA,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_MA %>%
# sample_frac(0.33) %>%
DT::datatable()4.2.11 MI - Michigan
cu_emp_data_USHousing_MI = cu_data_USHousing %>%
filter(area_name=="Detroit-Ann Arbor-Flint",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAH',
emp_line_code = 1100,
industry = 'Housing',
state = 'MI') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USHousing_MI = merge(EMP_DAT_CAT_SAN_MI, cu_emp_data_USHousing_MI,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_MI %>%
# sample_frac(0.33) %>%
DT::datatable()4.2.12 MN - Minnesota : Minneapolis-St. Paul
cu_emp_data_USHousing_MN = cu_data_USHousing %>%
filter(area_name=="Minneapolis-St. Paul",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAH',
emp_line_code = 1100,
industry = 'Housing',
state = 'MN') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USHousing_MN = merge(EMP_DAT_CAT_SAN_MN, cu_emp_data_USHousing_MN,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_MN %>%
# sample_frac(0.33) %>%
DT::datatable()4.2.13 MO - Missouri : St. Louis
cu_emp_data_USHousing_MO = cu_data_USHousing %>%
filter(area_name=="St. Louis",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAH',
emp_line_code = 1100,
industry = 'Housing',
state = 'MO') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USHousing_MO = merge(EMP_DAT_CAT_SAN_MO, cu_emp_data_USHousing_MO,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_MO %>%
# sample_frac(0.33) %>%
DT::datatable()4.2.14 NY - New York : New York-Northern New Jersey-Long Island
cu_emp_data_USHousing_NY = cu_data_USHousing %>%
filter(area_name=="New York-Northern New Jersey-Long Island",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAH',
emp_line_code = 1100,
industry = 'Housing',
state = 'NY') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USHousing_NY = merge(EMP_DAT_CAT_SAN_NY, cu_emp_data_USHousing_NY,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_NY %>%
# sample_frac(0.33) %>%
DT::datatable()4.2.15 OH - Ohio
- Cleveland-Akron
- Cincinnati-Hamilton
cu_emp_data_USHousing_OH = cu_data_USHousing %>%
filter( (area_name %in% c("Cleveland-Akron")
& period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
)
|
(area_name %in% c("Cincinnati-Hamilton")
& period %in% c("S01","S02")
)
) %>%
mutate(parent_item_code = 'SAH',
emp_line_code = 1100,
area_code = 'A210;A213',
area_name = 'Cleveland-Akron;Cincinnati-Hamilton',
industry = 'Housing',
state = 'OH') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USHousing_OH = merge(EMP_DAT_CAT_SAN_OH, cu_emp_data_USHousing_OH,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_OH %>%
# sample_frac(0.33) %>%
DT::datatable()4.2.16 OR - Oregon : Portland-Salem
cu_emp_data_USHousing_OR = cu_data_USHousing %>%
filter(area_name=="Portland-Salem",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAH',
emp_line_code = 1100,
industry = 'Housing',
state = 'OR') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USHousing_OR = merge(EMP_DAT_CAT_SAN_OR, cu_emp_data_USHousing_OR,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_OR %>%
# sample_frac(0.33) %>%
DT::datatable()4.2.17 PA - Pennsylvania
- Philadelphia-Wilmington-Atlantic City
- Pittsburgh
cu_emp_data_USHousing_PA = cu_data_USHousing %>%
filter( (area_name %in% c("Philadelphia-Wilmington-Atlantic City")
& period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
)
|
(area_name %in% c("Pittsburgh")
& period %in% c("S01","S02")
)
) %>%
mutate(parent_item_code = 'SAH',
emp_line_code = 1100,
area_code = 'A102;A104',
area_name = 'Philadelphia-Wilmington-Atlantic City;Pittsburgh',
industry = 'Housing',
state = 'PA') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USHousing_PA = merge(EMP_DAT_CAT_SAN_PA, cu_emp_data_USHousing_PA,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_PA %>%
# sample_frac(0.33) %>%
DT::datatable()4.2.18 TX - Texas
- Dallas-Fort Worth
- Houston-Galveston-Brazoria
cu_emp_data_USHousing_TX = cu_data_USHousing %>%
filter(area_name %in% c("Dallas-Fort Worth","Houston-Galveston-Brazoria"),
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAH',
emp_line_code = 1100,
area_code = 'A316;A318',
area_name = 'Dallas-Fort Worth;Houston-Galveston-Brazoria',
industry = 'Housing',
state = 'TX') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USHousing_TX = merge(EMP_DAT_CAT_SAN_TX, cu_emp_data_USHousing_TX,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_TX %>%
# sample_frac(0.33) %>%
DT::datatable()4.2.19 WA - Washington : Seattle-Tacoma-Bremerton
cu_emp_data_USHousing_WA = cu_data_USHousing %>%
filter(area_name=="Seattle-Tacoma-Bremerton",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAH',
emp_line_code = 1100,
industry = 'Housing',
state = 'WA') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USHousing_WA = merge(EMP_DAT_CAT_SAN_WA, cu_emp_data_USHousing_WA,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_WA %>%
# sample_frac(0.33) %>%
DT::datatable()4.2.20 WI - Wisconsin : Milwaukee-Racine
cu_emp_data_USHousing_WI = cu_data_USHousing %>%
filter(area_name=="Milwaukee-Racine",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAH',
emp_line_code = 1100,
industry = 'Housing',
state = 'WI') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USHousing_WI = merge(EMP_DAT_CAT_SAN_WI, cu_emp_data_USHousing_WI,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_WI %>%
# sample_frac(0.33) %>%
DT::datatable()4.3 Transportation
4.3.1 AK - Alaska : Anchorage
cu_emp_data_USTransportation_AK = cu_data_USTransportation %>%
filter(area_name=="Anchorage",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAT',
emp_line_code = 800,
industry = 'Transportation',
state = 'AK') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USTransportation_AK = merge(EMP_DAT_CAT_SAN_AK, cu_emp_data_USTransportation_AK,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_AK %>%
# sample_frac(0.33) %>%
DT::datatable()4.3.2 AZ - Arizona : Phoenix-Mesa
cu_emp_data_USTransportation_AZ = cu_data_USTransportation %>%
filter(area_name=="Phoenix-Mesa",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAT',
emp_line_code = 800,
industry = 'Transportation',
state = 'AZ') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USTransportation_AZ = merge(EMP_DAT_CAT_SAN_AZ, cu_emp_data_USTransportation_AZ,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_AZ %>%
# sample_frac(0.33) %>%
DT::datatable()4.3.3 CA - California
- Los Angeles-Riverside-Orange County
- San Francisco-Oakland-San Jose
- San Diego
cu_emp_data_USTransportation_CA = cu_data_USTransportation %>%
filter( (area_name %in% c("Los Angeles-Riverside-Orange County", "San Francisco-Oakland-San Jose")
& period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
)
|
(area_name %in% c("San Diego")
& period %in% c("S01","S02")
)
) %>%
mutate(parent_item_code = 'SAT',
emp_line_code = 800,
area_code = 'A421;A422;A424',
area_name = 'Los Angeles-Riverside-Orange County;San Francisco-Oakland-San Jose;San Diego',
industry = 'Transportation',
state = 'CA') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value/3),2))
cpi_emp_data_USTransportation_CA = merge(EMP_DAT_CAT_SAN_CA, cu_emp_data_USTransportation_CA,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_CA %>%
# sample_frac(0.33) %>%
DT::datatable()4.3.4 CO - Colorado : Denver-Boulder-Greeley
cu_emp_data_USTransportation_CO = cu_data_USTransportation %>%
filter(area_name=="Denver-Boulder-Greeley",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAT',
emp_line_code = 800,
industry = 'Transportation',
state = 'CO') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USTransportation_CO = merge(EMP_DAT_CAT_SAN_CO, cu_emp_data_USTransportation_CO,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_CO %>%
# sample_frac(0.33) %>%
DT::datatable()4.3.5 FL - Florida
- Miami-Fort Lauderdale
- Tampa-St. Petersburg-Clearwater
cu_emp_data_USTransportation_FL = cu_data_USTransportation %>%
filter( (area_name %in% c("Miami-Fort Lauderdale")
& period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
)
|
(area_name %in% c("Tampa-St. Petersburg-Clearwater")
& period %in% c("S01","S02")
)
) %>%
mutate(parent_item_code = 'SAT',
emp_line_code = 800,
area_code = 'A320;A321',
area_name = 'Miami-Fort Lauderdale;Tampa-St. Petersburg-Clearwater',
industry = 'Transportation',
state = 'FL') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USTransportation_FL = merge(EMP_DAT_CAT_SAN_FL, cu_emp_data_USTransportation_FL,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_FL %>%
# sample_frac(0.33) %>%
DT::datatable()4.3.6 GA - Georgia : Atlanta
cu_emp_data_USTransportation_GA = cu_data_USTransportation %>%
filter(area_name=="Atlanta",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAT',
emp_line_code = 800,
industry = 'Transportation',
state = 'GA') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USTransportation_GA = merge(EMP_DAT_CAT_SAN_GA, cu_emp_data_USTransportation_GA,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_GA %>%
# sample_frac(0.33) %>%
DT::datatable()4.3.7 HI - Hawaii : Honolulu
cu_emp_data_USTransportation_HI = cu_data_USTransportation %>%
filter(area_name=="Honolulu",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAT',
emp_line_code = 800,
industry = 'Transportation',
state = 'HI') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USTransportation_HI = merge(EMP_DAT_CAT_SAN_HI, cu_emp_data_USTransportation_HI,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_HI %>%
# sample_frac(0.33) %>%
DT::datatable()4.3.8 IL - Illinois : Chicago-Gary-Kenosha
cu_emp_data_USTransportation_IL = cu_data_USTransportation %>%
filter(area_name=="Chicago-Gary-Kenosha",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAT',
emp_line_code = 800,
industry = 'Transportation',
state = 'IL') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USTransportation_IL = merge(EMP_DAT_CAT_SAN_IL, cu_emp_data_USTransportation_IL,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_IL %>%
# sample_frac(0.33) %>%
DT::datatable()4.3.9 KS - Kansas : Kansas City
cu_emp_data_USTransportation_KS = cu_data_USTransportation %>%
filter(area_name=="Kansas City",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAT',
emp_line_code = 800,
industry = 'Transportation',
state = 'KS') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USTransportation_KS = merge(EMP_DAT_CAT_SAN_KS, cu_emp_data_USTransportation_KS,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_KS %>%
# sample_frac(0.33) %>%
DT::datatable()4.3.10 MA - Massachusetts : Boston-Brockton-Nashua
cu_emp_data_USTransportation_MA = cu_data_USTransportation %>%
filter(area_name=="Boston-Brockton-Nashua",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAT',
emp_line_code = 800,
industry = 'Transportation',
state = 'MA') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USTransportation_MA = merge(EMP_DAT_CAT_SAN_MA, cu_emp_data_USTransportation_MA,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_MA %>%
# sample_frac(0.33) %>%
DT::datatable()4.3.11 MI - Michigan : Detroit-Ann Arbor-Flint
cu_emp_data_USTransportation_MI = cu_data_USTransportation %>%
filter(area_name=="Detroit-Ann Arbor-Flint",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAT',
emp_line_code = 800,
industry = 'Transportation',
state = 'MI') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USTransportation_MI = merge(EMP_DAT_CAT_SAN_MI, cu_emp_data_USTransportation_MI,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_MI %>%
# sample_frac(0.33) %>%
DT::datatable()4.3.12 MN - Minnesota : Minneapolis-St. Paul
cu_emp_data_USTransportation_MN = cu_data_USTransportation %>%
filter(area_name=="Minneapolis-St. Paul",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAT',
emp_line_code = 800,
industry = 'Transportation',
state = 'MN') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USTransportation_MN = merge(EMP_DAT_CAT_SAN_MN, cu_emp_data_USTransportation_MN,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_MN %>%
# sample_frac(0.33) %>%
DT::datatable()4.3.13 MO - Missouri : St. Louis
cu_emp_data_USTransportation_MO = cu_data_USTransportation %>%
filter(area_name=="St. Louis",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAT',
emp_line_code = 800,
industry = 'Transportation',
state = 'MO') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USTransportation_MO = merge(EMP_DAT_CAT_SAN_MO, cu_emp_data_USTransportation_MO,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_MO %>%
# sample_frac(0.33) %>%
DT::datatable()4.3.14 NY - New York : New York-Northern New Jersey-Long Island
cu_emp_data_USTransportation_NY = cu_data_USTransportation %>%
filter(area_name=="New York-Northern New Jersey-Long Island",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAT',
emp_line_code = 800,
industry = 'Transportation',
state = 'NY') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USTransportation_NY = merge(EMP_DAT_CAT_SAN_NY, cu_emp_data_USTransportation_NY,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_NY %>%
# sample_frac(0.33) %>%
DT::datatable()4.3.15 OH - Ohio
- Cleveland-Akron
- Cincinnati-Hamilton
cu_emp_data_USTransportation_OH = cu_data_USTransportation %>%
filter( (area_name %in% c("Cleveland-Akron")
& period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
)
|
(area_name %in% c("Cincinnati-Hamilton")
& period %in% c("S01","S02")
)
) %>%
mutate(parent_item_code = 'SAT',
emp_line_code = 800,
area_code = 'A210;A213',
area_name = 'Cleveland-Akron;Cincinnati-Hamilton',
industry = 'Transportation',
state = 'OH') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USTransportation_OH = merge(EMP_DAT_CAT_SAN_OH, cu_emp_data_USTransportation_OH,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_OH %>%
# sample_frac(0.33) %>%
DT::datatable()4.3.16 OR - Oregon : Portland-Salem
cu_emp_data_USTransportation_OR = cu_data_USTransportation %>%
filter(area_name=="Portland-Salem",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAT',
emp_line_code = 800,
industry = 'Transportation',
state = 'OR') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USTransportation_OR = merge(EMP_DAT_CAT_SAN_OR, cu_emp_data_USTransportation_OR,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_OR %>%
# sample_frac(0.33) %>%
DT::datatable()4.3.17 PA - Pennsylvania
- Philadelphia-Wilmington-Atlantic City
- Pittsburgh
cu_emp_data_USTransportation_PA = cu_data_USTransportation %>%
filter( (area_name %in% c("Philadelphia-Wilmington-Atlantic City")
& period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
)
|
(area_name %in% c("Pittsburgh")
& period %in% c("S01","S02")
)
) %>%
mutate(parent_item_code = 'SAT',
emp_line_code = 800,
area_code = 'A102;A104',
area_name = 'Philadelphia-Wilmington-Atlantic City;Pittsburgh',
industry = 'Transportation',
state = 'PA') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USTransportation_PA = merge(EMP_DAT_CAT_SAN_PA, cu_emp_data_USTransportation_PA,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_PA %>%
# sample_frac(0.33) %>%
DT::datatable()4.3.18 TX - Texas
- Dallas-Fort Worth
- Houston-Galveston-Brazoria
cu_emp_data_USTransportation_TX = cu_data_USTransportation %>%
filter(area_name %in% c("Dallas-Fort Worth","Houston-Galveston-Brazoria"),
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAT',
emp_line_code = 800,
area_code = 'A316;A318',
area_name = 'Dallas-Fort Worth;Houston-Galveston-Brazoria',
industry = 'Transportation',
state = 'TX') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USTransportation_TX = merge(EMP_DAT_CAT_SAN_TX, cu_emp_data_USTransportation_TX,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_TX %>%
# sample_frac(0.33) %>%
DT::datatable()4.3.19 WA - Washington : Seattle-Tacoma-Bremerton
cu_emp_data_USTransportation_WA = cu_data_USTransportation %>%
filter(area_name=="Seattle-Tacoma-Bremerton",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAT',
emp_line_code = 800,
industry = 'Transportation',
state = 'WA') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USTransportation_WA = merge(EMP_DAT_CAT_SAN_WA, cu_emp_data_USTransportation_WA,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_WA %>%
# sample_frac(0.33) %>%
DT::datatable()4.3.20 WI - Wisconsin : Milwaukee-Racine
cu_emp_data_USTransportation_WI = cu_data_USTransportation %>%
filter(area_name=="Milwaukee-Racine",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAT',
emp_line_code = 800,
industry = 'Transportation',
state = 'WI') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USTransportation_WI = merge(EMP_DAT_CAT_SAN_WI, cu_emp_data_USTransportation_WI,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_WI %>%
# sample_frac(0.33) %>%
DT::datatable()4.4 Medical
4.4.1 AK - Alaska : Anchorage
cu_emp_data_USMedical_AK = cu_data_USMedical %>%
filter(area_name=="Anchorage",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAM',
emp_line_code = 1600,
industry = 'Medical',
state = 'AK') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_AK = merge(EMP_DAT_CAT_SAN_AK, cu_emp_data_USMedical_AK,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_AK %>%
# sample_frac(0.33) %>%
DT::datatable()4.4.2 AZ - Arizona : Phoenix-Mesa
cu_emp_data_USMedical_AZ = cu_data_USMedical %>%
filter(area_name=="Phoenix-Mesa",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAM',
emp_line_code = 1600,
industry = 'Medical',
state = 'AZ') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_AZ = merge(EMP_DAT_CAT_SAN_AZ, cu_emp_data_USMedical_AZ,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_AZ %>%
# sample_frac(0.33) %>%
DT::datatable()4.4.3 CA - California
- Los Angeles-Riverside-Orange County
- San Francisco-Oakland-San Jose
- San Diego
cu_emp_data_USMedical_CA = cu_data_USMedical %>%
filter( (area_name %in% c("Los Angeles-Riverside-Orange County", "San Francisco-Oakland-San Jose")
& period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
)
|
(area_name %in% c("San Diego")
& period %in% c("S01","S02")
)
) %>%
mutate(parent_item_code = 'SAM',
emp_line_code = 1600,
area_code = 'A421;A422;A424',
area_name = 'Los Angeles-Riverside-Orange County;San Francisco-Oakland-San Jose;San Diego',
industry = 'Medical',
state = 'CA') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value/3),2))
cpi_emp_data_USMedical_CA = merge(EMP_DAT_CAT_SAN_CA, cu_emp_data_USMedical_CA,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_CA %>%
# sample_frac(0.33) %>%
DT::datatable()4.4.4 CO - Colorado : Denver-Boulder-Greeley
cu_emp_data_USMedical_CO = cu_data_USMedical %>%
filter(area_name=="Denver-Boulder-Greeley",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAM',
emp_line_code = 1600,
industry = 'Medical',
state = 'CO') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_CO = merge(EMP_DAT_CAT_SAN_CO, cu_emp_data_USMedical_CO,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_CO %>%
# sample_frac(0.33) %>%
DT::datatable()4.4.5 FL - Florida
- Miami-Fort Lauderdale
- Tampa-St. Petersburg-Clearwater
cu_emp_data_USMedical_FL = cu_data_USMedical %>%
filter( (area_name %in% c("Miami-Fort Lauderdale")
& period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
)
|
(area_name %in% c("Tampa-St. Petersburg-Clearwater")
& period %in% c("S01","S02")
)
) %>%
mutate(parent_item_code = 'SAM',
emp_line_code = 1600,
area_code = 'A320;A321',
area_name = 'Miami-Fort Lauderdale;Tampa-St. Petersburg-Clearwater',
industry = 'Medical',
state = 'FL') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_FL = merge(EMP_DAT_CAT_SAN_FL, cu_emp_data_USMedical_FL,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_FL %>%
# sample_frac(0.33) %>%
DT::datatable()4.4.6 GA - Georgia : Atlanta
cu_emp_data_USMedical_GA = cu_data_USMedical %>%
filter(area_name=="Atlanta",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAM',
emp_line_code = 1600,
industry = 'Medical',
state = 'GA') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_GA = merge(EMP_DAT_CAT_SAN_GA, cu_emp_data_USMedical_GA,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_GA %>%
# sample_frac(0.33) %>%
DT::datatable()4.4.7 HI - Hawaii : Honolulu
cu_emp_data_USMedical_HI = cu_data_USMedical %>%
filter(area_name=="Honolulu",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAM',
emp_line_code = 1600,
industry = 'Medical',
state = 'HI') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_HI = merge(EMP_DAT_CAT_SAN_HI, cu_emp_data_USMedical_HI,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_HI %>%
# sample_frac(0.33) %>%
DT::datatable()4.4.8 IL - Illinois : Chicago-Gary-Kenosha
cu_emp_data_USMedical_IL = cu_data_USMedical %>%
filter(area_name=="Chicago-Gary-Kenosha",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAM',
emp_line_code = 1600,
industry = 'Medical',
state = 'IL') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_IL = merge(EMP_DAT_CAT_SAN_IL, cu_emp_data_USMedical_IL,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_IL %>%
# sample_frac(0.33) %>%
DT::datatable()4.4.9 KS - Kansas : Kansas City
cu_emp_data_USMedical_KS = cu_data_USMedical %>%
filter(area_name=="Kansas City",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAM',
emp_line_code = 1600,
industry = 'Medical',
state = 'KS') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_KS = merge(EMP_DAT_CAT_SAN_KS, cu_emp_data_USMedical_KS,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_KS %>%
# sample_frac(0.33) %>%
DT::datatable()4.4.10 MA - Massachusetts : Boston-Brockton-Nashua
cu_emp_data_USMedical_MA = cu_data_USMedical %>%
filter(area_name=="Boston-Brockton-Nashua",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAM',
emp_line_code = 1600,
industry = 'Medical',
state = 'MA') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_MA = merge(EMP_DAT_CAT_SAN_MA, cu_emp_data_USMedical_MA,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_MA %>%
# sample_frac(0.33) %>%
DT::datatable()4.4.11 MI - Michigan : Detroit-Ann Arbor-Flint
cu_emp_data_USMedical_MI = cu_data_USMedical %>%
filter(area_name=="Detroit-Ann Arbor-Flint",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAM',
emp_line_code = 1600,
industry = 'Medical',
state = 'MI') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_MI = merge(EMP_DAT_CAT_SAN_MI, cu_emp_data_USMedical_MI,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_MI %>%
# sample_frac(0.33) %>%
DT::datatable()4.4.12 MN - Minnesota : Minneapolis-St. Paul
cu_emp_data_USMedical_MN = cu_data_USMedical %>%
filter(area_name=="Minneapolis-St. Paul",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAM',
emp_line_code = 1600,
industry = 'Medical',
state = 'MN') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_MN = merge(EMP_DAT_CAT_SAN_MN, cu_emp_data_USMedical_MN,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_MN %>%
# sample_frac(0.33) %>%
DT::datatable()4.4.13 MO - Missouri : St. Louis
cu_emp_data_USMedical_MO = cu_data_USMedical %>%
filter(area_name=="St. Louis",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAM',
emp_line_code = 1600,
industry = 'Medical',
state = 'MO') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_MO = merge(EMP_DAT_CAT_SAN_MO, cu_emp_data_USMedical_MO,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_MO %>%
# sample_frac(0.33) %>%
DT::datatable()4.4.14 NY - New York
cu_emp_data_USMedical_NY = cu_data_USMedical %>%
filter(area_name=="New York-Northern New Jersey-Long Island",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAM',
emp_line_code = 1600,
industry = 'Medical',
state = 'NY') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_NY = merge(EMP_DAT_CAT_SAN_NY, cu_emp_data_USMedical_NY,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_NY %>%
# sample_frac(0.33) %>%
DT::datatable()4.4.15 OH - Ohio
- Cleveland-Akron
- Cincinnati-Hamilton
cu_emp_data_USMedical_OH = cu_data_USMedical %>%
filter( (area_name %in% c("Cleveland-Akron")
& period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
)
|
(area_name %in% c("Cincinnati-Hamilton")
& period %in% c("S01","S02")
)
) %>%
mutate(parent_item_code = 'SAM',
emp_line_code = 1600,
area_code = 'A210;A213',
area_name = 'Cleveland-Akron;Cincinnati-Hamilton',
industry = 'Medical',
state = 'OH') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USMedical_OH = merge(EMP_DAT_CAT_SAN_OH, cu_emp_data_USMedical_OH,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_OH %>%
# sample_frac(0.33) %>%
DT::datatable()4.4.16 OR - Oregon : Portland-Salem
cu_emp_data_USMedical_OR = cu_data_USMedical %>%
filter(area_name=="Portland-Salem",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAM',
emp_line_code = 1600,
industry = 'Medical',
state = 'OR') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_OR = merge(EMP_DAT_CAT_SAN_OR, cu_emp_data_USMedical_OR,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_OR %>%
# sample_frac(0.33) %>%
DT::datatable()4.4.17 PA - Pennsylvania
- Philadelphia-Wilmington-Atlantic City
- Pittsburgh
cu_emp_data_USMedical_PA = cu_data_USMedical %>%
filter( (area_name %in% c("Philadelphia-Wilmington-Atlantic City")
& period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
)
|
(area_name %in% c("Pittsburgh")
& period %in% c("S01","S02")
)
) %>%
mutate(parent_item_code = 'SAM',
emp_line_code = 1600,
area_code = 'A102;A104',
area_name = 'Philadelphia-Wilmington-Atlantic City;Pittsburgh',
industry = 'Medical',
state = 'PA') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USMedical_PA = merge(EMP_DAT_CAT_SAN_PA, cu_emp_data_USMedical_PA,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_PA %>%
# sample_frac(0.33) %>%
DT::datatable()4.4.18 TX - Texas
- Dallas-Fort Worth
- Houston-Galveston-Brazoria
cu_emp_data_USMedical_TX = cu_data_USMedical %>%
filter(area_name %in% c("Dallas-Fort Worth","Houston-Galveston-Brazoria"),
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAM',
emp_line_code = 1600,
area_code = 'A316;A318',
area_name = 'Dallas-Fort Worth;Houston-Galveston-Brazoria',
industry = 'Medical',
state = 'TX') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USMedical_TX = merge(EMP_DAT_CAT_SAN_TX, cu_emp_data_USMedical_TX,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_TX %>%
# sample_frac(0.33) %>%
DT::datatable()4.4.19 WA - Washington : Seattle-Tacoma-Bremerton
cu_emp_data_USMedical_WA = cu_data_USMedical %>%
filter(area_name=="Seattle-Tacoma-Bremerton",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAM',
emp_line_code = 1600,
industry = 'Medical',
state = 'WA') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_WA = merge(EMP_DAT_CAT_SAN_WA, cu_emp_data_USMedical_WA,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_WA %>%
# sample_frac(0.33) %>%
DT::datatable()4.4.20 WI - Wisconsin : Milwaukee-Racine
cu_emp_data_USMedical_WI = cu_data_USMedical %>%
filter(area_name=="Milwaukee-Racine",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAM',
emp_line_code = 1600,
industry = 'Medical',
state = 'WI') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_WI = merge(EMP_DAT_CAT_SAN_WI, cu_emp_data_USMedical_WI,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_WI %>%
# sample_frac(0.33) %>%
DT::datatable()4.5 Education & Communication
4.5.1 AK - Alaska : Anchorage
cu_data_USEducationAndCommunication_AK = cu_data_USEducationAndCommunication %>%
filter(area_name=="Anchorage",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAE',
emp_line_code = 1500,
industry = 'Education & Communication',
state = 'AK') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_AK = merge(EMP_DAT_CAT_SAN_AK, cu_data_USEducationAndCommunication_AK,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_AK %>%
# sample_frac(0.33) %>%
DT::datatable()4.5.2 AZ - Arizona : Phoenix-Mesa
cu_data_USEducationAndCommunication_AZ = cu_data_USEducationAndCommunication %>%
filter(area_name=="Phoenix-Mesa",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAE',
emp_line_code = 1500,
industry = 'Education & Communication',
state = 'AZ') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_AZ = merge(EMP_DAT_CAT_SAN_AZ, cu_data_USEducationAndCommunication_AZ,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_AZ %>%
# sample_frac(0.33) %>%
DT::datatable()4.5.3 CA - California
- Los Angeles-Riverside-Orange County
- San Francisco-Oakland-San Jose
- San Diego
cu_data_USEducationAndCommunication_CA = cu_data_USEducationAndCommunication %>%
filter( (area_name %in% c("Los Angeles-Riverside-Orange County", "San Francisco-Oakland-San Jose")
& period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
)
|
(area_name %in% c("San Diego")
& period %in% c("S01","S02")
)
) %>%
mutate(parent_item_code = 'SAE',
emp_line_code = 1500,
area_code = 'A421;A422;A424',
area_name = 'Los Angeles-Riverside-Orange County;San Francisco-Oakland-San Jose;San Diego',
industry = 'Education & Communication',
state = 'CA') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value/3),2))
cpi_emp_data_USEducationAndCommunication_CA = merge(EMP_DAT_CAT_SAN_CA, cu_data_USEducationAndCommunication_CA,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_CA %>%
###sample_frac(0.33) %>%
DT::datatable()4.5.4 CO - Colorado : Denver-Boulder-Greeley
cu_data_USEducationAndCommunication_CO = cu_data_USEducationAndCommunication %>%
filter(area_name=="Denver-Boulder-Greeley",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAE',
emp_line_code = 1500,
industry = 'Education & Communication',
state = 'CO') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_CO = merge(EMP_DAT_CAT_SAN_CO, cu_data_USEducationAndCommunication_CO,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_CO %>%
###sample_frac(0.33) %>%
DT::datatable()4.5.5 FL - Florida
- Miami-Fort Lauderdale
- Tampa-St. Petersburg-Clearwater
cu_data_USEducationAndCommunication_FL = cu_data_USEducationAndCommunication %>%
filter( (area_name %in% c("Miami-Fort Lauderdale")
& period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
)
|
(area_name %in% c("Tampa-St. Petersburg-Clearwater")
& period %in% c("S01","S02")
)
) %>%
mutate(parent_item_code = 'SAE',
emp_line_code = 1500,
area_code = 'A320;A321',
area_name = 'Miami-Fort Lauderdale;Tampa-St. Petersburg-Clearwater',
industry = 'Education & Communication',
state = 'FL') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_FL = merge(EMP_DAT_CAT_SAN_FL, cu_data_USEducationAndCommunication_FL,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_FL %>%
###sample_frac(0.33) %>%
DT::datatable()4.5.6 GA - Georgia : Atlanta
cu_data_USEducationAndCommunication_GA = cu_data_USEducationAndCommunication %>%
filter(area_name=="Atlanta",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAE',
emp_line_code = 1500,
industry = 'Education & Communication',
state = 'GA') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_GA = merge(EMP_DAT_CAT_SAN_GA, cu_data_USEducationAndCommunication_GA,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_GA %>%
###sample_frac(0.33) %>%
DT::datatable()4.5.7 HI - Hawaii : Honolulu
cu_data_USEducationAndCommunication_HI = cu_data_USEducationAndCommunication %>%
filter(area_name=="Honolulu",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAE',
emp_line_code = 1500,
industry = 'Education & Communication',
state = 'HI') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_HI = merge(EMP_DAT_CAT_SAN_HI, cu_data_USEducationAndCommunication_HI,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_HI %>%
###sample_frac(0.33) %>%
DT::datatable()4.5.8 IL - Illinois : Chicago-Gary-Kenosha
cu_data_USEducationAndCommunication_IL = cu_data_USEducationAndCommunication %>%
filter(area_name=="Chicago-Gary-Kenosha",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAE',
emp_line_code = 1500,
industry = 'Education & Communication',
state = 'IL') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_IL = merge(EMP_DAT_CAT_SAN_IL, cu_data_USEducationAndCommunication_IL,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_IL %>%
###sample_frac(0.33) %>%
DT::datatable()4.5.9 KS - Kansas : Kansas City
cu_data_USEducationAndCommunication_KS = cu_data_USEducationAndCommunication %>%
filter(area_name=="Kansas City",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAE',
emp_line_code = 1500,
industry = 'Education & Communication',
state = 'KS') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_KS = merge(EMP_DAT_CAT_SAN_KS, cu_data_USEducationAndCommunication_KS,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_KS %>%
# sample_frac(0.33) %>%
DT::datatable()4.5.10 MA - Massachusetts : Boston-Brockton-Nashua
cu_data_USEducationAndCommunication_MA = cu_data_USEducationAndCommunication %>%
filter(area_name=="Boston-Brockton-Nashua",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAE',
emp_line_code = 1500,
industry = 'Education & Communication',
state = 'MA') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_MA = merge(EMP_DAT_CAT_SAN_MA, cu_data_USEducationAndCommunication_MA,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_MA %>%
# sample_frac(0.33) %>%
DT::datatable()4.5.11 MI - Michigan
cu_data_USEducationAndCommunication_MI = cu_data_USEducationAndCommunication %>%
filter(area_name=="Detroit-Ann Arbor-Flint",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAE',
emp_line_code = 1500,
industry = 'Education & Communication',
state = 'MI') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_MI = merge(EMP_DAT_CAT_SAN_MI, cu_data_USEducationAndCommunication_MI,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_MI %>%
# sample_frac(0.33) %>%
DT::datatable()4.5.12 MN - Minnesota : Minneapolis-St. Paul
cu_data_USEducationAndCommunication_MN = cu_data_USEducationAndCommunication %>%
filter(area_name=="Minneapolis-St. Paul",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAE',
emp_line_code = 1500,
industry = 'Education & Communication',
state = 'MN') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_MN = merge(EMP_DAT_CAT_SAN_MN, cu_data_USEducationAndCommunication_MN,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_MN %>%
# sample_frac(0.33) %>%
DT::datatable()4.5.13 MO - Missouri : St. Louis
cu_data_USEducationAndCommunication_MO = cu_data_USEducationAndCommunication %>%
filter(area_name=="St. Louis",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAE',
emp_line_code = 1500,
industry = 'Education & Communication',
state = 'MO') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_MO = merge(EMP_DAT_CAT_SAN_MO, cu_data_USEducationAndCommunication_MO,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_MO %>%
# sample_frac(0.33) %>%
DT::datatable()4.5.14 NY - New York
cu_data_USEducationAndCommunication_NY = cu_data_USEducationAndCommunication %>%
filter(area_name=="New York-Northern New Jersey-Long Island",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAE',
emp_line_code = 1500,
industry = 'Education & Communication',
state = 'NY') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_NY = merge(EMP_DAT_CAT_SAN_NY, cu_data_USEducationAndCommunication_NY,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_NY %>%
# sample_frac(0.33) %>%
DT::datatable()4.5.15 OH - Ohio
- Cleveland-Akron
- Cincinnati-Hamilton
cu_data_USEducationAndCommunication_OH = cu_data_USEducationAndCommunication %>%
filter( (area_name %in% c("Cleveland-Akron")
& period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
)
|
(area_name %in% c("Cincinnati-Hamilton")
& period %in% c("S01","S02")
)
) %>%
mutate(parent_item_code = 'SAE',
emp_line_code = 1500,
area_code = 'A210;A213',
area_name = 'Cleveland-Akron;Cincinnati-Hamilton',
industry = 'Education & Communication',
state = 'OH') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USEducationAndCommunication_OH = merge(EMP_DAT_CAT_SAN_OH, cu_data_USEducationAndCommunication_OH,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_OH %>%
# sample_frac(0.33) %>%
DT::datatable()4.5.16 OR - Oregon : Portland-Salem
cu_data_USEducationAndCommunication_OR = cu_data_USEducationAndCommunication %>%
filter(area_name=="Portland-Salem",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAE',
emp_line_code = 1500,
industry = 'Education & Communication',
state = 'OR') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_OR = merge(EMP_DAT_CAT_SAN_OR, cu_data_USEducationAndCommunication_OR,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_OR %>%
# sample_frac(0.33) %>%
DT::datatable()4.5.17 PA - Pennsylvania
- Philadelphia-Wilmington-Atlantic City
- Pittsburgh
cu_data_USEducationAndCommunication_PA = cu_data_USEducationAndCommunication %>%
filter( (area_name %in% c("Philadelphia-Wilmington-Atlantic City")
& period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
)
|
(area_name %in% c("Pittsburgh")
& period %in% c("S01","S02")
)
) %>%
mutate(parent_item_code = 'SAE',
emp_line_code = 1500,
area_code = 'A102;A104',
area_name = 'Philadelphia-Wilmington-Atlantic City;Pittsburgh',
industry = 'Education & Communication',
state = 'PA') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USEducationAndCommunication_PA = merge(EMP_DAT_CAT_SAN_PA, cu_data_USEducationAndCommunication_PA,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_PA %>%
# sample_frac(0.33) %>%
DT::datatable()4.5.18 TX - Texas
- Dallas-Fort Worth
- Houston-Galveston-Brazoria
cu_data_USEducationAndCommunication_TX = cu_data_USEducationAndCommunication %>%
filter(area_name %in% c("Dallas-Fort Worth","Houston-Galveston-Brazoria"),
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAE',
emp_line_code = 1500,
area_code = 'A316;A318',
area_name = 'Dallas-Fort Worth;Houston-Galveston-Brazoria',
industry = 'Education & Communication',
state = 'TX') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USEducationAndCommunication_TX = merge(EMP_DAT_CAT_SAN_TX, cu_data_USEducationAndCommunication_TX,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_TX %>%
# sample_frac(0.33) %>%
DT::datatable()4.5.19 WA - Washington : Seattle-Tacoma-Bremerton
cu_data_USEducationAndCommunication_WA = cu_data_USEducationAndCommunication %>%
filter(area_name=="Seattle-Tacoma-Bremerton",
period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
mutate(parent_item_code = 'SAE',
emp_line_code = 1500,
industry = 'Education & Communication',
state = 'WA') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_WA = merge(EMP_DAT_CAT_SAN_WA, cu_data_USEducationAndCommunication_WA,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_WA %>%
# sample_frac(0.33) %>%
DT::datatable()4.5.20 WI - Wisconsin : Milwaukee-Racine
cu_data_USEducationAndCommunication_WI = cu_data_USEducationAndCommunication %>%
filter(area_name=="Milwaukee-Racine",
period %in% c("S01","S02")) %>%
mutate(parent_item_code = 'SAE',
emp_line_code = 1500,
industry = 'Education & Communication',
state = 'WI') %>%
select(c("industry",
"state",
"parent_item_code",
"emp_line_code",
"series_id",
"series_title",
"year",
"period",
"period_abbr",
"period_name",
"value",
"area_code",
"area_name",
"area_display_level",
"area_selectable",
"area_sort_sequence",
"item_code",
"item_name",
"item_display_level",
"item_selectable",
"item_sort_sequence",
"seasonal",
"periodicity_code",
"periodicity_name",
"base_code",
"base_name",
"base_period")) %>%
group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>%
summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_WI = merge(EMP_DAT_CAT_SAN_WI, cu_data_USEducationAndCommunication_WI,
by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_WI %>%
# sample_frac(0.33) %>%
DT::datatable()4.6 Combined Employment and CPI Data
cpi_emp_data_Combined <- rbind(cpi_emp_data_USFoodBeverage_AK,
cpi_emp_data_USFoodBeverage_AZ,
cpi_emp_data_USFoodBeverage_CA,
cpi_emp_data_USFoodBeverage_CO,
cpi_emp_data_USFoodBeverage_FL,
cpi_emp_data_USFoodBeverage_GA,
cpi_emp_data_USFoodBeverage_HI,
cpi_emp_data_USFoodBeverage_IL,
cpi_emp_data_USFoodBeverage_KS,
cpi_emp_data_USFoodBeverage_MA,
cpi_emp_data_USFoodBeverage_MI,
cpi_emp_data_USFoodBeverage_MN,
cpi_emp_data_USFoodBeverage_MO,
cpi_emp_data_USFoodBeverage_NY,
cpi_emp_data_USFoodBeverage_OH,
cpi_emp_data_USFoodBeverage_OR,
cpi_emp_data_USFoodBeverage_PA,
cpi_emp_data_USFoodBeverage_TX,
cpi_emp_data_USFoodBeverage_WA,
cpi_emp_data_USFoodBeverage_WI,
cpi_emp_data_USHousing_AK,
cpi_emp_data_USHousing_AZ,
cpi_emp_data_USHousing_CA,
cpi_emp_data_USHousing_CO,
cpi_emp_data_USHousing_FL,
cpi_emp_data_USHousing_GA,
cpi_emp_data_USHousing_HI,
cpi_emp_data_USHousing_IL,
cpi_emp_data_USHousing_KS,
cpi_emp_data_USHousing_MA,
cpi_emp_data_USHousing_MI,
cpi_emp_data_USHousing_MN,
cpi_emp_data_USHousing_MO,
cpi_emp_data_USHousing_NY,
cpi_emp_data_USHousing_OH,
cpi_emp_data_USHousing_OR,
cpi_emp_data_USHousing_PA,
cpi_emp_data_USHousing_TX,
cpi_emp_data_USHousing_WA,
cpi_emp_data_USHousing_WI,
cpi_emp_data_USTransportation_AK,
cpi_emp_data_USTransportation_CA,
cpi_emp_data_USTransportation_CO,
cpi_emp_data_USTransportation_FL,
cpi_emp_data_USTransportation_GA,
cpi_emp_data_USTransportation_HI,
cpi_emp_data_USTransportation_IL,
cpi_emp_data_USTransportation_KS,
cpi_emp_data_USTransportation_MA,
cpi_emp_data_USTransportation_MI,
cpi_emp_data_USTransportation_MN,
cpi_emp_data_USTransportation_MO,
cpi_emp_data_USTransportation_NY,
cpi_emp_data_USTransportation_OH,
cpi_emp_data_USTransportation_OR,
cpi_emp_data_USTransportation_PA,
cpi_emp_data_USTransportation_TX,
cpi_emp_data_USTransportation_WA,
cpi_emp_data_USTransportation_WI,
cpi_emp_data_USMedical_AK,
cpi_emp_data_USMedical_AZ,
cpi_emp_data_USMedical_CA,
cpi_emp_data_USMedical_CO,
cpi_emp_data_USMedical_FL,
cpi_emp_data_USMedical_GA,
cpi_emp_data_USMedical_HI,
cpi_emp_data_USMedical_IL,
cpi_emp_data_USMedical_KS,
cpi_emp_data_USMedical_MA,
cpi_emp_data_USMedical_MI,
cpi_emp_data_USMedical_MN,
cpi_emp_data_USMedical_MO,
cpi_emp_data_USMedical_NY,
cpi_emp_data_USMedical_OH,
cpi_emp_data_USMedical_OR,
cpi_emp_data_USMedical_PA,
cpi_emp_data_USMedical_TX,
cpi_emp_data_USMedical_WA,
cpi_emp_data_USMedical_WI,
cpi_emp_data_USEducationAndCommunication_AK,
cpi_emp_data_USEducationAndCommunication_AZ,
cpi_emp_data_USEducationAndCommunication_CA,
cpi_emp_data_USEducationAndCommunication_CO,
cpi_emp_data_USEducationAndCommunication_FL,
cpi_emp_data_USEducationAndCommunication_GA,
cpi_emp_data_USEducationAndCommunication_HI,
cpi_emp_data_USEducationAndCommunication_IL,
cpi_emp_data_USEducationAndCommunication_KS,
cpi_emp_data_USEducationAndCommunication_MA,
cpi_emp_data_USEducationAndCommunication_MI,
cpi_emp_data_USEducationAndCommunication_MN,
cpi_emp_data_USEducationAndCommunication_MO,
cpi_emp_data_USEducationAndCommunication_NY,
cpi_emp_data_USEducationAndCommunication_OH,
cpi_emp_data_USEducationAndCommunication_OR,
cpi_emp_data_USEducationAndCommunication_PA,
cpi_emp_data_USEducationAndCommunication_TX,
cpi_emp_data_USEducationAndCommunication_WA,
cpi_emp_data_USEducationAndCommunication_WI)
cpi_emp_data_Combined %>%
sample_frac(0.33) %>%
DT::datatable()5 RAJWANT WORK
cpi_emp_data_USFoodBeverage_NY %>%
select(c("Year","Sector"= "Description","No_Jobs"="Number of jobs","cpi_value")) %>%
gather(key="Type", value ="Value", "cpi_value","No_Jobs") %>%
ggplot(mapping = aes(x= Year, y=(as.numeric(Value)),fill = Type,group = Type)) +
geom_col(position = position_dodge()) + geom_smooth(method = "lm")## `geom_smooth()` using formula 'y ~ x'
theme(axis.text.x = element_text(angle = 60, colour="gray",hjust = 1,size=rel(0.86))) +
labs(title = "USSFoodBeverage Data for No. Of Jobs and CPI Value in NY",
y="Sector", x="Years") ## List of 4
## $ axis.text.x:List of 11
## ..$ family : NULL
## ..$ face : NULL
## ..$ colour : chr "gray"
## ..$ size : 'rel' num 0.86
## ..$ hjust : num 1
## ..$ vjust : NULL
## ..$ angle : num 60
## ..$ lineheight : NULL
## ..$ margin : NULL
## ..$ debug : NULL
## ..$ inherit.blank: logi FALSE
## ..- attr(*, "class")= chr [1:2] "element_text" "element"
## $ y : chr "Sector"
## $ x : chr "Years"
## $ title : chr "USSFoodBeverage Data for No. Of Jobs and CPI Value in NY"
## - attr(*, "class")= chr [1:2] "theme" "gg"
## - attr(*, "complete")= logi FALSE
## - attr(*, "validate")= logi TRUE
cpi_emp_data_USFoodBeverage_NY %>%
select(c("Year","Sector"= "Description","No_Jobs"="Number of jobs")) %>%
ggplot(mapping = aes(x= Year, y=(as.numeric(No_Jobs)))) +
geom_col() +
theme(axis.text.x = element_text(angle = 60, colour="gray",hjust = 1,size=rel(0.86))) +
labs(title = "USSFoodBeverage Data for No. Of Jobs in NY",
y="Sector", x="Years") cpi_emp_data_USFoodBeverage_NY %>%
select(c("Year","Sector"= "Description","cpi_value")) %>%
ggplot(mapping = aes(x= Year, y=(as.numeric(cpi_value)))) +
geom_col() +
theme(axis.text.x = element_text(angle = 60, colour="gray",hjust = 1,size=rel(0.86))) +
labs(title = "USSFoodBeverage Data for CPI Value in NY",
y = "Sector",x= "Years") # summary(as.numeric(cpi_emp_data_USFoodBeverage_NY$`Number of jobs`))
# summary(as.numeric(cpi_emp_data_USFoodBeverage_NY$cpi_value))
#
# quantile(as.numeric(cpi_emp_data_USFoodBeverage_NY$`Number of jobs`))
# levels(quantile(as.numeric(cpi_emp_data_USFoodBeverage_NY$cpi_value)))5.1 Model
#library(DataExplorer)
#library(plot_histogram)
# library(lazytrade)
ALL_NY <- rbind(cpi_emp_data_USTransportation_NY,
cpi_emp_data_USMedical_NY,
cpi_emp_data_USHousing_NY,
cpi_emp_data_USFoodBeverage_NY,
cpi_emp_data_USEducationAndCommunication_NY)
ALL_NY <- cpi_emp_data_Combined %>% filter(GeoName=="New York")
#DataExplorer::plot_missing(ALL_NY, title = "Beverage Training Data: % Missing Values by Data Element")
ALL_NY$`Number of jobs` = as.numeric(ALL_NY$`Number of jobs`)
#plot_histogram(ALL_NY, ggtheme=theme_light())
#plot_correlation(ALL_NY[,c(2,5,9)])
ggplot(ALL_NY[,c("Year","Description","Number of jobs","cpi_value")],
mapping=aes(y= log(cpi_value),
x=Year,
color= Description,
group = Description)) + geom_smooth() + theme_classic()## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
ggplot(ALL_NY[,c("Year","Description","Number of jobs","cpi_value")],
mapping=aes(y= log(`Number of jobs`),
x=Year,
color= Description,
group = Description)) + geom_smooth() + theme_classic()## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: package 'caret' was built under R version 3.5.3
## Loading required package: lattice
##
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
##
## lift
ALL_NY_M <- ALL_NY[,c("Year","Number of jobs","parent_item_code","cpi_value")]
names(ALL_NY_M) = c("Year", "Jobs", "C_Item","CPI")
# ALL_NY_M$Year = as.character(ALL_NY_M$Year)
# ALL_NY_M$E_Area = as.character(ALL_NY_M$E_Area)
set.seed(3456)
trainIndex <- createDataPartition(ALL_NY_M$Jobs, p = .8,
list = FALSE,
times = 1)
ALL_NY_train <- ALL_NY_M[ trainIndex,]
ALL_NY_test <- ALL_NY_M[-trainIndex,]
head(ALL_NY_M)#--NEW MODEL FOR ALL
# Change the col name
m_cpi_emp_data_Combined <- cpi_emp_data_Combined[,c("Year","Number of jobs","parent_item_code","cpi_value","GeoName")]
names(m_cpi_emp_data_Combined) = c("Year", "Jobs", "C_Item","CPI","GeoName")
m_cpi_emp_data_Combined$Jobs <- as.integer(m_cpi_emp_data_Combined$Jobs)
trainIndex <- createDataPartition(m_cpi_emp_data_Combined$Jobs, p = .8,
list = FALSE,
times = 1)
ALL_train <- m_cpi_emp_data_Combined[ trainIndex,]
ALL_test <- m_cpi_emp_data_Combined[-trainIndex,]
head(ALL_NY_M)5.2 For NY
# Train control
customTrainControl <- trainControl(method = "repeatedcv",
number = 10 ,
repeats = 5 ,
verboseIter = F)
# Linear Model
lm <- train(Jobs ~ .,
ALL_train,
method= 'lm',
trControl = customTrainControl)
lm$results##
## Call:
## lm(formula = .outcome ~ ., data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -407640 -66213 10575 69812 1267507
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.907e+05 2.746e+04 -6.945 5.63e-12 ***
## Year1999 -1.573e+03 2.636e+04 -0.060 0.952414
## Year2000 8.869e+03 2.611e+04 0.340 0.734092
## Year2001 1.996e+04 2.601e+04 0.767 0.442981
## Year2002 1.671e+04 2.623e+04 0.637 0.524278
## Year2003 3.784e+04 2.605e+04 1.453 0.146533
## Year2004 5.075e+04 2.614e+04 1.942 0.052371 .
## Year2005 5.782e+04 2.606e+04 2.219 0.026668 *
## Year2006 6.274e+04 2.620e+04 2.395 0.016762 *
## Year2007 7.301e+04 2.558e+04 2.854 0.004378 **
## Year2008 7.494e+04 2.599e+04 2.884 0.003987 **
## Year2009 7.479e+04 2.598e+04 2.879 0.004046 **
## Year2010 8.319e+04 2.625e+04 3.169 0.001563 **
## Year2011 9.647e+04 2.600e+04 3.711 0.000214 ***
## Year2012 8.986e+04 2.608e+04 3.446 0.000585 ***
## Year2013 1.179e+05 2.631e+04 4.481 8.00e-06 ***
## Year2014 1.113e+05 2.623e+04 4.243 2.34e-05 ***
## Year2015 1.283e+05 2.581e+04 4.969 7.50e-07 ***
## Year2016 1.322e+05 2.563e+04 5.157 2.84e-07 ***
## Year2017 1.794e+05 3.117e+04 5.756 1.04e-08 ***
## Year2018 NA NA NA NA
## C_ItemSAF -6.704e+03 1.390e+04 -0.482 0.629757
## C_ItemSAH 1.815e+05 1.900e+04 9.555 < 2e-16 ***
## C_ItemSAM 5.227e+05 1.322e+04 39.529 < 2e-16 ***
## C_ItemSAT 1.046e+05 1.603e+04 6.530 8.99e-11 ***
## CPI -4.621e+00 1.101e+00 -4.198 2.85e-05 ***
## GeoNameArizona 1.019e+05 3.000e+04 3.396 0.000703 ***
## GeoNameCalifornia 8.771e+05 2.741e+04 32.005 < 2e-16 ***
## GeoNameColorado 8.747e+04 2.688e+04 3.255 0.001160 **
## GeoNameFlorida 4.431e+05 2.657e+04 16.673 < 2e-16 ***
## GeoNameGeorgia 2.117e+05 2.795e+04 7.574 6.31e-14 ***
## GeoNameHawaii 1.434e+04 2.676e+04 0.536 0.592160
## GeoNameIllinois 3.554e+05 2.817e+04 12.613 < 2e-16 ***
## GeoNameKansas 4.313e+04 2.649e+04 1.628 0.103761
## GeoNameMassachusetts 2.281e+05 2.729e+04 8.358 < 2e-16 ***
## GeoNameMichigan 2.143e+05 2.319e+04 9.239 < 2e-16 ***
## GeoNameMissouri 1.203e+05 2.660e+04 4.521 6.64e-06 ***
## `GeoNameNew York` 6.158e+05 2.905e+04 21.198 < 2e-16 ***
## GeoNameOhio 2.812e+05 2.648e+04 10.619 < 2e-16 ***
## GeoNameOregon 7.990e+04 2.707e+04 2.952 0.003211 **
## GeoNamePennsylvania 3.555e+05 2.699e+04 13.172 < 2e-16 ***
## GeoNameTexas 5.624e+05 2.718e+04 20.690 < 2e-16 ***
## GeoNameWashington 1.721e+05 2.803e+04 6.139 1.06e-09 ***
## GeoNameWisconsin 1.116e+05 2.688e+04 4.153 3.47e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 163700 on 1492 degrees of freedom
## Multiple R-squared: 0.7709, Adjusted R-squared: 0.7644
## F-statistic: 119.5 on 42 and 1492 DF, p-value: < 2.2e-16
#--------------
# https://stackoverflow.com/questions/44200195/how-to-debug-contrasts-can-be-applied-only-to-factors-with-2-or-more-levels-er
ALL_train %>% dplyr::mutate_all(as.factor) %>% str## 'data.frame': 1535 obs. of 5 variables:
## $ Year : Factor w/ 21 levels "1998","1999",..: 1 2 5 6 7 8 9 10 11 13 ...
## $ Jobs : Factor w/ 1465 levels "3515","3779",..: 29 30 18 20 21 27 31 32 28 26 ...
## $ C_Item : Factor w/ 5 levels "SAE","SAF","SAH",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ CPI : Factor w/ 1529 levels "188.6","189.6",..: 423 426 443 459 476 487 494 508 528 523 ...
## $ GeoName: Factor w/ 19 levels "Alaska","Arizona",..: 1 1 1 1 1 1 1 1 1 1 ...
## [1] A427 A429 A421;A422;A424 A433 A320;A321
## [6] A319 A426 A207 A214 A103
## [11] A208 A211 A209 A101 A210;A213
## [16] A425 A102;A104 A316;A318 A423 A212
## 50 Levels: 0000 0100 0200 0300 0400 A000 A100 A101 A102 A103 A104 A200 ... A316;A318
##
## Call:
## lm(formula = Jobs ~ CPI + C_Item, data = ALL_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -616796 -91342 -25855 55837 1998242
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 119850.952 14762.062 8.119 9.59e-16 ***
## CPI 14.469 1.205 12.007 < 2e-16 ***
## C_ItemSAF -83340.521 21354.232 -3.903 9.92e-05 ***
## C_ItemSAH -51191.743 25599.996 -2.000 0.0457 *
## C_ItemSAM 516863.414 20820.848 24.824 < 2e-16 ***
## C_ItemSAT -48057.489 23294.045 -2.063 0.0393 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 258800 on 1529 degrees of freedom
## Multiple R-squared: 0.4133, Adjusted R-squared: 0.4114
## F-statistic: 215.4 on 5 and 1529 DF, p-value: < 2.2e-16
lm_cpi_item_name <- lm(Jobs ~ (CPI*GeoName)+CPI+GeoName+C_Item+Year,ALL_train)
summary(lm_cpi_item_name)##
## Call:
## lm(formula = Jobs ~ (CPI * GeoName) + CPI + GeoName + C_Item +
## Year, data = ALL_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -407490 -62836 3026 72716 1271991
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.910e+05 3.491e+04 -5.472 5.23e-08 ***
## CPI 1.433e+01 1.364e+01 1.051 0.293492
## GeoNameArizona 7.171e+04 4.517e+04 1.587 0.112648
## GeoNameCalifornia 8.679e+05 4.024e+04 21.567 < 2e-16 ***
## GeoNameColorado 3.870e+04 4.486e+04 0.863 0.388464
## GeoNameFlorida 4.131e+05 4.241e+04 9.740 < 2e-16 ***
## GeoNameGeorgia 1.698e+05 4.040e+04 4.204 2.78e-05 ***
## GeoNameHawaii -8.035e+03 4.237e+04 -0.190 0.849622
## GeoNameIllinois 3.576e+05 4.206e+04 8.501 < 2e-16 ***
## GeoNameKansas 2.774e+04 4.324e+04 0.642 0.521232
## GeoNameMassachusetts 2.594e+05 3.992e+04 6.497 1.12e-10 ***
## GeoNameMichigan 2.308e+05 3.515e+04 6.567 7.08e-11 ***
## GeoNameMissouri 1.218e+05 4.406e+04 2.765 0.005760 **
## GeoNameNew York 7.277e+05 4.235e+04 17.184 < 2e-16 ***
## GeoNameOhio 3.176e+05 4.062e+04 7.819 1.01e-14 ***
## GeoNameOregon 8.301e+04 4.367e+04 1.901 0.057543 .
## GeoNamePennsylvania 4.407e+05 4.042e+04 10.904 < 2e-16 ***
## GeoNameTexas 5.579e+05 3.992e+04 13.975 < 2e-16 ***
## GeoNameWashington 1.348e+05 4.012e+04 3.361 0.000796 ***
## GeoNameWisconsin 1.154e+05 4.457e+04 2.589 0.009723 **
## C_ItemSAF -3.107e+04 1.657e+04 -1.875 0.061044 .
## C_ItemSAH 1.141e+05 3.216e+04 3.550 0.000398 ***
## C_ItemSAM 5.183e+05 1.311e+04 39.533 < 2e-16 ***
## C_ItemSAT 5.946e+04 2.301e+04 2.585 0.009843 **
## Year1999 -2.255e+03 2.595e+04 -0.087 0.930781
## Year2000 8.081e+03 2.571e+04 0.314 0.753286
## Year2001 1.791e+04 2.565e+04 0.698 0.485132
## Year2002 1.513e+04 2.584e+04 0.586 0.558245
## Year2003 3.513e+04 2.569e+04 1.367 0.171750
## Year2004 4.723e+04 2.584e+04 1.828 0.067802 .
## Year2005 5.066e+04 2.587e+04 1.959 0.050350 .
## Year2006 5.429e+04 2.612e+04 2.078 0.037850 *
## Year2007 6.267e+04 2.560e+04 2.448 0.014487 *
## Year2008 6.103e+04 2.621e+04 2.329 0.019998 *
## Year2009 6.323e+04 2.601e+04 2.431 0.015194 *
## Year2010 6.870e+04 2.653e+04 2.589 0.009717 **
## Year2011 8.000e+04 2.661e+04 3.006 0.002691 **
## Year2012 6.975e+04 2.670e+04 2.613 0.009075 **
## Year2013 9.717e+04 2.711e+04 3.584 0.000349 ***
## Year2014 9.221e+04 2.702e+04 3.413 0.000661 ***
## Year2015 1.098e+05 2.644e+04 4.150 3.51e-05 ***
## Year2016 1.166e+05 2.602e+04 4.482 7.96e-06 ***
## Year2017 1.829e+05 3.080e+04 5.937 3.62e-09 ***
## CPI:GeoNameArizona 3.855e+01 2.531e+01 1.523 0.127874
## CPI:GeoNameCalifornia -1.353e+01 1.308e+01 -1.034 0.301371
## CPI:GeoNameColorado 2.594e+01 1.825e+01 1.421 0.155422
## CPI:GeoNameFlorida -7.427e+00 1.306e+01 -0.569 0.569519
## CPI:GeoNameGeorgia -1.048e+01 1.313e+01 -0.798 0.424985
## CPI:GeoNameHawaii 8.646e+00 1.559e+01 0.554 0.579321
## CPI:GeoNameIllinois -1.598e+01 1.314e+01 -1.216 0.224167
## CPI:GeoNameKansas 7.870e+00 1.687e+01 0.466 0.640991
## CPI:GeoNameMassachusetts -1.829e+01 1.314e+01 -1.392 0.163991
## CPI:GeoNameMichigan -1.497e+01 1.317e+01 -1.137 0.255789
## CPI:GeoNameMissouri -1.420e-01 1.745e+01 -0.008 0.993507
## CPI:GeoNameNew York -2.437e+01 1.316e+01 -1.852 0.064274 .
## CPI:GeoNameOhio -1.867e+01 1.304e+01 -1.432 0.152361
## CPI:GeoNameOregon -2.130e+00 1.711e+01 -0.124 0.900938
## CPI:GeoNamePennsylvania -2.741e+01 1.303e+01 -2.103 0.035627 *
## CPI:GeoNameTexas -1.397e+01 1.309e+01 -1.067 0.286269
## CPI:GeoNameWashington -1.182e+01 1.318e+01 -0.897 0.369977
## CPI:GeoNameWisconsin -1.710e+00 1.757e+01 -0.097 0.922514
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 161000 on 1474 degrees of freedom
## Multiple R-squared: 0.781, Adjusted R-squared: 0.772
## F-statistic: 87.59 on 60 and 1474 DF, p-value: < 2.2e-16
lm_cpi_item_name2 <- lm(Jobs ~ (CPI*GeoName*C_Item)+CPI+GeoName+C_Item+Year,ALL_train)
summary(lm_cpi_item_name2)##
## Call:
## lm(formula = Jobs ~ (CPI * GeoName * C_Item) + CPI + GeoName +
## C_Item + Year, data = ALL_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -416037 -9110 1027 10626 541396
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.285e+03 2.737e+04 0.266 0.790118
## CPI -1.483e+02 7.366e+01 -2.014 0.044241 *
## GeoNameArizona 1.942e+04 4.026e+04 0.482 0.629669
## GeoNameCalifornia 2.796e+05 3.701e+04 7.555 7.75e-14 ***
## GeoNameColorado 3.421e+04 3.965e+04 0.863 0.388391
## GeoNameFlorida 8.485e+04 3.709e+04 2.288 0.022301 *
## GeoNameGeorgia 4.737e+04 3.820e+04 1.240 0.215207
## GeoNameHawaii 1.631e+04 3.942e+04 0.414 0.679145
## GeoNameIllinois 1.256e+05 3.928e+04 3.199 0.001413 **
## GeoNameKansas 2.371e+04 3.842e+04 0.617 0.537309
## GeoNameMassachusetts 1.575e+05 3.853e+04 4.088 4.62e-05 ***
## GeoNameMichigan 4.890e+04 3.030e+04 1.614 0.106761
## GeoNameMissouri 7.024e+04 4.168e+04 1.685 0.092185 .
## GeoNameNew York 3.015e+05 4.093e+04 7.367 3.06e-13 ***
## GeoNameOhio 8.769e+04 3.661e+04 2.396 0.016733 *
## GeoNameOregon 4.000e+04 3.857e+04 1.037 0.299910
## GeoNamePennsylvania 2.072e+05 3.810e+04 5.437 6.43e-08 ***
## GeoNameTexas 1.161e+05 3.851e+04 3.013 0.002632 **
## GeoNameWashington 5.023e+04 3.836e+04 1.310 0.190573
## GeoNameWisconsin 3.952e+04 3.979e+04 0.993 0.320729
## C_ItemSAF 3.035e+05 1.340e+05 2.265 0.023669 *
## C_ItemSAH 1.179e+05 6.757e+04 1.745 0.081197 .
## C_ItemSAM 1.101e+05 6.695e+04 1.644 0.100449
## C_ItemSAT 6.349e+04 5.383e+04 1.179 0.238468
## Year1999 1.768e+03 8.438e+03 0.210 0.834089
## Year2000 3.938e+03 8.370e+03 0.470 0.638085
## Year2001 9.097e+03 8.370e+03 1.087 0.277273
## Year2002 1.525e+04 8.443e+03 1.807 0.071059 .
## Year2003 1.881e+04 8.468e+03 2.221 0.026546 *
## Year2004 2.551e+04 8.566e+03 2.978 0.002954 **
## Year2005 3.633e+04 8.691e+03 4.181 3.10e-05 ***
## Year2006 4.898e+04 8.894e+03 5.507 4.39e-08 ***
## Year2007 5.561e+04 8.892e+03 6.254 5.38e-10 ***
## Year2008 6.002e+04 9.368e+03 6.407 2.06e-10 ***
## Year2009 5.763e+04 9.216e+03 6.254 5.40e-10 ***
## Year2010 6.762e+04 1.000e+04 6.759 2.07e-11 ***
## Year2011 7.804e+04 1.028e+04 7.589 6.06e-14 ***
## Year2012 8.392e+04 1.055e+04 7.953 3.87e-15 ***
## Year2013 9.349e+04 1.078e+04 8.674 < 2e-16 ***
## Year2014 1.010e+05 1.081e+04 9.343 < 2e-16 ***
## Year2015 1.100e+05 1.057e+04 10.405 < 2e-16 ***
## Year2016 1.168e+05 1.042e+04 11.203 < 2e-16 ***
## Year2017 2.031e+05 1.082e+04 18.775 < 2e-16 ***
## CPI:GeoNameArizona 7.433e+01 9.798e+01 0.759 0.448244
## CPI:GeoNameCalifornia 2.010e+02 7.537e+01 2.667 0.007741 **
## CPI:GeoNameColorado 4.522e+01 1.017e+02 0.445 0.656544
## CPI:GeoNameFlorida 1.603e+02 7.515e+01 2.133 0.033086 *
## CPI:GeoNameGeorgia 1.314e+02 7.593e+01 1.731 0.083683 .
## CPI:GeoNameHawaii 8.794e+00 9.444e+01 0.093 0.925823
## CPI:GeoNameIllinois 1.432e+02 7.398e+01 1.935 0.053176 .
## CPI:GeoNameKansas -9.789e+00 9.854e+01 -0.099 0.920886
## CPI:GeoNameMassachusetts 1.318e+02 7.563e+01 1.742 0.081745 .
## CPI:GeoNameMichigan 1.270e+02 7.453e+01 1.704 0.088531 .
## CPI:GeoNameMissouri 5.274e+00 9.974e+01 0.053 0.957839
## CPI:GeoNameNew York 1.624e+02 7.412e+01 2.191 0.028638 *
## CPI:GeoNameOhio 1.193e+02 7.927e+01 1.506 0.132412
## CPI:GeoNameOregon 4.081e+00 1.007e+02 0.041 0.967675
## CPI:GeoNamePennsylvania 1.314e+02 7.862e+01 1.671 0.094878 .
## CPI:GeoNameTexas 1.599e+02 7.611e+01 2.102 0.035785 *
## CPI:GeoNameWashington 1.097e+02 7.566e+01 1.450 0.147423
## CPI:GeoNameWisconsin 4.538e+01 9.549e+01 0.475 0.634665
## CPI:C_ItemSAF -5.353e+01 1.021e+02 -0.524 0.600195
## CPI:C_ItemSAH 1.109e+02 7.445e+01 1.489 0.136606
## CPI:C_ItemSAM -2.205e+01 1.055e+02 -0.209 0.834406
## CPI:C_ItemSAT 1.142e+02 7.467e+01 1.529 0.126550
## GeoNameArizona:C_ItemSAF -3.622e+04 2.115e+05 -0.171 0.864081
## GeoNameCalifornia:C_ItemSAF -9.592e+04 1.655e+05 -0.580 0.562265
## GeoNameColorado:C_ItemSAF -4.518e+04 1.894e+05 -0.239 0.811528
## GeoNameFlorida:C_ItemSAF 2.306e+04 1.716e+05 0.134 0.893112
## GeoNameGeorgia:C_ItemSAF -3.506e+05 1.528e+05 -2.295 0.021898 *
## GeoNameHawaii:C_ItemSAF -1.552e+05 1.568e+05 -0.989 0.322635
## GeoNameIllinois:C_ItemSAF -4.041e+04 1.674e+05 -0.241 0.809339
## GeoNameKansas:C_ItemSAF -1.194e+05 1.578e+05 -0.757 0.449065
## GeoNameMassachusetts:C_ItemSAF -3.713e+05 1.521e+05 -2.441 0.014790 *
## GeoNameMichigan:C_ItemSAF -3.140e+05 1.362e+05 -2.305 0.021308 *
## GeoNameMissouri:C_ItemSAF -1.164e+05 1.701e+05 -0.684 0.494116
## GeoNameNew York:C_ItemSAF -2.552e+05 1.631e+05 -1.565 0.117903
## GeoNameOhio:C_ItemSAF -3.286e+05 1.505e+05 -2.183 0.029198 *
## GeoNameOregon:C_ItemSAF -1.069e+05 1.650e+05 -0.648 0.517102
## GeoNamePennsylvania:C_ItemSAF -4.450e+05 1.491e+05 -2.985 0.002885 **
## GeoNameTexas:C_ItemSAF -2.365e+05 1.520e+05 -1.555 0.120123
## GeoNameWashington:C_ItemSAF -3.610e+05 1.517e+05 -2.380 0.017455 *
## GeoNameWisconsin:C_ItemSAF -5.879e+04 1.651e+05 -0.356 0.721774
## GeoNameArizona:C_ItemSAH 1.376e+05 1.840e+05 0.747 0.454941
## GeoNameCalifornia:C_ItemSAH 3.193e+04 9.898e+04 0.323 0.747045
## GeoNameColorado:C_ItemSAH 4.021e+04 1.270e+05 0.317 0.751523
## GeoNameFlorida:C_ItemSAH 6.557e+04 9.444e+04 0.694 0.487647
## GeoNameGeorgia:C_ItemSAH -1.074e+05 9.968e+04 -1.077 0.281707
## GeoNameHawaii:C_ItemSAH -2.973e+04 8.913e+04 -0.334 0.738754
## GeoNameIllinois:C_ItemSAH -1.343e+05 1.080e+05 -1.243 0.213966
## GeoNameKansas:C_ItemSAH 7.005e+04 1.206e+05 0.581 0.561307
## GeoNameMassachusetts:C_ItemSAH -2.076e+05 9.736e+04 -2.132 0.033185 *
## GeoNameMichigan:C_ItemSAH -1.302e+04 7.072e+04 -0.184 0.853994
## GeoNameMissouri:C_ItemSAH 2.351e+04 1.138e+05 0.207 0.836335
## GeoNameNew York:C_ItemSAH -1.467e+05 9.735e+04 -1.506 0.132186
## GeoNameOhio:C_ItemSAH -1.276e+05 1.010e+05 -1.263 0.206671
## GeoNameOregon:C_ItemSAH 2.515e+04 1.208e+05 0.208 0.835074
## GeoNamePennsylvania:C_ItemSAH -2.352e+05 9.595e+04 -2.452 0.014340 *
## GeoNameTexas:C_ItemSAH 6.289e+04 1.010e+05 0.623 0.533598
## GeoNameWashington:C_ItemSAH 5.450e+03 1.044e+05 0.052 0.958393
## GeoNameWisconsin:C_ItemSAH 4.471e+04 1.243e+05 0.360 0.719062
## GeoNameArizona:C_ItemSAM 4.398e+02 1.238e+05 0.004 0.997165
## GeoNameCalifornia:C_ItemSAM 1.526e+06 9.691e+04 15.751 < 2e-16 ***
## GeoNameColorado:C_ItemSAM 4.276e+04 9.460e+04 0.452 0.651292
## GeoNameFlorida:C_ItemSAM 5.123e+05 8.851e+04 5.788 8.85e-09 ***
## GeoNameGeorgia:C_ItemSAM 9.187e+04 9.681e+04 0.949 0.342793
## GeoNameHawaii:C_ItemSAM -4.336e+04 8.778e+04 -0.494 0.621404
## GeoNameIllinois:C_ItemSAM 3.361e+05 8.951e+04 3.755 0.000181 ***
## GeoNameKansas:C_ItemSAM 1.510e+05 1.191e+05 1.268 0.205068
## GeoNameMassachusetts:C_ItemSAM 6.510e+04 9.395e+04 0.693 0.488500
## GeoNameMichigan:C_ItemSAM 3.716e+05 7.140e+04 5.204 2.25e-07 ***
## GeoNameMissouri:C_ItemSAM 1.297e+05 9.674e+04 1.341 0.180128
## GeoNameNew York:C_ItemSAM 6.133e+05 9.517e+04 6.445 1.61e-10 ***
## GeoNameOhio:C_ItemSAM 3.684e+05 9.054e+04 4.069 5.00e-05 ***
## GeoNameOregon:C_ItemSAM 4.647e+04 9.094e+04 0.511 0.609416
## GeoNamePennsylvania:C_ItemSAM 4.074e+05 8.897e+04 4.579 5.11e-06 ***
## GeoNameTexas:C_ItemSAM 3.713e+05 8.919e+04 4.163 3.35e-05 ***
## GeoNameWashington:C_ItemSAM 7.977e+04 9.302e+04 0.858 0.391271
## GeoNameWisconsin:C_ItemSAM 1.784e+05 9.065e+04 1.968 0.049236 *
## GeoNameArizona:C_ItemSAT NA NA NA NA
## GeoNameCalifornia:C_ItemSAT 2.267e+05 7.942e+04 2.855 0.004376 **
## GeoNameColorado:C_ItemSAT 4.114e+04 7.961e+04 0.517 0.605407
## GeoNameFlorida:C_ItemSAT 1.608e+05 7.035e+04 2.286 0.022403 *
## GeoNameGeorgia:C_ItemSAT 6.676e+04 7.027e+04 0.950 0.342214
## GeoNameHawaii:C_ItemSAT -4.056e+03 7.926e+04 -0.051 0.959190
## GeoNameIllinois:C_ItemSAT 9.727e+04 7.058e+04 1.378 0.168366
## GeoNameKansas:C_ItemSAT 1.367e+03 7.757e+04 0.018 0.985939
## GeoNameMassachusetts:C_ItemSAT -1.306e+05 7.210e+04 -1.811 0.070360 .
## GeoNameMichigan:C_ItemSAT -2.643e+04 5.749e+04 -0.460 0.645810
## GeoNameMissouri:C_ItemSAT 4.436e+04 7.604e+04 0.583 0.559736
## GeoNameNew York:C_ItemSAT -4.222e+04 7.414e+04 -0.569 0.569164
## GeoNameOhio:C_ItemSAT 4.198e+04 6.920e+04 0.607 0.544159
## GeoNameOregon:C_ItemSAT -2.353e+03 7.868e+04 -0.030 0.976145
## GeoNamePennsylvania:C_ItemSAT -7.067e+04 7.802e+04 -0.906 0.365181
## GeoNameTexas:C_ItemSAT 2.694e+05 7.013e+04 3.841 0.000128 ***
## GeoNameWashington:C_ItemSAT -2.090e+04 7.091e+04 -0.295 0.768256
## GeoNameWisconsin:C_ItemSAT 4.236e+04 8.219e+04 0.515 0.606337
## CPI:GeoNameArizona:C_ItemSAF -1.167e+02 1.783e+02 -0.655 0.512897
## CPI:GeoNameCalifornia:C_ItemSAF -2.401e+01 1.046e+02 -0.230 0.818480
## CPI:GeoNameColorado:C_ItemSAF -3.499e+00 1.416e+02 -0.025 0.980284
## CPI:GeoNameFlorida:C_ItemSAF -1.412e+01 1.058e+02 -0.133 0.893872
## CPI:GeoNameGeorgia:C_ItemSAF 7.318e+01 1.048e+02 0.699 0.484986
## CPI:GeoNameHawaii:C_ItemSAF 9.205e+01 1.246e+02 0.738 0.460364
## CPI:GeoNameIllinois:C_ItemSAF 3.229e+01 1.029e+02 0.314 0.753746
## CPI:GeoNameKansas:C_ItemSAF 9.394e+01 1.283e+02 0.732 0.464160
## CPI:GeoNameMassachusetts:C_ItemSAF 6.235e+01 1.044e+02 0.597 0.550448
## CPI:GeoNameMichigan:C_ItemSAF 7.058e+01 1.032e+02 0.684 0.494186
## CPI:GeoNameMissouri:C_ItemSAF 6.318e+01 1.334e+02 0.474 0.635728
## CPI:GeoNameNew York:C_ItemSAF 2.452e+01 1.029e+02 0.238 0.811745
## CPI:GeoNameOhio:C_ItemSAF 7.883e+01 1.079e+02 0.730 0.465269
## CPI:GeoNameOregon:C_ItemSAF 6.407e+01 1.332e+02 0.481 0.630570
## CPI:GeoNamePennsylvania:C_ItemSAF 7.651e+01 1.074e+02 0.712 0.476502
## CPI:GeoNameTexas:C_ItemSAF 3.844e+01 1.049e+02 0.366 0.714191
## CPI:GeoNameWashington:C_ItemSAF 9.269e+01 1.045e+02 0.887 0.375089
## CPI:GeoNameWisconsin:C_ItemSAF 1.698e+01 1.279e+02 0.133 0.894435
## CPI:GeoNameArizona:C_ItemSAH -9.152e+01 1.144e+02 -0.800 0.423928
## CPI:GeoNameCalifornia:C_ItemSAH -1.408e+02 7.642e+01 -1.843 0.065608 .
## CPI:GeoNameColorado:C_ItemSAH -2.947e+01 1.058e+02 -0.279 0.780639
## CPI:GeoNameFlorida:C_ItemSAH -1.071e+02 7.631e+01 -1.403 0.160883
## CPI:GeoNameGeorgia:C_ItemSAH -9.055e+01 7.697e+01 -1.176 0.239657
## CPI:GeoNameHawaii:C_ItemSAH 2.760e+00 9.596e+01 0.029 0.977059
## CPI:GeoNameIllinois:C_ItemSAH -1.022e+02 7.499e+01 -1.363 0.173069
## CPI:GeoNameKansas:C_ItemSAH -7.063e+00 1.023e+02 -0.069 0.944953
## CPI:GeoNameMassachusetts:C_ItemSAH -9.327e+01 7.664e+01 -1.217 0.223836
## CPI:GeoNameMichigan:C_ItemSAH -8.994e+01 7.543e+01 -1.192 0.233356
## CPI:GeoNameMissouri:C_ItemSAH -3.424e+00 1.030e+02 -0.033 0.973479
## CPI:GeoNameNew York:C_ItemSAH -1.185e+02 7.508e+01 -1.578 0.114704
## CPI:GeoNameOhio:C_ItemSAH -7.441e+01 8.051e+01 -0.924 0.355474
## CPI:GeoNameOregon:C_ItemSAH 9.579e-01 1.040e+02 0.009 0.992651
## CPI:GeoNamePennsylvania:C_ItemSAH -8.797e+01 7.978e+01 -1.103 0.270376
## CPI:GeoNameTexas:C_ItemSAH -1.126e+02 7.719e+01 -1.459 0.144813
## CPI:GeoNameWashington:C_ItemSAH -7.491e+01 7.669e+01 -0.977 0.328869
## CPI:GeoNameWisconsin:C_ItemSAH -4.547e+01 9.970e+01 -0.456 0.648453
## CPI:GeoNameArizona:C_ItemSAM 5.445e+02 3.876e+02 1.405 0.160328
## CPI:GeoNameCalifornia:C_ItemSAM -3.311e+01 1.113e+02 -0.297 0.766136
## CPI:GeoNameColorado:C_ItemSAM 1.434e+02 1.467e+02 0.977 0.328708
## CPI:GeoNameFlorida:C_ItemSAM 1.179e+02 1.088e+02 1.084 0.278632
## CPI:GeoNameGeorgia:C_ItemSAM 1.066e+02 1.128e+02 0.945 0.344581
## CPI:GeoNameHawaii:C_ItemSAM 2.990e+01 1.510e+02 0.198 0.843052
## CPI:GeoNameIllinois:C_ItemSAM 6.160e+01 1.066e+02 0.578 0.563470
## CPI:GeoNameKansas:C_ItemSAM -1.002e+02 2.022e+02 -0.496 0.620202
## CPI:GeoNameMassachusetts:C_ItemSAM 9.380e+01 1.093e+02 0.858 0.390835
## CPI:GeoNameMichigan:C_ItemSAM 5.383e+01 1.073e+02 0.502 0.615894
## CPI:GeoNameMissouri:C_ItemSAM 1.736e+02 1.567e+02 1.108 0.268081
## CPI:GeoNameNew York:C_ItemSAM 9.626e+01 1.070e+02 0.900 0.368372
## CPI:GeoNameOhio:C_ItemSAM 1.718e+02 1.185e+02 1.449 0.147491
## CPI:GeoNameOregon:C_ItemSAM 1.354e+02 1.430e+02 0.947 0.344056
## CPI:GeoNamePennsylvania:C_ItemSAM 1.409e+02 1.153e+02 1.222 0.221831
## CPI:GeoNameTexas:C_ItemSAM 3.251e+02 1.109e+02 2.930 0.003447 **
## CPI:GeoNameWashington:C_ItemSAM 9.958e+01 1.125e+02 0.885 0.376319
## CPI:GeoNameWisconsin:C_ItemSAM 1.057e+02 1.438e+02 0.735 0.462474
## CPI:GeoNameArizona:C_ItemSAT NA NA NA NA
## CPI:GeoNameCalifornia:C_ItemSAT -1.665e+02 7.671e+01 -2.170 0.030180 *
## CPI:GeoNameColorado:C_ItemSAT -5.181e+01 1.046e+02 -0.495 0.620414
## CPI:GeoNameFlorida:C_ItemSAT -1.317e+02 7.648e+01 -1.721 0.085410 .
## CPI:GeoNameGeorgia:C_ItemSAT -9.928e+01 7.716e+01 -1.287 0.198442
## CPI:GeoNameHawaii:C_ItemSAT -9.898e+00 9.689e+01 -0.102 0.918647
## CPI:GeoNameIllinois:C_ItemSAT -1.112e+02 7.516e+01 -1.480 0.139118
## CPI:GeoNameKansas:C_ItemSAT 1.249e+01 1.008e+02 0.124 0.901465
## CPI:GeoNameMassachusetts:C_ItemSAT -1.013e+02 7.687e+01 -1.318 0.187642
## CPI:GeoNameMichigan:C_ItemSAT -9.432e+01 7.567e+01 -1.246 0.212805
## CPI:GeoNameMissouri:C_ItemSAT -1.095e+01 1.020e+02 -0.107 0.914510
## CPI:GeoNameNew York:C_ItemSAT -1.310e+02 7.532e+01 -1.740 0.082109 .
## CPI:GeoNameOhio:C_ItemSAT -8.882e+01 8.058e+01 -1.102 0.270534
## CPI:GeoNameOregon:C_ItemSAT -2.637e+00 1.030e+02 -0.026 0.979570
## CPI:GeoNamePennsylvania:C_ItemSAT -9.817e+01 8.003e+01 -1.227 0.220171
## CPI:GeoNameTexas:C_ItemSAT -1.249e+02 7.736e+01 -1.615 0.106630
## CPI:GeoNameWashington:C_ItemSAT -7.795e+01 7.687e+01 -1.014 0.310751
## CPI:GeoNameWisconsin:C_ItemSAT -4.298e+01 9.830e+01 -0.437 0.662058
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 51690 on 1328 degrees of freedom
## Multiple R-squared: 0.9797, Adjusted R-squared: 0.9765
## F-statistic: 310.7 on 206 and 1328 DF, p-value: < 2.2e-16
lm_cpi_item_name3 <- lm(Jobs ~ (CPI*GeoName) + (GeoName*C_Item)+CPI+GeoName+C_Item+Year,ALL_train)
summary(lm_cpi_item_name3)##
## Call:
## lm(formula = Jobs ~ (CPI * GeoName) + (GeoName * C_Item) + CPI +
## GeoName + C_Item + Year, data = ALL_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -411031 -15334 343 17811 571733
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.959e+04 1.588e+04 -1.863 0.062699 .
## CPI -4.396e+01 1.111e+01 -3.955 8.03e-05 ***
## GeoNameArizona 5.196e+04 2.784e+04 1.867 0.062172 .
## GeoNameCalifornia 3.608e+05 2.042e+04 17.668 < 2e-16 ***
## GeoNameColorado 4.682e+04 2.276e+04 2.057 0.039846 *
## GeoNameFlorida 1.313e+05 2.095e+04 6.265 4.94e-10 ***
## GeoNameGeorgia 6.456e+04 2.061e+04 3.133 0.001766 **
## GeoNameHawaii 6.451e+03 2.149e+04 0.300 0.764092
## GeoNameIllinois 1.509e+05 2.094e+04 7.205 9.46e-13 ***
## GeoNameKansas 1.895e+04 2.132e+04 0.889 0.374296
## GeoNameMassachusetts 1.752e+05 2.064e+04 8.490 < 2e-16 ***
## GeoNameMichigan 6.712e+04 1.767e+04 3.798 0.000152 ***
## GeoNameMissouri 6.410e+04 2.225e+04 2.881 0.004029 **
## GeoNameNew York 3.666e+05 2.202e+04 16.651 < 2e-16 ***
## GeoNameOhio 1.049e+05 2.030e+04 5.167 2.73e-07 ***
## GeoNameOregon 3.811e+04 2.097e+04 1.818 0.069317 .
## GeoNamePennsylvania 2.257e+05 2.038e+04 11.074 < 2e-16 ***
## GeoNameTexas 1.598e+05 2.063e+04 7.748 1.78e-14 ***
## GeoNameWashington 4.028e+04 2.029e+04 1.985 0.047332 *
## GeoNameWisconsin 4.920e+04 2.241e+04 2.196 0.028282 *
## C_ItemSAF 5.458e+04 2.586e+04 2.111 0.034973 *
## C_ItemSAH 1.803e+05 4.987e+04 3.615 0.000311 ***
## C_ItemSAM 4.590e+04 2.144e+04 2.141 0.032447 *
## C_ItemSAT 1.261e+05 3.568e+04 3.534 0.000423 ***
## Year1999 2.520e+03 9.228e+03 0.273 0.784834
## Year2000 4.846e+03 9.145e+03 0.530 0.596281
## Year2001 1.208e+04 9.129e+03 1.323 0.186081
## Year2002 1.932e+04 9.184e+03 2.104 0.035542 *
## Year2003 2.242e+04 9.163e+03 2.447 0.014534 *
## Year2004 2.822e+04 9.228e+03 3.058 0.002268 **
## Year2005 4.095e+04 9.271e+03 4.417 1.08e-05 ***
## Year2006 5.384e+04 9.394e+03 5.731 1.22e-08 ***
## Year2007 6.092e+04 9.240e+03 6.593 6.08e-11 ***
## Year2008 6.393e+04 9.526e+03 6.711 2.80e-11 ***
## Year2009 6.340e+04 9.386e+03 6.755 2.09e-11 ***
## Year2010 7.009e+04 9.655e+03 7.259 6.41e-13 ***
## Year2011 7.844e+04 9.791e+03 8.011 2.37e-15 ***
## Year2012 8.623e+04 9.850e+03 8.754 < 2e-16 ***
## Year2013 9.791e+04 1.000e+04 9.792 < 2e-16 ***
## Year2014 1.033e+05 9.990e+03 10.343 < 2e-16 ***
## Year2015 1.149e+05 9.695e+03 11.856 < 2e-16 ***
## Year2016 1.226e+05 9.533e+03 12.862 < 2e-16 ***
## Year2017 1.773e+05 1.126e+04 15.748 < 2e-16 ***
## CPI:GeoNameArizona -2.885e+01 4.674e+01 -0.617 0.537259
## CPI:GeoNameCalifornia 5.796e+01 1.131e+01 5.124 3.41e-07 ***
## CPI:GeoNameColorado 2.159e+00 1.930e+01 0.112 0.910947
## CPI:GeoNameFlorida 4.743e+01 1.148e+01 4.131 3.83e-05 ***
## CPI:GeoNameGeorgia 4.316e+01 1.120e+01 3.855 0.000121 ***
## CPI:GeoNameHawaii 8.387e+00 1.413e+01 0.594 0.552841
## CPI:GeoNameIllinois 4.213e+01 1.116e+01 3.775 0.000167 ***
## CPI:GeoNameKansas -7.637e+00 1.702e+01 -0.449 0.653812
## CPI:GeoNameMassachusetts 4.174e+01 1.121e+01 3.725 0.000203 ***
## CPI:GeoNameMichigan 4.296e+01 1.112e+01 3.862 0.000118 ***
## CPI:GeoNameMissouri -2.867e+00 1.691e+01 -0.170 0.865397
## CPI:GeoNameNew York 4.594e+01 1.116e+01 4.118 4.05e-05 ***
## CPI:GeoNameOhio 4.177e+01 1.145e+01 3.647 0.000275 ***
## CPI:GeoNameOregon 1.307e+00 1.696e+01 0.077 0.938595
## CPI:GeoNamePennsylvania 4.563e+01 1.157e+01 3.945 8.37e-05 ***
## CPI:GeoNameTexas 4.824e+01 1.122e+01 4.298 1.84e-05 ***
## CPI:GeoNameWashington 4.089e+01 1.118e+01 3.659 0.000262 ***
## CPI:GeoNameWisconsin -1.978e+00 1.822e+01 -0.109 0.913544
## GeoNameArizona:C_ItemSAF 8.115e+02 5.065e+04 0.016 0.987221
## GeoNameCalifornia:C_ItemSAF -1.848e+05 3.645e+04 -5.072 4.47e-07 ***
## GeoNameColorado:C_ItemSAF -7.257e+03 4.251e+04 -0.171 0.864463
## GeoNameFlorida:C_ItemSAF -2.525e+04 3.500e+04 -0.721 0.470841
## GeoNameGeorgia:C_ItemSAF -5.869e+04 3.483e+04 -1.685 0.092263 .
## GeoNameHawaii:C_ItemSAF 1.966e+03 3.630e+04 0.054 0.956807
## GeoNameIllinois:C_ItemSAF -7.989e+04 3.682e+04 -2.170 0.030189 *
## GeoNameKansas:C_ItemSAF 3.309e+04 3.961e+04 0.835 0.403593
## GeoNameMassachusetts:C_ItemSAF -1.465e+05 3.468e+04 -4.225 2.54e-05 ***
## GeoNameMichigan:C_ItemSAF -6.246e+04 2.971e+04 -2.102 0.035704 *
## GeoNameMissouri:C_ItemSAF 1.592e+03 3.938e+04 0.040 0.967759
## GeoNameNew York:C_ItemSAF -2.565e+05 3.799e+04 -6.751 2.14e-11 ***
## GeoNameOhio:C_ItemSAF -6.818e+04 3.471e+04 -1.964 0.049690 *
## GeoNameOregon:C_ItemSAF 3.386e+03 3.848e+04 0.088 0.929912
## GeoNamePennsylvania:C_ItemSAF -1.601e+05 3.491e+04 -4.587 4.89e-06 ***
## GeoNameTexas:C_ItemSAF -4.920e+04 3.480e+04 -1.414 0.157693
## GeoNameWashington:C_ItemSAF -4.016e+04 3.434e+04 -1.169 0.242468
## GeoNameWisconsin:C_ItemSAF 2.217e+04 4.128e+04 0.537 0.591277
## GeoNameArizona:C_ItemSAH 1.291e+05 1.262e+05 1.023 0.306466
## GeoNameCalifornia:C_ItemSAH 1.110e+05 7.617e+04 1.457 0.145409
## GeoNameColorado:C_ItemSAH 7.857e+04 7.882e+04 0.997 0.318982
## GeoNameFlorida:C_ItemSAH 1.428e+05 6.473e+04 2.206 0.027528 *
## GeoNameGeorgia:C_ItemSAH -5.551e+04 6.783e+04 -0.818 0.413237
## GeoNameHawaii:C_ItemSAH -1.913e+03 6.639e+04 -0.029 0.977014
## GeoNameIllinois:C_ItemSAH -5.420e+04 6.629e+04 -0.818 0.413662
## GeoNameKansas:C_ItemSAH 3.521e+04 7.149e+04 0.492 0.622465
## GeoNameMassachusetts:C_ItemSAH -1.753e+05 7.031e+04 -2.493 0.012781 *
## GeoNameMichigan:C_ItemSAH -5.101e+04 5.239e+04 -0.974 0.330428
## GeoNameMissouri:C_ItemSAH 4.503e+04 7.003e+04 0.643 0.520360
## GeoNameNew York:C_ItemSAH -1.127e+05 6.966e+04 -1.617 0.106016
## GeoNameOhio:C_ItemSAH -5.748e+04 6.410e+04 -0.897 0.370007
## GeoNameOregon:C_ItemSAH 4.217e+04 7.419e+04 0.568 0.569860
## GeoNamePennsylvania:C_ItemSAH -2.216e+05 7.093e+04 -3.124 0.001819 **
## GeoNameTexas:C_ItemSAH 1.018e+05 6.666e+04 1.526 0.127120
## GeoNameWashington:C_ItemSAH -3.536e+03 6.800e+04 -0.052 0.958535
## GeoNameWisconsin:C_ItemSAH 4.001e+04 7.503e+04 0.533 0.593927
## GeoNameArizona:C_ItemSAM 2.064e+05 3.139e+04 6.575 6.85e-11 ***
## GeoNameCalifornia:C_ItemSAM 1.509e+06 2.933e+04 51.455 < 2e-16 ***
## GeoNameColorado:C_ItemSAM 1.767e+05 3.122e+04 5.661 1.82e-08 ***
## GeoNameFlorida:C_ItemSAM 8.243e+05 2.971e+04 27.741 < 2e-16 ***
## GeoNameGeorgia:C_ItemSAM 3.009e+05 2.968e+04 10.140 < 2e-16 ***
## GeoNameHawaii:C_ItemSAM 1.399e+04 3.040e+04 0.460 0.645487
## GeoNameIllinois:C_ItemSAM 5.630e+05 2.902e+04 19.400 < 2e-16 ***
## GeoNameKansas:C_ItemSAM 1.178e+05 3.003e+04 3.922 9.20e-05 ***
## GeoNameMassachusetts:C_ItemSAM 3.050e+05 2.938e+04 10.381 < 2e-16 ***
## GeoNameMichigan:C_ItemSAM 4.678e+05 2.576e+04 18.158 < 2e-16 ***
## GeoNameMissouri:C_ItemSAM 2.698e+05 3.037e+04 8.883 < 2e-16 ***
## GeoNameNew York:C_ItemSAM 1.003e+06 3.015e+04 33.278 < 2e-16 ***
## GeoNameOhio:C_ItemSAM 6.026e+05 2.950e+04 20.430 < 2e-16 ***
## GeoNameOregon:C_ItemSAM 1.554e+05 3.077e+04 5.049 5.02e-07 ***
## GeoNamePennsylvania:C_ItemSAM 6.298e+05 2.954e+04 21.318 < 2e-16 ***
## GeoNameTexas:C_ItemSAM 1.018e+06 2.892e+04 35.222 < 2e-16 ***
## GeoNameWashington:C_ItemSAM 2.604e+05 2.974e+04 8.756 < 2e-16 ***
## GeoNameWisconsin:C_ItemSAM 2.850e+05 3.050e+04 9.345 < 2e-16 ***
## GeoNameArizona:C_ItemSAT NA NA NA NA
## GeoNameCalifornia:C_ItemSAT -4.066e+04 4.977e+04 -0.817 0.413995
## GeoNameColorado:C_ItemSAT 2.262e+03 5.432e+04 0.042 0.966789
## GeoNameFlorida:C_ItemSAT 1.286e+04 4.701e+04 0.274 0.784476
## GeoNameGeorgia:C_ItemSAT 4.956e+03 4.861e+04 0.102 0.918816
## GeoNameHawaii:C_ItemSAT -2.215e+04 4.785e+04 -0.463 0.643455
## GeoNameIllinois:C_ItemSAT 3.752e+04 4.753e+04 0.789 0.430046
## GeoNameKansas:C_ItemSAT 3.919e+04 5.430e+04 0.722 0.470499
## GeoNameMassachusetts:C_ItemSAT -1.964e+05 4.755e+04 -4.130 3.84e-05 ***
## GeoNameMichigan:C_ItemSAT -7.678e+04 3.886e+04 -1.976 0.048387 *
## GeoNameMissouri:C_ItemSAT 4.066e+04 5.088e+04 0.799 0.424430
## GeoNameNew York:C_ItemSAT -2.092e+05 4.697e+04 -4.455 9.06e-06 ***
## GeoNameOhio:C_ItemSAT -1.614e+04 4.947e+04 -0.326 0.744249
## GeoNameOregon:C_ItemSAT 1.760e+03 5.482e+04 0.032 0.974389
## GeoNamePennsylvania:C_ItemSAT -1.393e+05 5.099e+04 -2.731 0.006386 **
## GeoNameTexas:C_ItemSAT 1.531e+05 4.857e+04 3.152 0.001657 **
## GeoNameWashington:C_ItemSAT -2.744e+04 4.957e+04 -0.554 0.579940
## GeoNameWisconsin:C_ItemSAT 4.489e+04 5.572e+04 0.806 0.420586
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 56940 on 1403 degrees of freedom
## Multiple R-squared: 0.9739, Adjusted R-squared: 0.9715
## F-statistic: 400.1 on 131 and 1403 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = Jobs ~ (CPI^2) + C_Item, data = ALL_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -616796 -91342 -25855 55837 1998242
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 119850.952 14762.062 8.119 9.59e-16 ***
## CPI 14.469 1.205 12.007 < 2e-16 ***
## C_ItemSAF -83340.521 21354.232 -3.903 9.92e-05 ***
## C_ItemSAH -51191.743 25599.996 -2.000 0.0457 *
## C_ItemSAM 516863.414 20820.848 24.824 < 2e-16 ***
## C_ItemSAT -48057.489 23294.045 -2.063 0.0393 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 258800 on 1529 degrees of freedom
## Multiple R-squared: 0.4133, Adjusted R-squared: 0.4114
## F-statistic: 215.4 on 5 and 1529 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = Jobs ~ CPI + Year, data = ALL_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -450389 -187647 -107860 50766 2267546
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 195201.635 37167.798 5.252 1.72e-07 ***
## CPI 3.702 1.199 3.088 0.00205 **
## Year1999 -2339.270 53706.025 -0.044 0.96526
## Year2000 28556.938 53147.060 0.537 0.59113
## Year2001 20401.869 52959.983 0.385 0.70012
## Year2002 -13163.374 53326.981 -0.247 0.80506
## Year2003 33074.846 52991.190 0.624 0.53262
## Year2004 37336.067 53146.995 0.703 0.48247
## Year2005 32933.063 52997.528 0.621 0.53443
## Year2006 45192.463 53202.290 0.849 0.39577
## Year2007 67772.427 51895.390 1.306 0.19177
## Year2008 68241.262 52554.604 1.298 0.19432
## Year2009 79495.401 52655.291 1.510 0.13132
## Year2010 50931.150 53065.458 0.960 0.33732
## Year2011 54786.919 52422.797 1.045 0.29614
## Year2012 53126.008 52604.456 1.010 0.31270
## Year2013 98888.194 53099.558 1.862 0.06275 .
## Year2014 67205.157 52954.860 1.269 0.20460
## Year2015 100441.500 52252.465 1.922 0.05476 .
## Year2016 104378.776 51823.842 2.014 0.04417 *
## Year2017 313382.749 62837.828 4.987 6.83e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 333700 on 1514 degrees of freedom
## Multiple R-squared: 0.03372, Adjusted R-squared: 0.02096
## F-statistic: 2.642 on 20 and 1514 DF, p-value: 0.0001048
xyplot(ALL_test$Jobs ~ predict(lm),
## plot the points (type = 'p') and a background grid ('g')
type = c("p", "g"),
xlab = "Predicted", ylab = "Observed")## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
xyplot(ALL_test$Jobs ~ predict(lm_cpi_year),
## plot the points (type = 'p') and a background grid ('g')
type = c("p", "g"),
xlab = "Predicted", ylab = "Observed")xyplot(resid(lm_cpi_year) ~ predict(lm_cpi_year),
type = c("p", "g"),
xlab = "Predicted", ylab = "Residuals")#PLS
# Useing train perform to perfrom pre-process and tuning together. The function first preprocess the training set by centering it and scaling it. Then the function uses 10-fold cross validation to try the number of components, i.e. latent variables, of the PLS model from 1 to 20.
# # The train function generates a resampling estimate of performance. Because
# the training set size is not small, 10-fold cross-validation should produce
# reasonable estimates of model performance. The function trainControl specifies
# the type of resampling:
ctrl <- trainControl(method = "cv", number = 10)
model_pls<- train(Jobs ~ ., ALL_train,
method = "pls",
tuneLength = 20,
metric='Rsquared',
trControl = ctrl,
preProc = c("center", "scale"))## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Partial Least Squares
##
## 1535 samples
## 4 predictor
##
## Pre-processing: centered (43), scaled (43)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 1381, 1381, 1382, 1382, 1380, 1382, ...
## Resampling results across tuning parameters:
##
## ncomp RMSE Rsquared MAE
## 1 183452.0 0.7068647 123594.6
## 2 179034.4 0.7198927 119609.3
## 3 173253.5 0.7388455 113591.6
## 4 171907.1 0.7427542 111394.8
## 5 168304.6 0.7527036 110675.7
## 6 166505.4 0.7587264 112089.3
## 7 165871.1 0.7608138 111006.2
## 8 165449.4 0.7619987 109881.7
## 9 165463.4 0.7618864 109977.4
## 10 165454.5 0.7619110 109963.2
## 11 165454.1 0.7619150 109956.8
## 12 165448.7 0.7619280 109951.7
## 13 165449.3 0.7619267 109952.2
## 14 165449.2 0.7619268 109952.2
## 15 165449.2 0.7619268 109952.2
## 16 165449.2 0.7619268 109952.2
## 17 165449.2 0.7619268 109952.2
## 18 165449.2 0.7619268 109952.2
## 19 165449.2 0.7619268 109952.2
## 20 165449.2 0.7619268 109952.2
##
## Rsquared was used to select the optimal model using the largest value.
## The final value used for the model was ncomp = 8.
# M
model_glm<- train(Jobs ~ ., ALL_train,
method = "glm",
tuneLength = 20,
metric='Rsquared',
trControl = ctrl,
preProc = c("center", "scale"))## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type
## == : prediction from a rank-deficient fit may be misleading
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type
## == : prediction from a rank-deficient fit may be misleading
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type
## == : prediction from a rank-deficient fit may be misleading
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type
## == : prediction from a rank-deficient fit may be misleading
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type
## == : prediction from a rank-deficient fit may be misleading
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type
## == : prediction from a rank-deficient fit may be misleading
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type
## == : prediction from a rank-deficient fit may be misleading
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type
## == : prediction from a rank-deficient fit may be misleading
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type
## == : prediction from a rank-deficient fit may be misleading
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type
## == : prediction from a rank-deficient fit may be misleading
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Generalized Linear Model
##
## 1535 samples
## 4 predictor
##
## Pre-processing: centered (43), scaled (43)
## Resampling: Cross-Validated (10 fold)
## Summary of sample sizes: 1381, 1381, 1382, 1381, 1382, 1381, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 164153.9 0.7657161 109914.9
# test_model(test_dataset = model_pls,
# predictor_dataset = ALL_train,
# test_type = "regression")
# TEST MODEL with Train data
xyplot(ALL_train$Jobs ~ predict(model_pls),
## plot the points (type = 'p') and a background grid ('g')
type = c("p", "g"),
xlab = "Predicted", ylab = "Observed")xyplot(ALL_train$Jobs ~ predict(lm),
## plot the points (type = 'p') and a background grid ('g')
type = c("p", "g"),
xlab = "Predicted", ylab = "Observed")## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
xyplot(ALL_train$Jobs ~ predict(lm_cpi2_item),
## plot the points (type = 'p') and a background grid ('g')
type = c("p", "g"),
xlab = "Predicted", ylab = "Observed")xyplot(ALL_train$Jobs ~ predict(lm_cpi_item),
## plot the points (type = 'p') and a background grid ('g')
type = c("p", "g"),
xlab = "Predicted", ylab = "Observed")xyplot(ALL_train$Jobs ~ predict(model_glm),
## plot the points (type = 'p') and a background grid ('g')
type = c("p", "g"),
xlab = "Predicted", ylab = "Observed")## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type
## == : prediction from a rank-deficient fit may be misleading
# TEST MODEL with TEst data
xyplot(ALL_test$Jobs ~ predict(model_pls,ALL_test),
## plot the points (type = 'p') and a background grid ('g')
type = c("p", "g"),
xlab = "Predicted", ylab = "Observed")xyplot(ALL_test$Jobs ~ predict(lm,ALL_test),
## plot the points (type = 'p') and a background grid ('g')
type = c("p", "g"),
xlab = "Predicted", ylab = "Observed")## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
xyplot(ALL_test$Jobs ~ predict(lm_cpi_item,ALL_test),
## plot the points (type = 'p') and a background grid ('g')
type = c("p", "g"),
xlab = "Predicted", ylab = "Observed")# REssidual Plot with train data
xyplot(resid(model_pls) ~ predict(model_pls),
type = c("p", "g"),
xlab = "Predicted", ylab = "Residuals")## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
xyplot(resid(lm_cpi_item) ~ predict(lm_cpi_item),
type = c("p", "g"),
xlab = "Predicted", ylab = "Residuals")xyplot(resid(lm_cpi2_item) ~ predict(lm_cpi2_item),
type = c("p", "g"),
xlab = "Predicted", ylab = "Residuals")xyplot(resid(lm_cpi2_item) ~ predict(lm_cpi2_item),
type = c("p", "g"),
xlab = "Predicted", ylab = "Residuals")# REssidual Plot with Test data
xyplot(resid(model_pls) ~ predict(model_pls,ALL_test),
type = c("p", "g"),
xlab = "Predicted", ylab = "Residuals")## Warning in is.na(x) | is.na(y): longer object length is not a multiple of
## shorter object length
xyplot(resid(model_glm) ~ predict(model_glm,ALL_test),
type = c("p", "g"),
xlab = "Predicted", ylab = "Residuals")## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type
## == : prediction from a rank-deficient fit may be misleading
## Warning in is.na(x) | is.na(y): longer object length is not a multiple of
## shorter object length
xyplot(resid(lm_cpi_item) ~ predict(lm_cpi_item,ALL_test),
type = c("p", "g"),
xlab = "Predicted", ylab = "Residuals")## Warning in is.na(x) | is.na(y): longer object length is not a multiple of
## shorter object length
#---------------------VALID
xyplot(ALL_train$Jobs ~ predict(lm_cpi_item_name2,ALL_train),
type = c("p", "g"),
xlab = "Predicted", ylab = "Actual Train")## Warning in predict.lm(lm_cpi_item_name2, ALL_train): prediction from a rank-
## deficient fit may be misleading
xyplot(ALL_test$Jobs ~ predict(lm_cpi_item_name2,ALL_test),
type = c("p", "g"),
xlab = "Predicted", ylab = "Actual Test")## Warning in predict.lm(lm_cpi_item_name2, ALL_test): prediction from a rank-
## deficient fit may be misleading
# xyplot(resid(lm_cpi_item_name3) ~ predict(lm_cpi_item_name3,ALL_test),
# type = c("p", "g"),
# xlab = "Predicted", ylab = "Residuals")
head(resid(lm_cpi_item_name2) )## 1 2 5 6 7 8
## -8030.268 -7749.304 -2670.667 3554.078 8762.014 5327.627
## 1 2 5 6 7 8
## 14954.268 14683.304 8991.667 2915.922 -2209.014 1511.373
## [1] 6924 6934 6321 6470 6553 6839
## 1 2 5 6 7 8
## -8030.268 -7749.304 -2670.667 3554.078 8762.014 5327.627
5.3 Graph for Predication
## PRedict
# Grapg suggest we we are able to explain the situation in most of the year with some outliers .
data.frame(pred=predict(lm_cpi_item_name2,ALL_test),actual = ALL_test$Jobs,year=ALL_test$Year,geo=ALL_test$GeoName) %>% ggplot(mapping = aes(x=pred,y=year )) +geom_point(mapping = aes(x=actual,color = "red")) + geom_line() ## Warning in predict.lm(lm_cpi_item_name2, ALL_test): prediction from a rank-
## deficient fit may be misleading
data.frame(pred=predict(lm_cpi_item_name2,ALL_test),actual = ALL_test$Jobs,year=ALL_test$Year,geo=ALL_test$GeoName, indus = ALL_test$C_Item) %>% filter(geo=="New York") %>%ggplot(mapping = aes(x=pred,y=year )) +geom_point(mapping = aes(x=actual,color = "Actual")) + geom_point(aes(color = "Pred") )+ facet_wrap(~indus) + labs(title = "New York Data for No. Of Jobs by Sector",
y = "Year",x= "Jobs") + theme_classic()## Warning in predict.lm(lm_cpi_item_name2, ALL_test): prediction from a rank-
## deficient fit may be misleading
data.frame(pred=predict(lm_cpi_item_name2,ALL_test),actual = ALL_test$Jobs,year=ALL_test$Year,geo=ALL_test$GeoName, indus = ALL_test$C_Item) %>% filter(geo=="Florida") %>% ggplot(mapping = aes(x=pred,y=year )) +geom_point(mapping = aes(x=actual,color = "Actual")) + geom_point(aes(color = "Pred") )+ facet_wrap(~indus) + labs(title = "Florida Data for No. Of Jobs by Sector",
y = "Year",x= "Jobs") + theme_classic()## Warning in predict.lm(lm_cpi_item_name2, ALL_test): prediction from a rank-
## deficient fit may be misleading
data.frame(pred=predict(lm_cpi_item_name2,ALL_test),actual = ALL_test$Jobs,year=ALL_test$Year,geo=ALL_test$GeoName, indus = ALL_test$C_Item) %>% filter(geo=="Texas") %>% ggplot(mapping = aes(x=pred,y=year )) +geom_point(mapping = aes(x=actual,color = "Actual")) + geom_point(aes(color = "Pred") )+ facet_wrap(~indus) + labs(title = "Texas Data for No. Of Jobs by Sector",
y = "Year",x= "Jobs") + theme_classic()## Warning in predict.lm(lm_cpi_item_name2, ALL_test): prediction from a rank-
## deficient fit may be misleading
5.4 RESULT OF MODEL
data.frame("LM Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm, ALL_test))),
"MAPE" = MLmetrics::MAPE(predict(lm, ALL_test),ALL_test$Jobs))## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
data.frame("lm_cpi2_item Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm_cpi2_item, ALL_test))),
"MAPE" = MLmetrics::MAPE(predict(lm_cpi2_item, ALL_test),ALL_test$Jobs))data.frame("lm_cpi_item Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm_cpi_item, ALL_test))),
"MAPE" = MLmetrics::MAPE(predict(lm_cpi2_item, ALL_test),ALL_test$Jobs))data.frame("lm_cpi_year Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm_cpi_year, ALL_test))),
"MAPE" = MLmetrics::MAPE(predict(lm_cpi_year, ALL_test),ALL_test$Jobs))data.frame("PLS Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(model_pls, ALL_test))),
"MAPE" = MLmetrics::MAPE(predict(model_pls, ALL_test),ALL_test$Jobs))cbind(data.frame("LM Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm, ALL_test))),
"MAPE" = MLmetrics::MAPE(predict(lm, ALL_test),ALL_test$Jobs)),
data.frame("lm_cpi_item_name2 Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm_cpi_item_name2, ALL_test))),
"MAPE" = MLmetrics::MAPE(predict(lm_cpi_item_name2, ALL_test),ALL_test$Jobs)),
data.frame("lm_cpi_item Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm_cpi_item, ALL_test))),
"MAPE" = MLmetrics::MAPE(predict(lm_cpi2_item, ALL_test),ALL_test$Jobs)),
data.frame("lm_cpi_year Model"= defaultSummary(data.frame(obs=ALL_NY_test$Jobs,pred=predict(lm_cpi_year, ALL_test))),
"MAPE" = MLmetrics::MAPE(predict(lm_cpi_year, ALL_test),ALL_test$Jobs)),
data.frame("PLS Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(model_pls, ALL_test))),
"MAPE" = MLmetrics::MAPE(predict(model_pls, ALL_test),ALL_test$Jobs))
)## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(lm_cpi_item_name2, ALL_test): prediction from a rank-
## deficient fit may be misleading
## Warning in predict.lm(lm_cpi_item_name2, ALL_test): prediction from a rank-
## deficient fit may be misleading
## NEW MODEl
cbind(
data.frame("MLM1"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm_cpi_item_name, ALL_test))),
"MAPE" = MLmetrics::MAPE(predict(lm_cpi_item_name, ALL_test),ALL_test$Jobs)),
data.frame("lm_cpi_item_name2"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm_cpi_item_name2, ALL_test))),
"MAPE" = MLmetrics::MAPE(predict(lm_cpi_item_name2, ALL_test),ALL_test$Jobs)),
data.frame("lm_cpi_item_name3"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm_cpi_item_name3, ALL_test))),
"MAPE" = MLmetrics::MAPE(predict(lm_cpi_item_name3, ALL_test),ALL_test$Jobs)),
data.frame("LM Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm, ALL_test))),
"MAPE" = MLmetrics::MAPE(predict(lm, ALL_test),ALL_test$Jobs)),
data.frame("lm_cpi2_item Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm_cpi2_item, ALL_test))),
"MAPE" = MLmetrics::MAPE(predict(lm_cpi2_item, ALL_test),ALL_test$Jobs)),
data.frame("lm_cpi_item Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm_cpi_item, ALL_test))),
"MAPE" = MLmetrics::MAPE(predict(lm_cpi2_item, ALL_test),ALL_test$Jobs)),
data.frame("lm_cpi_year Model"= defaultSummary(data.frame(obs=ALL_NY_test$Jobs,pred=predict(lm_cpi_year, ALL_test))),
"MAPE" = MLmetrics::MAPE(predict(lm_cpi_year, ALL_test),ALL_test$Jobs)),
data.frame("PLS Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(model_pls, ALL_test))),"MAPE" = MLmetrics::MAPE(predict(model_pls, ALL_test),ALL_test$Jobs))
)## Warning in predict.lm(lm_cpi_item_name2, ALL_test): prediction from a rank-
## deficient fit may be misleading
## Warning in predict.lm(lm_cpi_item_name2, ALL_test): prediction from a rank-
## deficient fit may be misleading
## Warning in predict.lm(lm_cpi_item_name3, ALL_test): prediction from a rank-
## deficient fit may be misleading
## Warning in predict.lm(lm_cpi_item_name3, ALL_test): prediction from a rank-
## deficient fit may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
5.5 For WHOLE US
cbind(
data.frame("MLM1"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm_cpi_item_name, ALL_test)))),
data.frame("MLM2"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm_cpi_item_name2, ALL_test)))),
data.frame("MLM3"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm_cpi_item_name3, ALL_test)))),
data.frame("LM Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm, ALL_test)))),
data.frame("lm_cpi2_item Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm_cpi2_item, ALL_test)))),
data.frame("lm_cpi_item Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm_cpi_item, ALL_test)))),
data.frame("lm_cpi_year Model"= defaultSummary(data.frame(obs=ALL_NY_test$Jobs,pred=predict(lm_cpi_year, ALL_test)))),
data.frame("PLS Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(model_pls, ALL_test))))
)## Warning in predict.lm(lm_cpi_item_name2, ALL_test): prediction from a rank-
## deficient fit may be misleading
## Warning in predict.lm(lm_cpi_item_name3, ALL_test): prediction from a rank-
## deficient fit may be misleading
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
# collect resamples
bbmle::AICctab(lm_cpi_item, lm_cpi_item_name, lm_cpi_item_name2, lm_cpi_item_name3,lm_cpi2_item,weights= T)## dAICc df weight
## lm_cpi_item_name2 0.0 208 1
## lm_cpi_item_name3 191.5 133 <0.001
## lm_cpi_item_name 3296.7 62 <0.001
## lm_cpi_item 4693.8 7 <0.001
## lm_cpi2_item 4693.8 7 <0.001
# https://www.youtube.com/watch?v=4SG2CB2Yu1c
anova(lm_cpi_item, lm_cpi_item_name, lm_cpi_item_name2, lm_cpi_item_name3,lm_cpi2_item)5.6 multicollinearity
WE do see out model is multicollinear and it makes it hard to interpret our coefficients, and it reduces the power of your model to identify independent variables that are statistically significant. These are definitely serious problems. However, sometime we can’t fix multicollinearity or finding ways to fix is not important if the objective is met.
## GVIF Df GVIF^(1/(2*Df))
## CPI 5.719766e+02 1 23.916032
## GeoName 1.693252e+07 18 1.587807
## C_Item 8.507845e+00 4 1.306855
## Year 1.866223e+00 19 1.016554
## CPI:GeoName 5.418332e+09 18 1.863739
## Model :
## Jobs ~ (CPI * GeoName) + CPI + GeoName + C_Item + Year
## Model :
## Jobs ~ (CPI * GeoName) + (GeoName * C_Item) + CPI + GeoName +
## C_Item + Year
##
## Complete :
## (Intercept) CPI GeoNameArizona GeoNameCalifornia
## GeoNameArizona:C_ItemSAT 0 0 0 0
## GeoNameColorado GeoNameFlorida GeoNameGeorgia
## GeoNameArizona:C_ItemSAT 0 0 0
## GeoNameHawaii GeoNameIllinois GeoNameKansas
## GeoNameArizona:C_ItemSAT 0 0 0
## GeoNameMassachusetts GeoNameMichigan GeoNameMissouri
## GeoNameArizona:C_ItemSAT 0 0 0
## GeoNameNew York GeoNameOhio GeoNameOregon
## GeoNameArizona:C_ItemSAT 0 0 0
## GeoNamePennsylvania GeoNameTexas GeoNameWashington
## GeoNameArizona:C_ItemSAT 0 0 0
## GeoNameWisconsin C_ItemSAF C_ItemSAH C_ItemSAM
## GeoNameArizona:C_ItemSAT 0 0 0 0
## C_ItemSAT Year1999 Year2000 Year2001 Year2002 Year2003
## GeoNameArizona:C_ItemSAT 0 0 0 0 0 0
## Year2004 Year2005 Year2006 Year2007 Year2008 Year2009
## GeoNameArizona:C_ItemSAT 0 0 0 0 0 0
## Year2010 Year2011 Year2012 Year2013 Year2014 Year2015
## GeoNameArizona:C_ItemSAT 0 0 0 0 0 0
## Year2016 Year2017 CPI:GeoNameArizona
## GeoNameArizona:C_ItemSAT 0 0 0
## CPI:GeoNameCalifornia CPI:GeoNameColorado
## GeoNameArizona:C_ItemSAT 0 0
## CPI:GeoNameFlorida CPI:GeoNameGeorgia
## GeoNameArizona:C_ItemSAT 0 0
## CPI:GeoNameHawaii CPI:GeoNameIllinois
## GeoNameArizona:C_ItemSAT 0 0
## CPI:GeoNameKansas CPI:GeoNameMassachusetts
## GeoNameArizona:C_ItemSAT 0 0
## CPI:GeoNameMichigan CPI:GeoNameMissouri
## GeoNameArizona:C_ItemSAT 0 0
## CPI:GeoNameNew York CPI:GeoNameOhio CPI:GeoNameOregon
## GeoNameArizona:C_ItemSAT 0 0 0
## CPI:GeoNamePennsylvania CPI:GeoNameTexas
## GeoNameArizona:C_ItemSAT 0 0
## CPI:GeoNameWashington CPI:GeoNameWisconsin
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameArizona:C_ItemSAF GeoNameCalifornia:C_ItemSAF
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameColorado:C_ItemSAF GeoNameFlorida:C_ItemSAF
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameGeorgia:C_ItemSAF GeoNameHawaii:C_ItemSAF
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameIllinois:C_ItemSAF GeoNameKansas:C_ItemSAF
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameMassachusetts:C_ItemSAF
## GeoNameArizona:C_ItemSAT 0
## GeoNameMichigan:C_ItemSAF GeoNameMissouri:C_ItemSAF
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameNew York:C_ItemSAF GeoNameOhio:C_ItemSAF
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameOregon:C_ItemSAF GeoNamePennsylvania:C_ItemSAF
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameTexas:C_ItemSAF GeoNameWashington:C_ItemSAF
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameWisconsin:C_ItemSAF GeoNameArizona:C_ItemSAH
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameCalifornia:C_ItemSAH GeoNameColorado:C_ItemSAH
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameFlorida:C_ItemSAH GeoNameGeorgia:C_ItemSAH
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameHawaii:C_ItemSAH GeoNameIllinois:C_ItemSAH
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameKansas:C_ItemSAH GeoNameMassachusetts:C_ItemSAH
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameMichigan:C_ItemSAH GeoNameMissouri:C_ItemSAH
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameNew York:C_ItemSAH GeoNameOhio:C_ItemSAH
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameOregon:C_ItemSAH GeoNamePennsylvania:C_ItemSAH
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameTexas:C_ItemSAH GeoNameWashington:C_ItemSAH
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameWisconsin:C_ItemSAH GeoNameArizona:C_ItemSAM
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameCalifornia:C_ItemSAM GeoNameColorado:C_ItemSAM
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameFlorida:C_ItemSAM GeoNameGeorgia:C_ItemSAM
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameHawaii:C_ItemSAM GeoNameIllinois:C_ItemSAM
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameKansas:C_ItemSAM GeoNameMassachusetts:C_ItemSAM
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameMichigan:C_ItemSAM GeoNameMissouri:C_ItemSAM
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameNew York:C_ItemSAM GeoNameOhio:C_ItemSAM
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameOregon:C_ItemSAM GeoNamePennsylvania:C_ItemSAM
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameTexas:C_ItemSAM GeoNameWashington:C_ItemSAM
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameWisconsin:C_ItemSAM GeoNameCalifornia:C_ItemSAT
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameColorado:C_ItemSAT GeoNameFlorida:C_ItemSAT
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameGeorgia:C_ItemSAT GeoNameHawaii:C_ItemSAT
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameIllinois:C_ItemSAT GeoNameKansas:C_ItemSAT
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameMassachusetts:C_ItemSAT
## GeoNameArizona:C_ItemSAT 0
## GeoNameMichigan:C_ItemSAT GeoNameMissouri:C_ItemSAT
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameNew York:C_ItemSAT GeoNameOhio:C_ItemSAT
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameOregon:C_ItemSAT GeoNamePennsylvania:C_ItemSAT
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameTexas:C_ItemSAT GeoNameWashington:C_ItemSAT
## GeoNameArizona:C_ItemSAT 0 0
## GeoNameWisconsin:C_ItemSAT
## GeoNameArizona:C_ItemSAT 0