DATA 698 : Capstone Research Project

1 Overview
2 Capstone Project on CPI and Employment
3 Data Preparation
4 Employment vs CPI - by Industry and Area
5 RAJWANT WORK
6 References

1 Overview

We are in totally different era of twenty first century, and it gives us very rare situation where any positive news would help the humankind. We want to use the historical CPI data and find the relation of it with the employment, in hope that we would have some positive news on employment by following the trend of the data in past.

We feel that lower CPI would result in more job opportunity, as the it gives space for more competition in small business across sectors.

2 Capstone Project on CPI and Employment

The estimates of employment for 1998-2006 are based on the 2002 North American Industry Classification System (NAICS). The estimates for 2007-2010 are based on the 2007 NAICS. The estimates for 2011-2016 are based on the 2012 NAICS. The estimates for 2017 forward are based on the 2017 NAICS.
Excludes limited partners.
Under the 2007 NAICS, internet publishing and broadcasting was reclassified to other information services.

(NA) Not available.
(NM) Not meaningful.
1. Not shown to avoid disclosure of confidential information; estimates are included in higher-level totals.
1. Estimate for employment suppressed to cover corresponding estimate for earnings. Estimates for this item are included in the total.

Last updated: September 24, 2019- new statistics for 2018; revised statistics for 2014-2017.

3 Data Preparation

Load the required libraries

#packages <- c("R.rsp","pandocfilters","knitr","plyr","tidyr","dplyr","ggplot2","plotly","sqldf","MASS","reshape2","Amelia","mice","googleVis","stringi","ROCR","scatterplot3d","visdat","gridExtra","sjmisc", "devtools")
#install.packages (packages, repos="http://cran.us.r-project.org", dependencies=TRUE, lib="C:/Program Files/R/R-3.6.3/library")
#invisible (lapply (packages, library, character.only = T))
#devtools::install_github("hadley/ggplot2")
#if (!require(devtools)) install.packages("devtools", repos="http://cran.us.r-project.org", dependencies=TRUE, lib="C:/Program Files/R/R-3.6.3/library")
#devtools::install_github("boxuancui/DataExplorer")
#library("DataExplorer", lib.loc="C:/Program Files/R/R-3.6.3/library")
library(tidyverse)
library(kableExtra)
library(lubridate)
library(forecast)
library(stringi)
library(dplyr)

3.1 Load Employment Datasets

file_path <- "../data/cpi/SAINC/"
EMP_DATA_FILE <- paste0(file_path,"EMP9818.csv")
EMP_DATA <- read_delim(EMP_DATA_FILE, delim = ",", col_names = TRUE, trim_ws= TRUE) 
#,skip = 6,col_types = list(col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character(),col_character()))
# names(SAP_INV_DAT) = c("DROP","I_BILLDATE","I_ORG","I_DC","I_DOCCA","I_BILLTYPE","I_ORDER_REA","I_DIV","I_MATYPE","I_BILL_QTY","BLANK")
# SAP_INV_DAT<- SAP_INV_DAT[-which(is.na(SAP_INV_DAT$I_ORG)),] %>% .[-str_which(trimws(.$I_ORG),'SOrg.|-----'),c(-1,-11)]
# 3
# SAP_INV_DAT$I_BILLDATE <- date(parse_datetime(SAP_INV_DAT$I_BILLDATE, "%m/%d/%Y"))
# 
# head(SAP_INV_DAT)
summary(EMP_DATA)

##    GeoFIPS            GeoName              Region       TableName        
##  Length:7084        Length:7084        Min.   :1.000   Length:7084       
##  Class :character   Class :character   1st Qu.:3.000   Class :character  
##  Mode  :character   Mode  :character   Median :5.000   Mode  :character  
##                                        Mean   :4.475                     
##                                        3rd Qu.:6.000                     
##                                        Max.   :8.000                     
##                                        NA's   :122                       
##     LineCode      IndustryClassification Description            Unit          
##  Min.   :  10.0   Length:7084            Length:7084        Length:7084       
##  1st Qu.: 517.0   Class :character       Class :character   Class :character  
##  Median : 712.5   Mode  :character       Mode  :character   Mode  :character  
##  Mean   : 872.1                                                               
##  3rd Qu.:1103.0                                                               
##  Max.   :2012.0                                                               
##  NA's   :4                                                                    
##      1998               1999               2000               2001          
##  Length:7084        Length:7084        Length:7084        Length:7084       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##      2002               2003               2004               2005          
##  Length:7084        Length:7084        Length:7084        Length:7084       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##      2006               2007               2008               2009          
##  Length:7084        Length:7084        Length:7084        Length:7084       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##      2010               2011               2012               2013          
##  Length:7084        Length:7084        Length:7084        Length:7084       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##      2014               2015               2016               2017          
##  Length:7084        Length:7084        Length:7084        Length:7084       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##      2018          
##  Length:7084       
##  Class :character  
##  Mode  :character  
##                    
##                    
##                    
##

DT::datatable(EMP_DATA)

# Reading the Category and detail Desc for ref.
EMP_DEF_FILE <- paste0(file_path,"SAEMP25N__definition.xml")
IND_DESC <- XML::xmlToDataFrame(EMP_DEF_FILE)
names(IND_DESC) <- c("LineCode","Description")
glimpse(IND_DESC)

## Rows: 118
## Columns: 3
## $ LineCode    <fct> 10, 20, 40, 50, 60, 70, 80, 90, 100, 101, 102, 103, 200...
## $ Description <fct> "Total employment (number of jobs)", "Wage and salary e...
## $ NA          <fct> "A count of jobs, both full-time and part-time. It incl...

head(IND_DESC,5)

3.2 Show Relevant Employment Data

# MN-MAIN  | NN - RANGE  | NL - LAST |
EMAP_CAT_LAST <- EMP_DATA[,c(2,5,6,7)] %>% 
                  filter(GeoName=="United States") %>% 
                  mutate(ISM = ifelse(IndustryClassification=="...","M","N")) %>% 
                  separate(IndustryClassification, c("From", "To"))  %>% 
                  mutate(ISML = ifelse(is.na(To)==T,"L","N")) %>% 
                  filter(ISML %in% c("L","N")) %>%  
                  select(.,LineCode,Description)
#DT::datatable(EMAP_CAT_LAST)
EMP_DAT_CAT <- EMP_DATA[,c(2,5,9:29)] %>% 
                left_join(EMAP_CAT_LAST,.,by="LineCode")
glimpse(EMP_DAT_CAT)

## Rows: 7,080
## Columns: 24
## $ LineCode    <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,...
## $ Description <chr> "Total employment (number of jobs)", "Total employment ...
## $ GeoName     <chr> "United States", "Alabama", "Alaska", "Arizona", "Arkan...
## $ `1998`      <chr> "158481200", "2361892", "382166", "2616288", "1445536",...
## $ `1999`      <chr> "161531300", "2378217", "381307", "2695892", "1460374",...
## $ `2000`      <chr> "165370800", "2392225", "389734", "2801510", "1482449",...
## $ `2001`      <chr> "165522200", "2376053", "394565", "2829002", "1482587",...
## $ `2002`      <chr> "165095100", "2364828", "402187", "2847095", "1478929",...
## $ `2003`      <chr> "165921500", "2371430", "405621", "2917121", "1482035",...
## $ `2004`      <chr> "168839700", "2425649", "413864", "3041476", "1505095",...
## $ `2005`      <chr> "172338400", "2486833", "421419", "3219820", "1537680",...
## $ `2006`      <chr> "175868600", "2545556", "431320", "3375218", "1567682",...
## $ `2007`      <chr> "179543700", "2604078", "439825", "3465075", "1582858",...
## $ `2008`      <chr> "179213900", "2582591", "443538", "3402808", "1579283",...
## $ `2009`      <chr> "173636700", "2479507", "442447", "3228493", "1542944",...
## $ `2010`      <chr> "172901700", "2460298", "443904", "3181571", "1541272",...
## $ `2011`      <chr> "176091700", "2497933", "450364", "3239045", "1561948",...
## $ `2012`      <chr> "178979700", "2503678", "459222", "3295537", "1565142",...
## $ `2013`      <chr> "182325100", "2523338", "461110", "3371219", "1569249",...
## $ `2014`      <chr> "186233800", "2551872", "461327", "3448173", "1587414",...
## $ `2015`      <chr> "190315800", "2586885", "461767", "3548174", "1610779",...
## $ `2016`      <chr> "193371900", "2619154", "457371", "3646604", "1629237",...
## $ `2017`      <chr> "196825300", "2653968", "456799", "3751283", "1644432",...
## $ `2018`      <chr> "200746000", "2691517", "459178", "3859137", "1663188",...

head(EMP_DAT_CAT,130)

#DT::datatable(EMP_DAT_CAT)
temp_name <- names(EMP_DAT_CAT [,c(4:24)])
EMP_DAT_CAT_SAN <-  data.frame(lapply(EMP_DAT_CAT [,c(4:24)],function(x) {
                      gsub("(D)|(T)",NA,x)
                    })) 
names(EMP_DAT_CAT_SAN) = temp_name
#DT::datatable(EMP_DAT_CAT_SAN)
EMP_DAT_CAT_SAN <- cbind(EMP_DAT_CAT[,c(1:3)],EMP_DAT_CAT_SAN)
#DT::datatable(EMP_DAT_CAT_SAN)

3.2.1 Employment from Select Industry by State

3.2.1.1 AK - Alaska

EMP_DAT_CAT_SAN_AK <- EMP_DAT_CAT_SAN %>% 
                        filter(LineCode %in% c(705,1100,800,1600,1500)) %>% 
                        filter(GeoName=="Alaska") %>% 
                        reshape2::melt(
                          id.vars = c("LineCode","Description","GeoName"), 
                          variable.name = "Year", 
                          value.name = "Number of jobs")
EMP_DAT_CAT_SAN_AK %>%
  #sample_frac(0.33) %>% 
  DT::datatable()

3.2.1.2 AZ - Arizona

EMP_DAT_CAT_SAN_AZ <- EMP_DAT_CAT_SAN %>% 
                        filter(LineCode %in% c(705,1100,800,1600,1500)) %>% 
                        filter(GeoName=="Arizona") %>% 
                        reshape2::melt(
                          id.vars = c("LineCode","Description","GeoName"), 
                          variable.name = "Year", 
                          value.name = "Number of jobs")
EMP_DAT_CAT_SAN_AZ %>%
  #sample_frac(0.33) %>% 
  DT::datatable()

3.2.1.3 CA - California

EMP_DAT_CAT_SAN_CA <- EMP_DAT_CAT_SAN %>% 
                        filter(LineCode %in% c(705,1100,800,1600,1500)) %>% 
                        filter(GeoName=="California") %>% 
                        reshape2::melt(
                          id.vars = c("LineCode","Description","GeoName"), 
                          variable.name = "Year", 
                          value.name = "Number of jobs")
EMP_DAT_CAT_SAN_CA %>%
  #sample_frac(0.33) %>% 
  DT::datatable()

3.2.1.4 CO - Colorado

EMP_DAT_CAT_SAN_CO <- EMP_DAT_CAT_SAN %>% 
                        filter(LineCode %in% c(705,1100,800,1600,1500)) %>% 
                        filter(GeoName=="Colorado") %>% 
                        reshape2::melt(
                          id.vars = c("LineCode","Description","GeoName"), 
                          variable.name = "Year", 
                          value.name = "Number of jobs")
EMP_DAT_CAT_SAN_CO %>%
  #sample_frac(0.33) %>% 
  DT::datatable()

3.2.1.5 FL - Florida

EMP_DAT_CAT_SAN_FL <- EMP_DAT_CAT_SAN %>% 
                        filter(LineCode %in% c(705,1100,800,1600,1500)) %>% 
                        filter(GeoName=="Florida") %>% 
                        reshape2::melt(
                          id.vars = c("LineCode","Description","GeoName"), 
                          variable.name = "Year", 
                          value.name = "Number of jobs")
EMP_DAT_CAT_SAN_FL %>%
  #sample_frac(0.33) %>% 
  DT::datatable()

3.2.1.6 GA - Georgia

EMP_DAT_CAT_SAN_GA <- EMP_DAT_CAT_SAN %>% 
                        filter(LineCode %in% c(705,1100,800,1600,1500)) %>% 
                        filter(GeoName=="Georgia") %>% 
                        reshape2::melt(
                          id.vars = c("LineCode","Description","GeoName"), 
                          variable.name = "Year", 
                          value.name = "Number of jobs")
EMP_DAT_CAT_SAN_GA %>%
  #sample_frac(0.33) %>% 
  DT::datatable()

3.2.1.7 HI - Hawaii

EMP_DAT_CAT_SAN_HI <- EMP_DAT_CAT_SAN %>% 
                        filter(LineCode %in% c(705,1100,800,1600,1500)) %>% 
                        filter(GeoName=="Hawaii") %>% 
                        reshape2::melt(
                          id.vars = c("LineCode","Description","GeoName"), 
                          variable.name = "Year", 
                          value.name = "Number of jobs")
EMP_DAT_CAT_SAN_HI %>%
  #sample_frac(0.33) %>% 
  DT::datatable()

3.2.1.8 IL - Illinois

EMP_DAT_CAT_SAN_IL <- EMP_DAT_CAT_SAN %>% 
                        filter(LineCode %in% c(705,1100,800,1600,1500)) %>% 
                        filter(GeoName=="Illinois") %>% 
                        reshape2::melt(
                          id.vars = c("LineCode","Description","GeoName"), 
                          variable.name = "Year", 
                          value.name = "Number of jobs")
EMP_DAT_CAT_SAN_IL %>%
  #sample_frac(0.33) %>% 
  DT::datatable()

3.2.1.9 KS - Kansas

EMP_DAT_CAT_SAN_KS <- EMP_DAT_CAT_SAN %>% 
                        filter(LineCode %in% c(705,1100,800,1600,1500)) %>% 
                        filter(GeoName=="Kansas") %>% 
                        reshape2::melt(
                          id.vars = c("LineCode","Description","GeoName"), 
                          variable.name = "Year", 
                          value.name = "Number of jobs")
EMP_DAT_CAT_SAN_KS %>%
  #sample_frac(0.33) %>% 
  DT::datatable()

3.2.1.10 MA - Massachusetts

EMP_DAT_CAT_SAN_MA <- EMP_DAT_CAT_SAN %>% 
                        filter(LineCode %in% c(705,1100,800,1600,1500)) %>% 
                        filter(GeoName=="Massachusetts") %>% 
                        reshape2::melt(
                          id.vars = c("LineCode","Description","GeoName"), 
                          variable.name = "Year", 
                          value.name = "Number of jobs")
EMP_DAT_CAT_SAN_MA %>%
  #sample_frac(0.33) %>% 
  DT::datatable()

3.2.1.11 MI - Michigan

EMP_DAT_CAT_SAN_MI <- EMP_DAT_CAT_SAN %>% 
                        filter(LineCode %in% c(705,1100,800,1600,1500)) %>% 
                        filter(GeoName=="Michigan") %>% 
                        reshape2::melt(
                          id.vars = c("LineCode","Description","GeoName"), 
                          variable.name = "Year", 
                          value.name = "Number of jobs")
EMP_DAT_CAT_SAN_MI %>%
  #sample_frac(0.33) %>% 
  DT::datatable()

3.2.1.12 MN - Minnesota

EMP_DAT_CAT_SAN_MN <- EMP_DAT_CAT_SAN %>% 
                        filter(LineCode %in% c(705,1100,800,1600,1500)) %>% 
                        filter(GeoName=="Michigan") %>% 
                        reshape2::melt(
                          id.vars = c("LineCode","Description","GeoName"), 
                          variable.name = "Year", 
                          value.name = "Number of jobs")
EMP_DAT_CAT_SAN_MN %>%
  #sample_frac(0.33) %>% 
  DT::datatable()

3.2.1.13 MO - Missouri

EMP_DAT_CAT_SAN_MO <- EMP_DAT_CAT_SAN %>% 
                        filter(LineCode %in% c(705,1100,800,1600,1500)) %>% 
                        filter(GeoName=="Missouri") %>% 
                        reshape2::melt(
                          id.vars = c("LineCode","Description","GeoName"), 
                          variable.name = "Year", 
                          value.name = "Number of jobs")
EMP_DAT_CAT_SAN_MO %>%
  #sample_frac(0.33) %>% 
  DT::datatable()

3.2.1.14 NY - New York

EMP_DAT_CAT_SAN_NY <- EMP_DAT_CAT_SAN %>% 
                        filter(LineCode %in% c(705,1100,800,1600,1500)) %>% 
                        filter(GeoName=="New York") %>% 
                        reshape2::melt(
                          id.vars = c("LineCode","Description","GeoName"), 
                          variable.name = "Year", 
                          value.name = "Number of jobs")
EMP_DAT_CAT_SAN_NY %>%
  #sample_frac(0.33) %>% 
  DT::datatable()

3.2.1.15 OH - Ohio

EMP_DAT_CAT_SAN_OH <- EMP_DAT_CAT_SAN %>% 
                        filter(LineCode %in% c(705,1100,800,1600,1500)) %>% 
                        filter(GeoName=="Ohio") %>% 
                        reshape2::melt(
                          id.vars = c("LineCode","Description","GeoName"), 
                          variable.name = "Year", 
                          value.name = "Number of jobs")
EMP_DAT_CAT_SAN_OH %>%
  #sample_frac(0.33) %>% 
  DT::datatable()

3.2.1.16 OR - Oregon

EMP_DAT_CAT_SAN_OR <- EMP_DAT_CAT_SAN %>% 
                        filter(LineCode %in% c(705,1100,800,1600,1500)) %>% 
                        filter(GeoName=="Oregon") %>% 
                        reshape2::melt(
                          id.vars = c("LineCode","Description","GeoName"), 
                          variable.name = "Year", 
                          value.name = "Number of jobs")
EMP_DAT_CAT_SAN_OR %>%
  #sample_frac(0.33) %>% 
  DT::datatable()

3.2.1.17 PA - Pennsylvania

EMP_DAT_CAT_SAN_PA <- EMP_DAT_CAT_SAN %>% 
                        filter(LineCode %in% c(705,1100,800,1600,1500)) %>% 
                        filter(GeoName=="Pennsylvania") %>% 
                        reshape2::melt(
                          id.vars = c("LineCode","Description","GeoName"), 
                          variable.name = "Year", 
                          value.name = "Number of jobs")
EMP_DAT_CAT_SAN_PA %>%
  #sample_frac(0.33) %>% 
  DT::datatable()

3.2.1.18 TX - Texas

EMP_DAT_CAT_SAN_TX <- EMP_DAT_CAT_SAN %>% 
                        filter(LineCode %in% c(705,1100,800,1600,1500)) %>% 
                        filter(GeoName=="Texas") %>% 
                        reshape2::melt(
                          id.vars = c("LineCode","Description","GeoName"), 
                          variable.name = "Year", 
                          value.name = "Number of jobs")
EMP_DAT_CAT_SAN_TX %>%
  #sample_frac(0.33) %>% 
  DT::datatable()

3.2.1.19 WA - Washington

EMP_DAT_CAT_SAN_WA <- EMP_DAT_CAT_SAN %>% 
                        filter(LineCode %in% c(705,1100,800,1600,1500)) %>% 
                        filter(GeoName=="Washington") %>% 
                        reshape2::melt(
                          id.vars = c("LineCode","Description","GeoName"), 
                          variable.name = "Year", 
                          value.name = "Number of jobs")
EMP_DAT_CAT_SAN_WA %>%
  #sample_frac(0.33) %>% 
  DT::datatable()

3.2.1.20 WI - Wisconsin

EMP_DAT_CAT_SAN_WI <- EMP_DAT_CAT_SAN %>% 
                        filter(LineCode %in% c(705,1100,800,1600,1500)) %>% 
                        filter(GeoName=="Wisconsin") %>% 
                        reshape2::melt(
                          id.vars = c("LineCode","Description","GeoName"), 
                          variable.name = "Year", 
                          value.name = "Number of jobs")
EMP_DAT_CAT_SAN_WI %>%
  #sample_frac(0.33) %>% 
  DT::datatable()

3.3 Load CPI Data

file_path <- "../data/cpi/archive/"
### Read files named *.csv
filenames <- list.files(path=file_path,
                        pattern="*.csv",
                        full.names = TRUE)
###print(filenames)
###Create list of data frame names without the ".csv" part 
names <- gsub(pattern='\\.', 
              replacement='_', # remove the last 4 characters from filename (_clean)
              x=(sub(pattern = "(.*)\\..*$", replacement = "\\1", # remove extenson from filename (.csv)
                    basename(filenames) # get filename without the directory file path
                   )
                 )
              )
###print(names)
###Load all files
for(i in filenames){
  ## Create list of data frame names without the ".csv" part 
  name <- gsub(pattern='\\.', 
               replacement='_', # remove the last 4 characters from filename (_clean)
               x=(sub(pattern = "(.*)\\..*$", replacement = "\\1", # remove extenson from filename (.csv)
                      basename(i) # get filename without the directory file path
                     )
                 )
              )
   print(name)
   assign(x=name, value=data.frame(read.csv(i)))
   #DT::datatable(name)
}

## [1] "cu_area"
## [1] "cu_base"
## [1] "cu_data_0_Current"
## [1] "cu_data_1_AllItems"
## [1] "cu_data_10_OtherWest"
## [1] "cu_data_11_USFoodBeverage"
## [1] "cu_data_12_USHousing"
## [1] "cu_data_13_USApparel"
## [1] "cu_data_14_USTransportation"
## [1] "cu_data_15_USMedical"
## [1] "cu_data_16_USRecreation"
## [1] "cu_data_17_USEducationAndCommunication"
## [1] "cu_data_18_USOtherGoodsAndServices"
## [1] "cu_data_19_PopulationSize"
## [1] "cu_data_2_Summaries"
## [1] "cu_data_20_USCommoditiesServicesSpecial"
## [1] "cu_data_3_AsizeNorthEast"
## [1] "cu_data_4_AsizeNorthCentral"
## [1] "cu_data_5_AsizeSouth"
## [1] "cu_data_6_AsizeWest"
## [1] "cu_data_7_OtherNorthEast"
## [1] "cu_data_8_OtherNorthCentral"
## [1] "cu_data_9_OtherSouth"
## [1] "cu_footnote"
## [1] "cu_item"
## [1] "cu_period"
## [1] "cu_periodicity"
## [1] "cu_series"

3.4 Show CPI Reference Data

3.4.1 Area

3.4.2 Base

3.4.3 Item

3.4.4 Periodicity

3.4.5 Series

3.4.6 Period

3.4.7 Combined Reference Data

3.5 Show CPI Data from Select Industry

3.5.1 CPI - US Food Beverages

FoodItemCode <- cu_reference_data %>%
                    filter(grepl('SAF|SEF|SS0|SS1|SS20', item_code))
                    #dplyr::filter(item_code %like% 'SAH' | item_code %like% 'SEH')
#DT::datatable((FoodItemCode))
cu_data_USFoodBeverage = cu_data_0_Current %>% 
                              inner_join(FoodItemCode, by="series_id") %>% 
                              inner_join(cu_period, by="period") %>% 
                               filter(year %in% (1998:2018)) %>% 
                               select(c("series_id",
                                        "series_title",
                                        "year",
                                        "period",
                                        "period_abbr",
                                        "period_name",
                                        "value",
                                        "area_code",
                                        "area_name",
                                        "area_display_level",
                                        "area_selectable",
                                        "area_sort_sequence",
                                        "item_code",
                                        "item_name",
                                        "item_display_level",
                                        "item_selectable",
                                        "item_sort_sequence",
                                        "seasonal",
                                        "periodicity_code",
                                        "periodicity_name",
                                        "base_code",
                                        "base_name",
                                        "base_period"))
DT::datatable(head(cu_data_USFoodBeverage))

3.5.2 CPI - US Housing

HousingItemCode <- cu_reference_data %>%
                    #filter(grepl('SAH|SEH', item_code, fixed = TRUE))
                    filter(grepl('SAH|SEH', item_code))
                    #dplyr::filter(item_code %like% 'SAH' | item_code %like% 'SEH')
#DT::datatable((HousingItemCode))
cu_data_USHousing = cu_data_0_Current %>% 
                              inner_join(HousingItemCode, by="series_id") %>% 
                              inner_join(cu_period, by="period") %>% 
                               #filter(item_code %like% c('SAH','SEH')) %>%
                               filter(year %in% (1998:2018)) %>% 
                               select(c("series_id",
                                        "series_title",
                                        "year",
                                        "period",
                                        "period_abbr",
                                        "period_name",
                                        "value",
                                        "area_code",
                                        "area_name",
                                        "area_display_level",
                                        "area_selectable",
                                        "area_sort_sequence",
                                        "item_code",
                                        "item_name",
                                        "item_display_level",
                                        "item_selectable",
                                        "item_sort_sequence",
                                        "seasonal",
                                        "periodicity_code",
                                        "periodicity_name",
                                        "base_code",
                                        "base_name",
                                        "base_period"))
DT::datatable(head(cu_data_USHousing))

3.5.3 CPI - US Transportation

TransportItemCode <- cu_reference_data %>%
                    filter(grepl('SAT|SET|SS4|SS52|SS53|SAS24|SAS4', item_code))
cu_data_USTransportation = cu_data_0_Current %>% 
                              inner_join(TransportItemCode, by="series_id") %>% 
                              inner_join(cu_period, by="period") %>% 
                               filter(year %in% (1998:2018)) %>% 
                               select(c("series_id",
                                        "series_title",
                                        "year",
                                        "period",
                                        "period_abbr",
                                        "period_name",
                                        "value",
                                        "area_code",
                                        "area_name",
                                        "area_display_level",
                                        "area_selectable",
                                        "area_sort_sequence",
                                        "item_code",
                                        "item_name",
                                        "item_display_level",
                                        "item_selectable",
                                        "item_sort_sequence",
                                        "seasonal",
                                        "periodicity_code",
                                        "periodicity_name",
                                        "base_code",
                                        "base_name",
                                        "base_period"))
DT::datatable(head(cu_data_USTransportation))

3.5.4 CPI - US Medical

MedicalItemCode <- cu_reference_data %>%
                    filter(grepl('SAM|SEM|SS57', item_code))
cu_data_USMedical = cu_data_0_Current %>% 
                              inner_join(MedicalItemCode, by="series_id") %>% 
                              inner_join(cu_period, by="period") %>% 
                               filter(year %in% (1998:2018)) %>% 
                               select(c("series_id",
                                        "series_title",
                                        "year",
                                        "period",
                                        "period_abbr",
                                        "period_name",
                                        "value",
                                        "area_code",
                                        "area_name",
                                        "area_display_level",
                                        "area_selectable",
                                        "area_sort_sequence",
                                        "item_code",
                                        "item_name",
                                        "item_display_level",
                                        "item_selectable",
                                        "item_sort_sequence",
                                        "seasonal",
                                        "periodicity_code",
                                        "periodicity_name",
                                        "base_code",
                                        "base_name",
                                        "base_period"))
DT::datatable(head(cu_data_USMedical))

3.5.5 CPI - US Education And Communication

EducationItemCode <- cu_reference_data %>%
                    filter(grepl('SAE|SEE|SSE|SS27', item_code))
cu_data_USEducationAndCommunication = cu_data_0_Current %>% 
                                        inner_join(EducationItemCode, by="series_id") %>% 
                                        inner_join(cu_period, by="period") %>% 
                                         filter(year %in% (1998:2018)) %>% 
                                         select(c("series_id",
                                                  "series_title",
                                                  "year",
                                                  "period",
                                                  "period_abbr",
                                                  "period_name",
                                                  "value",
                                                  "area_code",
                                                  "area_name",
                                                  "area_display_level",
                                                  "area_selectable",
                                                  "area_sort_sequence",
                                                  "item_code",
                                                  "item_name",
                                                  "item_display_level",
                                                  "item_selectable",
                                                  "item_sort_sequence",
                                                  "seasonal",
                                                  "periodicity_code",
                                                  "periodicity_name",
                                                  "base_code",
                                                  "base_name",
                                                  "base_period"))
DT::datatable(head(cu_data_USEducationAndCommunication))

4 Employment vs CPI - by Industry and Area

4.1 Food & Beverages

4.1.1 AK - Alaska : Anchorage

cu_emp_data_USFoodBeverage_AK = cu_data_USFoodBeverage %>% 
                                  filter(area_name=="Anchorage", 
                                         period %in% c("S01","S02")) %>% 
                                  mutate(parent_item_code = 'SAF', 
                                         emp_line_code = 705,
                                         industry = 'Food & Beverages',
                                         state = 'AK') %>% 
                                  select(c("industry",
                                            "state",
                                            "parent_item_code",
                                            "emp_line_code",
                                            "series_id",
                                            "series_title",
                                            "year",
                                            "period",
                                            "period_abbr",
                                            "period_name",
                                            "value",
                                            "area_code",
                                            "area_name",
                                            "area_display_level",
                                            "area_selectable",
                                            "area_sort_sequence",
                                            "item_code",
                                            "item_name",
                                            "item_display_level",
                                            "item_selectable",
                                            "item_sort_sequence",
                                            "seasonal",
                                            "periodicity_code",
                                            "periodicity_name",
                                            "base_code",
                                            "base_name",
                                            "base_period")) %>% 
                                  group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                  summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USFoodBeverage_AK = merge(EMP_DAT_CAT_SAN_AK, cu_emp_data_USFoodBeverage_AK, 
                                       by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_AK %>%
  ###sample_frac(0.33) %>% 
  DT::datatable()

4.1.2 AZ - Arizona : Phoenix-Mesa

cu_emp_data_USFoodBeverage_AZ = cu_data_USFoodBeverage %>% 
                                  filter(area_name=="Phoenix-Mesa", 
                                         period %in% c("S01","S02")) %>% 
                                  mutate(parent_item_code = 'SAF', 
                                         emp_line_code = 705,
                                         industry = 'Food & Beverages',
                                         state = 'AZ') %>% 
                                  select(c("industry",
                                            "state",
                                            "parent_item_code",
                                            "emp_line_code",
                                            "series_id",
                                            "series_title",
                                            "year",
                                            "period",
                                            "period_abbr",
                                            "period_name",
                                            "value",
                                            "area_code",
                                            "area_name",
                                            "area_display_level",
                                            "area_selectable",
                                            "area_sort_sequence",
                                            "item_code",
                                            "item_name",
                                            "item_display_level",
                                            "item_selectable",
                                            "item_sort_sequence",
                                            "seasonal",
                                            "periodicity_code",
                                            "periodicity_name",
                                            "base_code",
                                            "base_name",
                                            "base_period")) %>% 
                                  group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                  summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USFoodBeverage_AZ = merge(EMP_DAT_CAT_SAN_AZ, cu_emp_data_USFoodBeverage_AZ, 
                                       by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_AZ %>%
  ###sample_frac(0.33) %>% 
  DT::datatable()

4.1.3 CA - California

Los Angeles-Riverside-Orange County
San Francisco-Oakland-San Jose
San Diego

cu_emp_data_USFoodBeverage_CA = cu_data_USFoodBeverage %>% 
                                   filter( (area_name %in% c("Los Angeles-Riverside-Orange County", "San Francisco-Oakland-San Jose")
                                            & period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
                                            )
                                           |
                                            (area_name %in% c("San Diego")
                                             & period %in% c("S01","S02")
                                             )
                                         ) %>% 
                                  mutate(parent_item_code = 'SAF', 
                                         emp_line_code = 705,
                                         area_code = 'A421;A422;A424',
                                         area_name = 'Los Angeles-Riverside-Orange County;San Francisco-Oakland-San Jose;San Diego',
                                         industry = 'Food & Beverages',
                                         state = 'CA') %>% 
                                  select(c("industry",
                                            "state",
                                            "parent_item_code",
                                            "emp_line_code",
                                            "series_id",
                                            "series_title",
                                            "year",
                                            "period",
                                            "period_abbr",
                                            "period_name",
                                            "value",
                                            "area_code",
                                            "area_name",
                                            "area_display_level",
                                            "area_selectable",
                                            "area_sort_sequence",
                                            "item_code",
                                            "item_name",
                                            "item_display_level",
                                            "item_selectable",
                                            "item_sort_sequence",
                                            "seasonal",
                                            "periodicity_code",
                                            "periodicity_name",
                                            "base_code",
                                            "base_name",
                                            "base_period")) %>% 
                                  group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                  summarise(cpi_value = round(sum(value/3),2))
cpi_emp_data_USFoodBeverage_CA = merge(EMP_DAT_CAT_SAN_CA, cu_emp_data_USFoodBeverage_CA, 
                                       by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_CA %>%
  ###sample_frac(0.33) %>% 
  DT::datatable()

4.1.4 CO - Colorado : Denver-Boulder-Greeley

cu_emp_data_USFoodBeverage_CO = cu_data_USFoodBeverage %>% 
                                  filter(area_name %in% c("Denver-Boulder-Greeley"), 
                                         period %in% c("S01","S02")) %>% 
                                  mutate(parent_item_code = 'SAF', 
                                         emp_line_code = 705,
                                         industry = 'Food & Beverages',
                                         state = 'CO') %>% 
                                  select(c("industry",
                                            "state",
                                            "parent_item_code",
                                            "emp_line_code",
                                            "series_id",
                                            "series_title",
                                            "year",
                                            "period",
                                            "period_abbr",
                                            "period_name",
                                            "value",
                                            "area_code",
                                            "area_name",
                                            "area_display_level",
                                            "area_selectable",
                                            "area_sort_sequence",
                                            "item_code",
                                            "item_name",
                                            "item_display_level",
                                            "item_selectable",
                                            "item_sort_sequence",
                                            "seasonal",
                                            "periodicity_code",
                                            "periodicity_name",
                                            "base_code",
                                            "base_name",
                                            "base_period")) %>% 
                                  group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                  summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USFoodBeverage_CO = merge(EMP_DAT_CAT_SAN_CO, cu_emp_data_USFoodBeverage_CO, 
                                       by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_CO %>%
  ###sample_frac(0.33) %>% 
  DT::datatable()

4.1.5 FL - Florida

Miami-Fort Lauderdale
Tampa-St. Petersburg-Clearwater

cu_emp_data_USFoodBeverage_FL = cu_data_USFoodBeverage %>% 
                                  filter( (area_name %in% c("Miami-Fort Lauderdale")
                                          & period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
                                          )
                                         |
                                          (area_name %in% c("Tampa-St. Petersburg-Clearwater")
                                           & period %in% c("S01","S02")
                                           )
                                       ) %>% 
                                  mutate(parent_item_code = 'SAF', 
                                         emp_line_code = 705,
                                         area_code = 'A320;A321',
                                         area_name = 'Miami-Fort Lauderdale;Tampa-St. Petersburg-Clearwater',
                                         industry = 'Food & Beverages',
                                         state = 'FL') %>% 
                                  select(c("industry",
                                            "state",
                                            "parent_item_code",
                                            "emp_line_code",
                                            "series_id",
                                            "series_title",
                                            "year",
                                            "period",
                                            "period_abbr",
                                            "period_name",
                                            "value",
                                            "area_code",
                                            "area_name",
                                            "area_display_level",
                                            "area_selectable",
                                            "area_sort_sequence",
                                            "item_code",
                                            "item_name",
                                            "item_display_level",
                                            "item_selectable",
                                            "item_sort_sequence",
                                            "seasonal",
                                            "periodicity_code",
                                            "periodicity_name",
                                            "base_code",
                                            "base_name",
                                            "base_period")) %>% 
                                  group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                  summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USFoodBeverage_FL = merge(EMP_DAT_CAT_SAN_FL, cu_emp_data_USFoodBeverage_FL, 
                                       by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_FL %>%
  ###sample_frac(0.33) %>% 
  DT::datatable()

4.1.6 GA - Georgia : Atlanta

cu_emp_data_USFoodBeverage_GA = cu_data_USFoodBeverage %>% 
                                  filter(area_name=="Atlanta", 
                                         period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                                  mutate(industry = 'Food & Beverages',
                                         state = 'GA',
                                         parent_item_code = 'SAF', 
                                         emp_line_code = 705,
                                         industry = 'Food & Beverages',
                                         state = 'GA') %>% 
                                   select(c("industry",
                                            "state",
                                            "parent_item_code",
                                            "emp_line_code",
                                            "series_id",
                                            "series_title",
                                            "year",
                                            "period",
                                            "period_abbr",
                                            "period_name",
                                            "value",
                                            "area_code",
                                            "area_name",
                                            "area_display_level",
                                            "area_selectable",
                                            "area_sort_sequence",
                                            "item_code",
                                            "item_name",
                                            "item_display_level",
                                            "item_selectable",
                                            "item_sort_sequence",
                                            "seasonal",
                                            "periodicity_code",
                                            "periodicity_name",
                                            "base_code",
                                            "base_name",
                                            "base_period")) %>% 
                                  group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                  summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USFoodBeverage_GA = merge(EMP_DAT_CAT_SAN_GA, cu_emp_data_USFoodBeverage_GA, 
                                       by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_GA %>%
  ###sample_frac(0.33) %>% 
  DT::datatable()

4.1.7 HI - Hawaii : Honolulu

cu_emp_data_USFoodBeverage_HI = cu_data_USFoodBeverage %>% 
                                  filter(area_name=="Honolulu", 
                                         period %in% c("S01","S02")) %>% 
                                  mutate(industry = 'Food & Beverages',
                                         state = 'HI',
                                         parent_item_code = 'SAF', 
                                         emp_line_code = 705,
                                         industry = 'Food & Beverages',
                                         state = 'HI') %>% 
                                   select(c("industry",
                                            "state",
                                            "parent_item_code",
                                            "emp_line_code",
                                            "series_id",
                                            "series_title",
                                            "year",
                                            "period",
                                            "period_abbr",
                                            "period_name",
                                            "value",
                                            "area_code",
                                            "area_name",
                                            "area_display_level",
                                            "area_selectable",
                                            "area_sort_sequence",
                                            "item_code",
                                            "item_name",
                                            "item_display_level",
                                            "item_selectable",
                                            "item_sort_sequence",
                                            "seasonal",
                                            "periodicity_code",
                                            "periodicity_name",
                                            "base_code",
                                            "base_name",
                                            "base_period")) %>% 
                                  group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                  summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USFoodBeverage_HI = merge(EMP_DAT_CAT_SAN_HI, cu_emp_data_USFoodBeverage_HI, 
                                       by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_HI %>%
  ###sample_frac(0.33) %>% 
  DT::datatable()

4.1.8 IL - Illinois : Chicago-Gary-Kenosha

cu_emp_data_USFoodBeverage_IL = cu_data_USFoodBeverage %>% 
                                  filter(area_name=="Chicago-Gary-Kenosha", 
                                         period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                                  mutate(industry = 'Food & Beverages',
                                         state = 'IL',
                                         parent_item_code = 'SAF', 
                                         emp_line_code = 705,
                                         industry = 'Food & Beverages',
                                         state = 'IL') %>% 
                                   select(c("industry",
                                            "state",
                                            "parent_item_code",
                                            "emp_line_code",
                                            "series_id",
                                            "series_title",
                                            "year",
                                            "period",
                                            "period_abbr",
                                            "period_name",
                                            "value",
                                            "area_code",
                                            "area_name",
                                            "area_display_level",
                                            "area_selectable",
                                            "area_sort_sequence",
                                            "item_code",
                                            "item_name",
                                            "item_display_level",
                                            "item_selectable",
                                            "item_sort_sequence",
                                            "seasonal",
                                            "periodicity_code",
                                            "periodicity_name",
                                            "base_code",
                                            "base_name",
                                            "base_period")) %>% 
                                  group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                  summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USFoodBeverage_IL = merge(EMP_DAT_CAT_SAN_IL, cu_emp_data_USFoodBeverage_IL, 
                                       by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_IL %>%
  ###sample_frac(0.33) %>% 
  DT::datatable()

4.1.9 KS - Kansas : Kansas City

cu_emp_data_USFoodBeverage_KS = cu_data_USFoodBeverage %>% 
                                  filter(area_name=="Kansas City", 
                                         period %in% c("S01","S02")) %>% 
                                  mutate(industry = 'Food & Beverages',
                                         state = 'KS',
                                         parent_item_code = 'SAF', 
                                         emp_line_code = 705,
                                         industry = 'Food & Beverages',
                                         state = 'KS') %>% 
                                   select(c("industry",
                                            "state",
                                            "parent_item_code",
                                            "emp_line_code",
                                            "series_id",
                                            "series_title",
                                            "year",
                                            "period",
                                            "period_abbr",
                                            "period_name",
                                            "value",
                                            "area_code",
                                            "area_name",
                                            "area_display_level",
                                            "area_selectable",
                                            "area_sort_sequence",
                                            "item_code",
                                            "item_name",
                                            "item_display_level",
                                            "item_selectable",
                                            "item_sort_sequence",
                                            "seasonal",
                                            "periodicity_code",
                                            "periodicity_name",
                                            "base_code",
                                            "base_name",
                                            "base_period")) %>% 
                                  group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                  summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USFoodBeverage_KS = merge(EMP_DAT_CAT_SAN_KS, cu_emp_data_USFoodBeverage_KS, 
                                       by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_KS %>%
  ###sample_frac(0.33) %>% 
  DT::datatable()

4.1.10 MA - Massachusetts : Boston-Brockton-Nashua

cu_emp_data_USFoodBeverage_MA = cu_data_USFoodBeverage %>% 
                                  filter(area_name=="Boston-Brockton-Nashua", 
                                         period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                                  mutate(parent_item_code = 'SAF', 
                                         emp_line_code = 705,
                                         industry = 'Food & Beverages',
                                         state = 'MA') %>% 
                                   select(c("industry",
                                            "state",
                                            "parent_item_code",
                                            "emp_line_code",
                                            "series_id",
                                            "series_title",
                                            "year",
                                            "period",
                                            "period_abbr",
                                            "period_name",
                                            "value",
                                            "area_code",
                                            "area_name",
                                            "area_display_level",
                                            "area_selectable",
                                            "area_sort_sequence",
                                            "item_code",
                                            "item_name",
                                            "item_display_level",
                                            "item_selectable",
                                            "item_sort_sequence",
                                            "seasonal",
                                            "periodicity_code",
                                            "periodicity_name",
                                            "base_code",
                                            "base_name",
                                            "base_period")) %>% 
                                  group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                  summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USFoodBeverage_MA = merge(EMP_DAT_CAT_SAN_MA, cu_emp_data_USFoodBeverage_MA, 
                                       by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_MA %>%
  ###sample_frac(0.33) %>% 
  DT::datatable()

4.1.11 MI - Michigan : Detroit-Ann Arbor-Flint

cu_emp_data_USFoodBeverage_MI = cu_data_USFoodBeverage %>% 
                                  filter(area_name=="Detroit-Ann Arbor-Flint", 
                                         period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                                  mutate(parent_item_code = 'SAF', 
                                         emp_line_code = 705,
                                         industry = 'Food & Beverages',
                                         state = 'MI') %>% 
                                   select(c("industry",
                                            "state",
                                            "parent_item_code",
                                            "emp_line_code",
                                            "series_id",
                                            "series_title",
                                            "year",
                                            "period",
                                            "period_abbr",
                                            "period_name",
                                            "value",
                                            "area_code",
                                            "area_name",
                                            "area_display_level",
                                            "area_selectable",
                                            "area_sort_sequence",
                                            "item_code",
                                            "item_name",
                                            "item_display_level",
                                            "item_selectable",
                                            "item_sort_sequence",
                                            "seasonal",
                                            "periodicity_code",
                                            "periodicity_name",
                                            "base_code",
                                            "base_name",
                                            "base_period")) %>% 
                                  group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                  summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USFoodBeverage_MI = merge(EMP_DAT_CAT_SAN_MI, cu_emp_data_USFoodBeverage_MI, 
                                       by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_MI %>%
  ###sample_frac(0.33) %>% 
  DT::datatable()

4.1.12 MN - Minnesota : Minneapolis-St. Paul

cu_emp_data_USFoodBeverage_MN = cu_data_USFoodBeverage %>% 
                                  filter(area_name=="Minneapolis-St. Paul", 
                                         period %in% c("S01","S02")) %>% 
                                  mutate(parent_item_code = 'SAF', 
                                         emp_line_code = 705,
                                         industry = 'Food & Beverages',
                                         state = 'MN') %>% 
                                  select(c("industry",
                                            "state",
                                            "parent_item_code",
                                            "emp_line_code",
                                            "series_id",
                                            "series_title",
                                            "year",
                                            "period",
                                            "period_abbr",
                                            "period_name",
                                            "value",
                                            "area_code",
                                            "area_name",
                                            "area_display_level",
                                            "area_selectable",
                                            "area_sort_sequence",
                                            "item_code",
                                            "item_name",
                                            "item_display_level",
                                            "item_selectable",
                                            "item_sort_sequence",
                                            "seasonal",
                                            "periodicity_code",
                                            "periodicity_name",
                                            "base_code",
                                            "base_name",
                                            "base_period")) %>% 
                                  group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                  summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USFoodBeverage_MN = merge(EMP_DAT_CAT_SAN_MN, cu_emp_data_USFoodBeverage_MN, 
                                       by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_MN %>%
  ###sample_frac(0.33) %>% 
  DT::datatable()

4.1.13 MO - Missouri : St. Louis

cu_emp_data_USFoodBeverage_MO = cu_data_USFoodBeverage %>% 
                                  filter(area_name=="St. Louis", 
                                         period %in% c("S01","S02")) %>% 
                                  mutate(parent_item_code = 'SAF', 
                                         emp_line_code = 705,
                                         industry = 'Food & Beverages',
                                         state = 'MO') %>% 
                                  select(c("industry",
                                            "state",
                                            "parent_item_code",
                                            "emp_line_code",
                                            "series_id",
                                            "series_title",
                                            "year",
                                            "period",
                                            "period_abbr",
                                            "period_name",
                                            "value",
                                            "area_code",
                                            "area_name",
                                            "area_display_level",
                                            "area_selectable",
                                            "area_sort_sequence",
                                            "item_code",
                                            "item_name",
                                            "item_display_level",
                                            "item_selectable",
                                            "item_sort_sequence",
                                            "seasonal",
                                            "periodicity_code",
                                            "periodicity_name",
                                            "base_code",
                                            "base_name",
                                            "base_period")) %>% 
                                  group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                  summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USFoodBeverage_MO = merge(EMP_DAT_CAT_SAN_MO, cu_emp_data_USFoodBeverage_MO, 
                                       by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_MO %>%
  ###sample_frac(0.33) %>% 
  DT::datatable()

4.1.14 NY - New York : New York-Northern New Jersey-Long Island

cu_emp_data_USFoodBeverage_NY = cu_data_USFoodBeverage %>% 
                                  filter(area_name=="New York-Northern New Jersey-Long Island", 
                                         period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                                  mutate(parent_item_code = 'SAF', 
                                         emp_line_code = 705,
                                         industry = 'Food & Beverages',
                                         state = 'NY') %>% 
                                  select(c("industry",
                                            "state",
                                            "parent_item_code",
                                            "emp_line_code",
                                            "series_id",
                                            "series_title",
                                            "year",
                                            "period",
                                            "period_abbr",
                                            "period_name",
                                            "value",
                                            "area_code",
                                            "area_name",
                                            "area_display_level",
                                            "area_selectable",
                                            "area_sort_sequence",
                                            "item_code",
                                            "item_name",
                                            "item_display_level",
                                            "item_selectable",
                                            "item_sort_sequence",
                                            "seasonal",
                                            "periodicity_code",
                                            "periodicity_name",
                                            "base_code",
                                            "base_name",
                                            "base_period")) %>% 
                                  group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                  summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USFoodBeverage_NY = merge(EMP_DAT_CAT_SAN_NY, cu_emp_data_USFoodBeverage_NY, 
                                       by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_NY %>%
  ###sample_frac(0.33) %>% 
  DT::datatable()

4.1.15 OH - Ohio

Cleveland-Akron
Cincinnati-Hamilton

cu_emp_data_USFoodBeverage_OH = cu_data_USFoodBeverage %>% 
                                  filter( (area_name %in% c("Cleveland-Akron")
                                          & period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
                                          )
                                         |
                                          (area_name %in% c("Cincinnati-Hamilton")
                                           & period %in% c("S01","S02")
                                           )
                                       ) %>% 
                                  mutate(parent_item_code = 'SAF', 
                                         emp_line_code = 705,
                                         area_code = 'A210;A213', 
                                         area_name = 'Cleveland-Akron;Cincinnati-Hamilton',
                                         industry = 'Food & Beverages',
                                         state = 'OH') %>% 
                                  select(c("industry",
                                            "state",
                                            "parent_item_code",
                                            "emp_line_code",
                                            "series_id",
                                            "series_title",
                                            "year",
                                            "period",
                                            "period_abbr",
                                            "period_name",
                                            "value",
                                            "area_code",
                                            "area_name",
                                            "area_display_level",
                                            "area_selectable",
                                            "area_sort_sequence",
                                            "item_code",
                                            "item_name",
                                            "item_display_level",
                                            "item_selectable",
                                            "item_sort_sequence",
                                            "seasonal",
                                            "periodicity_code",
                                            "periodicity_name",
                                            "base_code",
                                            "base_name",
                                            "base_period")) %>% 
                                  group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                  summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USFoodBeverage_OH = merge(EMP_DAT_CAT_SAN_OH, cu_emp_data_USFoodBeverage_OH, 
                                       by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_OH %>%
  ###sample_frac(0.33) %>% 
  DT::datatable()

4.1.16 OR - Oregon : Portland-Salem

cu_emp_data_USFoodBeverage_OR = cu_data_USFoodBeverage %>% 
                                  filter(area_name=="Portland-Salem", 
                                         period %in% c("S01","S02")) %>% 
                                  mutate(parent_item_code = 'SAF', 
                                         emp_line_code = 705,
                                         industry = 'Food & Beverages',
                                         state = 'OR') %>% 
                                  select(c("industry",
                                            "state",
                                            "parent_item_code",
                                            "emp_line_code",
                                            "series_id",
                                            "series_title",
                                            "year",
                                            "period",
                                            "period_abbr",
                                            "period_name",
                                            "value",
                                            "area_code",
                                            "area_name",
                                            "area_display_level",
                                            "area_selectable",
                                            "area_sort_sequence",
                                            "item_code",
                                            "item_name",
                                            "item_display_level",
                                            "item_selectable",
                                            "item_sort_sequence",
                                            "seasonal",
                                            "periodicity_code",
                                            "periodicity_name",
                                            "base_code",
                                            "base_name",
                                            "base_period")) %>% 
                                  group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                  summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USFoodBeverage_OR = merge(EMP_DAT_CAT_SAN_OR, cu_emp_data_USFoodBeverage_OR, 
                                       by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_OR %>%
  ###sample_frac(0.33) %>% 
  DT::datatable()

4.1.17 PA - Pennsylvania

Philadelphia-Wilmington-Atlantic City
Pittsburgh

cu_emp_data_USFoodBeverage_PA = cu_data_USFoodBeverage %>% 
                                  filter( (area_name %in% c("Philadelphia-Wilmington-Atlantic City")
                                          & period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
                                          )
                                         |
                                          (area_name %in% c("Pittsburgh")
                                           & period %in% c("S01","S02")
                                           )
                                       ) %>% 
                                  mutate(parent_item_code = 'SAF', 
                                         emp_line_code = 705,
                                         area_code = 'A102;A104', 
                                         area_name = 'Philadelphia-Wilmington-Atlantic City;Pittsburgh',
                                         industry = 'Food & Beverages',
                                         state = 'PA') %>% 
                                  select(c("industry",
                                            "state",
                                            "parent_item_code",
                                            "emp_line_code",
                                            "series_id",
                                            "series_title",
                                            "year",
                                            "period",
                                            "period_abbr",
                                            "period_name",
                                            "value",
                                            "area_code",
                                            "area_name",
                                            "area_display_level",
                                            "area_selectable",
                                            "area_sort_sequence",
                                            "item_code",
                                            "item_name",
                                            "item_display_level",
                                            "item_selectable",
                                            "item_sort_sequence",
                                            "seasonal",
                                            "periodicity_code",
                                            "periodicity_name",
                                            "base_code",
                                            "base_name",
                                            "base_period")) %>% 
                                  group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                  summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USFoodBeverage_PA = merge(EMP_DAT_CAT_SAN_PA, cu_emp_data_USFoodBeverage_PA, 
                                       by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_PA %>%
  ###sample_frac(0.33) %>% 
  DT::datatable()

4.1.18 TX - Texas

Dallas-Fort Worth
Houston-Galveston-Brazoria

cu_emp_data_USFoodBeverage_TX = cu_data_USFoodBeverage %>% 
                                  filter(area_name %in% c("Dallas-Fort Worth","Houston-Galveston-Brazoria"), 
                                         period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                                  mutate(parent_item_code = 'SAF', 
                                         emp_line_code = 705,
                                         area_code = 'A316;A318', 
                                         area_name = 'Dallas-Fort Worth;Houston-Galveston-Brazoria',
                                         industry = 'Food & Beverages',
                                         state = 'TX') %>% 
                                  select(c("industry",
                                            "state",
                                            "parent_item_code",
                                            "emp_line_code",
                                            "series_id",
                                            "series_title",
                                            "year",
                                            "period",
                                            "period_abbr",
                                            "period_name",
                                            "value",
                                            "area_code",
                                            "area_name",
                                            "area_display_level",
                                            "area_selectable",
                                            "area_sort_sequence",
                                            "item_code",
                                            "item_name",
                                            "item_display_level",
                                            "item_selectable",
                                            "item_sort_sequence",
                                            "seasonal",
                                            "periodicity_code",
                                            "periodicity_name",
                                            "base_code",
                                            "base_name",
                                            "base_period")) %>% 
                                  group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                  summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USFoodBeverage_TX = merge(EMP_DAT_CAT_SAN_TX, cu_emp_data_USFoodBeverage_TX, 
                                       by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_TX %>%
  ###sample_frac(0.33) %>% 
  DT::datatable()

4.1.19 WA - Washington : Seattle-Tacoma-Bremerton

cu_emp_data_USFoodBeverage_WA = cu_data_USFoodBeverage %>% 
                                  filter(area_name=="Seattle-Tacoma-Bremerton", 
                                         period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                                  mutate(parent_item_code = 'SAF', 
                                         emp_line_code = 705,
                                         industry = 'Food & Beverages',
                                         state = 'WA') %>% 
                                  select(c("industry",
                                            "state",
                                            "parent_item_code",
                                            "emp_line_code",
                                            "series_id",
                                            "series_title",
                                            "year",
                                            "period",
                                            "period_abbr",
                                            "period_name",
                                            "value",
                                            "area_code",
                                            "area_name",
                                            "area_display_level",
                                            "area_selectable",
                                            "area_sort_sequence",
                                            "item_code",
                                            "item_name",
                                            "item_display_level",
                                            "item_selectable",
                                            "item_sort_sequence",
                                            "seasonal",
                                            "periodicity_code",
                                            "periodicity_name",
                                            "base_code",
                                            "base_name",
                                            "base_period")) %>% 
                                  group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                  summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USFoodBeverage_WA = merge(EMP_DAT_CAT_SAN_WA, cu_emp_data_USFoodBeverage_WA, 
                                       by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_WA %>%
  ###sample_frac(0.33) %>% 
  DT::datatable()

4.1.20 WI - Wisconsin : Milwaukee-Racine

cu_emp_data_USFoodBeverage_WI = cu_data_USFoodBeverage %>% 
                                  filter(area_name=="Milwaukee-Racine", 
                                         period %in% c("S01","S02")) %>% 
                                  mutate(parent_item_code = 'SAF', 
                                         emp_line_code = 705,
                                         industry = 'Food & Beverages',
                                         state = 'WI') %>% 
                                  select(c("industry",
                                            "state",
                                            "parent_item_code",
                                            "emp_line_code",
                                            "series_id",
                                            "series_title",
                                            "year",
                                            "period",
                                            "period_abbr",
                                            "period_name",
                                            "value",
                                            "area_code",
                                            "area_name",
                                            "area_display_level",
                                            "area_selectable",
                                            "area_sort_sequence",
                                            "item_code",
                                            "item_name",
                                            "item_display_level",
                                            "item_selectable",
                                            "item_sort_sequence",
                                            "seasonal",
                                            "periodicity_code",
                                            "periodicity_name",
                                            "base_code",
                                            "base_name",
                                            "base_period")) %>% 
                                  group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                  summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USFoodBeverage_WI = merge(EMP_DAT_CAT_SAN_WI, cu_emp_data_USFoodBeverage_WI, 
                                       by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USFoodBeverage_WI %>%
  ###sample_frac(0.33) %>% 
  DT::datatable()

4.2 Housing

4.2.1 AK - Alaska : Anchorage

cu_emp_data_USHousing_AK = cu_data_USHousing  %>% 
                            filter(area_name=="Anchorage", 
                                   period %in% c("S01","S02")) %>% 
                            mutate(parent_item_code = 'SAH', 
                                   emp_line_code = 1100,
                                   industry = 'Housing',
                                   state = 'AK') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USHousing_AK = merge(EMP_DAT_CAT_SAN_AK, cu_emp_data_USHousing_AK, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_AK %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.2.2 AZ - Arizona : Phoenix-Mesa

cu_emp_data_USHousing_AZ = cu_data_USHousing  %>% 
                            filter(area_name=="Phoenix-Mesa", 
                                   period %in% c("S01","S02")) %>% 
                            mutate(parent_item_code = 'SAH', 
                                   emp_line_code = 1100,
                                   industry = 'Housing',
                                   state = 'AZ') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USHousing_AZ = merge(EMP_DAT_CAT_SAN_AZ, cu_emp_data_USHousing_AZ, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_AZ %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.2.3 CA - California

Los Angeles-Riverside-Orange County
San Francisco-Oakland-San Jose
San Diego

cu_emp_data_USHousing_CA = cu_data_USHousing  %>% 
                            filter( (area_name %in% c("Los Angeles-Riverside-Orange County", "San Francisco-Oakland-San Jose")
                                      & period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
                                      )
                                     |
                                      (area_name %in% c("San Diego")
                                       & period %in% c("S01","S02")
                                       )
                                   ) %>% 
                            mutate(parent_item_code = 'SAH', 
                                   emp_line_code = 1100,
                                   area_code = 'A421;A422;A424',
                                   area_name = 'Los Angeles-Riverside-Orange County;San Francisco-Oakland-San Jose;San Diego',
                                   industry = 'Housing',
                                   state = 'CA') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value/3),2))
cpi_emp_data_USHousing_CA = merge(EMP_DAT_CAT_SAN_CA, cu_emp_data_USHousing_CA, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_CA %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.2.4 CO - Colorado : Denver-Boulder-Greeley

cu_emp_data_USHousing_CO = cu_data_USHousing  %>% 
                            filter(area_name=="Denver-Boulder-Greeley", 
                                   period %in% c("S01","S02")) %>% 
                            mutate(parent_item_code = 'SAH', 
                                   emp_line_code = 1100,
                                   industry = 'Housing',
                                   state = 'CO') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USHousing_CO = merge(EMP_DAT_CAT_SAN_CO, cu_emp_data_USHousing_CO, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_CO %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.2.5 FL - Florida

Miami-Fort Lauderdale
Tampa-St. Petersburg-Clearwater

cu_emp_data_USHousing_FL = cu_data_USHousing  %>% 
                            filter( (area_name %in% c("Miami-Fort Lauderdale")
                                    & period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
                                    )
                                   |
                                    (area_name %in% c("Tampa-St. Petersburg-Clearwater")
                                     & period %in% c("S01","S02")
                                     )
                                 ) %>% 
                            mutate(parent_item_code = 'SAH', 
                                   emp_line_code = 1100,
                                   area_code = 'A320;A321',
                                   area_name = 'Miami-Fort Lauderdale;Tampa-St. Petersburg-Clearwater',
                                   industry = 'Housing',
                                   state = 'FL') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USHousing_FL = merge(EMP_DAT_CAT_SAN_FL, cu_emp_data_USHousing_FL, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_FL %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.2.6 GA - Georgia : Atlanta

cu_emp_data_USHousing_GA = cu_data_USHousing  %>% 
                            filter(area_name=="Atlanta", 
                                   period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                            mutate(parent_item_code = 'SAH', 
                                   emp_line_code = 1100,
                                   industry = 'Housing',
                                   state = 'GA') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USHousing_GA = merge(EMP_DAT_CAT_SAN_GA, cu_emp_data_USHousing_GA, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_GA %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.2.7 HI - Hawaii : Honolulu

cu_emp_data_USHousing_HI = cu_data_USHousing  %>% 
                            filter(area_name=="Honolulu", 
                                   period %in% c("S01","S02")) %>% 
                            mutate(parent_item_code = 'SAH', 
                                   emp_line_code = 1100,
                                   industry = 'Housing',
                                   state = 'HI') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USHousing_HI = merge(EMP_DAT_CAT_SAN_HI, cu_emp_data_USHousing_HI, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_HI %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.2.8 IL - Illinois : Chicago-Gary-Kenosha

cu_emp_data_USHousing_IL = cu_data_USHousing  %>% 
                            filter(area_name=="Chicago-Gary-Kenosha", 
                                   period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                            mutate(parent_item_code = 'SAH', 
                                   emp_line_code = 1100,
                                   industry = 'Housing',
                                   state = 'IL') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USHousing_IL = merge(EMP_DAT_CAT_SAN_IL, cu_emp_data_USHousing_IL, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_IL %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.2.9 KS - Kansas : Kansas City

cu_emp_data_USHousing_KS = cu_data_USHousing  %>% 
                            filter(area_name=="Kansas City", 
                                   period %in% c("S01","S02")) %>% 
                            mutate(parent_item_code = 'SAH', 
                                   emp_line_code = 1100,
                                   industry = 'Housing',
                                   state = 'KS') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USHousing_KS = merge(EMP_DAT_CAT_SAN_KS, cu_emp_data_USHousing_KS, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_KS %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.2.10 MA - Massachusetts : Boston-Brockton-Nashua

cu_emp_data_USHousing_MA = cu_data_USHousing  %>% 
                            filter(area_name=="Boston-Brockton-Nashua", 
                                   period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                            mutate(parent_item_code = 'SAH', 
                                   emp_line_code = 1100,
                                   industry = 'Housing',
                                   state = 'MA') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USHousing_MA = merge(EMP_DAT_CAT_SAN_MA, cu_emp_data_USHousing_MA, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_MA %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.2.11 MI - Michigan

cu_emp_data_USHousing_MI = cu_data_USHousing  %>% 
                            filter(area_name=="Detroit-Ann Arbor-Flint", 
                                   period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                            mutate(parent_item_code = 'SAH', 
                                   emp_line_code = 1100,
                                   industry = 'Housing',
                                   state = 'MI') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USHousing_MI = merge(EMP_DAT_CAT_SAN_MI, cu_emp_data_USHousing_MI, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_MI %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.2.12 MN - Minnesota : Minneapolis-St. Paul

cu_emp_data_USHousing_MN = cu_data_USHousing  %>% 
                            filter(area_name=="Minneapolis-St. Paul", 
                                   period %in% c("S01","S02")) %>% 
                            mutate(parent_item_code = 'SAH', 
                                   emp_line_code = 1100,
                                   industry = 'Housing',
                                   state = 'MN') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USHousing_MN = merge(EMP_DAT_CAT_SAN_MN, cu_emp_data_USHousing_MN, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_MN %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.2.13 MO - Missouri : St. Louis

cu_emp_data_USHousing_MO = cu_data_USHousing  %>% 
                            filter(area_name=="St. Louis", 
                                   period %in% c("S01","S02")) %>% 
                            mutate(parent_item_code = 'SAH', 
                                   emp_line_code = 1100,
                                   industry = 'Housing',
                                   state = 'MO') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USHousing_MO = merge(EMP_DAT_CAT_SAN_MO, cu_emp_data_USHousing_MO, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_MO %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.2.14 NY - New York : New York-Northern New Jersey-Long Island

cu_emp_data_USHousing_NY = cu_data_USHousing  %>% 
                            filter(area_name=="New York-Northern New Jersey-Long Island", 
                                   period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                            mutate(parent_item_code = 'SAH', 
                                   emp_line_code = 1100,
                                   industry = 'Housing',
                                   state = 'NY') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USHousing_NY = merge(EMP_DAT_CAT_SAN_NY, cu_emp_data_USHousing_NY, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_NY %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.2.15 OH - Ohio

Cleveland-Akron
Cincinnati-Hamilton

cu_emp_data_USHousing_OH = cu_data_USHousing  %>% 
                                  filter( (area_name %in% c("Cleveland-Akron")
                                          & period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
                                          )
                                         |
                                          (area_name %in% c("Cincinnati-Hamilton")
                                           & period %in% c("S01","S02")
                                           )
                                       ) %>% 
                            mutate(parent_item_code = 'SAH', 
                                   emp_line_code = 1100,
                                   area_code = 'A210;A213', 
                                   area_name = 'Cleveland-Akron;Cincinnati-Hamilton',
                                   industry = 'Housing',
                                   state = 'OH') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USHousing_OH = merge(EMP_DAT_CAT_SAN_OH, cu_emp_data_USHousing_OH, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_OH %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.2.16 OR - Oregon : Portland-Salem

cu_emp_data_USHousing_OR = cu_data_USHousing  %>% 
                            filter(area_name=="Portland-Salem", 
                                   period %in% c("S01","S02")) %>% 
                            mutate(parent_item_code = 'SAH', 
                                   emp_line_code = 1100,
                                   industry = 'Housing',
                                   state = 'OR') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USHousing_OR = merge(EMP_DAT_CAT_SAN_OR, cu_emp_data_USHousing_OR, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_OR %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.2.17 PA - Pennsylvania

Philadelphia-Wilmington-Atlantic City
Pittsburgh

cu_emp_data_USHousing_PA = cu_data_USHousing  %>% 
                            filter( (area_name %in% c("Philadelphia-Wilmington-Atlantic City")
                                    & period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
                                    )
                                   |
                                    (area_name %in% c("Pittsburgh")
                                     & period %in% c("S01","S02")
                                     )
                                 ) %>% 
                            mutate(parent_item_code = 'SAH', 
                                   emp_line_code = 1100,
                                   area_code = 'A102;A104', 
                                   area_name = 'Philadelphia-Wilmington-Atlantic City;Pittsburgh',
                                   industry = 'Housing',
                                   state = 'PA') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USHousing_PA = merge(EMP_DAT_CAT_SAN_PA, cu_emp_data_USHousing_PA, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_PA %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.2.18 TX - Texas

Dallas-Fort Worth
Houston-Galveston-Brazoria

cu_emp_data_USHousing_TX = cu_data_USHousing  %>% 
                            filter(area_name %in% c("Dallas-Fort Worth","Houston-Galveston-Brazoria"), 
                                         period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                            mutate(parent_item_code = 'SAH', 
                                   emp_line_code = 1100,
                                   area_code = 'A316;A318', 
                                   area_name = 'Dallas-Fort Worth;Houston-Galveston-Brazoria',
                                   industry = 'Housing',
                                   state = 'TX') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USHousing_TX = merge(EMP_DAT_CAT_SAN_TX, cu_emp_data_USHousing_TX, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_TX %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.2.19 WA - Washington : Seattle-Tacoma-Bremerton

cu_emp_data_USHousing_WA = cu_data_USHousing  %>% 
                            filter(area_name=="Seattle-Tacoma-Bremerton", 
                                   period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                            mutate(parent_item_code = 'SAH', 
                                   emp_line_code = 1100,
                                   industry = 'Housing',
                                   state = 'WA') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USHousing_WA = merge(EMP_DAT_CAT_SAN_WA, cu_emp_data_USHousing_WA, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_WA %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.2.20 WI - Wisconsin : Milwaukee-Racine

cu_emp_data_USHousing_WI = cu_data_USHousing  %>% 
                            filter(area_name=="Milwaukee-Racine", 
                                   period %in% c("S01","S02")) %>% 
                            mutate(parent_item_code = 'SAH', 
                                   emp_line_code = 1100,
                                   industry = 'Housing',
                                   state = 'WI') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USHousing_WI = merge(EMP_DAT_CAT_SAN_WI, cu_emp_data_USHousing_WI, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USHousing_WI %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.3 Transportation

4.3.1 AK - Alaska : Anchorage

cu_emp_data_USTransportation_AK = cu_data_USTransportation %>% 
                                    filter(area_name=="Anchorage", 
                                           period %in% c("S01","S02")) %>% 
                                    mutate(parent_item_code = 'SAT', 
                                           emp_line_code = 800,
                                           industry = 'Transportation',
                                           state = 'AK') %>% 
                                    select(c("industry",
                                             "state",
                                             "parent_item_code",
                                             "emp_line_code",
                                             "series_id",
                                             "series_title",
                                             "year",
                                             "period",
                                             "period_abbr",
                                             "period_name",
                                             "value",
                                             "area_code",
                                             "area_name",
                                             "area_display_level",
                                             "area_selectable",
                                             "area_sort_sequence",
                                             "item_code",
                                             "item_name",
                                             "item_display_level",
                                             "item_selectable",
                                             "item_sort_sequence",
                                             "seasonal",
                                             "periodicity_code",
                                             "periodicity_name",
                                             "base_code",
                                             "base_name",
                                             "base_period")) %>% 
                                    group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                    summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USTransportation_AK = merge(EMP_DAT_CAT_SAN_AK, cu_emp_data_USTransportation_AK, 
                                         by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_AK %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.3.2 AZ - Arizona : Phoenix-Mesa

cu_emp_data_USTransportation_AZ = cu_data_USTransportation %>% 
                                    filter(area_name=="Phoenix-Mesa", 
                                           period %in% c("S01","S02")) %>% 
                                    mutate(parent_item_code = 'SAT', 
                                           emp_line_code = 800,
                                           industry = 'Transportation',
                                           state = 'AZ') %>% 
                                    select(c("industry",
                                             "state",
                                             "parent_item_code",
                                             "emp_line_code",
                                             "series_id",
                                             "series_title",
                                             "year",
                                             "period",
                                             "period_abbr",
                                             "period_name",
                                             "value",
                                             "area_code",
                                             "area_name",
                                             "area_display_level",
                                             "area_selectable",
                                             "area_sort_sequence",
                                             "item_code",
                                             "item_name",
                                             "item_display_level",
                                             "item_selectable",
                                             "item_sort_sequence",
                                             "seasonal",
                                             "periodicity_code",
                                             "periodicity_name",
                                             "base_code",
                                             "base_name",
                                             "base_period")) %>% 
                                    group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                    summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USTransportation_AZ = merge(EMP_DAT_CAT_SAN_AZ, cu_emp_data_USTransportation_AZ, 
                                         by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_AZ %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.3.3 CA - California

Los Angeles-Riverside-Orange County
San Francisco-Oakland-San Jose
San Diego

cu_emp_data_USTransportation_CA = cu_data_USTransportation %>% 
                                   filter( (area_name %in% c("Los Angeles-Riverside-Orange County", "San Francisco-Oakland-San Jose")
                                            & period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
                                            )
                                           |
                                            (area_name %in% c("San Diego")
                                             & period %in% c("S01","S02")
                                             )
                                         ) %>% 
                                    mutate(parent_item_code = 'SAT', 
                                           emp_line_code = 800,
                                           area_code = 'A421;A422;A424',
                                           area_name = 'Los Angeles-Riverside-Orange County;San Francisco-Oakland-San Jose;San Diego',
                                           industry = 'Transportation',
                                           state = 'CA') %>% 
                                    select(c("industry",
                                             "state",
                                             "parent_item_code",
                                             "emp_line_code",
                                             "series_id",
                                             "series_title",
                                             "year",
                                             "period",
                                             "period_abbr",
                                             "period_name",
                                             "value",
                                             "area_code",
                                             "area_name",
                                             "area_display_level",
                                             "area_selectable",
                                             "area_sort_sequence",
                                             "item_code",
                                             "item_name",
                                             "item_display_level",
                                             "item_selectable",
                                             "item_sort_sequence",
                                             "seasonal",
                                             "periodicity_code",
                                             "periodicity_name",
                                             "base_code",
                                             "base_name",
                                             "base_period")) %>% 
                                    group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                    summarise(cpi_value = round(sum(value/3),2))
cpi_emp_data_USTransportation_CA = merge(EMP_DAT_CAT_SAN_CA, cu_emp_data_USTransportation_CA, 
                                         by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_CA %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.3.4 CO - Colorado : Denver-Boulder-Greeley

cu_emp_data_USTransportation_CO = cu_data_USTransportation %>% 
                                    filter(area_name=="Denver-Boulder-Greeley", 
                                           period %in% c("S01","S02")) %>% 
                                    mutate(parent_item_code = 'SAT', 
                                           emp_line_code = 800,
                                           industry = 'Transportation',
                                           state = 'CO') %>% 
                                    select(c("industry",
                                             "state",
                                             "parent_item_code",
                                             "emp_line_code",
                                             "series_id",
                                             "series_title",
                                             "year",
                                             "period",
                                             "period_abbr",
                                             "period_name",
                                             "value",
                                             "area_code",
                                             "area_name",
                                             "area_display_level",
                                             "area_selectable",
                                             "area_sort_sequence",
                                             "item_code",
                                             "item_name",
                                             "item_display_level",
                                             "item_selectable",
                                             "item_sort_sequence",
                                             "seasonal",
                                             "periodicity_code",
                                             "periodicity_name",
                                             "base_code",
                                             "base_name",
                                             "base_period")) %>% 
                                    group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                    summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USTransportation_CO = merge(EMP_DAT_CAT_SAN_CO, cu_emp_data_USTransportation_CO, 
                                         by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_CO %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.3.5 FL - Florida

Miami-Fort Lauderdale
Tampa-St. Petersburg-Clearwater

cu_emp_data_USTransportation_FL = cu_data_USTransportation %>% 
                                    filter( (area_name %in% c("Miami-Fort Lauderdale")
                                            & period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
                                            )
                                           |
                                            (area_name %in% c("Tampa-St. Petersburg-Clearwater")
                                             & period %in% c("S01","S02")
                                             )
                                         ) %>% 
                                    mutate(parent_item_code = 'SAT', 
                                           emp_line_code = 800,
                                           area_code = 'A320;A321',
                                           area_name = 'Miami-Fort Lauderdale;Tampa-St. Petersburg-Clearwater',
                                           industry = 'Transportation',
                                           state = 'FL') %>% 
                                    select(c("industry",
                                             "state",
                                             "parent_item_code",
                                             "emp_line_code",
                                             "series_id",
                                             "series_title",
                                             "year",
                                             "period",
                                             "period_abbr",
                                             "period_name",
                                             "value",
                                             "area_code",
                                             "area_name",
                                             "area_display_level",
                                             "area_selectable",
                                             "area_sort_sequence",
                                             "item_code",
                                             "item_name",
                                             "item_display_level",
                                             "item_selectable",
                                             "item_sort_sequence",
                                             "seasonal",
                                             "periodicity_code",
                                             "periodicity_name",
                                             "base_code",
                                             "base_name",
                                             "base_period")) %>% 
                                    group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                    summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USTransportation_FL = merge(EMP_DAT_CAT_SAN_FL, cu_emp_data_USTransportation_FL, 
                                         by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_FL %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.3.6 GA - Georgia : Atlanta

cu_emp_data_USTransportation_GA = cu_data_USTransportation %>% 
                                    filter(area_name=="Atlanta", 
                                           period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                                    mutate(parent_item_code = 'SAT', 
                                           emp_line_code = 800,
                                           industry = 'Transportation',
                                           state = 'GA') %>% 
                                    select(c("industry",
                                             "state",
                                             "parent_item_code",
                                             "emp_line_code",
                                             "series_id",
                                             "series_title",
                                             "year",
                                             "period",
                                             "period_abbr",
                                             "period_name",
                                             "value",
                                             "area_code",
                                             "area_name",
                                             "area_display_level",
                                             "area_selectable",
                                             "area_sort_sequence",
                                             "item_code",
                                             "item_name",
                                             "item_display_level",
                                             "item_selectable",
                                             "item_sort_sequence",
                                             "seasonal",
                                             "periodicity_code",
                                             "periodicity_name",
                                             "base_code",
                                             "base_name",
                                             "base_period")) %>% 
                                    group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                    summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USTransportation_GA = merge(EMP_DAT_CAT_SAN_GA, cu_emp_data_USTransportation_GA, 
                                         by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_GA %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.3.7 HI - Hawaii : Honolulu

cu_emp_data_USTransportation_HI = cu_data_USTransportation %>% 
                                    filter(area_name=="Honolulu", 
                                           period %in% c("S01","S02")) %>% 
                                    mutate(parent_item_code = 'SAT', 
                                           emp_line_code = 800,
                                           industry = 'Transportation',
                                           state = 'HI') %>% 
                                    select(c("industry",
                                             "state",
                                             "parent_item_code",
                                             "emp_line_code",
                                             "series_id",
                                             "series_title",
                                             "year",
                                             "period",
                                             "period_abbr",
                                             "period_name",
                                             "value",
                                             "area_code",
                                             "area_name",
                                             "area_display_level",
                                             "area_selectable",
                                             "area_sort_sequence",
                                             "item_code",
                                             "item_name",
                                             "item_display_level",
                                             "item_selectable",
                                             "item_sort_sequence",
                                             "seasonal",
                                             "periodicity_code",
                                             "periodicity_name",
                                             "base_code",
                                             "base_name",
                                             "base_period")) %>% 
                                    group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                    summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USTransportation_HI = merge(EMP_DAT_CAT_SAN_HI, cu_emp_data_USTransportation_HI, 
                                         by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_HI %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.3.8 IL - Illinois : Chicago-Gary-Kenosha

cu_emp_data_USTransportation_IL = cu_data_USTransportation %>% 
                                    filter(area_name=="Chicago-Gary-Kenosha", 
                                           period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                                    mutate(parent_item_code = 'SAT', 
                                           emp_line_code = 800,
                                           industry = 'Transportation',
                                           state = 'IL') %>% 
                                    select(c("industry",
                                             "state",
                                             "parent_item_code",
                                             "emp_line_code",
                                             "series_id",
                                             "series_title",
                                             "year",
                                             "period",
                                             "period_abbr",
                                             "period_name",
                                             "value",
                                             "area_code",
                                             "area_name",
                                             "area_display_level",
                                             "area_selectable",
                                             "area_sort_sequence",
                                             "item_code",
                                             "item_name",
                                             "item_display_level",
                                             "item_selectable",
                                             "item_sort_sequence",
                                             "seasonal",
                                             "periodicity_code",
                                             "periodicity_name",
                                             "base_code",
                                             "base_name",
                                             "base_period")) %>% 
                                    group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                    summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USTransportation_IL = merge(EMP_DAT_CAT_SAN_IL, cu_emp_data_USTransportation_IL, 
                                         by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_IL %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.3.9 KS - Kansas : Kansas City

cu_emp_data_USTransportation_KS = cu_data_USTransportation %>% 
                                    filter(area_name=="Kansas City", 
                                           period %in% c("S01","S02")) %>% 
                                    mutate(parent_item_code = 'SAT', 
                                           emp_line_code = 800,
                                           industry = 'Transportation',
                                           state = 'KS') %>% 
                                    select(c("industry",
                                             "state",
                                             "parent_item_code",
                                             "emp_line_code",
                                             "series_id",
                                             "series_title",
                                             "year",
                                             "period",
                                             "period_abbr",
                                             "period_name",
                                             "value",
                                             "area_code",
                                             "area_name",
                                             "area_display_level",
                                             "area_selectable",
                                             "area_sort_sequence",
                                             "item_code",
                                             "item_name",
                                             "item_display_level",
                                             "item_selectable",
                                             "item_sort_sequence",
                                             "seasonal",
                                             "periodicity_code",
                                             "periodicity_name",
                                             "base_code",
                                             "base_name",
                                             "base_period")) %>% 
                                    group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                    summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USTransportation_KS = merge(EMP_DAT_CAT_SAN_KS, cu_emp_data_USTransportation_KS, 
                                         by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_KS %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.3.10 MA - Massachusetts : Boston-Brockton-Nashua

cu_emp_data_USTransportation_MA = cu_data_USTransportation %>% 
                                    filter(area_name=="Boston-Brockton-Nashua", 
                                           period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                                    mutate(parent_item_code = 'SAT', 
                                           emp_line_code = 800,
                                           industry = 'Transportation',
                                           state = 'MA') %>% 
                                    select(c("industry",
                                             "state",
                                             "parent_item_code",
                                             "emp_line_code",
                                             "series_id",
                                             "series_title",
                                             "year",
                                             "period",
                                             "period_abbr",
                                             "period_name",
                                             "value",
                                             "area_code",
                                             "area_name",
                                             "area_display_level",
                                             "area_selectable",
                                             "area_sort_sequence",
                                             "item_code",
                                             "item_name",
                                             "item_display_level",
                                             "item_selectable",
                                             "item_sort_sequence",
                                             "seasonal",
                                             "periodicity_code",
                                             "periodicity_name",
                                             "base_code",
                                             "base_name",
                                             "base_period")) %>% 
                                    group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                    summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USTransportation_MA = merge(EMP_DAT_CAT_SAN_MA, cu_emp_data_USTransportation_MA, 
                                         by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_MA %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.3.11 MI - Michigan : Detroit-Ann Arbor-Flint

cu_emp_data_USTransportation_MI = cu_data_USTransportation %>% 
                                    filter(area_name=="Detroit-Ann Arbor-Flint", 
                                           period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                                    mutate(parent_item_code = 'SAT', 
                                           emp_line_code = 800,
                                           industry = 'Transportation',
                                           state = 'MI') %>% 
                                    select(c("industry",
                                             "state",
                                             "parent_item_code",
                                             "emp_line_code",
                                             "series_id",
                                             "series_title",
                                             "year",
                                             "period",
                                             "period_abbr",
                                             "period_name",
                                             "value",
                                             "area_code",
                                             "area_name",
                                             "area_display_level",
                                             "area_selectable",
                                             "area_sort_sequence",
                                             "item_code",
                                             "item_name",
                                             "item_display_level",
                                             "item_selectable",
                                             "item_sort_sequence",
                                             "seasonal",
                                             "periodicity_code",
                                             "periodicity_name",
                                             "base_code",
                                             "base_name",
                                             "base_period")) %>% 
                                    group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                    summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USTransportation_MI = merge(EMP_DAT_CAT_SAN_MI, cu_emp_data_USTransportation_MI, 
                                         by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_MI %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.3.12 MN - Minnesota : Minneapolis-St. Paul

cu_emp_data_USTransportation_MN = cu_data_USTransportation %>% 
                                    filter(area_name=="Minneapolis-St. Paul", 
                                           period %in% c("S01","S02")) %>% 
                                    mutate(parent_item_code = 'SAT', 
                                           emp_line_code = 800,
                                           industry = 'Transportation',
                                           state = 'MN') %>% 
                                    select(c("industry",
                                             "state",
                                             "parent_item_code",
                                             "emp_line_code",
                                             "series_id",
                                             "series_title",
                                             "year",
                                             "period",
                                             "period_abbr",
                                             "period_name",
                                             "value",
                                             "area_code",
                                             "area_name",
                                             "area_display_level",
                                             "area_selectable",
                                             "area_sort_sequence",
                                             "item_code",
                                             "item_name",
                                             "item_display_level",
                                             "item_selectable",
                                             "item_sort_sequence",
                                             "seasonal",
                                             "periodicity_code",
                                             "periodicity_name",
                                             "base_code",
                                             "base_name",
                                             "base_period")) %>% 
                                    group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                    summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USTransportation_MN = merge(EMP_DAT_CAT_SAN_MN, cu_emp_data_USTransportation_MN, 
                                         by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_MN %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.3.13 MO - Missouri : St. Louis

cu_emp_data_USTransportation_MO = cu_data_USTransportation %>% 
                                    filter(area_name=="St. Louis", 
                                           period %in% c("S01","S02")) %>% 
                                    mutate(parent_item_code = 'SAT', 
                                           emp_line_code = 800,
                                           industry = 'Transportation',
                                           state = 'MO') %>% 
                                    select(c("industry",
                                             "state",
                                             "parent_item_code",
                                             "emp_line_code",
                                             "series_id",
                                             "series_title",
                                             "year",
                                             "period",
                                             "period_abbr",
                                             "period_name",
                                             "value",
                                             "area_code",
                                             "area_name",
                                             "area_display_level",
                                             "area_selectable",
                                             "area_sort_sequence",
                                             "item_code",
                                             "item_name",
                                             "item_display_level",
                                             "item_selectable",
                                             "item_sort_sequence",
                                             "seasonal",
                                             "periodicity_code",
                                             "periodicity_name",
                                             "base_code",
                                             "base_name",
                                             "base_period")) %>% 
                                    group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                    summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USTransportation_MO = merge(EMP_DAT_CAT_SAN_MO, cu_emp_data_USTransportation_MO, 
                                         by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_MO %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.3.14 NY - New York : New York-Northern New Jersey-Long Island

cu_emp_data_USTransportation_NY = cu_data_USTransportation %>% 
                                    filter(area_name=="New York-Northern New Jersey-Long Island", 
                                           period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                                    mutate(parent_item_code = 'SAT', 
                                           emp_line_code = 800,
                                           industry = 'Transportation',
                                           state = 'NY') %>% 
                                    select(c("industry",
                                             "state",
                                             "parent_item_code",
                                             "emp_line_code",
                                             "series_id",
                                             "series_title",
                                             "year",
                                             "period",
                                             "period_abbr",
                                             "period_name",
                                             "value",
                                             "area_code",
                                             "area_name",
                                             "area_display_level",
                                             "area_selectable",
                                             "area_sort_sequence",
                                             "item_code",
                                             "item_name",
                                             "item_display_level",
                                             "item_selectable",
                                             "item_sort_sequence",
                                             "seasonal",
                                             "periodicity_code",
                                             "periodicity_name",
                                             "base_code",
                                             "base_name",
                                             "base_period")) %>% 
                                    group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                    summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USTransportation_NY = merge(EMP_DAT_CAT_SAN_NY, cu_emp_data_USTransportation_NY, 
                                         by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_NY %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.3.15 OH - Ohio

Cleveland-Akron
Cincinnati-Hamilton

cu_emp_data_USTransportation_OH = cu_data_USTransportation %>% 
                                    filter( (area_name %in% c("Cleveland-Akron")
                                            & period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
                                            )
                                           |
                                            (area_name %in% c("Cincinnati-Hamilton")
                                             & period %in% c("S01","S02")
                                             )
                                         ) %>% 
                                    mutate(parent_item_code = 'SAT', 
                                           emp_line_code = 800,
                                           area_code = 'A210;A213', 
                                           area_name = 'Cleveland-Akron;Cincinnati-Hamilton',
                                           industry = 'Transportation',
                                           state = 'OH') %>% 
                                    select(c("industry",
                                             "state",
                                             "parent_item_code",
                                             "emp_line_code",
                                             "series_id",
                                             "series_title",
                                             "year",
                                             "period",
                                             "period_abbr",
                                             "period_name",
                                             "value",
                                             "area_code",
                                             "area_name",
                                             "area_display_level",
                                             "area_selectable",
                                             "area_sort_sequence",
                                             "item_code",
                                             "item_name",
                                             "item_display_level",
                                             "item_selectable",
                                             "item_sort_sequence",
                                             "seasonal",
                                             "periodicity_code",
                                             "periodicity_name",
                                             "base_code",
                                             "base_name",
                                             "base_period")) %>% 
                                    group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                    summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USTransportation_OH = merge(EMP_DAT_CAT_SAN_OH, cu_emp_data_USTransportation_OH, 
                                         by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_OH %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.3.16 OR - Oregon : Portland-Salem

cu_emp_data_USTransportation_OR = cu_data_USTransportation %>% 
                                    filter(area_name=="Portland-Salem", 
                                           period %in% c("S01","S02")) %>% 
                                    mutate(parent_item_code = 'SAT', 
                                           emp_line_code = 800,
                                           industry = 'Transportation',
                                           state = 'OR') %>% 
                                    select(c("industry",
                                             "state",
                                             "parent_item_code",
                                             "emp_line_code",
                                             "series_id",
                                             "series_title",
                                             "year",
                                             "period",
                                             "period_abbr",
                                             "period_name",
                                             "value",
                                             "area_code",
                                             "area_name",
                                             "area_display_level",
                                             "area_selectable",
                                             "area_sort_sequence",
                                             "item_code",
                                             "item_name",
                                             "item_display_level",
                                             "item_selectable",
                                             "item_sort_sequence",
                                             "seasonal",
                                             "periodicity_code",
                                             "periodicity_name",
                                             "base_code",
                                             "base_name",
                                             "base_period")) %>% 
                                    group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                    summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USTransportation_OR = merge(EMP_DAT_CAT_SAN_OR, cu_emp_data_USTransportation_OR, 
                                         by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_OR %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.3.17 PA - Pennsylvania

Philadelphia-Wilmington-Atlantic City
Pittsburgh

cu_emp_data_USTransportation_PA = cu_data_USTransportation %>% 
                                    filter( (area_name %in% c("Philadelphia-Wilmington-Atlantic City")
                                            & period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
                                            )
                                           |
                                            (area_name %in% c("Pittsburgh")
                                             & period %in% c("S01","S02")
                                             )
                                         ) %>% 
                                    mutate(parent_item_code = 'SAT', 
                                           emp_line_code = 800,
                                           area_code = 'A102;A104', 
                                           area_name = 'Philadelphia-Wilmington-Atlantic City;Pittsburgh',
                                           industry = 'Transportation',
                                           state = 'PA') %>% 
                                    select(c("industry",
                                             "state",
                                             "parent_item_code",
                                             "emp_line_code",
                                             "series_id",
                                             "series_title",
                                             "year",
                                             "period",
                                             "period_abbr",
                                             "period_name",
                                             "value",
                                             "area_code",
                                             "area_name",
                                             "area_display_level",
                                             "area_selectable",
                                             "area_sort_sequence",
                                             "item_code",
                                             "item_name",
                                             "item_display_level",
                                             "item_selectable",
                                             "item_sort_sequence",
                                             "seasonal",
                                             "periodicity_code",
                                             "periodicity_name",
                                             "base_code",
                                             "base_name",
                                             "base_period")) %>% 
                                    group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                    summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USTransportation_PA = merge(EMP_DAT_CAT_SAN_PA, cu_emp_data_USTransportation_PA, 
                                         by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_PA %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.3.18 TX - Texas

Dallas-Fort Worth
Houston-Galveston-Brazoria

cu_emp_data_USTransportation_TX = cu_data_USTransportation %>% 
                                    filter(area_name %in% c("Dallas-Fort Worth","Houston-Galveston-Brazoria"), 
                                           period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                                    mutate(parent_item_code = 'SAT', 
                                           emp_line_code = 800,
                                           area_code = 'A316;A318', 
                                           area_name = 'Dallas-Fort Worth;Houston-Galveston-Brazoria',
                                           industry = 'Transportation',
                                           state = 'TX') %>% 
                                    select(c("industry",
                                             "state",
                                             "parent_item_code",
                                             "emp_line_code",
                                             "series_id",
                                             "series_title",
                                             "year",
                                             "period",
                                             "period_abbr",
                                             "period_name",
                                             "value",
                                             "area_code",
                                             "area_name",
                                             "area_display_level",
                                             "area_selectable",
                                             "area_sort_sequence",
                                             "item_code",
                                             "item_name",
                                             "item_display_level",
                                             "item_selectable",
                                             "item_sort_sequence",
                                             "seasonal",
                                             "periodicity_code",
                                             "periodicity_name",
                                             "base_code",
                                             "base_name",
                                             "base_period")) %>% 
                                    group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                    summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USTransportation_TX = merge(EMP_DAT_CAT_SAN_TX, cu_emp_data_USTransportation_TX, 
                                         by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_TX %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.3.19 WA - Washington : Seattle-Tacoma-Bremerton

cu_emp_data_USTransportation_WA = cu_data_USTransportation %>% 
                                    filter(area_name=="Seattle-Tacoma-Bremerton", 
                                           period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                                    mutate(parent_item_code = 'SAT', 
                                           emp_line_code = 800,
                                           industry = 'Transportation',
                                           state = 'WA') %>% 
                                    select(c("industry",
                                             "state",
                                             "parent_item_code",
                                             "emp_line_code",
                                             "series_id",
                                             "series_title",
                                             "year",
                                             "period",
                                             "period_abbr",
                                             "period_name",
                                             "value",
                                             "area_code",
                                             "area_name",
                                             "area_display_level",
                                             "area_selectable",
                                             "area_sort_sequence",
                                             "item_code",
                                             "item_name",
                                             "item_display_level",
                                             "item_selectable",
                                             "item_sort_sequence",
                                             "seasonal",
                                             "periodicity_code",
                                             "periodicity_name",
                                             "base_code",
                                             "base_name",
                                             "base_period")) %>% 
                                    group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                    summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USTransportation_WA = merge(EMP_DAT_CAT_SAN_WA, cu_emp_data_USTransportation_WA, 
                                         by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_WA %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.3.20 WI - Wisconsin : Milwaukee-Racine

cu_emp_data_USTransportation_WI = cu_data_USTransportation %>% 
                                    filter(area_name=="Milwaukee-Racine", 
                                           period %in% c("S01","S02")) %>% 
                                    mutate(parent_item_code = 'SAT', 
                                           emp_line_code = 800,
                                           industry = 'Transportation',
                                           state = 'WI') %>% 
                                    select(c("industry",
                                             "state",
                                             "parent_item_code",
                                             "emp_line_code",
                                             "series_id",
                                             "series_title",
                                             "year",
                                             "period",
                                             "period_abbr",
                                             "period_name",
                                             "value",
                                             "area_code",
                                             "area_name",
                                             "area_display_level",
                                             "area_selectable",
                                             "area_sort_sequence",
                                             "item_code",
                                             "item_name",
                                             "item_display_level",
                                             "item_selectable",
                                             "item_sort_sequence",
                                             "seasonal",
                                             "periodicity_code",
                                             "periodicity_name",
                                             "base_code",
                                             "base_name",
                                             "base_period")) %>% 
                                    group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                    summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USTransportation_WI = merge(EMP_DAT_CAT_SAN_WI, cu_emp_data_USTransportation_WI, 
                                         by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USTransportation_WI %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.4 Medical

4.4.1 AK - Alaska : Anchorage

cu_emp_data_USMedical_AK = cu_data_USMedical %>% 
                            filter(area_name=="Anchorage", 
                                   period %in% c("S01","S02")) %>% 
                            mutate(parent_item_code = 'SAM', 
                                   emp_line_code = 1600,
                                   industry = 'Medical',
                                   state = 'AK') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_AK = merge(EMP_DAT_CAT_SAN_AK, cu_emp_data_USMedical_AK, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_AK %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.4.2 AZ - Arizona : Phoenix-Mesa

cu_emp_data_USMedical_AZ = cu_data_USMedical %>% 
                            filter(area_name=="Phoenix-Mesa", 
                                   period %in% c("S01","S02")) %>% 
                            mutate(parent_item_code = 'SAM', 
                                   emp_line_code = 1600,
                                   industry = 'Medical',
                                   state = 'AZ') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_AZ = merge(EMP_DAT_CAT_SAN_AZ, cu_emp_data_USMedical_AZ, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_AZ %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.4.3 CA - California

Los Angeles-Riverside-Orange County
San Francisco-Oakland-San Jose
San Diego

cu_emp_data_USMedical_CA = cu_data_USMedical %>% 
                             filter( (area_name %in% c("Los Angeles-Riverside-Orange County", "San Francisco-Oakland-San Jose")
                                      & period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
                                      )
                                     |
                                      (area_name %in% c("San Diego")
                                       & period %in% c("S01","S02")
                                       )
                                   ) %>% 
                            mutate(parent_item_code = 'SAM', 
                                   emp_line_code = 1600,
                                   area_code = 'A421;A422;A424',
                                   area_name = 'Los Angeles-Riverside-Orange County;San Francisco-Oakland-San Jose;San Diego',
                                   industry = 'Medical',
                                   state = 'CA') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value/3),2))
cpi_emp_data_USMedical_CA = merge(EMP_DAT_CAT_SAN_CA, cu_emp_data_USMedical_CA, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_CA %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.4.4 CO - Colorado : Denver-Boulder-Greeley

cu_emp_data_USMedical_CO = cu_data_USMedical %>% 
                            filter(area_name=="Denver-Boulder-Greeley", 
                                   period %in% c("S01","S02")) %>% 
                            mutate(parent_item_code = 'SAM', 
                                   emp_line_code = 1600,
                                   industry = 'Medical',
                                   state = 'CO') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_CO = merge(EMP_DAT_CAT_SAN_CO, cu_emp_data_USMedical_CO, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_CO %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.4.5 FL - Florida

Miami-Fort Lauderdale
Tampa-St. Petersburg-Clearwater

cu_emp_data_USMedical_FL = cu_data_USMedical %>% 
                            filter( (area_name %in% c("Miami-Fort Lauderdale")
                                    & period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
                                    )
                                   |
                                    (area_name %in% c("Tampa-St. Petersburg-Clearwater")
                                     & period %in% c("S01","S02")
                                     )
                                 ) %>% 
                            mutate(parent_item_code = 'SAM', 
                                   emp_line_code = 1600,
                                   area_code = 'A320;A321',
                                   area_name = 'Miami-Fort Lauderdale;Tampa-St. Petersburg-Clearwater',
                                   industry = 'Medical',
                                   state = 'FL') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_FL = merge(EMP_DAT_CAT_SAN_FL, cu_emp_data_USMedical_FL, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_FL %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.4.6 GA - Georgia : Atlanta

cu_emp_data_USMedical_GA = cu_data_USMedical %>% 
                            filter(area_name=="Atlanta", 
                                   period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                            mutate(parent_item_code = 'SAM', 
                                   emp_line_code = 1600,
                                   industry = 'Medical',
                                   state = 'GA') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_GA = merge(EMP_DAT_CAT_SAN_GA, cu_emp_data_USMedical_GA, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_GA %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.4.7 HI - Hawaii : Honolulu

cu_emp_data_USMedical_HI = cu_data_USMedical %>% 
                            filter(area_name=="Honolulu", 
                                   period %in% c("S01","S02")) %>% 
                            mutate(parent_item_code = 'SAM', 
                                   emp_line_code = 1600,
                                   industry = 'Medical',
                                   state = 'HI') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_HI = merge(EMP_DAT_CAT_SAN_HI, cu_emp_data_USMedical_HI, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_HI %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.4.8 IL - Illinois : Chicago-Gary-Kenosha

cu_emp_data_USMedical_IL = cu_data_USMedical %>% 
                            filter(area_name=="Chicago-Gary-Kenosha", 
                                   period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                            mutate(parent_item_code = 'SAM', 
                                   emp_line_code = 1600,
                                   industry = 'Medical',
                                   state = 'IL') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_IL = merge(EMP_DAT_CAT_SAN_IL, cu_emp_data_USMedical_IL, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_IL %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.4.9 KS - Kansas : Kansas City

cu_emp_data_USMedical_KS = cu_data_USMedical %>% 
                            filter(area_name=="Kansas City", 
                                   period %in% c("S01","S02")) %>% 
                            mutate(parent_item_code = 'SAM', 
                                   emp_line_code = 1600,
                                   industry = 'Medical',
                                   state = 'KS') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_KS = merge(EMP_DAT_CAT_SAN_KS, cu_emp_data_USMedical_KS, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_KS %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.4.10 MA - Massachusetts : Boston-Brockton-Nashua

cu_emp_data_USMedical_MA = cu_data_USMedical %>% 
                            filter(area_name=="Boston-Brockton-Nashua", 
                                   period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                            mutate(parent_item_code = 'SAM', 
                                   emp_line_code = 1600,
                                   industry = 'Medical',
                                   state = 'MA') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_MA = merge(EMP_DAT_CAT_SAN_MA, cu_emp_data_USMedical_MA, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_MA %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.4.11 MI - Michigan : Detroit-Ann Arbor-Flint

cu_emp_data_USMedical_MI = cu_data_USMedical %>% 
                            filter(area_name=="Detroit-Ann Arbor-Flint", 
                                   period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                            mutate(parent_item_code = 'SAM', 
                                   emp_line_code = 1600,
                                   industry = 'Medical',
                                   state = 'MI') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_MI = merge(EMP_DAT_CAT_SAN_MI, cu_emp_data_USMedical_MI, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_MI %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.4.12 MN - Minnesota : Minneapolis-St. Paul

cu_emp_data_USMedical_MN = cu_data_USMedical %>% 
                            filter(area_name=="Minneapolis-St. Paul", 
                                   period %in% c("S01","S02")) %>% 
                            mutate(parent_item_code = 'SAM', 
                                   emp_line_code = 1600,
                                   industry = 'Medical',
                                   state = 'MN') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_MN = merge(EMP_DAT_CAT_SAN_MN, cu_emp_data_USMedical_MN, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_MN %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.4.13 MO - Missouri : St. Louis

cu_emp_data_USMedical_MO = cu_data_USMedical %>% 
                            filter(area_name=="St. Louis", 
                                   period %in% c("S01","S02")) %>% 
                            mutate(parent_item_code = 'SAM', 
                                   emp_line_code = 1600,
                                   industry = 'Medical',
                                   state = 'MO') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_MO = merge(EMP_DAT_CAT_SAN_MO, cu_emp_data_USMedical_MO, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_MO %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.4.14 NY - New York

cu_emp_data_USMedical_NY = cu_data_USMedical %>% 
                            filter(area_name=="New York-Northern New Jersey-Long Island", 
                                   period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                            mutate(parent_item_code = 'SAM', 
                                   emp_line_code = 1600,
                                   industry = 'Medical',
                                   state = 'NY') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_NY = merge(EMP_DAT_CAT_SAN_NY, cu_emp_data_USMedical_NY, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_NY %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.4.15 OH - Ohio

Cleveland-Akron
Cincinnati-Hamilton

cu_emp_data_USMedical_OH = cu_data_USMedical %>% 
                            filter( (area_name %in% c("Cleveland-Akron")
                                    & period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
                                    )
                                   |
                                    (area_name %in% c("Cincinnati-Hamilton")
                                     & period %in% c("S01","S02")
                                     )
                                 ) %>% 
                            mutate(parent_item_code = 'SAM', 
                                   emp_line_code = 1600,
                                   area_code = 'A210;A213', 
                                   area_name = 'Cleveland-Akron;Cincinnati-Hamilton',
                                   industry = 'Medical',
                                   state = 'OH') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USMedical_OH = merge(EMP_DAT_CAT_SAN_OH, cu_emp_data_USMedical_OH, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_OH %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.4.16 OR - Oregon : Portland-Salem

cu_emp_data_USMedical_OR = cu_data_USMedical %>% 
                            filter(area_name=="Portland-Salem", 
                                   period %in% c("S01","S02")) %>% 
                            mutate(parent_item_code = 'SAM', 
                                   emp_line_code = 1600,
                                   industry = 'Medical',
                                   state = 'OR') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_OR = merge(EMP_DAT_CAT_SAN_OR, cu_emp_data_USMedical_OR, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_OR %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.4.17 PA - Pennsylvania

Philadelphia-Wilmington-Atlantic City
Pittsburgh

cu_emp_data_USMedical_PA = cu_data_USMedical %>% 
                                    filter( (area_name %in% c("Philadelphia-Wilmington-Atlantic City")
                                            & period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
                                            )
                                           |
                                            (area_name %in% c("Pittsburgh")
                                             & period %in% c("S01","S02")
                                             )
                                         ) %>% 
                            mutate(parent_item_code = 'SAM', 
                                   emp_line_code = 1600,
                                   area_code = 'A102;A104', 
                                   area_name = 'Philadelphia-Wilmington-Atlantic City;Pittsburgh',
                                   industry = 'Medical',
                                   state = 'PA') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USMedical_PA = merge(EMP_DAT_CAT_SAN_PA, cu_emp_data_USMedical_PA, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_PA %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.4.18 TX - Texas

Dallas-Fort Worth
Houston-Galveston-Brazoria

cu_emp_data_USMedical_TX = cu_data_USMedical %>% 
                            filter(area_name %in% c("Dallas-Fort Worth","Houston-Galveston-Brazoria"), 
                                           period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                            mutate(parent_item_code = 'SAM', 
                                   emp_line_code = 1600,
                                   area_code = 'A316;A318', 
                                   area_name = 'Dallas-Fort Worth;Houston-Galveston-Brazoria',
                                   industry = 'Medical',
                                   state = 'TX') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USMedical_TX = merge(EMP_DAT_CAT_SAN_TX, cu_emp_data_USMedical_TX, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_TX %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.4.19 WA - Washington : Seattle-Tacoma-Bremerton

cu_emp_data_USMedical_WA = cu_data_USMedical %>% 
                            filter(area_name=="Seattle-Tacoma-Bremerton", 
                                   period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                            mutate(parent_item_code = 'SAM', 
                                   emp_line_code = 1600,
                                   industry = 'Medical',
                                   state = 'WA') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_WA = merge(EMP_DAT_CAT_SAN_WA, cu_emp_data_USMedical_WA, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_WA %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.4.20 WI - Wisconsin : Milwaukee-Racine

cu_emp_data_USMedical_WI = cu_data_USMedical %>% 
                            filter(area_name=="Milwaukee-Racine", 
                                   period %in% c("S01","S02")) %>% 
                            mutate(parent_item_code = 'SAM', 
                                   emp_line_code = 1600,
                                   industry = 'Medical',
                                   state = 'WI') %>% 
                            select(c("industry",
                                     "state",
                                     "parent_item_code",
                                     "emp_line_code",
                                     "series_id",
                                     "series_title",
                                     "year",
                                     "period",
                                     "period_abbr",
                                     "period_name",
                                     "value",
                                     "area_code",
                                     "area_name",
                                     "area_display_level",
                                     "area_selectable",
                                     "area_sort_sequence",
                                     "item_code",
                                     "item_name",
                                     "item_display_level",
                                     "item_selectable",
                                     "item_sort_sequence",
                                     "seasonal",
                                     "periodicity_code",
                                     "periodicity_name",
                                     "base_code",
                                     "base_name",
                                     "base_period")) %>% 
                            group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                            summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USMedical_WI = merge(EMP_DAT_CAT_SAN_WI, cu_emp_data_USMedical_WI, 
                                  by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USMedical_WI %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.5 Education & Communication

4.5.1 AK - Alaska : Anchorage

cu_data_USEducationAndCommunication_AK = cu_data_USEducationAndCommunication %>% 
                                          filter(area_name=="Anchorage", 
                                                 period %in% c("S01","S02")) %>% 
                                          mutate(parent_item_code = 'SAE', 
                                                 emp_line_code = 1500,
                                                 industry = 'Education & Communication',
                                                 state = 'AK') %>% 
                                          select(c("industry",
                                                   "state",
                                                   "parent_item_code",
                                                   "emp_line_code",
                                                   "series_id",
                                                   "series_title",
                                                   "year",
                                                   "period",
                                                   "period_abbr",
                                                   "period_name",
                                                   "value",
                                                   "area_code",
                                                   "area_name",
                                                   "area_display_level",
                                                   "area_selectable",
                                                   "area_sort_sequence",
                                                   "item_code",
                                                   "item_name",
                                                   "item_display_level",
                                                   "item_selectable",
                                                   "item_sort_sequence",
                                                   "seasonal",
                                                   "periodicity_code",
                                                   "periodicity_name",
                                                   "base_code",
                                                   "base_name",
                                                   "base_period")) %>% 
                                          group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                          summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_AK = merge(EMP_DAT_CAT_SAN_AK, cu_data_USEducationAndCommunication_AK, 
                                                    by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_AK %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.5.2 AZ - Arizona : Phoenix-Mesa

cu_data_USEducationAndCommunication_AZ = cu_data_USEducationAndCommunication %>% 
                                          filter(area_name=="Phoenix-Mesa", 
                                                 period %in% c("S01","S02")) %>% 
                                          mutate(parent_item_code = 'SAE', 
                                                 emp_line_code = 1500,
                                                 industry = 'Education & Communication',
                                                 state = 'AZ') %>% 
                                          select(c("industry",
                                                   "state",
                                                   "parent_item_code",
                                                   "emp_line_code",
                                                   "series_id",
                                                   "series_title",
                                                   "year",
                                                   "period",
                                                   "period_abbr",
                                                   "period_name",
                                                   "value",
                                                   "area_code",
                                                   "area_name",
                                                   "area_display_level",
                                                   "area_selectable",
                                                   "area_sort_sequence",
                                                   "item_code",
                                                   "item_name",
                                                   "item_display_level",
                                                   "item_selectable",
                                                   "item_sort_sequence",
                                                   "seasonal",
                                                   "periodicity_code",
                                                   "periodicity_name",
                                                   "base_code",
                                                   "base_name",
                                                   "base_period")) %>% 
                                          group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                          summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_AZ = merge(EMP_DAT_CAT_SAN_AZ, cu_data_USEducationAndCommunication_AZ, 
                                                    by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_AZ %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.5.3 CA - California

Los Angeles-Riverside-Orange County
San Francisco-Oakland-San Jose
San Diego

cu_data_USEducationAndCommunication_CA = cu_data_USEducationAndCommunication %>% 
                                          filter( (area_name %in% c("Los Angeles-Riverside-Orange County", "San Francisco-Oakland-San Jose")
                                                    & period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
                                                    )
                                                   |
                                                    (area_name %in% c("San Diego")
                                                     & period %in% c("S01","S02")
                                                     )
                                                 ) %>% 
                                          mutate(parent_item_code = 'SAE', 
                                                 emp_line_code = 1500,
                                                 area_code = 'A421;A422;A424',
                                                 area_name = 'Los Angeles-Riverside-Orange County;San Francisco-Oakland-San Jose;San Diego',
                                                 industry = 'Education & Communication',
                                                 state = 'CA') %>% 
                                          select(c("industry",
                                                   "state",
                                                   "parent_item_code",
                                                   "emp_line_code",
                                                   "series_id",
                                                   "series_title",
                                                   "year",
                                                   "period",
                                                   "period_abbr",
                                                   "period_name",
                                                   "value",
                                                   "area_code",
                                                   "area_name",
                                                   "area_display_level",
                                                   "area_selectable",
                                                   "area_sort_sequence",
                                                   "item_code",
                                                   "item_name",
                                                   "item_display_level",
                                                   "item_selectable",
                                                   "item_sort_sequence",
                                                   "seasonal",
                                                   "periodicity_code",
                                                   "periodicity_name",
                                                   "base_code",
                                                   "base_name",
                                                   "base_period")) %>% 
                                          group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                          summarise(cpi_value = round(sum(value/3),2))
cpi_emp_data_USEducationAndCommunication_CA = merge(EMP_DAT_CAT_SAN_CA, cu_data_USEducationAndCommunication_CA, 
                                                    by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_CA %>%
  ###sample_frac(0.33) %>% 
  DT::datatable()

4.5.4 CO - Colorado : Denver-Boulder-Greeley

cu_data_USEducationAndCommunication_CO = cu_data_USEducationAndCommunication %>% 
                                          filter(area_name=="Denver-Boulder-Greeley", 
                                                 period %in% c("S01","S02")) %>% 
                                          mutate(parent_item_code = 'SAE', 
                                                 emp_line_code = 1500,
                                                 industry = 'Education & Communication',
                                                 state = 'CO') %>% 
                                          select(c("industry",
                                                   "state",
                                                   "parent_item_code",
                                                   "emp_line_code",
                                                   "series_id",
                                                   "series_title",
                                                   "year",
                                                   "period",
                                                   "period_abbr",
                                                   "period_name",
                                                   "value",
                                                   "area_code",
                                                   "area_name",
                                                   "area_display_level",
                                                   "area_selectable",
                                                   "area_sort_sequence",
                                                   "item_code",
                                                   "item_name",
                                                   "item_display_level",
                                                   "item_selectable",
                                                   "item_sort_sequence",
                                                   "seasonal",
                                                   "periodicity_code",
                                                   "periodicity_name",
                                                   "base_code",
                                                   "base_name",
                                                   "base_period")) %>% 
                                          group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                          summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_CO = merge(EMP_DAT_CAT_SAN_CO, cu_data_USEducationAndCommunication_CO, 
                                                    by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_CO %>%
  ###sample_frac(0.33) %>% 
  DT::datatable()

4.5.5 FL - Florida

Miami-Fort Lauderdale
Tampa-St. Petersburg-Clearwater

cu_data_USEducationAndCommunication_FL = cu_data_USEducationAndCommunication %>% 
                                          filter( (area_name %in% c("Miami-Fort Lauderdale")
                                                  & period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
                                                  )
                                                 |
                                                  (area_name %in% c("Tampa-St. Petersburg-Clearwater")
                                                   & period %in% c("S01","S02")
                                                   )
                                               ) %>% 
                                          mutate(parent_item_code = 'SAE', 
                                                 emp_line_code = 1500,
                                                 area_code = 'A320;A321',
                                                 area_name = 'Miami-Fort Lauderdale;Tampa-St. Petersburg-Clearwater',
                                                 industry = 'Education & Communication',
                                                 state = 'FL') %>% 
                                          select(c("industry",
                                                   "state",
                                                   "parent_item_code",
                                                   "emp_line_code",
                                                   "series_id",
                                                   "series_title",
                                                   "year",
                                                   "period",
                                                   "period_abbr",
                                                   "period_name",
                                                   "value",
                                                   "area_code",
                                                   "area_name",
                                                   "area_display_level",
                                                   "area_selectable",
                                                   "area_sort_sequence",
                                                   "item_code",
                                                   "item_name",
                                                   "item_display_level",
                                                   "item_selectable",
                                                   "item_sort_sequence",
                                                   "seasonal",
                                                   "periodicity_code",
                                                   "periodicity_name",
                                                   "base_code",
                                                   "base_name",
                                                   "base_period")) %>% 
                                          group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                          summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_FL = merge(EMP_DAT_CAT_SAN_FL, cu_data_USEducationAndCommunication_FL, 
                                                    by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_FL %>%
  ###sample_frac(0.33) %>% 
  DT::datatable()

4.5.6 GA - Georgia : Atlanta

cu_data_USEducationAndCommunication_GA = cu_data_USEducationAndCommunication %>% 
                                          filter(area_name=="Atlanta", 
                                                 period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                                          mutate(parent_item_code = 'SAE', 
                                                 emp_line_code = 1500,
                                                 industry = 'Education & Communication',
                                                 state = 'GA') %>% 
                                          select(c("industry",
                                                   "state",
                                                   "parent_item_code",
                                                   "emp_line_code",
                                                   "series_id",
                                                   "series_title",
                                                   "year",
                                                   "period",
                                                   "period_abbr",
                                                   "period_name",
                                                   "value",
                                                   "area_code",
                                                   "area_name",
                                                   "area_display_level",
                                                   "area_selectable",
                                                   "area_sort_sequence",
                                                   "item_code",
                                                   "item_name",
                                                   "item_display_level",
                                                   "item_selectable",
                                                   "item_sort_sequence",
                                                   "seasonal",
                                                   "periodicity_code",
                                                   "periodicity_name",
                                                   "base_code",
                                                   "base_name",
                                                   "base_period")) %>% 
                                          group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                          summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_GA = merge(EMP_DAT_CAT_SAN_GA, cu_data_USEducationAndCommunication_GA, 
                                                    by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_GA %>%
  ###sample_frac(0.33) %>% 
  DT::datatable()

4.5.7 HI - Hawaii : Honolulu

cu_data_USEducationAndCommunication_HI = cu_data_USEducationAndCommunication %>% 
                                          filter(area_name=="Honolulu", 
                                                 period %in% c("S01","S02")) %>% 
                                          mutate(parent_item_code = 'SAE', 
                                                 emp_line_code = 1500,
                                                 industry = 'Education & Communication',
                                                 state = 'HI') %>% 
                                          select(c("industry",
                                                   "state",
                                                   "parent_item_code",
                                                   "emp_line_code",
                                                   "series_id",
                                                   "series_title",
                                                   "year",
                                                   "period",
                                                   "period_abbr",
                                                   "period_name",
                                                   "value",
                                                   "area_code",
                                                   "area_name",
                                                   "area_display_level",
                                                   "area_selectable",
                                                   "area_sort_sequence",
                                                   "item_code",
                                                   "item_name",
                                                   "item_display_level",
                                                   "item_selectable",
                                                   "item_sort_sequence",
                                                   "seasonal",
                                                   "periodicity_code",
                                                   "periodicity_name",
                                                   "base_code",
                                                   "base_name",
                                                   "base_period")) %>% 
                                          group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                          summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_HI = merge(EMP_DAT_CAT_SAN_HI, cu_data_USEducationAndCommunication_HI, 
                                                    by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_HI %>%
  ###sample_frac(0.33) %>% 
  DT::datatable()

4.5.8 IL - Illinois : Chicago-Gary-Kenosha

cu_data_USEducationAndCommunication_IL = cu_data_USEducationAndCommunication %>% 
                                          filter(area_name=="Chicago-Gary-Kenosha", 
                                                 period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                                          mutate(parent_item_code = 'SAE', 
                                                 emp_line_code = 1500,
                                                 industry = 'Education & Communication',
                                                 state = 'IL') %>% 
                                          select(c("industry",
                                                   "state",
                                                   "parent_item_code",
                                                   "emp_line_code",
                                                   "series_id",
                                                   "series_title",
                                                   "year",
                                                   "period",
                                                   "period_abbr",
                                                   "period_name",
                                                   "value",
                                                   "area_code",
                                                   "area_name",
                                                   "area_display_level",
                                                   "area_selectable",
                                                   "area_sort_sequence",
                                                   "item_code",
                                                   "item_name",
                                                   "item_display_level",
                                                   "item_selectable",
                                                   "item_sort_sequence",
                                                   "seasonal",
                                                   "periodicity_code",
                                                   "periodicity_name",
                                                   "base_code",
                                                   "base_name",
                                                   "base_period")) %>% 
                                          group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                          summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_IL = merge(EMP_DAT_CAT_SAN_IL, cu_data_USEducationAndCommunication_IL, 
                                                    by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_IL %>%
  ###sample_frac(0.33) %>% 
  DT::datatable()

4.5.9 KS - Kansas : Kansas City

cu_data_USEducationAndCommunication_KS = cu_data_USEducationAndCommunication %>% 
                                          filter(area_name=="Kansas City", 
                                                 period %in% c("S01","S02")) %>% 
                                          mutate(parent_item_code = 'SAE', 
                                                 emp_line_code = 1500,
                                                 industry = 'Education & Communication',
                                                 state = 'KS') %>% 
                                          select(c("industry",
                                                   "state",
                                                   "parent_item_code",
                                                   "emp_line_code",
                                                   "series_id",
                                                   "series_title",
                                                   "year",
                                                   "period",
                                                   "period_abbr",
                                                   "period_name",
                                                   "value",
                                                   "area_code",
                                                   "area_name",
                                                   "area_display_level",
                                                   "area_selectable",
                                                   "area_sort_sequence",
                                                   "item_code",
                                                   "item_name",
                                                   "item_display_level",
                                                   "item_selectable",
                                                   "item_sort_sequence",
                                                   "seasonal",
                                                   "periodicity_code",
                                                   "periodicity_name",
                                                   "base_code",
                                                   "base_name",
                                                   "base_period")) %>% 
                                          group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                          summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_KS = merge(EMP_DAT_CAT_SAN_KS, cu_data_USEducationAndCommunication_KS, 
                                                    by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_KS %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.5.10 MA - Massachusetts : Boston-Brockton-Nashua

cu_data_USEducationAndCommunication_MA = cu_data_USEducationAndCommunication %>% 
                                          filter(area_name=="Boston-Brockton-Nashua", 
                                                 period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                                          mutate(parent_item_code = 'SAE', 
                                                 emp_line_code = 1500,
                                                 industry = 'Education & Communication',
                                                 state = 'MA') %>% 
                                          select(c("industry",
                                                   "state",
                                                   "parent_item_code",
                                                   "emp_line_code",
                                                   "series_id",
                                                   "series_title",
                                                   "year",
                                                   "period",
                                                   "period_abbr",
                                                   "period_name",
                                                   "value",
                                                   "area_code",
                                                   "area_name",
                                                   "area_display_level",
                                                   "area_selectable",
                                                   "area_sort_sequence",
                                                   "item_code",
                                                   "item_name",
                                                   "item_display_level",
                                                   "item_selectable",
                                                   "item_sort_sequence",
                                                   "seasonal",
                                                   "periodicity_code",
                                                   "periodicity_name",
                                                   "base_code",
                                                   "base_name",
                                                   "base_period")) %>% 
                                          group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                          summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_MA = merge(EMP_DAT_CAT_SAN_MA, cu_data_USEducationAndCommunication_MA, 
                                                    by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_MA %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.5.11 MI - Michigan

cu_data_USEducationAndCommunication_MI = cu_data_USEducationAndCommunication %>% 
                                          filter(area_name=="Detroit-Ann Arbor-Flint", 
                                                 period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                                          mutate(parent_item_code = 'SAE', 
                                                 emp_line_code = 1500,
                                                 industry = 'Education & Communication',
                                                 state = 'MI') %>% 
                                          select(c("industry",
                                                   "state",
                                                   "parent_item_code",
                                                   "emp_line_code",
                                                   "series_id",
                                                   "series_title",
                                                   "year",
                                                   "period",
                                                   "period_abbr",
                                                   "period_name",
                                                   "value",
                                                   "area_code",
                                                   "area_name",
                                                   "area_display_level",
                                                   "area_selectable",
                                                   "area_sort_sequence",
                                                   "item_code",
                                                   "item_name",
                                                   "item_display_level",
                                                   "item_selectable",
                                                   "item_sort_sequence",
                                                   "seasonal",
                                                   "periodicity_code",
                                                   "periodicity_name",
                                                   "base_code",
                                                   "base_name",
                                                   "base_period")) %>% 
                                          group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                          summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_MI = merge(EMP_DAT_CAT_SAN_MI, cu_data_USEducationAndCommunication_MI, 
                                                    by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_MI %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.5.12 MN - Minnesota : Minneapolis-St. Paul

cu_data_USEducationAndCommunication_MN = cu_data_USEducationAndCommunication %>% 
                                          filter(area_name=="Minneapolis-St. Paul", 
                                                 period %in% c("S01","S02")) %>% 
                                          mutate(parent_item_code = 'SAE', 
                                                 emp_line_code = 1500,
                                                 industry = 'Education & Communication',
                                                 state = 'MN') %>% 
                                          select(c("industry",
                                                   "state",
                                                   "parent_item_code",
                                                   "emp_line_code",
                                                   "series_id",
                                                   "series_title",
                                                   "year",
                                                   "period",
                                                   "period_abbr",
                                                   "period_name",
                                                   "value",
                                                   "area_code",
                                                   "area_name",
                                                   "area_display_level",
                                                   "area_selectable",
                                                   "area_sort_sequence",
                                                   "item_code",
                                                   "item_name",
                                                   "item_display_level",
                                                   "item_selectable",
                                                   "item_sort_sequence",
                                                   "seasonal",
                                                   "periodicity_code",
                                                   "periodicity_name",
                                                   "base_code",
                                                   "base_name",
                                                   "base_period")) %>% 
                                          group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                          summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_MN = merge(EMP_DAT_CAT_SAN_MN, cu_data_USEducationAndCommunication_MN, 
                                                    by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_MN %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.5.13 MO - Missouri : St. Louis

cu_data_USEducationAndCommunication_MO = cu_data_USEducationAndCommunication %>% 
                                          filter(area_name=="St. Louis", 
                                                 period %in% c("S01","S02")) %>% 
                                          mutate(parent_item_code = 'SAE', 
                                                 emp_line_code = 1500,
                                                 industry = 'Education & Communication',
                                                 state = 'MO') %>% 
                                          select(c("industry",
                                                   "state",
                                                   "parent_item_code",
                                                   "emp_line_code",
                                                   "series_id",
                                                   "series_title",
                                                   "year",
                                                   "period",
                                                   "period_abbr",
                                                   "period_name",
                                                   "value",
                                                   "area_code",
                                                   "area_name",
                                                   "area_display_level",
                                                   "area_selectable",
                                                   "area_sort_sequence",
                                                   "item_code",
                                                   "item_name",
                                                   "item_display_level",
                                                   "item_selectable",
                                                   "item_sort_sequence",
                                                   "seasonal",
                                                   "periodicity_code",
                                                   "periodicity_name",
                                                   "base_code",
                                                   "base_name",
                                                   "base_period")) %>% 
                                          group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                          summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_MO = merge(EMP_DAT_CAT_SAN_MO, cu_data_USEducationAndCommunication_MO, 
                                                    by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_MO %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.5.14 NY - New York

cu_data_USEducationAndCommunication_NY = cu_data_USEducationAndCommunication %>% 
                                          filter(area_name=="New York-Northern New Jersey-Long Island", 
                                                 period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                                          mutate(parent_item_code = 'SAE', 
                                                 emp_line_code = 1500,
                                                 industry = 'Education & Communication',
                                                 state = 'NY') %>% 
                                          select(c("industry",
                                                   "state",
                                                   "parent_item_code",
                                                   "emp_line_code",
                                                   "series_id",
                                                   "series_title",
                                                   "year",
                                                   "period",
                                                   "period_abbr",
                                                   "period_name",
                                                   "value",
                                                   "area_code",
                                                   "area_name",
                                                   "area_display_level",
                                                   "area_selectable",
                                                   "area_sort_sequence",
                                                   "item_code",
                                                   "item_name",
                                                   "item_display_level",
                                                   "item_selectable",
                                                   "item_sort_sequence",
                                                   "seasonal",
                                                   "periodicity_code",
                                                   "periodicity_name",
                                                   "base_code",
                                                   "base_name",
                                                   "base_period")) %>% 
                                          group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                          summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_NY = merge(EMP_DAT_CAT_SAN_NY, cu_data_USEducationAndCommunication_NY, 
                                                    by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_NY %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.5.15 OH - Ohio

Cleveland-Akron
Cincinnati-Hamilton

cu_data_USEducationAndCommunication_OH = cu_data_USEducationAndCommunication %>% 
                                          filter( (area_name %in% c("Cleveland-Akron")
                                                  & period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
                                                  )
                                                 |
                                                  (area_name %in% c("Cincinnati-Hamilton")
                                                   & period %in% c("S01","S02")
                                                   )
                                               ) %>% 
                                          mutate(parent_item_code = 'SAE', 
                                                 emp_line_code = 1500,
                                                 area_code = 'A210;A213', 
                                                 area_name = 'Cleveland-Akron;Cincinnati-Hamilton',
                                                 industry = 'Education & Communication',
                                                 state = 'OH') %>% 
                                          select(c("industry",
                                                   "state",
                                                   "parent_item_code",
                                                   "emp_line_code",
                                                   "series_id",
                                                   "series_title",
                                                   "year",
                                                   "period",
                                                   "period_abbr",
                                                   "period_name",
                                                   "value",
                                                   "area_code",
                                                   "area_name",
                                                   "area_display_level",
                                                   "area_selectable",
                                                   "area_sort_sequence",
                                                   "item_code",
                                                   "item_name",
                                                   "item_display_level",
                                                   "item_selectable",
                                                   "item_sort_sequence",
                                                   "seasonal",
                                                   "periodicity_code",
                                                   "periodicity_name",
                                                   "base_code",
                                                   "base_name",
                                                   "base_period")) %>% 
                                          group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                          summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USEducationAndCommunication_OH = merge(EMP_DAT_CAT_SAN_OH, cu_data_USEducationAndCommunication_OH, 
                                                    by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_OH %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.5.16 OR - Oregon : Portland-Salem

cu_data_USEducationAndCommunication_OR = cu_data_USEducationAndCommunication %>% 
                                          filter(area_name=="Portland-Salem", 
                                                 period %in% c("S01","S02")) %>% 
                                          mutate(parent_item_code = 'SAE', 
                                                 emp_line_code = 1500,
                                                 industry = 'Education & Communication',
                                                 state = 'OR') %>% 
                                          select(c("industry",
                                                   "state",
                                                   "parent_item_code",
                                                   "emp_line_code",
                                                   "series_id",
                                                   "series_title",
                                                   "year",
                                                   "period",
                                                   "period_abbr",
                                                   "period_name",
                                                   "value",
                                                   "area_code",
                                                   "area_name",
                                                   "area_display_level",
                                                   "area_selectable",
                                                   "area_sort_sequence",
                                                   "item_code",
                                                   "item_name",
                                                   "item_display_level",
                                                   "item_selectable",
                                                   "item_sort_sequence",
                                                   "seasonal",
                                                   "periodicity_code",
                                                   "periodicity_name",
                                                   "base_code",
                                                   "base_name",
                                                   "base_period")) %>% 
                                          group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                          summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_OR = merge(EMP_DAT_CAT_SAN_OR, cu_data_USEducationAndCommunication_OR, 
                                                    by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_OR %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.5.17 PA - Pennsylvania

Philadelphia-Wilmington-Atlantic City
Pittsburgh

cu_data_USEducationAndCommunication_PA = cu_data_USEducationAndCommunication %>% 
                                          filter( (area_name %in% c("Philadelphia-Wilmington-Atlantic City")
                                                  & period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")
                                                  )
                                                 |
                                                  (area_name %in% c("Pittsburgh")
                                                   & period %in% c("S01","S02")
                                                   )
                                               ) %>% 
                                          mutate(parent_item_code = 'SAE', 
                                                 emp_line_code = 1500,
                                                 area_code = 'A102;A104', 
                                                 area_name = 'Philadelphia-Wilmington-Atlantic City;Pittsburgh',
                                                 industry = 'Education & Communication',
                                                 state = 'PA') %>% 
                                          select(c("industry",
                                                   "state",
                                                   "parent_item_code",
                                                   "emp_line_code",
                                                   "series_id",
                                                   "series_title",
                                                   "year",
                                                   "period",
                                                   "period_abbr",
                                                   "period_name",
                                                   "value",
                                                   "area_code",
                                                   "area_name",
                                                   "area_display_level",
                                                   "area_selectable",
                                                   "area_sort_sequence",
                                                   "item_code",
                                                   "item_name",
                                                   "item_display_level",
                                                   "item_selectable",
                                                   "item_sort_sequence",
                                                   "seasonal",
                                                   "periodicity_code",
                                                   "periodicity_name",
                                                   "base_code",
                                                   "base_name",
                                                   "base_period")) %>% 
                                          group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                          summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USEducationAndCommunication_PA = merge(EMP_DAT_CAT_SAN_PA, cu_data_USEducationAndCommunication_PA, 
                                                    by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_PA %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.5.18 TX - Texas

Dallas-Fort Worth
Houston-Galveston-Brazoria

cu_data_USEducationAndCommunication_TX = cu_data_USEducationAndCommunication %>% 
                                          filter(area_name %in% c("Dallas-Fort Worth","Houston-Galveston-Brazoria"), 
                                                 period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>%
                                          mutate(parent_item_code = 'SAE', 
                                                 emp_line_code = 1500,
                                                 area_code = 'A316;A318', 
                                                 area_name = 'Dallas-Fort Worth;Houston-Galveston-Brazoria',
                                                 industry = 'Education & Communication',
                                                 state = 'TX') %>% 
                                          select(c("industry",
                                                   "state",
                                                   "parent_item_code",
                                                   "emp_line_code",
                                                   "series_id",
                                                   "series_title",
                                                   "year",
                                                   "period",
                                                   "period_abbr",
                                                   "period_name",
                                                   "value",
                                                   "area_code",
                                                   "area_name",
                                                   "area_display_level",
                                                   "area_selectable",
                                                   "area_sort_sequence",
                                                   "item_code",
                                                   "item_name",
                                                   "item_display_level",
                                                   "item_selectable",
                                                   "item_sort_sequence",
                                                   "seasonal",
                                                   "periodicity_code",
                                                   "periodicity_name",
                                                   "base_code",
                                                   "base_name",
                                                   "base_period")) %>% 
                                          group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                          summarise(cpi_value = round(sum(value)/2,2))
cpi_emp_data_USEducationAndCommunication_TX = merge(EMP_DAT_CAT_SAN_TX, cu_data_USEducationAndCommunication_TX, 
                                                    by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_TX %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.5.19 WA - Washington : Seattle-Tacoma-Bremerton

cu_data_USEducationAndCommunication_WA = cu_data_USEducationAndCommunication %>% 
                                          filter(area_name=="Seattle-Tacoma-Bremerton", 
                                                 period %in% c("M01","M02","M03","M04","M05","M06","M07","M08","M09","M10","M11","M12")) %>% 
                                          mutate(parent_item_code = 'SAE', 
                                                 emp_line_code = 1500,
                                                 industry = 'Education & Communication',
                                                 state = 'WA') %>% 
                                          select(c("industry",
                                                   "state",
                                                   "parent_item_code",
                                                   "emp_line_code",
                                                   "series_id",
                                                   "series_title",
                                                   "year",
                                                   "period",
                                                   "period_abbr",
                                                   "period_name",
                                                   "value",
                                                   "area_code",
                                                   "area_name",
                                                   "area_display_level",
                                                   "area_selectable",
                                                   "area_sort_sequence",
                                                   "item_code",
                                                   "item_name",
                                                   "item_display_level",
                                                   "item_selectable",
                                                   "item_sort_sequence",
                                                   "seasonal",
                                                   "periodicity_code",
                                                   "periodicity_name",
                                                   "base_code",
                                                   "base_name",
                                                   "base_period")) %>% 
                                          group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                          summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_WA = merge(EMP_DAT_CAT_SAN_WA, cu_data_USEducationAndCommunication_WA, 
                                                    by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_WA %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.5.20 WI - Wisconsin : Milwaukee-Racine

cu_data_USEducationAndCommunication_WI = cu_data_USEducationAndCommunication %>% 
                                          filter(area_name=="Milwaukee-Racine", 
                                                 period %in% c("S01","S02")) %>% 
                                          mutate(parent_item_code = 'SAE', 
                                                 emp_line_code = 1500,
                                                 industry = 'Education & Communication',
                                                 state = 'WI') %>% 
                                          select(c("industry",
                                                   "state",
                                                   "parent_item_code",
                                                   "emp_line_code",
                                                   "series_id",
                                                   "series_title",
                                                   "year",
                                                   "period",
                                                   "period_abbr",
                                                   "period_name",
                                                   "value",
                                                   "area_code",
                                                   "area_name",
                                                   "area_display_level",
                                                   "area_selectable",
                                                   "area_sort_sequence",
                                                   "item_code",
                                                   "item_name",
                                                   "item_display_level",
                                                   "item_selectable",
                                                   "item_sort_sequence",
                                                   "seasonal",
                                                   "periodicity_code",
                                                   "periodicity_name",
                                                   "base_code",
                                                   "base_name",
                                                   "base_period")) %>% 
                                          group_by(industry, state, parent_item_code, emp_line_code, year, area_code, area_name) %>% 
                                          summarise(cpi_value = round(sum(value),2))
cpi_emp_data_USEducationAndCommunication_WI = merge(EMP_DAT_CAT_SAN_WI, cu_data_USEducationAndCommunication_WI, 
                                                    by.x=c("LineCode","Year"), by.y=c("emp_line_code","year"))
cpi_emp_data_USEducationAndCommunication_WI %>%
#  sample_frac(0.33) %>% 
  DT::datatable()

4.6 Combined Employment and CPI Data

cpi_emp_data_Combined <- rbind(cpi_emp_data_USFoodBeverage_AK,
                                cpi_emp_data_USFoodBeverage_AZ,
                                cpi_emp_data_USFoodBeverage_CA,
                                cpi_emp_data_USFoodBeverage_CO,
                                cpi_emp_data_USFoodBeverage_FL,
                                cpi_emp_data_USFoodBeverage_GA,
                                cpi_emp_data_USFoodBeverage_HI,
                                cpi_emp_data_USFoodBeverage_IL,
                                cpi_emp_data_USFoodBeverage_KS,
                                cpi_emp_data_USFoodBeverage_MA,
                                cpi_emp_data_USFoodBeverage_MI,
                                cpi_emp_data_USFoodBeverage_MN,
                                cpi_emp_data_USFoodBeverage_MO,
                                cpi_emp_data_USFoodBeverage_NY,
                                cpi_emp_data_USFoodBeverage_OH,
                                cpi_emp_data_USFoodBeverage_OR,
                                cpi_emp_data_USFoodBeverage_PA,
                                cpi_emp_data_USFoodBeverage_TX,
                                cpi_emp_data_USFoodBeverage_WA,
                                cpi_emp_data_USFoodBeverage_WI,
                                cpi_emp_data_USHousing_AK,
                                cpi_emp_data_USHousing_AZ,
                                cpi_emp_data_USHousing_CA,
                                cpi_emp_data_USHousing_CO,
                                cpi_emp_data_USHousing_FL,
                                cpi_emp_data_USHousing_GA,
                                cpi_emp_data_USHousing_HI,
                                cpi_emp_data_USHousing_IL,
                                cpi_emp_data_USHousing_KS,
                                cpi_emp_data_USHousing_MA,
                                cpi_emp_data_USHousing_MI,
                                cpi_emp_data_USHousing_MN,
                                cpi_emp_data_USHousing_MO,
                                cpi_emp_data_USHousing_NY,
                                cpi_emp_data_USHousing_OH,
                                cpi_emp_data_USHousing_OR,
                                cpi_emp_data_USHousing_PA,
                                cpi_emp_data_USHousing_TX,
                                cpi_emp_data_USHousing_WA,
                                cpi_emp_data_USHousing_WI,
                                cpi_emp_data_USTransportation_AK,
                                cpi_emp_data_USTransportation_CA,
                                cpi_emp_data_USTransportation_CO,
                                cpi_emp_data_USTransportation_FL,
                                cpi_emp_data_USTransportation_GA,
                                cpi_emp_data_USTransportation_HI,
                                cpi_emp_data_USTransportation_IL,
                                cpi_emp_data_USTransportation_KS,
                                cpi_emp_data_USTransportation_MA,
                                cpi_emp_data_USTransportation_MI,
                                cpi_emp_data_USTransportation_MN,
                                cpi_emp_data_USTransportation_MO,
                                cpi_emp_data_USTransportation_NY,
                                cpi_emp_data_USTransportation_OH,
                                cpi_emp_data_USTransportation_OR,
                                cpi_emp_data_USTransportation_PA,
                                cpi_emp_data_USTransportation_TX,
                                cpi_emp_data_USTransportation_WA,
                                cpi_emp_data_USTransportation_WI,
                                cpi_emp_data_USMedical_AK,
                                cpi_emp_data_USMedical_AZ,
                                cpi_emp_data_USMedical_CA,
                                cpi_emp_data_USMedical_CO,
                                cpi_emp_data_USMedical_FL,
                                cpi_emp_data_USMedical_GA,
                                cpi_emp_data_USMedical_HI,
                                cpi_emp_data_USMedical_IL,
                                cpi_emp_data_USMedical_KS,
                                cpi_emp_data_USMedical_MA,
                                cpi_emp_data_USMedical_MI,
                                cpi_emp_data_USMedical_MN,
                                cpi_emp_data_USMedical_MO,
                                cpi_emp_data_USMedical_NY,
                                cpi_emp_data_USMedical_OH,
                                cpi_emp_data_USMedical_OR,
                                cpi_emp_data_USMedical_PA,
                                cpi_emp_data_USMedical_TX,
                                cpi_emp_data_USMedical_WA,
                                cpi_emp_data_USMedical_WI,
                                cpi_emp_data_USEducationAndCommunication_AK,
                                cpi_emp_data_USEducationAndCommunication_AZ,
                                cpi_emp_data_USEducationAndCommunication_CA,
                                cpi_emp_data_USEducationAndCommunication_CO,
                                cpi_emp_data_USEducationAndCommunication_FL,
                                cpi_emp_data_USEducationAndCommunication_GA,
                                cpi_emp_data_USEducationAndCommunication_HI,
                                cpi_emp_data_USEducationAndCommunication_IL,
                                cpi_emp_data_USEducationAndCommunication_KS,
                                cpi_emp_data_USEducationAndCommunication_MA,
                                cpi_emp_data_USEducationAndCommunication_MI,
                                cpi_emp_data_USEducationAndCommunication_MN,
                                cpi_emp_data_USEducationAndCommunication_MO,
                                cpi_emp_data_USEducationAndCommunication_NY,
                                cpi_emp_data_USEducationAndCommunication_OH,
                                cpi_emp_data_USEducationAndCommunication_OR,
                                cpi_emp_data_USEducationAndCommunication_PA,
                                cpi_emp_data_USEducationAndCommunication_TX,
                                cpi_emp_data_USEducationAndCommunication_WA,
                                cpi_emp_data_USEducationAndCommunication_WI)
                        
cpi_emp_data_Combined %>%
  sample_frac(0.33) %>% 
  DT::datatable()

5 RAJWANT WORK

cpi_emp_data_USFoodBeverage_NY %>% 
    select(c("Year","Sector"= "Description","No_Jobs"="Number of jobs","cpi_value")) %>%   
    gather(key="Type", value ="Value", "cpi_value","No_Jobs") %>% 
    ggplot(mapping = aes(x= Year, y=(as.numeric(Value)),fill = Type,group = Type)) + 
    geom_col(position = position_dodge()) + geom_smooth(method = "lm")

## `geom_smooth()` using formula 'y ~ x'

    theme(axis.text.x = element_text(angle = 60, colour="gray",hjust = 1,size=rel(0.86))) +
    labs(title = "USSFoodBeverage Data for No. Of Jobs and CPI Value in NY",
         y="Sector", x="Years")

## List of 4
##  $ axis.text.x:List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : chr "gray"
##   ..$ size         : 'rel' num 0.86
##   ..$ hjust        : num 1
##   ..$ vjust        : NULL
##   ..$ angle        : num 60
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi FALSE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ y          : chr "Sector"
##  $ x          : chr "Years"
##  $ title      : chr "USSFoodBeverage Data for No. Of Jobs and CPI Value in NY"
##  - attr(*, "class")= chr [1:2] "theme" "gg"
##  - attr(*, "complete")= logi FALSE
##  - attr(*, "validate")= logi TRUE

cpi_emp_data_USFoodBeverage_NY %>% 
    select(c("Year","Sector"= "Description","No_Jobs"="Number of jobs")) %>% 
    ggplot(mapping = aes(x= Year, y=(as.numeric(No_Jobs)))) + 
    geom_col() + 
    theme(axis.text.x = element_text(angle = 60, colour="gray",hjust = 1,size=rel(0.86))) +
    labs(title = "USSFoodBeverage Data for No. Of Jobs  in NY",
         y="Sector", x="Years")

cpi_emp_data_USFoodBeverage_NY %>% 
    select(c("Year","Sector"= "Description","cpi_value")) %>% 
    ggplot(mapping = aes(x= Year, y=(as.numeric(cpi_value)))) + 
    geom_col() + 
    theme(axis.text.x = element_text(angle = 60, colour="gray",hjust = 1,size=rel(0.86))) +
    labs(title = "USSFoodBeverage Data for CPI Value in NY",
         y = "Sector",x= "Years")

# summary(as.numeric(cpi_emp_data_USFoodBeverage_NY$`Number of jobs`))
# summary(as.numeric(cpi_emp_data_USFoodBeverage_NY$cpi_value))
# 
# quantile(as.numeric(cpi_emp_data_USFoodBeverage_NY$`Number of jobs`))
# levels(quantile(as.numeric(cpi_emp_data_USFoodBeverage_NY$cpi_value)))

5.1 Model

#library(DataExplorer)
#library(plot_histogram)
# library(lazytrade)
ALL_NY <- rbind(cpi_emp_data_USTransportation_NY,
                cpi_emp_data_USMedical_NY,
                cpi_emp_data_USHousing_NY,
                cpi_emp_data_USFoodBeverage_NY,
                cpi_emp_data_USEducationAndCommunication_NY)

ALL_NY <- cpi_emp_data_Combined %>% filter(GeoName=="New York")
#DataExplorer::plot_missing(ALL_NY, title = "Beverage Training Data: % Missing Values by Data Element")
ALL_NY$`Number of jobs` = as.numeric(ALL_NY$`Number of jobs`)
#plot_histogram(ALL_NY, ggtheme=theme_light())
#plot_correlation(ALL_NY[,c(2,5,9)])
ggplot(ALL_NY[,c("Year","Description","Number of jobs","cpi_value")], 
        mapping=aes(y= log(cpi_value), 
        x=Year,
        color= Description, 
        group = Description)) + geom_smooth() + theme_classic()

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(ALL_NY[,c("Year","Description","Number of jobs","cpi_value")], 
        mapping=aes(y= log(`Number of jobs`), 
        x=Year,
        color= Description, 
        group = Description)) + geom_smooth() + theme_classic()

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

library(caret)

## Warning: package 'caret' was built under R version 3.5.3

## Loading required package: lattice

## 
## Attaching package: 'caret'

## The following object is masked from 'package:purrr':
## 
##     lift

ALL_NY

ALL_NY_M <- ALL_NY[,c("Year","Number of jobs","parent_item_code","cpi_value")]
names(ALL_NY_M) = c("Year", "Jobs", "C_Item","CPI")
# ALL_NY_M$Year = as.character(ALL_NY_M$Year)
# ALL_NY_M$E_Area = as.character(ALL_NY_M$E_Area)

set.seed(3456)
trainIndex <- createDataPartition(ALL_NY_M$Jobs, p = .8, 
                                  list = FALSE, 
                                  times = 1)
ALL_NY_train <- ALL_NY_M[ trainIndex,]
ALL_NY_test  <- ALL_NY_M[-trainIndex,]
head(ALL_NY_M)

#--NEW MODEL FOR ALL
# Change the col name
m_cpi_emp_data_Combined <- cpi_emp_data_Combined[,c("Year","Number of jobs","parent_item_code","cpi_value","GeoName")]
names(m_cpi_emp_data_Combined) = c("Year", "Jobs", "C_Item","CPI","GeoName")
m_cpi_emp_data_Combined$Jobs <- as.integer(m_cpi_emp_data_Combined$Jobs)

trainIndex <- createDataPartition(m_cpi_emp_data_Combined$Jobs, p = .8, 
                                  list = FALSE, 
                                  times = 1)
ALL_train <- m_cpi_emp_data_Combined[ trainIndex,]
ALL_test  <- m_cpi_emp_data_Combined[-trainIndex,]
head(ALL_NY_M)

head(m_cpi_emp_data_Combined)

5.2 For NY

metric = 'RMSE'
head(ALL_NY_train )

# Train control
customTrainControl <- trainControl(method = "repeatedcv", 
                                   number = 10 , 
                                   repeats = 5 ,
                                   verboseIter = F)
# Linear Model
lm <- train(Jobs ~ .,
            ALL_train,
            method= 'lm',
            trControl = customTrainControl)
lm$results

summary(lm)

## 
## Call:
## lm(formula = .outcome ~ ., data = dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -407640  -66213   10575   69812 1267507 
## 
## Coefficients: (1 not defined because of singularities)
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          -1.907e+05  2.746e+04  -6.945 5.63e-12 ***
## Year1999             -1.573e+03  2.636e+04  -0.060 0.952414    
## Year2000              8.869e+03  2.611e+04   0.340 0.734092    
## Year2001              1.996e+04  2.601e+04   0.767 0.442981    
## Year2002              1.671e+04  2.623e+04   0.637 0.524278    
## Year2003              3.784e+04  2.605e+04   1.453 0.146533    
## Year2004              5.075e+04  2.614e+04   1.942 0.052371 .  
## Year2005              5.782e+04  2.606e+04   2.219 0.026668 *  
## Year2006              6.274e+04  2.620e+04   2.395 0.016762 *  
## Year2007              7.301e+04  2.558e+04   2.854 0.004378 ** 
## Year2008              7.494e+04  2.599e+04   2.884 0.003987 ** 
## Year2009              7.479e+04  2.598e+04   2.879 0.004046 ** 
## Year2010              8.319e+04  2.625e+04   3.169 0.001563 ** 
## Year2011              9.647e+04  2.600e+04   3.711 0.000214 ***
## Year2012              8.986e+04  2.608e+04   3.446 0.000585 ***
## Year2013              1.179e+05  2.631e+04   4.481 8.00e-06 ***
## Year2014              1.113e+05  2.623e+04   4.243 2.34e-05 ***
## Year2015              1.283e+05  2.581e+04   4.969 7.50e-07 ***
## Year2016              1.322e+05  2.563e+04   5.157 2.84e-07 ***
## Year2017              1.794e+05  3.117e+04   5.756 1.04e-08 ***
## Year2018                     NA         NA      NA       NA    
## C_ItemSAF            -6.704e+03  1.390e+04  -0.482 0.629757    
## C_ItemSAH             1.815e+05  1.900e+04   9.555  < 2e-16 ***
## C_ItemSAM             5.227e+05  1.322e+04  39.529  < 2e-16 ***
## C_ItemSAT             1.046e+05  1.603e+04   6.530 8.99e-11 ***
## CPI                  -4.621e+00  1.101e+00  -4.198 2.85e-05 ***
## GeoNameArizona        1.019e+05  3.000e+04   3.396 0.000703 ***
## GeoNameCalifornia     8.771e+05  2.741e+04  32.005  < 2e-16 ***
## GeoNameColorado       8.747e+04  2.688e+04   3.255 0.001160 ** 
## GeoNameFlorida        4.431e+05  2.657e+04  16.673  < 2e-16 ***
## GeoNameGeorgia        2.117e+05  2.795e+04   7.574 6.31e-14 ***
## GeoNameHawaii         1.434e+04  2.676e+04   0.536 0.592160    
## GeoNameIllinois       3.554e+05  2.817e+04  12.613  < 2e-16 ***
## GeoNameKansas         4.313e+04  2.649e+04   1.628 0.103761    
## GeoNameMassachusetts  2.281e+05  2.729e+04   8.358  < 2e-16 ***
## GeoNameMichigan       2.143e+05  2.319e+04   9.239  < 2e-16 ***
## GeoNameMissouri       1.203e+05  2.660e+04   4.521 6.64e-06 ***
## `GeoNameNew York`     6.158e+05  2.905e+04  21.198  < 2e-16 ***
## GeoNameOhio           2.812e+05  2.648e+04  10.619  < 2e-16 ***
## GeoNameOregon         7.990e+04  2.707e+04   2.952 0.003211 ** 
## GeoNamePennsylvania   3.555e+05  2.699e+04  13.172  < 2e-16 ***
## GeoNameTexas          5.624e+05  2.718e+04  20.690  < 2e-16 ***
## GeoNameWashington     1.721e+05  2.803e+04   6.139 1.06e-09 ***
## GeoNameWisconsin      1.116e+05  2.688e+04   4.153 3.47e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 163700 on 1492 degrees of freedom
## Multiple R-squared:  0.7709, Adjusted R-squared:  0.7644 
## F-statistic: 119.5 on 42 and 1492 DF,  p-value: < 2.2e-16

par(mfrow=c(2,2))
plot(lm$finalModel)

#--------------
# https://stackoverflow.com/questions/44200195/how-to-debug-contrasts-can-be-applied-only-to-factors-with-2-or-more-levels-er
ALL_train %>% dplyr::mutate_all(as.factor) %>% str

## 'data.frame':    1535 obs. of  5 variables:
##  $ Year   : Factor w/ 21 levels "1998","1999",..: 1 2 5 6 7 8 9 10 11 13 ...
##  $ Jobs   : Factor w/ 1465 levels "3515","3779",..: 29 30 18 20 21 27 31 32 28 26 ...
##  $ C_Item : Factor w/ 5 levels "SAE","SAF","SAH",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ CPI    : Factor w/ 1529 levels "188.6","189.6",..: 423 426 443 459 476 487 494 508 528 523 ...
##  $ GeoName: Factor w/ 19 levels "Alaska","Arizona",..: 1 1 1 1 1 1 1 1 1 1 ...

unique(cpi_emp_data_Combined$area_code)

##  [1] A427           A429           A421;A422;A424 A433           A320;A321     
##  [6] A319           A426           A207           A214           A103          
## [11] A208           A211           A209           A101           A210;A213     
## [16] A425           A102;A104      A316;A318      A423           A212          
## 50 Levels: 0000 0100 0200 0300 0400 A000 A100 A101 A102 A103 A104 A200 ... A316;A318

# multiple LM
lm_cpi_item <- lm(Jobs ~ CPI+C_Item,ALL_train)
summary(lm_cpi_item)

## 
## Call:
## lm(formula = Jobs ~ CPI + C_Item, data = ALL_train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -616796  -91342  -25855   55837 1998242 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 119850.952  14762.062   8.119 9.59e-16 ***
## CPI             14.469      1.205  12.007  < 2e-16 ***
## C_ItemSAF   -83340.521  21354.232  -3.903 9.92e-05 ***
## C_ItemSAH   -51191.743  25599.996  -2.000   0.0457 *  
## C_ItemSAM   516863.414  20820.848  24.824  < 2e-16 ***
## C_ItemSAT   -48057.489  23294.045  -2.063   0.0393 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 258800 on 1529 degrees of freedom
## Multiple R-squared:  0.4133, Adjusted R-squared:  0.4114 
## F-statistic: 215.4 on 5 and 1529 DF,  p-value: < 2.2e-16

lm_cpi_item_name <- lm(Jobs ~ (CPI*GeoName)+CPI+GeoName+C_Item+Year,ALL_train)
summary(lm_cpi_item_name)

## 
## Call:
## lm(formula = Jobs ~ (CPI * GeoName) + CPI + GeoName + C_Item + 
##     Year, data = ALL_train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -407490  -62836    3026   72716 1271991 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              -1.910e+05  3.491e+04  -5.472 5.23e-08 ***
## CPI                       1.433e+01  1.364e+01   1.051 0.293492    
## GeoNameArizona            7.171e+04  4.517e+04   1.587 0.112648    
## GeoNameCalifornia         8.679e+05  4.024e+04  21.567  < 2e-16 ***
## GeoNameColorado           3.870e+04  4.486e+04   0.863 0.388464    
## GeoNameFlorida            4.131e+05  4.241e+04   9.740  < 2e-16 ***
## GeoNameGeorgia            1.698e+05  4.040e+04   4.204 2.78e-05 ***
## GeoNameHawaii            -8.035e+03  4.237e+04  -0.190 0.849622    
## GeoNameIllinois           3.576e+05  4.206e+04   8.501  < 2e-16 ***
## GeoNameKansas             2.774e+04  4.324e+04   0.642 0.521232    
## GeoNameMassachusetts      2.594e+05  3.992e+04   6.497 1.12e-10 ***
## GeoNameMichigan           2.308e+05  3.515e+04   6.567 7.08e-11 ***
## GeoNameMissouri           1.218e+05  4.406e+04   2.765 0.005760 ** 
## GeoNameNew York           7.277e+05  4.235e+04  17.184  < 2e-16 ***
## GeoNameOhio               3.176e+05  4.062e+04   7.819 1.01e-14 ***
## GeoNameOregon             8.301e+04  4.367e+04   1.901 0.057543 .  
## GeoNamePennsylvania       4.407e+05  4.042e+04  10.904  < 2e-16 ***
## GeoNameTexas              5.579e+05  3.992e+04  13.975  < 2e-16 ***
## GeoNameWashington         1.348e+05  4.012e+04   3.361 0.000796 ***
## GeoNameWisconsin          1.154e+05  4.457e+04   2.589 0.009723 ** 
## C_ItemSAF                -3.107e+04  1.657e+04  -1.875 0.061044 .  
## C_ItemSAH                 1.141e+05  3.216e+04   3.550 0.000398 ***
## C_ItemSAM                 5.183e+05  1.311e+04  39.533  < 2e-16 ***
## C_ItemSAT                 5.946e+04  2.301e+04   2.585 0.009843 ** 
## Year1999                 -2.255e+03  2.595e+04  -0.087 0.930781    
## Year2000                  8.081e+03  2.571e+04   0.314 0.753286    
## Year2001                  1.791e+04  2.565e+04   0.698 0.485132    
## Year2002                  1.513e+04  2.584e+04   0.586 0.558245    
## Year2003                  3.513e+04  2.569e+04   1.367 0.171750    
## Year2004                  4.723e+04  2.584e+04   1.828 0.067802 .  
## Year2005                  5.066e+04  2.587e+04   1.959 0.050350 .  
## Year2006                  5.429e+04  2.612e+04   2.078 0.037850 *  
## Year2007                  6.267e+04  2.560e+04   2.448 0.014487 *  
## Year2008                  6.103e+04  2.621e+04   2.329 0.019998 *  
## Year2009                  6.323e+04  2.601e+04   2.431 0.015194 *  
## Year2010                  6.870e+04  2.653e+04   2.589 0.009717 ** 
## Year2011                  8.000e+04  2.661e+04   3.006 0.002691 ** 
## Year2012                  6.975e+04  2.670e+04   2.613 0.009075 ** 
## Year2013                  9.717e+04  2.711e+04   3.584 0.000349 ***
## Year2014                  9.221e+04  2.702e+04   3.413 0.000661 ***
## Year2015                  1.098e+05  2.644e+04   4.150 3.51e-05 ***
## Year2016                  1.166e+05  2.602e+04   4.482 7.96e-06 ***
## Year2017                  1.829e+05  3.080e+04   5.937 3.62e-09 ***
## CPI:GeoNameArizona        3.855e+01  2.531e+01   1.523 0.127874    
## CPI:GeoNameCalifornia    -1.353e+01  1.308e+01  -1.034 0.301371    
## CPI:GeoNameColorado       2.594e+01  1.825e+01   1.421 0.155422    
## CPI:GeoNameFlorida       -7.427e+00  1.306e+01  -0.569 0.569519    
## CPI:GeoNameGeorgia       -1.048e+01  1.313e+01  -0.798 0.424985    
## CPI:GeoNameHawaii         8.646e+00  1.559e+01   0.554 0.579321    
## CPI:GeoNameIllinois      -1.598e+01  1.314e+01  -1.216 0.224167    
## CPI:GeoNameKansas         7.870e+00  1.687e+01   0.466 0.640991    
## CPI:GeoNameMassachusetts -1.829e+01  1.314e+01  -1.392 0.163991    
## CPI:GeoNameMichigan      -1.497e+01  1.317e+01  -1.137 0.255789    
## CPI:GeoNameMissouri      -1.420e-01  1.745e+01  -0.008 0.993507    
## CPI:GeoNameNew York      -2.437e+01  1.316e+01  -1.852 0.064274 .  
## CPI:GeoNameOhio          -1.867e+01  1.304e+01  -1.432 0.152361    
## CPI:GeoNameOregon        -2.130e+00  1.711e+01  -0.124 0.900938    
## CPI:GeoNamePennsylvania  -2.741e+01  1.303e+01  -2.103 0.035627 *  
## CPI:GeoNameTexas         -1.397e+01  1.309e+01  -1.067 0.286269    
## CPI:GeoNameWashington    -1.182e+01  1.318e+01  -0.897 0.369977    
## CPI:GeoNameWisconsin     -1.710e+00  1.757e+01  -0.097 0.922514    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 161000 on 1474 degrees of freedom
## Multiple R-squared:  0.781,  Adjusted R-squared:  0.772 
## F-statistic: 87.59 on 60 and 1474 DF,  p-value: < 2.2e-16

lm_cpi_item_name2 <- lm(Jobs ~ (CPI*GeoName*C_Item)+CPI+GeoName+C_Item+Year,ALL_train)
summary(lm_cpi_item_name2)

## 
## Call:
## lm(formula = Jobs ~ (CPI * GeoName * C_Item) + CPI + GeoName + 
##     C_Item + Year, data = ALL_train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -416037   -9110    1027   10626  541396 
## 
## Coefficients: (2 not defined because of singularities)
##                                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                         7.285e+03  2.737e+04   0.266 0.790118    
## CPI                                -1.483e+02  7.366e+01  -2.014 0.044241 *  
## GeoNameArizona                      1.942e+04  4.026e+04   0.482 0.629669    
## GeoNameCalifornia                   2.796e+05  3.701e+04   7.555 7.75e-14 ***
## GeoNameColorado                     3.421e+04  3.965e+04   0.863 0.388391    
## GeoNameFlorida                      8.485e+04  3.709e+04   2.288 0.022301 *  
## GeoNameGeorgia                      4.737e+04  3.820e+04   1.240 0.215207    
## GeoNameHawaii                       1.631e+04  3.942e+04   0.414 0.679145    
## GeoNameIllinois                     1.256e+05  3.928e+04   3.199 0.001413 ** 
## GeoNameKansas                       2.371e+04  3.842e+04   0.617 0.537309    
## GeoNameMassachusetts                1.575e+05  3.853e+04   4.088 4.62e-05 ***
## GeoNameMichigan                     4.890e+04  3.030e+04   1.614 0.106761    
## GeoNameMissouri                     7.024e+04  4.168e+04   1.685 0.092185 .  
## GeoNameNew York                     3.015e+05  4.093e+04   7.367 3.06e-13 ***
## GeoNameOhio                         8.769e+04  3.661e+04   2.396 0.016733 *  
## GeoNameOregon                       4.000e+04  3.857e+04   1.037 0.299910    
## GeoNamePennsylvania                 2.072e+05  3.810e+04   5.437 6.43e-08 ***
## GeoNameTexas                        1.161e+05  3.851e+04   3.013 0.002632 ** 
## GeoNameWashington                   5.023e+04  3.836e+04   1.310 0.190573    
## GeoNameWisconsin                    3.952e+04  3.979e+04   0.993 0.320729    
## C_ItemSAF                           3.035e+05  1.340e+05   2.265 0.023669 *  
## C_ItemSAH                           1.179e+05  6.757e+04   1.745 0.081197 .  
## C_ItemSAM                           1.101e+05  6.695e+04   1.644 0.100449    
## C_ItemSAT                           6.349e+04  5.383e+04   1.179 0.238468    
## Year1999                            1.768e+03  8.438e+03   0.210 0.834089    
## Year2000                            3.938e+03  8.370e+03   0.470 0.638085    
## Year2001                            9.097e+03  8.370e+03   1.087 0.277273    
## Year2002                            1.525e+04  8.443e+03   1.807 0.071059 .  
## Year2003                            1.881e+04  8.468e+03   2.221 0.026546 *  
## Year2004                            2.551e+04  8.566e+03   2.978 0.002954 ** 
## Year2005                            3.633e+04  8.691e+03   4.181 3.10e-05 ***
## Year2006                            4.898e+04  8.894e+03   5.507 4.39e-08 ***
## Year2007                            5.561e+04  8.892e+03   6.254 5.38e-10 ***
## Year2008                            6.002e+04  9.368e+03   6.407 2.06e-10 ***
## Year2009                            5.763e+04  9.216e+03   6.254 5.40e-10 ***
## Year2010                            6.762e+04  1.000e+04   6.759 2.07e-11 ***
## Year2011                            7.804e+04  1.028e+04   7.589 6.06e-14 ***
## Year2012                            8.392e+04  1.055e+04   7.953 3.87e-15 ***
## Year2013                            9.349e+04  1.078e+04   8.674  < 2e-16 ***
## Year2014                            1.010e+05  1.081e+04   9.343  < 2e-16 ***
## Year2015                            1.100e+05  1.057e+04  10.405  < 2e-16 ***
## Year2016                            1.168e+05  1.042e+04  11.203  < 2e-16 ***
## Year2017                            2.031e+05  1.082e+04  18.775  < 2e-16 ***
## CPI:GeoNameArizona                  7.433e+01  9.798e+01   0.759 0.448244    
## CPI:GeoNameCalifornia               2.010e+02  7.537e+01   2.667 0.007741 ** 
## CPI:GeoNameColorado                 4.522e+01  1.017e+02   0.445 0.656544    
## CPI:GeoNameFlorida                  1.603e+02  7.515e+01   2.133 0.033086 *  
## CPI:GeoNameGeorgia                  1.314e+02  7.593e+01   1.731 0.083683 .  
## CPI:GeoNameHawaii                   8.794e+00  9.444e+01   0.093 0.925823    
## CPI:GeoNameIllinois                 1.432e+02  7.398e+01   1.935 0.053176 .  
## CPI:GeoNameKansas                  -9.789e+00  9.854e+01  -0.099 0.920886    
## CPI:GeoNameMassachusetts            1.318e+02  7.563e+01   1.742 0.081745 .  
## CPI:GeoNameMichigan                 1.270e+02  7.453e+01   1.704 0.088531 .  
## CPI:GeoNameMissouri                 5.274e+00  9.974e+01   0.053 0.957839    
## CPI:GeoNameNew York                 1.624e+02  7.412e+01   2.191 0.028638 *  
## CPI:GeoNameOhio                     1.193e+02  7.927e+01   1.506 0.132412    
## CPI:GeoNameOregon                   4.081e+00  1.007e+02   0.041 0.967675    
## CPI:GeoNamePennsylvania             1.314e+02  7.862e+01   1.671 0.094878 .  
## CPI:GeoNameTexas                    1.599e+02  7.611e+01   2.102 0.035785 *  
## CPI:GeoNameWashington               1.097e+02  7.566e+01   1.450 0.147423    
## CPI:GeoNameWisconsin                4.538e+01  9.549e+01   0.475 0.634665    
## CPI:C_ItemSAF                      -5.353e+01  1.021e+02  -0.524 0.600195    
## CPI:C_ItemSAH                       1.109e+02  7.445e+01   1.489 0.136606    
## CPI:C_ItemSAM                      -2.205e+01  1.055e+02  -0.209 0.834406    
## CPI:C_ItemSAT                       1.142e+02  7.467e+01   1.529 0.126550    
## GeoNameArizona:C_ItemSAF           -3.622e+04  2.115e+05  -0.171 0.864081    
## GeoNameCalifornia:C_ItemSAF        -9.592e+04  1.655e+05  -0.580 0.562265    
## GeoNameColorado:C_ItemSAF          -4.518e+04  1.894e+05  -0.239 0.811528    
## GeoNameFlorida:C_ItemSAF            2.306e+04  1.716e+05   0.134 0.893112    
## GeoNameGeorgia:C_ItemSAF           -3.506e+05  1.528e+05  -2.295 0.021898 *  
## GeoNameHawaii:C_ItemSAF            -1.552e+05  1.568e+05  -0.989 0.322635    
## GeoNameIllinois:C_ItemSAF          -4.041e+04  1.674e+05  -0.241 0.809339    
## GeoNameKansas:C_ItemSAF            -1.194e+05  1.578e+05  -0.757 0.449065    
## GeoNameMassachusetts:C_ItemSAF     -3.713e+05  1.521e+05  -2.441 0.014790 *  
## GeoNameMichigan:C_ItemSAF          -3.140e+05  1.362e+05  -2.305 0.021308 *  
## GeoNameMissouri:C_ItemSAF          -1.164e+05  1.701e+05  -0.684 0.494116    
## GeoNameNew York:C_ItemSAF          -2.552e+05  1.631e+05  -1.565 0.117903    
## GeoNameOhio:C_ItemSAF              -3.286e+05  1.505e+05  -2.183 0.029198 *  
## GeoNameOregon:C_ItemSAF            -1.069e+05  1.650e+05  -0.648 0.517102    
## GeoNamePennsylvania:C_ItemSAF      -4.450e+05  1.491e+05  -2.985 0.002885 ** 
## GeoNameTexas:C_ItemSAF             -2.365e+05  1.520e+05  -1.555 0.120123    
## GeoNameWashington:C_ItemSAF        -3.610e+05  1.517e+05  -2.380 0.017455 *  
## GeoNameWisconsin:C_ItemSAF         -5.879e+04  1.651e+05  -0.356 0.721774    
## GeoNameArizona:C_ItemSAH            1.376e+05  1.840e+05   0.747 0.454941    
## GeoNameCalifornia:C_ItemSAH         3.193e+04  9.898e+04   0.323 0.747045    
## GeoNameColorado:C_ItemSAH           4.021e+04  1.270e+05   0.317 0.751523    
## GeoNameFlorida:C_ItemSAH            6.557e+04  9.444e+04   0.694 0.487647    
## GeoNameGeorgia:C_ItemSAH           -1.074e+05  9.968e+04  -1.077 0.281707    
## GeoNameHawaii:C_ItemSAH            -2.973e+04  8.913e+04  -0.334 0.738754    
## GeoNameIllinois:C_ItemSAH          -1.343e+05  1.080e+05  -1.243 0.213966    
## GeoNameKansas:C_ItemSAH             7.005e+04  1.206e+05   0.581 0.561307    
## GeoNameMassachusetts:C_ItemSAH     -2.076e+05  9.736e+04  -2.132 0.033185 *  
## GeoNameMichigan:C_ItemSAH          -1.302e+04  7.072e+04  -0.184 0.853994    
## GeoNameMissouri:C_ItemSAH           2.351e+04  1.138e+05   0.207 0.836335    
## GeoNameNew York:C_ItemSAH          -1.467e+05  9.735e+04  -1.506 0.132186    
## GeoNameOhio:C_ItemSAH              -1.276e+05  1.010e+05  -1.263 0.206671    
## GeoNameOregon:C_ItemSAH             2.515e+04  1.208e+05   0.208 0.835074    
## GeoNamePennsylvania:C_ItemSAH      -2.352e+05  9.595e+04  -2.452 0.014340 *  
## GeoNameTexas:C_ItemSAH              6.289e+04  1.010e+05   0.623 0.533598    
## GeoNameWashington:C_ItemSAH         5.450e+03  1.044e+05   0.052 0.958393    
## GeoNameWisconsin:C_ItemSAH          4.471e+04  1.243e+05   0.360 0.719062    
## GeoNameArizona:C_ItemSAM            4.398e+02  1.238e+05   0.004 0.997165    
## GeoNameCalifornia:C_ItemSAM         1.526e+06  9.691e+04  15.751  < 2e-16 ***
## GeoNameColorado:C_ItemSAM           4.276e+04  9.460e+04   0.452 0.651292    
## GeoNameFlorida:C_ItemSAM            5.123e+05  8.851e+04   5.788 8.85e-09 ***
## GeoNameGeorgia:C_ItemSAM            9.187e+04  9.681e+04   0.949 0.342793    
## GeoNameHawaii:C_ItemSAM            -4.336e+04  8.778e+04  -0.494 0.621404    
## GeoNameIllinois:C_ItemSAM           3.361e+05  8.951e+04   3.755 0.000181 ***
## GeoNameKansas:C_ItemSAM             1.510e+05  1.191e+05   1.268 0.205068    
## GeoNameMassachusetts:C_ItemSAM      6.510e+04  9.395e+04   0.693 0.488500    
## GeoNameMichigan:C_ItemSAM           3.716e+05  7.140e+04   5.204 2.25e-07 ***
## GeoNameMissouri:C_ItemSAM           1.297e+05  9.674e+04   1.341 0.180128    
## GeoNameNew York:C_ItemSAM           6.133e+05  9.517e+04   6.445 1.61e-10 ***
## GeoNameOhio:C_ItemSAM               3.684e+05  9.054e+04   4.069 5.00e-05 ***
## GeoNameOregon:C_ItemSAM             4.647e+04  9.094e+04   0.511 0.609416    
## GeoNamePennsylvania:C_ItemSAM       4.074e+05  8.897e+04   4.579 5.11e-06 ***
## GeoNameTexas:C_ItemSAM              3.713e+05  8.919e+04   4.163 3.35e-05 ***
## GeoNameWashington:C_ItemSAM         7.977e+04  9.302e+04   0.858 0.391271    
## GeoNameWisconsin:C_ItemSAM          1.784e+05  9.065e+04   1.968 0.049236 *  
## GeoNameArizona:C_ItemSAT                   NA         NA      NA       NA    
## GeoNameCalifornia:C_ItemSAT         2.267e+05  7.942e+04   2.855 0.004376 ** 
## GeoNameColorado:C_ItemSAT           4.114e+04  7.961e+04   0.517 0.605407    
## GeoNameFlorida:C_ItemSAT            1.608e+05  7.035e+04   2.286 0.022403 *  
## GeoNameGeorgia:C_ItemSAT            6.676e+04  7.027e+04   0.950 0.342214    
## GeoNameHawaii:C_ItemSAT            -4.056e+03  7.926e+04  -0.051 0.959190    
## GeoNameIllinois:C_ItemSAT           9.727e+04  7.058e+04   1.378 0.168366    
## GeoNameKansas:C_ItemSAT             1.367e+03  7.757e+04   0.018 0.985939    
## GeoNameMassachusetts:C_ItemSAT     -1.306e+05  7.210e+04  -1.811 0.070360 .  
## GeoNameMichigan:C_ItemSAT          -2.643e+04  5.749e+04  -0.460 0.645810    
## GeoNameMissouri:C_ItemSAT           4.436e+04  7.604e+04   0.583 0.559736    
## GeoNameNew York:C_ItemSAT          -4.222e+04  7.414e+04  -0.569 0.569164    
## GeoNameOhio:C_ItemSAT               4.198e+04  6.920e+04   0.607 0.544159    
## GeoNameOregon:C_ItemSAT            -2.353e+03  7.868e+04  -0.030 0.976145    
## GeoNamePennsylvania:C_ItemSAT      -7.067e+04  7.802e+04  -0.906 0.365181    
## GeoNameTexas:C_ItemSAT              2.694e+05  7.013e+04   3.841 0.000128 ***
## GeoNameWashington:C_ItemSAT        -2.090e+04  7.091e+04  -0.295 0.768256    
## GeoNameWisconsin:C_ItemSAT          4.236e+04  8.219e+04   0.515 0.606337    
## CPI:GeoNameArizona:C_ItemSAF       -1.167e+02  1.783e+02  -0.655 0.512897    
## CPI:GeoNameCalifornia:C_ItemSAF    -2.401e+01  1.046e+02  -0.230 0.818480    
## CPI:GeoNameColorado:C_ItemSAF      -3.499e+00  1.416e+02  -0.025 0.980284    
## CPI:GeoNameFlorida:C_ItemSAF       -1.412e+01  1.058e+02  -0.133 0.893872    
## CPI:GeoNameGeorgia:C_ItemSAF        7.318e+01  1.048e+02   0.699 0.484986    
## CPI:GeoNameHawaii:C_ItemSAF         9.205e+01  1.246e+02   0.738 0.460364    
## CPI:GeoNameIllinois:C_ItemSAF       3.229e+01  1.029e+02   0.314 0.753746    
## CPI:GeoNameKansas:C_ItemSAF         9.394e+01  1.283e+02   0.732 0.464160    
## CPI:GeoNameMassachusetts:C_ItemSAF  6.235e+01  1.044e+02   0.597 0.550448    
## CPI:GeoNameMichigan:C_ItemSAF       7.058e+01  1.032e+02   0.684 0.494186    
## CPI:GeoNameMissouri:C_ItemSAF       6.318e+01  1.334e+02   0.474 0.635728    
## CPI:GeoNameNew York:C_ItemSAF       2.452e+01  1.029e+02   0.238 0.811745    
## CPI:GeoNameOhio:C_ItemSAF           7.883e+01  1.079e+02   0.730 0.465269    
## CPI:GeoNameOregon:C_ItemSAF         6.407e+01  1.332e+02   0.481 0.630570    
## CPI:GeoNamePennsylvania:C_ItemSAF   7.651e+01  1.074e+02   0.712 0.476502    
## CPI:GeoNameTexas:C_ItemSAF          3.844e+01  1.049e+02   0.366 0.714191    
## CPI:GeoNameWashington:C_ItemSAF     9.269e+01  1.045e+02   0.887 0.375089    
## CPI:GeoNameWisconsin:C_ItemSAF      1.698e+01  1.279e+02   0.133 0.894435    
## CPI:GeoNameArizona:C_ItemSAH       -9.152e+01  1.144e+02  -0.800 0.423928    
## CPI:GeoNameCalifornia:C_ItemSAH    -1.408e+02  7.642e+01  -1.843 0.065608 .  
## CPI:GeoNameColorado:C_ItemSAH      -2.947e+01  1.058e+02  -0.279 0.780639    
## CPI:GeoNameFlorida:C_ItemSAH       -1.071e+02  7.631e+01  -1.403 0.160883    
## CPI:GeoNameGeorgia:C_ItemSAH       -9.055e+01  7.697e+01  -1.176 0.239657    
## CPI:GeoNameHawaii:C_ItemSAH         2.760e+00  9.596e+01   0.029 0.977059    
## CPI:GeoNameIllinois:C_ItemSAH      -1.022e+02  7.499e+01  -1.363 0.173069    
## CPI:GeoNameKansas:C_ItemSAH        -7.063e+00  1.023e+02  -0.069 0.944953    
## CPI:GeoNameMassachusetts:C_ItemSAH -9.327e+01  7.664e+01  -1.217 0.223836    
## CPI:GeoNameMichigan:C_ItemSAH      -8.994e+01  7.543e+01  -1.192 0.233356    
## CPI:GeoNameMissouri:C_ItemSAH      -3.424e+00  1.030e+02  -0.033 0.973479    
## CPI:GeoNameNew York:C_ItemSAH      -1.185e+02  7.508e+01  -1.578 0.114704    
## CPI:GeoNameOhio:C_ItemSAH          -7.441e+01  8.051e+01  -0.924 0.355474    
## CPI:GeoNameOregon:C_ItemSAH         9.579e-01  1.040e+02   0.009 0.992651    
## CPI:GeoNamePennsylvania:C_ItemSAH  -8.797e+01  7.978e+01  -1.103 0.270376    
## CPI:GeoNameTexas:C_ItemSAH         -1.126e+02  7.719e+01  -1.459 0.144813    
## CPI:GeoNameWashington:C_ItemSAH    -7.491e+01  7.669e+01  -0.977 0.328869    
## CPI:GeoNameWisconsin:C_ItemSAH     -4.547e+01  9.970e+01  -0.456 0.648453    
## CPI:GeoNameArizona:C_ItemSAM        5.445e+02  3.876e+02   1.405 0.160328    
## CPI:GeoNameCalifornia:C_ItemSAM    -3.311e+01  1.113e+02  -0.297 0.766136    
## CPI:GeoNameColorado:C_ItemSAM       1.434e+02  1.467e+02   0.977 0.328708    
## CPI:GeoNameFlorida:C_ItemSAM        1.179e+02  1.088e+02   1.084 0.278632    
## CPI:GeoNameGeorgia:C_ItemSAM        1.066e+02  1.128e+02   0.945 0.344581    
## CPI:GeoNameHawaii:C_ItemSAM         2.990e+01  1.510e+02   0.198 0.843052    
## CPI:GeoNameIllinois:C_ItemSAM       6.160e+01  1.066e+02   0.578 0.563470    
## CPI:GeoNameKansas:C_ItemSAM        -1.002e+02  2.022e+02  -0.496 0.620202    
## CPI:GeoNameMassachusetts:C_ItemSAM  9.380e+01  1.093e+02   0.858 0.390835    
## CPI:GeoNameMichigan:C_ItemSAM       5.383e+01  1.073e+02   0.502 0.615894    
## CPI:GeoNameMissouri:C_ItemSAM       1.736e+02  1.567e+02   1.108 0.268081    
## CPI:GeoNameNew York:C_ItemSAM       9.626e+01  1.070e+02   0.900 0.368372    
## CPI:GeoNameOhio:C_ItemSAM           1.718e+02  1.185e+02   1.449 0.147491    
## CPI:GeoNameOregon:C_ItemSAM         1.354e+02  1.430e+02   0.947 0.344056    
## CPI:GeoNamePennsylvania:C_ItemSAM   1.409e+02  1.153e+02   1.222 0.221831    
## CPI:GeoNameTexas:C_ItemSAM          3.251e+02  1.109e+02   2.930 0.003447 ** 
## CPI:GeoNameWashington:C_ItemSAM     9.958e+01  1.125e+02   0.885 0.376319    
## CPI:GeoNameWisconsin:C_ItemSAM      1.057e+02  1.438e+02   0.735 0.462474    
## CPI:GeoNameArizona:C_ItemSAT               NA         NA      NA       NA    
## CPI:GeoNameCalifornia:C_ItemSAT    -1.665e+02  7.671e+01  -2.170 0.030180 *  
## CPI:GeoNameColorado:C_ItemSAT      -5.181e+01  1.046e+02  -0.495 0.620414    
## CPI:GeoNameFlorida:C_ItemSAT       -1.317e+02  7.648e+01  -1.721 0.085410 .  
## CPI:GeoNameGeorgia:C_ItemSAT       -9.928e+01  7.716e+01  -1.287 0.198442    
## CPI:GeoNameHawaii:C_ItemSAT        -9.898e+00  9.689e+01  -0.102 0.918647    
## CPI:GeoNameIllinois:C_ItemSAT      -1.112e+02  7.516e+01  -1.480 0.139118    
## CPI:GeoNameKansas:C_ItemSAT         1.249e+01  1.008e+02   0.124 0.901465    
## CPI:GeoNameMassachusetts:C_ItemSAT -1.013e+02  7.687e+01  -1.318 0.187642    
## CPI:GeoNameMichigan:C_ItemSAT      -9.432e+01  7.567e+01  -1.246 0.212805    
## CPI:GeoNameMissouri:C_ItemSAT      -1.095e+01  1.020e+02  -0.107 0.914510    
## CPI:GeoNameNew York:C_ItemSAT      -1.310e+02  7.532e+01  -1.740 0.082109 .  
## CPI:GeoNameOhio:C_ItemSAT          -8.882e+01  8.058e+01  -1.102 0.270534    
## CPI:GeoNameOregon:C_ItemSAT        -2.637e+00  1.030e+02  -0.026 0.979570    
## CPI:GeoNamePennsylvania:C_ItemSAT  -9.817e+01  8.003e+01  -1.227 0.220171    
## CPI:GeoNameTexas:C_ItemSAT         -1.249e+02  7.736e+01  -1.615 0.106630    
## CPI:GeoNameWashington:C_ItemSAT    -7.795e+01  7.687e+01  -1.014 0.310751    
## CPI:GeoNameWisconsin:C_ItemSAT     -4.298e+01  9.830e+01  -0.437 0.662058    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 51690 on 1328 degrees of freedom
## Multiple R-squared:  0.9797, Adjusted R-squared:  0.9765 
## F-statistic: 310.7 on 206 and 1328 DF,  p-value: < 2.2e-16

lm_cpi_item_name3 <- lm(Jobs ~ (CPI*GeoName) + (GeoName*C_Item)+CPI+GeoName+C_Item+Year,ALL_train)
summary(lm_cpi_item_name3)

## 
## Call:
## lm(formula = Jobs ~ (CPI * GeoName) + (GeoName * C_Item) + CPI + 
##     GeoName + C_Item + Year, data = ALL_train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -411031  -15334     343   17811  571733 
## 
## Coefficients: (1 not defined because of singularities)
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                    -2.959e+04  1.588e+04  -1.863 0.062699 .  
## CPI                            -4.396e+01  1.111e+01  -3.955 8.03e-05 ***
## GeoNameArizona                  5.196e+04  2.784e+04   1.867 0.062172 .  
## GeoNameCalifornia               3.608e+05  2.042e+04  17.668  < 2e-16 ***
## GeoNameColorado                 4.682e+04  2.276e+04   2.057 0.039846 *  
## GeoNameFlorida                  1.313e+05  2.095e+04   6.265 4.94e-10 ***
## GeoNameGeorgia                  6.456e+04  2.061e+04   3.133 0.001766 ** 
## GeoNameHawaii                   6.451e+03  2.149e+04   0.300 0.764092    
## GeoNameIllinois                 1.509e+05  2.094e+04   7.205 9.46e-13 ***
## GeoNameKansas                   1.895e+04  2.132e+04   0.889 0.374296    
## GeoNameMassachusetts            1.752e+05  2.064e+04   8.490  < 2e-16 ***
## GeoNameMichigan                 6.712e+04  1.767e+04   3.798 0.000152 ***
## GeoNameMissouri                 6.410e+04  2.225e+04   2.881 0.004029 ** 
## GeoNameNew York                 3.666e+05  2.202e+04  16.651  < 2e-16 ***
## GeoNameOhio                     1.049e+05  2.030e+04   5.167 2.73e-07 ***
## GeoNameOregon                   3.811e+04  2.097e+04   1.818 0.069317 .  
## GeoNamePennsylvania             2.257e+05  2.038e+04  11.074  < 2e-16 ***
## GeoNameTexas                    1.598e+05  2.063e+04   7.748 1.78e-14 ***
## GeoNameWashington               4.028e+04  2.029e+04   1.985 0.047332 *  
## GeoNameWisconsin                4.920e+04  2.241e+04   2.196 0.028282 *  
## C_ItemSAF                       5.458e+04  2.586e+04   2.111 0.034973 *  
## C_ItemSAH                       1.803e+05  4.987e+04   3.615 0.000311 ***
## C_ItemSAM                       4.590e+04  2.144e+04   2.141 0.032447 *  
## C_ItemSAT                       1.261e+05  3.568e+04   3.534 0.000423 ***
## Year1999                        2.520e+03  9.228e+03   0.273 0.784834    
## Year2000                        4.846e+03  9.145e+03   0.530 0.596281    
## Year2001                        1.208e+04  9.129e+03   1.323 0.186081    
## Year2002                        1.932e+04  9.184e+03   2.104 0.035542 *  
## Year2003                        2.242e+04  9.163e+03   2.447 0.014534 *  
## Year2004                        2.822e+04  9.228e+03   3.058 0.002268 ** 
## Year2005                        4.095e+04  9.271e+03   4.417 1.08e-05 ***
## Year2006                        5.384e+04  9.394e+03   5.731 1.22e-08 ***
## Year2007                        6.092e+04  9.240e+03   6.593 6.08e-11 ***
## Year2008                        6.393e+04  9.526e+03   6.711 2.80e-11 ***
## Year2009                        6.340e+04  9.386e+03   6.755 2.09e-11 ***
## Year2010                        7.009e+04  9.655e+03   7.259 6.41e-13 ***
## Year2011                        7.844e+04  9.791e+03   8.011 2.37e-15 ***
## Year2012                        8.623e+04  9.850e+03   8.754  < 2e-16 ***
## Year2013                        9.791e+04  1.000e+04   9.792  < 2e-16 ***
## Year2014                        1.033e+05  9.990e+03  10.343  < 2e-16 ***
## Year2015                        1.149e+05  9.695e+03  11.856  < 2e-16 ***
## Year2016                        1.226e+05  9.533e+03  12.862  < 2e-16 ***
## Year2017                        1.773e+05  1.126e+04  15.748  < 2e-16 ***
## CPI:GeoNameArizona             -2.885e+01  4.674e+01  -0.617 0.537259    
## CPI:GeoNameCalifornia           5.796e+01  1.131e+01   5.124 3.41e-07 ***
## CPI:GeoNameColorado             2.159e+00  1.930e+01   0.112 0.910947    
## CPI:GeoNameFlorida              4.743e+01  1.148e+01   4.131 3.83e-05 ***
## CPI:GeoNameGeorgia              4.316e+01  1.120e+01   3.855 0.000121 ***
## CPI:GeoNameHawaii               8.387e+00  1.413e+01   0.594 0.552841    
## CPI:GeoNameIllinois             4.213e+01  1.116e+01   3.775 0.000167 ***
## CPI:GeoNameKansas              -7.637e+00  1.702e+01  -0.449 0.653812    
## CPI:GeoNameMassachusetts        4.174e+01  1.121e+01   3.725 0.000203 ***
## CPI:GeoNameMichigan             4.296e+01  1.112e+01   3.862 0.000118 ***
## CPI:GeoNameMissouri            -2.867e+00  1.691e+01  -0.170 0.865397    
## CPI:GeoNameNew York             4.594e+01  1.116e+01   4.118 4.05e-05 ***
## CPI:GeoNameOhio                 4.177e+01  1.145e+01   3.647 0.000275 ***
## CPI:GeoNameOregon               1.307e+00  1.696e+01   0.077 0.938595    
## CPI:GeoNamePennsylvania         4.563e+01  1.157e+01   3.945 8.37e-05 ***
## CPI:GeoNameTexas                4.824e+01  1.122e+01   4.298 1.84e-05 ***
## CPI:GeoNameWashington           4.089e+01  1.118e+01   3.659 0.000262 ***
## CPI:GeoNameWisconsin           -1.978e+00  1.822e+01  -0.109 0.913544    
## GeoNameArizona:C_ItemSAF        8.115e+02  5.065e+04   0.016 0.987221    
## GeoNameCalifornia:C_ItemSAF    -1.848e+05  3.645e+04  -5.072 4.47e-07 ***
## GeoNameColorado:C_ItemSAF      -7.257e+03  4.251e+04  -0.171 0.864463    
## GeoNameFlorida:C_ItemSAF       -2.525e+04  3.500e+04  -0.721 0.470841    
## GeoNameGeorgia:C_ItemSAF       -5.869e+04  3.483e+04  -1.685 0.092263 .  
## GeoNameHawaii:C_ItemSAF         1.966e+03  3.630e+04   0.054 0.956807    
## GeoNameIllinois:C_ItemSAF      -7.989e+04  3.682e+04  -2.170 0.030189 *  
## GeoNameKansas:C_ItemSAF         3.309e+04  3.961e+04   0.835 0.403593    
## GeoNameMassachusetts:C_ItemSAF -1.465e+05  3.468e+04  -4.225 2.54e-05 ***
## GeoNameMichigan:C_ItemSAF      -6.246e+04  2.971e+04  -2.102 0.035704 *  
## GeoNameMissouri:C_ItemSAF       1.592e+03  3.938e+04   0.040 0.967759    
## GeoNameNew York:C_ItemSAF      -2.565e+05  3.799e+04  -6.751 2.14e-11 ***
## GeoNameOhio:C_ItemSAF          -6.818e+04  3.471e+04  -1.964 0.049690 *  
## GeoNameOregon:C_ItemSAF         3.386e+03  3.848e+04   0.088 0.929912    
## GeoNamePennsylvania:C_ItemSAF  -1.601e+05  3.491e+04  -4.587 4.89e-06 ***
## GeoNameTexas:C_ItemSAF         -4.920e+04  3.480e+04  -1.414 0.157693    
## GeoNameWashington:C_ItemSAF    -4.016e+04  3.434e+04  -1.169 0.242468    
## GeoNameWisconsin:C_ItemSAF      2.217e+04  4.128e+04   0.537 0.591277    
## GeoNameArizona:C_ItemSAH        1.291e+05  1.262e+05   1.023 0.306466    
## GeoNameCalifornia:C_ItemSAH     1.110e+05  7.617e+04   1.457 0.145409    
## GeoNameColorado:C_ItemSAH       7.857e+04  7.882e+04   0.997 0.318982    
## GeoNameFlorida:C_ItemSAH        1.428e+05  6.473e+04   2.206 0.027528 *  
## GeoNameGeorgia:C_ItemSAH       -5.551e+04  6.783e+04  -0.818 0.413237    
## GeoNameHawaii:C_ItemSAH        -1.913e+03  6.639e+04  -0.029 0.977014    
## GeoNameIllinois:C_ItemSAH      -5.420e+04  6.629e+04  -0.818 0.413662    
## GeoNameKansas:C_ItemSAH         3.521e+04  7.149e+04   0.492 0.622465    
## GeoNameMassachusetts:C_ItemSAH -1.753e+05  7.031e+04  -2.493 0.012781 *  
## GeoNameMichigan:C_ItemSAH      -5.101e+04  5.239e+04  -0.974 0.330428    
## GeoNameMissouri:C_ItemSAH       4.503e+04  7.003e+04   0.643 0.520360    
## GeoNameNew York:C_ItemSAH      -1.127e+05  6.966e+04  -1.617 0.106016    
## GeoNameOhio:C_ItemSAH          -5.748e+04  6.410e+04  -0.897 0.370007    
## GeoNameOregon:C_ItemSAH         4.217e+04  7.419e+04   0.568 0.569860    
## GeoNamePennsylvania:C_ItemSAH  -2.216e+05  7.093e+04  -3.124 0.001819 ** 
## GeoNameTexas:C_ItemSAH          1.018e+05  6.666e+04   1.526 0.127120    
## GeoNameWashington:C_ItemSAH    -3.536e+03  6.800e+04  -0.052 0.958535    
## GeoNameWisconsin:C_ItemSAH      4.001e+04  7.503e+04   0.533 0.593927    
## GeoNameArizona:C_ItemSAM        2.064e+05  3.139e+04   6.575 6.85e-11 ***
## GeoNameCalifornia:C_ItemSAM     1.509e+06  2.933e+04  51.455  < 2e-16 ***
## GeoNameColorado:C_ItemSAM       1.767e+05  3.122e+04   5.661 1.82e-08 ***
## GeoNameFlorida:C_ItemSAM        8.243e+05  2.971e+04  27.741  < 2e-16 ***
## GeoNameGeorgia:C_ItemSAM        3.009e+05  2.968e+04  10.140  < 2e-16 ***
## GeoNameHawaii:C_ItemSAM         1.399e+04  3.040e+04   0.460 0.645487    
## GeoNameIllinois:C_ItemSAM       5.630e+05  2.902e+04  19.400  < 2e-16 ***
## GeoNameKansas:C_ItemSAM         1.178e+05  3.003e+04   3.922 9.20e-05 ***
## GeoNameMassachusetts:C_ItemSAM  3.050e+05  2.938e+04  10.381  < 2e-16 ***
## GeoNameMichigan:C_ItemSAM       4.678e+05  2.576e+04  18.158  < 2e-16 ***
## GeoNameMissouri:C_ItemSAM       2.698e+05  3.037e+04   8.883  < 2e-16 ***
## GeoNameNew York:C_ItemSAM       1.003e+06  3.015e+04  33.278  < 2e-16 ***
## GeoNameOhio:C_ItemSAM           6.026e+05  2.950e+04  20.430  < 2e-16 ***
## GeoNameOregon:C_ItemSAM         1.554e+05  3.077e+04   5.049 5.02e-07 ***
## GeoNamePennsylvania:C_ItemSAM   6.298e+05  2.954e+04  21.318  < 2e-16 ***
## GeoNameTexas:C_ItemSAM          1.018e+06  2.892e+04  35.222  < 2e-16 ***
## GeoNameWashington:C_ItemSAM     2.604e+05  2.974e+04   8.756  < 2e-16 ***
## GeoNameWisconsin:C_ItemSAM      2.850e+05  3.050e+04   9.345  < 2e-16 ***
## GeoNameArizona:C_ItemSAT               NA         NA      NA       NA    
## GeoNameCalifornia:C_ItemSAT    -4.066e+04  4.977e+04  -0.817 0.413995    
## GeoNameColorado:C_ItemSAT       2.262e+03  5.432e+04   0.042 0.966789    
## GeoNameFlorida:C_ItemSAT        1.286e+04  4.701e+04   0.274 0.784476    
## GeoNameGeorgia:C_ItemSAT        4.956e+03  4.861e+04   0.102 0.918816    
## GeoNameHawaii:C_ItemSAT        -2.215e+04  4.785e+04  -0.463 0.643455    
## GeoNameIllinois:C_ItemSAT       3.752e+04  4.753e+04   0.789 0.430046    
## GeoNameKansas:C_ItemSAT         3.919e+04  5.430e+04   0.722 0.470499    
## GeoNameMassachusetts:C_ItemSAT -1.964e+05  4.755e+04  -4.130 3.84e-05 ***
## GeoNameMichigan:C_ItemSAT      -7.678e+04  3.886e+04  -1.976 0.048387 *  
## GeoNameMissouri:C_ItemSAT       4.066e+04  5.088e+04   0.799 0.424430    
## GeoNameNew York:C_ItemSAT      -2.092e+05  4.697e+04  -4.455 9.06e-06 ***
## GeoNameOhio:C_ItemSAT          -1.614e+04  4.947e+04  -0.326 0.744249    
## GeoNameOregon:C_ItemSAT         1.760e+03  5.482e+04   0.032 0.974389    
## GeoNamePennsylvania:C_ItemSAT  -1.393e+05  5.099e+04  -2.731 0.006386 ** 
## GeoNameTexas:C_ItemSAT          1.531e+05  4.857e+04   3.152 0.001657 ** 
## GeoNameWashington:C_ItemSAT    -2.744e+04  4.957e+04  -0.554 0.579940    
## GeoNameWisconsin:C_ItemSAT      4.489e+04  5.572e+04   0.806 0.420586    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 56940 on 1403 degrees of freedom
## Multiple R-squared:  0.9739, Adjusted R-squared:  0.9715 
## F-statistic: 400.1 on 131 and 1403 DF,  p-value: < 2.2e-16

lm_cpi2_item <- lm(Jobs ~ (CPI^2)+C_Item,ALL_train)
summary(lm_cpi2_item)

## 
## Call:
## lm(formula = Jobs ~ (CPI^2) + C_Item, data = ALL_train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -616796  -91342  -25855   55837 1998242 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 119850.952  14762.062   8.119 9.59e-16 ***
## CPI             14.469      1.205  12.007  < 2e-16 ***
## C_ItemSAF   -83340.521  21354.232  -3.903 9.92e-05 ***
## C_ItemSAH   -51191.743  25599.996  -2.000   0.0457 *  
## C_ItemSAM   516863.414  20820.848  24.824  < 2e-16 ***
## C_ItemSAT   -48057.489  23294.045  -2.063   0.0393 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 258800 on 1529 degrees of freedom
## Multiple R-squared:  0.4133, Adjusted R-squared:  0.4114 
## F-statistic: 215.4 on 5 and 1529 DF,  p-value: < 2.2e-16

lm_cpi_year <- lm(Jobs ~ CPI+Year,ALL_train)
summary(lm_cpi_year)

## 
## Call:
## lm(formula = Jobs ~ CPI + Year, data = ALL_train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -450389 -187647 -107860   50766 2267546 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 195201.635  37167.798   5.252 1.72e-07 ***
## CPI              3.702      1.199   3.088  0.00205 ** 
## Year1999     -2339.270  53706.025  -0.044  0.96526    
## Year2000     28556.938  53147.060   0.537  0.59113    
## Year2001     20401.869  52959.983   0.385  0.70012    
## Year2002    -13163.374  53326.981  -0.247  0.80506    
## Year2003     33074.846  52991.190   0.624  0.53262    
## Year2004     37336.067  53146.995   0.703  0.48247    
## Year2005     32933.063  52997.528   0.621  0.53443    
## Year2006     45192.463  53202.290   0.849  0.39577    
## Year2007     67772.427  51895.390   1.306  0.19177    
## Year2008     68241.262  52554.604   1.298  0.19432    
## Year2009     79495.401  52655.291   1.510  0.13132    
## Year2010     50931.150  53065.458   0.960  0.33732    
## Year2011     54786.919  52422.797   1.045  0.29614    
## Year2012     53126.008  52604.456   1.010  0.31270    
## Year2013     98888.194  53099.558   1.862  0.06275 .  
## Year2014     67205.157  52954.860   1.269  0.20460    
## Year2015    100441.500  52252.465   1.922  0.05476 .  
## Year2016    104378.776  51823.842   2.014  0.04417 *  
## Year2017    313382.749  62837.828   4.987 6.83e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 333700 on 1514 degrees of freedom
## Multiple R-squared:  0.03372,    Adjusted R-squared:  0.02096 
## F-statistic: 2.642 on 20 and 1514 DF,  p-value: 0.0001048

xyplot(ALL_test$Jobs ~ predict(lm),
 ## plot the points (type = 'p') and a background grid ('g')
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Observed")

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

xyplot(ALL_test$Jobs ~ predict(lm_cpi_year),
 ## plot the points (type = 'p') and a background grid ('g')
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Observed")

xyplot(resid(lm_cpi_year) ~ predict(lm_cpi_year),
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Residuals")

#PLS
# Useing train perform  to perfrom  pre-process and tuning together. The function first preprocess the training set by centering it and scaling it. Then the function uses 10-fold cross validation to try the number of components, i.e. latent variables, of the PLS model from 1 to 20.
# # The train function generates a resampling estimate of performance. Because
# the training set size is not small, 10-fold cross-validation should produce
# reasonable estimates of model performance. The function trainControl specifies
# the type of resampling:
ctrl <- trainControl(method = "cv", number = 10)
model_pls<- train(Jobs ~ .,   ALL_train,
                     method = "pls",
                     tuneLength = 20,
                     metric='Rsquared',
                     trControl = ctrl,
                     preProc = c("center", "scale"))

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018

model_pls

## Partial Least Squares 
## 
## 1535 samples
##    4 predictor
## 
## Pre-processing: centered (43), scaled (43) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1381, 1381, 1382, 1382, 1380, 1382, ... 
## Resampling results across tuning parameters:
## 
##   ncomp  RMSE      Rsquared   MAE     
##    1     183452.0  0.7068647  123594.6
##    2     179034.4  0.7198927  119609.3
##    3     173253.5  0.7388455  113591.6
##    4     171907.1  0.7427542  111394.8
##    5     168304.6  0.7527036  110675.7
##    6     166505.4  0.7587264  112089.3
##    7     165871.1  0.7608138  111006.2
##    8     165449.4  0.7619987  109881.7
##    9     165463.4  0.7618864  109977.4
##   10     165454.5  0.7619110  109963.2
##   11     165454.1  0.7619150  109956.8
##   12     165448.7  0.7619280  109951.7
##   13     165449.3  0.7619267  109952.2
##   14     165449.2  0.7619268  109952.2
##   15     165449.2  0.7619268  109952.2
##   16     165449.2  0.7619268  109952.2
##   17     165449.2  0.7619268  109952.2
##   18     165449.2  0.7619268  109952.2
##   19     165449.2  0.7619268  109952.2
##   20     165449.2  0.7619268  109952.2
## 
## Rsquared was used to select the optimal model using the largest value.
## The final value used for the model was ncomp = 8.

model_pls$results[3,2:4]

# M
model_glm<- train(Jobs ~ .,   ALL_train,
                     method = "glm",
                     tuneLength = 20,
                     metric='Rsquared',
                     trControl = ctrl,
                     preProc = c("center", "scale"))

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018

## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type
## == : prediction from a rank-deficient fit may be misleading

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018

## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type
## == : prediction from a rank-deficient fit may be misleading

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018

## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type
## == : prediction from a rank-deficient fit may be misleading

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018

## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type
## == : prediction from a rank-deficient fit may be misleading

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018

## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type
## == : prediction from a rank-deficient fit may be misleading

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018

## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type
## == : prediction from a rank-deficient fit may be misleading

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018

## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type
## == : prediction from a rank-deficient fit may be misleading

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018

## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type
## == : prediction from a rank-deficient fit may be misleading

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018

## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type
## == : prediction from a rank-deficient fit may be misleading

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018

## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type
## == : prediction from a rank-deficient fit may be misleading

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018

model_glm

## Generalized Linear Model 
## 
## 1535 samples
##    4 predictor
## 
## Pre-processing: centered (43), scaled (43) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1381, 1381, 1382, 1381, 1382, 1381, ... 
## Resampling results:
## 
##   RMSE      Rsquared   MAE     
##   164153.9  0.7657161  109914.9

# test_model(test_dataset = model_pls,
#            predictor_dataset = ALL_train,
#            test_type = "regression")
# TEST MODEL with Train  data
xyplot(ALL_train$Jobs ~ predict(model_pls),
 ## plot the points (type = 'p') and a background grid ('g')
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Observed")

xyplot(ALL_train$Jobs ~ predict(lm),
 ## plot the points (type = 'p') and a background grid ('g')
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Observed")

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

xyplot(ALL_train$Jobs ~ predict(lm_cpi2_item),
 ## plot the points (type = 'p') and a background grid ('g')
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Observed")

xyplot(ALL_train$Jobs ~ predict(lm_cpi_item),
 ## plot the points (type = 'p') and a background grid ('g')
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Observed")

xyplot(ALL_train$Jobs ~ predict(model_glm),
 ## plot the points (type = 'p') and a background grid ('g')
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Observed")

## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type
## == : prediction from a rank-deficient fit may be misleading

# TEST MODEL with TEst data
xyplot(ALL_test$Jobs ~ predict(model_pls,ALL_test),
 ## plot the points (type = 'p') and a background grid ('g')
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Observed")

xyplot(ALL_test$Jobs ~ predict(lm,ALL_test),
 ## plot the points (type = 'p') and a background grid ('g')
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Observed")

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

xyplot(ALL_test$Jobs ~ predict(lm_cpi_item,ALL_test),
 ## plot the points (type = 'p') and a background grid ('g')
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Observed")

# REssidual Plot with train data
xyplot(resid(model_pls) ~ predict(model_pls),
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Residuals")

xyplot(resid(lm) ~ predict(lm),
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Residuals")

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

xyplot(resid(lm_cpi_item) ~ predict(lm_cpi_item),
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Residuals")

xyplot(resid(lm_cpi2_item) ~ predict(lm_cpi2_item),
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Residuals")

xyplot(resid(lm_cpi2_item) ~ predict(lm_cpi2_item),
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Residuals")

# REssidual Plot with Test data
xyplot(resid(model_pls) ~ predict(model_pls,ALL_test),
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Residuals")

## Warning in is.na(x) | is.na(y): longer object length is not a multiple of
## shorter object length

xyplot(resid(model_glm) ~ predict(model_glm,ALL_test),
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Residuals")

## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type
## == : prediction from a rank-deficient fit may be misleading

## Warning in is.na(x) | is.na(y): longer object length is not a multiple of
## shorter object length

xyplot(resid(lm_cpi_item) ~ predict(lm_cpi_item,ALL_test),
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Residuals")

## Warning in is.na(x) | is.na(y): longer object length is not a multiple of
## shorter object length

#---------------------VALID
xyplot(ALL_train$Jobs ~ predict(lm_cpi_item_name2,ALL_train),
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Actual Train")

## Warning in predict.lm(lm_cpi_item_name2, ALL_train): prediction from a rank-
## deficient fit may be misleading

xyplot(ALL_test$Jobs ~ predict(lm_cpi_item_name2,ALL_test),
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Actual Test")

## Warning in predict.lm(lm_cpi_item_name2, ALL_test): prediction from a rank-
## deficient fit may be misleading

plot(lm_cpi_item_name2)

# xyplot(resid(lm_cpi_item_name3) ~ predict(lm_cpi_item_name3,ALL_test),
#  type = c("p", "g"),
#  xlab = "Predicted", ylab = "Residuals")

head(resid(lm_cpi_item_name2) )

##         1         2         5         6         7         8 
## -8030.268 -7749.304 -2670.667  3554.078  8762.014  5327.627

head(predict(lm_cpi_item_name2))

##         1         2         5         6         7         8 
## 14954.268 14683.304  8991.667  2915.922 -2209.014  1511.373

head(ALL_train$Jobs)

## [1] 6924 6934 6321 6470 6553 6839

head(ALL_train$Jobs-predict(lm_cpi_item_name2))

##         1         2         5         6         7         8 
## -8030.268 -7749.304 -2670.667  3554.078  8762.014  5327.627

# PReduit for NY in 2019

5.3 Graph for Predication

## PRedict
# Grapg suggest we we are able to explain the situation in most of the year with some outliers . 
data.frame(pred=predict(lm_cpi_item_name2,ALL_test),actual = ALL_test$Jobs,year=ALL_test$Year,geo=ALL_test$GeoName) %>% ggplot(mapping = aes(x=pred,y=year )) +geom_point(mapping = aes(x=actual,color = "red")) +  geom_line()

## Warning in predict.lm(lm_cpi_item_name2, ALL_test): prediction from a rank-
## deficient fit may be misleading

data.frame(pred=predict(lm_cpi_item_name2,ALL_test),actual = ALL_test$Jobs,year=ALL_test$Year,geo=ALL_test$GeoName, indus = ALL_test$C_Item) %>% filter(geo=="New York") %>%ggplot(mapping = aes(x=pred,y=year )) +geom_point(mapping = aes(x=actual,color = "Actual")) +  geom_point(aes(color = "Pred") )+ facet_wrap(~indus) + labs(title = "New York Data for No. Of Jobs by Sector",
       y = "Year",x= "Jobs")  + theme_classic()

## Warning in predict.lm(lm_cpi_item_name2, ALL_test): prediction from a rank-
## deficient fit may be misleading

data.frame(pred=predict(lm_cpi_item_name2,ALL_test),actual = ALL_test$Jobs,year=ALL_test$Year,geo=ALL_test$GeoName, indus = ALL_test$C_Item) %>% filter(geo=="Florida") %>% ggplot(mapping = aes(x=pred,y=year )) +geom_point(mapping = aes(x=actual,color = "Actual")) +  geom_point(aes(color = "Pred") )+ facet_wrap(~indus) + labs(title = "Florida Data for No. Of Jobs by Sector",
       y = "Year",x= "Jobs")  + theme_classic()

## Warning in predict.lm(lm_cpi_item_name2, ALL_test): prediction from a rank-
## deficient fit may be misleading

data.frame(pred=predict(lm_cpi_item_name2,ALL_test),actual = ALL_test$Jobs,year=ALL_test$Year,geo=ALL_test$GeoName, indus = ALL_test$C_Item) %>% filter(geo=="Texas") %>% ggplot(mapping = aes(x=pred,y=year )) +geom_point(mapping = aes(x=actual,color = "Actual")) +  geom_point(aes(color = "Pred") )+ facet_wrap(~indus) + labs(title = "Texas Data for No. Of Jobs by Sector",
       y = "Year",x= "Jobs") + theme_classic()

## Warning in predict.lm(lm_cpi_item_name2, ALL_test): prediction from a rank-
## deficient fit may be misleading

5.4 RESULT OF MODEL

data.frame("LM Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm, ALL_test))),
           "MAPE" =  MLmetrics::MAPE(predict(lm, ALL_test),ALL_test$Jobs))

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

data.frame("lm_cpi2_item Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm_cpi2_item, ALL_test))),
           "MAPE" =  MLmetrics::MAPE(predict(lm_cpi2_item, ALL_test),ALL_test$Jobs))

data.frame("lm_cpi_item Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm_cpi_item, ALL_test))),
           "MAPE" =  MLmetrics::MAPE(predict(lm_cpi2_item, ALL_test),ALL_test$Jobs))

data.frame("lm_cpi_year Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm_cpi_year, ALL_test))),
           "MAPE" =  MLmetrics::MAPE(predict(lm_cpi_year, ALL_test),ALL_test$Jobs))

data.frame("PLS Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(model_pls, ALL_test))),
           "MAPE" =  MLmetrics::MAPE(predict(model_pls, ALL_test),ALL_test$Jobs))

cbind(data.frame("LM Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm, ALL_test))),
                 "MAPE" =  MLmetrics::MAPE(predict(lm, ALL_test),ALL_test$Jobs)),
      data.frame("lm_cpi_item_name2 Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm_cpi_item_name2, ALL_test))),
                 "MAPE" =  MLmetrics::MAPE(predict(lm_cpi_item_name2, ALL_test),ALL_test$Jobs)),
      data.frame("lm_cpi_item Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm_cpi_item, ALL_test))),
                 "MAPE" =  MLmetrics::MAPE(predict(lm_cpi2_item, ALL_test),ALL_test$Jobs)),
      data.frame("lm_cpi_year Model"= defaultSummary(data.frame(obs=ALL_NY_test$Jobs,pred=predict(lm_cpi_year, ALL_test))),
                 "MAPE" =  MLmetrics::MAPE(predict(lm_cpi_year, ALL_test),ALL_test$Jobs)),
      data.frame("PLS Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(model_pls, ALL_test))),
                 "MAPE" =  MLmetrics::MAPE(predict(model_pls, ALL_test),ALL_test$Jobs))
    )

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(lm_cpi_item_name2, ALL_test): prediction from a rank-
## deficient fit may be misleading

## Warning in predict.lm(lm_cpi_item_name2, ALL_test): prediction from a rank-
## deficient fit may be misleading

## NEW MODEl

cbind(

data.frame("MLM1"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm_cpi_item_name, ALL_test))),
                 "MAPE" =  MLmetrics::MAPE(predict(lm_cpi_item_name, ALL_test),ALL_test$Jobs)),

data.frame("lm_cpi_item_name2"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm_cpi_item_name2, ALL_test))),
                 "MAPE" =  MLmetrics::MAPE(predict(lm_cpi_item_name2, ALL_test),ALL_test$Jobs)),

data.frame("lm_cpi_item_name3"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm_cpi_item_name3, ALL_test))),
                 "MAPE" =  MLmetrics::MAPE(predict(lm_cpi_item_name3, ALL_test),ALL_test$Jobs)),
data.frame("LM Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm, ALL_test))),
                 "MAPE" =  MLmetrics::MAPE(predict(lm, ALL_test),ALL_test$Jobs)),
      data.frame("lm_cpi2_item Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm_cpi2_item, ALL_test))),
                 "MAPE" =  MLmetrics::MAPE(predict(lm_cpi2_item, ALL_test),ALL_test$Jobs)),
      data.frame("lm_cpi_item Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm_cpi_item, ALL_test))),
                 "MAPE" =  MLmetrics::MAPE(predict(lm_cpi2_item, ALL_test),ALL_test$Jobs)),
      data.frame("lm_cpi_year Model"= defaultSummary(data.frame(obs=ALL_NY_test$Jobs,pred=predict(lm_cpi_year, ALL_test))),
                 "MAPE" =  MLmetrics::MAPE(predict(lm_cpi_year, ALL_test),ALL_test$Jobs)),
      data.frame("PLS Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(model_pls, ALL_test))),"MAPE" =  MLmetrics::MAPE(predict(model_pls, ALL_test),ALL_test$Jobs))
)

## Warning in predict.lm(lm_cpi_item_name2, ALL_test): prediction from a rank-
## deficient fit may be misleading

## Warning in predict.lm(lm_cpi_item_name2, ALL_test): prediction from a rank-
## deficient fit may be misleading

## Warning in predict.lm(lm_cpi_item_name3, ALL_test): prediction from a rank-
## deficient fit may be misleading

## Warning in predict.lm(lm_cpi_item_name3, ALL_test): prediction from a rank-
## deficient fit may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

5.5 For WHOLE US

cbind(

data.frame("MLM1"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm_cpi_item_name, ALL_test)))),

data.frame("MLM2"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm_cpi_item_name2, ALL_test)))),

data.frame("MLM3"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm_cpi_item_name3, ALL_test)))),
data.frame("LM Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm, ALL_test)))),
      data.frame("lm_cpi2_item Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm_cpi2_item, ALL_test)))),
      data.frame("lm_cpi_item Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(lm_cpi_item, ALL_test)))),
      data.frame("lm_cpi_year Model"= defaultSummary(data.frame(obs=ALL_NY_test$Jobs,pred=predict(lm_cpi_year, ALL_test)))),
      data.frame("PLS Model"= defaultSummary(data.frame(obs=ALL_test$Jobs,pred=predict(model_pls, ALL_test))))
)

## Warning in predict.lm(lm_cpi_item_name2, ALL_test): prediction from a rank-
## deficient fit may be misleading

## Warning in predict.lm(lm_cpi_item_name3, ALL_test): prediction from a rank-
## deficient fit may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

# collect resamples
bbmle::AICctab(lm_cpi_item, lm_cpi_item_name, lm_cpi_item_name2, lm_cpi_item_name3,lm_cpi2_item,weights= T)

##                   dAICc  df  weight
## lm_cpi_item_name2    0.0 208 1     
## lm_cpi_item_name3  191.5 133 <0.001
## lm_cpi_item_name  3296.7 62  <0.001
## lm_cpi_item       4693.8 7   <0.001
## lm_cpi2_item      4693.8 7   <0.001

# https://www.youtube.com/watch?v=4SG2CB2Yu1c
anova(lm_cpi_item, lm_cpi_item_name, lm_cpi_item_name2, lm_cpi_item_name3,lm_cpi2_item)

   # bbmle::AICctab(lm_cpi_item, model_glm, model_pls, lm_cpi_item_name, lm_cpi_item_name2, lm_cpi_item_name3,lm_cpi2_item,lm_cpi_year,model_pls)

5.6 multicollinearity

WE do see out model is multicollinear and it makes it hard to interpret our coefficients, and it reduces the power of your model to identify independent variables that are statistically significant. These are definitely serious problems. However, sometime we can’t fix multicollinearity or finding ways to fix is not important if the objective is met.

# lm_cpi_item_name2
car::vif(lm_cpi_item_name)

##                     GVIF Df GVIF^(1/(2*Df))
## CPI         5.719766e+02  1       23.916032
## GeoName     1.693252e+07 18        1.587807
## C_Item      8.507845e+00  4        1.306855
## Year        1.866223e+00 19        1.016554
## CPI:GeoName 5.418332e+09 18        1.863739

alias( lm_cpi_item_name)

## Model :
## Jobs ~ (CPI * GeoName) + CPI + GeoName + C_Item + Year

alias(lm_cpi_item_name3)

## Model :
## Jobs ~ (CPI * GeoName) + (GeoName * C_Item) + CPI + GeoName + 
##     C_Item + Year
## 
## Complete :
##                          (Intercept) CPI GeoNameArizona GeoNameCalifornia
## GeoNameArizona:C_ItemSAT 0           0   0              0                
##                          GeoNameColorado GeoNameFlorida GeoNameGeorgia
## GeoNameArizona:C_ItemSAT 0               0              0             
##                          GeoNameHawaii GeoNameIllinois GeoNameKansas
## GeoNameArizona:C_ItemSAT 0             0               0            
##                          GeoNameMassachusetts GeoNameMichigan GeoNameMissouri
## GeoNameArizona:C_ItemSAT 0                    0               0              
##                          GeoNameNew York GeoNameOhio GeoNameOregon
## GeoNameArizona:C_ItemSAT 0               0           0            
##                          GeoNamePennsylvania GeoNameTexas GeoNameWashington
## GeoNameArizona:C_ItemSAT 0                   0            0                
##                          GeoNameWisconsin C_ItemSAF C_ItemSAH C_ItemSAM
## GeoNameArizona:C_ItemSAT 0                0         0         0        
##                          C_ItemSAT Year1999 Year2000 Year2001 Year2002 Year2003
## GeoNameArizona:C_ItemSAT 0         0        0        0        0        0       
##                          Year2004 Year2005 Year2006 Year2007 Year2008 Year2009
## GeoNameArizona:C_ItemSAT 0        0        0        0        0        0       
##                          Year2010 Year2011 Year2012 Year2013 Year2014 Year2015
## GeoNameArizona:C_ItemSAT 0        0        0        0        0        0       
##                          Year2016 Year2017 CPI:GeoNameArizona
## GeoNameArizona:C_ItemSAT 0        0        0                 
##                          CPI:GeoNameCalifornia CPI:GeoNameColorado
## GeoNameArizona:C_ItemSAT 0                     0                  
##                          CPI:GeoNameFlorida CPI:GeoNameGeorgia
## GeoNameArizona:C_ItemSAT 0                  0                 
##                          CPI:GeoNameHawaii CPI:GeoNameIllinois
## GeoNameArizona:C_ItemSAT 0                 0                  
##                          CPI:GeoNameKansas CPI:GeoNameMassachusetts
## GeoNameArizona:C_ItemSAT 0                 0                       
##                          CPI:GeoNameMichigan CPI:GeoNameMissouri
## GeoNameArizona:C_ItemSAT 0                   0                  
##                          CPI:GeoNameNew York CPI:GeoNameOhio CPI:GeoNameOregon
## GeoNameArizona:C_ItemSAT 0                   0               0                
##                          CPI:GeoNamePennsylvania CPI:GeoNameTexas
## GeoNameArizona:C_ItemSAT 0                       0               
##                          CPI:GeoNameWashington CPI:GeoNameWisconsin
## GeoNameArizona:C_ItemSAT 0                     0                   
##                          GeoNameArizona:C_ItemSAF GeoNameCalifornia:C_ItemSAF
## GeoNameArizona:C_ItemSAT 0                        0                          
##                          GeoNameColorado:C_ItemSAF GeoNameFlorida:C_ItemSAF
## GeoNameArizona:C_ItemSAT 0                         0                       
##                          GeoNameGeorgia:C_ItemSAF GeoNameHawaii:C_ItemSAF
## GeoNameArizona:C_ItemSAT 0                        0                      
##                          GeoNameIllinois:C_ItemSAF GeoNameKansas:C_ItemSAF
## GeoNameArizona:C_ItemSAT 0                         0                      
##                          GeoNameMassachusetts:C_ItemSAF
## GeoNameArizona:C_ItemSAT 0                             
##                          GeoNameMichigan:C_ItemSAF GeoNameMissouri:C_ItemSAF
## GeoNameArizona:C_ItemSAT 0                         0                        
##                          GeoNameNew York:C_ItemSAF GeoNameOhio:C_ItemSAF
## GeoNameArizona:C_ItemSAT 0                         0                    
##                          GeoNameOregon:C_ItemSAF GeoNamePennsylvania:C_ItemSAF
## GeoNameArizona:C_ItemSAT 0                       0                            
##                          GeoNameTexas:C_ItemSAF GeoNameWashington:C_ItemSAF
## GeoNameArizona:C_ItemSAT 0                      0                          
##                          GeoNameWisconsin:C_ItemSAF GeoNameArizona:C_ItemSAH
## GeoNameArizona:C_ItemSAT 0                          0                       
##                          GeoNameCalifornia:C_ItemSAH GeoNameColorado:C_ItemSAH
## GeoNameArizona:C_ItemSAT 0                           0                        
##                          GeoNameFlorida:C_ItemSAH GeoNameGeorgia:C_ItemSAH
## GeoNameArizona:C_ItemSAT 0                        0                       
##                          GeoNameHawaii:C_ItemSAH GeoNameIllinois:C_ItemSAH
## GeoNameArizona:C_ItemSAT 0                       0                        
##                          GeoNameKansas:C_ItemSAH GeoNameMassachusetts:C_ItemSAH
## GeoNameArizona:C_ItemSAT 0                       0                             
##                          GeoNameMichigan:C_ItemSAH GeoNameMissouri:C_ItemSAH
## GeoNameArizona:C_ItemSAT 0                         0                        
##                          GeoNameNew York:C_ItemSAH GeoNameOhio:C_ItemSAH
## GeoNameArizona:C_ItemSAT 0                         0                    
##                          GeoNameOregon:C_ItemSAH GeoNamePennsylvania:C_ItemSAH
## GeoNameArizona:C_ItemSAT 0                       0                            
##                          GeoNameTexas:C_ItemSAH GeoNameWashington:C_ItemSAH
## GeoNameArizona:C_ItemSAT 0                      0                          
##                          GeoNameWisconsin:C_ItemSAH GeoNameArizona:C_ItemSAM
## GeoNameArizona:C_ItemSAT 0                          0                       
##                          GeoNameCalifornia:C_ItemSAM GeoNameColorado:C_ItemSAM
## GeoNameArizona:C_ItemSAT 0                           0                        
##                          GeoNameFlorida:C_ItemSAM GeoNameGeorgia:C_ItemSAM
## GeoNameArizona:C_ItemSAT 0                        0                       
##                          GeoNameHawaii:C_ItemSAM GeoNameIllinois:C_ItemSAM
## GeoNameArizona:C_ItemSAT 0                       0                        
##                          GeoNameKansas:C_ItemSAM GeoNameMassachusetts:C_ItemSAM
## GeoNameArizona:C_ItemSAT 0                       0                             
##                          GeoNameMichigan:C_ItemSAM GeoNameMissouri:C_ItemSAM
## GeoNameArizona:C_ItemSAT 0                         0                        
##                          GeoNameNew York:C_ItemSAM GeoNameOhio:C_ItemSAM
## GeoNameArizona:C_ItemSAT 0                         0                    
##                          GeoNameOregon:C_ItemSAM GeoNamePennsylvania:C_ItemSAM
## GeoNameArizona:C_ItemSAT 0                       0                            
##                          GeoNameTexas:C_ItemSAM GeoNameWashington:C_ItemSAM
## GeoNameArizona:C_ItemSAT 0                      0                          
##                          GeoNameWisconsin:C_ItemSAM GeoNameCalifornia:C_ItemSAT
## GeoNameArizona:C_ItemSAT 0                          0                          
##                          GeoNameColorado:C_ItemSAT GeoNameFlorida:C_ItemSAT
## GeoNameArizona:C_ItemSAT 0                         0                       
##                          GeoNameGeorgia:C_ItemSAT GeoNameHawaii:C_ItemSAT
## GeoNameArizona:C_ItemSAT 0                        0                      
##                          GeoNameIllinois:C_ItemSAT GeoNameKansas:C_ItemSAT
## GeoNameArizona:C_ItemSAT 0                         0                      
##                          GeoNameMassachusetts:C_ItemSAT
## GeoNameArizona:C_ItemSAT 0                             
##                          GeoNameMichigan:C_ItemSAT GeoNameMissouri:C_ItemSAT
## GeoNameArizona:C_ItemSAT 0                         0                        
##                          GeoNameNew York:C_ItemSAT GeoNameOhio:C_ItemSAT
## GeoNameArizona:C_ItemSAT 0                         0                    
##                          GeoNameOregon:C_ItemSAT GeoNamePennsylvania:C_ItemSAT
## GeoNameArizona:C_ItemSAT 0                       0                            
##                          GeoNameTexas:C_ItemSAT GeoNameWashington:C_ItemSAT
## GeoNameArizona:C_ItemSAT 0                      0                          
##                          GeoNameWisconsin:C_ItemSAT
## GeoNameArizona:C_ItemSAT 0