DATA 698 : Capstone Research Project

1 Overview

We are in totally different era of twenty first century, and it gives us very rare situation where any positive news would help the humankind. We want to use the historical CPI data and find the relation of it with the employment, in hope that we would have some positive news on employment by following the trend of the data in past.

We feel that lower CPI would result in more job opportunity, as the it gives space for more competition in small business across sectors.

2 Capstone Project on CPI and Employment

  1. The estimates of employment for 1998-2006 are based on the 2002 North American Industry Classification System (NAICS). The estimates for 2007-2010 are based on the 2007 NAICS. The estimates for 2011-2016 are based on the 2012 NAICS. The estimates for 2017 forward are based on the 2017 NAICS.
  2. Excludes limited partners.
  3. Under the 2007 NAICS, internet publishing and broadcasting was reclassified to other information services.
  • (NA) Not available.
  • (NM) Not meaningful.
    1. Not shown to avoid disclosure of confidential information; estimates are included in higher-level totals.
    1. Estimate for employment suppressed to cover corresponding estimate for earnings. Estimates for this item are included in the total.

Last updated: September 24, 2019- new statistics for 2018; revised statistics for 2014-2017.

3 Data Preparation

Load the required libraries

3.1 Load Employment Datasets

##    GeoFIPS            GeoName              Region       TableName        
##  Length:7084        Length:7084        Min.   :1.000   Length:7084       
##  Class :character   Class :character   1st Qu.:3.000   Class :character  
##  Mode  :character   Mode  :character   Median :5.000   Mode  :character  
##                                        Mean   :4.475                     
##                                        3rd Qu.:6.000                     
##                                        Max.   :8.000                     
##                                        NA's   :122                       
##     LineCode      IndustryClassification Description            Unit          
##  Min.   :  10.0   Length:7084            Length:7084        Length:7084       
##  1st Qu.: 517.0   Class :character       Class :character   Class :character  
##  Median : 712.5   Mode  :character       Mode  :character   Mode  :character  
##  Mean   : 872.1                                                               
##  3rd Qu.:1103.0                                                               
##  Max.   :2012.0                                                               
##  NA's   :4                                                                    
##      1998               1999               2000               2001          
##  Length:7084        Length:7084        Length:7084        Length:7084       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##      2002               2003               2004               2005          
##  Length:7084        Length:7084        Length:7084        Length:7084       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##      2006               2007               2008               2009          
##  Length:7084        Length:7084        Length:7084        Length:7084       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##      2010               2011               2012               2013          
##  Length:7084        Length:7084        Length:7084        Length:7084       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##      2014               2015               2016               2017          
##  Length:7084        Length:7084        Length:7084        Length:7084       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##      2018          
##  Length:7084       
##  Class :character  
##  Mode  :character  
##                    
##                    
##                    
## 
## Observations: 118
## Variables: 3
## $ LineCode    <fct> 10, 20, 40, 50, 60, 70, 80, 90, 100, 101, 102, 103, 200...
## $ Description <fct> "Total employment (number of jobs)", "Wage and salary e...
## $ NA          <fct> "A count of jobs, both full-time and part-time. It incl...

3.2 Show Relevant Employment Data

## Observations: 7,080
## Variables: 24
## $ LineCode    <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,...
## $ Description <chr> "Total employment (number of jobs)", "Total employment ...
## $ GeoName     <chr> "United States", "Alabama", "Alaska", "Arizona", "Arkan...
## $ `1998`      <chr> "158481200", "2361892", "382166", "2616288", "1445536",...
## $ `1999`      <chr> "161531300", "2378217", "381307", "2695892", "1460374",...
## $ `2000`      <chr> "165370800", "2392225", "389734", "2801510", "1482449",...
## $ `2001`      <chr> "165522200", "2376053", "394565", "2829002", "1482587",...
## $ `2002`      <chr> "165095100", "2364828", "402187", "2847095", "1478929",...
## $ `2003`      <chr> "165921500", "2371430", "405621", "2917121", "1482035",...
## $ `2004`      <chr> "168839700", "2425649", "413864", "3041476", "1505095",...
## $ `2005`      <chr> "172338400", "2486833", "421419", "3219820", "1537680",...
## $ `2006`      <chr> "175868600", "2545556", "431320", "3375218", "1567682",...
## $ `2007`      <chr> "179543700", "2604078", "439825", "3465075", "1582858",...
## $ `2008`      <chr> "179213900", "2582591", "443538", "3402808", "1579283",...
## $ `2009`      <chr> "173636700", "2479507", "442447", "3228493", "1542944",...
## $ `2010`      <chr> "172901700", "2460298", "443904", "3181571", "1541272",...
## $ `2011`      <chr> "176091700", "2497933", "450364", "3239045", "1561948",...
## $ `2012`      <chr> "178979700", "2503678", "459222", "3295537", "1565142",...
## $ `2013`      <chr> "182325100", "2523338", "461110", "3371219", "1569249",...
## $ `2014`      <chr> "186233800", "2551872", "461327", "3448173", "1587414",...
## $ `2015`      <chr> "190315800", "2586885", "461767", "3548174", "1610779",...
## $ `2016`      <chr> "193371900", "2619154", "457371", "3646604", "1629237",...
## $ `2017`      <chr> "196825300", "2653968", "456799", "3751283", "1644432",...
## $ `2018`      <chr> "200746000", "2691517", "459178", "3859137", "1663188",...

3.2.1 Employment from Select Industry by Area

3.2.1.1 AK - Alaska

3.2.1.2 AZ - Arizona

3.2.1.3 CA - California

3.2.1.4 CO - Colorado

3.2.1.5 FL - Florida

3.2.1.6 GA - Georgia

3.2.1.7 HI - Hawaii

3.2.1.8 IL - Illinois

3.2.1.9 KS - Kansas

3.2.1.10 MA - Massachusetts

3.2.1.11 MI - Michigan

3.2.1.12 MN - Minnesota

3.2.1.13 MO - Missouri

3.2.1.14 NY - New York

3.2.1.15 OH - Ohio

3.2.1.16 OR - Oregon

3.2.1.17 PA - Pennsylvania

3.2.1.18 TX - Texas

3.2.1.19 WA - Washington

3.2.1.20 WI - Wisconsin

3.3 Load CPI Data

## [1] "cu_area"
## [1] "cu_base"
## [1] "cu_data_0_Current"
## [1] "cu_data_1_AllItems"
## [1] "cu_data_10_OtherWest"
## [1] "cu_data_11_USFoodBeverage"
## [1] "cu_data_12_USHousing"
## [1] "cu_data_13_USApparel"
## [1] "cu_data_14_USTransportation"
## [1] "cu_data_15_USMedical"
## [1] "cu_data_16_USRecreation"
## [1] "cu_data_17_USEducationAndCommunication"
## [1] "cu_data_18_USOtherGoodsAndServices"
## [1] "cu_data_19_PopulationSize"
## [1] "cu_data_2_Summaries"
## [1] "cu_data_20_USCommoditiesServicesSpecial"
## [1] "cu_data_3_AsizeNorthEast"
## [1] "cu_data_4_AsizeNorthCentral"
## [1] "cu_data_5_AsizeSouth"
## [1] "cu_data_6_AsizeWest"
## [1] "cu_data_7_OtherNorthEast"
## [1] "cu_data_8_OtherNorthCentral"
## [1] "cu_data_9_OtherSouth"
## [1] "cu_footnote"
## [1] "cu_item"
## [1] "cu_period"
## [1] "cu_periodicity"
## [1] "cu_series"

3.4 Show CPI Reference Data

3.4.1 Area

3.4.2 Base

3.4.3 Item

3.4.4 Periodicity

3.4.5 Series

3.4.6 Period

3.5 Combined Reference Data

3.6 Show CPI Data from Select Industry

3.6.1 CPI - US Food Beverages

3.6.2 CPI - US Housing

3.6.3 CPI - US Transportation

3.6.4 CPI - US Medical

3.6.5 CPI - US Education And Communication

4 Employment vs CPI - by Industry and Area

4.1 Food & Beverages

4.1.1 AK - Alaska : Anchorage

4.1.2 AZ - Arizona : Phoenix-Mesa

4.1.3 CA - California

  • Los Angeles-Riverside-Orange County
  • San Francisco-Oakland-San Jose
  • San Diego

4.1.4 CO - Colorado : Denver-Boulder-Greeley

4.1.5 FL - Florida

  • Miami-Fort Lauderdale
  • Tampa-St. Petersburg-Clearwater

4.1.6 GA - Georgia

  • Atlanta

4.1.7 HI - Hawaii

  • Honolulu

4.1.8 IL - Illinois

  • Chicago-Gary-Kenosha

4.1.9 KS - Kansas

  • Kansas City

4.1.10 MA - Massachusetts

  • Boston-Brockton-Nashua

4.1.11 MI - Michigan

  • Detroit-Ann Arbor-Flint

4.1.12 MN - Minnesota

  • Minneapolis-St. Paul

4.1.13 MO - Missouri

  • St. Louis

4.1.14 NY - New York

  • New York-Northern New Jersey-Long Island

4.1.15 OH - Ohio

  • Cleveland-Akron
  • Cincinnati-Hamilton

4.1.16 OR - Oregon

  • Portland-Salem

4.1.17 PA - Pennsylvania

  • Philadelphia-Wilmington-Atlantic City
  • Pittsburgh

4.1.18 TX - Texas

  • Dallas-Fort Worth
  • Houston-Galveston-Brazoria

4.1.19 WA - Washington

  • Seattle-Tacoma-Bremerton

4.1.20 WI - Wisconsin

  • Milwaukee-Racine

4.2 Housing

4.2.1 NY - New York

4.2.2 MI - Michigan

4.2.3 OR - Oregon

4.3 Transportation

4.3.1 NY - New York

4.3.2 MI - Michigan

4.3.3 OR - Oregon

4.4 Medical

4.4.1 NY - New York

4.4.2 MI - Michigan

4.4.3 OR - Oregon

4.5 Education & Communication

4.5.1 NY - New York

4.5.2 MI - Michigan

4.5.3 OR - Oregon

5 RAJWANT WORK

## `geom_smooth()` using formula 'y ~ x'

## List of 4
##  $ axis.text.x:List of 11
##   ..$ family       : NULL
##   ..$ face         : NULL
##   ..$ colour       : chr "gray"
##   ..$ size         : 'rel' num 0.86
##   ..$ hjust        : num 1
##   ..$ vjust        : NULL
##   ..$ angle        : num 60
##   ..$ lineheight   : NULL
##   ..$ margin       : NULL
##   ..$ debug        : NULL
##   ..$ inherit.blank: logi FALSE
##   ..- attr(*, "class")= chr [1:2] "element_text" "element"
##  $ y          : chr "Sector"
##  $ x          : chr "Years"
##  $ title      : chr "USSFoodBeverage Data for No. Of Jobs and CPI Value in NY"
##  - attr(*, "class")= chr [1:2] "theme" "gg"
##  - attr(*, "complete")= logi FALSE
##  - attr(*, "validate")= logi TRUE

5.1 Model

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
## 
##     lift
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading
## 
## Call:
## lm(formula = .outcome ~ ., data = dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -141982  -33811    -302   30145  148496 
## 
## Coefficients: (1 not defined because of singularities)
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.918e+05  3.142e+04   9.287 7.38e-13 ***
## Year1999    -3.829e+03  4.189e+04  -0.091 0.927494    
## Year2000     2.578e+04  4.174e+04   0.618 0.539326    
## Year2001     3.835e+04  3.933e+04   0.975 0.333891    
## Year2002     4.543e+04  3.933e+04   1.155 0.253104    
## Year2003     5.607e+04  4.239e+04   1.322 0.191478    
## Year2004     7.446e+04  3.987e+04   1.868 0.067154 .  
## Year2005     8.853e+04  4.303e+04   2.057 0.044417 *  
## Year2006     1.064e+05  4.349e+04   2.446 0.017670 *  
## Year2007     1.190e+05  4.792e+04   2.483 0.016089 *  
## Year2008     1.397e+05  4.615e+04   3.028 0.003742 ** 
## Year2009     1.395e+05  4.692e+04   2.973 0.004370 ** 
## Year2010     1.621e+05  4.642e+04   3.493 0.000952 ***
## Year2011     1.709e+05  4.532e+04   3.771 0.000399 ***
## Year2012     1.633e+05  4.924e+04   3.316 0.001620 ** 
## Year2013     1.869e+05  4.719e+04   3.961 0.000216 ***
## Year2014     1.982e+05  5.909e+04   3.354 0.001448 ** 
## Year2015     2.303e+05  4.427e+04   5.202 3.00e-06 ***
## Year2016     2.593e+05  5.744e+04   4.514 3.40e-05 ***
## Year2017     2.408e+05  4.215e+04   5.714 4.65e-07 ***
## Year2018            NA         NA      NA       NA    
## C_ItemSAF   -1.453e+05  3.882e+04  -3.742 0.000437 ***
## C_ItemSAH    1.903e+05  8.301e+04   2.293 0.025699 *  
## C_ItemSAM    1.043e+06  2.330e+04  44.749  < 2e-16 ***
## C_ItemSAT   -2.298e+04  4.981e+04  -0.461 0.646392    
## CPI         -3.298e+00  3.231e+00  -1.021 0.311804    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 61960 on 55 degrees of freedom
## Multiple R-squared:  0.9866, Adjusted R-squared:  0.9807 
## F-statistic: 168.5 on 24 and 55 DF,  p-value: < 2.2e-16

## 'data.frame':    80 obs. of  4 variables:
##  $ Year  : Factor w/ 21 levels "1998","1999",..: 1 3 4 5 7 11 12 13 14 15 ...
##  $ Jobs  : Factor w/ 79 levels "202970","205107",..: 19 23 25 18 20 28 24 21 26 29 ...
##  $ C_Item: Factor w/ 5 levels "SAE","SAF","SAH",..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ CPI   : Factor w/ 80 levels "1196.3","1207.7",..: 34 42 40 36 48 60 56 59 66 68 ...
## [1] A101
## 45 Levels: 0000 0100 0200 0300 0400 A000 A100 A101 A102 A103 A104 A200 ... X400
## 
## Call:
## lm(formula = Jobs ~ CPI + C_Item, data = ALL_NY_train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -238919  -41944   -5136   25157  299676 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.819e+05  2.274e+04  16.791  < 2e-16 ***
## CPI          7.314e+00  2.727e+00   2.682  0.00903 ** 
## C_ItemSAF   -2.541e+05  4.090e+04  -6.214 2.76e-08 ***
## C_ItemSAH   -7.161e+04  7.466e+04  -0.959  0.34063    
## C_ItemSAM    1.017e+06  3.135e+04  32.443  < 2e-16 ***
## C_ItemSAT   -1.626e+05  5.094e+04  -3.192  0.00207 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 87680 on 74 degrees of freedom
## Multiple R-squared:  0.9638, Adjusted R-squared:  0.9614 
## F-statistic: 394.5 on 5 and 74 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = Jobs ~ (CPI^2) + C_Item, data = ALL_NY_train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -238919  -41944   -5136   25157  299676 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.819e+05  2.274e+04  16.791  < 2e-16 ***
## CPI          7.314e+00  2.727e+00   2.682  0.00903 ** 
## C_ItemSAF   -2.541e+05  4.090e+04  -6.214 2.76e-08 ***
## C_ItemSAH   -7.161e+04  7.466e+04  -0.959  0.34063    
## C_ItemSAM    1.017e+06  3.135e+04  32.443  < 2e-16 ***
## C_ItemSAT   -1.626e+05  5.094e+04  -3.192  0.00207 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 87680 on 74 degrees of freedom
## Multiple R-squared:  0.9638, Adjusted R-squared:  0.9614 
## F-statistic: 394.5 on 5 and 74 DF,  p-value: < 2.2e-16
## 
## Call:
## lm(formula = Jobs ~ CPI + Year, data = ALL_NY_train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -543365 -328882 -178870  239324  874565 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 608217.21  219290.01   2.774  0.00741 **
## CPI            -15.04       5.88  -2.558  0.01310 * 
## Year1999     57003.61  319638.10   0.178  0.85907   
## Year2000    113213.91  320094.66   0.354  0.72483   
## Year2001     50576.44  301418.64   0.168  0.86732   
## Year2002     57639.07  301418.44   0.191  0.84901   
## Year2003    131628.51  319744.44   0.412  0.68207   
## Year2004    101161.63  301652.70   0.335  0.73854   
## Year2005    175316.88  319939.52   0.548  0.58578   
## Year2006    199568.55  320094.78   0.623  0.53538   
## Year2007     -8440.53  349762.90  -0.024  0.98083   
## Year2008    294192.97  323627.32   0.909  0.36702   
## Year2009    249079.79  348085.16   0.716  0.47708   
## Year2010    326193.37  321806.45   1.014  0.31490   
## Year2011    253681.42  304191.90   0.834  0.40767   
## Year2012     74205.25  325084.20   0.228  0.82023   
## Year2013    229539.76  320171.91   0.717  0.47625   
## Year2014    191238.92  409103.60   0.467  0.64189   
## Year2015    305188.84  303679.30   1.005  0.31901   
## Year2016    320336.72  409891.32   0.782  0.43762   
## Year2017    269002.74  320008.39   0.841  0.40396   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 476500 on 59 degrees of freedom
## Multiple R-squared:  0.1487, Adjusted R-squared:  -0.1399 
## F-statistic: 0.5153 on 20 and 59 DF,  p-value: 0.949
## Warning in predict.lm(modelFit, newdata): prediction from a rank-deficient fit
## may be misleading

## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2006, Year2018
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Partial Least Squares 
## 
## 80 samples
##  3 predictor
## 
## Pre-processing: centered (25), scaled (25) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 72, 72, 72, 72, 72, 72, ... 
## Resampling results across tuning parameters:
## 
##   ncomp  RMSE       Rsquared   MAE      
##    1     262509.32  0.6477474  217562.44
##    2     167818.17  0.8622355  140030.97
##    3      96337.03  0.9561960   81301.18
##    4      78353.72  0.9748132   63473.45
##    5      79356.89  0.9718313   64146.98
##    6      80180.07  0.9686154   66026.58
##    7      82402.21  0.9661310   68664.52
##    8      82283.96  0.9663358   68616.73
##    9      83158.41  0.9651061   69421.75
##   10      84395.59  0.9630032   71213.14
##   11      84342.06  0.9619838   71189.22
##   12      84362.30  0.9619403   71196.97
##   13      84362.44  0.9619400   71197.03
##   14      84362.44  0.9619400   71197.03
##   15      84362.44  0.9619400   71197.03
##   16      84362.44  0.9619400   71197.03
##   17      84362.44  0.9619400   71197.03
##   18      84362.44  0.9619400   71197.03
##   19      84362.44  0.9619400   71197.03
##   20      84362.44  0.9619400   71197.03
## 
## Rsquared was used to select the optimal model using the largest value.
## The final value used for the model was ncomp = 4.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
## prediction from a rank-deficient fit may be misleading
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
## prediction from a rank-deficient fit may be misleading
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
## prediction from a rank-deficient fit may be misleading
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
## prediction from a rank-deficient fit may be misleading
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
## prediction from a rank-deficient fit may be misleading
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
## prediction from a rank-deficient fit may be misleading
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
## prediction from a rank-deficient fit may be misleading
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
## prediction from a rank-deficient fit may be misleading
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
## prediction from a rank-deficient fit may be misleading
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Warning in predict.lm(object, newdata, se.fit, scale = 1, type = if (type == :
## prediction from a rank-deficient fit may be misleading
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Year2018
## Generalized Linear Model 
## 
## 80 samples
##  3 predictor
## 
## Pre-processing: centered (25), scaled (25) 
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 72, 72, 72, 72, 72, 72, ... 
## Resampling results:
## 
##   RMSE      Rsquared   MAE     
##   75224.05  0.9751087  60425.63
test_model(test_dataset = model_pls,
           predictor_dataset = ALL_NY_train,
           test_type = "regression")


# TEST MODEL with Train  data
xyplot(ALL_NY_train$Jobs ~ predict(model_pls),
 ## plot the points (type = 'p') and a background grid ('g')
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Observed")


xyplot(ALL_NY_train$Jobs ~ predict(lm),
 ## plot the points (type = 'p') and a background grid ('g')
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Observed")

xyplot(ALL_NY_train$Jobs ~ predict(lm_cpi2_item),
 ## plot the points (type = 'p') and a background grid ('g')
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Observed")


xyplot(ALL_NY_train$Jobs ~ predict(lm_cpi_item),
 ## plot the points (type = 'p') and a background grid ('g')
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Observed")

xyplot(ALL_NY_train$Jobs ~ predict(model_glm),
 ## plot the points (type = 'p') and a background grid ('g')
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Observed")

# TEST MODEL with TEst data
xyplot(ALL_NY_test$Jobs ~ predict(model_pls,ALL_NY_test),
 ## plot the points (type = 'p') and a background grid ('g')
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Observed")

xyplot(ALL_NY_test$Jobs ~ predict(lm,ALL_NY_test),
 ## plot the points (type = 'p') and a background grid ('g')
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Observed")

xyplot(ALL_NY_test$Jobs ~ predict(lm_cpi_item,ALL_NY_test),
 ## plot the points (type = 'p') and a background grid ('g')
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Observed")


# REssidual Plot with train data
 xyplot(resid(model_pls) ~ predict(model_pls),
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Residuals")
 
  xyplot(resid(lm) ~ predict(lm),
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Residuals")

     xyplot(resid(lm_cpi_item) ~ predict(lm_cpi_item),
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Residuals")
    
   xyplot(resid(lm_cpi2_item) ~ predict(lm_cpi2_item),
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Residuals")

   
      xyplot(resid(lm_cpi2_item) ~ predict(lm_cpi2_item),
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Residuals")
      
 
 # REssidual Plot with Test data
 xyplot(resid(model_pls) ~ predict(model_pls,ALL_NY_test),
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Residuals")
 
  xyplot(resid(model_glm) ~ predict(model_glm,ALL_NY_test),
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Residuals")
  
    xyplot(resid(lm_cpi_item) ~ predict(lm_cpi_item,ALL_NY_test),
 type = c("p", "g"),
 xlab = "Predicted", ylab = "Residuals")
    
    

    
data.frame("LM Model"= defaultSummary(data.frame(obs=ALL_NY_test$Jobs,pred=predict(lm, ALL_NY_test))),"MAPE" =  MLmetrics::MAPE(predict(lm, ALL_NY_test),ALL_NY_test$Jobs))

data.frame("lm_cpi2_item Model"= defaultSummary(data.frame(obs=ALL_NY_test$Jobs,pred=predict(lm_cpi2_item, ALL_NY_test))),"MAPE" =  MLmetrics::MAPE(predict(lm_cpi2_item, ALL_NY_test),ALL_NY_test$Jobs))

data.frame("lm_cpi_item Model"= defaultSummary(data.frame(obs=ALL_NY_test$Jobs,pred=predict(lm_cpi_item, ALL_NY_test))),"MAPE" =  MLmetrics::MAPE(predict(lm_cpi2_item, ALL_NY_test),ALL_NY_test$Jobs))

data.frame("lm_cpi_year Model"= defaultSummary(data.frame(obs=ALL_NY_test$Jobs,pred=predict(lm_cpi_year, ALL_NY_test))),"MAPE" =  MLmetrics::MAPE(predict(lm_cpi_year, ALL_NY_test),ALL_NY_test$Jobs))

data.frame("PLS Model"= defaultSummary(data.frame(obs=ALL_NY_test$Jobs,pred=predict(model_pls, ALL_NY_test))),"MAPE" =  MLmetrics::MAPE(predict(model_pls, ALL_NY_test),ALL_NY_test$Jobs))


cbind(
  data.frame("LM Model"= defaultSummary(data.frame(obs=ALL_NY_test$Jobs,pred=predict(lm, ALL_NY_test))),"MAPE" =  MLmetrics::MAPE(predict(lm, ALL_NY_test),ALL_NY_test$Jobs)),

data.frame("lm_cpi2_item Model"= defaultSummary(data.frame(obs=ALL_NY_test$Jobs,pred=predict(lm_cpi2_item, ALL_NY_test))),"MAPE" =  MLmetrics::MAPE(predict(lm_cpi2_item, ALL_NY_test),ALL_NY_test$Jobs)),

data.frame("lm_cpi_item Model"= defaultSummary(data.frame(obs=ALL_NY_test$Jobs,pred=predict(lm_cpi_item, ALL_NY_test))),"MAPE" =  MLmetrics::MAPE(predict(lm_cpi2_item, ALL_NY_test),ALL_NY_test$Jobs)),

data.frame("lm_cpi_year Model"= defaultSummary(data.frame(obs=ALL_NY_test$Jobs,pred=predict(lm_cpi_year, ALL_NY_test))),"MAPE" =  MLmetrics::MAPE(predict(lm_cpi_year, ALL_NY_test),ALL_NY_test$Jobs)),

data.frame("PLS Model"= defaultSummary(data.frame(obs=ALL_NY_test$Jobs,pred=predict(model_pls, ALL_NY_test))),"MAPE" =  MLmetrics::MAPE(predict(model_pls, ALL_NY_test),ALL_NY_test$Jobs))
)

Debabrata Kabiraj, Joseph Simone and Rajwant Mishra

Oct 1, 2020