Preparing Data for Modeling
## Store Date Weekly_Sales Holiday_Flag
## Min. : 1 Length:6435 Min. : 209986 Min. :0.00000
## 1st Qu.:12 Class :character 1st Qu.: 553350 1st Qu.:0.00000
## Median :23 Mode :character Median : 960746 Median :0.00000
## Mean :23 Mean :1046965 Mean :0.06993
## 3rd Qu.:34 3rd Qu.:1420159 3rd Qu.:0.00000
## Max. :45 Max. :3818686 Max. :1.00000
## Temperature Fuel_Price CPI Unemployment
## Min. : -2.06 Min. :2.472 Min. :126.1 Min. : 3.879
## 1st Qu.: 47.46 1st Qu.:2.933 1st Qu.:131.7 1st Qu.: 6.891
## Median : 62.67 Median :3.445 Median :182.6 Median : 7.874
## Mean : 60.66 Mean :3.359 Mean :171.6 Mean : 7.999
## 3rd Qu.: 74.94 3rd Qu.:3.735 3rd Qu.:212.7 3rd Qu.: 8.622
## Max. :100.14 Max. :4.468 Max. :227.2 Max. :14.313
Linear Model to Predict Sales
- A linear model was created using all possible predictors, except for
Date, predicting sales. This model allowed for the understanding of the
individual effects of each predictor
- Then a model with all possible interactions was created, again,
excluding Date
##
## Call:
## lm(formula = Weekly_Sales ~ . - Date, data = trainlm)
##
## Residuals:
## Min 1Q Median 3Q Max
## -513804 -69546 -11882 39468 1823766
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1357543.2 278781.6 4.870 1.16e-06 ***
## Store2 351630.2 22573.7 15.577 < 2e-16 ***
## Store3 -1180716.9 23170.7 -50.957 < 2e-16 ***
## Store4 716535.6 116599.9 6.145 8.68e-10 ***
## Store5 -1282576.3 24140.1 -53.131 < 2e-16 ***
## Store6 -42348.2 23255.9 -1.821 0.068679 .
## Store7 -946003.2 35215.5 -26.863 < 2e-16 ***
## Store8 -699172.6 23983.0 -29.153 < 2e-16 ***
## Store9 -1068955.1 23877.8 -44.768 < 2e-16 ***
## Store10 628887.4 112929.0 5.569 2.71e-08 ***
## Store11 -217449.3 23756.1 -9.153 < 2e-16 ***
## Store12 -160159.8 105614.7 -1.516 0.129475
## Store13 665390.8 114607.5 5.806 6.85e-09 ***
## Store14 541155.6 43009.0 12.582 < 2e-16 ***
## Store15 -711503.8 105846.8 -6.722 2.02e-11 ***
## Store16 -1033264.4 39281.3 -26.304 < 2e-16 ***
## Store17 -482891.3 115651.4 -4.175 3.03e-05 ***
## Store18 -236188.4 103207.7 -2.288 0.022156 *
## Store19 88916.7 105944.5 0.839 0.401359
## Store20 568352.9 25871.2 21.969 < 2e-16 ***
## Store21 -815160.0 22958.8 -35.505 < 2e-16 ***
## Store22 -323211.6 100196.3 -3.226 0.001265 **
## Store23 -29770.2 113150.0 -0.263 0.792482
## Store24 27496.8 104872.5 0.262 0.793185
## Store25 -847649.9 25399.5 -33.373 < 2e-16 ***
## Store26 -359445.5 105525.4 -3.406 0.000664 ***
## Store27 439229.1 101050.9 4.347 1.41e-05 ***
## Store28 139953.5 105779.4 1.323 0.185880
## Store29 -750276.3 101676.3 -7.379 1.89e-13 ***
## Store30 -1122743.3 22425.9 -50.065 < 2e-16 ***
## Store31 -158464.1 22737.4 -6.969 3.65e-12 ***
## Store32 -332244.3 35092.7 -9.468 < 2e-16 ***
## Store33 -1023329.2 112745.7 -9.076 < 2e-16 ***
## Store34 -312719.9 107897.4 -2.898 0.003770 **
## Store35 -415109.3 98516.6 -4.214 2.56e-05 ***
## Store36 -1180433.3 23588.0 -50.044 < 2e-16 ***
## Store37 -1035121.3 22767.4 -45.465 < 2e-16 ***
## Store38 -796167.5 105624.2 -7.538 5.75e-14 ***
## Store39 -86778.8 23088.3 -3.759 0.000173 ***
## Store40 -455869.6 113041.5 -4.033 5.60e-05 ***
## Store41 -279251.2 37950.0 -7.358 2.20e-13 ***
## Store42 -730734.7 113110.0 -6.460 1.16e-10 ***
## Store43 -857065.6 25926.2 -33.058 < 2e-16 ***
## Store44 -1056135.9 115347.2 -9.156 < 2e-16 ***
## Store45 -676430.4 43141.6 -15.679 < 2e-16 ***
## Holiday_Flag1 75352.5 9602.9 7.847 5.30e-15 ***
## Temperature -904.9 159.4 -5.677 1.46e-08 ***
## Fuel_Price -42007.4 8681.7 -4.839 1.35e-06 ***
## CPI 2692.5 1262.7 2.132 0.033030 *
## Unemployment -23933.5 5245.3 -4.563 5.18e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 162300 on 4475 degrees of freedom
## Multiple R-squared: 0.9183, Adjusted R-squared: 0.9174
## F-statistic: 1027 on 49 and 4475 DF, p-value: < 2.2e-16
## Backward Elimination Method
## ---------------------------
##
## Candidate Terms:
##
## 1. Store
## 2. Holiday_Flag
## 3. Temperature
## 4. Fuel_Price
## 5. CPI
## 6. Unemployment
##
##
## Step => 0
## Model => Weekly_Sales ~ Store + Holiday_Flag + Temperature + Fuel_Price + CPI + Unemployment
## R2 => 0.918
##
## Initiating stepwise selection...
##
##
## No more variables to be removed.
##
## Call:
## lm(formula = paste(response, "~", paste(c(include, cterms), collapse = " + ")),
## data = l)
##
## Residuals:
## Min 1Q Median 3Q Max
## -513804 -69546 -11882 39468 1823766
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1357543.2 278781.6 4.870 1.16e-06 ***
## Store2 351630.2 22573.7 15.577 < 2e-16 ***
## Store3 -1180716.9 23170.7 -50.957 < 2e-16 ***
## Store4 716535.6 116599.9 6.145 8.68e-10 ***
## Store5 -1282576.3 24140.1 -53.131 < 2e-16 ***
## Store6 -42348.2 23255.9 -1.821 0.068679 .
## Store7 -946003.2 35215.5 -26.863 < 2e-16 ***
## Store8 -699172.6 23983.0 -29.153 < 2e-16 ***
## Store9 -1068955.1 23877.8 -44.768 < 2e-16 ***
## Store10 628887.4 112929.0 5.569 2.71e-08 ***
## Store11 -217449.3 23756.1 -9.153 < 2e-16 ***
## Store12 -160159.8 105614.7 -1.516 0.129475
## Store13 665390.8 114607.5 5.806 6.85e-09 ***
## Store14 541155.6 43009.0 12.582 < 2e-16 ***
## Store15 -711503.8 105846.8 -6.722 2.02e-11 ***
## Store16 -1033264.4 39281.3 -26.304 < 2e-16 ***
## Store17 -482891.3 115651.4 -4.175 3.03e-05 ***
## Store18 -236188.4 103207.7 -2.288 0.022156 *
## Store19 88916.7 105944.5 0.839 0.401359
## Store20 568352.9 25871.2 21.969 < 2e-16 ***
## Store21 -815160.0 22958.8 -35.505 < 2e-16 ***
## Store22 -323211.6 100196.3 -3.226 0.001265 **
## Store23 -29770.2 113150.0 -0.263 0.792482
## Store24 27496.8 104872.5 0.262 0.793185
## Store25 -847649.9 25399.5 -33.373 < 2e-16 ***
## Store26 -359445.5 105525.4 -3.406 0.000664 ***
## Store27 439229.1 101050.9 4.347 1.41e-05 ***
## Store28 139953.5 105779.4 1.323 0.185880
## Store29 -750276.3 101676.3 -7.379 1.89e-13 ***
## Store30 -1122743.3 22425.9 -50.065 < 2e-16 ***
## Store31 -158464.1 22737.4 -6.969 3.65e-12 ***
## Store32 -332244.3 35092.7 -9.468 < 2e-16 ***
## Store33 -1023329.2 112745.7 -9.076 < 2e-16 ***
## Store34 -312719.9 107897.4 -2.898 0.003770 **
## Store35 -415109.3 98516.6 -4.214 2.56e-05 ***
## Store36 -1180433.3 23588.0 -50.044 < 2e-16 ***
## Store37 -1035121.3 22767.4 -45.465 < 2e-16 ***
## Store38 -796167.5 105624.2 -7.538 5.75e-14 ***
## Store39 -86778.8 23088.3 -3.759 0.000173 ***
## Store40 -455869.6 113041.5 -4.033 5.60e-05 ***
## Store41 -279251.2 37950.0 -7.358 2.20e-13 ***
## Store42 -730734.7 113110.0 -6.460 1.16e-10 ***
## Store43 -857065.6 25926.2 -33.058 < 2e-16 ***
## Store44 -1056135.9 115347.2 -9.156 < 2e-16 ***
## Store45 -676430.4 43141.6 -15.679 < 2e-16 ***
## Holiday_Flag1 75352.5 9602.9 7.847 5.30e-15 ***
## Temperature -904.9 159.4 -5.677 1.46e-08 ***
## Fuel_Price -42007.4 8681.7 -4.839 1.35e-06 ***
## CPI 2692.5 1262.7 2.132 0.033030 *
## Unemployment -23933.5 5245.3 -4.563 5.18e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 162300 on 4475 degrees of freedom
## Multiple R-squared: 0.9183, Adjusted R-squared: 0.9174
## F-statistic: 1027 on 49 and 4475 DF, p-value: < 2.2e-16
Interpret 3 predictors
- Holiday_Flag1 75352.5 This means that when there is a holiday sales
will increase
- Fuel_Price -42007.4 This means that when fuel prices go up sales
will decrease
- CPI 2692.5 This means that when the CPI goes up sales will
increase
R^2 of the model (calculated on the validation data) is
0.925259
## Backward Elimination Method
## ---------------------------
##
## Candidate Terms:
##
## 1. Store
## 2. Holiday_Flag
## 3. Temperature
## 4. Fuel_Price
## 5. CPI
## 6. Unemployment
## 7. Store:Holiday_Flag
## 8. Store:Temperature
## 9. Store:Fuel_Price
## 10. Store:CPI
## 11. Store:Unemployment
## 12. Holiday_Flag:Temperature
## 13. Holiday_Flag:Fuel_Price
## 14. Holiday_Flag:CPI
## 15. Holiday_Flag:Unemployment
## 16. Temperature:Fuel_Price
## 17. Temperature:CPI
## 18. Temperature:Unemployment
## 19. Fuel_Price:CPI
## 20. Fuel_Price:Unemployment
## 21. CPI:Unemployment
##
##
## Step => 0
## Model => Weekly_Sales ~ Store + Holiday_Flag + Temperature + Fuel_Price + CPI + Unemployment + Store:Holiday_Flag + Store:Temperature + Store:Fuel_Price + Store:CPI + Store:Unemployment + Holiday_Flag:Temperature + Holiday_Flag:Fuel_Price + Holiday_Flag:CPI + Holiday_Flag:Unemployment + Temperature:Fuel_Price + Temperature:CPI + Temperature:Unemployment + Fuel_Price:CPI + Fuel_Price:Unemployment + CPI:Unemployment
## R2 => 0.93
##
## Initiating stepwise selection...
##
## Step => 1
## Removed => Unemployment
## Model => Weekly_Sales ~ Store + Holiday_Flag + Temperature + Fuel_Price + CPI + Store:Holiday_Flag + Store:Temperature + Store:Fuel_Price + Store:CPI + Store:Unemployment + Holiday_Flag:Temperature + Holiday_Flag:Fuel_Price + Holiday_Flag:CPI + Holiday_Flag:Unemployment + Temperature:Fuel_Price + Temperature:CPI + Temperature:Unemployment + Fuel_Price:CPI + Fuel_Price:Unemployment + CPI:Unemployment
## R2 => 0.93027
##
## Step => 2
## Removed => Holiday_Flag:CPI
## Model => Weekly_Sales ~ Store + Holiday_Flag + Temperature + Fuel_Price + CPI + Store:Holiday_Flag + Store:Temperature + Store:Fuel_Price + Store:CPI + Store:Unemployment + Holiday_Flag:Temperature + Holiday_Flag:Fuel_Price + Holiday_Flag:Unemployment + Temperature:Fuel_Price + Temperature:CPI + Temperature:Unemployment + Fuel_Price:CPI + Fuel_Price:Unemployment + CPI:Unemployment
## R2 => 0.93027
##
## Step => 3
## Removed => Fuel_Price:Unemployment
## Model => Weekly_Sales ~ Store + Holiday_Flag + Temperature + Fuel_Price + CPI + Store:Holiday_Flag + Store:Temperature + Store:Fuel_Price + Store:CPI + Store:Unemployment + Holiday_Flag:Temperature + Holiday_Flag:Fuel_Price + Holiday_Flag:Unemployment + Temperature:Fuel_Price + Temperature:CPI + Temperature:Unemployment + Fuel_Price:CPI + CPI:Unemployment
## R2 => 0.93027
##
## Step => 4
## Removed => CPI:Unemployment
## Model => Weekly_Sales ~ Store + Holiday_Flag + Temperature + Fuel_Price + CPI + Store:Holiday_Flag + Store:Temperature + Store:Fuel_Price + Store:CPI + Store:Unemployment + Holiday_Flag:Temperature + Holiday_Flag:Fuel_Price + Holiday_Flag:Unemployment + Temperature:Fuel_Price + Temperature:CPI + Temperature:Unemployment + Fuel_Price:CPI
## R2 => 0.93026
##
## Step => 5
## Removed => Store:Fuel_Price
## Model => Weekly_Sales ~ Store + Holiday_Flag + Temperature + Fuel_Price + CPI + Store:Holiday_Flag + Store:Temperature + Store:CPI + Store:Unemployment + Holiday_Flag:Temperature + Holiday_Flag:Fuel_Price + Holiday_Flag:Unemployment + Temperature:Fuel_Price + Temperature:CPI + Temperature:Unemployment + Fuel_Price:CPI
## R2 => 0.92952
##
##
## No more variables to be removed.
##
## Variables Removed:
##
## => Unemployment
## => Holiday_Flag:CPI
## => Fuel_Price:Unemployment
## => CPI:Unemployment
## => Store:Fuel_Price