DATA PREP FOR MODELING

Data Summary : The variable store, and holiday flag are continuous, but should be categorical

##      Store        Date            Weekly_Sales      Holiday_Flag    
##  Min.   : 1   Length:6435        Min.   : 209986   Min.   :0.00000  
##  1st Qu.:12   Class :character   1st Qu.: 553350   1st Qu.:0.00000  
##  Median :23   Mode  :character   Median : 960746   Median :0.00000  
##  Mean   :23                      Mean   :1046965   Mean   :0.06993  
##  3rd Qu.:34                      3rd Qu.:1420159   3rd Qu.:0.00000  
##  Max.   :45                      Max.   :3818686   Max.   :1.00000  
##   Temperature       Fuel_Price         CPI         Unemployment   
##  Min.   : -2.06   Min.   :2.472   Min.   :126.1   Min.   : 3.879  
##  1st Qu.: 47.46   1st Qu.:2.933   1st Qu.:131.7   1st Qu.: 6.891  
##  Median : 62.67   Median :3.445   Median :182.6   Median : 7.874  
##  Mean   : 60.66   Mean   :3.359   Mean   :171.6   Mean   : 7.999  
##  3rd Qu.: 74.94   3rd Qu.:3.735   3rd Qu.:212.7   3rd Qu.: 8.622  
##  Max.   :100.14   Max.   :4.468   Max.   :227.2   Max.   :14.313

Data Summary : categorical variabes as factors, and a random variable for splitting the data

##      Store          Date            Weekly_Sales     Holiday_Flag
##  1      : 143   Length:6435        Min.   : 209986   0:5985      
##  2      : 143   Class :character   1st Qu.: 553350   1: 450      
##  3      : 143   Mode  :character   Median : 960746               
##  4      : 143                      Mean   :1046965               
##  5      : 143                      3rd Qu.:1420159               
##  6      : 143                      Max.   :3818686               
##  (Other):5577                                                    
##   Temperature       Fuel_Price         CPI         Unemployment   
##  Min.   : -2.06   Min.   :2.472   Min.   :126.1   Min.   : 3.879  
##  1st Qu.: 47.46   1st Qu.:2.933   1st Qu.:131.7   1st Qu.: 6.891  
##  Median : 62.67   Median :3.445   Median :182.6   Median : 7.874  
##  Mean   : 60.66   Mean   :3.359   Mean   :171.6   Mean   : 7.999  
##  3rd Qu.: 74.94   3rd Qu.:3.735   3rd Qu.:212.7   3rd Qu.: 8.622  
##  Max.   :100.14   Max.   :4.468   Max.   :227.2   Max.   :14.313  
##                                                                   
##      random         
##  Min.   :0.0000653  
##  1st Qu.:0.2527042  
##  Median :0.4945870  
##  Mean   :0.4980825  
##  3rd Qu.:0.7453083  
##  Max.   :0.9999414  
## 
## tibble [6,435 × 9] (S3: tbl_df/tbl/data.frame)
##  $ Store       : Factor w/ 45 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Date        : chr [1:6435] "05-02-2010" "12-02-2010" "19-02-2010" "26-02-2010" ...
##  $ Weekly_Sales: num [1:6435] 1643691 1641957 1611968 1409728 1554807 ...
##  $ Holiday_Flag: Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 1 1 1 ...
##  $ Temperature : num [1:6435] 42.3 38.5 39.9 46.6 46.5 ...
##  $ Fuel_Price  : num [1:6435] 2.57 2.55 2.51 2.56 2.62 ...
##  $ CPI         : num [1:6435] 211 211 211 211 211 ...
##  $ Unemployment: num [1:6435] 8.11 8.11 8.11 8.11 8.11 ...
##  $ random      : num [1:6435] 0.288 0.788 0.409 0.883 0.94 ...

Data split 70/30 into training and validation datasets

  • the training data has 4533 observations
  • the validation data has 1902 observations

LINEAR MODEL TO PREDICT SALES

  1. A linear model was created using all possible predictors, except for Date, predicting sales. This model allowed for the understanding of the individual effects of each predictor.
  2. Then a model with all possible interactions was created, again, excluding Date
  3. A backward selection was employed using alpha = 0.01 in order to obtain a more parsimonious model.
## 
## Call:
## lm(formula = Weekly_Sales ~ . - Date, data = trainlm)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -513804  -69546  -11882   39468 1823766 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    1357543.2   278781.6   4.870 1.16e-06 ***
## Store2          351630.2    22573.7  15.577  < 2e-16 ***
## Store3        -1180716.9    23170.7 -50.957  < 2e-16 ***
## Store4          716535.6   116599.9   6.145 8.68e-10 ***
## Store5        -1282576.3    24140.1 -53.131  < 2e-16 ***
## Store6          -42348.2    23255.9  -1.821 0.068679 .  
## Store7         -946003.2    35215.5 -26.863  < 2e-16 ***
## Store8         -699172.6    23983.0 -29.153  < 2e-16 ***
## Store9        -1068955.1    23877.8 -44.768  < 2e-16 ***
## Store10         628887.4   112929.0   5.569 2.71e-08 ***
## Store11        -217449.3    23756.1  -9.153  < 2e-16 ***
## Store12        -160159.8   105614.7  -1.516 0.129475    
## Store13         665390.8   114607.5   5.806 6.85e-09 ***
## Store14         541155.6    43009.0  12.582  < 2e-16 ***
## Store15        -711503.8   105846.8  -6.722 2.02e-11 ***
## Store16       -1033264.4    39281.3 -26.304  < 2e-16 ***
## Store17        -482891.3   115651.4  -4.175 3.03e-05 ***
## Store18        -236188.4   103207.7  -2.288 0.022156 *  
## Store19          88916.7   105944.5   0.839 0.401359    
## Store20         568352.9    25871.2  21.969  < 2e-16 ***
## Store21        -815160.0    22958.8 -35.505  < 2e-16 ***
## Store22        -323211.6   100196.3  -3.226 0.001265 ** 
## Store23         -29770.2   113150.0  -0.263 0.792482    
## Store24          27496.8   104872.5   0.262 0.793185    
## Store25        -847649.9    25399.5 -33.373  < 2e-16 ***
## Store26        -359445.5   105525.4  -3.406 0.000664 ***
## Store27         439229.1   101050.9   4.347 1.41e-05 ***
## Store28         139953.5   105779.4   1.323 0.185880    
## Store29        -750276.3   101676.3  -7.379 1.89e-13 ***
## Store30       -1122743.3    22425.9 -50.065  < 2e-16 ***
## Store31        -158464.1    22737.4  -6.969 3.65e-12 ***
## Store32        -332244.3    35092.7  -9.468  < 2e-16 ***
## Store33       -1023329.2   112745.7  -9.076  < 2e-16 ***
## Store34        -312719.9   107897.4  -2.898 0.003770 ** 
## Store35        -415109.3    98516.6  -4.214 2.56e-05 ***
## Store36       -1180433.3    23588.0 -50.044  < 2e-16 ***
## Store37       -1035121.3    22767.4 -45.465  < 2e-16 ***
## Store38        -796167.5   105624.2  -7.538 5.75e-14 ***
## Store39         -86778.8    23088.3  -3.759 0.000173 ***
## Store40        -455869.6   113041.5  -4.033 5.60e-05 ***
## Store41        -279251.2    37950.0  -7.358 2.20e-13 ***
## Store42        -730734.7   113110.0  -6.460 1.16e-10 ***
## Store43        -857065.6    25926.2 -33.058  < 2e-16 ***
## Store44       -1056135.9   115347.2  -9.156  < 2e-16 ***
## Store45        -676430.4    43141.6 -15.679  < 2e-16 ***
## Holiday_Flag1    75352.5     9602.9   7.847 5.30e-15 ***
## Temperature       -904.9      159.4  -5.677 1.46e-08 ***
## Fuel_Price      -42007.4     8681.7  -4.839 1.35e-06 ***
## CPI               2692.5     1262.7   2.132 0.033030 *  
## Unemployment    -23933.5     5245.3  -4.563 5.18e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 162300 on 4475 degrees of freedom
## Multiple R-squared:  0.9183, Adjusted R-squared:  0.9174 
## F-statistic:  1027 on 49 and 4475 DF,  p-value: < 2.2e-16
## Backward Elimination Method 
## ---------------------------
## 
## Candidate Terms: 
## 
## 1. Store 
## 2. Holiday_Flag 
## 3. Temperature 
## 4. Fuel_Price 
## 5. CPI 
## 6. Unemployment 
## 
## 
## Step   => 0 
## Model  => Weekly_Sales ~ Store + Holiday_Flag + Temperature + Fuel_Price + CPI + Unemployment 
## R2     => 0.918 
## 
## Initiating stepwise selection... 
## 
## 
## No more variables to be removed.
## 
## Call:
## lm(formula = paste(response, "~", paste(c(include, cterms), collapse = " + ")), 
##     data = l)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -513804  -69546  -11882   39468 1823766 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    1357543.2   278781.6   4.870 1.16e-06 ***
## Store2          351630.2    22573.7  15.577  < 2e-16 ***
## Store3        -1180716.9    23170.7 -50.957  < 2e-16 ***
## Store4          716535.6   116599.9   6.145 8.68e-10 ***
## Store5        -1282576.3    24140.1 -53.131  < 2e-16 ***
## Store6          -42348.2    23255.9  -1.821 0.068679 .  
## Store7         -946003.2    35215.5 -26.863  < 2e-16 ***
## Store8         -699172.6    23983.0 -29.153  < 2e-16 ***
## Store9        -1068955.1    23877.8 -44.768  < 2e-16 ***
## Store10         628887.4   112929.0   5.569 2.71e-08 ***
## Store11        -217449.3    23756.1  -9.153  < 2e-16 ***
## Store12        -160159.8   105614.7  -1.516 0.129475    
## Store13         665390.8   114607.5   5.806 6.85e-09 ***
## Store14         541155.6    43009.0  12.582  < 2e-16 ***
## Store15        -711503.8   105846.8  -6.722 2.02e-11 ***
## Store16       -1033264.4    39281.3 -26.304  < 2e-16 ***
## Store17        -482891.3   115651.4  -4.175 3.03e-05 ***
## Store18        -236188.4   103207.7  -2.288 0.022156 *  
## Store19          88916.7   105944.5   0.839 0.401359    
## Store20         568352.9    25871.2  21.969  < 2e-16 ***
## Store21        -815160.0    22958.8 -35.505  < 2e-16 ***
## Store22        -323211.6   100196.3  -3.226 0.001265 ** 
## Store23         -29770.2   113150.0  -0.263 0.792482    
## Store24          27496.8   104872.5   0.262 0.793185    
## Store25        -847649.9    25399.5 -33.373  < 2e-16 ***
## Store26        -359445.5   105525.4  -3.406 0.000664 ***
## Store27         439229.1   101050.9   4.347 1.41e-05 ***
## Store28         139953.5   105779.4   1.323 0.185880    
## Store29        -750276.3   101676.3  -7.379 1.89e-13 ***
## Store30       -1122743.3    22425.9 -50.065  < 2e-16 ***
## Store31        -158464.1    22737.4  -6.969 3.65e-12 ***
## Store32        -332244.3    35092.7  -9.468  < 2e-16 ***
## Store33       -1023329.2   112745.7  -9.076  < 2e-16 ***
## Store34        -312719.9   107897.4  -2.898 0.003770 ** 
## Store35        -415109.3    98516.6  -4.214 2.56e-05 ***
## Store36       -1180433.3    23588.0 -50.044  < 2e-16 ***
## Store37       -1035121.3    22767.4 -45.465  < 2e-16 ***
## Store38        -796167.5   105624.2  -7.538 5.75e-14 ***
## Store39         -86778.8    23088.3  -3.759 0.000173 ***
## Store40        -455869.6   113041.5  -4.033 5.60e-05 ***
## Store41        -279251.2    37950.0  -7.358 2.20e-13 ***
## Store42        -730734.7   113110.0  -6.460 1.16e-10 ***
## Store43        -857065.6    25926.2 -33.058  < 2e-16 ***
## Store44       -1056135.9   115347.2  -9.156  < 2e-16 ***
## Store45        -676430.4    43141.6 -15.679  < 2e-16 ***
## Holiday_Flag1    75352.5     9602.9   7.847 5.30e-15 ***
## Temperature       -904.9      159.4  -5.677 1.46e-08 ***
## Fuel_Price      -42007.4     8681.7  -4.839 1.35e-06 ***
## CPI               2692.5     1262.7   2.132 0.033030 *  
## Unemployment    -23933.5     5245.3  -4.563 5.18e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 162300 on 4475 degrees of freedom
## Multiple R-squared:  0.9183, Adjusted R-squared:  0.9174 
## F-statistic:  1027 on 49 and 4475 DF,  p-value: < 2.2e-16

INTERPRET 3 PREDICTORS

WHAT IS THE R^2 OF THE MODEL