Data prepared for modeling
Data Summary: The variables: store, and holiday flag are continuous,
but should be categorical
## Store Date Weekly_Sales Holiday_Flag
## Min. : 1 Length:6435 Min. : 209986 Min. :0.00000
## 1st Qu.:12 Class :character 1st Qu.: 553350 1st Qu.:0.00000
## Median :23 Mode :character Median : 960746 Median :0.00000
## Mean :23 Mean :1046965 Mean :0.06993
## 3rd Qu.:34 3rd Qu.:1420159 3rd Qu.:0.00000
## Max. :45 Max. :3818686 Max. :1.00000
## Temperature Fuel_Price CPI Unemployment
## Min. : -2.06 Min. :2.472 Min. :126.1 Min. : 3.879
## 1st Qu.: 47.46 1st Qu.:2.933 1st Qu.:131.7 1st Qu.: 6.891
## Median : 62.67 Median :3.445 Median :182.6 Median : 7.874
## Mean : 60.66 Mean :3.359 Mean :171.6 Mean : 7.999
## 3rd Qu.: 74.94 3rd Qu.:3.735 3rd Qu.:212.7 3rd Qu.: 8.622
## Max. :100.14 Max. :4.468 Max. :227.2 Max. :14.313
Data Summary now with categorical variables as factors, and a random
variable for splitting the data
## Store Date Weekly_Sales Holiday_Flag
## 1 : 143 Length:6435 Min. : 209986 0:5985
## 2 : 143 Class :character 1st Qu.: 553350 1: 450
## 3 : 143 Mode :character Median : 960746
## 4 : 143 Mean :1046965
## 5 : 143 3rd Qu.:1420159
## 6 : 143 Max. :3818686
## (Other):5577
## Temperature Fuel_Price CPI Unemployment
## Min. : -2.06 Min. :2.472 Min. :126.1 Min. : 3.879
## 1st Qu.: 47.46 1st Qu.:2.933 1st Qu.:131.7 1st Qu.: 6.891
## Median : 62.67 Median :3.445 Median :182.6 Median : 7.874
## Mean : 60.66 Mean :3.359 Mean :171.6 Mean : 7.999
## 3rd Qu.: 74.94 3rd Qu.:3.735 3rd Qu.:212.7 3rd Qu.: 8.622
## Max. :100.14 Max. :4.468 Max. :227.2 Max. :14.313
##
## random
## Min. :0.0000653
## 1st Qu.:0.2527042
## Median :0.4945870
## Mean :0.4980825
## 3rd Qu.:0.7453083
## Max. :0.9999414
##
## tibble [6,435 × 9] (S3: tbl_df/tbl/data.frame)
## $ Store : Factor w/ 45 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Date : chr [1:6435] "05-02-2010" "12-02-2010" "19-02-2010" "26-02-2010" ...
## $ Weekly_Sales: num [1:6435] 1643691 1641957 1611968 1409728 1554807 ...
## $ Holiday_Flag: Factor w/ 2 levels "0","1": 1 2 1 1 1 1 1 1 1 1 ...
## $ Temperature : num [1:6435] 42.3 38.5 39.9 46.6 46.5 ...
## $ Fuel_Price : num [1:6435] 2.57 2.55 2.51 2.56 2.62 ...
## $ CPI : num [1:6435] 211 211 211 211 211 ...
## $ Unemployment: num [1:6435] 8.11 8.11 8.11 8.11 8.11 ...
## $ random : num [1:6435] 0.288 0.788 0.409 0.883 0.94 ...
Linear model to predict sales
## Backward Elimination Method
## ---------------------------
##
## Candidate Terms:
##
## 1. Store
## 2. Holiday_Flag
## 3. Temperature
## 4. Fuel_Price
## 5. CPI
## 6. Unemployment
## 7. Store:Holiday_Flag
## 8. Store:Temperature
## 9. Store:Fuel_Price
## 10. Store:CPI
## 11. Store:Unemployment
## 12. Holiday_Flag:Temperature
## 13. Holiday_Flag:Fuel_Price
## 14. Holiday_Flag:CPI
## 15. Holiday_Flag:Unemployment
## 16. Temperature:Fuel_Price
## 17. Temperature:CPI
## 18. Temperature:Unemployment
## 19. Fuel_Price:CPI
## 20. Fuel_Price:Unemployment
## 21. CPI:Unemployment
##
##
## Step => 0
## Model => Weekly_Sales ~ Store + Holiday_Flag + Temperature + Fuel_Price + CPI + Unemployment + Store:Holiday_Flag + Store:Temperature + Store:Fuel_Price + Store:CPI + Store:Unemployment + Holiday_Flag:Temperature + Holiday_Flag:Fuel_Price + Holiday_Flag:CPI + Holiday_Flag:Unemployment + Temperature:Fuel_Price + Temperature:CPI + Temperature:Unemployment + Fuel_Price:CPI + Fuel_Price:Unemployment + CPI:Unemployment
## R2 => 0.93
##
## Initiating stepwise selection...
##
## Step => 1
## Removed => Unemployment
## Model => Weekly_Sales ~ Store + Holiday_Flag + Temperature + Fuel_Price + CPI + Store:Holiday_Flag + Store:Temperature + Store:Fuel_Price + Store:CPI + Store:Unemployment + Holiday_Flag:Temperature + Holiday_Flag:Fuel_Price + Holiday_Flag:CPI + Holiday_Flag:Unemployment + Temperature:Fuel_Price + Temperature:CPI + Temperature:Unemployment + Fuel_Price:CPI + Fuel_Price:Unemployment + CPI:Unemployment
## R2 => 0.93027
##
## Step => 2
## Removed => Holiday_Flag:CPI
## Model => Weekly_Sales ~ Store + Holiday_Flag + Temperature + Fuel_Price + CPI + Store:Holiday_Flag + Store:Temperature + Store:Fuel_Price + Store:CPI + Store:Unemployment + Holiday_Flag:Temperature + Holiday_Flag:Fuel_Price + Holiday_Flag:Unemployment + Temperature:Fuel_Price + Temperature:CPI + Temperature:Unemployment + Fuel_Price:CPI + Fuel_Price:Unemployment + CPI:Unemployment
## R2 => 0.93027
##
## Step => 3
## Removed => Fuel_Price:Unemployment
## Model => Weekly_Sales ~ Store + Holiday_Flag + Temperature + Fuel_Price + CPI + Store:Holiday_Flag + Store:Temperature + Store:Fuel_Price + Store:CPI + Store:Unemployment + Holiday_Flag:Temperature + Holiday_Flag:Fuel_Price + Holiday_Flag:Unemployment + Temperature:Fuel_Price + Temperature:CPI + Temperature:Unemployment + Fuel_Price:CPI + CPI:Unemployment
## R2 => 0.93027
##
## Step => 4
## Removed => CPI:Unemployment
## Model => Weekly_Sales ~ Store + Holiday_Flag + Temperature + Fuel_Price + CPI + Store:Holiday_Flag + Store:Temperature + Store:Fuel_Price + Store:CPI + Store:Unemployment + Holiday_Flag:Temperature + Holiday_Flag:Fuel_Price + Holiday_Flag:Unemployment + Temperature:Fuel_Price + Temperature:CPI + Temperature:Unemployment + Fuel_Price:CPI
## R2 => 0.93026
##
## Step => 5
## Removed => Store:Fuel_Price
## Model => Weekly_Sales ~ Store + Holiday_Flag + Temperature + Fuel_Price + CPI + Store:Holiday_Flag + Store:Temperature + Store:CPI + Store:Unemployment + Holiday_Flag:Temperature + Holiday_Flag:Fuel_Price + Holiday_Flag:Unemployment + Temperature:Fuel_Price + Temperature:CPI + Temperature:Unemployment + Fuel_Price:CPI
## R2 => 0.92952
##
##
## No more variables to be removed.
##
## Variables Removed:
##
## => Unemployment
## => Holiday_Flag:CPI
## => Fuel_Price:Unemployment
## => CPI:Unemployment
## => Store:Fuel_Price
Interpretation of 3 predictors
Holiday_Flag1 is positive. This means that when there is a holiday
sales will increase
Fuel_Price is negative. This means that when fuel prices go up sales
will decrease
CPI is positive. This means that when CPI goes up sales will
increase
The R^2 of the model (calculated on the validation data) is
0.9252588