## MarketID MarketSize LocationID AgeOfStore
## 0 0 0 0
## Promotion Week SalesInThousands
## 0 0 0
Fortunately, we don’t have any miss value or NA.
## Classes 'tbl_df', 'tbl' and 'data.frame': 548 obs. of 7 variables:
## $ MarketID : num 1 1 1 1 1 1 1 1 1 1 ...
## $ MarketSize : chr "Medium" "Medium" "Medium" "Medium" ...
## $ LocationID : num 1 1 1 1 2 2 2 2 3 3 ...
## $ AgeOfStore : num 4 4 4 4 5 5 5 5 12 12 ...
## $ Promotion : num 3 3 3 3 2 2 2 2 1 1 ...
## $ Week : num 1 2 3 4 1 2 3 4 1 2 ...
## $ SalesInThousands: num 33.7 35.7 29 39.2 27.8 ...
Best sales’s location should within 40
## Warning: attributes are not identical across measure variables;
## they will be dropped
## MarketSize LocationID AgeOfStore Promotion
## Large :125 Min. : 1.0 Min. : 1.000 Min. :1.000
## Medium:212 1st Qu.:216.0 1st Qu.: 3.000 1st Qu.:1.000
## Small : 43 Median :502.0 Median : 7.000 Median :2.000
## Mean :481.9 Mean : 8.234 Mean :1.982
## 3rd Qu.:709.2 3rd Qu.:12.000 3rd Qu.:3.000
## Max. :920.0 Max. :28.000 Max. :3.000
## Week SalesInThousands
## Min. :1.000 Min. :19.26
## 1st Qu.:2.000 1st Qu.:42.90
## Median :2.000 Median :51.16
## Mean :2.518 Mean :54.26
## 3rd Qu.:4.000 3rd Qu.:61.43
## Max. :4.000 Max. :99.65
## MarketSize LocationID AgeOfStore Promotion
## Large : 43 Min. : 1.0 Min. : 1.000 Min. :1.000
## Medium:108 1st Qu.:213.8 1st Qu.: 4.000 1st Qu.:2.000
## Small : 17 Median :507.0 Median : 8.000 Median :2.000
## Mean :474.5 Mean : 9.113 Mean :2.137
## 3rd Qu.:705.0 3rd Qu.:12.000 3rd Qu.:3.000
## Max. :920.0 Max. :28.000 Max. :3.000
## Week SalesInThousands
## Min. :1.000 Min. :17.34
## 1st Qu.:1.000 1st Qu.:42.05
## Median :3.000 Median :48.41
## Mean :2.458 Mean :51.67
## 3rd Qu.:3.000 3rd Qu.:56.80
## Max. :4.000 Max. :94.89
fit1 <- lm(SalesInThousands~ . , data = trainglm)
summary(fit1)
##
## Call:
## lm(formula = SalesInThousands ~ ., data = trainglm)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.603 -8.328 1.447 8.178 24.468
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 80.116707 2.595067 30.873 < 2e-16 ***
## MarketSizeMedium -26.743810 1.329168 -20.121 < 2e-16 ***
## MarketSizeSmall -17.960401 2.165744 -8.293 2.03e-15 ***
## LocationID -0.015995 0.002163 -7.395 9.44e-13 ***
## AgeOfStore 0.138087 0.092235 1.497 0.135
## Promotion -1.135161 0.727534 -1.560 0.120
## Week -0.033217 0.534863 -0.062 0.951
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.59 on 373 degrees of freedom
## Multiple R-squared: 0.5456, Adjusted R-squared: 0.5383
## F-statistic: 74.65 on 6 and 373 DF, p-value: < 2.2e-16
#The significant indepentent variables are large Marketsize, LocationID.
plot(fit1, which=c(1,1))
A perfect fitted model would have its red line horizontal around zero - meaning that the residuals are randomly distributed over the fitted values and therefore our model would cover the characteristics of the data.so let get the model between diferent variablesSo let’s include the interaction effects in a new model:
fit2 <- lm(SalesInThousands~ (MarketSize + LocationID)^2
, data = trainglm)
summary(fit2)
##
## Call:
## lm(formula = SalesInThousands ~ (MarketSize + LocationID)^2,
## data = trainglm)
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.3933 -4.7856 0.1191 5.3898 19.8020
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 95.354879 1.242925 76.718 <2e-16 ***
## MarketSizeMedium -60.527153 1.678064 -36.070 <2e-16 ***
## MarketSizeSmall -31.407677 3.246429 -9.675 <2e-16 ***
## LocationID -0.045810 0.001900 -24.105 <2e-16 ***
## MarketSizeMedium:LocationID 0.065229 0.002796 23.330 <2e-16 ***
## MarketSizeSmall:LocationID 0.017411 0.011982 1.453 0.147
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.426 on 374 degrees of freedom
## Multiple R-squared: 0.8129, Adjusted R-squared: 0.8104
## F-statistic: 325 on 5 and 374 DF, p-value: < 2.2e-16
plot(fit2, which = c(1,1))
In the case, we went from 53.1% variance explained by fit1, to 81.1%% variance explained witht he model fit4.
Our model fit4 a very good regression fit, The model fit4 explains 81.1% of the variance given by the data. Higher sales are mainly affected by the market size and store location.Large market size and averge location is probably 40