- Drivers?
- What is under Managerial/corporate control?
- Simple vs. complicated?
November 13, 2017
Enterprise Industries, owners of Fresh Detergent, want to predict demand for their product. In this case, the product is an extra large bottle of Fresh liquid detergent. Given a model for demand, Enterprise can:
Four indicators for 30 sales periods (4 weeks):
## Fresh.Demand Fresh.Price Industry.Price Advertising.Spending ## 1 7.38 3.85 3.80 5.50 ## 2 8.51 3.75 4.00 6.75 ## 3 9.52 3.70 4.30 7.25 ## 4 7.50 3.70 3.70 5.50 ## 5 9.33 3.60 3.85 7.00 ## 6 8.28 3.60 3.80 6.50 ## 7 8.75 3.60 3.75 6.75 ## 8 7.87 3.80 3.85 5.25 ## 9 7.10 3.80 3.65 5.25 ## 10 8.00 3.85 4.00 6.00 ## 11 7.89 3.90 4.10 6.50 ## 12 8.15 3.90 4.00 6.25 ## 13 9.10 3.70 4.10 7.00 ## 14 8.86 3.75 4.20 6.90 ## 15 8.90 3.75 4.10 6.80 ## 16 8.87 3.80 4.10 6.80 ## 17 9.26 3.70 4.20 7.10 ## 18 9.00 3.80 4.30 7.00 ## 19 8.75 3.70 4.10 6.80 ## 20 7.95 3.80 3.75 6.50
Mean | Std. Dev. | Minimum | Maximum | Atoms | |
---|---|---|---|---|---|
Fresh.Demand | 8.38 | 0.68 | 7.10 | 9.52 | 26.00 |
Fresh.Price | 3.73 | 0.09 | 3.55 | 3.90 | 8.00 |
Industry.Price | 3.95 | 0.22 | 3.65 | 4.30 | 11.00 |
Advertising.Spending | 6.45 | 0.57 | 5.25 | 7.25 | 13.00 |
Fresh.Demand | Fresh.Price | Industry.Price | Advertising.Spending | |
---|---|---|---|---|
Fresh.Demand | 1.00 | -0.47 | 0.74 | 0.88 |
Fresh.Price | -0.47 | 1.00 | 0.08 | -0.47 |
Industry.Price | 0.74 | 0.08 | 1.00 | 0.60 |
Advertising.Spending | 0.88 | -0.47 | 0.60 | 1.00 |
Let's have a look at the 3-D.
scatter3d(Fresh.Demand~Advertising.Spending+Price.Difference, data=fresh.data, fit=FALSE, residuals=TRUE, bg="white", axis.scales=TRUE, grid=TRUE, ellipsoid=FALSE) scatter3d(Fresh.Demand~Advertising.Spending+Price.Difference, data=fresh.data, fit="linear", residuals=TRUE, bg="white", axis.scales=TRUE, grid=TRUE, ellipsoid=FALSE) scatter3d(Fresh.Demand~Advertising.Spending+Price.Difference, data=fresh.data, fit="quadratic", residuals=TRUE, bg="white", axis.scales=TRUE, grid=TRUE, ellipsoid=FALSE) scatter3d(Fresh.Demand~Advertising.Spending+Price.Difference, data=fresh.data, fit=c("linear","quadratic"), residuals=FALSE, bg="white", axis.scales=TRUE, grid=TRUE, ellipsoid=FALSE)
Dependent variable: | |
Fresh.Demand | |
Fresh.Price | -2.358*** |
(0.638) | |
Industry.Price | 1.612*** |
(0.295) | |
Advertising.Spending | 0.501*** |
(0.126) | |
Constant | 7.589*** |
(2.445) | |
Observations | 30 |
R2 | 0.894 |
Adjusted R2 | 0.881 |
Residual Std. Error | 0.235 (df = 26) |
F Statistic | 72.797*** (df = 3; 26) |
Note: | p<0.1; p<0.05; p<0.01 |
Conforms to intuition:
Constructing na"ive confidence intervals:
How could we test this?
Dependent variable: | ||
Fresh.Demand | ||
(1) | (2) | |
Fresh.Price | -2.358*** | |
(0.638) | ||
Industry.Price | 1.612*** | |
(0.295) | ||
Price.Difference | 1.588*** | |
(0.299) | ||
Advertising.Spending | 0.501*** | 0.563*** |
(0.126) | (0.119) | |
Constant | 7.589*** | 4.407*** |
(2.445) | (0.722) | |
Observations | 30 | 30 |
R2 | 0.894 | 0.886 |
Adjusted R2 | 0.881 | 0.878 |
Residual Std. Error | 0.235 (df = 26) | 0.238 (df = 27) |
F Statistic | 72.797*** (df = 3; 26) | 104.967*** (df = 2; 27) |
Note: | p<0.1; p<0.05; p<0.01 |
Res.Df | RSS | Df | Sum of Sq | F | Pr(>F) | |
---|---|---|---|---|---|---|
1 | 27 | 1.53 | ||||
2 | 26 | 1.43 | 1 | 0.10 | 1.85 | 0.1855 |
Let's solve for F in terms of r-squared.
What is the difference in r-squared across the two models?
0.007569.
What is the average unexplained variance for the biggest model? 0.0043829
Which yields the following F. 1.8497987
Dependent variable: | |||
Fresh.Demand | |||
(1) | (2) | (3) | |
Fresh.Price | -2.358*** | ||
(0.638) | |||
Industry.Price | 1.612*** | ||
(0.295) | |||
Price.Difference | 1.588*** | 1.307*** | |
(0.299) | (0.304) | ||
Advertising.Spending | 0.501*** | 0.563*** | -3.696* |
(0.126) | (0.119) | (1.850) | |
I(Advertising.Spending2) | 0.349** | ||
(0.151) | |||
Constant | 7.589*** | 4.407*** | 17.324*** |
(2.445) | (0.722) | (5.641) | |
Observations | 30 | 30 | 30 |
R2 | 0.894 | 0.886 | 0.905 |
Adjusted R2 | 0.881 | 0.878 | 0.894 |
Residual Std. Error | 0.235 (df = 26) | 0.238 (df = 27) | 0.221 (df = 26) |
F Statistic | 72.797*** (df = 3; 26) | 104.967*** (df = 2; 27) | 82.941*** (df = 3; 26) |
Note: | p<0.1; p<0.05; p<0.01 |
par(mfrow=c(2,2)) qqnorm(fresh.model.diff$residuals, main="QQ-Normal: Linear Ad.Spending", datax=TRUE) qqnorm(fresh.model.sq$residuals, main="QQ-Normal: Quadratic Ad.Spending", datax=TRUE) plot(fresh.data$Price.Difference,fresh.model.sq$residuals, xlab="Price.Difference") plot(fresh.data$Price.Difference,fresh.model.diff$residuals, xlab="Price.Difference")
Once we decide on a model, we can come up with at least two very valuable quantities.
Let's characterize the in choosing among these models.
## Fresh.Price Industry.Price Price.Difference Ad.Exp Fresh.Demand ## 1 3.85 3.80 -0.05 5.50 7.38 ## 2 3.75 4.00 0.25 6.75 8.51 ## 3 3.70 4.30 0.60 7.25 9.52 ## 4 3.70 3.70 0.00 5.50 7.50 ## 5 3.60 3.85 0.25 7.00 9.33 ## 6 3.60 3.80 0.20 6.50 8.28 ## Ad.Campaign DA DB DC ## 1 B 0 1 0 ## 2 B 0 1 0 ## 3 B 0 1 0 ## 4 A 1 0 0 ## 5 C 0 0 1 ## 6 A 1 0 0
What if the advertising campaign matters?
FD.lm <- lm(Fresh.Demand ~ Price.Difference+poly(Ad.Exp, 2), data=Fresh) shapiro.test(FD.lm$residuals)
## ## Shapiro-Wilk normality test ## ## data: FD.lm$residuals ## W = 0.9841, p-value = 0.9209
FD.lmI <- lm(Fresh.Demand ~ Price.Difference+Ad.Exp:Ad.Campaign, data=Fresh) shapiro.test(FD.lmI$residuals)
## ## Shapiro-Wilk normality test ## ## data: FD.lmI$residuals ## W = 0.99121, p-value = 0.9958
anova(FD.lm,FD.lmI)
## Analysis of Variance Table ## ## Model 1: Fresh.Demand ~ Price.Difference + poly(Ad.Exp, 2) ## Model 2: Fresh.Demand ~ Price.Difference + Ad.Exp:Ad.Campaign ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 26 1.27327 ## 2 25 0.75218 1 0.52109 17.319 0.0003268 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(allEffects(FD.lmI))
plotmeans(Ad.Exp~Ad.Campaign, data=Fresh)
plotmeans(Price.Difference~Ad.Campaign, data=Fresh)