#OmniFoods
To ensure a successful test marketing of its OmniPower energy bars, the OmniFoods marketing department has contracted with In-Store Placements Group (ISPG), a merchandising consultancy. ISPG will work with the grocery store chain that is conducting the test-market study. Using the same 34-store sample used in the test-market study, ISPG claims that the choice of shelf location and the presence of in-store OmniPower coupon dispensers both increase sales of the energy bars.
From the Desk of Claire Deborahs, Vice President, Customer Relations To my friends at OmniFoods Marketing: As you know, in reviewing your test marketing of OmniPower energy bars, my colleagues and I at ISPG suggested how your sales could be enhanced. From our experience marketing similar products, we told you that the shelf locations of a product and the presence of in-store coupon dispensers can enhance supermarket sales. We are happy to report the positive correlation of these factors and sales of OmniPower bars. You can see the effects, if you look through the data. I think the most striking thing is the only store with over 4000 unit sales and only 200 in promotional dollars was a store with coupon dispensers! We can talk later about extending our marketing agreement. In the meantime, congratulations on a successful test marketing venture. Sincerely, Claire Deborahs
Review the ISPG claims and supporting data (ISPG.csv). Then answer the following questions:
1.Are the supporting data consistent with ISPG’s claims? Perform an appropriate statistical analysis to identify the factors contributing to the sales volume of the energy bars.
ISPG=read.csv("/Users/ruiqianli/Desktop/EBAC_SB/Day\ 5/Day\ 5\ Exercise/ISPG.csv")
#Check the first three rows of the data
head(ISPG,3)
## Bars Price Promotion ShelfLocation Dispensers
## 1 4141 59 200 EndAisle Yes
## 2 3842 59 200 EndAisle Yes
## 3 3056 59 200 Normal No
#from the table, we find that dispenser and ShelfLocation are two kinds of Categorical Variable
#ANOVA to find out association of Dispensers with Bar
summary(aov(ISPG$Bars ~ ISPG$Dispensers))
## Df Sum Sq Mean Sq F value Pr(>F)
## ISPG$Dispensers 1 624791 624791 0.388 0.538
## Residuals 32 51468887 1608403
#ANOVA to find out association of ShelfLocation with Bar
summary(aov(ISPG$Bars ~ ISPG$ShelfLocation))
## Df Sum Sq Mean Sq F value Pr(>F)
## ISPG$ShelfLocation 1 6961300 6961300 4.936 0.0335 *
## Residuals 32 45132377 1410387
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
As can be seen from the results of ANOVA, the p value of Dispensers =0.538>0.05, which cannot be rejected. Therefore, the prediction effect of this variable on sales volume is not significant. If there are many variables, this variable can be deleted. (But there are few variables in this question, so we still keep them and do multiple regression analysis together).
#It is also observed that although promotion is numerical, it only has three kinds of data: 200, 400 and 600, so we now judge whether it needs to be classified as categorical.
cor.test(ISPG$Bars,ISPG$Promotion)
##
## Pearson's product-moment correlation
##
## data: ISPG$Bars and ISPG$Promotion
## t = 3.5831, df = 32, p-value = 0.001111
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2404366 0.7394557
## sample estimates:
## cor
## 0.5350951
library(ggplot2)
ggplot(ISPG, aes(x = Promotion, y = Bars)) + geom_point() +
geom_smooth(method = 'lm')
## `geom_smooth()` using formula 'y ~ x'
It can be seen from the results that bars and promotion are positively correlated, so it is not necessary to divide promotion into categorical data.
#model fit
ISPG_Fit=lm(formula = Bars ~ Dispensers + ShelfLocation +Promotion +Price, data = ISPG)
library(lm.beta)
OPBeta = lm.beta(ISPG_Fit)
summary(OPBeta)
##
## Call:
## lm(formula = Bars ~ Dispensers + ShelfLocation + Promotion +
## Price, data = ISPG)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1280.33 -251.63 -44.83 277.04 865.29
##
## Coefficients:
## Estimate Standardized Std. Error t value Pr(>|t|)
## (Intercept) 6248.24781 0.00000 529.03217 11.811 1.33e-12 ***
## DispensersYes 100.32590 0.04053 180.46501 0.556 0.583
## ShelfLocationNormal -815.37591 -0.32708 169.30970 -4.816 4.23e-05 ***
## Promotion 3.56496 0.46210 0.56522 6.307 6.88e-07 ***
## Price -53.04470 -0.68759 5.23575 -10.131 4.91e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 487.2 on 29 degrees of freedom
## Multiple R-squared: 0.8679, Adjusted R-squared: 0.8496
## F-statistic: 47.62 on 4 and 29 DF, p-value: 2.441e-12
From the results of multiple linear regression, it can be seen that when Dispensers = yes, the impact on sales is positive (the coefficient is positive), but the pvalue is very large, so it is likely to be the impact of random sampling, so it can not be concluded that it has a promoting effect on sales.【not consistent with ISPG’s claims】 Promotion also has a positive impact on sales, but comparatively speaking, its coefficient is very small, so it can have little impact. Shelflocation = normal has a significant negative impact on sales, so it is suggested that shelflocation of bars should be built at endaisle.【consistent with ISPG’s claims】 Price also has a certain negative impact on sales, which is also in line with common sense. Whether it is necessary to reduce the price to obtain greater sales needs further decision of supermarkets.
2.Do your results confirm (or discredit) the stated relationship between sales and product shelf location and the presence of in-store OmniPower coupon dispensers? If you were advising OmniFoods, would you recommend using a specific shelf location and in-store coupon dispensers to sell OmniPower bars?
From the analysis of the previous question, it can be concluded that whether there is in store omnipower couple dispenser has no significant impact on the sales volume, but the shelf location has a significant impact on the sales volume. So I suggest not to distribute coupons (coupons have their own costs) and to place the shelves at EndAisle.
3.What additional data would you advise collecting in order to determine the effectiveness of the sales promotion techniques used by ISPG?
I suggest collecting the data of other commodities with high correlation with bars sales. If it is confirmed that there is a significant positive impact, the shelves of these commodities can be put together to promote the sales of each other. I also suggest collecting the influence of the sales volume of bars on holidays and seasons. If it is obvious, we can adjust the purchase volume and publicity accordingly.
#Coca-Cola: Revenue Forecasting
You are a financial analyst for a large financial services company. You need to better evaluate investment opportunities for your clients. To assist in the forecasting, you have collected time-series data on revenues of two large well-known companies. One of them is The Coca-Cola Company. Founded in 1886 and headquartered in Atlanta, Georgia, Coca-Cola manufactures, distributes, and markets more than 3,300 beverages in over 200 countries worldwide. Some of its brands include Barq’s, Dasani, Full Throttle, Glacéau Vitaminwater, Minute Maid, Powerade, and Sprite in addition to Coca-Cola.
The data file ColaCola.csv stores The Coca-Cola Company’s gross revenues (in billions of dollars) from 1995 to 2009 extracted from Mergent’s Handbook of Common Stocks, 2006; and www.thecoca-colacompany.com.
Questions: 1. To simplify the interpretation of the regression coefficients, recode all values of the Year variable relative to Year 1995, i.e., assign consecutively numbered integers, starting with 0 and concluding with 14, as the coded values for the 15-year time periods. (hint: create a new variable codedYear)
CocaCola=read.csv("/Users/ruiqianli/Desktop/EBAC_SB/Day\ 5/Day\ 5\ Exercise/CocaCola.csv")
#Check the first three rows of the data
head(CocaCola,3)
## Year Revenues
## 1 1995 18.0
## 2 1996 18.5
## 3 1997 18.9
#Create a new categorical column: codedYear
CocaCola$codedYear = CocaCola$Year-1995
head(CocaCola,3)
## Year Revenues codedYear
## 1 1995 18.0 0
## 2 1996 18.5 1
## 3 1997 18.9 2
2.Compute a linear trend forecasting equation and plot the results.
#model fit
CocaCola_Fit_1=lm(formula = Revenues ~ codedYear, data = CocaCola)
summary(CocaCola_Fit_1)
##
## Call:
## lm(formula = Revenues ~ codedYear, data = CocaCola)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.8067 -2.0092 0.0533 1.7508 4.0033
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.0017 1.0641 15.038 1.34e-09 ***
## codedYear 0.9150 0.1294 7.074 8.37e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.165 on 13 degrees of freedom
## Multiple R-squared: 0.7938, Adjusted R-squared: 0.7779
## F-statistic: 50.04 on 1 and 13 DF, p-value: 8.372e-06
library(ggplot2)
p = ggplot() +geom_point(data = CocaCola, mapping = aes(x = codedYear, y = Revenues)) +theme_bw()
p + geom_smooth(data = CocaCola, mapping = aes(x = codedYear, y = Revenues), method = "lm") +ggtitle("linear trend forecasting")
## `geom_smooth()` using formula 'y ~ x'
CocaCola2 = CocaCola
CocaCola2$sq.year = CocaCola2$codedYear^2
CocaCola_Fit_2 = lm(Revenues ~ codedYear + I(codedYear^2), CocaCola2)
summary(CocaCola_Fit_2)
##
## Call:
## lm(formula = Revenues ~ codedYear + I(codedYear^2), data = CocaCola2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.6953 -0.9027 -0.1802 0.7729 2.2397
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 19.08794 0.83954 22.736 3.11e-11 ***
## codedYear -0.50943 0.27835 -1.830 0.092154 .
## I(codedYear^2) 0.10175 0.01917 5.306 0.000186 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.232 on 12 degrees of freedom
## Multiple R-squared: 0.9384, Adjusted R-squared: 0.9281
## F-statistic: 91.36 on 2 and 12 DF, p-value: 5.478e-08
plot(CocaCola2$Revenues~CocaCola2$codedYear)
lines(CocaCola2$codedYear, predict(CocaCola_Fit_2), col = 'red')
4.Using the forecasting equations in (2) and (3), what are your annual forecasts of the revenues for 2011?
#If we use the linear regression formula in (2) to predict
newdata<-data.frame(codedYear=2011-1995)
predict(CocaCola_Fit_1,newdata,interval="prediction",level=0.95)
## fit lwr upr
## 1 30.64167 25.19652 36.08681
The point estimate of 2011 revenue is 30.64167 and the interval estimate is (25.19652,36.08681)
#If we use the nonlinear regression formula in (3) to predict
newdata<-data.frame(codedYear=2011-1995)
predict(CocaCola_Fit_2,newdata,interval="prediction",level=0.95)
## fit lwr upr
## 1 36.98379 32.9163 41.05128
The point estimate of 2011 revenue is 36.98379 and the interval estimate is (32.9163,41.05128)
5.What forecast do you think you should use? Why? I will use the nonlinear regression prediction in (3), because the R-squared of this model is larger, so its fitting effect is better and the prediction will be more accurate.