#OmniFoods

To ensure a successful test marketing of its OmniPower energy bars, the OmniFoods marketing department has contracted with In-Store Placements Group (ISPG), a merchandising consultancy. ISPG will work with the grocery store chain that is conducting the test-market study. Using the same 34-store sample used in the test-market study, ISPG claims that the choice of shelf location and the presence of in-store OmniPower coupon dispensers both increase sales of the energy bars.

From the Desk of Claire Deborahs, Vice President, Customer Relations To my friends at OmniFoods Marketing: As you know, in reviewing your test marketing of OmniPower energy bars, my colleagues and I at ISPG suggested how your sales could be enhanced. From our experience marketing similar products, we told you that the shelf locations of a product and the presence of in-store coupon dispensers can enhance supermarket sales. We are happy to report the positive correlation of these factors and sales of OmniPower bars. You can see the effects, if you look through the data. I think the most striking thing is the only store with over 4000 unit sales and only 200 in promotional dollars was a store with coupon dispensers! We can talk later about extending our marketing agreement. In the meantime, congratulations on a successful test marketing venture. Sincerely, Claire Deborahs

Review the ISPG claims and supporting data (ISPG.csv). Then answer the following questions:

1.Are the supporting data consistent with ISPG’s claims? Perform an appropriate statistical analysis to identify the factors contributing to the sales volume of the energy bars.

ISPG=read.csv("/Users/ruiqianli/Desktop/EBAC_SB/Day\ 5/Day\ 5\ Exercise/ISPG.csv")
#Check the first three rows of the data
head(ISPG,3)
##   Bars Price Promotion ShelfLocation Dispensers
## 1 4141    59       200      EndAisle        Yes
## 2 3842    59       200      EndAisle        Yes
## 3 3056    59       200        Normal         No
#from the table, we find that dispenser and ShelfLocation are two kinds of Categorical Variable
#ANOVA to find out association of Dispensers with Bar
summary(aov(ISPG$Bars ~ ISPG$Dispensers))
##                 Df   Sum Sq Mean Sq F value Pr(>F)
## ISPG$Dispensers  1   624791  624791   0.388  0.538
## Residuals       32 51468887 1608403
#ANOVA to find out association of ShelfLocation with Bar
summary(aov(ISPG$Bars ~ ISPG$ShelfLocation))
##                    Df   Sum Sq Mean Sq F value Pr(>F)  
## ISPG$ShelfLocation  1  6961300 6961300   4.936 0.0335 *
## Residuals          32 45132377 1410387                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As can be seen from the results of ANOVA, the p value of Dispensers =0.538>0.05, which cannot be rejected. Therefore, the prediction effect of this variable on sales volume is not significant. If there are many variables, this variable can be deleted. (But there are few variables in this question, so we still keep them and do multiple regression analysis together).

#It is also observed that although promotion is numerical, it only has three kinds of data: 200, 400 and 600, so we now judge whether it needs to be classified as categorical.
cor.test(ISPG$Bars,ISPG$Promotion)
## 
##  Pearson's product-moment correlation
## 
## data:  ISPG$Bars and ISPG$Promotion
## t = 3.5831, df = 32, p-value = 0.001111
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2404366 0.7394557
## sample estimates:
##       cor 
## 0.5350951
library(ggplot2)
ggplot(ISPG, aes(x = Promotion, y = Bars)) + geom_point() +
geom_smooth(method = 'lm')
## `geom_smooth()` using formula 'y ~ x'

It can be seen from the results that bars and promotion are positively correlated, so it is not necessary to divide promotion into categorical data.

#model fit
ISPG_Fit=lm(formula = Bars ~ Dispensers + ShelfLocation +Promotion +Price, data = ISPG)
library(lm.beta)
OPBeta = lm.beta(ISPG_Fit)
summary(OPBeta)
## 
## Call:
## lm(formula = Bars ~ Dispensers + ShelfLocation + Promotion + 
##     Price, data = ISPG)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1280.33  -251.63   -44.83   277.04   865.29 
## 
## Coefficients:
##                       Estimate Standardized Std. Error t value Pr(>|t|)    
## (Intercept)         6248.24781      0.00000  529.03217  11.811 1.33e-12 ***
## DispensersYes        100.32590      0.04053  180.46501   0.556    0.583    
## ShelfLocationNormal -815.37591     -0.32708  169.30970  -4.816 4.23e-05 ***
## Promotion              3.56496      0.46210    0.56522   6.307 6.88e-07 ***
## Price                -53.04470     -0.68759    5.23575 -10.131 4.91e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 487.2 on 29 degrees of freedom
## Multiple R-squared:  0.8679, Adjusted R-squared:  0.8496 
## F-statistic: 47.62 on 4 and 29 DF,  p-value: 2.441e-12

From the results of multiple linear regression, it can be seen that when Dispensers = yes, the impact on sales is positive (the coefficient is positive), but the pvalue is very large, so it is likely to be the impact of random sampling, so it can not be concluded that it has a promoting effect on sales.【not consistent with ISPG’s claims】 Promotion also has a positive impact on sales, but comparatively speaking, its coefficient is very small, so it can have little impact. Shelflocation = normal has a significant negative impact on sales, so it is suggested that shelflocation of bars should be built at endaisle.【consistent with ISPG’s claims】 Price also has a certain negative impact on sales, which is also in line with common sense. Whether it is necessary to reduce the price to obtain greater sales needs further decision of supermarkets.

2.Do your results confirm (or discredit) the stated relationship between sales and product shelf location and the presence of in-store OmniPower coupon dispensers? If you were advising OmniFoods, would you recommend using a specific shelf location and in-store coupon dispensers to sell OmniPower bars?

From the analysis of the previous question, it can be concluded that whether there is in store omnipower couple dispenser has no significant impact on the sales volume, but the shelf location has a significant impact on the sales volume. So I suggest not to distribute coupons (coupons have their own costs) and to place the shelves at EndAisle.

3.What additional data would you advise collecting in order to determine the effectiveness of the sales promotion techniques used by ISPG?

I suggest collecting the data of other commodities with high correlation with bars sales. If it is confirmed that there is a significant positive impact, the shelves of these commodities can be put together to promote the sales of each other. I also suggest collecting the influence of the sales volume of bars on holidays and seasons. If it is obvious, we can adjust the purchase volume and publicity accordingly.

#Coca-Cola: Revenue Forecasting

You are a financial analyst for a large financial services company. You need to better evaluate investment opportunities for your clients. To assist in the forecasting, you have collected time-series data on revenues of two large well-known companies. One of them is The Coca-Cola Company. Founded in 1886 and headquartered in Atlanta, Georgia, Coca-Cola manufactures, distributes, and markets more than 3,300 beverages in over 200 countries worldwide. Some of its brands include Barq’s, Dasani, Full Throttle, Glacéau Vitaminwater, Minute Maid, Powerade, and Sprite in addition to Coca-Cola.

The data file ColaCola.csv stores The Coca-Cola Company’s gross revenues (in billions of dollars) from 1995 to 2009 extracted from Mergent’s Handbook of Common Stocks, 2006; and www.thecoca-colacompany.com.

Questions: 1. To simplify the interpretation of the regression coefficients, recode all values of the Year variable relative to Year 1995, i.e., assign consecutively numbered integers, starting with 0 and concluding with 14, as the coded values for the 15-year time periods. (hint: create a new variable codedYear)

CocaCola=read.csv("/Users/ruiqianli/Desktop/EBAC_SB/Day\ 5/Day\ 5\ Exercise/CocaCola.csv")
#Check the first three rows of the data
head(CocaCola,3)
##   Year Revenues
## 1 1995     18.0
## 2 1996     18.5
## 3 1997     18.9
#Create a new categorical column: codedYear
CocaCola$codedYear = CocaCola$Year-1995
head(CocaCola,3)
##   Year Revenues codedYear
## 1 1995     18.0         0
## 2 1996     18.5         1
## 3 1997     18.9         2

2.Compute a linear trend forecasting equation and plot the results.

#model fit
CocaCola_Fit_1=lm(formula = Revenues ~ codedYear, data = CocaCola)
summary(CocaCola_Fit_1)
## 
## Call:
## lm(formula = Revenues ~ codedYear, data = CocaCola)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.8067 -2.0092  0.0533  1.7508  4.0033 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  16.0017     1.0641  15.038 1.34e-09 ***
## codedYear     0.9150     0.1294   7.074 8.37e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.165 on 13 degrees of freedom
## Multiple R-squared:  0.7938, Adjusted R-squared:  0.7779 
## F-statistic: 50.04 on 1 and 13 DF,  p-value: 8.372e-06
library(ggplot2)
p = ggplot() +geom_point(data = CocaCola, mapping = aes(x = codedYear, y = Revenues)) +theme_bw()
p + geom_smooth(data = CocaCola, mapping = aes(x = codedYear, y = Revenues), method = "lm") +ggtitle("linear trend forecasting")
## `geom_smooth()` using formula 'y ~ x'

  1. Compute a quadratic trend forecasting equation and plot the results. (hint to plot the results: lines(coca$codedYear, predict(cocaFit2), col = “red”)
CocaCola2 = CocaCola
CocaCola2$sq.year = CocaCola2$codedYear^2

CocaCola_Fit_2 = lm(Revenues ~ codedYear + I(codedYear^2), CocaCola2)
summary(CocaCola_Fit_2)
## 
## Call:
## lm(formula = Revenues ~ codedYear + I(codedYear^2), data = CocaCola2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.6953 -0.9027 -0.1802  0.7729  2.2397 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    19.08794    0.83954  22.736 3.11e-11 ***
## codedYear      -0.50943    0.27835  -1.830 0.092154 .  
## I(codedYear^2)  0.10175    0.01917   5.306 0.000186 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.232 on 12 degrees of freedom
## Multiple R-squared:  0.9384, Adjusted R-squared:  0.9281 
## F-statistic: 91.36 on 2 and 12 DF,  p-value: 5.478e-08
plot(CocaCola2$Revenues~CocaCola2$codedYear)
lines(CocaCola2$codedYear, predict(CocaCola_Fit_2), col = 'red')

4.Using the forecasting equations in (2) and (3), what are your annual forecasts of the revenues for 2011?

#If we use the linear regression formula in (2) to predict
newdata<-data.frame(codedYear=2011-1995)
predict(CocaCola_Fit_1,newdata,interval="prediction",level=0.95)
##        fit      lwr      upr
## 1 30.64167 25.19652 36.08681

The point estimate of 2011 revenue is 30.64167 and the interval estimate is (25.19652,36.08681)

#If we use the nonlinear regression formula in (3) to predict
newdata<-data.frame(codedYear=2011-1995)
predict(CocaCola_Fit_2,newdata,interval="prediction",level=0.95)
##        fit     lwr      upr
## 1 36.98379 32.9163 41.05128

The point estimate of 2011 revenue is 36.98379 and the interval estimate is (32.9163,41.05128)

5.What forecast do you think you should use? Why? I will use the nonlinear regression prediction in (3), because the R-squared of this model is larger, so its fitting effect is better and the prediction will be more accurate.