You were hired as a consultant to describe for Coca-Cola the trade areas of two stores. This is a pilot project that can potentially be expanded to other markets. More specifically, the main interest of the project is to gain a better understanding of:
These insights will have implications for Coca-Cola’s pricing and promotional strategies.
Choice data is available for cola purchases of households over 103 weeks in two grocery stores.
Using descriptive statistics (tables and graphs) to provide a rigorous description of the market. Make sure to tell a managerially relevant story.
How do these statistics vary by store, by day of the week?
In terms of unit share, Pepsi dominates both store’s sales. The client brand, Coke, does relatively well in Store 1542 (28.17% share), however, it suffers in Store 1521 where Pepsi and Private Label brands eat into its share. This could be indicative of a higher number of price sensitive consumers in Store 1521 (holding all else equal). In support of that hypothesis, the Private Label brand, which holds the lowest price point, does approximately 10% better in terms of unit share when comparing Store 1521 to Store 1542. Private Label products are primarily responsible for Coke’s loss in unit share in Store 1521. Coke holds the highest price point across all brands, on average, in Store 1521. In Store 1542, Coke holds the second highest price point, on average, only being surpassed by RC Cola by an extremely small margin. Similar price positions across all brands implies close competition, which will we discussed in the following sections.
Store 1542 engages far more heavily in both feature and display advertising than Store 1521. While both Pepsi and Coke utilize these marketing tactics with similar intensity in Store 1542, this is not the case in store 1521. Coke’s lack of promotional activity in Store 1521 likely contributes to its loss in unit share. Likewise, if consumers in this market are more price sensitive, then Store 1521’s heavier investment in Private Label feature advertising is likely to induce brand switching.
It’s important to note that both the intensity and effects of promotional activity vary by day of the week. Specifically, we see the largest variation in these statistics when looking at activity on the weekend versus weekdays. In referencing the graphs below, we can see that price points remain relatively constant across the week, as expected. However, we can expect small price discounts to coincide with feature and display promotions, which clearly peak on weekends. Coke and Pepsi most closely mirror each other in terms of their promotional strategies. Specifically, both brands slowly increase their display advertising activity throughout the week, and reach their respective peaks on Friday and Sunday. Both brands exhibit similar patterns with respect to feature advertising. However, Pepsi reaches its peak feature advertising activity on Friday versus Coke, which reaches its peak on Sunday. This is an interesting point to make note of, as sales appear to peak on Friday and Saturday for both brands. Thus, feature advertising may be more effective on Friday (moving into the weekend), versus Sunday (moving into the week - where Monday sales are among the lowest for Coke).
Do you expect there to be any multi-collinearity issues? If so, how could it affect your analysis?
Delving further into the relationship between price and promotions, we can reference the correlation matrix below to inspect the magnitude of the negative correlations between brand prices and promotional activity (feature and display advertising). Negative correlations imply that price decreases when brands utilize display or feature promotions. Likewise, we also observe strong positive correlations between feature and display activity for each brand, which implies that promotional activities tend to coincide with each other. While this is often the nature of marketing mix strategies, it is sure to cause multicollinearity issues in any subsequent analysis. The primary consequence of multicollinearity issues will be the increase in magnitude of the resulting standard errors of our estimates of individual predictors. Thus, our resulting coefficient estimates may appear to be insignificant in their relationship to the dependent variable, even when that is not the case (type 2 error - failure to reject a false null hypothesis). At a more general level, multicollinearity will not reduce the predictive power or reliability of the model as a whole, however, the model may not give valid results about any individual predictor, or about which predictors are redundant with respect to others. This would cause a major issue in our analysis, as we seek to determine the precise relationship between individual marketing mix variables and our outcome variable of interest. Understanding these relationship will inform Coke’s pricing and promotional strategies moving forward.
In referencing the correlation matrix below, we can see that the strongest relationships exist between Coke and RC Cola, as well as Coke and Pepsi.
Can you detect any seasonality in terms of sales?
Likewise, seasonality exists, however, it is not at the week level.
dir()## [1] "basic.png"
## [2] "brand_pricing.png"
## [3] "ColaData_Lagged.csv"
## [4] "corrplot1.png"
## [5] "corrplot2.png"
## [6] "daily_brand.png"
## [7] "daily_disp_ad.png"
## [8] "daily_feat_ad.png"
## [9] "daily_sales.png"
## [10] "display_ads.png"
## [11] "feat_ad.png"
## [12] "full.png"
## [13] "hw4_files"
## [14] "hw4.html"
## [15] "hw4.Rmd"
## [16] "MKT_HW4.Rproj"
## [17] "MKTHW4_haj372.pdf"
## [18] "MNL-Cola.pdf"
## [19] "pie1.png"
## [20] "pie2.png"
## [21] "Screen Shot 2016-05-09 at 2.02.16 PM.png"
## [22] "Screen Shot 2016-05-09 at 2.39.48 PM.png"
## [23] "seasonality.png"
cola = read.csv("ColaData_Lagged.csv", header=TRUE, sep=",")
head(cola)## hid week dayofwk storeid select_pl select_pepsi select_coke select_rc
## 1 1 636 7 1542 1 0 0 0
## 2 1 639 2 1542 0 0 1 0
## 3 1 645 5 1542 0 1 0 0
## 4 1 658 4 1542 0 0 1 0
## 5 1 663 7 1542 1 0 0 0
## 6 1 667 3 1542 1 0 0 0
## price_pl price_pepsi price_coke price_rc disp_pl disp_pepsi disp_coke
## 1 0.5017 0.5633 0.7238 0.9867 0.0917 0 0.0000
## 2 0.3586 0.6422 0.4213 0.9867 0.0000 0 0.0000
## 3 0.4213 0.4213 0.5122 0.8200 0.0000 1 0.8942
## 4 0.3124 0.6469 0.4686 0.9867 0.2500 0 1.0000
## 5 0.4989 0.6454 0.6296 0.8200 0.0723 0 0.0000
## 6 0.3361 0.6485 0.6702 0.9867 0.0000 0 0.0000
## disp_rc feat_pl feat_pepsi feat_coke feat_rc lag_pl lag_pepsi lag_coke
## 1 0 0.2367 1 0.0000 0 NA NA NA
## 2 0 0.0000 0 1.0000 0 1 0 0
## 3 0 0.0000 1 0.8942 0 0 0 1
## 4 0 1.0000 0 1.0000 0 0 1 0
## 5 0 0.2410 0 1.0000 0 0 0 1
## 6 0 0.0000 0 0.8942 0 1 0 0
## lag_rc prev
## 1 NA NA
## 2 0 0
## 3 0 0
## 4 0 0
## 5 0 0
## 6 0 1
attach(cola)
library(plotrix)
#calculating unit market share
unit_share = aggregate(x=cola[,5:8], by=list(cola$storeid), FUN=sum, simplify=TRUE)
unit_share_viz = as.matrix(unit_share)
unit_share_viz## Group.1 select_pl select_pepsi select_coke select_rc
## [1,] 1521 237 809 182 208
## [2,] 1542 222 1739 952 466
g1_total = sum(237, 809,182,208)
g2_total = sum(222, 1739, 952, 466)
#group1
g1_pl_share = 237/g1_total
g1_pepsi_share = 809/g1_total
g1_coke_share = 182/g1_total
g1_rc_share = 208/g1_total
#group 2
g2_pl_share = 222/g2_total
g2_pepsi_share = 1739/g2_total
g2_coke_share = 952/g2_total
g2_rc_share = 466/g2_total
#PIE-1
slices1 = c(g1_pl_share, g1_pepsi_share, g1_coke_share, g1_rc_share)
lb1 = c("Private Label", "Pepsi", "Coke", "RC")
pct = round(c(g1_pl_share, g1_pepsi_share, g1_coke_share, g1_rc_share)*100, 2)
lb1 = paste(lb1, "\n", pct) # add percents to labels
lb1 = paste(lb1,"%",sep="") # ad % to labels
colors = c("red", "blue", "green", "yellow")
pie(slices1, labels = lb1, main = "Unit Share in Store 1521", col=colors)#PIE-2
slices2 = c(g2_pl_share, g2_pepsi_share, g2_coke_share, g2_rc_share)
lb2 = c("Private Label", "Pepsi", "Coke", "RC")
pct2 = round(c(g2_pl_share, g2_pepsi_share, g2_coke_share, g2_rc_share)*100, 2)
lb2 = paste(lb2, "\n", pct2) # add percents to labels
lb2 = paste(lb2,"%",sep="") # ad % to labels
colors = c("red", "blue", "green", "yellow")
pie(slices2, labels = lb2, main = "Unit Share in Store 1542", col=colors)#calculating avg price across markets
avg_prices = aggregate(x=cola[,9:12], by=list(cola$storeid), FUN=mean, simplify=TRUE)
avg_price_viz = as.matrix(avg_prices)
barplot(avg_price_viz[,2:5], beside = TRUE, ylim = c(0,1), col=c("orange", "grey"), main="Brand Pricing Across Stores", legend = avg_price_viz[,1])#calculating avg display advertising across markets
avg_display = aggregate(x=cola[,13:16], by=list(cola$storeid), FUN=mean, simplify = TRUE)
avg_display_viz = as.matrix(avg_display)
barplot(avg_display_viz[,2:5], beside = TRUE, ylim = c(0,0.4), col=c("blue", "grey"), main="Brand Display Advertising Across Stores", legend = avg_display_viz[,1])#calculating avg feature promotion across markets
avg_feature = aggregate(x=cola[,17:20], by=list(cola$storeid), FUN=mean, simplify = TRUE)
avg_feature_viz = as.matrix(avg_feature)
barplot(avg_feature_viz[,2:5], beside = TRUE, ylim = c(0,0.6), col=c("purple", "grey"), main="Brand Feature Advertising Across Stores", legend = avg_feature_viz[,1])#looking at price and promotional activity by day of the week
week_prices = aggregate(x=cola[,9:12], by=list(cola$dayofwk), FUN=mean, simplify=TRUE)
week_price_viz = as.matrix(week_prices)
week_price_viz## Group.1 price_pl price_pepsi price_coke price_rc
## [1,] 1 0.5272077 0.6912616 0.7640565 0.7437000
## [2,] 2 0.5191172 0.6758732 0.7635662 0.7497000
## [3,] 3 0.5341720 0.6802265 0.7691919 0.7282940
## [4,] 4 0.5242941 0.6855473 0.7562066 0.7219143
## [5,] 5 0.5334374 0.6847356 0.7831916 0.7565268
## [6,] 6 0.5428655 0.6651139 0.7589285 0.7292095
## [7,] 7 0.5016307 0.6619861 0.7395771 0.7037818
barplot(week_price_viz[,2:5], beside = TRUE, ylim = c(0,1), col="orange", main="Daily Brand Pricing")week_display = aggregate(x=cola[,13:16], by=list(cola$dayofwk), FUN=mean, simplify = TRUE)
week_display_viz = as.matrix(week_display)
barplot(week_display_viz[,2:5], beside = TRUE, ylim = c(0,0.4), col="blue", main="Daily Brand Display Advertising")week_feature = aggregate(x=cola[,17:20], by=list(cola$dayofwk), FUN=mean, simplify = TRUE)
week_feature_viz = as.matrix(week_feature)
barplot(week_feature_viz[,2:5], beside = TRUE, ylim = c(0,0.6), col="purple", main="Daily Brand Feature Advertising")week_sales = aggregate(x=cola[,5:8], by=list(cola$dayofwk), FUN=sum, simplify = TRUE)
week_sales_viz = as.matrix(week_sales)
barplot(week_sales_viz[,2:5], beside = TRUE, col="green", main="Daily Brand Sales")#correlation between price and promotions
library(corrplot)
corcola = cor(cola[,9:20])
corrplot(corcola, method = "number")pricecola = cor(cola[,9:12])
corrplot(pricecola, method = "number")#seasonality
sales_over_time = aggregate(x=cola[,5:8], by=list(cola$week), FUN=sum, simplify = TRUE)
sales_over_time_viz = as.matrix(sales_over_time)
plot(sales_over_time_viz[,2], col = c("red"), type = "l", xlab="Weeks", ylab="Sales", main="Seasonality in Sales", lwd=2, ylim=c(0, 60))
lines(sales_over_time_viz[,3], col = "blue", lwd=2)
lines(sales_over_time_viz[,4], col = "green", lwd=2)
lines(sales_over_time_viz[,5], col = "black", lwd=2)