The below questions are related to the case described in “Chapter 9: SVEDKA Vodka“ from Cutting-Edge Marketing Analytics, and reference the following documents: http://tinyurl.com/y7epy6fb (HW #3 instructions) and http://tinyurl.com/y8c87vzr (gsheet of the data).
For easy reference, below is the code and commentary – which cleaned, prepped, developed and analyzed data and models for this analysis. The results of these decisions are noted in the response section for each question. While multiple visuals and techniques were applied, this RMD highlights the most straightforward results – which are the figures more easily digestible to the executive stakeholders.
When running this RMD, key elements to remember are: 1) set working directory, 2) load the below libraries (e.g. coefplot, ggplot2 & gsheets etc.) and 3) read the CSV into the sved data frame.
## Loading required package: coefplot
## Loading required package: ggplot2
## Loading required package: forecast
## Loading required package: gpairs
## Loading required package: gsheet
## Loading required package: knitr
## Warning: package 'knitr' was built under R version 3.4.1
QUESTION: Run a regression of the natural logarithm of change in sales on the natural logarithm of previous period prices, and the natural log of marketing expenditures on print, outdoor, and broadcasting.
## Warning: Ignoring unknown aesthetics: xmin, xmax
RESPONSE: As reflected in the below graphs, the marketing expenditure effort with largest positive impact is print (Coefficient = .018). It’s interesting to note that it is a bit higher than the intercept (.024) and all other expenindures (as well as price) have a negative coefficients, particularly outdoor advertising expenditures (-011). However, these variables are all relatively close to zero. These methods are –at the very least– not greatly impacting the SVEDKA brand in the negative way within the market and toward the consumer.
plot_coeffq6(q6)
pairs(LnDiff ~ LnLPrice + LnPrint + LnOut + LnBroad, data=sved)
plot(q6$fitted.values, q6$residuals, ylab="Residuals", xlab="Fitted Values", main="Q6 -- Residuals vs. Fitted")
abline(0, 0) # the horizon
QUESTION: To understand the influence of vodka quality, run a regression by adding the tier 1 and tier 2 dummy variables (that indicate whether a vodka brand belongs to first- or second-quality tiers) to the set of independent variables in question 6.
## Warning: Ignoring unknown aesthetics: xmin, xmax
RESPONSE: When adding in Tier 1 and Tier 2 variables, a stronger model is produced. This is indicated through several metrics, such as this second model having a lower AIC compared to the first (-177.2088569 vs. -182.3826859 or exp=0.0752519) and R^2 calculations (0.132899 vs. 0.1004895). The coefficients for Tier 1 & Tier 2 are comparatively high regarding the other variables and even the earlier model; these two tier attributes are also close to the intercept and close to each other. This indicates that the quality of vodka is an important factor within this alcohol category. However, one does not look to greatly outpace the other.
Additionally, through ANOVA, we can see that there is a statisticially significant difference betwen the two models (pvalue=.01158 presuming alpha=.5). Further calculations supporting this are noted in the code.
This model indicates the the quality of vodka is a strong impact around the change in sales compared to the other selected variables. However, the coefficients do sit fairly close to zero.
plot_coeffq7(q7)
pairs(LnDiff ~ LnLPrice + LnPrint + LnOut + LnBroad + Tier1 + Tier2, data=sved)
plot(q6$fitted.values, q6$residuals, ylab="Residuals", xlab="Fitted Values", main="Q6 -- Residuals vs. Fitted")
abline(0, 0) # the horizon
print(anova)
## Analysis of Variance Table
##
## Model 1: LnDiff ~ LnLPrice + LnPrint + LnOut + LnBroad
## Model 2: LnDiff ~ LnLPrice + LnPrint + LnOut + LnBroad + Tier1 + Tier2
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 245 6.8671
## 2 243 6.6197 2 0.24743 4.5413 0.01158 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
QUESTION: To understand the influence of competition and brand power, run a regression by adding the sum of sales of all the competing brands in the previous year (lagtotalminussales) to the independent variables in question 7.
## Warning: Ignoring unknown aesthetics: xmin, xmax
RESPONSE: As indicated in the below graphs regarding the coefficients, adding in the LagTotalMinusSales variable to the model continues this trend in building a stronger model. It’s worth noting that the intercept in this model shifts from positive to negative (-1.79). However, LagTotalMinusSales has a fairly small coefficient – as displayed in both coefficient graphs.
coefplot(q8, intercept=FALSE)
## Warning: Ignoring unknown aesthetics: xmin, xmax
plot_coeffq8(q8)
QUESTION: To measure the sales growth of new brands compared with the existent ones, include the variable Firstintro. to the independent variable set in question 8. Firstintro is equal to one in the first three years after a brand is introduced and is zero elsewhere.
## Warning: Ignoring unknown aesthetics: xmin, xmax
RESPONSE: Continuing the previous trend, adding in the First Intro variable to the model continues builds a stronger model. The intercept in this model shifts remains negative (-1.62), but the coefficient with the largest –and positive– impact is the First Intro variable (5.43). This variable –which represents the first three years of brand’s introduction– has a much more significant impact than LagTotaMinusSales, Tier 1, Tier 2 or any of the previous variables focused on in questions 6 through 8. It is strongly influential variable.
coefplot(q9, intercept=FALSE)
## Warning: Ignoring unknown aesthetics: xmin, xmax
plot_coeffq9(q9)
QUESTION: Why does the coefficient of price and advertising change in the above regressions?
RESPONSE: As the requested variables are added, the R squared values (and R squared adjusted values) continues to grow and the AIC calculations continue to diminish. This reflects that the added attributes are strengthening the models being built and it shows the impact of multiple factors impact the market. There is not just a 1:1 relationship between price and advertising year over year; that would not be a nuanced or accurate picture of what is driving the market.
QUESTION Based on your analysis of the Vodka data, what recommendations do you have for Cuvelier regarding the marketing mix for SVEDKA?
RESPONSE: Based on the above analysis, I would have Cuveilier continue to do competitive brand research on upcoming product launches in the SVEDKA category Tier 2 category (and not worry as much about the Tier 1 space or, for that matter, Tier 3). Additionally, I would dig more into which print efforts (e.g. publications, journalists, tactics) are working and look to entirely overhaul the outdoor expenditure efforts. I would also dig deeper into price sensitivity and consumer outlook around the category – combining both qualitative and quantitative methods.
However, it’s worth noting that the data provided does not explicitly showcase how new media platforms (or social media) is currently factored into the marketing mix and budget. It’d be also worthwhile to investigate the differences between marketing expenditures between paid, earned and owned efforts. If possible, it would be worthwhile to gather additional data to factor in these efforts for a more holistic view. From there, Cuvelier could have a more nuanced understanding of what efforts are working (e.g. a press release vs. a Snapchat filter vs. a thought leader review vs. even word of mouth).
NOTE: This HTML RMD appears to print 7 pages | http://rpubs.com/tatoonie/309048