An e-commerce platform’s marketing department has ran 200 campaigns in across its 200 different markets. Depending on the market size, a market could be towns, cities, or countries; the platform’s advertisements were all aimed at its target market persona. The campaigns were conducted across 3 main channels: Google Ads, Meta (Facebook and Instagram), and LinkedIn. The marketing expenditure and sales data for each market were then queried from the database. A 4-variables dataset is made available as follows:
Sales: amount of units sold
google, facebook_insta, and
linkedin: the advertising expenditure for each channel, in
euro.
The table below illustrates what the first few observations in the dataset look like.
## google linkedin facebook_insta Sales
## 1 230100 37800 69200 22100
## 2 44500 39300 45100 10400
## 3 17200 45900 69300 12000
## 4 151500 41300 58500 16500
## 5 180800 10800 58400 17900
## 6 8700 48900 75000 7200
The aim is to inform how the different channels perform, and advise on a future steps.
I’ll start with a summary statistics table.
| Name | mdf |
| Number of rows | 200 |
| Number of columns | 4 |
| _______________________ | |
| Column type frequency: | |
| numeric | 4 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 147042.5 | 85854.24 | 700 | 74375 | 149750 | 218825 | 296400 | |
| 0 | 1 | 23264.0 | 14846.81 | 0 | 9975 | 22900 | 36525 | 49600 | |
| facebook_insta | 0 | 1 | 30554.0 | 21778.62 | 300 | 12750 | 25750 | 45100 | 114000 |
| Sales | 0 | 1 | 15130.5 | 5283.89 | 1600 | 11000 | 16000 | 19050 | 27000 |
15k items per market were sold, on average. The table also illustrates that the marketing expenditure varied greatly by channel. On average, a LinkedIn campaign cost 23k euro, while a Google Ads campaign cost 147k euro.
Let’s see how they’re related to sales:
The correlation between sales and Google Ads seems to be the highest (0.9), followed by LinkedIn and Meta. Interesting to note that there’s moderate correlation between Meta and LinkedIn ads expenditure also.
I kept the x-axis (ads expenditure) the same length throughout the charts for perspective-taking, but also to illustrate the differences in expenditure between the marketing channels. A few things to mention here:
The Google Ads expenditure is most closely related to sales.
The effect of LinkedIn ads on sales seems to be the highest, on average (a steeper blue line), but the data is dispersed a lot. This suggests that LinkedIn ads sometimes work really well, other times they don’t.
The Meta Ads is the least related to sales, and it’s characterised by high variation (the dots are spread up-and-down a lot).
The challenge is to create a model that helps estimate the effect each of the marketing channels has. A linear regression model fitted on all of the data produced the following estimates:
##
## Call:
## lm(formula = Sales ~ ., data = mdf)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7303.4 -824.4 -0.8 897.6 3747.3
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4625.1240788 307.5011646 15.041 <2e-16 ***
## google 0.0544458 0.0013752 39.592 <2e-16 ***
## linkedin 0.1070012 0.0084896 12.604 <2e-16 ***
## facebook_insta 0.0003357 0.0057881 0.058 0.954
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1662 on 196 degrees of freedom
## Multiple R-squared: 0.9026, Adjusted R-squared: 0.9011
## F-statistic: 605.4 on 3 and 196 DF, p-value: < 2.2e-16
The model indicates that both Google Ads and LinkedIn impact sales, but the Meta ads don’t. So I’ll drop the Meta advertising from the model.
##
## Call:
## lm(formula = Sales ~ google + linkedin, data = mdf)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7313.1 -826.9 9.5 902.2 3748.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4630.879464 290.308085 15.95 <2e-16 ***
## google 0.054449 0.001371 39.73 <2e-16 ***
## linkedin 0.107175 0.007926 13.52 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1657 on 197 degrees of freedom
## Multiple R-squared: 0.9026, Adjusted R-squared: 0.9016
## F-statistic: 912.7 on 2 and 197 DF, p-value: < 2.2e-16
Google ads effect can be interpreted as such: a 1 euro increase in advertising on Google Ads is associated with a 0.054 units increase in sales. Put differently, about 18,52 euro of advertising expenditure on Google ads is expected to sell one additional unit.
For LinkedIn ads, a 9,10 euro of advertising expenditure is expected to sell one additional unit.
Looking at the regression model’s summary below, the model seems to explain some 90% of variation in sales. The model also has a close-to-zero p-value.
get_regression_summaries(lm2)
## # A tibble: 1 x 9
## r_squared adj_r_squared mse rmse sigma statistic p_value df nobs
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.903 0.902 2706053. 1645. 1657. 913. 0 2 200
In marketing, advertising on multiple channels tends to create better effects than advertising on any single channel alone. In statistical inference terminology, this multiplicative effect is known as an interaction effect. In this dataset I know there’s an interaction effect between Google and LinkedIn ads. However, in practice it can be hard to tell whether such an effect exists - especially between numeric variables. The best visual analysis tool that I could come up with for this dataset is this bubblechart:
## `geom_smooth()` using formula 'y ~ x'
The bubble chart illustrates that when more money are spent on LinkedIn ads, the total sales increases, as you’d expect.
But it isn’t all that obvious that a multiplicative effect is present. What indicates this to me is that when the Google Ads expenditure is low but radio expenditure is high (bottom left on the chart), the dots are below or close to the line of best fit (the red line) - thus the model tends to overestimate the sales volume. On the other hand, as the Google Ads expenditure increases, the model tends to underestimate the sales volume when LinkedIn expenditure is also high.
So the bubble chart at least indicates that there’s a distinct possibility that there is one. Statistical model testing is the next step.
##
## Call:
## lm(formula = Sales ~ google * linkedin, data = mdf)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6309.3 -868.8 -31.2 908.9 3641.5
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.193e+03 4.101e+02 15.101 < 2e-16 ***
## google 4.358e-02 2.489e-03 17.512 < 2e-16 ***
## linkedin 4.227e-02 1.473e-02 2.869 0.00457 **
## google:linkedin 4.431e-07 8.673e-08 5.110 0.000000763 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1561 on 196 degrees of freedom
## Multiple R-squared: 0.914, Adjusted R-squared: 0.9127
## F-statistic: 694.7 on 3 and 196 DF, p-value: < 2.2e-16
The interaction term (\(google*linkedin\)), is highly significant with a p-value close to 0. The R-squared value also increases from 0.903 to 0.914. This indicates that a moderation effect is present. However, interaction models are more difficult to interpret, and the additional variation explained is small. As an initial model, and for explainability purposes, I would leave it out at this stage. Instead, I would mention that using both channels is likely to “benefit on sales additionally”.
Interaction modelling also comes with a range of caveats but I’ll leave that for another post. Meanwhile, this article that suggests when you should include interaction effects.
This model suggests that Google and LinkedIn Ads are associated with sales, while advertising on Meta isn’t. To estimate the effect advertising on the two channels would have on sales (in units), use the following equation:
\[ Sales\_units = 4631 + 0.054*Google\_amount + 0.11*LinkedIn\_amount \]
Plugging in the average campaign costs of 147k euro for Google Ads and 23k euro for LinkedIn ads results in an estimated sales of ~15k units sold - the same as the actual average units sold!
When using the model, take note of the following:
Using both LinkedIn and Google ads is likely to increase sales more than advertising on only one of the two channels.
The model is useful for advertising similar amounts as done until now. Spending a lot less or a lot more than what has been done so far could lead to very different results.
The model should be used with caution, and further experimentation is advised. For example, advertising on Meta seems to have no effect on sales. However, this should be validated by future experimentation.