Problem setting

An e-commerce platform’s marketing department has ran 200 campaigns in across its 200 different markets. Depending on the market size, a market could be towns, cities, or countries; the platform’s advertisements were all aimed at its target market persona. The campaigns were conducted across 3 main channels: Google Ads, Meta (Facebook and Instagram), and LinkedIn. The marketing expenditure and sales data for each market were then queried from the database. A 4-variables dataset is made available as follows:

Sales: amount of units sold
google, facebook_insta, and linkedin: the advertising expenditure for each channel, in euro.

The table below illustrates what the first few observations in the dataset look like.

##   google linkedin facebook_insta Sales
## 1 230100    37800          69200 22100
## 2  44500    39300          45100 10400
## 3  17200    45900          69300 12000
## 4 151500    41300          58500 16500
## 5 180800    10800          58400 17900
## 6   8700    48900          75000  7200

The aim is to inform how the different channels perform, and advise on a future steps.

Analysis

I’ll start with a summary statistics table.

Data summary
Name	mdf
Number of rows	200
Number of columns	4
_______________________
Column type frequency:
numeric	4
________________________
Group variables	None

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100
google	1	147042.5	85854.24	700	74375	149750	218825	296400
linkedin	1	23264.0	14846.81	0	9975	22900	36525	49600
facebook_insta	1	30554.0	21778.62	300	12750	25750	45100	114000
Sales	1	15130.5	5283.89	1600	11000	16000	19050	27000

15k items per market were sold, on average. The table also illustrates that the marketing expenditure varied greatly by channel. On average, a LinkedIn campaign cost 23k euro, while a Google Ads campaign cost 147k euro.

Let’s see how they’re related to sales:

The correlation between sales and Google Ads seems to be the highest (0.9), followed by LinkedIn and Meta. Interesting to note that there’s moderate correlation between Meta and LinkedIn ads expenditure also.

I kept the x-axis (ads expenditure) the same length throughout the charts for perspective-taking, but also to illustrate the differences in expenditure between the marketing channels. A few things to mention here:

The Google Ads expenditure is most closely related to sales.
The effect of LinkedIn ads on sales seems to be the highest, on average (a steeper blue line), but the data is dispersed a lot. This suggests that LinkedIn ads sometimes work really well, other times they don’t.
The Meta Ads is the least related to sales, and it’s characterised by high variation (the dots are spread up-and-down a lot).

The challenge is to create a model that helps estimate the effect each of the marketing channels has. A linear regression model fitted on all of the data produced the following estimates:

## 
## Call:
## lm(formula = Sales ~ ., data = mdf)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7303.4  -824.4    -0.8   897.6  3747.3 
## 
## Coefficients:
##                    Estimate   Std. Error t value Pr(>|t|)    
## (Intercept)    4625.1240788  307.5011646  15.041   <2e-16 ***
## google            0.0544458    0.0013752  39.592   <2e-16 ***
## linkedin          0.1070012    0.0084896  12.604   <2e-16 ***
## facebook_insta    0.0003357    0.0057881   0.058    0.954    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1662 on 196 degrees of freedom
## Multiple R-squared:  0.9026, Adjusted R-squared:  0.9011 
## F-statistic: 605.4 on 3 and 196 DF,  p-value: < 2.2e-16

The model indicates that both Google Ads and LinkedIn impact sales, but the Meta ads don’t. So I’ll drop the Meta advertising from the model.

## 
## Call:
## lm(formula = Sales ~ google + linkedin, data = mdf)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7313.1  -826.9     9.5   902.2  3748.4 
## 
## Coefficients:
##                Estimate  Std. Error t value Pr(>|t|)    
## (Intercept) 4630.879464  290.308085   15.95   <2e-16 ***
## google         0.054449    0.001371   39.73   <2e-16 ***
## linkedin       0.107175    0.007926   13.52   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1657 on 197 degrees of freedom
## Multiple R-squared:  0.9026, Adjusted R-squared:  0.9016 
## F-statistic: 912.7 on 2 and 197 DF,  p-value: < 2.2e-16

Google ads effect can be interpreted as such: a 1 euro increase in advertising on Google Ads is associated with a 0.054 units increase in sales. Put differently, about 18,52 euro of advertising expenditure on Google ads is expected to sell one additional unit.

For LinkedIn ads, a 9,10 euro of advertising expenditure is expected to sell one additional unit.

Looking at the regression model’s summary below, the model seems to explain some 90% of variation in sales. The model also has a close-to-zero p-value.

get_regression_summaries(lm2)

## # A tibble: 1 x 9
##   r_squared adj_r_squared      mse  rmse sigma statistic p_value    df  nobs
##       <dbl>         <dbl>    <dbl> <dbl> <dbl>     <dbl>   <dbl> <dbl> <dbl>
## 1     0.903         0.902 2706053. 1645. 1657.      913.       0     2   200

Note on interaction effects

In marketing, advertising on multiple channels tends to create better effects than advertising on any single channel alone. In statistical inference terminology, this multiplicative effect is known as an interaction effect. In this dataset I know there’s an interaction effect between Google and LinkedIn ads. However, in practice it can be hard to tell whether such an effect exists - especially between numeric variables. The best visual analysis tool that I could come up with for this dataset is this bubblechart:

## `geom_smooth()` using formula 'y ~ x'

The bubble chart illustrates that when more money are spent on LinkedIn ads, the total sales increases, as you’d expect.

But it isn’t all that obvious that a multiplicative effect is present. What indicates this to me is that when the Google Ads expenditure is low but radio expenditure is high (bottom left on the chart), the dots are below or close to the line of best fit (the red line) - thus the model tends to overestimate the sales volume. On the other hand, as the Google Ads expenditure increases, the model tends to underestimate the sales volume when LinkedIn expenditure is also high.

So the bubble chart at least indicates that there’s a distinct possibility that there is one. Statistical model testing is the next step.

## 
## Call:
## lm(formula = Sales ~ google * linkedin, data = mdf)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6309.3  -868.8   -31.2   908.9  3641.5 
## 
## Coefficients:
##                  Estimate Std. Error t value    Pr(>|t|)    
## (Intercept)     6.193e+03  4.101e+02  15.101     < 2e-16 ***
## google          4.358e-02  2.489e-03  17.512     < 2e-16 ***
## linkedin        4.227e-02  1.473e-02   2.869     0.00457 ** 
## google:linkedin 4.431e-07  8.673e-08   5.110 0.000000763 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1561 on 196 degrees of freedom
## Multiple R-squared:  0.914,  Adjusted R-squared:  0.9127 
## F-statistic: 694.7 on 3 and 196 DF,  p-value: < 2.2e-16

The interaction term (\(google*linkedin\)), is highly significant with a p-value close to 0. The R-squared value also increases from 0.903 to 0.914. This indicates that a moderation effect is present. However, interaction models are more difficult to interpret, and the additional variation explained is small. As an initial model, and for explainability purposes, I would leave it out at this stage. Instead, I would mention that using both channels is likely to “benefit on sales additionally”.

Interaction modelling also comes with a range of caveats but I’ll leave that for another post. Meanwhile, this article that suggests when you should include interaction effects.

Recommended action

This model suggests that Google and LinkedIn Ads are associated with sales, while advertising on Meta isn’t. To estimate the effect advertising on the two channels would have on sales (in units), use the following equation:

\[ Sales\_units = 4631 + 0.054*Google\_amount + 0.11*LinkedIn\_amount \]

Plugging in the average campaign costs of 147k euro for Google Ads and 23k euro for LinkedIn ads results in an estimated sales of ~15k units sold - the same as the actual average units sold!

When using the model, take note of the following:

Using both LinkedIn and Google ads is likely to increase sales more than advertising on only one of the two channels.
The model is useful for advertising similar amounts as done until now. Spending a lot less or a lot more than what has been done so far could lead to very different results.
The model should be used with caution, and further experimentation is advised. For example, advertising on Meta seems to have no effect on sales. However, this should be validated by future experimentation.

Basic marketing expenditure analysis

Dragos

2022-08-11

Problem setting

Analysis

Note on interaction effects

Recommended action