How to calculate marketing impact when you can’t A/B test

1. Introduction

How to calculate marketing impact when you can’t A/B test.

One of the key activities for companies to become more data-driven is A/B testing. There are going to be times where running a control experiment for something like activating a new channel, may not be possible. Also, customer behavior can sometimes be difficult to untangle amongst all of the different factors that influence buying decisions.

So how can we evaluate the impact of a change that’s already taken place? We can use a before and after analysis. Let’s get started. Rather than using a pre-selected dataset, we will create a synthetic dataset to use for our analysis.

# Install pacman if needed
if (!require("pacman")) install.packages("pacman")

## Loading required package: pacman

# load packages
pacman::p_load(pacman,
  tidyverse, openxlsx, psych, ggcorrplot, corrplot, cowplot, ggthemes)

2. Create synthetic data

First, set the seed for reproducibility when simulating data especially if you want to re-create the same dataset later.

#Set seed for reproducibility
set.seed(32250)

The dataset we want to create is as follows:

Sales date (based on week level granularity)
Revenue
Paid social ads

sales_date <- seq(as.Date('2021-01-01'), by = 'week', length.out=52)

#Check results
print(sales_date)

##  [1] "2021-01-01" "2021-01-08" "2021-01-15" "2021-01-22" "2021-01-29"
##  [6] "2021-02-05" "2021-02-12" "2021-02-19" "2021-02-26" "2021-03-05"
## [11] "2021-03-12" "2021-03-19" "2021-03-26" "2021-04-02" "2021-04-09"
## [16] "2021-04-16" "2021-04-23" "2021-04-30" "2021-05-07" "2021-05-14"
## [21] "2021-05-21" "2021-05-28" "2021-06-04" "2021-06-11" "2021-06-18"
## [26] "2021-06-25" "2021-07-02" "2021-07-09" "2021-07-16" "2021-07-23"
## [31] "2021-07-30" "2021-08-06" "2021-08-13" "2021-08-20" "2021-08-27"
## [36] "2021-09-03" "2021-09-10" "2021-09-17" "2021-09-24" "2021-10-01"
## [41] "2021-10-08" "2021-10-15" "2021-10-22" "2021-10-29" "2021-11-05"
## [46] "2021-11-12" "2021-11-19" "2021-11-26" "2021-12-03" "2021-12-10"
## [51] "2021-12-17" "2021-12-24"

social_ad_spend_avg <- 220

social_1 <-rep(0, length(sales_date) - 20)

print(social_1)

##  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

#Represents 10 weeks of social ad spending
social_2 <-rep(social_ad_spend_avg, length(sales_date) - length(social_1) - 10)

print(social_2)

##  [1] 220 220 220 220 220 220 220 220 220 220

social_3 <-rep(0, length(sales_date) - length(social_1) - length(social_2))

print(social_3)

##  [1] 0 0 0 0 0 0 0 0 0 0

#Combine into 1 vector
social <- c(social_1, social_2, social_3)

#Check if we have 52 weeks worth of data
length(social)

## [1] 52

Next is to generate revenue and place into a temporary revenue variable.

Revenue data is randomly generated based on a normal distribution.

#first argument is the number of draws
#2nd argument is the mean value per week of sales
temp_revenue <- rnorm(n=length(sales_date), mean = 3800, sd = 500)


#Check output
temp_revenue

##  [1] 3183.705 3526.286 4656.784 3956.365 3558.721 3882.231 4427.060 2797.395
##  [9] 3911.568 3324.583 3037.266 3443.502 3890.266 4149.344 4150.684 3767.462
## [17] 3504.365 4342.265 3769.305 3587.612 3812.881 4436.397 4901.101 4844.608
## [25] 4146.012 2979.749 3975.573 3591.830 3662.413 4121.466 4424.188 4664.163
## [33] 3961.647 3006.447 4063.691 3927.631 3936.741 3126.226 3394.774 3620.912
## [41] 3489.857 3224.486 3741.071 3144.567 4278.333 3484.772 4106.991 3850.383
## [49] 3711.587 3872.340 4542.720 4269.602

#Initialize data frame
df <- data.frame(sales_date,
                 temp_revenue, social)

In our last step, we want to scale revenue up for the periods during and after additional marketing ad spend.

df1 <- df %>% 
    mutate(revenue =
      case_when(
        sales_date >= "2021-08-13" ~ (temp_revenue + (temp_revenue * 0.35)), 
        TRUE ~ temp_revenue)
      )

head(df1)

The first thing that we want to do is define our (before and after) time periods, we need to add a column for the advertising periods.

df_final <- df1 %>% 
  select(-temp_revenue) %>% 
  mutate(advert_period = ifelse(sales_date < "2021-08-13", "before", "after" ))

df_final$advert_period <- factor(df_final$advert_period, levels=c('before','after'))

glimpse(df_final)

## Rows: 52
## Columns: 4
## $ sales_date    <date> 2021-01-01, 2021-01-08, 2021-01-15, 2021-01-22, 2021-01…
## $ social        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ revenue       <dbl> 3183.705, 3526.286, 4656.784, 3956.365, 3558.721, 3882.2…
## $ advert_period <fct> before, before, before, before, before, before, before, …

This dataset is 100% synthetic. It’s simulated data, but let’s visualize the data to make sure the dataset is good enough for performing our analysis.

3. Visualize dataset

p1 <- ggplot(df_final, aes(sales_date, revenue)) + geom_line() + theme_hc() + scale_y_continuous(expand = c(0, 0), limits = c(0, 7000)) + labs(x=" ", y=" ")
p2 <- ggplot(df_final, aes(revenue, advert_period)) + geom_boxplot() + theme_hc() + coord_flip() + labs(x = " ", y="Advertising Period")

plot_grid(p1, p2, nrow = 2, ncol = 1, labels = c("Revenue over time", "Revenue by social advertising period"))

4. Analysis

Now, let’s create a new dataframe. We want equal number of weeks of before and during our social ad spending time periods. We spent during a 10 week period so we want the 10 weeks before.

#Filter on dates from the start date of our spending minus 10 weeks plus the entire social spending period. To subtract 10 weeks from our spending date, we will multiply 7 days by 10 to get the right interval in weeks.
before_after_df <- df_final %>% 
        filter(between(sales_date, as.Date('2021-08-13') - 7*10, as.Date('2021-10-15')))


before_after_df

Then we will take the average revenue for both time periods.

ads_group <- before_after_df %>% 
  group_by(advert_period) %>% 
  summarize_at('revenue', .funs=mean) %>% 
  arrange(advert_period)
ads_group

ads_group

Let’s extract the average revenue for each period and calculate the percentage difference. With the average difference in revenue, we can calculate the uplift.

beforeMean<- as.numeric(ads_group %>% filter(advert_period == 'before') %>% select(revenue))


afterMean <- as.numeric(ads_group %>% filter(advert_period == 'after') %>% select(revenue))
 
uplift <- round(afterMean - beforeMean,2) 


percChange<- round(((afterMean-beforeMean)/beforeMean) *100,2)


print(paste("The difference in revenue from before and after is", uplift, "US Dollars", "with the percentage difference being", percChange, "%"))

## [1] "The difference in revenue from before and after is 695.47 US Dollars with the percentage difference being 16.83 %"

First, let’s grab the average spend.

#First calculate the average spend
avg_spend_df <- df_final %>% 
  filter(between(sales_date, as.Date('2021-08-13'), as.Date('2021-10-15'))) %>% 
  summarize(avg_spend = mean(social))

(avg_spend <- avg_spend_df[[1]])

## [1] 220

Now calculate the return on ad spend based on uplift and average spending.

#Take the uplift which is the difference in average revenue subtract the average spend. Then divide by the average spend.
roas <- round((uplift - avg_spend)/avg_spend,2)
roas

## [1] 2.16

If our revenue is in dollars, then for every $1 spent, we earned $2.16 dollars in revenue.

5. Final Summary

Before and after analysis can help provide some context to evaluate initiatives that could not be A/B tested, but we still want to understand the impact.

6. References

[1] Chapman, C. and McDonnell Feit, E., (2015). R for marketing research and analytics. Cham: Springer, pp.47-59.