Code
# get data: vector of numeric values representing media served in consecutive
# periods... could be impressions, or views, or spend, etc.
media <- read.csv('input/adstock_data.csv', header=F)$V1CUNY Data 621 - Spring 2022
Rather than discuss a specific modeling technique or statistical concept, in this post, I'd like to talk about a concept specific to the media and advertising industry that I think is fairly important, but where I have not yet seen a reliable, consistent approach for measurement.
This concept is called Adstock. It's not a great name, true. However, the concept is fairly simple. Advertisers devote considerable time, creative energy, and resources to making their messages resonate with media consumers. If we're doing our job correctly, the consumer should remember that message long after they've been exposed to it.
Further, if the consumer has been exposed multiple times to that message during a given period, there should be some sort of cumulative, lasting effect on the mind and behavior of consumers, that does not simply disappear the moment we stop serving the message.
So, Adstock is the theoretical measure of the cumulative effect of all this advertising, at any given moment. And we would expect it to gradually diminish over time, and thus we should be able to model it according to some mathematical formula.
Now that I’ve explained the concept, I think you’d agree it’s pretty important to have some kind of understanding of how much Adstock is out there right now for your ad campaigns, and how much time is left before consumers forget your message entirely.
And yet in my experience at several agencies, little to no attention is paid to this important concept at the media measurement and reporting level.
I suspect the main challenge to addressing the Adstock question is the difficulty of direct measurement. How do we efficiently measure the amount of advertising that people remember?
Brand Awareness survey data is probably a close proxy for Adstock effects, but (apart from being expensive) those are usually the response variables in our equation - the active question for advertisers is usually "How much do we need to spend to earn a marginal point in Brand Awareness (and stay ahead of our competitors' scores)?"
If we're using straight media impressions as a predictor in that particular model, with no adjustment for Adstock effects, we're assuming our consumers are goldfish swimming around in a bowl, forgetting our message the moment their attention is drawn away. No offense, goldfish, but that doesn't sound reasonable.
There is a wonderful book on business analytics and estimation, How to Measure Anything written by Douglas W. Hubbard. The title on the first page of the first section reads, "The Measurement Solution Exists" and it encapsulates the message that Hubbard drives home over and over again - no matter how tricky the problem, some kind of estimation is always possible, and it always improves your level of understanding and accuracy.
For a recent project with a major social media brand, I was challenged to examine some weekly time-series data of Brand Awareness survey scores and to try and figure out the relationship of those scores to the amount of media impressions that were being served in the U.S. market at those times, among other variables.
This was a great opportunity to try and test my Adstock theories – would an adjustment to the raw media impressions help produce a better model? After a survey of the Internet to try and find examples of prior work or a package to help estimate media effects (and coming up short), I decided to write something simple, plug it in, and see what happened.
Please excuse some of the code... the numbers have been truncated & altered from their original values, but they're representative.
# get data: vector of numeric values representing media served in consecutive
# periods... could be impressions, or views, or spend, etc.
media <- read.csv('input/adstock_data.csv', header=F)$V1There are three manual variables we can adjust. The most significant is halflife_days, an estimate of days required for any media to lost 50% of its “effect” on consumers in the aggregate (set to 3 days here strictly as a guess.)
# manually set these three vars
halflife_days <- 3 # estimated days for media to lose 50% effect
period_days <- 1 # 7 == 1 week
periods_addl <- 7 # additional periods to calculate after final media placement
# more vars
periods <- length(media)
periods_count <- periods + periods_addl
df <- data.frame(matrix(ncol=periods_count, nrow=0))And for the rate of decay of media effectiveness, we apply the exponential formula:
# for each element, create a vector and append values from exponential decay formula.
for(m in seq_along(media)){
decay_vec<- c(media[m])
for(p in 1:(periods_count-m)){
decay_vec <- append(decay_vec, round(exp(log(0.5) *
p / halflife_days * period_days) * media[m]))
}
# pad each consecutive vector to the right by m-1 elements, append to df
padded_vec <- append(integer(m-1), decay_vec)
df <- rbind(df, padded_vec)
}# fix df column labels
colnames(df) <- c(1:periods_count)
# sum up
adstock <- colSums(df)
# pad the media vector and merge to final df
media <- append(media, integer(length(adstock)-length(media)))
adstock_table <- data.frame(media, adstock)| media | adstock |
|---|---|
| 1527360 | 1527360 |
| 45408710 | 46620976 |
| 56480103 | 93483197 |
| 26802735 | 101000397 |
| 20425402 | 100589470 |
| 15618 | 79853534 |
| 158 | 63379950 |
| 75272107 | 125576807 |
| 81897782 | 181568160 |
| 67594427 | 211705171 |
| 71082748 | 239113253 |
| 66384687 | 256169001 |
| 62311329 | 265632801 |
| 61308437 | 272141330 |
| 60095799 | 276094517 |
| 32157 | 219168520 |
| 0 | 173954168 |
| 0 | 138067517 |
| 0 | 109584259 |
| 0 | 86977087 |
| 0 | 69033758 |
| 0 | 54792132 |
| 0 | 43488543 |
# quick viz
period_labels <- as.numeric(row.names(adstock_table))
ggplot(adstock_table) +
geom_area(aes(x=period_labels, y=adstock), fill='green', alpha=.5, stat='identity') +
geom_bar(aes(x=period_labels, y=media), fill='blue', stat='identity') +
ggtitle('Media and Adstock Effect') + xlab('period') + ylab('effect') +
scale_x_continuous(labels = period_labels, breaks=period_labels) +
scale_y_continuous(labels = scales::label_number(suffix = " K", scale = 1e-6)) In the plot above, the blue bars represent the actual media impressions served on a given day, and the green “wave” represents the cumulative Adstock effect.
As we can see from the graph, the Adstock effect can be quite substantial, even with a moderate “half-life” of three days! It seems to me that there’s potentially some value in building Adstock-adjusted features for any number of standard models and reporting techniques.
The concept seems relevant to any important questions with a time-series aspect, such as: