Regression is the most basic modeling technique and is widely used across all industries. You have probably heard linear regression, logistic regression, nonlinear regression, etc. However, here I intend to not inherit this categorization based off statistical tactics, but rather classify them based on their application in marketing.
Regression is one kind of mathematical expression in marketing relationships. For instance, you may want to understand the relationship between investment in marketing program A and the overall Sales holding all other factor constant, and regression helps you express this relationship (Investment in A to sales) in a mathematical form.
But mostly, when it comes to decision making, what we are really concerned, is by how much sales X amount of incremental investment will drive, in another words, what we truly care, is the incremental growth we will be able to drive.
Mathematics (Calculus, most specifically) just provides us an intuitive way of expressing incremental growth in math’s language – The (partial) derivative of independent variable(s) .
x <- 0:100
y <- 1450.54 + 5.3*x
linear <- data.frame(Invest = x, Sales = y)
Giving a very simple example, if the relationship between Investment in A and Sales is given as follows:
\[Y = \alpha + \beta X\]
ggplot(linear,aes(x = Invest, y = Sales)) + geom_line() + ggtitle("Scenario 1: Constant Incremental Growth")
Then the incremental growth – incremental increase in sales by one unit increase in invest – happens to beta!
D(expression(1476.107+5.298*x),"x")
## [1] 5.298
What does this tell us?
The incremental growth is a constant; it will not be impacted by Investment in A at all. Whether your investment level is 5, or 5000, the incremental growth will be the same. In other words, the marginal is constant regardless the cumulative.
If you think about it, this is an unrealistic assumption. The incremental growth will likely to differ when the investment level is 5 and when it’s at 5000. it might take a certain threshold of investment for program to scale, or program reaches market saturation. Nevertheless, this regression approach is widely used for one simple reason: simplicity. You can fit linear regression even in Excel.
Even though often our marketing relationship will not satisfy the hypothesis of constant incremental growth, Linear regression is still very useful in exploratory analysis to get a general idea of the relationship between data. Whether there is positive relationship or not.,etc.
In real world scenario, relationship between Investment and Sales is often not linear - that the incremental growth is not independent of cumulative effects.
One assumption that is widely adopted is the diminishing return assumption - that we believe as the independent variable increases, the incremental growth will decrease. For example, when our investment level is at 5, 1 incremental growth will fuel 100 increase in sales; but when our investment level grow up to 5000, 1 incremental growth will only fuel 50 increase in sales. If you observe similar trend in your data, you should consider incorporating diminishing return into your modeling.
There is a couple of different ways of modeling diminishing return. Negative Exponential is one of them.
There are two parameters in Negative Exponential Regression, Saturation and Proportion.
Saturation is the asymptote that determines the upper bound of dependent variables. Eg, if market share is our body of study, then we know that the saturation for market share should at most be 100%.
Proportion determines the rate of growth - the bigger the proportion rate, the faster the dependent variable will approach saturation. in Negative Exponential Model, dependent variable grows in proportion to the remaining dependent level:
\[\frac{\partial Y}{\partial X} = \theta (S - Y)\]
The reason why it is called Negative Exponential Regression lays in the fact that if you solve the equation, you will get:
\[Y = S\cdot(1-\exp(-\theta X))\]
proportion <- c(0.1,0.3,0.6)
saturation <- c(500,1000)
x <- NULL
y <- NULL
parameter <- NULL
for(s in saturation) {
for(p in proportion) {
var_name <- paste0("Saturation: ",s,",","Proportion: ",p)
parameter <- c(parameter, rep(var_name,101))
y <- c(y,s*(1-exp(-p*(0:100))))
x <- c(x,0:100)
}
}
negative.exp <- data.frame(x = x, y = y, parameter = parameter)
print(ggplot(negative.exp,aes(x = x, y= y, group = parameter,color = parameter))+geom_line(size = 1)+ggtitle("Negative Exponential Curve at different saturation and proportion level"))
I’d like to take product life cycle/diffusion of technology as an example. If you are an entrepreneur working dilligently for your start-up. At the very first stage, only few are put in contact with your product, as you put down more investment, the program start to scale, and you expect the incremental growth to increase as the investment increases(Think of Uber at its early stage); But after you pass certain threshold - that your product has reached a relatively large market share, it becomes increasingly difficult to acquire new customers as you approach saturation(Think of Facebook, Microsoft). At this point, the incremental growth starts to decrease even if the investment level decreases.
We use S-Shape(sigmod) Regression to model this relationship.
There are three parameters in S-shape regression. Saturation, Proportion and Turning point.
Saturation is the asymptote that determines the upper bound of dependent variables. Eg, if market share is our body of study, then we know that the saturation for market share should at most be 100%.
Proportion determines the rate of growth - the bigger the proportion rate, the faster the dependent variable will approach saturation.
Turning point divides the curve into two segments. before reaching turning point, the curve is convex and the incremental growth increases as independent variable increases; after surpassing turning point, the curve is concave and the incremental growth decreases as independent variable decreases.
\[\frac{\partial Y}{\partial X} = \theta\cdot\alpha\cdot y(\theta -y)\]
By solving this equation, we get:
\[\frac{\theta}{(1+\exp^{-\alpha\cdot(x-\tau)})}\]
proportion <- c(0.2,0.5)
saturation <- c(500,1000)
turning.point <- 30
x <- NULL
y <- NULL
parameter <- NULL
for(t in turning.point) {
for(s in saturation) {
for(p in proportion) {
var_name <- paste0("Saturation: ",s,",","Proportion: ",p,",Turning.Point: ",t)
parameter <- c(parameter, rep(var_name,51))
ind_x <- 0:50
y <- c(y,s/(1+exp(-p*(ind_x-t))))
x <- c(x,ind_x)
}
}
}
s.shape <- data.frame(x = x, y = y, parameter = parameter)
print(ggplot(s.shape,aes(x = x, y= y, group = parameter,color = parameter))+geom_line(size = 1)+ggtitle("S-shape Curve at different saturation and proportion level"))
There are other models that can be used to model marketing relationship, but these three are the mostly used marketing models. In the next session, I will show you how to apply these models into real-life marketing scenarios and how to use R package to fit these models.