Read.me
This notebook shows you how to estimate the Bass model with observed no. of adoptions, as well as how to make predictions with the estimated Bass model. You may find some equations and mathematical details are included in the notebook. They are included solely for show-and-tell purpose. You do NOT have to memorize any of these equations whatsoever.
To make things easier, I also include a general function to estimate the Bass model. You can apply this function to future data. For more details of this function, please check the last section.
Data Description
Please download and load data “Diffusion_data.Rdata” from Canvas. You will find three datasets in the file:
- “iphone”: the quarterly sales of iphone (in units)
- “red_bull_ads”: the daily no. of watches of red bull ads
- “sticky_notes”: the daily installations of the Facebook app
Note that the three data frames have the same data structure. They all contain the no. of “adoptions” at different time periods. Or, in Bass model term, each element of the sales variables is \(n(t)\).
For the data frames, the first variable “time” refers to different time periods in chronicle order. Depending on which the product, the time periods are different. For iphone data, the time periods are quarters of the year. For red bull ads and sticky notes app, the time periods are days. The second variable no_of_adoptions refers to the no. of sales or downloads or watches at each time period.
load("Diffusion_data.RData")
head(iphone)## time no_of_adoptions
## 1 Q2/07 270000
## 2 Q3/07 1119000
## 3 Q4/07 2315000
## 4 Q1/08 1703000
## 5 Q2/08 717000
## 6 Q3/08 6892000
head(red_bull_ads)## time no_of_adoptions
## 1 8/29/2007 328040
## 2 8/30/2007 342780
## 3 8/31/2007 322680
## 4 9/1/2007 341925
## 5 9/2/2007 279350
## 6 9/3/2007 332700
head(sticky_notes)## time no_of_adoptions
## 1 8/29/2007 2340934
## 2 8/30/2007 2404545
## 3 8/31/2007 2438134
## 4 9/1/2007 2373025
## 5 9/2/2007 2642917
## 6 9/3/2007 2312700
Analyzing New Product Diffusion with the Bass Model in 3 Steps
We now will use iphone data as an example to see how to analyze the diffusion of iphones with the Bass model in 3 steps. You may try the same procedure on red_bull_ads and sticky_notes data.
A bit of recap of the estimation of Bass model. We derive the estimation function from the Bass equation with: \[
\frac{f\left( t \right)}{1-F\left( t \right)}=p+qF\left( t \right)
\] Note that the PDF \(f(x)\) and CDF \(F(x)\) can be re-expressed with the number of adoptions at each time period \(n(t)\), the cumulative number of adoptiosn \(N(t)\), and the total market potential \(M\), as following: \[
\begin{cases}
f\left( t \right) =\frac{n\left( t \right)}{M}\\
F\left( t \right) =\frac{N\left( t \right)}{M}\\
\end{cases}
\] We then submit back these two equations into the Bass equation and get the main estimation function: a regression of the no. of adoptions on the cumulative no. of adoptions and the squared cumulative no. of adoptions, as shown below:
\[
\frac{\frac{n\left( t \right)}{M}}{1-\frac{N\left( t \right)}{M}}=p+q\frac{N\left( t \right)}{M}
\\
\Rightarrow n\left( t \right) =\left[ p+\frac{qN\left( t \right)}{M} \right] \left[ M-N\left( t \right) \right]
\\
\Rightarrow n\left( t \right) =a+bN\left( t \right) +c\left[ N\left( t \right) \right] ^2
\] With a bit of algebra, we know that:
\[
\begin{cases}
a\,\,=pM\\
b=q-p\\
c=-\frac{q}{M}\\
\end{cases}\,\,AND\,\,\begin{cases}
M=\frac{-b\pm \sqrt{b^2-4ac}}{2c}\\
p=\frac{a}{M}\\
q=-Mc\\
\end{cases}\,\,
\] With the math derivation, we can now start our analysis. Note that M is solved from a quadratic equation, it has two roots. We will use the largest value as the marketing potential M.
Step 1 - Estimating the empirical Bass equation
The first step is to run a regression with the Bass equation. We use \(n(t)\) as the dependent variable, and \(N(t)\) and \(N(t)^2\) as independent variables. However, in the data we only have the variable n(t) or the no. of adoptions at each time period. We first need to create a new variable of cumulative adoptions or \(N(t)\). For this, we use the cumsum() function from the base. Please use ?cumsum to see how this function works.
# creating a variable of cumulative adoptions
iphone$cum_adoptions <- cumsum(iphone$no_of_adoptions)
iphone$cum_adoptions## [1] 270000 1389000 3704000 5407000 6124000 13016000 17379000
## [8] 21172000 26380000 33747000 42484000 51236000 59634000 73736000
## [15] 89971000 108618000 128956000 146029000 183073000 218137000 244165000
## [22] 271075000 318864000 356294000 387535000 421332000 472357000 516076000
## [29] 551279000 590551000 665019000 726189000 773723000 821769000 896548000
## [36] 947741000 988140000
With the new variable, we can then run a regression to get the coefficients \((a,b,c)\). The estimating equation is described as above. The formulation of the model for the iphone data is like below:
# run a regression with the no. of adoptions as DV and the cumulative adoptions and its squared term as IVs
# the I(cum_adoptions^2) means we are adding the squared term of cum_adoptions to the regression.
mdl_iphone <- lm(no_of_adoptions ~ 1 + cum_adoptions + I(cum_adoptions^2),iphone)
summary(mdl_iphone)##
## Call:
## lm(formula = no_of_adoptions ~ 1 + cum_adoptions + I(cum_adoptions^2),
## data = iphone)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14154003 -3426209 -1046772 2410199 21584443
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.820e+06 2.222e+06 1.269 0.213
## cum_adoptions 1.287e-01 1.504e-02 8.559 5.35e-10 ***
## I(cum_adoptions^2) -8.031e-11 1.679e-11 -4.785 3.26e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8080000 on 34 degrees of freedom
## Multiple R-squared: 0.864, Adjusted R-squared: 0.856
## F-statistic: 108 on 2 and 34 DF, p-value: 1.858e-15
From the regression, we can obtain the \((a,b,c)\) as in the empirical Bass equation, where \(a\) is the intercept, \(b\) is the coefficient of the cumulative adoptions, and \(c\) is the coefficient of the squared cumulative adoptions.
a_iphone <- mdl_iphone$coefficients[1]
b_iphone <- mdl_iphone$coefficients[2]
c_iphone <- mdl_iphone$coefficients[3]
a_iphone## (Intercept)
## 2820363
b_iphone## cum_adoptions
## 0.1286912
c_iphone## I(cum_adoptions^2)
## -8.031406e-11
Step 2 - Transforming coefficients to the Bass parameters
Given the \((a,b,c)\) as in the empirical Bass equation, we can recover the Bass parameters \((p,q,M)\) with the equation shown above.
# obtaining M for iphones
# because two roots for the quadratic equation, we get the larger one (or the positive one) as the market size.
# For the max() function, see ?max for more details.
M_iphone <- max((-b_iphone-sqrt(b_iphone^2-4*a_iphone*c_iphone))/(2*c_iphone),
(-b_iphone+sqrt(b_iphone^2-4*a_iphone*c_iphone))/(2*c_iphone))
# given M_iphone, we calculate p and q
p_iphone <- a_iphone/M_iphone
q_iphone <- -c_iphone*M_iphoneStep 3 - Making predictions with the estimation results
With the estimated values of \((p,q,M)\), we can then make predictions with the Bass model. For the predictions, you need another input \(T\) or the time periods you want to predict. When \(T\) is specified, you can use the functions that we discussed in the lecture to make predictions.
The equations are shown below. \(t\) stands for the no. of a time period, e.g., time period \(1,2,3,\cdots\) \[N(t) = M(1-\dfrac{p+q}{pe^{(p+q)t}+q}) \]
# assume we are predicting from the quarter 38 to the quarter 60 for iphones.
# note that the data end at quarter 37.
# a vector of the no. of quarters
T <- 38:60
# getting cumulative no. of adoptions
N_iphone <- M_iphone*(1 -
(p_iphone+q_iphone)/
(p_iphone*exp((p_iphone+q_iphone)*T+q_iphone)))
N_iphone## [1] 909171317 997666196 1075205120 1143144471 1202672709 1254831157
## [7] 1300532218 1340575338 1375660989 1406402922 1433338904 1456940122
## [13] 1477619433 1495738577 1511614513 1525524955 1537713239 1548392574
## [19] 1557749772 1565948519 1573132234 1579426581 1584941668
The above include the three steps of how to apply the Bass model in real practices. You can follow the procedure and replicate the analysis with other two products: an online ad for Red Bull and a mobile app Sticky Notes.
Two General Functions for Estimating the Bass Model and Making Predictions
The function for estimating the Bass Model estimate_bass
Next, I’m including here a general function to estimate Bass Model. This function may prove useful for your future analysis. The function takes one input: a vector of the no. of adoptions at each time period, sorting in a chronological order. The function outputs the estimated Bass parameters \((p,q,M)\).
Please run the lines and obtain the function object. The detailed comments and explanations are in the function. The function covers Step-1 and Step-2 of the analysis in the previous section.
estimate_bass <- function(x) {
# the function estimate_bass is a function we use for estimation
# it takes one input "x". "x" is a vector of no. of adoptions n(t) in a chronological order.
#Note that the values must be arranged in a chronological order!
# the function outputs a vector of c(p,q,M), the three parameters of Bass model
# create a new variable of cumulative adoptions
cum_x <- cumsum(x)
# run a regression with sales as DV and cum_x and cum_x^2 as IVs
mdl <- lm(x ~ 1 + cum_x + I(cum_x^2))
# get the coefficients
# a: the intercept
# b: the coefficient of cumulative adoptions
# c: the coefficient of squared cumulative adoptions
a <- mdl$coefficients[1]
b <- mdl$coefficients[2]
c <- mdl$coefficients[3]
# solving for p, q and M with a, b and c
M1 <- (-b-sqrt(b^2-4*a*c))/(2*c)
M2 <- (-b+sqrt(b^2-4*a*c))/(2*c)
M <- max(M1,M2) # M is set to the larger of M1 and M2
p <- a/M
q <- -c*M
# output a named vector
bass.par <- c(p,q,M)
names(bass.par) <- c("p","q","M")
return(bass.par)
}Using this function, you can do the estimation. Next, let’s check the (p,q,M) of two other products: red bull ads and sticky notes.
The (p,q,M) of red bull ads:
estimate_bass(red_bull_ads$no_of_adoptions)## p q M
## 4.048058e-03 3.860639e-03 7.610345e+07
The (p,q,M) of sticky notes:
estimate_bass(sticky_notes$no_of_adoptions)## p q M
## 1.682286e-03 1.215977e-02 1.545807e+09
A function for making predictions with Bass Model predict_bass
The function predict_bass takes two inputs. The first input T is the time periods that you would like to make predictions. It can be a single number or a numeric vector. For example, if you want to predict into the next year and choose a week as your time period, you can set T = 1:52. The second input is the Bass parameters bass.par, a numeric vector with three members in the order of p, q, and M. Note that the order matters!
predict_bass <- function(T,bass.par){
# the function "predict_bass" is to predict using the Bass parameters
# it takes in two inputs:
#"T" - the time periods of your predictions
#"bass.par" - the bass parameters in the order of (p,q,M)
# it outputs the cumulative no. of adoptions during T
# unpack bass parameters
p <- bass.par[1]
q <- bass.par[2]
M <- bass.par[3]
# making predictions
N <- M*(1 - (p+q)/(p*exp((p+q)*T+q)))
# return values
return(N)
}We can use this to predict the iphone sales with the same T and the parameters estimated above. We would have the same predicted sales as N_iphone.
predict_bass(T,c(p_iphone,q_iphone,M_iphone))## [1] 909171317 997666196 1075205120 1143144471 1202672709 1254831157
## [7] 1300532218 1340575338 1375660989 1406402922 1433338904 1456940122
## [13] 1477619433 1495738577 1511614513 1525524955 1537713239 1548392574
## [19] 1557749772 1565948519 1573132234 1579426581 1584941668