Read.me
This notebook shows you how to estimate the Bass model with observed no. of adoptions, as well as how to make predictions with the estimated Bass model. You may find some equations and mathematical details are included in the notebook. They are included solely for show-and-tell purpose. You do NOT have to memorize any of these equations whatsoever.
To make things easier, I also include a general function to estimate the Bass model. You can apply this function to future data. For more details of this function, please check the last section.
Data Description
Please download and load data “Diffusion_data.Rdata” from Canvas. You will find three datasets in the file:
- “iphone”: the quarterly sales of iphone (in units)
- “red_bull_ads”: the daily no. of watches of red bull ads
- “sticky_notes”: the daily installations of the Facebook app
Note that the three data frames have the same data structure. They all contain the no. of “adoptions” at different time periods. Or, in Bass model term, each element of the sales variables is \(n(t)\).
For the data frames, the first variable “time” refers to different
time periods in chronicle order. Depending on which the product, the
time periods are different. For iphone data, the time
periods are quarters of the year. For red bull ads and
sticky notes app, the time periods are days. The second
variable no_of_adoptions refers to the no. of sales or
downloads or watches at each time period.
## time no_of_adoptions
## 1 Q2/07 270000
## 2 Q3/07 1119000
## 3 Q4/07 2315000
## 4 Q1/08 1703000
## 5 Q2/08 717000
## 6 Q3/08 6892000
## time no_of_adoptions
## 1 8/29/2007 328040
## 2 8/30/2007 342780
## 3 8/31/2007 322680
## 4 9/1/2007 341925
## 5 9/2/2007 279350
## 6 9/3/2007 332700
## time no_of_adoptions
## 1 8/29/2007 2340934
## 2 8/30/2007 2404545
## 3 8/31/2007 2438134
## 4 9/1/2007 2373025
## 5 9/2/2007 2642917
## 6 9/3/2007 2312700
Analyzing New Product Diffusion with the Bass Model in 3 Steps
We now will use iphone data as an example to see how to
analyze the diffusion of iphones with the Bass model in 3 steps. You may
try the same procedure on red_bull_ads and
sticky_notes data.
A bit of recap of the estimation of Bass model. We derive the
estimation function from the Bass equation with: \[
\frac{f\left( t \right)}{1-F\left( t \right)}=p+qF\left( t \right)
\] Note that the PDF \(f(x)\)
and CDF \(F(x)\) can be re-expressed
with the number of adoptions at each time period \(n(t)\), the cumulative number of adoptiosn
\(N(t)\), and the total market
potential \(M\), as following: \[
\begin{cases}
f\left( t \right) =\frac{n\left( t \right)}{M}\\
F\left( t \right) =\frac{N\left( t \right)}{M}\\
\end{cases}
\] We then submit back these two equations into the Bass equation
and get the main estimation function: a regression of the no. of
adoptions on the cumulative no. of adoptions and the squared cumulative
no. of adoptions, as shown below:
\[
\frac{\frac{n\left( t \right)}{M}}{1-\frac{N\left( t
\right)}{M}}=p+q\frac{N\left( t \right)}{M}
\\
\Rightarrow n\left( t \right) =\left[ p+\frac{qN\left( t \right)}{M}
\right] \left[ M-N\left( t \right) \right]
\\
\Rightarrow n\left( t \right) =a+bN\left( t \right) +c\left[ N\left( t
\right) \right] ^2
\] With a bit of algebra, we know that:
\[
\begin{cases}
a\,\,=pM\\
b=q-p\\
c=-\frac{q}{M}\\
\end{cases}\,\,AND\,\,\begin{cases}
M=\frac{-b + \sqrt{b^2-4ac}}{2c}\\
p=\frac{a}{M}\\
q=-Mc\\
\end{cases}\,\,
\] With the math derivation, we can now start our analysis. Note
that M is solved from a quadratic equation, it has two
roots. We will use the larger value as the marketing potential
M.
Step 1 - Estimating the empirical Bass equation
The first step is to run a regression with the Bass equation. We use
\(n(t)\) as the dependent variable, and
\(N(t)\) and \(N(t)^2\) as independent variables. However,
in the data we only have the variable n(t) or the no. of
adoptions at each time period. We first need to create a new variable of
cumulative adoptions or \(N(t)\). For
this, we use the cumsum() function from the base. Please
use ?cumsum to see how this function works.
# creating a variable of cumulative adoptions
iphone$cum_adoptions <- cumsum(iphone$no_of_adoptions)
iphone$cum_adoptions## [1] 270000 1389000 3704000 5407000 6124000 13016000 17379000
## [8] 21172000 26380000 33747000 42484000 51236000 59634000 73736000
## [15] 89971000 108618000 128956000 146029000 183073000 218137000 244165000
## [22] 271075000 318864000 356294000 387535000 421332000 472357000 516076000
## [29] 551279000 590551000 665019000 726189000 773723000 821769000 896548000
## [36] 947741000 988140000
With the new variable, we can then run a regression to get the coefficients \((a,b,c)\). The estimating equation is described as above. The formulation of the model for the iphone data is like below:
# run a regression with the no. of adoptions as DV and the cumulative adoptions and its squared term as IVs
# the I(cum_adoptions^2) means we are adding the squared term of cum_adoptions to the regression.
mdl_iphone <- lm(no_of_adoptions ~ 1 + cum_adoptions + I(cum_adoptions^2),iphone)
summary(mdl_iphone)##
## Call:
## lm(formula = no_of_adoptions ~ 1 + cum_adoptions + I(cum_adoptions^2),
## data = iphone)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14154003 -3426209 -1046772 2410199 21584443
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.820e+06 2.222e+06 1.269 0.213
## cum_adoptions 1.287e-01 1.504e-02 8.559 5.35e-10 ***
## I(cum_adoptions^2) -8.031e-11 1.679e-11 -4.785 3.26e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8080000 on 34 degrees of freedom
## Multiple R-squared: 0.864, Adjusted R-squared: 0.856
## F-statistic: 108 on 2 and 34 DF, p-value: 1.858e-15
From the regression, we can obtain the \((a,b,c)\) as in the empirical Bass equation, where \(a\) is the intercept, \(b\) is the coefficient of the cumulative adoptions, and \(c\) is the coefficient of the squared cumulative adoptions.
a_iphone <- mdl_iphone$coefficients[1]
b_iphone <- mdl_iphone$coefficients[2]
c_iphone <- mdl_iphone$coefficients[3]
a_iphone## (Intercept)
## 2820363
## cum_adoptions
## 0.1286912
## I(cum_adoptions^2)
## -8.031406e-11
Step 2 - Transforming coefficients to the Bass parameters
Given the \((a,b,c)\) as in the empirical Bass equation, we can recover the Bass parameters \((p,q,M)\) with the equation shown above.
# obtaining M for iphones
# because two roots for the quadratic equation, we get the larger one (or the positive one) as the market size.
# For the max() function, see ?max for more details.
M_iphone <- max((-b_iphone-sqrt(b_iphone^2-4*a_iphone*c_iphone))/(2*c_iphone),
(-b_iphone+sqrt(b_iphone^2-4*a_iphone*c_iphone))/(2*c_iphone))
# given M_iphone, we calculate p and q
p_iphone <- a_iphone/M_iphone
q_iphone <- -c_iphone*M_iphone## p_iphone q_iphone M_iphone
## 1.736705e-03 1.304279e-01 1.623974e+09
Step 3 - Making predictions with the estimation results
With the estimated values of \((p,q,M)\), we can then make predictions with the Bass model. For the predictions, you need another input \(T\) or the time periods you want to predict. When \(T\) is specified, you can use the functions that we discussed in the lecture to make predictions.
The equations are shown below. \(t\) stands for the no. of a time period, e.g., time period \(1,2,3,\cdots\) \[N(t) = M(1-\dfrac{p+q}{pe^{(p+q)t}+q}) \]
# assume we are predicting from the quarter 38 to the quarter 60 for iphones.
# note that the data end at quarter 37.
# a vector of the no. of quarters
T <- 38:60
# getting cumulative no. of adoptions
N_iphone <- M_iphone*(1 -
(p_iphone+q_iphone)/
(p_iphone*exp((p_iphone+q_iphone)*T)+q_iphone))
N_iphone## [1] 1079193147 1126238861 1170894001 1212977303 1252369828 1289011098
## [7] 1322893267 1354054012 1382568778 1408542931 1432104245 1453396042
## [13] 1472571161 1489786845 1505200575 1518966777 1531234344 1542144862
## [19] 1551831422 1560417931 1568018803 1574738948 1580673993
The above include the three steps of how to apply the Bass model in real practices. You can follow the procedure and replicate the analysis with other two products: an online ad for Red Bull and a mobile app Sticky Notes.
Two General Functions for Estimating the Bass Model and Making Predictions
The function for estimating the Bass Model
estimate_bass
Next, I’m including here a general function to estimate Bass Model.
This function may prove useful for your future analysis. The function
takes one input: a vector of the no. of
adoptions at each time period, sorting in a chronological
order. The function outputs the estimated Bass parameters \((p,q,M)\).
Please run the lines and obtain the function object. The detailed comments and explanations are in the function. The function covers Step-1 and Step-2 of the analysis in the previous section.
estimate_bass <- function(x) {
# the function estimate_bass is a function we use for estimation
# it takes one input "x". "x" is a vector of no. of adoptions n(t) in a chronological order.
#Note that the values must be arranged in a chronological order!
# the function outputs a vector of c(p,q,M), the three parameters of Bass model
# create a new variable of cumulative adoptions
cum_x <- cumsum(x)
# run a regression with sales as DV and cum_x and cum_x^2 as IVs
mdl <- lm(x ~ 1 + cum_x + I(cum_x^2))
# get the coefficients
# a: the intercept
# b: the coefficient of cumulative adoptions
# c: the coefficient of squared cumulative adoptions
a <- mdl$coefficients[1]
b <- mdl$coefficients[2]
c <- mdl$coefficients[3]
# solving for p, q and M with a, b and c
M1 <- (-b-sqrt(b^2-4*a*c))/(2*c)
M2 <- (-b+sqrt(b^2-4*a*c))/(2*c)
M <- max(M1,M2) # M is set to the larger of M1 and M2
p <- a/M
q <- -c*M
# output a named vector
bass.par <- c(p,q,M)
names(bass.par) <- c("p","q","M")
return(bass.par)
}Using this function, you can do the estimation. Next, let’s check the
(p,q,M) of two other products: red bull ads
and sticky notes.
The (p,q,M) of red bull ads:
## p q M
## 4.048058e-03 3.860639e-03 7.610345e+07
The (p,q,M) of sticky notes:
## p q M
## 1.682286e-03 1.215977e-02 1.545807e+09
A function for making predictions with Bass Model
predict_bass
The function predict_bass takes two inputs. The first
input T is the time periods that you would like to make
predictions. It can be a single number or a numeric vector. For example,
if you want to predict into the next year and choose a week as your time
period, you can set T = 1:52. The second input is the Bass
parameters bass.par, a numeric vector with three members in
the order of p, q, and M. Note
that the order matters!
predict_bass <- function(T,bass.par) {
# the function "predict_bass" is to predict using the Bass parameters
# it takes in two inputs:
#"T" - the time periods of your predictions
#"bass.par" - the bass parameters in the order of (p,q,M)
# it outputs the cumulative no. of adoptions during T
# unpack bass parameters
p <- bass.par[1]
q <- bass.par[2]
M <- bass.par[3]
# making predictions
N <- M*(1 - (p+q)/(p*exp((p+q)*T)+q))
# return values
return(N)
}We can use this to predict the iphone sales with the same
T and the parameters estimated above. We would have the
same predicted sales as N_iphone.
## [1] 1079193147 1126238861 1170894001 1212977303 1252369828 1289011098
## [7] 1322893267 1354054012 1382568778 1408542931 1432104245 1453396042
## [13] 1472571161 1489786845 1505200575 1518966777 1531234344 1542144862
## [19] 1551831422 1560417931 1568018803 1574738948 1580673993