Read.me

This notebook shows you how to estimate the Bass model with observed no. of adoptions, as well as how to make predictions with the estimated Bass model. You may find some equations and mathematical details are included in the notebook. They are included solely for show-and-tell purpose. You do NOT have to memorize any of these equations whatsoever.

To make things easier, I also include a general function to estimate the Bass model. You can apply this function to future data. For more details of this function, please check the last section.

Data Description

Please download and load data “Diffusion_data.Rdata” from Canvas. You will find three datasets in the file:

Note that the three data frames have the same data structure. They all contain the no. of “adoptions” at different time periods. Or, in Bass model term, each element of the sales variables is \(n(t)\).

For the data frames, the first variable “time” refers to different time periods in chronicle order. Depending on which the product, the time periods are different. For iphone data, the time periods are quarters of the year. For red bull ads and sticky notes app, the time periods are days. The second variable no_of_adoptions refers to the no. of sales or downloads or watches at each time period.

load("Diffusion_data.RData")
head(iphone)
##    time no_of_adoptions
## 1 Q2/07          270000
## 2 Q3/07         1119000
## 3 Q4/07         2315000
## 4 Q1/08         1703000
## 5 Q2/08          717000
## 6 Q3/08         6892000
head(red_bull_ads)
##        time no_of_adoptions
## 1 8/29/2007          328040
## 2 8/30/2007          342780
## 3 8/31/2007          322680
## 4  9/1/2007          341925
## 5  9/2/2007          279350
## 6  9/3/2007          332700
head(sticky_notes)
##        time no_of_adoptions
## 1 8/29/2007         2340934
## 2 8/30/2007         2404545
## 3 8/31/2007         2438134
## 4  9/1/2007         2373025
## 5  9/2/2007         2642917
## 6  9/3/2007         2312700

Analyzing New Product Diffusion with the Bass Model in 3 Steps

We now will use iphone data as an example to see how to analyze the diffusion of iphones with the Bass model in 3 steps. You may try the same procedure on red_bull_ads and sticky_notes data.

A bit of recap of the estimation of Bass model. We derive the estimation function from the Bass equation with: \[ \frac{f\left( t \right)}{1-F\left( t \right)}=p+qF\left( t \right) \] Note that the PDF \(f(x)\) and CDF \(F(x)\) can be re-expressed with the number of adoptions at each time period \(n(t)\), the cumulative number of adoptiosn \(N(t)\), and the total market potential \(M\), as following: \[ \begin{cases} f\left( t \right) =\frac{n\left( t \right)}{M}\\ F\left( t \right) =\frac{N\left( t \right)}{M}\\ \end{cases} \] We then submit back these two equations into the Bass equation and get the main estimation function: a regression of the no. of adoptions on the cumulative no. of adoptions and the squared cumulative no. of adoptions, as shown below:
\[ \frac{\frac{n\left( t \right)}{M}}{1-\frac{N\left( t \right)}{M}}=p+q\frac{N\left( t \right)}{M} \\ \Rightarrow n\left( t \right) =\left[ p+\frac{qN\left( t \right)}{M} \right] \left[ M-N\left( t \right) \right] \\ \Rightarrow n\left( t \right) =a+bN\left( t \right) +c\left[ N\left( t \right) \right] ^2 \] With a bit of algebra, we know that:
\[ \begin{cases} a\,\,=pM\\ b=q-p\\ c=-\frac{q}{M}\\ \end{cases}\,\,AND\,\,\begin{cases} M=\frac{-b\pm \sqrt{b^2-4ac}}{2c}\\ p=\frac{a}{M}\\ q=-Mc\\ \end{cases}\,\, \] With the math derivation, we can now start our analysis. Note that M is solved from a quadratic equation, it has two roots. We will use the largest value as the marketing potential M.

Step 1 - Estimating the empirical Bass equation

The first step is to run a regression with the Bass equation. We use \(n(t)\) as the dependent variable, and \(N(t)\) and \(N(t)^2\) as independent variables. However, in the data we only have the variable n(t) or the no. of adoptions at each time period. We first need to create a new variable of cumulative adoptions or \(N(t)\). For this, we use the cumsum() function from the base. Please use ?cumsum to see how this function works.

# creating a variable of cumulative adoptions
iphone$cum_adoptions <- cumsum(iphone$no_of_adoptions)
iphone$cum_adoptions
##  [1]    270000   1389000   3704000   5407000   6124000  13016000  17379000
##  [8]  21172000  26380000  33747000  42484000  51236000  59634000  73736000
## [15]  89971000 108618000 128956000 146029000 183073000 218137000 244165000
## [22] 271075000 318864000 356294000 387535000 421332000 472357000 516076000
## [29] 551279000 590551000 665019000 726189000 773723000 821769000 896548000
## [36] 947741000 988140000

With the new variable, we can then run a regression to get the coefficients \((a,b,c)\). The estimating equation is described as above. The formulation of the model for the iphone data is like below:

 # run a regression with the no. of adoptions as DV and the cumulative adoptions and its squared term as IVs
# the I(cum_adoptions^2) means we are adding the squared term of cum_adoptions to the regression. 
mdl_iphone <- lm(no_of_adoptions ~ 1 + cum_adoptions + I(cum_adoptions^2),iphone)
summary(mdl_iphone)
## 
## Call:
## lm(formula = no_of_adoptions ~ 1 + cum_adoptions + I(cum_adoptions^2), 
##     data = iphone)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -14154003  -3426209  -1046772   2410199  21584443 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         2.820e+06  2.222e+06   1.269    0.213    
## cum_adoptions       1.287e-01  1.504e-02   8.559 5.35e-10 ***
## I(cum_adoptions^2) -8.031e-11  1.679e-11  -4.785 3.26e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8080000 on 34 degrees of freedom
## Multiple R-squared:  0.864,  Adjusted R-squared:  0.856 
## F-statistic:   108 on 2 and 34 DF,  p-value: 1.858e-15

From the regression, we can obtain the \((a,b,c)\) as in the empirical Bass equation, where \(a\) is the intercept, \(b\) is the coefficient of the cumulative adoptions, and \(c\) is the coefficient of the squared cumulative adoptions.

a_iphone <- mdl_iphone$coefficients[1]
b_iphone <- mdl_iphone$coefficients[2]
c_iphone <- mdl_iphone$coefficients[3]

a_iphone
## (Intercept) 
##     2820363
b_iphone
## cum_adoptions 
##     0.1286912
c_iphone
## I(cum_adoptions^2) 
##      -8.031406e-11

Step 2 - Transforming coefficients to the Bass parameters

Given the \((a,b,c)\) as in the empirical Bass equation, we can recover the Bass parameters \((p,q,M)\) with the equation shown above.

# obtaining M for iphones
# because two roots for the quadratic equation, we get the larger one (or the positive one) as the market size. 
# For the max() function, see ?max for more details.
M_iphone <- max((-b_iphone-sqrt(b_iphone^2-4*a_iphone*c_iphone))/(2*c_iphone), 
                (-b_iphone+sqrt(b_iphone^2-4*a_iphone*c_iphone))/(2*c_iphone))

# given M_iphone, we calculate p and q
p_iphone <- a_iphone/M_iphone
q_iphone <- -c_iphone*M_iphone

Step 3 - Making predictions with the estimation results

With the estimated values of \((p,q,M)\), we can then make predictions with the Bass model. For the predictions, you need another input \(T\) or the time periods you want to predict. When \(T\) is specified, you can use the functions that we discussed in the lecture to make predictions.

The equations are shown below. \(t\) stands for the no. of a time period, e.g., time period \(1,2,3,\cdots\) \[N(t) = M(1-\dfrac{p+q}{pe^{(p+q)t}+q}) \]

# assume we are predicting from the quarter 38 to the quarter 60 for iphones.
# note that the data end at quarter 37. 
# a vector of the no. of quarters
T <- 38:60

# getting cumulative no. of adoptions 
N_iphone <- M_iphone*(1 - 
                        (p_iphone+q_iphone)/
                        (p_iphone*exp((p_iphone+q_iphone)*T+q_iphone)))
N_iphone
##  [1]  909171317  997666196 1075205120 1143144471 1202672709 1254831157
##  [7] 1300532218 1340575338 1375660989 1406402922 1433338904 1456940122
## [13] 1477619433 1495738577 1511614513 1525524955 1537713239 1548392574
## [19] 1557749772 1565948519 1573132234 1579426581 1584941668

The above include the three steps of how to apply the Bass model in real practices. You can follow the procedure and replicate the analysis with other two products: an online ad for Red Bull and a mobile app Sticky Notes.

Two General Functions for Estimating the Bass Model and Making Predictions

The function for estimating the Bass Model estimate_bass

Next, I’m including here a general function to estimate Bass Model. This function may prove useful for your future analysis. The function takes one input: a vector of the no. of adoptions at each time period, sorting in a chronological order. The function outputs the estimated Bass parameters \((p,q,M)\).

Please run the lines and obtain the function object. The detailed comments and explanations are in the function. The function covers Step-1 and Step-2 of the analysis in the previous section.

estimate_bass <- function(x) {
  
  # the function estimate_bass is a function we use for estimation
  # it takes one input "x". "x" is a vector of no. of adoptions n(t) in a chronological order. 
  #Note that the values must be arranged in a chronological order!
  # the function outputs a vector of c(p,q,M), the three parameters of Bass model
   
    # create a new variable of cumulative adoptions
  cum_x <- cumsum(x)
  
  # run a regression with sales as DV and cum_x and cum_x^2 as IVs
  mdl <- lm(x ~ 1 + cum_x + I(cum_x^2))
  
  # get the coefficients 
  # a: the intercept 
  # b: the coefficient of cumulative adoptions
  # c: the coefficient of squared cumulative adoptions
  a <- mdl$coefficients[1]
  b <- mdl$coefficients[2]
  c <- mdl$coefficients[3]
  
  # solving for p, q and M with a, b and c
  M1 <- (-b-sqrt(b^2-4*a*c))/(2*c)
  M2 <- (-b+sqrt(b^2-4*a*c))/(2*c)
  M <- max(M1,M2) # M is set to the larger of M1 and M2
  
  p <- a/M
  q <- -c*M
  
  # output a named vector 
  bass.par <- c(p,q,M)
  names(bass.par) <- c("p","q","M")
  
  return(bass.par)
}

Using this function, you can do the estimation. Next, let’s check the (p,q,M) of two other products: red bull ads and sticky notes.

The (p,q,M) of red bull ads:

estimate_bass(red_bull_ads$no_of_adoptions)
##            p            q            M 
## 4.048058e-03 3.860639e-03 7.610345e+07

The (p,q,M) of sticky notes:

estimate_bass(sticky_notes$no_of_adoptions)
##            p            q            M 
## 1.682286e-03 1.215977e-02 1.545807e+09

A function for making predictions with Bass Model predict_bass

The function predict_bass takes two inputs. The first input T is the time periods that you would like to make predictions. It can be a single number or a numeric vector. For example, if you want to predict into the next year and choose a week as your time period, you can set T = 1:52. The second input is the Bass parameters bass.par, a numeric vector with three members in the order of p, q, and M. Note that the order matters!

predict_bass <- function(T,bass.par){
  
  # the function "predict_bass" is to predict using the Bass parameters
  # it takes in two inputs: 
    #"T" - the time periods of your predictions
    #"bass.par" - the bass parameters in the order of (p,q,M)
  # it outputs the cumulative no. of adoptions during T
  
  # unpack bass parameters
  p <- bass.par[1]
  q <- bass.par[2]
  M <- bass.par[3]
  
  # making predictions 
  N <- M*(1 - (p+q)/(p*exp((p+q)*T+q)))
  
  # return values
  return(N)
  
}

We can use this to predict the iphone sales with the same T and the parameters estimated above. We would have the same predicted sales as N_iphone.

predict_bass(T,c(p_iphone,q_iphone,M_iphone))
##  [1]  909171317  997666196 1075205120 1143144471 1202672709 1254831157
##  [7] 1300532218 1340575338 1375660989 1406402922 1433338904 1456940122
## [13] 1477619433 1495738577 1511614513 1525524955 1537713239 1548392574
## [19] 1557749772 1565948519 1573132234 1579426581 1584941668