1 Read.me

This notebook shows you how to estimate the Bass model with observed no. of adoptions, as well as how to make predictions with the Bass parameters \(\{p,q,M\}\). You may find some equations and mathematical details are included in the notebook. They are included solely for show-and-tell purpose. You can skip them if you want.

We first load the MSR package where you can find the data and functions used in this notebook.

library(MSR)

2 Data Description

There are three data frames we can use for the estimation of the Bass model.

  • “iphone”: the quarterly sales of iphone (in units)
  • “red_bull_ads”: the daily no. of watches of red bull ads
  • “sticky_notes”: the daily installations of the Facebook app

Note that the three data frames have the same data structure. They all contain the no. of “adoptions” at different time periods. Or, in Bass model term, \(n(t)\). For the data frames, the first variable “time” refers to different time periods in chronological order. Depending on the product, the time periods are different. For iphone data, the time periods are quarters of the year. For red_bull_ads and sticky_notes app, the time periods are days. The second variable no_of_adoptions refers to the no. of sales or downloads or watches at each time period.

# load data from the package
data(list = c("iphone","red_bull_ads","sticky_notes"))

# see the head of each data frame
head(iphone)
##    time no_of_adoptions
## 1 Q2/07          270000
## 2 Q3/07         1119000
## 3 Q4/07         2315000
## 4 Q1/08         1703000
## 5 Q2/08          717000
## 6 Q3/08         6892000
head(red_bull_ads)
##        time no_of_adoptions
## 1 8/29/2007          328040
## 2 8/30/2007          342780
## 3 8/31/2007          322680
## 4  9/1/2007          341925
## 5  9/2/2007          279350
## 6  9/3/2007          332700
head(sticky_notes)
##        time no_of_adoptions
## 1 8/29/2007         2340934
## 2 8/30/2007         2404545
## 3 8/31/2007         2438134
## 4  9/1/2007         2373025
## 5  9/2/2007         2642917
## 6  9/3/2007         2312700

3 Analyzing New Product Diffusion with the Bass Model in 3 Steps

We now will use iphone data as an example to see how to analyze the diffusion of iphones with the Bass model in 3 steps. You may try the same procedure on red_bull_ads and sticky_notes data.

A bit of recap of the estimation of Bass model. We derive the estimation function from the Bass equation as below. With a bit of algebra, we obtain a regression of the no. of adoptions on the cumulative no. of adoptions and the squared cumulative no. of adoptions. The mathematical derivation is shown below. These are added just for reference purpose.
\[ \frac{n\left( t \right)}{M-N\left( t \right)}=p+q\frac{N\left( t \right)}{M} \\ \Rightarrow n\left( t \right) =\left[ p+\frac{qN\left( t \right)}{M} \right] \left[ M-N\left( t \right) \right] \\ \Rightarrow n\left( t \right) =a+bN\left( t \right) +c\left[ N\left( t \right) \right] ^2 \] With a bit of algebra, we know that:
\[ \begin{cases} a\,\,=pM\\ b=q-p\\ c=-\frac{q}{M}\\ \end{cases}\,\,AND\,\,\begin{cases} M=\frac{-b\pm \sqrt{b^2-4ac}}{2c}\\ p=\frac{a}{M}\\ q=-Mc\\ \end{cases}\,\, \] With the math derivation, we can now start our analysis.

3.1 Estimating the empirical Bass equation

3.1.1 Using the estimate_bass function in MSR

You can use a function in the MSR package to estimate the model. The function takes one input: the no. of adoptions at each time ordered chronologically. It outputs the estimated Bass parameters \(\{p,q,M\}\) as a vector. Please use ?estimate_bass or help(estimate_bass) to see more details about the function.

# estimate the Bass parameters of iphone
bass.iphone <- estimate_bass(iphone$no_of_adoptions)
bass.iphone
##            p            q            M 
## 1.736705e-03 1.304279e-01 1.623974e+09

Note: for the interpretation of the results, you need to know:

  • The definitions of \(p\), \(q\) and \(M\).
  • For a product, the understanding why its \(p\), \(q\) and \(M\) are those values, which requires some knowledge of product features.
    • For example, we would expect the imitation parameter \(q\) of a viral video (like Coincidance) higher than those of an iPad as the video is more contagious.
    • If you are asked these questions in the exam, you will be given related information.

3.1.2 (Optional) Coding the estimation in R

The first step is to run a regression with the Bass equation. We use \(n(t)\) as the dependent variable, and \(N(t)\) and \(N(t)^2\) as independent variables. However, in the data we only have the variable n(t) or the no. of adoptions at each time period. We first need to create a new variable of cumulative adoptions or \(N(t)\). For this, we use the cumsum() function from the base. Please use ?cumsum to see how this function works.

# creating a variable of cumulative adoptions
iphone$cum_adoptions <- cumsum(iphone$no_of_adoptions)
iphone$cum_adoptions
##  [1]    270000   1389000   3704000   5407000   6124000  13016000  17379000
##  [8]  21172000  26380000  33747000  42484000  51236000  59634000  73736000
## [15]  89971000 108618000 128956000 146029000 183073000 218137000 244165000
## [22] 271075000 318864000 356294000 387535000 421332000 472357000 516076000
## [29] 551279000 590551000 665019000 726189000 773723000 821769000 896548000
## [36] 947741000 988140000

With the new variable, we can then run a regression to get the coefficients \((a,b,c)\). The estimating equation is described as above. The formulation of the model for the iphone data is like below:

 # run a regression with the no. of adoptions as DV and the cumulative adoptions and its squared term as IVs
# the I(cum_adoptions^2) means we are adding the squared term of cum_adoptions to the regression. 
mdl_iphone <- lm(no_of_adoptions ~ 1 + cum_adoptions + I(cum_adoptions^2),iphone)
summary(mdl_iphone)
## 
## Call:
## lm(formula = no_of_adoptions ~ 1 + cum_adoptions + I(cum_adoptions^2), 
##     data = iphone)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -14154003  -3426209  -1046772   2410199  21584443 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         2.820e+06  2.222e+06   1.269    0.213    
## cum_adoptions       1.287e-01  1.504e-02   8.559 5.35e-10 ***
## I(cum_adoptions^2) -8.031e-11  1.679e-11  -4.785 3.26e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8080000 on 34 degrees of freedom
## Multiple R-squared:  0.864,  Adjusted R-squared:  0.856 
## F-statistic:   108 on 2 and 34 DF,  p-value: 1.858e-15

From the regression, we can obtain the \((a,b,c)\) as in the empirical Bass equation, where \(a\) is the intercept, \(b\) is the coefficient of the cumulative adoptions, and \(c\) is the coefficient of the squared cumulative adoptions.

a_iphone <- mdl_iphone$coefficients[1]
b_iphone <- mdl_iphone$coefficients[2]
c_iphone <- mdl_iphone$coefficients[3]

a_iphone
## (Intercept) 
##     2820363
b_iphone
## cum_adoptions 
##     0.1286912
c_iphone
## I(cum_adoptions^2) 
##      -8.031406e-11

Given the \((a,b,c)\) as in the empirical Bass equation, we can recover the Bass parameters \((p,q,M)\) with the equation shown above.

# obtaining M for iphones
# because two roots for the quadratic equation, we get the larger one (or the positive one) as the market size. 
# For the max() function, see ?max for more details.
M_iphone <- max((-b_iphone-sqrt(b_iphone^2-4*a_iphone*c_iphone))/(2*c_iphone), 
                (-b_iphone+sqrt(b_iphone^2-4*a_iphone*c_iphone))/(2*c_iphone))

# given M_iphone, we calculate p and q
p_iphone <- a_iphone/M_iphone
q_iphone <- -c_iphone*M_iphone

Note: for the interpretation of the results, you need to know:

  • The definitions of \(p\), \(q\) and \(M\).
  • For a product, the understanding why its \(p\), \(q\) and \(M\) are those values, which requires some knowledge of product features.
    • For example, we would expect the imitation parameter \(q\) of a viral video (like Coincidance) higher than those of an iPad as the video is more contagious.
    • If you are asked these questions in the exam, you will be given related information.

3.2 Making predictions with the estimation results

3.2.1 Using the predcit_bass function in the package MSR

With the estimated Bass parameters of iphone para.iphone, you can make predictions of the cumulative no. of adoptions \(N(t)\) of iphone. For this, you can use the function predict_bass in the package. The function takes into two inputs: T is the no. of time periods of your predictions and bass.par is the vector of Bass parameters in the order of \((p,q,M)\). Please use ?predict_bass or help(predict_bass) for more details.

In the following example, we set T = 38:60 as the data ends in quarter 37, and we predict the cumulative sales from quarter 38 onwards (until quarter 60). The Bass parameters are from the previous estimation results bass.iphone.

# set the value of T
T <- 38:60

# do the prediction
N_iphone <- predict_bass(T,bass.iphone)
N_iphone
##  [1] 1079193147 1126238861 1170894001 1212977303 1252369828 1289011098
##  [7] 1322893267 1354054012 1382568778 1408542931 1432104245 1453396042
## [13] 1472571161 1489786845 1505200575 1518966777 1531234344 1542144862
## [19] 1551831422 1560417931 1568018803 1574738948 1580673993

Here, N_iphone gives you the cumulative no. of adoptions of iphones at quarter 38, 39, …, 60. You can then use the results in your report, e.g., creating a growth curve.

3.2.2 (Optional) Coding the prediction with Bass parameters in R

With the estimated values of \((p,q,M)\), we can then make predictions with the Bass model. For the predictions, you need another input \(T\) or the time periods you want to predict. When \(T\) is specified, you can use the functions that we discussed in the lecture to make predictions.

The equations are shown below. \(t\) stands for the no. of a time period, e.g., time period \(1,2,3,\cdots\). Using the equation, we can code the prediction of the cumulative no. of adoptions from quarter 38 to 60. The Bass parameters are from the above estimation, i.e., p_iphone, q_iphone and M_iphone. \[N(t) = M(1-\dfrac{p+q}{pe^{(p+q)t}+q})\]

# assume we are predicting from the quarter 38 to the quarter 60 for iphones.
# note that the data end at quarter 37. 
# a vector of the no. of quarters
T <- 38:60

# getting cumulative no. of adoptions at each time period
N_iphone <- M_iphone*(1 - 
                        (p_iphone+q_iphone)/
                        (p_iphone*exp((p_iphone+q_iphone)*T)+q_iphone))
N_iphone
##  [1] 1079193147 1126238861 1170894001 1212977303 1252369828 1289011098
##  [7] 1322893267 1354054012 1382568778 1408542931 1432104245 1453396042
## [13] 1472571161 1489786845 1505200575 1518966777 1531234344 1542144862
## [19] 1551831422 1560417931 1568018803 1574738948 1580673993

With the predicted values of N_iphone, you can put it in your report and present the results to your clients as in real practices (e.g., making a growth curve from quarter 38 to 60).

