#install.packages("mlogit") #it needs R version to be 3.5 or newer
#install.packages("data.table")
library(mlogit)
library(data.table)
yogurtdata = fread("yogurt_3brands.csv")
#change the names, so that the software can recognize which column is for which choice alternative
#The names of each choice alternative needs to be consistent with those in the Choice variable
setnames(yogurtdata, c("Feature_S", "Feature_D", "Feature_Y", "HH Size", "Pan ID"),
c("Feature.Stonyfield", "Feature.Dannon", "Feature.Yoplait",
"HHSize", "PanID"))
setnames(yogurtdata, c("Price_S", "Price_D", "Price_Y"),
c("Price.Stonyfield", "Price.Dannon", "Price.Yoplait"))
Please get ready the data for estimating the following MNL model, specified using the latent utility functions for each of the three brands
You need to create an additional column in yogurtdata, called “Choice”, indicating the choices made by each person, and set this column to be a factor.
# Create a Choice variable that lists the choice made
yogurtdata[Stonyfield==1, Choice := "Stonyfield"]
yogurtdata[Dannon==1, Choice := "Dannon"]
yogurtdata[Yoplait==1, Choice := "Yoplait"]
yogurtdata[, Choice := as.factor(Choice)]
yogurtdata[, c("Stonyfield","Dannon","Yoplait"):= NULL]#remove these three columns
head(yogurtdata)
## Index Feature.Stonyfield Feature.Yoplait Feature.Dannon Price.Stonyfield
## 1: 1 0 0 0 0.108
## 2: 2 0 0 0 0.108
## 3: 3 0 0 0 0.108
## 4: 4 0 0 0 0.108
## 5: 5 0 0 0 0.125
## 6: 6 0 0 0 0.108
## Price.Yoplait Price.Dannon Income HHSize PanID Choice
## 1: 0.081 0.061 9 2 1 Dannon
## 2: 0.098 0.064 9 2 1 Yoplait
## 3: 0.098 0.061 9 2 1 Yoplait
## 4: 0.098 0.061 9 2 1 Yoplait
## 5: 0.098 0.049 9 2 1 Yoplait
## 6: 0.092 0.050 9 2 1 Yoplait
Then you need to setup the data format that is understandable by the package, using mlogit.data()
yl = mlogit.data(yogurtdata[,-c("Index" )], shape="wide",
choice="Choice", id="PanID", varying=1:6)
head(yl)
## Income HHSize PanID Choice alt Feature Price chid
## 1319 9 2 1 TRUE Dannon 0 0.061 1
## 1 9 2 1 FALSE Stonyfield 0 0.108 1
## 660 9 2 1 FALSE Yoplait 0 0.081 1
## 1320 9 2 1 FALSE Dannon 0 0.064 2
## 2 9 2 1 FALSE Stonyfield 0 0.108 2
## 661 9 2 1 TRUE Yoplait 0 0.098 2
The format for using mFormula() is the following
Choice ~ X different, beta same |X same, beta same |X different, beta different
f <- mFormula(Choice ~ Feature+Price | Income + HHSize)
# Estimate the model
ml <- mlogit(f, yl, reflevel="Dannon")
summary(ml)
##
## Call:
## mlogit(formula = Choice ~ Feature + Price | Income + HHSize,
## data = yl, reflevel = "Dannon", method = "nr")
##
## Frequencies of alternatives:
## Dannon Stonyfield Yoplait
## 0.33687 0.33080 0.33232
##
## nr method
## 4 iterations, 0h:0m:0s
## g'(-H)^-1g = 8.68E-08
## gradient close to zero
##
## Coefficients :
## Estimate Std. Error z-value Pr(>|z|)
## Stonyfield:(intercept) 1.572326 0.369253 4.2581 2.061e-05 ***
## Yoplait:(intercept) 2.848940 0.318431 8.9468 < 2.2e-16 ***
## Feature 0.371186 0.206549 1.7971 0.07232 .
## Price -23.480763 3.667916 -6.4017 1.537e-10 ***
## Stonyfield:Income -0.125584 0.030431 -4.1268 3.678e-05 ***
## Yoplait:Income -0.218509 0.030981 -7.0529 1.752e-12 ***
## Stonyfield:HHSize 0.265701 0.116981 2.2713 0.02313 *
## Yoplait:HHSize -0.096554 0.115666 -0.8348 0.40385
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -632.78
## McFadden R^2: 0.12595
## Likelihood ratio test : chisq = 182.36 (p.value = < 2.22e-16)
Q: Please interpret the model estimation results.
The intercepts for both Stonyfield and Yoplait are positive, with Dannon as the reference variable, it is easy to mark Dannon as the least preferred brand.
The Feature parameter for all brands are the same, and it is positive and statistically significant, and it is positive but less than 1. This means a marginal increase in the feature would increase the utility of every brand by 0.37.
The Price parameter for all brands are the same, and it is negative and statistically significant, Marginal value of the utility for price is quite high, this means that if any of the brand increases its price by 1 unit, it would have a high impact on the Competitiveness of the brand w.r.t. Dannon
The Income parameter for both brands are negative, meaning holding everything else the same, the families with higher income tend to prefer Dannon; with not slightly higher income tend to prefer Stonyfield.
The HHsize parameter for Stonyfield is positive, meaning holding everything else constant, the larger families tend to prefer Stonyfield over Dannon. The parameter for Yoplait is essentially zero, meaning they are indifferent between Dannon and Yoplait.
In the above model, all brands are constrained to have the same price parameter. Re-estimate the above model, but instead allow the price parameter to be brand specific, that is different across brands.
ydatanew = yogurtdata
ylnew = mlogit.data(ydatanew[,-c("Index" )], shape="wide",
choice="Choice", id="PanID", varying=1:6)
fnew <- mFormula(Choice ~ Feature | Income + HHSize | Price)
# Estimate the model
mlnew <- mlogit(fnew, yl, reflevel="Dannon")
summary(mlnew)
##
## Call:
## mlogit(formula = Choice ~ Feature | Income + HHSize | Price,
## data = yl, reflevel = "Dannon", method = "nr")
##
## Frequencies of alternatives:
## Dannon Stonyfield Yoplait
## 0.33687 0.33080 0.33232
##
## nr method
## 4 iterations, 0h:0m:0s
## g'(-H)^-1g = 2.79E-07
## gradient close to zero
##
## Coefficients :
## Estimate Std. Error z-value Pr(>|z|)
## Stonyfield:(intercept) 0.359542 0.765243 0.4698 0.6384689
## Yoplait:(intercept) 1.707557 1.018121 1.6772 0.0935103 .
## Feature 0.325126 0.211679 1.5359 0.1245541
## Stonyfield:Income -0.139728 0.031674 -4.4114 1.027e-05 ***
## Yoplait:Income -0.229798 0.031981 -7.1854 6.701e-13 ***
## Stonyfield:HHSize 0.293239 0.118332 2.4781 0.0132082 *
## Yoplait:HHSize -0.074036 0.117278 -0.6313 0.5278521
## Dannon:Price -42.594788 11.100031 -3.8374 0.0001244 ***
## Stonyfield:Price -21.209482 4.115792 -5.1532 2.561e-07 ***
## Yoplait:Price -21.622927 9.535203 -2.2677 0.0233478 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -631.07
## McFadden R^2: 0.12831
## Likelihood ratio test : chisq = 185.79 (p.value = < 2.22e-16)
prob = predict(ml,yl)
probnew=predict(mlnew,ylnew)
colMeans(prob)
## Dannon Stonyfield Yoplait
## 0.3368741 0.3308042 0.3323217
colMeans(probnew)
## Dannon Stonyfield Yoplait
## 0.3368741 0.3308042 0.3323217
Q: Compare the above two models
in the following: - First, based on the price parameters, do you think it makes sense to constrain them to be the same across the three brands?
Keeping this Model outside, It makes more sense to not constrain price parameters to be the same across the three brands. This makes sense as Having marginal utility for every brand in a product segment/line is not possible.
Although the first model gave us a much better fit in terms of stastically significant factors, but it holds good in theory. In practice its much better to have different price parameters across the brands.
AIC(ml)
## [1] 1281.567
AIC(mlnew)
## [1] 1282.146
Looking at the AIC values, It is bit dificult to compare the model fit as they are so close to each other. Ideally a lower AIC value is considered a better fit. But the difference between both the AIC value is just 1. Second way to pick a better model out of both is to see which model has the highest number of statstically significant factors.
But There is no right and wrong answer here, as having same price coefficient for three different brands also does not make sense.