An tutomotive company is trying to determine what type of engine the car should have. The questions include:
1) Do customers like or dislike hybrid engines?
2) If they like them, how much more they are willing to pay?
3) Are there segments of customers who like hybrid engines more than other customers?
cbc.df <- read.csv("http://goo.gl/5xQObB", colClasses = c(seat = "factor", price = "factor" ))
summary(cbc.df)
## resp.id ques alt carpool seat cargo
## Min. : 1.00 Min. : 1 Min. :1 no :6345 6:3024 2ft:4501
## 1st Qu.: 50.75 1st Qu.: 4 1st Qu.:1 yes:2655 7:2993 3ft:4499
## Median :100.50 Median : 8 Median :2 8:2983
## Mean :100.50 Mean : 8 Mean :2
## 3rd Qu.:150.25 3rd Qu.:12 3rd Qu.:3
## Max. :200.00 Max. :15 Max. :3
## eng price choice
## elec:3010 30:2998 Min. :0.0000
## gas :3005 35:2997 1st Qu.:0.0000
## hyb :2985 40:3005 Median :0.0000
## Mean :0.3333
## 3rd Qu.:1.0000
## Max. :1.0000
Data summary: there are 200 respondents conducted the survey. Each survey includes 15 questions, each question has 3 options or 3 althernatives. There are 5 product features or attributes.
xtabs(choice ~ carpool, data=cbc.df)
## carpool
## no yes
## 2115 885
xtabs(choice ~ seat, data=cbc.df)
## seat
## 6 7 8
## 1164 854 982
xtabs(choice ~ cargo, data=cbc.df)
## cargo
## 2ft 3ft
## 1312 1688
xtabs(choice ~ eng, data=cbc.df)
## eng
## elec gas hyb
## 608 1444 948
xtabs(choice ~ price, data=cbc.df)
## price
## 30 35 40
## 1486 956 558
library(mlogit)
## Loading required package: Formula
## Loading required package: maxLik
## Loading required package: miscTools
##
## Please cite the 'maxLik' package as:
## Henningsen, Arne and Toomet, Ott (2011). maxLik: A package for maximum likelihood estimation in R. Computational Statistics 26(3), 443-458. DOI 10.1007/s00180-010-0217-1.
##
## If you have questions, suggestions, or comments regarding the 'maxLik' package, please use a forum or 'tracker' at maxLik's R-Forge site:
## https://r-forge.r-project.org/projects/maxlik/
cbc.mlogit <- mlogit.data(data=cbc.df, choice="choice", shape="long", varying=3:6, alt.levels=paste("pos", 1:3), id.var="resp.id")
cbc.ml <- mlogit(choice ~ 0 + seat + cargo + eng + price, data = cbc.mlogit)
summary(cbc.ml)
##
## Call:
## mlogit(formula = choice ~ 0 + seat + cargo + eng + price, data = cbc.mlogit,
## method = "nr", print.level = 0)
##
## Frequencies of alternatives:
## pos 1 pos 2 pos 3
## 0.32700 0.33467 0.33833
##
## nr method
## 5 iterations, 0h:0m:0s
## g'(-H)^-1g = 7.84E-05
## successive function values within tolerance limits
##
## Coefficients :
## Estimate Std. Error t-value Pr(>|t|)
## seat7 -0.535280 0.062360 -8.5837 < 2.2e-16 ***
## seat8 -0.305840 0.061129 -5.0032 5.638e-07 ***
## cargo3ft 0.477449 0.050888 9.3824 < 2.2e-16 ***
## enggas 1.530762 0.067456 22.6926 < 2.2e-16 ***
## enghyb 0.719479 0.065529 10.9796 < 2.2e-16 ***
## price35 -0.913656 0.060601 -15.0765 < 2.2e-16 ***
## price40 -1.725851 0.069631 -24.7856 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -2581.6
The estimate of coefficients lists the mean values for each level relative to base level. Take seat7 for example, compared to seat6, seat7 is less populor (negative); similarly, cargo3ft is more popular than cargo 2 ft. The parameter estimates are on the logit scale and typically range between -2 to 2. With higher value, suggest strongly like (positive) or dislike (negative)
0 in this model indicates we did not include an intercept, if we remove 0 then we will have intercept, two additional parameters that indicate preference for the different positions in the question (left, middle, or right). Usually, we don’t want to know if customers prefer which position, but if we include intercept, and found significance, it may suggest that some respondents simply chose certain position answers.
Instead of presenting the coefficients, usually analyzers make choice share predictions or compute willingness-to-pay for each attribute
cbc2.ml <- mlogit(choice ~ 0 + seat + cargo + eng + as.numeric(as.character(price)), data = cbc.mlogit)
summary(cbc2.ml)
##
## Call:
## mlogit(formula = choice ~ 0 + seat + cargo + eng + as.numeric(as.character(price)),
## data = cbc.mlogit, method = "nr", print.level = 0)
##
## Frequencies of alternatives:
## pos 1 pos 2 pos 3
## 0.32700 0.33467 0.33833
##
## nr method
## 5 iterations, 0h:0m:0s
## g'(-H)^-1g = 8E-05
## successive function values within tolerance limits
##
## Coefficients :
## Estimate Std. Error t-value Pr(>|t|)
## seat7 -0.5345392 0.0623518 -8.5730 < 2.2e-16
## seat8 -0.3061074 0.0611184 -5.0084 5.488e-07
## cargo3ft 0.4766936 0.0508632 9.3721 < 2.2e-16
## enggas 1.5291247 0.0673982 22.6879 < 2.2e-16
## enghyb 0.7183908 0.0654963 10.9684 < 2.2e-16
## as.numeric(as.character(price)) -0.1733053 0.0069398 -24.9726 < 2.2e-16
##
## seat7 ***
## seat8 ***
## cargo3ft ***
## enggas ***
## enghyb ***
## as.numeric(as.character(price)) ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -2582.1
This model convert price to numeric variable, and the negative coefficience suggested people prefer lower price to higher price.
coef(cbc2.ml)["cargo3ft"]/(-coef(cbc2.ml)["as.numeric(as.character(price))"]/1000)
## cargo3ft
## 2750.601
This result suggests that on average, when the price is $2750.6, customers become indifferent between the two cargo capacity options.
References: R for Marketing Research & Analytics by Chris Chapman & Elea McDonnel Feit.