Analysis Purpose

An tutomotive company is trying to determine what type of engine the car should have. The questions include:

1) Do customers like or dislike hybrid engines?

2) If they like them, how much more they are willing to pay?

3) Are there segments of customers who like hybrid engines more than other customers?

Load Data

cbc.df <- read.csv("http://goo.gl/5xQObB", colClasses = c(seat = "factor", price = "factor" ))
summary(cbc.df)
##     resp.id            ques         alt    carpool    seat     cargo     
##  Min.   :  1.00   Min.   : 1   Min.   :1   no :6345   6:3024   2ft:4501  
##  1st Qu.: 50.75   1st Qu.: 4   1st Qu.:1   yes:2655   7:2993   3ft:4499  
##  Median :100.50   Median : 8   Median :2              8:2983             
##  Mean   :100.50   Mean   : 8   Mean   :2                                 
##  3rd Qu.:150.25   3rd Qu.:12   3rd Qu.:3                                 
##  Max.   :200.00   Max.   :15   Max.   :3                                 
##    eng       price         choice      
##  elec:3010   30:2998   Min.   :0.0000  
##  gas :3005   35:2997   1st Qu.:0.0000  
##  hyb :2985   40:3005   Median :0.0000  
##                        Mean   :0.3333  
##                        3rd Qu.:1.0000  
##                        Max.   :1.0000

Data summary: there are 200 respondents conducted the survey. Each survey includes 15 questions, each question has 3 options or 3 althernatives. There are 5 product features or attributes.

Check the raw choice

xtabs(choice ~ carpool, data=cbc.df)
## carpool
##   no  yes 
## 2115  885
xtabs(choice ~ seat, data=cbc.df)
## seat
##    6    7    8 
## 1164  854  982
xtabs(choice ~ cargo, data=cbc.df)
## cargo
##  2ft  3ft 
## 1312 1688
xtabs(choice ~ eng, data=cbc.df)
## eng
## elec  gas  hyb 
##  608 1444  948
xtabs(choice ~ price, data=cbc.df)
## price
##   30   35   40 
## 1486  956  558

Fitting Choice Model

library(mlogit)
## Loading required package: Formula
## Loading required package: maxLik
## Loading required package: miscTools
## 
## Please cite the 'maxLik' package as:
## Henningsen, Arne and Toomet, Ott (2011). maxLik: A package for maximum likelihood estimation in R. Computational Statistics 26(3), 443-458. DOI 10.1007/s00180-010-0217-1.
## 
## If you have questions, suggestions, or comments regarding the 'maxLik' package, please use a forum or 'tracker' at maxLik's R-Forge site:
## https://r-forge.r-project.org/projects/maxlik/
cbc.mlogit <- mlogit.data(data=cbc.df, choice="choice", shape="long", varying=3:6, alt.levels=paste("pos", 1:3), id.var="resp.id")
cbc.ml <- mlogit(choice ~ 0 + seat + cargo + eng + price, data = cbc.mlogit)
summary(cbc.ml)
## 
## Call:
## mlogit(formula = choice ~ 0 + seat + cargo + eng + price, data = cbc.mlogit, 
##     method = "nr", print.level = 0)
## 
## Frequencies of alternatives:
##   pos 1   pos 2   pos 3 
## 0.32700 0.33467 0.33833 
## 
## nr method
## 5 iterations, 0h:0m:0s 
## g'(-H)^-1g = 7.84E-05 
## successive function values within tolerance limits 
## 
## Coefficients :
##           Estimate Std. Error  t-value  Pr(>|t|)    
## seat7    -0.535280   0.062360  -8.5837 < 2.2e-16 ***
## seat8    -0.305840   0.061129  -5.0032 5.638e-07 ***
## cargo3ft  0.477449   0.050888   9.3824 < 2.2e-16 ***
## enggas    1.530762   0.067456  22.6926 < 2.2e-16 ***
## enghyb    0.719479   0.065529  10.9796 < 2.2e-16 ***
## price35  -0.913656   0.060601 -15.0765 < 2.2e-16 ***
## price40  -1.725851   0.069631 -24.7856 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log-Likelihood: -2581.6

The estimate of coefficients lists the mean values for each level relative to base level. Take seat7 for example, compared to seat6, seat7 is less populor (negative); similarly, cargo3ft is more popular than cargo 2 ft. The parameter estimates are on the logit scale and typically range between -2 to 2. With higher value, suggest strongly like (positive) or dislike (negative)

0 in this model indicates we did not include an intercept, if we remove 0 then we will have intercept, two additional parameters that indicate preference for the different positions in the question (left, middle, or right). Usually, we don’t want to know if customers prefer which position, but if we include intercept, and found significance, it may suggest that some respondents simply chose certain position answers.

Report Findings

Instead of presenting the coefficients, usually analyzers make choice share predictions or compute willingness-to-pay for each attribute

cbc2.ml <- mlogit(choice ~ 0 + seat + cargo + eng + as.numeric(as.character(price)), data = cbc.mlogit)
summary(cbc2.ml)
## 
## Call:
## mlogit(formula = choice ~ 0 + seat + cargo + eng + as.numeric(as.character(price)), 
##     data = cbc.mlogit, method = "nr", print.level = 0)
## 
## Frequencies of alternatives:
##   pos 1   pos 2   pos 3 
## 0.32700 0.33467 0.33833 
## 
## nr method
## 5 iterations, 0h:0m:0s 
## g'(-H)^-1g = 8E-05 
## successive function values within tolerance limits 
## 
## Coefficients :
##                                   Estimate Std. Error  t-value  Pr(>|t|)
## seat7                           -0.5345392  0.0623518  -8.5730 < 2.2e-16
## seat8                           -0.3061074  0.0611184  -5.0084 5.488e-07
## cargo3ft                         0.4766936  0.0508632   9.3721 < 2.2e-16
## enggas                           1.5291247  0.0673982  22.6879 < 2.2e-16
## enghyb                           0.7183908  0.0654963  10.9684 < 2.2e-16
## as.numeric(as.character(price)) -0.1733053  0.0069398 -24.9726 < 2.2e-16
##                                    
## seat7                           ***
## seat8                           ***
## cargo3ft                        ***
## enggas                          ***
## enghyb                          ***
## as.numeric(as.character(price)) ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log-Likelihood: -2582.1

This model convert price to numeric variable, and the negative coefficience suggested people prefer lower price to higher price.

coef(cbc2.ml)["cargo3ft"]/(-coef(cbc2.ml)["as.numeric(as.character(price))"]/1000)
## cargo3ft 
## 2750.601

This result suggests that on average, when the price is $2750.6, customers become indifferent between the two cargo capacity options.

References: R for Marketing Research & Analytics by Chris Chapman & Elea McDonnel Feit.