Conjoint Analysis is a survey based technique
to identify how customers value various attributes that make up an
individual product. Products and services are bundles of features that
customers consider jointly and while making a purchase decision they
must make trade offs. Conjoint analysis helps companies and business
owners determine importance of different features and their money value.
This also helps them identify optimal price for their products and
services.
Basic Setup In this example, we will discover how to design a survey to get the information necessary, analyze and quantify importance of various features in a product and how to interpret the results. But first , let’s begin by setting up R environment and loading required libraries.
tidyverse
Error: object 'tidyverse' not found
Experiment Design: We want to know what car manufacturer/car type respondents find most important when choosing to buy a car.
###Experiment Design for asking Conjoint Questions
#- identify number of questions required. Define level and factors
NumberOfQuestions = nrow(oa.design(nlevels=c(7,2,2)))
creating full factorial with 28 runs ...
# create dummy data
bettervehicle = expand.grid(
car_manufacturer = c('chevrolet','dodge','honda','lincoln','mercury','pontiac','toyota'),
car_type= c('compact','suv','twoseater','midsize','minivan','pickup','subcompact'))
#combination to enquire
selectedComb = caFactorialDesign(
data = bettervehicle,
type = 'fractional',
cards = NumberOfQuestions)
# print select combinations
kable(selectedComb) %>%
kable_styling(bootstrap_options = c('striped','hover','condensed','responsive'))
car_manufacturer | car_type | |
---|---|---|
4 | lincoln | compact |
5 | mercury | compact |
6 | pontiac | compact |
7 | toyota | compact |
8 | chevrolet | suv |
9 | dodge | suv |
12 | mercury | suv |
13 | pontiac | suv |
15 | chevrolet | twoseater |
17 | honda | twoseater |
18 | lincoln | twoseater |
21 | toyota | twoseater |
23 | dodge | midsize |
24 | honda | midsize |
27 | pontiac | midsize |
28 | toyota | midsize |
29 | chevrolet | minivan |
31 | honda | minivan |
32 | lincoln | minivan |
34 | pontiac | minivan |
37 | dodge | pickup |
38 | honda | pickup |
39 | lincoln | pickup |
40 | mercury | pickup |
43 | chevrolet | subcompact |
44 | dodge | subcompact |
47 | mercury | subcompact |
49 | toyota | subcompact |
Predicting Responses for other combinations Using orthogonal factorial design, we identified 28 combinations to include in customer survey out of possible 49 combinations. It saves time, money and effort. Survey takers were asked if they liked the combination or not and their responses were noted for all 28 combinations. We will use these 28 responses to train our logistic regression model and use to predict rest of combinations.
# logistic regression
logisticmodel=glm(Response ~ factor(car_manufacturer) + factor(car_type),
family=binomial(link='logit'), data=selectedComb)
logisticmodel
Call: glm(formula = Response ~ factor(car_manufacturer) + factor(car_type),
family = binomial(link = "logit"), data = selectedComb)
Coefficients:
(Intercept) factor(car_manufacturer)dodge
37.17375406612627131 -18.66970866954126862
factor(car_manufacturer)honda factor(car_manufacturer)lincoln
-24.18860483890588853 -56.42542531667980654
factor(car_manufacturer)mercury factor(car_manufacturer)pontiac
-37.17375406612629263 -37.17375406612627131
factor(car_manufacturer)toyota factor(car_type)suv
0.00000000000000367 0.00000000000000819
factor(car_type)twoseater factor(car_type)midsize
-37.17375406612627131 19.79288025159280195
factor(car_type)minivan factor(car_type)pickup
-56.83058479194890822 39.00537632897358975
factor(car_type)subcompact
-37.17375406612627842
Degrees of Freedom: 27 Total (i.e. Null); 15 Residual
Null Deviance: 38.7
Residual Deviance: 11.1 AIC: 37.1
# predict
bettervehicle$response = ifelse(predict(logisticmodel,bettervehicle,type="response") > 0.5,1,0)
bettervehicle$response
[1] 1 1 1 0 0 0 1 1 1 1 0 0 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0
[45] 0 0 0 0 0
Conjoint Analysis
Now, that we have all the data we need, lets run a conjoint analysis and summarize importance of various features for customers.
Call:
lm(formula = frml)
Residuals:
Min 1Q Median 3Q Max
-0,6364 -0,2245 0,0612 0,2041 0,6327
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value
(Intercept) 0,50811688311688263 0,02847261231624853 17,85
factor(x$car_manufacturer)1 -3,44318181818182811 0,51981640713137645 -6,62
factor(x$car_manufacturer)2 0,57351576994434250 0,10435622599741000 5,50
factor(x$car_manufacturer)3 0,57351576994434050 0,10435622599740983 5,50
factor(x$car_manufacturer)4 0,57351576994434172 0,10435622599740989 5,50
factor(x$car_manufacturer)5 0,28780148423005619 0,10435622599741004 2,76
factor(x$car_manufacturer)6 0,28780148423005719 0,10435622599741004 2,76
factor(x$car_manufacturer)7 0,43065862708719999 0,10435622599741008 4,13
factor(x$car_type)1 3,57142857142857872 0,51323656942362439 6,96
factor(x$car_type)2 -0,42857142857143121 0,11199740137158185 -3,83
factor(x$car_type)3 -0,28571428571428786 0,11199740137158190 -2,55
factor(x$car_type)4 -0,85714285714285821 0,11199740137158173 -7,65
factor(x$car_type)5 -0,00000000000000116 0,11199740137158175 0,00
factor(x$car_type)6 -1,00000000000000111 0,11199740137158172 -8,93
factor(x$car_type)7 NA NA NA
Pr(>|t|)
(Intercept) < 0,0000000000000002 ***
factor(x$car_manufacturer)1 0,000000002089818 ***
factor(x$car_manufacturer)2 0,000000325636423 ***
factor(x$car_manufacturer)3 0,000000325636423 ***
factor(x$car_manufacturer)4 0,000000325636423 ***
factor(x$car_manufacturer)5 0,00698 **
factor(x$car_manufacturer)6 0,00698 **
factor(x$car_manufacturer)7 0,000078922487768 ***
factor(x$car_type)1 0,000000000438388 ***
factor(x$car_type)2 0,00023 ***
factor(x$car_type)3 0,01234 *
factor(x$car_type)4 0,000000000016081 ***
factor(x$car_type)5 1,00000
factor(x$car_type)6 0,000000000000032 ***
factor(x$car_type)7 NA
---
Signif. codes: 0 ‘***’ 0,001 ‘**’ 0,01 ‘*’ 0,05 ‘.’ 0,1 ‘ ’ 1
Residual standard error: 0,296 on 95 degrees of freedom
Multiple R-squared: 0,694, Adjusted R-squared: 0,652
F-statistic: 16,6 on 13 and 95 DF, p-value: <0,0000000000000002
[1] "Part worths (utilities) of levels (model parameters for whole sample):"
[1] "Average importance of factors (attributes):"
[1] 30 70
[1] Sum of average importance: 100
[1] "Chart of average factors importance"
Feature Importance
The Importance of features plot
shows that buyers of vehicles care more about the type of car it is than
they do the manufacturer. Which in way makes sense as it provides them
with a way of understanding how the car might best suit their needs.
# Feature Importance
Importance2 = data.frame(Feature = c('car_manufacturer','car_type'),
Importance=caImportance(y=Survey_b,
x=bettervehicle[,1:2]))
Importance2
ggplot(Importance2,aes(reorder(Feature,-Importance),Importance))+
geom_bar(stat='identity',width=.8,fill='orange3')+
ggtitle('Importance of different features')+
xlab('')+theme_classic()+coord_flip()
Feature Utilities
Now, for vehicle options, we
understand that car type is the key factor, although the manufacturer is
still seen to be of some importance. But overall which type of car
design do consumers find most paramount in their reason of wanting to
buy it? We can get our answer by comparing the levels of each feature
one at a time
#summarize utilities
util = data.frame(Utilities = t
(data.frame(caPartUtilities(y=Survey,
x=bettervehicle[,1:2],z=levels))))
util$levels = row.names(util)
util$levels
[1] "intercept" "chevrolet" "dodge" "honda" "lincoln" "mercury"
[7] "pontiac" "toyota" "compact" "suv" "twoseater" "midsize"
[13] "minivan" "pickup" "subcompact"
util %>%
arrange(desc(Utilities))
# Type
ggplot(data = util[which(util$levels %in% c('compact','suv','twoseater','midsize','minivan', 'pickup','subcompact')),], aes(x = levels, y = Utilities,fill=Utilities)) +
geom_bar(stat='identity') +
ggtitle("Utilities of Car Classes") + xlab("")+coord_flip()+
geom_hline(yintercept = 0,lty=2,col='black',linewidth=.7)+scale_fill_gradient(high='cyan',low='black')+theme(plot.title = element_text(size=14),vjust=1)+theme_classic()+theme(legend.position = 'bottom')
ggplot(data = util[which(util$levels %in% c('compact','suv','midsize','pickup')),], aes(x = levels, y = Utilities,fill=Utilities)) +
geom_bar(stat='identity') +
ggtitle("Utilities of Car Classes") + xlab("")+coord_flip()+
geom_hline(yintercept = 0,lty=2,col='black',linewidth=.7)+scale_fill_gradient(high='cyan',low='black')+theme(plot.title = element_text(size=14),vjust=1)+theme_classic()+theme(legend.position = 'bottom')
Findings
After comparing the levels of each feature in our utility summary
it’s clear from the results of the survey that the majority of
respondents most prefer to buy cars based off of their type or general
design rather than the car’s manufacturer. In addition to this, we also
discovered that the 51% of respondents who happened to buy a vehicle
ended getting one because it was a pick up truck while another 51% of
respondents did so because of it being midsize. Meanwhile, 22.4%
purchased the vehicle because of it being an SUV and less than 10% for
being compact. Whereas, respondents found two seaters, subcompact and
minivan car types to be of no importance in choosing a car to buy.
Insights
What this tells us is that we should
focus the majority of our resources more heavily toward the continued
production and advertising alongside the addition of more technological
enhancements in pick up trucks and midsize cars while still allocating a
certain percent of resources to the production,enhancement and
advertisement of SUV types.