Conjoint Analysis is a survey based technique to identify how customers value various attributes that make up an individual product. Products and services are bundles of features that customers consider jointly and while making a purchase decision they must make trade offs. Conjoint analysis helps companies and business owners determine importance of different features and their money value. This also helps them identify optimal price for their products and services.
In this example, we will discover how to design a survey to get the information necessary, analyze and quantify importance of various features in a product and how to interpret the results. But first , let’s begin by setting up R environment and loading required libraries.
#set working directory
setwd("C:/Users/awani/Documents/GitHub/50daysofAnalytics/Day 10 - Conjoint Analysis")
# load libraries
if (!require("pacman")) install.packages("pacman")
pacman::p_load(conjoint, DoE.base, knitr, dplyr, kableExtra, ggplot2)
options(scipen = 999)
options(digits = 3)
A small business owner wants to use conjoint analysis to understand what features in his product are most attractive to his customer and how should he price them. He sells chocolates with three different attributes.
Let first figure out how should the survey for conjoint be designed. In this example, 27 different varieties of chocolates can be manufactured and it might to present all those options to a customer and ask their preference but in real world scenario, we will have enormous possible combinations. Asking a customer to rate all different combinations will not only be expensive but inaccurate. To avoid this issue, we can design a survey to just ask questions about few combinations and then predict their response for rest of the combinations. This process of often referred as experiment design.
###Experiment Design for asking Conjoint Questions
#- identify number of questions required. Define level and factors
NumeberOfQuestions = nrow(oa.design(nlevels=c(3,3,3)))
#- create dummy data
data = expand.grid(Chocolate = c("Milk","Dark","Organic"),
Center = c("Plain", "Chewy", "Soft"),
Nuts = c("Mixed", "Almonds","None"))
#- Combinations to enquire
selectedComb = caFactorialDesign(
data = data,
type='fractional',
cards=NumeberOfQuestions)
#- print Selected Combinations
kable(selectedComb) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
| Chocolate | Center | Nuts | |
|---|---|---|---|
| 3 | Organic | Plain | Mixed |
| 5 | Dark | Chewy | Mixed |
| 7 | Milk | Soft | Mixed |
| 11 | Dark | Plain | Almonds |
| 13 | Milk | Chewy | Almonds |
| 18 | Organic | Soft | Almonds |
| 19 | Milk | Plain | None |
| 24 | Organic | Chewy | None |
| 26 | Dark | Soft | None |
Using orthogonal factorial design, we identified 9 combinations to include in customer survey out of possible 27 combinations. It saves time, money and effort. Survey takers was asked if they like the combination or not and their reposes were noted for all 9 combinations. We will use these 9 responses to train out logistic regression model and use to predict rest of combinations.
# add response column
selectedComb$Response = c(0,0,1,1,1,1,1,0,0)
# logistic regression
logit=glm(Response ~ factor(Chocolate) + factor(Center) + factor(Nuts),
family=binomial(link='logit'), data=selectedComb)
# predict
data$response = ifelse(predict(logit,data,type="response") > 0.5,1,0)
Now, that we have all the data we need, lets run a conjoint analysis and summarize importance of various features for customers.
# Survey response
Survey = data.frame(matrix(data$response,
ncol=27, nrow=1))
# save levels of the reponse
levels = c("Milk","Dark","Organic","Plain", "Chewy", "Soft","Mixed", "Almonds","None")
#Conjoint
Conjoint(Survey,data[,1:3],z =levels)
##
## Call:
## lm.default(formula = frml)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0,444 -0,111 -0,111 0,222 0,222
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 0,55555555555555535818 0,04969039949999531219
## factor(x$Chocolate)1 0,44444444444444408671 0,07027283689263061350
## factor(x$Chocolate)2 -0,22222222222222209886 0,07027283689263061350
## factor(x$Center)1 -0,00000000000000003783 0,07027283689263061350
## factor(x$Center)2 0,00000000000000000622 0,07027283689263062738
## factor(x$Nuts)1 -0,22222222222222226540 0,07027283689263064126
## factor(x$Nuts)2 0,44444444444444441977 0,07027283689263062738
## t value Pr(>|t|)
## (Intercept) 11,18 0,00000000047 ***
## factor(x$Chocolate)1 6,32 0,00000357622 ***
## factor(x$Chocolate)2 -3,16 0,0049 **
## factor(x$Center)1 0,00 1,0000
## factor(x$Center)2 0,00 1,0000
## factor(x$Nuts)1 -3,16 0,0049 **
## factor(x$Nuts)2 6,32 0,00000357622 ***
## ---
## Signif. codes: 0 '***' 0,001 '**' 0,01 '*' 0,05 '.' 0,1 ' ' 1
##
## Residual standard error: 0,258 on 20 degrees of freedom
## Multiple R-squared: 0,8, Adjusted R-squared: 0,74
## F-statistic: 13,3 on 6 and 20 DF, p-value: 0,00000453
## [1] "Part worths (utilities) of levels (model parameters for whole sample):"
## levnms utls
## 1 intercept 0,5556
## 2 Milk 0,4444
## 3 Dark -0,2222
## 4 Organic -0,2222
## 5 Plain 0
## 6 Chewy 0
## 7 Soft 0
## 8 Mixed -0,2222
## 9 Almonds 0,4444
## 10 None -0,2222
## [1] "Average importance of factors (attributes):"
## [1] 50 0 50
## [1] Sum of average importance: 100
## [1] "Chart of average factors importance"
The Importance of features plot shows that customers care equally about Chocolate type and Nut and not so much about Center.
#Feature Imp
Importance = data.frame(Feature = c("Chocolate", "Center", "Nut"),
Importance = caImportance(y = Survey, x = data[,1:3]))
ggplot(data = Importance, aes(x = reorder(Feature,-Importance), y = Importance)) +
geom_bar(stat= "identity", fill = "skyblue2", width = 0.7) +
ggtitle("Importance of different features of the Product") + xlab("")
Okay. So we understand type of chocolate and nuts are important. But which type of chocolate is preferred? We can get our answer by comparing the levels of each feature one at a time
#summarize Utilities
util = data.frame(Utilities = t(data.frame(caPartUtilities(y = Survey,
x = data[,1:3],z =levels))))
util$levels = row.names(util)
# Chocolate Type
ggplot(data = util[which(util$levels %in% c("Dark","Milk","Organic")),], aes(x = levels, y = Utilities)) +
geom_bar(stat= "identity", fill = "skyblue2", width = 0.7) +
ggtitle("Utilities of Chocolate Type") + xlab("")