Conjoint Analysis is a survey based technique to identify how customers value various attributes that make up an individual product. Products and services are bundles of features that customers consider jointly and while making a purchase decision they must make trade offs. Conjoint analysis helps companies and business owners determine importance of different features and their money value. This also helps them identify optimal price for their products and services.

Basic Setup

In this example, we will discover how to design a survey to get the information necessary, analyze and quantify importance of various features in a product and how to interpret the results. But first , let’s begin by setting up R environment and loading required libraries.

#set working directory
setwd("C:/Users/awani/Documents/GitHub/50daysofAnalytics/Day 10 - Conjoint Analysis")

# load libraries
if (!require("pacman")) install.packages("pacman")
pacman::p_load(conjoint, DoE.base, knitr, dplyr, kableExtra, ggplot2)

options(scipen = 999)
options(digits = 3)

 

Experiment Design

A small business owner wants to use conjoint analysis to understand what features in his product are most attractive to his customer and how should he price them. He sells chocolates with three different attributes.

Let first figure out how should the survey for conjoint be designed. In this example, 27 different varieties of chocolates can be manufactured and it might to present all those options to a customer and ask their preference but in real world scenario, we will have enormous possible combinations. Asking a customer to rate all different combinations will not only be expensive but inaccurate. To avoid this issue, we can design a survey to just ask questions about few combinations and then predict their response for rest of the combinations. This process of often referred as experiment design.

###Experiment Design for asking Conjoint Questions

#- identify number of questions required. Define level and factors
NumeberOfQuestions = nrow(oa.design(nlevels=c(3,3,3)))

#- create dummy data
data = expand.grid(Chocolate = c("Milk","Dark","Organic"),
                   Center = c("Plain", "Chewy", "Soft"),
                   Nuts = c("Mixed", "Almonds","None"))

#- Combinations to enquire
selectedComb = caFactorialDesign(
  data = data,
  type='fractional',
  cards=NumeberOfQuestions)

#- print Selected Combinations
kable(selectedComb) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
Chocolate Center Nuts
3 Organic Plain Mixed
5 Dark Chewy Mixed
7 Milk Soft Mixed
11 Dark Plain Almonds
13 Milk Chewy Almonds
18 Organic Soft Almonds
19 Milk Plain None
24 Organic Chewy None
26 Dark Soft None

 

Predicting Responses for other combinations

Using orthogonal factorial design, we identified 9 combinations to include in customer survey out of possible 27 combinations. It saves time, money and effort. Survey takers was asked if they like the combination or not and their reposes were noted for all 9 combinations. We will use these 9 responses to train out logistic regression model and use to predict rest of combinations.

# add response column
selectedComb$Response = c(0,0,1,1,1,1,1,0,0)

# logistic regression
logit=glm(Response ~ factor(Chocolate) + factor(Center) + factor(Nuts),
            family=binomial(link='logit'), data=selectedComb)

# predict
data$response = ifelse(predict(logit,data,type="response") > 0.5,1,0)

Conjoint Analysis

Now, that we have all the data we need, lets run a conjoint analysis and summarize importance of various features for customers.

# Survey response
Survey = data.frame(matrix(data$response,
                               ncol=27, nrow=1))
# save levels of the reponse
levels = c("Milk","Dark","Organic","Plain", "Chewy", "Soft","Mixed", "Almonds","None")

#Conjoint
Conjoint(Survey,data[,1:3],z =levels)
## 
## Call:
## lm.default(formula = frml)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -0,444 -0,111 -0,111  0,222  0,222 
## 
## Coefficients:
##                                     Estimate              Std. Error
## (Intercept)           0,55555555555555535818  0,04969039949999531219
## factor(x$Chocolate)1  0,44444444444444408671  0,07027283689263061350
## factor(x$Chocolate)2 -0,22222222222222209886  0,07027283689263061350
## factor(x$Center)1    -0,00000000000000003783  0,07027283689263061350
## factor(x$Center)2     0,00000000000000000622  0,07027283689263062738
## factor(x$Nuts)1      -0,22222222222222226540  0,07027283689263064126
## factor(x$Nuts)2       0,44444444444444441977  0,07027283689263062738
##                      t value      Pr(>|t|)    
## (Intercept)            11,18 0,00000000047 ***
## factor(x$Chocolate)1    6,32 0,00000357622 ***
## factor(x$Chocolate)2   -3,16        0,0049 ** 
## factor(x$Center)1       0,00        1,0000    
## factor(x$Center)2       0,00        1,0000    
## factor(x$Nuts)1        -3,16        0,0049 ** 
## factor(x$Nuts)2         6,32 0,00000357622 ***
## ---
## Signif. codes:  0 '***' 0,001 '**' 0,01 '*' 0,05 '.' 0,1 ' ' 1
## 
## Residual standard error: 0,258 on 20 degrees of freedom
## Multiple R-squared:   0,8,   Adjusted R-squared:  0,74 
## F-statistic: 13,3 on 6 and 20 DF,  p-value: 0,00000453
## [1] "Part worths (utilities) of levels (model parameters for whole sample):"
##       levnms    utls
## 1  intercept  0,5556
## 2       Milk  0,4444
## 3       Dark -0,2222
## 4    Organic -0,2222
## 5      Plain       0
## 6      Chewy       0
## 7       Soft       0
## 8      Mixed -0,2222
## 9    Almonds  0,4444
## 10      None -0,2222
## [1] "Average importance of factors (attributes):"
## [1] 50  0 50
## [1] Sum of average importance:  100
## [1] "Chart of average factors importance"

Feature Importance

The Importance of features plot shows that customers care equally about Chocolate type and Nut and not so much about Center.

#Feature Imp
Importance = data.frame(Feature = c("Chocolate", "Center", "Nut"), 
                        Importance = caImportance(y = Survey, x = data[,1:3]))

ggplot(data = Importance, aes(x = reorder(Feature,-Importance), y = Importance)) + 
  geom_bar(stat= "identity", fill = "skyblue2", width = 0.7) +
  ggtitle("Importance of different features of the Product") + xlab("")

Feature Utilities

Okay. So we understand type of chocolate and nuts are important. But which type of chocolate is preferred? We can get our answer by comparing the levels of each feature one at a time

#summarize Utilities
util = data.frame(Utilities = t(data.frame(caPartUtilities(y = Survey,
                                                           x = data[,1:3],z =levels))))
util$levels = row.names(util)

# Chocolate Type
ggplot(data = util[which(util$levels %in% c("Dark","Milk","Organic")),], aes(x = levels, y = Utilities)) +
  geom_bar(stat= "identity", fill = "skyblue2", width = 0.7) +
  ggtitle("Utilities of Chocolate Type") + xlab("")