According to Green & Srinivasan, conjoint analysis is any decompositional method that estimates the structure of consumer’s preferences - given his/her overall evaluation of a set of alternatives that are prespecified in terms of levels of different attributes. This approach is widely used and popular in marketing research, especially when in comes to understanding consumer preferences and determining selections of features in new product designs/advertisements.

There are a number of conjoint methods such as traditional method (CA) which uses stated preference ratings, adaptive conjoint analysis (ACA) which is suitable to handle a large number of attributes. One of the most popular conjoint is choice-based conjoint analysis that uses stated choices.

## Registered S3 methods overwritten by 'ggplot2':
##   method         from 
##   [.quosures     rlang
##   c.quosures     rlang
##   print.quosures rlang

Create Attribute Levels

First, let’s create different features that consumers would likely prefer in a product/service.

c <- expand.grid(
  price <- c("low", "medium", "high"),
  color <- c("black", "white"),
  size <- c("small", "large"))

#change the column names to these
names(c) <- c("price", "color", "size")

Decide Design Types (e.g. Orthogonal, Full, Factorial)

caFactorialDesign a function created to return types of design, while caEncodedDesign convert it to a matrix of profiles.

A full factorial design includes all combinations of the attribute levels. In this case, 3 prices x 2 colors x 2 sizes = 12 combinations to be evaluated by each consumer/respondent. However if the number of combinations are too large, then one of the ways to deal with this problem is to use orthogonal design, and as the name suggests, this select a fraction of the full combinations and in a systematic way.

design <- caFactorialDesign(data=c, type="orthogonal")
code <- caEncodedDesign(design)
encodedorthodesign <- data.frame(design, code)
print(encodedorthodesign)
##     price color  size price.1 color.1 size.1
## 2  medium black small       2       1      1
## 3    high black small       3       1      1
## 4     low white small       1       2      1
## 5  medium white small       2       2      1
## 7     low black large       1       1      2
## 8  medium black large       2       1      2
## 11 medium white large       2       2      2
## 12   high white large       3       2      2

After knowing types of design you want and number of cards created, now let’s create the sample data of how respodents would rate/respond to each card.

set.seed(123)
df <- data.frame(X=replicate(5, sample(1:100, 8, replace=T)))
lev <- c("low", "medium", "large", "black", "white", "small", "large")
lev.df <- data.frame(lev)

Average Importance of the Features

From the plot, we can tell that price is most signficant factor when it comes to choosing this product.

## 
## Call:
## lm(formula = frml)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -57,000 -22,200  -1,875  27,175  48,850 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          50,767      4,926  10,305 3,84e-12 ***
## factor(x$price.1)1   12,533      7,307   1,715   0,0951 .  
## factor(x$price.1)2   -2,967      6,231  -0,476   0,6370    
## factor(x$color.1)1   -1,125      4,673  -0,241   0,8112    
## factor(x$size.1)1    -1,825      4,673  -0,391   0,6985    
## ---
## Signif. codes:  0 '***' 0,001 '**' 0,01 '*' 0,05 '.' 0,1 ' ' 1
## 
## Residual standard error: 29,56 on 35 degrees of freedom
## Multiple R-squared:  0,08455,    Adjusted R-squared:  -0,02008 
## F-statistic: 0,8081 on 4 and 35 DF,  p-value: 0,5285
## [1] "Part worths (utilities) of levels (model parameters for whole sample):"
##      levnms    utls
## 1 intercept 50,7667
## 2       low 12,5333
## 3    medium -2,9667
## 4     large -9,5667
## 5     black  -1,125
## 6     white   1,125
## 7     small  -1,825
## 8     large   1,825
## [1] "Average importance of factors (attributes):"
## [1] 61,8 17,2 21,0
## [1] Sum of average importance:  100
## [1] "Chart of average factors importance"
caImportance(df, encodedorthodesign[,4:6])
## [1] 61.8 17.2 21.0

#Utilites of Each Feature Level

This shows matrix of individual utilities for respondents.

u <- caPartUtilities(y=df, x=encodedorthodesign[,4:6], z=lev.df)
u
##      intercept     low  medium   large  black  white  small   large
## [1,]    38.667 -30.667  15.833  14.833 -9.875  9.875 -8.375   8.375
## [2,]    49.250  25.750  12.000 -37.750  3.250 -3.250 -5.750   5.750
## [3,]    65.500   8.500  -0.500  -8.000  9.375 -9.375 14.125 -14.125
## [4,]    53.250  30.250 -26.500  -3.750 -4.625  4.625 -5.875   5.875
## [5,]    47.167  28.833 -15.667 -13.167 -3.750  3.750 -3.250   3.250

For instance, this respondent prefers large size, and do not prefer small size product.

barplot(u[1,7:8])

##Find out the segmentation based on the utilites

k <- caSegmentation(df, encodedorthodesign[,4:6], c=3)
k
## $segm
## K-means clustering with 3 clusters of sizes 2, 1, 2
## 
## Cluster means:
##     [,1]  [,2]   [,3]  [,4]   [,5]  [,6]   [,7]  [,8]
## 1 73.625 45.00 72.375 61.00 76.625 65.25 52.625 24.00
## 2 36.250 35.25  9.500 56.00  6.500 53.00 72.750 71.75
## 3 20.375 33.00 79.375 28.75 80.125 29.50 37.875 50.50
## 
## Clustering vector:
## [1] 2 1 1 3 3
## 
## Within cluster sum of squares by cluster:
## [1] 3875.25    0.00  372.25
##  (between_SS / total_SS =  78.7 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"    
## [5] "tot.withinss" "betweenss"    "size"         "iter"        
## [9] "ifault"      
## 
## $util
##       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]
## [1,] 36.25 35.25  9.50 56.00  6.50 53.00 72.75 71.75
## [2,] 58.75  9.00 66.00 52.25 84.00 70.25 63.75 14.00
## [3,] 88.50 81.00 78.75 69.75 69.25 60.25 41.50 34.00
## [4,] 16.25 39.00 82.25 25.50 84.75 28.00 37.25 60.00
## [5,] 24.50 27.00 76.50 32.00 75.50 31.00 38.50 41.00
## 
## $sclu
## [1] 2 1 1 3 3

Plot the clusters

k.df <- data.frame(k$sclu, k$util)
ggplot(k.df, aes(k.df$X1, k.df$X2, color=as.factor(k.df$k.sclu)))+geom_jitter()+
  labs(title="3 Clusters", x="X1", y="X2", color="Clusters")