What is conjoint analysis

Suppose your firm has introduced a new product to the market e.g. a new yoghurt brand, what features maximizes customers utility? Afterall, companies invest time and money into product research trying to pinpoint exactly what products or services consumers desire. Knowing what your customers’ desire, your product delevelopment strategies will be aimed at meeting or exceeding what they value.

Conjoint analysis is a statistical marketing research technique that helps businesses measure what their consumers value most about their products and services. It helps identify a combination of features and pricing that produce maximum utility. With different combinations you can predict future demands for the product.

As an example, let’s say that we are researching on attributes that are most influential to a consumer when purchasing a TV. See attributes below:

My example that I will henceforth use is from an internet providing company e.g. Zuku, Safcom, etc. I want to understand what bundle options based on different features and price are most influential to customers. They have been based on combinations of different features. The data generated is simulated but not based on any market information a.ka. it’s random. An online questionnaire I have scripted using XLSForm can be accessed here and filled in.

The first thing is to load the required packages and come up with all possible combinations of product features based on full experimental design. All combinations are equal to multipliplication of number of attributes in each level in our case \(3*3*2*2*3=108\). We will then choose the combination necessary for research.

library(conjoint)
library(dplyr)
library(AlgDesign)
library(ggplot2)
library(knitr)
library(kableExtra)

##Which combinations maximizes utility of consumers

combined_attributes=gen.factorial(c(3,3,2,2,3),varNames=c("Bundles","Days_valid","Free_Whatsapp","Free_Youtube","Price"),factors = "all")

##Name all attributes

combined_attributes<-combined_attributes %>%
  mutate(Bundles=factor(Bundles,labels=c("10GBs","12GBs","15GBs"),levels=c(1,2,3)),
         Days_valid=factor(Days_valid,labels=c("7days","10days","12days"),levels=c(1,2,3)),
         Free_Whatsapp=factor(Free_Whatsapp,labels = c("No","Yes"),levels=c(1,2)),
         Free_Youtube=factor(Free_Youtube,labels=c("No","Yes"),levels=c(1,2)),
         Price=factor(Price,labels=c("1000","1500","2000"),levels=c(1,2,3)))
####levels
levels=c("10GBs","12GBs","15GBs","7days","10days","12days","Yes","No","Yes","No","1000","1500","2000")
levels=data.frame(levels)

NB: It is difficult to use all combinations, we need to reduce them into manageable combinations through random selection of rows. I have reduced to 9 combinations, displayed in the table below.

###Reduce number of combinations through random selection

set.seed(7654321)
few_combinations=optFederov(~.,combined_attributes,9)
few_combinations=few_combinations$design
few_combinations %>%
  kable("html") %>%
  kable_styling(font_size=12) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
Bundles Days_valid Free_Whatsapp Free_Youtube Price
4 10GBs 10days No No 1000
18 15GBs 12days Yes No 1000
28 10GBs 7days Yes Yes 1000
38 12GBs 7days No No 1500
49 10GBs 10days Yes No 1500
60 15GBs 10days No Yes 1500
84 15GBs 7days Yes No 2000
97 10GBs 12days No Yes 2000
104 12GBs 10days Yes Yes 2000

We now simulate data on given by the respondents ratings given to each combination. In this example the data is from 100 respondents. I will only display data from only 10 respondents.

####simulating a data

n=100 ##Number of respondents interviewed
profile_data <- data.frame(Respondent =1:100)
profile_data$Respondent<-as.factor(profile_data$Respondent)
for (run in 1:9) {
  profile_data[,paste("rating.combination",as.character(run), sep = "")]<- sample(c(1:9), n, replace = TRUE)
}

head(profile_data,10) %>%
  kable("html") %>%
  kable_styling(font_size=10) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
Respondent rating.combination1 rating.combination2 rating.combination3 rating.combination4 rating.combination5 rating.combination6 rating.combination7 rating.combination8 rating.combination9
1 4 4 8 3 9 3 8 8 4
2 1 1 1 4 4 5 5 2 1
3 2 5 3 2 2 3 5 5 2
4 9 7 6 4 2 5 2 1 7
5 9 8 1 8 2 2 1 4 1
6 8 2 8 2 1 6 4 2 5
7 9 3 6 3 8 8 6 8 9
8 2 8 2 3 3 3 6 4 8
9 7 8 1 7 6 9 9 2 6
10 3 5 2 5 7 4 8 9 7

Run a conjoint analysis and summarize importance of various features. For ease of interpretation we will summarize important factors in a bar graph. From the bar graph below on ‘Importance of different features’, it is clear that customers put a high value on the type of bundle than on other factors. Free Youtube does not feature prominently in customers preference when they are making trade-offs.

fit=Conjoint(y=profile_data[,2:10],x=few_combinations,z=levels)
## 
## Call:
## lm(formula = frml)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -4,27  -2,07   0,06   2,10   4,26 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               4,981000   0,094921  52,475   <2e-16 ***
## factor(x$Bundles)1       -0,005333   0,126562  -0,042    0,966    
## factor(x$Bundles)2        0,152667   0,152682   1,000    0,318    
## factor(x$Days_valid)1     0,092667   0,129907   0,713    0,476    
## factor(x$Days_valid)2    -0,141333   0,126562  -1,117    0,264    
## factor(x$Free_Whatsapp)1 -0,096000   0,094921  -1,011    0,312    
## factor(x$Free_Youtube)1  -0,047000   0,093211  -0,504    0,614    
## factor(x$Price)1          0,048667   0,137398   0,354    0,723    
## factor(x$Price)2          0,186667   0,146467   1,274    0,203    
## ---
## Signif. codes:  0 '***' 0,001 '**' 0,01 '*' 0,05 '.' 0,1 ' ' 1
## 
## Residual standard error: 2,537 on 891 degrees of freedom
## Multiple R-squared:  0,00629,    Adjusted R-squared:  -0,002632 
## F-statistic: 0,705 on 8 and 891 DF,  p-value: 0,6874
## [1] "Part worths (utilities) of levels (model parameters for whole sample):"
##       levnms    utls
## 1  intercept   4,981
## 2      10GBs -0,0053
## 3      12GBs  0,1527
## 4      15GBs -0,1473
## 5      7days  0,0927
## 6     10days -0,1413
## 7     12days  0,0487
## 8        Yes  -0,096
## 9         No   0,096
## 10       Yes  -0,047
## 11        No   0,047
## 12      1000  0,0487
## 13      1500  0,1867
## 14      2000 -0,2353
## [1] "Average importance of factors (attributes):"
## [1] 25,66 24,45 13,96 11,33 24,60
## [1] Sum of average importance:  100
## [1] "Chart of average factors importance"
#####GGplot of important features
Importance = data.frame(Feature = c("Bundles","Days_valid","Free_Whatsapp","Free_Youtube","Price"), 
                        Importance = caImportance(y=profile_data[,2:10],x=few_combinations))
ggplot(data = Importance, aes(x = reorder(Feature,-Importance), y = Importance)) + 
  geom_bar(stat= "identity", fill = "skyblue2", width = 0.7) +
  ggtitle("Importance of different features") + xlab("")

We now ask, what is the best combination that will maximize utility? We will turn to utilities for each level. From the 5 biplots below the best features are the following: NB. Highest positive bars from biplots.

util = caUtilities(y=profile_data[,2:10],x=few_combinations,z =levels)
## 
## Call:
## lm(formula = frml)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -4,27  -2,07   0,06   2,10   4,26 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               4,981000   0,094921  52,475   <2e-16 ***
## factor(x$Bundles)1       -0,005333   0,126562  -0,042    0,966    
## factor(x$Bundles)2        0,152667   0,152682   1,000    0,318    
## factor(x$Days_valid)1     0,092667   0,129907   0,713    0,476    
## factor(x$Days_valid)2    -0,141333   0,126562  -1,117    0,264    
## factor(x$Free_Whatsapp)1 -0,096000   0,094921  -1,011    0,312    
## factor(x$Free_Youtube)1  -0,047000   0,093211  -0,504    0,614    
## factor(x$Price)1          0,048667   0,137398   0,354    0,723    
## factor(x$Price)2          0,186667   0,146467   1,274    0,203    
## ---
## Signif. codes:  0 '***' 0,001 '**' 0,01 '*' 0,05 '.' 0,1 ' ' 1
## 
## Residual standard error: 2,537 on 891 degrees of freedom
## Multiple R-squared:  0,00629,    Adjusted R-squared:  -0,002632 
## F-statistic: 0,705 on 8 and 891 DF,  p-value: 0,6874
bundle_utility=util[2:4]
valid_days=util[5:7]
Free.Whatsapp=util[8:9]
Free.Youtube=util[10:11]
price=util[12:14]

names(bundle_utility)=c("10GBs","12GBs","15GBs")
names(valid_days)=c("7days","10days","12days")
names(Free.Whatsapp)=c("No","Yes")
names(Free.Youtube)=c("No","Yes")
names(price)=c("1000","1500","2000")
barplot(bundle_utility,col="skyblue2",main="Bundle type")

barplot(valid_days,col="brown",main="Valid days")

barplot(Free.Whatsapp,col="grey",main="Free WhatsApp")

barplot(Free.Youtube,col="orange",main="Free Youtube")

barplot(price,col="yellow",main="Price tags")