Conjoint Analysis

Several interdependent decisions are involved in the formulation of a marketing strategy for a brand. These include not only decisions about the product’s characteristic but also its positioning, communication, distribution, and pricing to chosen sets of targeted customers. The decisions will need to be made in the wake of uncertain competitive reactions and a changing (and often unpredictable) environment. For a business to be successful, the decision process must include a clear understanding of how customers will choose among (and react to) various competing alternatives.

As seen in this article, one of the major objectives in conjoint analysis is to predict the choices made by a sample of individuals for a new item which is described in terms of a set of attributes used in a conjoint study. Choice-based conjoint (CBC) studies involve the conversion of an individual’s stated utility for an item to predict the probability of choice of an alternative under various conditions. Such a prediction is made using preference data collected on a set of hypothetical choice alternatives. Once the choice model based on such data has been estimated, the parameter estimates can be used to assess relative importance of different attributes of a product.

Packages

To perform the analysis, the following CRAN based packages will be used:

  1. tidyverse
  2. ggthemes
  3. mlogit
  4. cvms
  5. kableExtra
  6. stargazer

Data

The database used in this article contains data on choices made by 137 individuals. Each individual evaluated 15 choice sets of different tablets and its attributes. Each choice set had three alternatives: the individual’s task was to choose one alternative from a choice set. Each alternative is described using the following attributes: brand name, screen size, size of hard drive, RAM, battery life, and price. Thus, the file contains data on 137⋅15 = 2055 choice sets.

Column 1 (consumer id) identifies each of the 137 subjects; Column 2 (choice set id) identifies each of 2055 choice sets; Column 3 (alternative id in set) identifies the three alternatives in a choice set. Column 4 identifies the id of the alternative chosen from the choice set. The remaining columns contain attributes for each alternative.

Each row corresponds to an alternative and characteristics that describe it. Each choice set has three alternatives and the individual selected one alternative in each set – which happened to be alternative #1 in both cases. The ordering of alternatives within a choice set has no special meaning. In the table below, the two choice sets can be seen displayed for consumer #1:

ConsumerId ChoiceSetId AltIdInSet Choice Brand Size Storage Ram Battery Price
1 1 1 1 iPad 7inch 32gb 4gb 7h 499
1 1 2 0 Surface 10inch 64gb 2gb 9h 399
1 1 3 0 Kindle 9inch 16gb 2gb 8h 499
1 2 1 1 iPad 8inch 32gb 1gb 8h 399
1 2 2 0 Surface 10inch 128gb 4gb 7h 299
1 2 3 0 Nexus 7inch 64gb 1gb 9h 199

Multinomial conjoint model estimation

The Multinomial Logistic Regression is useful for situations in which you want to be able to classify subjects based on values of a set of predictor variables. This type of regression is similar to logistic regression, but it is more general because the dependent variable is not restricted to two categories. See Maddala (1983), Louviere et. al (2001) and Greene (2012) for various details of estimation methods. The mlogit package will be used to train a conjoint multinomial choice model. First, the provided function is used with mlogit.data() to create a specially formatted data object that will be used in the estimation.

mdata <- mlogit.data(data = data,
                     choice = "Choice", # variable that contains choice
                     shape = "long", # tells mlogit how data is structured (every row is alternative)
                     varying = 5:10, # columns that contain variables that vary across alternatives
                     alt.levels = paste(" Alternative", 1:3), # levels of the alternatives
                     id.var = "ConsumerId") # consumer id
ConsumerId ChoiceSetId AltIdInSet Choice Brand Size Storage Ram Battery Price
  1. Alternative 1
1 1 1 TRUE iPad 7inch 32gb 4gb 7h 499
  1. Alternative 2
1 1 2 FALSE Surface 10inch 64gb 2gb 9h 399
  1. Alternative 3
1 1 3 FALSE Kindle 9inch 16gb 2gb 8h 499
  1. Alternative 1
1 2 1 TRUE iPad 8inch 32gb 1gb 8h 399
  1. Alternative 2
1 2 2 FALSE Surface 10inch 128gb 4gb 7h 299
  1. Alternative 3
1 2 3 FALSE Nexus 7inch 64gb 1gb 9h 199

When we run the model, it selects the reference level for each discrete attribute. The utility of the reference level is normalized to zero. We specified a reference level for each discrete attribute at the data-loading stage. These reference levels are Nexus, 7" screen, 16GB HD, 1GB RAM, 7-hour battery. We treat price as a continuous variable, so we do not need to specify a reference level.

The model assumes the utility of alternative \(j\) without an error term is expressed as follows

\[\begin{align*} V_j = & \beta_{11}\mathbb{1}\left[\text{Brand=Galaxy}\right] + \beta_{12}\mathbb{1}\left[\text{Brand=iPad}\right] + \beta_{13}\mathbb{1}\left[\text{Brand=Kindle}\right] + \beta_{14}\mathbb{1}\left[\text{Brand=Surface}\right] + \\ &\beta_{21}\mathbb{1}\left[\text{Screen=10inch}\right] + \beta_{22}\mathbb{1}\left[\text{Screen=9inch}\right] + \beta_{23}\mathbb{1}\left[\text{Screen=8inch}\right]+\\ &\beta_{31}\mathbb{1}\left[\text{Storage=128gb}\right] + \beta_{32}\mathbb{1}\left[\text{Storage=64gb}\right] + \beta_{33}\mathbb{1}\left[\text{Storage=32gb}\right]+\\ &\beta_{41}\mathbb{1}\left[\text{RAM=4gb}\right] + \beta_{42}\mathbb{1}\left[\text{RAM=2gb}\right] +\\ &\beta_{51}\mathbb{1}\left[\text{Battery=9h}\right] + \beta_{52}\mathbb{1}\left[\text{Battery=8h}\right] +\\ &\beta_{6}\text{Price} \end{align*}\]

where \(U_j = V_j + \text{error}\). That is, there are 15 parameters \(\beta\) to estimate.

Assuming independent extreme value error distribution, consumer chooses alternative \(j\) from the choice set of three alternatives with probability: \[ p_j = \frac{\exp(V_j)}{\exp(V_1)+\exp(V_2)+\exp(V_3)},\ \ j\in\{1,2,3\} \]

Clearly, \(p_1+p_2+p_3=1\)

Now the model is estimated using the mlogit() function as follows. The results of the estimation can be seen in the table below:

set.seed(999) # remember to set the random seed to ensure replicability
model <- mlogit(Choice ~ 0 + Brand + Size  + Storage + Ram + Battery + Price, data = mdata) # 0 + tells model to exclude intercept
Estimate Std. Error z-value Pr(> | z| )
BrandGalaxy 0.338 0.093 3.653 0.0003
BrandiPad 0.978 0.094 10.434 0
BrandKindle 0.263 0.100 2.640 0.008
BrandSurface 0.145 0.094 1.545 0.122
Size10inch 0.324 0.084 3.849 0.0001
Size8inch 0.189 0.083 2.280 0.023
Size9inch 0.436 0.081 5.388 0.00000
Storage128gb 0.590 0.087 6.775 0
Storage32gb 0.217 0.083 2.615 0.009
Storage64gb 0.578 0.081 7.154 0
Ram2gb 0.319 0.067 4.742 0.00000
Ram4gb 0.636 0.065 9.853 0
Battery8h 0.130 0.065 1.995 0.046
Battery9h 0.125 0.065 1.927 0.054
Price -0.005 0.0003 -18.489 0

Meaning of parameters

After the estimation, a coefficient estimate for each level (except the reference one) of every discrete attribute is obtained. Such coefficient captures relative utility or part-worth of the level of attribute compared to the reference level. For example, in case of the brand attribute, BrandiPad coefficient gives us an estimate of iPad’s brand relative utility compared to Nexus (reference brand), in this case 0.978 relative utility.

In case of price, a single coefficient is obtained, which captures how the utility of the alternative changes when price goes up by one unit (USD$1), holding all other characteristics of the alternative fixed (ceteris paribus).

Prediction

The estimated parameters can be used to predict the probabilities of the choice for different alternatives in the data. The prediction for the first five choice sets in the data are shown in the table below:

Alternative 1 Alternative 2 Alternative 3
0.4 0.4 0.2
0.2 0.5 0.3
0.5 0.3 0.2
0.4 0.4 0.2
0.4 0.2 0.4

Now, using the confusion matrix created with the packages cvms and ggplot2 we can measure the accuracy of prediction across all data as shown below:

If the predictions were random, the accuracy would be 33.3% for three alternatives (as we have in the choice sets). The simple model is doing much better than that with a 56,35% – although it is not perfect.

Willingness to pay

Using parameter estimates, it can be calculated how much a consumer would be willing to pay for the selected level of an attribute by dividing the estimated coefficient for that level by the price coefficient. In other words, it can be estimated what change in price would cause a change in utility equivalent to the change in the level of the attribute in question from the reference level:

## BrandiPad 
##  125.7944
##   Ram4gb 
## 124.9299
## Size9inch 
##   85.5882

For example, it can be seen that an average consumer would be willing to pay up to USD$125.8 to get an iPad instead of a Nexus, holding all other characteristics fixed.

Conclusion

In this article we designed a Choice-Based Conjoint analysis with product alternatives defined on 6 Tablet’s attributes in order to know more in depth which attributes are more important (utility wise) to consumers. The main objective of choice-based conjoint analysis is to identify important attributes that the consumer considers when evaluating the proposed alternatives. For instance, the relative utility or part-worths estimated can be used to represent attribute importance when using a model such as a multinomial logit regression. These practices allow marketing analysts and the firm to test its new product design in a hypothetical market simulation and predict its performance in the market relative to its competition.