In microeconomics, the measurement of consumers’ preferences is one of the most important elements of marketing research. It helps to explain the reasons of consumers’ decisions. Using some statistical methods it is possible to quantify preferences and answer the question: what product will the consumer choose?
Some of the more important issues for which modern conjoint analysis is used are the following:
Define product attributes that can be quantified and corresponding levels. A level is the specific value or realization of the attribute.
Part-worths are generated by OLS Regression. OLS is the method of calculation traditionally used in most conjoint studies. However, OLS is not appropriate for conjoint data consisting of rank orders.
For OLS to be appropriate, we must assume the data are “scaled at the interval level.” By this, we mean that the data are scaled so that real differences in the things being measured are communicated by the arithmetic differences in their values. Fahrenheit temperature, for instance, has an interval scale. The difference between 70 and 80 degrees is exactly as large as the difference between 80 and 90 degrees. In the social sciences and in marketing research we are usually willing to assume that rating scale values possess this kind of scaling.
Some important notes on experimental design:
The basic results of a conjoint analysis are the estimated attribute-level utilities. Keeping with the example in Table 1, conjoint output might look like the output shown below:
Within a given attribute, the estimated utilities are generally scaled in such a way that they add up to zero. So a negative number does not mean that a given level has “negative utility”; it just means that this level is on average less preferred than a level with an estimated utility that is positive.
T-Value Interpretation:
Because of the way conjoint utilities are scaled, the standard interpretation of t-values can yield misleading results. For example, the level “Saturn” of the attribute “Brand” has a t-value of 0.87. In general, a t-value of this magnitude would fail a test of statistical significance; however, this t-value is generated because within the attribute “Brand,” the level “Saturn” has neither a very high nor very low relative preference. It is basically in the middle in terms of overall preference. Because of the scaling, levels that have more moderate levels of preference within a given attribute are likely to have estimated utilities close to zero, which tends to produce very low t-values (recall that the t-test is measuring the probability that the true value of a parameter is not different from zero).
Note:
At a practical level, it is rare that an attribute will not be significant, and, if you find one that is, it means it probably should not have been included in the experimental design in the first place, because respondents are not considering that attribute’s information when they make choices.
library(conjoint)
data(tea)
#calculating the model for the first respondent
#caModel(y = vector of single profile of preferences, x = matrix of profiles)
caModel(y=tprefm[1,], x=tprof)
##
## Call:
## lm(formula = frml)
##
## Residuals:
## 1 2 3 4 5 6 7 8 9
## 1.1345 -1.4897 0.3103 -0.2655 0.3103 0.1931 1.5931 -1.4310 -1.4310
## 10 11 12 13
## 1.1207 0.3690 1.1931 -1.6069
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.3937 0.5439 6.240 0.00155 **
## factor(x$price)1 -1.5172 0.7944 -1.910 0.11440
## factor(x$price)2 -1.1414 0.6889 -1.657 0.15844
## factor(x$variety)1 -0.4747 0.6889 -0.689 0.52141
## factor(x$variety)2 -0.6747 0.6889 -0.979 0.37234
## factor(x$kind)1 0.6586 0.6889 0.956 0.38293
## factor(x$kind)2 -1.5172 0.7944 -1.910 0.11440
## factor(x$aroma)1 0.6293 0.5093 1.236 0.27150
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.78 on 5 degrees of freedom
## Multiple R-squared: 0.8184, Adjusted R-squared: 0.5642
## F-statistic: 3.22 on 7 and 5 DF, p-value: 0.1082
#returning vector of utilities for all 12 attribute levels (intercept is first)
caUtilities(y=tprefm[1,], x=tprof, z=tlevn)
##
## Call:
## lm(formula = frml)
##
## Residuals:
## 1 2 3 4 5 6 7 8 9
## 1,1345 -1,4897 0,3103 -0,2655 0,3103 0,1931 1,5931 -1,4310 -1,4310
## 10 11 12 13
## 1,1207 0,3690 1,1931 -1,6069
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3,3937 0,5439 6,240 0,00155 **
## factor(x$price)1 -1,5172 0,7944 -1,910 0,11440
## factor(x$price)2 -1,1414 0,6889 -1,657 0,15844
## factor(x$variety)1 -0,4747 0,6889 -0,689 0,52141
## factor(x$variety)2 -0,6747 0,6889 -0,979 0,37234
## factor(x$kind)1 0,6586 0,6889 0,956 0,38293
## factor(x$kind)2 -1,5172 0,7944 -1,910 0,11440
## factor(x$aroma)1 0,6293 0,5093 1,236 0,27150
## ---
## Signif. codes: 0 '***' 0,001 '**' 0,01 '*' 0,05 '.' 0,1 ' ' 1
##
## Residual standard error: 1,78 on 5 degrees of freedom
## Multiple R-squared: 0.8184, Adjusted R-squared: 0.5642
## F-statistic: 3.22 on 7 and 5 DF, p-value: 0,1082
## [1] 3.3936782 -1.5172414 -1.1413793 2.6586207 -0.4747126 -0.6747126
## [7] 1.1494253 0.6586207 -1.5172414 0.8586207 0.6293103 -0.6293103
#calculating individual part-worth utilities for the first 6 respondents
head(caPartUtilities(y=tpref, x=tprof, z=tlevn))
## intercept low medium high black green red bags granulated
## [1,] 3.394 -1.517 -1.141 2.659 -0.475 -0.675 1.149 0.659 -1.517
## [2,] 5.049 3.391 -0.695 -2.695 -1.029 0.971 0.057 1.105 -0.609
## [3,] 4.029 2.563 -1.182 -1.382 -0.248 2.352 -2.103 -0.382 -2.437
## [4,] 5.856 -1.149 -0.025 1.175 -0.492 1.308 -0.816 -0.825 -0.149
## [5,] 6.250 -2.333 2.567 -0.233 -0.033 -0.633 0.667 -0.233 -0.333
## [6,] 1.578 -0.713 -0.144 0.856 1.456 -0.744 -0.713 0.656 -0.713
## leafy yes no
## [1,] 0.859 0.629 -0.629
## [2,] -0.495 -0.681 0.681
## [3,] 2.818 0.776 -0.776
## [4,] 0.975 0.121 -0.121
## [5,] 0.567 -1.250 1.250
## [6,] 0.056 1.595 -1.595
#estimate parameters for entire sample of 100 - (assume sample is homogeneous)
#z = matrix of level names
#this function gives us average importance of factors as well
Conjoint(y=tpref, x=tprof, z=tlevn)
##
## Call:
## lm(formula = frml)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5,1888 -2,3761 -0,7512 2,2128 7,5134
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3,55336 0,09068 39,184 < 2e-16 ***
## factor(x$price)1 0,24023 0,13245 1,814 0,070 .
## factor(x$price)2 -0,14311 0,11485 -1,246 0,213
## factor(x$variety)1 0,61489 0,11485 5,354 1,02e-07 ***
## factor(x$variety)2 0,03489 0,11485 0,304 0,761
## factor(x$kind)1 0,13689 0,11485 1,192 0,234
## factor(x$kind)2 -0,88977 0,13245 -6,718 2,76e-11 ***
## factor(x$aroma)1 0,41078 0,08492 4,837 1,48e-06 ***
## ---
## Signif. codes: 0 '***' 0,001 '**' 0,01 '*' 0,05 '.' 0,1 ' ' 1
##
## Residual standard error: 2,967 on 1292 degrees of freedom
## Multiple R-squared: 0.09003, Adjusted R-squared: 0.0851
## F-statistic: 18.26 on 7 and 1292 DF, p-value: < 2,2e-16
## [1] "Part worths (utilities) of levels (model parameters for whole sample):"
## levnms utls
## 1 intercept 3,5534
## 2 low 0,2402
## 3 medium -0,1431
## 4 high -0,0971
## 5 black 0,6149
## 6 green 0,0349
## 7 red -0,6498
## 8 bags 0,1369
## 9 granulated -0,8898
## 10 leafy 0,7529
## 11 yes 0,4108
## 12 no -0,4108
## [1] "Average importance of factors (attributes):"
## [1] 24,76 32,22 27,15 15,88
## [1] Sum of average importance: 100,01
## [1] "Chart of average factors importance"
Assuming the sample is not homogeneous, we can also rate respondents on three or N clusters using the K-means method of clustering. The necessary function is caSegmentation():
library(cluster)
segments <- caSegmentation(y=tpref, x=tprof, c=3)
print(segments)
## K-means clustering with 3 clusters of sizes 29, 31, 40
##
## Cluster means:
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## 1 4.808000 5.070759 2.767310 7.132138 6.843172 2.649483 3.656379 1.539724
## 2 3.330226 5.582000 5.214258 4.207645 3.859419 4.740871 5.173129 5.334710
## 3 5.480275 2.938100 1.368100 4.540275 1.973100 3.782900 1.382900 0.965750
## [,9] [,10] [,11] [,12] [,13]
## 1 2.063862 1.030862 6.691448 5.980517 6.801207
## 2 3.366968 4.838194 4.612129 6.050548 5.108613
## 3 2.820750 0.111225 3.450750 0.442900 0.692900
##
## Clustering vector:
## [1] 1 2 1 2 2 3 1 2 1 1 1 1 3 3 3 3 2 3 2 3 3 1 3 2 2 1 2 2 2 2 3 1 2 1 1
## [36] 1 1 3 3 3 3 2 3 2 3 1 1 3 3 3 1 3 3 3 2 1 3 2 3 2 3 3 1 2 2 1 3 3 3 2
## [71] 1 3 1 2 1 2 2 3 1 1 2 2 2 1 3 3 3 3 2 3 2 3 2 3 3 1 3 2 1 1
##
## Within cluster sum of squares by cluster:
## [1] 1540.596 2316.512 1605.654
## (between_SS / total_SS = 41.0 %)
##
## Available components:
##
## [1] "cluster" "centers" "totss" "withinss"
## [5] "tot.withinss" "betweenss" "size" "iter"
## [9] "ifault"