If you don’t have enough budget or tech skilss to do those fancy, tech-savvy conjoint analyses, it doesn’t mean that conjoint analysis is completely out of reach to you. You just need to be a bit clever and do some extra works. At below, I offer an example of doing conjoint analysis using \(\textsf{R}\)’s \(\textsf{conjoint}\) package using data extracted from a customized SurveyCake survey. For more details on commercial application of conjoint analysis, I highly recommend that you reference Prof. Tony Chuo’s tutorial page.
During the televised vice presidential debate on Dec. 22nd, 2024, Taiwan People’s Party vice presidential candidate Mrs. Cynthia Hsin-ying Wu (吳欣盈) stated that “細漢欸時準打給攏愛啉Johnnie Walker,” implying “Taiwan moves forward,” sparking heated discussion from the public. Johnnie Walker is an imported blended whiskey, and whiskey has been a staple on Taiwanese tables during holidays and festivals for decades, witnessing Taiwan’s economic takeoff.
Unbeknownst to most people, Taiwan has long been a market that liquor merchants vie for, due to zero tariffs on whiskey (only a 5% sales tax is levied). According to 2021 data, Taiwan imported over NT$8 billion worth of Scotch whiskey annually, ranking third globally and continuing to grow; meanwhile, whiskies distilled by local Taiwanese distilleries have frequently won international awards in recent years and been recognized as the world’s sixth major whiskey-producing region. Whether imported or domestic, whiskey has become deeply embedded into Taiwanese local drinking and banquet culture. But how much do you really know about it?
Generally speaking, the main commercial products from liquor merchants are divided by aging years into [12 years, 15 years, 18 years]. In terms of the origin of the base spirit (distillery), they are often categorized into [single malt, blended], and the differences between these two categories are reflected in the price, generally falling into three ranges: [under NT$1,000, NT$1,000 to NT$2,500, over NT$2,500].
Why are there these differences? As the base spirit matures in wooden barrels, the flavor becomes richer and the alcohol proof gradually increases due to evaporation-this cost is reflected in the price (a cute term for this is the so-called “Angel’s share,” that is, the price exacted by the Mother Nature).
Differences in climate and humidity across producing regions have long been a contentious issue among whiskey-exporting countries within the World Trade Organization (WTO). For example, India is also a major whiskey-producing region where the warm and humid climate accelerates maturation. This has led temperate-zone producing countries like the U.K. to view it as unfair trade competition, demanding higher import tariffs on Indian whiskey products to protect the market competitiveness of their domestic whiskies-a move that India considers a violation of the WTO’s non-discrimination principle among member states. Business-wise, people appreciate variety. Market segmentation based on years of maturity also maximizes liquor producers’ profits (Page 2019).
Whether to tax spirits at full barrel volume or only the amount left in the cask after ageing also carries important fiscal implications (Dietz 1997). Sir William E. Gladstone was widely regarded as the chief proponent of the Inland Revenue Act of 1880 (commonly known as the Free Mash-Tun Act), which abolished taxes on malt and exempted Angel’s share from tax. What’s behind Sir Gladstone’s motivation to push through this act? Well, his family business was counting on it, but this unfounded allegation should not distract us here.
Back to our main story, Mrs. Cynthia Hsin-ying Wu also broached another thorny issue. She said (and I quoted) “Blended whiskey …
Is blended whiskey (such as our protagonist here, Johnnie Walker) really inferior in quality to single malt whiskey, as Wu claimed?
In fact, the liquid in each bottle of whiskey may not come from just one oak barrel, or even from base spirits of different distilleries, depending solely on the manufacturer’s marketing strategy. What is known as “single malt whiskey” refers to whiskey produced and bottled at the same distillery, believed to fully express the territorial characteristics of that location. “Blended whiskey,” on the other hand, means the base spirits may come from different distilleries. Whether single malt or blended, both are ultimately blended products.
Single malt involves selecting base spirits from different oak barrels within one distillery, blending them, and diluting with water according to product characteristics (so it still requires blending…); blended whiskey, in addition to selecting single malts from various distilleries, can also incorporate grain whiskies. How well it turns out depends heavily on the skill of each distillery’s master blender.
Before you get intoxicated by the obnoxious content presented above, let’s get some brainstorming!
We applied \(\textsf{R}\)’s \(\textsf{conjoint}\) to produce a 3-attribute design with the following “levels”:
### Load required packages (uncomment if you haven't downloaded these packages)
# install.packages("conjoint") # conjoint package
library(conjoint)
c <- expand.grid(
age <- c("12年", "15年", "18年"),
malt <- c("單一麥芽", "調和"),
price <- c("1000元內", "1000以上至2500元內", "2500元以上"))
# Change the column names to uppercase letters
names(c) <- c("Age", "Malt", "Price")
# Here's what we got
design <- caFactorialDesign(data=c, type="orthogonal")
design
## Age Malt Price
## 1 12年 單一麥芽 1000元內
## 3 18年 單一麥芽 1000元內
## 5 15年 調和 1000元內
## 7 12年 單一麥芽 1000以上至2500元內
## 12 18年 調和 1000以上至2500元內
## 14 15年 單一麥芽 2500元以上
## 15 18年 單一麥芽 2500元以上
## 16 12年 調和 2500元以上
So we need to ask respondents to compare at least 8 unique combinations (called “profiles”) of the 3 attributes in order to get somewhat reliable estimate of each attribute’s effect on the probability that a whiskey product with this level of attribute is likely to be preferred by an average consumer.
Since we do not have the luxury of randomized flashcard setting used in Qualtrics, a walk-around solution is to place these 8 profiles in SurveyCake survey’s rank order question and have them displayed in random order to mitigate ordering effect on our final statistical inference.
Finally, with a bit of data wrangling via \(\textsf{grepl}\) on Excel sheel, we re-organized > 2K responses (collected during the 2024 Lunar New Year) into \(\textsf{conjoint}\)-readable format. So there you have it!
### Load required packages (uncomment if you haven't downloaded these packages)
# install.packages("dplyr") # For data wrangling
# install.packages("ggplot2") # For visualization
library(dplyr)
library(ggplot2)
Sys.setlocale("LC_ALL", "C") # Language display setting: need to read Chinese text
## [1] "C"
wsk <- read.csv("C://Users/hktse/Dropbox/眾包平台/問卷資料備份/抽獎問卷/whiskey_data.csv")
names(wsk)[1] <- "gender"
wsk$ID <- 1:nrow(wsk) # Generate ID
# Remove underage respondents!
wsk <- wsk[which(wsk$age >= 20), ]
wsk$age <- ifelse(
(wsk$age >= 20 & wsk$age < 30), "20-29",
ifelse((wsk$age >= 30 & wsk$age < 40), "30-39",
ifelse((wsk$age >= 40 & wsk$age < 50), "40-49",
ifelse((wsk$age >= 50 & wsk$age < 60), "50-59",
ifelse((wsk$age >= 60 & wsk$age < 70), "60-69",
ifelse((wsk$age >= 70 & wsk$age < 80), "70-79", ">= 80") )))))
wsk$gender <- ifelse(wsk$gender == 1, "Male", "Female")
# Compile data
data <- data.frame(
Age = c(c("20-29", "30-39", "40-49", "50-59", "60-69", "70-79", ">= 80"), c("20-29", "30-39", "40-49", "50-59", "60-69", "70-79", ">= 80")),
Gender = c(rep("Male", 7), rep("Female", 7)),
Population = c(186, 318, 389, 117, 71, 19, 1,
218, 287, 260, 119, 72, 25, 1)
)
data$Age <- as.factor(data$Age)
# Visualize respondents' demographic profile
data <- data %>% mutate(Age = factor(Age, levels = c("20-29", "30-39", "40-49", "50-59", "60-69", "70-79", ">= 80")))
basic_plot <- ggplot(
data,
aes(
x = Age,
fill = Gender,
y = ifelse(
test = Gender == "Male",
yes = -Population,
no = Population
)
)
) +
geom_bar(stat = "identity")
population_pyramid <- basic_plot +
scale_y_continuous(
labels = abs,
limits = max(data$Population) * c(-1,1)
) +
coord_flip() +
theme_minimal() +
labs(
x = "Age",
y = "Population",
fill = "Age",
title = "Population Pyramid"
) + ggtitle(" ") +
theme(plot.title = element_text(hjust = 0.5))
# Display plot
population_pyramid+scale_fill_manual(values = c("rosybrown2", "powderblue"))
Now the analysis.
# Convert collected responses to conjoint-readable format
library(stringi)
text <- wsk[,5]
names(text)[1] <- c("whiskey")
w_list <- list()
for (i in 1:length(text))
{
cj <- stri_unescape_unicode(gsub("<U\\+(....)>", "\\\\u\\1", text[i]))
w_list[[i]] <- cj
}
# Switch back to Chinese language setting in order to display Chinese chaarcters
Sys.setlocale(category = "LC_ALL", locale = "cht")
wsk$cj <- unlist(w_list)
# Verify
head(wsk,2)
preferences <- wsk
preferences$cj <- as.character(preferences$cj)
# Make profiles data
profiles <- c(
"12年、單一麥芽、1000元內",
"18年、單一麥芽、1000元內",
"15年、調和、1000元內",
"12年、單一麥芽、1000以上至2500元內",
"18年、調和、1000以上至2500元內",
"15年、單一麥芽、2500元以上",
"18年、單一麥芽、2500元以上",
"12年、調和、2500元以上"
)
# Handling string text
library(stringr)
df <- data.frame(do.call(rbind, lapply(preferences$cj, function(x) {
match(
profiles,
str_replace_all(strsplit(x, "\\n")[[1]], "^[0-9]+. ", "")
)
})))
names(df) <- paste0("profile", 1:length(profiles))
design <- caFactorialDesign(data=c, type="orthogonal")
code <- caEncodedDesign(design) # Generate "profiles"
print(code) # Show labels that constitute the "profiles" of choice features
# Verification
print(round(cov(code),5))
print(round(cor(code),5))
encodedorthodesign <- data.frame(design, code)
print(encodedorthodesign)
# Set levels for the three attributes by the order of Age -> Malt -> Price
lev <- c("12年", "15年", "18年", "單一麥芽", "調和", "1000元內", "1000以上至2500元內", "2500元以上")
lev.df <- data.frame(lev)
# preferences from rank ordering (so-called ranking) into importance assessments (so-called rating)
preferences <- caRankToScore(y.rank = df)
# We now have pretty much everything we need to conduct conjoint analysis. Save design, code, leve.df, and the caRankToScore()-encoded preferences data to be used later.
# Load design, code, and preferences data externally
design <- readRDS(url("https://www.dropbox.com/scl/fi/3vwpnq0xm244cbyb8gswe/design.rds?rlkey=c2gylz9chnp71y2saszjovumm&st=09ol1mre&dl=1"))
code <- readRDS(url("https://www.dropbox.com/scl/fi/dvr61by009k956g4mczor/code.rds?rlkey=8qwnb9hrnl9w2qncc25l8rp14&st=yxtkzopt&dl=1"))
lev.df <- readRDS(url("https://www.dropbox.com/scl/fi/jdcfvsjvrln6fgr61jfg8/lev.df.rds?rlkey=j33xxt3cynlmf9oudk2ipnx4x&st=320us2tb&dl=1"))
preferences <- readRDS(url("https://www.dropbox.com/scl/fi/pegaymsyefhit65r27fgg/preferences.rds?rlkey=yw1ia0r5lkmt5r6ie21zvcael&st=o8d0vm5g&dl=1"))
# Measuring preferences at individual level (for the 1st respondent)
caModel(preferences[1,], code)
##
## Call:
## lm(formula = frml)
##
## Residuals:
## 1 2 3 4 5 6 7 8
## 2.38095 -2.28571 -0.09524 -1.23810 1.23810 0.09524 1.04762 -1.14286
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.2381 1.0817 3.918 0.0594 .
## factor(x$Age)1 0.2381 1.4423 0.165 0.8841
## factor(x$Age)2 -0.1429 1.6912 -0.084 0.9404
## factor(x$Malt)1 0.5714 1.0817 0.528 0.6501
## factor(x$Price)1 -1.4286 1.4423 -0.991 0.4263
## factor(x$Price)2 -0.8095 1.6912 -0.479 0.6794
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.862 on 2 degrees of freedom
## Multiple R-squared: 0.61, Adjusted R-squared: -0.3651
## F-statistic: 0.6256 on 5 and 2 DF, p-value: 0.7094
# Determining relative importance of features for the 1st respondent
importance = caImportance(y = preferences[1,], x = code)
print(importance)
## [1] 7.34 22.02 70.64
# Measuring preferences at aggregated level (in the cross-section of respondents)
partutilities = caPartUtilities(y = preferences, x = code, z = lev.df)
head(partutilities, 20)
## intercept 12年 15年 18年 單一麥芽 調和 1000元內
## [1,] 4.238 0.238 -0.143 -0.095 0.571 -0.571 -1.429
## [2,] 4.595 -1.905 1.143 0.762 -0.071 0.071 1.095
## [3,] 4.571 -1.429 0.524 0.905 0.571 -0.571 -2.762
## [4,] 4.119 0.286 -0.571 0.286 -0.214 0.214 2.952
## [5,] 4.857 -0.476 -0.381 0.857 -2.143 2.143 1.524
## [6,] 4.286 -0.381 0.429 -0.048 1.286 -1.286 1.619
## [7,] 4.476 -0.857 0.048 0.810 0.143 -0.143 2.476
## [8,] 3.929 0.095 -0.524 0.429 1.929 -1.929 1.429
## [9,] 4.286 -2.381 0.429 1.952 1.286 -1.286 -0.714
## [10,] 4.571 0.571 -0.143 -0.429 -0.429 0.429 2.571
## [11,] 4.190 0.524 -0.048 -0.476 0.857 -0.857 -1.810
## [12,] 4.643 -1.190 1.048 0.143 0.643 -0.643 -2.857
## [13,] 4.476 -1.524 -0.952 2.476 -0.857 0.857 1.810
## [14,] 5.024 -1.143 1.952 -0.810 0.357 -0.357 -0.143
## [15,] 4.405 -1.762 -0.810 2.571 0.071 -0.071 -1.429
## [16,] 4.786 0.286 -0.905 0.619 -1.214 1.214 -2.381
## [17,] 4.738 -1.429 0.190 1.238 -0.929 0.929 0.238
## [18,] 4.524 -2.476 -0.048 2.524 -0.143 0.143 -0.810
## [19,] 4.786 0.619 1.095 -1.714 0.786 -0.786 -0.381
## [20,] 4.333 -0.333 0.000 0.333 1.000 -1.000 -2.667
## 1000以上至2500元內 2500元以上
## [1,] -0.810 2.238
## [2,] -0.524 -0.571
## [3,] 1.190 1.571
## [4,] -2.905 -0.048
## [5,] -1.048 -0.476
## [6,] 0.429 -2.048
## [7,] 0.048 -2.524
## [8,] -0.190 -1.238
## [9,] 0.429 0.286
## [10,] -0.143 -2.429
## [11,] -0.714 2.524
## [12,] 1.381 1.476
## [13,] -0.952 -0.857
## [14,] 2.952 -2.810
## [15,] 0.190 1.238
## [16,] 0.762 1.619
## [17,] -0.143 -0.095
## [18,] -0.048 0.857
## [19,] 2.762 -2.381
## [20,] 0.667 2.000
# Measuring total utilities
totalutilities = caTotalUtilities(y = preferences, x = code)
head(totalutilities, 20)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 3.619 3.286 2.095 4.238 2.762 6.905 6.952 6.143
## [2,] 3.714 6.381 6.905 2.095 4.905 5.095 4.714 2.190
## [3,] 0.952 3.286 1.762 4.905 6.095 7.238 7.619 4.143
## [4,] 7.143 7.143 6.714 1.286 1.714 3.286 4.143 4.571
## [5,] 3.762 5.095 8.143 1.190 6.810 1.857 3.095 6.048
## [6,] 6.810 7.143 5.048 5.619 3.381 3.952 3.476 0.571
## [7,] 6.238 7.905 6.857 3.810 5.190 2.143 2.905 0.952
## [8,] 7.381 7.714 2.905 5.762 2.238 4.095 5.048 0.857
## [9,] 2.476 6.810 2.714 3.619 5.381 6.286 7.810 0.905
## [10,] 7.286 6.286 7.429 4.571 4.429 1.571 1.286 3.143
## [11,] 3.762 2.762 1.476 4.857 2.143 7.524 7.095 6.381
## [12,] 1.238 2.571 2.190 5.476 5.524 7.810 6.905 4.286
## [13,] 3.905 7.905 6.190 1.143 6.857 1.810 5.238 2.952
## [14,] 4.095 4.429 6.476 7.190 6.810 4.524 1.762 0.714
## [15,] 1.286 5.619 2.095 2.905 7.095 4.905 8.286 3.810
## [16,] 1.476 1.810 2.714 4.619 7.381 4.286 5.810 7.905
## [17,] 2.619 5.286 6.095 2.238 6.762 3.905 4.952 4.143
## [18,] 1.095 6.095 3.810 1.857 7.143 5.190 7.762 3.048
## [19,] 5.810 3.476 4.714 8.952 5.048 4.286 1.476 2.238
## [20,] 2.333 3.000 0.667 5.667 4.333 7.333 7.667 5.000
# The holy grail!
# Summary of the most important preference measurement results
Conjoint(y = preferences, x = code, z = lev.df)
##
## Call:
## lm(formula = frml)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4,2018 -1,8760 -0,0565 1,7982 4,9435
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4,48912 0,01833 244,887 < 2e-16 ***
## factor(x$Age)1 -0,48168 0,02444 -19,707 < 2e-16 ***
## factor(x$Age)2 0,25492 0,02866 8,894 < 2e-16 ***
## factor(x$Malt)1 0,40350 0,01833 22,012 < 2e-16 ***
## factor(x$Price)1 0,08241 0,02444 3,372 0,000748 ***
## factor(x$Price)2 0,46503 0,02866 16,226 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0,001 '**' 0,01 '*' 0,05 '.' 0,1 ' ' 1
##
## Residual standard error: 2,214 on 16658 degrees of freedom
## Multiple R-squared: 0,06704, Adjusted R-squared: 0,06676
## F-statistic: 239,4 on 5 and 16658 DF, p-value: < 2,2e-16
## [1] "Part worths (utilities) of levels (model parameters for whole sample):"
## levnms utls
## 1 intercept 4,4891
## 2 12年 -0,4817
## 3 15年 0,2549
## 4 18年 0,2268
## 5 單一麥芽 0,4035
## 6 調和 -0,4035
## 7 1000元內 0,0824
## 8 1000以上至2500元內 0,465
## 9 2500元以上 -0,5474
## [1] "Average importance of factors (attributes):"
## [1] 31,18 20,55 48,27
## [1] Sum of average importance: 100
## [1] "Chart of average factors importance"
So which factor (attribute) is most important in driving whiskey consumers’ purchasin decisions? Price is clearly the most important factor!
Within each of the three factors, which level positively (negatively)
affects consumers’ purchasing decisions?
It appears that an average Taiwanese consumer would prefer a 15-year, single malt whiskey priced between NT$1000-2500. Taiwanese absolutely love single male, to say the least!
When all attributes combined, how much utility does each level (within an attribute) contribute to a consumer’s purchasing decision? To answer this question, we need to compute a metric called average Parts Worth utility. Average Parts Worth utility is a numerical score representing the average preference for each level of an attribute across all respondents. These scores are calculated by first finding the range of preferences for each attribute for each individual, and then averaging these values across all participants to get the mean part-worth utility. This helps to show the average impact each attribute level has on a customer’s decision.
### Calculate average parts worth
# profiles_apw
profiles_apw <- data.frame(Age = c(1, 3, 2, 1, 3, 2, 3, 1), Malt = c(1, 1, 2, 1, 2, 1, 1, 2),
Price = c(1, 1, 1, 2, 2, 3, 3, 3))
# Set levels
lev <- c("12年", "15年", "18年", "單一麥芽", "調和", "1000元內", "1000以上至2500元內", "2500元以上")
att = apply(profiles_apw,2,max)
attl = unlist(sapply(1:length(att), function(x) rep(x , att[x])))
attl[attl == 1] <- "gray72"
attl[attl==2] <- "gold2"
attl[attl==3] <- "red1"
# average parts worth
apw = caUtilities(y = preferences, x = profiles_apw, z = lev)
# plot apw
bar_apw <- barplot(apw[2:length(apw)],las = 2,col = attl, main="Average Part Worths (Utility)")
# Rotate x-axis labels by 45 degrees
text(x = bar_apw,
y = par("usr")[3] - 0.03, # Position below the axis
labels = lev,
srt = 45, # Rotate 45 degrees
adj = 1, # right-justify (adjust alignment)
xpd = TRUE, # Allow text outside plot area
cex = 0.7) # Label size
Using exactly the same procedure, we can infer what level of an attribute is prioritized by a particular respondent subgroup. At what follows, we visualize the difference in average parts worth by age group, gender, and between beginners and more experienced buyers. (Email me for replication code if you’ are’ interested)
Older consumers prefer single malt more than their younger counterparts, but somehow dislike whisket products priced over NT$2500.
When OurWhiskey Foundation first distributed an online survey asking female readers “Do you even like whisky?” It sparked an outcry against potential gender-based discrimination because many women are, in fact, frequent consumers of liquor products and do not view whiskey as a manly drink. But do women’s whiskey preferences differ from those of men? Let’s find out!
It seems that men appreciate more matured single malt than women but are less willing to spend big bucks on more expensive whiskies (go girls!)
Finally, we look at the difference in average parts worth between beginners and more experienced buyers. Clearly, beginners are budget-savvy.
Can we dig more deeply into this dataset? Yes, let’s estimate a tree-based model to uncover a typical respondent’s whiskey-purchasing decision-making process by using total average parts worth (apw) utilities aggregated from all levels of the three attributes as the outcome variable to be partitioned. An attribute level that appears early in the decision process contain larger variance and is the key factor in shaping the effects of other attribute levels along subsequent decision-making stages.
# Load the data
total_apw <- read.csv("C://Users/hktse/Dropbox/眾包平台/問卷資料備份/抽獎問卷/total_apw.csv")
#Rename columns
names(total_apw)[1:8] <- c("12-yr", "15-yr", "18-yr", "Single malt", "Blended", ">NT$1000", "NT$1000-2500", ">NT$2500")
# Load required packages
library(rsample) # Data splitting
library(dplyr) # Data wrangling
library(rpart) # Perform regression trees
library(rpart.plot) # Plot regression trees
library(rattle)
tree.fit <- rpart(
formula = total_utility ~ .,
data = total_apw,
method = "anova",
minsplit = 200, # Minimum number of observations in a single node
maxdepth = 7, # Maximum depth (node #) of the final tree, root node is counted as depth 0
xval = 10
)
rpart.plot(tree.fit)
Wow! It seems that whiskey companies and distributors should focus on marketing their 15-year single malt product lines.
Perhaps there are other crucial factors we have ignored that could plausibly invalidate the inference we drawn here, such as branding. (Speaking of which, how many Glen-ish whiskey brands do you know of? In Scottish, “glen” literally means a valley, especially a deep, steep-sided valley.)
Having said that, this analysis serves to demonstrate the potential to leverage off-the-shelf free survey tools to uncover useful insights from a conjoint analysis. Last but not least …
Acknowledgement: Special thanks to Willy Lin for helping to enrich the content of this research note.