# Remove scientific notation for cleaner output
options(scipen = 999)Survey Experiments: Design and Analysis
1 Introduction
In this lab session we will work on the analysis of experimental data. In experiments, the most important thing is the design, not the analysis — which is relatively straightforward for standard designs. Today we will cover the following topics:
- Statistical power (Section 2): How to calculate the minimum sample size needed to reliably detect a true effect, and avoid running underpowered experiments.
- Randomization (Section 3): How to implement and verify randomization procedures in R.
- Experimental analysis (Section 4): How to analyze data from real experiments with different designs and complications: a simple 2-group randomized experiment and a 4-arm experiment.
- Analysis of conjoint experiments (Section 5): How to analyze a conjoint experiment using R, including subgroup analysis.
2 Statistical Power
A well-powered experiment prevents two types of errors: a false positive (Type I error, α) — concluding there is an effect when there isn’t — and a false negative (Type II error, β) — missing a real effect. Statistical power is 1 − β, the probability of correctly detecting a true effect.
We have already discussed the logic of statistical power in class. To run power analyses in R, we use the pwr package.
# install.packages("pwr")
library(pwr)Power analyses require you to specify three of the following four quantities; R will solve for the fourth:
- Effect size: how large a difference do you expect?
- Sample size (n): how many observations per group?
- Significance level (α): the false positive rate, conventionally 0.05
- Power (1 − β): the probability of detecting the effect, conventionally 0.80
2.1 Sample Size for a Difference in Proportions
We use pwr.p.test() when we have two groups and a binary (yes/no) outcome. The function requires an effect size in Cohen’s h, which we calculate with ES.h() from the two proportions.
Scenario A: Suppose we expect 50% success in the control group and 75% in the treatment group — a fairly large effect.
# ES.h() converts two proportions into Cohen's h (the appropriate effect size metric
# for proportions). pwr.p.test() then finds the n needed per group.
sample1 <- pwr.p.test(
h = ES.h(p1 = 0.75, p2 = 0.50), # expected proportions in treatment vs. control
sig.level = 0.05, # alpha (Type I error rate)
power = 0.80, # desired power (1 - Type II error rate)
alternative = "two.sided" # we don't assume a direction in advance
)
sample1
proportion power calculation for binomial distribution (arcsine transformation)
h = 0.5235988
n = 28.62923
sig.level = 0.05
power = 0.8
alternative = two.sided
plot(sample1)Interpretation: The
nshown is the required sample size per group. Double it for the total study size. The power curve shows how power rises as sample size grows; the red dot marks your design.
Scenario B: Most interventions produce smaller effects. Let’s try 35% vs. 40% — a much more realistic expectation.
sample2 <- pwr.p.test(
h = ES.h(p1 = 0.40, p2 = 0.35),
sig.level = 0.05,
power = 0.80,
alternative = "two.sided"
)
plot(sample2)Notice how dramatically the required sample size increases for a smaller effect. This is a common surprise for new researchers — realistic effect sizes demand large samples.
2.2 Computing Power for a Fixed Sample Size
Sometimes budget or logistics fix your sample size, so instead of asking “how many do I need?”, you ask “what is my power given what I have?” Simply replace power = with n =:
# Here we fix n = 80 and ask: what is our power?
sample3 <- pwr.p.test(
h = ES.h(p1 = 0.75, p2 = 0.50),
sig.level = 0.05,
n = 80,
alternative = "greater" # one-sided: we expect treatment > control
)
sample3
proportion power calculation for binomial distribution (arcsine transformation)
h = 0.5235988
n = 80
sig.level = 0.05
power = 0.9988106
alternative = greater
plot(sample3)Interpretation: A power below 0.80 means a high probability of a false negative — we might miss a real effect. If power is very low (e.g., < 0.50), running the experiment may not be worthwhile.
2.3 Sample Size for a Difference in Means
When the outcome is continuous (e.g., a 0–10 attitude scale), we think in terms of differences in means. We need two inputs:
- delta (Δ): the expected difference in group means
- sd (σ): the expected standard deviation of the outcome
These two together determine the standardized effect size d = Δ/σ.
# delta and sd are on the same scale as your outcome variable
sample4 <- power.t.test(
delta = 0.75, # expected difference in means (e.g., 0.75 points on a scale)
sd = 2.25, # expected SD of the outcome
sig.level = 0.05,
power = 0.80,
n = NULL # solve for n
)
sample4
Two-sample t test power calculation
n = 142.2466
delta = 0.75
sd = 2.25
sig.level = 0.05
power = 0.8
alternative = two.sided
NOTE: n is number in *each* group
plot(sample4)Again, we can flip this around to assess power for a given sample:
sample5 <- power.t.test(
delta = 0.75,
sd = 2.25,
sig.level = 0.05,
n = 80 # solve for power
)
sample5
Two-sample t test power calculation
n = 80
delta = 0.75
sd = 2.25
sig.level = 0.05
power = 0.5538507
alternative = two.sided
NOTE: n is number in *each* group
If you find it hard to “guess” delta and sd separately, you can work directly with Cohen’s d. A common rule of thumb: d = 0.2 (small effect size), 0.5 (medium), 0.8 (large):
# d = delta / sd. Using d = 0.2 (a small but realistic effect)
sample6 <- power.t.test(
d = 0.2,
power = 0.80,
sig.level = 0.05
)
sample6
Two-sample t test power calculation
n = 393.4067
delta = 0.2
sd = 1
sig.level = 0.05
power = 0.8
alternative = two.sided
NOTE: n is number in *each* group
plot(sample6)Finally, if the two groups have unequal sizes (e.g., because one condition is rarer or more expensive), use pwr.t2n.test():
# n1 and n2 are the sizes of the two groups; d is Cohen's d
sample7 <- pwr.t2n.test(n1 = 28, n2 = 35, d = 0.5)
sample7
t test power calculation
n1 = 28
n2 = 35
d = 0.5
sig.level = 0.05
power = 0.4924588
alternative = two.sided
If you have no prior data to estimate effect size, use a small-to-medium effect (Cohen’s d ≈ 0.3, or a 5–10 percentage point difference in proportions) as a conservative planning assumption. This ensures you are powered for realistic but interesting effects rather than only for large, optimistic ones.
an alternative way of thinking is to power the experiment for the minimum substantively relevant effect you consider is worth reporting. Powering the experiments to detect super tiny effects may be a waste of resources.
3 Randomization
In survey experiments, randomization is usually handled by the survey software. In field or fieldwork experiments, however, we often need to randomize manually — for example, assigning villages or polling stations to treatment. This section shows how to do it in R and, crucially, how to verify that it worked.
3.1 Basic Randomization Procedure
The core idea: generate a vector of random numbers, rank them, and assign the top m units to treatment.
set.seed(1234567) # Set a seed: guarantees the same random draw every time you run this.
# Use any integer; record it for your pre-registration/replication file.
n <- 400 # Total number of units
m <- 200 # Number to assign to treatment (here, 50%)
data1 <- data.frame(ID = 1:n) # Create data frame with unit IDs
sel <- rnorm(n) # Draw n random numbers from a standard normal
sel <- order(sel, decreasing = TRUE) # Rank them (highest first)
data1$Tr <- 0 # Initialize all units as control (0)
data1$Tr[sel[1:m]] <- 1 # Assign the top m units to treatment (1)
head(data1) ID Tr
1 1 1
2 2 1
3 3 1
4 4 0
5 5 0
6 6 1
table(data1$Tr) # Check: should be 200 treated, 200 control
0 1
200 200
3.2 Randomization Check (Balance Test)
One of the most important steps after randomization is checking balance — that the treatment and control groups are similar on pre-treatment characteristics. With true randomization, any differences are due to chance, but we should verify this formally.
We create a larger dataset and add a non-random variable (gender assigned by ID order) to test against:
N2 <- 2000
m2 <- N2 / 2 # 50% treated
data2 <- data.frame(id = 1:N2)
# Assign the first 1000 observations as Male, the rest as Female.
# This is deliberately non-random to give the balance test something to detect IF
# our randomization fails.
data2$genere <- ifelse(data2$id <= N2 / 2, "Male", "Female")
table(data2$genere) # 1000 male, 1000 female
Female Male
1000 1000
Now randomize:
sel2 <- rnorm(N2)
sel2 <- order(sel2, decreasing = TRUE)
data2$Tr <- 0
data2$Tr[sel2[1:m2]] <- 1And check balance. A t-test should find no statistically significant difference in gender or ID across treatment arms — if randomization worked, treatment assignment is independent of these variables:
# Proportions table: should be roughly 50/50 male-female within each arm
check.gender <- table(data2$Tr, data2$genere)
prop.table(check.gender, margin = 1) # row proportions (by treatment arm)
Female Male
0 0.493 0.507
1 0.507 0.493
# Formal tests: p > 0.05 means no evidence of imbalance (which is what we want)
t.test(data2$Tr ~ data2$genere) # Is treatment related to gender? (Should be: NO)
Welch Two Sample t-test
data: data2$Tr by data2$genere
t = 0.62585, df = 1998, p-value = 0.5315
alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
95 percent confidence interval:
-0.02987034 0.05787034
sample estimates:
mean in group Female mean in group Male
0.507 0.493
t.test(data2$id ~ data2$Tr) # Is treatment related to ID order? (Should be: NO)
Welch Two Sample t-test
data: data2$id by data2$Tr
t = -0.13742, df = 1998, p-value = 0.8907
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
-54.21182 47.11182
sample estimates:
mean in group 0 mean in group 1
998.725 1002.275
Interpretation: Non-significant p-values here are good news — they suggest that the random assignment is truly independent of pre-existing characteristics. If a test were significant, it would indicate a flaw in the randomization procedure.
4 Analysis of Experiments
Now we turn to real experimental data. We cover three examples that progressively introduce common complications:
- A simple two-arm experiment (election observers)
- A multi-arm experiment (housing and immigration)
- An experiment with non-compliance (GOTV)
library(readr)
library(tidyverse)
library(haven)
library(stargazer)
library(rempsyc)
library(sjmisc)
library(AER)
library(here)
library(tableone)
library(janitor)here() do?
here() constructs file paths relative to the project root (where your .Rproj file lives), not your current working directory. This makes scripts portable — they run correctly regardless of which subfolder you open them from.
4.1 Experiment 1: Electoral Fraud in Russia
Paper: Enikolopov, R., et al. (2013). “Field experiment estimate of electoral fraud in Russian parliamentary elections.” PNAS, 110(2), 448–452.
Design: During the December 2011 Russian parliamentary election, polling stations were randomly assigned to receive independent election observers (observers = 1) or not (observers = 0). If observers deter fraud, we expect higher turnout and lower incumbent vote share in observed stations.
Key identifying assumption: Because assignment was random, any differences in outcomes between observed and unobserved stations are caused by the observers (not by pre-existing differences between stations).
# --- File path ---
# Adjust these paths to match your project structure
exprussia <- here("data_exprussia.csv")
exprussia <- read_delim(exprussia,delim = ";",
escape_double = FALSE,
locale = locale(decimal_mark = ","))
names(exprussia)[1] "observers" "turnout" "ershare"
summary(exprussia) observers turnout ershare
Min. :0.0000 Min. :0.3348 Min. :0.1273
1st Qu.:0.0000 1st Qu.:0.5223 1st Qu.:0.2891
Median :0.0000 Median :0.5989 Median :0.4558
Mean :0.0493 Mean :0.6105 Mean :0.4438
3rd Qu.:0.0000 3rd Qu.:0.6852 3rd Qu.:0.5563
Max. :1.0000 Max. :1.0000 Max. :0.9400
table(exprussia$observers) # How many stations in each condition?
0 1
3008 156
4.1.1 Descriptive Comparison
The simplest analysis: compare group means.
Russia_summary <- exprussia %>%
group_by(observers) %>%
summarise(
mean_turnout = mean(turnout),
mean_ershare = mean(ershare)
)
knitr::kable(Russia_summary, format = "html", digits = 2, align = "lcc",
caption = "Mean turnout and United Russia vote share by observer presence")| observers | mean_turnout | mean_ershare |
|---|---|---|
| 0 | 0.61 | 0.45 |
| 1 | 0.55 | 0.34 |
4.1.2 Significance Testing
We already see important differences in turnout and vot for United Russia. To assess whether these differences are larger than we would expect from chance alone, we run t-tests:
# nice_t_test() from rempsyc runs t-tests and formats results in APA style
t.test.table <- nice_t_test(
data = exprussia,
response = c("turnout", "ershare"),
group = "observers",
warning = FALSE
) |>
nice_table()
t.test.tableDependent Variable | t | df | p | d | 95% CI |
|---|---|---|---|---|---|
turnout | 8.31 | 178.95 | < .001*** | 0.58 | [0.42, 0.74] |
ershare | 9.25 | 177.91 | < .001*** | 0.66 | [0.50, 0.82] |
4.1.3 Visualization: Bar Plot with Confidence Intervals
Bar plots summarizing group means are common in experimental papers. The error bars represent 95% confidence intervals.
# Compute descriptive stats including SE and 95% CI for each group
Russia_summary <- exprussia |>
group_by(observers) |>
summarize(
mean_turnout = mean(turnout),
mean_ershare = mean(ershare),
n = n(),
SE_turnout = sd(turnout) / sqrt(n()),
SE_ershare = sd(ershare) / sqrt(n()),
CI_turnout = 1.96 * SE_turnout, # half-width of 95% CI
CI_ershare = 1.96 * SE_ershare
)
Russia_summary$observers <- as.factor(Russia_summary$observers)
ggplot(Russia_summary, aes(observers, mean_turnout, fill = observers)) +
geom_col(width = 0.7) +
geom_errorbar(
aes(ymin = mean_turnout - CI_turnout, ymax = mean_turnout + CI_turnout),
width = 0.2
) +
labs(
y = "Turnout ± 95% CI",
x = "Observers present",
title = "Effect of election observers on voter turnout",
caption = "Enikolopov et al. (2013). Error bars = 95% CI."
) +
theme_classic() +
guides(fill = "none")4.1.4 Visualization: Violin + Box Plot (Showing Full Distributions)
Bar plots only show the mean and CI, which can hide important features of the data (e.g., skewness, bimodality). A violin plot shows the full distribution:
# Recode 0/1 to meaningful labels before plotting
exprussia <- exprussia |>
mutate(observers = factor(observers,
levels = c(0, 1),
labels = c("No observers", "Observers")))
ggplot(exprussia, aes(x = observers, y = turnout, fill = observers)) +
geom_violin(trim = FALSE, alpha = 0.6) +
geom_boxplot(width = 0.1, alpha = 0.5) + # Shows median and IQR inside violin
stat_summary( # Adds mean ± SD as a point + range
fun.data = mean_sdl, mult = 1,
geom = "pointrange", color = "black"
) +
labs(
y = "Turnout",
x = NULL,
title = "Distribution of turnout by observer assignment",
caption = "Point = mean; range = ±1 SD. Enikolopov et al. (2013)."
) +
theme_classic() +
guides(fill = "none")Why prefer the violin plot? It reveals whether distributions are symmetric, whether there are outliers, and whether the two groups differ in spread — none of which is visible from a bar chart.
4.2 Experiment 2: Public Housing and Immigration Attitudes
This experiment uses a factorial vignette design — more complex than a simple treatment/control setup. Respondents were shown a scenario about a municipal housing allocation and asked which family should receive the apartment. The first family in the waiting list, or another family with a delicate situation ( a sick daughter) that may deserve priority.
The immigrant status of the first and second family in the scenario was randomly varied, creating four conditions:
| Condition | 1st family | 2nd family |
|---|---|---|
| 1 (I–I) | Immigrant | Immigrant |
| 2 (I–NI) | Immigrant | Non-immigrant |
| 3 (NI–I) | Non-immigrant | Immigrant |
| 4 (NI–NI) | Non-immigrant | Non-immigrant |
By comparing these conditions, we can estimate whether respondents systematically favor one group over another and whether the order matters.
exp_pis <- here("exp_pis.dta")
exp_pis <- read_dta(exp_pis)
names(exp_pis)[1] "gender" "age" "education"
[4] "housing1" "housing1_dv" "leftright"
[7] "immigrants_economy" "immigrants_culture" "housing_dic"
4.2.1 Randomization Check (Balance)
Before analyzing outcomes, we verify that the four groups are similar on pre-treatment characteristics. If randomization worked, we should see no significant differences across groups.
Pis_summary <- exp_pis %>%
group_by(housing1) %>%
summarise(
mean_age = mean(age),
mean_education = mean(education),
mean_leftright = mean(leftright),
mean_imm_economy = mean(immigrants_economy),
mean_imm_culture = mean(immigrants_culture),
mean_gender = mean(gender),
n_obs = n()
)
knitr::kable(Pis_summary, format = "html", digits = 2, align = "lccccccc",
caption = "Pre-treatment characteristics by experimental condition")| housing1 | mean_age | mean_education | mean_leftright | mean_imm_economy | mean_imm_culture | mean_gender | n_obs |
|---|---|---|---|---|---|---|---|
| 1 | 50.03 | 3.86 | 5.26 | 4.55 | 4.16 | 1.48 | 148 |
| 2 | 47.64 | 4.00 | 5.40 | 4.43 | 4.15 | 1.51 | 145 |
| 3 | 47.41 | 3.98 | 5.37 | 4.75 | 4.17 | 1.50 | 167 |
| 4 | 49.95 | 3.64 | 5.51 | 4.67 | 4.26 | 1.46 | 152 |
We use the tableone package, which is purpose-built for balance tables, handles both continuous and binary variables appropriately, and includes significance tests across multiple groups in one clean output.
# Label the treatment variable for cleaner column headers
exp_pis <- exp_pis |>
mutate(condition = factor(housing1,
labels = c("I–I", "I–NI", "NI–I", "NI–NI")))
# Variables to check for balance
covariates <- c("age", "education", "leftright",
"immigrants_economy", "immigrants_culture", "gender")
# CreateTableOne runs the omnibus F-test (continuous) or chi-square (binary)
# across all four groups simultaneously
tab1 <- CreateTableOne(
vars = covariates,
strata = "condition", # grouping variable
data = exp_pis,
factorVars = "gender" # treat gender as categorical, not continuous
)
# Print with SMDs and the omnibus p-value
# smd = TRUE adds standardized mean differences between each group and the first
print(tab1,
smd = TRUE,
showAllLevels = FALSE,
printToggle = FALSE) |>
knitr::kable(caption = "Table 1. Covariate balance across experimental conditions",
notes = "p: omnibus F-test (continuous) or χ² (categorical).
SMD: standardized mean difference vs. I–I condition.")| I–I | I–NI | NI–I | NI–NI | p | test | SMD | |
|---|---|---|---|---|---|---|---|
| n | 148 | 145 | 167 | 152 | |||
| age (mean (SD)) | 50.03 (15.73) | 47.64 (16.41) | 47.41 (16.13) | 49.95 (15.65) | 0.301 | 0.106 | |
| education (mean (SD)) | 3.86 (1.52) | 4.00 (1.59) | 3.98 (1.66) | 3.64 (1.54) | 0.186 | 0.126 | |
| leftright (mean (SD)) | 5.26 (1.74) | 5.40 (1.80) | 5.37 (1.77) | 5.51 (1.72) | 0.654 | 0.077 | |
| immigrants_economy (mean (SD)) | 4.55 (2.87) | 4.43 (2.58) | 4.75 (2.73) | 4.67 (2.45) | 0.743 | 0.067 | |
| immigrants_culture (mean (SD)) | 4.16 (2.87) | 4.15 (2.84) | 4.17 (2.74) | 4.26 (2.67) | 0.987 | 0.020 | |
| gender = 2 (%) | 71 (48.0) | 74 (51.0) | 84 (50.3) | 70 (46.1) | 0.816 | 0.058 |
How to read the table: Each row is a pre-treatment variable. Columns show the mean (SD) for each condition, the omnibus p-value testing equality across all four groups simultaneously, and the SMD for each group relative to the first. All p-values should be > 0.05 and SMDs should be close to 0.
For a more formal pairwise test across all four groups, we can add Bonferroni-corrected pairwise t-tests. Because we run many comparisons simultaneously, we need to adjust the p-values to control the family-wise error rate:
# Cross-tabulate gender by treatment to check proportions
balance <- table(exp_pis$gender, exp_pis$housing1)
prop.table(balance, margin = 2) # Column proportions
1 2 3 4
1 0.5202703 0.4896552 0.4970060 0.5394737
2 0.4797297 0.5103448 0.5029940 0.4605263
# Pairwise t-tests with Bonferroni correction (adjusts for multiple comparisons).
# Goal: ALL p-values should be > 0.05.
pairwise.t.test(exp_pis$leftright, exp_pis$housing1, p.adjust.method = "bonferroni")
Pairwise comparisons using t tests with pooled SD
data: exp_pis$leftright and exp_pis$housing1
1 2 3
2 1 - -
3 1 1 -
4 1 1 1
P value adjustment method: bonferroni
pairwise.t.test(exp_pis$age, exp_pis$housing1, p.adjust.method = "bonferroni")
Pairwise comparisons using t tests with pooled SD
data: exp_pis$age and exp_pis$housing1
1 2 3
2 1.00 - -
3 0.89 1.00 -
4 1.00 1.00 0.95
P value adjustment method: bonferroni
pairwise.t.test(exp_pis$gender, exp_pis$housing1, p.adjust.method = "bonferroni")
Pairwise comparisons using t tests with pooled SD
data: exp_pis$gender and exp_pis$housing1
1 2 3
2 1 - -
3 1 1 -
4 1 1 1
P value adjustment method: bonferroni
pairwise.t.test(exp_pis$education, exp_pis$housing1, p.adjust.method = "bonferroni")
Pairwise comparisons using t tests with pooled SD
data: exp_pis$education and exp_pis$housing1
1 2 3
2 1.00 - -
3 1.00 1.00 -
4 1.00 0.32 0.37
P value adjustment method: bonferroni
Why Bonferroni correction? When we run many tests simultaneously, some will be significant by chance (the multiple comparisons problem). The Bonferroni correction controls the family-wise error rate by making each individual test more stringent.
4.2.2 Main Analysis: Pairwise Comparisons
We first produce a crosstabulation to observe the percentages across conditions:
library(gtsummary)
# Label the variables for clean output
exp_pis_labelled <- exp_pis |>
mutate(
housing1_dv = factor(housing1_dv,
levels = c(1, 2),
labels = c("1st family", "2nd family")),
condition = factor(housing1, levels = c(1, 2, 3, 4),
labels = c("I-I", "I-NI", "NI-I","NI-NI"))
)
# tbl_cross() produces a clean crosstab with row/column totals and
# an omnibus chi-square test across all cells
tbl_cross(
data = exp_pis_labelled,
row = housing1_dv, # outcome in rows
col = condition, # treatment in columns
percent = "column", # show column percentages (proportion within each arm)
statistic = "{n} ({p}%)", # format: count (percentage)
label = list(
housing1_dv ~ "Family chosen",
condition ~ "Experimental condition"
)
) |>
add_p(source_note = TRUE) |> # adds chi-square test p-value as a footnote
bold_labels()
Experimental condition
|
Total | ||||
|---|---|---|---|---|---|
| I-I | I-NI | NI-I | NI-NI | ||
| Family chosen | |||||
| 1st family | 40 (27%) | 48 (33%) | 94 (56%) | 48 (32%) | 230 (38%) |
| 2nd family | 108 (73%) | 97 (67%) | 73 (44%) | 104 (68%) | 382 (62%) |
| Total | 148 (100%) | 145 (100%) | 167 (100%) | 152 (100%) | 612 (100%) |
| Pearson’s Chi-squared test, p<0.001 | |||||
We can also use the pairwise t-test to check for significant differences.
# Are there significant differences across conditions in the outcome?
pairwise.t.test(exp_pis$housing1_dv, exp_pis$housing1, p.adjust.method = "bonferroni")
Pairwise comparisons using t tests with pooled SD
data: exp_pis$housing1_dv and exp_pis$housing1
1 2 3
2 1.0000 - -
3 0.00000034 0.0001 -
4 1.0000 1.0000 0.00002189
P value adjustment method: bonferroni
4.2.3 Regression Analysis
Regression is often preferable to pairwise t-tests because: (1) it gives you a single reference category to interpret against, (2) you can easily add covariates, and (3) outputs are easy to present in publication tables.
# The DV currently codes choice as 1 = first family, 2 = second family.
# Recode to 0/1 so coefficients represent the *proportion choosing the 2nd family*.
exp_pis$housing1_dv <- exp_pis$housing1_dv - 1
# Convert treatment to a factor with meaningful labels
exp_pis$housing1 <- as.factor(exp_pis$housing1)
exp_pis$housing2 <- fct_recode(exp_pis$housing1,
"I-I" = "1",
"I-NI" = "2",
"NI-I" = "3",
"NI-NI" = "4")Model 1: Treatment dummies with “I–I” as the reference category:
fit1 <- lm(housing1_dv ~ housing2, data = exp_pis)
stargazer(fit1, type = "html",
title = "Effect of family immigrant status on housing allocation",
dep.var.labels = "Chose 2nd family (0/1)",
notes = "Reference category: I-I (both families immigrant).")| Dependent variable: | |
| Chose 2nd family (0/1) | |
| housing2I-NI | -0.061 |
| (0.055) | |
| housing2NI-I | -0.293*** |
| (0.053) | |
| housing2NI-NI | -0.046 |
| (0.054) | |
| Constant | 0.730*** |
| (0.039) | |
| Observations | 612 |
| R2 | 0.058 |
| Adjusted R2 | 0.053 |
| Residual Std. Error | 0.472 (df = 608) |
| F Statistic | 12.485*** (df = 3; 608) |
| Note: | p<0.1; p<0.05; p<0.01 |
| Reference category: I-I (both families immigrant). | |
Reading the table: Each coefficient shows how much more (or less) likely respondents were to choose the 2nd family relative to the I–I condition. The intercept is the predicted probability of choosing the 2nd family when both are immigrant.
We observe how in the case of a 1st non-migrant family and a 2nd immigrant origin family, the likelihood of our respondents to advocate for a switch in the order of the list is significantly lower. Changing the reference category can make substantive comparisons clearer. Let’s use “NI–NI” as the baseline:
exp_pis$housing2 <- relevel(exp_pis$housing2, ref = "NI-NI")fit2 <- lm(housing1_dv ~ housing2, data = exp_pis)
stargazer(fit2, type = "html",
title = "Effect of family immigrant status on housing allocation",
dep.var.labels = "Chose 2nd family (0/1)",
notes = "Reference category: NI-NI (both families non-immigrant).")| Dependent variable: | |
| Chose 2nd family (0/1) | |
| housing2I-I | 0.046 |
| (0.054) | |
| housing2I-NI | -0.015 |
| (0.055) | |
| housing2NI-I | -0.247*** |
| (0.053) | |
| Constant | 0.684*** |
| (0.038) | |
| Observations | 612 |
| R2 | 0.058 |
| Adjusted R2 | 0.053 |
| Residual Std. Error | 0.472 (df = 608) |
| F Statistic | 12.485*** (df = 3; 608) |
| Note: | p<0.1; p<0.05; p<0.01 |
| Reference category: NI-NI (both families non-immigrant). | |
4.2.4 Adding Covariates
In a randomized experiment, adding pre-treatment covariates to the regression should not change the treatment effect estimates (because of randomization), but it can improve precision (reduce standard errors). If randomization worked, treatment coefficients should not change substantially. This is a useful check:
fit3 <- lm(
housing1_dv ~ housing2 + gender + age + education + leftright +
immigrants_economy + immigrants_culture,
data = exp_pis
)
stargazer(fit1, fit3, type = "html",
title = "Treatment effects with and without covariates",
dep.var.labels = "Chose 2nd family (0/1)",
column.labels = c("Without covariates", "With covariates"))| Dependent variable: | ||
| Chose 2nd family (0/1) | ||
| Without covariates | With covariates | |
| (1) | (2) | |
| housing2I-I | 0.047 | |
| (0.054) | ||
| housing2I-NI | -0.061 | -0.003 |
| (0.055) | (0.055) | |
| housing2NI-I | -0.293*** | -0.239*** |
| (0.053) | (0.053) | |
| housing2NI-NI | -0.046 | |
| (0.054) | ||
| gender | -0.025 | |
| (0.038) | ||
| age | 0.001 | |
| (0.001) | ||
| education | -0.028** | |
| (0.013) | ||
| leftright | -0.019* | |
| (0.011) | ||
| immigrants_economy | 0.007 | |
| (0.010) | ||
| immigrants_culture | -0.007 | |
| (0.009) | ||
| Constant | 0.730*** | 0.878*** |
| (0.039) | (0.132) | |
| Observations | 612 | 612 |
| R2 | 0.058 | 0.073 |
| Adjusted R2 | 0.053 | 0.059 |
| Residual Std. Error | 0.472 (df = 608) | 0.470 (df = 602) |
| F Statistic | 12.485*** (df = 3; 608) | 5.270*** (df = 9; 602) |
| Note: | p<0.1; p<0.05; p<0.01 | |
4.2.5 Heterogeneous Treatment Effects
Does the treatment affect everyone equally? Probably not — people with strong anti-immigration attitudes may react differently. We test this by interacting the treatment with prior immigration attitudes.
# Inspect the immigration culture variable (0 = enriches, 10 = undermines)
frq(exp_pis$immigrants_culture)Y, usando esta escala, ¿dirías que la vida cultural española se enriquece o se e (x) <numeric>
# total N=612 valid N=612 mean=4.19 sd=2.77
Value | Label | N | Raw % | Valid % | Cum. %
-----------------------------------------------------
0 | Enriquece 0 | 74 | 12.09 | 12.09 | 12.09
1 | 1 | 43 | 7.03 | 7.03 | 19.12
2 | 2 | 66 | 10.78 | 10.78 | 29.90
3 | 3 | 80 | 13.07 | 13.07 | 42.97
4 | 4 | 51 | 8.33 | 8.33 | 51.31
5 | 5 | 134 | 21.90 | 21.90 | 73.20
6 | 6 | 36 | 5.88 | 5.88 | 79.08
7 | 7 | 51 | 8.33 | 8.33 | 87.42
8 | 8 | 28 | 4.58 | 4.58 | 91.99
9 | 9 | 14 | 2.29 | 2.29 | 94.28
10 | Empobrece 10 | 35 | 5.72 | 5.72 | 100.00
<NA> | <NA> | 0 | 0.00 | <NA> | <NA>
# Simplify to 3 ordered categories for interpretability
exp_pis <- exp_pis |>
mutate(immigrants_cult3 = case_when(
immigrants_culture >= 0 & immigrants_culture <= 4 ~ "Positive",
immigrants_culture == 5 ~ "Neutral",
immigrants_culture >= 6 & immigrants_culture <= 10 ~ "Negative"
))
# Set "Negative" as the reference (most restrictive attitudes)
exp_pis$immigrants_cult3 <- factor(exp_pis$immigrants_cult3,
levels = c("Negative", "Neutral", "Positive"))# The * operator includes main effects AND their interaction
fit4 <- lm(housing1_dv ~ housing2 * immigrants_cult3, data = exp_pis)
stargazer(fit4, type = "html",
title = "Heterogeneous treatment effects by immigration attitudes",
dep.var.labels = "Chose 2nd family (0/1)",
notes = "Interaction terms show how treatment effects differ by attitude group.")| Dependent variable: | |
| Chose 2nd family (0/1) | |
| housing2I-I | 0.032 |
| (0.109) | |
| housing2I-NI | 0.063 |
| (0.105) | |
| housing2NI-I | -0.405*** |
| (0.101) | |
| immigrants_cult3Neutral | -0.051 |
| (0.109) | |
| immigrants_cult3Positive | -0.043 |
| (0.092) | |
| housing2I-I:immigrants_cult3Neutral | -0.008 |
| (0.152) | |
| housing2I-NI:immigrants_cult3Neutral | 0.040 |
| (0.160) | |
| housing2NI-I:immigrants_cult3Neutral | 0.239 |
| (0.154) | |
| housing2I-I:immigrants_cult3Positive | 0.035 |
| (0.134) | |
| housing2I-NI:immigrants_cult3Positive | -0.161 |
| (0.129) | |
| housing2NI-I:immigrants_cult3Positive | 0.213* |
| (0.125) | |
| Constant | 0.718*** |
| (0.075) | |
| Observations | 612 |
| R2 | 0.076 |
| Adjusted R2 | 0.059 |
| Residual Std. Error | 0.470 (df = 600) |
| F Statistic | 4.512*** (df = 11; 600) |
| Note: | p<0.1; p<0.05; p<0.01 |
| Interaction terms show how treatment effects differ by attitude group. | |
Interaction coefficients in a table are hard to read. It is much clearer to plot predicted values for each treatment × attitude combination:
# Create a grid of all treatment × attitude combinations
new_data <- expand.grid(
housing2 = levels(exp_pis$housing2),
immigrants_cult3 = levels(exp_pis$immigrants_cult3)
)
# Get predicted values and 95% CI from the model
predictions <- predict(fit4, newdata = new_data, interval = "confidence")
predictions_df <- as.data.frame(predictions)
colnames(predictions_df) <- c("fit", "lwr", "upr")
new_data <- cbind(new_data, predictions_df)
knitr::kable(new_data, digits = 2, format = "html", align = "lcccc",
caption = "Predicted probability of choosing 2nd family by treatment and attitudes")| housing2 | immigrants_cult3 | fit | lwr | upr |
|---|---|---|---|---|
| NI-NI | Negative | 0.72 | 0.57 | 0.87 |
| I-I | Negative | 0.75 | 0.60 | 0.90 |
| I-NI | Negative | 0.78 | 0.64 | 0.92 |
| NI-I | Negative | 0.31 | 0.18 | 0.45 |
| NI-NI | Neutral | 0.67 | 0.51 | 0.82 |
| I-I | Neutral | 0.69 | 0.55 | 0.83 |
| I-NI | Neutral | 0.77 | 0.59 | 0.95 |
| NI-I | Neutral | 0.50 | 0.33 | 0.67 |
| NI-NI | Positive | 0.68 | 0.57 | 0.78 |
| I-I | Positive | 0.74 | 0.63 | 0.85 |
| I-NI | Positive | 0.58 | 0.47 | 0.68 |
| NI-I | Positive | 0.48 | 0.39 | 0.58 |
ggplot(new_data, aes(x = housing2, y = fit, fill = immigrants_cult3)) +
geom_bar(stat = "identity", position = "dodge") +
geom_errorbar(
aes(ymin = lwr, ymax = upr),
position = position_dodge(width = 0.9),
width = 0.25
) +
labs(
title = "Predicted probability of choosing 2nd family",
subtitle = "By experimental condition and attitudes towards immigration",
x = "Experimental condition",
y = "Predicted probability",
fill = "Immigration attitudes"
) +
theme_minimal() +
theme(legend.position = "bottom")Interpretation guide: If the treatment effects (differences across bars within each attitude group) vary in size or direction across attitude groups, that is evidence of heterogeneous treatment effects — the experiment affected different people differently.
4.3 Experiment 3: ITT and CACE — Get Out the Vote
The third experiment illustrates a common real-world complication: non-compliance. In the Gerber & Green (2005) GOTV study, voters were randomly assigned to receive a personal visit encouraging them to vote. However, some assigned to the treatment group were not at home and thus never received the visit — these are “never-takers.”
This creates a gap between:
- Intent-to-Treat (ITT): the effect of being assigned to treatment (regardless of whether treatment was received)
- Complier Average Causal Effect (CACE): the effect of actually receiving treatment, estimated only among those who complied
The CACE is what we really care about, but we can only estimate it indirectly, using the assignment as an instrumental variable for actual treatment receipt.
data1 <- read_csv("Gerber_Green_APSRsubset_2005.csv")# Keep only single-person households in the door-to-door treatment arm
# (filtering out mail and phone treatments for clean comparison)
data2 <- data1 %>%
filter(onetreat == 1, mailings == 0, phongotv == 0, persons == 1)4.3.1 Intent-to-Treat (ITT) Estimate
The ITT is the simplest estimator: regress the outcome on the assignment indicator. This is always valid because assignment is random, even if compliance is imperfect.
# persngrp = 1 if assigned to personal visit condition
# v98_1 = voted in 1998 election (0/1)
itt_fit <- lm(v98_1 ~ persngrp, data = data2)
stargazer(itt_fit, type = "html",
title = "Intent-to-Treat estimate (effect of assignment)",
dep.var.labels = "Voted (0/1)",
covariate.labels = c("Assigned to personal visit"),
notes = "ITT: effect of being assigned to treatment, regardless of compliance.")| Dependent variable: | |
| Voted (0/1) | |
| Assigned to personal visit | -0.121 |
| (0.657) | |
| Constant | 5.742*** |
| (0.296) | |
| Observations | 7,090 |
| R2 | 0.00000 |
| Adjusted R2 | -0.0001 |
| Residual Std. Error | 22.269 (df = 7088) |
| F Statistic | 0.034 (df = 1; 7088) |
| Note: | p<0.1; p<0.05; p<0.01 |
| ITT: effect of being assigned to treatment, regardless of compliance. | |
Interpretation: The ITT estimates the effect of offering the treatment to the full assigned group, diluted by non-compliers who did not actually receive it.
4.3.2 First Stage: Effect of Assignment on Compliance (ITT_D)
We next estimate how many of those assigned to treatment actually received it (were visited). This is called the “first stage” in IV terminology, or ITT_D:
# cntany = 1 if the voter was actually contacted
itt_d_fit <- lm(cntany ~ persngrp, data = data2)
stargazer(itt_d_fit, type = "html",
title = "First stage: Effect of assignment on treatment receipt (ITT_D)",
dep.var.labels = "Actually contacted (0/1)",
covariate.labels = c("Assigned to personal visit"),
notes = "ITT_D: proportion of assigned units who complied.")| Dependent variable: | |
| Actually contacted (0/1) | |
| Assigned to personal visit | 0.273*** |
| (0.006) | |
| Constant | 0.000 |
| (0.003) | |
| Observations | 7,090 |
| R2 | 0.230 |
| Adjusted R2 | 0.230 |
| Residual Std. Error | 0.201 (df = 7088) |
| F Statistic | 2,122.996*** (df = 1; 7088) |
| Note: | p<0.1; p<0.05; p<0.01 |
| ITTproportion of assigned units who complied. | |
Key quantity: The coefficient on
persngrpis the compliance rate — the share of assigned units who actually received the treatment.
4.3.3 Complier Average Causal Effect (CACE)
The CACE answers the question: “For those who would comply with assignment, what is the effect of actually receiving the treatment?”
By the exclusion restriction (assignment only affects voting through actual contact), we can recover CACE as:
\[\text{CACE} = \frac{\text{ITT}}{\text{ITT}_D}\]
# Manual calculation: ratio of the two regression coefficients
cace_manual <- coef(itt_fit)[2] / coef(itt_d_fit)[2]
cat("CACE (manual):", round(cace_manual, 4))CACE (manual): -0.4431
This is algebraically equivalent to Two-Stage Least Squares (2SLS), where we use assignment (persngrp) as an instrument for actual contact (cntany):
# ivreg() syntax: outcome ~ endogenous_variable | instrument
# The | separates the "structural equation" from the "instrument"
iv_model <- ivreg(v98_1 ~ cntany | persngrp, data = data2)
summary(iv_model)
Call:
ivreg(formula = v98_1 ~ cntany | persngrp, data = data2)
Residuals:
Min 1Q Median 3Q Max
-5.742 -5.742 -5.742 -4.742 93.701
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.7419 0.2964 19.375 <0.0000000000000002 ***
cntany -0.4431 2.4015 -0.185 0.854
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 22.27 on 7088 degrees of freedom
Multiple R-Squared: 0.0002198, Adjusted R-squared: 7.873e-05
Wald test: 0.03405 on 1 and 7088 DF, p-value: 0.8536
Interpretation: The CACE estimate is larger than the ITT estimate. This makes intuitive sense: the ITT averages across compliers and never-takers (who were unaffected), so it underestimates the effect for those who actually received the visit.
| Estimator | What it estimates | When to use |
|---|---|---|
| ITT | Effect of assignment | Always valid; conservative |
| ITT_D | Compliance rate | Needed to compute CACE |
| CACE (IV) | Effect of receipt among compliers | When you care about the treatment, not just the offer |
The CACE is only identified if: (1) assignment affects the outcome only through treatment receipt (exclusion restriction), and (2) assignment increases (not decreases) treatment receipt for all units (monotonicity).
5 Analysis of Conjoint Experiments
This part walks through a conjoint analysis of a public policy experiment conducted during the COVID-19 pandemic. Respondents were shown pairs of hypothetical exit-from-lockdown policies and asked which they preferred. Each policy profile varied randomly across five attributes.
Our goal is to estimate and visualize two complementary quantities:
- AMCEs (Average Marginal Component Effects): How much does each attribute level change the probability of a profile being chosen, relative to a baseline?
- Marginal Means (MMs): What is the average predicted probability that a profile is chosen when a given level is present?
We will also conduct a subgroup analysis by partisanship to see whether government supporters and non-supporters weight these attributes differently.
AMCEs answer “compared to the baseline level, how much does this level matter?” They are intuitive for testing hypotheses but depend on what the baseline is. Marginal Means don’t require a baseline and are easier to interpret substantively: a MM above 0.5 means the level makes a profile more likely to be chosen than average. For a good discussion, see Leeper, Hobolt & Tilley (2020).
5.1 Setup
We use four packages:
tidyverse— data manipulation and plottinghaven— reading SPSS.savfilescregg— conjoint analysis (AMCEs and Marginal Means)here— robust relative file paths
#Remove scientific notation
options(scipen=999)
# Install any missing packages automatically
required_packages <- c("tidyverse", "haven", "here")
to_install <- required_packages[!sapply(required_packages, requireNamespace, quietly = TRUE)]
if (length(to_install) > 0) install.packages(to_install)
# Cregg is not on CRAN, we install it from github page
if (!require("remotes")) {
install.packages("remotes")
}
remotes::install_github("leeper/cregg")
library(tidyverse)
library(haven)
library(cregg)
library(here)5.2 Data Loading and Merging
The experiment produces two separate files:
- Conjoint results (
resultados Cj.sav): one row per profile shown (each respondent saw multiple pairs) - Survey data (
UBCNES_196285_20200625.sav): one row per respondent, containing background variables like partisanship
We merge them on the respondent ID (numericalId) to attach partisanship to the choice data.
# --- File paths ---
# Adjust these paths to match your project structure
cj_path <- here("UBCNES_196285_20200625_resultados Cj.sav")
survey_path <- here("UBCNES_196285_20200625_short.sav")
conjoint_raw <- haven::read_spss(cj_path)
survey_raw <- haven::read_spss(survey_path)# From the survey, extract partisanship (P5) and recode into a binary variable.
# P5 == 1 or P5 == 4 are coded as government supporters in the original Stata script.
survey_partisanship <- survey_raw |>
select(numericalId, P5) |>
mutate(
govt_supporter = factor(
if_else(P5 %in% c(1, 4), "Govt Supporter", "Non-supporter"),
levels = c("Govt Supporter", "Non-supporter")
)
)5.3 Data Preparation
This is the most important step. We need to:
- Merge the conjoint data with the partisanship variable
- Derive the choice variable: which profile did the respondent pick?
- Label the attribute levels so they are human-readable in plots
5.3.1 Merge and Create the Choice Variable
In the raw data, selection contains the position (1 or 2) of the profile the respondent chose. Within each task (set), the chosen profile is the one where pos == selection.
merged_data <- left_join(conjoint_raw, survey_partisanship, by = "numericalId")
# `choice` is TRUE for the profile the respondent selected, FALSE for the other.
# We check this *within* each (respondent, task) pair.
merged_data <- merged_data |>
group_by(numericalId, set) |>
mutate(choice = (max(selection) == position)) |>
ungroup()max(selection) == position
In a forced-choice conjoint, each task shows exactly two profiles. selection records which position (1 or 2) was chosen. position records the position of this row’s profile (1 or 2). So for each group, max(selection) pulls out the chosen value, then compares it to each row’s position. The result is TRUE for the selected option and FALSE for all others.
5.3.2 Label Attributes and Rename ID
labeled_data <- merged_data |>
mutate(
`End of Confinement` = factor(
atr1, levels = 1:3,
labels = c("July", "September", "November")
),
`Economic Impact` = factor(
atr2, levels = 1:3,
labels = c("Same unemployment as now", "1M more unemployed", "2M more unemployed")
),
`Risk of Second Wave` = factor(
atr3, levels = 1:4,
labels = c("5% prob 2nd wave", "20% prob 2nd wave", "50% prob 2nd wave", "80% prob 2nd wave")
),
`Privacy` = factor(
atr4, levels = 1:2,
labels = c("No cellphone control", "Cellphone control")
),
`Executive Power` = factor(
atr5, levels = 1:2,
labels = c("No special powers", "Special powers")
)
) |>
rename(id = numericalId) # we rename for clarity5.4 Data Structure
Before modelling, let’s understand what the data looks like. Each row is one profile shown to one respondent in one task.
cat("Number of unique respondents:", n_distinct(labeled_data$id), "\n")Number of unique respondents: 1236
cat("Total choice tasks shown:", nrow(distinct(labeled_data, id, set)), "\n")Total choice tasks shown: 12360
cat("Rows per task (profiles per task):", nrow(labeled_data) / nrow(distinct(labeled_data, id, set)), "\n")Rows per task (profiles per task): 2
cat("\nSample: first task for the first respondent\n")
Sample: first task for the first respondent
labeled_data |>
filter(id == min(id), set == 1) |>
select(id, set, position, choice, `End of Confinement`, `Economic Impact`,
`Risk of Second Wave`, `Privacy`, `Executive Power`) |>
print()# A tibble: 2 × 9
id set position choice `End of Confinement` `Economic Impact`
<dbl> <dbl> <dbl> <lgl> <fct> <fct>
1 378656406 1 1 TRUE July 1M more unemployed
2 378656406 1 2 FALSE November 2M more unemployed
# ℹ 3 more variables: `Risk of Second Wave` <fct>, Privacy <fct>,
# `Executive Power` <fct>
Each respondent sees 10 tasks. Each task has exactly 2 rows — one per profile. In each task, exactly one row has choice = TRUE. The attributes vary randomly across profiles and tasks. This randomization is what allows us to estimate causal effects of each attribute.
5.5 Conjoint Analysis: Full Sample
We define the formula once and reuse it throughout. This ensures consistency and makes it easy to add or remove attributes.
# Define the conjoint formula — reused for all models below
cj_formula <- choice ~ `End of Confinement` + `Economic Impact` +
`Risk of Second Wave` + `Privacy` + `Executive Power`# Average Marginal Component Effects (AMCEs)
amce_results <- cj(labeled_data, cj_formula, id = ~id, estimate = "amce")
# Marginal Means
mm_results <- cj(labeled_data, cj_formula, id = ~id, estimate = "mm")cregg estimating?
cj() fits a linear probability model (OLS) for AMCEs, or simply computes weighted means for MMs, with standard errors clustered by respondent. The id = ~id argument tells it which column identifies respondents for clustering.
The reference (baseline) level for AMCEs is automatically set to the first factor level in each attribute — in our case, “July”, “Same unemployment as now”, “5% prob 2nd wave”, “No cellphone control”, and “No special powers”. These appear as zero-effect points in the AMCE plot.
5.6 Visualization: Full Sample
We define a shared theme so all plots look consistent.
plot_theme <- theme_minimal(base_size = 14) +
theme(
plot.title = element_text(face = "bold", hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5, size = 12),
legend.position = "none", # suppress redundant legend (levels already on y-axis)
strip.text.y = element_text(angle = 0, face = "bold"),
panel.border = element_rect(color = "grey90", fill = NA),
strip.background = element_rect(color = "grey90", fill = "grey95")
)5.6.1 AMCE Plot
The AMCE shows how much each level changes the probability of a profile being chosen, relative to the baseline level of that attribute.
ggplot(amce_results, aes(x = estimate, y = level)) +
geom_vline(xintercept = 0, linetype = "dashed", color = "grey40") +
geom_pointrange(aes(xmin = lower, xmax = upper), size = 0.5) +
facet_grid(feature ~ ., scales = "free_y", space = "free_y") +
labs(
title = "Which Policy Features Drive Public Choice?",
subtitle = "AMCEs with 95% Confidence Intervals (reference levels shown at 0)",
x = "Change in Probability of Choosing Profile",
y = NULL
) +
plot_theme- Points at exactly 0 are baseline levels (they don’t show up as lines because the estimate is zero by construction).
- A point to the right of 0 means this level makes the profile more likely to be chosen than the baseline.
- A point to the left of 0 means less likely.
- Bars show 95% confidence intervals — if a bar crosses zero, the effect is not statistically distinguishable from zero at the 5% level.
5.6.2 Marginal Means Plot
Marginal Means show the predicted probability that a profile is chosen when a given level is present, averaged across all combinations of other attributes. The reference point is 0.5 (random chance in a 2-profile task).
ggplot(mm_results, aes(x = estimate, y = level)) +
geom_vline(xintercept = 0.5, linetype = "dashed", color = "grey40") +
geom_pointrange(aes(xmin = lower, xmax = upper), size = 0.5) +
facet_grid(feature ~ ., scales = "free_y", space = "free_y") +
scale_x_continuous(limits = c(0, 0.7)) +
labs(
title = "Predicted Public Support for Policy Features",
subtitle = "Marginal Means with 95% CIs (dashed line = random selection at 0.5)",
x = "Predicted Probability of Choosing Profile",
y = NULL
) +
plot_themeBoth plots convey similar information, but:
| AMCE | Marginal Mean | |
|---|---|---|
| Baseline needed? | Yes | No |
| Intuitive scale | Change in probability | Absolute probability |
| Best for | Hypothesis testing | Substantive interpretation |
Notice that the rank order of levels within an attribute is the same in both plots. The difference is just the reference point (0 vs 0.5).
5.7 Subgroup Analysis: Partisanship
Does partisanship shape how citizens evaluate these trade-offs? We now estimate Marginal Means separately for government supporters and non-supporters using the by argument in cj().
mm_subgroup <- cj(
labeled_data,
cj_formula,
id = ~id,
estimate = "mm",
by = ~govt_supporter
)ggplot(mm_subgroup, aes(x = estimate, y = level, color = BY, shape = BY)) +
geom_vline(xintercept = 0.5, linetype = "dashed", color = "grey40") +
geom_pointrange(
aes(xmin = lower, xmax = upper),
size = 0.5,
position = position_dodge(width = 0.6)
) +
scale_color_manual(
values = c("Govt Supporter" = "#2166ac", "Non-supporter" = "#d6604d")
) +
scale_shape_manual(
values = c("Govt Supporter" = 16, "Non-supporter" = 17)
) +
scale_x_continuous(limits = c(0.2, 0.8)) +
facet_grid(feature ~ ., scales = "free_y", space = "free_y") +
labs(
title = "Do Government Supporters and Non-supporters Have Different Priorities?",
subtitle = "Marginal Means estimated separately for each partisan group",
x = "Predicted Probability of Choosing Profile",
y = NULL,
color = "Partisanship",
shape = "Partisanship"
) +
plot_theme +
theme(legend.position = "bottom") # restore legend for subgroup plot5.7.1 Interpreting the Subgroup Results
The plot above reveals where partisan groups converge and where they diverge:
Economic Impact & Second Wave Risk: Both groups respond similarly to economic and health risks — higher unemployment and higher probability of a second wave both reduce support, roughly equally across groups. This suggests these considerations are less politically filtered.
Privacy (Cellphone Control): Non-supporters show markedly lower support for policies involving cellphone tracking compared to government supporters. The difference in predicted probabilities is substantively meaningful and statistically clear (non-overlapping CIs).
Executive Power (Special Powers): The starkest divergence. Non-supporters strongly penalize the granting of special powers to the government, while supporters are considerably more permissive. This is consistent with well-established findings that in-group members are more tolerant of power concentration.
The subgroup MMs tell us that the two groups differ in their predicted probabilities, but to formally test whether these differences are statistically significant, you should compute interaction AMCEs using cj() with estimate = "amce" and by = ~govt_supporter, then use cj_anova() to test for differences. Visualizing the subgroup plots is a useful first step, but it is not a formal test.
5.7.2 Formal test of subgroup comparison
We now test formally the interaction using the AMCES
# Step 1: Estimate interaction AMCEs separately by partisanship
amce_subgroup <- cj(
labeled_data,
cj_formula,
id = ~id,
estimate = "amce",
by = ~govt_supporter
)
# Step 2: Formal test of whether AMCEs differ across partisan groups
cj_anova(labeled_data, cj_formula, id = ~id, by = ~govt_supporter)Analysis of Deviance Table
Model 1: choice ~ `End of Confinement` + `Economic Impact` + `Risk of Second Wave` +
Privacy + `Executive Power`
Model 2: choice ~ `End of Confinement` + `Economic Impact` + `Risk of Second Wave` +
Privacy + `Executive Power` + govt_supporter + `End of Confinement`:govt_supporter +
`Economic Impact`:govt_supporter + `Risk of Second Wave`:govt_supporter +
Privacy:govt_supporter + `Executive Power`:govt_supporter
Resid. Df Resid. Dev Df Deviance F Pr(>F)
1 24710 5291.4
2 24700 5251.9 10 39.509 18.582 < 0.00000000000000022 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
5.8 Conclusion
This analysis demonstrates a complete conjoint workflow in R:
- Data preparation: merging conjoint and survey data, deriving the choice variable, and labeling attributes
- Estimation: using
cregg::cj()to compute AMCEs and Marginal Means - Visualization: readable
ggplot2plots with appropriate reference lines - Subgroup analysis: comparing partisan groups to uncover heterogeneity in preferences
The central substantive finding is that while economic and health considerations are shared across partisan lines, civil liberties trade-offs (cellphone surveillance, executive power) are strongly moderated by partisanship. Government supporters are considerably more willing to accept restrictions on civil liberties in exchange for security during a health crisis.