Exercise Nudge RCT — Tutorial Analysis

Author

Giuseppe A. Veltri for BIH5106

Published

March 9, 2026

Introduction

This tutorial walks through the analysis of a Randomized Controlled Trial (RCT) testing three behavioural nudge interventions designed to increase physical exercise. Participants were randomly assigned to one of four groups:

Group	Description
Control	No intervention
Social Comparison	Participants received information about how their exercise behaviour compared to peers
Commitment Device	Participants made a commitment (e.g., depositing money) they would forfeit if they did not meet exercise goals
Default Scheduling	Exercise sessions were pre-scheduled in participants’ calendars, requiring them to opt out rather than opt in

The primary outcome is the change in exercise days per week from baseline to 8 weeks (change_days_8wk). A secondary binary outcome indicates whether participants met a success threshold (success_8wk).

We will cover:

Descriptive statistics and balance checks
Outcome distributions
Chi-squared tests for success rates
ANOVA and linear regression for main effects
Moderation (interaction) analysis
Mediation analysis
Logistic regression for predicting success

Setup

Load packages

We use library() to load R packages — collections of functions that extend R’s capabilities. Each package serves a specific purpose:

Package	Purpose
`haven`	Read data files from other statistical software (Stata, SPSS, SAS)
`dplyr`	Data manipulation — filtering, grouping, summarising, creating new variables
`ggplot2`	Creating plots and visualisations using a grammar of graphics
`broom`	Convert model output into tidy data frames
`car`	Companion to Applied Regression — additional regression diagnostics
`patchwork`	Combine multiple ggplot2 plots into a single figure

library(haven)
library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(ggplot2)
library(broom)
library(car)

Loading required package: carData


Attaching package: 'car'

The following object is masked from 'package:dplyr':

    recode

library(patchwork)

Load and prepare data

The data are stored as a Stata .dta file. After loading, we convert the numeric group and gender codes into labelled factors so that tables and plots display meaningful names.

Commands used:

read_dta() (from haven): Reads a Stata .dta file into R as a data frame. The result is stored in an object called data using the assignment operator <-.
|> (pipe operator): Takes the result on the left and passes it as input to the function on the right. This lets us chain operations in a readable sequence instead of nesting functions.
mutate() (from dplyr): Creates or modifies columns in a data frame. Here we overwrite two existing columns.
factor(): Converts a numeric variable into a categorical (factor) variable. The levels argument specifies which numeric codes exist; labels assigns meaningful names to each level.

data <- read_dta("exercise_nudge_rct.dta")

data <- data |>
  mutate(
    group_assignment = factor(
      group_assignment,
      levels = c(0, 1, 2, 3),
      labels = c("Control", "Social Comparison",
                  "Commitment Device", "Default Scheduling")
    ),
    gender = factor(gender, levels = c(0, 1), labels = c("Male", "Female"))
  )

A quick look at the first few rows using head(), which displays the first 6 rows of a data frame by default:

head(data)

# A tibble: 6 × 23
  participant_id   age gender education income   bmi baseline_days_exercise
           <dbl> <dbl> <fct>      <dbl>  <dbl> <dbl>                  <dbl>
1           1001    48 Female         2      1  25.6                      2
2           1002    40 Female         3      3  27.6                      2
3           1003    50 Male           3      3  29.2                      2
4           1004    60 Female         4      4  29.2                      3
5           1005    39 Male           4      3  23.2                      2
6           1006    39 Male           3      3  21.4                      1
# ℹ 16 more variables: baseline_exercise_minutes <dbl>, self_efficacy <dbl>,
#   loss_aversion <dbl>, social_norm_sensitivity <dbl>, health_literacy <dbl>,
#   group_assignment <fct>, adherence <dbl>, days_exercise_8wk <dbl>,
#   exercise_duration_8wk <dbl>, change_days_8wk <dbl>,
#   change_minutes_8wk <dbl>, success_8wk <dbl>, gender_label <chr>,
#   education_label <chr>, income_label <chr>, group_label <chr>

Descriptive statistics and balance checks

In an RCT, randomisation should make the groups comparable on baseline characteristics. We verify this by computing the mean and standard deviation of key variables within each group. Large differences would suggest a problem with the randomisation.

Commands used:

group_by() (from dplyr): Splits the data frame into groups defined by one or more variables. All subsequent operations (like summarise()) are then performed separately within each group.
summarise() (from dplyr): Collapses each group into a single summary row. Inside it, we call summary functions on each variable.
mean() and sd(): Base R functions that compute the arithmetic mean and standard deviation. The argument na.rm = TRUE tells R to ignore missing values instead of returning NA.

balance_stats <- data |>
  group_by(group_assignment) |>
  summarise(
    age_mean = mean(age, na.rm = TRUE),
    age_sd   = sd(age, na.rm = TRUE),
    bmi_mean = mean(bmi, na.rm = TRUE),
    bmi_sd   = sd(bmi, na.rm = TRUE),
    self_efficacy_mean = mean(self_efficacy, na.rm = TRUE),
    self_efficacy_sd   = sd(self_efficacy, na.rm = TRUE),
    loss_aversion_mean = mean(loss_aversion, na.rm = TRUE),
    loss_aversion_sd   = sd(loss_aversion, na.rm = TRUE),
    social_norm_sensitivity_mean = mean(social_norm_sensitivity, na.rm = TRUE),
    social_norm_sensitivity_sd   = sd(social_norm_sensitivity, na.rm = TRUE),
    health_literacy_mean = mean(health_literacy, na.rm = TRUE),
    health_literacy_sd   = sd(health_literacy, na.rm = TRUE),
    baseline_days_exercise_mean = mean(baseline_days_exercise, na.rm = TRUE),
    baseline_days_exercise_sd   = sd(baseline_days_exercise, na.rm = TRUE)
  )

balance_stats

# A tibble: 4 × 15
  group_assignment   age_mean age_sd bmi_mean bmi_sd self_efficacy_mean
  <fct>                 <dbl>  <dbl>    <dbl>  <dbl>              <dbl>
1 Control                42.8   11.6     26.8   5.05               4.70
2 Social Comparison      42.4   11.6     27.2   4.82               4.66
3 Commitment Device      42.3   11.9     27.1   4.77               4.74
4 Default Scheduling     42.3   11.3     26.9   4.65               4.68
# ℹ 9 more variables: self_efficacy_sd <dbl>, loss_aversion_mean <dbl>,
#   loss_aversion_sd <dbl>, social_norm_sensitivity_mean <dbl>,
#   social_norm_sensitivity_sd <dbl>, health_literacy_mean <dbl>,
#   health_literacy_sd <dbl>, baseline_days_exercise_mean <dbl>,
#   baseline_days_exercise_sd <dbl>

What to look for

If the means and SDs are roughly similar across the four groups, the randomisation appears to have worked well. Formal balance tests (e.g., ANOVA or chi-squared tests on each baseline variable) can be used but are not strictly necessary — randomisation guarantees balance in expectation.

Outcome distribution

Before running any statistical tests, it is good practice to inspect the distribution of the outcome variable. This helps identify skewness, outliers, or floor/ceiling effects that might affect the choice of statistical model.

Commands used:

summary(): A base R function that produces a quick numeric summary — minimum, 1st quartile, median, mean, 3rd quartile, and maximum. The $ operator extracts a single column from a data frame (e.g., data$change_days_8wk).

summary(data$days_exercise_8wk)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   1.500   2.800   2.995   4.200   7.000

summary(data$change_days_8wk)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-2.1000  0.0000  0.5000  0.5533  1.1000  3.3000

Next we create histograms using ggplot2. The key components:

ggplot(): Initialises a plot. The aes() function inside it maps variables to visual properties (here, x = change_days_8wk maps the outcome to the horizontal axis).
geom_histogram(): Adds a histogram layer. bins = 30 controls how many bars the range is divided into.
facet_wrap(~ group_assignment): Creates a separate panel for each level of group_assignment, so we can compare distributions side by side.
labs(): Sets the plot title and axis labels.

ggplot(data, aes(x = change_days_8wk)) +
  geom_histogram(bins = 30, fill = "skyblue", color = "black") +
  facet_wrap(~ group_assignment) +
  labs(
    title = "Change in exercise days by group",
    x = "Change in days per week",
    y = "Count"
  )

What to look for

Check whether the distributions are roughly symmetric and whether any group shows a clear shift relative to the control group.

Success rates by group

The binary variable success_8wk indicates whether a participant met a pre-defined exercise threshold. We use a cross-tabulation to inspect success rates and a chi-squared test to assess whether success rates differ significantly across groups.

Commands used:

table(): Creates a cross-tabulation (contingency table) of two categorical variables. The first argument becomes the rows, the second becomes the columns. The cells contain counts.
prop.table(): Converts a table of counts into proportions. The argument margin = 1 means “compute proportions within each row” (i.e., each row sums to 1). Using margin = 2 would compute column proportions instead.
chisq.test(): Performs Pearson’s chi-squared test of independence. It tests the null hypothesis that the row variable (group assignment) and the column variable (success) are independent — i.e., that success rates are the same across all groups.

success_table <- table(data$group_assignment, data$success_8wk)
success_table

                    
                       0   1
  Control            745   5
  Social Comparison  611 139
  Commitment Device  553 197
  Default Scheduling 683  67

Row proportions show the success rate within each group:

prop.table(success_table, margin = 1)

                    
                               0           1
  Control            0.993333333 0.006666667
  Social Comparison  0.814666667 0.185333333
  Commitment Device  0.737333333 0.262666667
  Default Scheduling 0.910666667 0.089333333

chisq.test(success_table)


    Pearson's Chi-squared test

data:  success_table
X-squared = 238.61, df = 3, p-value < 2.2e-16

Interpretation

A statistically significant chi-squared test (p < 0.05) indicates that the probability of success differs across at least some of the groups. It does not tell us which groups differ — for that, we would need pairwise comparisons.

Main effects analysis

One-way ANOVA

A one-way ANOVA tests whether the mean change in exercise days differs across the four groups. It is the multi-group generalisation of a two-sample t-test.

Commands used:

oneway.test(): Performs a one-way analysis of variance. The formula change_days_8wk ~ group_assignment reads as “change in exercise days as a function of group assignment.” The ~ (tilde) is R’s formula notation: the outcome goes on the left, the predictor(s) on the right. By default, this version does not assume equal variances across groups (Welch’s ANOVA).

oneway.test(change_days_8wk ~ group_assignment, data = data)


    One-way analysis of means (not assuming equal variances)

data:  change_days_8wk and group_assignment
F = 392.93, num df = 3, denom df = 1660, p-value < 2.2e-16

Linear regression (basic)

An equivalent approach is to fit a linear regression with group_assignment as the predictor. The control group serves as the reference category, so each coefficient represents the difference in mean change between that intervention group and the control.

Commands used:

lm(): Fits a linear regression model (ordinary least squares). The formula change_days_8wk ~ group_assignment specifies the outcome and predictor. When the predictor is a factor, R automatically creates dummy variables — one for each level except the reference level (the first level, here “Control”).
summary(): When applied to a model object (not a data vector), this prints the full regression table — coefficients, standard errors, t-statistics, p-values, R-squared, and the F-test.

basic_reg <- lm(change_days_8wk ~ group_assignment, data = data)
summary(basic_reg)


Call:
lm(formula = change_days_8wk ~ group_assignment, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.21760 -0.49467  0.00533  0.47507  2.80533 

Coefficients:
                                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)                        -0.07507    0.02487  -3.018  0.00257 ** 
group_assignmentSocial Comparison   0.89267    0.03518  25.376  < 2e-16 ***
group_assignmentCommitment Device   1.05093    0.03518  29.875  < 2e-16 ***
group_assignmentDefault Scheduling  0.56973    0.03518  16.196  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6812 on 2996 degrees of freedom
Multiple R-squared:  0.2586,    Adjusted R-squared:  0.2579 
F-statistic: 348.4 on 3 and 2996 DF,  p-value: < 2.2e-16

Reading the output

The intercept is the mean change for the control group.
Each group_assignment coefficient is the additional change relative to control.
A significant coefficient (p < 0.05) suggests that group’s nudge had an effect beyond the control condition.

Regression with covariates

Adding baseline covariates (age, gender, baseline exercise) can improve precision by reducing residual variance. In an RCT, covariate adjustment should not change the point estimates much (if randomisation worked), but it can tighten confidence intervals.

Commands used:

lm() again, but now with additional predictors joined by +. The formula change_days_8wk ~ group_assignment + age + gender + baseline_days_exercise includes the treatment variable and three covariates. The + in a formula means “and also control for” — it does not imply an interaction between the variables.

covariate_reg <- lm(
  change_days_8wk ~ group_assignment + age + gender + baseline_days_exercise,
  data = data
)
summary(covariate_reg)


Call:
lm(formula = change_days_8wk ~ group_assignment + age + gender + 
    baseline_days_exercise, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.00406 -0.45097 -0.03391  0.44709  2.66079 

Coefficients:
                                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)                         0.100751   0.054593   1.845   0.0651 .  
group_assignmentSocial Comparison   0.891009   0.034154  26.088   <2e-16 ***
group_assignmentCommitment Device   1.059475   0.034157  31.018   <2e-16 ***
group_assignmentDefault Scheduling  0.569158   0.034149  16.667   <2e-16 ***
age                                 0.000457   0.001043   0.438   0.6611    
genderFemale                        0.029561   0.024157   1.224   0.2212    
baseline_days_exercise             -0.086766   0.006383 -13.593   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6612 on 2993 degrees of freedom
Multiple R-squared:  0.3022,    Adjusted R-squared:  0.3008 
F-statistic: 216.1 on 6 and 2993 DF,  p-value: < 2.2e-16

Moderation analysis

Moderation (interaction) analysis asks: does the effect of the intervention depend on a participant characteristic? For example, does the Social Comparison nudge work better for people who are more sensitive to social norms?

We fit separate interaction models for four moderators, each motivated by a specific theoretical link to the interventions.

Social norm sensitivity

We expect that the Social Comparison nudge might be more effective for participants who are more sensitive to social norms.

Commands used (these apply to all four moderation models below):

lm() with the * operator: In a formula, A * B is shorthand for A + B + A:B — it includes the main effects of both variables and their interaction. The interaction term A:B allows the effect of A to vary depending on the value of B (and vice versa). This is the key ingredient of moderation analysis.
expand.grid(): Creates a data frame containing every combination of the supplied values. Here we generate all combinations of the 4 group levels and social norm sensitivity values 1 through 10. This gives us a grid of hypothetical participants to generate predictions for.
predict(): Takes a fitted model and new data, and returns the model’s predicted outcome for each row. The argument newdata = specifies the data frame of values to predict for. This is how we produce the predicted lines in the interaction plots.
geom_line(): A ggplot2 layer that draws lines connecting data points. Combined with color = group_assignment, it draws a separate line for each group.

social_norm_mod <- lm(
  change_days_8wk ~ group_assignment * social_norm_sensitivity,
  data = data
)
summary(social_norm_mod)


Call:
lm(formula = change_days_8wk ~ group_assignment * social_norm_sensitivity, 
    data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.16281 -0.46206  0.01006  0.44698  2.80753 

Coefficients:
                                                            Estimate Std. Error
(Intercept)                                                 0.010252   0.103698
group_assignmentSocial Comparison                          -0.448452   0.146098
group_assignmentCommitment Device                           1.158779   0.141934
group_assignmentDefault Scheduling                          0.547337   0.148719
social_norm_sensitivity                                    -0.012436   0.014696
group_assignmentSocial Comparison:social_norm_sensitivity   0.194408   0.020640
group_assignmentCommitment Device:social_norm_sensitivity  -0.016179   0.020228
group_assignmentDefault Scheduling:social_norm_sensitivity  0.003392   0.020943
                                                           t value Pr(>|t|)    
(Intercept)                                                  0.099 0.921250    
group_assignmentSocial Comparison                           -3.070 0.002163 ** 
group_assignmentCommitment Device                            8.164 4.72e-16 ***
group_assignmentDefault Scheduling                           3.680 0.000237 ***
social_norm_sensitivity                                     -0.846 0.397504    
group_assignmentSocial Comparison:social_norm_sensitivity    9.419  < 2e-16 ***
group_assignmentCommitment Device:social_norm_sensitivity   -0.800 0.423887    
group_assignmentDefault Scheduling:social_norm_sensitivity   0.162 0.871357    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6638 on 2992 degrees of freedom
Multiple R-squared:  0.2969,    Adjusted R-squared:  0.2953 
F-statistic: 180.5 on 7 and 2992 DF,  p-value: < 2.2e-16

social_norm_df <- expand.grid(
  group_assignment = levels(data$group_assignment),
  social_norm_sensitivity = 1:10
)
social_norm_df$predicted <- predict(social_norm_mod, newdata = social_norm_df)

ggplot(social_norm_df, aes(x = social_norm_sensitivity, y = predicted,
                           color = group_assignment)) +
  geom_line(linewidth = 1) +
  labs(
    title = "Predicted change by social norm sensitivity",
    x = "Social Norm Sensitivity",
    y = "Predicted Change in Exercise Days",
    color = "Group"
  ) +
  theme_minimal()

Reading interaction plots

If the lines are roughly parallel, there is little moderation — the intervention effect is similar regardless of the moderator value. Non-parallel lines (especially crossing lines) suggest moderation. Check the regression output to see whether the interaction terms are statistically significant.

How to interpret this model

The regression output includes several types of coefficients. Here is how to read each one:

Main effects of group (e.g., group_assignmentSocial Comparison): These represent the difference between that intervention group and the control group when social norm sensitivity equals zero. Because the scale does not include zero in practice, these coefficients are not directly meaningful on their own — they are the intercept difference extrapolated to the edge of the scale.
Main effect of social norm sensitivity (social_norm_sensitivity): This is the slope of social norm sensitivity for the control group only (the reference category). It tells you how much the predicted change in exercise days increases per one-unit increase in social norm sensitivity among control participants.
Interaction terms (e.g., group_assignmentSocial Comparison:social_norm_sensitivity): These are the key coefficients for moderation. Each one tells you how much the slope of social norm sensitivity differs in that intervention group compared to the control group. A positive and significant interaction means the intervention effect grows as social norm sensitivity increases. A negative and significant interaction means the intervention effect shrinks (or reverses) at higher levels of the moderator.

In the plot, look specifically at the Social Comparison line relative to Control. If the gap between these two lines widens as social norm sensitivity increases, this confirms a positive interaction — the Social Comparison nudge is more effective for participants who care more about social norms.

Loss aversion

The Commitment Device nudge involves a potential loss (forfeiting a deposit). We test whether participants with higher loss aversion respond more strongly to this nudge.

The commands below follow the same pattern as the social norm sensitivity model: lm() with * for the interaction, expand.grid() + predict() for the prediction grid, and geom_line() for the plot.

loss_aversion_mod <- lm(
  change_days_8wk ~ group_assignment * loss_aversion,
  data = data
)
summary(loss_aversion_mod)


Call:
lm(formula = change_days_8wk ~ group_assignment * loss_aversion, 
    data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.23062 -0.46799 -0.00692  0.45435  2.75550 

Coefficients:
                                                  Estimate Std. Error t value
(Intercept)                                      -0.059100   0.082997  -0.712
group_assignmentSocial Comparison                 0.993732   0.121750   8.162
group_assignmentCommitment Device                 0.375575   0.122944   3.055
group_assignmentDefault Scheduling                0.678056   0.122105   5.553
loss_aversion                                    -0.002702   0.013414  -0.201
group_assignmentSocial Comparison:loss_aversion  -0.016559   0.019453  -0.851
group_assignmentCommitment Device:loss_aversion   0.112297   0.019759   5.683
group_assignmentDefault Scheduling:loss_aversion -0.017979   0.019626  -0.916
                                                 Pr(>|t|)    
(Intercept)                                       0.47647    
group_assignmentSocial Comparison                4.80e-16 ***
group_assignmentCommitment Device                 0.00227 ** 
group_assignmentDefault Scheduling               3.05e-08 ***
loss_aversion                                     0.84035    
group_assignmentSocial Comparison:loss_aversion   0.39469    
group_assignmentCommitment Device:loss_aversion  1.45e-08 ***
group_assignmentDefault Scheduling:loss_aversion  0.35972    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6748 on 2992 degrees of freedom
Multiple R-squared:  0.2735,    Adjusted R-squared:  0.2718 
F-statistic: 160.9 on 7 and 2992 DF,  p-value: < 2.2e-16

loss_aversion_df <- expand.grid(
  group_assignment = levels(data$group_assignment),
  loss_aversion = 1:10
)
loss_aversion_df$predicted <- predict(loss_aversion_mod, newdata = loss_aversion_df)

ggplot(loss_aversion_df, aes(x = loss_aversion, y = predicted,
                             color = group_assignment)) +
  geom_line(linewidth = 1) +
  labs(
    title = "Predicted change by loss aversion",
    x = "Loss Aversion",
    y = "Predicted Change in Exercise Days",
    color = "Group"
  ) +
  theme_minimal()

How to interpret this model

The structure of this output mirrors the social norm sensitivity model above:

The interaction term group_assignmentCommitment Device:loss_aversion is the most theoretically relevant coefficient. A positive and significant value would indicate that the Commitment Device nudge becomes more effective as loss aversion increases — consistent with the idea that people who are more averse to losses respond more strongly to the threat of forfeiting their deposit.
Interaction terms for the other groups (Social Comparison, Default Scheduling) test whether loss aversion moderates those interventions as well, which may or may not be theoretically expected.

In the plot, focus on the Commitment Device line. If it diverges upward from Control as loss aversion increases, this supports the hypothesis. If all four lines remain roughly parallel, loss aversion does not meaningfully moderate any of the intervention effects.

Health literacy

The Default Scheduling nudge may be particularly helpful for participants with lower health literacy, who might otherwise struggle to plan exercise independently.

Same commands as the previous moderation models (lm() with *, expand.grid(), predict(), geom_line()).

health_literacy_mod <- lm(
  change_days_8wk ~ group_assignment * health_literacy,
  data = data
)
summary(health_literacy_mod)


Call:
lm(formula = change_days_8wk ~ group_assignment * health_literacy, 
    data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.21264 -0.46959  0.01271  0.44206  2.63991 

Coefficients:
                                                    Estimate Std. Error t value
(Intercept)                                        -0.125101   0.058491  -2.139
group_assignmentSocial Comparison                   0.871741   0.083621  10.425
group_assignmentCommitment Device                   1.074236   0.082711  12.988
group_assignmentDefault Scheduling                  0.200840   0.081999   2.449
health_literacy                                     0.016683   0.017692   0.943
group_assignmentSocial Comparison:health_literacy   0.006887   0.025302   0.272
group_assignmentCommitment Device:health_literacy  -0.007879   0.024865  -0.317
group_assignmentDefault Scheduling:health_literacy  0.125841   0.025003   5.033
                                                   Pr(>|t|)    
(Intercept)                                          0.0325 *  
group_assignmentSocial Comparison                   < 2e-16 ***
group_assignmentCommitment Device                   < 2e-16 ***
group_assignmentDefault Scheduling                   0.0144 *  
health_literacy                                      0.3458    
group_assignmentSocial Comparison:health_literacy    0.7855    
group_assignmentCommitment Device:health_literacy    0.7514    
group_assignmentDefault Scheduling:health_literacy 5.11e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6741 on 2992 degrees of freedom
Multiple R-squared:  0.2751,    Adjusted R-squared:  0.2734 
F-statistic: 162.2 on 7 and 2992 DF,  p-value: < 2.2e-16

health_literacy_df <- expand.grid(
  group_assignment = levels(data$group_assignment),
  health_literacy = 1:10
)
health_literacy_df$predicted <- predict(health_literacy_mod, newdata = health_literacy_df)

ggplot(health_literacy_df, aes(x = health_literacy, y = predicted,
                               color = group_assignment)) +
  geom_line(linewidth = 1) +
  labs(
    title = "Predicted change by health literacy",
    x = "Health Literacy",
    y = "Predicted Change in Exercise Days",
    color = "Group"
  ) +
  theme_minimal()

How to interpret this model

Here the theoretical prediction runs in the opposite direction from the previous two models. We expect Default Scheduling to be more helpful for people with lower health literacy, because these participants may lack the knowledge to structure their own exercise routines. This means we are looking for a negative interaction term for group_assignmentDefault Scheduling:health_literacy — indicating that the advantage of Default Scheduling over Control decreases as health literacy goes up.

In the plot, look for the Default Scheduling line to be well above Control on the left side (low health literacy) but converging toward Control on the right (high health literacy). This pattern would confirm that the nudge compensates for low health literacy.

Self-efficacy

Self-efficacy (belief in one’s ability to exercise) may moderate the effect of all interventions. Participants with higher self-efficacy might benefit more because they feel capable of acting on the nudge.

Same commands as the previous moderation models.

self_efficacy_mod <- lm(
  change_days_8wk ~ group_assignment * self_efficacy,
  data = data
)
summary(self_efficacy_mod)


Call:
lm(formula = change_days_8wk ~ group_assignment * self_efficacy, 
    data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.18857 -0.45483  0.00079  0.45314  2.50985 

Coefficients:
                                                 Estimate Std. Error t value
(Intercept)                                      -0.24923    0.07715  -3.230
group_assignmentSocial Comparison                 0.68695    0.10900   6.302
group_assignmentCommitment Device                 0.98387    0.10885   9.039
group_assignmentDefault Scheduling                0.35022    0.11012   3.180
self_efficacy                                     0.03702    0.01554   2.382
group_assignmentSocial Comparison:self_efficacy   0.04457    0.02207   2.019
group_assignmentCommitment Device:self_efficacy   0.01392    0.02185   0.637
group_assignmentDefault Scheduling:self_efficacy  0.04702    0.02226   2.112
                                                 Pr(>|t|)    
(Intercept)                                       0.00125 ** 
group_assignmentSocial Comparison                3.37e-10 ***
group_assignmentCommitment Device                 < 2e-16 ***
group_assignmentDefault Scheduling                0.00149 ** 
self_efficacy                                     0.01729 *  
group_assignmentSocial Comparison:self_efficacy   0.04357 *  
group_assignmentCommitment Device:self_efficacy   0.52420    
group_assignmentDefault Scheduling:self_efficacy  0.03474 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.6737 on 2992 degrees of freedom
Multiple R-squared:  0.276, Adjusted R-squared:  0.2743 
F-statistic: 162.9 on 7 and 2992 DF,  p-value: < 2.2e-16

self_efficacy_df <- expand.grid(
  group_assignment = levels(data$group_assignment),
  self_efficacy = 1:10
)
self_efficacy_df$predicted <- predict(self_efficacy_mod, newdata = self_efficacy_df)

ggplot(self_efficacy_df, aes(x = self_efficacy, y = predicted,
                             color = group_assignment)) +
  geom_line(linewidth = 1) +
  labs(
    title = "Predicted change by self-efficacy",
    x = "Self-Efficacy",
    y = "Predicted Change in Exercise Days",
    color = "Group"
  ) +
  theme_minimal()

How to interpret this model

Unlike the previous three moderators, self-efficacy is not tied to a single intervention — it could plausibly moderate all of them. A person who believes they can exercise regularly may be better positioned to act on any nudge they receive.

Look at the interaction terms for all three intervention groups. Positive and significant values would suggest that participants with higher self-efficacy benefit more from the nudge.
Alternatively, if the interaction is negative, it could mean that the nudge is most helpful for people with low self-efficacy — the intervention compensates for a lack of internal motivation.

In the plot, check whether the intervention lines fan out (diverge from Control) as self-efficacy increases (positive moderation) or whether they converge (negative moderation). Also note whether the pattern is similar across interventions or specific to one.

A note on multiple interaction models

We have fitted four separate interaction models, each with a different moderator. Each model includes interaction terms for all three intervention groups, meaning we are testing many coefficients across these models. Be cautious about over-interpreting any single marginally significant interaction (e.g., p-values near 0.05). Ideally, interpret these results in light of the pre-specified theoretical hypotheses rather than treating every significant coefficient as a confirmed finding.

Mediation analysis

Mediation analysis asks: through what mechanism does the intervention affect the outcome? Here we test whether adherence (how closely participants followed the intervention protocol) mediates the effect of group assignment on the change in exercise days.

We follow the classic Baron and Kenny (1986) approach:

Step 1: Show that the intervention affects the mediator (adherence).
Step 2: Show that the mediator affects the outcome, controlling for the intervention.

If both steps yield significant effects, and the direct effect of the intervention is reduced when controlling for the mediator, we have evidence of mediation.

Step 1: Intervention → Adherence

Commands used:

Both steps use lm() and summary(), which we have already seen. The only difference is the choice of outcome and predictors in the formula.

In Step 1, the outcome is the mediator (adherence) and the predictor is group_assignment. This tests whether the intervention affects adherence.
In Step 2, the outcome is the final outcome (change_days_8wk) and we include both adherence and group_assignment using +. This lets us see (a) whether adherence predicts the outcome after controlling for group, and (b) whether the group coefficients shrink compared to the basic regression — which would indicate mediation.

adherence_mod <- lm(adherence ~ group_assignment, data = data)
summary(adherence_mod)


Call:
lm(formula = adherence ~ group_assignment, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-31.097  -6.783  -0.064   6.903  38.903 

Coefficients:
                                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)                         47.7827     0.3490  136.92   <2e-16 ***
group_assignmentSocial Comparison    8.2813     0.4935   16.78   <2e-16 ***
group_assignmentCommitment Device   10.0427     0.4935   20.35   <2e-16 ***
group_assignmentDefault Scheduling   5.3147     0.4935   10.77   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 9.557 on 2996 degrees of freedom
Multiple R-squared:  0.1371,    Adjusted R-squared:  0.1363 
F-statistic: 158.7 on 3 and 2996 DF,  p-value: < 2.2e-16

Step 2: Adherence → Outcome (controlling for intervention)

mediation_mod <- lm(change_days_8wk ~ adherence + group_assignment, data = data)
summary(mediation_mod)


Call:
lm(formula = change_days_8wk ~ adherence + group_assignment, 
    data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.49456 -0.31504  0.01512  0.31666  1.58562 

Coefficients:
                                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)                        -2.4799715  0.0474607  -52.25   <2e-16 ***
adherence                           0.0503301  0.0009223   54.57   <2e-16 ***
group_assignmentSocial Comparison   0.4758666  0.0260588   18.26   <2e-16 ***
group_assignmentCommitment Device   0.5454852  0.0265803   20.52   <2e-16 ***
group_assignmentDefault Scheduling  0.3022458  0.0253919   11.90   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4825 on 2995 degrees of freedom
Multiple R-squared:  0.6283,    Adjusted R-squared:  0.6278 
F-statistic:  1265 on 4 and 2995 DF,  p-value: < 2.2e-16

Interpretation

Compare the group coefficients in this model with those from the basic regression (without adherence). If the coefficients shrink when adherence is added, this suggests that part of the intervention effect operates through adherence. A significant coefficient on adherence is necessary for mediation.

Logistic regression

When the outcome is binary (success_8wk), we use logistic regression instead of linear regression. The model estimates the log-odds of success as a function of group assignment and psychological factors.

Commands used:

glm(): Stands for Generalised Linear Model. It extends lm() to handle non-normal outcomes. The formula works the same way as in lm(). The critical difference is the family = binomial() argument, which tells R to fit a logistic regression (binary outcome, logit link function). The coefficients are on the log-odds scale, not the probability scale.
summary() on a glm object: Similar to a linear model summary, but reports z-statistics instead of t-statistics. Coefficients represent changes in log-odds per unit increase in the predictor.

logit_mod <- glm(
  success_8wk ~ group_assignment + self_efficacy + loss_aversion + social_norm_sensitivity,
  data = data,
  family = binomial()
)
summary(logit_mod)


Call:
glm(formula = success_8wk ~ group_assignment + self_efficacy + 
    loss_aversion + social_norm_sensitivity, family = binomial(), 
    data = data)

Coefficients:
                                   Estimate Std. Error z value Pr(>|z|)    
(Intercept)                        -7.86185    0.59783 -13.151  < 2e-16 ***
group_assignmentSocial Comparison   3.55527    0.45919   7.742 9.75e-15 ***
group_assignmentCommitment Device   4.02467    0.45717   8.803  < 2e-16 ***
group_assignmentDefault Scheduling  2.68794    0.46724   5.753 8.78e-09 ***
self_efficacy                       0.24173    0.03648   6.626 3.46e-11 ***
loss_aversion                       0.12284    0.03306   3.715 0.000203 ***
social_norm_sensitivity             0.12847    0.03408   3.769 0.000164 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2385.8  on 2999  degrees of freedom
Residual deviance: 2025.0  on 2993  degrees of freedom
AIC: 2039

Number of Fisher Scoring iterations: 7

Predicted probabilities

To make the logistic regression results more interpretable, we compute predicted probabilities of success for each group at low (3) and high (7) levels of self-efficacy, holding the other variables at their midpoints.

Commands used:

data.frame(): Creates a new data frame manually by specifying column names and their values. We use rep() (repeat) to generate repeated values for each column.
predict() with type = "response": When applied to a logistic model, type = "response" converts the predictions from the log-odds scale back to the probability scale (values between 0 and 1). Without this argument, predict() would return log-odds, which are harder to interpret.

logit_predictions <- data.frame(
  group_assignment = rep(levels(data$group_assignment), each = 2),
  self_efficacy = rep(c(3, 7), times = 4),
  loss_aversion = 5,
  social_norm_sensitivity = 5
)

logit_predictions$predicted_prob <- predict(
  logit_mod,
  newdata = logit_predictions,
  type = "response"
)

logit_predictions

    group_assignment self_efficacy loss_aversion social_norm_sensitivity
1            Control             3             5                       5
2            Control             7             5                       5
3  Social Comparison             3             5                       5
4  Social Comparison             7             5                       5
5  Commitment Device             3             5                       5
6  Commitment Device             7             5                       5
7 Default Scheduling             3             5                       5
8 Default Scheduling             7             5                       5
  predicted_prob
1    0.002786619
2    0.007295082
3    0.089084402
4    0.204571524
5    0.135232478
6    0.291406540
7    0.039460372
8    0.097501825

Interpretation

The predicted_prob column shows the estimated probability of meeting the exercise success threshold. Comparing rows within a group shows the effect of self-efficacy; comparing across groups shows the intervention effect.

Summary visualisations

Change in exercise by group

Commands used:

geom_boxplot(): A ggplot2 layer that draws box-and-whisker plots. The box shows the interquartile range (25th to 75th percentile), the line inside is the median, and the whiskers extend to 1.5× the IQR. Points beyond the whiskers are plotted individually as potential outliers.
theme_minimal(): Applies a clean, minimal visual theme to the plot (light background, no grid boxes).
theme(legend.position = "none"): Hides the legend, since the x-axis labels already identify the groups.

ggplot(data, aes(x = group_assignment, y = change_days_8wk,
                 fill = group_assignment)) +
  geom_boxplot() +
  labs(
    title = "Change in exercise frequency by intervention",
    x = "Intervention Group",
    y = "Change in Days per Week"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

Combined interaction effects

Commands used:

Here we use the patchwork package to arrange multiple plots into a single figure.

We save each plot to a variable (p1, p2, p3, p4) using <- instead of displaying them immediately.
The + operator (from patchwork, not base R) places plots side by side.
The / operator stacks plots vertically.
So (p1 + p2) / (p3 + p4) creates a 2×2 grid: top row is p1 and p2, bottom row is p3 and p4.
plot_annotation() adds an overall title to the combined figure.

p1 <- ggplot(social_norm_df, aes(x = social_norm_sensitivity, y = predicted,
                                  color = group_assignment)) +
  geom_line(linewidth = 1) +
  labs(title = "Social norm sensitivity", x = "Social Norm Sensitivity",
       y = "Predicted Change", color = "Group") +
  theme_minimal()

p2 <- ggplot(loss_aversion_df, aes(x = loss_aversion, y = predicted,
                                    color = group_assignment)) +
  geom_line(linewidth = 1) +
  labs(title = "Loss aversion", x = "Loss Aversion",
       y = "Predicted Change", color = "Group") +
  theme_minimal()

p3 <- ggplot(health_literacy_df, aes(x = health_literacy, y = predicted,
                                      color = group_assignment)) +
  geom_line(linewidth = 1) +
  labs(title = "Health literacy", x = "Health Literacy",
       y = "Predicted Change", color = "Group") +
  theme_minimal()

p4 <- ggplot(self_efficacy_df, aes(x = self_efficacy, y = predicted,
                                    color = group_assignment)) +
  geom_line(linewidth = 1) +
  labs(title = "Self-efficacy", x = "Self-Efficacy",
       y = "Predicted Change", color = "Group") +
  theme_minimal()

(p1 + p2) / (p3 + p4) +
  plot_annotation(title = "Interaction effects by cognitive factors")

Exercise Nudge RCT — Tutorial Analysis

Introduction

Setup

Load packages

Load and prepare data

Descriptive statistics and balance checks

Outcome distribution

Success rates by group

Main effects analysis

One-way ANOVA

Linear regression (basic)

Regression with covariates

Moderation analysis

Loss aversion

How to interpret this model

Health literacy

How to interpret this model

Self-efficacy

How to interpret this model

Mediation analysis

Step 1: Intervention → Adherence

Step 2: Adherence → Outcome (controlling for intervention)

Logistic regression

Predicted probabilities

Summary visualisations

Change in exercise by group

Combined interaction effects

Further reading

References