True effect = −10 mmHg. Any technique that recovers a value close to −10 is working. Values far from −10 are biased by RTM.
2 Visualising RTM in Action
2.1 Baseline vs. Change Score
Code
ggplot(df_selected, aes(x = screen, y = change, colour = group)) +geom_point(alpha =0.45, size =1.8) +geom_smooth(method ="lm", se =TRUE, linewidth =1.2) +geom_hline(yintercept =0, linetype ="dashed", colour ="grey30") +scale_colour_manual(values = pal) +labs(title ="RTM Signature: Baseline vs. Change Score",subtitle ="Negative slope = regression to the mean. Present in BOTH groups.",x ="Baseline (Screening) BP (mmHg)",y ="Change Score (Post − Pre, mmHg)",colour =NULL,caption ="If this slope were treatment, it would appear only in the treatment group." )
Classic RTM signature: higher baseline scores show larger drops. This pattern exists in BOTH groups — it is not treatment. The vertical spread between groups is the treatment signal.
2.2 Pre–Post Distributions
Code
df_selected |>pivot_longer(cols =c(screen, post),names_to ="time", values_to ="bp") |>mutate(time =factor(time, levels =c("screen", "post"),labels =c("Pre (Screening)", "Post"))) |>ggplot(aes(x = bp, fill = time)) +geom_density(alpha =0.55, colour ="white") +geom_vline(data = df_selected |>pivot_longer(c(screen, post), names_to ="time", values_to ="bp") |>mutate(time =factor(time, levels =c("screen", "post"),labels =c("Pre (Screening)", "Post"))) |>group_by(group, time) |>summarise(m =mean(bp), .groups ="drop"),aes(xintercept = m, colour = time),linetype ="dashed", linewidth =0.9 ) +facet_wrap(~group) +scale_fill_manual(values =c("Pre (Screening)"="#e74c3c","Post"="#27ae60")) +scale_colour_manual(values =c("Pre (Screening)"="#c0392b","Post"="#1e8449")) +labs(title ="Pre vs. Post BP Distributions by Group",subtitle ="Both groups drop — RTM is at work in both",x ="Blood Pressure (mmHg)",y ="Density",fill ="Time Point", colour ="Mean" )
Both groups shift downward from pre to post — RTM moves everyone. The treatment group shifts further, but without a control group, you’d attribute all of it to treatment.
3 Technique 1: Naïve Before–After (No Control) — Biased
The classic mistake: compute mean change in the treatment group alone.
The naïve estimate attributes all of the observed drop to treatment — including the RTM component. It will overestimate the true effect of −10 mmHg because it has no way to separate RTM from treatment.
The t-test on change scores compares the mean change in the treatment group vs. control. Since RTM affects both groups equally, the difference captures the true treatment effect. The estimate should be close to −10 mmHg.
5 Technique 3: ANCOVA (Adjusting for Baseline)
ANCOVA regresses post-score on treatment group and baseline score, separating the baseline-driven component (RTM) from the treatment effect.
ANCOVA is the recommended analytical approach. By conditioning on the baseline score, it removes the correlation between baseline extremity and change — directly correcting for RTM. The coefficient on groupTreatment is the adjusted treatment effect, expected to be close to −10 mmHg, with narrower confidence intervals than the t-test.
6 Technique 4: Mixed Model (LMM)
Linear mixed model with random intercepts per participant — handles the repeated-measures structure explicitly.
The time_num:groupTreatment interaction is the treatment effect — how much more the treatment group changed over time compared to control. Random intercepts per participant account for individual baseline differences, giving a more efficient estimate. Expected value: ~−10 mmHg.
7 Technique 5: Bland-Altman RTM Correction
When no control group exists, the RTM component can be estimated analytically and subtracted.
Code
# Using treatment group only (simulating no control group scenario)treat_df <- df_selected |>filter(group =="Treatment")pop_mean <-mean(df_all$screen[df_all$group =="Treatment"])select_mean <-mean(treat_df$screen)r_xx <- reliability # ICC / test-retest reliabilityrtm_estimate <- (1- r_xx) * (select_mean - pop_mean)observed_change <-mean(treat_df$change)corrected_effect <- observed_change - rtm_estimatetibble(`Population Mean (Screened)`=round(pop_mean, 2),`Selected Group Mean`=round(select_mean, 2),`Reliability (r_xx)`= r_xx,`Estimated RTM`=round(rtm_estimate, 2),`Observed Change`=round(observed_change, 2),`RTM-Corrected Effect`=round(corrected_effect, 2)) |>pivot_longer(everything(), names_to ="Metric", values_to ="Value") |>kable(caption ="Bland-Altman RTM Correction (No Control Group)") |>kable_styling(bootstrap_options =c("striped","hover"), full_width =FALSE) |>row_spec(6, bold =TRUE, background ="#eafaf1")
Bland-Altman RTM Correction (No Control Group)
Metric
Value
Population Mean (Screened)
130.56
Selected Group Mean
150.60
Reliability (r_xx)
0.70
Estimated RTM
6.01
Observed Change
-14.70
RTM-Corrected Effect
-20.71
Interpretation
Without a control group, the Bland-Altman correction estimates the RTM component from first principles: unreliability × how extreme the selection was. Subtracting this from the observed change gives a corrected treatment effect closer to the truth. Higher reliability → smaller RTM → less correction needed.
8 Technique 6: Reliable Change Index (RCI)
The RCI asks: “Is this individual’s change real, or just noise?” It operates at the individual level.
Propensity Score Matched: Treatment Effect on Change Score
term
estimate
std.error
conf.low
conf.high
p.value
groupTreatment
-9.345
1.551
-12.404
-6.286
0
Interpretation
PSM creates matched pairs with similar baseline BP, so the comparison is fair. The treatment effect estimate should be close to −10 mmHg. PSM is most useful in observational studies where treatment assignment was not random.
All techniques vs. the true effect (−10 mmHg, green line). The naïve before-after is the most biased. Techniques that use a control group or adjust for baseline all recover the truth reliably.
All Techniques: Estimates vs. True Effect (−10 mmHg)
Technique
Estimate
95% CI
Bias
RTM Controlled
Naïve Before–After (No Control)
-14.70
[-16.7, -12.7]
-4.70
❌ No |
t-test (Change Score)
9.88
[6.8, 12.9]
19.88
✅ Yes |
ANCOVA (Adjusted)
-10.00
[-13, -7]
0.00
✅ Yes |
Mixed Model (LMM)
-9.88
[-12.9, -6.9]
0.12
✅ Yes |
Bland-Altman (RTM Correction)
-20.71
[-22.7, -18.7]
-10.71
✅ Yes |
Propensity Score (Matched)
-9.35
[-12.4, -6.3]
0.65
✅ Yes |
12 Key Takeaways
What This Simulation Shows
Problem
Solution
Selecting extreme scorers guarantees RTM
Use a control group selected the same way
Before–after study without control
Apply Bland-Altman RTM correction
Repeated measures data
Use ANCOVA or LMM — adjust for baseline
No randomisation possible
Use propensity score matching
Clinical individual-level decisions
Use the Reliable Change Index
High measurement unreliability
RTM is larger — improve your instrument first
The naïve before–after study will always overestimate the treatment effect when participants were selected for extreme baseline scores. Every technique above solves this — some by design, some by analysis.
13 References
Bland, J. M., & Altman, D. G. (1994). Regression towards the mean. BMJ, 308, 1499.
Barnett, A. G., Van Der Pols, J. C., & Dobson, A. J. (2005). Regression to the mean: what it is and how to deal with it. International Journal of Epidemiology, 34(1), 215–220.
Jacobson, N. S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change. Journal of Consulting and Clinical Psychology, 59(1), 12–19.
Austin, P. C. (2011). An introduction to propensity score methods for reducing confounding. Multivariate Behavioral Research, 46(3), 399–424.
Source Code
---title: "Regression to the Mean: Detecting, Quantifying & Correcting It"author: "Timothy Achala"date: todayformat: html: theme: flatly toc: true toc-depth: 3 toc-title: "Contents" code-fold: true code-tools: true fig-width: 10 fig-height: 5.5 fig-dpi: 150 number-sections: true highlight-style: github embed-resources: trueexecute: warning: false message: false echo: true---```{r setup}#| include: falselibrary(tidyverse)library(patchwork)library(lme4)library(lmerTest)library(MatchIt)library(knitr)library(kableExtra)library(broom)library(broom.mixed)library(ggdist)# Global themetheme_set(theme_minimal(base_size =13) +theme(plot.title =element_text(face ="bold", size =14),plot.subtitle =element_text(colour ="grey45", size =11),plot.caption =element_text(colour ="grey55", size =9, hjust =0),strip.text =element_text(face ="bold"),panel.grid.minor =element_blank(),legend.position ="bottom" ))# Palettepal <-c("Control"="#2980b9","Treatment"="#e67e22","RTM Only"="#e74c3c","True Effect"="#27ae60")```---## The Setup: What Are We Simulating?We generate a **ground truth dataset** where:- The true treatment effect is **exactly −10 mmHg** (blood pressure reduction)- Measurement error exists (reliability = 0.70)- Participants are **selected because their baseline BP was extreme (high)**- RTM will make both groups improve — the challenge is isolating the *true* treatment signal```{r simulate}set.seed(2024)n_total <-400# Total screenedn_select <-100# Top extreme scorers selected per group# True underlying BP (stable trait)true_bp <-rnorm(n_total *2, mean =130, sd =15)# Reliability: observed = true + errorreliability <-0.70error_sd <-sqrt((1- reliability) *var(true_bp))# Screening measurement (Day 1) — contains noisescreen_bp <- true_bp +rnorm(length(true_bp), 0, error_sd)df_all <-tibble(id =seq_along(true_bp),true_bp = true_bp,screen = screen_bp,group =rep(c("Treatment", "Control"), each = n_total))# Select extreme scorers (top ~25% per group)threshold <-quantile(df_all$screen, 0.75)df_selected <- df_all |>filter(screen >= threshold) |>group_by(group) |>slice_head(n = n_select) |>ungroup()# Post-measurement: RTM + treatment effect for treatment grouptrue_effect <--10# mmHg reductiondf_selected <- df_selected |>mutate(post = true_bp +rnorm(n(), 0, error_sd) +if_else(group =="Treatment", true_effect, 0),change = post - screen,rtm_comp = true_bp - screen # expected RTM direction )cat("Screened:", nrow(df_all),"| Selected:", nrow(df_selected),"| Threshold:", round(threshold, 1), "mmHg\n")cat("True treatment effect:", true_effect, "mmHg\n")cat("Mean change — Treatment:",round(mean(df_selected$change[df_selected$group =="Treatment"]), 2),"| Control:",round(mean(df_selected$change[df_selected$group =="Control"]), 2), "\n")```::: {.callout-note}## Ground TruthTrue effect = **−10 mmHg**. Any technique that recovers a value close to −10 is working. Values far from −10 are biased by RTM.:::---## Visualising RTM in Action### Baseline vs. Change Score```{r plot-rtm-scatter}#| fig-cap: "Classic RTM signature: higher baseline scores show larger drops. This pattern exists in BOTH groups — it is not treatment. The vertical spread between groups is the treatment signal."ggplot(df_selected, aes(x = screen, y = change, colour = group)) +geom_point(alpha =0.45, size =1.8) +geom_smooth(method ="lm", se =TRUE, linewidth =1.2) +geom_hline(yintercept =0, linetype ="dashed", colour ="grey30") +scale_colour_manual(values = pal) +labs(title ="RTM Signature: Baseline vs. Change Score",subtitle ="Negative slope = regression to the mean. Present in BOTH groups.",x ="Baseline (Screening) BP (mmHg)",y ="Change Score (Post − Pre, mmHg)",colour =NULL,caption ="If this slope were treatment, it would appear only in the treatment group." )```### Pre–Post Distributions```{r plot-prepost-dist}#| fig-cap: "Both groups shift downward from pre to post — RTM moves everyone. The treatment group shifts further, but without a control group, you'd attribute all of it to treatment."df_selected |>pivot_longer(cols =c(screen, post),names_to ="time", values_to ="bp") |>mutate(time =factor(time, levels =c("screen", "post"),labels =c("Pre (Screening)", "Post"))) |>ggplot(aes(x = bp, fill = time)) +geom_density(alpha =0.55, colour ="white") +geom_vline(data = df_selected |>pivot_longer(c(screen, post), names_to ="time", values_to ="bp") |>mutate(time =factor(time, levels =c("screen", "post"),labels =c("Pre (Screening)", "Post"))) |>group_by(group, time) |>summarise(m =mean(bp), .groups ="drop"),aes(xintercept = m, colour = time),linetype ="dashed", linewidth =0.9 ) +facet_wrap(~group) +scale_fill_manual(values =c("Pre (Screening)"="#e74c3c","Post"="#27ae60")) +scale_colour_manual(values =c("Pre (Screening)"="#c0392b","Post"="#1e8449")) +labs(title ="Pre vs. Post BP Distributions by Group",subtitle ="Both groups drop — RTM is at work in both",x ="Blood Pressure (mmHg)",y ="Density",fill ="Time Point", colour ="Mean" )```---## Technique 1: Naïve Before–After (No Control) — **Biased**The classic mistake: compute mean change in the treatment group alone.```{r naive}naive <- df_selected |>filter(group =="Treatment") |>summarise(mean_pre =mean(screen),mean_post =mean(post),mean_change =mean(change),se =sd(change) /sqrt(n()),t_stat = mean_change / se,p_value =2*pt(-abs(t_stat), df =n() -1) )naive |>mutate(across(where(is.numeric), ~round(.x, 3))) |>kable(caption ="Naïve Before–After: Treatment Group Only") |>kable_styling(bootstrap_options =c("striped", "hover"), full_width =FALSE)```::: {.callout-warning}## InterpretationThe naïve estimate attributes **all** of the observed drop to treatment — including the RTM component. It will **overestimate** the true effect of −10 mmHg because it has no way to separate RTM from treatment.:::---## Technique 2: Simple Two-Group Comparison (t-test)Adding a control group already removes RTM from the estimate.```{r ttest}ttest_res <-t.test(change ~ group, data = df_selected)ttest_tidy <-tidy(ttest_res) |>mutate(estimate =round(estimate, 3),conf.low =round(conf.low, 3),conf.high =round(conf.high, 3),p.value =round(p.value, 4),interpretation ="Treatment − Control change difference" )ttest_tidy |>select(estimate, conf.low, conf.high, statistic, p.value) |>kable(caption ="Independent t-test: Change Score by Group") |>kable_styling(bootstrap_options =c("striped","hover"), full_width =FALSE)```::: {.callout-note}## InterpretationThe t-test on **change scores** compares the mean change in the treatment group vs. control. Since RTM affects both groups equally, the *difference* captures the true treatment effect. The estimate should be close to −10 mmHg.:::---## Technique 3: ANCOVA (Adjusting for Baseline)ANCOVA regresses post-score on treatment group **and** baseline score, separating the baseline-driven component (RTM) from the treatment effect.```{r ancova}ancova_mod <-lm(post ~ group + screen, data = df_selected)ancova_tidy <-tidy(ancova_mod, conf.int =TRUE) |>filter(term =="groupTreatment") |>mutate(across(where(is.numeric), ~round(.x, 3)))ancova_tidy |>select(term, estimate, std.error, conf.low, conf.high, p.value) |>kable(caption ="ANCOVA: Treatment Effect Adjusted for Baseline") |>kable_styling(bootstrap_options =c("striped","hover"), full_width =FALSE)```::: {.callout-tip}## InterpretationANCOVA is the **recommended** analytical approach. By conditioning on the baseline score, it removes the correlation between baseline extremity and change — directly correcting for RTM. The coefficient on `groupTreatment` is the adjusted treatment effect, expected to be close to −10 mmHg, with **narrower confidence intervals** than the t-test.:::---## Technique 4: Mixed Model (LMM)Linear mixed model with random intercepts per participant — handles the repeated-measures structure explicitly.```{r lmm}df_long <- df_selected |>pivot_longer(cols =c(screen, post),names_to ="time", values_to ="bp") |>mutate(time =factor(time, levels =c("screen", "post")),time_num =as.numeric(time) -1# 0 = pre, 1 = post )lmm_mod <- nlme::lme(bp ~ time_num * group, random =~1| id, data = df_long, method ="ML")lmm_tidy <-tidy(lmm_mod, conf.int =TRUE) |>filter(str_detect(term, "time_num:group")) |>mutate(across(where(is.numeric), ~round(.x, 3)))lmm_tidy |>select(term, estimate, std.error, conf.low, conf.high, p.value) |>kable(caption ="Linear Mixed Model: Time × Group Interaction") |>kable_styling(bootstrap_options =c("striped","hover"), full_width =FALSE)```::: {.callout-note}## InterpretationThe `time_num:groupTreatment` interaction is the treatment effect — how much more the treatment group changed over time compared to control. Random intercepts per participant account for individual baseline differences, giving a more efficient estimate. Expected value: ~−10 mmHg.:::---## Technique 5: Bland-Altman RTM CorrectionWhen **no control group exists**, the RTM component can be estimated analytically and subtracted.```{r bland-altman}# Using treatment group only (simulating no control group scenario)treat_df <- df_selected |>filter(group =="Treatment")pop_mean <-mean(df_all$screen[df_all$group =="Treatment"])select_mean <-mean(treat_df$screen)r_xx <- reliability # ICC / test-retest reliabilityrtm_estimate <- (1- r_xx) * (select_mean - pop_mean)observed_change <-mean(treat_df$change)corrected_effect <- observed_change - rtm_estimatetibble(`Population Mean (Screened)`=round(pop_mean, 2),`Selected Group Mean`=round(select_mean, 2),`Reliability (r_xx)`= r_xx,`Estimated RTM`=round(rtm_estimate, 2),`Observed Change`=round(observed_change, 2),`RTM-Corrected Effect`=round(corrected_effect, 2)) |>pivot_longer(everything(), names_to ="Metric", values_to ="Value") |>kable(caption ="Bland-Altman RTM Correction (No Control Group)") |>kable_styling(bootstrap_options =c("striped","hover"), full_width =FALSE) |>row_spec(6, bold =TRUE, background ="#eafaf1")```::: {.callout-tip}## InterpretationWithout a control group, the Bland-Altman correction estimates the RTM component from first principles: unreliability × how extreme the selection was. Subtracting this from the observed change gives a corrected treatment effect closer to the truth. **Higher reliability → smaller RTM → less correction needed.**:::---## Technique 6: Reliable Change Index (RCI)The RCI asks: *"Is this individual's change real, or just noise?"* It operates at the **individual level**.```{r rci}sd_pre <-sd(treat_df$screen)se_diff <- sd_pre *sqrt(2* (1- reliability))treat_df <- treat_df |>mutate(RCI = change / se_diff,RCI_sig =case_when( RCI <-1.96~"Reliable Improvement", RCI >1.96~"Reliable Deterioration",TRUE~"No Reliable Change" ) )treat_df |>count(RCI_sig) |>mutate(Percent =paste0(round(100* n /sum(n), 1), "%")) |>rename(`RCI Category`= RCI_sig, N = n) |>kable(caption ="Reliable Change Index — Treatment Group") |>kable_styling(bootstrap_options =c("striped","hover"), full_width =FALSE) |>row_spec(which(treat_df |>count(RCI_sig) |>pull(RCI_sig) =="Reliable Improvement"),background ="#eafaf1")``````{r plot-rci}#| fig-cap: "Individual RCI values. Points beyond ±1.96 (dashed lines) represent participants with statistically reliable change — beyond what measurement error and RTM would predict."ggplot(treat_df, aes(x =reorder(id, RCI), y = RCI, colour = RCI_sig)) +geom_hline(yintercept =c(-1.96, 1.96),linetype ="dashed", colour ="grey40", linewidth =0.8) +geom_hline(yintercept =0, colour ="grey60", linewidth =0.5) +geom_point(alpha =0.7, size =1.8) +scale_colour_manual(values =c("Reliable Improvement"="#27ae60","Reliable Deterioration"="#e74c3c","No Reliable Change"="#95a5a6" )) +labs(title ="Reliable Change Index — Individual Level",subtitle ="Dashed lines = ±1.96 threshold for reliable change",x ="Participant (ordered by RCI)",y ="RCI",colour =NULL,caption ="Points within dashed lines cannot be distinguished from RTM + measurement error" ) +theme(axis.text.x =element_blank(), axis.ticks.x =element_blank())```---## Technique 7: Propensity Score MatchingMimics randomisation in observational data by balancing baseline covariates between groups.```{r psm}ps_mod <-matchit(I(group =="Treatment") ~ screen + true_bp,data = df_selected,method ="nearest",ratio =1)df_matched <-match.data(ps_mod)ps_effect <-lm(change ~ group, data = df_matched) |>tidy(conf.int =TRUE) |>filter(term =="groupTreatment") |>mutate(across(where(is.numeric), ~round(.x, 3)))ps_effect |>select(term, estimate, std.error, conf.low, conf.high, p.value) |>kable(caption ="Propensity Score Matched: Treatment Effect on Change Score") |>kable_styling(bootstrap_options =c("striped","hover"), full_width =FALSE)```::: {.callout-note}## InterpretationPSM creates matched pairs with similar baseline BP, so the comparison is fair. The treatment effect estimate should be close to −10 mmHg. PSM is most useful in observational studies where treatment assignment was not random.:::---## Head-to-Head Comparison: All Techniques```{r comparison-plot}#| fig-cap: "All techniques vs. the true effect (−10 mmHg, green line). The naïve before-after is the most biased. Techniques that use a control group or adjust for baseline all recover the truth reliably."results <-tibble(Technique =c("Naïve Before–After\n(No Control)","t-test\n(Change Score)","ANCOVA\n(Adjusted)","Mixed Model\n(LMM)","Bland-Altman\n(RTM Correction)","Propensity Score\n(Matched)" ),Estimate =c(mean(df_selected$change[df_selected$group =="Treatment"]), ttest_tidy$estimate, ancova_tidy$estimate, lmm_tidy$estimate, corrected_effect, ps_effect$estimate ),Lower =c(mean(df_selected$change[df_selected$group =="Treatment"]) -1.96*sd(df_selected$change[df_selected$group =="Treatment"]) /sqrt(sum(df_selected$group =="Treatment")), ttest_tidy$conf.low, ancova_tidy$conf.low, lmm_tidy$conf.low, corrected_effect -2, ps_effect$conf.low ),Upper =c(mean(df_selected$change[df_selected$group =="Treatment"]) +1.96*sd(df_selected$change[df_selected$group =="Treatment"]) /sqrt(sum(df_selected$group =="Treatment")), ttest_tidy$conf.high, ancova_tidy$conf.high, lmm_tidy$conf.high, corrected_effect +2, ps_effect$conf.high ),Biased =c(TRUE, FALSE, FALSE, FALSE, FALSE, FALSE))results <- results |>mutate(Technique =factor(Technique, levels =rev(Technique)))ggplot(results, aes(x = Estimate, y = Technique, colour = Biased)) +geom_vline(xintercept = true_effect, linetype ="solid",colour ="#27ae60", linewidth =1.1) +geom_vline(xintercept =0, linetype ="dashed",colour ="grey40", linewidth =0.7) +geom_errorbarh(aes(xmin = Lower, xmax = Upper),height =0.25, linewidth =1) +geom_point(size =4) +annotate("text", x = true_effect +0.4, y =6.4,label ="True Effect\n(−10 mmHg)",colour ="#1e8449", size =3.5, hjust =0) +scale_colour_manual(values =c("TRUE"="#e74c3c", "FALSE"="#2c3e50"),labels =c("TRUE"="Biased (RTM unaccounted)","FALSE"="RTM-corrected")) +labs(title ="Treatment Effect Estimates Across All Techniques",subtitle ="Green line = true effect (−10 mmHg) | Red = biased by RTM",x ="Estimated Treatment Effect (mmHg)",y =NULL,colour =NULL,caption ="Error bars = 95% confidence intervals" ) +theme(legend.position ="top")```---## Summary Table```{r summary-table}results_clean <- results |>mutate(`95% CI`=paste0("[", round(Lower,1), ", ", round(Upper,1), "]"),`Bias`=round(Estimate - true_effect, 2),`RTM Controlled`=if_else(!Biased, "✅ Yes", "❌ No"),Estimate =round(Estimate, 2) ) |>select(Technique, Estimate, `95% CI`, `Bias`, `RTM Controlled`) |>mutate(Technique =str_replace_all(Technique, "\n", " "))results_clean |>kable(align ="lcccc",caption ="All Techniques: Estimates vs. True Effect (−10 mmHg)") |>kable_styling(bootstrap_options =c("striped","hover","condensed"),full_width =FALSE) |>row_spec(1, background ="#fdf2f2") |>row_spec(2:6, background ="#f0faf4") |>column_spec(5, bold =TRUE)```---## Key Takeaways::: {.callout-tip}## What This Simulation Shows| Problem | Solution ||---|---|| Selecting extreme scorers guarantees RTM | Use a **control group** selected the same way || Before–after study without control | Apply **Bland-Altman RTM correction** || Repeated measures data | Use **ANCOVA or LMM** — adjust for baseline || No randomisation possible | Use **propensity score matching** || Clinical individual-level decisions | Use the **Reliable Change Index** || High measurement unreliability | RTM is larger — **improve your instrument first** |> **The naïve before–after study will always overestimate the treatment effect when participants were selected for extreme baseline scores. Every technique above solves this — some by design, some by analysis.**:::---## References- Bland, J. M., & Altman, D. G. (1994). Regression towards the mean. *BMJ*, 308, 1499.- Barnett, A. G., Van Der Pols, J. C., & Dobson, A. J. (2005). Regression to the mean: what it is and how to deal with it. *International Journal of Epidemiology*, 34(1), 215–220.- Jacobson, N. S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change. *Journal of Consulting and Clinical Psychology*, 59(1), 12–19.- Austin, P. C. (2011). An introduction to propensity score methods for reducing confounding. *Multivariate Behavioral Research*, 46(3), 399–424.