Linear Mixed Effects Models for Repeated Measures in R

Introduction
Data Preparation
Understanding Model Structure
Fitting Different Models
Model Interpretation
Model Comparison
Diagnostics
Predictions and Visualization
Complete Example

Introduction

Linear Mixed Effects Models (LMMs) are ideal for repeated measures data because they: - Handle unbalanced data (missing observations) - Account for correlation between measurements from the same subject - Allow for subject-specific intercepts and slopes - Provide more power than traditional repeated measures ANOVA

Basic Model Structure

Y = Fixed Effects + Random Effects + Error

Fixed Effects: Population-level effects (same for all subjects)
Random Effects: Subject-specific deviations from population effects
Error: Residual variation

Data Preparation

First, let’s set up our environment and prepare the data:

```{r setup, message=FALSE, warning=FALSE} # Load required packages library(lme4) # For fitting mixed models library(lmerTest) # For p-values and degrees of freedom library(ggplot2) # For plotting library(dplyr) # For data manipulation library(broom.mixed) # For tidy model output library(performance) # For model diagnostics library(lattice) # For plotting random effects

Ensure your data is properly formatted

Subject should be a factor

long_data$Subject <- as.factor(long_data$Subject) long_data$Group <- as.factor(long_data$Group)

Check your data structure

str(long_data) head(long_data, 12)


### Creating Sample Data (if needed)
```{r sample-data}
# Create sample wide-format data
set.seed(123)
wide_data <- data.frame(
  Subject = 1:20,
  Group = rep(c("Treatment", "Control"), each = 10),
  Age = sample(18:65, 20),
  Baseline = rnorm(20, 50, 10),
  Week4 = rnorm(20, 52, 10),
  Week8 = rnorm(20, 54, 10),
  Week12 = rnorm(20, 56, 10)
)

# Convert to long format
long_data <- wide_data %>%
  pivot_longer(
    cols = c(Baseline, Week4, Week8, Week12),
    names_to = "TimePoint",
    values_to = "Score"
  ) %>%
  mutate(
    # Create numeric time variable
    Time = case_when(
      TimePoint == "Baseline" ~ 0,
      TimePoint == "Week4" ~ 4,
      TimePoint == "Week8" ~ 8,
      TimePoint == "Week12" ~ 12
    ),
    # Ensure Subject is a factor
    Subject = as.factor(Subject),
    Group = as.factor(Group)
  ) %>%
  arrange(Subject, Time)

head(long_data, 12)

Understanding Model Structure

Random Effects Syntax

(1|Subject): Random intercept only
(Time|Subject): Random intercept and slope (correlated)
(1|Subject) + (0 + Time|Subject): Random intercept and slope (uncorrelated)
(0 + Time|Subject): Random slope only (no random intercept)

Fitting Different Models

Model 1: Random Intercept Only

This allows each subject to have their own baseline level.

```{r model1} # Model 1: Random intercept only # Each subject gets their own baseline (intercept) model1 <- lmer(Score ~ Time * Group + (1|Subject), data = long_data)

summary(model1)


**Interpretation of `(1|Subject)`:**
- Each subject has their own intercept (baseline level)
- These intercepts follow a normal distribution around the population intercept
- Accounts for consistent individual differences

### Model 2: Random Intercept and Slope
This allows both intercepts and slopes to vary by subject.

```{r model2}
# Model 2: Random intercept and random slope for Time
# Each subject has their own baseline AND rate of change over time
model2 <- lmer(Score ~ Time * Group + (Time|Subject), 
               data = long_data)

summary(model2)

Interpretation of (Time|Subject): - Each subject has their own intercept AND slope for Time - Some subjects improve faster/slower than others - Includes correlation between random intercepts and slopes

Model 3: Uncorrelated Random Effects

Forces random intercept and slope to be uncorrelated.

```{r model3} # Model 3: Uncorrelated random intercept and slope model3 <- lmer(Score ~ Time * Group + (1|Subject) + (0 + Time|Subject), data = long_data)

summary(model3)


---

## Model Interpretation

### Fixed Effects Interpretation

```{r fixed-effects}
# Extract and display fixed effects
fixed_effects <- fixef(model2)
print(fixed_effects)

# Get confidence intervals
confint(model2, parm = "beta_")

Example Fixed Effects Interpretation: - (Intercept): Expected score for reference group (Control) at Time = 0 - Time: Rate of change per time unit for reference group - GroupTreatment: Difference between groups at Time = 0 (baseline) - Time:GroupTreatment: Additional rate of change for Treatment group

Random Effects Interpretation

```{r random-effects} # Extract random effects random_effects <- ranef(model2) print(random_effects)

Variance components

VarCorr(model2)


**Random Effects Components:**
- **Subject (Intercept) Variance**: How much subjects vary in baseline scores
- **Subject Time Variance**: How much subjects vary in rate of change
- **Correlation**: Relationship between random intercepts and slopes
- **Residual Variance**: Unexplained variation

---

## Model Comparison

```{r model-comparison}
# Compare models using information criteria
model_comparison <- AIC(model1, model2, model3)
print(model_comparison)

# Likelihood ratio test for nested models
anova(model1, model2)  # Test if random slopes are needed

Diagnostics

Residual Diagnostics

```{r diagnostics} # Basic residual plot plot(model2, main = “Residuals vs Fitted Values”)

Q-Q plot for normality of residuals

qqnorm(resid(model2), main = “Q-Q Plot of Residuals”) qqline(resid(model2))


### Random Effects Diagnostics

```{r random-diagnostics}
# Check normality of random intercepts
qqnorm(ranef(model2)$Subject[,1], main = "Q-Q Plot: Random Intercepts")
qqline(ranef(model2)$Subject[,1])

# Check normality of random slopes
qqnorm(ranef(model2)$Subject[,2], main = "Q-Q Plot: Random Slopes")
qqline(ranef(model2)$Subject[,2])

# Comprehensive model diagnostics
check_model(model2)

Predictions and Visualization

Extracting Predictions

```{r predictions} # Get fitted values for observed data long_data$fitted <- fitted(model2)

Create new data for population-level predictions

new_data <- expand.grid( Time = seq(0, 12, by = 2), Group = c(“Control”, “Treatment”) )

Population-level predictions (fixed effects only)

new_data$predicted <- predict(model2, newdata = new_data, re.form = ~0)

print(new_data)


### Visualization

```{r visualization}
# Plot raw data with model fit
ggplot(long_data, aes(x = Time, y = Score, color = Group)) +
  geom_point(alpha = 0.6, size = 2) +                    # Raw data points
  geom_line(aes(group = Subject), alpha = 0.3) +         # Individual trajectories
  geom_smooth(method = "lm", se = TRUE, size = 1.5) +    # Group trends
  labs(title = "Repeated Measures Data with Model Fit",
       subtitle = "Individual trajectories (thin lines) and group trends (thick lines)",
       x = "Time (weeks)", 
       y = "Score") +
  theme_minimal() +
  theme(plot.title = element_text(size = 14, face = "bold"),
        legend.position = "bottom")

# Plot population-level predictions
ggplot(new_data, aes(x = Time, y = predicted, color = Group)) +
  geom_line(size = 2) +
  geom_point(size = 3) +
  labs(title = "Population-Level Predictions",
       x = "Time (weeks)", 
       y = "Predicted Score") +
  theme_minimal()

Random Effects Plots

```{r random-plots} # Plot random effects dotplot(ranef(model2), main = “Random Effects by Subject”)

Individual subject trajectories

ggplot(long_data, aes(x = Time, y = Score)) + geom_point() + geom_line() + facet_wrap(~Subject, scales = “free_y”) + labs(title = “Individual Subject Trajectories”) + theme_minimal()


---

## Complete Example

### Final Model and Comprehensive Interpretation

```{r final-example}
# Fit the final model
final_model <- lmer(Score ~ Time * Group + (Time|Subject), 
                    data = long_data)

# Get tidy output
tidy_fixed <- tidy(final_model, effects = "fixed", conf.int = TRUE)
tidy_random <- tidy(final_model, effects = "ran_pars")

print("Fixed Effects:")
print(tidy_fixed)

print("Random Effects:")
print(tidy_random)

# Extract key estimates for interpretation
intercept <- fixef(final_model)[1]
time_effect <- fixef(final_model)[2]
group_effect <- fixef(final_model)[3]
interaction_effect <- fixef(final_model)[4]

# Variance components
var_components <- as.data.frame(VarCorr(final_model))
intercept_sd <- sqrt(var_components$vcov[1])
slope_sd <- sqrt(var_components$vcov[2])
residual_sd <- sqrt(var_components$vcov[3])

Model Interpretation Summary

```{r interpretation-summary} cat(“=== MODEL INTERPRETATION SUMMARY ===”)

cat(“FIXED EFFECTS:”) cat(“1. Control group baseline (Time = 0):”, round(intercept, 2), “points”) cat(“2. Control group change per week:”, round(time_effect, 2), “points/week”) cat(“3. Treatment advantage at baseline:”, round(group_effect, 2), “points”) cat(“4. Additional treatment benefit per week:”, round(interaction_effect, 2), “points/week”)

cat(“RANDOM EFFECTS (Individual Variation):”) cat(“5. Between-subject SD in baselines:”, round(intercept_sd, 2), “points”) cat(“6. Between-subject SD in slopes:”, round(slope_sd, 2), “points/week”) cat(“7. Residual SD (unexplained variation):”, round(residual_sd, 2), “points”)

cat(“PRACTICAL INTERPRETATION:”) if (group_effect > 0) { cat(“- Treatment group starts”, abs(round(group_effect, 1)), “points higher than Control”) } else { cat(“- Treatment group starts”, abs(round(group_effect, 1)), “points lower than Control”) }

if (interaction_effect > 0) { cat(“- Treatment group improves”, round(interaction_effect, 2), “points/week faster than Control”) } else if (interaction_effect < 0) { cat(“- Treatment group improves”, abs(round(interaction_effect, 2)), “points/week slower than Control”) } else { cat(“- Both groups improve at the same rate over time”) }

cat(“- Substantial individual differences in baselines (SD =”, round(intercept_sd, 1), “)”) cat(“- Individual differences in improvement rates (SD =”, round(slope_sd, 2), “)”)


### Model Assumptions Check

```{r assumptions}
cat("=== MODEL ASSUMPTIONS CHECKLIST ===\n\n")

# 1. Linearity
cat("1. LINEARITY: Check residuals vs fitted plot for patterns\n")

# 2. Normality of residuals
shapiro_resid <- shapiro.test(resid(final_model))
cat("2. RESIDUAL NORMALITY: Shapiro-Wilk p-value =", 
    round(shapiro_resid$p.value, 4), "\n")

# 3. Normality of random effects
shapiro_intercepts <- shapiro.test(ranef(final_model)$Subject[,1])
shapiro_slopes <- shapiro.test(ranef(final_model)$Subject[,2])
cat("3. RANDOM INTERCEPTS NORMALITY: Shapiro-Wilk p-value =", 
    round(shapiro_intercepts$p.value, 4), "\n")
cat("4. RANDOM SLOPES NORMALITY: Shapiro-Wilk p-value =", 
    round(shapiro_slopes$p.value, 4), "\n")

# 5. Homoscedasticity
cat("5. HOMOSCEDASTICITY: Check residuals plot for equal variance\n")

cat("\nNote: p > 0.05 suggests assumption is met for normality tests\n")

Key Takeaways

When to Use Different Random Effects Structures:

Random Intercept Only (1|Subject):
- Use when subjects differ in baseline but change at same rate
- Computationally simpler
- Good starting point
Random Intercept and Slope (Time|Subject):
- Use when subjects differ in both baseline and rate of change
- More realistic for most longitudinal data
- May have convergence issues with small samples
Uncorrelated Random Effects (1|Subject) + (0 + Time|Subject):
- Use when random intercepts and slopes are not correlated
- Helpful if correlated model doesn’t converge

Model Selection Strategy:

Start with random intercept model
Add random slopes if theoretically justified
Compare models using likelihood ratio tests
Check convergence and diagnostics
Choose most parsimonious model that fits well

Interpretation Guidelines:

Fixed effects tell you about population-average effects
Random effects tell you about individual variability
Interactions show whether groups change differently over time
Always check model assumptions and diagnostics
Consider practical significance, not just statistical significance

Additional Resources

lme4 package documentation
Mixed Effects Models in R
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1-48.