Exercise 1, a typical RM-ANOVA scenario

1(a):

	numDF	F-value	p-value
treat	3	320.210	0.000
time	3	3.175	0.027
treat:time	6	0.585	0.742

	Value	Std.Error	t-value	p-value
treatA	18.528	1.763	10.512	0.000
treatB	18.045	1.849	9.762	0.000
treatC	20.115	1.949	10.323	0.000
time2mid	-1.618	2.493	-0.649	0.518
time3post	-3.594	2.493	-1.442	0.152
time4followup	-4.114	2.493	-1.650	0.102
treatB:time2mid	-0.810	3.612	-0.224	0.823
treatC:time2mid	0.341	3.716	0.092	0.927
treatB:time3post	-2.072	3.612	-0.574	0.567
treatC:time3post	1.053	3.716	0.283	0.777
treatB:time4followup	-2.763	3.612	-0.765	0.446
treatC:time4followup	3.779	3.716	1.017	0.311

Examining the interaction plot, treatment groups A and B begin with similar pre-treatment depression scores (around 18), while group C starts slightly higher (around 20). All three groups exhibit a decrease in scores from pre to post-treatment (1pre \(\rightarrow\) 3post). However, between post-treatment and follow-up (3post \(\rightarrow\) 4followup), the trajectories of the three groups diverge. Group C’s scores increase toward baseline while groups A and B continue to decline, though more gradually.

Each groups average response over time appears to have vertical separation. Participants in group C consistently report the highest depression scores. While participants in groups A and B report the second and third highest depression scores, respectively. Based on these observations I would suspect there to be a marginal treatment effect.

Despite group C divergence at follow-up, we initially see a near parallel downward trend across all groups, especially from pre to post. This would suggest a marginal time effect as all groups saw an average decrease in depression scores over time. Conversely, this consistent trend would argue against a strong interaction effect. There may be some evidence due to group C’s diverge but I don’t believe there will be enough. Thus, based on the interaction plot, I would expect to see evidence of main effects for both time and treatment, but likely no significant interaction effect.

Analysis of the naive two-way ANOVA results reveals strong evidence for a marginal eﬀect of treatment (F-value = 320.2, p <0.0001), weak but “significant” evidence for a marginal eﬀect of time (F-value = 3.2, p <0.0271), and no interaction effect (F-value = 0.58, p <0.7418). All of these outputs align with my interpretation of the interaction plot. The AIC and BIC are 741.4738 and 776.3415, respectively.

1(b):

ANOVA Output from fitcs1
	numDF	F-value	p-value
treat	3	105.935	0.000
time	3	9.745	0.000
treat:time	6	1.795	0.107

ANOVA Output from fitar1
	numDF	F-value	p-value
treat	3	82.177	0.000
time	3	11.411	0.000
treat:time	6	2.122	0.057

ANOVA Output from fitma1
	numDF	F-value	p-value
treat	3	274.404	0.000
time	3	3.632	0.015
treat:time	6	0.905	0.494

In the compound symmetry model (fitcs1), we find strong evidence of a marginal effect of treatment (F = 105.9, p < 0.0001) and time (F = 9.75, p < 0.0001), but not for an interaction effect (F = 1.80, p = 0.1068). In the autoregressive of order 1 model (fitar1), both treatment (F = 82.18, p < 0.0001) and time (F = 11.41, p < 0.0001) effects remain significant, and the interaction effect approaches weak significance (F = 2.12, p = 0.0566). Lastly in the moving average of order 1 model (fitma1), we again observe strong evidence for treatment (F = 274.4, p < 0.0001), weak/moderate evidence for time (F = 3.63, p = 0.0153), but no evidence for an interaction effect (F = 0.91, p = 0.4942).

Across all models, the treatment effect remains highly significant and consistent. However, the strength of statistical significance of the time and interaction effects varied depending on the assumed residual var/covar structure. All three var/covar-adjusted models yield lower p-values for the time and interaction terms compared to the naive model in part (a), suggesting that accounting for dependency in the residuals increases power. The output of model fitma1 provides a similar interpretation than the naive ANOVA from part (a).

1(c):

Compound Symmetry
	Value	Std.Error	t-value	p-value
treatA	18.528	1.763	10.512	0.000
treatB	18.045	1.849	9.762	0.000
treatC	20.115	1.949	10.323	0.000
time2mid	-1.618	1.423	-1.138	0.258
time3post	-3.594	1.423	-2.526	0.013
time4followup	-4.114	1.423	-2.891	0.005
treatB:time2mid	-0.810	2.062	-0.393	0.695
treatC:time2mid	0.341	2.121	0.161	0.872
treatB:time3post	-2.072	2.062	-1.005	0.317
treatC:time3post	1.053	2.121	0.497	0.620
treatB:time4followup	-2.763	2.062	-1.340	0.183
treatC:time4followup	3.779	2.121	1.782	0.078

AR(1)
	Value	Std.Error	t-value	p-value
treatA	18.528	1.906	9.720	0.000
treatB	18.045	1.999	9.026	0.000
treatC	20.115	2.107	9.545	0.000
time2mid	-1.618	0.815	-1.985	0.050
time3post	-3.594	1.126	-3.191	0.002
time4followup	-4.114	1.348	-3.052	0.003
treatB:time2mid	-0.810	1.181	-0.686	0.494
treatC:time2mid	0.341	1.215	0.281	0.779
treatB:time3post	-2.072	1.632	-1.270	0.207
treatC:time3post	1.053	1.679	0.628	0.532
treatB:time4followup	-2.763	1.953	-1.414	0.160
treatC:time4followup	3.779	2.009	1.881	0.063

MA(1)
	Value	Std.Error	t-value	p-value
treatA	18.528	1.489	12.441	0.000
treatB	18.045	1.562	11.552	0.000
treatC	20.115	1.647	12.217	0.000
time2mid	-1.618	1.489	-1.087	0.280
time3post	-3.594	2.106	-1.706	0.091
time4followup	-4.114	2.106	-1.953	0.053
treatB:time2mid	-0.810	2.158	-0.376	0.708
treatC:time2mid	0.341	2.220	0.154	0.878
treatB:time3post	-2.072	3.052	-0.679	0.499
treatC:time3post	1.053	3.140	0.335	0.738
treatB:time4followup	-2.763	3.052	-0.905	0.367
treatC:time4followup	3.779	3.140	1.204	0.231

AIC BIC Model Comparisons
Model	AIC	BIC
Naive ANOVA	741.474	776.342
Compound Symmetry	682.491	720.041
AR(1)	618.984	656.533
MA(1)	675.687	713.237

Model Lags
Model	Lag.1	Lag.2	Lag.3
Compound Symmetry	0.670	0.670	0.67
AR(1)	0.909	0.826	0.75
MA(1)	0.500	0.000	0.00

It is clear from the AIC and BIC values that the AR(1) model (fitar1) provides the best fit to the data. Comparing autocorrelations across models highlights why this is the case. The compound symmetry (CS) model assumes a constant correlation of 0.67 across all lags, regardless of time separation. The moving average (MA(1)) model estimates a moderate autocorrelation of 0.50 at lag 1, but drops to zero for lags 2 and beyond. In contrast, the AR(1) model captures a more theoretically realistic, gradually decaying correlation structure, estimating autocorrelations of approximately 0.91 at lag 1, 0.83 at lag 2, and 0.75 at lag 3. The notably high lag-1 autocorrelation suggests a strong temporal dependency between repeated measures, which neither the CS nor MA(1) structures adequately capture.

Conventional wisdom holds that the choice of residual correlation structure has minimal impact in repeated measures designs with few within-person observations, largely because such datasets typically exhibit weak dependence or near-independence between time points. However, this dataset violates that expectation due to its unusually strong autoregressive pattern. The high autocorrelation observed here means that properly modeling the decay of correlation over time is critical for accurately estimating fixed effects and variance components. Additionally, a potential violation of weak stationarity could contribute to this effect.

1(d)

##    
##     1pre 2mid 3post 4followup
##   A   11   10    10         9
##   B   10   10     9         8
##   C    9    8     8         6

ANOVA Dropouts (fit2)
	numDF	F-value	p-value
treat	3	429.357	0.000
time	3	3.191	0.027
treat:time	6	0.756	0.606

All three groups see individuals dropout during the study. Group A loses two individuals, the first between 1pre \(\rightarrow\) 2mid and the second between 3post \(\rightarrow\) 4followup. Group B also loses two individuals, the first between 2mid \(\rightarrow\) 3post and the second between 3post \(\rightarrow\) 4followup. Group C loses three participants, the most of any group, the first between 1pre \(\rightarrow\) 2mid, with the remainder lost between 3post \(\rightarrow\) 4followup.

These dropouts led to noticeable changes in the interaction plot, particularly for Groups A and C during the 3post \(\rightarrow\) 4followup phase. For Group C, follow-up scores decreased from approximately 20 to 18, shifting the interpretation from a full return to baseline (as seen in part (a)) to a modest post-treatment reduction. More strikingly, Group A’s follow-up levels increased from about 15 to 18, suggesting a near return to pre-treatment levels—an interpretation that differs significantly from part (a), where follow-up scores remained lower. This shift highlights how sensitive the trajectory is to the dropout of just single participant in this phase - likely an outlier.

In contrast, Group B showed minimal change, with 3post levels decreasing slightly, resulting in an interpretation largely consistent with part (a).

Based on the plot alone, I would still not expect a strong interaction effect to emerge. I would still expect a treatment effect, but slightly weakened as group means now look more similar overall. I would also still expect to see a marginal effect of time, though slightly weakened as groups now show a trend toward baseline.

That said, without confidence intervals/error bars, it is difficult to assess whether these visual differences are meaningful. Moreover, the reduced sample size due to dropout likely increases standard errors, decreasing our power to detect significant effects and making any conclusions more uncertain.

1(e):

ANOVA Results Grouped by Effect
	Effect	Model	F-value	p-value
time…1	time	fitar1	11.411	0.000
time…2	time	fitar2	12.063	0.000
time…3	time	fitcs1	9.745	0.000
time…4	time	fitcs2	10.338	0.000
time…5	time	fitma1	3.632	0.015
time…6	time	fitma2	3.886	0.011
treat…7	treat	fitar1	82.177	0.000
treat…8	treat	fitar2	126.483	0.000
treat…9	treat	fitcs1	105.935	0.000
treat…10	treat	fitcs2	156.030	0.000
treat…11	treat	fitma1	274.404	0.000
treat…12	treat	fitma2	365.794	0.000
treat:time…13	treat:time	fitar1	2.122	0.057
treat:time…14	treat:time	fitar2	1.873	0.093
treat:time…15	treat:time	fitcs1	1.795	0.107
treat:time…16	treat:time	fitcs2	1.986	0.075
treat:time…17	treat:time	fitma1	0.905	0.494
treat:time…18	treat:time	fitma2	0.890	0.505

Across these six models, there is consistent and strong evidence for a marginal treatment effect. Every model, regardless of covariance structure or dataset, reports highly significant F-values for the treatment effect (all p < 0.0001), with F-values ranging from 82.18 to 365.79. This indicates that, on average, there are clear differences in response between treatment groups.

Similarly, there is strong evidence for a marginal time effect. All models detect a “significant” effect of time, though the strength varies. In particular, models fitma1 and fitma1 models show weak evidence with p = 0.015 and p = 0.011, respectively. All other models are strongly significant with F-values for time ranging from 3.63 to 12.06 (all p < 0.0001).

In contrast, the interaction effect shows little to no evidence of significance in any model. Both AR(1) models and fitcs2 provide very weak evidence (order of magnitude) for an interaction effect p = 0.057, p = 0.093, and 0.075 respectively. This suggests that the pattern of change over time is generally similar across treatment groups.

Notably, the AR(1) and CS models tend to produce slightly higher F-values for time and interaction effects compared to the MA(1) models, but these differences do not alter the overall interpretation. The dropout datasets produce similar patterns to the complete-data models, indicating that the loss of participants did not drastically change the conclusions.

Despite the sample reduction, both marginal effects remain consistently significant across models, suggesting that the treatment and time effects are robust. RMdat2 models still fail to provide persuasive evidence against the null for an interaction effect.

Bonus Interpretation (plz don’t grade just curious if this is true)

The F-values for the marginal effects of treatment and time are consistently higher in the dropout data set compared to the complete data set, which is counter intuitive given the expected reduction in power with fewer observations. One possible explanation is that the unbalanced dropout disproportionately affected one group. Potentially Group A reduced within-group variability (with loss of outlier) making between-group contrasts more apparent.

1(f):

Compound Symmetry
	Value	Std.Error	t-value	p-value
treatA	18.528	1.471	12.598	0.000
treatB	18.045	1.543	11.698	0.000
treatC	20.115	1.626	12.371	0.000
time2mid	-1.169	1.246	-0.938	0.351
time3post	-2.685	1.246	-2.154	0.034
time4followup	-1.442	1.291	-1.117	0.267
treatB:time2mid	-1.260	1.769	-0.712	0.478
treatC:time2mid	-0.212	1.868	-0.113	0.910
treatB:time3post	-3.421	1.804	-1.896	0.061
treatC:time3post	-0.018	1.868	-0.009	0.993
treatB:time4followup	-5.399	1.873	-2.883	0.005
treatC:time4followup	-0.190	2.006	-0.095	0.925

AR(1)
	Value	Std.Error	t-value	p-value
treatA	18.528	1.567	11.823	0.000
treatB	18.045	1.644	10.979	0.000
treatC	20.115	1.732	11.611	0.000
time2mid	-1.246	0.775	-1.608	0.111
time3post	-2.727	1.062	-2.567	0.012
time4followup	-1.994	1.287	-1.549	0.125
treatB:time2mid	-1.183	1.097	-1.078	0.284
treatC:time2mid	-0.129	1.162	-0.111	0.912
treatB:time3post	-3.314	1.527	-2.171	0.032
treatC:time3post	0.028	1.593	0.018	0.986
treatB:time4followup	-5.034	1.862	-2.704	0.008
treatC:time4followup	0.786	1.970	0.399	0.691

MA(1)
	Value	Std.Error	t-value	p-value
treatA	18.528	1.265	14.644	0.000
treatB	18.045	1.327	13.599	0.000
treatC	20.115	1.399	14.381	0.000
time2mid	-1.110	1.312	-0.846	0.400
time3post	-2.451	1.833	-1.337	0.184
time4followup	-1.192	1.867	-0.639	0.525
treatB:time2mid	-1.319	1.866	-0.707	0.481
treatC:time2mid	-0.276	1.965	-0.140	0.889
treatB:time3post	-3.603	2.648	-1.361	0.177
treatC:time3post	-0.271	2.742	-0.099	0.922
treatB:time4followup	-5.662	2.712	-2.088	0.039
treatC:time4followup	-0.326	2.846	-0.114	0.909

AIC BIC Model Comparisons
Model	AIC	BIC
Compound Symmetry	580.659	616.560
AR(1)	535.739	571.640
MA(1)	575.710	611.611

Across the three models, AR(1) provides the best fit (AIC = 535.74, BIC = 571.64), suggesting it captures the underlying correlation structure of the residuals more efficiently than compound symmetry (AIC = 580.66, BIC = 616.56) or MA(1) (AIC = 575.71, BIC = 611.61). Because AR(1) allows correlations to decay over time, it is less volatile at weighting repeated observations. This is showing to be a valuable property in unbalanced designs, where number of measurements vary across individuals. Under the CS structure, can bias estimates if dropouts are more common at specific time points, leading to inflated or deflated group means. Similarly, the MA(1) may not fully capture the longer autocorrelation patterns in the data.

Overall, the point estimates for the fixed effects are quite consistent across all three models (CS, AR(1), MA(1)), indicating that the overall trends in treatment, time, and interaction effects are robust to the choice of residual correlation structure. All models provide strong evidence for a marginal treatment effect. Given its lower AIC/BIC values, it is not surprising that model fitar2 produces the narrowest confidence intervals across most fixed effect estimates. However, there are subtle differences in the precision of estimates:

Time effects:

3post: The CS and AR(1) models reaches weak significance (p = 0.03 and p = 0.01, respectively), indicating a significant change in Group A’s depression scores following treatment compared to baseline. While model MA(1)’s 95 % CI spans zero [-6.04, 1.14], indicating no significant difference between responses at these time points.

Interaction effects:

treatB:time3post: Only the AR(1) model provides evidence at the classical significance threshold (p = 0.032) for a change from baseline to follow up in Group B compared to the same period in group A. The CS model provides weak non-significant evidence (p = 0.061). While the MA(1) again provides no evidence (p = 0.177).
treatB:time4followup: The CS and AR(1) models provide moderate evidence (p = 0.005 and p = 0.008, respectively) for a change from baseline to follow-up in Group B than the same change in the Group A. While model MA(1) provides weak evidence (p = 0.039).

Other interaction terms remain non-significant across all models, though the exact estimates and t-values fluctuate depending on the covariance structure.

These patterns reinforce that while fixed effect estimates may not shift dramatically, their associated uncertainty and interpretability do depend on the assumed covariance structure. The AR(1) model appears to offer the most efficient and plausible estimates in the context of unbalanced, longitudinal data.

Exercise 2, an interrupted time series analysis

2(a):

\[y_{ij} \mid d_i \sim \textrm{N}(\beta_0 + \beta_1 \text{time} + \beta_2 \text{treatment} + \beta_3 \text{time} \times \text{treatment} + d_i, \sigma^2),\]

\(y_{ij} \mid d_i \sim \textrm{N}(\beta_0 + \beta_1 \text{time} + d_i, \ \sigma^2), \tag{1}\)

Expression (1) describes the pre-treatment phase (treatment = 0) linear relationship between time and response.

\(y_{ij} \mid d_i \sim \textrm{N}((\beta_0 + \beta_2) + (\beta_1 + \beta_3) \text{time} + d_i, \ \sigma^2), \tag{2}\)

Expression (2) describes the treatment phase (treatment = 1) linear relationship between time and response.

\((\beta_1 + \beta_3) - \beta_1 = \beta_3, \tag{3}\)

The \(\beta_3\) parameter describes the trend change between pre-intervention and treatment phases. This was determined by taking the difference in slope between the treatment (\((\beta_1 + \beta_3)\)) and pre-treatment (\(\beta_1\)) phases (3). The level change is provided by the (\(\beta_2\)) term.

2(b):

Random Intercept
	Value	Std.Error	DF	t-value	p-value
(Intercept)	30.182	3.279	137	9.204	0.000
time	-0.896	0.606	137	-1.477	0.142
treatment	-21.278	4.389	137	-4.848	0.000
time:treatment	3.282	0.689	137	4.763	0.000

Random Slope and Random Intercept
	Value	Std.Error	DF	t-value	p-value
(Intercept)	30.182	3.498	137	8.629	0.000
time	-0.896	0.598	137	-1.498	0.136
treatment	-21.278	2.882	137	-7.382	0.000
time:treatment	3.282	0.453	137	7.253	0.000

Random Intercept AR(1)
	Value	Std.Error	t-value	p-value
(Intercept)	30.154	3.980	7.576	0.000
time	-0.822	0.684	-1.201	0.232
treatment	-19.435	6.227	-3.121	0.002
time:treatment	3.053	0.938	3.256	0.001

Model Comparisons
Model	AIC	BIC	B_3	p.value
mod.hlm1	1077.576	1095.478	3.282	0.000
mod.hlm2	991.420	1015.289	3.282	0.000
mod.ar	939.377	957.279	3.053	0.001

All models estimate a positive trend change near 3 with strong to moderate evidence against the null. This suggests that the treatment phase is associated with a steeper positive slope in response over time compared to the pre-treatment phase.

Both the random intercept model (mod.hlm1) and random slope model (mod.hlm2), provide strong evidence for a positive trend change of 3.28 (p <0.0001). The random intercept model that assumes an AR(1) estimates a slightly lower value of 3.05 with moderate evidence (p = 0.0014). Despite the smaller estimate, this model (mod.ar) provides a better fit to the data based on AIC and BIC (AIC = 939.4, BIC = 957.3) compared to mod.hlm1 (AIC = 1077.58, BIC = 1095.48) and mod.hlm1 (AIC = 991.42, BIC = 1015.29).

2(c):

These plots reveal important heterogeneity in individual responses to the treatment over time. Broadly, we observe three distinct temporal trends across participants:

Initial decline followed by a post-treatment increase (Individuals 1, 3, 4, 8, and 10):

These trajectories show a clear trend change, consistent with the theoretical structure of the study — a flat or declining pre-intervention phase followed by a strong positive slope after treatment begins. This is observed in the majority of participants (Ind.1, 3, 4, 8, and 10).
Continuous increase across both phases (Ind.6 and 9):

These participants exhibit a monotonic increase in responses across the entire study period, without an apparent breakpoint at the time of intervention. For these individuals, the treatment may have simply accelerated an already increasing response pattern, or there may be no meaningful trend change at all.
Continuous decline across both phases (Ind. 2 and 5):

These individuals show a monotonic decrease, suggesting either non-responsiveness to the treatment or a confounding process overriding any treatment effect. Their responses contrast sharply with those in trend 1 and 2.

2(d):

## 
## Multivariate Meta-Analysis Model (k = 10; method: REML)
## 
##   logLik  Deviance       AIC       BIC      AICc   
## -20.1216   40.2431   44.2431   44.6376   46.2431   
## 
## Variance Components:
## 
##             estim    sqrt  nlvls  fixed  factor 
## sigma^2    2.8409  1.6855     10     no     Ind 
## 
## Test for Heterogeneity:
## Q(df = 9) = 23.5579, p-val = 0.0051
## 
## Model Results:
## 
## estimate      se    zval    pval   ci.lb   ci.ub      
##   3.3761  0.7532  4.4820  <.0001  1.8997  4.8524  *** 
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This person-specific random-effects meta-analysis revealed a significant average treatment effect (p < 0.0001), with a pooled estimate of 3.38 and 95% CI [1.90, 4.85]. This suggests that individuals on average experienced a positive change in slope following the intervention (\(\beta_3\)). However, the analysis also identified substantial heterogeneity across individuals, as indicated by a between-person variance of 2.84 and a significant test for heterogeneity (p = 0.0051). The forest plot visually illustrates this variability. Some individuals (e.g., Ind 1 and Ind 10) show strong, clearly positive effects. While others exhibit wide confidence intervals overlapping zero. This is reflected in the prediction interval (dashed line) overlapping zero. Meaning that randomly added individuals would be expected to have a trend change between -0.1 and 6. Importantly this reflects the heterogeneity in individual level trends seen in part (c). Overall, while the average effect is positive, these findings emphasize the importance of accounting for individual differences when evaluating treatment impact.

chb_HW_3

2025-04-17