HW4

Author

Dan Reef

Imagine you are consulting for a research team analyzing data from AIDS patients with advanced immune suppression (CD4 ≤ 50 cells/mm³).

Patients received 1 of several possible HIV treatments,

and CD4 counts were tracked over time.

Your primary goal is to evaluate whether the combination of zidovudine + zalcitabine offers benefits over other regimens, accounting for repeated measures within patients.

As the study biostatistician, your team is relying on you to apply appropriate statistical methods, interpret findings in a clinical context, and guide decision-making. Complete the tasks below, ensuring clear explanations accompany your statistical output. Please note, the dataset is in person-period (long) format, which is the appropriate structure for multilevel modeling.


Variable	Description	Coding/Notes

id	Patient id

age	Age in years	Continuous (years)

sex	Sex assigned at birth	0=female, 1=male

week	Number of weeks since study enrollment	Continuous (weeks)

cd4	Patient CD4 cell count	Continuous (count)

logcd4	Natural logarithm of the patient’s CD4 cell count (cd4), calculated as log(cd4 + 1) to handle zero values and ensure the transformation is defined for all non-negative counts.	Continuous (log count)

trt	Treatment group	0=other HIV treatment regimens, 1=zidovudine plus 2.25mg of zalcitabine

Part I: Data Checking, Descriptives, and Preparation (33 points)

As part of the data cleaning process, please complete the steps below.

1. Create a comprehensive descriptive table (Table 1) summarizing patient demographics (age, sex), treatment assignment, and baseline CD4 counts. (Hint: you’ll need to create a variable to select only one case per patient) (4 points)

Use the first measurement from each subject to create a comprehensive descriptive table summarizing patient demographics
(age, sex), treatment assignment, and baseline CD4 counts.

Table 1 shows baseline characteristics for the study cohort.

Code

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Code

library(readxl)
library(skimr)
library(glue)
library(forcats)
library(gtsummary)
library(gt)
library(haven)
library(flextable)


Attaching package: 'flextable'

The following object is masked from 'package:gtsummary':

    continuous_summary

The following object is masked from 'package:purrr':

    compose

Code

library(scales)


Attaching package: 'scales'

The following object is masked from 'package:purrr':

    discard

The following object is masked from 'package:readr':

    col_factor

Code

library(broom)
library(lme4)

Loading required package: Matrix

Attaching package: 'Matrix'

The following objects are masked from 'package:tidyr':

    expand, pack, unpack

Code

library(lmerTest)


Attaching package: 'lmerTest'

The following object is masked from 'package:lme4':

    lmer

The following object is masked from 'package:stats':

    step

Code

library(ggeffects)

cd4_dta_path <- r"(G:\My Drive\UNC\BIOS 642\HW4\cd4-1.dta)"

cd4_dta_df <- read_dta(cd4_dta_path)

cd4_dta_df <- cd4_dta_df %>%
  zap_labels() %>%
  zap_label() %>%
  zap_formats() %>%
  zap_missing() %>%
  mutate(
    sex = factor(sex), 
    sex = case_when(
      sex == 0 ~ "female", 
      sex == 1 ~ "male"), 
    trt = case_when(
      trt == 0 ~ "Other HIV Tx", 
      trt == 1 ~ "Zidovudine/Zalcitabine"))

baseline_rows_df <- cd4_dta_df %>%
  group_by(id) %>%
  slice(1) %>%
  ungroup()

Code

table_1 <- baseline_rows_df %>%
  tbl_summary(
    include = c(age, sex, CD4, trt), 
    by = trt
  ) %>% 
  as_gt()

table_1

Table 1: Baseline Characteristics

Characteristic	Other HIV Tx N = 985¹	Zidovudine/Zalcitabine N = 324¹
age	37 (32, 42)	37 (32, 42)
sex
female	120 (12%)	42 (13%)
male	865 (88%)	282 (87%)
CD4	20 (10, 34)	21 (10, 36)
¹ Median (Q1, Q3); n (%)

2. Compare and contrast the distributions of the cd4 and logcd4. In your answer, make sure to note whether each is sufficiently normally distributed to be used as the dependent variable in a parametric test, presenting at least 3 pieces of statistical or graphical information that support your view. (4 points)

For these two specific questions, treat each row of data as an independent observation, as if each came from a different individual.

Code

cd4_dta_df %>%
  ggplot(aes(x = CD4)) +
  geom_histogram(aes(y = after_stat(density)), binwidth = 5) + 
  geom_density(color = "black", linewidth = 1) +
  theme_minimal() + 
  theme(
    panel.grid = element_blank(), 
    axis.line = element_line(color = "black")
  )

cd4_dta_df %>%
  ggplot(aes(x = logcd4)) +
  geom_histogram(aes(y = after_stat(density)), binwidth = 0.25) + 
  geom_density(color = "black", linewidth = 1) +
  theme_minimal() + 
  theme(
    panel.grid = element_blank(), 
    axis.line = element_line(color = "black")
  )

Code

cd4_dta_df %>%
  ggplot(aes(sample = CD4)) +
  stat_qq() + 
  stat_qq_line() + 
  theme_minimal() + 
  theme(
    panel.grid = element_blank(), 
    axis.line = element_line(color = "black")
  ) +
  labs(
  x = "Theoretical Quantiles (Normal Distribution)",
  y = "Observed Quantiles of CD4"
  )

cd4_dta_df %>%
  ggplot(aes(sample = logcd4)) +
  stat_qq() + 
  stat_qq_line() + 
  theme_minimal() + 
  theme(
    panel.grid = element_blank(), 
    axis.line = element_line(color = "black")
  ) +
  labs(
  x = "Theoretical Quantiles (Normal Distribution)",
  y = "Observed Quantiles of Log(CD4)"
  )

Code

ks_result <- ks.test(cd4_dta_df$CD4, "pnorm", 
                     mean = mean(cd4_dta_df$CD4, na.rm = TRUE), 
                     sd = sd(cd4_dta_df$CD4, na.rm = TRUE))

Warning in ks.test.default(cd4_dta_df$CD4, "pnorm", mean = mean(cd4_dta_df$CD4,
: ties should not be present for the one-sample Kolmogorov-Smirnov test

Code

D_val <- round(ks_result$statistic, 3)

p_val <- format.pval(ks_result$p.value, digits = 3, scientific = TRUE)

ks_result


    Asymptotic one-sample Kolmogorov-Smirnov test

data:  cd4_dta_df$CD4
D = 0.21656, p-value < 2.2e-16
alternative hypothesis: two-sided

For CD4, the Kolmogorov-Smirnov test indicated a significant departure from normality (D = 0.217, p = <2e-16). However, given the large sample size (n = 5036, even minor deviations may result in statistical significance.

Code

ks_result <- ks.test(cd4_dta_df$logcd4, "pnorm", 
        alternative = "two.sided", 
        mean = mean(cd4_dta_df$logcd4, na.rm = TRUE), 
        sd = sd(cd4_dta_df$logcd4, na.rm = TRUE))

Warning in ks.test.default(cd4_dta_df$logcd4, "pnorm", alternative =
"two.sided", : ties should not be present for the one-sample Kolmogorov-Smirnov
test

Code

D_val <- round(ks_result$statistic, 3)

p_val <- format.pval(ks_result$p.value, digits = 3, scientific = TRUE)

ks_result


    Asymptotic one-sample Kolmogorov-Smirnov test

data:  cd4_dta_df$logcd4
D = 0.06844, p-value < 2.2e-16
alternative hypothesis: two-sided

For log(CD4), the Kolmogorov-Smirnov test indicated a significant departure from normality (D = 0.068, p = <2e-16). However, given the large sample size (n = 5036, even minor deviations may result in statistical significance.

CD4 is not normally distributed. Rather, it is right skewed, as demonstrated by Figure 1 (a) and Figure 2 (a). The Kolmogorov-Smirnov test results outlined above add further evidence for non-normality.

Log(CD4) is also not normally distributed, though it is closer to normal than is CD4. Both the histogram (Figure 1 (b)) and the QQ plot (Figure 2 (b)) demonstrate a heavy left tail and zero-inflation. The Kolmogorov-Smirnov test results outlined above add further evidence for non-normality.

3. Recode the week variable, with patients remaining in the study for more than 8 months (i.e., > 32 weeks) coded as "1", and patients in the study for 8 months or less (i.e., ≤ 32 weeks) coded as "0". (3 points)

Code

cd4_dta_df <- cd4_dta_df %>%
  mutate(
    week_binary = case_when(
      week <= 32 ~ "≤ 32 weeks", 
      week > 32 ~ "> 32 weeks"
    )
  )

cd4_dta_df %>% select(c(week, week_binary)) %>% head(n = 10)

# A tibble: 10 × 2
    week week_binary
   <dbl> <chr>      
 1  0    ≤ 32 weeks 
 2  7.57 ≤ 32 weeks 
 3 15.6  ≤ 32 weeks 
 4 23.6  ≤ 32 weeks 
 5 32.6  > 32 weeks 
 6 40    > 32 weeks 
 7  0    ≤ 32 weeks 
 8  8    ≤ 32 weeks 
 9 16    ≤ 32 weeks 
10 23    ≤ 32 weeks

Due to scaling issues and typos in Part III, please create a new variable called month (where 4 weeks = 1 month) and use it as the time variable in your models for Part III. This adjustment will help avoid scaling inconsistencies and improve the clarity of the results.

Code

cd4_dta_df <- cd4_dta_df %>%
  mutate(
    month = week/4
  )

cd4_dta_df %>% select(c(week, month)) %>% head(n = 10)

# A tibble: 10 × 2
    week month
   <dbl> <dbl>
 1  0     0   
 2  7.57  1.89
 3 15.6   3.89
 4 23.6   5.89
 5 32.6   8.14
 6 40    10   
 7  0     0   
 8  8     2   
 9 16     4   
10 23     5.75

a) Did the average number of days in the study significantly differ between treatment and control groups? How did you come to your conclusion? (3 points)

Code

days_on_study_df <- cd4_dta_df %>% 
  # Get the row of each id with maximum value for week
  group_by(id) %>%
  slice_max(order_by = week, n = 1) %>%
  ungroup() %>% 
  arrange(as.numeric(id)) %>%
  # convert weeks on study to days_on_study
  mutate(days_on_study = week * 7)

t.test.result <- t.test(
  x = days_on_study_df %>% 
    filter(trt == "Zidovudine/Zalcitabine") %>%
    pull(days_on_study), 
  y = days_on_study_df %>% 
    filter(trt == "Other HIV Tx") %>%
    pull(days_on_study)
  )

t_stat <- t.test.result$statistic
df <- t.test.result$parameter
p_val <- t.test.result$p.value
ci <- t.test.result$conf.int

t.test.result


    Welch Two Sample t-test

data:  days_on_study_df %>% filter(trt == "Zidovudine/Zalcitabine") %>% pull(days_on_study) and days_on_study_df %>% filter(trt == "Other HIV Tx") %>% pull(days_on_study)
t = -0.37754, df = 523.86, p-value = 0.7059
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -13.129619   8.896594
sample estimates:
mean of x mean of y 
 185.9413  188.0579

The average number of days on study did not significantly differ between treatment groups. The Welch two-sample t-test yielded a t-statistic of -0.378 with 523.86 degrees of freedom, and a p-value of 0.706. The 95% confidence interval for the difference in means was (-13.13, 8.9). The mean time on study was 185.9 days for the Zidovudine/Zalcitabine group and 188.1 days for the other HIV treatment group.

b) What proportion of patients in the study were in the study more than 8 months? Was there a significant difference between treatment and control groups? How did you come to your conclusion? (3 points)

Code

days_on_study_df <- days_on_study_df %>%
  mutate(
    month_binary = case_when(
      month > 8 ~ ">8 months", 
      month <= 8 ~ "≤8 months"
    ))

prop_by_month_binary_df <- days_on_study_df %>%
  group_by(month_binary) %>%
  summarize(
    n = n(),
    proportion = n / nrow(days_on_study_df)
  )

prop_over_8_months <- prop_by_month_binary_df %>% filter(month_binary == ">8 months") %>% pull(proportion)

prop_8_months_or_below <- prop_by_month_binary_df %>% filter(month_binary == "≤8 months") %>% pull(proportion)

prop_by_month_binary_df

# A tibble: 2 × 3
  month_binary     n proportion
  <chr>        <int>      <dbl>
1 >8 months      607      0.464
2 ≤8 months      702      0.536

Proportion of patients in the study >8 months: 46.37%.

Code

prop.test.result <- prop.test(
  x = c(
    nrow(days_on_study_df %>% filter(trt == "Zidovudine/Zalcitabine", month_binary == ">8 months")), 
    nrow(days_on_study_df %>% filter(trt == "Other HIV Tx", month_binary == ">8 months"))), 
  n = c(
    nrow(days_on_study_df %>% filter(trt == "Zidovudine/Zalcitabine")), 
    nrow(days_on_study_df %>% filter(trt == "Other HIV Tx"))
    )
  )

x2 <- prop.test.result$statistic
df <- prop.test.result$parameter
p.val <- prop.test.result$p.val
ci <- prop.test.result$conf.int
estimates <- prop.test.result$estimate

prop.test.result


    2-sample test for equality of proportions with continuity correction

data:  c(nrow(days_on_study_df %>% filter(trt == "Zidovudine/Zalcitabine", month_binary == ">8 months")), nrow(days_on_study_df %>% filter(trt == "Other HIV Tx", month_binary == ">8 months"))) out of c(nrow(days_on_study_df %>% filter(trt == "Zidovudine/Zalcitabine")), nrow(days_on_study_df %>% filter(trt == "Other HIV Tx")))
X-squared = 0.026063, df = 1, p-value = 0.8717
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.05747088  0.07188461
sample estimates:
   prop 1    prop 2 
0.4691358 0.4619289

The proportion of patients remaining in the study for more than 8 months did not significantly differ between treatment groups. The two-sample test for equality of proportions yielded a chi-squared statistic of 0.026 with 1 degree of freedom, p = 0.872. The estimated proportions of patients in the study >8 months were 0.469 for the Zidovudine/Zalcitabine group and 0.462 for the other HIV treatment group. The 95% confidence interval for the difference in proportions was (-0.057, 0.072).

c) Regardless of whether there was a difference between treatment groups on length of time in the study, provide at least two possible reasons why one group of patients, on average, would remain in the study longer than another. What methods or measures would study researchers assess to determine whether these reasons might be contributing factors? (6 points)

Two possible reasons why one group of patients, on average, would remain in the study longer than another:

(1) One treatment causes more side effects than the other, leading subjects on the treatment arm associated with greater toxicity to drop out sooner due to toxicity. This can be summarized as dropout due to toxicity. Researchers could assess adverse event rates or quality-of-life measures to assess dropout due to toxicity as a possibility.

(2) There is public knowledge suggesting that one treatment arm is superior to the other and treatment is not blinded (either by design or due to flaws in blinding). Subjects on the arm perceived to be inferior, particularly those with means and motivation to seek out alternatives, drop out sooner than those on the arm perceived to be more favorable. This can be summarized as dropout due to disappointment. Researchers could assess concordance between a subject’s guess as to their treatment arm and the correct treatment arm and/or subjects’ satisfaction with their perceived treatment arm to investigate this possibility.

4. Recode the status variable, collapsing the “transplant” and “censored” categories into a single category (transplant or censored = 0) and death as the “failure” (death = 1). Provide a frequency table showing the distribution for the new variable. (2 points)

Please note that Question 4 on Homework 4 is a duplicate of a question from Homework 3. You can safely ignore Question 4 when completing your assignment.

5. Summarize missing data patterns across weeks (log_cd4). (Note: The results might be trivial) (3 points)

Code

sum(is.na(cd4_dta_df$logcd4))

[1] 0

There are no missing values for logcd4.

Part II: OLS vs. Multilevel Modeling (17 points)

6. Fit a linear regression with log_cd4 as the outcome and treatment, mean-centered age, and sex as predictors. Present the treatment effect and discuss its interpretation in terms of CD4 improvement. Hint: Ignore the data clustering for this analysis, treating each case as independent. (6 points)

For these two specific questions, treat each row of data as an independent observation, as if each came from a different individual.

Code

cd4_dta_df <- cd4_dta_df %>%
  mutate(age_centered = scale(age, center = TRUE))

logcd4_lm <- lm(logcd4 ~ trt + age_centered + sex, data = cd4_dta_df)

logcd4_coef <- summary(logcd4_lm)$coefficients

summary(logcd4_lm)


Call:
lm(formula = logcd4 ~ trt + age_centered + sex, data = cd4_dta_df)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.0696 -0.5758  0.0819  0.6963  3.4823 

Coefficients:
                          Estimate Std. Error t value Pr(>|t|)    
(Intercept)                3.01522    0.04644  64.926  < 2e-16 ***
trtZidovudine/Zalcitabine -0.09724    0.03500  -2.778  0.00548 ** 
age_centered               0.09670    0.01520   6.364 2.14e-10 ***
sexmale                   -0.13421    0.04830  -2.779  0.00548 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.073 on 5032 degrees of freedom
Multiple R-squared:  0.01036,   Adjusted R-squared:  0.009766 
F-statistic: 17.55 on 3 and 5032 DF,  p-value: 2.476e-11

A linear regression model was fit to predict log-transformed CD4 counts with treatment group, mean-centered age, and sex as predictors, treating each observation as independent.

The treatment effect estimate was -0.097 (SE = 0.035), t = -2.778, p = 0.00548.

This suggests that, on average, patients receiving Zidovudine/Zalcitabine had log(CD4) values 0.097 units lower than those receiving other HIV treatments, adjusting for age and sex.

The overall model was statistically significant, F(3, 5032) = 17.55, p = 2.476e-11.

7. Fit a random intercept MLM with repeated measures (Level 1) nested within patient (Level 2) with log_cd4 as the outcome and treatment, mean-centered age, and sex as predictors. Calculate and interpret in words the Intraclass Correlation Coefficient (ICC), making sure to explain what the statistic reveals about variability in CD4 counts at each level. (6 points)

Code

logcd4_mlm <- lmer(
  logcd4 ~ trt + age_centered + sex + (1 | id),
  data = cd4_dta_df,
  na.action = na.exclude
)

p_val_trt <- summary(logcd4_mlm)$coefficients["trtZidovudine/Zalcitabine", "Pr(>|t|)"]

print(summary(logcd4_mlm))

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: logcd4 ~ trt + age_centered + sex + (1 | id)
   Data: cd4_dta_df

REML criterion at convergence: 12429.7

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.4991 -0.4690  0.0354  0.5376  3.7605 

Random effects:
 Groups   Name        Variance Std.Dev.
 id       (Intercept) 0.7365   0.8582  
 Residual             0.4107   0.6409  
Number of obs: 5036, groups:  id, 1309

Fixed effects:
                            Estimate Std. Error         df t value Pr(>|t|)    
(Intercept)                  2.98235    0.07584 1337.91404  39.325  < 2e-16 ***
trtZidovudine/Zalcitabine   -0.07172    0.05967 1293.52460  -1.202 0.229564    
age_centered                 0.09486    0.02579 1294.17955   3.679 0.000244 ***
sexmale                     -0.12284    0.07924 1331.54148  -1.550 0.121294    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) trtZ/Z ag_cnt
trtZdvdn/Zl -0.206              
age_centerd  0.097 -0.001       
sexmale     -0.920  0.013 -0.105

Code

var_components <- as.data.frame(VarCorr(logcd4_mlm))

# Between-person variance (random intercept variance)
var_between <- var_components$vcov[var_components$grp == "id"]

# Within-person (residual) variance
var_within <- var_components$vcov[var_components$grp == "Residual"]

# ICC
icc <- var_between / (var_between + var_within)
print(icc)

[1] 0.6419785

A random intercept multilevel model was fit to predict log-transformed CD4 counts using treatment, mean-centered age, and sex as fixed effects, and allowing intercepts to vary across patients.

The intraclass correlation coefficient (ICC) is the ratio between the random-intercept (between-person) variance and the total variance (random intercept [between-person] variance + residual [within-person] variance).

The ICC was 0.642, indicating that 64.2% of the variance in log(CD4) was attributable to between-person differences (ie, stable differences across patients). The remaining 35.8% of the variance reflected within-person variability (ie, fluctuations over time within individuals).

This suggests that while there are meaningful stable differences in CD4 levels across patients, a substantial proportion of variability occurs within patients over time.

8. Compare and contrast the OLS and MLM results. Which model better accounts for the study design? (5 points)

The OLS model suggested a significant negative effect of treatment with Zidovudine/Zalcitabine on log-transformed CD4 counts (b = -0.097, p = 2.476e-11).

However, when accounting for the nesting of repeated measures within individuals using a random intercept MLM, the treatment effect became smaller and non-significant (b = -0.072, p = 2.296e-01).

As discussed above, the ICC indicates that most of the variance in log(CD4) was attributable to stable between-person differences.

The multilevel model provided a better fit to the study design by accounting for the longitudinal measurements within patients and providing a more correct estimation of the effective sample size and standard errors.

Part III: Investigating Time Trends and Interactions (40 points)

Code

head(cd4_dta_df %>%
       select(c(week, month)), n = 10)

# A tibble: 10 × 2
    week month
   <dbl> <dbl>
 1  0     0   
 2  7.57  1.89
 3 15.6   3.89
 4 23.6   5.89
 5 32.6   8.14
 6 40    10   
 7  0     0   
 8  8     2   
 9 16     4   
10 23     5.75

9. Test a MLM predicting logcd4 with a linear fixed effect of time (month) variable (no treatment variable here). Is logcd4 significantly associated with linear time? (5 points)

Code

logcd4_month_mlm <- lmer(logcd4 ~ month + (1|id), data = cd4_dta_df)

summary(logcd4_month_mlm)

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: logcd4 ~ month + (1 | id)
   Data: cd4_dta_df

REML criterion at convergence: 12262.7

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.3527 -0.4555  0.0330  0.5292  3.7115 

Random effects:
 Groups   Name        Variance Std.Dev.
 id       (Intercept) 0.7660   0.8752  
 Residual             0.3892   0.6239  
Number of obs: 5036, groups:  id, 1309

Fixed effects:
              Estimate Std. Error         df t value Pr(>|t|)    
(Intercept)  2.996e+00  2.803e-02  1.723e+03  106.90   <2e-16 ***
month       -4.042e-02  2.962e-03  3.910e+03  -13.64   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
      (Intr)
month -0.366

A multilevel model was fit with log-transformed CD4 count as the outcome and month as a fixed effect, allowing random intercepts by patient.

The effect of month was statistically significant (b = -0.04, SE = 0.003, t = -13.65, p = 1.892e-41), indicating that log(CD4) decreased by approximately 0.04 units per month on average.

10. Test a second MLM with both a linear fixed and random effect of time (month) variables in the same model. How does this model compare with the model from question 9 with respect to model fit? (6 points)

Code

logcd4_month_randslope_mlm <- lmer(
  logcd4 ~ month + (1 + month | id), 
  data = cd4_dta_df, 
  na.action = na.exclude
  )

Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00286946 (tol = 0.002, component 1)

Code

summary(logcd4_month_randslope_mlm)

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: logcd4 ~ month + (1 + month | id)
   Data: cd4_dta_df

REML criterion at convergence: 12110.7

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.1900 -0.4264  0.0267  0.4844  3.5973 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 id       (Intercept) 0.648362 0.80521      
          month       0.004332 0.06582  0.20
 Residual             0.339943 0.58305      
Number of obs: 5036, groups:  id, 1309

Fixed effects:
              Estimate Std. Error         df t value Pr(>|t|)    
(Intercept)    2.99710    0.02593 1300.69326  115.58   <2e-16 ***
month         -0.04194    0.00351  952.63639  -11.95   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
      (Intr)
month -0.215
optimizer (nloptwrap) convergence code: 0 (OK)
Model failed to converge with max|grad| = 0.00286946 (tol = 0.002, component 1)

a. How does this model compare with the model from question 9?

Code

lrt_result <- anova(logcd4_month_mlm, logcd4_month_randslope_mlm, test = "LRT")

refitting model(s) with ML (instead of REML)

Code

chisq_val <- lrt_result$Chisq[2]
df_val <- lrt_result$Df[2]
p_val <- lrt_result$`Pr(>Chisq)`[2]

print(lrt_result)

Data: cd4_dta_df
Models:
logcd4_month_mlm: logcd4 ~ month + (1 | id)
logcd4_month_randslope_mlm: logcd4 ~ month + (1 + month | id)
                           npar   AIC   BIC  logLik -2*log(L)  Chisq Df
logcd4_month_mlm              4 12255 12282 -6123.7     12247          
logcd4_month_randslope_mlm    6 12108 12147 -6047.8     12096 151.71  2
                           Pr(>Chisq)    
logcd4_month_mlm                         
logcd4_month_randslope_mlm  < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Adding a random slope for month significantly improved model fit compared to the random intercept-only model, χ²(2) = 151.71, p = 1.138e-33. The random slope model also had substantially lower AIC (1.212268^{4}) and BIC (1.2161826^{4}) compared to the random intercept-only model (AIC = 1.2270672^{4}, BIC = 1.229677^{4}), indicating improved model fit with appropriate adjustment for model complexity.

b. Use predictive margins and plot the linear effect of time (Hint: margins and marginsplot commands will help here). Briefly describe the shape and direction of the linear time trajectory.

Code

# Get predicted values for month across the observed range
pred_month <- ggpredict(logcd4_month_randslope_mlm, terms = "month")

pred_month %>%
  ggplot(aes(x = x, y = predicted)) +
    geom_line(color = "blue") +
    geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2) +
    labs(
      x = "Time (Months)",
      y = "Predicted log(CD4)",
      title = "Predicted Linear Effect of Time (Month) on log(CD4)"
    ) +
    theme_minimal() + 
  theme(
    panel.grid = element_blank(),
    axis.line = element_line(color = "black"),
    axis.ticks = element_line(color = "black")
  )

Log(CD4) decreases linearly as time increases.

11. Test a MLM predicting logcd4 with a quadratic fixed effect of time (month##month) and linear random effect of time (month). (6 points)

Code

logcd4_month2_mlm <- lmer(
  logcd4 ~ poly(month, 2) + (1 + month | id), 
  data = cd4_dta_df, 
  na.action = na.exclude
  )

Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00325604 (tol = 0.002, component 1)

Code

summary(logcd4_month2_mlm)

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: logcd4 ~ poly(month, 2) + (1 + month | id)
   Data: cd4_dta_df

REML criterion at convergence: 12041.3

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.2981 -0.4429  0.0397  0.4893  3.5627 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 id       (Intercept) 0.648592 0.80535      
          month       0.004627 0.06802  0.19
 Residual             0.332345 0.57649      
Number of obs: 5036, groups:  id, 1309

Fixed effects:
                  Estimate Std. Error         df t value Pr(>|t|)    
(Intercept)        2.83463    0.02658 1253.40402 106.634  < 2e-16 ***
poly(month, 2)1   -9.75541    0.78893  926.22427 -12.365  < 2e-16 ***
poly(month, 2)2   -4.72939    0.61846 3403.88193  -7.647 2.66e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) p(,2)1
ply(mnt,2)1 0.308        
ply(mnt,2)2 0.000  0.063 
optimizer (nloptwrap) convergence code: 0 (OK)
Model failed to converge with max|grad| = 0.00325604 (tol = 0.002, component 1)

a. How does this model compare with the model from question 10?

Code

lrt_result <- anova(logcd4_month_randslope_mlm, logcd4_month2_mlm, test = "LRT")

refitting model(s) with ML (instead of REML)

Code

chisq_val <- lrt_result$Chisq[2]
df_val <- lrt_result$Df[2]
p_val <- lrt_result$`Pr(>Chisq)`[2]

aic_q10 <- AIC(logcd4_month_randslope_mlm)
aic_q11 <- AIC(logcd4_month2_mlm)
delta_aic <- aic_q11 - aic_q10

print(lrt_result)

Data: cd4_dta_df
Models:
logcd4_month_randslope_mlm: logcd4 ~ month + (1 + month | id)
logcd4_month2_mlm: logcd4 ~ poly(month, 2) + (1 + month | id)
                           npar   AIC   BIC  logLik -2*log(L)  Chisq Df
logcd4_month_randslope_mlm    6 12108 12147 -6047.8     12096          
logcd4_month2_mlm             7 12052 12098 -6019.0     12038 57.689  1
                           Pr(>Chisq)    
logcd4_month_randslope_mlm               
logcd4_month2_mlm            3.07e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Adding a quadratic fixed effect of time (month²) significantly improved model fit compared to the linear random slope model, likelihood ratio test result: χ²(1) = 57.69, p = 3.070e-14. The quadratic model also had a substantially lower AIC (1.205529^{4}) than the linear model (1.212268^{4}), with ΔAIC = -67.39. This provides strong evidence that a curved (quadratic) trajectory better describes the association between time and log(CD4) than a linear trend.

b. Use predictive margins and plot the quadratic effect of time. Briefly describe the shape and direction of the quadratic time trajectory.

Code

# Get predicted values for month across the observed range
pred_month <- ggpredict(logcd4_month2_mlm, terms = "month [all]")

pred_month %>%
  ggplot(aes(x = x, y = predicted)) +
    geom_line(color = "blue") +
    geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2) +
    labs(
      x = "Time (Months)",
      y = "Predicted log(CD4)",
      title = "Predicted Quadratic Effect of Time (Month) on log(CD4)"
    ) +
    theme_minimal() + 
  theme(
    panel.grid = element_blank(),
    axis.line = element_line(color = "black"),
    axis.ticks = element_line(color = "black")
  )

As time increases, predicted log(CD4) increases slightly, levels off, and then decreases.

12. Test a MLM predicting logcd4 with a cubic fixed effect of time (month##month##month) and linear random effect of time (month). (6 points)

Code

logcd4_month3_mlm <- lmer(
  logcd4 ~ poly(month, 3) + (1 + month | id), 
  data = cd4_dta_df, 
  na.action = na.exclude
  )

summary(logcd4_month3_mlm)

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: logcd4 ~ poly(month, 3) + (1 + month | id)
   Data: cd4_dta_df

REML criterion at convergence: 12006.8

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.4110 -0.4389  0.0400  0.4859  3.5649 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 id       (Intercept) 0.649842 0.80613      
          month       0.004632 0.06806  0.20
 Residual             0.329010 0.57359      
Number of obs: 5036, groups:  id, 1309

Fixed effects:
                  Estimate Std. Error         df t value Pr(>|t|)    
(Intercept)        2.83685    0.02662 1253.94700 106.581  < 2e-16 ***
poly(month, 3)1   -9.60747    0.78716  922.75745 -12.205  < 2e-16 ***
poly(month, 3)2   -4.47920    0.61705 3442.72996  -7.259 4.79e-13 ***
poly(month, 3)3    3.57460    0.61503 3299.62642   5.812 6.76e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) p(,3)1 p(,3)2
ply(mnt,3)1 0.311               
ply(mnt,3)2 0.001  0.065        
ply(mnt,3)3 0.016  0.035  0.071

a. How does this model compare with the model from question 11?

Code

lrt_result <- anova(logcd4_month2_mlm, logcd4_month3_mlm, test = "LRT")

refitting model(s) with ML (instead of REML)

Code

chisq_val <- lrt_result$Chisq[2]
df_val <- lrt_result$Df[2]
p_val <- lrt_result$`Pr(>Chisq)`[2]

aic_q11 <- AIC(logcd4_month2_mlm)
aic_q12 <- AIC(logcd4_month3_mlm)
delta_aic <- aic_q12 - aic_q11

print(lrt_result)

Data: cd4_dta_df
Models:
logcd4_month2_mlm: logcd4 ~ poly(month, 2) + (1 + month | id)
logcd4_month3_mlm: logcd4 ~ poly(month, 3) + (1 + month | id)
                  npar   AIC   BIC  logLik -2*log(L)  Chisq Df Pr(>Chisq)    
logcd4_month2_mlm    7 12052 12098 -6019.0     12038                         
logcd4_month3_mlm    8 12020 12073 -6002.2     12004 33.621  1  6.696e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Including a cubic fixed effect of time (month³) significantly improved model fit compared to the quadratic model, χ²(1) = 33.62, p = 6.696e-09. The cubic model also had a lower AIC (1.202282^{4}) than the quadratic model (1.205529^{4}), with ΔAIC = -32.47. This indicates that the time trajectory of log(CD4) is better captured by a cubic curve than by a quadratic function.

b. Use predictive margins and plot the cubic effect of time. Briefly describe the shape and direction of the cubic time trajectory.

Code

# Get predicted values for month across the observed range
pred_month <- ggpredict(logcd4_month3_mlm, terms = "month [all]")

pred_month %>%
  ggplot(aes(x = x, y = predicted)) +
    geom_line(color = "blue") +
    geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.2) +
    labs(
      x = "Time (Months)",
      y = "Predicted log(CD4)",
      title = "Predicted Cubic Effect of Time (Month) on log(CD4)"
    ) +
    theme_minimal() + 
  theme(
    panel.grid = element_blank(),
    axis.line = element_line(color = "black"),
    axis.ticks = element_line(color = "black")
  )

Log(CD4) progresses with a sinusoidal shape over time. It increases, levels off, decreases, levels off, and begins increasing slightly. I question whether this final rise makes biological sense, or from a statistical perspective whether this is due to overfitting.

13. In your view, which model from questions 10, 11 and 12 is the “best” model for describing the association between time (month) and logcd4? What information are you using to make this assessment? (5 points)

Code

aic_q10 <- AIC(logcd4_month_randslope_mlm)  # Model 10: Linear random slope
aic_q11 <- AIC(logcd4_month2_mlm)           # Model 11: Quadratic
aic_q12 <- AIC(logcd4_month3_mlm)           # Model 12: Cubic

aic_q12 - aic_q10

[1] -99.86063

The LRT and AIC shows an improvement in model fit for models from q11 over q10 and q12 over q11 (with AIC also penalizing model complexity). Additionally, AIC for model q12 is substantially smaller than that for model q10. Together, these data demonstrate the model that best fits the data is the model in q12 (cubic).

14. Using the “best” model identified in question 13, test a model interacting the time (month)interaction term with treatment group. Describe the shape and direction of the time trajectories, making sure to note whether and how the treatment groups significantly differ from each other with respect to time. (Hint: Use predictive margins to help you answer this question). (6 points)

Code

logcd4_month3_trt_mlm <- lmer(
  logcd4 ~ poly(month, 3) * trt + (1 + month | id), 
  data = cd4_dta_df, 
  na.action = na.exclude
  )

Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00529595 (tol = 0.002, component 1)

Code

summary(logcd4_month3_trt_mlm)

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: logcd4 ~ poly(month, 3) * trt + (1 + month | id)
   Data: cd4_dta_df

REML criterion at convergence: 11991.3

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.4126 -0.4432  0.0399  0.4902  3.5504 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 id       (Intercept) 0.651142 0.80693      
          month       0.004578 0.06766  0.19
 Residual             0.328668 0.57330      
Number of obs: 5036, groups:  id, 1309

Fixed effects:
                                            Estimate Std. Error         df
(Intercept)                                  2.85649    0.03065 1252.52800
poly(month, 3)1                             -8.47333    0.90566  928.08517
poly(month, 3)2                             -5.17279    0.71491 3450.66657
poly(month, 3)3                              3.57204    0.71038 3291.90492
trtZidovudine/Zalcitabine                   -0.08081    0.06168 1254.32493
poly(month, 3)1:trtZidovudine/Zalcitabine   -4.63838    1.81874  911.02982
poly(month, 3)2:trtZidovudine/Zalcitabine    2.80945    1.41382 3423.74217
poly(month, 3)3:trtZidovudine/Zalcitabine    0.02709    1.41833 3311.18603
                                          t value Pr(>|t|)    
(Intercept)                                93.201  < 2e-16 ***
poly(month, 3)1                            -9.356  < 2e-16 ***
poly(month, 3)2                            -7.236 5.69e-13 ***
poly(month, 3)3                             5.028 5.21e-07 ***
trtZidovudine/Zalcitabine                  -1.310   0.1904    
poly(month, 3)1:trtZidovudine/Zalcitabine  -2.550   0.0109 *  
poly(month, 3)2:trtZidovudine/Zalcitabine   1.987   0.0470 *  
poly(month, 3)3:trtZidovudine/Zalcitabine   0.019   0.9848    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) pl(,3)1 pl(,3)2 pl(,3)3 trtZ/Z p(,3)1: p(,3)2:
ply(mnt,3)1  0.307                                               
ply(mnt,3)2  0.004  0.073                                        
ply(mnt,3)3  0.019  0.047   0.083                                
trtZdvdn/Zl -0.497 -0.152  -0.002  -0.009                        
pl(,3)1:Z/Z -0.153 -0.498  -0.036  -0.023   0.310                
pl(,3)2:Z/Z -0.002 -0.037  -0.506  -0.042  -0.005  0.050         
pl(,3)3:Z/Z -0.009 -0.024  -0.041  -0.501   0.009  0.012   0.047 
optimizer (nloptwrap) convergence code: 0 (OK)
Model failed to converge with max|grad| = 0.00529595 (tol = 0.002, component 1)

Code

pred_trt_month <- ggpredict(logcd4_month3_trt_mlm, terms = c("month [all]", "trt"))

Some of the focal terms are of type `character`. This may lead to
  unexpected results. It is recommended to convert these variables to
  factors before fitting the model.
  The following variables are of type character: `trt`

Code

ggplot(pred_trt_month, aes(x = x, y = predicted, color = group)) +
  geom_line() +
  geom_ribbon(aes(ymin = conf.low, ymax = conf.high, fill = group), alpha = 0.2, color = NA) +
  labs(
    x = "Time (Months)",
    y = "Predicted log(CD4)",
    title = "Predicted CD4 Trajectories Over Time by Treatment Group",
    color = "Treatment Group",
    fill = "Treatment Group"
  ) +
  theme_minimal() +
  theme(
    panel.grid = element_blank(),
    axis.line = element_line(color = "black"),
    axis.ticks = element_line(color = "black")
  )

Again, for both curves: Log(CD4) progresses with a sinusoidal shape over time. It increases, levels off, decreases, levels off, and begins increasing slightly. I question whether this final rise makes biological sense, or from a statistical perspective whether this is due to overfitting.

However, beginning shortly at <1 month log(CD4) is higher at all time points for Other HIV Tx than it is for Zidovudine/Zalcitabine. The Zidovudine/Zalcitabine group experienced a steeper decline in log(CD4) during the middle months, as evidenced by significant interaction terms between time and treatment in the mixed-effects model (0.0109 for the linear interaction term, with a similar trend for the quadratic term. These findings suggest that not only does log(CD4) change nonlinearly over time, but the pattern and magnitude of these changes differ significantly by treatment group.

15. Test a cross-level interaction between mean-centered age and month, while including treatment and sex as fixed main effects in the model. Does it appear that age is associated with logcd4 trends? (6 points)

Code

q15_mlm <- lmer(
  logcd4 ~ poly(month, 3) * age_centered + trt + sex + (1 + month | id), 
  data = cd4_dta_df, 
  na.action = na.exclude
  )

Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00346236 (tol = 0.002, component 1)

Code

# Fixed effects table
fixef_q15 <- summary(q15_mlm)$coefficients

# p-values for interactions (important for interpretation)
p_linear <- fixef_q15["poly(month, 3)1:age_centered", "Pr(>|t|)"]
p_quadratic <- fixef_q15["poly(month, 3)2:age_centered", "Pr(>|t|)"]
p_cubic <- fixef_q15["poly(month, 3)3:age_centered", "Pr(>|t|)"]

# Main effect of age
p_age_main <- fixef_q15["age_centered", "Pr(>|t|)"]

# Format for inline reporting
p_linear_fmt <- label_scientific(digits = 2)(p_linear)
p_quadratic_fmt <- label_scientific(digits = 2)(p_quadratic)
p_cubic_fmt <- label_scientific(digits = 2)(p_cubic)
p_age_main_fmt <- label_scientific(digits = 2)(p_age_main)

summary(q15_mlm)

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: logcd4 ~ poly(month, 3) * age_centered + trt + sex + (1 + month |  
    id)
   Data: cd4_dta_df

REML criterion at convergence: 12000.6

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.3894 -0.4376  0.0412  0.4822  3.5528 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 id       (Intercept) 0.645090 0.80317      
          month       0.004637 0.06809  0.19
 Residual             0.328992 0.57358      
Number of obs: 5036, groups:  id, 1309

Fixed effects:
                               Estimate Std. Error         df t value Pr(>|t|)
(Intercept)                     2.91545    0.07438 1343.28107  39.195  < 2e-16
poly(month, 3)1                -9.58252    0.78728  922.95070 -12.172  < 2e-16
poly(month, 3)2                -4.48352    0.61717 3439.27580  -7.265  4.6e-13
poly(month, 3)3                 3.58576    0.61513 3296.41337   5.829  6.1e-09
age_centered                    0.09531    0.02654 1254.92410   3.591 0.000343
trtZidovudine/Zalcitabine      -0.03036    0.05839 1290.33522  -0.520 0.603167
sexmale                        -0.08051    0.07728 1312.18629  -1.042 0.297695
poly(month, 3)1:age_centered    0.60050    0.79524  935.35230   0.755 0.450366
poly(month, 3)2:age_centered   -0.89535    0.61863 3472.66201  -1.447 0.147896
poly(month, 3)3:age_centered   -0.08415    0.61223 3263.27356  -0.137 0.890689
                                
(Intercept)                  ***
poly(month, 3)1              ***
poly(month, 3)2              ***
poly(month, 3)3              ***
age_centered                 ***
trtZidovudine/Zalcitabine       
sexmale                         
poly(month, 3)1:age_centered    
poly(month, 3)2:age_centered    
poly(month, 3)3:age_centered    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) pl(,3)1 pl(,3)2 pl(,3)3 ag_cnt trtZ/Z sexmal p(,3)1: p(,3)2:
ply(mnt,3)1  0.111                                                             
ply(mnt,3)2 -0.007  0.065                                                      
ply(mnt,3)3  0.009  0.036   0.071                                              
age_centerd  0.092  0.000  -0.005   0.000                                      
trtZdvdn/Zl -0.204  0.002  -0.001   0.000  -0.001                              
sexmale     -0.914 -0.002   0.008  -0.003  -0.100  0.011                       
ply(m,3)1:_  0.001 -0.003  -0.008  -0.006   0.311  0.002 -0.001                
ply(m,3)2:_  0.000 -0.008  -0.007  -0.012   0.006 -0.001 -0.001  0.083         
ply(m,3)3:_  0.001 -0.006  -0.012  -0.011   0.013 -0.004  0.000  0.029   0.069 
optimizer (nloptwrap) convergence code: 0 (OK)
Model failed to converge with max|grad| = 0.00346236 (tol = 0.002, component 1)

Code

pred_q15 <- ggpredict(q15_mlm, terms = c("month [all]"))

ggplot(pred_q15, aes(x = x, y = predicted, color = group)) +
  geom_line() +
  geom_ribbon(aes(ymin = conf.low, ymax = conf.high, fill = group), alpha = 0.2, color = NA) +
  labs(
    x = "Time (Months)",
    y = "Predicted log(CD4)",
    title = "Predicted CD4 Trajectories Over Time"
  ) +
  theme_minimal() +
  theme(
    panel.grid = element_blank(),
    axis.line = element_line(color = "black"),
    axis.ticks = element_line(color = "black"), 
    legend.position = "none"
  )

pred_q15 <- ggpredict(q15_mlm, terms = c("age_centered"))

Model contains splines or polynomial terms. Consider using
  `terms="age_centered [all]"` to get smooth plots. See also
  package-vignette 'Adjusted Predictions at Specific Values'.

Code

ggplot(pred_q15, aes(x = x, y = predicted, color = group)) +
  geom_line() +
  geom_ribbon(aes(ymin = conf.low, ymax = conf.high, fill = group), alpha = 0.2, color = NA) +
  labs(
    x = "Age (Mean Centered)",
    y = "Predicted log(CD4)",
    title = "Predicted CD4 Trajectories By Age (Mean Centered)"
  ) +
  theme_minimal() +
  theme(
    panel.grid = element_blank(),
    axis.line = element_line(color = "black"),
    axis.ticks = element_line(color = "black"), 
    legend.position = "none"
  )

Adjusting for time, sex, and treatment group, the estimated coefficient for age was 0.095, corresponding to an approximate increase of 0.095 log(CD4) units per additional year of age. The main effect of age was statistically significant (p = 3.4e-04).

None of the interactions between age and the linear, quadratic, or cubic time terms were statistically significant (linear: p = 4.5e-01; quadratic: p = 1.5e-01; cubic: p = 8.9e-01). This suggests that while age was associated with overall log(CD4), the shape and direction of log(CD4) change over time did not significantly differ by age.

The predicted values plotted in Figure 3 illustrate these findings:

Figure 3 (a) shows the log(CD4) trajectory over time, which followed a sinusoidal trajectory as described previously.

Figure 3 (b) shows the positive association between age and log(CD4).