Purpose: To provide an introduction to the joint parametric modeling of the mean response and covariance for longitudinal data.
Instructions:
For each question requiring data analysis, support your conclusions by including only the relevant SAS/R output in your answer.
Include your SAS/R program (but not your SAS output) as an appendix to your solutions.
Joint Modeling of Mean and Covariance in the a Study on the Effect of Ozone on Pulmonary Function
In a study designed to examine the acute responses of normal subjects to ozone exposure, researcher randomized 20 subjects to exposure to either room air or 0.12 ppb ozone. The exposure lasted six hours. A baseline and 6 other measures of FEV1 (a measure of pulmonary function) made at hourly intervals were recorded for all study participants in the study. Subjects cycled on an exercise bike before a measure of FEV1 was obtained. The investigators were interested in determining whether changes in pulmonary function during the 6 hours of exposure were different in the ozone and room air exposed groups.
In the analyses of the data from this study, the response variable of interest was FEV1 (ml)/100. From here on out when we refer to FEV1 we mean this rescaled response, i.e., FEV1 (ml)/100. The measurement times were coded 0-6, with time=0 for the baseline measurement, time=1 for the measurement at hour 1,…,time=6 for the measurement at hour 6. For this homework, we will consider the data from hours 0, 2, 4, 6. The two exposure groups were coded 0 and 1, with group=1 denoting exposure to ozone and group=0 denoting exposure to room air.
The data are in the file “ozone0246.dat” on the course web page. Each row of the data set contains the following four variables: subject ID, hour, group and the y = FEV1/100 measurements, respectively (note it is already in long format).
There are 20 subjects randomized to either room air or ozone whose FEV (ml)/100 are measured at 4 equally spaced time points. All 20 subjects have 4 measurements at the same time points of baseline, 2, 4, and 6 hours. The dataset is balanced and complete.
Subjects randomized to the room air exposure have FEV1 measurements ranging from 37 to 47 at baseline and 38 to 49 at 6 hours. There is more variability between subjects compared to variability within subjects. There is a gradual slight increase overall trend in FEV1 measurements between time points. Some subjects experience sharp changes while others have little gradual changes.
Subjects randomized to the ozone exposure have a baseline FEV1 measurement ranging from 36 to 46. The FEV1 measurement at 6 hours ranges from 23 to 45. There is greater variability within subjects compared to variability between subjects. Overall there is a decreasing trend in FEV1 across time. There are different rates of decreasing trends between subjects; some subjects have sharp changes while others have gradual changes.
Both groups have similiar baseline FEV1 measurements. However, there is greater FEV1 changes in subjects in the ozone exposure group along with greater variability between subjects.
|
Mean Response Profile
|
|||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
The mean baseline FEV1 for both exposure groups are similar. The mean FEV1 measurement for the room air exposure group is consistently higher compared to the mean FEV1 measurements for the ozone exposure group. The mean trend for the two groups are in different different and at different rates; they are non-parallel. From the 2 hour time point to the 6 hour time point, there is greater variance in the ozone group compared to the room air group.
|
Sample Covariance
|
Sample Correlation
|
||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
The sample covariance matrix shows variance is not constant over time.
Sample correlation decreases as time gap increases.
a) Define a reasonable “maximal” mean model for this study. Fit this model using an unstructured variance-covariance matrix. Comment on the variance structure and on the correlation structure. What simplified variance- covariance structure(s) might be reasonable? Justify your answer.
Maximal mean model:
\(FEV1_{ij} = \beta_{0}+\beta_{1}X_{1ij}+\beta_{2}X_{2ij}+\beta_{3}X_{3ij}+\beta_{4}X_{4i}+\beta_{5}X_{5ij}+\beta_{6}X_{6ij}+\beta_{7}X_{7ij}+\epsilon_{ij}\)
where \(\epsilon_{ij} \sim {\sf N}(0,\sigma^{2}_{i})\)
|
UN Covariance
|
UN Correlation
|
||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
From the unstructured covariance matrix, there is not constant variance at time points.The correlation matrix shows as time gap increase the correlation decreases. There is not great magnitude in the decrease of correlation with increasing time. A heterogeneous autoregressive covariance matrix might be a reasonable simplified variance covariance matrix structure.
b) Keeping the same maximal mean model, evaluate whether your suggestion(s) for the variance-covariance structure from question 2a as well as the following models for the variance-covariance structure provide an adequate fit to the data compared with an unstructured variance-covariance:
i) compound symmetry
Compound symmetry covariance structure does not resemble the pattern shown in either the sample covariance matrix or the unstructured covariance structure. It’s correlation matrix is not consistent either.
ii) heterogeneous compound symmetry (hint use: type=csh in proc mixed)|
CSH Covariance
|
CSH Correlation
|
||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
Compound symmetry covariance structure with heterogenous variance does resemble the patterns seen in the unstructure covariance and correlation matrix.
iii) 1st-order autoregressive
The first order autoregressive covariance matrix does not follow the variance pattern represented in the sample or unstructured covariance structure. Although correlation does decrease with time, the correlations are much lower than what is shown in the data and unstructured covariance.
iv) heterogeneous 1st-order autoregressive (hint use: type=arh(1) in proc mixed)|
AR1H Covariance
|
AR1H Correlation
|
||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
First order auto regressive covariance structure with heterogeneous variance does also remeble the sample and unstructured covariance matrix and their correlation matrix.
Using likelihood ratio tests and the AIC criterion as appropriate, identify a model for the variance-covariance structure that provides a good fit to the data. Provide estimates for the parameters used in defining this variance-covariance model. Also provide estimated variance-covariance and correlation matrices.| Model | LogLikelihood | -2LogLikelihood | Number of Paramters |
|---|---|---|---|
| Unstructured | -177.5752 | 355.1504 | 10 |
| Compound Symmetry | -197.0629 | 394.1258 | 2 |
| Compound Symmetry, Hetero | -182.6467 | 365.2934 | 5 |
| AR(1) | -190.5644 | 381.1288 | 2 |
| AR(1), Hetero | -179.1189 | 358.2378 | 5 |
|
|
The log likelihood ratio test shows that neither compound symmetry nor the first order auto regressive covariance structure are adequate compared to the unstructured covariance model.Both compound symmetry heterogeneous and first order auto regressive heterogeneous covariance structure are adequate compared to the unstructured covariance in explaining the pattern. Both compound symmetry heterogeneous and first order autoregressive heterogeneous have fewer parameters compared to the unstructured covariance.
| Structure | -2LL | Parameters | AIC |
|---|---|---|---|
| CSH | 365.2934 | 5 | 375.2934 |
| ARH(1) | 358.2378 | 5 | 368.2378 |
Comparing compound symmetry heterogeneous and first order autoregressive heterogeneous structures using AIC, first order autoregressive heterogenous is the better structure for the data.
Fit the usual model for the analysis of mean profiles using room air exposure as the reference level for group and baseline as the reference group for time. Use the variance-covariance structure identified in your answer to question 2b. Based on this model:
a) Test the null hypothesis that the pattern of means over hours is identical (coincides) for the two exposure groups. What do you conclude?| Model | -2LogLikelihood | Number of Parameters |
|---|---|---|
| Saturated | 358.2378 | 8 |
| Time Only | 374.1760 | 4 |
| Difference | 15.9382 | 4 |
With \(Pr(\chi^{2}_{df=4, 0.95}>15.9382) = 0.003103.\), the simpler model with only time is not sufficient compared to the saturated model.
b) Test the null hypothesis that the mean response profiles of the two groups are parallel. What do you conclude?| Model | -2LogLikelihood | Number of Parameters |
|---|---|---|
| Saturated | 358.2378 | 8 |
| Time and Group | 371.6128 | 5 |
| Difference | 13.3750 | 3 |
With \(Pr(\chi^{2}_{df=3, 0.95}>13.3750) = 0.003892\), the simpler model with only time and group does not sufficiently explain the changes in group mean over time.
Fit a model that includes hour as a continuous variable, group and their interaction. Use the model for the variance-covariance structure that you identified in question 2b.
a) What is the estimated rate of change in mean response for the room air group? b) What is the estimated rate of change in mean response for the ozone group? c) Test the hypothesis that the rates of change in mean response are identical in the two groups. What do you conclude? d) What is the estimated difference in rate of mean change between the two groups? By calculating a 95% confidence interval for this difference, identify what are plausible values for the underlying true difference.
#### Libraries ####
library(tidyverse)
library(kableExtra)
library(nlme)
#### Dataset ####
FEV <- read_delim("/Users/questwhalen/Desktop/Professional/UMMC/Schedule\ and\ Classes\ and\ Material/BDS\ 724\ Longitudinal\ and\ Multilevel\ Models/HW_Projects/Longitudinal_Multilevel_Model/HW\ 4/ozone0246.dat",
delim = " ",
trim_ws = TRUE,
col_names = c("id", "time", "group", "y"))
# map(.x = 1:3, .f = ~ FEV[,.x] %>% unique)
#### Q1 ####
#### 1a. ####
# kable(list(FEV %>% group_by(id, group) %>% summarize(Count = n()) %>% ungroup %>% slice(1:5),
# FEV %>% group_by(id, group) %>% summarize(Count = n()) %>% ungroup %>% slice(6:10),
# FEV %>% group_by(id, group) %>% summarize(Count = n()) %>% ungroup %>% slice(11:15),
# FEV %>% group_by(id, group) %>% summarize(Count = n()) %>% ungroup %>% slice(16:20))) %>%
# kable_styling() %>%
# kable_paper("striped", full_width = F)
# FEV %>% group_by(time, group) %>% summarize(Count = n()) %>%
# kable %>%
# kable_styling() %>%
# kable_paper("striped", full_width = F)
# data is complete, every participants have same number of observations
# xtabs(~ id + time, FEV)
#### 1.b ####
## Individual Spagetti Plot
# interaction.plot(x.factor = FEV$time,
# trace.factor = FEV$id,
# response = FEV$y,
# legend = FALSE,
# xlab = "Time (hours)",
# ylab = "FEV1 (ml/100)",
# main = "Plot of Mean Response Profile in Subjects",
# type = "b",
# pch = 19,
# col = 1:20)
# interaction.plot(x.factor = FEV$time,
# trace.factor = FEV$id,
# response = FEV$y,
# legend = FALSE,
# xlab = "Time (hours)",
# ylab = "FEV1 (ml/100)",
# main = "Plot of Mean Response Profile in Subjects",
# type = "b",
# pch = c(rep(19,10),rep(17,10)),
# lty = c(rep(1,10),rep(2,10)),
# col = c(rep(4,10), rep(8,10)))
# legend(1.1, 30, lty = 1:2, pch = c(19,17), legend=c("0: Room Air","1: Ozone"), col=c(4,8), title = "Group")
# kable(list(FEV %>% mutate(Id = factor(id), Group = factor(group)) %>% group_by(Id, Group) %>%
# summarise(Observations = n(), Mean = mean(y), StdDev = var(y) %>% sqrt) %>% ungroup %>% slice(1:10),
# FEV %>% mutate(Id = factor(id), Group = factor(group)) %>% group_by(Id, Group) %>%
# summarise(Observations = n(), Mean = mean(y), StdDev = var(y) %>% sqrt) %>% ungroup %>% slice(11:20)
# )) %>%
# kable_styling() %>%
# kable_paper("striped", full_width = F) %>%
# add_header_above(header = c("Within Subject Variability", ""))
interaction.plot(x.factor = FEV$time[1:40],
trace.factor = FEV$id[1:40],
response = FEV$y[1:40],
legend = FALSE,
xlab = "Time (hours)",
ylab = "FEV1 (ml/100)",
main = "Subjects with Room Air Exposure FEV1 Overtime",
type = "b",
pch = c(rep(19,10)),
lty = c(rep(1,10)),
col = c(rep(4,10)))
interaction.plot(x.factor = FEV$time[41:80],
trace.factor = FEV$id[41:80],
response = FEV$y[41:80],
legend = FALSE,
xlab = "Time (hours)",
ylab = "FEV1 (ml/100)",
main = "Subjects with Ozone Exposure FEV1 Overtime",
type = "b",
pch = c(rep(17,10)),
lty = c(rep(2,10)),
col = c(rep(8,10)))
#### 1.c ####
## Mean Response Profile Spagetti Plot
interaction.plot(x.factor = FEV$time,
trace.factor = FEV$group,
response = FEV$y,
legend = FALSE,
xlab = "Time (hours)",
ylab = "FEV1 (ml/100)",
main = "Plot of Mean Response Profile by group",
type = "b",
pch = c(19,17),
lty = 1:2,
col = c("dark blue", "dark red"))
legend(1, 40, lty = 1:2, pch = c(19,17), legend=c("0: Room Air","1: Ozone"), col=c("dark blue", "dark red"), title = "Group")
kable(list(FEV %>% mutate(Time = factor(time), Group = group) %>% group_by(Time, Group) %>%
summarise(Observations = n(), Mean = mean(y), Var = var(y)) %>% ungroup)) %>%
kable_paper("striped", full_width = F) %>%
add_header_above(header = c("Mean Response Profile"))
#### Q2 ####
#### 2.a ####
UNmod <- gls(y ~ group + factor(time) + group*factor(time),
data = FEV,
corr= corSymm(form = ~ 1 | id),
weights = varIdent(form = ~ 1 |time),
method = 'REML')
# UNmod %>% summary
L <- diag(c(1.000000, 1.019838, 1.585709, 1.986173))
list(L%*%getVarCov(UNmod)%*%L, #covariance matrix
L%*%getVarCov(UNmod)%*%L %>% cor()) %>% #correlation matrix
kable()%>%
kable_paper("striped", full_width = F) %>%
add_header_above(header = c("UN Covariance", "UN Correlation"))
# getVarCov(UNmod)
# getVarCov(UNmod) %>% cor()
#varIdent(form = ~ variance covariate | grouping factor); when a grouping factor is present, a different coefficient value is used for each at its levels less one reference level
#getVarCov gives marginal variance-covariance of the responses by group
#### 2b.i ####
CSmod <- gls(y ~ group + factor(time) + group*factor(time),
data = FEV,
correlation = corCompSymm(form = ~ 1|id),
weights = varIdent(form = ~ 1), # for homogenous variance
method = 'REML')
# CSmod %>% summary
list(diag(4) %*% getVarCov(CSmod) %*% diag(4), diag(4) %*% getVarCov(CSmod) %*% diag(4) %>% cor()) %>%
kable%>%
kable_paper("striped", full_width = F) %>%
add_header_above(header = c("CS Covariance", "CS Correlation"))
#### 2b.ii ####
CSHmod <- gls(y ~ group + factor(time) + group*factor(time),
data = FEV,
correlation = corCompSymm(form = ~ 1|id),
weights = varIdent(form = ~ 1|time), #include for heterogenous variance; ratio of variances to reference time point
method = 'REML')
# CSHmod %>% summary
LCSH <- diag(c(1.000000, 0.963340, 1.513376, 1.980123 ))
list(LCSH %*% getVarCov(CSHmod) %*% LCSH, LCSH %*% getVarCov(CSHmod) %*% LCSH %>% cor()) %>%
kable%>%
kable_paper("striped", full_width = F) %>%
add_header_above(header = c("CSH Covariance", "CSH Correlation"))
#### 2b.iii ####
AR1mod <- gls(y ~ group + factor(time) + group*factor(time),
data = FEV,
correlation = corAR1(form = ~ 1|id),
weights = varIdent(form = ~ 1), #include for homogenous variance
method = 'REML')
# AR1mod %>% summary
list(diag(4) %*% getVarCov(AR1mod) %*% diag(4), diag(4) %*% getVarCov(CSHmod) %*% diag(4) %>% cor()) %>%
kable %>%
kable_paper("striped", full_width = F) %>%
add_header_above(header = c("AR1 Covariance", "AR1 Correlation"))
#### 2b.iv ####
AR1Hmod <- gls(y ~ group + factor(time) + group*factor(time),
data = FEV,
correlation = corAR1(form = ~ 1|id),
weights = varIdent(form = ~ 1|time), #include for heterogenous variance
method = 'REML')
# AR1Hmod %>% summary
LAR1H <- diag(c(1.000000, 1.016444, 1.598048, 2.024784))
list(LAR1H %*% getVarCov(AR1Hmod) %*% LAR1H, LAR1H %*% getVarCov(AR1Hmod) %*% LAR1H %>% cor()) %>%
kable%>%
kable_paper("striped", full_width = F) %>%
add_header_above(header = c("AR1H Covariance", "AR1H Correlation"))
#### 2b.comparison ####
Best <- tibble("Model" = c("Unstructured", "Compound Symmetry", "Compound Symmetry, Hetero", "AR(1)", "AR(1), Hetero"),
"LogLikelihood" = c(-177.5752, -197.0629, -182.6467, -190.5644, -179.1189),
"Number of Paramters" = c(10, 2, 5, 2, 5)) %>%
mutate("-2LogLikelihood" = -2*LogLikelihood) %>%
select(c(1,2,4,3))
Best %>%
kable %>%
kable_paper("striped", full_width = F)
Comp <- tibble("Structure Comparisons" = c("UN vs CS", "UN vs CSH", "UN vs AR(1)", "UN vs ARH(1)"),
"CV = Diff LL" = c(Best[[2,3]]-Best[[1,3]], Best[[3,3]]-Best[[1,3]], Best[[4,3]]-Best[[1,3]], Best[[5,3]]-Best[[1,3]]),
"df = Diff Parameters" = c(8, 5, 8, 5),
"Pr(chi2(df,0.95)>CV)"= c("< 0.00001", 0.054941, 0.001157, 0.749359))
list(Comp[1:2,], Comp[3:4,]) %>%
kable %>%
kable_paper("striped", full_width = F)
tibble("Structure" = c("CSH", "ARH(1)"),
"-2LL"= c(Best[[3,3]], Best[[5,3]]),
"Parameters" = c(5,5),
"AIC" = c(Best[[3,3]]+(2*5), Best[[5,3]]+(2*5))) %>%
kable %>%
kable_paper("striped", full_width = F)
#### Q3 ####
# AR1Hmod %>% summary
#### 3a ####
AR1Hmod.equalgroups <- gls(y~ time,
data = FEV,
correlation = corAR1(form = ~ 1|id),
weights = varIdent(form = ~ 1|time), #include for heterogenous variance
method = 'REML')
# AR1Hmod.equalgroups %>% summary
# anova(AR1Hmod.equalgroups, AR1Hmod)
tibble("Model" = c("Saturated","Time Only", "Difference"),
"-2LogLikelihood" = c(-2*-179.1189, -2*-189.5657, (-2*-189.5657)-(-2*-179.1189)),
"Number of Parameters" = c(8, 4, 4)) %>%
kable %>%
kable_paper("striped", full_width = F)
#### 3b ####
AR1Hmod.parallel <- gls(y~ factor(time) + group,
data = FEV,
correlation = corAR1(form = ~ 1|id),
weights = varIdent(form = ~ 1|time), #include for heterogenous variance
method = 'REML')
# AR1Hmod.parallel %>% summary
# anova(AR1Hmod.parallel, AR1Hmod)
tibble("Model" = c("Saturated","Time and Group", "Difference"),
"-2LogLikelihood" = c(-2*-179.1189, -2*-185.8064, (-2*-185.8064)-(-2*-179.1189)),
"Number of Parameters" = c(8, 5, 3)) %>%
kable %>%
kable_paper("striped", full_width = F)
#### Q4 ####
model.linear<- gls(y~ time + group + group*time,
data = FEV,
correlation = corAR1(form = ~ 1|id),
weights = varIdent(form = ~ 1|time),
method = 'REML')
model.linear %>% summary