Homework 4

Purpose: To provide an introduction to the joint parametric modeling of the mean response and covariance for longitudinal data.

Instructions:

For each question requiring data analysis, support your conclusions by including only the relevant SAS/R output in your answer.
Include your SAS/R program (but not your SAS output) as an appendix to your solutions.

Joint Modeling of Mean and Covariance in the a Study on the Effect of Ozone on Pulmonary Function

In a study designed to examine the acute responses of normal subjects to ozone exposure, researcher randomized 20 subjects to exposure to either room air or 0.12 ppb ozone. The exposure lasted six hours. A baseline and 6 other measures of FEV1 (a measure of pulmonary function) made at hourly intervals were recorded for all study participants in the study. Subjects cycled on an exercise bike before a measure of FEV1 was obtained. The investigators were interested in determining whether changes in pulmonary function during the 6 hours of exposure were different in the ozone and room air exposed groups.

In the analyses of the data from this study, the response variable of interest was FEV1 (ml)/100. From here on out when we refer to FEV1 we mean this rescaled response, i.e., FEV1 (ml)/100. The measurement times were coded 0-6, with time=0 for the baseline measurement, time=1 for the measurement at hour 1,…,time=6 for the measurement at hour 6. For this homework, we will consider the data from hours 0, 2, 4, 6. The two exposure groups were coded 0 and 1, with group=1 denoting exposure to ozone and group=0 denoting exposure to room air.

The data are in the file “ozone0246.dat” on the course web page. Each row of the data set contains the following four variables: subject ID, hour, group and the y = FEV1/100 measurements, respectively (note it is already in long format).

1. (10 pts) Descriptive Analyses:

Describe key aspects of the longitudinal design and completeness of data.

There are 20 subjects randomized to either room air or ozone whose FEV (ml)/100 are measured at 4 equally spaced time points. All 20 subjects have 4 measurements at the same time points of baseline, 2, 4, and 6 hours. The dataset is balanced and complete.

Plot the FEV1 response against hour for each subject exposed to room air (all on the first plot) and for each subject exposed to ozone (all on a second plot). Comment on any patterns in the data or other notable aspects of the data.

Subjects randomized to the room air exposure have FEV1 measurements ranging from 37 to 47 at baseline and 38 to 49 at 6 hours. There is more variability between subjects compared to variability within subjects. There is a gradual slight increase overall trend in FEV1 measurements between time points. Some subjects experience sharp changes while others have little gradual changes.

Subjects randomized to the ozone exposure have a baseline FEV1 measurement ranging from 36 to 46. The FEV1 measurement at 6 hours ranges from 23 to 45. There is greater variability within subjects compared to variability between subjects. Overall there is a decreasing trend in FEV1 across time. There are different rates of decreasing trends between subjects; some subjects have sharp changes while others have gradual changes.

Both groups have similiar baseline FEV1 measurements. However, there is greater FEV1 changes in subjects in the ozone exposure group along with greater variability between subjects.

Obtain the mean FEV1 value at each hour of measurement for ozone and room air subjects separately. Plot the means against hour. Comment on the pattern of change in mean FEV1 with hour for ozone and room air subjects.

Mean Response Profile

Time	Group	Observations	Mean	Var
0	0	10	42.383	10.061979
0	1	10	42.617	8.597179
2	0	10	41.868	7.859729
2	1	10	41.009	11.547299
4	0	10	42.515	9.356672
4	1	10	39.600	37.561489
6	0	10	43.124	11.709160
6	1	10	37.211	61.898943

The mean baseline FEV1 for both exposure groups are similar. The mean FEV1 measurement for the room air exposure group is consistently higher compared to the mean FEV1 measurements for the ozone exposure group. The mean trend for the two groups are in different different and at different rates; they are non-parallel. From the 2 hour time point to the 6 hour time point, there is greater variance in the ozone group compared to the room air group.

Sample Covariance

Sample Correlation

	t0	t2	t4	t6
t0	8.852958	7.570342	10.38459	10.24317
t2	7.570342	9.386982	12.95955	15.45361
t4	10.384595	12.959549	24.46050	27.52209
t6	10.243174	15.453607	27.52209	44.06794

	t0	t2	t4	t6
t0	1.0000000	0.8304401	0.7056879	0.5185958
t2	0.8304401	1.0000000	0.8552522	0.7598109
t4	0.7056879	0.8552522	1.0000000	0.8382766
t6	0.5185958	0.7598109	0.8382766	1.0000000

The sample covariance matrix shows variance is not constant over time.

Sample correlation decreases as time gap increases.

2. (20 pts) Fitting a “Maximal” Model and Evaluating Variance-Covariance Structure:

a) Define a reasonable “maximal” mean model for this study. Fit this model using an unstructured variance-covariance matrix. Comment on the variance structure and on the correlation structure. What simplified variance- covariance structure(s) might be reasonable? Justify your answer.

Maximal mean model:

\(FEV1_{ij} = \beta_{0}+\beta_{1}X_{1ij}+\beta_{2}X_{2ij}+\beta_{3}X_{3ij}+\beta_{4}X_{4i}+\beta_{5}X_{5ij}+\beta_{6}X_{6ij}+\beta_{7}X_{7ij}+\epsilon_{ij}\)

where \(\epsilon_{ij} \sim {\sf N}(0,\sigma^{2}_{i})\)

\(X_{1ij}\) :Hr 2 time indicator
\(X_{2ij}\) :Hr 4 time indicator
\(X_{3ij}\) :Hr 6 time indicator
\(X_{4i}\) :Ozone group indicator
\(X_{5ij}\) :Hr 2 and Ozone interaction
\(X_{6ij}\) :Hr 4 and Ozone interaction
\(X_{7ij}\) :Hr 6 and Ozone interaction

UN Covariance

UN Correlation

9.329589	8.206349	17.68216	22.23825
8.206349	10.092233	20.99710	30.18338
17.682163	20.997101	58.98705	76.41663
22.238252	30.183379	76.41663	145.18795

1.0000000	0.9851576	0.9929213	0.9632263
0.9851576	1.0000000	0.9916086	0.9893279
0.9929213	0.9916086	1.0000000	0.9632258
0.9632263	0.9893279	0.9632258	1.0000000

From the unstructured covariance matrix, there is not constant variance at time points.The correlation matrix shows as time gap increase the correlation decreases. There is not great magnitude in the decrease of correlation with increasing time. A heterogeneous autoregressive covariance matrix might be a reasonable simplified variance covariance matrix structure.

b) Keeping the same maximal mean model, evaluate whether your suggestion(s) for the variance-covariance structure from question 2a as well as the following models for the variance-covariance structure provide an adequate fit to the data compared with an unstructured variance-covariance:

i) compound symmetry

Compound symmetry covariance structure does not resemble the pattern shown in either the sample covariance matrix or the unstructured covariance structure. It’s correlation matrix is not consistent either.

ii) heterogeneous compound symmetry (hint use: type=csh in proc mixed)

CSH Covariance

CSH Correlation

9.853779	7.144273	17.63164	30.18444
7.144273	8.486354	16.36258	28.01188
17.631637	16.362581	51.68807	69.13164
30.184439	28.011881	69.13164	151.48558

1.0000000	0.9864895	0.9663038	0.9908609
0.9864895	1.0000000	0.9675543	0.9922139
0.9663038	0.9675543	1.0000000	0.9409958
0.9908609	0.9922139	0.9409958	1.0000000

Compound symmetry covariance structure with heterogenous variance does resemble the patterns seen in the unstructure covariance and correlation matrix.

iii) 1st-order autoregressive

The first order autoregressive covariance matrix does not follow the variance pattern represented in the sample or unstructured covariance structure. Although correlation does decrease with time, the correlations are much lower than what is shown in the data and unstructured covariance.

iv) heterogeneous 1st-order autoregressive (hint use: type=arh(1) in proc mixed)

AR1H Covariance

AR1H Correlation

9.183796	8.000444	16.67447	22.57122
8.000444	9.802927	20.43119	27.65646
16.674472	20.431194	59.89375	81.07452
22.571217	27.656463	81.07452	154.36082

1.0000000	0.9880637	0.9895792	0.9835088
0.9880637	1.0000000	0.9985443	0.9819967
0.9895792	0.9985443	1.0000000	0.9737329
0.9835088	0.9819967	0.9737329	1.0000000

First order auto regressive covariance structure with heterogeneous variance does also remeble the sample and unstructured covariance matrix and their correlation matrix.

Using likelihood ratio tests and the AIC criterion as appropriate, identify a model for the variance-covariance structure that provides a good fit to the data. Provide estimates for the parameters used in defining this variance-covariance model. Also provide estimated variance-covariance and correlation matrices.

Model	LogLikelihood	-2LogLikelihood	Number of Paramters
Unstructured	-177.5752	355.1504	10
Compound Symmetry	-197.0629	394.1258	2
Compound Symmetry, Hetero	-182.6467	365.2934	5
AR(1)	-190.5644	381.1288	2
AR(1), Hetero	-179.1189	358.2378	5

Structure Comparisons	CV = Diff LL	df = Diff Parameters	Pr(chi2(df,0.95)>CV)
UN vs CS	38.9754	8	< 0.00001
UN vs CSH	10.1430	5	0.054941

Structure Comparisons	CV = Diff LL	df = Diff Parameters	Pr(chi2(df,0.95)>CV)
UN vs AR(1)	25.9784	8	0.001157
UN vs ARH(1)	3.0874	5	0.749359

The log likelihood ratio test shows that neither compound symmetry nor the first order auto regressive covariance structure are adequate compared to the unstructured covariance model.Both compound symmetry heterogeneous and first order auto regressive heterogeneous covariance structure are adequate compared to the unstructured covariance in explaining the pattern. Both compound symmetry heterogeneous and first order autoregressive heterogeneous have fewer parameters compared to the unstructured covariance.

Structure	-2LL	Parameters	AIC
CSH	365.2934	5	375.2934
ARH(1)	358.2378	5	368.2378

Comparing compound symmetry heterogeneous and first order autoregressive heterogeneous structures using AIC, first order autoregressive heterogenous is the better structure for the data.

3. (20 pts) Analysis of Response Profiles:

Fit the usual model for the analysis of mean profiles using room air exposure as the reference level for group and baseline as the reference group for time. Use the variance-covariance structure identified in your answer to question 2b. Based on this model:

a) Test the null hypothesis that the pattern of means over hours is identical (coincides) for the two exposure groups. What do you conclude?

Model	-2LogLikelihood	Number of Parameters
Saturated	358.2378	8
Time Only	374.1760	4
Difference	15.9382	4

With \(Pr(\chi^{2}_{df=4, 0.95}>15.9382) = 0.003103.\), the simpler model with only time is not sufficient compared to the saturated model.

b) Test the null hypothesis that the mean response profiles of the two groups are parallel. What do you conclude?

Model	-2LogLikelihood	Number of Parameters
Saturated	358.2378	8
Time and Group	371.6128	5
Difference	13.3750	3

With \(Pr(\chi^{2}_{df=3, 0.95}>13.3750) = 0.003892\), the simpler model with only time and group does not sufficiently explain the changes in group mean over time.

4. (20 pts) Fitting a Linear Model in Time:

Fit a model that includes hour as a continuous variable, group and their interaction. Use the model for the variance-covariance structure that you identified in question 2b.

a) What is the estimated rate of change in mean response for the room air group? b) What is the estimated rate of change in mean response for the ozone group? c) Test the hypothesis that the rates of change in mean response are identical in the two groups. What do you conclude? d) What is the estimated difference in rate of mean change between the two groups? By calculating a 95% confidence interval for this difference, identify what are plausible values for the underlying true difference.

5. (20 pts) Evaluating the Fit of a Linear Model in Time: Does a model with a linear trend in hour for each exposure group adequately describe the pattern of change in the two groups? Justify your answer with appropriate statistical analysis. (Hint: compare it with quadratic model and statured model)

6. (10 pts) Summarizing the Key Results and Conclusions: Write a brief structured abstract (maximum 200 words) summarizing the objective, methods, results and conclusions that might be drawn concerning exposure differences in patterns of pulmonary function over time.

#### Libraries ####
library(tidyverse)
library(kableExtra)
library(nlme)

#### Dataset ####
FEV <- read_delim("/Users/questwhalen/Desktop/Professional/UMMC/Schedule\ and\ Classes\ and\ Material/BDS\ 724\ Longitudinal\ and\ Multilevel\ Models/HW_Projects/Longitudinal_Multilevel_Model/HW\ 4/ozone0246.dat", 
                  delim = " ", 
                  trim_ws = TRUE,
                  col_names = c("id", "time", "group", "y"))

# map(.x = 1:3, .f = ~ FEV[,.x] %>% unique)

#### Q1 ####
#### 1a. ####
# kable(list(FEV %>% group_by(id, group) %>% summarize(Count = n()) %>% ungroup %>% slice(1:5),
#            FEV %>% group_by(id, group) %>% summarize(Count = n()) %>% ungroup %>% slice(6:10),
#            FEV %>% group_by(id, group) %>% summarize(Count = n()) %>% ungroup %>% slice(11:15),
#            FEV %>% group_by(id, group) %>% summarize(Count = n()) %>% ungroup %>% slice(16:20))) %>%
#     kable_styling()  %>%
#     kable_paper("striped", full_width = F)

# FEV %>% group_by(time, group) %>% summarize(Count = n()) %>%
#     kable %>%
#     kable_styling()  %>%
#     kable_paper("striped", full_width = F)
# data is complete, every participants have same number of observations
# xtabs(~ id + time, FEV)
#### 1.b ####
## Individual Spagetti Plot
# interaction.plot(x.factor = FEV$time,
#                  trace.factor = FEV$id,
#                  response = FEV$y,
#                  legend = FALSE,
#                  xlab = "Time (hours)",
#                  ylab = "FEV1 (ml/100)",
#                  main = "Plot of Mean Response Profile in Subjects",
#                  type = "b",
#                  pch = 19,
#                  col = 1:20)
# interaction.plot(x.factor = FEV$time,
#                  trace.factor = FEV$id,
#                  response = FEV$y,
#                  legend = FALSE,
#                  xlab = "Time (hours)",
#                  ylab = "FEV1 (ml/100)",
#                  main = "Plot of Mean Response Profile in Subjects",
#                  type = "b",
#                  pch = c(rep(19,10),rep(17,10)),
#                  lty = c(rep(1,10),rep(2,10)),
#                  col = c(rep(4,10), rep(8,10)))
# legend(1.1, 30, lty = 1:2, pch = c(19,17), legend=c("0: Room Air","1: Ozone"), col=c(4,8), title = "Group")

# kable(list(FEV %>% mutate(Id = factor(id), Group = factor(group)) %>% group_by(Id, Group) %>%
#              summarise(Observations = n(), Mean = mean(y), StdDev = var(y) %>% sqrt) %>% ungroup %>% slice(1:10),
#            FEV %>% mutate(Id = factor(id), Group = factor(group)) %>% group_by(Id, Group) %>%
#              summarise(Observations = n(), Mean = mean(y), StdDev = var(y) %>% sqrt) %>% ungroup %>% slice(11:20)
#             )) %>%
#     kable_styling() %>%
#     kable_paper("striped", full_width = F) %>%
#   add_header_above(header = c("Within Subject Variability", ""))
interaction.plot(x.factor = FEV$time[1:40],
                 trace.factor = FEV$id[1:40],
                 response = FEV$y[1:40],
                 legend = FALSE,
                 xlab = "Time (hours)",
                 ylab = "FEV1 (ml/100)",
                 main = "Subjects with Room Air Exposure FEV1 Overtime",
                 type = "b",
                 pch = c(rep(19,10)),
                 lty = c(rep(1,10)),
                 col = c(rep(4,10)))
interaction.plot(x.factor = FEV$time[41:80],
                 trace.factor = FEV$id[41:80],
                 response = FEV$y[41:80],
                 legend = FALSE,
                 xlab = "Time (hours)",
                 ylab = "FEV1 (ml/100)",
                 main = "Subjects with Ozone Exposure FEV1 Overtime",
                 type = "b",
                 pch = c(rep(17,10)),
                 lty = c(rep(2,10)),
                 col = c(rep(8,10)))
#### 1.c ####
## Mean Response Profile Spagetti Plot
interaction.plot(x.factor = FEV$time,
                 trace.factor = FEV$group,
                 response = FEV$y,
                 legend = FALSE,
                 xlab = "Time (hours)",
                 ylab = "FEV1 (ml/100)",
                 main = "Plot of Mean Response Profile by group",
                 type = "b",
                 pch = c(19,17),
                 lty = 1:2,
                 col = c("dark blue", "dark red"))
legend(1, 40, lty = 1:2, pch = c(19,17), legend=c("0: Room Air","1: Ozone"), col=c("dark blue", "dark red"), title = "Group")
kable(list(FEV %>% mutate(Time = factor(time), Group = group) %>% group_by(Time, Group) %>% 
                summarise(Observations = n(), Mean = mean(y), Var = var(y)) %>% ungroup)) %>%
  kable_paper("striped", full_width = F) %>% 
  add_header_above(header = c("Mean Response Profile"))


#### Q2 ####
#### 2.a ####
UNmod <- gls(y ~ group + factor(time) + group*factor(time), 
             data = FEV,
             corr= corSymm(form = ~ 1 | id),
             weights = varIdent(form = ~ 1 |time),
             method = 'REML')
# UNmod %>% summary
L <- diag(c(1.000000, 1.019838, 1.585709, 1.986173))
list(L%*%getVarCov(UNmod)%*%L, #covariance matrix
     L%*%getVarCov(UNmod)%*%L %>% cor()) %>% #correlation matrix
  kable()%>%
  kable_paper("striped", full_width = F) %>% 
  add_header_above(header = c("UN Covariance", "UN Correlation"))

# getVarCov(UNmod)
# getVarCov(UNmod) %>% cor()

#varIdent(form = ~ variance covariate | grouping factor); when a grouping factor is present, a different coefficient value is used for each at its levels less one reference level
#getVarCov gives marginal variance-covariance of the responses by group
#### 2b.i ####
CSmod <- gls(y ~ group + factor(time) + group*factor(time),
             data = FEV,
             correlation = corCompSymm(form = ~ 1|id),
             weights = varIdent(form = ~ 1), # for homogenous variance
             method = 'REML')
# CSmod %>% summary
list(diag(4) %*% getVarCov(CSmod) %*% diag(4), diag(4) %*% getVarCov(CSmod) %*% diag(4) %>% cor()) %>% 
  kable%>%
  kable_paper("striped", full_width = F) %>% 
  add_header_above(header = c("CS Covariance", "CS Correlation"))
#### 2b.ii ####
CSHmod <- gls(y ~ group + factor(time) + group*factor(time),
             data = FEV,
             correlation = corCompSymm(form = ~ 1|id),
             weights = varIdent(form = ~ 1|time), #include for heterogenous variance; ratio of variances to reference time point
             method = 'REML')
# CSHmod %>% summary
LCSH <- diag(c(1.000000, 0.963340, 1.513376, 1.980123 ))
list(LCSH %*% getVarCov(CSHmod) %*% LCSH, LCSH %*% getVarCov(CSHmod) %*% LCSH %>% cor()) %>% 
  kable%>%
  kable_paper("striped", full_width = F) %>% 
  add_header_above(header = c("CSH Covariance", "CSH Correlation"))
#### 2b.iii ####
AR1mod <- gls(y ~ group + factor(time) + group*factor(time),
             data = FEV,
             correlation = corAR1(form = ~ 1|id),
             weights = varIdent(form = ~ 1), #include for homogenous variance
             method = 'REML')
# AR1mod %>% summary
list(diag(4) %*% getVarCov(AR1mod) %*% diag(4), diag(4) %*% getVarCov(CSHmod) %*% diag(4) %>% cor()) %>% 
  kable %>%
  kable_paper("striped", full_width = F) %>% 
  add_header_above(header = c("AR1 Covariance", "AR1 Correlation"))
#### 2b.iv ####
AR1Hmod <- gls(y ~ group + factor(time) + group*factor(time),
             data = FEV,
             correlation = corAR1(form = ~ 1|id),
             weights = varIdent(form = ~ 1|time), #include for heterogenous variance
             method = 'REML')
# AR1Hmod %>% summary
LAR1H <- diag(c(1.000000, 1.016444, 1.598048, 2.024784))
list(LAR1H %*% getVarCov(AR1Hmod) %*% LAR1H, LAR1H %*% getVarCov(AR1Hmod) %*% LAR1H %>% cor()) %>% 
  kable%>%
  kable_paper("striped", full_width = F) %>% 
  add_header_above(header = c("AR1H Covariance", "AR1H Correlation"))
#### 2b.comparison ####
Best <- tibble("Model" = c("Unstructured", "Compound Symmetry", "Compound Symmetry, Hetero", "AR(1)", "AR(1), Hetero"),
       "LogLikelihood" = c(-177.5752, -197.0629, -182.6467, -190.5644, -179.1189),
       "Number of Paramters" = c(10, 2, 5, 2, 5)) %>%
  mutate("-2LogLikelihood" = -2*LogLikelihood) %>% 
  select(c(1,2,4,3)) 
Best %>% 
  kable %>%
  kable_paper("striped", full_width = F)

Comp <- tibble("Structure Comparisons" = c("UN vs CS", "UN vs CSH", "UN vs AR(1)", "UN vs ARH(1)"),
       "CV = Diff LL" = c(Best[[2,3]]-Best[[1,3]], Best[[3,3]]-Best[[1,3]], Best[[4,3]]-Best[[1,3]], Best[[5,3]]-Best[[1,3]]),
       "df = Diff Parameters" = c(8, 5, 8, 5),
       "Pr(chi2(df,0.95)>CV)"= c("< 0.00001", 0.054941, 0.001157, 0.749359))
list(Comp[1:2,], Comp[3:4,]) %>% 
  kable %>%
  kable_paper("striped", full_width = F)

tibble("Structure" = c("CSH", "ARH(1)"),
       "-2LL"= c(Best[[3,3]], Best[[5,3]]),
       "Parameters" = c(5,5),
       "AIC" = c(Best[[3,3]]+(2*5), Best[[5,3]]+(2*5))) %>% 
  kable %>%
  kable_paper("striped", full_width = F)


#### Q3 ####
# AR1Hmod %>% summary
#### 3a ####
AR1Hmod.equalgroups <- gls(y~ time,
             data = FEV,
             correlation = corAR1(form = ~ 1|id),
             weights = varIdent(form = ~ 1|time), #include for heterogenous variance
             method = 'REML')
# AR1Hmod.equalgroups %>% summary
# anova(AR1Hmod.equalgroups, AR1Hmod)

tibble("Model" = c("Saturated","Time Only", "Difference"),
       "-2LogLikelihood" = c(-2*-179.1189, -2*-189.5657, (-2*-189.5657)-(-2*-179.1189)),
       "Number of Parameters" = c(8, 4, 4)) %>% 
  kable %>%
  kable_paper("striped", full_width = F)
#### 3b ####
AR1Hmod.parallel <- gls(y~ factor(time) + group,
             data = FEV,
             correlation = corAR1(form = ~ 1|id),
             weights = varIdent(form = ~ 1|time), #include for heterogenous variance
             method = 'REML')
# AR1Hmod.parallel %>% summary
# anova(AR1Hmod.parallel, AR1Hmod)

tibble("Model" = c("Saturated","Time and Group", "Difference"),
       "-2LogLikelihood" = c(-2*-179.1189, -2*-185.8064, (-2*-185.8064)-(-2*-179.1189)),
       "Number of Parameters" = c(8, 5, 3)) %>% 
  kable %>%
  kable_paper("striped", full_width = F)


#### Q4 ####
model.linear<- gls(y~ time + group + group*time, 
                data = FEV,
                correlation = corAR1(form = ~ 1|id),
                weights = varIdent(form = ~ 1|time), 
                method = 'REML')
model.linear %>% summary