The purpose of this assignment is to analyze the changes in forced expiratory volume at 1 second (FEV1) over time, and to determine whether trend is different in the treatment group. The dataset which will be analyzed if from the Childhood Asthma Management Program (CAMP) with children ages 5 - 12 years. It includes data from 1993 - 1995.

The variables in the dataset are as follows:

Data Management & Recreating Variables

The first step would create dichotomous variables and create homogeneity between variables when analyzing in regression models. Names of some variables were changed for more clarity during interpretation.

Data Exploration

Here, various characteristics (demographics, treatment groups, etc.) are explored within the dataset.

a) Baseline data

Summary of baseline FEV1

This table shows the baseline data from the first visit which includes the mean, standard deviation (SD), minimum, and maximum for age at the start of the study, and the FEV1.

Mean FEV1 SD of FEV1 Minimum FEV1 Maximum FEV1
102.82 12.77 64 154
Statistics of FEV1

b) Exploring the participants in different treatment groups

Treatment Groups Number Percentage
Placebo 275 39.57
Budesonide 210 30.22
Nedocromil 210 30.22

c) Mean age of students at the start of the study

Mean Age
8.38

d) Gender distribution in the dataset

Gender Frequency
Male 412
Female 283

e) Number of subjects according to ethnic groups

Ethnicity Frequency
White 479
Black 89
Hispanic 68
Other 59

The most number of subjects are white (479), while blacks are 89, hispanics 68, and other ethnicities are 59.

f) Summary of minimum, maximum, mean, standard deviation, and standard error of the FEV1 mean differences

Minimum Maximum Mean SD
-70 45 0.43 10.33

The minimum difference is -70, maximum is 45, mean 0.43, and standard deviation is 10.33.

g) Comparing the number of subjects at various visits

Visit Number
Month0 695
Month2 664
Month4 673
Month12 674
Month16 666

h) Plots

Fig. 1. FEV1 distribution every month

The plots for FEV1 for each visit seem to be normally distributed, without any significant variation from each other. The peak is about 100% of predicted for each follow-up month.

Fig. 2. Spaghetti plot showing distributions of FEV1 during each visit

From the figure above, none of the treatment groups follow a specific pattern for the FEV1 values.

Fig. 3. FEV1 mean difference in each treatment group

The plot above shows no significant difference in mean differences within the 3 treatment groups.

Missingness of data

The wide format of the longitudinal dataset is used to explore the missingness of the data.

Missingness pattern

##     id TG GENDER ETHNIC age_rz anypet parent_smokes any_smokes woodstove
## 593  1  1      1      1      1      1             1          1         1
## 17   1  1      1      1      1      1             1          1         1
## 13   1  1      1      1      1      1             1          1         1
## 12   1  1      1      1      1      1             1          1         1
## 2    1  1      1      1      1      1             1          1         1
## 1    1  1      1      1      1      1             1          1         1
## 2    1  1      1      1      1      1             1          1         1
## 6    1  1      1      1      1      1             1          1         1
## 2    1  1      1      1      1      1             1          1         1
## 1    1  1      1      1      1      1             1          1         1
## 4    1  1      1      1      1      1             1          1         1
## 4    1  1      1      1      1      1             1          1         1
## 13   1  1      1      1      1      1             1          1         1
## 2    1  1      1      1      1      1             1          1         1
## 1    1  1      1      1      1      1             1          1         1
## 7    1  1      1      1      1      1             1          1         1
## 1    1  1      1      1      1      1             1          1         1
## 6    1  1      1      1      1      1             1          1         1
## 1    1  1      1      1      1      1             1          1         1
## 1    1  1      1      1      1      1             1          1         1
## 3    1  1      1      1      1      1             1          1         1
## 1    1  1      1      1      1      1             1          1         1
## 1    1  1      1      1      1      1             1          1         1
## 1    1  1      1      1      1      1             1          1         0
##      0  0      0      0      0      0             0          0         1
##     dehumid wbc hemog Month0 agehome Month12 Month4 Month16 Month2 diff    
## 593       1   1     1      1       1       1      1       1      1    1   0
## 17        1   1     1      1       1       1      1       1      0    1   1
## 13        1   1     1      1       1       1      1       0      1    0   2
## 12        1   1     1      1       1       1      0       1      1    1   1
## 2         1   1     1      1       1       1      0       1      0    1   2
## 1         1   1     1      1       1       0      1       1      1    1   1
## 2         1   1     1      1       1       0      1       1      0    1   2
## 6         1   1     1      1       1       0      1       0      1    0   3
## 2         1   1     1      1       1       0      1       0      0    0   4
## 1         1   1     1      1       1       0      0       1      0    1   3
## 4         1   1     1      1       1       0      0       0      1    0   4
## 4         1   1     1      1       1       0      0       0      0    0   5
## 13        1   1     1      1       0       1      1       1      1    1   1
## 2         1   1     1      1       0       1      0       1      1    1   2
## 1         1   1     1      1       0       0      1       1      1    1   2
## 7         1   1     1      0       1       1      1       1      1    0   2
## 1         1   1     1      0       1       1      1       1      0    0   3
## 6         1   1     0      1       1       1      1       1      1    1   1
## 1         1   1     0      1       1       1      0       0      0    0   5
## 1         1   1     0      1       1       0      1       0      0    0   5
## 3         1   0     1      1       1       1      1       1      1    1   1
## 1         1   0     1      0       1       1      1       1      0    0   4
## 1         0   1     1      1       1       1      1       1      1    1   1
## 1         1   1     1      1       1       1      1       1      1    1   1
##           1   4     8      9      16      22     26      31     32   40 190

The pink boxes indicate missign data, and it shows no pattern to the missingness.

Visualize plot for missing data

There are very few data are missing according to the above plot.

Scatter plot for missing data

There seems to be some missingness of data according to the pattern and the table above. No clear trend or pattern of missingness. It is assumed to be missing completely in random (MCAR).

Designing models

Several types of generalized estimating equation (GEEs) will be run. The model which best fits our data will be considered for our analysis.

a) Independence Correlation

term estimate conf.low conf.high
(Intercept) 103.55 102.24 104.85
TGBudesonide 1.37 -0.72 3.46
TGNedocromil -0.82 -2.78 1.15
  • Beta0: The mean FEV1 for individuals receiving the placebo is 103.55 (95% CI: 102.24, 104.85).
  • Beta1: The average difference in FEV1 comparing budesonide to placebo is 1.37 (95% CI: -0.72, 3.46).
  • Beta2: The average difference in FEV1 comparing nedocromil to placebo is -0.82 (95% CI: -2.78, 1.15).

b) Exchangeable Correlation

term estimate conf.low conf.high
(Intercept) 103.75 102.45 105.05
TGBudesonide 1.12 -0.96 3.20
TGNedocromil -1.01 -2.97 0.95
  • Beta0: The mean FEV1 for individuals receiving the placebo is 103.75 (95% CI: 102.45, 105.05).
  • Beta1: The average difference in FEV1 comparing budesonide to placebo is 1.12 (95% CI: -0.96, 3.20).
  • Beta2: The average difference in FEV1 comparing nedocromil to placebo is -1.01 (95% CI: -2.97, 0.95).

c) Autoregressive Correlation

term estimate conf.low conf.high
(Intercept) 103.41 102.09 104.72
TGBudesonide 0.68 -1.40 2.76
TGNedocromil -1.12 -3.15 0.90
  • Beta0: The mean FEV1 for individuals receiving the placebo is 103.41 (95% CI: 102.09, 104.72).
  • Beta1: The average difference in FEV1 comparing budesonide to placebo is 0.68 (95% CI: -1.40, 2.76).
  • Beta2: The average difference in FEV1 comparing nedocromil to placebo is -1.12 (95% CI: -3.15, 0.90).

d) Unstructured Correlation

term estimate conf.low conf.high
(Intercept) 103.68 102.38 104.97
TGBudesonide 1.03 -1.04 3.10
TGNedocromil -1.04 -3.00 0.92
  • Beta0: The mean FEV1 for individuals receiving the placebo is 103.68 (95% CI: 102.38, 104.97).
  • Beta1: The average difference in FEV1 comparing budesonide to placebo is 1.03 (95% CI: -1.04, 3.10).
  • Beta2: The average difference in FEV1 comparing nedocromil to placebo is -1.04 (95% CI: -3.00, 0.92).

Comparing & selecting a model for our analysis

Independence QIC

x
QIC 530585.2
QICu 530567.6
Quasi Lik -265280.8
CIC 11.8
params 3.0
QICC 530585.2

Exchangeable QIC

x
QIC 530623.58
QICu 530623.53
Quasi Lik -265308.76
CIC 3.03
params 3.00
QICC 530623.60

Autoregressive QIC

x
QIC 531496.11
QICu 531495.84
Quasi Lik -265744.92
CIC 3.14
params 3.00
QICC 531496.12

Unstructured QIC

x
QIC 530644.76
QICu 530644.72
Quasi Lik -265319.36
CIC 3.02
params 3.00
QICC 530644.86

Independence and exchangeable structures have the lowest QICs.

Correlation Structure

This will give an idea of the type of correlation the data follows.

Month0 Month2 Month4 Month12 Month16
Month0 1.00 0.78 0.76 0.74 0.67
Month2 0.78 1.00 0.80 0.77 0.72
Month4 0.76 0.80 1.00 0.77 0.74
Month12 0.74 0.77 0.77 1.00 0.81
Month16 0.67 0.72 0.74 0.81 1.00

Though the correlation structure follows an autoregressive pattern, an exchangeable model is more suitable for our data.

Data Analysis

Simple GEE Models

FEV1 by Treatment Group

Term Estimate Lower Bound Upper Bound P-value
(Intercept) 103.75 102.45 105.05 0.00
TGBudesonide 1.12 -0.96 3.20 0.29
TGNedocromil -1.01 -2.97 0.95 0.31

Plot to show FEV1 by time

The plot above confirms linearity of time.

Analysis of time

When time is analyzed as a whole

Term Estimate Lower Bound Upper Bound P-value
(Intercept) 103.84 102.94 104.74 0.00
visitc -0.01 -0.05 0.03 0.68

When time is analysed separately

Term Estimate P-value Lower Bound Upper Bound
(Intercept) 102.66 0.00 101.30 104.02
TGBudesonide 1.37 0.20 -0.72 3.46
TGNedocromil -0.82 0.41 -2.79 1.14
visitcMonth2 1.88 0.00 1.20 2.56
visitcMonth4 0.98 0.01 0.28 1.68
visitcMonth12 1.26 0.00 0.56 1.96
visitcMonth16 0.36 0.37 -0.43 1.16

There are differences in the estimates between when the month is analyzed as a whole and separately. The beta0 decreased from 104.98 to 102.66, and the coefficients for budesonide and nedocromil increased (became more positively associated) when month was analyzed separately.

Finalized model

This model includes more covariates to acount for other factors like demographic, blood and environmental characteristics. The process of backwards elimination is used to eliminate variables which are not statistically significant and had large p-values.

Term Estimate Lower Bound Upper Bound P-value
(Intercept) 93.58 83.21 103.96 0.00
visitc12 1.22 0.52 1.92 0.00
visitc16 0.44 -0.37 1.24 0.28
visitc2 1.94 1.29 2.60 0.00
visitc4 1.08 0.39 1.77 0.00
TGBudesonide 1.37 -0.76 3.51 0.21
TGNedocromil -0.87 -2.87 1.12 0.39
GENDERFemale 0.33 -1.43 2.09 0.71
ETHNICBlack -0.30 -3.08 2.48 0.83
ETHNICHispanic -1.70 -4.56 1.16 0.24
ETHNICOther 0.11 -2.82 3.04 0.94
age_rz -0.17 -0.60 0.26 0.43
hemog 0.59 -0.02 1.21 0.06
wbc 0.02 -0.02 0.06 0.26
anypet1 1.80 -0.08 3.68 0.06
woodstove1 -0.84 -4.26 2.58 0.63
dehumid1 1.11 -2.28 4.49 0.52
parent_smokes1 6.71 2.03 11.39 0.00
any_smokes1 -7.03 -11.54 -2.52 0.00
x
QIC 494934.39
QICu 494934.06
Quasi Lik -247448.03
CIC 19.16
params 19.00
QICC 494934.65

Confounding & Interaction

Confounding
  • Age
Term Estimate Lower Bound Upper Bound P-value
(Intercept) 104.98 101.43 108.53 0.00
TGBudesonide 1.15 -0.94 3.23 0.28
TGNedocromil -1.01 -2.98 0.95 0.31
visitc -0.01 -0.05 0.03 0.68
age_rz -0.14 -0.55 0.27 0.50
  • Gender
Term Estimate Lower Bound Upper Bound P-value
(Intercept) 103.95 102.37 105.54 0.00
TGBudesonide 1.14 -0.94 3.23 0.28
TGNedocromil -0.97 -2.95 1.01 0.34
visitc -0.01 -0.05 0.03 0.69
GENDERm -0.28 -1.99 1.44 0.75
  • Race
Term Estimate Lower Bound Upper Bound P-value
(Intercept) 103.48 100.80 106.17 0.00
TGBudesonide 1.12 -0.96 3.20 0.29
TGNedocromil -0.98 -2.95 0.98 0.33
visitc -0.01 -0.05 0.03 0.69
ETHNICh -0.87 -4.56 2.82 0.64
ETHNICo 0.87 -2.84 4.58 0.65
ETHNICw 0.47 -2.22 3.17 0.73
Interaction

This model is designed to observe any effect that the treatment group or the visit had on the outcome.

Term Estimate Lower Bound Upper Bound
(Intercept) 103.97 102.60 105.35
visitc -0.03 -0.11 0.04
TGBudesonide 0.64 -1.57 2.84
TGNedocromil -1.09 -3.20 1.03
visitc:TGBudesonide 0.07 -0.03 0.17
visitc:TGNedocromil 0.01 -0.09 0.12

Assumptions

Normality

The histogram shows normal curve, confirming normality within our sample.

Homoscadasticity

Treatment Group

The boxplots do not have much variations from each other.

Follow-up Visit

For each of the visit, there are not many variations between the follow-up month,

Linear Mixed-Effects Model

Linear Mixed-Effects Model (LMM) can also be used to analyze the CAMP dataset. The LMM models are usually used for subject-specific data. Like GEE models, LMM models are also selected according to p-values of the coefficients, AIC and scientific expertise. After running LMM models, comparisons were made between the treatment groups in GEE and LMM models.

Both the LMM and GEE models show no significant differences within each other. Only variations are seen in the 95% CI range. The finalized GEE model seems to be the best fit for our dataset.