The purpose of this assignment is to analyze the changes in forced expiratory volume at 1 second (FEV1) over time, and to determine whether trend is different in the treatment group. The dataset which will be analyzed if from the Childhood Asthma Management Program (CAMP) with children ages 5 - 12 years. It includes data from 1993 - 1995.
The variables in the dataset are as follows:
id: Participant ID
POSFEVPP: FEV1 after the administration of a bronchodilator (units = percent of predicted values)
TG: Treatment group, Treatment group: A=budesonide, B=nedocromil, C=placebo
visitc: Followup visit in months
GENDER: m=male, f=female
ETHNIC: w=white,b=black,h=hispanic,o=other
age_rz: Age in years at start of the study
hemog: Hemoglobin (g/dl)
wbc: White Blood Cell count (1000 cells/ul)
agehome: Age of current home (years)
anypet: Any pets, 1=Yes 2=No
woodstove: Used wood stove for heating/cooking, 1=Yes 2=No
dehumid: Use a dehumidifier, 1=Yes 2=No 3=DK
parent_smokes: Either Parent/partner smokes in home, 1=Yes 2=No
any_smokes: Anyone (including visitors) smokes in home, 1=Yes 2=No
The first step would create dichotomous variables and create homogeneity between variables when analyzing in regression models. Names of some variables were changed for more clarity during interpretation.
Here, various characteristics (demographics, treatment groups, etc.) are explored within the dataset.
This table shows the baseline data from the first visit which includes the mean, standard deviation (SD), minimum, and maximum for age at the start of the study, and the FEV1.
| Mean FEV1 | SD of FEV1 | Minimum FEV1 | Maximum FEV1 |
|---|---|---|---|
| 102.82 | 12.77 | 64 | 154 |
| Treatment Groups | Number | Percentage |
|---|---|---|
| Placebo | 275 | 39.57 |
| Budesonide | 210 | 30.22 |
| Nedocromil | 210 | 30.22 |
| Mean Age |
|---|
| 8.38 |
| Gender | Frequency |
|---|---|
| Male | 412 |
| Female | 283 |
| Ethnicity | Frequency |
|---|---|
| White | 479 |
| Black | 89 |
| Hispanic | 68 |
| Other | 59 |
The most number of subjects are white (479), while blacks are 89, hispanics 68, and other ethnicities are 59.
| Minimum | Maximum | Mean | SD |
|---|---|---|---|
| -70 | 45 | 0.43 | 10.33 |
The minimum difference is -70, maximum is 45, mean 0.43, and standard deviation is 10.33.
| Visit | Number |
|---|---|
| Month0 | 695 |
| Month2 | 664 |
| Month4 | 673 |
| Month12 | 674 |
| Month16 | 666 |
The plots for FEV1 for each visit seem to be normally distributed, without any significant variation from each other. The peak is about 100% of predicted for each follow-up month.
From the figure above, none of the treatment groups follow a specific pattern for the FEV1 values.
The plot above shows no significant difference in mean differences within the 3 treatment groups.
The wide format of the longitudinal dataset is used to explore the missingness of the data.
## id TG GENDER ETHNIC age_rz anypet parent_smokes any_smokes woodstove
## 593 1 1 1 1 1 1 1 1 1
## 17 1 1 1 1 1 1 1 1 1
## 13 1 1 1 1 1 1 1 1 1
## 12 1 1 1 1 1 1 1 1 1
## 2 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1 1
## 2 1 1 1 1 1 1 1 1 1
## 6 1 1 1 1 1 1 1 1 1
## 2 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1 1
## 4 1 1 1 1 1 1 1 1 1
## 4 1 1 1 1 1 1 1 1 1
## 13 1 1 1 1 1 1 1 1 1
## 2 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1 1
## 7 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1 1
## 6 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1 1
## 3 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1 0
## 0 0 0 0 0 0 0 0 1
## dehumid wbc hemog Month0 agehome Month12 Month4 Month16 Month2 diff
## 593 1 1 1 1 1 1 1 1 1 1 0
## 17 1 1 1 1 1 1 1 1 0 1 1
## 13 1 1 1 1 1 1 1 0 1 0 2
## 12 1 1 1 1 1 1 0 1 1 1 1
## 2 1 1 1 1 1 1 0 1 0 1 2
## 1 1 1 1 1 1 0 1 1 1 1 1
## 2 1 1 1 1 1 0 1 1 0 1 2
## 6 1 1 1 1 1 0 1 0 1 0 3
## 2 1 1 1 1 1 0 1 0 0 0 4
## 1 1 1 1 1 1 0 0 1 0 1 3
## 4 1 1 1 1 1 0 0 0 1 0 4
## 4 1 1 1 1 1 0 0 0 0 0 5
## 13 1 1 1 1 0 1 1 1 1 1 1
## 2 1 1 1 1 0 1 0 1 1 1 2
## 1 1 1 1 1 0 0 1 1 1 1 2
## 7 1 1 1 0 1 1 1 1 1 0 2
## 1 1 1 1 0 1 1 1 1 0 0 3
## 6 1 1 0 1 1 1 1 1 1 1 1
## 1 1 1 0 1 1 1 0 0 0 0 5
## 1 1 1 0 1 1 0 1 0 0 0 5
## 3 1 0 1 1 1 1 1 1 1 1 1
## 1 1 0 1 0 1 1 1 1 0 0 4
## 1 0 1 1 1 1 1 1 1 1 1 1
## 1 1 1 1 1 1 1 1 1 1 1 1
## 1 4 8 9 16 22 26 31 32 40 190
The pink boxes indicate missign data, and it shows no pattern to the missingness.
There are very few data are missing according to the above plot.
There seems to be some missingness of data according to the pattern and the table above. No clear trend or pattern of missingness. It is assumed to be missing completely in random (MCAR).
Several types of generalized estimating equation (GEEs) will be run. The model which best fits our data will be considered for our analysis.
| term | estimate | conf.low | conf.high |
|---|---|---|---|
| (Intercept) | 103.55 | 102.24 | 104.85 |
| TGBudesonide | 1.37 | -0.72 | 3.46 |
| TGNedocromil | -0.82 | -2.78 | 1.15 |
| term | estimate | conf.low | conf.high |
|---|---|---|---|
| (Intercept) | 103.75 | 102.45 | 105.05 |
| TGBudesonide | 1.12 | -0.96 | 3.20 |
| TGNedocromil | -1.01 | -2.97 | 0.95 |
| term | estimate | conf.low | conf.high |
|---|---|---|---|
| (Intercept) | 103.41 | 102.09 | 104.72 |
| TGBudesonide | 0.68 | -1.40 | 2.76 |
| TGNedocromil | -1.12 | -3.15 | 0.90 |
| term | estimate | conf.low | conf.high |
|---|---|---|---|
| (Intercept) | 103.68 | 102.38 | 104.97 |
| TGBudesonide | 1.03 | -1.04 | 3.10 |
| TGNedocromil | -1.04 | -3.00 | 0.92 |
| x | |
|---|---|
| QIC | 530585.2 |
| QICu | 530567.6 |
| Quasi Lik | -265280.8 |
| CIC | 11.8 |
| params | 3.0 |
| QICC | 530585.2 |
| x | |
|---|---|
| QIC | 530623.58 |
| QICu | 530623.53 |
| Quasi Lik | -265308.76 |
| CIC | 3.03 |
| params | 3.00 |
| QICC | 530623.60 |
| x | |
|---|---|
| QIC | 531496.11 |
| QICu | 531495.84 |
| Quasi Lik | -265744.92 |
| CIC | 3.14 |
| params | 3.00 |
| QICC | 531496.12 |
| x | |
|---|---|
| QIC | 530644.76 |
| QICu | 530644.72 |
| Quasi Lik | -265319.36 |
| CIC | 3.02 |
| params | 3.00 |
| QICC | 530644.86 |
Independence and exchangeable structures have the lowest QICs.
This will give an idea of the type of correlation the data follows.
| Month0 | Month2 | Month4 | Month12 | Month16 | |
|---|---|---|---|---|---|
| Month0 | 1.00 | 0.78 | 0.76 | 0.74 | 0.67 |
| Month2 | 0.78 | 1.00 | 0.80 | 0.77 | 0.72 |
| Month4 | 0.76 | 0.80 | 1.00 | 0.77 | 0.74 |
| Month12 | 0.74 | 0.77 | 0.77 | 1.00 | 0.81 |
| Month16 | 0.67 | 0.72 | 0.74 | 0.81 | 1.00 |
Though the correlation structure follows an autoregressive pattern, an exchangeable model is more suitable for our data.
FEV1 by Treatment Group
| Term | Estimate | Lower Bound | Upper Bound | P-value |
|---|---|---|---|---|
| (Intercept) | 103.75 | 102.45 | 105.05 | 0.00 |
| TGBudesonide | 1.12 | -0.96 | 3.20 | 0.29 |
| TGNedocromil | -1.01 | -2.97 | 0.95 | 0.31 |
Plot to show FEV1 by time
The plot above confirms linearity of time.
Analysis of time
When time is analyzed as a whole
| Term | Estimate | Lower Bound | Upper Bound | P-value |
|---|---|---|---|---|
| (Intercept) | 103.84 | 102.94 | 104.74 | 0.00 |
| visitc | -0.01 | -0.05 | 0.03 | 0.68 |
When time is analysed separately
| Term | Estimate | P-value | Lower Bound | Upper Bound |
|---|---|---|---|---|
| (Intercept) | 102.66 | 0.00 | 101.30 | 104.02 |
| TGBudesonide | 1.37 | 0.20 | -0.72 | 3.46 |
| TGNedocromil | -0.82 | 0.41 | -2.79 | 1.14 |
| visitcMonth2 | 1.88 | 0.00 | 1.20 | 2.56 |
| visitcMonth4 | 0.98 | 0.01 | 0.28 | 1.68 |
| visitcMonth12 | 1.26 | 0.00 | 0.56 | 1.96 |
| visitcMonth16 | 0.36 | 0.37 | -0.43 | 1.16 |
There are differences in the estimates between when the month is analyzed as a whole and separately. The beta0 decreased from 104.98 to 102.66, and the coefficients for budesonide and nedocromil increased (became more positively associated) when month was analyzed separately.
This model includes more covariates to acount for other factors like demographic, blood and environmental characteristics. The process of backwards elimination is used to eliminate variables which are not statistically significant and had large p-values.
| Term | Estimate | Lower Bound | Upper Bound | P-value |
|---|---|---|---|---|
| (Intercept) | 93.58 | 83.21 | 103.96 | 0.00 |
| visitc12 | 1.22 | 0.52 | 1.92 | 0.00 |
| visitc16 | 0.44 | -0.37 | 1.24 | 0.28 |
| visitc2 | 1.94 | 1.29 | 2.60 | 0.00 |
| visitc4 | 1.08 | 0.39 | 1.77 | 0.00 |
| TGBudesonide | 1.37 | -0.76 | 3.51 | 0.21 |
| TGNedocromil | -0.87 | -2.87 | 1.12 | 0.39 |
| GENDERFemale | 0.33 | -1.43 | 2.09 | 0.71 |
| ETHNICBlack | -0.30 | -3.08 | 2.48 | 0.83 |
| ETHNICHispanic | -1.70 | -4.56 | 1.16 | 0.24 |
| ETHNICOther | 0.11 | -2.82 | 3.04 | 0.94 |
| age_rz | -0.17 | -0.60 | 0.26 | 0.43 |
| hemog | 0.59 | -0.02 | 1.21 | 0.06 |
| wbc | 0.02 | -0.02 | 0.06 | 0.26 |
| anypet1 | 1.80 | -0.08 | 3.68 | 0.06 |
| woodstove1 | -0.84 | -4.26 | 2.58 | 0.63 |
| dehumid1 | 1.11 | -2.28 | 4.49 | 0.52 |
| parent_smokes1 | 6.71 | 2.03 | 11.39 | 0.00 |
| any_smokes1 | -7.03 | -11.54 | -2.52 | 0.00 |
| x | |
|---|---|
| QIC | 494934.39 |
| QICu | 494934.06 |
| Quasi Lik | -247448.03 |
| CIC | 19.16 |
| params | 19.00 |
| QICC | 494934.65 |
| Term | Estimate | Lower Bound | Upper Bound | P-value |
|---|---|---|---|---|
| (Intercept) | 104.98 | 101.43 | 108.53 | 0.00 |
| TGBudesonide | 1.15 | -0.94 | 3.23 | 0.28 |
| TGNedocromil | -1.01 | -2.98 | 0.95 | 0.31 |
| visitc | -0.01 | -0.05 | 0.03 | 0.68 |
| age_rz | -0.14 | -0.55 | 0.27 | 0.50 |
| Term | Estimate | Lower Bound | Upper Bound | P-value |
|---|---|---|---|---|
| (Intercept) | 103.95 | 102.37 | 105.54 | 0.00 |
| TGBudesonide | 1.14 | -0.94 | 3.23 | 0.28 |
| TGNedocromil | -0.97 | -2.95 | 1.01 | 0.34 |
| visitc | -0.01 | -0.05 | 0.03 | 0.69 |
| GENDERm | -0.28 | -1.99 | 1.44 | 0.75 |
| Term | Estimate | Lower Bound | Upper Bound | P-value |
|---|---|---|---|---|
| (Intercept) | 103.48 | 100.80 | 106.17 | 0.00 |
| TGBudesonide | 1.12 | -0.96 | 3.20 | 0.29 |
| TGNedocromil | -0.98 | -2.95 | 0.98 | 0.33 |
| visitc | -0.01 | -0.05 | 0.03 | 0.69 |
| ETHNICh | -0.87 | -4.56 | 2.82 | 0.64 |
| ETHNICo | 0.87 | -2.84 | 4.58 | 0.65 |
| ETHNICw | 0.47 | -2.22 | 3.17 | 0.73 |
This model is designed to observe any effect that the treatment group or the visit had on the outcome.
| Term | Estimate | Lower Bound | Upper Bound |
|---|---|---|---|
| (Intercept) | 103.97 | 102.60 | 105.35 |
| visitc | -0.03 | -0.11 | 0.04 |
| TGBudesonide | 0.64 | -1.57 | 2.84 |
| TGNedocromil | -1.09 | -3.20 | 1.03 |
| visitc:TGBudesonide | 0.07 | -0.03 | 0.17 |
| visitc:TGNedocromil | 0.01 | -0.09 | 0.12 |
The histogram shows normal curve, confirming normality within our sample.
Treatment Group
The boxplots do not have much variations from each other.
Follow-up Visit
For each of the visit, there are not many variations between the follow-up month,
Linear Mixed-Effects Model (LMM) can also be used to analyze the CAMP dataset. The LMM models are usually used for subject-specific data. Like GEE models, LMM models are also selected according to p-values of the coefficients, AIC and scientific expertise. After running LMM models, comparisons were made between the treatment groups in GEE and LMM models.
Both the LMM and GEE models show no significant differences within each other. Only variations are seen in the 95% CI range. The finalized GEE model seems to be the best fit for our dataset.