Assignment 4: Longitudinal and Multilevel Data Analysis

1) Calculate the mean and 95% confidence interval for blood lead level at each time point.


	week	bloodlead.mean	bloodlead.lower	bloodlead.upper

1	baseline	26.541	25.084	27.998
2	week1	13.496	11.270	15.722
3	week4	15.433	13.160	17.705
4	week6	20.757	18.074	23.441

2) Create a spaghetti plot of all blood lead levels

As what was similarly shown in the table above, the plots hint that blood lead levels may not have a linear relationship with time. The baseline measurements are the highest, then dip in week 1, and then increase in week 4 and 6. The somewhat parabolic trend in blood lead levels over time suggests that a quadratic time term might be better suited in this data.

3) Linear regression:

3a) Perform linear regression on blood lead level as a function of the number of weeks of treatment. Be sure to account for clustering by individual, allowing for both random intercept and random slope.

## Linear mixed-effects model fit by REML
##  Data: bloodpb_long 
##        AIC      BIC    logLik
##   1415.579 1435.186 -701.7894
## 
## Random effects:
##  Formula: ~time | ID
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr  
## (Intercept) 3.2932693 (Intr)
## time        0.4814887 1     
## Residual    7.7994689       
## 
## Fixed effects: bloodlead ~ time 
##                 Value Std.Error  DF   t-value p-value
## (Intercept) 20.247724 0.9717970 146 20.835342  0.0000
## time        -0.433124 0.2435181 146 -1.778611  0.0774
##  Correlation: 
##      (Intr)
## time -0.497
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -1.7999173 -0.7190207  0.1124234  0.5199816  4.4078019 
## 
## Number of Observations: 196
## Number of Groups: 49

The fixed effects estimate of time is -0.43, which is the coefficent or change in blood lead levels after accounting for clustering in individuals. For example, for one unit increase in time corresponds to a 0.43 decrease in blood lead levels. However the p-value was not significant at the 0.05 level.

The output also describes the random effects. The random effect variance estimate (standard deviation^2), which is 10.85, is the variability of the intercept across individuals, or the amount of variance that can be attributed to differences between individuals. The random effect variance of time (0.23), is the amount of variability in the slope across individuals, or the variability within subject blood levels over time. 0.23 is the residual variation between subjects within each time point.

## Linear mixed-effects model fit by REML
##  Data: bloodpb_long 
##       AIC      BIC    logLik
##   1288.02 1336.882 -629.0099
## 
## Random effects:
##  Formula: ~week | ID
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev   Corr                
## (Intercept) 4.368967 (Intr) wekwk1 wekwk4
## weekweek1   6.397047 -0.118              
## weekweek4   6.643135 -0.125  0.769       
## weekweek6   7.265425  0.138  0.356  0.287
## Residual    2.578185                     
## 
## Fixed effects: bloodlead ~ week 
##                  Value Std.Error  DF   t-value p-value
## (Intercept)  26.540816 0.7247084 144  36.62275       0
## weekweek1   -13.044898 1.0518816 144 -12.40149       0
## weekweek4   -11.108163 1.0825643 144 -10.26097       0
## weekweek6    -5.783673 1.1612843 144  -4.98041       0
##  Correlation: 
##           (Intr) wekwk1 wekwk4
## weekweek1 -0.266              
## weekweek4 -0.267  0.705       
## weekweek6 -0.055  0.387  0.333
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -1.05817575 -0.29992778 -0.06805226  0.24769639  1.88721251 
## 
## Number of Observations: 196
## Number of Groups: 49

A second model was explored, treating time as a categorical variable. The mean blood lead level is 26.54 at baseline, 13.5 at week 1, 15.43 at week 4, and 20.76 at week 6. The coefficients for these fixed effect estimates were statistically significant at the 0.05 level (and were notably larger especially when compared to the coefficient for the numeric time variable in the first model). Use of the “week” variable also appears to have reduced the residual variance. In addition, comparison of the AIC and BIC estimates in the models suggest that perhaps this second model is best suited for the data.

3b) Is there evidence of individual variability in the intercept or slope?

In the first model, there appears to be individual variability in the intercept and slope, especially when considering the relatively small mean or fixed parameter estimate of time. The residual variance indicates high variability of blood lead levels around the individual regression lines for each child over time. The variance parameters also indicate considerable individual variability in the intercept. Because the intercept estimate does not have a variance of zero, this suggests that the inclusion of the random effect is needed to account for individual unmeasured explanatory variables that affect each child’s blood lead levels.

Individual variability also appears to be evident in the second model with the categorical week variable. While the residual variance is reduced compared to the first model, there is individual variation in the intercept and in the slopes for the effect of week. The standard deviation and confidence intervals for the variance estimates do not contain zero, providing further evidence of individual variability.

## Approximate 95% confidence intervals
## 
##  Fixed effects:
##                  lower       est.      upper
## (Intercept)  25.108376  26.540816  27.973257
## weekweek1   -15.124021 -13.044898 -10.965775
## weekweek4   -13.247933 -11.108163  -8.968394
## weekweek6    -8.079039  -5.783673  -3.488308
## attr(,"label")
## [1] "Fixed effects:"
## 
##  Random Effects:
##   Level: ID 
##                                 lower       est.      upper
## sd((Intercept))             0.5248365  4.3689672 36.3691847
## sd(weekweek1)               0.9060366  6.3970470 45.1661806
## sd(weekweek4)               1.0812319  6.6431355 40.8157109
## sd(weekweek6)               1.5906805  7.2654254 33.1847961
## cor((Intercept),weekweek1) -0.9880020 -0.1180442  0.9807889
## cor((Intercept),weekweek4) -0.9854442 -0.1251083  0.9760417
## cor((Intercept),weekweek6) -0.9941429  0.1378789  0.9966336
## cor(weekweek1,weekweek4)   -0.9145553  0.7690160  0.9984794
## cor(weekweek1,weekweek6)   -0.2238389  0.3559475  0.7496560
## cor(weekweek4,weekweek6)   -0.4296094  0.2870889  0.7818837
## 
##  Within-group standard error:
##        lower         est.        upper 
## 6.093211e-03 2.578185e+00 1.090892e+03

3c) Does a linear model fit the data reasonably well?

The residuals plots seem to indicate that the models fit the data reasonably well but with some bias (although the residual plot of model 2 appears to be driven by an outlier). The Q-Q plot of the residuals appears to show that model 2 is an improvement over model 1, with a more normal distribution of residuals in the middle. Although the heavy tails remain at the ends of the distribution, the linear model may be sufficient in the data.

4) Compute and interpret the intraclass correlation coefficient for blood lead level.

The intraclass correlation coefficient is 0.24.

0.24 of the total variance can be attributed to individual clustering, or correlation of blood lead level measurements within. This suggests that there is a need for fixed effects model.