Our objective is to seek a model that embodies two types of research questions:
level-1 questions about within-person change
level-2 questions about between-person differences in change.
The level-1 component of the multilevel model, also known as the individual growth model, represents the change we expect each member of the population to experience during the time period under study. (each person has a model across the time period).
\(y_{ij} = \pi_{0i} + \pi_{1i}x_{ij} + \epsilon_{ij}\), where i and j represent individuals and occasions/time.
The above model assumes that a straight line adequately represents each person’s true change over time and any deviations from linearity observed in sample data result from random measurement error (\(\epsilon_{ij}\)).
From real data analysis, we could say the stochastic part of level-1 model (\(\epsilon_{ij}\)) is not just measurement errors, which violates the independence of residuals.
Generally, we assume \(\epsilon_{ij}\) ~ \(N(0, \sigma^2)\). This assumption may be less credible in longitudinal data. For level-1 model, residuals may be autocorrealted and the variance may not equal across time. Because the same person is measured on several occasions, any unexplained person-specific time-invariant effect in the residuals will create a correlation across occasions.
Level-2 model codifies the relationship between inter-individual differences and time-invariant characteristics of the individual; which simultaneously account for both the general patterns (between group differences in intercepts and slopes) and inter-individual patterns within groups.
Here, we assume z is binary variable and is a predictor for the individual change.
Level-2 sub-model is
\(\pi_{0i} = \gamma_{00} + \gamma_{01}z_i+\xi_{0i}\), \(\pi_{1i} = \gamma_{10} + \gamma_{11}z_i+\xi_{1i}\).
We call \(\gamma_{00}\), \(\gamma_{01}\), \(\gamma_{10}\) and \(\gamma_{11}\) as fixed effects.
For z=0, we have \(\pi_{0i} = \gamma_{00} +\xi_{0i}\) and \(\pi_{1i} = \gamma_{10}+\xi_{1i}\). \(\gamma_{00}\) represents the average true initial status and \(\gamma_{10}\) is the average true rate of change.
The multilevel model (z=0) is for addressing: what is the average true change in the population with z=0?
For z=1, we have \(\pi_{0i}=\gamma_{00} + \gamma_{01}+\xi_{0i}\), \(\pi_{1i} = \gamma_{10} + \gamma_{11}+\xi_{1i}\). \(\gamma_{01}\) represents the hypothesized difference in average true initial status between groups; \(\gamma_{11}\) is the difference in average true annual rate of change.
The multilevel model (z=1) is for addressing: what is the difference in the average of trajectory of the true change associate with z=1?
We permit the level-2 residuals to be correlated.
\(\xi_{0i}\) and \(\xi_{1i}\) are the deviations of the individual growth parameters from their population average, their population covariance summarizes the association between the true individual intercepts and slopes. The population covariance of the level-2 residuals \(\sigma_{01}\), summarizes the magnitude and direction of the association between true initial status and true annual rate of change, controlling for the z.
\(\sigma_{01}\) addresses: controlling for z=1,are true initial status and true rate of change related?
We assume \(\xi_{0i}\) and \(\xi_{1i}\) are bivariate normal with mean 0 and unknown variance.
1
The reason we cannot use ANOVA on multilevel data is the residuals are NOT independent.
ICC (Intraclass Correlation \(\rho\)): measure of the proportion of variation in the outcome variable that occurs between groups versus the total variation present.
\(\rho=\frac{\tau^2}{\tau^2 + \sigma^2}\), where \(\tau^2\) is the population variance between clusters and \(\sigma^2\) is the population variance within the cluster. Higher ICC indicates the total variation in the outcome is highly associated with the cluster memberships (a relatively strong relationship among the scores for 2 individuals from the same cluster). Another way to say this is individuals from the same cluster are more alike on the measured variable.
\(\sigma^2\) is the weighted average of within-cluster variances. \(\tau^2\) could also be viewed as the impact of the cluster on the dependent variable; and testing it for significance is equivalent to testing the cluster has no impact on the dependent variable.
Larger ICC values are indicative of a greater impact of clustering. Thus, as the ICC increases, we must be more cognizant of employing multilevel modeling strategies in data analysis.
In practice, if we find that \(\tau^2\) is not 0, we must be careful in describing the relationship between the independent and dependent variables, as it is not the same for all clusters.
Why? - reduce collinearity caused by including an interaction term in a model;
A variant MLE known as restricted MLE has proven more accurate than MLE for estimating variance parameters.
As the number of level-2 cluster increases, the difference in value for MLE and REML estimates become very small.
Some notations:
Level-1 model: \(y_{ij}=\beta_{0j} + \beta_{1j}x + \epsilon_{ij}\); Level-2 model: \(\beta_{0j}=\gamma_{00} + U_{0j}\), \(\beta_{1j}=\gamma_{10}\)
No independent variable.
m1 <- lme(fixed = geread ~ 1, random = ~1|school, data = Achieve)
summary(m1)## Linear mixed-effects model fit by REML
## Data: Achieve
## AIC BIC logLik
## 46274.31 46296.03 -23134.15
##
## Random effects:
## Formula: ~1 | school
## (Intercept) Residual
## StdDev: 0.6257119 2.24611
##
## Fixed effects: geread ~ 1
## Value Std.Error DF t-value p-value
## (Intercept) 4.306753 0.05497501 10160 78.3402 0
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.3229469 -0.6377948 -0.2137753 0.2849664 3.8811630
##
## Number of Observations: 10320
## Number of Groups: 160
The above result shows the ICC is 0.6257/(0.6257+2.2461)=0.218, which means the correlation of reading test scores among students within the same school is 0.22.
m2 <- lme(fixed = geread ~ gevocab, random = ~1|school, data = Achieve)
summary(m2)## Linear mixed-effects model fit by REML
## Data: Achieve
## AIC BIC logLik
## 43145.2 43174.17 -21568.6
##
## Random effects:
## Formula: ~1 | school
## (Intercept) Residual
## StdDev: 0.3158785 1.94074
##
## Fixed effects: geread ~ gevocab
## Value Std.Error DF t-value p-value
## (Intercept) 2.0233559 0.04930868 10159 41.03447 0
## gevocab 0.5128977 0.00837268 10159 61.25850 0
## Correlation:
## (Intr)
## gevocab -0.758
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.0822506 -0.5734728 -0.2103488 0.3206692 4.4334337
##
## Number of Observations: 10320
## Number of Groups: 160
The above code random=~1|school indicates “1” stands for the intercept, and only random intercept model will used and that the random intercept varies within school. Each school will have a model geread ~ gevocab.
model 2 is better than model 1 based on AIC, BIC and LogLik. To estimate the proportion of variance in the outcome variable accounted for at each level of model, for level 1 model, we have the following:
\(R^2=1-\frac{0.3158785 +1.94074}{0.6257119+2.24611}=0.214\), which tells us that model 1 explains around 21% of the variance in the reading score.
For level-2 model, we have \(R_2^2=1-\frac{0.3158785 +1.94074/B}{0.6257119+2.24611/B}\), where B=10320(#. of rows)/160(unique schools)=64.5, which leads to \(R_2^2\) is 47.6%.
The impact of independent variable on the dependent is allowed to vary across the Level 2 effects. In the current example, this would mean that we allow the impact of gevocab on geread to vary from one school to another.
m3 <- lme(fixed = geread ~ gevocab, random=~gevocab|school, data = Achieve)
summary(m3)## Linear mixed-effects model fit by REML
## Data: Achieve
## AIC BIC logLik
## 43004.85 43048.3 -21496.43
##
## Random effects:
## Formula: ~gevocab | school
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 0.5316531 (Intr)
## gevocab 0.1389443 -0.859
## Residual 1.9146626
##
## Fixed effects: geread ~ gevocab
## Value Std.Error DF t-value p-value
## (Intercept) 2.0057064 0.06108786 10159 32.83314 0
## gevocab 0.5203554 0.01441548 10159 36.09699 0
## Correlation:
## (Intr)
## gevocab -0.866
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.7102091 -0.5674433 -0.2074358 0.3176380 4.6774569
##
## Number of Observations: 10320
## Number of Groups: 160
intervals(m3)## Approximate 95% confidence intervals
##
## Fixed effects:
## lower est. upper
## (Intercept) 1.8859621 2.0057064 2.1254506
## gevocab 0.4920982 0.5203554 0.5486126
## attr(,"label")
## [1] "Fixed effects:"
##
## Random Effects:
## Level: school
## lower est. upper
## sd((Intercept)) 0.4250700 0.5316531 0.6649611
## sd(gevocab) 0.1153701 0.1389443 0.1673356
## cor((Intercept),gevocab) -0.9178709 -0.8585096 -0.7615768
##
## Within-group standard error:
## lower est. upper
## 1.888327 1.914663 1.941365
The result shows that gevocab is significantly related to geread across schools. The estimated coefficient of gecovab is 0.52, which can be interpreted as the average impact of the predictor on the outcome across schools. 0.1389443 represents \(\tau^2_1\) and reflects the variation in coefficients across schools. This tells how the relationship between geread and gevocab varies from school to school. As before, we have \(\sigma^2=1.91466\) and \(\tau^2=0.53165\). Taken together, the largest source of random variation in geread is variation among students within schools, with lesser variation from differences in the conditional mean (intercept) and coefficient for gevocab across shcools.
Adding age: if we like to see whether the age of a student impacts reading performance and want to know this effect to vary from one school to another.
m4 <- lme(fixed = geread ~ gevocab+age, random=~gevocab+age|school, data = Achieve)
summary(m4)## Linear mixed-effects model fit by REML
## Data: Achieve
## AIC BIC logLik
## 43015.77 43088.18 -21497.88
##
## Random effects:
## Formula: ~gevocab + age | school
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 0.492487773 (Intr) gevocb
## gevocab 0.137976606 -0.073
## age 0.006387997 -0.649 -0.601
## Residual 1.914030473
##
## Fixed effects: geread ~ gevocab + age
## Value Std.Error DF t-value p-value
## (Intercept) 2.9614055 0.4151887 10158 7.13267 0.0000
## gevocab 0.5191496 0.0143563 10158 36.16175 0.0000
## age -0.0088390 0.0038396 10158 -2.30208 0.0214
## Correlation:
## (Intr) gevocb
## gevocab -0.095
## age -0.989 -0.032
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.6805531 -0.5687008 -0.2091071 0.3180583 4.6850735
##
## Number of Observations: 10320
## Number of Groups: 160
intervals(m4, which = "fixed")## Approximate 95% confidence intervals
##
## Fixed effects:
## lower est. upper
## (Intercept) 2.14755358 2.961405543 3.775257505
## gevocab 0.49100837 0.519149589 0.547290809
## age -0.01636526 -0.008838966 -0.001312672
## attr(,"label")
## [1] "Fixed effects:"
Age is also significantly associated with geread. Age has negative effect on gevocab. 0.006387997 indicates that the relationship of vocabulary on reading varies more across schools than dose the impact of age.
We add a new layer classroom: we may assume that at least a portion of a students’ performance on a reading test is due to the classroom in which he or she learns (the quality of the teacher, the presence of distuptive students, etc…).
m1 <- lme(fixed = geread ~ 1,random = ~1|school/class, data = dat)
summary(m1)## Linear mixed-effects model fit by REML
## Data: dat
## AIC BIC logLik
## 46154 46182.97 -23073
##
## Random effects:
## Formula: ~1 | school
## (Intercept)
## StdDev: 0.5583923
##
## Formula: ~1 | class %in% school
## (Intercept) Residual
## StdDev: 0.5221676 2.201589
##
## Fixed effects: geread ~ 1
## Value Std.Error DF t-value p-value
## (Intercept) 4.30806 0.05499164 9752 78.34027 0
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.3052003 -0.6289603 -0.2093700 0.3049085 3.8673234
##
## Number of Observations: 10320
## Number of Groups:
## school class %in% school
## 160 568
intervals(m1)## Approximate 95% confidence intervals
##
## Fixed effects:
## lower est. upper
## (Intercept) 4.200265 4.30806 4.415855
## attr(,"label")
## [1] "Fixed effects:"
##
## Random Effects:
## Level: school
## lower est. upper
## sd((Intercept)) 0.4702517 0.5583923 0.6630533
## Level: class
## lower est. upper
## sd((Intercept)) 0.4545912 0.5221676 0.5997895
##
## Within-group standard error:
## lower est. upper
## 2.170908 2.201589 2.232704
Adding more independent variables.
m2 <- lme(fixed = geread ~ gevocab + clenroll+cenroll, random = ~1|school/class, data = dat)
summary(m2)## Linear mixed-effects model fit by REML
## Data: dat
## AIC BIC logLik
## 43144.87 43195.56 -21565.43
##
## Random effects:
## Formula: ~1 | school
## (Intercept)
## StdDev: 0.2766194
##
## Formula: ~1 | class %in% school
## (Intercept) Residual
## StdDev: 0.3007871 1.922991
##
## Fixed effects: geread ~ gevocab + clenroll + cenroll
## Value Std.Error DF t-value p-value
## (Intercept) 1.6751266 0.20809605 9751 8.04978 0.0000
## gevocab 0.5075566 0.00842654 9751 60.23313 0.0000
## clenroll 0.0189860 0.00955860 407 1.98628 0.0477
## cenroll -0.0000037 0.00000364 158 -1.02193 0.3084
## Correlation:
## (Intr) gevocb clnrll
## gevocab -0.124
## clenroll -0.961 -0.062
## cenroll -0.134 0.025 -0.007
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.2211629 -0.5672782 -0.2079045 0.3183508 4.4736276
##
## Number of Observations: 10320
## Number of Groups:
## school class %in% school
## 160 568
intervals(m2)## Approximate 95% confidence intervals
##
## Fixed effects:
## lower est. upper
## (Intercept) 1.267215e+00 1.675127e+00 2.083038e+00
## gevocab 4.910389e-01 5.075566e-01 5.240744e-01
## clenroll 1.956547e-04 1.898604e-02 3.777642e-02
## cenroll -1.091387e-05 -3.721429e-06 3.471016e-06
## attr(,"label")
## [1] "Fixed effects:"
##
## Random Effects:
## Level: school
## lower est. upper
## sd((Intercept)) 0.2173971 0.2766194 0.3519749
## Level: class
## lower est. upper
## sd((Intercept)) 0.2409209 0.3007871 0.3755294
##
## Within-group standard error:
## lower est. upper
## 1.896210 1.922991 1.950151
Adding an interaction term.
m3 <- lme(fixed = geread ~ gevocab + clenroll+cenroll+gevocab*cenroll, random = ~1|school/class, data = dat)
summary(m3)## Linear mixed-effects model fit by REML
## Data: dat
## AIC BIC logLik
## 43167.75 43225.69 -21575.88
##
## Random effects:
## Formula: ~1 | school
## (Intercept)
## StdDev: 0.2740961
##
## Formula: ~1 | class %in% school
## (Intercept) Residual
## StdDev: 0.2975919 1.923059
##
## Fixed effects: geread ~ gevocab + clenroll + cenroll + gevocab * cenroll
## Value Std.Error DF t-value p-value
## (Intercept) 1.7515430 0.20999286 9750 8.34096 0.0000
## gevocab 0.4899998 0.01168332 9750 41.94013 0.0000
## clenroll 0.0188007 0.00951172 407 1.97659 0.0488
## cenroll -0.0000132 0.00000563 158 -2.33721 0.0207
## gevocab:cenroll 0.0000023 0.00000107 9750 2.18957 0.0286
## Correlation:
## (Intr) gevocb clnrll cenrll
## gevocab -0.203
## clenroll -0.949 -0.041
## cenroll -0.212 0.542 0.000
## gevocab:cenroll 0.166 -0.693 -0.007 -0.766
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.1901563 -0.5682666 -0.2060729 0.3183307 4.4723839
##
## Number of Observations: 10320
## Number of Groups:
## school class %in% school
## 160 568
intervals(m3)## Approximate 95% confidence intervals
##
## Fixed effects:
## lower est. upper
## (Intercept) 1.339913e+00 1.751543e+00 2.163172e+00
## gevocab 4.670981e-01 4.899998e-01 5.129015e-01
## clenroll 1.025107e-04 1.880075e-02 3.749899e-02
## cenroll -2.427125e-05 -1.315470e-05 -2.038145e-06
## gevocab:cenroll 2.450820e-07 2.339608e-06 4.434134e-06
## attr(,"label")
## [1] "Fixed effects:"
##
## Random Effects:
## Level: school
## lower est. upper
## sd((Intercept)) 0.2149691 0.2740961 0.349486
## Level: class
## lower est. upper
## sd((Intercept)) 0.2375104 0.2975919 0.3728718
##
## Within-group standard error:
## lower est. upper
## 1.896272 1.923059 1.950225
corp is like district.
m4 <- lme(fixed = geread ~ 1, random = ~1|corp/school/class, data = dat)
summary(m4)## Linear mixed-effects model fit by REML
## Data: dat
## AIC BIC logLik
## 46113.22 46149.43 -23051.61
##
## Random effects:
## Formula: ~1 | corp
## (Intercept)
## StdDev: 0.4209979
##
## Formula: ~1 | school %in% corp
## (Intercept)
## StdDev: 0.295833
##
## Formula: ~1 | class %in% school %in% corp
## (Intercept) Residual
## StdDev: 0.5247746 2.201587
##
## Fixed effects: geread ~ 1
## Value Std.Error DF t-value p-value
## (Intercept) 4.32583 0.07197848 9752 60.09894 0
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.2995234 -0.6304742 -0.2130696 0.3028624 3.9448255
##
## Number of Observations: 10320
## Number of Groups:
## corp school %in% corp
## 59 160
## class %in% school %in% corp
## 568
intervals(m4)## Approximate 95% confidence intervals
##
## Fixed effects:
## lower est. upper
## (Intercept) 4.184738 4.32583 4.466923
## attr(,"label")
## [1] "Fixed effects:"
##
## Random Effects:
## Level: corp
## lower est. upper
## sd((Intercept)) 0.321723 0.4209979 0.5509065
## Level: school
## lower est. upper
## sd((Intercept)) 0.2003532 0.295833 0.4368144
## Level: class
## lower est. upper
## sd((Intercept)) 0.4578135 0.5247746 0.6015295
##
## Within-group standard error:
## lower est. upper
## 2.170912 2.201587 2.232695
m5 <- lme(fixed = geread ~ gevocab + gender, random = ~gender|school/class, data = dat)
summary(m5)## Linear mixed-effects model fit by REML
## Data: dat
## AIC BIC logLik
## 43127.93 43200.35 -21553.97
##
## Random effects:
## Formula: ~gender | school
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 0.2207298 (Intr)
## gender 0.1101379 -0.019
##
## Formula: ~gender | class %in% school
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 0.302941367 (Intr)
## gender 0.004914493 -0.022
## Residual 1.922517135
##
## Fixed effects: geread ~ gevocab + gender
## Value Std.Error DF t-value p-value
## (Intercept) 2.0150114 0.07570370 9750 26.61708 0.0000
## gevocab 0.5091256 0.00840838 9750 60.54976 0.0000
## gender 0.0175517 0.03929581 9750 0.44666 0.6551
## Correlation:
## (Intr) gevocb
## gevocab -0.527
## gender -0.758 0.039
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.2117003 -0.5676457 -0.2072293 0.3161412 4.4474380
##
## Number of Observations: 10320
## Number of Groups:
## school class %in% school
## 160 568
If we like to hypothesize that the relationship of gender and reading varies across classrooms but not across schools. We have the following model.
m6 <- lme(fixed = geread ~ gevocab + gender, random = list(school=~1, class=~gender), data = dat)
summary(m6)## Linear mixed-effects model fit by REML
## Data: dat
## AIC BIC logLik
## 43125.18 43183.11 -21554.59
##
## Random effects:
## Formula: ~1 | school
## (Intercept)
## StdDev: 0.2737238
##
## Formula: ~gender | class %in% school
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 0.3623693 (Intr)
## gender 0.1651168 -0.563
## Residual 1.9215122
##
## Fixed effects: geread ~ gevocab + gender
## Value Std.Error DF t-value p-value
## (Intercept) 2.0128841 0.07726503 9750 26.05168 0.0000
## gevocab 0.5090473 0.00841459 9750 60.49582 0.0000
## gender 0.0190565 0.03880626 9750 0.49107 0.6234
## Correlation:
## (Intr) gevocb
## gevocab -0.516
## gender -0.768 0.039
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.2117225 -0.5676842 -0.2072084 0.3182780 4.4324392
##
## Number of Observations: 10320
## Number of Groups:
## school class %in% school
## 160 568
Level 1 model: \(Y_{it}=\pi_{0i} + \pi_{1i}T_{it}+ \pi_{2i}X_{it}+\epsilon_{it}\) \(\pi_{0i} = \beta_{00} + \beta_{01}Z_i+ r_{0i}\)
Level 2 model: \(\pi_{1i} = \beta_{10} + r{1i}\) \(\pi_{2i} = \beta_{20} + r_{2i}\)
Now, we have time predictor, time-varying predictor and time-invariant predictors.
Time-varying predictors should appear at level 1 since they are associated with specific measurements, and time-invariant predictors should appear at level 2 or higher because they are associated with individuals across all measurement conditions.
Using stack function in R to transform data to person-period format, shows as the following code:
Lang <- read.csv("Language.csv", header = TRUE)
LangPP <- data.frame(ID=Lang$ID, school = Lang$school, Process = Lang$Process, Application = Lang$Application, Grammar = Lang$Grammar, Mechanics = Lang$Goal4RitScoref08, stack(Lang, select = LangScore1:LangScore6))
names(LangPP)[c(7)] <- "Language"The above code creates a new time variable named “ind”.
Change time variable as continuous numeric variable as follows:
LangPP$Time <- car::Recode(LangPP$ind, '"LangScore1"=0; "LangScore2"=1; "LangScore3"=2; "LangScore4"=3; "LangScore5"=4; "LangScore6"=5;', as.factor=FALSE)
head(LangPP)## ID school Process Application Grammar Mechanics Language ind Time
## 1 1 4 231 217 226 222 224 LangScore1 0
## 2 2 4 213 209 204 220 211 LangScore1 0
## 3 3 4 226 227 220 222 223 LangScore1 0
## 4 4 4 211 223 214 203 213 LangScore1 0
## 5 5 4 218 209 219 217 216 LangScore1 0
## 6 6 4 183 180 188 180 183 LangScore1 0
m1 <- lme(fixed = Language ~ Time, random = ~1|ID, data = LangPP)
summary(m1)## Linear mixed-effects model fit by REML
## Data: LangPP
## AIC BIC logLik
## 135173.6 135204.9 -67582.82
##
## Random effects:
## Formula: ~1 | ID
## (Intercept) Residual
## StdDev: 15.23617 7.526427
##
## Fixed effects: Language ~ Time
## Value Std.Error DF t-value p-value
## (Intercept) 197.21573 0.29356329 15189 671.7997 0
## Time 3.24619 0.03264194 15189 99.4483 0
## Correlation:
## (Intr)
## Time -0.278
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -6.40395610 -0.49863770 0.03877858 0.56691711 4.86362966
##
## Number of Observations: 18228
## Number of Groups: 3038
Adding predictors to the model is the same way as previous stated, whether they are time varying or time invariant.
nd <- LangPP[complete.cases(LangPP), ]
m2 <- lme(fixed = Language ~ Time + Grammar, random = ~1|ID, data = nd)
summary(m2)## Linear mixed-effects model fit by REML
## Data: nd
## AIC BIC logLik
## 130031.1 130070.2 -65010.56
##
## Random effects:
## Formula: ~1 | ID
## (Intercept) Residual
## StdDev: 6.01123 7.526131
##
## Fixed effects: Language ~ Time + Grammar
## Value Std.Error DF t-value p-value
## (Intercept) 73.70355 1.0913675 15179 67.53321 0
## Time 3.24548 0.0326514 15179 99.39795 0
## Grammar 0.63089 0.0055231 3034 114.22629 0
## Correlation:
## (Intr) Time
## Time -0.075
## Grammar -0.991 0.000
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -6.62860687 -0.52600469 0.03741845 0.57880493 4.67614317
##
## Number of Observations: 18216
## Number of Groups: 3036
Grammar is also statistically significantly related to language test scores, meaning that measurement occasions revealing higher Grammar scores also demonstrated higher Language scores.
To allow the growth rate to vary randomly across individuals:
m3 <- lme(fixed = Language ~ Time + Grammar, random = ~Time|ID, data = nd)
summary(m3)## Linear mixed-effects model fit by REML
## Data: nd
## AIC BIC logLik
## 128617.1 128671.7 -64301.54
##
## Random effects:
## Formula: ~Time | ID
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 4.645126 (Intr)
## Time 1.792940 0.026
## Residual 6.737546
##
## Fixed effects: Language ~ Time + Grammar
## Value Std.Error DF t-value p-value
## (Intercept) 54.47082 1.0025930 15179 54.32994 0
## Time 3.24548 0.0437406 15179 74.19834 0
## Grammar 0.72912 0.0050825 3034 143.45691 0
## Correlation:
## (Intr) Time
## Time -0.047
## Grammar -0.993 0.000
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -5.316263678 -0.523957279 -0.001213543 0.536332069 5.032113994
##
## Number of Observations: 18216
## Number of Groups: 3036
intervals(m3)## Approximate 95% confidence intervals
##
## Fixed effects:
## lower est. upper
## (Intercept) 52.5056194 54.4708223 56.4360252
## Time 3.1597459 3.2454828 3.3312197
## Grammar 0.7191527 0.7291182 0.7390837
## attr(,"label")
## [1] "Fixed effects:"
##
## Random Effects:
## Level: ID
## lower est. upper
## sd((Intercept)) 4.43512829 4.64512564 4.86506607
## sd(Time) 1.72057864 1.79294012 1.86834487
## cor((Intercept),Time) 0.01112055 0.02613533 0.04113833
##
## Within-group standard error:
## lower est. upper
## 6.655244 6.737546 6.820867
Time is significant, indicates that growth rate in language over the 6 time points do differ across individuals in the sample.
Adding a third level of data.
m4 <- lme(fixed = Language ~ Time + Grammar, random = ~1|school/ID, data = nd)
summary(m4)## Linear mixed-effects model fit by REML
## Data: nd
## AIC BIC logLik
## 129906 129952.9 -64947
##
## Random effects:
## Formula: ~1 | school
## (Intercept)
## StdDev: 1.864035
##
## Formula: ~1 | ID %in% school
## (Intercept) Residual
## StdDev: 5.781975 7.526131
##
## Fixed effects: Language ~ Time + Grammar
## Value Std.Error DF t-value p-value
## (Intercept) 74.44782 1.2133745 15179 61.35601 0
## Time 3.24548 0.0326514 15179 99.39795 0
## Grammar 0.62608 0.0059030 3000 106.06152 0
## Correlation:
## (Intr) Time
## Time -0.067
## Grammar -0.951 0.000
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -6.66711402 -0.52788280 0.03525493 0.58085294 4.67186329
##
## Number of Observations: 18216
## Number of Groups:
## school ID %in% school
## 35 3036
Comparing with and without school with ANOVA
anova(m2,m4)## Model df AIC BIC logLik Test L.Ratio p-value
## m2 1 5 130031.1 130070.2 -65010.56
## m4 2 6 129906.0 129952.9 -64947.00 1 vs 2 127.1166 <.0001
Including school leads to better model fit.
\(\bullet\) The corAR1 error structure is a first-order autoregressive error structure for situations when time is measured in fixed intervals \(\bullet\) The corCAR1 error structure is a first-order autoregressive error structure for situations when time is measured in varying intervals \(\bullet\) The corARMA error structure is an error structure that incorporates both an autoregressive component, and a moving average process
If we believe that the relationship among the longitudinal measures follow an autoregressive process we should use the following code:
m5 <- lme(fixed = Language ~ Time, random = ~1|ID, correlation = corAR1(), data = nd)
summary(m5)## Linear mixed-effects model fit by REML
## Data: nd
## AIC BIC logLik
## 134812.9 134852 -67401.47
##
## Random effects:
## Formula: ~1 | ID
## (Intercept) Residual
## StdDev: 15.12726 7.826729
##
## Correlation Structure: AR(1)
## Formula: ~1 | ID
## Parameter estimate(s):
## Phi
## 0.187954
## Fixed effects: Language ~ Time
## Value Std.Error DF t-value p-value
## (Intercept) 196.93995 0.29731496 15179 662.3950 0
## Time 3.32366 0.03678136 15179 90.3625 0
## Correlation:
## (Intr)
## Time -0.309
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -6.04160440 -0.47534079 0.05309946 0.56304378 4.68692543
##
## Number of Observations: 18216
## Number of Groups: 3036
intervals(m5)## Approximate 95% confidence intervals
##
## Fixed effects:
## lower est. upper
## (Intercept) 196.35718 196.939953 197.522726
## Time 3.25156 3.323656 3.395752
## attr(,"label")
## [1] "Fixed effects:"
##
## Random Effects:
## Level: ID
## lower est. upper
## sd((Intercept)) 14.72836 15.12726 15.53697
##
## Correlation structure:
## lower est. upper
## Phi 0.1646125 0.187954 0.2110851
## attr(,"label")
## [1] "Correlation structure:"
##
## Within-group standard error:
## lower est. upper
## 7.718620 7.826729 7.936352
The point estimate for the autocorrelation is 0.188 with 95% CI (0.165, 0.211), meaning this autocorrelation is significantly different from 0. This result indicates a positive relationship between adjacent pairs of scores so that a relatively higher score at one point in time is associated with a relatively higher socre at the next point in time.
When adding an error structure to a random coefficients model, the random coefficients structure must be specified in the syntax for specifying the correlation as follows:
m6 <- lme(fixed = Language ~ Time, random = ~Time|ID, correlation = corAR1(form = ~Time|ID), data = nd)
summary(m6)## Linear mixed-effects model fit by REML
## Data: nd
## AIC BIC logLik
## 133477.4 133532.1 -66731.7
##
## Random effects:
## Formula: ~Time | ID
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 18.405189 (Intr)
## Time 1.876833 -0.729
## Residual 6.567955
##
## Correlation Structure: AR(1)
## Formula: ~Time | ID
## Parameter estimate(s):
## Phi
## -0.08333152
## Fixed effects: Language ~ Time
## Value Std.Error DF t-value p-value
## (Intercept) 197.34089 0.3439154 15179 573.8064 0
## Time 3.21335 0.0436252 15179 73.6582 0
## Correlation:
## (Intr)
## Time -0.677
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -5.79781447 -0.49041475 0.01559593 0.52065744 5.17467096
##
## Number of Observations: 18216
## Number of Groups: 3036
intervals(m6)## Approximate 95% confidence intervals
##
## Fixed effects:
## lower est. upper
## (Intercept) 196.666771 197.340887 198.015003
## Time 3.127841 3.213352 3.298863
## attr(,"label")
## [1] "Fixed effects:"
##
## Random Effects:
## Level: ID
## lower est. upper
## sd((Intercept)) 17.9215268 18.4051892 18.9019047
## sd(Time) 1.7972087 1.8768326 1.9599842
## cor((Intercept),Time) -0.7527637 -0.7290973 -0.7035486
##
## Correlation structure:
## lower est. upper
## Phi -0.1136616 -0.08333152 -0.0528463
## attr(,"label")
## [1] "Correlation structure:"
##
## Within-group standard error:
## lower est. upper
## 6.473317 6.567955 6.663977
Using package lattice.
dat <- Achieve[Achieve$corp == 940 & Achieve$school==767, ]
dotplot(class ~ geread, data = dat, jitter.y=TRUE)We could re-order the classroom with ascending order of geread.
dotplot(reorder(class, geread)~geread, data = dat)Classroom 3 is more homogeneous (smaller variance) and lower performing (smaller mean) than classroom 2 and 4.
One method for visualizing such large and complex data would be to focus on a higher level of data aggregation.
Achieve <- cbind(Achieve, Classroom_unique = paste(Achieve$corp, Achieve$school, Achieve$class, sep=""))
Achieve.class <- aggregate(Achieve, by = list(Achieve$Classroom_unique), FUN = mean)
dotplot(reorder(corp, geread)~geread, data = Achieve.class, jitter.y=TRUE)The above aggregating is just for plot, NOT for modeling the data.