Cross-sectional contextual effects models offer a method to estimate the mean difference in an outcome associated with a change in the group mean of a covariate while holding constant the individual value (Iversen 1991; Blalock 1984). This coding strategy can be extended to longitudinal data, whereby researchers can estimate the impact of contextual variables on student growth. This document offers a model for estimating these effects with cross-sectional and cross-classified longitudinal research designs
In the context of student data, where students were sampled within educational clusters such as schools or classrooms, contextual effects models allow researchers to estimate the average change in an outcome associated with increasing the cluster mean of a characteristic while holding constant the within-group dynamics.
If the data are structured such that there are \(j = \{1, 2, \dots, m\}\) clusters (such as schools) each with \(i = \{1, 2, \dots, n\}\) units (such as students) and the data generating process of \(y\) is involves a covariate, \(x\), with a fixed within group effect, \(\lambda\), and an independent contextual effect, \(\gamma\), then a mixed regression model can be estimated in the form of
\[ y_{ij} = \alpha + \lambda x_{ij} + \gamma \bar{x}_j + u_j + e_{ij} \]
Group-mean centering the level-1 value of \(x\) changes the slope of \(\bar{x}_j\) to be the between effect, \(\beta\), which is the slope of the group-mean of \(y\) on the group mean of \(x\)
\[ y_{ij} = \alpha + \lambda \left(x_{ij}-\bar{x}_j\right) + \beta \bar{x}_j + u_j + e_{ij}\]
These models are related in such a way that the estimate of within effect, \(\lambda\), plus the contextual effect, \(\gamma\), equals the between effect, \(\beta\).
For example, examine the data in Table 1, which are a small sample from the 1998 ECLS-K survey. In these data, there are 4 schools, each with 4 students. For each student, the data has recorded their spring math T-score and the students’ parents’ highest years of education. For each school, we have also recorded the average parents’ highest years of education.
Table 1: ECLS Data
## school SpngMath PrntEduc MeanPrntEduc
## 1 766 38 11 11.50
## 2 766 38 11 11.50
## 3 766 37 12 11.50
## 4 766 60 12 11.50
## 5 214 35 12 13.50
## 6 214 45 12 13.50
## 7 214 55 14 13.50
## 8 214 59 16 13.50
## 9 452 47 14 14.00
## 10 452 49 14 14.00
## 11 452 50 14 14.00
## 12 452 58 14 14.00
## 13 1212 67 16 17.75
## 14 1212 64 17 17.75
## 15 1212 69 18 17.75
## 16 1212 64 20 17.75
To build the intuition for these contextual models, consider the following regression models that produce unique effects. The first model will estimate the overall effect, \(\pi\) by using as the covariate the unit-level value of \(x\), \(x_{ij}\) to predict \(y\). The second model estimates the between-group effect, \(\beta\), by using the group-level average value of \(x\), \(\bar{x}_j\), to predict \(y\).
The first model estimates the overall effect, \(\pi\), and is estimated with \[ y_{ij} = \alpha + \pi x_{ij} + e_{ij}. \] For example, the overall effect for the data in Table 1 is presented in Table 2.
Table 2: Overall and Between Effect of Parental Education on Math T-Scores
## [1] "Overall Effect"
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.57474 9.69296 0.1625 0.8732635
## PrntEduc 3.56742 0.67259 5.3040 0.0001114 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## [1] "Between Effect"
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.60042 12.17656 -0.0493 0.9613692
## MeanPrntEduc 3.72073 0.84758 4.3898 0.0006168 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The overall and between effects can be related to the within and contextual effects by the following identity \[ b_{yx|z} = \frac{b_{yx}-b_{yx}b_{zx}}{1-b_{xz}b_{zx}} \] where \(b_{\mbox{(Dependent, Independent)}}\) is the regression slope for the independent variable predicting the dependent variable. We can calculate the within and contextual effect with knowledge of how the unit-level value and the group-mean relate. The first relationship is the value of \(\eta^2\), which is the classic effect size from the ANOVA framework (Cohen 1992) defined as the ratio of the sum of squares between the groups to the sum of squares total,
\[\eta^2_x = \frac{SSB_x}{SST_x}. \]
This measures the level of clustering in variable and ranges from 0 to 1. High values indicate that the values of the variable cluster around the group means to a large extent, while low values indicate that group means are very similar. This value is also the regression slope of the unit-level values when predicting the group means, and thus it comprises one of the bi-variate regression relationships. The final bivariate relationship is the slope of the group means when predicting the unit-level values. because of how regression works (conditional means), this slope is by definition 1.
Table 3: ANOVA Table for Parental Education
## Loading required package: lsr
## Df Sum Sq Mean Sq F value Pr(>F)
## factor(school) 3 81.69 27.229 15.75 0.000184 ***
## Residuals 12 20.75 1.729
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## eta.sq eta.sq.part
## factor(school) 0.7974375 0.7974375
With these effects defined, it is possible to understand the nature of the within and contextual effects. Using the value of \(\eta^2\) as it relates to the predictor, the overall effect (\(\pi\)) of \(x\) predicting \(y\) and between effect (\(\beta\)) of \(\bar{x}_j\) predicting \(y\), the within-group effect, \(\lambda\), can be computed as
\[ \lambda = \frac{\pi-\beta\eta^2}{1-\eta^2}. \]
Conceptually, this indicates that the within effect, \(\lambda\), is the overall effect, \(\pi\), net of the between effect, \(\beta\), scaled by how clustered the predictor is, \(\eta^2\). If the predictor isn’t clustered (i.e., \(\eta^2 = 0\)) then the within effect is the overall effect.
For example, the within-group effect for the ECLS data can be calculated as
(overall.effect-between.effect*eta.2)/(1-eta.2)
## PrntEduc
## 2.963855
Which is equivalent to an econometric fixed effects regression that enters dummy variables for each group
printCoefmat(coefficients(summary(lm(SpngMath~PrntEduc+factor(school)))))
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.48795 22.69971 0.3739 0.7156
## PrntEduc 2.96386 1.65803 1.7876 0.1014
## factor(school)452 1.01807 5.40451 0.1884 0.8540
## factor(school)766 0.67771 6.28631 0.1078 0.9161
## factor(school)1212 4.90361 8.84174 0.5546 0.5903
or group mean centering the predictor
printCoefmat(coefficients(summary(lm(SpngMath~I(PrntEduc-MeanPrntEduc)))))
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 52.1875 2.8111 18.5645 2.944e-11 ***
## I(PrntEduc - MeanPrntEduc) 2.9639 2.4685 1.2007 0.2498
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The contextual effect, \(\gamma\) can be expressed as \[ \gamma = \frac{\beta - \pi}{1-\eta^2}. \] This is the between effect, \(\beta\), minus the overall effect \(\pi\), scaled by \(1-\eta^2\). If the predictor isn’t clustered (i.e., \(\eta^2 = 0\)) then the contextual effect is the difference between the between effect and the overall effect.
For example we can calculate the contextual effect as
(between.effect-overall.effect)/(1-eta.2)
## MeanPrntEduc
## 0.7568791
To estimate the contextual effect in the context of regression, we can fit an OLS model that includes both the (uncentered) unit-level value and the group mean
\[ y_{ij} = \alpha + \lambda x_{ij} + \gamma \bar{x}_j + e_{ij}\]
Where we can see that the OLS estimates in Table 4 reproduce our “by-hand” calculations.
Table 4: OLS Contextual Effects Regression
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.60042 11.14665 -0.0539 0.95786
## PrntEduc 2.96386 1.53946 1.9253 0.07636 .
## MeanPrntEduc 0.75688 1.72393 0.4390 0.66784
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Finally, we can also estimate the between effect in a similar regression model that group-mean centers the unit-level value
\[ y_{ij} = \alpha + \lambda\left(x_{ij}-\bar{x}_j\right) + \beta \bar{x}_j + e_{ij}\]
Table 5: OLS Regression Estimating Within and Between Effects
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.60042 11.14665 -0.0539 0.9578612
## I(PrntEduc - MeanPrntEduc) 2.96386 1.53946 1.9253 0.0763561 .
## MeanPrntEduc 3.72073 0.77589 4.7954 0.0003497 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can also see the relationship between the between effect, within-effect, and contextual effect with the following identities
\[ \pi = \lambda\left(1-\eta^2\right) + \beta\eta^2 \]
which tells us that the overall effect is the within-effect and the between effect weighted by the level of clustering in the predictor; that is, the more clustering there is in the predictor, the more weight is given to the between effect.
Finally, we can see from the following how the sum of the within and contextual effect is the between effect
\[ \lambda+\gamma = \frac{\pi-\beta\eta^2+\beta-\pi}{1-\eta^2}=\beta \]
At the very least, researchers should employ a generalized least squares estimator to produce the regression slopes and cluster-robust standard errors for the slopes. A typical method used in education research is the Hierarchical Linear Model (HLM, Raudenbush and Bryk 2002), also known as a mixed model. In mixed notation, the regression model to estimate contextual effects is similar to the model above, except now we add a random effect for the intercept (we make the assumption that the within-group effect is fixed, but this can certainly be relaxed). Thus, the multilevel (mixed) model is
\[ y_{ij} = \alpha + \lambda x_{ij} + \gamma \bar{x}_j + u_j + e_{ij}\]
and the intraclass correlation (\(\rho\)) is defined as
\[ \rho = \frac{\mbox{var}\left(u_j\right)}{\mbox{var}\left(u_j\right)+\mbox{var}\left(e_{ij}\right)}\]
The interpretation of the contextual effect, \(\gamma\) is straightforward. Literally, by way of regression, the slope \(\gamma\) for \(\bar{x}_j\) in the model that controls for \(x_{ij}\) is the impact of an increase in the group mean while holding constant the unit-level value. This can be seen in Figure 1, which plots six observations across 3 groups, each with a solid line indicating the within-group regression as a solid line and the between group regression (short dashes) that fits the group means.
In this case, the between group slope is 0.75 and the within group effect is 0.25. The contextual effect is then 0.5, represented by the long-dashed line that compares the value of the higher value of \(x\) in group 2 with the same value of \(x\) in the group with the higher mean. That vertical difference is the mean difference in \(y\) associated with a single unit change in the group mean.
Figure 1: Conceptual Representation of Contextual Effects Model
We can simulate this with code to specify \(\lambda\), \(\gamma\), and \(\rho\) to partition the random variance into within and between error.
sim.context.ml <- function(n,m,lambda,gamma,rho) {
#initialize data
data <- c()
#for each group, create unit datasets
for (j in 1:m) {
#group ids
group <- rep(j,n)
#random x variable
x <- rnorm(n)
#group mean
xbar <- rep(mean(x),n)
#define outcome
y <- lambda*x + gamma*xbar +
rnorm(n, sd = sqrt(1-rho)) + rep(rnorm(1, sd = sqrt(rho)),n)
#compile data
data <- rbind(data,cbind(group,x,xbar,y))
}
data <- as.data.frame(data)
return(data)
}
for example, we can specify a dataset with 3 groups and 3 units per group
## group x xbar y
## 1 1 0.3163503 -0.67433177 1.08672957
## 2 1 -1.4567236 -0.67433177 0.64019711
## 3 1 -0.8826220 -0.67433177 -0.47218334
## 4 2 -0.1989137 -0.70238225 -1.02027723
## 5 2 -1.1677561 -0.70238225 -1.61767364
## 6 2 -0.7404770 -0.70238225 -1.20539758
## 7 3 -0.1891049 -0.09788458 0.04633735
## 8 3 0.2791218 -0.09788458 -1.20656656
## 9 3 -0.3836707 -0.09788458 0.07809291
Suppose we simulate data whereby \(\lambda = 0.5\), \(\gamma = 0.25\), \(\rho\) is 0.1, and run a mixed model on that simulated data where 100 groups have 25 units each
require(lme4)
## Loading required package: lme4
## Loading required package: Matrix
## Loading required package: Rcpp
model <- lmer(y ~ x + xbar + (1 | group),
data = sim.context.ml(n = 25,
m = 100,
lambda = .5,
gamma = .25,
rho = .1))
summary(model)
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ x + xbar + (1 | group)
## Data:
## sim.context.ml(n = 25, m = 100, lambda = 0.5, gamma = 0.25, rho = 0.1)
##
## REML criterion at convergence: 6895
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.7463 -0.6347 0.0081 0.6727 2.9491
##
## Random effects:
## Groups Name Variance Std.Dev.
## group (Intercept) 0.1131 0.3364
## Residual 0.8681 0.9317
## Number of obs: 2500, groups: group, 100
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.03808 0.03867 0.985
## x 0.48948 0.01886 25.959
## xbar -0.02767 0.20141 -0.137
##
## Correlation of Fixed Effects:
## (Intr) x
## x 0.000
## xbar 0.106 -0.094
Table 6 presents the means and standard deviations for the parameters based on 30 replications of the same model.
Table 6: Mean and Standard Deviation of 50 simulations of a Cross-sectional Contextual Model
## Loading required package: parallel
## mean sd
## lambda (.5) 0.5017306 0.01820253
## gamma (.25) 0.2328467 0.19029598
## rho (.1) 0.1010155 0.01632070
Before we consider contextual effects on growth, we must first consider the case of longitudinal data often encountered in an educational research context.
We have \(j = \{1, 2, \dots, m\}\) groups, \(i = \{1, 2, \dots, n\}\) units, and each unit is observed over t = {1, 2, , k} time points. However, each unit may encounter a different group at different time points. This creates a challenge because not only will the group have an effect on the baseline score, but will also have an impact on the growth trajectory.
To simplify the model, suppose the following:
Thus, the model is the following mixed regression:
\[ y_{tij} = \alpha + \left(\zeta + g_i + w_{j(t)} \right) t + r_i + u_j + e_{tij} \]
The measure of the variability in the growth slopes are based on the \(\omega\) parameter (Hedges and Rhoads 2010). This measure is a ratio of the variance in the slope at one level to the variance of the intercept at the same level. Thus,
\(\omega_2\) is defined as
\[\omega_2 = \frac{\mbox{var}\left(g_i\right)}{\mbox{var}\left(r_i\right)} \]
which makes the variation in the slope at the unit level
\[ \mbox{var}\left(g_i\right) = \omega_2 \mbox{var}\left(r_i\right)\]
\(\omega_3\) is defined as
\[\omega_3 = \frac{\mbox{var}\left(w_j\right)}{\mbox{var}\left(u_j\right)} \]
which makes the variation in the slope at the group level
\[ \mbox{var}\left(w_j\right) = \omega_3 \mbox{var}\left(u_j\right)\]
Variation in the baseline score is of course captured by the ICCs at the unit (\(\rho_2\)) and group (\(\rho_3\)) levels
\(\rho_2\) is defined as
\[\rho_2 = \frac{\mbox{var}\left(r_i\right)}{\mbox{var}\left(u_j\right)+\mbox{var}\left(r_i\right)+\mbox{var}\left(e_{ijk}\right)}\]
\(\rho_3\) is defined as
\[\rho_2 = \frac{\mbox{var}\left(u_j\right)}{\mbox{var}\left(u_j\right)+\mbox{var}\left(r_i\right)+\mbox{var}\left(e_{ijk}\right)}\]
With these parameters defined, it is possible to simulate data.
The function to simulate the data for this model is
sim.cross.growth <- function(n, m, k, zeta,
omega.2, omega.3,
rho.2, rho.3) {
#set each group's random effects
groupdata <- cbind(group = seq(from = 1, to = m),
u = rnorm(m,sd = sqrt(rho.3)),
w = rnorm(m,sd = sqrt(omega.3*rho.3)))
#set each units's random effects
unitdata <- cbind(unit = seq(from = 1, to = n),
r = rnorm(n,sd = sqrt(rho.2)),
g = rnorm(n,sd = sqrt(omega.2*rho.2)))
#initialize the time dataset
timedata <- c()
for (i in 1:n) {
#for each unit, initialize the set of groups
groupset <- c()
for (t in 1:k) {
#for each timepoint, randomly pick a group
groupset <- c(groupset, sample(1:m, 1))
}
#create the unit's time data
timedata <- rbind(timedata, cbind(group = groupset,
unit = rep(i,k),
time = seq(from = 0, to = k - 1),
e = rnorm(k, sd = sqrt(1-(rho.3+rho.2)))))
}
#merge everything together
data <- data.frame(merge(merge(timedata, groupdata),unitdata))
#calculate the outcome
data$y <- (zeta + data$g + data$w)*data$t + data$r + data$u + data$e
#keep only the needed variables
data <- data.frame(cbind(unit = data$unit, t = data$t,
group = data$group, y = data$y))
#return dataset
return(data)
}
Thus, we can make a simple dataset where each unit is randomly nested in a group for each timepoint. For example, suppose 4 units, each with three timepoints, entered in to one of three groups at any given timepoint.
## unit t group y
## 1 1 0 1 -0.1597215
## 2 1 1 2 0.5542255
## 3 1 2 3 0.2034560
## 6 2 0 1 1.5625844
## 5 2 1 1 1.5351643
## 4 2 2 1 2.4548574
## 7 3 0 1 0.5175272
## 8 3 1 1 0.7124162
## 9 3 2 2 -0.1134085
## 12 4 0 3 -0.5826133
## 11 4 1 2 0.5325167
## 10 4 2 1 0.5074660
With a large sample simulation, we can run a cross-classified model. Here is one with 2,500 units nested in one of 100 groups per timepoint, with 3 timepoints per unit. The fixed slope is 0.5, and the \(\omega\)’s and \(\rho\)’s control the variance components.
require(lme4)
model <- lmer(y ~ t + (1 + t | group) + (1 + t | unit),
data = sim.cross.growth(n = 2500,
m = 100,
k = 3,
zeta = .5,
omega.2 = .05,
omega.3 = .1,
rho.2 = .6,
rho.3 = .1))
summary(model)
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ t + (1 + t | group) + (1 + t | unit)
## Data:
## sim.cross.growth(n = 2500, m = 100, k = 3, zeta = 0.5, omega.2 = 0.05,
## omega.3 = 0.1, rho.2 = 0.6, rho.3 = 0.1)
##
## REML criterion at convergence: 18030
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.9676 -0.5381 0.0061 0.5384 3.3871
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## unit (Intercept) 0.629763 0.79358
## t 0.050771 0.22532 -0.11
## group (Intercept) 0.078002 0.27929
## t 0.008565 0.09255 0.02
## Residual 0.282449 0.53146
## Number of obs: 7500, groups: unit, 2500; group, 100
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) -0.04800 0.03360 -1.43
## t 0.48712 0.01284 37.93
##
## Correlation of Fixed Effects:
## (Intr)
## t -0.140
Table 7 presents the means and standard deviations for the parameters based on 50 replications of the same model.
Table 7: Mean and Standard Deviation of 30 simulations of a Cross-classified Growth Model
## mean sd
## zeta (.5) 0.50166902 0.01329367
## omega.2 (.05) 0.05164811 0.01051132
## omega.3 (.1) 0.10148749 0.02695889
## rho.2 (.6) 0.60371939 0.01648643
## rho.3 (.1) 0.10067275 0.01480845
We can offer an analogy to the cross-sectional contextual effect as it pertains to growth. Again, we have \(j = \{1, 2, \dots, m\}\) groups, \(i = \{1, 2, \dots, n\}\) units, and each unit is observed over t = {1, 2, , k} time points. Again, each unit may encounter a different group at different time points.
The data generating process of \(y\) involves a covariate, \(x\), with a fixed within group effect, \(\lambda\), and an independent contextual effect, \(\gamma\) on the baseline score, with an average growth slope of \(\zeta\) moderated by the unit value of the covariate with a slope of \(\phi\) and a contextual effect on the slope \(\kappa\) based on the mean of the covariate for all the units nested in the group at a specific time (\(\bar{x}_{j(t)}\).
\[ y_{tij} = \alpha + + \lambda x_i + \gamma \bar{x}_j + \left(\zeta + \phi x_i + \kappa \bar{x}_{j(t)} + g_i + w_{j(t)} \right) t + r_i + u_j + e_{tij} \]
Again, the measure of the variability in the growth slopes are based on the \(\omega\)s, \(\omega_2\) is defined as
\[\omega_2 = \frac{\mbox{var}\left(g_i\right)}{\mbox{var}\left(r_i\right)} \]
which makes the variation in the slope at the unit level
\[ \mbox{var}\left(g_i\right) = \omega_2 \mbox{var}\left(r_i\right)\]
\(\omega_3\) is defined as
\[\omega_3 = \frac{\mbox{var}\left(w_j\right)}{\mbox{var}\left(u_j\right)} \]
which makes the variation in the slope at the group level
\[ \mbox{var}\left(w_j\right) = \omega_3 \mbox{var}\left(u_j\right)\]
Variation in the baseline score is of course captured by the ICCs at the unit (\(\rho_2\)) and group (\(\rho_3\)) levels
\(\rho_2\) is defined as
\[\rho_2 = \frac{\mbox{var}\left(r_i\right)}{\mbox{var}\left(u_j\right)+\mbox{var}\left(r_i\right)+\mbox{var}\left(e_{ijk}\right)}\]
\(\rho_3\) is defined as
\[\rho_2 = \frac{\mbox{var}\left(u_j\right)}{\mbox{var}\left(u_j\right)+\mbox{var}\left(r_i\right)+\mbox{var}\left(e_{ijk}\right)}\]
In this model, \(\zeta\) is the slope for time, or the growth trend, \(\phi\) is the effect of the covariate on the growth trend, and \(\kappa\) is the contextual effect on the growth trend.
What is happening here? Essentially, we are using a slopes-as-outcomes approach. Just as we can estimate a contextual effect crosssectionally, we can also estimate a contextual effect as a moderator to the growth slope.
Here is the simulation function
sim.cross.growth.context <- function(n, m, k, lambda, gamma,
zeta, phi, kappa,
omega.2, omega.3,
rho.2, rho.3) {
#set each group's random effects
groupdata <- cbind(group = seq(from = 1, to = m),
u = rnorm(m,sd = sqrt(rho.3)),
w = rnorm(m,sd = sqrt(omega.3*rho.3)))
#set each units's unit's x value and random effects
unitdata <- cbind(unit = seq(from = 1, to = n),
x = rnorm(n),
r = rnorm(n,sd = sqrt(rho.2)),
g = rnorm(n,sd = sqrt(omega.2*rho.2)))
#initialize the time dataset
timedata <- c()
for (i in 1:n) {
#for each unit, initialize the set of groups
groupset <- c()
for (t in 1:k) {
#for each timepoint, randomly pick a group
groupset <- c(groupset, sample(1:m, 1))
}
#create the unit's time data
timedata <- rbind(timedata, cbind(group = groupset,
unit = rep(i,k),
time = seq(from = 0, to = k - 1),
e = rnorm(k, sd = sqrt(1-(rho.3+rho.2)))))
}
#merge everything together
data <- data.frame(merge(merge(timedata, groupdata),unitdata))
# calculate the time/unit average of x
data$xbar <- ave(data$x, data$group,data$t)
#calculate the outcome
data$y <- data$x*lambda + data$xbar*gamma +
(zeta + data$x*phi + data$xbar*kappa + data$g + data$w)*data$t +
data$r + data$u + data$e
#keep only the needed variables
data <- data.frame(cbind(unit = data$unit, t = data$t,
group = data$group, x = data$x,
xbar = data$xbar, y = data$y))
#return dataset
return(data)
}
Thus, we can make a simple dataset where each unit is randomly nested in a group for each timepoint. For example, suppose 4 units, each with three timepoints, entered in to one of three groups at any given timepoint.
## unit t group x xbar y
## 1 1 0 1 0.5894040 0.5894040 1.208228614
## 2 1 1 1 0.5894040 0.8862016 3.452460113
## 3 1 2 1 0.5894040 0.8977645 3.905745551
## 5 2 0 3 1.2061250 1.0346004 2.558918627
## 4 2 1 1 1.2061250 0.8862016 2.896250146
## 6 2 2 1 1.2061250 0.8977645 4.365149678
## 9 3 0 3 0.8630758 1.0346004 0.006751706
## 7 3 1 1 0.8630758 0.8862016 0.893841100
## 8 3 2 3 0.8630758 0.7781954 1.627596419
## 10 4 0 2 0.6933150 0.6933150 0.403475962
## 11 4 1 3 0.6933150 0.6933150 2.908962236
## 12 4 2 3 0.6933150 0.7781954 3.287975021
With a large sample simulation, we can run a cross-classified model. Here is one with 2,500 units nested in one of 100 groups per timepoint, with 3 timepoints per unit. The baseline score is impacted by the unit value of x with a slope of .5, with a contextual effect of .25, the fixed growth slope is 0.5, and is moderated by the unit value of x with a slope of .15 and a contextual effect of .25, and the \(\omega\)’s and \(\rho\)’s control the variance components.
model <- lmer(y ~ x + xbar + t + I(x*t) + I(xbar*t)
+ (1 + t | group) + (1 + t | unit),
data = sim.cross.growth.context(n = 5000,
m = 200,
k = 3,
lambda = .5,
gamma = .25,
zeta = .5,
phi= .15,
kappa = .25,
omega.2 = .05,
omega.3 = .1,
rho.2 = .6,
rho.3 = .1))
summary(model)
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ x + xbar + t + I(x * t) + I(xbar * t) + (1 + t | group) +
## (1 + t | unit)
## Data: sim.cross.growth.context(n = 5000, m = 200, k = 3, lambda = 0.5,
## gamma = 0.25, zeta = 0.5, phi = 0.15, kappa = 0.25, omega.2 = 0.05,
## omega.3 = 0.1, rho.2 = 0.6, rho.3 = 0.1)
##
## REML criterion at convergence: 36175.4
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.9522 -0.5442 0.0006 0.5473 3.1745
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## unit (Intercept) 0.586762 0.76600
## t 0.026215 0.16191 0.04
## group (Intercept) 0.128929 0.35907
## t 0.009912 0.09956 -0.06
## Residual 0.306659 0.55377
## Number of obs: 15000, groups: unit, 5000; group, 200
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) -0.035236 0.028660 -1.23
## x 0.508330 0.013164 38.62
## xbar 0.235984 0.064604 3.65
## t 0.502436 0.009523 52.76
## I(x * t) 0.145945 0.006191 23.57
## I(xbar * t) 0.305038 0.051224 5.95
##
## Correlation of Fixed Effects:
## (Intr) x xbar t I(x*t)
## x -0.010
## xbar -0.089 -0.095
## t -0.165 -0.006 0.165
## I(x * t) -0.004 -0.393 0.122 0.008
## I(xbar * t) 0.069 0.073 -0.784 -0.210 -0.158
Table 8 presents the means and standard deviations for the parameters based on 30 replications of the same model.
Table 8: Mean and Standard Deviation of 50 simulations of a Cross-classified Longitudinal Contextual Model
## mean sd
## lambda (.5) 0.50079324 0.021388120
## gamma (.25) 0.24778582 0.104840052
## zeta (.5) 0.50030353 0.013841776
## phi (.15) 0.14995213 0.008193117
## kappa (.25) 0.24790754 0.085981695
## omega.2 (.05) 0.04906650 0.010785213
## omega.3 (.1) 0.10410728 0.029444242
## rho.2 (.6) 0.60029789 0.017217519
## rho.3 (.1) 0.09850469 0.014424095
Blalock, Hubert M. 1984. “Contextual-Effects Models: Theoretical and Methodological Issues.” Annual Review of Sociology. JSTOR, 353–72.
Cohen, Jacob. 1992. “A Power Primer.” Psychological Bulletin 112 (1). American Psychological Association: 155.
Hedges, Larry V, and Christopher Rhoads. 2010. “Statistical Power Analysis in Education Research. NCSER 2010-3006.” National Center for Special Education Research. ERIC.
Iversen, Gudmund R. 1991. Contextual Analysis. 7. Sage.
Raudenbush, Stephen W, and Anthony S Bryk. 2002. Hierarchical Linear Models: Applications and Data Analysis Methods. Vol. 1. Sage.