Bliese, P.D., Maltarich, M.A. & Hendricks, J.L. Back to Basics with Mixed-Effects Models: Nine Take-Away Points. J Bus Psychol 33, 1–23 (2018).

https://doi.org/10.1007/s10869-017-9491-z

1 Data Management

##Simulation 1: Independent Data

##Set up Independent data
set.seed(12532) #使用set.seed生成隨機序列數據,()內設定起始值
#建立X和Y向量
Y<-rnorm(1000) #用rnormY隨機生成一組1000筆數據Y,Y是常態分配,平均為0,標準差為1

X<-.30*Y+sqrt(1-.30^2)*rnorm(1000) #sqrt是square root
#X和Y相關係數
cor(X,Y)
## [1] 0.2937485
#建立group ID向量,ID從1到100,重複10次,平均一組有10個成員
G.ID<-rep(1:100,10) 
#隨機分派Group ID給observations(X,Y)
NO.GRP<-data.frame(G.ID=G.ID,Y=Y,X=X) 
head(NO.GRP)
##   G.ID          Y          X
## 1    1 -0.3562218 -0.2533692
## 2    2  0.3926757  1.9169494
## 3    3 -0.4866951 -0.3444651
## 4    4  0.2367052 -1.6619983
## 5    5  0.5555600  1.3272490
## 6    6  0.1011640  2.1926501
#Calculate group means for X.
TDAT<-aggregate(NO.GRP[,c("G.ID","X")],by=list(NO.GRP$G.ID),mean) #list()取得G.ID變量
#Assign means back to members as X.G
NO.GRP<-merge(NO.GRP,TDAT[,c("G.ID","X")],by="G.ID", #merge()將兩個data.frame 透過by="G.ID"進行合併
suffixes=c("",".G")) #The default is c(".x",".y")

2 Table 1

#These data have no group-level properties.
#NO.GRP[1:15,] 
knitr::kable(head(NO.GRP,15))
G.ID Y X X.G
1 -0.3562218 -0.2533692 -0.1805595
1 -0.4222437 -1.1081410 -0.1805595
1 -0.4751256 0.1228866 -0.1805595
1 -0.0754015 -0.2089118 -0.1805595
1 0.0401238 -0.5647489 -0.1805595
1 -0.0425110 0.3182596 -0.1805595
1 1.8467950 0.9231918 -0.1805595
1 0.5551599 -0.5044792 -0.1805595
1 1.6542047 0.7838111 -0.1805595
1 -2.3206311 -1.3140941 -0.1805595
2 -0.8275048 -0.6610696 -0.2058514
2 0.3926757 1.9169494 -0.2058514
2 0.3734971 -0.8080668 -0.2058514
2 -2.4918987 -2.2593381 -0.2058514
2 -1.4049858 1.5874561 -0.2058514

3 Take-away point 1

#install packages
pacman::p_load(nlme,multilevel)
##ICC(1) estimate via mixed-effects
tmod<-lme(fixed=Y~1,random=~1|G.ID,data=NO.GRP) 
VarCorr(tmod)
## G.ID = pdLogChol(1) 
##             Variance     StdDev      
## (Intercept) 4.299649e-09 6.557171e-05
## Residual    9.986574e-01 9.993285e-01
#組內相關係數ICC(1)
4.299649e-09/(4.299649e-09+9.986574e-01)
## [1] 4.305429e-09
# ICC(1) estimate via ANOVA
tmod<-aov(Y~as.factor(G.ID),data=NO.GRP)
ICC1(tmod)
## [1] -0.01560787

ICC(1)用來描述組內資料的相似性或資料的不獨立性。以上用兩種方式計算組內相關係數ICC(1)皆趨近於0,表示隨機抽取任一群組內的兩名員工,其response variable完全無共通性。

將完全獨立的資料應用在OLS和Mixed effect model,結果會如何?

4 Table 2

#Table 2 (upper panel) presents the results from regressing Y on X and X.G in OLS regression in simulated data with near-zero ICC(1)

tmod<-lm(Y~X+X.G, data=NO.GRP)
knitr::kable(summary(tmod)$coef)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0615821 0.0302628 2.034908 0.0421243
X 0.3019515 0.0316102 9.552340 0.0000000
X.G -0.1098059 0.0981465 -1.118796 0.2634968
#Table 2 (lower panel) presents the results from regressing Y on X and X.G in mixed-effects random intercept model in simulated data with near-zero ICC(1)

tmod<-lme(fixed=Y~X+X.G,random=~1|G.ID,data=NO.GRP)
#summary(tmod)$tTable
knitr::kable(summary(tmod)$tTable)
Value Std.Error DF t-value p-value
(Intercept) 0.0615821 0.0302629 899 2.034908 0.0421531
X 0.3019515 0.0316102 899 9.552340 0.0000000
X.G -0.1098059 0.0981465 98 -1.118796 0.2659622

兩種模型所估計的參數與Std.Error都相同。

Take-away point 1:即使沒有預期或明顯的總體層次效果,混合效果模型仍可被用於巢套資料。

5 Take-away point 2

#Re-run simulation with data that returns a slightly positive ICC(1)
set.seed(125321) #變更起始值
Y<-rnorm(1000)
X<-.30*Y+sqrt(1-.30^2)*rnorm(1000)
cor(X,Y)
## [1] 0.2827705
G.ID<-rep(1:100,10)
NO.GRP<-data.frame(G.ID=G.ID,Y=Y,X=X)
# Calculate group means for X. Assign means back to members as X.G
TDAT<-aggregate(NO.GRP[,c("G.ID","X")],list(NO.GRP$G.ID),mean)
NO.GRP<-merge(NO.GRP,TDAT[,c("G.ID","X")],by="G.ID",
suffixes=c("",".G"))
knitr::kable(head(NO.GRP,15))
G.ID Y X X.G
1 -1.9711215 0.5262381 0.1933068
1 -0.2086705 -1.2067246 0.1933068
1 0.1279318 0.7974741 0.1933068
1 0.9715064 -0.2736225 0.1933068
1 0.9996284 0.1510868 0.1933068
1 -0.1924392 0.7063790 0.1933068
1 1.0780661 -0.9318504 0.1933068
1 -0.7626443 0.9025238 0.1933068
1 0.1578146 0.3128734 0.1933068
1 2.8152661 0.9486898 0.1933068
2 0.6172546 1.0773918 -0.1148806
2 0.1150379 0.9117392 -0.1148806
2 -0.6256317 -0.8983085 -0.1148806
2 0.8985839 2.0294684 -0.1148806
2 -1.1983031 -0.8236887 -0.1148806
# ICC(1) estimate via mixed-effects
tmod<-lme(fixed=Y~1,random=~1|G.ID,data=NO.GRP)
VarCorr(tmod)
## G.ID = pdLogChol(1) 
##             Variance   StdDev   
## (Intercept) 0.01292908 0.1137061
## Residual    1.00240352 1.0012010
#ICC(1)
(0.01292908)/(0.01292908+1.00240352)
## [1] 0.01273384
# ICC(1) estimate via ANOVA
tmod<-aov(Y~as.factor(G.ID),data=NO.GRP)
summary(tmod)
##                  Df Sum Sq Mean Sq F value Pr(>F)
## as.factor(G.ID)  99  112.0   1.132   1.129  0.194
## Residuals       900  902.2   1.002
ICC1(tmod)
## [1] 0.01273379

ICC(1)=0.013,表示任兩個成員之間的相似性有1.3%可歸因於群組的影響。 但F-test 檢定結果指出ICC(1)=0.013不具有統計上的顯著性(p = .194),因此一般情況下仍被視為independent data。

忽略資料的非獨立性,將它視作完全獨立的資料應用在OLS和Mixed effect model,結果會如何?

6 Table3

#Table 3 (upper panel) presents the results of regressing Y on X and X.G in OLS in simulated data with small positive ICC(1)

tmod<-lm(Y~X+X.G, data=NO.GRP)
knitr::kable(summary(tmod)$coef)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0148230 0.0307618 0.4818625 0.6300094
X 0.2826789 0.0328511 8.6048573 0.0000000
X.G 0.0947241 0.1080628 0.8765653 0.3809340
#Table 3 (lower panel) presents the results of regressing Y on X and X.G in mixed-effects random intercept model in simulated data with small positive ICC(1)

tmod<-lme(fixed=Y~X+X.G, random=~1|G.ID,data=NO.GRP)
knitr::kable(summary(tmod)$tTable)
Value Std.Error DF t-value p-value
(Intercept) 0.0148230 0.0320474 899 0.4625321 0.6438117
X 0.2826789 0.0326979 899 8.6451599 0.0000000
X.G 0.0947241 0.1121245 98 0.8448117 0.4002738

兩模型所估計的參數相同,但是在OLS中,X的標準誤略大於Mixed-effects model,會增加型二誤差的風險;而X.G的標準誤略小於Mixed-effects model,會增加型一誤差的風險。這就是本文要告訴我們的第二個重點。

Take-away point 2:Nested data沒有選擇使用mixed-effect model,group-level effects會增加型一錯誤,lower-level effects會增加型二錯誤的風險。

接下來要討論平減(centering)

7 Take-away point 3

# Create a group-mean centered variable (W.X)
NO.GRP$W.X<-NO.GRP$X-NO.GRP$X.G
#Table 4 (upper panel) Results of regressing Y on W.X and X.G in OLS (upper panel) in simulated data with small positive ICC(1)

tmod<-lm(Y~W.X+X.G, data=NO.GRP)
knitr::kable(summary(tmod)$coef)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0148230 0.0307618 0.4818625 0.6300094
W.X 0.2826789 0.0328511 8.6048573 0.0000000
X.G 0.3774030 0.1029484 3.6659436 0.0002594
#Table 4 (lower panel) Results of regressing Y on W.X and X.G in mixed-effects random intercept model in simulated data with small positive ICC(1)

tmod<-lme(fixed=Y~W.X+X.G,random=~1|G.ID,data=NO.GRP)
knitr::kable(summary(tmod)$tTable)
Value Std.Error DF t-value p-value
(Intercept) 0.0148230 0.0320474 899 0.4625321 0.6438117
W.X 0.2826789 0.0326979 899 8.6451599 0.0000000
X.G 0.3774030 0.1072509 98 3.5188807 0.0006589

表4呈現兩模型level-1變數經過centering後的結果,參數估計結果相同。 我們再和表3沒有centering相比,X和W.X參數估計結果也相同,但是X.G參數卻變大了,此現象稱為“emergent effect”,又稱作contextual effect 或incremental effect (詳見Hofmann and Gavin 1998)

8 Take-away point 5

Interpreting Level-2 Coefficients

# create a dataset that contains 100 (X.G, Y.G) pairs instead of 1000 (X, Y) pairs

TDAT<-aggregate(NO.GRP[,c("Y","X")],list(NO.GRP$G.ID),mean)
names(TDAT)<-c("G.ID","Y.G","X.G")
knitr::kable(head(TDAT,15))
G.ID Y.G X.G
1 0.3015338 0.1933068
2 0.1172007 -0.1148806
3 0.6550346 0.6917088
4 0.0919974 -0.4618666
5 0.2392360 0.6067226
6 0.6261253 0.0598669
7 -0.2403001 -0.0295834
8 -0.5648905 -0.2815494
9 0.1233002 -0.1471086
10 -0.4606494 0.0345138
11 0.2488465 -0.0363647
12 0.2269388 0.0661628
13 0.0349877 -0.1330073
14 0.1542718 -0.0606252
15 -0.9466745 -0.1054040
#Table 5 (results from 100 group means) Results of OLS group-mean regression of Y.G on X.G in simulated data with small positive ICC(1)

tmod<-lm(Y.G~X.G, data=TDAT)
knitr::kable(summary(tmod)$coef)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.014823 0.0320474 0.4625321 0.6447251
X.G 0.377403 0.1072509 3.5188807 0.0006589
#Add another group-level predictor, Z.G, with no level-1 analogue

set.seed(125321)
Y<-rnorm(1000)
X<-.30*Y+sqrt(1-.30^2)*rnorm(1000)
G.ID<-rep(1:100,10)
NO.GRP<-data.frame(G.ID=G.ID,Y=Y,X=X)
NO.GRP<-NO.GRP[order(NO.GRP$G.ID),]
NO.GRP$Z.G<-rep(rnorm(100),each=10) #新增group-level變數Z.G
knitr::kable(head(NO.GRP,10))
G.ID Y X Z.G
1 1 -1.9711215 0.5262381 -0.6305105
101 1 -0.7626443 0.9025238 -0.6305105
201 1 0.1578146 0.3128734 -0.6305105
301 1 -0.2086705 -1.2067246 -0.6305105
401 1 2.8152661 0.9486898 -0.6305105
501 1 0.9715064 -0.2736225 -0.6305105
601 1 1.0780661 -0.9318504 -0.6305105
701 1 0.1279318 0.7974741 -0.6305105
801 1 0.9996284 0.1510868 -0.6305105
901 1 -0.1924392 0.7063790 -0.6305105
#mixed effect model
tmod<-lme(fixed=Y~Z.G, random=~1|G.ID, data=NO.GRP) #Y~Z.G
knitr::kable(summary(tmod)$tTable)
Value Std.Error DF t-value p-value
(Intercept) 0.0272597 0.0337761 900 0.8070708 0.4198389
Z.G 0.0174623 0.0366474 98 0.4764947 0.6347827
#OLS regression
TDAT<-aggregate(NO.GRP[,c("Y","Z.G")],list(NO.GRP$G.ID),mean)
names(TDAT)<-c("G.ID","Y.G","Z.G")

tmod<-lm(Y.G~Z.G, data=TDAT) #Y.G~Z.G
knitr::kable(summary(tmod)$coef)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0272597 0.0337761 0.8070711 0.4215800
Z.G 0.0174623 0.0366474 0.4764948 0.6347826

Take-away point 5: 在mixed-effects models,高層次的變數Z.G可預測低層次結果變數Y的組平均Y.G。