Bliese, P.D., Maltarich, M.A. & Hendricks, J.L. Back to Basics with Mixed-Effects Models: Nine Take-Away Points. J Bus Psychol 33, 1–23 (2018).

https://doi.org/10.1007/s10869-017-9491-z

1 Data Management

##Simulation 1: Independent Data

##Set up Independent data
set.seed(12532) #使用set.seed生成隨機序列數據，()內設定起始值

#建立X和Y向量
Y<-rnorm(1000) #用rnormY隨機生成一組1000筆數據Y，Y是常態分配，平均為0，標準差為1

X<-.30*Y+sqrt(1-.30^2)*rnorm(1000) #sqrt是square root

#X和Y相關係數
cor(X,Y)

## [1] 0.2937485

#建立group ID向量，ID從1到100，重複10次，平均一組有10個成員
G.ID<-rep(1:100,10)

#隨機分派Group ID給observations(X,Y)
NO.GRP<-data.frame(G.ID=G.ID,Y=Y,X=X)

head(NO.GRP)

##   G.ID          Y          X
## 1    1 -0.3562218 -0.2533692
## 2    2  0.3926757  1.9169494
## 3    3 -0.4866951 -0.3444651
## 4    4  0.2367052 -1.6619983
## 5    5  0.5555600  1.3272490
## 6    6  0.1011640  2.1926501

#Calculate group means for X.
TDAT<-aggregate(NO.GRP[,c("G.ID","X")],by=list(NO.GRP$G.ID),mean) #list()取得G.ID變量

#Assign means back to members as X.G
NO.GRP<-merge(NO.GRP,TDAT[,c("G.ID","X")],by="G.ID", #merge()將兩個data.frame 透過by="G.ID"進行合併
suffixes=c("",".G")) #The default is c(".x",".y")

2 Table 1

#These data have no group-level properties.
#NO.GRP[1:15,] 
knitr::kable(head(NO.GRP,15))

G.ID	Y	X	X.G
1	-0.3562218	-0.2533692	-0.1805595
1	-0.4222437	-1.1081410	-0.1805595
1	-0.4751256	0.1228866	-0.1805595
1	-0.0754015	-0.2089118	-0.1805595
1	0.0401238	-0.5647489	-0.1805595
1	-0.0425110	0.3182596	-0.1805595
1	1.8467950	0.9231918	-0.1805595
1	0.5551599	-0.5044792	-0.1805595
1	1.6542047	0.7838111	-0.1805595
1	-2.3206311	-1.3140941	-0.1805595
2	-0.8275048	-0.6610696	-0.2058514
2	0.3926757	1.9169494	-0.2058514
2	0.3734971	-0.8080668	-0.2058514
2	-2.4918987	-2.2593381	-0.2058514
2	-1.4049858	1.5874561	-0.2058514

3 Take-away point 1

#install packages
pacman::p_load(nlme,multilevel)

##ICC(1) estimate via mixed-effects
tmod<-lme(fixed=Y~1,random=~1|G.ID,data=NO.GRP)

VarCorr(tmod)

## G.ID = pdLogChol(1) 
##             Variance     StdDev      
## (Intercept) 4.299649e-09 6.557171e-05
## Residual    9.986574e-01 9.993285e-01

#組內相關係數ICC(1)
4.299649e-09/(4.299649e-09+9.986574e-01)

## [1] 4.305429e-09

# ICC(1) estimate via ANOVA
tmod<-aov(Y~as.factor(G.ID),data=NO.GRP)

ICC1(tmod)

## [1] -0.01560787

ICC(1)用來描述組內資料的相似性或資料的不獨立性。以上用兩種方式計算組內相關係數ICC(1)皆趨近於0，表示隨機抽取任一群組內的兩名員工，其response variable完全無共通性。

將完全獨立的資料應用在OLS和Mixed effect model，結果會如何?

4 Table 2

#Table 2 (upper panel) presents the results from regressing Y on X and X.G in OLS regression in simulated data with near-zero ICC(1)

tmod<-lm(Y~X+X.G, data=NO.GRP)
knitr::kable(summary(tmod)$coef)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	0.0615821	0.0302628	2.034908	0.0421243
X	0.3019515	0.0316102	9.552340	0.0000000
X.G	-0.1098059	0.0981465	-1.118796	0.2634968

#Table 2 (lower panel) presents the results from regressing Y on X and X.G in mixed-effects random intercept model in simulated data with near-zero ICC(1)

tmod<-lme(fixed=Y~X+X.G,random=~1|G.ID,data=NO.GRP)
#summary(tmod)$tTable
knitr::kable(summary(tmod)$tTable)

	Value	Std.Error	DF	t-value	p-value
(Intercept)	0.0615821	0.0302629	899	2.034908	0.0421531
X	0.3019515	0.0316102	899	9.552340	0.0000000
X.G	-0.1098059	0.0981465	98	-1.118796	0.2659622

兩種模型所估計的參數與Std.Error都相同。

Take-away point 1：即使沒有預期或明顯的總體層次效果，混合效果模型仍可被用於巢套資料。

5 Take-away point 2

#Re-run simulation with data that returns a slightly positive ICC(1)
set.seed(125321) #變更起始值
Y<-rnorm(1000)
X<-.30*Y+sqrt(1-.30^2)*rnorm(1000)
cor(X,Y)

## [1] 0.2827705

G.ID<-rep(1:100,10)
NO.GRP<-data.frame(G.ID=G.ID,Y=Y,X=X)

# Calculate group means for X. Assign means back to members as X.G
TDAT<-aggregate(NO.GRP[,c("G.ID","X")],list(NO.GRP$G.ID),mean)
NO.GRP<-merge(NO.GRP,TDAT[,c("G.ID","X")],by="G.ID",
suffixes=c("",".G"))
knitr::kable(head(NO.GRP,15))

G.ID	Y	X	X.G
1	-1.9711215	0.5262381	0.1933068
1	-0.2086705	-1.2067246	0.1933068
1	0.1279318	0.7974741	0.1933068
1	0.9715064	-0.2736225	0.1933068
1	0.9996284	0.1510868	0.1933068
1	-0.1924392	0.7063790	0.1933068
1	1.0780661	-0.9318504	0.1933068
1	-0.7626443	0.9025238	0.1933068
1	0.1578146	0.3128734	0.1933068
1	2.8152661	0.9486898	0.1933068
2	0.6172546	1.0773918	-0.1148806
2	0.1150379	0.9117392	-0.1148806
2	-0.6256317	-0.8983085	-0.1148806
2	0.8985839	2.0294684	-0.1148806
2	-1.1983031	-0.8236887	-0.1148806

# ICC(1) estimate via mixed-effects
tmod<-lme(fixed=Y~1,random=~1|G.ID,data=NO.GRP)
VarCorr(tmod)

## G.ID = pdLogChol(1) 
##             Variance   StdDev   
## (Intercept) 0.01292908 0.1137061
## Residual    1.00240352 1.0012010

#ICC(1)
(0.01292908)/(0.01292908+1.00240352)

## [1] 0.01273384

# ICC(1) estimate via ANOVA
tmod<-aov(Y~as.factor(G.ID),data=NO.GRP)
summary(tmod)

##                  Df Sum Sq Mean Sq F value Pr(>F)
## as.factor(G.ID)  99  112.0   1.132   1.129  0.194
## Residuals       900  902.2   1.002

ICC1(tmod)

## [1] 0.01273379

ICC(1)=0.013，表示任兩個成員之間的相似性有1.3%可歸因於群組的影響。但F-test 檢定結果指出ICC(1)=0.013不具有統計上的顯著性(p = .194)，因此一般情況下仍被視為independent data。

忽略資料的非獨立性，將它視作完全獨立的資料應用在OLS和Mixed effect model，結果會如何?

6 Table3

#Table 3 (upper panel) presents the results of regressing Y on X and X.G in OLS in simulated data with small positive ICC(1)

tmod<-lm(Y~X+X.G, data=NO.GRP)
knitr::kable(summary(tmod)$coef)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	0.0148230	0.0307618	0.4818625	0.6300094
X	0.2826789	0.0328511	8.6048573	0.0000000
X.G	0.0947241	0.1080628	0.8765653	0.3809340

#Table 3 (lower panel) presents the results of regressing Y on X and X.G in mixed-effects random intercept model in simulated data with small positive ICC(1)

tmod<-lme(fixed=Y~X+X.G, random=~1|G.ID,data=NO.GRP)
knitr::kable(summary(tmod)$tTable)

	Value	Std.Error	DF	t-value	p-value
(Intercept)	0.0148230	0.0320474	899	0.4625321	0.6438117
X	0.2826789	0.0326979	899	8.6451599	0.0000000
X.G	0.0947241	0.1121245	98	0.8448117	0.4002738

兩模型所估計的參數相同，但是在OLS中，X的標準誤略大於Mixed-effects model，會增加型二誤差的風險；而X.G的標準誤略小於Mixed-effects model，會增加型一誤差的風險。這就是本文要告訴我們的第二個重點。

Take-away point 2：Nested data沒有選擇使用mixed-effect model，group-level effects會增加型一錯誤，lower-level effects會增加型二錯誤的風險。

接下來要討論平減(centering)

7 Take-away point 3

# Create a group-mean centered variable (W.X)
NO.GRP$W.X<-NO.GRP$X-NO.GRP$X.G

#Table 4 (upper panel) Results of regressing Y on W.X and X.G in OLS (upper panel) in simulated data with small positive ICC(1)

tmod<-lm(Y~W.X+X.G, data=NO.GRP)
knitr::kable(summary(tmod)$coef)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	0.0148230	0.0307618	0.4818625	0.6300094
W.X	0.2826789	0.0328511	8.6048573	0.0000000
X.G	0.3774030	0.1029484	3.6659436	0.0002594

#Table 4 (lower panel) Results of regressing Y on W.X and X.G in mixed-effects random intercept model in simulated data with small positive ICC(1)

tmod<-lme(fixed=Y~W.X+X.G,random=~1|G.ID,data=NO.GRP)
knitr::kable(summary(tmod)$tTable)

	Value	Std.Error	DF	t-value	p-value
(Intercept)	0.0148230	0.0320474	899	0.4625321	0.6438117
W.X	0.2826789	0.0326979	899	8.6451599	0.0000000
X.G	0.3774030	0.1072509	98	3.5188807	0.0006589

表4呈現兩模型level-1變數經過centering後的結果，參數估計結果相同。我們再和表3沒有centering相比，X和W.X參數估計結果也相同，但是X.G參數卻變大了，此現象稱為“emergent effect”，又稱作contextual effect 或incremental effect (詳見Hofmann and Gavin 1998)

8 Take-away point 5

Interpreting Level-2 Coefficients

# create a dataset that contains 100 (X.G, Y.G) pairs instead of 1000 (X, Y) pairs

TDAT<-aggregate(NO.GRP[,c("Y","X")],list(NO.GRP$G.ID),mean)
names(TDAT)<-c("G.ID","Y.G","X.G")
knitr::kable(head(TDAT,15))

G.ID	Y.G	X.G
1	0.3015338	0.1933068
2	0.1172007	-0.1148806
3	0.6550346	0.6917088
4	0.0919974	-0.4618666
5	0.2392360	0.6067226
6	0.6261253	0.0598669
7	-0.2403001	-0.0295834
8	-0.5648905	-0.2815494
9	0.1233002	-0.1471086
10	-0.4606494	0.0345138
11	0.2488465	-0.0363647
12	0.2269388	0.0661628
13	0.0349877	-0.1330073
14	0.1542718	-0.0606252
15	-0.9466745	-0.1054040

#Table 5 (results from 100 group means) Results of OLS group-mean regression of Y.G on X.G in simulated data with small positive ICC(1)

tmod<-lm(Y.G~X.G, data=TDAT)
knitr::kable(summary(tmod)$coef)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	0.014823	0.0320474	0.4625321	0.6447251
X.G	0.377403	0.1072509	3.5188807	0.0006589

#Add another group-level predictor, Z.G, with no level-1 analogue

set.seed(125321)
Y<-rnorm(1000)
X<-.30*Y+sqrt(1-.30^2)*rnorm(1000)
G.ID<-rep(1:100,10)
NO.GRP<-data.frame(G.ID=G.ID,Y=Y,X=X)
NO.GRP<-NO.GRP[order(NO.GRP$G.ID),]
NO.GRP$Z.G<-rep(rnorm(100),each=10) #新增group-level變數Z.G
knitr::kable(head(NO.GRP,10))

	G.ID	Y	X	Z.G
1	1	-1.9711215	0.5262381	-0.6305105
101	1	-0.7626443	0.9025238	-0.6305105
201	1	0.1578146	0.3128734	-0.6305105
301	1	-0.2086705	-1.2067246	-0.6305105
401	1	2.8152661	0.9486898	-0.6305105
501	1	0.9715064	-0.2736225	-0.6305105
601	1	1.0780661	-0.9318504	-0.6305105
701	1	0.1279318	0.7974741	-0.6305105
801	1	0.9996284	0.1510868	-0.6305105
901	1	-0.1924392	0.7063790	-0.6305105

#mixed effect model
tmod<-lme(fixed=Y~Z.G, random=~1|G.ID, data=NO.GRP) #Y~Z.G
knitr::kable(summary(tmod)$tTable)

	Value	Std.Error	DF	t-value	p-value
(Intercept)	0.0272597	0.0337761	900	0.8070708	0.4198389
Z.G	0.0174623	0.0366474	98	0.4764947	0.6347827

#OLS regression
TDAT<-aggregate(NO.GRP[,c("Y","Z.G")],list(NO.GRP$G.ID),mean)
names(TDAT)<-c("G.ID","Y.G","Z.G")

tmod<-lm(Y.G~Z.G, data=TDAT) #Y.G~Z.G
knitr::kable(summary(tmod)$coef)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	0.0272597	0.0337761	0.8070711	0.4215800
Z.G	0.0174623	0.0366474	0.4764948	0.6347826

Take-away point 5: 在mixed-effects models,高層次的變數Z.G可預測低層次結果變數Y的組平均Y.G。

Tutorial：Mixed-effect model (Simulation 1)

Ching-Fang Wu

Sat Dec 12 17:01:43 2020