Perform an analysis of the anorexia data example following the paired or independent samples example in the course note. You can start with this R script .
library(PairedData)
#
data(Anorexia, package="PairedData")
dta <- Anorexia
head(dta)
Prior Post
1 83.8 95.2
2 83.3 94.3
3 86.0 91.5
4 82.5 91.9
5 86.7 100.3
6 79.6 76.7
with(dta, plot(paired(Prior, Post), type="profile")) +
labs(x="Therapy", y="Weight (in lb)") +
theme_bw()
## Compute variable means
colMeans(dta)
Prior Post
83.229 90.494
print(apply(dta, 2, sd), 3)
Prior Post
5.02 8.48
cov(dta)
Prior Post
Prior 25.167 22.883
Post 22.883 71.827
print(cor(dta),3)
Prior Post
Prior 1.000 0.538
Post 0.538 1.000
t.test(dta$Prior, dta$Post)
Welch Two Sample t-test
data: dta$Prior and dta$Post
t = -3.04, df = 26, p-value = 0.0053
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-12.1747 -2.3547
sample estimates:
mean of x mean of y
83.229 90.494
The results revealed by independent sample t-test showed that Session (prior and post test) yielded a significant effect with t = 3.04, p = 0.005.
t.test(dta$Prior, dta$Post, paired = TRUE)
Paired t-test
data: dta$Prior and dta$Post
t = -4.18, df = 16, p-value = 7e-04
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-10.9447 -3.5847
sample estimates:
mean of the differences
-7.2647
The results revealed by paired t-test showed that Session (prior and post test) still yielded a significant effect with t = 4.18, p < .001. As Prof. Sheu mentioned in the class, if we did not consider the effect of correlation, the t or F values would be smaller.
Identify the study design of the ergoStool example and replicate the results of analysis reported.
data(ergoStool, package="nlme")
dta <- ergoStool
head(dta)
effort Type Subject
1 12 T1 1
2 15 T2 1
3 12 T3 1
4 10 T4 1
5 10 T1 2
6 14 T2 2
Complete block designs or Randomized complete block designs
library(tidyverse)
dta_sum <- dta %>%
group_by(Type) %>%
summarise(m_Type = mean(effort), sd_Type = sd(effort))
dta_sum
# A tibble: 4 x 3
Type m_Type sd_Type
<fct> <dbl> <dbl>
1 T1 8.56 1.67
2 T2 12.4 1.59
3 T3 10.8 1.92
4 T4 9.22 1.72
ggplot(dta,
aes(x=reorder(Subject, effort, max),
y=effort,
color=Type)) +
geom_point(position=position_jitter(width=.1),
size=rel(3)) +
geom_hline(yintercept=mean(dta$effort),
linetype="dotted") +
labs(y="Effort",
x="Subject ID") +
guides(color=guide_legend(reverse=T)) +
theme_minimal() +
theme(legend.position=c(.1,.85))
## ANOVA (set Subject as random effect)
summary(m1 <- aov(effort ~ Type + Error(Subject), data=dta))
Error: Subject
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 8 66.5 8.31
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Type 3 81.2 27.06 22.4 3.9e-07
Residuals 24 29.1 1.21
The results revealed that Type yielded a significant effect (F = 22.4, p < .001) on effort.
pacman::p_load(car, tidyverse, lme4, GGally)
# input data
dta <- read.table("Data/thetaEEG.txt", header=T)
# data structure
str(dta)
'data.frame': 19 obs. of 10 variables:
$ ID : int 1 2 3 4 5 6 7 8 9 10 ...
$ Ch3 : num -3.54 5.72 0.52 0 2.07 1.67 9.13 -0.43 -0.56 1.28 ...
$ Ch4 : num -3.11 5.07 -0.18 0.74 0.76 ...
$ Ch5 : num -0.24 6.87 0.9 1.1 3.51 2.77 3.44 -0.31 -1.22 1.89 ...
$ Ch6 : num 0.42 5.96 0.6 0.13 0.6 4.55 4.8 -0.61 0.67 1.77 ...
$ Ch7 : num -0.49 8.2 1.27 0.19 3.71 1.8 0.48 -1.04 -0.97 1.83 ...
$ Ch8 : num 2.13 4.87 1.28 0.07 1.86 4.79 1.63 -0.13 -0.98 0.91 ...
$ Ch17: num -4.15 5.48 -0.95 0.8 1.49 4.51 9.94 -0.61 0 1.4 ...
$ Ch18: num 2.87 5.57 1.74 0.25 3.11 3.24 1.34 -0.61 -1.22 1.1 ...
$ Ch19: num 1.34 6.33 0.79 -0.66 1.8 3.99 1.53 -0.43 -0.91 -0.12 ...
head(dta)
ID Ch3 Ch4 Ch5 Ch6 Ch7 Ch8 Ch17 Ch18 Ch19
1 1 -3.54 -3.11 -0.24 0.42 -0.49 2.13 -4.15 2.87 1.34
2 2 5.72 5.07 6.87 5.96 8.20 4.87 5.48 5.57 6.33
3 3 0.52 -0.18 0.90 0.60 1.27 1.28 -0.95 1.74 0.79
4 4 0.00 0.74 1.10 0.13 0.19 0.07 0.80 0.25 -0.66
5 5 2.07 0.76 3.51 0.60 3.71 1.86 1.49 3.11 1.80
6 6 1.67 4.18 2.77 4.55 1.80 4.79 4.51 3.24 3.99
eeg <- as.factor(colnames(dta[, -1]))
eeg
[1] Ch3 Ch4 Ch5 Ch6 Ch7 Ch8 Ch17 Ch18 Ch19
Levels: Ch17 Ch18 Ch19 Ch3 Ch4 Ch5 Ch6 Ch7 Ch8
m0 <- lm(as.matrix(dta[, -1]) ~ 1)
m0_aov <- Anova(m0, idata = data.frame(eeg), idesign = ~ eeg)
summary(m0_aov)
Type III Repeated Measures MANOVA Tests:
------------------------------------------
Term: (Intercept)
Response transformation matrix:
(Intercept)
Ch3 1
Ch4 1
Ch5 1
Ch6 1
Ch7 1
Ch8 1
Ch17 1
Ch18 1
Ch19 1
Sum of squares and products for the hypothesis:
(Intercept)
(Intercept) 1761.6
Multivariate Tests: (Intercept)
Df test stat approx F num Df den Df Pr(>F)
Pillai 1 0.20359 4.6014 1 18 0.04584
Wilks 1 0.79641 4.6014 1 18 0.04584
Hotelling-Lawley 1 0.25564 4.6014 1 18 0.04584
Roy 1 0.25564 4.6014 1 18 0.04584
------------------------------------------
Term: eeg
Response transformation matrix:
eeg1 eeg2 eeg3 eeg4 eeg5 eeg6 eeg7 eeg8
Ch3 0 0 0 1 0 0 0 0
Ch4 0 0 0 0 1 0 0 0
Ch5 0 0 0 0 0 1 0 0
Ch6 0 0 0 0 0 0 1 0
Ch7 0 0 0 0 0 0 0 1
Ch8 -1 -1 -1 -1 -1 -1 -1 -1
Ch17 1 0 0 0 0 0 0 0
Ch18 0 1 0 0 0 0 0 0
Ch19 0 0 1 0 0 0 0 0
Sum of squares and products for the hypothesis:
eeg1 eeg2 eeg3 eeg4 eeg5 eeg6 eeg7
eeg1 5.7475 1.6500e-02 1.4850000 0.5060000 7.700000 1.9305000 3.0635000
eeg2 0.0165 4.7368e-05 0.0042632 0.0014526 0.022105 0.0055421 0.0087947
eeg3 1.4850 4.2632e-03 0.3836842 0.1307368 1.989474 0.4987895 0.7915263
eeg4 0.5060 1.4526e-03 0.1307368 0.0445474 0.677895 0.1699579 0.2697053
eeg5 7.7000 2.2105e-02 1.9894737 0.6778947 10.315789 2.5863158 4.1042105
eeg6 1.9305 5.5421e-03 0.4987895 0.1699579 2.586316 0.6484263 1.0289842
eeg7 3.0635 8.7947e-03 0.7915263 0.2697053 4.104211 1.0289842 1.6328895
eeg8 -0.0165 -4.7368e-05 -0.0042632 -0.0014526 -0.022105 -0.0055421 -0.0087947
eeg8
eeg1 -1.6500e-02
eeg2 -4.7368e-05
eeg3 -4.2632e-03
eeg4 -1.4526e-03
eeg5 -2.2105e-02
eeg6 -5.5421e-03
eeg7 -8.7947e-03
eeg8 4.7368e-05
Multivariate Tests: eeg
Df test stat approx F num Df den Df Pr(>F)
Pillai 1 0.52437 1.5159 8 11 0.2559
Wilks 1 0.47563 1.5159 8 11 0.2559
Hotelling-Lawley 1 1.10246 1.5159 8 11 0.2559
Roy 1 1.10246 1.5159 8 11 0.2559
Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
Sum Sq num Df Error SS den Df F value Pr(>F)
(Intercept) 195.7 1 766 18 4.60 0.046
eeg 10.7 8 389 144 0.49 0.858
Mauchly Tests for Sphericity
Test statistic p-value
eeg 0.000374 9.74e-11
Greenhouse-Geisser and Huynh-Feldt Corrections
for Departure from Sphericity
GG eps Pr(>F[GG])
eeg 0.315 0.66
HF eps Pr(>F[HF])
eeg 0.37092 0.6854
The results revealed by MANOVA yielded a significant effect (I think it is the subject effect? It means that there are individual difference in theta power.)
dtaL <- dta %>%
gather(key = Channel, value = Theta, 2:10) %>%
mutate( Channel = factor(Channel))
#
str(dtaL)
'data.frame': 171 obs. of 3 variables:
$ ID : int 1 2 3 4 5 6 7 8 9 10 ...
$ Channel: Factor w/ 9 levels "Ch17","Ch18",..: 4 4 4 4 4 4 4 4 4 4 ...
$ Theta : num -3.54 5.72 0.52 0 2.07 1.67 9.13 -0.43 -0.56 1.28 ...
# set theme to black and white and save parameters
ot <- theme_set(theme_bw())
ggplot(dtaL, aes(reorder(Channel, Theta, median), Theta)) +
geom_boxplot() +
geom_point(alpha = .3) +
geom_hline(yintercept = mean(dtaL$Theta)) +
labs(x ="Channel Number", y = "Theta (Power)")
summary(m1 <- aov(Theta ~ Channel + Error(ID/Channel), data = dtaL))
Error: ID
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 1 112 112
Error: ID:Channel
Df Sum Sq Mean Sq
Channel 8 23.1 2.89
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Channel 8 10 1.23 0.18 0.99
Residuals 153 1020 6.67
The results revealed by Univariate ANOVA showed that the channels did not reach signifiant difference on theta power.
# Theta Power over Channel with CIs
ggplot(dtaL, aes(reorder(Channel, Theta, median), Theta)) +
stat_summary(fun.data = "mean_cl_boot", geom = "pointrange", col = "firebrick") +
geom_hline(yintercept = mean(dtaL$Theta), linetype = "dashed") +
geom_point(alpha = .3, size = rel(.7)) +
labs(x = "Channel Number", y = "Theta (Power)")
ggplot(dtaL, aes(reorder(ID, Theta, mean), Theta, color = Channel)) +
geom_point() +
geom_hline(yintercept = mean(dtaL$Theta), linetype = "dashed") +
coord_flip() +
labs(y = "Theta (Power)", x = "Subject ID") +
theme(legend.position = c(.9, .2))
summary(m2 <- lmer(Theta ~ Channel + (1 | ID), data = dtaL))
Linear mixed model fit by REML ['lmerMod']
Formula: Theta ~ Channel + (1 | ID)
Data: dtaL
REML criterion at convergence: 697
Scaled residuals:
Min 1Q Median 3Q Max
-3.650 -0.388 0.000 0.422 4.639
Random effects:
Groups Name Variance Std.Dev.
ID (Intercept) 4.43 2.10
Residual 2.70 1.64
Number of obs: 171, groups: ID, 19
Fixed effects:
Estimate Std. Error t value
(Intercept) 1.403 0.613 2.29
ChannelCh18 -0.548 0.533 -1.03
ChannelCh19 -0.408 0.533 -0.76
ChannelCh3 -0.502 0.533 -0.94
ChannelCh4 0.187 0.533 0.35
ChannelCh5 -0.365 0.533 -0.68
ChannelCh6 -0.257 0.533 -0.48
ChannelCh7 -0.552 0.533 -1.03
ChannelCh8 -0.550 0.533 -1.03
Correlation of Fixed Effects:
(Intr) ChnC18 ChnC19 ChnnC3 ChnnC4 ChnnC5 ChnnC6 ChnnC7
ChannelCh18 -0.435
ChannelCh19 -0.435 0.500
ChannelCh3 -0.435 0.500 0.500
ChannelCh4 -0.435 0.500 0.500 0.500
ChannelCh5 -0.435 0.500 0.500 0.500 0.500
ChannelCh6 -0.435 0.500 0.500 0.500 0.500 0.500
ChannelCh7 -0.435 0.500 0.500 0.500 0.500 0.500 0.500
ChannelCh8 -0.435 0.500 0.500 0.500 0.500 0.500 0.500 0.500
confint(m2)
2.5 % 97.5 %
.sig01 1.49441 2.97200
.sigma 1.43604 1.79864
(Intercept) 0.20522 2.60004
ChannelCh18 -1.57256 0.47572
ChannelCh19 -1.43204 0.61625
ChannelCh3 -1.52572 0.52256
ChannelCh4 -0.83730 1.21098
ChannelCh5 -1.38940 0.65888
ChannelCh6 -1.28098 0.76730
ChannelCh7 -1.57572 0.47256
ChannelCh8 -1.57414 0.47414
ranef(m2)
$ID
(Intercept)
1 -1.498201
2 4.624063
3 -0.380711
4 -0.729276
5 0.965688
6 2.275669
7 3.702186
8 -1.601210
9 -1.446177
10 0.140576
11 1.246621
12 0.115604
13 -2.645866
14 -1.248483
15 -1.247442
16 2.160174
17 -0.099778
18 -2.341001
19 -1.992436
with conditional variances for "ID"
plot(m2, form = resid(., type = "pearson") ~ fitted(.) | Channel,
abline = 0, lty = 2, layout = c(4,1), pch = 20,
xlab = "Fitted values", ylab = "Residuals")
dtaL <- dtaL %>%
group_by(Channel) %>%
mutate( var_eeg = var(Theta)) %>%
as.data.frame
head(dtaL)
ID Channel Theta var_eeg
1 1 Ch3 -3.54 8.6459
2 2 Ch3 5.72 8.6459
3 3 Ch3 0.52 8.6459
4 4 Ch3 0.00 8.6459
5 5 Ch3 2.07 8.6459
6 6 Ch3 1.67 8.6459
summary(m3 <- lmer(Theta ~ Channel + (1 | ID), data = dtaL,
weights = 1/var_eeg))
Linear mixed model fit by REML ['lmerMod']
Formula: Theta ~ Channel + (1 | ID)
Data: dtaL
Weights: 1/var_eeg
REML criterion at convergence: 683.9
Scaled residuals:
Min 1Q Median 3Q Max
-3.861 -0.416 -0.021 0.430 4.029
Random effects:
Groups Name Variance Std.Dev.
ID (Intercept) 3.983 1.996
Residual 0.371 0.609
Number of obs: 171, groups: ID, 19
Fixed effects:
Estimate Std. Error t value
(Intercept) 1.403 0.648 2.17
ChannelCh18 -0.548 0.585 -0.94
ChannelCh19 -0.408 0.564 -0.72
ChannelCh3 -0.502 0.615 -0.82
ChannelCh4 0.187 0.672 0.28
ChannelCh5 -0.365 0.564 -0.65
ChannelCh6 -0.257 0.556 -0.46
ChannelCh7 -0.552 0.558 -0.99
ChannelCh8 -0.550 0.542 -1.02
Correlation of Fixed Effects:
(Intr) ChnC18 ChnC19 ChnnC3 ChnnC4 ChnnC5 ChnnC6 ChnnC7
ChannelCh18 -0.554
ChannelCh19 -0.574 0.636
ChannelCh3 -0.526 0.583 0.604
ChannelCh4 -0.482 0.534 0.553 0.508
ChannelCh5 -0.574 0.636 0.659 0.604 0.554
ChannelCh6 -0.583 0.645 0.669 0.613 0.562 0.669
ChannelCh7 -0.581 0.643 0.667 0.611 0.560 0.667 0.677
ChannelCh8 -0.598 0.662 0.686 0.629 0.577 0.686 0.697 0.694
plot(m3, form = resid(., type = "pearson") ~ fitted(.) | Channel,
abline = 0, lty = 2, layout = c(4,1), pch = 20,
xlab = "Fitted values", ylab = "Residuals")
In my opinion, I need to do some model comparisons to see which model is the best fitting, but the example code can’t run. I still looking for some procedure to do it.
Conduct an analysis of the cognitive task example and commment on the results produced by each code chunk.
dta <- read.table("data/cognitive_task.txt", h=T)
head(dta)
ID Score Method Class Klass
1 S01 3 I1 C1 K1
2 S02 6 I1 C1 K1
3 S03 3 I1 C1 K1
4 S04 3 I1 C1 K1
5 S05 1 I1 C2 K2
6 S06 2 I1 C2 K2
VCA::varPlot(Score ~ Method/Class/ID,
Data=dta,
YLabel=list(text="Score", side=2, cex=1),
MeanLine=list(var=c("Method", "Class"),
col=c("darkred", "salmon"), lwd=c(1, 2)))
dta_sum <- dta %>%
group_by(Method) %>%
summarise(m_Score = mean(Score), sd_Score = sd(Score))
dta_sum
# A tibble: 2 x 3
Method m_Score sd_Score
<chr> <dbl> <dbl>
1 I1 3.5 1.63
2 I2 7.25 2.35
dta_sum1 <- dta %>%
group_by(Method, Class) %>%
summarise(m_Score = mean(Score), sd_Score = sd(Score))
dta_sum1
# A tibble: 8 x 4
# Groups: Method [2]
Method Class m_Score sd_Score
<chr> <chr> <dbl> <dbl>
1 I1 C1 3.75 1.5
2 I1 C2 1.75 0.5
3 I1 C3 5.5 0.577
4 I1 C4 3 0.816
5 I2 C5 7 0.816
6 I2 C6 4 0.816
7 I2 C7 8 0.816
8 I2 C8 10 0.816
m0 <- aov(Score ~ Method + Error(Class), data=dta)
#
summary(m0)
Error: Class
Df Sum Sq Mean Sq F value Pr(>F)
Method 1 112 112.5 6.46 0.044
Residuals 6 104 17.4
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 24 18.5 0.771
The results showed that Method yielded a significant effect on scores.
m01 <- aov(Score ~ Method + Error(Method/Klass), data=dta)
#
summary(m01)
Error: Method
Df Sum Sq Mean Sq
Method 1 112 112
Error: Method:Klass
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 6 104 17.4
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 24 18.5 0.771
112/17.4
[1] 6.4368
m1 <- lme4::lmer(Score ~ Method + (1 | Class), data=dta)
m1
Linear mixed model fit by REML ['lmerMod']
Formula: Score ~ Method + (1 | Class)
Data: dta
REML criterion at convergence: 101.58
Random effects:
Groups Name Std.Dev.
Class (Intercept) 2.040
Residual 0.878
Number of obs: 32, groups: Class, 8
Fixed Effects:
(Intercept) MethodI2
3.50 3.75
m11 <- lme4::lmer(Score ~ Method + (1 | Method:Klass), data=dta)
m11
Linear mixed model fit by REML ['lmerMod']
Formula: Score ~ Method + (1 | Method:Klass)
Data: dta
REML criterion at convergence: 101.58
Random effects:
Groups Name Std.Dev.
Method:Klass (Intercept) 2.040
Residual 0.878
Number of obs: 32, groups: Method:Klass, 8
Fixed Effects:
(Intercept) MethodI2
3.50 3.75
confint(m1, method="boot")
2.5 % 97.5 %
.sig01 0.82428 3.4025
.sigma 0.65193 1.1422
(Intercept) 1.52570 5.6800
MethodI2 0.92168 6.7614