Perform an analysis of the anorexia data example following the paired or independent samples example in the course note. You can start with this R script .
pacman::p_load("tidyverse")
讀取資料
# install.packages("PairedData")
library(PairedData)
data(Anorexia, package="PairedData")
str(Anorexia)
'data.frame': 17 obs. of 2 variables:
$ Prior: num 83.8 83.3 86 82.5 86.7 79.6 76.9 94.2 73.4 80.5 ...
$ Post : num 95.2 94.3 91.5 91.9 100.3 ...
治療前後體重比較圖
with(Anorexia, plot(paired(Prior, Post), type="profile")) +
labs(x="Therapy", y="Weight (in lb)") +
theme_bw()
各欄位變項平均值
colMeans(Anorexia)
Prior Post
83.229 90.494
各欄位變項標準差
print(apply(Anorexia, 2, sd), 3)
Prior Post
5.02 8.48
共變數
cov(Anorexia)
Prior Post
Prior 25.167 22.883
Post 22.883 71.827
相關係數
cor(Anorexia)
Prior Post
Prior 1.0000 0.5382
Post 0.5382 1.0000
2-sample t-test
t.test(Anorexia$Prior, Anorexia$Post)
Welch Two Sample t-test
data: Anorexia$Prior and Anorexia$Post
t = -3.04, df = 26, p-value = 0.0053
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-12.1747 -2.3547
sample estimates:
mean of x mean of y
83.229 90.494
從 2-sample t-test 使用差平均的檢定統計值的結果可以看到,治療前後之體重具有顯著差異。
paired t-test
t.test(Anorexia$Prior, Anorexia$Post, pair = T)
Paired t-test
data: Anorexia$Prior and Anorexia$Post
t = -4.18, df = 16, p-value = 7e-04
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-10.9447 -3.5847
sample estimates:
mean of the differences
-7.2647
由此可知,治療前後的體重確實有極顯著差異,p 值和 2-sample t-test 有很大程度的不同,應該是 2-sample t-test 將治療前後作成兩個獨立樣本。
重新整理資料,並改成長型資料
names(Anorexia) <- c("A", "B")
dta1 <- gather(Anorexia, key = "Therapy", value = "Weight", A, B)
head(dta1)
Therapy Weight
1 A 83.8
2 A 83.3
3 A 86.0
4 A 82.5
5 A 86.7
6 A 79.6
str(dta1)
'data.frame': 34 obs. of 2 variables:
$ Therapy: chr "A" "A" "A" "A" ...
$ Weight : num 83.8 83.3 86 82.5 86.7 79.6 76.9 94.2 73.4 80.5 ...
治療前(A)後(B)之體重散布圖與各組之平均
ggplot(dta1, aes(Therapy, Weight)) +
geom_point(shape = 1) +
stat_summary(fun.data = mean_cl_boot, geom = "pointrange") +
coord_flip() +
labs(x = "Therapy", y = "Weight (in lb)") +
theme_bw()
ANOVA
summary(lm(Weight ~ Therapy, data=dta1))
Call:
lm(formula = Weight ~ Therapy, data = dta1)
Residuals:
Min 1Q Median 3Q Max
-15.29 -2.45 1.11 4.00 11.11
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 83.23 1.69 49.28 <2e-16
TherapyB 7.26 2.39 3.04 0.0047
Residual standard error: 6.96 on 32 degrees of freedom
Multiple R-squared: 0.224, Adjusted R-squared: 0.2
F-statistic: 9.25 on 1 and 32 DF, p-value: 0.00467
summary(aov(Weight ~ Therapy, data=dta1))
Df Sum Sq Mean Sq F value Pr(>F)
Therapy 1 449 449 9.25 0.0047
Residuals 32 1552 48
The end
Identify the study design of the ergoStool example and replicate the results of analysis reported.
資料讀取
# install.packages("nlme")
library(nlme)
data(ergoStool, package="nlme")
head(ergoStool)
Grouped Data: effort ~ Type | Subject
effort Type Subject
1 12 T1 1
2 15 T2 1
3 12 T3 1
4 10 T4 1
5 10 T1 2
6 14 T2 2
str(ergoStool)
Classes 'nffGroupedData', 'nfGroupedData', 'groupedData' and 'data.frame': 36 obs. of 3 variables:
$ effort : num 12 15 12 10 10 14 13 12 7 14 ...
$ Type : Factor w/ 4 levels "T1","T2","T3",..: 1 2 3 4 1 2 3 4 1 2 ...
$ Subject: Ord.factor w/ 9 levels "8"<"5"<"4"<"9"<..: 8 8 8 8 9 9 9 9 6 6 ...
- attr(*, "formula")=Class 'formula' language effort ~ Type | Subject
.. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
- attr(*, "labels")=List of 2
..$ x: chr "Type of stool"
..$ y: chr "Effort required to arise"
- attr(*, "units")=List of 1
..$ y: chr "(Borg scale)"
- attr(*, "FUN")=function (x)
..- attr(*, "source")= chr "function (x) mean(x, na.rm = TRUE)"
- attr(*, "order.groups")= logi TRUE
還沒想到怎麼做QQ
Examine the channel effects of the EEG study by following the analysis in the left brow EMG example.
pacman::p_load(car, tidyverse, lme4, GGally)
資料輸入
dta3 <- read.table("~/data/thetaEEG.txt", header=T)
str(dta3)
'data.frame': 19 obs. of 10 variables:
$ ID : int 1 2 3 4 5 6 7 8 9 10 ...
$ Ch3 : num -3.54 5.72 0.52 0 2.07 1.67 9.13 -0.43 -0.56 1.28 ...
$ Ch4 : num -3.11 5.07 -0.18 0.74 0.76 ...
$ Ch5 : num -0.24 6.87 0.9 1.1 3.51 2.77 3.44 -0.31 -1.22 1.89 ...
$ Ch6 : num 0.42 5.96 0.6 0.13 0.6 4.55 4.8 -0.61 0.67 1.77 ...
$ Ch7 : num -0.49 8.2 1.27 0.19 3.71 1.8 0.48 -1.04 -0.97 1.83 ...
$ Ch8 : num 2.13 4.87 1.28 0.07 1.86 4.79 1.63 -0.13 -0.98 0.91 ...
$ Ch17: num -4.15 5.48 -0.95 0.8 1.49 4.51 9.94 -0.61 0 1.4 ...
$ Ch18: num 2.87 5.57 1.74 0.25 3.11 3.24 1.34 -0.61 -1.22 1.1 ...
$ Ch19: num 1.34 6.33 0.79 -0.66 1.8 3.99 1.53 -0.43 -0.91 -0.12 ...
head(dta3)
ID Ch3 Ch4 Ch5 Ch6 Ch7 Ch8 Ch17 Ch18 Ch19
1 1 -3.54 -3.11 -0.24 0.42 -0.49 2.13 -4.15 2.87 1.34
2 2 5.72 5.07 6.87 5.96 8.20 4.87 5.48 5.57 6.33
3 3 0.52 -0.18 0.90 0.60 1.27 1.28 -0.95 1.74 0.79
4 4 0.00 0.74 1.10 0.13 0.19 0.07 0.80 0.25 -0.66
5 5 2.07 0.76 3.51 0.60 3.71 1.86 1.49 3.11 1.80
6 6 1.67 4.18 2.77 4.55 1.80 4.79 4.51 3.24 3.99
car package 進行多變量探討重新命名並變更變量屬性
ch <- as.factor(colnames(dta3[, -1]))
線性迴歸模型
m0 <- lm(as.matrix(dta3[, -1]) ~ 1)
m0_aov <- Anova(m0, idata = data.frame(ch), idesign = ~ ch)
summary(m0_aov)
Type III Repeated Measures MANOVA Tests:
------------------------------------------
Term: (Intercept)
Response transformation matrix:
(Intercept)
Ch3 1
Ch4 1
Ch5 1
Ch6 1
Ch7 1
Ch8 1
Ch17 1
Ch18 1
Ch19 1
Sum of squares and products for the hypothesis:
(Intercept)
(Intercept) 1761.6
Multivariate Tests: (Intercept)
Df test stat approx F num Df den Df Pr(>F)
Pillai 1 0.20359 4.6014 1 18 0.04584
Wilks 1 0.79641 4.6014 1 18 0.04584
Hotelling-Lawley 1 0.25564 4.6014 1 18 0.04584
Roy 1 0.25564 4.6014 1 18 0.04584
------------------------------------------
Term: ch
Response transformation matrix:
ch1 ch2 ch3 ch4 ch5 ch6 ch7 ch8
Ch3 0 0 0 1 0 0 0 0
Ch4 0 0 0 0 1 0 0 0
Ch5 0 0 0 0 0 1 0 0
Ch6 0 0 0 0 0 0 1 0
Ch7 0 0 0 0 0 0 0 1
Ch8 -1 -1 -1 -1 -1 -1 -1 -1
Ch17 1 0 0 0 0 0 0 0
Ch18 0 1 0 0 0 0 0 0
Ch19 0 0 1 0 0 0 0 0
Sum of squares and products for the hypothesis:
ch1 ch2 ch3 ch4 ch5 ch6 ch7
ch1 5.7475 1.6500e-02 1.4850000 0.5060000 7.700000 1.9305000 3.0635000
ch2 0.0165 4.7368e-05 0.0042632 0.0014526 0.022105 0.0055421 0.0087947
ch3 1.4850 4.2632e-03 0.3836842 0.1307368 1.989474 0.4987895 0.7915263
ch4 0.5060 1.4526e-03 0.1307368 0.0445474 0.677895 0.1699579 0.2697053
ch5 7.7000 2.2105e-02 1.9894737 0.6778947 10.315789 2.5863158 4.1042105
ch6 1.9305 5.5421e-03 0.4987895 0.1699579 2.586316 0.6484263 1.0289842
ch7 3.0635 8.7947e-03 0.7915263 0.2697053 4.104211 1.0289842 1.6328895
ch8 -0.0165 -4.7368e-05 -0.0042632 -0.0014526 -0.022105 -0.0055421 -0.0087947
ch8
ch1 -1.6500e-02
ch2 -4.7368e-05
ch3 -4.2632e-03
ch4 -1.4526e-03
ch5 -2.2105e-02
ch6 -5.5421e-03
ch7 -8.7947e-03
ch8 4.7368e-05
Multivariate Tests: ch
Df test stat approx F num Df den Df Pr(>F)
Pillai 1 0.52437 1.5159 8 11 0.2559
Wilks 1 0.47563 1.5159 8 11 0.2559
Hotelling-Lawley 1 1.10246 1.5159 8 11 0.2559
Roy 1 1.10246 1.5159 8 11 0.2559
Univariate Type III Repeated-Measures ANOVA Assuming Sphericity
Sum Sq num Df Error SS den Df F value Pr(>F)
(Intercept) 195.7 1 766 18 4.60 0.046
ch 10.7 8 389 144 0.49 0.858
Mauchly Tests for Sphericity
Test statistic p-value
ch 0.000374 9.74e-11
Greenhouse-Geisser and Huynh-Feldt Corrections
for Departure from Sphericity
GG eps Pr(>F[GG])
ch 0.315 0.66
HF eps Pr(>F[HF])
ch 0.37092 0.6854
寬型資料轉換成長型資料
dta3L <- dta3 %>%
gather( key = Channel, value = Value, 2:10) %>%
mutate( Channel = factor(Channel))
str(dta3L)
'data.frame': 171 obs. of 3 variables:
$ ID : int 1 2 3 4 5 6 7 8 9 10 ...
$ Channel: Factor w/ 9 levels "Ch17","Ch18",..: 4 4 4 4 4 4 4 4 4 4 ...
$ Value : num -3.54 5.72 0.52 0 2.07 1.67 9.13 -0.43 -0.56 1.28 ...
資料視覺化
ot <- theme_set(theme_bw())
ggplot(dta3L, aes(reorder(Channel, Value, median), Value)) +
geom_boxplot() +
geom_point(alpha = .3) +
geom_hline(yintercept = mean(dta3L$Value)) +
labs(x ="Channel of the patients' brain", y = "Changes in absolute theta power of EEG")
模型
m1 <- aov(Value ~ Channel + Error(ID/Channel), data = dta3L)
summary(m1)
Error: ID
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 1 112 112
Error: ID:Channel
Df Sum Sq Mean Sq
Channel 8 23.1 2.89
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Channel 8 10 1.23 0.18 0.99
Residuals 153 1020 6.67
重新排序channel factor
dta3L$Channel <- factor(dta3L$Channel, levels(dta3L$Channel)[c(2, 4, 8, 3, 1, 6, 7, 9, 5)])
按各 Channel 平均大小排序之散布圖
ggplot(dta3L, aes(Channel, Value)) +
stat_summary(fun.data = "mean_cl_boot", geom = "pointrange", col = "firebrick") +
geom_hline(yintercept = mean(dta3L$Value), linetype = "dashed") +
geom_point(alpha = .3, size = rel(.7)) +
labs(x ="Channel of the patients' brain", y = "Changes in absolute theta power of EEG")
各病患之 EEG 以不同 channel 分別呈色
ggplot(dta3L, aes(reorder(ID, Value, mean), Value, color = Channel)) +
geom_point() +
geom_hline(yintercept = mean(dta3L$Value), linetype = "dashed") +
coord_flip() +
scale_color_manual(values = c("gray", "forestgreen", "orangered", "navy", "yellow", "pink", "brown", "purple", "black"))+
labs(y = "Changes in absolute theta power of EEG", x = "Patient ID") +
theme(legend.position = c(.9, .4))
summary(m2 <- lmer(Value ~ Channel + (1 | ID), data = dta3L))
Linear mixed model fit by REML ['lmerMod']
Formula: Value ~ Channel + (1 | ID)
Data: dta3L
REML criterion at convergence: 697
Scaled residuals:
Min 1Q Median 3Q Max
-3.650 -0.388 0.000 0.422 4.639
Random effects:
Groups Name Variance Std.Dev.
ID (Intercept) 4.43 2.10
Residual 2.70 1.64
Number of obs: 171, groups: ID, 19
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.85421 0.61257 1.39
ChannelCh3 0.04684 0.53346 0.09
ChannelCh7 -0.00316 0.53346 -0.01
ChannelCh19 0.14053 0.53346 0.26
ChannelCh17 0.54842 0.53346 1.03
ChannelCh5 0.18316 0.53346 0.34
ChannelCh6 0.29158 0.53346 0.55
ChannelCh8 -0.00158 0.53346 0.00
ChannelCh4 0.73526 0.53346 1.38
Correlation of Fixed Effects:
(Intr) ChnnC3 ChnnC7 ChnC19 ChnC17 ChnnC5 ChnnC6 ChnnC8
ChannelCh3 -0.435
ChannelCh7 -0.435 0.500
ChannelCh19 -0.435 0.500 0.500
ChannelCh17 -0.435 0.500 0.500 0.500
ChannelCh5 -0.435 0.500 0.500 0.500 0.500
ChannelCh6 -0.435 0.500 0.500 0.500 0.500 0.500
ChannelCh8 -0.435 0.500 0.500 0.500 0.500 0.500 0.500
ChannelCh4 -0.435 0.500 0.500 0.500 0.500 0.500 0.500 0.500
confint(m2)
2.5 % 97.5 %
.sig01 1.49441 2.9720
.sigma 1.43604 1.7986
(Intercept) -0.34320 2.0516
ChannelCh3 -0.97730 1.0710
ChannelCh7 -1.02730 1.0210
ChannelCh19 -0.88362 1.1647
ChannelCh17 -0.47572 1.5726
ChannelCh5 -0.84098 1.2073
ChannelCh6 -0.73256 1.3157
ChannelCh8 -1.02572 1.0226
ChannelCh4 -0.28888 1.7594
ranef(m2)
$ID
(Intercept)
1 -1.498201
2 4.624063
3 -0.380711
4 -0.729276
5 0.965688
6 2.275669
7 3.702186
8 -1.601210
9 -1.446177
10 0.140576
11 1.246621
12 0.115604
13 -2.645866
14 -1.248483
15 -1.247442
16 2.160174
17 -0.099778
18 -2.341001
19 -1.992436
with conditional variances for "ID"
殘差圖
plot(m2, form = resid(., type = "pearson") ~ fitted(.) | Channel,
abline = 0, lty = 2, layout = c(5,2), pch = 20,
xlab = "Fitted values", ylab = "Residuals")
EEG 對 Channel 的變異數
v_ch <- dta3L %>%
group_by(Channel) %>%
summarize( var_ch = var(Value)) %>%
as.data.frame
dta3L2 <- merge(dta3L, v_ch, by = "Channel")
str(dta3L2)
'data.frame': 171 obs. of 4 variables:
$ Channel: Factor w/ 9 levels "Ch18","Ch3","Ch7",..: 5 5 5 5 5 5 5 5 5 5 ...
$ ID : int 1 2 3 4 5 6 7 8 9 10 ...
$ Value : num -4.15 5.48 -0.95 0.8 1.49 4.51 9.94 -0.61 0 1.4 ...
$ var_ch : num 10.7 10.7 10.7 10.7 10.7 ...
加權
m3 <- lmer(Value ~ Channel + (1 | ID), data = dta3L2,
weights = 1/var_ch)
summary(m3)
Linear mixed model fit by REML ['lmerMod']
Formula: Value ~ Channel + (1 | ID)
Data: dta3L2
Weights: 1/var_ch
REML criterion at convergence: 683.9
Scaled residuals:
Min 1Q Median 3Q Max
-3.861 -0.416 -0.021 0.430 4.029
Random effects:
Groups Name Variance Std.Dev.
ID (Intercept) 3.983 1.996
Residual 0.371 0.609
Number of obs: 171, groups: ID, 19
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.85421 0.58474 1.46
ChannelCh3 0.04684 0.54881 0.09
ChannelCh7 -0.00316 0.48322 -0.01
ChannelCh19 0.14053 0.49095 0.29
ChannelCh17 0.54842 0.58488 0.94
ChannelCh5 0.18316 0.49082 0.37
ChannelCh6 0.29158 0.48104 0.61
ChannelCh8 -0.00158 0.46467 0.00
ChannelCh4 0.73526 0.61118 1.20
Correlation of Fixed Effects:
(Intr) ChnnC3 ChnnC7 ChnC19 ChnC17 ChnnC5 ChnnC6 ChnnC8
ChannelCh3 -0.412
ChannelCh7 -0.468 0.499
ChannelCh19 -0.461 0.491 0.558
ChannelCh17 -0.387 0.412 0.468 0.461
ChannelCh5 -0.461 0.491 0.558 0.549 0.461
ChannelCh6 -0.470 0.501 0.569 0.560 0.470 0.560
ChannelCh8 -0.487 0.519 0.589 0.580 0.487 0.580 0.592
ChannelCh4 -0.370 0.394 0.448 0.441 0.370 0.441 0.450 0.466
殘差圖
plot(m3, form = resid(., type = "pearson") ~ fitted(.) | Channel,
abline = 0, lty = 2, layout = c(5,2), pch = 20,
xlab = "Fitted values", ylab = "Residuals")
*residual plots - combined and compared**
目前無法解決 fortify() 的問題…
The end
Conduct an analysis of the cognitive task example and commment on the results produced by each code chunk.
資料輸入
dta4 <- read.table("~/data/cognitive_task.txt", h=T)
分層級群組顯示散布圖
VCA::varPlot(Score ~ Method/Class/ID,
Data=dta4,
YLabel=list(text="Score", side=2, cex=1),
MeanLine=list(var=c("Method", "Class"),
col=c("darkred", "salmon"), lwd=c(1, 2)))
在class具random effect 下分數與方法的迴歸模型
m0 <- aov(Score ~ Method + Error(Class), data=dta4)
在klass和method具random effect下分數與方法的迴歸模型
m01 <- aov(Score ~ Method + Error(Method/Klass), data=dta4)
summary(m0)
Error: Class
Df Sum Sq Mean Sq F value Pr(>F)
Method 1 112 112.5 6.46 0.044
Residuals 6 104 17.4
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 24 18.5 0.771
summary(m01)
Error: Method
Df Sum Sq Mean Sq
Method 1 112 112
Error: Method:Klass
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 6 104 17.4
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 24 18.5 0.771
m1 <- lme4::lmer(Score ~ Method + (1 | Class), data=dta4)
m11 <- lme4::lmer(Score ~ Method + (1 | Method:Klass), data=dta4)
信賴區間
confint(m1, method="boot")
2.5 % 97.5 %
.sig01 0.86539 3.2472
.sigma 0.62091 1.1177
(Intercept) 1.37717 5.6343
MethodI2 0.57779 6.3374
The end