1 Homework 1

Perform an analysis of the anorexia data example following the paired or independent samples example in the course note. You can start with this R script .

pacman::p_load("tidyverse")

讀取資料

# install.packages("PairedData")
library(PairedData)
data(Anorexia, package="PairedData")
str(Anorexia)
'data.frame':   17 obs. of  2 variables:
 $ Prior: num  83.8 83.3 86 82.5 86.7 79.6 76.9 94.2 73.4 80.5 ...
 $ Post : num  95.2 94.3 91.5 91.9 100.3 ...

治療前後體重比較圖

with(Anorexia, plot(paired(Prior, Post), type="profile")) +
 labs(x="Therapy", y="Weight (in lb)") +
 theme_bw()

各欄位變項平均值

colMeans(Anorexia)
 Prior   Post 
83.229 90.494 

各欄位變項標準差

print(apply(Anorexia, 2, sd), 3)
Prior  Post 
 5.02  8.48 

共變數

cov(Anorexia)
       Prior   Post
Prior 25.167 22.883
Post  22.883 71.827

相關係數

cor(Anorexia)
       Prior   Post
Prior 1.0000 0.5382
Post  0.5382 1.0000

2-sample t-test

t.test(Anorexia$Prior, Anorexia$Post)

    Welch Two Sample t-test

data:  Anorexia$Prior and Anorexia$Post
t = -3.04, df = 26, p-value = 0.0053
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -12.1747  -2.3547
sample estimates:
mean of x mean of y 
   83.229    90.494 

從 2-sample t-test 使用差平均的檢定統計值的結果可以看到,治療前後之體重具有顯著差異。

paired t-test

t.test(Anorexia$Prior, Anorexia$Post, pair = T)

    Paired t-test

data:  Anorexia$Prior and Anorexia$Post
t = -4.18, df = 16, p-value = 7e-04
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -10.9447  -3.5847
sample estimates:
mean of the differences 
                -7.2647 

由此可知,治療前後的體重確實有極顯著差異,p 值和 2-sample t-test 有很大程度的不同,應該是 2-sample t-test 將治療前後作成兩個獨立樣本。

重新整理資料,並改成長型資料

names(Anorexia) <- c("A", "B")
dta1 <- gather(Anorexia, key = "Therapy", value = "Weight", A, B)
head(dta1)
  Therapy Weight
1       A   83.8
2       A   83.3
3       A   86.0
4       A   82.5
5       A   86.7
6       A   79.6
str(dta1)
'data.frame':   34 obs. of  2 variables:
 $ Therapy: chr  "A" "A" "A" "A" ...
 $ Weight : num  83.8 83.3 86 82.5 86.7 79.6 76.9 94.2 73.4 80.5 ...

治療前(A)後(B)之體重散布圖與各組之平均

ggplot(dta1, aes(Therapy, Weight)) + 
 geom_point(shape = 1) + 
 stat_summary(fun.data = mean_cl_boot, geom = "pointrange") + 
 coord_flip() +
 labs(x = "Therapy", y = "Weight (in lb)") +
 theme_bw()

ANOVA

summary(lm(Weight ~ Therapy, data=dta1))

Call:
lm(formula = Weight ~ Therapy, data = dta1)

Residuals:
   Min     1Q Median     3Q    Max 
-15.29  -2.45   1.11   4.00  11.11 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)    83.23       1.69   49.28   <2e-16
TherapyB        7.26       2.39    3.04   0.0047

Residual standard error: 6.96 on 32 degrees of freedom
Multiple R-squared:  0.224, Adjusted R-squared:   0.2 
F-statistic: 9.25 on 1 and 32 DF,  p-value: 0.00467
summary(aov(Weight ~ Therapy, data=dta1))
            Df Sum Sq Mean Sq F value Pr(>F)
Therapy      1    449     449    9.25 0.0047
Residuals   32   1552      48               

The end

2 Homework 2

Identify the study design of the ergoStool example and replicate the results of analysis reported.

資料讀取

# install.packages("nlme")
library(nlme)
data(ergoStool, package="nlme")
head(ergoStool)
Grouped Data: effort ~ Type | Subject
  effort Type Subject
1     12   T1       1
2     15   T2       1
3     12   T3       1
4     10   T4       1
5     10   T1       2
6     14   T2       2
str(ergoStool)
Classes 'nffGroupedData', 'nfGroupedData', 'groupedData' and 'data.frame':  36 obs. of  3 variables:
 $ effort : num  12 15 12 10 10 14 13 12 7 14 ...
 $ Type   : Factor w/ 4 levels "T1","T2","T3",..: 1 2 3 4 1 2 3 4 1 2 ...
 $ Subject: Ord.factor w/ 9 levels "8"<"5"<"4"<"9"<..: 8 8 8 8 9 9 9 9 6 6 ...
 - attr(*, "formula")=Class 'formula'  language effort ~ Type | Subject
  .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
 - attr(*, "labels")=List of 2
  ..$ x: chr "Type of stool"
  ..$ y: chr "Effort required to arise"
 - attr(*, "units")=List of 1
  ..$ y: chr "(Borg scale)"
 - attr(*, "FUN")=function (x)  
  ..- attr(*, "source")= chr "function (x) mean(x, na.rm = TRUE)"
 - attr(*, "order.groups")= logi TRUE

還沒想到怎麼做QQ

3 Homework 3

Examine the channel effects of the EEG study by following the analysis in the left brow EMG example.

pacman::p_load(car, tidyverse, lme4, GGally)

資料輸入

dta3 <- read.table("~/data/thetaEEG.txt", header=T)
str(dta3)
'data.frame':   19 obs. of  10 variables:
 $ ID  : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Ch3 : num  -3.54 5.72 0.52 0 2.07 1.67 9.13 -0.43 -0.56 1.28 ...
 $ Ch4 : num  -3.11 5.07 -0.18 0.74 0.76 ...
 $ Ch5 : num  -0.24 6.87 0.9 1.1 3.51 2.77 3.44 -0.31 -1.22 1.89 ...
 $ Ch6 : num  0.42 5.96 0.6 0.13 0.6 4.55 4.8 -0.61 0.67 1.77 ...
 $ Ch7 : num  -0.49 8.2 1.27 0.19 3.71 1.8 0.48 -1.04 -0.97 1.83 ...
 $ Ch8 : num  2.13 4.87 1.28 0.07 1.86 4.79 1.63 -0.13 -0.98 0.91 ...
 $ Ch17: num  -4.15 5.48 -0.95 0.8 1.49 4.51 9.94 -0.61 0 1.4 ...
 $ Ch18: num  2.87 5.57 1.74 0.25 3.11 3.24 1.34 -0.61 -1.22 1.1 ...
 $ Ch19: num  1.34 6.33 0.79 -0.66 1.8 3.99 1.53 -0.43 -0.91 -0.12 ...
head(dta3)
  ID   Ch3   Ch4   Ch5  Ch6   Ch7  Ch8  Ch17 Ch18  Ch19
1  1 -3.54 -3.11 -0.24 0.42 -0.49 2.13 -4.15 2.87  1.34
2  2  5.72  5.07  6.87 5.96  8.20 4.87  5.48 5.57  6.33
3  3  0.52 -0.18  0.90 0.60  1.27 1.28 -0.95 1.74  0.79
4  4  0.00  0.74  1.10 0.13  0.19 0.07  0.80 0.25 -0.66
5  5  2.07  0.76  3.51 0.60  3.71 1.86  1.49 3.11  1.80
6  6  1.67  4.18  2.77 4.55  1.80 4.79  4.51 3.24  3.99

3.1car package 進行多變量探討

重新命名並變更變量屬性

ch <- as.factor(colnames(dta3[, -1]))

線性迴歸模型

m0 <- lm(as.matrix(dta3[, -1]) ~ 1)
m0_aov <- Anova(m0, idata = data.frame(ch), idesign = ~ ch)
summary(m0_aov)

Type III Repeated Measures MANOVA Tests:

------------------------------------------
 
Term: (Intercept) 

 Response transformation matrix:
     (Intercept)
Ch3            1
Ch4            1
Ch5            1
Ch6            1
Ch7            1
Ch8            1
Ch17           1
Ch18           1
Ch19           1

Sum of squares and products for the hypothesis:
            (Intercept)
(Intercept)      1761.6

Multivariate Tests: (Intercept)
                 Df test stat approx F num Df den Df  Pr(>F)
Pillai            1   0.20359   4.6014      1     18 0.04584
Wilks             1   0.79641   4.6014      1     18 0.04584
Hotelling-Lawley  1   0.25564   4.6014      1     18 0.04584
Roy               1   0.25564   4.6014      1     18 0.04584

------------------------------------------
 
Term: ch 

 Response transformation matrix:
     ch1 ch2 ch3 ch4 ch5 ch6 ch7 ch8
Ch3    0   0   0   1   0   0   0   0
Ch4    0   0   0   0   1   0   0   0
Ch5    0   0   0   0   0   1   0   0
Ch6    0   0   0   0   0   0   1   0
Ch7    0   0   0   0   0   0   0   1
Ch8   -1  -1  -1  -1  -1  -1  -1  -1
Ch17   1   0   0   0   0   0   0   0
Ch18   0   1   0   0   0   0   0   0
Ch19   0   0   1   0   0   0   0   0

Sum of squares and products for the hypothesis:
        ch1         ch2        ch3        ch4       ch5        ch6        ch7
ch1  5.7475  1.6500e-02  1.4850000  0.5060000  7.700000  1.9305000  3.0635000
ch2  0.0165  4.7368e-05  0.0042632  0.0014526  0.022105  0.0055421  0.0087947
ch3  1.4850  4.2632e-03  0.3836842  0.1307368  1.989474  0.4987895  0.7915263
ch4  0.5060  1.4526e-03  0.1307368  0.0445474  0.677895  0.1699579  0.2697053
ch5  7.7000  2.2105e-02  1.9894737  0.6778947 10.315789  2.5863158  4.1042105
ch6  1.9305  5.5421e-03  0.4987895  0.1699579  2.586316  0.6484263  1.0289842
ch7  3.0635  8.7947e-03  0.7915263  0.2697053  4.104211  1.0289842  1.6328895
ch8 -0.0165 -4.7368e-05 -0.0042632 -0.0014526 -0.022105 -0.0055421 -0.0087947
            ch8
ch1 -1.6500e-02
ch2 -4.7368e-05
ch3 -4.2632e-03
ch4 -1.4526e-03
ch5 -2.2105e-02
ch6 -5.5421e-03
ch7 -8.7947e-03
ch8  4.7368e-05

Multivariate Tests: ch
                 Df test stat approx F num Df den Df Pr(>F)
Pillai            1   0.52437   1.5159      8     11 0.2559
Wilks             1   0.47563   1.5159      8     11 0.2559
Hotelling-Lawley  1   1.10246   1.5159      8     11 0.2559
Roy               1   1.10246   1.5159      8     11 0.2559

Univariate Type III Repeated-Measures ANOVA Assuming Sphericity

            Sum Sq num Df Error SS den Df F value Pr(>F)
(Intercept)  195.7      1      766     18    4.60  0.046
ch            10.7      8      389    144    0.49  0.858


Mauchly Tests for Sphericity

   Test statistic  p-value
ch       0.000374 9.74e-11


Greenhouse-Geisser and Huynh-Feldt Corrections
 for Departure from Sphericity

   GG eps Pr(>F[GG])
ch  0.315       0.66

    HF eps Pr(>F[HF])
ch 0.37092     0.6854

寬型資料轉換成長型資料

dta3L <- dta3 %>% 
        gather( key = Channel, value = Value, 2:10) %>% 
        mutate( Channel = factor(Channel))
str(dta3L)
'data.frame':   171 obs. of  3 variables:
 $ ID     : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Channel: Factor w/ 9 levels "Ch17","Ch18",..: 4 4 4 4 4 4 4 4 4 4 ...
 $ Value  : num  -3.54 5.72 0.52 0 2.07 1.67 9.13 -0.43 -0.56 1.28 ...

資料視覺化

ot <- theme_set(theme_bw())
ggplot(dta3L, aes(reorder(Channel, Value, median), Value)) +
  geom_boxplot() +
  geom_point(alpha = .3) +
  geom_hline(yintercept = mean(dta3L$Value)) +
  labs(x ="Channel of the patients' brain", y = "Changes in absolute theta power of EEG")

3.2 單一變量

模型

m1 <- aov(Value ~ Channel + Error(ID/Channel), data = dta3L)
summary(m1)

Error: ID
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  1    112     112               

Error: ID:Channel
        Df Sum Sq Mean Sq
Channel  8   23.1    2.89

Error: Within
           Df Sum Sq Mean Sq F value Pr(>F)
Channel     8     10    1.23    0.18   0.99
Residuals 153   1020    6.67               

重新排序channel factor

dta3L$Channel <- factor(dta3L$Channel, levels(dta3L$Channel)[c(2, 4, 8, 3, 1, 6, 7, 9, 5)])

按各 Channel 平均大小排序之散布圖

ggplot(dta3L, aes(Channel, Value)) +
 stat_summary(fun.data = "mean_cl_boot", geom = "pointrange", col = "firebrick") + 
 geom_hline(yintercept = mean(dta3L$Value), linetype = "dashed") +
 geom_point(alpha = .3, size = rel(.7)) +
 labs(x ="Channel of the patients' brain", y = "Changes in absolute theta power of EEG") 

各病患之 EEG 以不同 channel 分別呈色

ggplot(dta3L, aes(reorder(ID, Value, mean), Value, color = Channel)) +
 geom_point() +
 geom_hline(yintercept = mean(dta3L$Value), linetype = "dashed") + 
 coord_flip() +
 scale_color_manual(values = c("gray", "forestgreen", "orangered", "navy", "yellow", "pink", "brown", "purple", "black"))+
 labs(y = "Changes in absolute theta power of EEG", x = "Patient ID") +
 theme(legend.position = c(.9, .4))

3.3 mixed-effect approach

summary(m2 <- lmer(Value ~ Channel + (1 | ID), data = dta3L))
Linear mixed model fit by REML ['lmerMod']
Formula: Value ~ Channel + (1 | ID)
   Data: dta3L

REML criterion at convergence: 697

Scaled residuals: 
   Min     1Q Median     3Q    Max 
-3.650 -0.388  0.000  0.422  4.639 

Random effects:
 Groups   Name        Variance Std.Dev.
 ID       (Intercept) 4.43     2.10    
 Residual             2.70     1.64    
Number of obs: 171, groups:  ID, 19

Fixed effects:
            Estimate Std. Error t value
(Intercept)  0.85421    0.61257    1.39
ChannelCh3   0.04684    0.53346    0.09
ChannelCh7  -0.00316    0.53346   -0.01
ChannelCh19  0.14053    0.53346    0.26
ChannelCh17  0.54842    0.53346    1.03
ChannelCh5   0.18316    0.53346    0.34
ChannelCh6   0.29158    0.53346    0.55
ChannelCh8  -0.00158    0.53346    0.00
ChannelCh4   0.73526    0.53346    1.38

Correlation of Fixed Effects:
            (Intr) ChnnC3 ChnnC7 ChnC19 ChnC17 ChnnC5 ChnnC6 ChnnC8
ChannelCh3  -0.435                                                 
ChannelCh7  -0.435  0.500                                          
ChannelCh19 -0.435  0.500  0.500                                   
ChannelCh17 -0.435  0.500  0.500  0.500                            
ChannelCh5  -0.435  0.500  0.500  0.500  0.500                     
ChannelCh6  -0.435  0.500  0.500  0.500  0.500  0.500              
ChannelCh8  -0.435  0.500  0.500  0.500  0.500  0.500  0.500       
ChannelCh4  -0.435  0.500  0.500  0.500  0.500  0.500  0.500  0.500
confint(m2)
               2.5 % 97.5 %
.sig01       1.49441 2.9720
.sigma       1.43604 1.7986
(Intercept) -0.34320 2.0516
ChannelCh3  -0.97730 1.0710
ChannelCh7  -1.02730 1.0210
ChannelCh19 -0.88362 1.1647
ChannelCh17 -0.47572 1.5726
ChannelCh5  -0.84098 1.2073
ChannelCh6  -0.73256 1.3157
ChannelCh8  -1.02572 1.0226
ChannelCh4  -0.28888 1.7594
ranef(m2)
$ID
   (Intercept)
1    -1.498201
2     4.624063
3    -0.380711
4    -0.729276
5     0.965688
6     2.275669
7     3.702186
8    -1.601210
9    -1.446177
10    0.140576
11    1.246621
12    0.115604
13   -2.645866
14   -1.248483
15   -1.247442
16    2.160174
17   -0.099778
18   -2.341001
19   -1.992436

with conditional variances for "ID" 

殘差圖

plot(m2, form = resid(., type = "pearson") ~ fitted(.) | Channel, 
     abline = 0, lty = 2, layout = c(5,2), pch = 20, 
     xlab = "Fitted values", ylab = "Residuals")

EEG 對 Channel 的變異數

v_ch <- dta3L %>% 
    group_by(Channel) %>% 
    summarize( var_ch = var(Value)) %>% 
    as.data.frame
dta3L2 <- merge(dta3L, v_ch, by = "Channel")
str(dta3L2)
'data.frame':   171 obs. of  4 variables:
 $ Channel: Factor w/ 9 levels "Ch18","Ch3","Ch7",..: 5 5 5 5 5 5 5 5 5 5 ...
 $ ID     : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Value  : num  -4.15 5.48 -0.95 0.8 1.49 4.51 9.94 -0.61 0 1.4 ...
 $ var_ch : num  10.7 10.7 10.7 10.7 10.7 ...

加權

m3 <- lmer(Value ~ Channel + (1 | ID), data = dta3L2,
                   weights = 1/var_ch)
summary(m3)
Linear mixed model fit by REML ['lmerMod']
Formula: Value ~ Channel + (1 | ID)
   Data: dta3L2
Weights: 1/var_ch

REML criterion at convergence: 683.9

Scaled residuals: 
   Min     1Q Median     3Q    Max 
-3.861 -0.416 -0.021  0.430  4.029 

Random effects:
 Groups   Name        Variance Std.Dev.
 ID       (Intercept) 3.983    1.996   
 Residual             0.371    0.609   
Number of obs: 171, groups:  ID, 19

Fixed effects:
            Estimate Std. Error t value
(Intercept)  0.85421    0.58474    1.46
ChannelCh3   0.04684    0.54881    0.09
ChannelCh7  -0.00316    0.48322   -0.01
ChannelCh19  0.14053    0.49095    0.29
ChannelCh17  0.54842    0.58488    0.94
ChannelCh5   0.18316    0.49082    0.37
ChannelCh6   0.29158    0.48104    0.61
ChannelCh8  -0.00158    0.46467    0.00
ChannelCh4   0.73526    0.61118    1.20

Correlation of Fixed Effects:
            (Intr) ChnnC3 ChnnC7 ChnC19 ChnC17 ChnnC5 ChnnC6 ChnnC8
ChannelCh3  -0.412                                                 
ChannelCh7  -0.468  0.499                                          
ChannelCh19 -0.461  0.491  0.558                                   
ChannelCh17 -0.387  0.412  0.468  0.461                            
ChannelCh5  -0.461  0.491  0.558  0.549  0.461                     
ChannelCh6  -0.470  0.501  0.569  0.560  0.470  0.560              
ChannelCh8  -0.487  0.519  0.589  0.580  0.487  0.580  0.592       
ChannelCh4  -0.370  0.394  0.448  0.441  0.370  0.441  0.450  0.466

殘差圖

plot(m3, form = resid(., type = "pearson") ~ fitted(.) | Channel, 
     abline = 0, lty = 2, layout = c(5,2), pch = 20,
     xlab = "Fitted values", ylab = "Residuals")

*residual plots - combined and compared**

目前無法解決 fortify() 的問題…

The end

4 Homework 4

Conduct an analysis of the cognitive task example and commment on the results produced by each code chunk.

資料輸入

dta4 <- read.table("~/data/cognitive_task.txt", h=T)

分層級群組顯示散布圖

VCA::varPlot(Score ~ Method/Class/ID, 
             Data=dta4,
             YLabel=list(text="Score", side=2, cex=1),
             MeanLine=list(var=c("Method", "Class"),
                           col=c("darkred", "salmon"), lwd=c(1, 2)))

在class具random effect 下分數與方法的迴歸模型

m0 <- aov(Score ~ Method + Error(Class), data=dta4)

在klass和method具random effect下分數與方法的迴歸模型

m01 <- aov(Score ~ Method + Error(Method/Klass), data=dta4)
summary(m0)

Error: Class
          Df Sum Sq Mean Sq F value Pr(>F)
Method     1    112   112.5    6.46  0.044
Residuals  6    104    17.4               

Error: Within
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals 24   18.5   0.771               
summary(m01)

Error: Method
       Df Sum Sq Mean Sq
Method  1    112     112

Error: Method:Klass
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  6    104    17.4               

Error: Within
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals 24   18.5   0.771               
m1 <- lme4::lmer(Score ~ Method + (1 | Class), data=dta4)
m11 <- lme4::lmer(Score ~ Method + (1 | Method:Klass), data=dta4)

信賴區間

confint(m1, method="boot")
              2.5 % 97.5 %
.sig01      0.86539 3.2472
.sigma      0.62091 1.1177
(Intercept) 1.37717 5.6343
MethodI2    0.57779 6.3374

The end