1 Homework 1:

Perform an analysis of the anorexia data example following the paired or independent samples example in the course note. You can start with this R script .

1.1 Load data file

library(PairedData)

#
data(Anorexia, package="PairedData")
dta <- Anorexia
head(dta)

  Prior  Post
1  83.8  95.2
2  83.3  94.3
3  86.0  91.5
4  82.5  91.9
5  86.7 100.3
6  79.6  76.7

1.2 Visualization

with(dta, plot(paired(Prior, Post), type="profile")) +
 labs(x="Therapy", y="Weight (in lb)") +
 theme_bw()

## Compute variable means

colMeans(dta)

 Prior   Post 
83.229 90.494

1.3 Compute variable SD

print(apply(dta, 2, sd), 3)

Prior  Post 
 5.02  8.48

1.4 Compute covariances

cov(dta)

       Prior   Post
Prior 25.167 22.883
Post  22.883 71.827

1.5 Compute correlations

print(cor(dta),3)

      Prior  Post
Prior 1.000 0.538
Post  0.538 1.000

1.6 2-sample t-test

t.test(dta$Prior, dta$Post)


    Welch Two Sample t-test

data:  dta$Prior and dta$Post
t = -3.04, df = 26, p-value = 0.0053
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -12.1747  -2.3547
sample estimates:
mean of x mean of y 
   83.229    90.494

The results revealed by independent sample t-test showed that Session (prior and post test) yielded a significant effect with t = 3.04, p = 0.005.

1.7 Paired t-test

t.test(dta$Prior, dta$Post, paired = TRUE)


    Paired t-test

data:  dta$Prior and dta$Post
t = -4.18, df = 16, p-value = 7e-04
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -10.9447  -3.5847
sample estimates:
mean of the differences 
                -7.2647

The results revealed by paired t-test showed that Session (prior and post test) still yielded a significant effect with t = 4.18, p < .001. As Prof. Sheu mentioned in the class, if we did not consider the effect of correlation, the t or F values would be smaller.

2 Homework 2:

Identify the study design of the ergoStool example and replicate the results of analysis reported.

2.1 Load data file

data(ergoStool, package="nlme")
dta <- ergoStool

head(dta)

  effort Type Subject
1     12   T1       1
2     15   T2       1
3     12   T3       1
4     10   T4       1
5     10   T1       2
6     14   T2       2

Complete block designs or Randomized complete block designs

2.2 Compute variable means and SD

library(tidyverse)
dta_sum <- dta %>%
  group_by(Type) %>%
  summarise(m_Type = mean(effort), sd_Type = sd(effort))
dta_sum

# A tibble: 4 x 3
  Type  m_Type sd_Type
  <fct>  <dbl>   <dbl>
1 T1      8.56    1.67
2 T2     12.4     1.59
3 T3     10.8     1.92
4 T4      9.22    1.72

2.3 Visualization

ggplot(dta, 
       aes(x=reorder(Subject, effort, max), 
           y=effort, 
           color=Type)) + 
 geom_point(position=position_jitter(width=.1),
            size=rel(3)) + 
 geom_hline(yintercept=mean(dta$effort), 
            linetype="dotted") +
 labs(y="Effort", 
      x="Subject ID") +
 guides(color=guide_legend(reverse=T)) +
 theme_minimal() +
 theme(legend.position=c(.1,.85))

## ANOVA (set Subject as random effect)

summary(m1 <- aov(effort ~ Type + Error(Subject), data=dta))


Error: Subject
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  8   66.5    8.31               

Error: Within
          Df Sum Sq Mean Sq F value  Pr(>F)
Type       3   81.2   27.06    22.4 3.9e-07
Residuals 24   29.1    1.21

The results revealed that Type yielded a significant effect (F = 22.4, p < .001) on effort.

3 Homework 3:

3.1 Load data file

pacman::p_load(car, tidyverse, lme4, GGally)

# input data
dta <- read.table("Data/thetaEEG.txt", header=T)

# data structure
str(dta)

'data.frame':   19 obs. of  10 variables:
 $ ID  : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Ch3 : num  -3.54 5.72 0.52 0 2.07 1.67 9.13 -0.43 -0.56 1.28 ...
 $ Ch4 : num  -3.11 5.07 -0.18 0.74 0.76 ...
 $ Ch5 : num  -0.24 6.87 0.9 1.1 3.51 2.77 3.44 -0.31 -1.22 1.89 ...
 $ Ch6 : num  0.42 5.96 0.6 0.13 0.6 4.55 4.8 -0.61 0.67 1.77 ...
 $ Ch7 : num  -0.49 8.2 1.27 0.19 3.71 1.8 0.48 -1.04 -0.97 1.83 ...
 $ Ch8 : num  2.13 4.87 1.28 0.07 1.86 4.79 1.63 -0.13 -0.98 0.91 ...
 $ Ch17: num  -4.15 5.48 -0.95 0.8 1.49 4.51 9.94 -0.61 0 1.4 ...
 $ Ch18: num  2.87 5.57 1.74 0.25 3.11 3.24 1.34 -0.61 -1.22 1.1 ...
 $ Ch19: num  1.34 6.33 0.79 -0.66 1.8 3.99 1.53 -0.43 -0.91 -0.12 ...

head(dta)

  ID   Ch3   Ch4   Ch5  Ch6   Ch7  Ch8  Ch17 Ch18  Ch19
1  1 -3.54 -3.11 -0.24 0.42 -0.49 2.13 -4.15 2.87  1.34
2  2  5.72  5.07  6.87 5.96  8.20 4.87  5.48 5.57  6.33
3  3  0.52 -0.18  0.90 0.60  1.27 1.28 -0.95 1.74  0.79
4  4  0.00  0.74  1.10 0.13  0.19 0.07  0.80 0.25 -0.66
5  5  2.07  0.76  3.51 0.60  3.71 1.86  1.49 3.11  1.80
6  6  1.67  4.18  2.77 4.55  1.80 4.79  4.51 3.24  3.99

3.2 multivariate approach: design

eeg <- as.factor(colnames(dta[, -1]))
eeg

[1] Ch3  Ch4  Ch5  Ch6  Ch7  Ch8  Ch17 Ch18 Ch19
Levels: Ch17 Ch18 Ch19 Ch3 Ch4 Ch5 Ch6 Ch7 Ch8

3.3 linear model

m0 <- lm(as.matrix(dta[, -1]) ~ 1)

m0_aov <- Anova(m0, idata = data.frame(eeg), idesign = ~ eeg)

summary(m0_aov)


Type III Repeated Measures MANOVA Tests:

------------------------------------------
 
Term: (Intercept) 

 Response transformation matrix:
     (Intercept)
Ch3            1
Ch4            1
Ch5            1
Ch6            1
Ch7            1
Ch8            1
Ch17           1
Ch18           1
Ch19           1

Sum of squares and products for the hypothesis:
            (Intercept)
(Intercept)      1761.6

Multivariate Tests: (Intercept)
                 Df test stat approx F num Df den Df  Pr(>F)
Pillai            1   0.20359   4.6014      1     18 0.04584
Wilks             1   0.79641   4.6014      1     18 0.04584
Hotelling-Lawley  1   0.25564   4.6014      1     18 0.04584
Roy               1   0.25564   4.6014      1     18 0.04584

------------------------------------------
 
Term: eeg 

 Response transformation matrix:
     eeg1 eeg2 eeg3 eeg4 eeg5 eeg6 eeg7 eeg8
Ch3     0    0    0    1    0    0    0    0
Ch4     0    0    0    0    1    0    0    0
Ch5     0    0    0    0    0    1    0    0
Ch6     0    0    0    0    0    0    1    0
Ch7     0    0    0    0    0    0    0    1
Ch8    -1   -1   -1   -1   -1   -1   -1   -1
Ch17    1    0    0    0    0    0    0    0
Ch18    0    1    0    0    0    0    0    0
Ch19    0    0    1    0    0    0    0    0

Sum of squares and products for the hypothesis:
        eeg1        eeg2       eeg3       eeg4      eeg5       eeg6       eeg7
eeg1  5.7475  1.6500e-02  1.4850000  0.5060000  7.700000  1.9305000  3.0635000
eeg2  0.0165  4.7368e-05  0.0042632  0.0014526  0.022105  0.0055421  0.0087947
eeg3  1.4850  4.2632e-03  0.3836842  0.1307368  1.989474  0.4987895  0.7915263
eeg4  0.5060  1.4526e-03  0.1307368  0.0445474  0.677895  0.1699579  0.2697053
eeg5  7.7000  2.2105e-02  1.9894737  0.6778947 10.315789  2.5863158  4.1042105
eeg6  1.9305  5.5421e-03  0.4987895  0.1699579  2.586316  0.6484263  1.0289842
eeg7  3.0635  8.7947e-03  0.7915263  0.2697053  4.104211  1.0289842  1.6328895
eeg8 -0.0165 -4.7368e-05 -0.0042632 -0.0014526 -0.022105 -0.0055421 -0.0087947
            eeg8
eeg1 -1.6500e-02
eeg2 -4.7368e-05
eeg3 -4.2632e-03
eeg4 -1.4526e-03
eeg5 -2.2105e-02
eeg6 -5.5421e-03
eeg7 -8.7947e-03
eeg8  4.7368e-05

Multivariate Tests: eeg
                 Df test stat approx F num Df den Df Pr(>F)
Pillai            1   0.52437   1.5159      8     11 0.2559
Wilks             1   0.47563   1.5159      8     11 0.2559
Hotelling-Lawley  1   1.10246   1.5159      8     11 0.2559
Roy               1   1.10246   1.5159      8     11 0.2559

Univariate Type III Repeated-Measures ANOVA Assuming Sphericity

            Sum Sq num Df Error SS den Df F value Pr(>F)
(Intercept)  195.7      1      766     18    4.60  0.046
eeg           10.7      8      389    144    0.49  0.858


Mauchly Tests for Sphericity

    Test statistic  p-value
eeg       0.000374 9.74e-11


Greenhouse-Geisser and Huynh-Feldt Corrections
 for Departure from Sphericity

    GG eps Pr(>F[GG])
eeg  0.315       0.66

     HF eps Pr(>F[HF])
eeg 0.37092     0.6854

The results revealed by MANOVA yielded a significant effect (I think it is the subject effect? It means that there are individual difference in theta power.)

3.4 wide format to long format

dtaL <- dta %>% 
        gather(key = Channel, value = Theta, 2:10) %>% 
        mutate( Channel = factor(Channel))

#
str(dtaL)

'data.frame':   171 obs. of  3 variables:
 $ ID     : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Channel: Factor w/ 9 levels "Ch17","Ch18",..: 4 4 4 4 4 4 4 4 4 4 ...
 $ Theta  : num  -3.54 5.72 0.52 0 2.07 1.67 9.13 -0.43 -0.56 1.28 ...

3.5 boxplot

# set theme to black and white and save parameters
ot <- theme_set(theme_bw())

ggplot(dtaL, aes(reorder(Channel, Theta, median), Theta)) +
 geom_boxplot() +
 geom_point(alpha = .3) +
 geom_hline(yintercept = mean(dtaL$Theta)) +
 labs(x ="Channel Number", y = "Theta (Power)")

3.6 Univariate approach

summary(m1 <- aov(Theta ~ Channel + Error(ID/Channel), data = dtaL))


Error: ID
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  1    112     112               

Error: ID:Channel
        Df Sum Sq Mean Sq
Channel  8   23.1    2.89

Error: Within
           Df Sum Sq Mean Sq F value Pr(>F)
Channel     8     10    1.23    0.18   0.99
Residuals 153   1020    6.67

The results revealed by Univariate ANOVA showed that the channels did not reach signifiant difference on theta power.

3.7 Visualization

# Theta Power over Channel with CIs
ggplot(dtaL, aes(reorder(Channel, Theta, median), Theta)) +
 stat_summary(fun.data = "mean_cl_boot", geom = "pointrange", col = "firebrick") + 
 geom_hline(yintercept = mean(dtaL$Theta), linetype = "dashed") +
 geom_point(alpha = .3, size = rel(.7)) +
 labs(x = "Channel Number", y = "Theta (Power)")

3.8 Patients’ theta power with channels colors

ggplot(dtaL, aes(reorder(ID, Theta, mean), Theta, color = Channel)) +
 geom_point() +
 geom_hline(yintercept = mean(dtaL$Theta), linetype = "dashed") + 
 coord_flip() +
 labs(y = "Theta (Power)", x = "Subject ID") +
 theme(legend.position = c(.9, .2))

3.9 Mixed-effect approach

summary(m2 <- lmer(Theta ~ Channel + (1 | ID), data = dtaL))

Linear mixed model fit by REML ['lmerMod']
Formula: Theta ~ Channel + (1 | ID)
   Data: dtaL

REML criterion at convergence: 697

Scaled residuals: 
   Min     1Q Median     3Q    Max 
-3.650 -0.388  0.000  0.422  4.639 

Random effects:
 Groups   Name        Variance Std.Dev.
 ID       (Intercept) 4.43     2.10    
 Residual             2.70     1.64    
Number of obs: 171, groups:  ID, 19

Fixed effects:
            Estimate Std. Error t value
(Intercept)    1.403      0.613    2.29
ChannelCh18   -0.548      0.533   -1.03
ChannelCh19   -0.408      0.533   -0.76
ChannelCh3    -0.502      0.533   -0.94
ChannelCh4     0.187      0.533    0.35
ChannelCh5    -0.365      0.533   -0.68
ChannelCh6    -0.257      0.533   -0.48
ChannelCh7    -0.552      0.533   -1.03
ChannelCh8    -0.550      0.533   -1.03

Correlation of Fixed Effects:
            (Intr) ChnC18 ChnC19 ChnnC3 ChnnC4 ChnnC5 ChnnC6 ChnnC7
ChannelCh18 -0.435                                                 
ChannelCh19 -0.435  0.500                                          
ChannelCh3  -0.435  0.500  0.500                                   
ChannelCh4  -0.435  0.500  0.500  0.500                            
ChannelCh5  -0.435  0.500  0.500  0.500  0.500                     
ChannelCh6  -0.435  0.500  0.500  0.500  0.500  0.500              
ChannelCh7  -0.435  0.500  0.500  0.500  0.500  0.500  0.500       
ChannelCh8  -0.435  0.500  0.500  0.500  0.500  0.500  0.500  0.500

3.10 Show CIs for each channels

confint(m2)

               2.5 %  97.5 %
.sig01       1.49441 2.97200
.sigma       1.43604 1.79864
(Intercept)  0.20522 2.60004
ChannelCh18 -1.57256 0.47572
ChannelCh19 -1.43204 0.61625
ChannelCh3  -1.52572 0.52256
ChannelCh4  -0.83730 1.21098
ChannelCh5  -1.38940 0.65888
ChannelCh6  -1.28098 0.76730
ChannelCh7  -1.57572 0.47256
ChannelCh8  -1.57414 0.47414

3.11 Show random effect

ranef(m2)

$ID
   (Intercept)
1    -1.498201
2     4.624063
3    -0.380711
4    -0.729276
5     0.965688
6     2.275669
7     3.702186
8    -1.601210
9    -1.446177
10    0.140576
11    1.246621
12    0.115604
13   -2.645866
14   -1.248483
15   -1.247442
16    2.160174
17   -0.099778
18   -2.341001
19   -1.992436

with conditional variances for "ID"

3.12 Residual plot

plot(m2, form = resid(., type = "pearson") ~ fitted(.) | Channel, 
     abline = 0, lty = 2, layout = c(4,1), pch = 20, 
     xlab = "Fitted values", ylab = "Residuals")

3.13 Variance Theta Power by Channel

dtaL <- dtaL %>% 
    group_by(Channel) %>% 
    mutate( var_eeg = var(Theta)) %>% 
    as.data.frame
head(dtaL)

  ID Channel Theta var_eeg
1  1     Ch3 -3.54  8.6459
2  2     Ch3  5.72  8.6459
3  3     Ch3  0.52  8.6459
4  4     Ch3  0.00  8.6459
5  5     Ch3  2.07  8.6459
6  6     Ch3  1.67  8.6459

3.14 Weighted fit (Subject as random effect)

summary(m3 <- lmer(Theta ~ Channel + (1 | ID), data = dtaL,
                   weights = 1/var_eeg))

Linear mixed model fit by REML ['lmerMod']
Formula: Theta ~ Channel + (1 | ID)
   Data: dtaL
Weights: 1/var_eeg

REML criterion at convergence: 683.9

Scaled residuals: 
   Min     1Q Median     3Q    Max 
-3.861 -0.416 -0.021  0.430  4.029 

Random effects:
 Groups   Name        Variance Std.Dev.
 ID       (Intercept) 3.983    1.996   
 Residual             0.371    0.609   
Number of obs: 171, groups:  ID, 19

Fixed effects:
            Estimate Std. Error t value
(Intercept)    1.403      0.648    2.17
ChannelCh18   -0.548      0.585   -0.94
ChannelCh19   -0.408      0.564   -0.72
ChannelCh3    -0.502      0.615   -0.82
ChannelCh4     0.187      0.672    0.28
ChannelCh5    -0.365      0.564   -0.65
ChannelCh6    -0.257      0.556   -0.46
ChannelCh7    -0.552      0.558   -0.99
ChannelCh8    -0.550      0.542   -1.02

Correlation of Fixed Effects:
            (Intr) ChnC18 ChnC19 ChnnC3 ChnnC4 ChnnC5 ChnnC6 ChnnC7
ChannelCh18 -0.554                                                 
ChannelCh19 -0.574  0.636                                          
ChannelCh3  -0.526  0.583  0.604                                   
ChannelCh4  -0.482  0.534  0.553  0.508                            
ChannelCh5  -0.574  0.636  0.659  0.604  0.554                     
ChannelCh6  -0.583  0.645  0.669  0.613  0.562  0.669              
ChannelCh7  -0.581  0.643  0.667  0.611  0.560  0.667  0.677       
ChannelCh8  -0.598  0.662  0.686  0.629  0.577  0.686  0.697  0.694

3.15 Residual plot

plot(m3, form = resid(., type = "pearson") ~ fitted(.) | Channel, 
     abline = 0, lty = 2, layout = c(4,1), pch = 20,
     xlab = "Fitted values", ylab = "Residuals")

In my opinion, I need to do some model comparisons to see which model is the best fitting, but the example code can’t run. I still looking for some procedure to do it.

4 Homework 4:

Conduct an analysis of the cognitive task example and commment on the results produced by each code chunk.

4.1 Load data file

dta <- read.table("data/cognitive_task.txt", h=T)
head(dta)

   ID Score Method Class Klass
1 S01     3     I1    C1    K1
2 S02     6     I1    C1    K1
3 S03     3     I1    C1    K1
4 S04     3     I1    C1    K1
5 S05     1     I1    C2    K2
6 S06     2     I1    C2    K2

4.2 Visualization

VCA::varPlot(Score ~ Method/Class/ID, 
             Data=dta,
             YLabel=list(text="Score", side=2, cex=1),
             MeanLine=list(var=c("Method", "Class"),
                           col=c("darkred", "salmon"), lwd=c(1, 2)))

4.3 Compute mean and SD for each mehod

dta_sum <- dta %>%
  group_by(Method) %>%
  summarise(m_Score = mean(Score), sd_Score = sd(Score))
dta_sum

# A tibble: 2 x 3
  Method m_Score sd_Score
  <chr>    <dbl>    <dbl>
1 I1        3.5      1.63
2 I2        7.25     2.35

4.4 Compute mean and SD for each level

dta_sum1 <- dta %>%
  group_by(Method, Class) %>%
  summarise(m_Score = mean(Score), sd_Score = sd(Score))
dta_sum1

# A tibble: 8 x 4
# Groups:   Method [2]
  Method Class m_Score sd_Score
  <chr>  <chr>   <dbl>    <dbl>
1 I1     C1       3.75    1.5  
2 I1     C2       1.75    0.5  
3 I1     C3       5.5     0.577
4 I1     C4       3       0.816
5 I2     C5       7       0.816
6 I2     C6       4       0.816
7 I2     C7       8       0.816
8 I2     C8      10       0.816

4.5 anova (Class as random effect)

m0 <- aov(Score ~ Method + Error(Class), data=dta)

#
summary(m0)


Error: Class
          Df Sum Sq Mean Sq F value Pr(>F)
Method     1    112   112.5    6.46  0.044
Residuals  6    104    17.4               

Error: Within
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals 24   18.5   0.771

The results showed that Method yielded a significant effect on scores.

4.6

m01 <- aov(Score ~ Method + Error(Method/Klass), data=dta)

#
summary(m01)


Error: Method
       Df Sum Sq Mean Sq
Method  1    112     112

Error: Method:Klass
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  6    104    17.4               

Error: Within
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals 24   18.5   0.771

4.7 F Value for Method in m01

112/17.4

[1] 6.4368

4.8 ANOVA (set Class as random effect)

m1 <- lme4::lmer(Score ~ Method + (1 | Class), data=dta)
m1

Linear mixed model fit by REML ['lmerMod']
Formula: Score ~ Method + (1 | Class)
   Data: dta
REML criterion at convergence: 101.58
Random effects:
 Groups   Name        Std.Dev.
 Class    (Intercept) 2.040   
 Residual             0.878   
Number of obs: 32, groups:  Class, 8
Fixed Effects:
(Intercept)     MethodI2  
       3.50         3.75

m11 <- lme4::lmer(Score ~ Method + (1 | Method:Klass), data=dta)
m11

Linear mixed model fit by REML ['lmerMod']
Formula: Score ~ Method + (1 | Method:Klass)
   Data: dta
REML criterion at convergence: 101.58
Random effects:
 Groups       Name        Std.Dev.
 Method:Klass (Intercept) 2.040   
 Residual                 0.878   
Number of obs: 32, groups:  Method:Klass, 8
Fixed Effects:
(Intercept)     MethodI2  
       3.50         3.75

4.9 Show CIs

confint(m1, method="boot")

              2.5 % 97.5 %
.sig01      0.82428 3.4025
.sigma      0.65193 1.1422
(Intercept) 1.52570 5.6800
MethodI2    0.92168 6.7614

Analysis of Variance: Homework exercises

Hao-Lun Fu

2020-10-05