0.1 load package

pacman::p_load(tidyverse, lme4)

0.2 input data

dta <- read.table("finger_ridges.txt", header = T)
str(dta)
## 'data.frame':    12 obs. of  3 variables:
##  $ Pair  : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Twin_1: int  71 79 105 115 76 83 114 57 114 94 ...
##  $ Twin_2: int  71 82 99 114 70 82 113 44 113 91 ...

0.3 coerce pair variable to factor type

dta$Pair <- factor(dta$Pair)

將Pair變項屬性改為factor

0.4 show first 6 lines

head(dta)
##   Pair Twin_1 Twin_2
## 1    1     71     71
## 2    2     79     82
## 3    3    105     99
## 4    4    115    114
## 5    5     76     70
## 6    6     83     82

0.5 historical idea of intraclass correlation

cor(c(dta$Twin_1, dta$Twin_2), c(dta$Twin_2, dta$Twin_1))
## [1] 0.9625626

Twin_1和Twin_2的相關性為0.96

0.6 turn data from wide format to long format

dtaL <- dta %>% 
        gather(key = Twin, value = Ridges, Twin_1, Twin_2) %>%
        mutate( Twin = as.factor(Twin)) 

將資料型態從wide轉為long,並把Twin的資料屬性改為factor

0.7 show mean and sd of ridges by twin pair

dtaL %>% group_by(Pair) %>%
         summarize( ridge_ave = mean(Ridges), ridge_sd = sd(Ridges)) %>%
         arrange( desc(ridge_ave)) 
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 12 x 3
##    Pair  ridge_ave ridge_sd
##    <fct>     <dbl>    <dbl>
##  1 4         114.     0.707
##  2 7         114.     0.707
##  3 9         114.     0.707
##  4 3         102      4.24 
##  5 10         92.5    2.12 
##  6 6          82.5    0.707
##  7 2          80.5    2.12 
##  8 11         79      5.66 
##  9 12         74      2.83 
## 10 5          73      4.24 
## 11 1          71      0    
## 12 8          50.5    9.19

計算每一pair的平均數與標準差並用arrange( desc(ridge_ave)) 針對ridge_ave降冪排序

0.8 dot plot

ggplot(dtaL, aes(reorder(Pair, Ridges, mean), Ridges, color = Pair)) + 
 geom_point() + 
 geom_hline(yintercept = mean(dtaL$Ridges), linetype = "dashed") +
 labs(y = "Number of finger ridges", x = "Twin ID") +
 theme_bw() +
 theme(legend.position = "NONE")

data visualization:reorder x=pair by y=Ridges mean

0.9 random effect anova for balanced designs

summary(m0 <- aov(Ridges ~ Error(Pair), data = dtaL))
## 
## Error: Pair
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 11   8990   817.3               
## 
## Error: Within
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 12  171.5   14.29

Pair間的殘差變異量為8990,Pair內的殘差變異量為171.5

0.10 random effects anova

summary(m1 <- lmer(Ridges ~ 1 + (1 | Pair), data = dtaL))
## Linear mixed model fit by REML ['lmerMod']
## Formula: Ridges ~ 1 + (1 | Pair)
##    Data: dtaL
## 
## REML criterion at convergence: 174.1
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.88917 -0.38618 -0.00834  0.37963  1.54959 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  Pair     (Intercept) 401.51   20.04   
##  Residual              14.29    3.78   
## Number of obs: 24, groups:  Pair, 12
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   87.208      5.836   14.94

考慮不同pair間的差異,Ridges的截距為87.208
Pair間的隨機效果(variance=401.51)比Pair內的隨機效果(variance=14.29)大,因此確實需要考量不同Pair間的隨機效果

ranef(m1)
## $Pair
##    (Intercept)
## 1   -15.924912
## 2    -6.591031
## 3    14.533018
## 4    26.814441
## 5   -13.959885
## 6    -4.626003
## 7    25.831927
## 8   -36.066447
## 9    25.831927
## 10    5.199136
## 11   -8.064801
## 12  -12.977371
## 
## with conditional variances for "Pair"

分別計算12個pair的隨機效果

0.11 The end

  1. Estimated Var(twin) = (817.3 - 14.29) /2 = 401.51

  2. The estimate of intraclass correlation = 401.51/(401.51+14.29) = 0.966

0.12 Question

1.計算Twin_1和Twin_2的Twin_2的相關,為什麼不能直接寫cor (dta\(Twin_1, dta\)Twin_2)?雖計算出來的結果相似,但與cor(c(dta\(Twin_1, dta\)Twin_2), c(dta\(Twin_2, dta\)Twin_1))不曉得是否有什麼差異?

2.對於error(v)和用lmer做出來的報表解釋,不是很確定是否正確?