pacman::p_load(tidyverse, lme4)
dta <- read.table("finger_ridges.txt", header = T)
str(dta)
## 'data.frame': 12 obs. of 3 variables:
## $ Pair : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Twin_1: int 71 79 105 115 76 83 114 57 114 94 ...
## $ Twin_2: int 71 82 99 114 70 82 113 44 113 91 ...
dta$Pair <- factor(dta$Pair)
將Pair變項屬性改為factor
head(dta)
## Pair Twin_1 Twin_2
## 1 1 71 71
## 2 2 79 82
## 3 3 105 99
## 4 4 115 114
## 5 5 76 70
## 6 6 83 82
cor(c(dta$Twin_1, dta$Twin_2), c(dta$Twin_2, dta$Twin_1))
## [1] 0.9625626
Twin_1和Twin_2的相關性為0.96
dtaL <- dta %>%
gather(key = Twin, value = Ridges, Twin_1, Twin_2) %>%
mutate( Twin = as.factor(Twin))
將資料型態從wide轉為long,並把Twin的資料屬性改為factor
dtaL %>% group_by(Pair) %>%
summarize( ridge_ave = mean(Ridges), ridge_sd = sd(Ridges)) %>%
arrange( desc(ridge_ave))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 12 x 3
## Pair ridge_ave ridge_sd
## <fct> <dbl> <dbl>
## 1 4 114. 0.707
## 2 7 114. 0.707
## 3 9 114. 0.707
## 4 3 102 4.24
## 5 10 92.5 2.12
## 6 6 82.5 0.707
## 7 2 80.5 2.12
## 8 11 79 5.66
## 9 12 74 2.83
## 10 5 73 4.24
## 11 1 71 0
## 12 8 50.5 9.19
計算每一pair的平均數與標準差並用arrange( desc(ridge_ave)) 針對ridge_ave降冪排序
ggplot(dtaL, aes(reorder(Pair, Ridges, mean), Ridges, color = Pair)) +
geom_point() +
geom_hline(yintercept = mean(dtaL$Ridges), linetype = "dashed") +
labs(y = "Number of finger ridges", x = "Twin ID") +
theme_bw() +
theme(legend.position = "NONE")
data visualization:reorder x=pair by y=Ridges mean
summary(m0 <- aov(Ridges ~ Error(Pair), data = dtaL))
##
## Error: Pair
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 11 8990 817.3
##
## Error: Within
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 12 171.5 14.29
Pair間的殘差變異量為8990,Pair內的殘差變異量為171.5
summary(m1 <- lmer(Ridges ~ 1 + (1 | Pair), data = dtaL))
## Linear mixed model fit by REML ['lmerMod']
## Formula: Ridges ~ 1 + (1 | Pair)
## Data: dtaL
##
## REML criterion at convergence: 174.1
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.88917 -0.38618 -0.00834 0.37963 1.54959
##
## Random effects:
## Groups Name Variance Std.Dev.
## Pair (Intercept) 401.51 20.04
## Residual 14.29 3.78
## Number of obs: 24, groups: Pair, 12
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 87.208 5.836 14.94
考慮不同pair間的差異,Ridges的截距為87.208
Pair間的隨機效果(variance=401.51)比Pair內的隨機效果(variance=14.29)大,因此確實需要考量不同Pair間的隨機效果
ranef(m1)
## $Pair
## (Intercept)
## 1 -15.924912
## 2 -6.591031
## 3 14.533018
## 4 26.814441
## 5 -13.959885
## 6 -4.626003
## 7 25.831927
## 8 -36.066447
## 9 25.831927
## 10 5.199136
## 11 -8.064801
## 12 -12.977371
##
## with conditional variances for "Pair"
分別計算12個pair的隨機效果
Estimated Var(twin) = (817.3 - 14.29) /2 = 401.51
The estimate of intraclass correlation = 401.51/(401.51+14.29) = 0.966
1.計算Twin_1和Twin_2的Twin_2的相關,為什麼不能直接寫cor (dta\(Twin_1, dta\)Twin_2)?雖計算出來的結果相似,但與cor(c(dta\(Twin_1, dta\)Twin_2), c(dta\(Twin_2, dta\)Twin_1))不曉得是否有什麼差異?
2.對於error(v)和用lmer做出來的報表解釋,不是很確定是否正確?