R.A.Fisher提出的统计理论基础:将总变异分解为由研究因素所产生的变异与抽样误差的部分,通过比较来自于不同部分的变异,借助统计分析做出推断。
总变异=研究因素产生的变异(组间变异)+抽样误差(组内变异)
F=MS(组间)/MS(组内);
理论上,F>1,则认为处理因素有作用;F=1,则处理因素无作用。抽样误差是总存在的,不可避免,抽样误差及抽样方法 ;根据统计分析,判断研究因素对研究目的是否存在影响。
【注意】当多个水平(>=3,若为两个则可用T检验),可用方差分析。
单因素方差分析的适用条件:
(1)各研究对象独立随机抽样
(2)因变量服从正态分布
(3)方差齐性
library(multcomp)
## Loading required package: mvtnorm
## Loading required package: survival
## Loading required package: TH.data
## Loading required package: MASS
##
## Attaching package: 'TH.data'
## The following object is masked from 'package:MASS':
##
## geyser
data('cholesterol')
head(cholesterol)
## trt response
## 1 1time 3.8612
## 2 1time 10.3868
## 3 1time 5.9059
## 4 1time 3.0609
## 5 1time 7.7204
## 6 1time 2.7139
shapiro.test(cholesterol$response)
##
## Shapiro-Wilk normality test
##
## data: cholesterol$response
## W = 0.97722, p-value = 0.4417
bartlett.test(response~trt,data=cholesterol)
##
## Bartlett test of homogeneity of variances
##
## data: response by trt
## Bartlett's K-squared = 0.57975, df = 4, p-value = 0.9653
attach(cholesterol)
aov_response <- aov(response~trt)
summary(aov_response)
## Df Sum Sq Mean Sq F value Pr(>F)
## trt 4 1351.4 337.8 32.43 9.82e-13 ***
## Residuals 45 468.8 10.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
检验结果说明,各组之间差异显著(P=9.82e-13,即P<0.01)。
TukeyHSD(aov_response)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = response ~ trt)
##
## $trt
## diff lwr upr p adj
## 2times-1time 3.44300 -0.6582817 7.544282 0.1380949
## 4times-1time 6.59281 2.4915283 10.694092 0.0003542
## drugD-1time 9.57920 5.4779183 13.680482 0.0000003
## drugE-1time 15.16555 11.0642683 19.266832 0.0000000
## 4times-2times 3.14981 -0.9514717 7.251092 0.2050382
## drugD-2times 6.13620 2.0349183 10.237482 0.0009611
## drugE-2times 11.72255 7.6212683 15.823832 0.0000000
## drugD-4times 2.98639 -1.1148917 7.087672 0.2512446
## drugE-4times 8.57274 4.4714583 12.674022 0.0000037
## drugE-drugD 5.58635 1.4850683 9.687632 0.0030633
attach(cholesterol)
## The following objects are masked from cholesterol (pos = 3):
##
## response, trt
boxplot(response~trt)
par(las=2)
par(mar=c(5,8,4,2))
plot(TukeyHSD(aov_response))
公众号医学统计园文章《单因素方差分析–R语言》
视频《R语言与高级医学统计学》