1.单因素方差分析基本思想

R.A.Fisher提出的统计理论基础:将总变异分解为由研究因素所产生的变异与抽样误差的部分,通过比较来自于不同部分的变异,借助统计分析做出推断。

总变异=研究因素产生的变异(组间变异)+抽样误差(组内变异)

 F=MS(组间)/MS(组内);

理论上,F>1,则认为处理因素有作用;F=1,则处理因素无作用。抽样误差是总存在的,不可避免,抽样误差及抽样方法  ;根据统计分析,判断研究因素对研究目的是否存在影响。

【注意】当多个水平(>=3,若为两个则可用T检验),可用方差分析。

单因素方差分析的适用条件:

(1)各研究对象独立随机抽样

(2)因变量服从正态分布

(3)方差齐性

2.单因素方差分析R语言实例

library(multcomp)
## Loading required package: mvtnorm
## Loading required package: survival
## Loading required package: TH.data
## Loading required package: MASS
## 
## Attaching package: 'TH.data'
## The following object is masked from 'package:MASS':
## 
##     geyser
data('cholesterol')
head(cholesterol)
##     trt response
## 1 1time   3.8612
## 2 1time  10.3868
## 3 1time   5.9059
## 4 1time   3.0609
## 5 1time   7.7204
## 6 1time   2.7139
shapiro.test(cholesterol$response)
## 
##  Shapiro-Wilk normality test
## 
## data:  cholesterol$response
## W = 0.97722, p-value = 0.4417
bartlett.test(response~trt,data=cholesterol)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  response by trt
## Bartlett's K-squared = 0.57975, df = 4, p-value = 0.9653
attach(cholesterol)
aov_response <- aov(response~trt)
summary(aov_response)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## trt          4 1351.4   337.8   32.43 9.82e-13 ***
## Residuals   45  468.8    10.4                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

检验结果说明,各组之间差异显著(P=9.82e-13,即P<0.01)。

TukeyHSD(aov_response)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = response ~ trt)
## 
## $trt
##                   diff        lwr       upr     p adj
## 2times-1time   3.44300 -0.6582817  7.544282 0.1380949
## 4times-1time   6.59281  2.4915283 10.694092 0.0003542
## drugD-1time    9.57920  5.4779183 13.680482 0.0000003
## drugE-1time   15.16555 11.0642683 19.266832 0.0000000
## 4times-2times  3.14981 -0.9514717  7.251092 0.2050382
## drugD-2times   6.13620  2.0349183 10.237482 0.0009611
## drugE-2times  11.72255  7.6212683 15.823832 0.0000000
## drugD-4times   2.98639 -1.1148917  7.087672 0.2512446
## drugE-4times   8.57274  4.4714583 12.674022 0.0000037
## drugE-drugD    5.58635  1.4850683  9.687632 0.0030633

结果可视化

attach(cholesterol)
## The following objects are masked from cholesterol (pos = 3):
## 
##     response, trt
boxplot(response~trt)

par(las=2)
par(mar=c(5,8,4,2))
plot(TukeyHSD(aov_response))

参考文章

公众号医学统计园文章《单因素方差分析–R语言》

视频《R语言与高级医学统计学》