Packages for one-way ANOVA in R
#install.packages("compute.es",repos = "http://cran.us.r-project.org");
#install.packages("car",repos = "http://cran.us.r-project.org");
#install.packages ("ggplot2",repos = "http://cran.us.r-project.org");
#install.packages("multcomp",repos = "http://cran.us.r-project.org");
#install.packages("pastecs",repos = "http://cran.us.r-project.org");
#install.packages("WRS2", repos="http://R-Forge.R-project.org")
# Steps to install WRS
# first: install dependent packages
#install.packages(c("MASS", "akima", "robustbase"),repos = "http://cran.us.r-project.org")
# second: install suggested packages
#install.packages(c("cobs", "robust", "mgcv", "scatterplot3d", "quantreg", "rrcov", "lars", "pwr", "trimcluster", "parallel", "mc2d", "psych", "Rfit"),repos = "http://cran.us.r-project.org")
# third: install an additional package which provides some C functions
#install.packages("devtools",repos = "http://cran.us.r-project.org")
library("devtools")
## Warning: package 'devtools' was built under R version 3.4.3
#install_github("mrxiaohe/WRScpp",repos = "http://cran.us.r-project.org")
# fourth: install WRS
#install_github("nicebread/WRS", subdir="pkg",repos = "http://cran.us.r-project.org")
library(compute.es);
library(car);
## Warning: package 'car' was built under R version 3.4.3
library(ggplot2);
## Warning: package 'ggplot2' was built under R version 3.4.3
library(multcomp);
## Warning: package 'multcomp' was built under R version 3.4.3
## Loading required package: mvtnorm
## Warning: package 'mvtnorm' was built under R version 3.4.3
## Loading required package: survival
## Loading required package: TH.data
## Warning: package 'TH.data' was built under R version 3.4.3
## Loading required package: MASS
## Warning: package 'MASS' was built under R version 3.4.3
##
## Attaching package: 'TH.data'
## The following object is masked from 'package:MASS':
##
## geyser
library(pastecs);
## Warning: package 'pastecs' was built under R version 3.4.3
## Loading required package: boot
##
## Attaching package: 'boot'
## The following object is masked from 'package:survival':
##
## aml
## The following object is masked from 'package:car':
##
## logit
library(WRS2);
library(WRS)
##
## Attaching package: 'WRS'
## The following objects are masked from 'package:WRS2':
##
## akp.effect, ancboot, ancova, binband, bwtrim, discANOVA,
## discmcp, discstep, Dqcomhd, lincon, mcp2a, mcp2atm, mcppb20,
## med1way, med2way, medpb2, mest, mestse, msmedse, pairdepb,
## pb2gen, pbad2way, pball, pbcor, Qanova, qcomhd, rmanova,
## rmanovab, rmmcp, rungen, runmbo, runmean, sppba, sppbb, sppbi,
## t1way, t1waybt, t2way, t3way, trimse, tsplit, twocor, twopcor,
## winall, wincor, winmean, winse, yuen, yuen.effect.ci, yuenbt,
## yuend, ZYmediate
## The following object is masked from 'package:MASS':
##
## ltsreg
## The following object is masked from 'package:car':
##
## ellipse
## The following objects are masked from 'package:stats':
##
## cov2cor, ecdf
## The following object is masked from 'package:grDevices':
##
## bmp
General procedure for one-way ANOVA
To conduct one-way ANOVA you should follow this general procedure:
1 Enter data: obviously you need to enter your data.
2 Explore your data: as with any analysis, it’s a good idea to begin by graphing your data and computing some descriptive statistics. You should also check distributional assumptions and use Levene’s test to check for homogeneity of variance.
3 Compute the basic ANOVA: you can then run the main analysis of variance.
Depending on what you found in the previous step, you might need to run a robust version of the test.
4 Compute post hoc tests: having conducted the main ANOVA you can follow it up with post hoc tests.
Enter the data
bmi<-c(3,2,1,1,4,5,2,4,2,3,7,4,5,3,6)
dose<-gl(3,5,labels = c("Placebo", "Low Dose", "High Dose"))
medicalData<-data.frame(dose, bmi)
medicalData
## dose bmi
## 1 Placebo 3
## 2 Placebo 2
## 3 Placebo 1
## 4 Placebo 1
## 5 Placebo 4
## 6 Low Dose 5
## 7 Low Dose 2
## 8 Low Dose 4
## 9 Low Dose 2
## 10 Low Dose 3
## 11 High Dose 7
## 12 High Dose 4
## 13 High Dose 5
## 14 High Dose 3
## 15 High Dose 6
Explore the data
by(medicalData$bmi,medicalData$dose, stat.desc)
## medicalData$dose: Placebo
## nbr.val nbr.null nbr.na min max
## 5.0000000 0.0000000 0.0000000 1.0000000 4.0000000
## range sum median mean SE.mean
## 3.0000000 11.0000000 2.0000000 2.2000000 0.5830952
## CI.mean.0.95 var std.dev coef.var
## 1.6189318 1.7000000 1.3038405 0.5926548
## --------------------------------------------------------
## medicalData$dose: Low Dose
## nbr.val nbr.null nbr.na min max
## 5.0000000 0.0000000 0.0000000 2.0000000 5.0000000
## range sum median mean SE.mean
## 3.0000000 16.0000000 3.0000000 3.2000000 0.5830952
## CI.mean.0.95 var std.dev coef.var
## 1.6189318 1.7000000 1.3038405 0.4074502
## --------------------------------------------------------
## medicalData$dose: High Dose
## nbr.val nbr.null nbr.na min max
## 5.0000000 0.0000000 0.0000000 3.0000000 7.0000000
## range sum median mean SE.mean
## 4.0000000 25.0000000 5.0000000 5.0000000 0.7071068
## CI.mean.0.95 var std.dev coef.var
## 1.9632432 2.5000000 1.5811388 0.3162278
Shapiro’s Test
shapiro.test(medicalData$bmi)
##
## Shapiro-Wilk normality test
##
## data: medicalData$bmi
## W = 0.95351, p-value = 0.5812
shapiro.test(medicalData[medicalData$dose=="Placebo",]$bmi)
##
## Shapiro-Wilk normality test
##
## data: medicalData[medicalData$dose == "Placebo", ]$bmi
## W = 0.90202, p-value = 0.4211
shapiro.test(medicalData[medicalData$dose=="Low Dose",]$bmi)
##
## Shapiro-Wilk normality test
##
## data: medicalData[medicalData$dose == "Low Dose", ]$bmi
## W = 0.90202, p-value = 0.4211
shapiro.test(medicalData[medicalData$dose=="High Dose",]$bmi)
##
## Shapiro-Wilk normality test
##
## data: medicalData[medicalData$dose == "High Dose", ]$bmi
## W = 0.98676, p-value = 0.9672
From the output, the p-value > 0.05 implying that the distribution of the data are not significantly different from normal distribution.
In other words, we can assume the normality.
Homogeneous in Variance (homoscedasticity) Test
There are many ways of testing data for homogeneity of variance. Two methods are shown here.
* Bartlett’s test - If the data is normally distributed, this is the best test to use. It is sensitive to data which is not non-normally distribution; it is more likely to return a “false positive” when the data is non-normal.
* Levene’s test - this is more robust to departures from normality than Bartlett’s test.
It is in the car package.
Bartlett’s test
bartlett.test(medicalData$bmi~medicalData$dose)
##
## Bartlett test of homogeneity of variances
##
## data: medicalData$bmi by medicalData$dose
## Bartlett's K-squared = 0.1853, df = 2, p-value = 0.9115
Levene’s test
leveneTest(medicalData$bmi, medicalData$dose)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 0.1176 0.89
## 12
Levene’s test is very non-significant, F(2,12) = 0.118, p = .89. This means that for these data the variances are very similar (hence the high probability value).
When test assumptions are met
medicalModel<-aov(bmi~dose, data = medicalData)
summary(medicalModel)
## Df Sum Sq Mean Sq F value Pr(>F)
## dose 2 20.13 10.067 5.119 0.0247 *
## Residuals 12 23.60 1.967
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
F value is 5.119, and p-value is 0.0247.
In other words, the variation of bmi means among different dose (numerator) is much larger than the variation of bmi within each dose, and our p-value is less than 0.05.
Hence we can conclude that for our confidence interval we accept the alternative hypothesis H1 that there is a significant relationship between dose and bmi.
par(mfrow=c(2,2))
plot(medicalModel)
You will actually see four graphs, but the first two are the most important for ANOVA.
The first graph can be used for testing homogeneity of variance. Essentially, if it has a funnel shape then we’re in trouble. The plot shows points that are equally spread for the three groups, which implies that variances are similar across groups (which was also the conclusion reached by Levene’s test).
The second plot (on the right) is a Q-Q plot, which tells us something about the normality of residuals in the model. We want our residuals to be normally distributed, which means that the dots on the graph should cling to the diagonal line. Ours look like they have had a bit of an argument with the diagonal line, which suggests that we may not be able to assume normality of errors and should perhaps use a robust version of ANOVA instead.
# Reset the earlier partition command
dev.off()
Post Hoc Test
TukeyHSD(medicalModel)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = bmi ~ dose, data = medicalData)
##
## $dose
## diff lwr upr p adj
## Low Dose-Placebo 1.0 -1.3662412 3.366241 0.5162761
## High Dose-Placebo 2.8 0.4337588 5.166241 0.0209244
## High Dose-Low Dose 1.8 -0.5662412 4.166241 0.1474576
This output confirms significant differences between the high dose and placebo groups, p < .05, but not between the low dose and placebo, where p = .52, or high dose and low dose, where p = .15
The confidence intervals also confirm this because they do not cross zero for the comparison of the high dose and placebo group, which means that the true difference between group means is likely not to be zero (i.e., no difference); conversely, for others the confidence intervals cross zero, implying that the true difference between means could be zero.
Wilcox (2005) describes a set of robust procedures for conducting one-way ANOVA.
The first issue with using these functions is that most of them require the data to be in wide format rather than the long format.
medicalWide<-unstack(medicalData, bmi~dose) #convert data into wide format
medicalWide
## Placebo Low.Dose High.Dose
## 1 3 5 7
## 2 2 2 4
## 3 1 4 5
## 4 1 2 3
## 5 4 3 6
This is the format that Wilcox’s functions expect. The first robust function, t1way(), is based on a trimmed mean. It takes the general form:
t1way(dataFrame, tr = .2, grp = c(x, y, z))
in which,
dataFrame is the name of the dataframe to be analysed.
tr is the proportion of trimming to be done. The default is .2 or 20%, and you need to use this option only if you want to specify an amount other than 20%.
grp can be used to specify particular groups by referring to their column in the dataframe; for example, if we wanted to analyse only the placebo and highdose group, we could do this using grp = c(1,3).
t1way(medicalWide, tr=.1)
## $TEST
## [1] 4.320451
##
## $nu1
## [1] 2
##
## $nu2
## [1] 7.943375
##
## $n
## [1] 5 5 5
##
## $p.value
## [1] 0.05373847
Based on this robust test, there is not a significant difference in bmi scores across the three dose groups, Ft(2, 7.94) = 4.32, p = .054.
F=4.320451, num df=2, denom df=7.943375, p-value=0.05373847
Alternative Approach
When variances are not equal across groups
If Levene’s test is significant then it is reasonable to assume that population variances are different across groups.
In this case, we can apply Welch’s F to the data, which makes adjustments for differences in group variances. This test is produced by the oneway.test() function,which is built into R.
oneway.test(bmi ~ dose, data = medicalData)
##
## One-way analysis of means (not assuming equal variances)
##
## data: bmi and dose
## F = 4.3205, num df = 2.0000, denom df = 7.9434, p-value = 0.05374
For our data we didn’t need this test because our Levene’s test was not significant, indicating that our population variances were similar. However, when homogeneity of variance has been violated you should look at this F-ratio