Foreword: The Elementary Statistics for Medical Students (ESMS) project
This project aims to strengthen the statistical skills for Medical students in Vietnam. We provide the problem-based tutorials, each one will show the student how to resolve a study question using the appropriate methods. We focus on the basic statistical methods and common tasks All tutorials will be based on R, not only because this is a great tool for data visualization and analysis, but also for encouraging the students to use a free software instead of using the pirated packages like SPSS or Medcalc. Through these tutorials you will realize that R is much more powerful than any commercial package you ever know.
Introduction
The one-way analysis of variance (ANOVA) is an omnibus statistical method for verifying whether statistically significant differences exist between the means of independent groups. It worth to mention that the ANOVA is based on a linear model (LM), so our true intention is to evaluate the “effect” of a multi-levels factor on the outcome. Further analysis such as pairwise comparison or planned contrast would also be required for verifying our hypothesis, because stand alone ANOVA cannot locate which specific groups were statistically significantly different from each other.
This tutorial will show you how to carry out an One-way ANOVA in R. We also suggest some popular techniques for pairwise comparison, including Tukey post-hoc test, Bonferroni adjustment and planned contrast analysis. Last but not least, we also introduce some bootstraping procedures for ANOVA and post-hoc test, as well as how to interpret and report the results.
Context
In this parrallel group clinical trial, 606 patients were randomized into 4 treatment groups (A,B,C and D). Each group was treated with a different dose of drug X (A=lowest, D=highest). All patients underwent a liver function test before and after the treatment. The outcomes include Alkaline phosphatase, Alanine aminotransferase, Aspartate aminotransferase and Total bilirubin levels. For this tutorial we will only focus on the Total bilirubin level (TBL). Our study question is whether the drug doses have an effect on post-treatment TBL level?
** Method**
Data analysis was performed using R programming language. The drug dose effect on post treatment TBL was evaluated using One-way ANOVA. The TBL levels were compared among 4 dose groups by Bonferroni post-hoc test and contrast analysis.
Data preparation
The dataset could be downloaded from the famous vincentarebundock database as follow:
library(tidyverse)
data=read.csv("http://vincentarelbundock.github.io/Rdatasets/csv/texmex/liver.csv")%>%as_tibble()
Step 1: Exploring data and Testing of Assumptions
Before making any move, we should make sure that our data can actually be handled by ANOVA. Unfortunately, most of the published studies never consider checking of assumption on their data, whilst the real world data are far from perfect.
The result obtained from One-way ANOVA is only reliable if our data satisfy all the required assumptions as follows:
In fact, checking for these six assumptions is easy, but dealing with the violations is more difficult. First, we will explore our data and checking for any violation of those assumptions:
Table 1: Post treatment TBL levels in four Dose groups
d=psych::describeBy(data$TBL.M,data$dose)
t1=rbind(d$A[,c(2:13)], d$B[,c(2:13)], d$C[,c(2:13)], d$D[,c(2:13)])%>%as.data.frame()
row.names(t1)=c("A","B","C","D")
knitr::kable(t1)
n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
A | 152 | 9.570375 | 3.557731 | 8.4645 | 9.162516 | 2.788771 | 3.249 | 23.085 | 19.836 | 1.271402 | 2.066067 | 0.2885701 |
B | 148 | 10.554628 | 4.166143 | 9.4905 | 10.131750 | 3.422582 | 3.420 | 25.992 | 22.572 | 1.162064 | 1.648718 | 0.3424549 |
C | 148 | 10.747581 | 5.235447 | 9.4050 | 10.023450 | 3.422582 | 4.275 | 42.750 | 38.475 | 2.895294 | 12.872133 | 0.4303511 |
D | 158 | 11.845538 | 5.271410 | 10.6020 | 11.223211 | 4.309918 | 3.933 | 39.843 | 35.910 | 2.014411 | 6.771180 | 0.4193708 |
Checking for Normal distribution
The normal distribution of data can be verified by QQ-plot and/or statistical tests.
qqnorm(data$TBL.M)
qqline(data$TBL.M, col = "red")
or by using the D’Agostino test:
library(fBasics)
dagoTest(data$TBL.M)
##
## Title:
## D'Agostino Normality Test
##
## Test Results:
## STATISTIC:
## Chi2 | Omnibus: 320.5378
## Z3 | Skewness: 14.4439
## Z4 | Kurtosis: 10.5788
## P VALUE:
## Omnibus Test: < 2.2e-16
## Skewness Test: < 2.2e-16
## Kurtosis Test: < 2.2e-16
##
## Description:
## Thu Feb 23 01:20:06 2017 by user: Admin
or by looking directly at the distribution density curve:
data%>%ggplot(aes(x=TBL.M,fill=dose))+geom_density(alpha=0.5)+scale_y_continuous("Total bilirubin after treatment")+theme_bw()+scale_fill_manual(values=c("white","grey80","grey30","grey5"))
Those results indicate that our data is NOT normaly distributed. Such violation could be resolved by data transformation. Though many techniques are availabe for data transformation, we introduce here a very flexible method which is the BOX-COX transformation.
The Box-Cox transformation could be applied directly using the CARET package:
BCT=data[,c(2:9)]%>%apply(.,2,as.numeric)%>%preProcess(.,method="BoxCox")
data[,c(2:9)]=data[,c(2:9)]%>%as.matrix()%>%predict(BCT,.)
After being transformed, the dataset is now normally distributed with less ourlier
dagoTest(data$TBL.M)
##
## Title:
## D'Agostino Normality Test
##
## Test Results:
## STATISTIC:
## Chi2 | Omnibus: 2.2161
## Z3 | Skewness: -0.472
## Z4 | Kurtosis: 1.4119
## P VALUE:
## Omnibus Test: 0.3302
## Skewness Test: 0.637
## Kurtosis Test: 0.158
##
## Description:
## Thu Feb 23 01:20:06 2017 by user: Admin
qqnorm(data$TBL.M)
qqline(data$TBL.M, col = "red")
data%>%ggplot(aes(x=dose,y=TBL.M,fill=dose))+geom_boxplot(alpha=0.7)+scale_y_continuous("Total bilirubin after treatment (BoxCox transformed)")+coord_flip()+theme_bw()+scale_fill_manual(values=c("white","grey60","grey40","grey20"))
Step 2: Performing One-way ANOVA
This step consists of performing an ANOVA or F-test, based on a linear model of “Outcome ~ Factor + error”. It could be easily done using the aov () function.
aov(TBL.M ~ dose, data=data)%>%summary()
## Df Sum Sq Mean Sq F value Pr(>F)
## dose 3 0.705 0.23501 6.609 0.000213 ***
## Residuals 602 21.408 0.03556
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The table shows us the output of the ANOVA analysis. First, We can see that the significance value is 0.000319 (i.e., p=0.0003), which is below 0.05, indicating that there is statistically significant difference in mean TBL level between at least two drug doses. Unfortunately, the F-test cannot tell us which of the specific drug doses differed from other.
The Levene test for variance homogeneity confirms that our ANOVA result satisfies the Assumption N°6
aov(TBL.M ~ dose, data=data)%>%car::leveneTest()
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 0.1754 0.9131
## 602
Effect size of ANOVA
The effect size for ANOVA is measured by two metrics: Omega squared and Eta squared. They could be manually calculated as follow:
effectsize=function(output){
out=output%>%summary()
df=out[[1]]$Df[1]
SS=out[[1]]$`Sum Sq`[1]
MS=out[[1]]$`Mean Sq`[1]
SSR=out[[1]]$`Sum Sq`[2]
MSR=out[[1]]$`Mean Sq`[2]
eta2=SS/(SS+SSR)
omega2<- df*(MS-MSR) / (SS + SSR + MSR)
cbind(Effectsizes=c("Eta_squared","Omega_squared"),Estimated=c(eta2,omega2))
}
output=aov(TBL.M ~ dose, data=data)
effectsize(output)
## Effectsizes Estimated
## [1,] "Eta_squared" "0.0318832046673342"
## [2,] "Omega_squared" "0.0270152573079852"
Step 3: Multiple Comparisons Table
Our thrid step consists of performing some pariwise comparisons. As mentioned above, this could be done via a pairwise post-hoc test likes Bonferroni, or by analysing the planned contrast matrix:
Bonferroni Post-hoc test
pairwise.t.test(data$TBL.M, data$dose,p.adjust.method="bonferroni")
##
## Pairwise comparisons using t tests with pooled SD
##
## data: data$TBL.M and data$dose
##
## A B C
## B 0.285 - -
## C 0.206 1.000 -
## D 6.3e-05 0.098 0.141
##
## P value adjustment method: bonferroni
The Bonferroni post-hoc test’s result indicates that the difference in mean TBL level was only significative between the Highest (D) and Lowest (A) doses. No significant difference was found for the intermediate doses (B,C).
An alternative method is Tukey posthoc test:
aov(TBL.M ~ dose, data=data)%>%TukeyHSD()
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = TBL.M ~ dose, data = data)
##
## $dose
## diff lwr upr p adj
## B-A 0.043239481 -0.012862272 0.09934123 0.1945352
## C-A 0.046193976 -0.009907777 0.10229573 0.1474914
## D-A 0.095203892 0.040008990 0.15039879 0.0000621
## C-B 0.002954495 -0.053520031 0.05942902 0.9991256
## D-B 0.051964411 -0.003609348 0.10753817 0.0764662
## D-C 0.049009915 -0.006563843 0.10458367 0.1057793
aov(TBL.M ~ dose, data=data)%>%TukeyHSD()%>%plot()
First, the TukeyHSD function provides mean, as well as upper and lower limits of confidence interval for difference in TBL level between pairs of drug doses. Only the difference between Highest (D) and Lowest (A) doses could be considered significative, at a significance level of 0.0001, as the 95%CI does not contain zero (both lwr and upr limits were positive).
Planned contrast analysis
Compared to the conventional post-hoc tests, a planned contrast analysis is much more flexible, as it allows us to test every possible hypothesis. For example, the following contrast matrix supports not only simple contrasts (between two drug doses) but also the complexe hypothesis, likes: whether the mean TBL in Highest dose (D) differs significantly to the mean TBL of 3 lower doses (A, B and C). In a similar way, we can test our hypothesis about the difference in TBL between Upper dose groups (C+D) and Lower dose groups (A+B). All we need to do is setting the weight for contrasts.
cntrMat <- rbind("D vs A"=c( -1, 0, 0, 1),
"D vs B"=c( 0, -1, 0, 1),
"D vs C"=c( 0, 0, -1, 1),
"(BCD) vs A"=c( -1, 1/3, 1/3, 1/3),
"((CD) vs (AB))" =c( -1/2, -1/2, 1/2, 1/2),
"D vs (ABC)" =c( -1/3, -1/3, -1/3, 1),
"(CD) vs A" =c( -1, 0, 1/2, 1/2),
"(CD) vs B" =c( 0, -1, 1/2, 1/2)
)
library(multcomp)
out=aov(TBL.M ~ dose, data=data)
summary(glht(out,linfct=mcp(dose=cntrMat),alternative="two.sided"),test=adjusted("bonferroni"))
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: User-defined Contrasts
##
##
## Fit: aov(formula = TBL.M ~ dose, data = data)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## D vs A == 0 0.09520 0.02142 4.444 8.42e-05 ***
## D vs B == 0 0.05196 0.02157 2.409 0.13040
## D vs C == 0 0.04901 0.02157 2.272 0.18754
## (BCD) vs A == 0 0.06155 0.01767 3.482 0.00426 **
## ((CD) vs (AB)) == 0 0.04908 0.01533 3.202 0.01148 *
## D vs (ABC) == 0 0.06539 0.01745 3.748 0.00157 **
## (CD) vs A == 0 0.07070 0.01872 3.777 0.00139 **
## (CD) vs B == 0 0.02746 0.01888 1.454 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- bonferroni method)
Based on this result, we could say that:
The Highest Dose (D) results in higher TBL level, compared to all lower doses (A,B and C)
The Upper doses (C and D) also represent a higher TBL level than that in Lower doses (A and B)
The mean TBL in pooled group of upper doses (B,C and D) was significantly higher than that in lowest dose (A)
Step 4: Bootstraping ANOVA analysis
This is an advance technique that implies skills in function writing and extracting information from the output and therefore not mandatory for a routine analysis. However, Bootstraping is very helpful if you have a limited sample size and/or outliers.
First, we should write down a statistical function called anovaboot, it will provide the F-test, etasquared and Omegasquared for our resampled data.
The we call the boot package and perform a bootstrap with our function:
anovaboot=function(formula,data,i){
drs= data[i,]
out=summary(aov(lm(formula=formula,data=drs)))
df=out[[1]]$Df[1]
SS=out[[1]]$`Sum Sq`[1]
MS=out[[1]]$`Mean Sq`[1]
SSR=out[[1]]$`Sum Sq`[2]
MSR=out[[1]]$`Mean Sq`[2]
Fstat=out[[1]]$`F value`[1]
p=out[[1]]$`Pr(>F)`[1]
eta2=SS/(SS+SSR)
omega2<- df*(MS-MSR) / (SS + SSR + MSR)
return=c(SS,MS,Fstat,eta2,omega2,p)
}
#Bootstrap
library(boot)
set.seed(123)
res=boot(statistic=anovaboot,formula="TBL.M~dose",data,R=1000)%>%.$t%>%as_tibble()
names(res)=c("SSq","MSq","F_stats","Eta_sq","Omega_sq","P_value")
Hmisc::describe(res)
## res
##
## 6 Variables 1000 Observations
## ---------------------------------------------------------------------------
## SSq
## n missing distinct Info Mean Gmd .05 .10
## 1000 0 1000 1 0.7959 0.3606 0.3315 0.4091
## .25 .50 .75 .90 .95
## 0.5712 0.7594 0.9889 1.2308 1.3731
##
## lowest : 0.1377494 0.1459124 0.1749832 0.1761131 0.1785651
## highest: 1.8777922 1.9456574 1.9649548 1.9666145 2.1809989
## ---------------------------------------------------------------------------
## MSq
## n missing distinct Info Mean Gmd .05 .10
## 1000 0 1000 1 0.2653 0.1202 0.1105 0.1364
## .25 .50 .75 .90 .95
## 0.1904 0.2531 0.3296 0.4103 0.4577
##
## lowest : 0.04591646 0.04863747 0.05832773 0.05870438 0.05952171
## highest: 0.62593073 0.64855248 0.65498492 0.65553816 0.72699964
## ---------------------------------------------------------------------------
## F_stats
## n missing distinct Info Mean Gmd .05 .10
## 1000 0 1000 1 7.536 3.459 3.146 3.842
## .25 .50 .75 .90 .95
## 5.344 7.194 9.347 11.689 13.088
##
## lowest : 1.245522 1.405630 1.610905 1.640770 1.720563
## highest: 17.134346 17.226800 17.590400 19.705958 20.257045
## ---------------------------------------------------------------------------
## Eta_sq
## n missing distinct Info Mean Gmd .05 .10
## 1000 0 1000 1 0.03598 0.01594 0.01544 0.01878
## .25 .50 .75 .90 .95
## 0.02594 0.03461 0.04451 0.05504 0.06123
##
## lowest : 0.006168634 0.006956073 0.007963836 0.008110281 0.008501343
## highest: 0.078669724 0.079060654 0.080594869 0.089421079 0.091692491
## ---------------------------------------------------------------------------
## Omega_sq
## n missing distinct Info Mean Gmd .05 .10
## 1000 0 1000 1 0.03113 0.016 0.01051 0.01387
## .25 .50 .75 .90 .95
## 0.02105 0.02975 0.03968 0.05026 0.05646
##
## lowest : 0.001213982 0.002004043 0.003015165 0.003162099 0.003554466
## highest: 0.073965177 0.074357504 0.075897202 0.084755111 0.087034722
## ---------------------------------------------------------------------------
## P_value
## n missing distinct Info Mean Gmd .05
## 1000 0 1000 1 0.00546 0.009922 2.736e-08
## .10 .25 .50 .75 .90 .95
## 1.876e-07 4.782e-06 9.470e-05 1.226e-03 9.634e-03 2.474e-02
##
## lowest : 1.616239e-12 3.386410e-12 5.874875e-11 9.614896e-11 1.089908e-10
## highest: 1.615548e-01 1.787836e-01 1.856657e-01 2.400944e-01 2.923597e-01
## ---------------------------------------------------------------------------
res%>%gather(SSq:P_value,key="Metric",value="Value")%>%ggplot(aes(x=Value,fill=Metric))+geom_histogram(alpha=0.7,color="black",show.legend = F)+facet_wrap(~Metric,ncol=2,scales="free")+theme_bw()
etap=res%>%ggplot(aes(x=Eta_sq))+geom_histogram(alpha=0.8,fill="gold",color="black")+theme_bw()+geom_vline(xintercept = mean(res$Eta_sq),color="blue",size=0.8,linetype="dashed")
pp=res%>%ggplot(aes(x=P_value))+geom_histogram(alpha=0.8,fill="gold",color="black")+theme_bw()+geom_vline(xintercept = 0.02,color="red",size=0.8,linetype="dashed")
etap=res%>%ggplot(aes(x=Eta_sq))+geom_histogram(alpha=0.8,fill="gold",color="black")+theme_bw()+geom_vline(xintercept = mean(res$Eta_sq),color="blue",size=0.8,linetype="dashed")
pp=res%>%ggplot(aes(x=P_value))+geom_histogram(alpha=0.8,fill="gold",color="black")+theme_bw()+geom_vline(xintercept = 0.02,color="red",size=0.8,linetype="dashed")
library(gridExtra)
grid.arrange(etap,pp)
res%>%mutate(iteration=c(1:nrow(.)))%>%gather(SSq:P_value,key="Metric",value="Value")%>%ggplot(aes(x=iteration,y=Value,color=Metric))+geom_path(alpha=0.8,size=0.6,show.legend = F)+facet_wrap(~Metric,ncol=2,scales="free")+theme_bw()
The bootstrap result could be explored by either descriptive statistics or graphical tools
Those results, based on 1000 iterations confirm the significance level of ANOVA F-test. However, the Bootstrap 95%CI show a relatively low effect-size (Etasq = 0.02 to 0.06, which is lower than 0.12, the medium level based on Cohen’s rule).
The bootstrap analysis could also be performed on a Tukey post hoc test, as follows:
tukeyboot=function(formula,data,i){
drs= data[i,]
out=aov(lm(formula=formula,data=drs))
tk=TukeyHSD(out)
return=cbind(d1=tk$dose[1,1],
d2=tk$dose[2,1],
d3=tk$dose[3,1],
d4=tk$dose[4,1],
d5=tk$dose[5,1],
d6=tk$dose[6,1],
p1=tk$dose[1,4],
p2=tk$dose[2,4],
p3=tk$dose[3,4],
p4=tk$dose[4,4],
p5=tk$dose[5,4],
p6=tk$dose[6,4])
}
restk=boot(statistic=tukeyboot,formula="TBL.M~dose",data,R=1000)%>%.$t%>%as_tibble()
restkdif=restk[,c(1:6)]
names(restkdif)=c("BA","CA","DA","CB","DB","DC")
restkdif=restkdif%>%gather(BA:DC,key="Pair",value="Value")%>%mutate(Para="Meandif")
restkp=restk[,c(7:12)]
names(restkp)=c("BA","CA","DA","CB","DB","DC")
restkp=restkp%>%gather(BA:DC,key="Pair",value="Value")%>%mutate(Para="P_value")
restklong=rbind(restkdif,restkp)
restklong%>%ggplot(aes(x=Value,fill=Pair))+geom_histogram(color="black",show.legend = F)+facet_grid(Pair~Para,scales="free")+theme_bw()
We just rebuilt a pairwise comparison table using Tukey posthoc test, based on 1000 resampled datasets (i.e 606,000 observations). Interestingly, those results indicate that the mean TBL is not only significantly different between Highest and Lowest doses, but also between the D vs B and D vs C dose groups. This is the magic of bootstraping.
Step 5: Reporting the ANOVA results
Based on these tables and figures, we could report the results of our study as follows:
“The higher drug doses had a significantly effect on the post-treatment TBL level, as determined by one-way ANOVA (F(3,606) = 6.318,p=0.0003, Eta-squared =0.03). The contrast analysis indicates a significant contrast not only between the highest dose and lowest dose groups (D vs A, p=0.0001), but also the Upper dose and Lower dose groups (C+D vs A+B, p=0.009). There was no statistically significant difference between the intermediate and low dose groups.”
Exercise
Selecting any outcome among the remaining parameters, including Alkaline phosphatase, Alanine aminotransferase, Aspartate aminotransferase, then perform your own ANOVA study on this parameter.
Evaluate the change in liver function indices from Baseline to Post-treatment points, then perfom an ANOVA on those changes.
Thank you