Comparing Three or More Independent Groups

Learning Objectives

Learn how to use One-way ANOVA test to perform hypothesis testing
Learn how to use Kruskal–Wallis test to perform hypothesis testing

Sources

Rosner, Bernard. Fundamentals of Biostatistics. Cengage Learning, 2017
Triola, Mario F., and Laura Iossi. Elementary Statistics. Pearson, 2018

Rpubs

https://rpubs.com/zaidyousif/1136027

Load Libraries

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(openxlsx)

Load data

setwd("E:/Biostat and Study Design/204/Lectures/Data")
BP_df <- openxlsx::read.xlsx('SBP_ANOVA_DataSet.xlsx')

F-Distribution

For two normally distributed populations with equal variances (\(\sigma_1^2\) = \(\sigma_2^2\)), the sampling distribution of the test statistic \(F\) = \(s_1^2\) / \(s_2^2\) is the F distribution. If you repeat the process of selecting samples from two normally distributed populations with equal variances, the distribution of the ratio \(s_1^2\) / \(s_2^2\) is the F distribution.

Alt

There is a different F distribution for each different pair of degrees of freedom for the numerator and denominator. These are the properties of F distribution:

The F distribution is not symmetric.
Values of the F distribution cannot be negative.
The exact shape of the F distribution depends on the two different degrees of freedom.

One-Way ANOVA

One-way analysis of variance (ANOVA) is a method of testing the equality of three or more population means by analyzing sample variances. One-way ANOVA has the following requirements:

The populations have distributions that are approximately normal (loose requirement).
The populations have the same variance or standard deviation (\(sd_{max}/sd_{min} < 2\)) (loose requirement).
The samples are simple random samples of quantitative data.
The samples are independent of each other.

F test statistic is the ratio of these two estimates: (1) variance between samples (based on variance among sample means); and (2) variance within samples (based on the sample variances).

\[F\,Statistics = \frac{variance\,between\,samples}{variance\,within\,samples}\]

You might be wondering why not just test two samples at a time? For example, if we want to test the claim that the three populations have the same mean, pair the the samples and test each pair \(H_0: \mu1 = \mu2\), \(H_0: \mu2 = \mu3\), and \(H_0: \mu1 = \mu3\) ? If we use a 0.05 significance level for each of those three hypothesis tests, the actual overall confidence level could be as low as \(0.95^3\) (or 0.857). In general, as we increase the number of individual tests of significance, we increase the risk of finding a difference by chance alone (instead of a real difference in the means). The risk of a type I error: finding a difference in one of the pairs when no such difference actually exists.

Example: Using the Blood Pressure dataset, determine if there is an overall difference in the mean systolic blood pressure among the three groups:

group 1: Patients on placebo
group 2: Hypertension patients on lisinopril
group 3: Hypertension patients on new drug

Let’s explore the data.

by(BP_df$SBP,BP_df$Intervention,summary)

## BP_df$Intervention: lisinopril
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    99.0   114.8   120.0   120.3   127.0   146.0 
## ------------------------------------------------------------ 
## BP_df$Intervention: new_drug
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    96.0   116.8   123.0   123.0   129.0   150.0 
## ------------------------------------------------------------ 
## BP_df$Intervention: placebo
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   114.0   139.8   149.0   148.9   157.0   185.0

by(BP_df$SBP,BP_df$Intervention,length)

## BP_df$Intervention: lisinopril
## [1] 100
## ------------------------------------------------------------ 
## BP_df$Intervention: new_drug
## [1] 100
## ------------------------------------------------------------ 
## BP_df$Intervention: placebo
## [1] 100

BP_df %>%  ggplot(aes(y=SBP,x=Intervention)) +
    stat_boxplot(geom = 'errorbar',  width = 0.2) + 
    geom_boxplot(fill='deepskyblue',outlier.colour="red", outlier.size=4) +
    theme_light()

We assess normality using QQ plot

BP_df %>% ggplot(aes(sample = SBP)) +
  stat_qq_line(size=2,aes(color='red'))+
  stat_qq(size=2) +
  theme_light()+
    facet_grid(Intervention ~.)

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

The data appears to be normally distributed. Next we assess the variance assumption by comparing \(s_{min}\) and \(s_{max}\).

by(BP_df$SBP,BP_df$Intervention,sd)

## BP_df$Intervention: lisinopril
## [1] 9.547415
## ------------------------------------------------------------ 
## BP_df$Intervention: new_drug
## [1] 10.60636
## ------------------------------------------------------------ 
## BP_df$Intervention: placebo
## [1] 14.15058

14.15058/9.547415

## [1] 1.482137

The largest ratio is 1.48, which is less than 2. We proceed with step 1 of our analysis by testing the null hypothesis.

\(H_0: \mu1 = \mu2 = \mu3\)

\(H_1: \mu1 \neq\ \mu2 \,\, OR \,\, \mu1\neq\ \mu3\,\, OR \,\, \mu2\neq\ \mu3\)

ANOVA_mod <- aov(SBP ~ Intervention , data=BP_df)
summary(ANOVA_mod)

##               Df Sum Sq Mean Sq F value Pr(>F)    
## Intervention   2  50040   25020   185.8 <2e-16 ***
## Residuals    297  39985     135                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The overall F-test statistic is \(F_{stat}\) = 185.8 with a P-Value = 2e-16. Since the P-value is < 0.05, we reject this null hypothesis and conclude that the groups means are not all equal.

Having rejected the null hypothesis of equality of all groups means, we proceed to step 2: post-hoc comparisons. To compare groups with different means, we will run all pairwise group comparisons, while correcting for multiple testing. We will learn two methods for correcting multiple testing, Bonferroni correction and Tukey’s HSD. Generally, Tukey’s HSD is the preferred methods because it yields more accurate results. Bonferroni correction gives greater chance of failure to reject a false null hypothesis than other methods,

First, we start with Bonferroni correction. The p-value calculated by dividing the original alpha-value by the number of analyses on the dependent variable.

pairwise.t.test(BP_df$SBP, BP_df$Intervention, p.adj = "bonf")

## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  BP_df$SBP and BP_df$Intervention 
## 
##          lisinopril new_drug
## new_drug 0.29       -       
## placebo  <2e-16     <2e-16  
## 
## P value adjustment method: bonferroni

Post-hoc analysis indicates that the lisinopril and new drug groups differ significantly from the placebo group, after a Bonferroni correction.

Tukey’s HSD cannot be performed using pairwise.t.test() function. Instead, we use TukeyHSD function.

TukeyHSD(ANOVA_mod)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = SBP ~ Intervention, data = BP_df)
## 
## $Intervention
##                      diff       lwr       upr     p adj
## new_drug-lisinopril  2.73 -1.135199  6.595199 0.2209626
## placebo-lisinopril  28.66 24.794801 32.525199 0.0000000
## placebo-new_drug    25.93 22.064801 29.795199 0.0000000

The results of the Tukey-HSD comparisons are consistent with those of the other corrected comparison. Tukey’s HSD is generally considered to be more accurate.

Kruskal-Wallis test

The Kruskal-Wallis test (also called the H test) is a nonparametric test that uses ranks of combined simple random samples from three or more independent populations to test the null hypothesis that the populations have the same median. The F procedure is robust against deviations from normality especially when sample size is >= 30. However, if normality cannot be assumed, then use Kruskal-Wallis test which has the following requirements :

At least three independent simple random samples.
Each sample has at least five observations.

Example: Using the example above, test the same claim using Kruskal-Wallis test

\(H_0: median_1 = median_2 = median_3\)

\(H_1: median_1 \neq\ median_2 \,\, OR \,\, median_1\neq\ median_3\,\, OR \,\, median_2 \neq\ median_3\)

kruskal.test(SBP ~ Intervention, data=BP_df)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  SBP by Intervention
## Kruskal-Wallis chi-squared = 161.84, df = 2, p-value < 2.2e-16

Similar to the F procedure one-way ANOVA test, the Kruskal-Wallis test detected significant difference.

The post-hoc pairwise comparisons following the Kruskal-Wallist test use the Wilcoxon rank-sum test, and are implemented in pairwise.wilcox.test().

pairwise.wilcox.test(BP_df$SBP, BP_df$Intervention, p.adjust = "bonferroni")

## 
##  Pairwise comparisons using Wilcoxon rank sum test with continuity correction 
## 
## data:  BP_df$SBP and BP_df$Intervention 
## 
##          lisinopril new_drug
## new_drug 0.22       -       
## placebo  <2e-16     <2e-16  
## 
## P value adjustment method: bonferroni

Post-hoc analysis indicates that groups 1 and 2 differ significantly, after a Bonferroni correction.