library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(openxlsx)
setwd("E:/Biostat and Study Design/204/Lectures/Data")
BP_df <- openxlsx::read.xlsx('SBP_ANOVA_DataSet.xlsx')
For two normally distributed populations with equal variances (\(\sigma_1^2\) = \(\sigma_2^2\)), the sampling distribution of the test statistic \(F\) = \(s_1^2\) / \(s_2^2\) is the F distribution. If you repeat the process of selecting samples from two normally distributed populations with equal variances, the distribution of the ratio \(s_1^2\) / \(s_2^2\) is the F distribution.
There is a different F distribution for each different pair of degrees of freedom for the numerator and denominator. These are the properties of F distribution:
One-way analysis of variance (ANOVA) is a method of testing the equality of three or more population means by analyzing sample variances. One-way ANOVA has the following requirements:
F test statistic is the ratio of these two estimates: (1) variance between samples (based on variance among sample means); and (2) variance within samples (based on the sample variances).
\[F\,Statistics = \frac{variance\,between\,samples}{variance\,within\,samples}\]
You might be wondering why not just test two samples at a time? For example, if we want to test the claim that the three populations have the same mean, pair the the samples and test each pair \(H_0: \mu1 = \mu2\), \(H_0: \mu2 = \mu3\), and \(H_0: \mu1 = \mu3\) ? If we use a 0.05 significance level for each of those three hypothesis tests, the actual overall confidence level could be as low as \(0.95^3\) (or 0.857). In general, as we increase the number of individual tests of significance, we increase the risk of finding a difference by chance alone (instead of a real difference in the means). The risk of a type I error: finding a difference in one of the pairs when no such difference actually exists.
Example: Using the Blood Pressure dataset, determine if there is an overall difference in the mean systolic blood pressure among the three groups:
Let’s explore the data.
by(BP_df$SBP,BP_df$Intervention,summary)
## BP_df$Intervention: lisinopril
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 99.0 114.8 120.0 120.3 127.0 146.0
## ------------------------------------------------------------
## BP_df$Intervention: new_drug
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 96.0 116.8 123.0 123.0 129.0 150.0
## ------------------------------------------------------------
## BP_df$Intervention: placebo
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 114.0 139.8 149.0 148.9 157.0 185.0
by(BP_df$SBP,BP_df$Intervention,length)
## BP_df$Intervention: lisinopril
## [1] 100
## ------------------------------------------------------------
## BP_df$Intervention: new_drug
## [1] 100
## ------------------------------------------------------------
## BP_df$Intervention: placebo
## [1] 100
BP_df %>% ggplot(aes(y=SBP,x=Intervention)) +
stat_boxplot(geom = 'errorbar', width = 0.2) +
geom_boxplot(fill='deepskyblue',outlier.colour="red", outlier.size=4) +
theme_light()
We assess normality using QQ plot
BP_df %>% ggplot(aes(sample = SBP)) +
stat_qq_line(size=2,aes(color='red'))+
stat_qq(size=2) +
theme_light()+
facet_grid(Intervention ~.)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
The data appears to be normally distributed. Next we assess the variance assumption by comparing \(s_{min}\) and \(s_{max}\).
by(BP_df$SBP,BP_df$Intervention,sd)
## BP_df$Intervention: lisinopril
## [1] 9.547415
## ------------------------------------------------------------
## BP_df$Intervention: new_drug
## [1] 10.60636
## ------------------------------------------------------------
## BP_df$Intervention: placebo
## [1] 14.15058
14.15058/9.547415
## [1] 1.482137
The largest ratio is 1.48, which is less than 2. We proceed with step 1 of our analysis by testing the null hypothesis.
\(H_0: \mu1 = \mu2 = \mu3\)
\(H_1: \mu1 \neq\ \mu2 \,\, OR \,\, \mu1\neq\ \mu3\,\, OR \,\, \mu2\neq\ \mu3\)
ANOVA_mod <- aov(SBP ~ Intervention , data=BP_df)
summary(ANOVA_mod)
## Df Sum Sq Mean Sq F value Pr(>F)
## Intervention 2 50040 25020 185.8 <2e-16 ***
## Residuals 297 39985 135
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The overall F-test statistic is \(F_{stat}\) = 185.8 with a P-Value = 2e-16. Since the P-value is < 0.05, we reject this null hypothesis and conclude that the groups means are not all equal.
Having rejected the null hypothesis of equality of all groups means, we proceed to step 2: post-hoc comparisons. To compare groups with different means, we will run all pairwise group comparisons, while correcting for multiple testing. We will learn two methods for correcting multiple testing, Bonferroni correction and Tukey’s HSD. Generally, Tukey’s HSD is the preferred methods because it yields more accurate results. Bonferroni correction gives greater chance of failure to reject a false null hypothesis than other methods,
First, we start with Bonferroni correction. The p-value calculated by dividing the original alpha-value by the number of analyses on the dependent variable.
pairwise.t.test(BP_df$SBP, BP_df$Intervention, p.adj = "bonf")
##
## Pairwise comparisons using t tests with pooled SD
##
## data: BP_df$SBP and BP_df$Intervention
##
## lisinopril new_drug
## new_drug 0.29 -
## placebo <2e-16 <2e-16
##
## P value adjustment method: bonferroni
Post-hoc analysis indicates that the lisinopril and new drug groups differ significantly from the placebo group, after a Bonferroni correction.
Tukey’s HSD cannot be performed using pairwise.t.test() function. Instead, we use TukeyHSD function.
TukeyHSD(ANOVA_mod)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = SBP ~ Intervention, data = BP_df)
##
## $Intervention
## diff lwr upr p adj
## new_drug-lisinopril 2.73 -1.135199 6.595199 0.2209626
## placebo-lisinopril 28.66 24.794801 32.525199 0.0000000
## placebo-new_drug 25.93 22.064801 29.795199 0.0000000
The results of the Tukey-HSD comparisons are consistent with those of the other corrected comparison. Tukey’s HSD is generally considered to be more accurate.
The Kruskal-Wallis test (also called the H test) is a nonparametric test that uses ranks of combined simple random samples from three or more independent populations to test the null hypothesis that the populations have the same median. The F procedure is robust against deviations from normality especially when sample size is >= 30. However, if normality cannot be assumed, then use Kruskal-Wallis test which has the following requirements :
Example: Using the example above, test the same claim using Kruskal-Wallis test
\(H_0: median_1 = median_2 = median_3\)
\(H_1: median_1 \neq\ median_2 \,\, OR \,\, median_1\neq\ median_3\,\, OR \,\, median_2 \neq\ median_3\)
kruskal.test(SBP ~ Intervention, data=BP_df)
##
## Kruskal-Wallis rank sum test
##
## data: SBP by Intervention
## Kruskal-Wallis chi-squared = 161.84, df = 2, p-value < 2.2e-16
Similar to the F procedure one-way ANOVA test, the Kruskal-Wallis test detected significant difference.
The post-hoc pairwise comparisons following the Kruskal-Wallist test use the Wilcoxon rank-sum test, and are implemented in pairwise.wilcox.test().
pairwise.wilcox.test(BP_df$SBP, BP_df$Intervention, p.adjust = "bonferroni")
##
## Pairwise comparisons using Wilcoxon rank sum test with continuity correction
##
## data: BP_df$SBP and BP_df$Intervention
##
## lisinopril new_drug
## new_drug 0.22 -
## placebo <2e-16 <2e-16
##
## P value adjustment method: bonferroni
Post-hoc analysis indicates that groups 1 and 2 differ significantly, after a Bonferroni correction.