We’ll use the PlantGrowth data set that comes with R. It provides the weight of plants produced under two distinct treatment conditions and a control condition.

data <- PlantGrowth

We utilize the function sample n() [in the dplyr package] to get a sense of how the data looks.

set.seed(123)
dplyr::sample_n(data, 10)

   weight group
1    5.87  trt1
2    4.32  trt1
3    3.59  trt1
4    5.18  ctrl
5    5.14  ctrl
6    4.89  trt1
7    5.12  trt2
8    4.81  trt1
9    4.50  ctrl
10   4.69  trt1

The column “group” is known as a factor in R, while the different categories (“ctr”, “trt1”, “trt2”) are known as factor levels. The levels are listed in alphabetical order.

levels(data$group)

[1] "ctrl" "trt1" "trt2"

If the levels are not in the correct order automatically, reorder them as follows:

data$group <- ordered(data$group,
                         levels = c("ctrl", "trt1", "trt2"))

The dplyr package can be used to compute summary statistics (mean and sd) by groups.

Calculate summary statistics by groups – count, mean, and standard deviation:

library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

group_by(data, group) %>%
  summarise(
    count = n(),
    mean = mean(weight, na.rm = TRUE),
    sd = sd(weight, na.rm = TRUE)
  )

# A tibble: 3 × 4
  group count  mean    sd
  <ord> <int> <dbl> <dbl>
1 ctrl     10  5.03 0.583
2 trt1     10  4.66 0.794
3 trt2     10  5.53 0.443

Visualize your information

Read R base graphs to learn how to utilize them. For an easy ggplot2-based data visualization, we’ll use the ggpubr R tool.

Visualize your data with ggpubr:

library("ggpubr")

Warning: package 'ggpubr' was built under R version 4.2.1

Loading required package: ggplot2

Warning: package 'ggplot2' was built under R version 4.2.1

ggboxplot(data, x = "group", y = "weight",
          color = "group", palette = c("#00AFBB", "#E7B800", "#FC4E07"),
          order = c("ctrl", "trt1", "trt2"),
          ylab = "Weight", xlab = "Treatment")

One-way ANOVA Test in R

Add error bars: mean_se

library("ggpubr")
ggline(data, x = "group", y = "weight",
       add = c("mean_se", "jitter"),
       order = c("ctrl", "trt1", "trt2"),
       ylab = "Weight", xlab = "Treatment")

R’s one-way ANOVA test

boxplot(weight ~ group, data = data,
        xlab = "Treatment", ylab = "Weight",
        frame = FALSE, col = c("#00AFBB", "#E7B800", "#FC4E07"))

For plot means

library("gplots")

Warning: package 'gplots' was built under R version 4.2.1


Attaching package: 'gplots'

The following object is masked from 'package:stats':

    lowess

plotmeans(weight ~ group, data = data, frame = FALSE,
          xlab = "Treatment", ylab = "Weight",
          main="Mean Plot with 95% CI")

Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter

Warning in axis(1, at = 1:length(means), labels = legends, ...): "frame" is not
a graphical parameter

Warning in plot.xy(xy.coords(x, y), type = type, ...): "frame" is not a
graphical parameter

Calculate the one-way ANOVA test

We want to see if the average weights of the plants in the three experimental circumstances vary significantly.

This question can be answered using the R function aov(). Summary of the function The analysis of the variance model is summarised using aov().

Perform an analysis of variance.

res.aov <- aov(weight ~ group, data = data)

Summary of the analysis

summary(res.aov)

            Df Sum Sq Mean Sq F value Pr(>F)  
group        2  3.766  1.8832   4.846 0.0159 *
Residuals   27 10.492  0.3886                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The columns F value and Pr(>F) in the output corresponding to the p-value of the test.

The results of one-way ANOVA testing should be interpreted.

We can conclude that there are significant differences between the groups highlighted with “*” in the model summary because the p-value is less than the significance level of 0.05.

Multiple pairwise comparisons between group means

Multiple Tukey pairwise comparisons

We can compute Tukey HSD (Tukey Honest Significant Differences, R function: TukeyHSD()) for doing numerous pairwise comparisons between the means of groups because the ANOVA test is significant.

The fitted ANOVA is passed to TukeyHD() as an input.

TukeyHSD(res.aov)

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = weight ~ group, data = data)

$group
            diff        lwr       upr     p adj
trt1-ctrl -0.371 -1.0622161 0.3202161 0.3908711
trt2-ctrl  0.494 -0.1972161 1.1852161 0.1979960
trt2-trt1  0.865  0.1737839 1.5562161 0.0120064

lwr, upr: the lower and upper-end points of the confidence interval at 95 percent (default) p adj: p-value after multiple comparisons adjustment.

Only the difference between trt2 and trt1 is significant, as shown by the output, with an adjusted p-value of 0.012.

Using the multcomp package, perform several comparisons.

The function glht() [in the multcomp package] can be used to do multiple comparison processes for an ANOVA. General linear hypothesis tests are abbreviated as glht. The following is a simplified format:

library(multcomp)

Warning: package 'multcomp' was built under R version 4.2.1

Loading required package: mvtnorm

Loading required package: survival

Loading required package: TH.data

Warning: package 'TH.data' was built under R version 4.2.1

Loading required package: MASS


Attaching package: 'MASS'

The following object is masked from 'package:dplyr':

    select


Attaching package: 'TH.data'

The following object is masked from 'package:MASS':

    geyser

glht(model, lincft)

model: a model that has been fitted, such as an object returned by aov ().

lincft() specifies the linear hypotheses that will be tested. Objects provided from the function mcp are used to specify multiple comparisons in ANOVA models ().

For a one-way ANOVA, use glht() to make numerous pairwise comparisons:

library(multcomp)
summary(glht(res.aov, linfct = mcp(group = "Tukey")))


     Simultaneous Tests for General Linear Hypotheses

Multiple Comparisons of Means: Tukey Contrasts


Fit: aov(formula = weight ~ group, data = data)

Linear Hypotheses:
                 Estimate Std. Error t value Pr(>|t|)  
trt1 - ctrl == 0  -0.3710     0.2788  -1.331   0.3908  
trt2 - ctrl == 0   0.4940     0.2788   1.772   0.1979  
trt2 - trt1 == 0   0.8650     0.2788   3.103   0.0121 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)

T-test with pairs

Pairwise comparisons across group levels with various testing corrections can also be calculated using the pairewise.t.test() function.

pairwise.t.test(data$weight, data$group,
                 p.adjust.method = "BH")


    Pairwise comparisons using t tests with pooled SD 

data:  data$weight and data$group 

     ctrl  trt1 
trt1 0.194 -    
trt2 0.132 0.013

P value adjustment method: BH

The output is a table of pairwise comparison p-values. The Benjamini-Hochberg method was used to alter the p-values in this case.

Examine the assumption of homogeneity of variance

The residuals versus fits plot can be used to assess variance homogeneity.

There are no clear correlations between residuals and fitted values (the mean of each group) in the plot below, which is good. As a result, we can assume that the variances are homogeneous.

1: Variance homogeneity

plot(res.aov, 1)

ANOVA (one-way) Outliers are discovered in R Points 17, 15, 4, which can have a significant impact on normality and homogeneity of variance.

To meet the test assumptions, it can be beneficial to remove outliers.

The homogeneity of variances can also be checked using Bartlett’s or Levene’s tests.

Levene’s test is recommended since it is less sensitive to deviations from normal distribution. The leveneTest() method [from the car package] will be used:

library(car)

Warning: package 'car' was built under R version 4.2.1

Loading required package: carData

Warning: package 'carData' was built under R version 4.2.1


Attaching package: 'car'

The following object is masked from 'package:dplyr':

    recode

leveneTest(weight ~ group, data = data)

Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  2  1.1192 0.3412
      27

The p-value is not less than the significance level of 0.05, as seen in the output above. This indicates that there is no indication that the variance across groups is statistically significant.

As a result, we can infer that the variations in the different treatment groups are homogeneous.

Relaxing the premise of homogeneity of variance

ANOVA test with no equal variance assumption

oneway.test(weight ~ group, data = data)


    One-way analysis of means (not assuming equal variances)

data:  weight and group
F = 5.181, num df = 2.000, denom df = 17.128, p-value = 0.01739

Pairwise t-tests with no assumption of equal variances

pairwise.t.test(data$weight, data$group,
                 p.adjust.method = "BH", pool.sd = FALSE)


    Pairwise comparisons using t tests with non-pooled SD 

data:  data$weight and data$group 

     ctrl  trt1 
trt1 0.250 -    
trt2 0.072 0.028

P value adjustment method: BH

Examine the presumption of normality

The residuals’ normal probability plot is used to verify that the residuals are normally distributed. It should be about in a straight line.

2: Normality

plot(res.aov, 2)

ANOVA (one-way) R-based test

We can infer normality because all of the points lie roughly along this reference line.

Extract the residuals

aov_residuals <- residuals(object = res.aov )

Run Shapiro-Wilk test

Shapiro-Wilk normality test

shapiro.test(x = aov_residuals )


    Shapiro-Wilk normality test

data:  aov_residuals
W = 0.96607, p-value = 0.4379

The Shapiro-Wilk test on the ANOVA residuals (W = 0.96, p = 0.43), which finds no evidence of normality violation, supports the previous conclusion.

ANOVA test with a non-parametric alternative

The Kruskal-Wallis rank-sum test is a non-parametric alternative to one-way ANOVA that can be employed when the ANOVA assumptions are not met.

kruskal.test(weight ~ group, data = data)


    Kruskal-Wallis rank sum test

data:  weight by group
Kruskal-Wallis chi-squared = 7.9882, df = 2, p-value = 0.01842

EXERCISE 5 (One-way ANOVA) - STAT 55

Ian Duhaylungsod

2022-09-27