Computational Biology & Data Science

One-way ANOVA example with `chicken diet` data

The “chicken diet data” is a classic example in statistical analysis that is often used to demonstrate the use of different statistical tests. The data consists of the weights of chickens that were fed different diets over a period of time. Here is a brief summary of the story behind the chicken diet data:

A group of chickens were divided into three groups, with each group being fed a different diet: a control diet, a high-protein diet, and a low-protein diet.
The weights of the chickens were measured at the beginning and end of the experiment.
The goal of the experiment was to determine whether there was a significant difference in the weight gain of the chickens between the three groups.
To analyze the data, a statistical test (such as a t-test or an ANOVA) was used to compare the mean weight gain of the chickens between the three groups.
The results of the statistical test allowed the researchers to conclude whether or not there was a significant difference in the weight gain of the chickens between the three groups.

# Set the seed for reproducibility
set.seed(123)

# Generate the synthetic data
c.data <- data.frame(
  Diet = rep(c("Control", "High Protein", "Low Protein"), each = 10),
  Weight = c(rnorm(10, mean = 1, sd = 0.5), rnorm(10, mean = 1.5, sd = 0.5), rnorm(10, mean = 1.2, sd = 0.5)))

head(c.data)

##      Diet    Weight
## 1 Control 0.7197622
## 2 Control 0.8849113
## 3 Control 1.7793542
## 4 Control 1.0352542
## 5 Control 1.0646439
## 6 Control 1.8575325

The one-way ANOVA is a statistical test that is used to compare the means of three or more groups. In the context of the chicken diet data, the one-way ANOVA can be used to determine whether there is a significant difference in the weight gain of the chickens between the three groups (control, high-protein, and low-protein).

# Fit a one-way ANOVA model
model <- aov(Weight ~ Diet, data = c.data)

# Perform the ANOVA test
anova <- anova(model)

The aov() function is used to fit a one-way ANOVA model to the data, and the anova() function is used to perform the actual ANOVA test.

The aov() function fits a linear model to the data, where the response variable is the weight of the chickens and the predictor variable is the type of diet that they were fed. The anova() function then takes the fitted model as an input and performs the ANOVA test to determine whether there is a significant difference in the means of the groups.

Using separate functions to fit the model and perform the test allows you to have more control over the analysis. For example, you can use the aov() function to fit different types of models (such as a two-way ANOVA or a mixed-effects model) and then use the anova() function to perform the ANOVA test on the fitted model. This separation of the model fitting and testing steps also allows you to perform other types of statistical tests (such as a t-test or a regression analysis) on the same data, using different functions.

The results of the one-way ANOVA are typically presented in the form of an ANOVA table, which provides information about the variance between the groups and within the groups. Here is an example of how the ANOVA table for the chicken diet data might look:

# Print the results of the ANOVA test
print(anova)

## Analysis of Variance Table
## 
## Response: Weight
##           Df Sum Sq Mean Sq F value  Pr(>F)  
## Diet       2 2.3471 1.17355  4.9348 0.01491 *
## Residuals 27 6.4208 0.23781                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The first row of the table shows the degrees of freedom (Df) for the diet group, the sum of squares (Sum Sq) between the groups, the mean square (Mean Sq) between the groups, the F value, and the p-value (Pr(>F)). The second row shows the degrees of freedom for the residuals (errors), the sum of squares within the groups, and the mean square within the groups.

To interpret the results of the one-way ANOVA, you can focus on the p-value, which indicates the probability of obtaining the observed difference in means between the groups by chance, given that there is no actual difference in the population. If the p-value is less than the chosen alpha level (usually 0.05), then the difference in means between the groups is considered to be statistically significant, and you can conclude that there is a significant difference in the weight gain of the chickens between the three groups.

In the example ANOVA table above, the p-value is 0.0149, which is less than 0.05. This means that the difference in means between the groups is statistically significant, and you can conclude that there is a significant difference in the weight gain of the chickens between the three groups.

Permutation Test

A permutation test is a statistical procedure that involves rearranging the values in a dataset and comparing the resulting distribution of a test statistic to the distribution of the test statistic calculated from the original data. Permutation tests are often used as a non-parametric alternative to traditional hypothesis tests, particularly when the assumptions of parametric tests are not met.

Here is a tutorial on how to conduct a simple permutation test in R to test whether there is a significant difference between the means of two groups of biological data:

Please, download the data from the Google drive link. “ChickData.xlsx”

library(tidyverse)
# Load the chicken data from an excel table.
d <- readxl::read_excel("~/Downloads/ChickData.xlsx")

head(d)

## # A tibble: 6 × 2
##   weight feed    
##    <dbl> <chr>   
## 1    325 meatmeal
## 2    257 meatmeal
## 3    303 meatmeal
## 4    315 meatmeal
## 5    380 meatmeal
## 6    153 meatmeal

# how many observations in each diet?
table(d$feed)

## 
##   casein meatmeal 
##       12       11

# let's look at a boxplot of weight gain by those 2 diets
boxplot(d$weight~d$feed, las = 1,
        ylab = "weight (g)",
        xlab = "feed",
        main = "Weight by Feed")

# calculate the difference in sample MEANS
mean(d$weight[d$feed == "casein"]) # mean for casein

## [1] 323.5833

mean(d$weight[d$feed == "meatmeal"]) # mean for meatmeal

## [1] 276.9091

# lets calculate the absolute diff in means
test.stat1 <- abs(mean(d$weight[d$feed == "casein"]) -
                    mean(d$weight[d$feed == "meatmeal"]))
test.stat1

## [1] 46.67424

# Permutation Test

# for reproducability of results
set.seed(1979)

# the number of observations to sample
n <- length(d$feed)

# the number of permutation samples to take
P <- 100000

# the variable we will resample from
variable <- d$weight

# initialize a matrix to store the permutation data
PermSamples <- matrix(0, nrow = n, ncol = P)

# each column is a permutation sample of data
# now, get those permutation samples, using a loop
# let's take a moment to discuss what that code is doing
for(i in 1:P){
  PermSamples[, i] <- sample(variable,
                             size = n,
                             replace = FALSE)
}

# we can take a quick look at the first 5 columns of PermSamples
PermSamples[, 1:5]

##       [,1] [,2] [,3] [,4] [,5]
##  [1,]  379  283  380  352  206
##  [2,]  380  303  258  260  380
##  [3,]  257  206  379  380  153
##  [4,]  283  242  222  404  359
##  [5,]  222  260  325  258  258
##  [6,]  315  352  153  379  263
##  [7,]  352  263  263  325  325
##  [8,]  153  325  315  359  216
##  [9,]  368  379  344  242  260
## [10,]  344  258  368  368  257
## [11,]  359  257  206  257  315
## [12,]  206  153  404  222  303
## [13,]  404  344  303  390  390
## [14,]  325  318  318  303  352
## [15,]  242  404  332  263  404
## [16,]  390  380  257  206  379
## [17,]  260  332  216  315  318
## [18,]  303  359  352  344  368
## [19,]  263  222  242  283  222
## [20,]  332  368  260  332  344
## [21,]  318  315  283  318  283
## [22,]  216  390  390  153  332
## [23,]  258  216  359  216  242

# initialize vectors to store all of the Test-stats
Perm.test.stat1 <- rep(0, P)

# loop through and calculate the test-stats
for (i in 1:P){
  # calculate the perm-test-stat1 and save it
  Perm.test.stat1[i] <- abs(mean(PermSamples[d$feed == "casein", i]) -
                              mean(PermSamples[d$feed == "meatmeal", i]))
  
}
# before going too far with this,
# let's remind ourselves of
# the TEST STATS
test.stat1

## [1] 46.67424

# and, let's calculate the permutation p-value
# notice how we can ask R a true/false question
(Perm.test.stat1 >= test.stat1)[1:10]

##  [1] FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE

# Calculate the p-value, for all P = 100,000
mean(Perm.test.stat1 >= test.stat1)

## [1] 0.09959

Disclaimer: This tutorial’s Permutation test part has been modified mainly from https://www.geeksforgeeks.org/permutation-hypothesis-test-in-r-programming/

sessionInfo()

## R version 4.1.0 (2021-05-18)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] lubridate_1.9.2 forcats_1.0.0   stringr_1.5.0   dplyr_1.1.2    
##  [5] purrr_1.0.1     readr_2.1.4     tidyr_1.3.0     tibble_3.2.1   
##  [9] ggplot2_3.4.3   tidyverse_2.0.0
## 
## loaded via a namespace (and not attached):
##  [1] highr_0.10        cellranger_1.1.0  bslib_0.5.1       compiler_4.1.0   
##  [5] pillar_1.9.0      jquerylib_0.1.4   tools_4.1.0       digest_0.6.31    
##  [9] timechange_0.2.0  jsonlite_1.8.4    evaluate_0.21     lifecycle_1.0.3  
## [13] gtable_0.3.3      pkgconfig_2.0.3   rlang_1.1.1       cli_3.6.1        
## [17] rstudioapi_0.15.0 yaml_2.3.7        xfun_0.39         fastmap_1.1.1    
## [21] withr_2.5.0       knitr_1.43        hms_1.1.3         generics_0.1.3   
## [25] sass_0.4.6        vctrs_0.6.2       grid_4.1.0        tidyselect_1.2.0 
## [29] glue_1.6.2        R6_2.5.1          fansi_1.0.4       readxl_1.4.2     
## [33] rmarkdown_2.24    tzdb_0.3.0        magrittr_2.0.3    scales_1.2.1     
## [37] htmltools_0.5.5   colorspace_2.1-0  utf8_1.2.3        stringi_1.7.12   
## [41] munsell_0.5.0     cachem_1.0.8

Computational Biology & Data Science - Lecture 7

Yasin Kaymaz

2022-12-25

One-way ANOVA example with `chicken diet` data

Permutation Test

Computational Biology & Data Science - Lecture 7

Yasin Kaymaz

2022-12-25

One-way ANOVA example with chicken diet data

Permutation Test

One-way ANOVA example with `chicken diet` data