chicken diet
dataThe “chicken diet data” is a classic example in statistical analysis that is often used to demonstrate the use of different statistical tests. The data consists of the weights of chickens that were fed different diets over a period of time. Here is a brief summary of the story behind the chicken diet data:
A group of chickens were divided into three groups, with each group being fed a different diet: a control diet, a high-protein diet, and a low-protein diet.
The weights of the chickens were measured at the beginning and end of the experiment.
The goal of the experiment was to determine whether there was a significant difference in the weight gain of the chickens between the three groups.
To analyze the data, a statistical test (such as a t-test or an ANOVA) was used to compare the mean weight gain of the chickens between the three groups.
The results of the statistical test allowed the researchers to conclude whether or not there was a significant difference in the weight gain of the chickens between the three groups.
# Set the seed for reproducibility
set.seed(123)
# Generate the synthetic data
c.data <- data.frame(
Diet = rep(c("Control", "High Protein", "Low Protein"), each = 10),
Weight = c(rnorm(10, mean = 1, sd = 0.5), rnorm(10, mean = 1.5, sd = 0.5), rnorm(10, mean = 1.2, sd = 0.5)))
head(c.data)
## Diet Weight
## 1 Control 0.7197622
## 2 Control 0.8849113
## 3 Control 1.7793542
## 4 Control 1.0352542
## 5 Control 1.0646439
## 6 Control 1.8575325
The one-way ANOVA is a statistical test that is used to compare the means of three or more groups. In the context of the chicken diet data, the one-way ANOVA can be used to determine whether there is a significant difference in the weight gain of the chickens between the three groups (control, high-protein, and low-protein).
# Fit a one-way ANOVA model
model <- aov(Weight ~ Diet, data = c.data)
# Perform the ANOVA test
anova <- anova(model)
The aov()
function is used to fit a one-way ANOVA model
to the data, and the anova()
function is used to perform
the actual ANOVA test.
The aov() function fits a linear model to the data, where the response variable is the weight of the chickens and the predictor variable is the type of diet that they were fed. The anova() function then takes the fitted model as an input and performs the ANOVA test to determine whether there is a significant difference in the means of the groups.
Using separate functions to fit the model and perform the test allows
you to have more control over the analysis. For example, you can use the
aov()
function to fit different types of models (such as a
two-way ANOVA or a mixed-effects model) and then use the
anova()
function to perform the ANOVA test on the fitted
model. This separation of the model fitting and testing steps also
allows you to perform other types of statistical tests (such as a t-test
or a regression analysis) on the same data, using different
functions.
The results of the one-way ANOVA are typically presented in the form of an ANOVA table, which provides information about the variance between the groups and within the groups. Here is an example of how the ANOVA table for the chicken diet data might look:
# Print the results of the ANOVA test
print(anova)
## Analysis of Variance Table
##
## Response: Weight
## Df Sum Sq Mean Sq F value Pr(>F)
## Diet 2 2.3471 1.17355 4.9348 0.01491 *
## Residuals 27 6.4208 0.23781
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The first row of the table shows the degrees of freedom (Df) for the diet group, the sum of squares (Sum Sq) between the groups, the mean square (Mean Sq) between the groups, the F value, and the p-value (Pr(>F)). The second row shows the degrees of freedom for the residuals (errors), the sum of squares within the groups, and the mean square within the groups.
To interpret the results of the one-way ANOVA, you can focus on the p-value, which indicates the probability of obtaining the observed difference in means between the groups by chance, given that there is no actual difference in the population. If the p-value is less than the chosen alpha level (usually 0.05), then the difference in means between the groups is considered to be statistically significant, and you can conclude that there is a significant difference in the weight gain of the chickens between the three groups.
In the example ANOVA table above, the p-value is 0.0149, which is
less than 0.05. This means that the difference in means between the
groups is
statistically significant, and you can conclude
that there is a significant difference in the weight gain of the
chickens between the three groups.
A permutation test is a statistical procedure that involves rearranging the values in a dataset and comparing the resulting distribution of a test statistic to the distribution of the test statistic calculated from the original data. Permutation tests are often used as a non-parametric alternative to traditional hypothesis tests, particularly when the assumptions of parametric tests are not met.
Here is a tutorial on how to conduct a simple permutation test in R to test whether there is a significant difference between the means of two groups of biological data:
Please, download the data from the Google drive link. “ChickData.xlsx”
library(tidyverse)
# Load the chicken data from an excel table.
d <- readxl::read_excel("~/Downloads/ChickData.xlsx")
head(d)
## # A tibble: 6 × 2
## weight feed
## <dbl> <chr>
## 1 325 meatmeal
## 2 257 meatmeal
## 3 303 meatmeal
## 4 315 meatmeal
## 5 380 meatmeal
## 6 153 meatmeal
# how many observations in each diet?
table(d$feed)
##
## casein meatmeal
## 12 11
# let's look at a boxplot of weight gain by those 2 diets
boxplot(d$weight~d$feed, las = 1,
ylab = "weight (g)",
xlab = "feed",
main = "Weight by Feed")
# calculate the difference in sample MEANS
mean(d$weight[d$feed == "casein"]) # mean for casein
## [1] 323.5833
mean(d$weight[d$feed == "meatmeal"]) # mean for meatmeal
## [1] 276.9091
# lets calculate the absolute diff in means
test.stat1 <- abs(mean(d$weight[d$feed == "casein"]) -
mean(d$weight[d$feed == "meatmeal"]))
test.stat1
## [1] 46.67424
# Permutation Test
# for reproducability of results
set.seed(1979)
# the number of observations to sample
n <- length(d$feed)
# the number of permutation samples to take
P <- 100000
# the variable we will resample from
variable <- d$weight
# initialize a matrix to store the permutation data
PermSamples <- matrix(0, nrow = n, ncol = P)
# each column is a permutation sample of data
# now, get those permutation samples, using a loop
# let's take a moment to discuss what that code is doing
for(i in 1:P){
PermSamples[, i] <- sample(variable,
size = n,
replace = FALSE)
}
# we can take a quick look at the first 5 columns of PermSamples
PermSamples[, 1:5]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 379 283 380 352 206
## [2,] 380 303 258 260 380
## [3,] 257 206 379 380 153
## [4,] 283 242 222 404 359
## [5,] 222 260 325 258 258
## [6,] 315 352 153 379 263
## [7,] 352 263 263 325 325
## [8,] 153 325 315 359 216
## [9,] 368 379 344 242 260
## [10,] 344 258 368 368 257
## [11,] 359 257 206 257 315
## [12,] 206 153 404 222 303
## [13,] 404 344 303 390 390
## [14,] 325 318 318 303 352
## [15,] 242 404 332 263 404
## [16,] 390 380 257 206 379
## [17,] 260 332 216 315 318
## [18,] 303 359 352 344 368
## [19,] 263 222 242 283 222
## [20,] 332 368 260 332 344
## [21,] 318 315 283 318 283
## [22,] 216 390 390 153 332
## [23,] 258 216 359 216 242
# initialize vectors to store all of the Test-stats
Perm.test.stat1 <- rep(0, P)
# loop through and calculate the test-stats
for (i in 1:P){
# calculate the perm-test-stat1 and save it
Perm.test.stat1[i] <- abs(mean(PermSamples[d$feed == "casein", i]) -
mean(PermSamples[d$feed == "meatmeal", i]))
}
# before going too far with this,
# let's remind ourselves of
# the TEST STATS
test.stat1
## [1] 46.67424
# and, let's calculate the permutation p-value
# notice how we can ask R a true/false question
(Perm.test.stat1 >= test.stat1)[1:10]
## [1] FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
# Calculate the p-value, for all P = 100,000
mean(Perm.test.stat1 >= test.stat1)
## [1] 0.09959
Disclaimer: This tutorial’s Permutation test part has been modified
mainly from
https://www.geeksforgeeks.org/permutation-hypothesis-test-in-r-programming/
sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0 dplyr_1.1.2
## [5] purrr_1.0.1 readr_2.1.4 tidyr_1.3.0 tibble_3.2.1
## [9] ggplot2_3.4.3 tidyverse_2.0.0
##
## loaded via a namespace (and not attached):
## [1] highr_0.10 cellranger_1.1.0 bslib_0.5.1 compiler_4.1.0
## [5] pillar_1.9.0 jquerylib_0.1.4 tools_4.1.0 digest_0.6.31
## [9] timechange_0.2.0 jsonlite_1.8.4 evaluate_0.21 lifecycle_1.0.3
## [13] gtable_0.3.3 pkgconfig_2.0.3 rlang_1.1.1 cli_3.6.1
## [17] rstudioapi_0.15.0 yaml_2.3.7 xfun_0.39 fastmap_1.1.1
## [21] withr_2.5.0 knitr_1.43 hms_1.1.3 generics_0.1.3
## [25] sass_0.4.6 vctrs_0.6.2 grid_4.1.0 tidyselect_1.2.0
## [29] glue_1.6.2 R6_2.5.1 fansi_1.0.4 readxl_1.4.2
## [33] rmarkdown_2.24 tzdb_0.3.0 magrittr_2.0.3 scales_1.2.1
## [37] htmltools_0.5.5 colorspace_2.1-0 utf8_1.2.3 stringi_1.7.12
## [41] munsell_0.5.0 cachem_1.0.8