What this lab does (read me first):
- You will complete a fully guided one-way ANOVA for HCNp values for each Organ studied in this paper (with plots, assumptions, transformations).
- Then you will replicate the workflow for Protein ~ Organ - After each task, answer the numbered interpretation questions in your own words.
As always work through the lab one chunk of code at a time and answer the student questions as they occur. This lab is different as once you finish the guided work for the HCNp values you’ll need to copy and paste the code to do the same plots and analyses using the protein data. You will use the same procedures as last time - create a new folder for this lab and save the .Rmd file in that folder. Within that folder create a new folder called “data” which is where you will put the data files. I’m giving you all three data files from the paper you discussed on Wednesday but this lab will only use the Additional file 1 data to analyze.
Load packages, locate the CSV and coerce Organ
to a
factor so ANOVA treats it as a categorical group variable.
suppressPackageStartupMessages({
library(readr) # fast, tidy CSV reading
library(dplyr) # data wrangling verbs
library(ggplot2) # plotting
library(car) # Levene's test & power transforms
library(broom) # tidy model outputs
library(knitr) # kable tables
})
knitr::opts_chunk$set(fig.width = 6, fig.asp = 1)
# Load and prepare the dataset for ANOVA
# Read in the CSV file
# - readr::read_csv() reads a CSV (comma-separated values) file into R as a tibble (a tidyverse-friendly version of a data frame)
hc <- readr::read_csv(here::here("data", "Additional file 1.csv"))
glimpse(hc) #look at the data to make sure everything looks right
## Rows: 80
## Columns: 3
## $ Organ <chr> "young leaf", "young leaf", "young leaf", "young leaf", "young…
## $ Protein <dbl> 12.71, 13.42, 12.19, 12.18, 14.94, 13.11, 12.94, 13.77, 10.21,…
## $ HCNp <dbl> 68.300000, 68.700000, 65.200000, 64.500000, 64.214383, 69.9803…
hc1 <- hc %>% #creates a new dataframe with Organ (categorical) and HCNp
select(Organ, HCNp) %>%
mutate(Organ = factor(Organ)) # assures that Organ is categorical
Student questions (Setup):
Organ
to a factor before
ANOVA?glimpse()
command and what types of variables are they (e.g., numeric, character)?
Young leaf, young leaf, young leaf, young leaf over and over. They are
categorical. Organ = category. Category is young leaf.Plots help us see patterns, spread, and outliers before modeling. Boxplots summarize the distribution per group; jittered points show raw data.
#tells knitr how big to make any figures (6 inches by 4 inches) produced #in this chunk
ggplot(
data = hc1,
aes(x = Organ, y = HCNp, fill = Organ)
) +
geom_boxplot(
alpha = 0.75,
outlier.shape = NA
) +
# Draws boxplots
# alpha = 0.75 makes boxes semi-transparent so points are visible on top.
# outlier.shape = NA hides the built-in outlier dots (we’ll show raw #points ourselves). Add raw observations as jittered points so they don’t #stack vertically.
geom_jitter(
aes(color = Organ),
width = 0.15,
height = 0,
alpha = 0.6,
size = 1
) +
# Map Organ to 'color' so point outlines match the box color.
# width controls horizontal jitter; height = 0 keeps true HCNp values.
# alpha < 1 and small size help with overplotting.
# Title and axis labels.
labs(
title = "HCNp across organs",
y = "HCNp",
x = "Organ"
) +
# Theme tweaks below:
# - rotate x labels vertically for long organ names
# - drop legend (x axis already identifies groups)
theme(
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1),
legend.position = "none"
)
Student questions:
ANOVA tests whether all group means are equal vs at least one differs. It partitions total variability into between-group and within-group components and compares them via an F-ratio.
# Safety: ensure required columns exist and drop NAs (if there is missing
#data in an excel file, make sure the excel file has NA in that cell)
stopifnot(all(c("HCNp","Organ") %in% names(hc1)))
# Fit the ANOVA model (HCNp as response; Organ as categorical predictor)
fit1 <- aov(HCNp ~ Organ, data = hc1)
# Prints the ANOVA table so it appears in the knitted output
cat("\nANOVA table: HCNp ~ Organ\n")
##
## ANOVA table: HCNp ~ Organ
print(summary(fit1))
## Df Sum Sq Mean Sq F value Pr(>F)
## Organ 9 30900 3433 381.6 <2e-16 ***
## Residuals 70 630 9
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Context: group sizes & means (helps when interpreting differences)
means_hcn <- hc1 %>%
dplyr::group_by(Organ) %>%
dplyr::summarise(
n = dplyr::n(),
mean = mean(HCNp),
sd = sd(HCNp),
.groups = "drop"
)
knitr::kable(means_hcn, digits = 3, caption = "Group sizes and means for HCNp by Organ")# kable (from knitr) turns a data frame/tibble/matrix into #a formatted table using the means_hcn when you knit an R Markdown. It allows you to #add captions, rounding, alignment, and custom column names.
Organ | n | mean | sd |
---|---|---|---|
Fully developed flower (white) | 8 | 8.662 | 2.192 |
Fully developed flower (yellow) | 8 | 6.235 | 1.253 |
Intermediate fruit | 8 | 2.331 | 0.753 |
large flower bud | 8 | 10.030 | 1.728 |
Mature fruit | 8 | 0.093 | 0.081 |
mature leaf | 8 | 6.464 | 1.708 |
medium leaf | 8 | 32.111 | 1.572 |
small flower bud | 8 | 25.255 | 3.823 |
Young fruit | 8 | 32.055 | 7.080 |
young leaf | 8 | 67.357 | 3.147 |
# --- Extract reporting numbers ---
# Build the ANOVA-like summary table from the fitted model object `fit1`.
# For aov objects, summary(fit1)[[1]] returns a matrix/data.frame with rows for model terms (e.g., "Organ") and "Residuals", and columns like "Df", "F value", "Pr(>F)".
aov_tab <- summary(fit1)[[1]]
# Pull the F statistic for the factor named exactly "Organ" from the #table.Assumes the row name is "Organ" (i.e., the model was fit with Organ #as a factor term).
F_value <- aov_tab["Organ","F value"]
# Pull the p-value for the Organ effect (column name "Pr(>F)").
p_value <- aov_tab["Organ","Pr(>F)"]
# Extract degrees of freedom:
# df1 = numerator df for Organ (its row's "Df")
# df2 = denominator df from the Residuals row
df1 <- aov_tab["Organ","Df"]; df2 <- aov_tab["Residuals","Df"]
# Print a compact report line:
# F(df1, df2) = F_value, p = p_value
# Note: %.4g prints p in up to 4 significant digits (switching to scientific notation if tiny).
cat(sprintf("\nReport: F(%d, %d) = %.3f, p = %.4g\n", df1, df2, F_value, p_value))
##
## Report: F(9, 70) = 381.636, p = 8.289e-56
Student questions (A2):
Write a one-sentence result with the F statistic, df, and
p-value. Is there evidence that mean HCNp differs among organs?
The results indicate a vary large F-value (381.636) with 9 degrees of
freedom (8 + 1) between groups and 70 degrees of freedom within groups.
This indicates the variance between groups is much greater than the
variance within groups. These values, along with the p-value (which is
very close to 0) indicate that variance is significantly different
between at least 1 of the groups.
Looking at the means table, which organs appear
most different in average HCNp? (This is just
descriptive; we will do post hoc testing to confirm pairwise
differences.)
Young leaf had the highest average HCNp (much larger than the rest) and
mature fruit had the lowest average HCNp (smaller than the
rest)
ANOVA assumes independent observations, roughly normal residuals, and homogeneous variances. We check residual plots and use tests as guides (not absolute rules).
# Residuals vs Fitted: look for no pattern (looking for a random cloud of #points), and roughly constant vertical spread
ggplot(data.frame(fit = fitted(fit1), res = resid(fit1)), aes(fit, res)) +
geom_point(alpha = 0.6) +
geom_hline(yintercept = 0, linetype = "dashed") +
labs(title = "Residuals vs Fitted (HCNp ~ Organ)", x = "Fitted values", y = "Residuals")
# QQ plot: residuals should follow the reference line if approximately normal
ggplot(data.frame(res = resid(fit1)), aes(sample = res)) +
stat_qq() + stat_qq_line() +
labs(title = "QQ plot of residuals (HCNp ~ Organ)", x = "Theoretical Quantiles", y = "Sample Quantiles")
# 1) Normality of residuals (ANOVA assumes normally distributed errors)
# We test the residuals from the fitted model, not each raw group,
# because ANOVA is about model errors, not the raw data per se.
cat("\nShapiro–Wilk normality test (residuals): HCNp ~ Organ\n")
##
## Shapiro–Wilk normality test (residuals): HCNp ~ Organ
res <- residuals(fit1)
print(shapiro.test(resid(fit1)))
##
## Shapiro-Wilk normality test
##
## data: resid(fit1)
## W = 0.87512, p-value = 1.254e-06
# Interpretation tip:
# p < 0.05 → evidence residuals deviate from normality
# p ≥ 0.05 → no strong evidence against normality
# Remember: with moderate n, ANOVA is fairly robust to mild non-normality.
# 2) test for Homogeneity of variances (equal variances across groups)
cat("\nLevene's test for equal variances (median-centered): HCNp ~ Organ\n")
##
## Levene's test for equal variances (median-centered): HCNp ~ Organ
print(leveneTest(HCNp ~ Organ, data = hc1, center = median))
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 9 4.9223 4.049e-05 ***
## 70
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 3) Welch's ANOVA is a statistical test used as an alternative to the traditional one-way ANOVA when the assumption of equal variances across groups is violated. It is particularly useful for comparing the means of three or more groups when their variances are unequal. Welch's ANOVA still assumes ~normality)
# Use when Levene indicates heteroscedasticity.
cat("\nWelch's ANOVA (robust to unequal variances): HCNp ~ Organ\n")
##
## Welch's ANOVA (robust to unequal variances): HCNp ~ Organ
print(oneway.test(HCNp ~ Organ, data = hc1, var.equal = FALSE))
##
## One-way analysis of means (not assuming equal variances)
##
## data: HCNp and Organ
## F = 752.54, num df = 9.000, denom df = 25.805, p-value < 2.2e-16
# Interpretation tip:
# If Welch is significant while classic ANOVA disagrees, trust Welch when variances are unequal.
# Practical guidance:
# - These tests inform, they don’t automatically veto ANOVA.
# - If assumptions look shaky: consider data transformations, Welch’s ANOVA,
# or nonparametric alternatives (e.g., Kruskal–Wallis) depending on the issue.
Student questions (A3):
If variance increases with the mean (fan shape) or residuals are skewed, a log (or power) transform can stabilize variance and improve normality.
.safe_log <- function(x) if (all(x > 0, na.rm = TRUE)) log(x) else log1p(x)
hc_hcn <- hc1 %>% dplyr::mutate(HCNp_log = .safe_log(HCNp))
fit1_log <- aov(HCNp_log ~ Organ, data = hc_hcn)
cat("\nANOVA (log scale): HCNp_log ~ Organ\n")
##
## ANOVA (log scale): HCNp_log ~ Organ
print(summary(fit1_log))
## Df Sum Sq Mean Sq F value Pr(>F)
## Organ 9 109.2 12.129 425.2 <2e-16 ***
## Residuals 70 2.0 0.029
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
cat("\nLevene on log scale (median-centered):\n")
##
## Levene on log scale (median-centered):
print(leveneTest(HCNp_log ~ Organ, data = hc_hcn, center = median))
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 9 2.4378 0.01788 *
## 70
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ggplot(data.frame(fit = fitted(fit1_log), res = resid(fit1_log)), aes(fit, res)) +
geom_point(alpha = 0.6) + geom_hline(yintercept = 0, linetype = "dashed") +
labs(title = "Residuals vs Fitted (log HCNp ~ Organ)", x = "Fitted (log HCNp)", y = "Residuals")
ggplot(data.frame(res = resid(fit1_log)), aes(sample = res)) +
stat_qq() + stat_qq_line() +
labs(title = "QQ plot (log HCNp ~ Organ)", x = "Theoretical Quantiles", y = "Sample Quantiles")
Student questions (A4):
Did the log transform improve the diagnostics? Cite plots and Levene’s p-value. Yes, the log transformation improved diagnostics by a little bit. There is still some variation in the residual vs fitted plot, but it makes less of a shape. There is still some variation in the QQ plot at the tails. Levene’s p-value is greater than 0.001, meaning that the data is homogeneus and the data has equal variance.
Which model will you report (original vs log)? Justify briefly
(1–2 sentences).
I will report the log transformed analysis because the transformation
resulted in equal variances of the groups, which qualifies the data for
a classic ANOVA. The p-value from the ANOVA is also smaller than in the
transformed data than in the original data. —
If your chosen ANOVA is significant, Tukey HSD controls the family-wise error while comparing all pairs. The code below is there for the log transformed values. You would use the fit1 if you are doing the Tukey test with the non-transformed data (the original dataset) - I don’t care which one you choose, but whichever you choose justify the choice
tuk <- TukeyHSD(if (exists("fit1_log")) fit1_log else fit1) # use transformed fit if you chose it
tuk_df <- broom::tidy(tuk) %>%
dplyr::mutate(sig = ifelse(adj.p.value < 0.05, "Significant", "Not significant")) %>%
dplyr::arrange(adj.p.value)
knitr::kable(tuk_df, digits = 3, caption = "Tukey HSD pairwise comparisons (HCNp)")
term | contrast | null.value | estimate | conf.low | conf.high | adj.p.value | sig |
---|---|---|---|---|---|---|---|
Organ | Intermediate fruit-Fully developed flower (white) | 0 | -1.066 | -1.342 | -0.790 | 0.000 | Significant |
Organ | Mature fruit-Fully developed flower (white) | 0 | -2.160 | -2.436 | -1.884 | 0.000 | Significant |
Organ | medium leaf-Fully developed flower (white) | 0 | 1.252 | 0.976 | 1.528 | 0.000 | Significant |
Organ | small flower bud-Fully developed flower (white) | 0 | 1.012 | 0.736 | 1.288 | 0.000 | Significant |
Organ | Young fruit-Fully developed flower (white) | 0 | 1.232 | 0.956 | 1.508 | 0.000 | Significant |
Organ | young leaf-Fully developed flower (white) | 0 | 1.977 | 1.701 | 2.253 | 0.000 | Significant |
Organ | Intermediate fruit-Fully developed flower (yellow) | 0 | -0.786 | -1.062 | -0.510 | 0.000 | Significant |
Organ | Mature fruit-Fully developed flower (yellow) | 0 | -1.881 | -2.157 | -1.605 | 0.000 | Significant |
Organ | medium leaf-Fully developed flower (yellow) | 0 | 1.532 | 1.256 | 1.808 | 0.000 | Significant |
Organ | small flower bud-Fully developed flower (yellow) | 0 | 1.291 | 1.015 | 1.567 | 0.000 | Significant |
Organ | Young fruit-Fully developed flower (yellow) | 0 | 1.511 | 1.235 | 1.787 | 0.000 | Significant |
Organ | young leaf-Fully developed flower (yellow) | 0 | 2.257 | 1.981 | 2.533 | 0.000 | Significant |
Organ | large flower bud-Intermediate fruit | 0 | 1.209 | 0.933 | 1.485 | 0.000 | Significant |
Organ | Mature fruit-Intermediate fruit | 0 | -1.095 | -1.371 | -0.819 | 0.000 | Significant |
Organ | mature leaf-Intermediate fruit | 0 | 0.804 | 0.528 | 1.080 | 0.000 | Significant |
Organ | medium leaf-Intermediate fruit | 0 | 2.318 | 2.042 | 2.594 | 0.000 | Significant |
Organ | small flower bud-Intermediate fruit | 0 | 2.077 | 1.801 | 2.353 | 0.000 | Significant |
Organ | Young fruit-Intermediate fruit | 0 | 2.298 | 2.022 | 2.574 | 0.000 | Significant |
Organ | young leaf-Intermediate fruit | 0 | 3.043 | 2.767 | 3.319 | 0.000 | Significant |
Organ | Mature fruit-large flower bud | 0 | -2.304 | -2.580 | -2.028 | 0.000 | Significant |
Organ | medium leaf-large flower bud | 0 | 1.109 | 0.833 | 1.385 | 0.000 | Significant |
Organ | small flower bud-large flower bud | 0 | 0.868 | 0.592 | 1.144 | 0.000 | Significant |
Organ | Young fruit-large flower bud | 0 | 1.088 | 0.812 | 1.364 | 0.000 | Significant |
Organ | young leaf-large flower bud | 0 | 1.834 | 1.558 | 2.110 | 0.000 | Significant |
Organ | mature leaf-Mature fruit | 0 | 1.899 | 1.623 | 2.175 | 0.000 | Significant |
Organ | medium leaf-Mature fruit | 0 | 3.412 | 3.136 | 3.688 | 0.000 | Significant |
Organ | small flower bud-Mature fruit | 0 | 3.172 | 2.896 | 3.448 | 0.000 | Significant |
Organ | Young fruit-Mature fruit | 0 | 3.392 | 3.116 | 3.668 | 0.000 | Significant |
Organ | young leaf-Mature fruit | 0 | 4.137 | 3.861 | 4.413 | 0.000 | Significant |
Organ | medium leaf-mature leaf | 0 | 1.514 | 1.238 | 1.790 | 0.000 | Significant |
Organ | small flower bud-mature leaf | 0 | 1.273 | 0.997 | 1.549 | 0.000 | Significant |
Organ | Young fruit-mature leaf | 0 | 1.493 | 1.217 | 1.769 | 0.000 | Significant |
Organ | young leaf-mature leaf | 0 | 2.239 | 1.963 | 2.515 | 0.000 | Significant |
Organ | young leaf-small flower bud | 0 | 0.966 | 0.690 | 1.241 | 0.000 | Significant |
Organ | young leaf-Young fruit | 0 | 0.745 | 0.469 | 1.021 | 0.000 | Significant |
Organ | young leaf-medium leaf | 0 | 0.725 | 0.449 | 1.001 | 0.000 | Significant |
Organ | large flower bud-Fully developed flower (yellow) | 0 | 0.423 | 0.147 | 0.699 | 0.000 | Significant |
Organ | mature leaf-large flower bud | 0 | -0.405 | -0.681 | -0.129 | 0.000 | Significant |
Organ | Fully developed flower (yellow)-Fully developed flower (white) | 0 | -0.279 | -0.555 | -0.003 | 0.045 | Significant |
Organ | mature leaf-Fully developed flower (white) | 0 | -0.262 | -0.537 | 0.014 | 0.078 | Not significant |
Organ | small flower bud-medium leaf | 0 | -0.241 | -0.516 | 0.035 | 0.141 | Not significant |
Organ | Young fruit-small flower bud | 0 | 0.220 | -0.056 | 0.496 | 0.233 | Not significant |
Organ | large flower bud-Fully developed flower (white) | 0 | 0.144 | -0.132 | 0.420 | 0.792 | Not significant |
Organ | Young fruit-medium leaf | 0 | -0.020 | -0.296 | 0.256 | 1.000 | Not significant |
Organ | mature leaf-Fully developed flower (yellow) | 0 | 0.018 | -0.258 | 0.294 | 1.000 | Not significant |
Student questions (A5):
Now repeat the workflow for analysis of HCNp for Protein. The steps are listed; write/copy and the code yourself. This tests your ability to execute the analysis and interpret results independently.
General tip when copying/pasting then editing code: make sure to change the names of the variables you create - for example, make sure to change hc1 to hc2, and fit1 to fit2, etc.
Why: See relative medians, spread, and outliers before modeling.
Do: Create a boxplot with jittered points
(geom_boxplot(...) + geom_jitter(...)
). Make the panel
square using aspect.ratio = 1
and rotate x-axis labels for
readability.
suppressPackageStartupMessages({
library(readr) # fast, tidy CSV reading
library(dplyr) # data wrangling verbs
library(ggplot2) # plotting
library(car) # Levene's test & power transforms
library(broom) # tidy model outputs
library(knitr) # kable tables
})
knitr::opts_chunk$set(fig.width = 6, fig.asp = 1)
# Load and prepare the dataset for ANOVA
# Read in the CSV file
# - readr::read_csv() reads a CSV (comma-separated values) file into R as a tibble (a tidyverse-friendly version of a data frame)
hc <- readr::read_csv(here::here("data", "Additional file 1.csv"))
## Rows: 80 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Organ
## dbl (2): Protein, HCNp
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(hc) #look at the data to make sure everything looks right
## Rows: 80
## Columns: 3
## $ Organ <chr> "young leaf", "young leaf", "young leaf", "young leaf", "young…
## $ Protein <dbl> 12.71, 13.42, 12.19, 12.18, 14.94, 13.11, 12.94, 13.77, 10.21,…
## $ HCNp <dbl> 68.300000, 68.700000, 65.200000, 64.500000, 64.214383, 69.9803…
hc2 <- hc %>% #creates a new dataframe with Organ (categorical) and protein
select(Organ, Protein) %>%
mutate(Organ = factor(Organ)) # assures that Organ is categorical
#tells knitr how big to make any figures (6 inches by 4 inches) produced #in this chunk
ggplot(
data = hc2,
aes(x = Organ, y = Protein, fill = Organ)
) +
geom_boxplot(
alpha = 0.75,
outlier.shape = NA
) +
# Draws boxplots
# alpha = 0.75 makes boxes semi-transparent so points are visible on top.
# outlier.shape = NA hides the built-in outlier dots (we’ll show raw #points ourselves). Add raw observations as jittered points so they don’t #stack vertically.
geom_jitter(
aes(color = Organ),
width = 0.15,
height = 0,
alpha = 0.6,
size = 1
) +
# Map Organ to 'color' so point outlines match the box color.
# width controls horizontal jitter; height = 0 keeps true protein values.
# alpha < 1 and small size help with overplotting.
# Title and axis labels.
labs(
title = "Protein across organs",
y = "Protein",
x = "Organ"
) +
# Theme tweaks below:
# - rotate x labels vertically for long organ names
# - drop legend (x axis already identifies groups)
theme(
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1),
legend.position = "none"
)
ANOVA tests whether all group means are equal vs at least one differs. It partitions total variability into between-group and within-group components and compares them via an F-ratio.
# Safety: ensure required columns exist and drop NAs (if there is missing
#data in an excel file, make sure the excel file has NA in that cell)
stopifnot(all(c("Protein","Organ") %in% names(hc2)))
# Fit the ANOVA model (HCNp as response; Organ as categorical predictor)
fit2 <- aov(Protein ~ Organ, data = hc2)
# Prints the ANOVA table so it appears in the knitted output
cat("\nANOVA table: Protein ~ Organ\n")
##
## ANOVA table: Protein ~ Organ
print(summary(fit2))
## Df Sum Sq Mean Sq F value Pr(>F)
## Organ 9 657.5 73.06 21.68 <2e-16 ***
## Residuals 70 235.9 3.37
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Context: group sizes & means (helps when interpreting differences)
means_protein <- hc2 %>%
dplyr::group_by(Organ) %>%
dplyr::summarise(
n = dplyr::n(),
mean = mean(Protein),
sd = sd(Protein),
.groups = "drop"
)
knitr::kable(means_protein, digits = 3, caption = "Group sizes and means for Protein by Organ")# kable (from knitr) turns a data frame/tibble/matrix into #a formatted table using the means_hcn when you knit an R Markdown. It allows you to #add captions, rounding, alignment, and custom column names.
Organ | n | mean | sd |
---|---|---|---|
Fully developed flower (white) | 8 | 12.137 | 1.604 |
Fully developed flower (yellow) | 8 | 11.470 | 2.366 |
Intermediate fruit | 8 | 7.862 | 1.357 |
large flower bud | 8 | 13.598 | 2.460 |
Mature fruit | 8 | 7.338 | 1.271 |
mature leaf | 8 | 8.717 | 0.617 |
medium leaf | 8 | 10.308 | 1.128 |
small flower bud | 8 | 15.543 | 1.620 |
Young fruit | 8 | 15.820 | 3.304 |
young leaf | 8 | 13.157 | 0.907 |
# --- Extract reporting numbers ---
# Build the ANOVA-like summary table from the fitted model object `fit2`.
# For aov objects, summary(fit2)[[1]] returns a matrix/data.frame with rows for model terms (e.g., "Organ") and "Residuals", and columns like "Df", "F value", "Pr(>F)".
aov_tab <- summary(fit2)[[1]]
# Pull the F statistic for the factor named exactly "Organ" from the #table.Assumes the row name is "Organ" (i.e., the model was fit with Organ #as a factor term).
F_value <- aov_tab["Organ","F value"]
# Pull the p-value for the Organ effect (column name "Pr(>F)").
p_value <- aov_tab["Organ","Pr(>F)"]
# Extract degrees of freedom:
# df1 = numerator df for Organ (its row's "Df")
# df2 = denominator df from the Residuals row
df1 <- aov_tab["Organ","Df"]; df2 <- aov_tab["Residuals","Df"]
# Print a compact report line:
# F(df1, df2) = F_value, p = p_value
# Note: %.4g prints p in up to 4 significant digits (switching to scientific notation if tiny).
cat(sprintf("\nReport: F(%d, %d) = %.3f, p = %.4g\n", df1, df2, F_value, p_value))
##
## Report: F(9, 70) = 21.680, p = 5.497e-17
# Residuals vs Fitted: look for no pattern (looking for a random cloud of #points), and roughly constant vertical spread
ggplot(data.frame(fit = fitted(fit2), res = resid(fit2)), aes(fit, res)) +
geom_point(alpha = 0.6) +
geom_hline(yintercept = 0, linetype = "dashed") +
labs(title = "Residuals vs Fitted (Protein ~ Organ)", x = "Fitted values", y = "Residuals")
# QQ plot: residuals should follow the reference line if approximately normal
ggplot(data.frame(res = resid(fit2)), aes(sample = res)) +
stat_qq() + stat_qq_line() +
labs(title = "QQ plot of residuals (Protein ~ Organ)", x = "Theoretical Quantiles", y = "Sample Quantiles")
# 1) Normality of residuals (ANOVA assumes normally distributed errors)
# We test the residuals from the fitted model, not each raw group,
# because ANOVA is about model errors, not the raw data per se.
cat("\nShapiro–Wilk normality test (residuals): Protein ~ Organ\n")
##
## Shapiro–Wilk normality test (residuals): Protein ~ Organ
res <- residuals(fit2)
print(shapiro.test(resid(fit2)))
##
## Shapiro-Wilk normality test
##
## data: resid(fit2)
## W = 0.96671, p-value = 0.03509
# Interpretation tip:
# p < 0.05 → evidence residuals deviate from normality
# p ≥ 0.05 → no strong evidence against normality
# Remember: with moderate n, ANOVA is fairly robust to mild non-normality.
# 2) test for Homogeneity of variances (equal variances across groups)
cat("\nLevene's test for equal variances (median-centered): Protein ~ Organ\n")
##
## Levene's test for equal variances (median-centered): Protein ~ Organ
print(leveneTest(Protein ~ Organ, data = hc2, center = median))
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 9 2.3298 0.02333 *
## 70
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# 3) Welch's ANOVA is a statistical test used as an alternative to the traditional one-way ANOVA when the assumption of equal variances across groups is violated. It is particularly useful for comparing the means of three or more groups when their variances are unequal. Welch's ANOVA still assumes ~normality)
# Use when Levene indicates heteroscedasticity.
cat("\nWelch's ANOVA (robust to unequal variances): Protein ~ Organ\n")
##
## Welch's ANOVA (robust to unequal variances): Protein ~ Organ
print(oneway.test(Protein ~ Organ, data = hc2, var.equal = FALSE))
##
## One-way analysis of means (not assuming equal variances)
##
## data: Protein and Organ
## F = 31.22, num df = 9.000, denom df = 28.153, p-value = 2.474e-12
# Interpretation tip:
# If Welch is significant while classic ANOVA disagrees, trust Welch when variances are unequal.
# Practical guidance:
# - These tests inform, they don’t automatically veto ANOVA.
# - If assumptions look shaky: consider data transformations, Welch’s ANOVA,
# or nonparametric alternatives (e.g., Kruskal–Wallis) depending on the issue.
If variance increases with the mean (fan shape) or residuals are skewed, a log (or power) transform can stabilize variance and improve normality.
.safe_log <- function(x) if (all(x > 0, na.rm = TRUE)) log(x) else log1p(x)
hc_protein <- hc2 %>% dplyr::mutate(Protein_log = .safe_log(Protein))
fit2_log <- aov(Protein_log ~ Organ, data = hc_protein)
cat("\nANOVA (log scale): Protein_log ~ Organ\n")
##
## ANOVA (log scale): Protein_log ~ Organ
print(summary(fit2_log))
## Df Sum Sq Mean Sq F value Pr(>F)
## Organ 9 5.350 0.5945 25.64 <2e-16 ***
## Residuals 70 1.623 0.0232
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
cat("\nLevene on log scale (median-centered):\n")
##
## Levene on log scale (median-centered):
print(leveneTest(Protein_log ~ Organ, data = hc_protein, center = median))
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 9 1.6619 0.1149
## 70
ggplot(data.frame(fit = fitted(fit2_log), res = resid(fit2_log)), aes(fit, res)) +
geom_point(alpha = 0.6) + geom_hline(yintercept = 0, linetype = "dashed") +
labs(title = "Residuals vs Fitted (log Protein ~ Organ)", x = "Fitted (log Protein)", y = "Residuals")
ggplot(data.frame(res = resid(fit2_log)), aes(sample = res)) +
stat_qq() + stat_qq_line() +
labs(title = "QQ plot (log Protein ~ Organ)", x = "Theoretical Quantiles", y = "Sample Quantiles")
tuk <- TukeyHSD(if (exists("fit2_log")) fit2_log else fit2) # use transformed fit if you chose it
tuk_df <- broom::tidy(tuk) %>%
dplyr::mutate(sig = ifelse(adj.p.value < 0.05, "Significant", "Not significant")) %>%
dplyr::arrange(adj.p.value)
knitr::kable(tuk_df, digits = 3, caption = "Tukey HSD pairwise comparisons (HCNp)")
term | contrast | null.value | estimate | conf.low | conf.high | adj.p.value | sig |
---|---|---|---|---|---|---|---|
Organ | small flower bud-Intermediate fruit | 0 | 0.688 | 0.440 | 0.937 | 0.000 | Significant |
Organ | Young fruit-Intermediate fruit | 0 | 0.694 | 0.445 | 0.942 | 0.000 | Significant |
Organ | small flower bud-Mature fruit | 0 | 0.758 | 0.509 | 1.007 | 0.000 | Significant |
Organ | Young fruit-Mature fruit | 0 | 0.763 | 0.515 | 1.012 | 0.000 | Significant |
Organ | Mature fruit-large flower bud | 0 | -0.615 | -0.864 | -0.366 | 0.000 | Significant |
Organ | young leaf-Mature fruit | 0 | 0.595 | 0.346 | 0.844 | 0.000 | Significant |
Organ | Young fruit-mature leaf | 0 | 0.580 | 0.332 | 0.829 | 0.000 | Significant |
Organ | small flower bud-mature leaf | 0 | 0.575 | 0.326 | 0.824 | 0.000 | Significant |
Organ | large flower bud-Intermediate fruit | 0 | 0.545 | 0.296 | 0.794 | 0.000 | Significant |
Organ | young leaf-Intermediate fruit | 0 | 0.525 | 0.276 | 0.774 | 0.000 | Significant |
Organ | Mature fruit-Fully developed flower (white) | 0 | -0.508 | -0.757 | -0.260 | 0.000 | Significant |
Organ | Mature fruit-Fully developed flower (yellow) | 0 | -0.440 | -0.689 | -0.191 | 0.000 | Significant |
Organ | Intermediate fruit-Fully developed flower (white) | 0 | -0.438 | -0.687 | -0.190 | 0.000 | Significant |
Organ | mature leaf-large flower bud | 0 | -0.432 | -0.681 | -0.183 | 0.000 | Significant |
Organ | Young fruit-medium leaf | 0 | 0.416 | 0.167 | 0.665 | 0.000 | Significant |
Organ | young leaf-mature leaf | 0 | 0.412 | 0.163 | 0.661 | 0.000 | Significant |
Organ | small flower bud-medium leaf | 0 | 0.411 | 0.162 | 0.660 | 0.000 | Significant |
Organ | Intermediate fruit-Fully developed flower (yellow) | 0 | -0.370 | -0.619 | -0.121 | 0.000 | Significant |
Organ | medium leaf-Mature fruit | 0 | 0.347 | 0.099 | 0.596 | 0.001 | Significant |
Organ | mature leaf-Fully developed flower (white) | 0 | -0.325 | -0.574 | -0.076 | 0.002 | Significant |
Organ | Young fruit-Fully developed flower (yellow) | 0 | 0.324 | 0.075 | 0.573 | 0.002 | Significant |
Organ | small flower bud-Fully developed flower (yellow) | 0 | 0.319 | 0.070 | 0.567 | 0.003 | Significant |
Organ | medium leaf-Intermediate fruit | 0 | 0.278 | 0.029 | 0.526 | 0.017 | Significant |
Organ | medium leaf-large flower bud | 0 | -0.267 | -0.516 | -0.019 | 0.025 | Significant |
Organ | mature leaf-Fully developed flower (yellow) | 0 | -0.257 | -0.505 | -0.008 | 0.038 | Significant |
Organ | Young fruit-Fully developed flower (white) | 0 | 0.255 | 0.006 | 0.504 | 0.040 | Significant |
Organ | small flower bud-Fully developed flower (white) | 0 | 0.250 | 0.001 | 0.499 | 0.048 | Significant |
Organ | young leaf-medium leaf | 0 | 0.247 | -0.001 | 0.496 | 0.052 | Not significant |
Organ | mature leaf-Mature fruit | 0 | 0.183 | -0.066 | 0.432 | 0.339 | Not significant |
Organ | large flower bud-Fully developed flower (yellow) | 0 | 0.175 | -0.074 | 0.424 | 0.402 | Not significant |
Organ | young leaf-Young fruit | 0 | -0.168 | -0.417 | 0.080 | 0.458 | Not significant |
Organ | medium leaf-mature leaf | 0 | 0.164 | -0.084 | 0.413 | 0.494 | Not significant |
Organ | young leaf-small flower bud | 0 | -0.163 | -0.412 | 0.085 | 0.503 | Not significant |
Organ | medium leaf-Fully developed flower (white) | 0 | -0.161 | -0.410 | 0.088 | 0.525 | Not significant |
Organ | young leaf-Fully developed flower (yellow) | 0 | 0.155 | -0.094 | 0.404 | 0.576 | Not significant |
Organ | Young fruit-large flower bud | 0 | 0.149 | -0.100 | 0.397 | 0.634 | Not significant |
Organ | small flower bud-large flower bud | 0 | 0.143 | -0.105 | 0.392 | 0.680 | Not significant |
Organ | mature leaf-Intermediate fruit | 0 | 0.113 | -0.136 | 0.362 | 0.893 | Not significant |
Organ | large flower bud-Fully developed flower (white) | 0 | 0.106 | -0.142 | 0.355 | 0.924 | Not significant |
Organ | medium leaf-Fully developed flower (yellow) | 0 | -0.092 | -0.341 | 0.157 | 0.968 | Not significant |
Organ | young leaf-Fully developed flower (white) | 0 | 0.087 | -0.162 | 0.335 | 0.979 | Not significant |
Organ | Mature fruit-Intermediate fruit | 0 | -0.070 | -0.319 | 0.179 | 0.995 | Not significant |
Organ | Fully developed flower (yellow)-Fully developed flower (white) | 0 | -0.069 | -0.317 | 0.180 | 0.996 | Not significant |
Organ | young leaf-large flower bud | 0 | -0.020 | -0.269 | 0.229 | 1.000 | Not significant |
Organ | Young fruit-small flower bud | 0 | 0.005 | -0.244 | 0.254 | 1.000 | Not significant |
Student questions (B1):
Why: Test if mean Protein differs among organs.
Do: Drop NAs, ensure Organ
is a factor,
fit aov(Protein ~ Organ, data = ...)
, print
summary(...)
, and show a small table of group sizes and
means.
Student questions (B2):
df = 9 (within), 70 (between)
p-value = <2e-16
There is evidence that mean Protein differs among organs.
Why: Validate model assumptions and pick a robust alternative if needed.
Do: Plot residuals vs fitted and a QQ plot; run
shapiro.test(resid(...))
; run
car::leveneTest(...)
; if Levene’s p < 0.05, report
Welch’s ANOVA
(oneway.test(..., var.equal = FALSE)
).
Student questions (B3):
Why: Stabilize variance and improve normality when spread increases with the mean.
Do: Create Protein_log
via
log(...)
or log1p(...)
, fit
aov(Protein_log ~ Organ, ...)
, and repeat diagnostics.
Student questions (B4):
Why: Identify which specific organ means differ, controlling family-wise error.
Do: Run TukeyHSD(fit_protein_chosen)
where fit_protein_chosen
is the model you decided to
report.
Student questions (B5):
File names:
Lastname_Firstname_Lab_ANOVA.html
and
Lastname_Firstname_Lab_ANOVA.Rmd
.
Do the if you have time just to see how it works, but not required since the parametric tests are probably appropriate for these data. It is good to have the code for future reference however.
suppressPackageStartupMessages({
library(dplyr)
library(knitr)
# install.packages("rstatix") # <- run once if needed
library(rstatix) # for dunn_test()
})
# Use the existing 'hc' if present; require necessary columns
stopifnot(exists("hc"))
stopifnot(all(c("Organ","HCNp") %in% names(hc)))
# Analysis frame: drop NAs
hc_kw <- hc %>%
dplyr::filter(!is.na(Organ), !is.na(HCNp))
# Descriptive medians & IQRs (rank-based summary)
hc_kw %>%
dplyr::group_by(Organ) %>%
dplyr::summarise(
n = dplyr::n(),
median = median(HCNp),
IQR = IQR(HCNp),
.groups = "drop"
) %>%
knitr::kable(digits = 3, caption = "HCNp by Organ: medians and IQRs")
Organ | n | median | IQR |
---|---|---|---|
Fully developed flower (white) | 8 | 8.420 | 3.219 |
Fully developed flower (yellow) | 8 | 6.167 | 0.624 |
Intermediate fruit | 8 | 2.357 | 1.048 |
Mature fruit | 8 | 0.083 | 0.138 |
Young fruit | 8 | 32.533 | 6.834 |
large flower bud | 8 | 9.939 | 2.294 |
mature leaf | 8 | 6.350 | 1.032 |
medium leaf | 8 | 32.550 | 0.399 |
small flower bud | 8 | 25.408 | 6.229 |
young leaf | 8 | 66.750 | 4.151 |
# Omnibus Kruskal–Wallis test
kw <- kruskal.test(HCNp ~ Organ, data = hc_kw)
kw
##
## Kruskal-Wallis rank sum test
##
## data: HCNp by Organ
## Kruskal-Wallis chi-squared = 75.138, df = 9, p-value = 1.484e-12
# Effect size: epsilon-squared
k <- nlevels(hc_kw$Organ); n <- nrow(hc_kw); H <- unname(kw$statistic)
epsilon2 <- (H - k + 1) / (n - k)
cat(sprintf("\nEpsilon-squared effect size: %.3f\n", epsilon2))
##
## Epsilon-squared effect size: 0.952
# ---- Dunn post-hoc pairwise comparisons (Holm-adjusted) --------------
dunn_tbl <- dunn_test(hc_kw, HCNp ~ Organ, p.adjust.method = "holm") %>%
arrange(p.adj) %>%
transmute(
group1, group2,
z = statistic,
p = p,
p_adj = p.adj,
significant = ifelse(p_adj < 0.05, "Yes", "No")
)
knitr::kable(dunn_tbl, digits = 3,
caption = "Dunn post-hoc (Holm-adjusted) for HCNp by Organ")
group1 | group2 | z | p | p_adj | significant |
---|---|---|---|---|---|
Mature fruit | young leaf | 6.197 | 0.000 | 0.000 | Yes |
Intermediate fruit | young leaf | 5.498 | 0.000 | 0.000 | Yes |
Mature fruit | medium leaf | 5.132 | 0.000 | 0.000 | Yes |
Mature fruit | Young fruit | 5.035 | 0.000 | 0.000 | Yes |
Fully developed flower (yellow) | young leaf | 4.465 | 0.000 | 0.000 | Yes |
Intermediate fruit | medium leaf | 4.432 | 0.000 | 0.000 | Yes |
Intermediate fruit | Young fruit | 4.336 | 0.000 | 0.001 | Yes |
Mature fruit | small flower bud | 4.293 | 0.000 | 0.001 | Yes |
mature leaf | young leaf | 4.250 | 0.000 | 0.001 | Yes |
Intermediate fruit | small flower bud | 3.593 | 0.000 | 0.012 | Yes |
Fully developed flower (white) | young leaf | 3.421 | 0.001 | 0.022 | Yes |
Fully developed flower (yellow) | medium leaf | 3.400 | 0.001 | 0.023 | Yes |
Fully developed flower (yellow) | Young fruit | 3.303 | 0.001 | 0.032 | Yes |
mature leaf | medium leaf | 3.185 | 0.001 | 0.046 | Yes |
large flower bud | Mature fruit | -3.174 | 0.002 | 0.047 | Yes |
mature leaf | Young fruit | 3.088 | 0.002 | 0.061 | No |
large flower bud | young leaf | 3.023 | 0.003 | 0.073 | No |
Fully developed flower (white) | Mature fruit | -2.776 | 0.006 | 0.154 | No |
Fully developed flower (yellow) | small flower bud | 2.561 | 0.010 | 0.282 | No |
Intermediate fruit | large flower bud | 2.474 | 0.013 | 0.347 | No |
Fully developed flower (white) | medium leaf | 2.356 | 0.018 | 0.462 | No |
mature leaf | small flower bud | 2.345 | 0.019 | 0.462 | No |
Fully developed flower (white) | Young fruit | 2.259 | 0.024 | 0.549 | No |
Fully developed flower (white) | Intermediate fruit | -2.076 | 0.038 | 0.833 | No |
Fully developed flower (white) | Fully developed flower (yellow) | -1.044 | 0.297 | 1.000 | No |
Fully developed flower (white) | large flower bud | 0.398 | 0.691 | 1.000 | No |
Fully developed flower (white) | mature leaf | -0.828 | 0.407 | 1.000 | No |
Fully developed flower (white) | small flower bud | 1.517 | 0.129 | 1.000 | No |
Fully developed flower (yellow) | Intermediate fruit | -1.033 | 0.302 | 1.000 | No |
Fully developed flower (yellow) | large flower bud | 1.442 | 0.149 | 1.000 | No |
Fully developed flower (yellow) | Mature fruit | -1.732 | 0.083 | 1.000 | No |
Fully developed flower (yellow) | mature leaf | 0.215 | 0.830 | 1.000 | No |
Intermediate fruit | Mature fruit | -0.699 | 0.484 | 1.000 | No |
Intermediate fruit | mature leaf | 1.248 | 0.212 | 1.000 | No |
large flower bud | mature leaf | -1.226 | 0.220 | 1.000 | No |
large flower bud | medium leaf | 1.958 | 0.050 | 1.000 | No |
large flower bud | small flower bud | 1.119 | 0.263 | 1.000 | No |
large flower bud | Young fruit | 1.861 | 0.063 | 1.000 | No |
Mature fruit | mature leaf | 1.947 | 0.052 | 1.000 | No |
medium leaf | small flower bud | -0.839 | 0.401 | 1.000 | No |
medium leaf | Young fruit | -0.097 | 0.923 | 1.000 | No |
medium leaf | young leaf | 1.065 | 0.287 | 1.000 | No |
small flower bud | Young fruit | 0.742 | 0.458 | 1.000 | No |
small flower bud | young leaf | 1.904 | 0.057 | 1.000 | No |
Young fruit | young leaf | 1.162 | 0.245 | 1.000 | No |
Student questions (Kruskal–Wallis): Bonus Question