🎯 Objectives
By the end of this class, you will be able to:
- Perform one-way ANOVA and post-hoc analysis.
- Use
ggplot2 to create advanced plots with summary statistics and custom themes.
- Apply
RColorBrewer palettes for improved visuals.
- Layer multiple geometries including bars, error bars, points, and annotations.
1. 📊 Loading and Cleaning Data
We work with MyoD_intensity.xlsx, a file with intensity values from three replicates per treatment.
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
myod <- read_excel("MyoD_intensity.xlsx")
myod_clean <- myod |>
select(Treatment, matches("-")) |>
setNames(c("treatment", "int_r1", "int_r2", "int_r3")) |>
mutate(treatment = fct(treatment)) |>
mutate(treatment = fct_recode(
treatment,
"40mM Glc" = "glc",
"100ng/ml\nLXA4" = "lxa4",
"Glc +LXA4" = "g+l"
)) |>
pivot_longer(2:4, names_to = "int_reps", values_to = "val") |>
group_by(treatment)
2. 📈 One-Way ANOVA & Tukey HSD
summary(aov(val ~ treatment, data = myod_clean))
Df Sum Sq Mean Sq F value Pr(>F)
treatment 3 5.284e+09 1.761e+09 21.8 0.000332 ***
Residuals 8 6.464e+08 8.080e+07
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(aov(val ~ treatment, data = myod_clean))
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = val ~ treatment, data = myod_clean)
$treatment
diff lwr upr p adj
40mM Glc-Control -52720.67 -76224.28 -29217.056 0.0004298
100ng/ml\nLXA4-Control -2840.00 -26343.61 20663.611 0.9789669
Glc +LXA4-Control -16610.00 -40113.61 6893.611 0.1862127
100ng/ml\nLXA4-40mM Glc 49880.67 26377.06 73384.278 0.0006298
Glc +LXA4-40mM Glc 36110.67 12607.06 59614.278 0.0050999
Glc +LXA4-100ng/ml\nLXA4 -13770.00 -37273.61 9733.611 0.3091419
3. 🎨 Setting a Custom ggplot Theme
theme_set(
theme_bw() +
theme(
axis.line = element_line(linewidth = 0.3),
axis.text = element_text(size = 8, colour = "black"),
legend.position = "none",
plot.margin = margin(1, 1, 1, 1, unit = "cm"),
axis.title.y = element_text(vjust = 3, size = 12, face = "bold"),
axis.title.x = element_blank(),
panel.grid = element_blank(),
panel.background = element_rect(fill = "transparent", color = NA),
plot.background = element_rect(fill = "transparent", color = NA)
)
)
4. 🧪 Plotting Means, Error Bars, and Points
library(RColorBrewer)
myod_clean |>
ggplot(aes(treatment, val)) +
stat_summary(geom = "bar", fun.data = mean_se,
fill = brewer.pal(4, "Pastel1"), color = "grey30") +
stat_summary(geom = "errorbar", fun.data = mean_se, width = 0.3) +
stat_summary(geom = "pointrange", fun.data = mean_se, size = 0.2) +
geom_jitter(data = myod |>
select(Treatment, matches("-")) |>
setNames(c("treatment", "int_r1", "int_r2", "int_r3")) |>
mutate(treatment = fct(treatment)) |>
mutate(treatment = fct_recode(
treatment,
"40mM Glc" = "glc",
"100ng/ml\nLXA4" = "lxa4",
"Glc +LXA4" = "g+l"
)) |>
pivot_longer(2:4, names_to = "int_reps", values_to = "val"),
aes(treatment, val), width = 0.2, size = 3, alpha = 0.5) +
scale_y_continuous(expand = c(0, 0), limits = c(0, 1e5), name = "Intensity") +
geom_line(data = tibble(x = c(1, 2), y = c(77000)), aes(x, y)) +
geom_text(data = tibble(x = 1.5, y = 78000), aes(x, y, label = "**"), size = 8)
🧮 Understanding stat_summary()
The stat_summary() function in ggplot2 is used to apply summary functions (like mean, median, or standard error) to your data before plotting.
Unlike geom_point() or geom_col() that work with raw values, stat_summary() lets you:
- Compute and display summary statistics dynamically.
- Layer statistical summaries (e.g., error bars) on top of plots.
- Avoid manual summarization of data frames before plotting.
📌 Syntax
stat_summary(geom = "bar", fun = mean)
5. 📋 Common geom_*() Functions and Their Default stat_*()
geom_point() |
stat_identity |
Plots raw data points |
geom_line() |
stat_identity |
Connects data points in order |
geom_path() |
stat_identity |
Connects points in data order (not sorted by x) |
geom_bar() |
stat_count |
Counts observations in each category |
geom_col() |
stat_identity |
Uses actual y-values (no aggregation) |
geom_histogram() |
stat_bin |
Bins continuous data and counts in each bin |
geom_boxplot() |
stat_boxplot |
Computes and displays boxplot statistics |
geom_violin() |
stat_ydensity |
Computes a density estimate along y-axis |
geom_density() |
stat_density |
Displays kernel density estimate |
geom_smooth() |
stat_smooth |
Fits and plots a model (e.g., LOESS, linear) |
geom_text() |
stat_identity |
Displays text exactly as provided |
geom_errorbar() |
stat_identity |
Requires ymin and ymax to be specified manually |
geom_pointrange() |
stat_identity |
Point with a range (y, ymin, ymax) |
geom_ribbon() |
stat_identity |
Area between ymin and ymax |
geom_area() |
stat_identity |
Area under the curve from y=0 |
geom_dotplot() |
stat_bindot |
Dots for binned values along x |