R Programming - Class 4: ANOVA, Advanced ggplot, and Data Visualization

Author

Akash Mitra

🎯 Objectives

By the end of this class, you will be able to:

  • Perform one-way ANOVA and post-hoc analysis.
  • Use ggplot2 to create advanced plots with summary statistics and custom themes.
  • Apply RColorBrewer palettes for improved visuals.
  • Layer multiple geometries including bars, error bars, points, and annotations.

1. 📊 Loading and Cleaning Data

We work with MyoD_intensity.xlsx, a file with intensity values from three replicates per treatment.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)

myod <- read_excel("MyoD_intensity.xlsx")

myod_clean <- myod |> 
  select(Treatment, matches("-")) |> 
  setNames(c("treatment", "int_r1", "int_r2", "int_r3")) |> 
  mutate(treatment = fct(treatment)) |> 
  mutate(treatment = fct_recode(
    treatment,
    "40mM Glc" = "glc",
    "100ng/ml\nLXA4" = "lxa4",
    "Glc +LXA4" = "g+l"
  )) |> 
  pivot_longer(2:4, names_to = "int_reps", values_to = "val") |> 
  group_by(treatment)

2. 📈 One-Way ANOVA & Tukey HSD

summary(aov(val ~ treatment, data = myod_clean))
            Df    Sum Sq   Mean Sq F value   Pr(>F)    
treatment    3 5.284e+09 1.761e+09    21.8 0.000332 ***
Residuals    8 6.464e+08 8.080e+07                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(aov(val ~ treatment, data = myod_clean))
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = val ~ treatment, data = myod_clean)

$treatment
                              diff       lwr        upr     p adj
40mM Glc-Control         -52720.67 -76224.28 -29217.056 0.0004298
100ng/ml\nLXA4-Control    -2840.00 -26343.61  20663.611 0.9789669
Glc +LXA4-Control        -16610.00 -40113.61   6893.611 0.1862127
100ng/ml\nLXA4-40mM Glc   49880.67  26377.06  73384.278 0.0006298
Glc +LXA4-40mM Glc        36110.67  12607.06  59614.278 0.0050999
Glc +LXA4-100ng/ml\nLXA4 -13770.00 -37273.61   9733.611 0.3091419

3. 🎨 Setting a Custom ggplot Theme

theme_set(
  theme_bw() +
    theme(
      axis.line = element_line(linewidth = 0.3),
      axis.text = element_text(size = 8, colour = "black"),
      legend.position = "none",
      plot.margin = margin(1, 1, 1, 1, unit = "cm"),
      axis.title.y = element_text(vjust = 3, size = 12, face = "bold"),
      axis.title.x = element_blank(),
      panel.grid = element_blank(),
      panel.background = element_rect(fill = "transparent", color = NA),
      plot.background  = element_rect(fill = "transparent", color = NA)
    )
)

4. 🧪 Plotting Means, Error Bars, and Points

library(RColorBrewer)

myod_clean |> 
  ggplot(aes(treatment, val)) +
  stat_summary(geom = "bar", fun.data = mean_se,
               fill = brewer.pal(4, "Pastel1"), color = "grey30") +
  stat_summary(geom = "errorbar", fun.data = mean_se, width = 0.3) +
  stat_summary(geom = "pointrange", fun.data = mean_se, size = 0.2) +
  geom_jitter(data = myod |> 
                select(Treatment, matches("-")) |> 
                setNames(c("treatment", "int_r1", "int_r2", "int_r3")) |> 
                mutate(treatment = fct(treatment)) |> 
                mutate(treatment = fct_recode(
                  treatment,
                  "40mM Glc" = "glc",
                  "100ng/ml\nLXA4" = "lxa4",
                  "Glc +LXA4" = "g+l"
                )) |> 
                pivot_longer(2:4, names_to = "int_reps", values_to = "val"),
              aes(treatment, val), width = 0.2, size = 3, alpha = 0.5) +
  scale_y_continuous(expand = c(0, 0), limits = c(0, 1e5), name = "Intensity") +
  geom_line(data = tibble(x = c(1, 2), y = c(77000)), aes(x, y)) +
  geom_text(data = tibble(x = 1.5, y = 78000), aes(x, y, label = "**"), size = 8)

🧮 Understanding stat_summary()

The stat_summary() function in ggplot2 is used to apply summary functions (like mean, median, or standard error) to your data before plotting.

Unlike geom_point() or geom_col() that work with raw values, stat_summary() lets you:

  • Compute and display summary statistics dynamically.
  • Layer statistical summaries (e.g., error bars) on top of plots.
  • Avoid manual summarization of data frames before plotting.

📌 Syntax

stat_summary(geom = "bar", fun = mean)


5. 📋 Common geom_*() Functions and Their Default stat_*()

geom_*() Default stat_*() Description
geom_point() stat_identity Plots raw data points
geom_line() stat_identity Connects data points in order
geom_path() stat_identity Connects points in data order (not sorted by x)
geom_bar() stat_count Counts observations in each category
geom_col() stat_identity Uses actual y-values (no aggregation)
geom_histogram() stat_bin Bins continuous data and counts in each bin
geom_boxplot() stat_boxplot Computes and displays boxplot statistics
geom_violin() stat_ydensity Computes a density estimate along y-axis
geom_density() stat_density Displays kernel density estimate
geom_smooth() stat_smooth Fits and plots a model (e.g., LOESS, linear)
geom_text() stat_identity Displays text exactly as provided
geom_errorbar() stat_identity Requires ymin and ymax to be specified manually
geom_pointrange() stat_identity Point with a range (y, ymin, ymax)
geom_ribbon() stat_identity Area between ymin and ymax
geom_area() stat_identity Area under the curve from y=0
geom_dotplot() stat_bindot Dots for binned values along x