R Programming - Class 4: ANOVA, Advanced ggplot, and Data Visualization

Author

Akash Mitra

🎯 Objectives

By the end of this class, you will be able to:

Perform one-way ANOVA and post-hoc analysis.
Use ggplot2 to create advanced plots with summary statistics and custom themes.
Apply RColorBrewer palettes for improved visuals.
Layer multiple geometries including bars, error bars, points, and annotations.

1. 📊 Loading and Cleaning Data

We work with MyoD_intensity.xlsx, a file with intensity values from three replicates per treatment.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(readxl)

myod <- read_excel("MyoD_intensity.xlsx")

myod_clean <- myod |> 
  select(Treatment, matches("-")) |> 
  setNames(c("treatment", "int_r1", "int_r2", "int_r3")) |> 
  mutate(treatment = fct(treatment)) |> 
  mutate(treatment = fct_recode(
    treatment,
    "40mM Glc" = "glc",
    "100ng/ml\nLXA4" = "lxa4",
    "Glc +LXA4" = "g+l"
  )) |> 
  pivot_longer(2:4, names_to = "int_reps", values_to = "val") |> 
  group_by(treatment)

2. 📈 One-Way ANOVA & Tukey HSD

summary(aov(val ~ treatment, data = myod_clean))

            Df    Sum Sq   Mean Sq F value   Pr(>F)    
treatment    3 5.284e+09 1.761e+09    21.8 0.000332 ***
Residuals    8 6.464e+08 8.080e+07                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

TukeyHSD(aov(val ~ treatment, data = myod_clean))

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = val ~ treatment, data = myod_clean)

$treatment
                              diff       lwr        upr     p adj
40mM Glc-Control         -52720.67 -76224.28 -29217.056 0.0004298
100ng/ml\nLXA4-Control    -2840.00 -26343.61  20663.611 0.9789669
Glc +LXA4-Control        -16610.00 -40113.61   6893.611 0.1862127
100ng/ml\nLXA4-40mM Glc   49880.67  26377.06  73384.278 0.0006298
Glc +LXA4-40mM Glc        36110.67  12607.06  59614.278 0.0050999
Glc +LXA4-100ng/ml\nLXA4 -13770.00 -37273.61   9733.611 0.3091419

3. 🎨 Setting a Custom ggplot Theme

theme_set(
  theme_bw() +
    theme(
      axis.line = element_line(linewidth = 0.3),
      axis.text = element_text(size = 8, colour = "black"),
      legend.position = "none",
      plot.margin = margin(1, 1, 1, 1, unit = "cm"),
      axis.title.y = element_text(vjust = 3, size = 12, face = "bold"),
      axis.title.x = element_blank(),
      panel.grid = element_blank(),
      panel.background = element_rect(fill = "transparent", color = NA),
      plot.background  = element_rect(fill = "transparent", color = NA)
    )
)

4. 🧪 Plotting Means, Error Bars, and Points

library(RColorBrewer)

myod_clean |> 
  ggplot(aes(treatment, val)) +
  stat_summary(geom = "bar", fun.data = mean_se,
               fill = brewer.pal(4, "Pastel1"), color = "grey30") +
  stat_summary(geom = "errorbar", fun.data = mean_se, width = 0.3) +
  stat_summary(geom = "pointrange", fun.data = mean_se, size = 0.2) +
  geom_jitter(data = myod |> 
                select(Treatment, matches("-")) |> 
                setNames(c("treatment", "int_r1", "int_r2", "int_r3")) |> 
                mutate(treatment = fct(treatment)) |> 
                mutate(treatment = fct_recode(
                  treatment,
                  "40mM Glc" = "glc",
                  "100ng/ml\nLXA4" = "lxa4",
                  "Glc +LXA4" = "g+l"
                )) |> 
                pivot_longer(2:4, names_to = "int_reps", values_to = "val"),
              aes(treatment, val), width = 0.2, size = 3, alpha = 0.5) +
  scale_y_continuous(expand = c(0, 0), limits = c(0, 1e5), name = "Intensity") +
  geom_line(data = tibble(x = c(1, 2), y = c(77000)), aes(x, y)) +
  geom_text(data = tibble(x = 1.5, y = 78000), aes(x, y, label = "**"), size = 8)

🧮 Understanding `stat_summary()`

The stat_summary() function in ggplot2 is used to apply summary functions (like mean, median, or standard error) to your data before plotting.

Unlike geom_point() or geom_col() that work with raw values, stat_summary() lets you:

Compute and display summary statistics dynamically.
Layer statistical summaries (e.g., error bars) on top of plots.
Avoid manual summarization of data frames before plotting.

📌 Syntax

stat_summary(geom = "bar", fun = mean)

**5. 📋 Common geom_() Functions and Their Default stat_()**

`geom_*()`	Default `stat_*()`	Description
`geom_point()`	`stat_identity`	Plots raw data points
`geom_line()`	`stat_identity`	Connects data points in order
`geom_path()`	`stat_identity`	Connects points in data order (not sorted by x)
`geom_bar()`	`stat_count`	Counts observations in each category
`geom_col()`	`stat_identity`	Uses actual y-values (no aggregation)
`geom_histogram()`	`stat_bin`	Bins continuous data and counts in each bin
`geom_boxplot()`	`stat_boxplot`	Computes and displays boxplot statistics
`geom_violin()`	`stat_ydensity`	Computes a density estimate along y-axis
`geom_density()`	`stat_density`	Displays kernel density estimate
`geom_smooth()`	`stat_smooth`	Fits and plots a model (e.g., LOESS, linear)
`geom_text()`	`stat_identity`	Displays text exactly as provided
`geom_errorbar()`	`stat_identity`	Requires ymin and ymax to be specified manually
`geom_pointrange()`	`stat_identity`	Point with a range (y, ymin, ymax)
`geom_ribbon()`	`stat_identity`	Area between ymin and ymax
`geom_area()`	`stat_identity`	Area under the curve from y=0
`geom_dotplot()`	`stat_bindot`	Dots for binned values along x