This worksheet will guide you through basic statistical analysis and data visualization techniques in R using simulated crop treatment data. You will learn about ANOVA, LSD, t-tests, and how to make boxplots with informative labels and color themes.
# Required for reading files (not used in this example)
library(readxl) # Not essential here, but useful if reading Excel input
library(readr) # Not essential here, but useful if reading CSV input
library(data.table) # Not essential here, but useful for speed + large files
# Core plotting and data handling
library(ggplot2) # Essential: used for all visualizations
library(dplyr) # Essential: used for summarizing and transforming data
# Optional enhancements to plots
library(ggbeeswarm) # Optional: jittered swarm plots
library(viridisLite) # Optional: for viridis color scales
library(viridis) # Optional: color scale options
library(RColorBrewer) # Optional: for using ColorBrewer palettes
# Plot annotation and combination
library(ggpubr) # Essential: used for adding statistical comparisons
library(cowplot) # Optional: combines multiple plots into one
# Image support
library(magick) # Optional: advanced image processing
library(plotly)
# Document support
library(knitr) # Essential: powers R Markdown knitting
# Post-hoc testing
library(agricolae) # Essential: needed for LSD.test after ANOVA
π Instructions: Simulate or load a dataset with a Treatment column and a numeric outcome (e.g., Yield).
# Simulate data with distinct mean for Fungicide_B
set.seed(123) #exactly the same numbers every time we run the code, for reproducibility.
data <- data.frame(
Treatment = rep(c("Control", "Fungicide_A", "Fungicide_B", "Fungicide_C"), each = 50),
Yield = c(
rnorm(50, mean = 50, sd = 10), # Control
rnorm(50, mean = 50, sd = 10), # Fungicide_A
rnorm(50, mean = 60, sd = 10), # Fungicide_B --> deliberately higher
rnorm(50, mean = 50, sd = 10) # Fungicide_C
)
)
# View the first few rows
head(data)
π Question: What type of variable is
Treatment? What type is Yield?
π Instructions: Use ggplot2 to create
a boxplot of Yield by Treatment.
library(ggplot2)
ggplot(data, aes(x = Treatment, y = Yield)) +
geom_boxplot()
# This visual helps us to quickly see distribution, central tendency, spread, and outliers in the data.
π Question: What do the horizontal lines in the boxplot represent?
π Instructions: Add axis labels and fill colors based on Treatment.
ggplot(data, aes(x = Treatment, y = Yield, fill = Treatment)) +
geom_boxplot() +
xlab("Treatment Applied") +
ylab("Yield (bu/acre)") +
scale_fill_brewer(palette = "Set2")
π Question: Why do we map
fill = Treatment inside aes()?
π Instructions: Clean up the plot using a white background and rotated axis labels.
ggplot(data, aes(x = Treatment, y = Yield, fill = Treatment)) +
geom_boxplot() +
xlab("Treatment Applied") +
ylab("Yield (bu/acre)") +
scale_fill_brewer(palette = "Set2") +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
π Question: What does theme_bw()
do?
shapiro_results <- data %>%
group_by(Treatment) %>%
summarise(p_value = shapiro.test(Yield)$p.value)
shapiro_results
π Question: If any groupβs p-value < 0.05, data may not be normally distributed. Should we use ANOVA or Kruskal-Wallis? In case: kruskal.test(Yield ~ Treatment, data = data) Follow it with Dunnβs test for pairwise comparisons library(FSA) dunnTest(Yield ~ Treatment, data = data, method = βbonferroniβ)
Purpose: To test whether the means of three or more groups are significantly different. It compares.
Null Hypothesis (Hβ): All group means are equal Alternative (Hβ): At least one group mean is different
How it works: It compares the variation between groups to the variation within groups. If variation between is much larger than we conclude: significant difference
aov(Yield ~ Treatment, data = data)
## Call:
## aov(formula = Yield ~ Treatment, data = data)
##
## Terms:
## Treatment Residuals
## Sum of Squares 1738.183 17260.573
## Deg. of Freedom 3 196
##
## Residual standard error: 9.38425
## Estimated effects may be unbalanced
library(agricolae)
model <- aov(Yield ~ Treatment, data = data)
LSD.test(model, "Treatment", p.adj = "none")
ggplot(data, aes(x = Treatment, y = Yield, fill = Treatment)) +
geom_boxplot() +
stat_compare_means(comparisons = list(c("Control", "Fungicide_A")), method = "t.test")
ns = not significant* = p < 0.05** = p < 0.01*** = p < 0.001π Instructions: Test whether yields are significantly different among treatments.
model <- aov(Yield ~ Treatment, data = data)
summary(model)
## Df Sum Sq Mean Sq F value Pr(>F)
## Treatment 3 1738 579.4 6.579 0.000293 ***
## Residuals 196 17261 88.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
π Question: What is the null hypothesis of ANOVA? Is it rejected?
π Instructions: Add the p-value to your plot.
library(ggpubr)
ggplot(data, aes(x = Treatment, y = Yield, fill = Treatment)) +
geom_boxplot() +
stat_compare_means(method = "anova", label.y = 70) +
scale_fill_brewer(palette = "Set2") +
theme_bw()
π Question: What does a p-value < 0.05 indicate?
π Instructions: Compare individual treatment pairs using t-tests.
pairs <- list(c("Control", "Fungicide_A"),
c("Control", "Fungicide_B"),
c("Control", "Fungicide_C"))
ggplot(data, aes(x = Treatment, y = Yield, fill = Treatment)) +
geom_boxplot() +
stat_compare_means(comparisons = pairs, method = "t.test", label = "p.signif") +
scale_fill_brewer(palette = "Set2") +
theme_bw()
π Question: Which treatment(s) show statistically significant differences from the Control?
fill argument in aes()
do?theme_bw()?Save your final plots as PNG or PDF and submit them along with your written answers.
ggsave("your_final_plot.png", width = 8, height = 6, dpi = 300)
Happy plotting! π
aes()
β Aesthetic Mappingsaes(x = Treatment, y = Yield, fill = Treatment)
## Aesthetic mapping:
## * `x` -> `Treatment`
## * `y` -> `Yield`
## * `fill` -> `Treatment`
Maps variables to plot features: x-axis, y-axis, color, size, etc.
geom_boxplot()Draws boxplots to show distributions and differences.
xlab() and ylab()Set axis labels.
ggplot(data, aes(x = Treatment, y = Yield)) +
geom_boxplot() +
xlab("Treatment") +
ylab("Yield (bu/acre)")
fill
and Color ScalesCustomize plot colors:
theme() and theme_bw()Customize the overall look:
ggplot(data, aes(x = Treatment, y = Yield, fill = Treatment)) +
geom_boxplot() +
scale_fill_viridis_d(option = "plasma") +
theme_bw() +
theme(axis.text.x = element_text(angle = 1, face = "bold"))
aspect.ratioAdjust the plot height vs.Β width:
theme(aspect.ratio = 1.25)
## List of 1
## $ aspect.ratio: num 1.25
## - attr(*, "class")= chr [1:2] "theme" "gg"
## - attr(*, "complete")= logi FALSE
## - attr(*, "validate")= logi TRUE
stat_compare_means()Adds statistical test results on plots:
stat_compare_means(method = "anova")
## geom_text: na.rm = FALSE
## stat_compare_means: label.x.npc = left, label.y.npc = top, label.x = NULL, label.y = NULL, label.sep = , , method = anova, method.args = list(), paired = FALSE, ref.group = NULL, symnum.args = list(), hide.ns = FALSE, na.rm = FALSE
## position_identity
| Concept | Description |
|---|---|
| ANOVA | Tests for overall mean differences among 3+ groups |
| LSD | Identifies which groups are different after ANOVA |
| t-Test | Compares two groups directly |
| p-value | Probability that observed difference is due to chance |
aes() |
Maps data to plot elements (e.g., x, y, fill) |
geom_boxplot() |
Visualizes distribution and outliers for each group |
fill |
Colors plot elements based on grouping |
xlab(), ylab() |
Labels for axes |
theme() |
Customize fonts, colors, spacing |
stat_compare_means() |
Add test results to plots visually |
# Simulated Example Data
set.seed(123)
data_PS433 <- data.frame(
Treatment = rep(c("Control", "Fungicide_A", "Fungicide_B", "Fungicide_C"), each = 50),
Yield = c(
rnorm(50, mean = 50, sd = 10), # Control
rnorm(50, mean = 50, sd = 10), # Fungicide_A
rnorm(50, mean = 60, sd = 10), # Fungicide_B --> deliberately higher
rnorm(50, mean = 50, sd = 10) # Fungicide_C
)
)
# Boxplot with ANOVA and Pairwise
library(ggplot2)
library(ggpubr)
p <- ggplot(data=data_PS433, aes(x = Treatment, y = Yield, fill = Treatment)) +
geom_boxplot() +
scale_fill_brewer(palette = "Set2") +
stat_compare_means(method = "anova", label.y = 70) +
stat_compare_means(comparisons = list(c("Control", "Fungicide_A"), c("Control", "Fungicide_B"),
c("Control", "Fungicide_C")), label = "p.signif") +
xlab("Treatment") + ylab("Yield (bu/acre)") +
theme_bw() +
theme(axis.text.x = element_text(angle = 0, face = "bold"))
p
if (!requireNamespace(βsmplot2β, quietly = TRUE)) remotes::install_github(βsmin95/smplot2β) if (!requireNamespace(βsdamrβ, quietly = TRUE)) remotes::install_github(βsmin95/sdamrβ)
# Load required libraries
library(ggplot2) # Data visualization package
library(smplot2) # For polished smplots
library(sdamr) # For sm_raincloud and jitter-enhanced raincloud plots
library(dplyr) # For data wrangling (e.g., summarizing by group)
library(agricolae) # For performing LSD (Least Significant Difference) test
library(ggpubr) # For adding significance stars to ggplot
# Set random seed to make results reproducible
set.seed(123)
# Simulate example dataset
# Each treatment has 50 observations (like 50 plots or plants per treatment)
data_PS533_SDSU_Solanki <- data.frame(
Treatment = rep(c("Control", "Fungicide_A", "Fungicide_B", "Fungicide_C"), each = 50),
Yield = c(
rnorm(50, mean = 50, sd = 10), # Control group
rnorm(50, mean = 50, sd = 10), # Fungicide A group
rnorm(50, mean = 60, sd = 10), # Fungicide B group (higher mean yield on purpose)
rnorm(50, mean = 50, sd = 10) # Fungicide C group
)
)
# Calculate group means and sample sizes for labeling the plot
means <- data_PS533_SDSU_Solanki %>%
group_by(Treatment) %>%
summarise(mean_yield = round(mean(Yield), 2),
n = n())
# Perform ANOVA to check for overall treatment differences
model <- aov(Yield ~ Treatment, data = data_PS533_SDSU_Solanki)
# Perform LSD test to see which specific groups are significantly different
lsd_result <- LSD.test(model, "Treatment", p.adj = "none")
# Extract the LSD value (how big a difference needs to be to count as "significant")
lsd_value <- round(lsd_result$statistics$LSD, 2)
# Get group comparison letters from LSD (e.g., a, b, ab)
group_letters <- as.data.frame(lsd_result$groups)
group_letters$Treatment <- rownames(group_letters)
# Combine group letters with mean values
label_data <- merge(means, group_letters, by = "Treatment")
# Create raincloud plot with statistical annotations
Solanki <- ggplot(data = data_PS533_SDSU_Solanki, aes(x = Treatment, y = Yield, fill = Treatment)) +
# Add raincloud layers (half-violin, boxplot, and points)
sm_raincloud(
position = position_nudge(x = 0.15),
which_side = "right",
show.legend = TRUE,
point.params = list(
size = 2, shape = 22, color = "black", alpha = 0.4,
position = sdamr::position_jitternudge(nudge.x = 0.06, seed = 123, jitter.width = 0.06)
)
) +
# Add t-test comparison between selected treatment pairs
stat_compare_means(
comparisons = list(
c("Control", "Fungicide_A"),
c("Control", "Fungicide_B"),
c("Control", "Fungicide_C")
),
method = "t.test",
label = "p.signif", # Use *, **, *** based on p-value
tip.length = 0.01
) +
# Label each group with its mean and sample size
geom_text(data = means,
aes(x = Treatment, y = mean_yield + 6,
label = paste0("Mean = ", mean_yield, "\nn = ", n)),
inherit.aes = FALSE,
size = 3, fontface = "bold", vjust = 0, hjust = 1) +
# Label with group letters from LSD test (e.g., a, b, ab)
geom_text(data = label_data,
aes(x = Treatment, y = mean_yield + 12, label = groups),
inherit.aes = FALSE,
size = 5, fontface = "bold", vjust = -1, hjust = 1.5) +
# Annotate the LSD value on the plot
annotate("text",
x = 2.5,
y = max(data_PS533_SDSU_Solanki$Yield) + 15,
label = paste0("LSD = ", lsd_value),
size = 5, fontface = "bold", color = "darkred") +
# Define color scheme for treatments
scale_fill_manual(values = sm_color("blue", "orange", "purple", "darkred")) +
# Set general visual theme
theme_minimal(base_size = 14) +
labs(
x = "Treatment", # X-axis label
y = "Soybean Yield (bu/acre)", # Y-axis label
title = "Brookings: Effect of Fungicide Treatment on Soybean Yield" # Title
) +
# Customize theme appearance: bold axis lines and text
theme(
axis.line.x = element_line(size = 1, color = "black"),
axis.line.y = element_line(size = 1, color = "black"),
axis.text.x = element_text(angle = 20, hjust = 1, face = "bold"),
axis.title = element_text(face = "bold"),
plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
legend.position = c(.95, 0.9), # Top-right corner
legend.title = element_blank(),
legend.text = element_text(size = 8),
legend.key.size = unit(0.2, "cm") # Shrink legend keys
)
# Print the plot
Solanki
ggsave("raincloud_plot.png", plot = Solanki, width = 10, height = 6, dpi = 900)
# When you will create picture (png) in R and save, it will look much better and polished.
R Core Team (2024). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL: https://www.R-project.org/
RStudio Team (2019). RStudio: Integrated Development for R. RStudio, Inc., Boston, MA. URL http://www.rstudio.com/.
Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. URL: https://ggplot2.tidyverse.org/
Min, S. (2023). smplot2: Simplified Plotting Functions for ggplot2. GitHub repository: https://github.com/smin95/smplot2
R Markdown Cookbook Yihui Xie, Christophe Dervieux, Emily Riederer.2025-04-03 https://bookdown.org/yihui/rmarkdown-cookbook/