1 Introduction

This 433/533 class exercise will focus on the analysis of fungicide application timing and the efficacy of different products on crop yield using field plot data. I extracted the first set of data (mock_yield_data) from the picture (multi-location trial) we used in one of our classes. Another set of data is created from multi-location, multi-product trials (test, test2, test4, test4, etc.). The data set is stored in an Excel file that is in the working directory and includes different treatments applied at various locations. Make sure use a correct folder path where you have all the files for subsequent analysis.

This exercise will help you learn how to:

  • load field data into R,
  • summarize treatment means,
  • run ANOVA and LSD tests,
  • and create a publication-style bar plot with error bars and annotations.

2 R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com. It’s a combination of: - Markdown (text with formatting) - R code chunks (sections of code you can run) - Output (plots, summaries, tables)

3 R and required packages Installation

3.1 R Installation

  • R is a programming language used for data analysis and visualization.
  • RStudio is a user-friendly interface to run R code. It’s where you’ll write scripts, view plots, and work with files. Before you begin:
  1. Install R: https://cran.r-project.org/
  2. Install RStudio: https://posit.co/download/rstudio-desktop/
# Webling: https://posit.co/download/rstudio-desktop/
# Note first install 'R' then 'Rstudio':

3.2 Installing necessary packages

Packages are tools that add functions to R. Run on console to get required packages.

install.packages(“librarian”) install.packages(“bookdown”)

packages <- c(“ggplot2”, “dplyr”, “readxl”, “knitr”, “tidyr”, “agricolae”, “here”, “bookdown”)

new_packages <- packages[!(packages %in% installed.packages()[,“Package”])] if(length(new_packages)) install.packages(new_packages)

new_packages <- packages[!(packages %in% installed.packages()[,“Package”])]; if(length(new_packages)) install.packages(new_packages)

librarian::shelf(readr, knitr,readxl, dplyr, tidyr, ggplot2, agricolae, here, bookdown)

Alternatively

packages <- c(
  "readr", "readxl", "dplyr", "tidyr",
  "ggplot2", "agricolae", "here", "bookdown", "knitr"
)

new_packages <- packages[!(packages %in% installed.packages()[, "Package"])]
if (length(new_packages)) install.packages(new_packages)

invisible(lapply(packages, library, character.only = TRUE))
# The 'librarian' package checks if packages are installed, verifies their version,
# installs only if needed, and then loads all desired packages into the R session.
# however it required CRAN mirroring, otherwise fails.

librarian::shelf(readr, knitr,readxl, dplyr, tidyr, ggplot2, agricolae, here)

3.3 Setting working directory, file location

This is the folder where R will look for your files.

setwd("C:/Users/Shyam.Solanki/Dropbox/SDSU/Teaching/Diseases_of_Field_Crops_Major_years/2026_spring/Class_40/DFC_data")
#If you get error: Classic Windows path issue in R:backslashes (\) in the file path
# Use forward slashes /: if you are copying path from Windows
 # use your working directory. Can I make red colored?

💡 Note: Make sure the folder path above is correct for your computer.
Example path: C:/Users/huma0/Desktop/Data

4 Class data: Load Excel/csv data from the working directory.

We will load a CSV file named mock_yield_data.csv.

data <- read_csv("mock_yield_data.csv")
## Rows: 96 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Location, Treatment
## dbl (3): Rep, Yield, Severity
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#use read_csv or read_excel
# View the first few rows
#Explore the Data
head(data)
## # A tibble: 6 × 5
##   Location   Treatment   Rep Yield Severity
##   <chr>      <chr>     <dbl> <dbl>    <dbl>
## 1 ClayCenter Check         1  77.5      8  
## 2 ClayCenter Check         2  78.8      7  
## 3 ClayCenter Check         3  85.9      7  
## 4 ClayCenter Check         4  80.0      9  
## 5 ClayCenter R3            1  77.2      3.1
## 6 ClayCenter R3            2  83.3      3.2
str(data)
## spc_tbl_ [96 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Location : chr [1:96] "ClayCenter" "ClayCenter" "ClayCenter" "ClayCenter" ...
##  $ Treatment: chr [1:96] "Check" "Check" "Check" "Check" ...
##  $ Rep      : num [1:96] 1 2 3 4 1 2 3 4 1 2 ...
##  $ Yield    : num [1:96] 77.5 78.8 85.9 80 77.2 ...
##  $ Severity : num [1:96] 8 7 7 9 3.1 3.2 3.4 3.4 6 5.4 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Location = col_character(),
##   ..   Treatment = col_character(),
##   ..   Rep = col_double(),
##   ..   Yield = col_double(),
##   ..   Severity = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
summary(data)
##    Location          Treatment              Rep           Yield      
##  Length:96          Length:96          Min.   :1.00   Min.   :58.95  
##  Class :character   Class :character   1st Qu.:1.75   1st Qu.:72.21  
##  Mode  :character   Mode  :character   Median :2.50   Median :77.79  
##                                        Mean   :2.50   Mean   :77.92  
##                                        3rd Qu.:3.25   3rd Qu.:84.82  
##                                        Max.   :4.00   Max.   :94.99  
##     Severity     
##  Min.   : 2.700  
##  1st Qu.: 3.200  
##  Median : 4.550  
##  Mean   : 5.146  
##  3rd Qu.: 6.425  
##  Max.   :10.000

4.1 Summary Statistics by Location and Treatment

Summarize by Group. We want the mean, standard deviation, and standard error by location and treatment.

# Summary statistics by location and treatment
# %>% is the pipe operator. It sends the output from one step into the next step.
# This step calculates the mean yield, standard deviation,
# number of observations, and standard error for each treatment
# within each location.
# data → group it → then summarize it.
summary_df <- data %>%
  group_by(Location, Treatment) %>% 
  summarize(
    MeanYield = mean(Yield, na.rm = TRUE),
    SD = sd(Yield, na.rm = TRUE),
    n = n(),
    SE = SD / sqrt(n),
    .groups = "drop"
  )
head(summary_df)
## # A tibble: 6 × 6
##   Location   Treatment MeanYield    SD     n    SE
##   <chr>      <chr>         <dbl> <dbl> <int> <dbl>
## 1 ClayCenter Check          80.5  3.73     4 1.86 
## 2 ClayCenter R3             77.7  4.70     4 2.35 
## 3 ClayCenter V4             84.0  3.61     4 1.81 
## 4 ClayCenter V4R3           85.3  4.12     4 2.06 
## 5 Clinton    Check          84.4  5.22     4 2.61 
## 6 Clinton    R3             84.1  1.71     4 0.856
# Set plotting order for Treatment
#Converts Treatment into a factor.
#Sets the order of those categories. We can change order if needed.
summary_df$Treatment <- factor(summary_df$Treatment, levels = c("Check", "V4", "R3", "V4R3"))
head(summary_df)
## # A tibble: 6 × 6
##   Location   Treatment MeanYield    SD     n    SE
##   <chr>      <fct>         <dbl> <dbl> <int> <dbl>
## 1 ClayCenter Check          80.5  3.73     4 1.86 
## 2 ClayCenter R3             77.7  4.70     4 2.35 
## 3 ClayCenter V4             84.0  3.61     4 1.81 
## 4 ClayCenter V4R3           85.3  4.12     4 2.06 
## 5 Clinton    Check          84.4  5.22     4 2.61 
## 6 Clinton    R3             84.1  1.71     4 0.856

4.1.1 ANOVA and LSD Test

This compares treatment effects statistically.This step runs a separate ANOVA and LSD test for each location. It creates a results table containing the LSD value and ANOVA p-value, which are later used to decide whether treatment differences are significant and to annotate the plot.

lsd_results <- data %>%
  group_by(Location) %>%
  do({
    model <- aov(Yield ~ Treatment, data = .)
    pval <- summary(model)[[1]]$`Pr(>F)`[1]
    lsd_out <- LSD.test(model, "Treatment", alpha = 0.10, group = FALSE)
    data.frame(LSD = lsd_out$statistics$LSD, p_value = pval)
  }) %>%
  ungroup()

lsd_results
## # A tibble: 6 × 3
##   Location     LSD p_value
##   <chr>      <dbl>   <dbl>
## 1 ClayCenter  5.12 0.0806 
## 2 Clinton     5.36 0.735  
## 3 Janesville  5.05 0.769  
## 4 Malta       3.13 0.00257
## 5 Slater      5.38 0.525  
## 6 Waterloo    3.08 0.286

4.1.2 Preparing Annotations for Plot

annotation_df <- summary_df %>%
  group_by(Location) %>%
  summarize(MaxMean = max(MeanYield)) %>%
  left_join(lsd_results, by = "Location") %>%
  mutate(
    LSD_label = ifelse(
      p_value < 0.10,
      paste0("LSD(0.10) = ", round(LSD, 1)),
      "LSD(0.10) = NS"
    )
  )

annotation_df
## # A tibble: 6 × 5
##   Location   MaxMean   LSD p_value LSD_label      
##   <chr>        <dbl> <dbl>   <dbl> <chr>          
## 1 ClayCenter    85.3  5.12 0.0806  LSD(0.10) = 5.1
## 2 Clinton       87.0  5.36 0.735   LSD(0.10) = NS 
## 3 Janesville    87.7  5.05 0.769   LSD(0.10) = NS 
## 4 Malta         71.8  3.13 0.00257 LSD(0.10) = 3.1
## 5 Slater        76.5  5.38 0.525   LSD(0.10) = NS 
## 6 Waterloo      75.7  3.08 0.286   LSD(0.10) = NS

4.2 Plot: Fungicide Timing Effect on Yield

# this code chunk has a figure plotted, so 'fig.cap' needed.



my_colors <- c(
  "Check" = "#999999",
  "V4"    = "#0072B2",
  "R3"    = "#E69F00",
  "V4R3"  = "#009E73"
)

   #"Check" = "#E69F00",
 # "R3" = "#F0E442",
 # "V4" = "#0072B2",
 # "V4R3" = "#444444"
#)


yield_plot <- ggplot(summary_df, aes(x = Location, y = MeanYield, fill = Treatment)) +
  geom_col(position = position_dodge(0.8), width = 0.7, color = "black") +
  geom_errorbar(
    aes(ymin = MeanYield - SE, ymax = MeanYield + SE),
    position = position_dodge(0.8), width = 0.3
  ) +
  geom_text(
    aes(label = round(MeanYield, 1), group = Treatment),
    position = position_dodge(0.8),
    vjust = 0.7,
    hjust = 4,
    angle = 90,
    fontface = "bold",
    color = "white",
    size = 3
  ) +
  geom_text(
    data = annotation_df,
    aes(x = Location, y = MaxMean + 5, label = LSD_label),
    inherit.aes = FALSE,
    size = 2.8,
    fontface = "bold"
  ) +
  scale_fill_manual(values = my_colors) +
  scale_y_continuous(limits = c(0, 100), expand = expansion(mult = c(0, 0.12))) +
  labs(
    title = "Effect of Fungicide Timing on Yield",
    x = "Location",
    y = "Yield (bu/A)",
    fill = NULL
  ) +
  theme_minimal(base_size = 14) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1, face = "bold", size = 12, color = "black"),
    axis.text.y = element_text(face = "bold", size = 12, color = "black"),
    axis.title = element_text(face = "bold", size = 14, color = "black"),
    axis.line = element_line(color = "black"),
    axis.ticks = element_line(color = "black"),
    panel.grid.major.x = element_blank(),
    panel.grid.minor = element_blank(),
    plot.title = element_text(hjust = 0.5, face = "bold", size = 16)
  )

print(yield_plot)
Effect of Fungicide Timing on Yield

Figure 4.1: Effect of Fungicide Timing on Yield

4.3 Saving plot in working directory

ggsave("Yield_Fungicide_Effect.png", plot = yield_plot, dpi = 900, width = 12, height = 10, units = "in")

4.4 Congrats!

You have completed a full R analysis workflow on given a dataset:

  • Loaded data.

  • Summarized results.

  • Ran statistical tests.

  • Created and saved a custom plot.

Let me know if you have questions, or experiment by changing some numbers in the dataset and re-running the analysis!