This 433/533 class exercise will focus on the analysis of fungicide application timing and the efficacy of different products on crop yield using field plot data. I extracted the first set of data (mock_yield_data) from the picture (multi-location trial) we used in one of our classes. Another set of data is created from multi-location, multi-product trials (test, test2, test4, test4, etc.). The data set is stored in an Excel file that is in the working directory and includes different treatments applied at various locations. Make sure use a correct folder path where you have all the files for subsequent analysis.
This exercise will help you learn how to:
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com. It’s a combination of: - Markdown (text with formatting) - R code chunks (sections of code you can run) - Output (plots, summaries, tables)
# Webling: https://posit.co/download/rstudio-desktop/
# Note first install 'R' then 'Rstudio':
Packages are tools that add functions to R. Run on console to get required packages.
install.packages(“librarian”) install.packages(“bookdown”)
packages <- c(“ggplot2”, “dplyr”, “readxl”, “knitr”, “tidyr”, “agricolae”, “here”, “bookdown”)
new_packages <- packages[!(packages %in% installed.packages()[,“Package”])] if(length(new_packages)) install.packages(new_packages)
new_packages <- packages[!(packages %in% installed.packages()[,“Package”])]; if(length(new_packages)) install.packages(new_packages)
librarian::shelf(readr, knitr,readxl, dplyr, tidyr, ggplot2, agricolae, here, bookdown)
Alternatively
packages <- c(
"readr", "readxl", "dplyr", "tidyr",
"ggplot2", "agricolae", "here", "bookdown", "knitr"
)
new_packages <- packages[!(packages %in% installed.packages()[, "Package"])]
if (length(new_packages)) install.packages(new_packages)
invisible(lapply(packages, library, character.only = TRUE))
# The 'librarian' package checks if packages are installed, verifies their version,
# installs only if needed, and then loads all desired packages into the R session.
# however it required CRAN mirroring, otherwise fails.
librarian::shelf(readr, knitr,readxl, dplyr, tidyr, ggplot2, agricolae, here)
This is the folder where R will look for your files.
setwd("C:/Users/Shyam.Solanki/Dropbox/SDSU/Teaching/Diseases_of_Field_Crops_Major_years/2026_spring/Class_40/DFC_data")
#If you get error: Classic Windows path issue in R:backslashes (\) in the file path
# Use forward slashes /: if you are copying path from Windows
# use your working directory. Can I make red colored?
💡 Note: Make sure the folder path above is correct for your computer.
Example path: C:/Users/huma0/Desktop/Data
We will load a CSV file named mock_yield_data.csv.
data <- read_csv("mock_yield_data.csv")
## Rows: 96 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Location, Treatment
## dbl (3): Rep, Yield, Severity
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#use read_csv or read_excel
# View the first few rows
#Explore the Data
head(data)
## # A tibble: 6 × 5
## Location Treatment Rep Yield Severity
## <chr> <chr> <dbl> <dbl> <dbl>
## 1 ClayCenter Check 1 77.5 8
## 2 ClayCenter Check 2 78.8 7
## 3 ClayCenter Check 3 85.9 7
## 4 ClayCenter Check 4 80.0 9
## 5 ClayCenter R3 1 77.2 3.1
## 6 ClayCenter R3 2 83.3 3.2
str(data)
## spc_tbl_ [96 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Location : chr [1:96] "ClayCenter" "ClayCenter" "ClayCenter" "ClayCenter" ...
## $ Treatment: chr [1:96] "Check" "Check" "Check" "Check" ...
## $ Rep : num [1:96] 1 2 3 4 1 2 3 4 1 2 ...
## $ Yield : num [1:96] 77.5 78.8 85.9 80 77.2 ...
## $ Severity : num [1:96] 8 7 7 9 3.1 3.2 3.4 3.4 6 5.4 ...
## - attr(*, "spec")=
## .. cols(
## .. Location = col_character(),
## .. Treatment = col_character(),
## .. Rep = col_double(),
## .. Yield = col_double(),
## .. Severity = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
summary(data)
## Location Treatment Rep Yield
## Length:96 Length:96 Min. :1.00 Min. :58.95
## Class :character Class :character 1st Qu.:1.75 1st Qu.:72.21
## Mode :character Mode :character Median :2.50 Median :77.79
## Mean :2.50 Mean :77.92
## 3rd Qu.:3.25 3rd Qu.:84.82
## Max. :4.00 Max. :94.99
## Severity
## Min. : 2.700
## 1st Qu.: 3.200
## Median : 4.550
## Mean : 5.146
## 3rd Qu.: 6.425
## Max. :10.000
Summarize by Group. We want the mean, standard deviation, and standard error by location and treatment.
# Summary statistics by location and treatment
# %>% is the pipe operator. It sends the output from one step into the next step.
# This step calculates the mean yield, standard deviation,
# number of observations, and standard error for each treatment
# within each location.
# data → group it → then summarize it.
summary_df <- data %>%
group_by(Location, Treatment) %>%
summarize(
MeanYield = mean(Yield, na.rm = TRUE),
SD = sd(Yield, na.rm = TRUE),
n = n(),
SE = SD / sqrt(n),
.groups = "drop"
)
head(summary_df)
## # A tibble: 6 × 6
## Location Treatment MeanYield SD n SE
## <chr> <chr> <dbl> <dbl> <int> <dbl>
## 1 ClayCenter Check 80.5 3.73 4 1.86
## 2 ClayCenter R3 77.7 4.70 4 2.35
## 3 ClayCenter V4 84.0 3.61 4 1.81
## 4 ClayCenter V4R3 85.3 4.12 4 2.06
## 5 Clinton Check 84.4 5.22 4 2.61
## 6 Clinton R3 84.1 1.71 4 0.856
# Set plotting order for Treatment
#Converts Treatment into a factor.
#Sets the order of those categories. We can change order if needed.
summary_df$Treatment <- factor(summary_df$Treatment, levels = c("Check", "V4", "R3", "V4R3"))
head(summary_df)
## # A tibble: 6 × 6
## Location Treatment MeanYield SD n SE
## <chr> <fct> <dbl> <dbl> <int> <dbl>
## 1 ClayCenter Check 80.5 3.73 4 1.86
## 2 ClayCenter R3 77.7 4.70 4 2.35
## 3 ClayCenter V4 84.0 3.61 4 1.81
## 4 ClayCenter V4R3 85.3 4.12 4 2.06
## 5 Clinton Check 84.4 5.22 4 2.61
## 6 Clinton R3 84.1 1.71 4 0.856
This compares treatment effects statistically.This step runs a separate ANOVA and LSD test for each location. It creates a results table containing the LSD value and ANOVA p-value, which are later used to decide whether treatment differences are significant and to annotate the plot.
lsd_results <- data %>%
group_by(Location) %>%
do({
model <- aov(Yield ~ Treatment, data = .)
pval <- summary(model)[[1]]$`Pr(>F)`[1]
lsd_out <- LSD.test(model, "Treatment", alpha = 0.10, group = FALSE)
data.frame(LSD = lsd_out$statistics$LSD, p_value = pval)
}) %>%
ungroup()
lsd_results
## # A tibble: 6 × 3
## Location LSD p_value
## <chr> <dbl> <dbl>
## 1 ClayCenter 5.12 0.0806
## 2 Clinton 5.36 0.735
## 3 Janesville 5.05 0.769
## 4 Malta 3.13 0.00257
## 5 Slater 5.38 0.525
## 6 Waterloo 3.08 0.286
annotation_df <- summary_df %>%
group_by(Location) %>%
summarize(MaxMean = max(MeanYield)) %>%
left_join(lsd_results, by = "Location") %>%
mutate(
LSD_label = ifelse(
p_value < 0.10,
paste0("LSD(0.10) = ", round(LSD, 1)),
"LSD(0.10) = NS"
)
)
annotation_df
## # A tibble: 6 × 5
## Location MaxMean LSD p_value LSD_label
## <chr> <dbl> <dbl> <dbl> <chr>
## 1 ClayCenter 85.3 5.12 0.0806 LSD(0.10) = 5.1
## 2 Clinton 87.0 5.36 0.735 LSD(0.10) = NS
## 3 Janesville 87.7 5.05 0.769 LSD(0.10) = NS
## 4 Malta 71.8 3.13 0.00257 LSD(0.10) = 3.1
## 5 Slater 76.5 5.38 0.525 LSD(0.10) = NS
## 6 Waterloo 75.7 3.08 0.286 LSD(0.10) = NS
# this code chunk has a figure plotted, so 'fig.cap' needed.
my_colors <- c(
"Check" = "#999999",
"V4" = "#0072B2",
"R3" = "#E69F00",
"V4R3" = "#009E73"
)
#"Check" = "#E69F00",
# "R3" = "#F0E442",
# "V4" = "#0072B2",
# "V4R3" = "#444444"
#)
yield_plot <- ggplot(summary_df, aes(x = Location, y = MeanYield, fill = Treatment)) +
geom_col(position = position_dodge(0.8), width = 0.7, color = "black") +
geom_errorbar(
aes(ymin = MeanYield - SE, ymax = MeanYield + SE),
position = position_dodge(0.8), width = 0.3
) +
geom_text(
aes(label = round(MeanYield, 1), group = Treatment),
position = position_dodge(0.8),
vjust = 0.7,
hjust = 4,
angle = 90,
fontface = "bold",
color = "white",
size = 3
) +
geom_text(
data = annotation_df,
aes(x = Location, y = MaxMean + 5, label = LSD_label),
inherit.aes = FALSE,
size = 2.8,
fontface = "bold"
) +
scale_fill_manual(values = my_colors) +
scale_y_continuous(limits = c(0, 100), expand = expansion(mult = c(0, 0.12))) +
labs(
title = "Effect of Fungicide Timing on Yield",
x = "Location",
y = "Yield (bu/A)",
fill = NULL
) +
theme_minimal(base_size = 14) +
theme(
axis.text.x = element_text(angle = 45, hjust = 1, face = "bold", size = 12, color = "black"),
axis.text.y = element_text(face = "bold", size = 12, color = "black"),
axis.title = element_text(face = "bold", size = 14, color = "black"),
axis.line = element_line(color = "black"),
axis.ticks = element_line(color = "black"),
panel.grid.major.x = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(hjust = 0.5, face = "bold", size = 16)
)
print(yield_plot)
Figure 4.1: Effect of Fungicide Timing on Yield
ggsave("Yield_Fungicide_Effect.png", plot = yield_plot, dpi = 900, width = 12, height = 10, units = "in")
You have completed a full R analysis workflow on given a dataset:
Loaded data.
Summarized results.
Ran statistical tests.
Created and saved a custom plot.
Let me know if you have questions, or experiment by changing some numbers in the dataset and re-running the analysis!