Fusiform rust disease is one of the most important tree diseases in the Southeastern U.S. Caused by Cronartium quercuum f.sp fusiforme (Cfq), the disease greatly impacts loblolly pine, the most widely planted timber species for this region (Amerson et al. 2015). Annual losses due to Fusiform rust infection in loblolly and slash pine is estimate to be $35 million annually for the range of Virginia to Florida in the Southeast (Anderson et al. 1986). However, genetic variation for resistance in loblolly pine is recognized for the Fusiform rust fungus (McKeand et al. 1999).
The Tree Improvement Program at North Carolina State University (NCSUCTIP) is a cooperative designed to breed loblolly pine for genetic gain and improved performance. NCSUCTIP chose the Atlantic Coastal Elite (ACE) population, an intensively managed elite breeding population, to conduct an analysis of genetic variation of Fusiform rust disease resistance. A three-disconnected diallel mating design was used to mate 24 trees with ach diallel comprising of 8 trees. 76 crosses were produced and for each cross, 120 full-sib seedlings were grown in a greenhouse for Cqf resistance screening prior to field planting.
The inoculation of the Atlantic Coastal Elite population took place at the Resistance Screening Center (RSC) located in Asheville, North Carolina. 76 crosses were inoculated with 120 seedlings per cross and 53 families were analyzed for disease incidence. 23 crosses were removed from analysis due to higher disease incidence. Additionally, seven coastal checklots were inoculated to serve as baseline comparison against the crosses of interest in the ACE population.
A mixed inoculum was used in the experiment that represents the coastal deployment of loblolly pine. Five regions were collected by the RSC to create the single bulk mix comprised of three ten-gall mixes from random trees from each region. Inocula were sprayed at a density of 50,000 spores per milliliter using a controlled basidiospores inoculation system, placed in a humidity chamber set for optimal growth with temperatures ideal for fungal infection, then moved to the RSC greenhouses (Young et al. 2018, unpublished). Screening occurred after nine months for the presence or absence of galls on the seedlings (no gall=0, gall=1).
Categorical data analysis for the female and male parents were identified as predictors variables and the results (no gall=0, gall=1) of the inoculation were treated as qualitative binary data.
We fit a generalized linear mixed model to partition observed variation into genetic and environmental components.
\[log(p/(1-p))= μ+ R_i+F_j+RF_{ij}+E_{ij}\]
\(log(p/(1-p)\) is the log odds of the presence of a gall.
\(μ\) is the intercept and conditional mean.
\(R_i\) is the replication effect.
\(F_i\) is the family effect.
\(RF_{ij}\) is the random effect for the replication by family interaction \(∼N(0,Iσ_{rf}^2)\).
\(E_{ij}\) is the residual error with \(∼N(0,Iσ^2)\).
R was used in the statistical analysis of this data as well as the use of packages ggplot2, tidyverse, asremlPlus, and lme4.
R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686
Chris Brien (2021). asremlPlus: Augments ‘ASReml-R’ in Fitting Mixed Models and Packages Generally in Exploring Prediction Differences. R package version 4.2-32. https://CRAN.R-project.org/package=asremlPlus
Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
ACEdata <- read.table("/Users/sarahconner/Desktop/R data/ACE_screen_data_nochecks_recoded.csv", sep=",", header = TRUE)
str(ACEdata)
## 'data.frame': 8846 obs. of 9 variables:
## $ female : chr "N111069" "N111069" "N111069" "N111069" ...
## $ male : chr "N11210" "N11210" "N11210" "N11210" ...
## $ code : chr "N111069_x_N11210" "N111069_x_N11210" "N111069_x_N11210" "N111069_x_N11210" ...
## $ rep : chr "A" "A" "A" "A" ...
## $ tray : int 1 1 1 1 1 1 1 1 1 1 ...
## $ disease: int 1 1 1 1 1 1 1 1 0 0 ...
## $ femaleR: int 146 146 146 146 146 146 146 146 146 146 ...
## $ maleR : int 137 137 137 137 137 137 137 137 137 137 ...
## $ codeR : chr "146_x_137" "146_x_137" "146_x_137" "146_x_137" ...
names(ACEdata)
## [1] "female" "male" "code" "rep" "tray" "disease" "femaleR"
## [8] "maleR" "codeR"
attach(ACEdata)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.0.6 ✓ dplyr 1.0.4
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
# overall mean for disease
ACEdata %>%
dplyr::summarize(diseasemean=mean(disease, na.rm=TRUE), n=n())
## diseasemean n
## 1 0.4739995 8846
#Family means
ACEdata %>%
group_by(codeR) %>%
dplyr::summarize(diseasemean=mean(disease, na.rm=TRUE), n=n())
## # A tibble: 60 x 3
## codeR diseasemean n
## * <chr> <dbl> <int>
## 1 110_x_113 0.639 119
## 2 110_x_116 0.575 240
## 3 110_x_121 0.2 120
## 4 110_x_143 0.432 111
## 5 110_x_160 0.592 360
## 6 110_x_179 0.678 59
## 7 113_x_143 0.567 120
## 8 113_x_160 0.458 120
## 9 116_x_116 0.625 120
## 10 121_x_113 0.2 120
## # … with 50 more rows
# Family Means for Disease
familym <- ACEdata %>%
group_by(codeR) %>%
dplyr::summarize(
diseasemean=mean(disease, na.rm=TRUE)
)
ggplot(data=familym, aes(x=reorder(codeR, -diseasemean), y=diseasemean), na.rm = T) +
geom_bar(stat="identity", color="skyblue", fill="white")+
scale_fill_brewer(palette="Blues") +
ylim(0,1) +
theme_minimal() +
labs(x="Family ID", y="Disease")
The following code is an example code using ASReml-R and glmer to fit a Generalized Linear Mixed Model for this binary dataset that includes fixed and random effects.
#library(asremlPlus)
#asr <- asreml(fixed = disease ~ 1 + rep + codeR,
# random= ~ rep:codeR,
# family = list(asr_binomial(link = "logit", dispersion = 1, total = NULL)),
# na.action=na.method(x="include", y="include"),
# data=tfir, maxit=100)
#summary(asr)
#library(lme4)
#model <- glmer(disease ~ rep + codeR + (rep | codeR), data = ACEdata, family = binomial)
#summary(model)