Introduction

Fusiform rust disease is one of the most important tree diseases in the Southeastern U.S. Caused by Cronartium quercuum f.sp fusiforme (Cfq), the disease greatly impacts loblolly pine, the most widely planted timber species for this region (Amerson et al. 2015). Annual losses due to Fusiform rust infection in loblolly and slash pine is estimate to be $35 million annually for the range of Virginia to Florida in the Southeast (Anderson et al. 1986). However, genetic variation for resistance in loblolly pine is recognized for the Fusiform rust fungus (McKeand et al. 1999).

Materials and Methods

Genetic Material

The Tree Improvement Program at North Carolina State University (NCSUCTIP) is a cooperative designed to breed loblolly pine for genetic gain and improved performance. NCSUCTIP chose the Atlantic Coastal Elite (ACE) population, an intensively managed elite breeding population, to conduct an analysis of genetic variation of Fusiform rust disease resistance. A three-disconnected diallel mating design was used to mate 24 trees with ach diallel comprising of 8 trees. 76 crosses were produced and for each cross, 120 full-sib seedlings were grown in a greenhouse for Cqf resistance screening prior to field planting.

Artificial Inoculation

The inoculation of the Atlantic Coastal Elite population took place at the Resistance Screening Center (RSC) located in Asheville, North Carolina. 76 crosses were inoculated with 120 seedlings per cross and 53 families were analyzed for disease incidence. 23 crosses were removed from analysis due to higher disease incidence. Additionally, seven coastal checklots were inoculated to serve as baseline comparison against the crosses of interest in the ACE population.

A mixed inoculum was used in the experiment that represents the coastal deployment of loblolly pine. Five regions were collected by the RSC to create the single bulk mix comprised of three ten-gall mixes from random trees from each region. Inocula were sprayed at a density of 50,000 spores per milliliter using a controlled basidiospores inoculation system, placed in a humidity chamber set for optimal growth with temperatures ideal for fungal infection, then moved to the RSC greenhouses (Young et al. 2018, unpublished). Screening occurred after nine months for the presence or absence of galls on the seedlings (no gall=0, gall=1).

Statistical Analysis

Categorical data analysis for the female and male parents were identified as predictors variables and the results (no gall=0, gall=1) of the inoculation were treated as qualitative binary data.

We fit a generalized linear mixed model to partition observed variation into genetic and environmental components.

\[log⁡(p/(1-p))= μ+ R_i+F_j+RF_{ij}+E_{ij}\]

\(log⁡(p/(1-p)\) is the log odds of the presence of a gall.

\(μ\) is the intercept and conditional mean.

\(R_i\) is the replication effect.

\(F_i\) is the family effect.

\(RF_{ij}\) is the random effect for the replication by family interaction \(∼N(0,Iσ_{rf}^2)\).

\(E_{ij}\) is the residual error with \(∼N(0,Iσ^2)\).

R was used in the statistical analysis of this data as well as the use of packages ggplot2, tidyverse, asremlPlus, and lme4.

R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.

Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686

Chris Brien (2021). asremlPlus: Augments ‘ASReml-R’ in Fitting Mixed Models and Packages Generally in Exploring Prediction Differences. R package version 4.2-32. https://CRAN.R-project.org/package=asremlPlus

Douglas Bates, Martin Maechler, Ben Bolker, Steve Walker (2015). Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.

Load Data

ACEdata <- read.table("/Users/sarahconner/Desktop/R data/ACE_screen_data_nochecks_recoded.csv",  sep=",", header = TRUE)
str(ACEdata)
## 'data.frame':    8846 obs. of  9 variables:
##  $ female : chr  "N111069" "N111069" "N111069" "N111069" ...
##  $ male   : chr  "N11210" "N11210" "N11210" "N11210" ...
##  $ code   : chr  "N111069_x_N11210" "N111069_x_N11210" "N111069_x_N11210" "N111069_x_N11210" ...
##  $ rep    : chr  "A" "A" "A" "A" ...
##  $ tray   : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ disease: int  1 1 1 1 1 1 1 1 0 0 ...
##  $ femaleR: int  146 146 146 146 146 146 146 146 146 146 ...
##  $ maleR  : int  137 137 137 137 137 137 137 137 137 137 ...
##  $ codeR  : chr  "146_x_137" "146_x_137" "146_x_137" "146_x_137" ...
names(ACEdata)
## [1] "female"  "male"    "code"    "rep"     "tray"    "disease" "femaleR"
## [8] "maleR"   "codeR"
attach(ACEdata)

Visualize the Data

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.0.6     ✓ dplyr   1.0.4
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
# overall mean for disease
ACEdata %>% 
  dplyr::summarize(diseasemean=mean(disease, na.rm=TRUE), n=n())
##   diseasemean    n
## 1   0.4739995 8846
#Family means
ACEdata %>% 
  group_by(codeR) %>% 
  dplyr::summarize(diseasemean=mean(disease, na.rm=TRUE), n=n())
## # A tibble: 60 x 3
##    codeR     diseasemean     n
##  * <chr>           <dbl> <int>
##  1 110_x_113       0.639   119
##  2 110_x_116       0.575   240
##  3 110_x_121       0.2     120
##  4 110_x_143       0.432   111
##  5 110_x_160       0.592   360
##  6 110_x_179       0.678    59
##  7 113_x_143       0.567   120
##  8 113_x_160       0.458   120
##  9 116_x_116       0.625   120
## 10 121_x_113       0.2     120
## # … with 50 more rows

Summary of Families

# Family Means for Disease
familym <- ACEdata %>% 
  group_by(codeR) %>% 
  dplyr::summarize(
    diseasemean=mean(disease, na.rm=TRUE)
  )

ggplot(data=familym, aes(x=reorder(codeR, -diseasemean), y=diseasemean), na.rm = T) +
  geom_bar(stat="identity", color="skyblue", fill="white")+
  scale_fill_brewer(palette="Blues") +
  ylim(0,1) +
  theme_minimal() +
  labs(x="Family ID", y="Disease") 

Model Diagnostic

The following code is an example code using ASReml-R and glmer to fit a Generalized Linear Mixed Model for this binary dataset that includes fixed and random effects.

#library(asremlPlus)
#asr <- asreml(fixed = disease ~ 1 + rep + codeR,
#              random= ~ rep:codeR,
#              family = list(asr_binomial(link = "logit", dispersion = 1, total = NULL)),
#              na.action=na.method(x="include", y="include"),
#              data=tfir, maxit=100)
#summary(asr)

#library(lme4)
#model <- glmer(disease ~ rep + codeR + (rep | codeR), data = ACEdata, family = binomial)
#summary(model)