Alencar Xavier

Outline

Part 1: Overview

  • Introduction
  • Breeding analytics, impact, history
  • General framework

Part 2: Operations

  • Routine analytics
  • Data visualization
  • Exploratory analysis

Part 1 - Overview

Introduction

  • Analytics help us to work smart and to achieve our goals more efficiently.

  • Empower breeders to understand the data nuances, strengths and pitfalls.

Breeding analytics

  • Data visualization: Quality control of locations, experiments, traits

  • Phenotypic analysis: BLUPs, repeatability, spatial, product placement

  • Genomic analysis: Predictions, crosses-combinations, diversity, QTLs

  • Phenomic analysis: Automation, multi-trait analysis, quality control

  • Envirotyping analysis: Environmental characterization, GxE predictions

  • Optimization analysis: Exp. Designs, resource allocation (Reps vs Locs)

Impact of breeding analytics

Where is the main impact of breeding analytics?

  • ADD VALUE: Optimize the breeding pipeline
  • INTUITUVE DECISIONS: Descriptive + Predictive + Prescriptive
  • EMPOWERMENT: Enhance understanding and free breeder’s time

Analytics must be (1) simple, (2) useful and (3) interpretable

History of breeding analytics

  • Francis Galton (1886): Regression and heritability
  • Ronald Fisher (1918): Infinitesimal model, P = G + E
  • Sewall Wright - (1921,1922): Inheritance and genetic relationship
  • Charles Henderson (1968): Modeling genetics with relationship

General framework

General framework

Why are mixed models so popular?

Common model setup

Mixed model notation

  • Linear model: \(y=Xb+Zu+e\)
  • Genetic variance: \(V(u)=G=A\sigma^2_a\)
  • Residual variance: \(V(e)=R=I\sigma^2_e\)
  • Henderson’s equation (\(Cg=r\))

\[ \left[\begin{array}{rr}X'R^{-1} X & Z'R^{-1}X \\X'R^{-1}Z & Z'R^{-1}Z+G^{-1}\end{array}\right] \left[\begin{array}{r} b \\ u \end{array}\right] =\left[\begin{array}{r} X'R^{-1}y \\ Z'R^{-1}y \end{array}\right] \]

  • We know (data): \(x=\{y,X,Z,A\}\)
  • We want (parameters): \(\theta=\{b,u,\sigma^{2}_{a},\sigma^{2}_{e}\}\)
  • Parameter estimation based on Gaussian likelihood: \(L(x|\theta)\)

Mixed model solutions

Variance components (\(\partial L / \partial \sigma^2_i\))

  • \(\sigma^2_a = \frac{\hat{u}' A^{-1}\hat{u} + C^{22}}{q} \approx \frac{y'S'Z\hat{u}}{tr(SZ'AZ)}\)
  • \(\sigma^2_e = \frac{y'\hat{e}}{n-rank(X)}\)

Key summary statistics

  • \(h^2 = \frac{V(g)}{V(y)} = V^{-1}G\)
  • \(sd(\hat{g}) = \sqrt{Diag(G-C^{22})}\)
  • \(Acc = Cor(\hat{g},g) = \frac{Cov(\hat{g},g)}{\sqrt{V(\hat{g})V(g)} } = \sqrt{\frac{GV^{-1}G}{G}}\)

Multivariate mixed models

Mixed models also enable us to evaluate multiple traits

\[ y=\{y_1,y_2,...,y_k\} \]

With multiple traits, the relation among traits is modeled

\[ V(u) = A \otimes \Sigma_a = \left[\begin{array}{rr} A \sigma^2_{a_1} & A \sigma_{a_{12}} \\ A \sigma_{a_{21}} & A \sigma^2_{a_2} \end{array}\right] \] \[ V(e) = I \otimes \Sigma_e = \left[\begin{array}{rr} I\sigma^2_{e_1} & I\sigma_{e_1e_2} \\ I\sigma_{e_2e_1} & I\sigma^2_{e_2} \end{array}\right] \] Why does it matter? Covariances (\(\sigma_{a_{12}}\), \(\sigma_{e_{12}}\)) are extra information!!

Ongoing MM research in plant breeding

Part 2 - Operations

Routine analytics

  • Interpretation varies with model: purely phenotypic or genomic enhanced
  • Granularity: W/A environments, W/A experiments, W/A family

Workflow 1 - Field data analytics

Workflow 2 - Genomic-driven predictive analytics

Data visualization

Data visualization

Data visualization

Phenotypic GxE (observed data)

Additive Genetic GxE (modeled)

Exploratory analysis

Optimization of population size for predictions

Assessing the value of phenomic traits

Dissection of genetic architecture

Germplasm classification

Evaluation of genomic prediction methods

Evaluation of spatial and genetic parametrizations

Development of an efficient multi-trait solver

Thanks!