October 26, 2018

Instructors

Outline

Part 1: Concepts

  • History of mixed models
  • Mixed models in plant breeding
  • Fixed and random terms
  • Model notation
  • Variance decomposition

Part 2: Applications

  • Selection models
  • Practical examples
  • Variance components
  • Ridges and Kernels

Part 1 - Concepts

History of mixed models

Francis Galton - 1886: Regression and heritability

Ronald Fisher - 1918: Infinitesimal model (P = G + E)

Sewall Wright - 1922: Genetic relationship

Charles Henderson - 1950, 1968: BLUP using relationship

Mixed models in plant breeding

Fixed and random terms

Fixed effect

  • Assumed to be invariable (often you cannot recollect the data)
  • Inferences are made upon the parameters
  • Results can not be extrapolated to other datasets
  • Example: Overall mean and environmental effects

Random effects

  • You may not have all the levels available
  • Inference are made on variance components
  • Prior assumption: coefficients are normally distributed
  • Results can not be extrapolated to other datasets
  • Example: Genetic effects

Let's unleash the beast

Model notation

  • Linear model: \(y=Xb+Zu+e\)
  • With variance: \(y \sim N(Xb,ZKZ\sigma^{2}_{u}+I\sigma^{2}_{e})\)

Assuming: \(u \sim N(0,K\sigma^{2}_{u})\) and \(e \sim N(0,I\sigma^{2}_{e})\)

Henderson equation

\[ \left[\begin{array}{rr} X'X & Z'X \\ X'Z & Z'Z+\lambda K^{-1} \end{array}\right] \left[\begin{array}{r} b \\ u \end{array}\right] =\left[\begin{array}{r} X'y \\ Z'y \end{array}\right] \] Summary:

  • We know (data): \(x=\{y,X,Z,K\}\)
  • We want (parameters): \(\theta=\{b,u,\sigma^{2}_{a},\sigma^{2}_{e}\}\)
  • Estimation based on Gaussian likelihood: \(L(x|\theta)\)

Model notation

  • y = vector of observations (n)
  • X = design matrix of fixed effects (n x p)
  • Z = design (or incidence) matrix of random effects (n x p)
  • K = random effect correlation matrix (q x q)
  • u = vector of random effect coefficients (q)
  • b = vector of fixed effect coefficients (p)
  • e = vector of residuals (n)
  • \(\sigma^{2}_{a}\) = marker effect variance (1)
  • \(\sigma^{2}_{u}\) = random effect variance (1)
  • \(\sigma^{2}_{e}\) = residual variance (1)
  • \(\lambda=\sigma^{2}_{e}/\sigma^{2}_{u}\) (Regularization parameters) (1)

Model notation

The mixed model can also be notated as follows

\[y=Wg+e\]

Solved as

\[[W'W+\Sigma] g = [W'y]\]

Where

\(W=[X,Z]\)

\(g=[b,u]\)

\(\Sigma = \left[\begin{array}{rr} 0 & 0 \\ 0 & \lambda K^{-1} \end{array}\right]\)

Variance decomposition

Part 2 - Applications

Selection

1 Genetic values

  • BLUPs or BLUEs from replicated trials
  • Captures additive and non-additive genetics together

2 Breeding values

  • Use pedigree information to create \(K\)
  • Captures additive genetics (heritable)
  • Trials not necessarily replicated

3 Genomic Breeding values

  • Genotypic information replaces pedigree
  • Any signal: additivity, dominance and epistasis

Examples

  • Example 1: Balanced data, no kinship
  • Example 2: Balanced data, with kinship
  • Example 3: Unbalanced data, with kinship
  • Example 4: Balanced data, missing individual

Example 1

Data:

##    Env Gen Phe
## 1   E1  G1  47
## 2   E1  G2  51
## 3   E1  G3  46
## 4   E1  G4  58
## 5   E2  G1  52
## 6   E2  G2  46
## 7   E2  G3  52
## 8   E2  G4  54
## 9   E3  G1  53
## 10  E3  G2  48
## 11  E3  G3  58
## 12  E3  G4  52

Model: \(Phenotype = Environment_{(F)} + Genotype_{(R)}\)

Example 1

Design matrix \(W\):

##    EnvE1 EnvE2 EnvE3 GenG1 GenG2 GenG3 GenG4
## 1      1     0     0     1     0     0     0
## 2      1     0     0     0     1     0     0
## 3      1     0     0     0     0     1     0
## 4      1     0     0     0     0     0     1
## 5      0     1     0     1     0     0     0
## 6      0     1     0     0     1     0     0
## 7      0     1     0     0     0     1     0
## 8      0     1     0     0     0     0     1
## 9      0     0     1     1     0     0     0
## 10     0     0     1     0     1     0     0
## 11     0     0     1     0     0     1     0
## 12     0     0     1     0     0     0     1

Example 1

\(W'W\):

##       EnvE1 EnvE2 EnvE3 GenG1 GenG2 GenG3 GenG4
## EnvE1     4     0     0     1     1     1     1
## EnvE2     0     4     0     1     1     1     1
## EnvE3     0     0     4     1     1     1     1
## GenG1     1     1     1     3     0     0     0
## GenG2     1     1     1     0     3     0     0
## GenG3     1     1     1     0     0     3     0
## GenG4     1     1     1     0     0     0     3

Example 1

Left-hand side (\(W'W+\Sigma\)):

##       EnvE1 EnvE2 EnvE3 GenG1 GenG2 GenG3 GenG4
## EnvE1     4     0     0  1.00  1.00  1.00  1.00
## EnvE2     0     4     0  1.00  1.00  1.00  1.00
## EnvE3     0     0     4  1.00  1.00  1.00  1.00
## GenG1     1     1     1  3.17  0.00  0.00  0.00
## GenG2     1     1     1  0.00  3.17  0.00  0.00
## GenG3     1     1     1  0.00  0.00  3.17  0.00
## GenG4     1     1     1  0.00  0.00  0.00  3.17

Assuming independent individuals: \(K=I\)

Regularization: \(\lambda=\sigma^{2}_{e}/\sigma^{2}_{u}=1.64/9.56=0.17\)

Example 1

Right-hand side (\(W'y\)):

##       [,1]
## EnvE1  202
## EnvE2  204
## EnvE3  211
## GenG1  152
## GenG2  145
## GenG3  156
## GenG4  164

Example 1

We can find coefficients through least-square solution

\(g=(LHS)^{-1}(RHS)=(W'W+\Sigma)^{-1}W'y\)

##        [,1]
## EnvE1 50.50
## EnvE2 51.00
## EnvE3 52.75
## GenG1 -0.71
## GenG2 -2.92
## GenG3  0.55
## GenG4  3.08

Shrinkage

\(BLUE=\frac{w'y}{w'w}=\frac{sum}{n}\) = simple average

\(BLUP=\frac{w'y}{w'w+\lambda}=\frac{sum}{n+\lambda}\) = biased average = \(BLUE\times h^2\)

Note:

  • More observations = less shrinkage
  • Higher heritability = less shrinkage: \(\lambda = \frac{h^{2}-1}{h^2}\)

Example 2

If we know the relationship among individuals:

##       GenG1 GenG2 GenG3 GenG4
## GenG1  1.00  0.64  0.23  0.48
## GenG2  0.64  1.00  0.33  0.67
## GenG3  0.23  0.33  1.00  0.31
## GenG4  0.48  0.67  0.31  1.00

Example 2

Then we estimate \(\lambda K^{-1}\)

##       GenG1 GenG2 GenG3 GenG4
## GenG1  0.15 -0.09  0.00 -0.01
## GenG2 -0.09  0.22 -0.02 -0.10
## GenG3  0.00 -0.02  0.10 -0.02
## GenG4 -0.01 -0.10 -0.02  0.17

Regularization: \(\lambda=\sigma^{2}_{e}/\sigma^{2}_{u}=1.64/17.70=0.09\)

Example 2

And the left-hand side becomes

##       EnvE1 EnvE2 EnvE3 GenG1 GenG2 GenG3 GenG4
## EnvE1     4     0     0  1.00  1.00  1.00  1.00
## EnvE2     0     4     0  1.00  1.00  1.00  1.00
## EnvE3     0     0     4  1.00  1.00  1.00  1.00
## GenG1     1     1     1  3.15 -0.09  0.00 -0.01
## GenG2     1     1     1 -0.09  3.22 -0.02 -0.10
## GenG3     1     1     1  0.00 -0.02  3.10 -0.02
## GenG4     1     1     1 -0.01 -0.10 -0.02  3.17

Example 2

We can find coefficients through least-square solution

\(g=(LHS)^{-1}(RHS)=(W'W+\Sigma)^{-1}W'y\)

##        [,1]
## EnvE1 51.05
## EnvE2 51.55
## EnvE3 53.30
## GenG1 -1.32
## GenG2 -3.34
## GenG3  0.03
## GenG4  2.45

Genetic coefficients shrink more: Var(A) < Var(G)

Example 3

What if we have missing data?

##    Env Gen Phe
## 1   E1  G1  47
## 2   E1  G2  51
## 3   E1  G3  NA
## 4   E1  G4  58
## 5   E2  G1  52
## 6   E2  G2  46
## 7   E2  G3  52
## 8   E2  G4  NA
## 9   E3  G1  53
## 10  E3  G2  48
## 11  E3  G3  58
## 12  E3  G4  52

Example 3

Rows of missing points are removed

##    EnvE1 EnvE2 EnvE3 GenG1 GenG2 GenG3 GenG4
## 1      1     0     0     1     0     0     0
## 2      1     0     0     0     1     0     0
## 4      1     0     0     0     0     0     1
## 5      0     1     0     1     0     0     0
## 6      0     1     0     0     1     0     0
## 7      0     1     0     0     0     1     0
## 9      0     0     1     1     0     0     0
## 10     0     0     1     0     1     0     0
## 11     0     0     1     0     0     1     0
## 12     0     0     1     0     0     0     1

Example 3

\(W'W\):

##       EnvE1 EnvE2 EnvE3 GenG1 GenG2 GenG3 GenG4
## EnvE1     3     0     0     1     1     0     1
## EnvE2     0     3     0     1     1     1     0
## EnvE3     0     0     4     1     1     1     1
## GenG1     1     1     1     3     0     0     0
## GenG2     1     1     1     0     3     0     0
## GenG3     0     1     1     0     0     2     0
## GenG4     1     0     1     0     0     0     2

Example 3

Left-hand side (\(W'W+\Sigma\)):

##       EnvE1 EnvE2 EnvE3 GenG1 GenG2 GenG3 GenG4
## EnvE1     3     0     0  1.00  1.00  0.00  1.00
## EnvE2     0     3     0  1.00  1.00  1.00  0.00
## EnvE3     0     0     4  1.00  1.00  1.00  1.00
## GenG1     1     1     1  3.10 -0.06  0.00 -0.01
## GenG2     1     1     1 -0.06  3.15 -0.01 -0.07
## GenG3     0     1     1  0.00 -0.01  2.07 -0.01
## GenG4     1     0     1 -0.01 -0.07 -0.01  2.11

Regularization: \(\lambda=\sigma^{2}_{e}/\sigma^{2}_{u}=1.21/19.61=0.06\)

Example 3

Right-hand side (\(W'y\)):

##       [,1]
## EnvE1  156
## EnvE2  150
## EnvE3  211
## GenG1  152
## GenG2  145
## GenG3  110
## GenG4  110

Example 3

Find coefficients through least-square solution

\(g=(LHS)^{-1}(RHS)=(W'W+\Sigma)^{-1}W'y\)

##        [,1]
## EnvE1 54.14
## EnvE2 51.70
## EnvE3 53.82
## GenG1 -2.56
## GenG2 -4.68
## GenG3  2.15
## GenG4  0.81

Example 4

What if we are missing data from a individual?

##    Env Gen Phe
## 1   E1  G1  NA
## 2   E1  G2  51
## 3   E1  G3  46
## 4   E1  G4  58
## 5   E2  G1  NA
## 6   E2  G2  46
## 7   E2  G3  52
## 8   E2  G4  54
## 9   E3  G1  NA
## 10  E3  G2  48
## 11  E3  G3  58
## 12  E3  G4  52

Example 4

Rows of missing points are removed

##    EnvE1 EnvE2 EnvE3 GenG1 GenG2 GenG3 GenG4
## 2      1     0     0     0     1     0     0
## 3      1     0     0     0     0     1     0
## 4      1     0     0     0     0     0     1
## 6      0     1     0     0     1     0     0
## 7      0     1     0     0     0     1     0
## 8      0     1     0     0     0     0     1
## 10     0     0     1     0     1     0     0
## 11     0     0     1     0     0     1     0
## 12     0     0     1     0     0     0     1

Example 4

\(W'W\):

##       EnvE1 EnvE2 EnvE3 GenG1 GenG2 GenG3 GenG4
## EnvE1     3     0     0     0     1     1     1
## EnvE2     0     3     0     0     1     1     1
## EnvE3     0     0     3     0     1     1     1
## GenG1     0     0     0     0     0     0     0
## GenG2     1     1     1     0     3     0     0
## GenG3     1     1     1     0     0     3     0
## GenG4     1     1     1     0     0     0     3

Example 4

Left-hand side (\(W'W+\Sigma\)):

##       EnvE1 EnvE2 EnvE3 GenG1 GenG2 GenG3 GenG4
## EnvE1     3     0     0  0.00  1.00  1.00  1.00
## EnvE2     0     3     0  0.00  1.00  1.00  1.00
## EnvE3     0     0     3  0.00  1.00  1.00  1.00
## GenG1     0     0     0  0.14 -0.08  0.00 -0.01
## GenG2     1     1     1 -0.08  3.19 -0.02 -0.09
## GenG3     1     1     1  0.00 -0.02  3.09 -0.01
## GenG4     1     1     1 -0.01 -0.09 -0.01  3.15

Regularization: \(\lambda=\sigma^{2}_{e}/\sigma^{2}_{u}=1.79/22.78=0.08\)

Example 4

Right-hand side (\(W'y\)):

##       [,1]
## EnvE1  155
## EnvE2  152
## EnvE3  158
## GenG1    0
## GenG2  145
## GenG3  156
## GenG4  164

Example 4

Find coefficients through least-square solution

\(g=(LHS)^{-1}(RHS)=(W'W+\Sigma)^{-1}W'y\)

##        [,1]
## EnvE1 52.06
## EnvE2 51.06
## EnvE3 53.06
## GenG1 -1.82
## GenG2 -3.48
## GenG3 -0.07
## GenG4  2.38

Variance components

Expectation-Maximization REML (1977)

\(\sigma^{2}_{u} = \frac{u'K^{-1}u}{q-\lambda tr(K^{-1}C^{22})}\) and \(\sigma^{2}_{e} = \frac{e'y}{n-p}\)

Bayesian Gibbs Sampling (1993)

\(\sigma^{2}_{u} = \frac{u'K^{-1}u+S_u\nu_u}{\chi^2(q+\nu_u)}\) and \(\sigma^{2}_{e} = \frac{e'e+S_e\nu_e}{\chi^2(n+\nu_e)}\)

Predicted Residual Error Sum of Squares (PRESS) (2017)

  • \(\lambda = argmin(\sum{e^2_i/(1-h_{ii})^2})\)
  • Where \(H=(I+K\lambda)^{-1}\) and \(e=y-\mu-Hy\)

Ridges and Kernels

Kernel methods:

  • Genetic signal is captured by the relationship matrix \(K\)
  • Random effect coefficients are the breeding values (BV)
  • Efficient to compute BV when \(markers \gg individuals\)
  • Easy use and combine pedigree, markers and interactions

Ridge methods:

  • Genetic signal is captured by the design matrix \(M\)
  • Random effect coefficients are the marker effects
  • Easy way to make predictions of unobserved individuals
  • Enables to visualize where the QTLs are in the genome

Ridges and Kernels

Kernel

\(y=Xb+Zu+e\), \(u\sim N(0,K\sigma^2_u)\)

Ridge

\(y=Xb+Ma+e\), \(a\sim N(0,I\sigma^2_a)\)

Where

  • \(M\) is the genotypic matrix, \(m_{ij}=\{0,1,2\}\)
  • \(K=\alpha MM'\)
  • \(u=Ma\)
  • \(\sigma^2_a=\alpha\sigma^2_u\)

Ridges and Kernels

Kernel model

\[ \left[\begin{array}{rr} X'X & Z'X \\ X'Z & Z'Z+K^{-1}(\sigma^2_e/\sigma^2_u) \end{array}\right] \left[\begin{array}{r} b \\ u \end{array}\right] =\left[\begin{array}{r} X'y \\ Z'y \end{array}\right] \]

Ridge model

\[ \left[\begin{array}{rr} X'X & M'X \\ X'M & M'M+I^{-1}(\sigma^2_e/\sigma^2_a) \end{array}\right] \left[\begin{array}{r} b \\ a \end{array}\right] =\left[\begin{array}{r} X'y \\ M'y \end{array}\right] \]

Both models capture same genetic signal (de los Campos 2015)

Ridges and Kernels

K = tcrossprod(M)/ncol(M)
GBLUP = reml(y=y,K=K); Kernel_GEBV = GBLUP$EBV
RRBLUP = reml(y=y,Z=M); Ridge_GEBV = M%*%RRBLUP$EBV
plot(Kernel_GEBV,Ridge_GEBV, main='Comparing results')

Break