A brief recap of variance component estimation

March 2022

Alencar Xavier

Quantitative Geneticist, Corteva Agrisciences
Adjunct Professor at Purdue University
http://alenxav.wixsite.com/home/

Outline

Get data
Build GRM
Statistical model
Matrices
Henderson’s equation
Reduced animal model
Variance of BLUEs
Projection matrices
BLUP solutions
Variance of BLUPs
Reliability
Accuracy
REML
Expectation-Maximization
Pseudo-Expectation
MIVQUE
Average-Information
Gibbs sampling
Marker effects
Gauss-Seidel
Final remarks

Get some data

The code below fetches soybean data, a small subset from the SoyNAM project. We use two families (5,15) as training set, one (04) as prediction target. Phenotypes come from 10 incomplete blocks observed over 3 years.

tr = function(x) sum(diag(x)) # Trace function
data(met,package='NAM')
Obs$Sp = with(Obs,NAM::SPC(YLD,Block,Row,Col,4,4))
Obs$Block = factor(Obs$Block)
Gen2 = data.matrix(Gen[grep('-04',rownames(Gen)),])
Gen = data.matrix(Gen[grep('-05|-15',rownames(Gen)),])
Obs = droplevels(Obs[grep('-05|-15',Obs$ID),])
Obs = Obs[,c('Block','Sp','ID','YLD')]
Gen = Gen[,apply(Gen,2,var)>0.1]  # Observed individuals
Gen2 = Gen2[,colnames(Gen)] # Prediction target

Get some data

dim(Obs)

## [1] 1117    4

dim(Gen)

## [1]  280 3974

head(Obs)

##     Block       Sp         ID   YLD
## 16      9 56.56612 DS11-15004 52.16
## 28      7 55.90857 DS11-05089 68.25
## 36      8 57.66000 DS11-05042 58.94
## 76      3 66.27364 DS11-05156 75.38
## 83      9 56.98647 DS11-15133 66.93
## 113     8 53.03104 DS11-15119 72.64

Build GRM

From Habier et al. 2007 and VanRaden 2008, we have

\[A=\frac{(M-P)(M-P)'}{2\sum{(1+f_j)p_j(1-p_j)}}\]

Pre-center the markers (\(\tilde{Z}=M-P\)), then you get

\[A=\alpha\tilde{Z}\tilde{Z}'\]

Where \(\alpha^{-1} = n^{-1}\sum_{j=1}^p(\tilde{z}_j'\tilde{z}_j)\). Also, note that \(A\) stands here for additive relationship, not as a pedigree matrix.

Build GRM

Z1 = apply(Gen,2,function(x) x-mean(x)) # Observed individuals
Z2 = apply(Gen2,2,function(x) x-mean(x)) # Prediction target
A = tcrossprod(Z1)
alpha = 1/mean(diag(A))
A = A*alpha
diag(A) = diag(A)+0.01 # Stabilize GRM
iA = solve(A) # Get inverse

Build GRM

Statistical model

Linear model

\[y=Xb+Zu+e\]

\[y\sim N(Xb,V)\]

Variances

\[Var(y)=V=ZGZ'+R\]

\[Var(u)=G=A\sigma^2_a\]

\[Var(e)=R=I\sigma^2_e\]

Statistical model

The joint variance is defined as

\[Var\left[\begin{array}{r} y \\ u \\ e \end{array}\right] = \left[\begin{array}{rrr} V & ZG & R \\ GZ' & G & 0 \\ R & 0 & R \end{array}\right]\]

Matrices - Design matrices

y = Obs$YLD
X = model.matrix(~Block+Block:Sp-1,Obs)
Z = model.matrix(~ID-1,Obs)

# Constants
n = length(y) # number of obs
q = ncol(Z) # levels of Z
rX = ncol(X) # rank of X

# If you are not sure X is full-rank, run
# rX = qr(X)$rank

# Check dimensions
print(c(n=n,q=q,rX=rX))

##    n    q   rX 
## 1117  280   18

Matrices - Design matrix X

Matrix::image(Matrix::Matrix(X[1:50,]),main='First 50 rows of X')

Matrices - Design matrix Z

Matrix::image(Matrix::Matrix(Z[1:500,]),main='First 500 rows of Z')

Matrices - Variance starting values

Let’s assume a starting values of \(h_0^2\cong0.25\). Now what?

If \(\hat{\sigma}^2_{Y0}=\frac{(y-Xb_{LS})'(y-Xb_{LS})}{n-r_X}\). Then,

\(\hat{\sigma}^2_{e} = (1-h_0^2)\hat{\sigma}^2_{Y0}\) and \(\hat{\sigma}^2_{e} = h_0^2\frac{\hat{\sigma}^2_{Y0}}{\sum^q_{j=1}{\sigma^2_{z_j}}}\).

# Starting values for variance components
b = qr.solve(X,y)
vy0 = c(crossprod(y-X%*%b)/(n-rX))
vu = 0.25*vy0/sum(apply(Z,2,var))
ve = 0.75*vy0
vc = c(vu=vu,ve=ve)
print(vc)

##       vu       ve 
## 15.42179 46.14134

Matrices - Variance-covariances

I = diag(n)
R = I*ve
G = A*vu
ZAZ = Z %*% A %*% t(Z)
V = Z%*%G%*%t(Z) + R
iG = solve(G)
iR = solve(I*ve)
iV = solve(V)

Henderson’s equation

Henderson’s mixed model equation (HMME) for

\[y=Xb+Zu+e\]

\[ \left[\begin{array}{rr}X'R^{-1} X & Z'R^{-1}X \\X'R^{-1}Z & Z'R^{-1}Z+G^{-1}\end{array}\right] \left[\begin{array}{r} \hat{b} \\ \hat{u} \end{array}\right] =\left[\begin{array}{r} X'R^{-1}y \\ Z'R^{-1}y \end{array}\right] \]

\[ \left[\begin{array}{rr} C_{11} & C_{12} \\ C_{21}& C_{22}\end{array}\right] \left[\begin{array}{r} \hat{g}_1 \\ \hat{g}_2 \end{array}\right] =\left[\begin{array}{r} r_1 \\ r_2 \end{array}\right] \]

\[C\hat{g}=r\]

where \(C\) is the left-hand side, \(r\) is the right-hand side.

Henderson’s equation

\[y=Xb+Zu+e\]

Alternative notation

\[y=Wg+e\]

where \(W=\{X,Z\}\), and \(g=\{b,u\}\). Thus, MME

\[[W'R^{-1}W+\Sigma] \hat{g} = W'R^{-1}y\]

where

\[\Sigma= \left[\begin{array}{rr} 0 & 0 \\ 0 & G^{-1}\end{array}\right] \]

Henderson’s equation

System of equations

Sigma = as.matrix(Matrix::bdiag(diag(0,rX),iG))
W = cbind(X,Z)
C = t(W) %*% iR %*% W + Sigma
iC = solve(C)
r = t(W) %*% iR %*% y

Compute coefficients

g = iC %*% r
b = g[1:rX]
u = g[-c(1:rX)]

Reduced animal model

The “reduced animal model” simplifies the mixed model equation

into

\[ \left[\begin{array}{rr}X'X & Z'X \\X'Z & Z'Z+\lambda A^{-1}\end{array}\right] \left[\begin{array}{r} \hat{b} \\ \hat{u} \end{array}\right] =\left[\begin{array}{r} X'y \\ Z'y \end{array}\right] \] where \(\lambda=\sigma^{2}_e\sigma^{-2}_u\).

That is achieved by multiplying every thing by \(\sigma^{2}_e\).

Reduced animal model

From a data-augmentation standpoint, the reduced model is:

\[ \left[\begin{array}{rr}X'X & Z'X \\ X'Z & Z'Z \\ 0 & \lambda A^{-1} \end{array}\right] \left[\begin{array}{r} \hat{b} \\ \hat{u} \end{array}\right] =\left[\begin{array}{r} X'y \\ Z'y \\ 0 \end{array}\right] \]

Variance of BLUEs

Let \(C^{-1}\) be described as

\[ C^{-1} = \left[\begin{array}{rr} C^{11} & C^{12} \\ C^{21}& C^{22}\end{array}\right] \] Useful because

\[Var(\hat{b}) = C^{11} = (X'V^{-1}X)^{-1}\]

Also, we’ll need \(C^{22}\) to estimate variances later

C22 = as.matrix(iC[-c(1:rX),-c(1:rX)])

Projection matrices

The HAT matrix projects \(y\) into the parameter-space:

\[Hy=\hat{y}\]

The ABSORPTION matrix projects \(y\) into the null-space:

\[Sy=\hat{e}\]

\[\\\]

NOTE: In the least squares model, the \(H\) is obtained from the solution of coefficients \(\hat{b}=(X'X)^{-1}X'y\), because \(X\hat{b}=\hat{y}\), then \(H=X(X'X)^{-1}X'\). Thus, the absorption matrix is simply \(S=I-H\).

Projection matrices

LS absorption

\[S=I-X(X'X)^{-1}X'\]

BLUE absorption

\[M=I-X(X' V^{-1} X)^{-1}X'V^{-1}\] Random effect absorption

\[V^{-1}=R^{-1}-R^{-1}Z(Z'R^{-1}Z+G^{-1})^{-1}Z'R^{-1}\]

Mixed model absorption

\[P=V^{-1}-V^{-1}X(X' V^{-1} X)^{-1}X'V^{-1}\]

Projection matrices

\[H=WC^{-1}W'R^{-1}\] \[Hy=\hat{y}\] \[P=R^{-1}(I-H)\]

S = I - X %*% solve( t(X)%*%X ) %*% t(X)
M = I - X %*% solve( t(X) %*% iV %*%X ) %*% t(X) %*% iV
P = iV %*% M
H = W %*% iC %*% t(W) %*% iR
yHat = H %*% y
e = y - yHat

Projection matrices - Identities

\[SX = 0\] \[PX = 0\] \[SS = S\] \[MX = 0\]

\[My = y-X\hat{b}\] \[MM = M\] \[MS = M\] \[SM = S\] \[PVP = P\] \[P=V^{-1}M\] \[P=(XX'\sigma^2_\infty+V)^{-1}\] \[ZAZ'Py=\frac{Z\hat{u}}{\hat{\sigma}^2_u}\] \[Py=\frac{\hat{e}}{\hat{\sigma}^2_e}\]

BLUP solutions

\[\hat{u}=\hat{g}_2\]

\[\hat{u}=GZ'Py\] \[\hat{u}=(Z'R^{-1}Z+G^{-1})^{-1}Z'R^{-1}(y-X\hat{b})\]

\[\hat{u}=(Z'Z+\sigma^{2}_e\sigma^{-2}_u A^{-1})^{-1}Z'(y-X\hat{b})\]

BLUP solutions

# Other ways to get BLUPs
u2 = G %*% t(Z) %*% P %*% y
u3 = solve( t(Z)%*%iR%*%Z + iG , t(Z)%*%iR%*%c(y-X%*%b) )
u4 = solve( t(Z)%*%Z + iA*(ve/vu) , t(Z)%*% c(y-X%*%b) )
plot(data.frame(u,u2,u3,u4))

Variance of BLUPs

The variance of \(u\) is

\[Var(u)=G\]

However, the variance of \(\hat{u}\)

\[Var(\hat{u})=G Z' P Z G\]

Or, in terms of C,

\[Var(\hat{u})=G - C^{22}\]

Variance of BLUPs

# Variance of u hat
VarUhat = G %*% t(Z) %*% P %*% Z %*% G
VarUhat2 = G - C22
plot(diag(VarUhat),diag(VarUhat2))

Reliability

Reliability is the observation-level heritability (\(r^2\))

Under ML

\[r^2= Z' V^{-1} Z G\]

Under REML

\[r^2= Z' P^{-1} Z G\]

\[r^2= I-C^{22}G^{-1}\]

Alternatively

\[r^2 = \frac{Var(u)}{Var(\hat{u})}\]

Reliability

rel1 = diag(diag(q)-C22 %*% iG)
rel2 = diag( t(Z) %*% P %*% Z %*% G )
plot(rel1,rel2)

Accuracy

Accuracy is the square root of reliability (\(a=\sqrt{r^2}\)). Accuracy is generally defined as the correlation between estimated and true breeding values.

\[a = cor(u,\hat{u}) = \frac{cov(u,\hat{u})}{sd(u)\times sd(\hat{u})}\]

If the statistical model is the true model, \(cov(u,\hat{u})=cov(\hat{u},\hat{u})=\hat{\sigma}^2_u\)

\[a_i = \sqrt \frac{cov(u,\hat{u})^2}{var(\hat{u})var(u)}= \sqrt \frac{cov(u,\hat{u})}{var(u)} = \sqrt{\frac{ G_{iy} Z' V^{-1} Z G_{yi} }{ G_{ii} }}\] When all individuals are observed, the denominator \(G_{ii}\) cancels out with the nominator \(G_{iy}\), yielding \(a=\sqrt{Z' V^{-1} Z G}\).

Accuracy

R2_by_ind = function(z){  Cov12=(z%*%t(Z1)*alpha)*vu
  Var22 = c(crossprod(z)*alpha)*vu
  R2 = Cov12%*%t(Z)%*%iV%*%Z%*%t(Cov12)/Var22}  
Acc = sqrt(apply(Z2,1,R2_by_ind)); hist(Acc)

REML

Gaussian likelihood

\[L(X|\theta)=\frac{ exp (y-Xb)'V^{-1}(y-Xb)}{2\pi |V|}\]

Log-likelihood

\[-2LL(X|\theta)= ln|V| + (y-Xb)'V^{-1}(y-Xb)\]

Restricted Log-likelihood (REML)

\[-2LL(X|\theta)= ln|V| + ln|X'V^{-1}X| + (y-Xb)'V^{-1}(y-Xb)\]

REML derived as pseudo-random

\[-2LL(X|\theta)= ln|P| + y'Py\]

Expectation-Maximization

We EM simply by rearranging \(\partial L / \partial\sigma^2_i\).

Genetic variance

\[\partial L / \partial\sigma^2_u = tr(P ZAZ') - y'PZAZ'Py \]

\[\partial L / \partial\sigma^2_u = \frac{q}{\sigma^2_u} - \frac{u'A^{-1}u}{\sigma^4_u}- \frac{tr(A^{-1}C^{22})}{\sigma^4_u} \]

Residual variance

\[\partial L / \partial\sigma^2_e = tr(P I) - y'PIPy = tr(P) - y'P^2y\]

\[\partial L / \partial\sigma^2_e = \frac{n-r_X}{\sigma^2_e} - \frac{q}{\sigma^2_e} - \frac{tr(A^{-1}C^{22})}{\sigma^2_e\sigma^2_u}- \frac{e'e}{\sigma^4_e}\]

Expectation-Maximization

The final estimators turn out:

\[\hat{\sigma}^2_u=\frac{\hat{u}'A^{-1}\hat{u}+tr(A^{-1}C^{22})}{q} = \frac{\hat{u}'A^{-1}\hat{u}}{q-tr(G^{-1}C^{22}) } \]

\[\hat{\sigma}^2_e=\frac{\hat{e}'\hat{e}+tr(WC^{-1}W')\hat{\sigma}^2_e}{n}=\frac{y'e}{n-r_X}\]

Expectation-Maximization

# Var e
Ve = c(crossprod(y,e)) / (n-rX)
# Var U
Vu = c(t(u) %*% iA %*% u + tr(iA%*%C22)) / q
# Var U (faster converging alternative)
Vu2 = c(t(u) %*% iA %*% u) / ( q - tr(iG%*%C22) )
# Check
print(c(Ve=Ve, Vu=Vu, Vu2=Vu2))

##       Ve       Vu      Vu2 
## 52.18652 14.67906 12.66814

Pseudo-Expectation

Simplification of the likelihood. From

\[\partial L / \partial\sigma^2_i = tr(P V_i ) - y'PV_i'Py \] To

\[\partial PL / \partial\sigma^2_i = tr(S V_i ) - y'SV_i'Py \]

Yielding (Schaeffer 1986, VanRaden 1988):

\(\hat{\sigma}^2_u=\frac{ \tilde{u}'\hat{u}}{tr(AZ'SZ)}\) and \(\hat{\sigma}^2_e=\frac{y'S'\hat{e}}{n-r_X}\)

where \(\tilde{u} = ySZ\).

Pseudo-Expectation

Sy = S %*% y
ZSy = t(Z) %*% Sy
trAZSZ = tr( S %*% ZAZ ) # In practice, only diagonals are computed
# Var U
c(u %*% ZSy) / trAZSZ

## [1] 9.506624

# Var E
c(t(e) %*% Sy) / (n-rX)

## [1] 52.18652

P.S.: Pseudo-Expectation works great for SNP-BLUP.

MIVQUE

Minimum Variance Quadratic Unbiased Estimator (Rao 1971):

Unbiased variances: \(E[\hat{\sigma}]=\sigma\)
Quadratic unbiasedness: \(E[y'Qy]=E[\epsilon_i'Q\epsilon_i]=0\)
Invariant: \(QX=0\)
\(y'Qy=y'Q\epsilon_0+y'Q\epsilon_1\) where \(\epsilon_0=e\) and \(\epsilon_1=Zu\).
Iterations of MIVQUE yields REML, we use \(Q=P\).
General formulation:

\[\hat{\sigma}=S^{-1}q\] Where \(S\) and \(q\) are defined next.

MIVQUE

\[\hat{\sigma}=S^{-1}q\] Terms are defined as

\[ S = \left[\begin{array}{rr} P\frac{\partial V}{\partial\sigma^2_u}P\frac{\partial V}{\partial\sigma^2_u} & P\frac{\partial V}{\partial\sigma^2_u}P\frac{\partial V}{\partial\sigma^2_e} \\ P\frac{\partial V}{\partial\sigma^2_e}P\frac{\partial V}{\partial\sigma^2_u} & P\frac{\partial V}{\partial\sigma^2_e}P\frac{\partial V}{\partial\sigma^2_e} \end{array}\right]\]

\[q = \left[\begin{array}{rr} y'P \frac{\partial V}{\partial\sigma^2_u} Py \\ y'P \frac{\partial V}{\partial\sigma^2_e} Py \end{array}\right] \]

where \(\frac{\partial V}{\partial\sigma^2_u}=ZAZ'\) and \(\frac{\partial V}{\partial\sigma^2_e}=I\)

MIVQUE

PZAZ = P %*% ZAZ
S = matrix(c(
  tr( PZAZ %*% PZAZ ),    tr( PZAZ %*% P ),
  tr( P %*% I %*% PZAZ ), tr( P %*% P    )),2,2)
qs = c(Vu = c(t(y) %*% PZAZ %*% P %*% y),
       Ve = c(t(y) %*% P %*% P %*% y))
solve(S,qs)

## [1]  8.495934 54.161688

Average-Information

Second-derivative methods (NR, FS, AI) work with:

\[\theta^{t+1} = \theta^{t}-\frac{f'}{f''}\]

Where \(f'\) and \(f''\) are first and second derivatives, respectively. Where \(f'=\Delta(\theta)\) is computed as

\[\Delta(\sigma^2_i) = \partial L / \partial\sigma^2_i = tr(P \frac{\partial V}{\partial\sigma^2_i} ) - y'P \frac{\partial V}{\partial\sigma^2_i}Py \]

And the (average-Information) \(f''=AI(\theta)\) is computed as:

\[AI(\sigma^2_i,\sigma^2_j) = y'P\frac{\partial V}{\partial\sigma^2_i} P\frac{\partial V}{\partial\sigma^2_j} P y\]

Average-Information

SecDer1 = matrix(c(
  t(y)%*%PZAZ%*%PZAZ%*%P%*%y, t(y)%*%PZAZ%*%P%*%P%*% y,
  t(y)%*%P%*%PZAZ%*%P%*%y, t(y) %*%P%*%P%*%P%*%y ),2,2)
FirDer1 = c( vu = tr(PZAZ) - t(y) %*% PZAZ %*% P %*% y ,
             ve = tr(P %*% I) - t(y) %*% P %*% P %*% y )
vc - solve(SecDer1,FirDer1)

##        vu        ve 
##  5.614032 53.308581

SecDer1 # AI matrix

##            [,1]       [,2]
## [1,] 0.12285971 0.04612028
## [2,] 0.04612028 0.53927449

Average-Information

Building \(V\) and \(P\) is often not feasible. Let’s check how to solve using the mixed model equations (Meyer 1997). We have seen solutions for first derivative, \(\Delta(\theta)\), through \(C\):

\[\partial L / \partial\sigma^2_u = \frac{q}{\sigma^2_u} - \frac{u'A^{-1}u}{\sigma^4_u}- \frac{tr(A^{-1}C^{22})}{\sigma^4_u} \]

\[\partial L / \partial\sigma^2_e = \frac{n-r_X}{\sigma^2_e} - \frac{q}{\sigma^2_e} - \frac{tr(A^{-1}C^{22})}{\sigma^2_e\sigma^2_u}- \frac{e'e}{\sigma^4_e}\]

Average-Information

The \(AI(\theta)\) is obtained as follows.

\[M_B = \left[\begin{array}{rr} C & W'R^{-1}B \\ B'R^{-1}W & B'R^{-1}B \end{array}\right]\]

\[B = \left[\begin{array}{r} ZAZ'Py & Py \end{array}\right] = \left[\begin{array}{r} Zu\sigma^{-2}_u & e\sigma^{-2}_e \end{array}\right] \]

Take the Cholesky

\[M_B = LL'\]

Then we reconstruct the sub-matrix corresponding to \(B'R^{-1}B\).

\[AI(\theta) = L_{22}'L_{22}\]

Average-Information

B = cbind( vu = Z %*% u / vu, ve = e / ve )
MB = chol(rbind(cbind(as.matrix(C),       t(W) %*% iR %*% B),
                cbind( t(B) %*% iR %*% W, t(B) %*% iR %*% B)))
LB = MB[(ncol(MB)-1):ncol(MB),(ncol(MB)-1):ncol(MB)]
SecDer2 =crossprod(LB)
FirDer2 =c(q/vu-(t(u)%*%iA%*%u)/(vu^2)-tr(iA%*%C22)/(vu^2),
    ((n-rX)/ve-(1/ve)*(q-tr(iA%*%C22)/(vu)))-crossprod(e)/(ve^2))
vc - solve(SecDer2,FirDer2)

##        vu        ve 
##  5.614032 53.308581

SecDer2 # AI matrix

##                       
##  0.12285971 0.04612028
##  0.04612028 0.53927449

Gibbs sampling

Simple Hierarchical Bayesian approach to estimate variance components, where the final estimates are the average of the samples a posteriori, \(\pi(\theta|x)\), the distribution of parameters (\(\theta=\{\sigma^2_u,\sigma^2_u,b,u\}\)) given data (\(x=\{y,X,Z,A\}\)). The posterior is a function of the probability of the data, \(p(x|\theta)\), and probability of priors, \(\pi(\theta)\).

\[\pi(\theta|x)\propto p(x|\theta)\pi(\theta)\]

\[p(x|\theta)=N(Xb+Zu,R)N(u,G|\sigma^2_u)N(e,R|\sigma^2_e)\] \[\pi(\theta)=\chi^{-2}(\sigma^2_u|S_u,\nu_u)\chi^{-2}(\sigma^2_e|S_e,\nu_e)\]

Gibbs sampling

We sample variance components from a scaled inverse-chi square distribution:

\[\sigma^2_u|u,S_u,\nu_u \sim \frac{u'A^{-1}u+S_u\nu_u}{\chi^2(q+\nu_u)}\] \[\sigma^2_e|e,S_e,\nu_e \sim \frac{e'e+S_e\nu_e}{\chi^2(n+\nu_e)}\] Using the mixed model equation (\(Cg=r\)) notation, we sample coefficients from:

\[g_j|x,g_{-j} \sim N( \frac{r_j - C_{-j,j}g_{-j}}{C_{j,j}},\frac{1}{C_{j,j}} )\]

Gibbs sampling

# Priors
df0 = 5
Su = vu
Se = ve
# Sample genetic variance
c(t(u) %*% iA %*% u + Su*df0 ) / rchisq(1,q+df0)

## [1] 3.521494

# Sample residual variance
c(t(e) %*% e + Se*df0 ) / rchisq(1,n+df0)

## [1] 49.31188

Gibbs sampling

g_samp = sapply(1:(rX+q), function(j) rnorm(n=1,
        mean=c(r[j]-C[j,-j]%*%g[-j])/C[j,j], sd=sqrt(1/C[j,j])))
plot(g[-c(1:rX)],g_samp[-c(1:rX)],xlab='u hat',ylab='sample u')

Marker effects

\[\hat{\beta}|\hat{u}=\hat{\sigma}^2_\beta \tilde{Z}'G^{-1}\hat{u}\]

vb = vu * alpha
beta = vb * c( t(Z1) %*% iG %*% u)
plot(beta)

Marker effects - Statistical model

The joint variance, including marker effects \(\beta\), is defined as

\[Var\left[\begin{array}{r} y \\ u \\ \beta \\ e \end{array}\right] = \left[\begin{array}{rrrr} V & ZG & Z\tilde{Z}D & R \\ GZ' & G & \tilde{Z}D & 0 \\ D\tilde{Z}'Z' & D\tilde{Z}' & D & 0 \\ R & 0 & 0 & R \end{array}\right]\]

where \(u=\tilde{Z}\beta\), \(\beta\sim N(0,D)\), \(D=I_m\sigma^2_\beta\), and \(G=A'\sigma^2_u=\tilde{Z}D\tilde{Z}'\). The model could be described as:

\[y=Xb+Z(\tilde{Z}\beta)+e\]

Gauss-Seidel

Gauss-Seidel can bu used in SNP-BLUP models (\(y=Xb+e\)) to avoid building the system of equations. A common approach (Legarra and Misztal 2008) is to update one effect at a time

\[\hat{b}_j^{t+1}=\frac{ x_j'e^t+x_j'x_j\hat{b}_j^t}{x_j'x_j+\lambda}\] with subsequent update of the residuals

\[e^{t+1}=e^t- x_j(\hat{b}_j^{t+1}-\hat{b}_j^{t})\]

Only diagonal component of the LHS are necessary (\(x_j'x_j\)), and those can be computed beforehand. Convergence faster when updated occur in random order (Ma et al. 2015).

Gauss-Seidel

# Get some data
data(tpod, package='NAM')
y = y-mean(y) # centralize phenotype
X = NAM::CNT(gen) # centralize markers
b = rep(0,ncol(X)) # coefficient starting values
xx = apply(X,2,crossprod) # pre-compute X'X
lambda = mean(xx) # lambda for h2=0.5
e = y # starting value of e

# Gauss-Seidel
for(j in sample(1:ncol(X))){
  b_old = b[j]
  b[j] = (c(X[,j]%*%e) +xx[j]*b_old)/(xx[j]+lambda)
  e = e - X[,j]*(b[j]-b_old)
}

Gauss-Seidel

# Check convergence over 10 iterations
for(i in 1:10){
  # Store current version of beta
  b_vec = b
  # Gauss-Seidel
  for(j in sample(1:ncol(X))){
    b_old = b[j]
    b[j] = (c(X[,j]%*%e) +xx[j]*b_old)/(xx[j]+lambda)
    e = e - X[,j]*(b[j]-b_old)}
  # Print log convergence
  cat(round(log10(crossprod(b_vec-b)[1,1]),2),' ')}

## -2.94  -3.61  -4.15  -4.67  -5.34  -5.94  -6.57  -7.07  -7.59  -8.21

Gauss-Seidel

plot(b,xlab='SNP',ylab='Marker effect')

Final remarks

Design matrices (\(Z\),\(X\)) are sparse (Matrix)
Symmetric matrices are store as upper diagonals
Use good linear algebra libraries (RcppEigen)
Same matrices are never computed (\(P\),\(V\),\(R\),\(A\))
For trace, only compute diagonals (\(tr(Z'SZ)\))
Effect absorption can be done one vector at a time (\(SZ\))
Sometimes factorization can help (SVD, Cholesky)

Alencar Xavier

Outline

Get some data

Get some data

Build GRM

Build GRM

Build GRM

Statistical model

Statistical model

Matrices - Design matrices

Matrices - Design matrix X

Matrices - Design matrix Z

Matrices - Variance starting values

Matrices - Variance-covariances

Henderson’s equation

Henderson’s equation

Henderson’s equation

Reduced animal model

Reduced animal model

Variance of BLUEs

Projection matrices

Projection matrices

Projection matrices

Projection matrices - Identities

BLUP solutions

BLUP solutions

Variance of BLUPs

Variance of BLUPs

Reliability

Reliability

Accuracy

Accuracy

REML

Expectation-Maximization

Expectation-Maximization

Expectation-Maximization

Pseudo-Expectation

Pseudo-Expectation

MIVQUE

MIVQUE

MIVQUE

Average-Information

Average-Information

Average-Information

Average-Information

Average-Information

Gibbs sampling

Gibbs sampling

Gibbs sampling

Gibbs sampling

Marker effects

Marker effects - Statistical model

Gauss-Seidel

Gauss-Seidel

Gauss-Seidel

Gauss-Seidel

Final remarks

Thank you