Modeling time-varying genetic effects

Our main goal is to estimate the effect of SNPs on the mean and variance of standardized Math and Reading scores of students from Grades 5, 8 and 9. For simplicity, we will first illustrate our pipeline only for the mean effect of SNP. We use a linear mixed effects model

\[Y = \textbf{X}\beta + \textbf{Z}\mu + \epsilon\]where, \(\textbf{X}\) is a matrix of predictor variables with \(\beta\) as a vector of their estimated effect and \(\textbf{Z}\) are covariates with their corresponding vector of random effects, \(\mu\). We assume \(\epsilon \sim N(0,\sigma_\epsilon^2)\) and the random components are iid with mean 0 and a variance-covariance matrix,

\[\Omega_\mu = \begin{bmatrix} \sigma^2_{\mu0} & \\ \sigma_{\mu01} & \sigma^2_{\mu1} \\ \end{bmatrix}\]

where \(\mu_0\) is the random intercept of individual and \(\mu_1\) is random slope, time.

2 Levels

We start by regressing academic performance on SNP and 10 principal components to control for genetic ancestry and then add additional terms step-wise.

\[AP = \beta_0 + \beta_1 \textbf{SNP} + \beta_2 \textbf{PC1} + ... + \beta_{11} \textbf{PC10} + \epsilon\]

\[AP = \beta_0 + \beta_1 \textbf{time} + \beta_2 \textbf{SNP} + \beta_3 \textbf{PC1} + ... + \beta_{12} \textbf{PC10} + \beta_{13} \textbf{SNP:time} + \epsilon\]

\[\begin{multline} \begin{split} AP_i = \beta_0 + \beta_1 \textbf{time}_i + \beta_2 \textbf{SNP}_i & + \beta_3 \textbf{PC1}_i + ... + \beta_{12} \textbf{PC10}_i + \beta_{13} \textbf{SNP:time}_i \\& + \mu_{0i} + \epsilon_{i} \end{split} \end{multline}\]

\[\begin{multline} \begin{split} AP_{ij} = \beta_0 + \beta_1 \textbf{time}_{ij} + \beta_2 & \textbf{SNP}_{i} + \beta_3 \textbf{PC1}_{i} + ... + \beta_{12} \textbf{PC10}_{i} + \beta_{13} \textbf{SNP:time}_{ij} \\& + \mu_{0i} + \mu_{1j} \textbf{time}_{ij} + \epsilon_{ij} \end{split} \end{multline}\]

We use the base model above to construct a PGI and add this PGI in an unconditional growth model where \(\mu_0\) and \(\mu_1\) are introduced.

\[PGI = \sum_{1}^{n}\beta_n \textbf{SNP}_n\]

\[ AP_{ij} = \beta_0 + \beta_1 \textbf{time}_{ij} + \mu_{0i} + \mu_{1j} \textbf{time}_{ij} + \epsilon_{ij} \]

\[ AP_{ij} = \beta_0 + \beta_1 \textbf{time}_{ij} + \beta_2 \textbf{PGI}_i + \beta_3 \textbf{PGI:time}_{ij} + \mu_{0i} + \mu_{1j} \textbf{time}_{ij} + \epsilon_{ij} \]

3 Levels

Exactly the same as above but with the addition of a random intercept, \(\gamma_0\), in \(\textbf{Z}\) representing the genetic relatedness matrix (GRM). This is common in GWAS because it is known to increase power and provide better estimates of genetic effects. We add \(\gamma_0\) in the base model and keep it in all succeeding models.

\[AP = \beta_0 + \beta_1 \textbf{SNP} + \beta_2 \textbf{PC1} + ... + \beta_{11} \textbf{PC10} + {\color{Red}{\gamma_0}} + \epsilon\]

\[AP = \beta_0 + \beta_1 \textbf{time} + \beta_2 \textbf{SNP} + \beta_3 \textbf{PC1} + ... + \beta_{12} \textbf{PC10} + \beta_{13} \textbf{SNP:time} + {\color{Red}{\gamma_0}}+ \epsilon\]

\[\begin{multline} \begin{split} AP_i = \beta_0 + \beta_1 \textbf{time}_i + \beta_2 \textbf{SNP}_i & + \beta_3 \textbf{PC1}_i + ... + \beta_{12} \textbf{PC10}_i + \beta_{13} \textbf{SNP:time}_i \\& + {\color{Red}{\gamma_{0i}}} + \mu_{0i} + \epsilon_{i} \end{split} \end{multline}\]

\[\begin{multline} \begin{split} AP_{ij} = \beta_0 + \beta_1 \textbf{time}_{ij} + \beta_2 & \textbf{SNP}_{i} + \beta_3 \textbf{PC1}_{i} + ... + \beta_{12} \textbf{PC10}_{i} + \beta_{13} \textbf{SNP:time}_{ij} \\& + {\color{Red}{\gamma_{0i}}} + \mu_{0i} + \mu_{1j} \textbf{time}_{ij} + \epsilon_{ij} \end{split} \end{multline}\] We can estimate additional covariance terms at L3.

\[L1: \epsilon \sim N(0, \sigma^2)\]

\[L2: \gamma_0 \sim N(0, \sigma^2_{\gamma0})\]

\[L3: \begin{bmatrix}\sigma^2_{\gamma0} & & \\\sigma_{\gamma0\mu0} & \sigma^2_{\mu0} & \\\sigma_{\gamma0\mu1} & \sigma_{\mu01} & \sigma^2_{\mu1}\\\end{bmatrix}\]

Using the SNP effect esimates from the revised base model we end up with exactly the same formula as above for PGI construction and growth modeling.

3 Levels simplified

We simplify our approach above by removing \(\mu_1\), the random slope of time. We previously estimated \(\mu_{01}\) in simulation and found it to be small. In addition, modeling it will introduce complexity and its interpretation may hamper our ability to more directly address our overarching research question. To further simplify our model, we also assume the random terms are uncorrelated.

We keep \(\gamma_0\) and \(\mu_0\) since we want to decouple the variance due to individual genetic predisposition from non-genetic factors within the individual. Hence, we propose the following.

\[AP = \beta_0 + \beta_1 \textbf{SNP} + \beta_2 \textbf{PC1} + ... + \beta_{11} \textbf{PC10} + \gamma_0 + \epsilon\]

\[AP = \beta_0 + \beta_1 \textbf{time} + \beta_2 \textbf{SNP} + \beta_3 \textbf{PC1} + ... + \beta_{12} \textbf{PC10} + \beta_{13} \textbf{SNP:time} + \gamma_0 + \epsilon\]

\[\begin{multline} \begin{split} AP_i = \beta_0 + \beta_1 \textbf{time}_i + \beta_2 \textbf{SNP}_i & + \beta_3 \textbf{PC1}_i + ... + \beta_{12} \textbf{PC10}_i + \beta_{13} \textbf{SNP:time}_i \\& + \gamma_{0i} + \mu_{0i} + \epsilon_{i} \end{split} \end{multline}\] Our variance components are

\[L1: \epsilon \sim N(0, \sigma^2)\]

\[L2: \gamma_0 \sim N(0, \sigma^2_{\gamma0})\] \[L3: \mu_0 \sim N(0, \sigma^2_{\mu0}).\]

SNP effect on phenotype variance

We have previously regressed our phenotype, academic performance in Math or English, on SNP. We can take the residual, \(\epsilon = \hat{AP} - AP\), square then divide it by its standard deviation to get the variance.

\[Var(AP) = (\hat{AP} - AP)^2 /\sigma_{\hat{AP}}\]

We then regress it on SNP and iteratively add covariates as above.

\[Var(AP) = \beta_0 + \beta_1 \textbf{SNP} + \beta_2 \textbf{PC1} + ... + \beta_{11} \textbf{PC10} + \gamma_0\] \[Var(AP) = \beta_0 + \beta_1 \textbf{time} + \beta_2 \textbf{SNP} + \beta_3 \textbf{PC1} + ... + \beta_{12} \textbf{PC10} + \beta_{13} \textbf{SNP:time} + \gamma_0\]

\[\begin{multline} \begin{split} Var(AP_i) = \beta_0 + \beta_1 \textbf{time}_i + \beta_2 \textbf{SNP}_i & + \beta_3 \textbf{PC1}_i + ... + \beta_{12} \textbf{PC10}_i + \beta_{13} \textbf{SNP:time}_i \\& + \gamma_{0i} + \mu_{0i} \end{split} \end{multline}\] The variance components are \(\gamma_0 \sim N(0, \sigma^2_{\gamma0})\) and \(\mu_0 \sim N(0, \sigma^2_{\mu0}).\)

SNP effect on mean and variance

We saw in our simulation that mean and variance are inherently correlated. In order to account for this relationship, we estimate SNPs’ effect on phenotype mean and variance concurrently using the simplified model discussed above but with 2 instead of 3 levels. We employ a hierarchical generalized linear mixed effects model (HGLM) [@cao2015; @rönnegård2010]:

\[Y_{ij} = \textbf{X}_{ij} \beta + G_{1j} \gamma_1 + G_{2j} \gamma_2 + \mu_{j} + \delta_{j} + \epsilon_{ij}, \] where \(i\) indexes time whereas \(j\) the person, \(\textbf{X}_{ij}\) is a matrix of covariates (i.e., time, sex, batch, 10 PCs) including the intercept with \(\beta\) as their respective effect size, \(G_{1j}\)/\(G_{2j}\) are the copies of the effect allele (expressed as deviation from the family mean) with \(\gamma_1\)/\(\gamma_2\) as their effect on phenotype mean, \(\mu_{j}\) is the random intercept of individual, \(\delta_{j}\) is the genetic relatedness matrix (GRM) and \[ \epsilon_{ij} \sim N (0, G_{0j} \sigma_0^2 + G_{1j} \sigma_1^2 + G_{2j} \sigma_2^2) \]

We assume uncorrelated variance components: \(\mu_{j} \sim N(0, \sigma^2_{\mu j})\), \(\delta_{j} \sim N(0, \sigma^2_{F}\Phi)\), and \(\epsilon_{ij} \sim N(0, \textbf{V}_\textbf{E})\) where \(\textbf{V}_\textbf{E}\) is an \(n \times n\) matrix with the \(i^{th}\) diagonal equal to \(G_{0j} \sigma_0^2 + G_{1j} \sigma_1^2 + G_{2j} \sigma_2^2\) and \(\Phi\) is \(2\times\) the kinship matrix.