To conduct the analysis, we sliced the 24 columns representing the 24 akt1 SNPs, NDRM.CH (percentage change in non-dominant arm muscle strength), Race, Gender and Age from the original dataset fms. Then we filtered out the subject whose race was Caucasian. This resulted in a new data frame with dimension of \(791 \times 28\). For each column representing akt1 SNPs, we mapped the genotype, such as AA/AT/TT or CC/CT/TT, to 0, 1, 2 if A or C was the reference allele. This gave us 24 new columns coded as 0, 1 and 2 that could be treated as continuous variables. This new data frame was adopted for this homework.
Two simple linear regression (SLR) models were fitted. The quantitative NDRM.CH was regressed on every SNP.
In the first model as shown, \(SNP_{i, j}\) is the numerical representation of genotype \(j\), (0, 1, 2) for the \(i^{th}\) subject. The first model is an additive model with 1 degree of freedom. This procedure gave us 24 SLRs, represented by each row of the table below. \(N\) in the table below represents the number of subjects for each SLR. Column p-Value: Additive Model (df = 1) represents the p-value associated with the \(SNP_j\) in the SLR.
\[NDRM.CH_i = \alpha + \beta \times SNP_{i, j} + \epsilon_{i}. j = 1, \ldots, 24; i = 1, \ldots, N; SNP_{i,j} = 0, 1, 2\]
In the second model as shown, \(GENOTYPE_{i, j}\) is the genotype (XX, XY, YY) of SNP that is considered as a categorical variable. XX is treated as the reference level. The two columns in the table, p-Value: Allele 1 (2 df Test) and p-Value: Allele 2 (2 df Test), represent the p-value associated with XY and YY. This model is also repetitively fitted for the 24 SNPs.
\[NDRM.CH_i = \alpha + \beta_1 \times [GENOTYPE_{i, j} == XY] + \beta_2 \times [GENOTYPE_{i, j} == YY] + \epsilon_{i}. j = 1, \ldots, 24; i = 1, \ldots, N; GENOTYPE_{i, j} = XX, XY, YY\]
For akt1_g1780a_g363a, its genotype only has two categories, GA and GG. This causes the column p-Value: Allele 2 (2 df Test) of akt1_g1780a_g363a to be 0.
We can assume the type I error to be \(\alpha = 0.05\). With the Bonferroni correction, the significance level for each of the 24 tests becomes \(\alpha* = \frac{0.05}{24} = 0.002\). According to the table below, no SNP is statistically significant at \(\alpha* = 0.002\) under either the additive model or the model with df = 2. Therefore, we can conclude that none of the 24 SNPs contribute significant amount of information to explain percentage change in non-dominant arm muscle strength under the setting of SLR within the Caucasian population.
| SNP | N | p-Value: Additive Model (df = 1) | p-Value: Allele 1 (2 df Test) | p-Value: Allele 2 (2 df Test) |
|---|---|---|---|---|
| akt1_t22932c | 723 | 0.714 | 0.2312 | 0.2341 |
| akt1_g15129a | 725 | 0.9035 | 0.7504 | 0.8058 |
| akt1_g14803t | 761 | 0.8894 | 0.354 | 0.8778 |
| akt1_c10744t_c12886t | 722 | 0.5093 | 0.7334 | 0.426 |
| akt1_t10726c_t12868c | 719 | 0.1752 | 0.3834 | 0.186 |
| akt1_t10598a_t12740a | 722 | 0.5712 | 0.9658 | 0.5034 |
| akt1_c9756a_c11898t | 726 | 0.7819 | 0.6395 | 0.963 |
| akt1_t8407g | 723 | 0.7658 | 0.6014 | 0.703 |
| akt1_a7699g | 725 | 0.3656 | 0.5311 | 0.7446 |
| akt1_c6148t_c8290t | 761 | 0.6675 | 0.2316 | 0.2058 |
| akt1_c6024t_c8166t | 726 | 0.6783 | 0.6133 | 0.8282 |
| akt1_c5854t_c7996t | 761 | 0.5183 | 0.4803 | 0.4128 |
| akt1_c832g_c3359g | 761 | 0.3193 | 0.1987 | 0.6761 |
| akt1_g288c | 761 | 0.4077 | 0.8275 | 0.4643 |
| akt1_g1780a_g363a | 761 | 0.3295 | 0.3295 | NA |
| akt1_g2347t_g205t | 726 | 0.8749 | 0.991 | 0.8328 |
| akt1_g2375a_g233a | 725 | 0.7996 | 0.738 | 0.7112 |
| akt1_g4362c | 723 | 0.3966 | 0.6637 | 0.8875 |
| akt1_c15676t | 761 | 0.5623 | 0.7022 | 0.6153 |
| akt1_a15756t | 761 | 0.4804 | 0.6426 | 0.4843 |
| akt1_g20703a | 761 | 0.6664 | 0.4443 | 0.4212 |
| akt1_g22187a | 722 | 0.2061 | 0.7514 | 0.6279 |
| akt1_a22889g | 761 | 0.5641 | 0.1897 | 0.6987 |
| akt1_g23477a | 761 | 0.4478 | 0.7684 | 0.7273 |
Age is considered as a continuous covariate. If the model is adjusted for age only, the additive model can be re-written as:
\[NDRM.CH_i = \alpha + \beta \times SNP_{i, j} + \gamma \times Age_{i} + \epsilon_{i}.\\ j = 1, \ldots, 24; i = 1, \ldots, N; SNP_{i,j} = 0, 1, 2\] Similarly, the model with \(df=2\) becomes: \[NDRM.CH_i = \alpha + \beta_1 \times [GENOTYPE_{i, j} == XY]+ \beta_2 \times [GENOTYPE_{i, j} == YY] + \gamma \times Age_{i} + \epsilon_{i}. \\ j = 1, \ldots, 24; i = 1, \ldots, N; GENOTYPE_{i, j} = XX, XY, YY\]
In the table below, we still cannot recognize any SNPs that can significantly contribute to the percentage change in non-dominant arm muscle strength under the setting of SLR within the Caucasian population at \(\alpha* = 0.002\).
| SNP | N | p-Value: Additive Model (df = 1) | p-Value: Allele 1 (2 df Test) | p-Value: Allele 2 (2 df Test) |
|---|---|---|---|---|
| akt1_t22932c | 716 | 0.7761 | 0.3516 | 0.3547 |
| akt1_g15129a | 718 | 0.7615 | 0.8312 | 0.6683 |
| akt1_g14803t | 754 | 0.8466 | 0.3927 | 0.9261 |
| akt1_c10744t_c12886t | 715 | 0.6354 | 0.9776 | 0.3169 |
| akt1_t10726c_t12868c | 712 | 0.1929 | 0.3109 | 0.3511 |
| akt1_t10598a_t12740a | 715 | 0.7412 | 0.8211 | 0.6511 |
| akt1_c9756a_c11898t | 719 | 0.8738 | 0.878 | 0.9135 |
| akt1_t8407g | 717 | 0.9851 | 0.9829 | 0.989 |
| akt1_a7699g | 718 | 0.4859 | 0.8615 | 0.9775 |
| akt1_c6148t_c8290t | 754 | 0.9587 | 0.6122 | 0.3086 |
| akt1_c6024t_c8166t | 719 | 0.7709 | 0.8376 | 0.7988 |
| akt1_c5854t_c7996t | 754 | 0.4256 | 0.5096 | 0.4061 |
| akt1_c832g_c3359g | 754 | 0.4654 | 0.4012 | 0.9427 |
| akt1_g288c | 754 | 0.5346 | 0.8505 | 0.5781 |
| akt1_g1780a_g363a | 754 | 0.6617 | 0.6617 | NA |
| akt1_g2347t_g205t | 719 | 0.591 | 0.6964 | 0.6408 |
| akt1_g2375a_g233a | 718 | 0.7454 | 0.586 | 0.5668 |
| akt1_g4362c | 717 | 0.5617 | 0.9563 | 0.8283 |
| akt1_c15676t | 754 | 0.66 | 0.7767 | 0.6972 |
| akt1_a15756t | 754 | 0.5649 | 0.6001 | 0.5155 |
| akt1_g20703a | 754 | 0.7433 | 0.48 | 0.4699 |
| akt1_g22187a | 715 | 0.2931 | 0.6888 | 0.7564 |
| akt1_a22889g | 754 | 0.5142 | 0.1622 | 0.6503 |
| akt1_g23477a | 754 | 0.492 | 0.6838 | 0.8086 |
Gender is considered as a categorical covariate. If the model is adjusted for gender only, female is the reference level. The additive model can be re-written as:
\[NDRM.CH_i = \alpha + \beta \times SNP_{i, j} + \gamma \times [Gender_{i} == Male] + \epsilon_{i}. \\ j = 1, \ldots, 24; i = 1, \ldots, N; SNP_{i,j} = 0, 1, 2\]
Similarly, the model with \(df=2\) becomes: \[NDRM.CH_i = \alpha + \beta_1 \times [GENOTYPE_{i, j} == XY]+ \beta_2 \times [GENOTYPE_{i, j} == YY] + \gamma \times [Gender_{i} == Male] + \epsilon_{i}.\\ j = 1, \ldots, 24; i = 1, \ldots, N; GENOTYPE_{i, j} = XX, XY, YY; Gender_{i} = Male, Female.\]
In the table below, we still cannot recognize any SNPs that can significantly contribute to the percentage change in non-dominant arm muscle strength under the setting of SLR within the Caucasian population at \(\alpha* = 0.002\).
| SNP | N | p-Value: Additive Model (df = 1) | p-Value: Allele 1 (2 df Test) | p-Value: Allele 2 (2 df Test) |
|---|---|---|---|---|
| akt1_t22932c | 723 | 0.7466 | 0.5487 | 0.6358 |
| akt1_g15129a | 725 | 0.5501 | 0.9315 | 0.4644 |
| akt1_g14803t | 761 | 0.7 | 0.2561 | 0.5191 |
| akt1_c10744t_c12886t | 722 | 0.8066 | 0.9396 | 0.6913 |
| akt1_t10726c_t12868c | 719 | 0.1576 | 0.3293 | 0.2113 |
| akt1_t10598a_t12740a | 722 | 0.9081 | 0.698 | 0.8027 |
| akt1_c9756a_c11898t | 726 | 0.7848 | 0.9931 | 0.7019 |
| akt1_t8407g | 723 | 0.9122 | 0.661 | 0.719 |
| akt1_a7699g | 725 | 0.3225 | 0.6781 | 0.9106 |
| akt1_c6148t_c8290t | 761 | 0.8502 | 0.3324 | 0.1729 |
| akt1_c6024t_c8166t | 726 | 0.8909 | 0.9224 | 0.7996 |
| akt1_c5854t_c7996t | 761 | 0.6946 | 0.4495 | 0.437 |
| akt1_c832g_c3359g | 761 | 0.2994 | 0.2329 | 0.9167 |
| akt1_g288c | 761 | 0.7957 | 0.9233 | 0.8118 |
| akt1_g1780a_g363a | 761 | 0.6582 | 0.6582 | NA |
| akt1_g2347t_g205t | 726 | 0.4818 | 0.7034 | 0.491 |
| akt1_g2375a_g233a | 725 | 0.4312 | 0.3876 | 0.3163 |
| akt1_g4362c | 723 | 0.4066 | 0.5002 | 0.7193 |
| akt1_c15676t | 761 | 0.7806 | 0.8858 | 0.768 |
| akt1_a15756t | 761 | 0.5477 | 0.4473 | 0.4264 |
| akt1_g20703a | 761 | 0.8195 | 0.5465 | 0.5469 |
| akt1_g22187a | 722 | 0.5022 | 0.7005 | 0.9066 |
| akt1_a22889g | 761 | 0.8329 | 0.4703 | 0.9232 |
| akt1_g23477a | 761 | 0.7422 | 0.927 | 0.8648 |
If the model is adjusted for gender and age, female is the reference level. The additive model can be re-written as:
\[NDRM.CH_i = \alpha + \beta \times SNP_{i, j} + \gamma \times [Gender_{i} == Male] +\eta \times Age_i + \epsilon_{i}. \\ j = 1, \ldots, 24; i = 1, \ldots, N; SNP_{i,j} = 0, 1, 2\]
Similarly, the model with \(df=2\) becomes: \[NDRM.CH_i = \alpha + \beta_1 \times [GENOTYPE_{i, j} == XY]+ \beta_2 \times [GENOTYPE_{i, j} == YY] + \gamma \times [Gender_{i} == Male] +\eta \times Age_i+ \epsilon_{i}.\\ j = 1, \ldots, 24; i = 1, \ldots, N; GENOTYPE_{i, j} = XX, XY, YY; Gender_{i} = Male, Female.\]
In the table below, we still cannot recognize any SNPs that can significantly contribute to the percentage change in non-dominant arm muscle strength under the setting of SLR within the Caucasian population at \(\alpha* = 0.002\).
For SNP akt1_g2375a_g233a, the subject being coded as decided is always removed.
| SNP | N | p-Value: Additive Model (df = 1) | p-Value: Allele 1 (2 df Test) | p-Value: Allele 2 (2 df Test) |
|---|---|---|---|---|
| akt1_t22932c | 716 | 0.6767 | 0.7453 | 0.843 |
| akt1_g15129a | 718 | 0.4111 | 0.9449 | 0.3434 |
| akt1_g14803t | 754 | 0.7086 | 0.2893 | 0.5374 |
| akt1_c10744t_c12886t | 715 | 0.9555 | 0.8126 | 0.5502 |
| akt1_t10726c_t12868c | 712 | 0.1621 | 0.246 | 0.388 |
| akt1_t10598a_t12740a | 715 | 0.873 | 0.5318 | 0.9966 |
| akt1_c9756a_c11898t | 719 | 0.6797 | 0.7087 | 0.7632 |
| akt1_t8407g | 717 | 0.8107 | 0.9069 | 0.9713 |
| akt1_a7699g | 718 | 0.4531 | 0.9646 | 0.8102 |
| akt1_c6148t_c8290t | 754 | 0.7455 | 0.8046 | 0.2609 |
| akt1_c6024t_c8166t | 719 | 0.778 | 0.7877 | 0.8428 |
| akt1_c5854t_c7996t | 754 | 0.6039 | 0.4712 | 0.4302 |
| akt1_c832g_c3359g | 754 | 0.4603 | 0.478 | 0.8102 |
| akt1_g288c | 754 | 0.9893 | 0.9382 | 0.971 |
| akt1_g1780a_g363a | 754 | 0.9233 | 0.9233 | NA |
| akt1_g2347t_g205t | 719 | 0.2695 | 0.3944 | 0.3524 |
| akt1_g2375a_g233a | 718 | 0.3996 | 0.3143 | 0.254 |
| akt1_g4362c | 717 | 0.5918 | 0.8537 | 0.9934 |
| akt1_c15676t | 754 | 0.9251 | 0.992 | 0.8604 |
| akt1_a15756t | 754 | 0.6537 | 0.4386 | 0.4792 |
| akt1_g20703a | 754 | 0.9013 | 0.5857 | 0.6024 |
| akt1_g22187a | 715 | 0.6975 | 0.604 | 0.8995 |
| akt1_a22889g | 754 | 0.809 | 0.4329 | 0.9058 |
| akt1_g23477a | 754 | 0.8035 | 0.7949 | 0.9759 |
Even though the 24 akt1 SNPs are not statistically significant with Bonferroni correction, both Age and Gender are statistically significant at \(\alpha* = 0.002\).
I can use the additive model (model 1 in part 4) as an example to show the p-value associated with Age and Gender|Male can contribute significant amount information to the percentage change in non-dominant arm muscle strength under the setting of SLR within the Caucasian population at \(\alpha* = 0.002\). Please see more details from the figure below.