Part 0. Data Processing

To conduct the analysis, we sliced the 24 columns representing the 24 akt1 SNPs, NDRM.CH (percentage change in non-dominant arm muscle strength), Race, Gender and Age from the original dataset fms. Then we filtered out the subject whose race was Caucasian. This resulted in a new data frame with dimension of \(791 \times 28\). For each column representing akt1 SNPs, we mapped the genotype, such as AA/AT/TT or CC/CT/TT, to 0, 1, 2 if A or C was the reference allele. This gave us 24 new columns coded as 0, 1 and 2 that could be treated as continuous variables. This new data frame was adopted for this homework.

Part 1. Un-adjusted analyses in Caucasians

Two simple linear regression (SLR) models were fitted. The quantitative NDRM.CH was regressed on every SNP.

In the first model as shown, \(SNP_{i, j}\) is the numerical representation of genotype \(j\), (0, 1, 2) for the \(i^{th}\) subject. The first model is an additive model with 1 degree of freedom. This procedure gave us 24 SLRs, represented by each row of the table below. \(N\) in the table below represents the number of subjects for each SLR. Column p-Value: Additive Model (df = 1) represents the p-value associated with the \(SNP_j\) in the SLR.

\[NDRM.CH_i = \alpha + \beta \times SNP_{i, j} + \epsilon_{i}. j = 1, \ldots, 24; i = 1, \ldots, N; SNP_{i,j} = 0, 1, 2\]

In the second model as shown, \(GENOTYPE_{i, j}\) is the genotype (XX, XY, YY) of SNP that is considered as a categorical variable. XX is treated as the reference level. The two columns in the table, p-Value: Allele 1 (2 df Test) and p-Value: Allele 2 (2 df Test), represent the p-value associated with XY and YY. This model is also repetitively fitted for the 24 SNPs.

\[NDRM.CH_i = \alpha + \beta_1 \times [GENOTYPE_{i, j} == XY] + \beta_2 \times [GENOTYPE_{i, j} == YY] + \epsilon_{i}. j = 1, \ldots, 24; i = 1, \ldots, N; GENOTYPE_{i, j} = XX, XY, YY\]

For akt1_g1780a_g363a, its genotype only has two categories, GA and GG. This causes the column p-Value: Allele 2 (2 df Test) of akt1_g1780a_g363a to be 0.

We can assume the type I error to be \(\alpha = 0.05\). With the Bonferroni correction, the significance level for each of the 24 tests becomes \(\alpha* = \frac{0.05}{24} = 0.002\). According to the table below, no SNP is statistically significant at \(\alpha* = 0.002\) under either the additive model or the model with df = 2. Therefore, we can conclude that none of the 24 SNPs contribute significant amount of information to explain percentage change in non-dominant arm muscle strength under the setting of SLR within the Caucasian population.

SNP N p-Value: Additive Model (df = 1) p-Value: Allele 1 (2 df Test) p-Value: Allele 2 (2 df Test)
akt1_t22932c 723 0.714 0.2312 0.2341
akt1_g15129a 725 0.9035 0.7504 0.8058
akt1_g14803t 761 0.8894 0.354 0.8778
akt1_c10744t_c12886t 722 0.5093 0.7334 0.426
akt1_t10726c_t12868c 719 0.1752 0.3834 0.186
akt1_t10598a_t12740a 722 0.5712 0.9658 0.5034
akt1_c9756a_c11898t 726 0.7819 0.6395 0.963
akt1_t8407g 723 0.7658 0.6014 0.703
akt1_a7699g 725 0.3656 0.5311 0.7446
akt1_c6148t_c8290t 761 0.6675 0.2316 0.2058
akt1_c6024t_c8166t 726 0.6783 0.6133 0.8282
akt1_c5854t_c7996t 761 0.5183 0.4803 0.4128
akt1_c832g_c3359g 761 0.3193 0.1987 0.6761
akt1_g288c 761 0.4077 0.8275 0.4643
akt1_g1780a_g363a 761 0.3295 0.3295 NA
akt1_g2347t_g205t 726 0.8749 0.991 0.8328
akt1_g2375a_g233a 725 0.7996 0.738 0.7112
akt1_g4362c 723 0.3966 0.6637 0.8875
akt1_c15676t 761 0.5623 0.7022 0.6153
akt1_a15756t 761 0.4804 0.6426 0.4843
akt1_g20703a 761 0.6664 0.4443 0.4212
akt1_g22187a 722 0.2061 0.7514 0.6279
akt1_a22889g 761 0.5641 0.1897 0.6987
akt1_g23477a 761 0.4478 0.7684 0.7273

Part 2. Age-adjusted Analyses in Caucasians

Age is considered as a continuous covariate. If the model is adjusted for age only, the additive model can be re-written as:

\[NDRM.CH_i = \alpha + \beta \times SNP_{i, j} + \gamma \times Age_{i} + \epsilon_{i}.\\ j = 1, \ldots, 24; i = 1, \ldots, N; SNP_{i,j} = 0, 1, 2\] Similarly, the model with \(df=2\) becomes: \[NDRM.CH_i = \alpha + \beta_1 \times [GENOTYPE_{i, j} == XY]+ \beta_2 \times [GENOTYPE_{i, j} == YY] + \gamma \times Age_{i} + \epsilon_{i}. \\ j = 1, \ldots, 24; i = 1, \ldots, N; GENOTYPE_{i, j} = XX, XY, YY\]

In the table below, we still cannot recognize any SNPs that can significantly contribute to the percentage change in non-dominant arm muscle strength under the setting of SLR within the Caucasian population at \(\alpha* = 0.002\).

SNP N p-Value: Additive Model (df = 1) p-Value: Allele 1 (2 df Test) p-Value: Allele 2 (2 df Test)
akt1_t22932c 716 0.7761 0.3516 0.3547
akt1_g15129a 718 0.7615 0.8312 0.6683
akt1_g14803t 754 0.8466 0.3927 0.9261
akt1_c10744t_c12886t 715 0.6354 0.9776 0.3169
akt1_t10726c_t12868c 712 0.1929 0.3109 0.3511
akt1_t10598a_t12740a 715 0.7412 0.8211 0.6511
akt1_c9756a_c11898t 719 0.8738 0.878 0.9135
akt1_t8407g 717 0.9851 0.9829 0.989
akt1_a7699g 718 0.4859 0.8615 0.9775
akt1_c6148t_c8290t 754 0.9587 0.6122 0.3086
akt1_c6024t_c8166t 719 0.7709 0.8376 0.7988
akt1_c5854t_c7996t 754 0.4256 0.5096 0.4061
akt1_c832g_c3359g 754 0.4654 0.4012 0.9427
akt1_g288c 754 0.5346 0.8505 0.5781
akt1_g1780a_g363a 754 0.6617 0.6617 NA
akt1_g2347t_g205t 719 0.591 0.6964 0.6408
akt1_g2375a_g233a 718 0.7454 0.586 0.5668
akt1_g4362c 717 0.5617 0.9563 0.8283
akt1_c15676t 754 0.66 0.7767 0.6972
akt1_a15756t 754 0.5649 0.6001 0.5155
akt1_g20703a 754 0.7433 0.48 0.4699
akt1_g22187a 715 0.2931 0.6888 0.7564
akt1_a22889g 754 0.5142 0.1622 0.6503
akt1_g23477a 754 0.492 0.6838 0.8086

Part 3. Gender-adjusted Analyses in Caucasians

Gender is considered as a categorical covariate. If the model is adjusted for gender only, female is the reference level. The additive model can be re-written as:

\[NDRM.CH_i = \alpha + \beta \times SNP_{i, j} + \gamma \times [Gender_{i} == Male] + \epsilon_{i}. \\ j = 1, \ldots, 24; i = 1, \ldots, N; SNP_{i,j} = 0, 1, 2\]

Similarly, the model with \(df=2\) becomes: \[NDRM.CH_i = \alpha + \beta_1 \times [GENOTYPE_{i, j} == XY]+ \beta_2 \times [GENOTYPE_{i, j} == YY] + \gamma \times [Gender_{i} == Male] + \epsilon_{i}.\\ j = 1, \ldots, 24; i = 1, \ldots, N; GENOTYPE_{i, j} = XX, XY, YY; Gender_{i} = Male, Female.\]

In the table below, we still cannot recognize any SNPs that can significantly contribute to the percentage change in non-dominant arm muscle strength under the setting of SLR within the Caucasian population at \(\alpha* = 0.002\).

SNP N p-Value: Additive Model (df = 1) p-Value: Allele 1 (2 df Test) p-Value: Allele 2 (2 df Test)
akt1_t22932c 723 0.7466 0.5487 0.6358
akt1_g15129a 725 0.5501 0.9315 0.4644
akt1_g14803t 761 0.7 0.2561 0.5191
akt1_c10744t_c12886t 722 0.8066 0.9396 0.6913
akt1_t10726c_t12868c 719 0.1576 0.3293 0.2113
akt1_t10598a_t12740a 722 0.9081 0.698 0.8027
akt1_c9756a_c11898t 726 0.7848 0.9931 0.7019
akt1_t8407g 723 0.9122 0.661 0.719
akt1_a7699g 725 0.3225 0.6781 0.9106
akt1_c6148t_c8290t 761 0.8502 0.3324 0.1729
akt1_c6024t_c8166t 726 0.8909 0.9224 0.7996
akt1_c5854t_c7996t 761 0.6946 0.4495 0.437
akt1_c832g_c3359g 761 0.2994 0.2329 0.9167
akt1_g288c 761 0.7957 0.9233 0.8118
akt1_g1780a_g363a 761 0.6582 0.6582 NA
akt1_g2347t_g205t 726 0.4818 0.7034 0.491
akt1_g2375a_g233a 725 0.4312 0.3876 0.3163
akt1_g4362c 723 0.4066 0.5002 0.7193
akt1_c15676t 761 0.7806 0.8858 0.768
akt1_a15756t 761 0.5477 0.4473 0.4264
akt1_g20703a 761 0.8195 0.5465 0.5469
akt1_g22187a 722 0.5022 0.7005 0.9066
akt1_a22889g 761 0.8329 0.4703 0.9232
akt1_g23477a 761 0.7422 0.927 0.8648

Part 4. Both Age and Gender-adjusted Analyses in Caucasians

If the model is adjusted for gender and age, female is the reference level. The additive model can be re-written as:

\[NDRM.CH_i = \alpha + \beta \times SNP_{i, j} + \gamma \times [Gender_{i} == Male] +\eta \times Age_i + \epsilon_{i}. \\ j = 1, \ldots, 24; i = 1, \ldots, N; SNP_{i,j} = 0, 1, 2\]

Similarly, the model with \(df=2\) becomes: \[NDRM.CH_i = \alpha + \beta_1 \times [GENOTYPE_{i, j} == XY]+ \beta_2 \times [GENOTYPE_{i, j} == YY] + \gamma \times [Gender_{i} == Male] +\eta \times Age_i+ \epsilon_{i}.\\ j = 1, \ldots, 24; i = 1, \ldots, N; GENOTYPE_{i, j} = XX, XY, YY; Gender_{i} = Male, Female.\]

In the table below, we still cannot recognize any SNPs that can significantly contribute to the percentage change in non-dominant arm muscle strength under the setting of SLR within the Caucasian population at \(\alpha* = 0.002\).

For SNP akt1_g2375a_g233a, the subject being coded as decided is always removed.

SNP N p-Value: Additive Model (df = 1) p-Value: Allele 1 (2 df Test) p-Value: Allele 2 (2 df Test)
akt1_t22932c 716 0.6767 0.7453 0.843
akt1_g15129a 718 0.4111 0.9449 0.3434
akt1_g14803t 754 0.7086 0.2893 0.5374
akt1_c10744t_c12886t 715 0.9555 0.8126 0.5502
akt1_t10726c_t12868c 712 0.1621 0.246 0.388
akt1_t10598a_t12740a 715 0.873 0.5318 0.9966
akt1_c9756a_c11898t 719 0.6797 0.7087 0.7632
akt1_t8407g 717 0.8107 0.9069 0.9713
akt1_a7699g 718 0.4531 0.9646 0.8102
akt1_c6148t_c8290t 754 0.7455 0.8046 0.2609
akt1_c6024t_c8166t 719 0.778 0.7877 0.8428
akt1_c5854t_c7996t 754 0.6039 0.4712 0.4302
akt1_c832g_c3359g 754 0.4603 0.478 0.8102
akt1_g288c 754 0.9893 0.9382 0.971
akt1_g1780a_g363a 754 0.9233 0.9233 NA
akt1_g2347t_g205t 719 0.2695 0.3944 0.3524
akt1_g2375a_g233a 718 0.3996 0.3143 0.254
akt1_g4362c 717 0.5918 0.8537 0.9934
akt1_c15676t 754 0.9251 0.992 0.8604
akt1_a15756t 754 0.6537 0.4386 0.4792
akt1_g20703a 754 0.9013 0.5857 0.6024
akt1_g22187a 715 0.6975 0.604 0.8995
akt1_a22889g 754 0.809 0.4329 0.9058
akt1_g23477a 754 0.8035 0.7949 0.9759

Part 5. Other Comments

Even though the 24 akt1 SNPs are not statistically significant with Bonferroni correction, both Age and Gender are statistically significant at \(\alpha* = 0.002\).

I can use the additive model (model 1 in part 4) as an example to show the p-value associated with Age and Gender|Male can contribute significant amount information to the percentage change in non-dominant arm muscle strength under the setting of SLR within the Caucasian population at \(\alpha* = 0.002\). Please see more details from the figure below.