3.0 Genotypes

3.0.1 raw genotype

Initially, there are 784,256 raw SNPs in total. The raw SNP data was filtered by MAF (0.05), HWE (1e-6), missing rate (0.05). Then the sparse missing genotypes were imputed using Beagle.

Then, within each sub-group (All, Brothers, Sisters, Bro-Sis pairs, Bro pairs + Sis pairs, Subset sisters), we further filtered SNPs by MAF (0.05) and HWE (1e-6) and the number of SNPs are as follows:

  • All: 350,794
  • Brothers: 348,309
  • Sisters: 349,113
  • Bro-Sis pairs: 349,390
  • Bro pairs + Sis pairs: 349,622
  • Subset sisters: 348,319

3.1 Population genetics for GRM

3.1.1 Unrelated GRM - \(m_e\)

\[ m_e=1/\text{var}(\text{GRM}_{\text{offdiag}}) \]

3.1.2 Sib GRM - genome length (Morgan)

The genome length measured in Morgan can be put in the following:

\[ m_{e.\text{IBS}}=\text{var}(\text{GRM}_{\text{sib}})=\frac{1}{16L} \]

3.1.3 IBS distribution across All, Brothers, Sisiters, Bro-Sis pairs, Bro pairs + Sis pairs

We can depict the distribution of IBS across differnt groups of families and calculate the \(\frac{1}{\text{var}(\text{GRM}_{\text{sib}})}\) using clean SNPs

Compared with Visscher et al. 2006, we can see that there is a good-fitting line between \(\text{var}(\text{IBS})\) and the length calculated in their paper:

3.1.4 IBS for each locus

The variance IBS score (\(g_i\)) for \(i^{th}\) locus between a pair sib is \(\text{var}(g_i)=\theta^2r_{ii}^2+r_{ii}^2\). We calculated \(\text{var}(g_i)\) across all loci with minor allel frequency greater than 0.05 and number of genotyped families greater than 500, and the density of \(\text{var}(g_i)\) can be shown below:

The pattern of \(\text{var}(g_i)\) across differnt MAF can be observed below (The red line is theoritical line):

Simulation LD

According to SimulationLD.pdf, I simulated a toy example with two loci:

set.seed(12345)
# Allele frequency of two loci 
f_A<-0.6
f_a<-1-f_A
f_B<-0.7
f_b<-1-f_B

# D and generation
D<-0.1
t<-1

# Sample size 100
size<-100

# First locus sampled based on allele frequency
gen1<-sample(0:1,size*2,replace = T,prob=c(f_a,f_A))

# Second locus 
gen2<-numeric(size*2)

for(i in 1:length(gen2)){
  if(gen1[i]==1){
    gen2[i]<-sample(0:1,1,prob=c((f_A*f_b-D^t)/f_A,(f_A*f_B+D^t)/f_A))
  }else{
    gen2[i]<-sample(0:1,1,prob=c((f_a*f_b+D^t)/f_a,(f_a*f_B-D^t)/f_a))
  }
}

# Re-estimate allele frequencies and D from samples
p_A<-sum(gen1)/(2*size)
p_B<-sum(gen2)/(2*size)
D_hat<-sum(gen1==1 & gen2==1)/(2*size) - p_A*p_B

cat(" Real allele frequency of A is ", f_A, "; Estimated allele frequency of A is ", p_A, "\n",
    "Real allele frequency of B is ", f_B, "; Estimated allele frequency of A is ", p_B, "\n",
"Real D is ", D, "; Estimated D is ", D_hat, "\n"
)
##  Real allele frequency of A is  0.6 ; Estimated allele frequency of A is  0.51 
##  Real allele frequency of B is  0.7 ; Estimated allele frequency of A is  0.745 
##  Real D is  0.1 ; Estimated D is  0.09005

3.1.5 IBD correlation for sibs

In contrast, for IBD for a pair of sib is \((1-2c_{ij})^2\), in which \(c_{ij}\) is the recombination faction between a pair of loci \(i\) and \(j\).

3.2 Estimation for heritability (\(h^2_{\text{SNP}}\))

3.2.1 unrelated \(h^2\) - MOM

This part was calculated by Guoan using 280K individuals across 81 quantitative traits.

3.2.2 Sib \(h^2\) - IBS-linkage

Two scenarios:

  • SingleSib: randomly selected one individual from each family and calculate \(h^2\) using HE-reg method implemented in GCTA across 81 quantitative traits and 251 metabolism traits

  • Sib: Calculate \(h^2\) (cross-product or squared difference) using HE-reg based on family IBS across 81 quantitative traits and 251 metabolism traits

Results

  • Sib vs Unrelated

Square of difference:

Corss-product:

  • Sib vs SingleSib

Square of difference:

Corss-product:

  • SingleSib vs Unrelated

3.3 GWAS

3.3.1 Unrelated GWAS 280K unrelated UKB

\(y_{\text{self}}=a+bx+e\)

NCP and smpling variance, statistical power.

3.3.2 Single-sib GWAS, 18K unrelated UKB

\(y_{\text{self}}=a+bx+e\)

3.3.3 Alternative sib GWAS, 18K

\(y_{\text{sib}}=a+bx+e\), here \(y_{\text{sib}}\) means the phenotype of the other sib.

We can now check the results between Single-sib and Alternative sib:

Height

BMI

AgeDiabetes

4 Discrepancy between GWAS and linkage (IBS-linkage)

I have completed the comparison between IBS GWAS based on square difference and single-sib GWAS:

  • Height

  • BMI

  • AgeDiabetes

between IBS GWAS based on cross product and single-sib GWAS:

  • Height

  • BMI

  • AgeDiabetes