Table of contents

## [1] "2021-03-05 21:56:52 CST"

1. Original publication & tools

It is an linear model implementation for genomic scan for loci under selection

Related publications

1 EigenGWAS: finding loci under selection through genome-wide association studies of eigenvectors in structured populations. Heredity, 2016, 117:51-61

2 Identifying loci with breeding potential across temperate and tropical adaptation via EigenGWAS and EnvGWAS. Mol Ecology, 2019, 28:3544-60

Java tools for EigenGWAS analysis. GEAR

2. Discrete subgroups

A pair of subgroups

Step 1 Set the genetic backgroud differentiation between a pair of subgroups as \(F_{st}=f\), in which \(f\) quantifies the intensity of genetic drift.

Step 2 Simulating ancestral population that has the frequencies vector \(P\) for \(M\) loci.

Step 3 Simulating the diversed frequencies for two subpopulations \(Frq_1=beta(\alpha=\frac{P(1-f)}{f}, \beta=\frac{(1-P)(1-f)}{f})\), \(Frq_2=beta(\alpha=\frac{P(1-f)}{f}, \beta=\frac{(1-P)(1-f)}{f})\),

Under \(beta\) distribution, it will have mean of the two population \(\frac{\alpha}{\alpha+\beta}\), and variance \(\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}\). It is easy to see that for the simulated \(Frq_1\) and \(Frq_2\), the averaged allele frequency is

\[\frac{\alpha}{\alpha+\beta}=\frac{\frac{P(1-f)}{f}}{\frac{P(1-f)}{f}+\frac{(1-P)(1-f)}{f}}=P\]

identical to the ancestral population, and the variance is

\[\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}=\frac{\frac{P(1-f)}{f}\frac{(1-P)(1-f)}{f}}{[\frac{(1-f)}{f}]^2[\frac{1}{f}]}=P(1-P)f\]

It is known that under genetic drift the allele frequency randomly runs towards fixation, either 0 or 1, given its intial allele frequency (See binomail Tab for its demonstration). The speed of changing can be written as \(\sigma=\sqrt{p(1-p)/n}\), proportional to the sample size (effective sample size).

See more statistical properties of Beta distribution wiki.

Simulating LD

3. Admixted population

Admixture Dirichlet distribution

Dirichlet distribution wiki

\[D(\alpha_1, \alpha_2, ...\alpha_K)\] \(E(\alpha_i)=\frac{\alpha_i}{\sum_1^K\alpha_i}\), and \[var(\alpha_i)=\frac{\tilde{\alpha}_i(1-\tilde{\alpha}_i)}{\alpha_0+1}\] \[cov(\alpha_i, \alpha_j)=\frac{-\tilde{\alpha}_i\tilde{\alpha}_j}{\alpha_0+1}\]

in which \(\tilde{\alpha}_i=\frac{\alpha_i}{\sum_1^K\alpha_i}\) and \(\alpha_0=\sum_1^K\alpha_i\).

## Loading required package: coda
## Loading required package: MASS
## ##
## ## Markov Chain Monte Carlo Package (MCMCpack)
## ## Copyright (C) 2003-2021 Andrew D. Martin, Kevin M. Quinn, and Jong Hee Park
## ##
## ## Support provided by the U.S. National Science Foundation
## ## (Grants SES-0350646 and SES-0350613)
## ##

Six pops

M=c(10000)
fst1=0.05
fst2=0.01
for(m in 1:length(M)) {
  P=runif(M[m], 0.2, 0.8)
  Sz=c(100, 100, 100, 100, 100, 100)
  Fq=matrix(0, length(Sz), M[m])
  Fq[1,]=rbeta(M[m], P*(1-fst1)/fst1, (1-P)*(1-fst1)/fst1)

  P3=rbeta(M[m], P*(1-fst1)/fst1, (1-P)*(1-fst1)/fst1)

  P2=(0.5*Fq[1,]+0.5*P3)

  Fq[3,]=rbeta(M[m], P2*(1-fst2)/fst2, (1-P2)*(1-fst2)/fst2)
  Fq[4,]=rbeta(M[m], P2*(1-fst2)/fst2, (1-P2)*(1-fst2)/fst2)
  
  Fq[2,]=rbeta(M[m], P3*(1-fst2)/fst2, (1-P3)*(1-fst2)/fst2)
  Fq[5,]=rbeta(M[m], P3*(1-fst2)/fst2, (1-P3)*(1-fst2)/fst2)
  Fq[6,]=rbeta(M[m], P3*(1-fst2)/fst2, (1-P3)*(1-fst2)/fst2)

  G=matrix(0, sum(Sz), M[m])
  for(i in 1:Sz[1]) {
    G[i,] = rbinom(M[m], 2, Fq[1,])
  }
  
  for(i in (Sz[1]+1):(Sz[1]+Sz[2])) {
    G[i,] = rbinom(M[m], 2, Fq[2,])
  }
  
  for(i in (Sz[1]+Sz[2]+1):sum(Sz)) {
    G[i,] = rbinom(M[m], 2, Fq[3,])
  }

  for(i in (Sz[1]+Sz[2]+Sz[3]+1):sum(Sz)) {
    G[i,] = rbinom(M[m], 2, Fq[4,])
  }

  for(i in (Sz[1]+Sz[2]+Sz[3]+Sz[4]+1):sum(Sz)) {
    G[i,] = rbinom(M[m], 2, Fq[5,])
  }

  for(i in (Sz[1]+Sz[2]+Sz[3]+Sz[4]+Sz[5]+1):sum(Sz)) {
    G[i,] = rbinom(M[m], 2, Fq[6,])
  }

  Gs=apply(G, 2, scale)
  GG=Gs %*% t(Gs)
  EigenG=eigen(GG)
  layout(matrix(1:2, 1, 2))
  barplot(EigenG$values[1:20])
  plot(EigenG$vectors[,1], EigenG$vectors[,2])
  layout(matrix(1:6, 2, 3, byrow=T))
  plot(xlim=range(EigenG$vectors[,1]), ylim=range(EigenG$vectors[,2]), main=paste(M[m], "markers"), EigenG$vectors[1:100,1], EigenG$vectors[1:100,2], bty='n', xlab="PC 1", ylab="PC 2", pch=16)
  plot(xlim=range(EigenG$vectors[,1]), ylim=range(EigenG$vectors[,2]), main=paste(M[m], "markers"), EigenG$vectors[101:200,1], EigenG$vectors[101:200,2], bty='n', xlab="PC 1", ylab="PC 2", pch=16)
  plot(xlim=range(EigenG$vectors[,1]), ylim=range(EigenG$vectors[,2]), main=paste(M[m], "markers"), EigenG$vectors[201:300,1], EigenG$vectors[201:300,2], bty='n', xlab="PC 1", ylab="PC 2", pch=16)
  plot(xlim=range(EigenG$vectors[,1]), ylim=range(EigenG$vectors[,2]), main=paste(M[m], "markers"), EigenG$vectors[301:400,1], EigenG$vectors[301:400,2], bty='n', xlab="PC 1", ylab="PC 2", pch=16)
  plot(xlim=range(EigenG$vectors[,1]), ylim=range(EigenG$vectors[,2]), main=paste(M[m], "markers"), EigenG$vectors[401:500,1], EigenG$vectors[401:500,2], bty='n', xlab="PC 1", ylab="PC 2", pch=16)
  plot(xlim=range(EigenG$vectors[,1]), ylim=range(EigenG$vectors[,2]), main=paste(M[m], "markers"), EigenG$vectors[501:600,1], EigenG$vectors[501:600,2], bty='n', xlab="PC 1", ylab="PC 2", pch=16)
}

4 PCA methods

5 Homo and hetro

Heterogeneous cohort

rep=1
EV=matrix(0, rep, 10)
typeI=matrix(0, rep, 4)
fst=c(0.002, 0.01)
N1=c(100, 500, 1000)
N2=1000
Ml=c(8000, 2000)
M=sum(Ml)
ME=matrix(0, length(N1), 3)
for(s in 1:length(N1)) {
  for(r in 1:rep) {
    P=runif(M, 0.2, 0.8)
    p1=runif(Ml[1], 0.2, 0.8)
    p2=runif(Ml[2], 0.2, 0.8)
    P=c(p1, p2)
    P1=c(rbeta(Ml[1], p1*(1-fst[1])/fst[1], (1-p1)*(1-fst[1])/fst[1]),
         rbeta(Ml[2], p2*(1-fst[2])/fst[2], (1-p2)*(1-fst[2])/fst[2]))
    P2=c(rbeta(Ml[1], p1*(1-fst[1])/fst[1], (1-p1)*(1-fst[1])/fst[1]),
         rbeta(Ml[2], p2*(1-fst[2])/fst[2], (1-p2)*(1-fst[2])/fst[2]))

    Gn=matrix(0, nrow=N1[s]+N2, ncol=M)
    for(i in 1:N1[s]) {
      Gn[i,] = rbinom(M, 2, P1)
    }

    for(i in (N1[s]+1):(N2+N1[s])) {
      Gn[i,] = rbinom(M, 2, P2)
    }
    Frq1=apply(Gn[1:N1[s],], 2, mean)/2
    Frq2=apply(Gn[(N1[s]+1):(N1[s]+N2),], 2, mean)/2
    FrqM=apply(Gn, 2, mean)/2
    Fst=2*(N1[s]/(N1[s]+N2)*(Frq1-FrqM)^2 + N2/(N1[s]+N2) * (Frq2-FrqM)^2)/(FrqM*(1-FrqM))
    FstN=(Frq1-Frq2)^2/(2*FrqM*(1-FrqM))
    plot(Fst, FstN)
  }

  GnS=apply(Gn, 2, scale)
  G=GnS %*% t(GnS)/M
  EigenGN=eigen(G)
  
  RegB=matrix(0, M, 4)
  for(i in 1:M) {
    mod=lm(EigenGN$vectors[,1]~Gn[,i])
    RegB[i,1]=summary(mod)$coefficients[2,1]
    RegB[i,2]=summary(mod)$coefficients[2,2]
  }
  RegB[,3]=RegB[,1]^2/RegB[,2]^2
  qqplot(rchisq(M, 1), RegB[,3], bty='n', pch=16)
  abline(a=0, b=1, col="red")
  gc=median(RegB[,3])/0.455
  abline(a=0, b=gc, col="blue")
  
  ME[s,1]=mean(Fst)*(N1[s]+N2)
  ME[s,2]=gc
  ME[s,3]=EigenGN$values[1]
}
rownames(ME)=N1
barplot(t(ME), beside = T, border = F)
legend("topleft", legend = c("Fst", "GC", "Eigenvalue"), pch=15, col=c("black", "grey50", "grey"), bty='n')

6 Power analysis

Noncentrality parameter for \(\chi^2_1\)

\(\chi^2_1\) with NCP, wiki.

NCP for additive model

For linear mode \(\tilde{E}_k=\mu+\beta_lx_l+e\), in which \(\tilde{E}_k\) is the \(k^{th}\) standardized eigenvector of interest, \(x_l\) codes for the additive effect, and \(\beta_l\) is the regression coefficient, and \(e\) is the residual of the model. The regression coefficient can be estimated as \[ \hat{\beta}_l=\frac{cov(x_l, \tilde{E}_k)}{var(x_l)} \]

\(x_l\) Marginal probability
\(aa\) \(Aa\) \(AA\)
Subgroup 1 \(\tilde{E}_k\) 2 1 0
\(\color{cyan}{\frac{\sqrt{\frac{n_2}{n_1}}}{1+\omega_1\sigma^2_1+\omega_2\sigma^2_2}}\) \(\color{cyan}{p^2_1}\) \(\color{cyan}{2p_1q_1}\) \(\color{cyan}{q^2_1}\) \(1-\gamma_1\) \(\color{cyan}{\omega_1=\frac{n_1}{n_1+n_2}}\)
\(\color{pink}{p^2_2}\) \(\color{pink}{2p_2q_2}\) \(\color{pink}{q^2_2}\) \(\gamma_2\)
Subgroup 2 2 1 0
\(\color{pink}{\frac{-\sqrt{\frac{n_1}{n_2}}}{1+\omega_1\sigma^2_1+\omega_2\sigma^2_2}}\) \(\color{pink}{p^2_2}\) \(\color{pink}{2p_2q_2}\) \(\color{pink}{q^2_2}\) \(1-\gamma_2\) \(\color{pink}{\omega_1=\frac{n_1}{n_1+n_2}}\)
\(\color{cyan}{p^2_1}\) \(\color{cyan}{2p_1q_1}\) \(\color{cyan}{q^2_1}\) \(\gamma_1\)

\[cov(\tilde{E}_k,x_l)=E(\tilde{E}_kx_l)-E(\tilde{E}_k)E(x_l)=E(\tilde{E}_kx_l)\]

And, \[ \begin{align} E(\tilde{E}_kx_l)&=&\color{cyan}{\frac{\sqrt{\frac{n_2}{n_1}}}{1+\omega_1\sigma^2_1+\omega_2\sigma^2_2}}[\omega_1(2p^2_1+2p_1q_1)(1-\gamma_1)+\omega_2(2p^2_2+2p_2q_2)\gamma_2]\\ &&- \color{pink}{\frac{\sqrt{\frac{n_1}{n_2}}}{1+\omega_1\sigma^2_1+\omega_2\sigma^2_2}}[\omega_2(2p^2_2+2p_2q_2)(1-\gamma_2)+\omega_2(2p^2_1+2p_2q_1)\gamma_1]\\ &=&\color{cyan} { \frac{2\sqrt{\frac{n_2}{n_1}}}{1+\omega_1\sigma^2_1+\omega_2\sigma^2_2} }[\omega_1 \color{cyan}{p_1}(1-\gamma_1)+\omega_2 \color{cyan}{p_2}\gamma_2] -\color{pink} { \frac{2\sqrt{\frac{n_1}{n_2}}}{1+\omega_1\sigma^2_1+\omega_2\sigma^2_2} }[\omega_2 \color{pink}{p_2}(1-\gamma_2)+\omega_1 \color{pink}{p_1}\gamma_1] \end{align} \]

Under the condition that \(\gamma_1= 0\) and $ _2= 0$, and the sampling variance \(\sigma^2_.\) further shrinks to zero, the NCP becomes \[2\sqrt{\frac{n_1}{N}\frac{n_2}{N}}(p_1-p_2)\]

\(x_l\) Marginal probability
\(aa\) \(Aa\) \(AA\)
Subgroup 1 \(\tilde{E}_k\) 2 1 0
\(\color{cyan}{\sqrt{\frac{n_2}{n_1}}}\) \(\color{cyan}{p^2_1}\) \(\color{cyan}{2p_1q_1}\) \(\color{cyan}{q^2_1}\) \(\color{cyan}{\omega_1=\frac{n_1}{n_1+n_2}}\)
Subgroup 2 2 1 0
\(\color{pink}{-\sqrt{\frac{n_2}{n_1}}}\) \(\color{pink}{p^2_2}\) \(\color{pink}{2p_2q_2}\) \(\color{pink}{q^2_2}\) \(\color{pink}{\omega_1=\frac{n_1}{n_1+n_2}}\)

\[ E(\tilde{E}_kx_l)=\omega_1\color{cyan}{\sqrt{\frac{n_2}{n_1}}}[(2p^2_1+2p_1q_1)]- \omega_2\color{pink}{\sqrt{\frac{n_1}{n_2}}}[(2p^2_2+2p_2q_2)] =2\sqrt{\frac{n_1}{N}\frac{n_2}{N}}(p_1-p_2) \] We have \[E(\beta_l)\approx\frac{2\sqrt{\omega_1\omega_2} (p_1-p_2)}{2p_lq_l}\]

The sampling variance of \(\beta_l\) is \(\sigma_{\beta_l}=\sqrt{\frac{\sigma^2_{E_k}-\beta_l^2\sigma^2_{x_l}}{(N-1)\sigma^2_{x_l}}}\), so the z-statistic is \[z \approx \sqrt{(N-1)\omega_1\omega_2}\frac{2(p_1-p_2)}{\sqrt{2p_lq_l}}\]

and \(z^2\sim\chi^2_1\), with NCP of \((N-1)\omega_1\omega_2\frac{4(p_1-p_2)^2}{2p_lq_l}\).

NCP for Inbred line

\(x_l\) Marginal probability
\(aa\) \(AA\)
Subgroup 1 \(\tilde{E}_k\) 2 0
\(\color{cyan}{\sqrt{\frac{n_2}{n_1}}}\) \(\color{cyan}{p^2_1}\) \(\color{cyan}{q^2_1}\) \(\color{cyan}{\omega_1=\frac{n_1}{n_1+n_2}}\)
Subgroup 2 2 0
\(\color{pink}{-\sqrt{\frac{n_1}{n_2}}}\) \(\color{pink}{p^2_2}\) \(\color{pink}{q^2_2}\) \(\color{pink}{\omega_1=\frac{n_1}{n_1+n_2}}\)

\[ E(\tilde{E}_kx)=-\omega_1\sqrt{\frac{n_2}{n_1}}2p_1+\omega_2\sqrt{\frac{n_2}{n_1}}2p_2 =2\frac{\sqrt{n_1n_2}}{N}(p_1-p_2) \]

\[ E(\beta_l)=\frac{E(\tilde{E}_kx_l)}{4p_lq_l}=\frac{2\frac{\sqrt{n_1n_2}}{N}(p_1-p_2)}{4p_lq_l} \] \[ \sigma_{\beta_l}=\sqrt{\frac{\sigma^2_{E_k}-\beta_l^2\sigma^2_{x_l}}{(n-1)\sigma^2_{x_l}}} \]

\(z\approx\sqrt{(n-1)\omega_1\omega_2}\frac{2(p_1-p_2)}{4p_lq_l}\), and its NCP is \((n-1)\omega_1\omega_2\frac{(p_1-p_2)^2}{p_lq_l}\), which is half of that for a random mating population.

NCP for dominance model

For linear mode \(\tilde{E}_k=\mu+\beta_dx_d+e\), here we drop off the subscript \(l\) for the target locus.

\(x_d\) Marginal probability
\(aa\) \(Aa\) \(AA\)
Subgroup 1 \(\tilde{E}_k\) 0 1 0
\(\color{cyan}{\sqrt{\frac{n_2}{n_1}}}\) \(\color{cyan}{p^2_1}\) \(\color{cyan}{2p_1q_1}\) \(\color{cyan}{q^2_1}\) \(\color{cyan}{\omega_1=\frac{n_1}{n_1+n_2}}\)
Subgroup 2 0 1 0
\(\color{pink}{-\sqrt{\frac{n_1}{n_2}}}\) \(\color{pink}{p^2_2}\) \(\color{pink}{2p_2q_2}\) \(\color{pink}{q^2_2}\) \(\color{pink}{\omega_1=\frac{n_1}{n_1+n_2}}\)

\[ E(\tilde{E}_kx_d)=\omega_1\sqrt{\frac{n_2}{n_1}}2p_1q_1-\omega_2\sqrt{\frac{n_1}{n_2}}2p_2q_2=\sqrt{\frac{n_1}{N}\frac{n_2}{N}}(2p_1q_1-2p_2q_2) \] We have \(E(x_d)=\omega_1\), and \(var(x_d)=f_2(1-f_2)\), in which \(f_2=\omega_12p_1q_1+\omega_22p_2q_2\), and \[E(\beta_d)=\frac{\sqrt{\omega_1\omega_2}(2p_1q_1-2p_2q_2)}{f_2(1-f_2)}\]

The sampling variance for \(\beta_d\) is \(\sigma_{\beta_d}=\sqrt{\frac{\sigma^2_{E_k}-\beta_d^2\sigma_{x_k}^2}{(n-1)\sigma^2_{x_k}}}\).

So, the z-score is \[z\approx \frac{\frac{\sqrt{\omega_1\omega_2(2p_1q_1-2p_2q_2)}}{f_2(1-f_2)}}{\sqrt{\frac{\sigma^2_{E_k}-\beta^2_d\sigma^2_{x_d}}{(n-1)\sigma^2_{x_d}}}}=\sqrt{(n-1)\omega_1\omega_2}\frac{2p_1q_1-2p_2q_2}{\sqrt{f_2(1-f_2)}}\] and \(z^2\sim\chi^2_1\), with NCP of \((n-1)\omega_1\omega_2\frac{(2p_1q_1-2p_2q_2)^2}{f_2(1-f_2)}\). In comparsion, for the additive effect, the NCP is \(4n\omega_1\omega_2\frac{(p_{l|1}-p_{l|2})^2}{2p_lq_2}\).

A realized power calculation for EigenGWAS

EigenGWAS power analysis Shiny.

Related parameters:

\(p_1\) and \(p_2\) are the allele frequencies for the reference allele at two sub groups.

\(n_1\) and \(n_2\) are sample sizes for two subgroups, respectively. \(n=n_1+n_2\) the total sample size.

\(\omega_1=\frac{n_1}{n_1+n_2}\), and \(\omega_2=\frac{n_2}{n_1+n_2}\).

\(p=\omega_1p_1+\omega_2p_2\)

R validation

7 EigenGWAS pipeline

Random mating population (CEU vs TSI)

Replace the plink with your own path. Using HapMap CEU (northwest European) vs TSI (south Europeans) as example. The data can be found here.

library(qqman)
## 
## For example usage please run: vignette('qqman')
## 
## Citation appreciated but not required:
## Turner, S.D. qqman: an R package for visualizing GWAS results using Q-Q and manhattan plots. biorXiv DOI: 10.1101/005165 (2014).
## 
plink2='/Users/gc5k/bin/plink_mac/plink' #replace it with your own path
dat="./data/euro/euro_10K"
layout(matrix(1:6, 2, 3))

#make-grm
grmCmd=paste(plink2, "--bfile ", dat, "--make-grm-gz --out ", dat)
system(grmCmd)

gz=gzfile(paste0(dat, ".grm.gz"))
grm=read.table(gz, as.is = T)
Ne=-1/mean(grm[grm[,1]!=grm[,2], 4])
Me=1/var(grm[grm[,1]!=grm[,2], 4])
print(paste("Ne=", format(Ne, digits = 2), "Me=", format(Me, digits = 2)))
## [1] "Ne= 199 Me= 18440"
hist(grm[grm[,1]!=grm[,2],4], main="Pairwise relatedness", xlab="Relatedness score", breaks = 50)

#pca
pcaCmd=paste(plink2, "--bfile ", dat, "--pca 10 --out ", dat)
system(pcaCmd)
barplot(main="Top 10 eigenvalue", read.table(paste0(dat, ".eigenval"), as.is = T)[,1], border = F)
abline(h=1, col="red", lty=2, lwd=3)

pc=read.table(paste0(dat, ".eigenvec"), as.is = T)
plot(pc[,3], pc[,4], xlab="Eigenvector 1", ylab="Eigenvector 2", bty="n", main="Eigenspace", bty="n", col=ifelse(pc[,3]>0, "red", "blue"), pch=16, cex=0.5)

#EigenGWAS
liCmd=paste0(plink2, " --linear --bfile ", dat, " --pheno ", dat, ".eigenvec --out ", dat)
system(liCmd)

#plot
EigenRes=read.table(paste0(dat, ".assoc.linear"), as.is = T, header = T)
EigenRes$Praw=EigenRes$P
gc=qchisq(median(EigenRes$P), 1, lower.tail = F)/qchisq(0.5, 1, lower.tail = F)
print(paste("GC = ", format(gc, digits = 4)))
## [1] "GC =  1.701"
EigenRes$P=pchisq(qchisq(EigenRes$Praw, 1, lower.tail = F)/gc, 1, lower.tail = F)
manhattan(EigenRes, main="EigenGWAS 1", p="P", cex=0.3, bty='n')

#QQplot
chiseq=rchisq(nrow(EigenRes), 1)
qqplot(chiseq, qchisq(EigenRes$Praw, 1, lower.tail = F), xlab=expression(paste("Theoretical ", chi[1]^2)), ylab=expression(paste("Observed ", chi[1]^2)), bty="n", col="grey", pch=16, cex=0.5)
points(sort(chiseq), sort(qchisq(EigenRes$P, 1, lower.tail = F)), col="black", pch=16, cex=0.5)
legend("topleft", legend = c("Raw", "GC correction"), pch=16, cex=0.5, col=c("grey", "black"), bty='n')
abline(a=0, b=1, col="red", lty=2)

Inbred population (Arabdopsis)

Replace the plink with your own path. The Arabidopsis data can be found here.

library(qqman)
plink2='/Users/gc5k/bin/plink_mac/plink' #replace it with your own path
dat="./data/arab/arab"

layout(matrix(1:6, 2, 3))
#make-grm
grmCmd=paste(plink2, "--bfile ", dat, "--make-grm-gz --out ", dat)
system(grmCmd)

gz=gzfile(paste0(dat, ".grm.gz"))
grm=read.table(gz, as.is = T)
Ne=-1/mean(grm[grm[,1]!=grm[,2], 4]/2)
Me=1/var(grm[grm[,1]!=grm[,2], 4]/2)
print(paste("Ne=", format(Ne, digits = 2), "Me=", format(Me, digits = 2)))
## [1] "Ne= 293 Me= 395"
hist(grm[grm[,1]!=grm[,2],4]/2, main="Pairwise relatedness", xlab="Relatedness score", breaks = 50)

#pca
pcaCmd=paste(plink2, "--bfile ", dat, "--pca 10 --out ", dat)
system(pcaCmd)
barplot(main="Top 10 eigenvalue", read.table(paste0(dat, ".eigenval"), as.is = T)[,1]/2, border = F)
abline(h=1, col="red", lty=2, lwd=3)

pc=read.table(paste0(dat, ".eigenvec"), as.is = T)
plot(pc[,3], pc[,4], xlab="Eigenvector 1", ylab="Eigenvector 2", bty="n", main="Eigenspace", bty="n", col=ifelse(pc[,3]>0, "red", "blue"), pch=16, cex=0.5)

#EigenGWAS
liCmd=paste0(plink2, " --linear --bfile ", dat, " --pheno ", dat, ".eigenvec --out ", dat)
system(liCmd)

#plot
EigenRes=read.table(paste0(dat, ".assoc.linear"), as.is = T, header = T)
EigenRes$Praw=EigenRes$P
gc=qchisq(median(EigenRes$P), 1, lower.tail = F)/qchisq(0.5, 1, lower.tail = F)
print(paste("GC = ", format(gc, digits = 4)))
## [1] "GC =  9.047"
EigenRes$P=pchisq(qchisq(EigenRes$Praw, 1, lower.tail = F)/gc, 1, lower.tail = F)
manhattan(EigenRes, main="EigenGWAS 1", cex=0.3, bty='n')

#QQplot
chiseq=rchisq(nrow(EigenRes), 1)
qqplot(chiseq, qchisq(EigenRes$Praw, 1, lower.tail = F), xlab=expression(paste("Theoretical ", chi[1]^2)), ylab=expression(paste("Observed ", chi[1]^2)), bty="n", col="grey", pch=16, cex=0.5)
points(sort(chiseq), sort(qchisq(EigenRes$P, 1, lower.tail = F)), col="black", pch=16, cex=0.5)
legend("topleft", legend = c("Raw", "GC correction"), pch=16, cex=0.5, col=c("grey", "black"), bty='n')
abline(a=0, b=1, col="red", lty=2)

Bayes-P

## Warning in pchisq(arabVec$Chi, 1, ncp = arabVec$Fst * 295, lower.tail = F): full
## precision may not have been achieved in 'pnchisq'

## Warning in pchisq(arabVec$Chi, 1, ncp = arabVec$Fst * 295, lower.tail = F): full
## precision may not have been achieved in 'pnchisq'

## Warning in pchisq(arabVec$Chi, 1, ncp = arabVec$Fst * 295, lower.tail = F): full
## precision may not have been achieved in 'pnchisq'

## Warning in pchisq(arabVec$Chi, 1, ncp = arabVec$Fst * 295, lower.tail = F): full
## precision may not have been achieved in 'pnchisq'

## Warning in pchisq(arabVec$Chi, 1, ncp = arabVec$Fst * 295, lower.tail = F): full
## precision may not have been achieved in 'pnchisq'

## Warning in pchisq(arabVec$Chi, 1, ncp = arabVec$Fst * 295, lower.tail = F): full
## precision may not have been achieved in 'pnchisq'

## Warning in pchisq(arabVec$Chi, 1, ncp = arabVec$Fst * 295, lower.tail = F): full
## precision may not have been achieved in 'pnchisq'

## Warning in pchisq(arabVec$Chi, 1, ncp = arabVec$Fst * 295, lower.tail = F): full
## precision may not have been achieved in 'pnchisq'

## Warning in pchisq(arabVec$Chi, 1, ncp = arabVec$Fst * 295, lower.tail = F): full
## precision may not have been achieved in 'pnchisq'

## Warning in pchisq(arabVec$Chi, 1, ncp = arabVec$Fst * 295, lower.tail = F): full
## precision may not have been achieved in 'pnchisq'

## Warning in pchisq(arabVec$Chi, 1, ncp = arabVec$Fst * 295, lower.tail = F): full
## precision may not have been achieved in 'pnchisq'

## Warning in pchisq(arabVec$Chi, 1, ncp = arabVec$Fst * 295, lower.tail = F): full
## precision may not have been achieved in 'pnchisq'

## Warning in pchisq(arabVec$Chi, 1, ncp = arabVec$Fst * 295, lower.tail = F): full
## precision may not have been achieved in 'pnchisq'

## Warning in pchisq(arabVec$Chi, 1, ncp = arabVec$Fst * 295, lower.tail = F): full
## precision may not have been achieved in 'pnchisq'

## Warning in pchisq(arabVec$Chi, 1, ncp = arabVec$Fst * 295, lower.tail = F): full
## precision may not have been achieved in 'pnchisq'

## Warning in pchisq(arabVec$Chi, 1, ncp = arabVec$Fst * 295, lower.tail = F): full
## precision may not have been achieved in 'pnchisq'

## Warning in pchisq(arabVec$Chi, 1, ncp = arabVec$Fst * 295, lower.tail = F): full
## precision may not have been achieved in 'pnchisq'

## Warning in pchisq(arabVec$Chi, 1, ncp = arabVec$Fst * 295, lower.tail = F): full
## precision may not have been achieved in 'pnchisq'

## Warning in pchisq(arabVec$Chi, 1, ncp = arabVec$Fst * 295, lower.tail = F): full
## precision may not have been achieved in 'pnchisq'

## Warning in pchisq(arabVec$Chi, 1, ncp = arabVec$Fst * 295, lower.tail = F): full
## precision may not have been achieved in 'pnchisq'

## Warning in locfdr(zzp, nulltype = 1, df = 50, bre = 100, plot = 1, main =
## "arabVec$P"): f(z) misfit = 16. Rerun with increased df

R.sys

## R version 3.6.2 (2019-12-12)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS  10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] locfdr_1.1-8      RMTstat_0.3       qqman_0.1.4       MCMCpack_1.4-9   
## [5] MASS_7.3-53       coda_0.19-4       Compositional_4.1 Rcpp_1.0.5       
## 
## loaded via a namespace (and not attached):
##  [1] emplik_1.1-1        Rfast2_0.0.6        xfun_0.18          
##  [4] listenv_0.8.0       splines_3.6.2       lattice_0.20-41    
##  [7] htmltools_0.5.0     stats4_3.6.2        yaml_2.2.1         
## [10] rlang_0.4.8         sn_1.6-2            RcppZiggurat_0.1.5 
## [13] calibrate_1.7.7     matrixStats_0.57.0  foreach_1.5.1      
## [16] mda_0.5-2           stringr_1.4.0       MatrixModels_0.4-1 
## [19] future_1.19.1       codetools_0.2-16    evaluate_0.14      
## [22] knitr_1.30          SparseM_1.78        doParallel_1.0.16  
## [25] mixture_1.5         quantreg_5.74       parallel_3.6.2     
## [28] class_7.3-17        Rfast_2.0.1         codalm_0.1.0       
## [31] conquer_1.0.2       tmvnsim_1.0-2       mcmc_0.9-7         
## [34] FlexDir_1.0         mnormt_2.0.2        RANN_2.6.1         
## [37] digest_0.6.26       stringi_1.5.3       numDeriv_2016.8-1.1
## [40] grid_3.6.2          tools_3.6.2         magrittr_1.5       
## [43] future.apply_1.6.0  Matrix_1.2-18       SQUAREM_2020.4     
## [46] rmarkdown_2.4       iterators_1.0.13    globals_0.13.1     
## [49] nnet_7.3-14         compiler_3.6.2