In this tutorial on Polygenic Risk Score (PRS) analysis using
PRSice-2, our objective is to explore the genetic architecture
underlying complex traits of Alzheimer’s disease. PRS integrates
information from multiple genetic variants, typically Single Nucleotide
Polymorphisms (SNPs), to quantify an individual’s genetic predisposition
for a specific trait or disease. Thus, PRS allows us to capture subtle
genetic influences that may be missed by focusing on individual variants
alone.
Data used in preparation of this article were obtained from the
Alzheimer’s Disease Neuroimaging Initiative (ADNI) database
(adni.loni.usc.edu). As such, the investigators within the ADNI
contributed to the design and implementation of ADNI and/or provided
data but did not participate in analysis or writing of this report. A
complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf
In PRSice-2, the target and base data must be quality controlled to the standards implemented in GWAS studies, e.g. removing SNPs with low genotyping rate, low minor allele frequency, out of Hardy-Weinberg Equilibrium, removing individuals with low genotyping rate (see Marees et al).
plink \
--bfile chip1_2_QC \
--keep ADNI1_to_keep_Dx.txt\
--make-bed \
--out ADNI1_DX
plink \
--bfile ADNI1_DX \
--maf 0.01 \
--hwe 1e-6 \
--make-bed \
--out ADNI1_DX.QC
plink \
--bfile chip1_2_QC \
--update-ids updated_ADNI3_iids2.txt\
--keep ADNI3_Dx.txt\
--make-bed \
--out ADNI3_DX
plink \
--bfile ADNI3_DX \
--maf 0.01 \
--hwe 1e-6 \
--make-bed \
--out ADNI3_DX.QC
Here, PRSice-2 software is employed to compute PRS based on GWAS summary statistics. These scores are derived from the association of genetic variants with a trait of interest, facilitating the identification of genetic risk factors associated with Alzheimer’s disease.
plink --bfile ADNI3_DX.QC \
--pheno ADNI3_Dx.txt \
--pheno-name DX_bl\
--allow-no-sex \
--logistic \
--out my_logistic_regression_results_Dx
awk '!seen[$2]++' my_logistic_regression_results_Dx.assoc.logistic > my_unique_results.assoc.logistic
Rscript PRSice.R \
--prsice ./PRSice_mac \
--base my_unique_results.assoc.logistic\
--target ADNI1_Dx \
--pheno ADNI1_Dx.txt \
--thread 1 \
--stat OR \
--binary-target T \
--perm 10000
The above figure shows a model using SNPs with a p-value up to 0.041 achieves the highest predictive value in the target sample with a p-value of 0.001. We can therefore conclude that many SNPs that affect Alzheimer’s disease in the base sample can be used to predict the diagnosis in the target sample. PRSice also offers many additional options to adjust the risk score analysis, including adding covariates, additional principal components and adjusting clumping parameters which will be explored in future tutorials.