Investigate the genetic association between blood plasma proteins and Alzheimer’s Disease (AD) to inform disease prediction and provide signals for functional research
PENDING ACTUAL GRAPHIC
71 candidate proteins were identified from a literature review of 33 papers.
Proteins were marked as candidates if significantly associated with an AD related phenotype in an independent discovery study (univariate or multivariate) or if previously replicated in 3 or more studies in Kiddle et al’s comprehensive 2014 review.
25 proteins with GWAS data from Sun et al were selected for further analysis.
| Protein | SOMAmerID | Protein Short Code | UniProtID | Number of studies where significant association with AD phenotype |
|---|---|---|---|---|
| Apolipoprotein E | APOE.2937.10.2 | APOE | P02649 | 7 |
| Pancreatic prohormone | PPY.4588.1.2 | PPY | P01298 | 7 |
| Complement C3 | C3.2755.8.2 | C3 | P01024 | 6 |
| Complement factor H | CFH.4159.130.1 | CFH | P08603 | 6 |
| Clusterin | CLU.4542.24.2 | CLU | P10909 | 5 |
| Plasma protease C1 inhibitor | SERPING1.4479.14.2 | SERPING1 | P05155 | 5 |
| Serum amyloid P component | APCS.2474.54.5 | APCS | P02743 | 5 |
| Insulin-like growth factor-binding protein 2 | IGFBP2.2570.72.5 | IGFBP2 | P18065 | 4 |
| Interleukin-10 | IL10.2773.50.2 | IL10 | P22301 | 4 |
| Interleukin-3 | IL3.4717.55.2 | IL3 | P08700 | 4 |
| Vitronectin | VTN.13125.45.3 | VTN | P04004 | 4 |
| Fibronectin | FN1.3435.53.2 | FN1 | P02751 | 3 |
| Granulocyte colony-stimulating factor | CSF3.8952.65.3 | CSF3 | P09919 | 3 |
| Haptoglobin | HP.3054.3.2 | HP | P00738 | 3 |
| Complement C4 A/B | C4A.C4B.4481.34.2 | C4A | P0C0L4,P0C0L5 | 2 |
| Complement component C6 | C6.4127.75.1 | C6 | P13671 | 2 |
| Alpha-2-HS-glycoprotein | AHSG.3581.53.3 | AHSG | P02765 | 1 |
| Amyloid beta A4 protein | APP.3171.57.2 | APP | P05067 | 1 |
| Amyloid-beta A4 precursor protein-binding family B member 3 | APBB3.13589.10.3 | APBB3 | O95704 | 1 |
| Brain-derived neurotrophic factor | BDNF.2421.7.3 | BDNF | P23560 | 1 |
| Fibrinogen gamma chain | FGA.FGB.FGG.4907.56.1 | FGA | P02671,P02675,P02679 | 1 |
| Fibulin-1 | FBLN1.6470.19.3 | FBLN1 | P23142 | 1 |
| Inter-alpha-trypsin inhibitor heavy chain H1 | ITIH1.7955.195.3 | ITIH1 | P19827 | 1 |
| Prostate-specific antigen | KLK3.8468.19.3 | KLK3 | P07288 | 1 |
| Receptor tyrosine-protein kinase erbB-2 | ERBB2.2616.23.18 | ERBB2 | P04626 | 1 |
Summary GWAS statistics for the 25 target proteins were downloaded from the repository provided by the Cambridge Cardiovascular Epidemiology Unit based on 3,301 healthy individuals (‘Base data’).
Removed [X] bi-allelic and duplicate SNPs resulting in [X] SNPs for analysis.
Raw genotype data from GERAD, ADNI and ANM for 6244 individuals accessed via KCL Rosalind High Performance Computing cluster (‘Target data’).
Consistent QC applied [REQUIRES VALIDATION]
Samples: European ancestry, non-relatives, no other dementia or mild cognitive impairment, call rate > 98%
SNPs: MAF >0.01, call rate > 98%, HWE p<10^-5, imputed with INFO quality score >0.7
Only SNPs genotyped in all three Target data samples were selected from the Base data SNPs resulting in 5,210,103 SNPs for consideration in the PRS model.
| ADNI | ANM | GERAD | TOTAL | |
|---|---|---|---|---|
| Cases | 639 | 371 | 3277 | 4287 |
| Controls | 368 | 374 | 1215 | 1957 |
| Total | 1007 | 745 | 4492 | 6244 |
Observed protein heritability (h2) was estimated using linkage disequilibrium score regression (LDSR).
Average mean chi squared was low at 1.014048.
Results treated as indicative given average SE of 0.192148 and known challenges of applying LDSR to samples <5,000.
Genetic correlation (rg) between proteins was also estimated using LDSR.
No proteins had an rg of over 0.99 at a p-value less than 0.05.
[X]% results were NA. [Explain NA in context of LDSR].
[TABLE TO BE ADDED]
PRS models built for each of the 25 target proteins at 10 pre-defined p-value thresholds 5e-08 | 1e-05 | 1e-04 | 0.0001 | 0.001 | 0.01 | 0.05 | 0.1 | 0.2 | 0.5 | 1.
Models applied independently to the three raw genotype samples with and without SNPs from the APOE region and meta-analysed.
Age, sex and 7 principal components for population stratification included as covariates.
#PRSice command to generate independent PRS models
Rscript /mnt/lustre/groups/proitsi/Jodie/prs/Programmes/PRSice.R \
--dir /mnt/lustre/groups/proitsi/Alex/PRS_Outputs_All_Covariates \
--prsice /mnt/lustre/groups/proitsi/Jodie/prs/Programmes/PRSice_linux \
--base $PROTEIN_PATH \
--target $TARGET_FILE \
--stat BETA \
--score std \
--binary-target T \
--prevalence 0.07 \
--cov-file $TARGET_COVARIATE_FILE \
--cov-col PC1,PC2,PC3,PC4,PC5,PC6,PC7,AGE,SEX \
--bar-levels 5e-8,5e-5,5e-4,0.001,0.01,0.05,0.1,0.2,0.5,1 \
--fastscore \
--out /mnt/lustre/groups/proitsi/Alex/PRS_Outputs_All_Covariates/$OUTPUT
#Example code for meta-analysis
for (protein in proteins){
print("Select data from each of the 10 thresholds")
test1<-prsResults %>% filter(Protein == protein, Threshold == 5e-08)
...
test10<-prsResults %>% filter(Protein == protein, Threshold == 1)
print("Run meta analysis on each threshold")
if(exists("test1")) test1.reml<-rma(yi=Coefficient, sei=Standard.Error, method="REML", data=test1)
...
if(exists("test10")) test10.reml<-rma(yi=Coefficient, sei=Standard.Error, method="REML", data=test10)
6 proteins were nominally significant (p = <0.05) in the with APOE sample after meta-analysis (APOE, Serum amyloid P component, Complement component C6, Inter-alpha-trypsin inhibitor heavy chain H1, Prostate-specific antigen, Plasma protease C1 inhibitor).
When [X] SNPs in the APOE region were removed from the base and target data to control for the known association with APOE, 6 proteins were nominally significant with Insulin-like growth factor-binding protein 2 replacing APOE.
| Protein | ShortCode | P | R^2 |
|---|---|---|---|
| Complement component C6 | C6 | 0.0242 | 0.000814 |
| Plasma protease C1 inhibitor | SERPING1 | 0.0256 | 0.000799 |
| Inter-alpha-trypsin inhibitor heavy chain H1 | ITIH1 | 0.0272 | 0.000782 |
| Serum amyloid P component | APCS | 0.0283 | 0.000771 |
| Prostate-specific antigen | KLK3 | 0.0398 | 0.000678 |
| Insulin-like growth factor-binding protein 2 | IGFBP2 | 0.0435 | 0.000653 |
PRS model built for AD using Kunkle et al GWAS summary statistics and tested for association with the 20 proteins which had blood plasma protein levels collected in the ANM cohort.
For pancreatic prohormone, amyloid-beta A4 precursor protein-binding family B member 3, fibulin-1, inter-alpha-trypsin inhibitor heavy chain H1 and vitronectin no protein level data available.
| Male | Female | Mean Age | |
|---|---|---|---|
| AD | 50 | 50 | 50 |
| MCI | 50 | 50 | 50 |
| Controls | 50 | 50 | 50 |
AD PRS models were tested with and without SNPs from the APOE region at 10 pre-defined p-value thresholds 5e-08 | 1e-05 | 1e-04 | 0.0001 | 0.001 | 0.01 | 0.05 | 0.1 | 0.2 | 0.5 | 1.
No covariates included as protein phenotype created by taking residuals from regression of protein levels with age, gender and centre.
ANALYSIS TO-DOS
#PRSice command to generate AD PRS models
Rscript /mnt/lustre/groups/proitsi/Jodie/prs/Programmes/PRSice.R \
--dir /mnt/lustre/groups/proitsi/Alex/AD_PRS_to_Protein/PRS_Outputs \
--prsice /mnt/lustre/groups/proitsi/Jodie/prs/Programmes/PRSice_linux \
--base /mnt/lustre/groups/proitsi/Alex/AD_PRS_to_Protein/Base_Data/Post_QC/Kunkle_Stage1_post_qc.txt \
--target /mnt/lustre/groups/proitsi/Alex/AD_PRS_to_Protein/Target_Data/three_batches_imputed.pQTL.QC \
--chr Chromosome \
--bp Position \
--snp MarkerName \
--A1 Effect_allele \
--A2 Non_Effect_allele \
--stat Beta \
--pvalue Pvalue \
--score std \
--binary-target F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F \
--pheno /mnt/lustre/groups/proitsi/Alex/AD_PRS_to_Protein/Target_Data/SOMA_Pheno_Oct19 \
--pheno-col $TARGET_PROTEINS \
--bar-levels 5e-8,5e-5,5e-4,0.001,0.01,0.05,0.1,0.2,0.5,1 \
--fastscore \
--out /mnt/lustre/groups/proitsi/Alex/AD_PRS_to_Protein/PRS_Outputs/Kunkle_AD_to_ANM_WITH_APOE
2 proteins were nominally significant (p = <0.05) in the with APOE sample (APOE and Haptoglobin).
When [X] SNPs in the APOE region were removed from the base and target data to control for the known association with APOE, 2 proteins were nominally significant with Interleukin-10 replacing APOE.
| Protein | ShortCode | P | R^2 |
|---|---|---|---|
| Interleukin-10 | IL10 | 0.0392 | 0.01040 |
| Haptoglobin | HP | 0.0460 | 0.00974 |
Proteins that were significant in one or both PRS were considered for Mendelian Randomisation (MR).
This resulted in 8 proteins, excluding APOE. 4 proteins had SNPs that passed criteria for use as genetic instruments:
ANALYSIS TO-DOS
| Protein | ShortCode | Protein PRS -> AD | AD PRS -> Protein | cis-eQTL |
|---|---|---|---|---|
| Interleukin-10 | IL10 | N | Y | N |
| Haptoglobin | HP | N | Y | Y |
| Complement component C6 | C6 | Y | N | N |
| Plasma protease C1 inhibitor | SERPING1 | Y | N | Y |
| Inter-alpha-trypsin inhibitor heavy chain H1 | ITIH1 | Y | N | Y |
| Serum amyloid P component | APCS | Y | N | Y |
| Prostate-specific antigen | KLK3 | Y | N | N |
| Insulin-like growth factor-binding protein 2 | IGFBP2 | Y | N | N |
Proteins were analysed individually as an exposure with AD as the outcome. AD was then used as an exposure with each protein as the outcome to test for reverse causality.
Inverse variance weighted MR and sensitivity analysis (e.g. weighted median, weighted mode, MR Egger) was conducted using the R package MR Base.