Objective

Investigate the genetic association between blood plasma proteins and Alzheimer’s Disease (AD) to inform disease prediction and provide signals for functional research

Graphical Abstract

PENDING ACTUAL GRAPHIC

PENDING ACTUAL GRAPHIC

Results

Step 1 - Select target proteins

71 candidate proteins were identified from a literature review of 33 papers.
Proteins were marked as candidates if significantly associated with an AD related phenotype in an independent discovery study (univariate or multivariate) or if previously replicated in 3 or more studies in Kiddle et al’s comprehensive 2014 review.

25 proteins with GWAS data from Sun et al were selected for further analysis.

Figure 1: Target proteins for further analysis
Protein SOMAmerID Protein Short Code UniProtID Number of studies where significant association with AD phenotype
Apolipoprotein E APOE.2937.10.2 APOE P02649 7
Pancreatic prohormone PPY.4588.1.2 PPY P01298 7
Complement C3 C3.2755.8.2 C3 P01024 6
Complement factor H CFH.4159.130.1 CFH P08603 6
Clusterin CLU.4542.24.2 CLU P10909 5
Plasma protease C1 inhibitor SERPING1.4479.14.2 SERPING1 P05155 5
Serum amyloid P component APCS.2474.54.5 APCS P02743 5
Insulin-like growth factor-binding protein 2 IGFBP2.2570.72.5 IGFBP2 P18065 4
Interleukin-10 IL10.2773.50.2 IL10 P22301 4
Interleukin-3 IL3.4717.55.2 IL3 P08700 4
Vitronectin VTN.13125.45.3 VTN P04004 4
Fibronectin FN1.3435.53.2 FN1 P02751 3
Granulocyte colony-stimulating factor CSF3.8952.65.3 CSF3 P09919 3
Haptoglobin HP.3054.3.2 HP P00738 3
Complement C4 A/B C4A.C4B.4481.34.2 C4A P0C0L4,P0C0L5 2
Complement component C6 C6.4127.75.1 C6 P13671 2
Alpha-2-HS-glycoprotein AHSG.3581.53.3 AHSG P02765 1
Amyloid beta A4 protein APP.3171.57.2 APP P05067 1
Amyloid-beta A4 precursor protein-binding family B member 3 APBB3.13589.10.3 APBB3 O95704 1
Brain-derived neurotrophic factor BDNF.2421.7.3 BDNF P23560 1
Fibrinogen gamma chain FGA.FGB.FGG.4907.56.1 FGA P02671,P02675,P02679 1
Fibulin-1 FBLN1.6470.19.3 FBLN1 P23142 1
Inter-alpha-trypsin inhibitor heavy chain H1 ITIH1.7955.195.3 ITIH1 P19827 1
Prostate-specific antigen KLK3.8468.19.3 KLK3 P07288 1
Receptor tyrosine-protein kinase erbB-2 ERBB2.2616.23.18 ERBB2 P04626 1

Step 2 - Protein PRS -> AD status

Data Preparation

Summary GWAS statistics for the 25 target proteins were downloaded from the repository provided by the Cambridge Cardiovascular Epidemiology Unit based on 3,301 healthy individuals (‘Base data’).

Removed [X] bi-allelic and duplicate SNPs resulting in [X] SNPs for analysis.

Raw genotype data from GERAD, ADNI and ANM for 6244 individuals accessed via KCL Rosalind High Performance Computing cluster (‘Target data’).

Consistent QC applied [REQUIRES VALIDATION]
Samples: European ancestry, non-relatives, no other dementia or mild cognitive impairment, call rate > 98%
SNPs: MAF >0.01, call rate > 98%, HWE p<10^-5, imputed with INFO quality score >0.7

Only SNPs genotyped in all three Target data samples were selected from the Base data SNPs resulting in 5,210,103 SNPs for consideration in the PRS model.

Figure 2: Target sample summary
ADNI ANM GERAD TOTAL
Cases 639 371 3277 4287
Controls 368 374 1215 1957
Total 1007 745 4492 6244
Protein Heritability

Observed protein heritability (h2) was estimated using linkage disequilibrium score regression (LDSR).
Average mean chi squared was low at 1.014048.
Results treated as indicative given average SE of 0.192148 and known challenges of applying LDSR to samples <5,000.

Figure 3: Protein Heritability Estimates

Genetic correlation (rg) between proteins was also estimated using LDSR.
No proteins had an rg of over 0.99 at a p-value less than 0.05.
[X]% results were NA. [Explain NA in context of LDSR].

Figure 4: Protein Genetic Correlation Matrix

[TABLE TO BE ADDED]

Protein PRS Models

PRS models built for each of the 25 target proteins at 10 pre-defined p-value thresholds 5e-08 | 1e-05 | 1e-04 | 0.0001 | 0.001 | 0.01 | 0.05 | 0.1 | 0.2 | 0.5 | 1.
Models applied independently to the three raw genotype samples with and without SNPs from the APOE region and meta-analysed.
Age, sex and 7 principal components for population stratification included as covariates.

#PRSice command to generate independent PRS models

Rscript /mnt/lustre/groups/proitsi/Jodie/prs/Programmes/PRSice.R \
      --dir /mnt/lustre/groups/proitsi/Alex/PRS_Outputs_All_Covariates \
      --prsice /mnt/lustre/groups/proitsi/Jodie/prs/Programmes/PRSice_linux \
      --base $PROTEIN_PATH \
      --target $TARGET_FILE \
      --stat BETA \
      --score std \
      --binary-target T \
      --prevalence 0.07 \
      --cov-file $TARGET_COVARIATE_FILE \
      --cov-col PC1,PC2,PC3,PC4,PC5,PC6,PC7,AGE,SEX \
      --bar-levels 5e-8,5e-5,5e-4,0.001,0.01,0.05,0.1,0.2,0.5,1 \
      --fastscore \
      --out /mnt/lustre/groups/proitsi/Alex/PRS_Outputs_All_Covariates/$OUTPUT
      
#Example code for meta-analysis
for (protein in proteins){ 
  print("Select data from each of the 10 thresholds")
    test1<-prsResults %>% filter(Protein == protein, Threshold == 5e-08)
    ...
    test10<-prsResults %>% filter(Protein == protein, Threshold == 1)
     
  print("Run meta analysis on each threshold")
    if(exists("test1")) test1.reml<-rma(yi=Coefficient, sei=Standard.Error, method="REML", data=test1)
    ...
    if(exists("test10")) test10.reml<-rma(yi=Coefficient, sei=Standard.Error, method="REML", data=test10)

6 proteins were nominally significant (p = <0.05) in the with APOE sample after meta-analysis (APOE, Serum amyloid P component, Complement component C6, Inter-alpha-trypsin inhibitor heavy chain H1, Prostate-specific antigen, Plasma protease C1 inhibitor).

When [X] SNPs in the APOE region were removed from the base and target data to control for the known association with APOE, 6 proteins were nominally significant with Insulin-like growth factor-binding protein 2 replacing APOE.

Figure 5a: Proteins with Protein PRS significantly associated with AD
Protein ShortCode P R^2
Complement component C6 C6 0.0242 0.000814
Plasma protease C1 inhibitor SERPING1 0.0256 0.000799
Inter-alpha-trypsin inhibitor heavy chain H1 ITIH1 0.0272 0.000782
Serum amyloid P component APCS 0.0283 0.000771
Prostate-specific antigen KLK3 0.0398 0.000678
Insulin-like growth factor-binding protein 2 IGFBP2 0.0435 0.000653
Figure 5b: Protein PRS Associations with AD (Meta-analysed, No APOE SNPs)

Figure 6: Protein PRS Associations with AD (Per sample, No APOE SNPs)

Step 3 - AD PRS -> protein

AD PRS Models

PRS model built for AD using Kunkle et al GWAS summary statistics and tested for association with the 20 proteins which had blood plasma protein levels collected in the ANM cohort.
For pancreatic prohormone, amyloid-beta A4 precursor protein-binding family B member 3, fibulin-1, inter-alpha-trypsin inhibitor heavy chain H1 and vitronectin no protein level data available.

Figure 7: ANM target sample summary
[TABLE TO BE COMPLETED]
Male Female Mean Age
AD 50 50 50
MCI 50 50 50
Controls 50 50 50

AD PRS models were tested with and without SNPs from the APOE region at 10 pre-defined p-value thresholds 5e-08 | 1e-05 | 1e-04 | 0.0001 | 0.001 | 0.01 | 0.05 | 0.1 | 0.2 | 0.5 | 1.
No covariates included as protein phenotype created by taking residuals from regression of protein levels with age, gender and centre.

ANALYSIS TO-DOS

  • Test for collinearity between AD PRS score and AD status in ANM and deciles analysis
  • Confirm covariates and run with and without case / control as covariates
  • Run AD and Protein PRS stratified by age
  • Run Protein PRS deciles association analysis with protein levels in ANM cohort (to check level of association with protein)
  • Run Protein PRS deciles association analysis with AD for significant proteins
#PRSice command to generate AD PRS models

Rscript /mnt/lustre/groups/proitsi/Jodie/prs/Programmes/PRSice.R \
--dir /mnt/lustre/groups/proitsi/Alex/AD_PRS_to_Protein/PRS_Outputs \
--prsice /mnt/lustre/groups/proitsi/Jodie/prs/Programmes/PRSice_linux \
--base /mnt/lustre/groups/proitsi/Alex/AD_PRS_to_Protein/Base_Data/Post_QC/Kunkle_Stage1_post_qc.txt \
--target /mnt/lustre/groups/proitsi/Alex/AD_PRS_to_Protein/Target_Data/three_batches_imputed.pQTL.QC \
--chr Chromosome \
--bp Position \
--snp  MarkerName \
--A1 Effect_allele \
--A2 Non_Effect_allele \
--stat Beta \
--pvalue Pvalue \
--score std \
--binary-target F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F,F \
--pheno /mnt/lustre/groups/proitsi/Alex/AD_PRS_to_Protein/Target_Data/SOMA_Pheno_Oct19 \
--pheno-col $TARGET_PROTEINS \
--bar-levels 5e-8,5e-5,5e-4,0.001,0.01,0.05,0.1,0.2,0.5,1 \
--fastscore \
--out /mnt/lustre/groups/proitsi/Alex/AD_PRS_to_Protein/PRS_Outputs/Kunkle_AD_to_ANM_WITH_APOE

2 proteins were nominally significant (p = <0.05) in the with APOE sample (APOE and Haptoglobin).

When [X] SNPs in the APOE region were removed from the base and target data to control for the known association with APOE, 2 proteins were nominally significant with Interleukin-10 replacing APOE.

Figure 8a: Proteins significantly associated with AD PRS
Protein ShortCode P R^2
Interleukin-10 IL10 0.0392 0.01040
Haptoglobin HP 0.0460 0.00974
Figure 8b: AD PRS Associations with blood plasma protein levels (No APOE SNPs)

Step 4 - Bidirectional MR

Proteins that were significant in one or both PRS were considered for Mendelian Randomisation (MR).
This resulted in 8 proteins, excluding APOE. 4 proteins had SNPs that passed criteria for use as genetic instruments:

  • Genome wide significant cis-eQTLs associated with gene encoding target protein
  • Not associated with confounders
  • Not associated with outcome (except through exposure)
  • Not in LD with other instrumental variables

ANALYSIS TO-DOS

  • Review constraints to include more SNPs for existing 4 proteins and other 4
  • Apply statistical biological screening for AD and protein SNP instruments
Figure 9: Proteins for MR
Protein ShortCode Protein PRS -> AD AD PRS -> Protein cis-eQTL
Interleukin-10 IL10 N Y N
Haptoglobin HP N Y Y
Complement component C6 C6 Y N N
Plasma protease C1 inhibitor SERPING1 Y N Y
Inter-alpha-trypsin inhibitor heavy chain H1 ITIH1 Y N Y
Serum amyloid P component APCS Y N Y
Prostate-specific antigen KLK3 Y N N
Insulin-like growth factor-binding protein 2 IGFBP2 Y N N

Proteins were analysed individually as an exposure with AD as the outcome. AD was then used as an exposure with each protein as the outcome to test for reverse causality.
Inverse variance weighted MR and sensitivity analysis (e.g. weighted median, weighted mode, MR Egger) was conducted using the R package MR Base.

Figure 10: MR Results