Gene Identification Linked to Differentially Methylated CpGs in Cord Blood and Timing of Birth within the PREG Dataset


Danielle Coates 1

1 Department of Life Sciences, Virginia Commonwealth University, Richmond, Virginia

Introduction

Preterm birth (PTB) refers to the delivery of a baby before 37 weeks of gestation, compared to the typical full-term pregnancy duration of around 40 weeks. PTB is associated with a range of complications, including respiratory problems, developmental delays, and cerebral palsy. The causes of PTB are multifactorial, involving genetic, environmental, and biological factors. Research has increasingly explored the role of epigenetics, specifically DNA methylation, in understanding the mechanisms behind PTB. By studying changes in DNA methylation patterns at specific CpG sites and their associated genes, we can aim to identify genetic factors that influence the timing of birth. This information can aid in the development of innovative diagnostic tools and preventive strategies to reduce the incidence of preterm births and improve the outcomes for affected infants.

Objectives

  1. Two datasets from the Pregnancy, Race, Genes, Environment Cohort (PREG) (Lapato 2018):
  • Standard EWAS: Contains 445,080 CpG sites, leveraging only the p-values and coefficients.
  • CellDMC dataset: Utilizes the CellDMC algorithm included in the EpiDISH (Epigenetic Dissection of Intra-Sample Heterogeneity) package (Zheng 2018). Contains methylation data for seven cord blood cell types, each with 445,080 CpG sites, leveraging only the p-values and standard errors.
  1. Examine and identify sigificant CpGs associated with gestational age (GA) within both datasets, using the p-values and coefficients.

  2. Annotate significant CpGs to identify genes potentially linked to the timing of birth.

  3. Perform gene ontology (GO) enrichment analysis on these genes to explore their functional roles in birth timing.

Methods

  • The PREG study (York, 2020) consists of 124 umbilical cord blood specimens collected at VCU health clinics.

  • Genome-wide DNAm for datasets was measured using the Infinium Human Methylation 450K Beadchip (Illumina, San Diego, USA) which quantifies ~485,000 CpG sites.

  • The CellDMC algorithm by Zheng et al. (Zheng, 2017) was applied to the CellDMC dataset; this identifies cell-type specific effects using a deconvolution approach and linear modeling with interaction terms between phenotype and cell-type fractions.

  • Significant CpGs identified were annotated using the IlluminaHumanMethylation450kanno.ilmn12.hg19 (Hansen, 2021) package from Bioconductor v3.18 to identify genes.

  • Gene ontology (GO) enrichment was performed using the org.Hs.eg.db and GO.db packages (Calson, 2023) from Bioconductor v3.18.

Results

  • At a Bonferroni-corrected p-value threshold of \(p_{B} < 0.05\), the Standard EWAS identified 161 CpG sites significantly associated with gestational age (GA). Among these, 131 CpG sites were linked to known gene names, highlighting potential genetic associations with birth timing (Fig 1).
    Jitter plot illustrating significant CpGs across chromosome numbers from the Standard EWAS. The x-axis represents the chromosome number, while the y-axis represents the −log10 p-values. The dashed black line denotes the Bonferroni-corrected genome-wide significance threshold (pB < 0.05). The top 10 most significant CpGs are annotated with their corresponding gene names.

    Figure 1: Jitter plot illustrating significant CpGs across chromosome numbers from the Standard EWAS. The x-axis represents the chromosome number, while the y-axis represents the −log10 p-values. The dashed black line denotes the Bonferroni-corrected genome-wide significance threshold (pB < 0.05). The top 10 most significant CpGs are annotated with their corresponding gene names.

  • A total of 120 significant CpG sites were identified as hypomethylated at a Bonferroni-corrected p-value threshold of \(p_{B} < 0.05\), with 103 associated with known gene names. Additionally, 41 significant CpG sites were found to be hypermethylated, with 28 linked to known gene names (Fig. 2).
The standard EWAS results show gray dots for nonsignificant associations and colored dots for Bonferroni-significant ones (pB < 0.05). Blue dots indicate negative effect sizes, orange dots positive ones. The x-axis represents coefficient estimates (β-values), the y-axis -log10 p-values, with the dashed line marking the Bonferroni-corrected significance threshold.

Figure 2: The standard EWAS results show gray dots for nonsignificant associations and colored dots for Bonferroni-significant ones (pB < 0.05). Blue dots indicate negative effect sizes, orange dots positive ones. The x-axis represents coefficient estimates (β-values), the y-axis -log10 p-values, with the dashed line marking the Bonferroni-corrected significance threshold.

  • No significant cell-specific associations were observed between CpGs and birth timing (Fig. 3).
Seven cord blood cell types from CellDMC analysis, showing gray dots for nonsignificant associations and colored dots for Bonferroni-significant ones (pB < 0.05). Blue dots indicate negative effect sizes, orange dots positive ones. The x-axis shows coefficient estimates (β-values), the y-axis -log10 p-values, with the dashed line marking the Bonferroni-corrected significance threshold.

Figure 3: Seven cord blood cell types from CellDMC analysis, showing gray dots for nonsignificant associations and colored dots for Bonferroni-significant ones (pB < 0.05). Blue dots indicate negative effect sizes, orange dots positive ones. The x-axis shows coefficient estimates (β-values), the y-axis -log10 p-values, with the dashed line marking the Bonferroni-corrected significance threshold.

Analysis

  • The top 10 known gene names identified from the standard EWAS data include TMEM201, ZNF598, TSNARE1, NIN, C5orf62, PLAGL1, PTDSS2, ACVR1B, and ZNF598, with PTDSS2 and ZNF598 appearing twice.

  • The 131 significant genes identified were associated with 1,006 unique GO terms. The most frequent term was protein binding, occurring 80 times (7.95%), followed by nucleus with 53 occurrences (5.27%) and cytosol appearing 46 times (4.57%) (Fig. 4).

Enrichment results of significant standard EWAS genes.

Figure 4: Enrichment results of significant standard EWAS genes.

Virginia Commonwealth University (VCU)