Virginia Commonwealth University (VCU)
1 Department of Life Sciences, Virginia Commonwealth University, Richmond, Virginia
Preterm birth (PTB) refers to the delivery of a baby before 37 weeks of gestation, compared to the typical full-term pregnancy duration of around 40 weeks. PTB is associated with a range of complications, including respiratory problems, developmental delays, and cerebral palsy. The causes of PTB are multifactorial, involving genetic, environmental, and biological factors. Research has increasingly explored the role of epigenetics, specifically DNA methylation, in understanding the mechanisms behind PTB. By studying changes in DNA methylation patterns at specific CpG sites and their associated genes, we can aim to identify genetic factors that influence the timing of birth. This information can aid in the development of innovative diagnostic tools and preventive strategies to reduce the incidence of preterm births and improve the outcomes for affected infants.
Examine and identify sigificant CpGs associated with gestational age (GA) within both datasets, using the p-values and coefficients.
Annotate significant CpGs to identify genes potentially linked to the timing of birth.
Perform gene ontology (GO) enrichment analysis on these genes to explore their functional roles in birth timing.
The PREG study (York, 2020) consists of 124 umbilical cord blood specimens collected at VCU health clinics.
Genome-wide DNAm for datasets was measured using the Infinium Human Methylation 450K Beadchip (Illumina, San Diego, USA) which quantifies ~485,000 CpG sites.
The CellDMC algorithm by Zheng et al. (Zheng, 2017) was applied to the CellDMC dataset; this identifies cell-type specific effects using a deconvolution approach and linear modeling with interaction terms between phenotype and cell-type fractions.
Significant CpGs identified were annotated using the IlluminaHumanMethylation450kanno.ilmn12.hg19 (Hansen, 2021) package from Bioconductor v3.18 to identify genes.
Gene ontology (GO) enrichment was performed using the org.Hs.eg.db and GO.db packages (Calson, 2023) from Bioconductor v3.18.
Figure 1: Jitter plot illustrating significant CpGs across chromosome numbers from the Standard EWAS. The x-axis represents the chromosome number, while the y-axis represents the −log10 p-values. The dashed black line denotes the Bonferroni-corrected genome-wide significance threshold (pB < 0.05). The top 10 most significant CpGs are annotated with their corresponding gene names.
Figure 2: The standard EWAS results show gray dots for nonsignificant associations and colored dots for Bonferroni-significant ones (pB < 0.05). Blue dots indicate negative effect sizes, orange dots positive ones. The x-axis represents coefficient estimates (β-values), the y-axis -log10 p-values, with the dashed line marking the Bonferroni-corrected significance threshold.
Figure 3: Seven cord blood cell types from CellDMC analysis, showing gray dots for nonsignificant associations and colored dots for Bonferroni-significant ones (pB < 0.05). Blue dots indicate negative effect sizes, orange dots positive ones. The x-axis shows coefficient estimates (β-values), the y-axis -log10 p-values, with the dashed line marking the Bonferroni-corrected significance threshold.
The top 10 known gene names identified from the standard EWAS data include TMEM201, ZNF598, TSNARE1, NIN, C5orf62, PLAGL1, PTDSS2, ACVR1B, and ZNF598, with PTDSS2 and ZNF598 appearing twice.
The 131 significant genes identified were associated with 1,006 unique GO terms. The most frequent term was protein binding, occurring 80 times (7.95%), followed by nucleus with 53 occurrences (5.27%) and cytosol appearing 46 times (4.57%) (Fig. 4).
Figure 4: Enrichment results of significant standard EWAS genes.
Virginia Commonwealth University (VCU)