DNA methylation

  • DNA methylation is a type of chemical modification of DNA which involves the addition of a methyl group to the number 5 carbon of the cytosine (5C), to convert cytosine to 5-methylcytosine (5mC).

  • The most well characterized epigenetic mechanism.

  • In humans, DNA methylation occurs in cytosines that precede guanines (hence, CpG)

DNA modifications

CpG Sites and CpG islands

  • CpG sites are not randomly distributed in the genome - the frequency of CpG sites in human genomes is 1%, which is less than the expected (~4-6%).

  • Around 60-90% of CpGs are methylated in mammals.

  • DNA methylation frequently occurs in repeated sequences, and may help to suppress transcription from repeated sequences, and aid chromosomal stability.

CpG Sites and CpG islands

  • There are regions of the DNA that have a higher concentration of CpG sites (> 60%), named the CpG islands, which tend to be located in the promoter regions of many genes.

  • Between 200-1000 bp in length

  • Usually not methylated.

Creation and maintenance of DNA methylation

  • In humans, DNA is methylated by three enzymes, DNA methyltransferase DNMT1, DNMT3a, DNMT3b.

  • DNMT1 is the maintenance methyltransferase that is responsible for copying DNA methylation patterns to the daughter strands during DNA replication.

  • DNMT3a and 3b are the de novo methyltransferases that set up DNA methylation patterns early in development.

Roles of DNA methylation

  • Transcriptional gene silencing
  • Maintain genome stability
  • Embryonic development
  • Genomic imprinting
  • X chromosome inactivation (females)

Factors associated with changes in DNA methylation

  • Aging (developmental stage)
  • Diet
  • Inflammatory patterns
  • Environmental exposures
  • Smoking
  • Alcohol

DNA methylation and cancer

Hypomethylation – decrease methylation levels

  • A lower level of DNA methylation in tumors was one of the first epigenetic alterations to be found in human cancer. (Feinberg AP, et al., 1983).
  • Demethylation of the promoter region of proto-oncogenes will activate normally repressed gene expression
  • Global hypomethylation of DNA sequences that are normally heavily methylated may result in:
    • Chromosomal instability
    • Increased transcription from transposable elements
    • An elevated mutation rate due to mitotic recombination

DNA hypermethylation

Hypermethylation – increase methylation levels

  • Hypermethylation of the CpG islands in the promoter regions of tumor-suppressor genes is a major event in the origin of many cancers.
  • Hypermethylation of promoters can inactivate tumor-suppressor genes, affect genes involved in the cell cycle, DNA repair, and the metabolism of carcinogens, all of which are involved in the development of cancer.
  • The profiles of hypermethylation of the CpG islands in tumor-suppressor genes are specific to the cancer type.

Methylation assays

Sensitivity of restriction enzymes for methylated CpG sites

MeDIP (Methylated DNA immuno-precipitation)

  • Anti-methylcytidine Ab to Me-C => ChIP – chip
  • Doesn’t distinguish among nearby sites

Methylation assays

Sodium Bisulfite conversion

  • Modifies non-methylated cytosines
  • Differentiation of methylated and non-methylated cytosines

  • \(C \; \rightarrow \; U\)
  • \(C^M \; \rightarrow \; C\)

(m)RRBS: (multiplexed) Reduced Representation Bisulfite Sequencing

  • Utilizes cutting pattern of MspI enzyme (C^CGG) to systematically digest CpG-poor DNA
  • Covers the majority of CpG islands and promoters, and a reasonable number of exons, shores and enhancers

  • Advantages:
    • Only need 50-200ng DNA
    • Can be from any species
    • Cost and time

Application of DNA methylation assays

Early diagnosis

  • Detection of CpG-island hypermethylation in biological fluids and serum

Prognosis

  • Hypemethylation of specific genes
  • Whole DNA methylation profiles

Prediction

  • CpG island hypermethylation as a marker of response to chemotherapy

Prevention

  • Developing DNMTs inhibitors as chemopreventive drugs to reactive silenced genes

Bisulfite conversion-based Microarray Analysis

  • A DNA microarray is a technology that consists of thousands of spots with DNA oligonucleotides (probes) that are used to hybridize a target sequence.

  • Probe-target hybridization is usually detected and quantified by detection of fluorophore-, or chemiluminescence-labeled targets.

Illumina Infinium methylation assay

  • Unmethylated cytosines are chemically deaminated to uracil in the presence of bisulfite.

  • Methylated cytosines are refractory to the effects of bisulfite and remain cytosine.

  • After bisulfite conversion, each sample is whole-genome amplified (WGA) and enzymatically fragmented.

  • The bisulfite-converted WGA-DNA samples is purified and applied to the BeadChips.

Illumina Infinium methylation assay

  • Bead technology
  • Each bead has oligos containing 23-base address + 50-base probe complementary to bisulfite converted DNA

Illumina Infinium evolution

  • 2008: HumanMethylation27K. 25,578 probes targeting CpG sites within the proximal promoter regions.

  • 2011: HumanMethylation450K. 485,577 probes targeting additional CpG islands, shores and shelves, the 5' and 3' UTRs, gene bodies, some enhancer regions. Covers 99% of RefSeq genes.

  • 2015: MethylationEPIC. >850,000 probes. Additional cooverage of regulatory elements. 58% of FANTOM5 enhancers, 7% distal and 27% proximal ENCODE regulatory elements.

Measurement of methylation level

Two types of probes

  • Type I probes have two separate probe sequences per CpG site (one each for methylated and unmethylated CpGs). ~28% of probes. Suggested to be more stable and reproducible than the Type II probes

  • Type II probes have just one probe sequence per CpG site. Use half of the physical space. ~ 72% of probes. Have a decreased quantitative dynamic range compared to Type I probes.

Measurement of methylation level

Beta-value

\[\beta = \frac{M}{U + M}\]

  • \(M\) - signal from methylated probes
  • \(U\) - signal from unmethylated probes

\(\beta = 0\) - all probes are non-methylated

\(\beta = 1\) - all probes are methylated

Measurement of methylation level

Beta-value

\[\beta = \frac{M}{U + M}\]

  • \(M\) - signal from methylated probes
  • \(U\) - signal from unmethylated probes

M-value

\[Mvalue=log \left( \frac{M}{U} \right)\]

\(M = - \infty\) - all probes are non-methylated

\(M = + \infty\) - all probes are methylated

Measurement of methylation level

  • \(\beta\) values obtained from Infinium II probes are slightly less accurate and reproducible than those obtained from Infinium I probes (Dedeurwaerder et.al. 2011)
  • Peak correction methods (normalization) are available

Filter questionable probes

  • Remove probes that have failed to hybridize (detection p-value)
    • detection p-value represents the probability the target signal was distinguishable against background noise
  • Common approaches
    • Drop probes that failed in \(n^{th}%\) of samples
      • Common thresholds are 20%, 10%, 5% of probes at >0.05, >0.01
    • Drop samples that failed in \(n^{th}%\) of probes
      • Common thresholds are 50%, 20% at >0.05, >0.01

Filter questionable probes

  • Probes on X and Y chromosomes
  • Probes with lowest variation
  • Probes with extreme methylation level (e.g. median = 0% or 100%)
  • Keep only those in regions of interest (e.g. CpG islands, shores)

Filter questionable probes

minfi

  • Reads Illumina’s 450k array raw data (IDAT files) into R
  • Performs QC and normalization
  • Identifies differential methylation positions (DMP)
source("https://bioconductor.org/biocLite.R")
biocLite("minfi")
biocLite("minfiData")
library(minfi)
## Loading required package: BiocGenerics
## Loading required package: parallel
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, xtabs
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, append, as.data.frame, cbind, colnames,
##     do.call, duplicated, eval, evalq, Filter, Find, get, grep,
##     grepl, intersect, is.unsorted, lapply, lengths, Map, mapply,
##     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
##     Position, rank, rbind, Reduce, rownames, sapply, setdiff,
##     sort, table, tapply, union, unique, unsplit
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## Loading required package: lattice
## Loading required package: GenomicRanges
## Loading required package: S4Vectors
## Loading required package: stats4
## 
## Attaching package: 'S4Vectors'
## The following objects are masked from 'package:base':
## 
##     colMeans, colSums, expand.grid, rowMeans, rowSums
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Loading required package: SummarizedExperiment
## Loading required package: Biostrings
## Loading required package: XVector
## Loading required package: bumphunter
## Loading required package: foreach
## Loading required package: iterators
## Loading required package: locfit
## locfit 1.5-9.1    2013-03-22
## Setting options('download.file.method.GEOquery'='auto')
## Setting options('GEOquery.inmemory.gpl'=FALSE)

Methylation data

baseDir <- system.file("extdata", package = "minfiData")
list.files(baseDir)
## [1] "5723646052"      "5723646053"      "SampleSheet.csv"
targets <- read.metharray.sheet(baseDir)
## [read.metharray.sheet] Found the following CSV files:
## [1] "/Users/mdozmorov/Library/R/3.3/library/minfiData/extdata/SampleSheet.csv"
RGset <- read.metharray.exp(targets = targets)
pd <- pData(RGset) ## phenotypic data

QC

densityPlot(RGset, sampGroups = pd$Sample_Group, main = "Beta", xlab = "Beta")

Beta values are expected to cluster around 0 or 1.

QC

par(oma=c(2,10,1,1))
densityBeanPlot(RGset, sampGroups = pd$Sample_Group, sampNames = pd$Sample_Name)

Normalization

MSet.norm <- preprocessIllumina(RGset, bg.correct = TRUE, normalize = "controls", reference = 2)

Different methods for normalization have been proposed and still being developed

  • Dye-bias adjustment
  • Probe type I and II adjustment

 

Yousefi P. et. al. "Considerations for normalization of DNA methylation data by Illumina 450K BeadChip assay in population studies" Epigenetics 2013 http://www.tandfonline.com/doi/abs/10.4161/epi.26037

Multi-dimensional scaling (MDS) plot

mdsPlot(MSet.norm, numPositions = 1000, sampGroups = pd$Sample_Group, sampNames =pd$Sample_Name)

Similar to PCA, useful to identify outlier samples.

Getting M-values

## A small subset to speed up the demo:
mset <- MSet.norm[1:20000,]

## Getting the M values:
M <- getM(mset, type = "beta", betaThreshold = 0.001)

M values show the level of methylation centered around 0

Beta values ≤ 0.001, or more than 0.999 are truncated to avoid numerical issues.

Differentially methylated positions

dmp <- dmpFinder(M, pheno=pd$Sample_Group, type="categorical")
head(dmp)
##            intercept         f         pval       qval
## cg10805483 -9.964341 1706.1212 2.053224e-06 0.02639720
## cg20386875 -5.434480 1445.1107 2.859882e-06 0.02639720
## cg07155336 -5.799521  550.9746 1.952772e-05 0.05148498
## cg13059719 -2.505878  549.6611 1.962059e-05 0.05148498
## cg08343042 -3.565042  506.2230 2.310839e-05 0.05148498
## cg23098069  1.532107  497.6219 2.390872e-05 0.05148498

Rows ordered by p-value.

Plotting methylation levels

cpgs <- rownames(dmp)[1:4]
par(mfrow=c(2,2))
plotCpg(mset, cpg=cpgs, pheno=pd$Sample_Group)

My pipeline

  1. Filtering non-specific, polymorphic, SNP, chromosome Y probes
  2. Pre-processing and QC
    • dasen (background correction and quantile normalization)
    • BIMQ (Beta-mixture quantile normalization, correcting batch effect of Infinium I and II chemistries)
    • Principal Components Analysis to detect batch effects
    • ComBat, ISVA (removing batch effect)
  3. Association analysis, or differential methylation
    • betareg regression model
    • Pearson correlation coefficient
    • limma, minfi for differentially methylated tegions
    • Benjamini-Hochberg adjusted p-values < 0.05
  4. Functional enrichment analyses

Interpretation

R packages for Illumina Infinium array analysis

Morris TJ, Beck S "Analysis pipelines and packages for Infinium HumanMethylation450 BeadChip (450k) data" Methods. 2015 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4304832/

R packages for Illumina Infinium array analysis

Methylation statistics packages

References

Thank you