1 Instructor names and contact information

  1. Benjamin P. Berman
  2. Tiago C. Silva

2 Workshop Description

This workshop demonstrates how to perform ELMER analysis using matched RNA-seq and DNA methylation data.

You can find a detailed information about ELMER at http://bioconductor.org/packages/3.10/bioc/vignettes/ELMER/inst/doc/index.html.

Articles about ELMER:

  • Tiago C Silva, Simon G Coetzee, Nicole Gull, Lijing Yao, Dennis J Hazelett, Houtan Noushmehr, De-Chen Lin, Benjamin P Berman, ELMER v.2: an R/Bioconductor package to reconstruct gene regulatory networks from DNA methylation and transcriptome profiles, Bioinformatics, Volume 35, Issue 11, 1 June 2019, Pages 1974–1977, https://doi.org/10.1093/bioinformatics/bty902
  • Yao, Lijing, et al. “Inferring regulatory element landscapes and transcription factor networks from cancer methylomes.” Genome biology 16.1 (2015): 105. https://doi.org/10.1186/s13059-015-0668-3
  • Yao, Lijing, Benjamin P. Berman, and Peggy J. Farnham. “Demystifying the secret mission of enhancers: linking distal regulatory elements to target genes.” Critical reviews in biochemistry and molecular biology 50.6 (2015): 550-573. https://doi.org/10.3109/10409238.2015.1087961

2.1 Pre-requisites

  • Basic knowledge of R syntax
  • Familiarity with the SummarizedExperiment classes
  • Familiarity with ’omics data types including DNA methylation and gene expression
  • A machine with at least 8GB of RAM

2.2 Workshop Participation

Students will have a chance to run ELMER analysis on a provided MultiAssayExperiment object created from TCGA data from GDC data portal.

2.3 Goals and objectives

  • gain familiarity ELMER input data, a MultiAssayExperiment object
  • Execute ELMER analysis on real data and understand its meaning

3 R/Bioconductor packages used

library("ELMER")
library("MultiAssayExperiment")

4 Retrieving the data

The RNA-seq data, DNA methylation data and patients metadata (i.e age, gender) used in this workshop is structured as a MultiAssayExperiment. You can read more about a MultiAssayExperiment object at http://bioconductor.org/packages/MultiAssayExperiment/ and https://bioconductor.github.io/BiocWorkshops/workflow-for-multi-omics-analysis-with-multiassayexperiment.html.

Through the next section we will:

  1. load the data
  2. Verify the RNA-seq data
  3. Verify the DNA methylation data
  4. Verify the samples metadata

4.1 Download the data

The data is available in this google drive.

4.2 Loading the data

mae <- readRDS("Data/TCGA_ESCA_MAE_distal_regions.rds")
mae
## A MultiAssayExperiment object of 2 listed
##  experiments with user-defined names and respective classes. 
##  Containing an ExperimentList class object of length 2: 
##  [1] DNA methylation: RangedSummarizedExperiment with 148951 rows and 171 columns 
##  [2] Gene expression: RangedSummarizedExperiment with 56830 rows and 171 columns 
## Features: 
##  experiments() - obtain the ExperimentList instance 
##  colData() - the primary/phenotype DataFrame 
##  sampleMap() - the sample availability DataFrame 
##  `$`, `[`, `[[` - extract colData columns, subset, or experiment 
##  *Format() - convert into a long or wide DataFrame 
##  assays() - convert ExperimentList to a SimpleList of matrices

4.3 Verify the RNA-seq data

# check experiments
experiments(mae)
## ExperimentList class object of length 2: 
##  [1] DNA methylation: RangedSummarizedExperiment with 148951 rows and 171 columns 
##  [2] Gene expression: RangedSummarizedExperiment with 56830 rows and 171 columns
# Get the Gene expression object
rna.seq <- mae[[2]]

# nb of genes
nrow(rna.seq)
## [1] 56830
# nb of samples
ncol(rna.seq)
## [1] 171
# Check genes metadata
rowRanges(rna.seq)
## GRanges object with 56830 ranges and 3 metadata columns:
##                   seqnames              ranges strand | ensembl_gene_id
##                      <Rle>           <IRanges>  <Rle> |     <character>
##   ENSG00000000003     chrX 100627109-100639991      - | ENSG00000000003
##   ENSG00000000005     chrX 100584802-100599885      + | ENSG00000000005
##   ENSG00000000419    chr20   50934867-50958555      - | ENSG00000000419
##   ENSG00000000457     chr1 169849631-169894267      - | ENSG00000000457
##   ENSG00000000460     chr1 169662007-169854080      + | ENSG00000000460
##               ...      ...                 ...    ... .             ...
##   ENSG00000281904     chr2   90365737-90367699      + | ENSG00000281904
##   ENSG00000281909    chr15   22480439-22484840      - | ENSG00000281909
##   ENSG00000281910    chr16   58559796-58559931      - | ENSG00000281910
##   ENSG00000281912     chr1   45303910-45305619      + | ENSG00000281912
##   ENSG00000281920     chr2   65623272-65628424      + | ENSG00000281920
##                   external_gene_name original_ensembl_gene_id
##                          <character>              <character>
##   ENSG00000000003             TSPAN6       ENSG00000000003.13
##   ENSG00000000005               TNMD        ENSG00000000005.5
##   ENSG00000000419               DPM1       ENSG00000000419.11
##   ENSG00000000457              SCYL3       ENSG00000000457.12
##   ENSG00000000460           C1orf112       ENSG00000000460.15
##               ...                ...                      ...
##   ENSG00000281904         AC233263.6        ENSG00000281904.1
##   ENSG00000281909            HERC2P7        ENSG00000281909.1
##   ENSG00000281910           SNORA50A        ENSG00000281910.1
##   ENSG00000281912          LINC01144        ENSG00000281912.1
##   ENSG00000281920         AC007389.5        ENSG00000281920.1
##   -------
##   seqinfo: 24 sequences from an unspecified genome; no seqlengths
# Check genes expression
assay(rna.seq)[1:4,1:4]
##                 TCGA-2H-A9GF-01A-11R-A37I-31 TCGA-2H-A9GG-01A-11R-A37I-31
## ENSG00000000003                     17.63165                    16.758267
## ENSG00000000005                      0.00000                     7.721927
## ENSG00000000419                     19.33125                    19.188030
## ENSG00000000457                     15.87565                    16.251205
##                 TCGA-2H-A9GH-01A-11R-A37I-31 TCGA-2H-A9GI-01A-11R-A37I-31
## ENSG00000000003                    17.624508                     17.89604
## ENSG00000000005                     9.236257                      0.00000
## ENSG00000000419                    19.765411                     20.90188
## ENSG00000000457                    15.745243                     15.92405

4.4 Verify the DNA methylation data

# check experiments
experiments(mae)
## ExperimentList class object of length 2: 
##  [1] DNA methylation: RangedSummarizedExperiment with 148951 rows and 171 columns 
##  [2] Gene expression: RangedSummarizedExperiment with 56830 rows and 171 columns
# Get the DNA methylation object
dna.met <- mae[[1]]

# nb of DNA methylation probes
nrow(dna.met)
## [1] 148951
# nb of samples
ncol(dna.met)
## [1] 171
# Check DNA methylation probes metadata
rowRanges(dna.met)[,1:4]
## GRanges object with 148951 ranges and 4 metadata columns:
##              seqnames              ranges strand | address_A address_B
##                 <Rle>           <IRanges>  <Rle> | <integer> <integer>
##   cg08258224     chr1       864703-864704      - |  23712478      <NA>
##   cg13938959     chr1       898803-898804      + |  38601418      <NA>
##   cg12445832     chr1       898915-898916      - |  70726486      <NA>
##   cg23999112     chr1       898976-898977      - |  49747457      <NA>
##   cg11527153     chr1       902156-902157      + |  47662324      <NA>
##          ...      ...                 ...    ... .       ...       ...
##   cg16428758     chrX 155539968-155539969      - |  28623392      <NA>
##   cg05682970     chrX 155609878-155609879      - |  34723345      <NA>
##   cg07211220     chrX 155616254-155616255      + |  33714463      <NA>
##   cg25059696     chrY     7563149-7563150      + |  61613414      <NA>
##   cg13851368     chrY   11954167-11954168      + |  36601306  32634316
##                  channel  designType
##              <character> <character>
##   cg08258224        Both          II
##   cg13938959        Both          II
##   cg12445832        Both          II
##   cg23999112        Both          II
##   cg11527153        Both          II
##          ...         ...         ...
##   cg16428758        Both          II
##   cg05682970        Both          II
##   cg07211220        Both          II
##   cg25059696        Both          II
##   cg13851368         Grn           I
##   -------
##   seqinfo: 26 sequences from an unspecified genome; no seqlengths