This is a tutorial for reproducing main results in MESA paper entitled ‘Multimodal Epigenetic Sequencing Analysis (MESA) of Cell-free DNA for Non-invasive Colorectal Cancer Detection’.

1. Download codes from Github

git clone https://github.com/ChaorongC/MESA

If you want to start with the processed data, please download the data from https://doi.org/10.5281/zenodo.6812875 and skip to step 5.

2. Process targeted EM-seq data

# Prepare softwares
conda install -c bioconda trim-galore
conda install -c bioconda bsmap
conda install -c bioconda samtools
conda install -c bioconda deeptools

Then follow the codes in MESA/codes_PaperReproducibility/1.DataProcessing.sh to get BAM files for the next step.

3. Extract features from processed targeted EM-seq data

# Prepare softwares
conda install -c bioconda bedtools
conda install -c bioconda danpos
conda install -c mvdbeek ucsc_tools

Then follow the codes in MESA/codes_PaperReproducibility/2.FeaturesExtraction.sh to get features (cfDNA methyaltion, nucleosome occupancy, nucleosome fuzziness, and WPS) for each sample.

4. Process cfTAPS data and extract features

# Prepare softwares
conda install -c bioconda bwa
conda install -c bioconda methyldackel
conda install -c pwwang bwtool

Then follow steps in MESA/codes_PaperReproducibility/3.cfTAPS_dataProcessing.sh to get features (cfDNA methylation, nucleosome occupancy, and WPS) for each sample.

5. Run MESA with the feature-by-sample matrices

# Generate feature-by-sample matrices for each modality (cfDNA methylation modality as an example)
# Sort features by feature ID for each sample
ls *cancer_sample*_methylation.tsv | while read file;do
  prefix = $(echo $file|sed 's/.tsv//');
  sort -k1,1 $file >${prefix}.sorted.tsv
done;
# Generate feature-by-sample matrix for cancer samples
cut -f1 cancer_sample1_methylation.sorted.tsv >cancer.siteMethyRatio.tsv
ls cancer_sample*_methylation.sorted.tsv | while read file;do
  cut -f2 $file | paste cancer.siteMethyRatio.tsv - >tmp;
  mv tmp cancer.siteMethyRatio.tsv
done;
# Generate feature-by-sample matrix for non-cancer samples
cut -f1 non-cancer_sample1_methylation.sorted.tsv >non-cancer.siteMethyRatio.tsv
ls non-cancer_sample*_methylation.sorted.tsv | while read file;do
  cut -f2 $file | paste non-cancer.siteMethyRatio.tsv - >tmp;
  mv tmp non-cancer.siteMethyRatio.tsv
done;

Follow the codes in MESA/codes_PaperReproducibility/4.runMESA.py to get predictive probabilities for both single modality and multimodal models.

6. Plot figures

Use the results from the previous steps to generate figures in MESA paper folowing codes in MESA/codes_PaperReproducibility/5.PlotFigures.R.