So methpipe has finally finished the Day 10 samples and I figured I’d run them through the RADMeth model to see if anything interesting happens. RADMeth is different than MACAU as it uses a beta-bionmial GLM as opposed to the mixed-effects model of MACAU.

Beta-binomial still lives in proportion space, as opposed to count space like the MACAU model, so we lose some information there, but it’ll be interesting to see what the differences are.

The RADMeth documentation suggests that RADMeth be run on a computing cluster with “a few hundred” available nodes then offers that it can also be run on a personal workstation but the analysis will take “sigificantly” longer. Since I’m fresh out of computing clusters, personal laptop it is!

I’ll probably upload this prior to starting the actual RADMeth step, in case it either explodes mid-process or I get tired of waiting forever for it to finish. This avoids the problem that I’m having with the previous steps, the notebook’s been running off and on for a week and a half now, but hasn’t been uploaded because it’s not finished.

I’ll recap what’s happened in that notebook here. It’s all from the methpipe manual if you’re interested in the logic behind why.

  1. Copied the .bam files produced by the Bismark aligner.
  2. Converted them to .mr files (unique format for MethPipe) using the to-mr command.
  3. Sorted them using GNUSort
  4. Used duplicate-remover to remove PCR duplicated reads
  5. Used methcounts to get methylation counts by loci. This step takes 8-10 hours per sample.

That brings us to here!

library(readr)
setwd("~/Documents/RobertsLab/methpipe/Day10")
list.files(pattern = "*.meth")
 [1] "EPI-103.meth" "EPI-104.meth" "EPI-111.meth" "EPI-113.meth" "EPI-119.meth" "EPI-120.meth"
 [7] "EPI-127.meth" "EPI-128.meth" "EPI-135.meth" "EPI-136.meth" "EPI-143.meth" "EPI-145.meth"
system("head EPI-103.meth")
scaffold100 1   +   CXG 0   0
scaffold100 3   -   CXG 0   0
scaffold100 5   +   CHH 0   0
scaffold100 7   +   CpG 0   0
scaffold100 8   -   CpG 0   0
scaffold100 10  -   CHH 0   0
scaffold100 14  -   CHH 0   0
scaffold100 15  -   CHH 0   0
scaffold100 18  +   CXG 0   0
scaffold100 20  -   CXG 0   0

Well, that looks like some methylation count data with scaffold location, strand orientation, context, and what I assume are methylated and total coverage counts. Frustrating that I technically already had this data via the methylextract stuff I did previously, but methpipe wants it contorted it’s on specific way, so I get to re-do all the work.

The documentation says that RADMeth needs two input files, a proportion table with location and count files by sample. Something like the table below.

EPI-103 EPI-104 EPI-111 EPI-113
scaffold100:13 3 5 1 6 1 4 4 9
scaffold100:17 4 7 4 8 5 11 7 16
scaffold100:101 0 3 3 10 13 17 15 23

The second file it wants is a design matrix which specifies which samples belong to which treatment groups similar to the table below.

base trt
EPI-103 1 2
EPI-104 1 2
EPI-111 1 1
EPI-113 1 1

Lets start by making the proportion table. Thankfully methpipe gives us a comand to do this automatically from the .meth files via merge-methcounts. I’m going to take this opportunity to construct my proportion table such that it goes Ambient -> Low -> Super Low treatments in order, just for readability.

system("merge-methcounts EPI-119.meth EPI-120.meth EPI-135.meth EPI-136.meth EPI-103.meth EPI-104.meth EPI-127.meth EPI-128.meth EPI-111.meth EPI-113.meth EPI-143.meth EPI-145.meth -o Day10-merged.meth")

Thats done, lets see what it looks like.

system("head Day10-merged.meth")
scaffold100 1   +   CXG 0   0
scaffold100 3   -   CXG 0   6
scaffold100 5   +   CHH 0   0
scaffold100 7   +   CpG 0   1
scaffold100 8   -   CpG 0   6
scaffold100 10  -   CHH 0   6
scaffold100 14  -   CHH 0   6
scaffold100 15  -   CHH 0   6
scaffold100 18  +   CXG 0   1
scaffold100 20  -   CXG 0   6

Well… thats not right. Ah. Apparently that code is for the technical replicate combination. So it assumed that the different EPI-XXX files were re-runs of the same sample. Oops. Have to use a -t flag to say make the proportion table

Once more, this time with feeling!

system("merge-methcounts -t EPI-119.meth EPI-120.meth EPI-135.meth EPI-136.meth EPI-103.meth EPI-104.meth EPI-127.meth EPI-128.meth EPI-111.meth EPI-113.meth EPI-143.meth EPI-145.meth > Day10-merged.meth")

Lets see what this one looks like.

system("head Day10-merged.meth")
EPI-119 EPI-120 EPI-135 EPI-136 EPI-103 EPI-104 EPI-127 EPI-128 EPI-111 EPI-113 EPI-143 EPI-145 
scaffold100:1:+:CXG 0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   
scaffold100:3:-:CXG 1   0   1   0   0   0   1   0   0   0   0   0   1   0   0   0   0   0   2   0   0   0   0   0   
scaffold100:5:+:CHH 0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   
scaffold100:7:+:CpG 0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   
scaffold100:8:-:CpG 1   0   1   0   0   0   1   0   0   0   0   0   1   0   0   0   0   0   2   0   0   0   0   0   
scaffold100:10:-:CHH    1   0   1   0   0   0   1   0   0   0   0   0   1   0   0   0   0   0   2   0   0   0   0   0   
scaffold100:14:-:CHH    1   0   1   0   0   0   1   0   0   0   0   0   1   0   0   0   0   0   2   0   0   0   0   0   
scaffold100:15:-:CHH    1   0   1   0   0   0   1   0   0   0   0   0   1   0   0   0   0   0   2   0   0   0   0   0   
scaffold100:18:+:CXG    0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   

Much better!

I made the Design Matrix in excel real quick and saved it as a .csv file.

system("cat designmatrix.csv")
,base,trt
EPI-119,1,0
EPI-120,1,0
EPI-135,1,0
EPI-136,1,0
EPI-103,1,1
EPI-104,1,1
EPI-127,1,1
EPI-128,1,1
EPI-111,1,2
EPI-113,1,2
EPI-143,1,2
EPI-145,1,2

Hm, for some reason R doesn’t want to display the cat correctly, so you’re stuck with trusting me that it’s correct.

Now we get to start with the actual RADMeth part.

There are three main parts to a RADMeth run apparently. First, we do the regression, then we adjust p-values based on neighboring p-values, and finally we combine neighbors to form DMRs. I’m not 100% sure how this will work with our data, as we technically only have differentially methylated loci, but we’ll see.

First, the regression. RADMeth needs to be told what factor it’s looking at, in our case it’s called trt.

system("radmeth regression -factor trt designmatrix.csv Day10-merged.meth > cpgs.bed")
Error: trt is not a part of the design specification.

Hm. Maybe it doesn’t like the commas in the design matrix? I’ll change it over to a tab delimited file.

system("radmeth regression -factor trt designmatrix.txt Day10-merged.meth > cpgs.bed")
EPI-119 EPI-120 EPI-135 EPI-136 EPI-103 EPI-104 EPI-127 EPI-128 EPI-111 EPI-113 EPI-143 EPI-145  does not match factor names or their order in the design matrix. Please verify that the design matrix and the proportion table are correctly formatted.

Hmm. It looks like merge-methcounts forgot to make the first column name blank, so all of my column names are shifted by one and the final column has no name. Guess I’ll have to fix that.

After some googling and some help from Sam, I found that ( printf ‘’; cat Day10-merged.meth ) > Day10-merged2.meth

works to add that tab. Lets try it again.


system("radmeth regression -factor trt designmatrix.txt Day10-merged2.meth > cpgs.bed")

Well, that didn’t work either, so next i tried changing the delimiting character from tab (/t) to just whitespace using tr. The command was:

cat Day10-merged.meth | tr “\t” " " > Day10-merged2.meth

system("head Day10-merged2.meth")
EPI-119 EPI-120 EPI-135 EPI-136 EPI-103 EPI-104 EPI-127 EPI-128 EPI-111 EPI-113 EPI-143 EPI-145 
scaffold100:1:+:CXG 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
scaffold100:3:-:CXG 1 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 2 0 0 0 0 0 
scaffold100:5:+:CHH 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
scaffold100:7:+:CpG 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 
scaffold100:8:-:CpG 1 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 2 0 0 0 0 0 
scaffold100:10:-:CHH 1 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 2 0 0 0 0 0 
scaffold100:14:-:CHH 1 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 2 0 0 0 0 0 
scaffold100:15:-:CHH 1 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 2 0 0 0 0 0 
scaffold100:18:+:CXG 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 

That looks better? Maybe? Once more unto the breach.


system("radmeth regression -factor trt designmatrix2.txt Day10-merged2.meth > cpgs.bed")
---
title: "Day 10 RADMeth"
output: html_notebook
---

So methpipe has finally finished the Day 10 samples and I figured I'd run them through the RADMeth model to see if anything interesting happens. RADMeth is different than MACAU as it uses a beta-bionmial GLM as opposed to the mixed-effects model of MACAU. 

Beta-binomial still lives in proportion space, as opposed to count space like the MACAU model, so we lose some information there, but it'll be interesting to see what the differences are. 

The RADMeth documentation suggests that RADMeth be run on a computing cluster with "a few hundred" available nodes then offers that it can also be run on a personal workstation but the analysis will take "sigificantly" longer. Since I'm fresh out of computing clusters, personal laptop it is!

I'll probably upload this prior to starting the actual RADMeth step, in case it either explodes mid-process or I get tired of waiting forever for it to finish. This avoids the problem that I'm having with the previous steps, the notebook's been running off and on for a week and a half now, but hasn't been uploaded because it's not finished.

I'll recap what's happened in that notebook here. It's all from the methpipe manual if you're interested in the logic behind why. 

1. Copied the .bam files produced by the Bismark aligner.   
2. Converted them to .mr files (unique format for MethPipe) using the to-mr command.  
3. Sorted them using GNUSort
4. Used duplicate-remover to remove PCR duplicated reads
5. Used methcounts to get methylation counts by loci. This step takes 8-10 hours per sample. 

That brings us to here! 



```{r}

library(readr)
setwd("~/Documents/RobertsLab/methpipe/Day10")

list.files(pattern = "*.meth")

```


```{r}

system("head EPI-103.meth")

```

Well, that looks like some methylation count data with scaffold location, strand orientation, context, and what I assume are methylated and total coverage counts. Frustrating that I technically already had this data via the methylextract stuff I did previously, but methpipe wants it contorted it's on specific way, so I get to re-do all the work. 

The documentation says that RADMeth needs two input files, a proportion table with location and count files by sample. Something like the table below. 


|                 | EPI-103 |   | EPI-104 |    | EPI-111 |    | EPI-113 |    |
|-----------------|--------:|--:|--------:|---:|--------:|---:|--------:|---:|
| scaffold100:13  |       3 | 5 |       1 | 6  |       1 | 4  |       4 | 9  |
| scaffold100:17  |       4 | 7 |       4 | 8  |       5 | 11 |       7 | 16 |
| scaffold100:101 |       0 | 3 |       3 | 10 |      13 | 17 |      15 | 23 |

The second file it wants is a design matrix which specifies which samples belong to which treatment groups similar to the table below.

|         | base | trt |
|---------|-----:|----:|
| EPI-103 |    1 |   2 |
| EPI-104 |    1 |   2 |
| EPI-111 |    1 |   1 |
| EPI-113 |    1 |   1 |


Lets start by making the proportion table. Thankfully methpipe gives us a comand to do this automatically from the .meth files via merge-methcounts. I'm going to take this opportunity to construct my proportion table such that it goes Ambient -> Low -> Super Low treatments in order, just for readability.

```{r}

system("merge-methcounts EPI-119.meth EPI-120.meth EPI-135.meth EPI-136.meth EPI-103.meth EPI-104.meth EPI-127.meth EPI-128.meth EPI-111.meth EPI-113.meth EPI-143.meth EPI-145.meth -o Day10-merged.meth")

```

Thats done, lets see what it looks like.

```{r}

system("head Day10-merged.meth")

```

Well... thats not right. Ah. Apparently that code is for the technical replicate combination. So it assumed that the different EPI-XXX files were re-runs of the same sample. Oops. Have to use a -t flag to say make the proportion table

Once more, this time with feeling!

```{r}

system("merge-methcounts -t EPI-119.meth EPI-120.meth EPI-135.meth EPI-136.meth EPI-103.meth EPI-104.meth EPI-127.meth EPI-128.meth EPI-111.meth EPI-113.meth EPI-143.meth EPI-145.meth > Day10-merged.meth")

```

Lets see what this one looks like.

```{r}

system("head Day10-merged.meth")

```

Much better!

I made the Design Matrix in excel real quick and saved it as a .csv file. 

```{r}

system("cat designmatrix.csv")

```

Hm, for some reason R doesn't want to display the cat correctly, so you're stuck with trusting me that it's correct.

Now we get to start with the actual RADMeth part.

There are three main parts to a RADMeth run apparently. First, we do the regression, then we adjust p-values based on neighboring p-values, and finally we combine neighbors to form DMRs. I'm not 100% sure how this will work with our data, as we technically only have differentially methylated loci, but we'll see.

First, the regression. RADMeth needs to be told what factor it's looking at, in our case it's called trt. 


```{r}

system("radmeth regression -factor trt designmatrix.csv Day10-merged.meth > cpgs.bed")

```

Hm. Maybe it doesn't like the commas in the design matrix? I'll change it over to a tab delimited file. 

```{r}

system("radmeth regression -factor trt designmatrix.txt Day10-merged.meth > cpgs.bed")

```


Hmm. It looks like merge-methcounts forgot to make the first column name blank, so all of my column names are shifted by one and the final column has no name. Guess I'll have to fix that.

After some googling and some help from Sam, I found that 
( printf '\t'; cat Day10-merged.meth ) > Day10-merged2.meth 

works to add that tab. Lets try it again.


```{r}

system("radmeth regression -factor trt designmatrix.txt Day10-merged2.meth > cpgs.bed")

```

Well, that didn't work either, so next i tried changing the delimiting character from tab (/t) to just whitespace using tr. The command was:

cat Day10-merged.meth | tr "\\t" " " > Day10-merged2.meth

```{r}

system("head Day10-merged2.meth")

```


That looks better? Maybe? Once more unto the breach.

```{r}

system("radmeth regression -factor trt designmatrix2.txt Day10-merged2.meth > cpgs.bed")

```