We’ve been using Methyl Extract for our SNP and Methylation status calling, and it seems to be working well, except the methylation calling output is not that great. We plan to use a modeling software called MACAU, which expects two count files, one a matrix of methylation counts by location and sample, the second a matrix of total reads by location and sample. Methyl Extract provides this in a single matrix, but both counts are split by which strand they came from.

The goal of this notebook is to figure out how to combine the different samples in two ways. First, We need all of the samples to be in a single file, and second, we need the +/- strands summed so we have total counts by sample. This would seem easy at first glance, but there’s the additional problem that each sample may not have had a read happen at each position in each scaffold, so it’s likely there will be dissimilar numbers of rows per sample file.

write.csv(test.meth5, file = "methylation", sep = " ")
attempt to set 'sep' ignored
colnames(test.total7) <- c("loc", "EPI-104", "EPI-111", "EPI-113", "EPI-119")
Error in colnames(test.total7) <- c("loc", "EPI-104", "EPI-111", "EPI-113",  : 
  object 'test.total7' not found
LS0tCnRpdGxlOiAiTWVyZ2luZyBNZXRoeWxhdGlvbiBDb3VudCBmaWxlcyBieSBzYW1wbGUuIgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgpXZSd2ZSBiZWVuIHVzaW5nIE1ldGh5bCBFeHRyYWN0IGZvciBvdXIgU05QIGFuZCBNZXRoeWxhdGlvbiBzdGF0dXMgY2FsbGluZywgYW5kIGl0IHNlZW1zIHRvIGJlIHdvcmtpbmcgd2VsbCwgZXhjZXB0IHRoZSBtZXRoeWxhdGlvbiBjYWxsaW5nIG91dHB1dCBpcyBub3QgdGhhdCBncmVhdC4gV2UgcGxhbiB0byB1c2UgYSBtb2RlbGluZyBzb2Z0d2FyZSBjYWxsZWQgTUFDQVUsIHdoaWNoIGV4cGVjdHMgdHdvIGNvdW50IGZpbGVzLCBvbmUgYSBtYXRyaXggb2YgbWV0aHlsYXRpb24gY291bnRzIGJ5IGxvY2F0aW9uIGFuZCBzYW1wbGUsIHRoZSBzZWNvbmQgYSBtYXRyaXggb2YgdG90YWwgcmVhZHMgYnkgbG9jYXRpb24gYW5kIHNhbXBsZS4gTWV0aHlsIEV4dHJhY3QgcHJvdmlkZXMgdGhpcyBpbiBhIHNpbmdsZSBtYXRyaXgsIGJ1dCBib3RoIGNvdW50cyBhcmUgc3BsaXQgYnkgd2hpY2ggc3RyYW5kIHRoZXkgY2FtZSBmcm9tLgoKVGhlIGdvYWwgb2YgdGhpcyBub3RlYm9vayBpcyB0byBmaWd1cmUgb3V0IGhvdyB0byBjb21iaW5lIHRoZSBkaWZmZXJlbnQgc2FtcGxlcyBpbiB0d28gd2F5cy4gRmlyc3QsIFdlIG5lZWQgYWxsIG9mIHRoZSBzYW1wbGVzIHRvIGJlIGluIGEgc2luZ2xlIGZpbGUsIGFuZCBzZWNvbmQsIHdlIG5lZWQgdGhlICsvLSBzdHJhbmRzIHN1bW1lZCBzbyB3ZSBoYXZlIHRvdGFsIGNvdW50cyBieSBzYW1wbGUuIFRoaXMgd291bGQgc2VlbSBlYXN5IGF0IGZpcnN0IGdsYW5jZSwgYnV0IHRoZXJlJ3MgdGhlIGFkZGl0aW9uYWwgcHJvYmxlbSB0aGF0IGVhY2ggc2FtcGxlIG1heSBub3QgaGF2ZSBoYWQgYSByZWFkIGhhcHBlbiBhdCBlYWNoIHBvc2l0aW9uIGluIGVhY2ggc2NhZmZvbGQsIHNvIGl0J3MgbGlrZWx5IHRoZXJlIHdpbGwgYmUgZGlzc2ltaWxhciBudW1iZXJzIG9mIHJvd3MgcGVyIHNhbXBsZSBmaWxlLgoKCmBgYHtyfQpsaWJyYXJ5KHJlYWRyKQoKIyMgUmVhZHMgaW4gc29tZSBvZiB0aGUgc2FtcGxlIGZpbGVzLCBpdCBhbHNvIGNoYW5nZXMgYW55ICIuIiBjaGFyYWN0ZXJzIHRvIE5Bcy4KRVBJXzExMV9DRyA8LSByZWFkX2RlbGltKCJ+L0RvY3VtZW50cy9Sb2JlcnRzIExhYi9Ib2xsaWUtR2VvZHVjay9NZXJnZS10ZXN0L1ZDRi1UZXN0L0VQSS0xMTEtQ0cub3V0cHV0IiwgIlx0IiwgZXNjYXBlX2RvdWJsZSA9IEZBTFNFLCB0cmltX3dzID0gVFJVRSwgbmEgPSAiLiIpCgpFUElfMTEzX0NHIDwtIHJlYWRfZGVsaW0oIn4vRG9jdW1lbnRzL1JvYmVydHMgTGFiL0hvbGxpZS1HZW9kdWNrL01lcmdlLXRlc3QvVkNGLVRlc3QvRVBJLTExMy1DRy5vdXRwdXQiLCAiXHQiLCBlc2NhcGVfZG91YmxlID0gRkFMU0UsIHRyaW1fd3MgPSBUUlVFLCBuYSA9ICIuIikKCkVQSV8xMDRfQ0cgPC0gcmVhZF9kZWxpbSgifi9Eb2N1bWVudHMvUm9iZXJ0cyBMYWIvSG9sbGllLUdlb2R1Y2svTWVyZ2UtdGVzdC9WQ0YtVGVzdC9FUEktMTA0LUNHLm91dHB1dCIsICJcdCIsIGVzY2FwZV9kb3VibGUgPSBGQUxTRSwgdHJpbV93cyA9IFRSVUUsIG5hID0gIi4iKQoKRVBJXzExOV9DRyA8LSByZWFkX2RlbGltKCJ+L0RvY3VtZW50cy9Sb2JlcnRzIExhYi9Ib2xsaWUtR2VvZHVjay9NZXJnZS10ZXN0L1ZDRi1UZXN0L0VQSS0xMTktQ0cub3V0cHV0IiwgIlx0IiwgZXNjYXBlX2RvdWJsZSA9IEZBTFNFLCB0cmltX3dzID0gVFJVRSwgbmEgPSAiLiIpCgoKIyMgU3VtcyB0aGUgbWV0aCBhbmQgY292ZXJhZ2UgY29sdW1ucyBiZXR3ZWVuIGJvdGggc3RyYW5kcywgc2luY2UgTWV0aEV4dHJhY3Qgb3V0cHV0cyB0aGVtIHNlcGFyYXRlCkVQSV8xMDRfQ0dbLDEwXSA8LSByb3dTdW1zKGNiaW5kKEVQSV8xMDRfQ0ckYFdhdHNvbiBNRVRIYCwgRVBJXzEwNF9DRyRgQ3JpY2sgTUVUSGApLCBuYS5ybSA9IFRSVUUpCkVQSV8xMDRfQ0dbLDExXSA8LSByb3dTdW1zKGNiaW5kKEVQSV8xMDRfQ0ckYFdhdHNvbiBDT1ZFUkFHRWAsIEVQSV8xMDRfQ0ckYENyaWNrIENPVkVSQUdFYCksIG5hLnJtID0gVFJVRSkKRVBJXzEwNF9DR1ssMTJdIDwtICJFUElfMTA0IgoKRVBJXzExMV9DR1ssMTBdIDwtIHJvd1N1bXMoY2JpbmQoRVBJXzExMV9DRyRgV2F0c29uIE1FVEhgLCBFUElfMTExX0NHJGBDcmljayBNRVRIYCksIG5hLnJtID0gVFJVRSkKRVBJXzExMV9DR1ssMTFdIDwtIHJvd1N1bXMoY2JpbmQoRVBJXzExMV9DRyRgV2F0c29uIENPVkVSQUdFYCwgRVBJXzExMV9DRyRgQ3JpY2sgQ09WRVJBR0VgKSwgbmEucm0gPSBUUlVFKQpFUElfMTExX0NHWywxMl0gPC0gIkVQSV8xMTEiCgpFUElfMTEzX0NHWywxMF0gPC0gcm93U3VtcyhjYmluZChFUElfMTEzX0NHJGBXYXRzb24gTUVUSGAsIEVQSV8xMTNfQ0ckYENyaWNrIE1FVEhgKSwgbmEucm0gPSBUUlVFKQpFUElfMTEzX0NHWywxMV0gPC0gcm93U3VtcyhjYmluZChFUElfMTEzX0NHJGBXYXRzb24gQ09WRVJBR0VgLCBFUElfMTEzX0NHJGBDcmljayBDT1ZFUkFHRWApLCBuYS5ybSA9IFRSVUUpCkVQSV8xMTNfQ0dbLDEyXSA8LSAiRVBJXzExMyIKCkVQSV8xMTlfQ0dbLDEwXSA8LSByb3dTdW1zKGNiaW5kKEVQSV8xMTlfQ0ckYFdhdHNvbiBNRVRIYCwgRVBJXzExOV9DRyRgQ3JpY2sgTUVUSGApLCBuYS5ybSA9IFRSVUUpCkVQSV8xMTlfQ0dbLDExXSA8LSByb3dTdW1zKGNiaW5kKEVQSV8xMTlfQ0ckYFdhdHNvbiBDT1ZFUkFHRWAsIEVQSV8xMTlfQ0ckYENyaWNrIENPVkVSQUdFYCksIG5hLnJtID0gVFJVRSkKRVBJXzExOV9DR1ssMTJdIDwtICJFUElfMTE5IgoKCiMjIE5hbWVzIHRoZSBjb2x1bW5zCmNvbG5hbWVzKEVQSV8xMDRfQ0cpWzEwOjEyXSA8LSBjKCJUb3RhbE1ldGgxMDQiLCAiVG90YWxDb3YxMDQiLCAiU2FtcGxlTmFtZSIpCmNvbG5hbWVzKEVQSV8xMTFfQ0cpWzEwOjEyXSA8LSBjKCJUb3RhbE1ldGgxMTEiLCAiVG90YWxDb3YxMTEiLCAiU2FtcGxlTmFtZSIpCmNvbG5hbWVzKEVQSV8xMTNfQ0cpWzEwOjEyXSA8LSBjKCJUb3RhbE1ldGgxMTMiLCAiVG90YWxDb3YxMTMiLCAiU2FtcGxlTmFtZSIpCmNvbG5hbWVzKEVQSV8xMTlfQ0cpWzEwOjEyXSA8LSBjKCJUb3RhbE1ldGgxMTkiLCAiVG90YWxDb3YxMTkiLCAiU2FtcGxlTmFtZSIpCgoKCgoKIyMgTWVyZ2VzIHRoZSBkaWZmZXJlbnQgc2FtcGxlcyBpbnRvIGEgc2luZ2xlIGZpbGUuIHRoZSBtZXJnZSgpIGZ1bmN0aW9uIGluIFIgb25seSBhbGxvd3MgdHdvIGRhdGEgZnJhbWVzIHRvIGJlIG1lcmdlZCBhdCBhIHRpbWUsIHNvIHRoZXkKIyMgaGF2ZSB0byBiZSBkb25lIHNlcXVlbnRpYWxseS4gQSBtb3JlIGVmZmljaWVudCBzb2x0dWlvbiB3aWxsIHByb2JhYmx5IG5lZWQgdG8gYmUgZm91bmQgcHJpb3IgdG8gYW5hbHl6aW5nIHRoZSBmdWxsIHNhbXBsZSBzZXQKdGVzdC5tZXRoIDwtIG1lcmdlKEVQSV8xMDRfQ0dbLGMoMSwgMiwgMywgMTApXSwgRVBJXzExMV9DR1ssYygxLCAyLCAzLCAxMCldLCBieSA9IG5hbWVzKEVQSV8xMTFfQ0dbLGMoMSwyLDMpXSksIGFsbCA9IFQpCgp0ZXN0Lm1ldGgyIDwtIG1lcmdlKHRlc3QubWV0aCwgRVBJXzExM19DR1ssYygxLCAyLCAzLCAxMCldLCBhbGwgPSBUKQoKdGVzdC5tZXRoMyA8LSBtZXJnZSh0ZXN0Lm1ldGgyLCBFUElfMTE5X0NHWyxjKDEsIDIsIDMsIDEwKV0sIGFsbCA9IFQpCgp3cml0ZS50YWJsZSh0ZXN0Lm1ldGg1LCBmaWxlID0gIm1ldGh5bGF0aW9uLnR4dCIsIHNlcCA9ICIgIikKCgp0ZXN0LnRvdGFsMSA8LSBtZXJnZShFUElfMTA0X0NHWyxjKDEsIDIsIDMsIDExKV0sIEVQSV8xMTFfQ0dbLGMoMSwgMiwgMywgMTEpXSwgYnkgPSBuYW1lcyhFUElfMTExX0NHWyxjKDEsMiwzKV0pLCBhbGwgPSBUKQoKdGVzdC50b3RhbDIgPC0gbWVyZ2UodGVzdC5tZXRoLCBFUElfMTEzX0NHWyxjKDEsIDIsIDMsIDExKV0sIGFsbCA9IFQpCgp0ZXN0LnRvdGFsMyA8LSBtZXJnZSh0ZXN0Lm1ldGgyLCBFUElfMTE5X0NHWyxjKDEsIDIsIDMsIDExKV0sIGFsbCA9IFQpCgpoZWFkKHRlc3QubWV0aDMpCmBgYAoKYGBge3J9CgojIyBNYWtlcyBhIG5ldyBsYWJlbCBjb2x1bW4sIGNvbmNhdGVuYXRpbmcgdGhlIHNjYWZmb2xkIG5hbWUgYW5kIGxvY2F0aW9uLiBBbHNvIGRyb3BzIGFsbCBvZiB0aGUgb3RoZXIgbm9uLWltcG9ydGFudCBjb2x1bW5zCnRlc3QubWV0aDQgPC0gYXMuZGF0YS5mcmFtZShwYXN0ZTAodGVzdC5tZXRoM1ssMV0sICItIiwgdGVzdC5tZXRoM1ssMl0pKQoKdGVzdC5tZXRoNFssYygyLDMsNCldIDwtIHRlc3QubWV0aDNbLGMoNCw1LDYpXQpjb2xuYW1lcyh0ZXN0Lm1ldGg0KVsxXSA8LSAiU2NhZmYiCgp0ZXN0Lm1ldGg1IDwtIGFzLm1hdHJpeCh0ZXN0Lm1ldGg0WyxjKDIsMyw0KV0pCgoKaW1wdXRlLmtubihkYXRhID0gdGVzdC5tZXRoNSkKCndyaXRlLmNzdih0ZXN0Lm1ldGg1LCBmaWxlID0gInRlc3QubWV0aDUuY3N2IikKCgojIyBDYWxjdWxhdGVzIHBlcmNlbnQgbWlzc2luZyBieSBzYW1wbGUsIGFzIHdlbGwgYXMgdG90YWwgbWlzc2luZy4gfjQwJSAKCnBlcmMubWlzc2luZy5zMSA8LSBzdW0oaXMubmEodGVzdC5tZXRoM1ssNF0pKSAvIG5yb3codGVzdC5tZXRoMykKcGVyYy5taXNzaW5nLnMyIDwtIHN1bShpcy5uYSh0ZXN0Lm1ldGgzWyw1XSkpIC8gbnJvdyh0ZXN0Lm1ldGgzKQpwZXJjLm1pc3NpbmcuczMgPC0gc3VtKGlzLm5hKHRlc3QubWV0aDNbLDZdKSkgLyBucm93KHRlc3QubWV0aDMpCnBlcmMubWlzc2luZy5zNCA8LSBzdW0oaXMubmEodGVzdC5tZXRoM1ssN10pKSAvIG5yb3codGVzdC5tZXRoMykKCnRvdGFsLm1pc3NpbmcgPC0gc3VtKCFpcy5uYSh0ZXN0Lm1ldGgzWyxjKDQsNSw2LDcpXSkpIC8gKDQgKiBucm93KHRlc3QubWV0aDMpKQoKCiMjIEFmdGVyIHRhbGtpbmcgdG8gU3RldmVuLCB3ZSBkZWNpZGVkIHRvIGRyb3AgYWxsIGxvY2kgdGhhdCBkaWQgbm90IGhhdmUgMTB4IGNvdmVyYWdlIGluIGV2ZXJ5IHNhbXBsZSAKdGVzdC5tZXRoM1tjb21wbGV0ZS5jYXNlcyh0ZXN0Lm1ldGgzWywgYyg0LDUsNiw3KV0pLF0gLT4gdGVzdC5tZXRoNAoKdGVzdC5tZXRoNSA8LSB0ZXN0Lm1ldGg0W2FwcGx5KHRlc3QubWV0aDRbLCBjKDQsNSw2LDcpXSwgTUFSR0lOID0gMSwgZnVuY3Rpb24oeCkgYWxsKHggPiAxMCkpLCBdCgp0ZXN0LnRvdGFsM1tjb21wbGV0ZS5jYXNlcyh0ZXN0LnRvdGFsM1ssIGMoNCw1LDYsNyldKSxdIC0+IHRlc3QudG90YWw0Cgp0ZXN0LnRvdGFsNSA8LSB0ZXN0Lm1ldGg0W2FwcGx5KHRlc3QudG90YWw0WywgYyg0LDUsNiw3KV0sIE1BUkdJTiA9IDEsIGZ1bmN0aW9uKHgpIGFsbCh4ID4gMTApKSwgXQoKCgoKCiMjIE1ha2VzIHRoZSBzYW1lIGNvbmNhdGVuYXRlZCBsYWJlbCBjb2x1bW4gYWdhaW4sIG9vcHMuIAp0ZXN0Lm1ldGg1Wyw4XSA8LSBwYXN0ZTAodGVzdC5tZXRoNVssMV0sICItIiwgdGVzdC5tZXRoNVssMl0pCgp0ZXN0LnRvdGFsNVssOF0gPC0gcGFzdGUwKHRlc3QudG90YWw1WywxXSwgIi0iLCB0ZXN0LnRvdGFsNVssMl0pCgojIyBSZW1vdmVzIGFsbCBsb2NpIHByZXNlbnQgaW4gdGhlIHRvdGFsIGNvdmVyYWdlIGZpbGUsIGJ1dCBub3QgaW4gdGhlIG1ldGh5bGF0aW9uIGZpbGUuCnRlc3QudG90YWw2IDwtIHRlc3QudG90YWw1W3Rlc3QudG90YWw1Wyw4XSAlaW4lIHRlc3QubWV0aDVbLDhdLF0KCiMjIENyZWF0ZXMgdGhlIGZpbmFsIGNvbHVtbiBmb3IgTWFjYXUgaW5wdXQsIHdpdGggbGFiZWxzLCB0aGVuIHdyaXRlcyB0aGVtIHRvIHdoaXRlLXNwYWNlIHNlcGFyYXRlZCBmaWxlcy4gCgp0ZXN0Lm1ldGg2IDwtIGNiaW5kKHRlc3QubWV0aDVbLDhdLCB0ZXN0Lm1ldGg1Wyw0XSwgdGVzdC5tZXRoNVssNV0sIHRlc3QubWV0aDVbLDZdLCB0ZXN0Lm1ldGg1Wyw3XSkKCmNvbG5hbWVzKHRlc3QubWV0aDYpIDwtIGMoImxvYyIsICJFUEktMTA0IiwgIkVQSS0xMTEiLCAiRVBJLTExMyIsICJFUEktMTE5IikKCnRlc3QudG90YWw3IDwtIGNiaW5kKHRlc3QudG90YWw2Wyw4XSwgdGVzdC50b3RhbDZbLDRdLCB0ZXN0LnRvdGFsNlssNV0sIHRlc3QudG90YWw2Wyw2XSwgdGVzdC50b3RhbDZbLDddKQpjb2xuYW1lcyh0ZXN0LnRvdGFsNykgPC0gYygibG9jIiwgIkVQSS0xMDQiLCAiRVBJLTExMSIsICJFUEktMTEzIiwgIkVQSS0xMTkiKQoKd3JpdGUudGFibGUodGVzdC5tZXRoNiwgZmlsZSA9ICJtZXRoLnR4dCIsIHNlcCA9ICIgIikKd3JpdGUudGFibGUodGVzdC50b3RhbDcsIGZpbGUgPSAidG90YWxjb3YudHh0Iiwgc2VwID0gIiAiKQoKYGBgCgoK