First, lets test that the files I’ve been operating on on Emu are correct. There are the original 24 samples that are not in the second drop on Emu, so hopefully the set difference will only have those 24 files, otherwise I’ll be profoundly sad.
Lets look.
library(readr)
hd_md5s <- read_delim("~/Documents/fixNight/md5sum.txt",
" ", escape_double = FALSE, col_names = FALSE,
trim_ws = TRUE)
Parsed with column specification:
cols(
X1 = col_character(),
X2 = col_character()
)
emu_md5s <- read_delim("~/Documents/fixNight/emu_md5s.txt",
" ", escape_double = FALSE, col_names = FALSE,
trim_ws = TRUE)
Parsed with column specification:
cols(
X1 = col_character(),
X2 = col_character()
)
bad.emu_md5s.2 <- emu_md5s[!emu_md5s$X2 %in% hd_md5s$X2,]
print(bad.emu_md5s.2)
Huzzah. Those are the 24 original drop files, so I didn’t waste tons of time on bad files. Hooray.
Next lets look at the RRBS2 directory under p_generosa on nightingales. I originally copied files here because A) I didn’t trust my script totally, and B) I had some permission issues on Owl. There will only be the second drop files here, so there’s hopefully nothing in the badrrbs2 data frame after I’m done
rrbs2_md5s <- read_delim("~/Documents/fixNight/rrbs2_md5s.txt",
",", escape_double = FALSE, col_names = FALSE,
trim_ws = TRUE)
Parsed with column specification:
cols(
X1 = col_character(),
X2 = col_character()
)
badrrbs2_md5s <- rrbs2_md5s[!rrbs2_md5s$X1 %in% hd_md5s$X1,]
print(badrrbs2_md5s)
Aaaaand not so much. 4 bad files there.
How about in p_geneorsa itself? I’ll shorten the data frame to only include those drop 2 files initially, just for ease.
p_generosa_md5s <- read_delim("~/Documents/fixNight/p_generosa_md5s.txt",
" ", escape_double = FALSE, col_names = FALSE,
trim_ws = TRUE)
Parsed with column specification:
cols(
X1 = col_character(),
X2 = col_character()
)
p_generosa_md5s.2 <- p_generosa_md5s[p_generosa_md5s$X2 %in% hd_md5s$X2,]
p_generosa_md5s.3 <- p_generosa_md5s.2[!p_generosa_md5s.2$X1 %in% hd_md5s$X1,]
print(p_generosa_md5s.3)
Looks like there’s 7 to fix, from my files on Emu. Once that’s done and they check out, everything should match the files on the Genewiz supplied hard drive.
files.to.fix <- p_generosa_md5s.3$X2
lets check the new files just copied over
Woo, looks like all of the files match what was on the Genewiz HDD.
Just for fun, I’ll run the MD5s on the Nightingales/P_generosa files once more. Just because.
system("md5sum ~/Documents/owl/nightingales/P_generosa/EPI*.gz >> ~/Documents/fixNight/secondnightMD5s.txt")
Perform the same operations as the P_generosa files previously, where I subset them to get just the 82 files from the hard drive, and then look for MD5s that aren’t in the hard drive MD5 list.
Ok! Everything on nightingales now matches the files delivered by Genewiz.
LS0tCnRpdGxlOiAiRml4IGdlb2R1Y2sgZmlsZXMgb24gTmlnaHRpbmdhbGVzIgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgpGaXJzdCwgbGV0cyB0ZXN0IHRoYXQgdGhlIGZpbGVzIEkndmUgYmVlbiBvcGVyYXRpbmcgb24gb24gRW11IGFyZSBjb3JyZWN0LiBUaGVyZSBhcmUgdGhlIG9yaWdpbmFsIDI0IHNhbXBsZXMgdGhhdCBhcmUgbm90IGluIHRoZSBzZWNvbmQgZHJvcCBvbiBFbXUsIHNvIGhvcGVmdWxseSB0aGUgc2V0IGRpZmZlcmVuY2Ugd2lsbCBvbmx5IGhhdmUgdGhvc2UgMjQgZmlsZXMsIG90aGVyd2lzZSBJJ2xsIGJlIHByb2ZvdW5kbHkgc2FkLiAKCkxldHMgbG9vay4KCmBgYHtyfQpsaWJyYXJ5KHJlYWRyKQoKaGRfbWQ1cyA8LSByZWFkX2RlbGltKCJ+L0RvY3VtZW50cy9maXhOaWdodC9tZDVzdW0udHh0IiwgCiAgICAgIiAiLCBlc2NhcGVfZG91YmxlID0gRkFMU0UsIGNvbF9uYW1lcyA9IEZBTFNFLCAKICAgICB0cmltX3dzID0gVFJVRSkKCmVtdV9tZDVzIDwtIHJlYWRfZGVsaW0oIn4vRG9jdW1lbnRzL2ZpeE5pZ2h0L2VtdV9tZDVzLnR4dCIsIAogICAgICIgIiwgZXNjYXBlX2RvdWJsZSA9IEZBTFNFLCBjb2xfbmFtZXMgPSBGQUxTRSwgCiAgICAgdHJpbV93cyA9IFRSVUUpCgpiYWQuZW11X21kNXMuMiA8LSBlbXVfbWQ1c1shZW11X21kNXMkWDIgJWluJSBoZF9tZDVzJFgyLF0KCnByaW50KGJhZC5lbXVfbWQ1cy4yKQoKYGBgCgpIdXp6YWguIFRob3NlIGFyZSB0aGUgMjQgb3JpZ2luYWwgZHJvcCBmaWxlcywgc28gSSBkaWRuJ3Qgd2FzdGUgdG9ucyBvZiB0aW1lIG9uIGJhZCBmaWxlcy4gSG9vcmF5LgoKTmV4dCBsZXRzIGxvb2sgYXQgdGhlIFJSQlMyIGRpcmVjdG9yeSB1bmRlciBwX2dlbmVyb3NhIG9uIG5pZ2h0aW5nYWxlcy4gSSBvcmlnaW5hbGx5IGNvcGllZCBmaWxlcyBoZXJlIGJlY2F1c2UgQSkgSSBkaWRuJ3QgdHJ1c3QgbXkgc2NyaXB0IHRvdGFsbHksIGFuZCBCKSBJIGhhZCBzb21lIHBlcm1pc3Npb24gaXNzdWVzIG9uIE93bC4gVGhlcmUgd2lsbCBvbmx5IGJlIHRoZSBzZWNvbmQgZHJvcCBmaWxlcyBoZXJlLCBzbyB0aGVyZSdzIGhvcGVmdWxseSBub3RoaW5nIGluIHRoZSBiYWRycmJzMiBkYXRhIGZyYW1lIGFmdGVyIEknbSBkb25lCgpgYGB7cn0KCnJyYnMyX21kNXMgPC0gcmVhZF9kZWxpbSgifi9Eb2N1bWVudHMvZml4TmlnaHQvcnJiczJfbWQ1cy50eHQiLCAKICAgICAiLCIsIGVzY2FwZV9kb3VibGUgPSBGQUxTRSwgY29sX25hbWVzID0gRkFMU0UsIAogICAgIHRyaW1fd3MgPSBUUlVFKQpiYWRycmJzMl9tZDVzIDwtIHJyYnMyX21kNXNbIXJyYnMyX21kNXMkWDEgJWluJSBoZF9tZDVzJFgxLF0KCnByaW50KGJhZHJyYnMyX21kNXMpCmBgYAoKQWFhYWFuZCBub3Qgc28gbXVjaC4gNCBiYWQgZmlsZXMgdGhlcmUuCgpIb3cgYWJvdXQgaW4gcF9nZW5lb3JzYSBpdHNlbGY/IEknbGwgc2hvcnRlbiB0aGUgZGF0YSBmcmFtZSB0byBvbmx5IGluY2x1ZGUgdGhvc2UgZHJvcCAyIGZpbGVzIGluaXRpYWxseSwganVzdCBmb3IgZWFzZS4KYGBge3J9CnBfZ2VuZXJvc2FfbWQ1cyA8LSByZWFkX2RlbGltKCJ+L0RvY3VtZW50cy9maXhOaWdodC9wX2dlbmVyb3NhX21kNXMudHh0IiwgCiAgICAgIiAiLCBlc2NhcGVfZG91YmxlID0gRkFMU0UsIGNvbF9uYW1lcyA9IEZBTFNFLCAKICAgICB0cmltX3dzID0gVFJVRSkKCnBfZ2VuZXJvc2FfbWQ1cy4yIDwtIHBfZ2VuZXJvc2FfbWQ1c1twX2dlbmVyb3NhX21kNXMkWDIgJWluJSBoZF9tZDVzJFgyLF0KCnBfZ2VuZXJvc2FfbWQ1cy4zIDwtIHBfZ2VuZXJvc2FfbWQ1cy4yWyFwX2dlbmVyb3NhX21kNXMuMiRYMSAlaW4lIGhkX21kNXMkWDEsXQoKCnByaW50KHBfZ2VuZXJvc2FfbWQ1cy4zKQpgYGAKCgpMb29rcyBsaWtlIHRoZXJlJ3MgNyB0byBmaXgsIGZyb20gbXkgZmlsZXMgb24gRW11LiBPbmNlIHRoYXQncyBkb25lIGFuZCB0aGV5IGNoZWNrIG91dCwgZXZlcnl0aGluZyBzaG91bGQgbWF0Y2ggdGhlIGZpbGVzIG9uIHRoZSBHZW5ld2l6IHN1cHBsaWVkIGhhcmQgZHJpdmUuIAoKYGBge3J9CgpmaWxlcy50by5maXggPC0gcF9nZW5lcm9zYV9tZDVzLjMkWDIKCnN5c3RlbShwYXN0ZTAoInJtIH4vRG9jdW1lbnRzL293bC9uaWdodGluZ2FsZXMvUF9nZW5lcm9zYS8iLCBmaWxlcy50by5maXhbaV0pKQpzeXN0ZW0ocGFzdGUwKCJzY3Agfi9Eb2N1bWVudHMvR2VuZXdpel9oZGQvIiwgZmlsZXMudG8uZml4W2ldLCAiIH4vRG9jdW1lbnRzL293bC9uaWdodGluZ2FsZXMvUF9nZW5lcm9zYS8iLCBmaWxlcy50by5maXhbaV0pKQpzeXN0ZW0ocGFzdGUwKCJtZDVzdW0gfi9Eb2N1bWVudHMvb3dsL25pZ2h0aW5nYWxlcy9QX2dlbmVyb3NhLyIsIGZpbGVzLnRvLmZpeFtpXSwgIiA+PiB+L0RvY3VtZW50cy9maXhOaWdodC9uZXdNRDVzLnR4dCIpKQoKZm9yKGkgaW4gMjpsZW5ndGgoZmlsZXMudG8uZml4KSkgICB7CiAgCiAgc3lzdGVtKHBhc3RlMCgicm0gfi9Eb2N1bWVudHMvb3dsL25pZ2h0aW5nYWxlcy9QX2dlbmVyb3NhLyIsIGZpbGVzLnRvLmZpeFtpXSkpCiAgc3lzdGVtKHBhc3RlMCgic2NwIH4vRG9jdW1lbnRzL0dlbmV3aXpfaGRkLyIsIGZpbGVzLnRvLmZpeFtpXSwgIiB+L0RvY3VtZW50cy9vd2wvbmlnaHRpbmdhbGVzL1BfZ2VuZXJvc2EvIiwgZmlsZXMudG8uZml4W2ldKSkKICBzeXN0ZW0ocGFzdGUwKCJtZDVzdW0gfi9Eb2N1bWVudHMvb3dsL25pZ2h0aW5nYWxlcy9QX2dlbmVyb3NhLyIsIGZpbGVzLnRvLmZpeFtpXSwgIiA+PiB+L0RvY3VtZW50cy9maXhOaWdodC9uZXdNRDVzLnR4dCIpKQogIAogIAp9CgoKYGBgCgpsZXRzIGNoZWNrIHRoZSBuZXcgZmlsZXMganVzdCBjb3BpZWQgb3ZlcgoKYGBge3J9Cm5ld19tZDVzIDwtIHJlYWRfZGVsaW0oIn4vRG9jdW1lbnRzL2ZpeE5pZ2h0L25ld01ENXMudHh0IiwgCiAgICAgIiAiLCBlc2NhcGVfZG91YmxlID0gRkFMU0UsIGNvbF9uYW1lcyA9IEZBTFNFLCAKICAgICB0cmltX3dzID0gVFJVRSkKCnByaW50KG5ld19tZDVzKQoKY2hlY2subWQ1cyA8LSBuZXdfbWQ1c1shbmV3X21kNXMkWDIgJWluJSBoZF9tZDVzJFgyLF0KCnByaW50KGNoZWNrLm1kNXMpCmBgYAoKV29vLCBsb29rcyBsaWtlIGFsbCBvZiB0aGUgZmlsZXMgbWF0Y2ggd2hhdCB3YXMgb24gdGhlIEdlbmV3aXogSERELgoKCkp1c3QgZm9yIGZ1biwgSSdsbCBydW4gdGhlIE1ENXMgb24gdGhlIE5pZ2h0aW5nYWxlcy9QX2dlbmVyb3NhIGZpbGVzIG9uY2UgbW9yZS4gSnVzdCBiZWNhdXNlLgoKCmBgYHtyfQoKc3lzdGVtKCJtZDVzdW0gfi9Eb2N1bWVudHMvb3dsL25pZ2h0aW5nYWxlcy9QX2dlbmVyb3NhL0VQSSouZ3ogPj4gfi9Eb2N1bWVudHMvZml4TmlnaHQvc2Vjb25kbmlnaHRNRDVzLnR4dCIpCgpgYGAKClBlcmZvcm0gdGhlIHNhbWUgb3BlcmF0aW9ucyBhcyB0aGUgUF9nZW5lcm9zYSBmaWxlcyBwcmV2aW91c2x5LCB3aGVyZSBJIHN1YnNldCB0aGVtIHRvIGdldCBqdXN0IHRoZSA4MiBmaWxlcyBmcm9tIHRoZSBoYXJkIGRyaXZlLCBhbmQgdGhlbiBsb29rIGZvciBNRDVzIHRoYXQgYXJlbid0IGluIHRoZSBoYXJkIGRyaXZlIE1ENSBsaXN0LgoKYGBge3J9CgpzZWNvbmRuaWdodF9tZDVzIDwtIHJlYWRfZGVsaW0oIn4vRG9jdW1lbnRzL2ZpeE5pZ2h0L3NlY29uZG5pZ2h0TUQ1cy50eHQiLCAKICAgICAiLCIsIGVzY2FwZV9kb3VibGUgPSBGQUxTRSwgY29sX25hbWVzID0gRkFMU0UsIAogICAgIHRyaW1fd3MgPSBUUlVFKQoKc2Vjb25kbmlnaHRfbWQ1cy4yIDwtIHNlY29uZG5pZ2h0X21kNXNbc2Vjb25kbmlnaHRfbWQ1cyRYMiAlaW4lIGhkX21kNXMkWDIsXQoKc2Vjb25kbmlnaHRfbWQ1cy4zIDwtIHNlY29uZG5pZ2h0X21kNXMuMlshc2Vjb25kbmlnaHRfbWQ1cy4yJFgxICVpbiUgaGRfbWQ1cyRYMSxdCgpwcmludChzZWNvbmRuaWdodF9tZDVzLjMpCgpgYGAKCk9rISBFdmVyeXRoaW5nIG9uIG5pZ2h0aW5nYWxlcyBub3cgbWF0Y2hlcyB0aGUgZmlsZXMgZGVsaXZlcmVkIGJ5IEdlbmV3aXou