Mostly just housekeeping stuff, setting working directories, making directories, and setting a sample vector to iterate over in future steps.
setwd("~/Documents/Genewiz_hdd/Day145")
The working directory was changed to /home/sean/Documents/Genewiz_hdd/Day145 inside a notebook chunk. The working directory will be reset when the chunk is finished running. Use the knitr root.dir option in the setup chunk to change the the working directory for notebook chunks.
system("mkdir SNPvcf")
system("mkdir CGoutput")
samples.vec <- c("EPI-175", "EPI-176", "EPI-181", "EPI-182", "EPI-184", "EPI-185", "EPI-187", "EPI-188", "EPI-193", "EPI-195", "EPI-199", "EPI-200", "EPI-205", "EPI-206", "EPI-208", "EPI-209", "EPI-214", "EPI-215", "EPI-220", "EPI-221", "EPI-226", "EPI-227", "EPI-229", "EPI-230")
This copies all of the SNP files to a SNP directory, and the cgoutput files to a cgoutput directory. Just for organization
system("scp *.vcf ~/Documents/Genewiz_hdd/Day135/SNPvcf")
system("scp *.output ~/Documents/Genewiz_hdd/Day135/CGoutput")
This bgzips, indexes, and merges all of the SNP files, and then runs vcftools to make the relatedness matrix.
setwd("~/Documents/Genewiz_hdd/Day135/SNPvcf")
for(i in 1:length(samples.vec)) {
system(paste0("bgzip ", samples.vec[i], "_SNVs.vcf"))
system(paste0("tabix ", samples.vec[i], "_SNVs.vcf.gz"))
}
system("vcf-merge *.gz > day10-mergedSNV.vcf")
system("vcftools --relatedness --vcf day10-mergedSNV.vcf --out day10-relatedness")
system("head day10-relatedness.relatedness")
After fixing the vcftools output in excel, it looks like this. Not very exciting.
this reads in the individual cg.output files from methylextract and combines them in to a single large data table.
library(readr)
setwd("~/Documents/Genewiz_hdd/Day135/CGoutput")
data <- list.files(path = ".", pattern = "*.output")
print(data)
db <- read_delim("~/Documents/Genewiz_hdd/Day135/CGoutput/EPI-151_CG.output", "\t", escape_double = FALSE, trim_ws = TRUE, na = ".")
db <- as.data.table(db)
db$EPI151Meth <- rowSums(cbind(db[,4], db[,7]), na.rm = TRUE)
db$EPI151TotCov <- rowSums(cbind(db[,5], db[,8]), na.rm = TRUE)
meth.db <- db[,c(1,2,3,10)]
totcov.db <- db[,c(1,2,3,11)]
for(i in 2:length(data)) {
temp <- read_delim(paste0("~/Documents/Genewiz_hdd/Day135/CGoutput/", data[i]), "\t", escape_double = FALSE, trim_ws = TRUE, na = ".")
temp <- as.data.table(temp)
temp$Meth <- rowSums(cbind(temp[,4], temp[,7]), na.rm = TRUE)
temp$TotCov <- rowSums(cbind(temp[,5], temp[,8]), na.rm = TRUE)
meth.db <- merge(meth.db, temp[,c(1,2,3,10)], by = names(meth.db[,c(1,2,3)]), all = TRUE)
colnames(meth.db)[ncol(meth.db)] <- paste0(substr(data[i], 1, 7), "Meth")
meth.db <- as.data.table(meth.db)
totcov.db <- merge(totcov.db, temp[,c(1,2,3,11)], by = names(totcov.db[,c(1,2,3)]), all = TRUE)
colnames(totcov.db)[ncol(totcov.db)] <- paste0(substr(data[i], 1, 7), "TotCov")
totcov.db <- as.data.table(totcov.db)
}
This is the part where I make my coverage files. It’s also the part where I made my logic error. The commented line applies our 10x coverage requirement to the methylated counts, which makes no sense now that I think about it. We lose some number of hypomethylated spots, which is not good. Glad I fixed it before we got too far in to everything.
setwd("~/Documents/Genewiz_hdd/Day10/macau")
The working directory was changed to /home/sean/Documents/Genewiz_hdd/Day10/macau inside a notebook chunk. The working directory will be reset when the chunk is finished running. Use the knitr root.dir option in the setup chunk to change the the working directory for notebook chunks.
write.table(meth.db5, file = "macau.meth2.txt", sep = " ", quote = FALSE, row.names = FALSE)
write.table(totcov.db4, file = "macau.totalcov2.txt", sep = " ", quote = FALSE, row.names = FALSE)
setwd("~/Documents/Genewiz_hdd/Day10/macau")
The working directory was changed to /home/sean/Documents/Genewiz_hdd/Day10/macau inside a notebook chunk. The working directory will be reset when the chunk is finished running. Use the knitr root.dir option in the setup chunk to change the the working directory for notebook chunks.
system("macau -g macau.meth2.txt -t macau.totalcov2.txt -p day10-pred.csv -k day10-relate-finished.csv -o output3 -s 10000 -seed 123")
Reading Files ...
## number of total individuals = 12
## number of analyzed individuals = 12
## number of covariates = 1
## number of total genes/sites = 15558
Performing Analysis 0.00%
Performing Analysis === 6.43%
Performing Analysis ====== 12.86%
Performing Analysis ========= 19.28%
Performing Analysis ============ 25.71%
Performing Analysis ================ 32.14%
Performing Analysis =================== 38.57%
Performing Analysis ====================== 45.00%
Performing Analysis ========================= 51.42%
Performing Analysis ============================ 57.85%
Performing Analysis ================================ 64.28%
Performing Analysis =================================== 70.71%
Performing Analysis ====================================== 77.14%
Performing Analysis ========================================= 83.56%
Performing Analysis ============================================ 89.99%
Performing Analysis ================================================ 96.42%
Performing Analysis ==================================================100.00%
setwd("~/Documents/Genewiz_hdd/Day135/macau/")
The working directory was changed to /home/sean/Documents/Genewiz_hdd/Day135/macau inside a notebook chunk. The working directory will be reset when the chunk is finished running. Use the knitr root.dir option in the setup chunk to change the the working directory for notebook chunks.
result_assoc <- read_delim("~/Documents/Genewiz_hdd/Day135/macau/output/output3.assoc.txt", "\t", escape_double = FALSE, trim_ws = TRUE)
Parsed with column specification:
cols(
id = col_character(),
n = col_integer(),
acpt_rate = col_double(),
beta = col_double(),
se_beta = col_double(),
pvalue = col_double(),
h = col_double(),
se_h = col_double(),
sigma2 = col_double(),
se_sigma2 = col_double(),
alpha0 = col_double(),
se_alpha0 = col_double()
)
sig.results <- result_assoc[result_assoc$pvalue <= 0.05,]
for(i in 1:nrow(sig.results)) {
temp <- strsplit(sig.results$id[i], "-")
sig.results[i,13] <- temp[[1]][1]
sig.results[i,14] <- temp[[1]][2]
}
colnames(sig.results)[c(13,14)] <- c("Scaffold", "Loc")
bed.test <- as.data.frame(cbind(sig.results[,13], (sig.results[,14]), (sig.results[,14]), sig.results$beta))
colnames(bed.test)[4] <- "beta"
write.table(bed.test, file = "Day135-10xCov-DiffMethRegions.bed", sep = "\t", quote = FALSE, col.names = FALSE, row.names = FALSE)
nrow(sig.results)
[1] 159
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).