Details

This is an R Markdown document. The data for this analysis was collected on 13 January 2017. I have a pool of paired-end 100 sequences from Cedrela species. These sequences were obtained via hybridization capture, targeted enrichment, and short-read sequencing on the Illumina HiSeq 3000.

I used kmercountexact.sh from bbtools to produce a k-mer frequency distribution.

library(ggplot2)

Load Data

ced_gen <- read.table("CEOD_khist.txt")
khist<-ggplot(ced_gen[5:100,], aes(x=V1, y=V2))+
  geom_vline(xintercept = 25, color="red")+
  geom_line(size=1)+
  theme_bw()+
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        plot.title = element_text(hjust = 0))+
  labs(x = "k-mer Depth", y = "Count")+
  ggtitle("K-mer Frequency")
khist

Option: save

#ggsave("khist.jpg",plot=khist, width=5, height=3.5)