Goal: get featureCounts’s output summary to look tidy, and plot the number of reads in each assignment bin. First, let’s look at the raw output from the featureCounts summary file:
read.delim("data/counts.summary.txt")
## Status wt1.star.Aligned.out.sam
## 1 Assigned 15704854
## 2 Unassigned_Ambiguity 162382
## 3 Unassigned_MultiMapping 11598208
## 4 Unassigned_NoFeatures 6735262
## 5 Unassigned_Unmapped 0
## 6 Unassigned_MappingQuality 0
## 7 Unassigned_FragementLength 0
## 8 Unassigned_Chimera 0
## 9 Unassigned_Secondary 0
## wt2.star.Aligned.out.sam ko1.star.Aligned.out.sam
## 1 18961834 18700313
## 2 178964 181625
## 3 12366479 15265678
## 4 7907991 8699869
## 5 0 0
## 6 0 0
## 7 0 0
## 8 0 0
## 9 0 0
## ko2.star.Aligned.out.sam
## 1 20053581
## 2 200727
## 3 17661981
## 4 8269903
## 5 0
## 6 0
## 7 0
## 8 0
## 9 0
Pretty ugly. Let’s make it tidy and make a plot.
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)
d <- read.delim("data/counts.summary.txt", row.names=1) %>%
# transpose and turn it from a matrix to a data frame
t %>%
as.data.frame %>%
# make a "sample" column from the rownames of the object.
# remove the ".star.Aligned.out.sam" from the pattern
mutate(sample=gsub(".star.Aligned.out.sam", "", rownames(.))) %>%
# Reorder the columns
select(sample, Assigned:Unassigned_NoFeatures) %>%
# Make it tidy
gather(Assignment, ReadCounts, -sample)
d
## sample Assignment ReadCounts
## 1 wt1 Assigned 15704854
## 2 wt2 Assigned 18961834
## 3 ko1 Assigned 18700313
## 4 ko2 Assigned 20053581
## 5 wt1 Unassigned_Ambiguity 162382
## 6 wt2 Unassigned_Ambiguity 178964
## 7 ko1 Unassigned_Ambiguity 181625
## 8 ko2 Unassigned_Ambiguity 200727
## 9 wt1 Unassigned_MultiMapping 11598208
## 10 wt2 Unassigned_MultiMapping 12366479
## 11 ko1 Unassigned_MultiMapping 15265678
## 12 ko2 Unassigned_MultiMapping 17661981
## 13 wt1 Unassigned_NoFeatures 6735262
## 14 wt2 Unassigned_NoFeatures 7907991
## 15 ko1 Unassigned_NoFeatures 8699869
## 16 ko2 Unassigned_NoFeatures 8269903
d %>% ggplot(aes(Assignment, ReadCounts)) + geom_bar(stat="identity", aes(fill=sample), position="dodge")