Processing featureCounts summary

Goal: get featureCounts’s output summary to look tidy, and plot the number of reads in each assignment bin. First, let’s look at the raw output from the featureCounts summary file:

read.delim("data/counts.summary.txt")

##                       Status wt1.star.Aligned.out.sam
## 1                   Assigned                 15704854
## 2       Unassigned_Ambiguity                   162382
## 3    Unassigned_MultiMapping                 11598208
## 4      Unassigned_NoFeatures                  6735262
## 5        Unassigned_Unmapped                        0
## 6  Unassigned_MappingQuality                        0
## 7 Unassigned_FragementLength                        0
## 8         Unassigned_Chimera                        0
## 9       Unassigned_Secondary                        0
##   wt2.star.Aligned.out.sam ko1.star.Aligned.out.sam
## 1                 18961834                 18700313
## 2                   178964                   181625
## 3                 12366479                 15265678
## 4                  7907991                  8699869
## 5                        0                        0
## 6                        0                        0
## 7                        0                        0
## 8                        0                        0
## 9                        0                        0
##   ko2.star.Aligned.out.sam
## 1                 20053581
## 2                   200727
## 3                 17661981
## 4                  8269903
## 5                        0
## 6                        0
## 7                        0
## 8                        0
## 9                        0

Pretty ugly. Let’s make it tidy and make a plot.

library(dplyr)

## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyr)
library(ggplot2)


d <- read.delim("data/counts.summary.txt", row.names=1) %>%
  # transpose and turn it from a matrix to a data frame
  t %>%
  as.data.frame %>%
  # make a "sample" column from the rownames of the object.
  # remove the ".star.Aligned.out.sam" from the pattern
  mutate(sample=gsub(".star.Aligned.out.sam", "", rownames(.))) %>%
  # Reorder the columns
  select(sample, Assigned:Unassigned_NoFeatures) %>%
  # Make it tidy
  gather(Assignment, ReadCounts, -sample)
d

##    sample              Assignment ReadCounts
## 1     wt1                Assigned   15704854
## 2     wt2                Assigned   18961834
## 3     ko1                Assigned   18700313
## 4     ko2                Assigned   20053581
## 5     wt1    Unassigned_Ambiguity     162382
## 6     wt2    Unassigned_Ambiguity     178964
## 7     ko1    Unassigned_Ambiguity     181625
## 8     ko2    Unassigned_Ambiguity     200727
## 9     wt1 Unassigned_MultiMapping   11598208
## 10    wt2 Unassigned_MultiMapping   12366479
## 11    ko1 Unassigned_MultiMapping   15265678
## 12    ko2 Unassigned_MultiMapping   17661981
## 13    wt1   Unassigned_NoFeatures    6735262
## 14    wt2   Unassigned_NoFeatures    7907991
## 15    ko1   Unassigned_NoFeatures    8699869
## 16    ko2   Unassigned_NoFeatures    8269903

d %>% ggplot(aes(Assignment, ReadCounts)) + geom_bar(stat="identity", aes(fill=sample), position="dodge")

Processing featureCounts summary

Stephen Turner

April 20, 2015