MAGeCK Count Report

Author: Wei Li, weililab.org

Parameters

comparison_name is the prefix of your output file, defined by the “-n” parameter in your “mageck test” command. The system will look for the following files to generate this report:

comparison_name.countsummary.txt
comparison_name.count_normalized.txt
comparison_name.log

# define the comparison_name here; for example,
# comparison_name='demo'
comparison_name='cdh1'

Preprocessing

Reading input files. If any of these files are problematic, an error message will be shown below.

cstable=read.table(count_summary_file,header = T,as.is = T)
nc_table=read.table(normalized_cnt_file,header = T,as.is = T)

Summary

The summary of the count command is as follows.

Count command summary
File	Label	Reads	Mapped	Percentage	TotalsgRNAs	Zerocounts	GiniIndex
/juno/work/solitlab/jc/Project_11252_D/trimmed/CDH1hi.merged.fastq.gz	High	16112919	4105474	0.2548	77438	3324	0.17500
/juno/work/solitlab/jc/Project_11252_D/trimmed/CDH1lo.merged.fastq.gz	Low	18839795	5681368	0.3016	77438	3007	0.16920
/juno/work/solitlab/jc/Project_11252_D/trimmed/Lib.merged.fastq.gz	CTRL	43856292	40320961	0.9194	77438	40	0.05808

The meanings of the columns are as follows.

File: The filename of fastq file;
Label: Assigned label;
Reads: The total read count in the fastq file;
Mapped: Reads that can be mapped to gRNA library;
Percentage: The percentage of mapped reads;
TotalsgRNAs: The number of sgRNAs in the library;
ZeroCounts: The number of sgRNA with 0 read counts;
GiniIndex: The Gini Index of the read count distribution. Gini index can be used to measure the evenness of the read counts, and a smaller value means a more even distribution of the read counts.

If –day0label and –gmt-file options are provided, the following metrics will display the degree of negative selections of essential genes (provided by –gmt-file).

Count command summary
File	Label	NegSelQCPval	NegSelQCPvalPermutation	NegSelQCPvalPermutationFDR
/juno/work/solitlab/jc/Project_11252_D/trimmed/CDH1hi.merged.fastq.gz	High	1	1	1
/juno/work/solitlab/jc/Project_11252_D/trimmed/CDH1lo.merged.fastq.gz	Low	1	1	1
/juno/work/solitlab/jc/Project_11252_D/trimmed/Lib.merged.fastq.gz	CTRL	1	1	1

The meanings of the columns are as follows.

NegSelQC: the enrichment score (ES) of essential genes in the negative selection list. The score is calculated using GSEA;
NegSelQCPval: the associated p value of the enrichment score;
NegSelQCPvalPermutation: the permutated p value of the enrichment score;
NegSelQCPvalPermutationFDR: the adjusted permutated p value;
NegSelQCGene: the number of genes used for the analysis.

Normalized read count distribution of all samples

The following figure shows the distribution of median-normalized read counts in all samples.

The following figure shows the histogram of median-normalized read counts in all samples.

Principle Component Analysis

The following figure shows the first 2 principle components (PCs) from the Principle Component Analysis (PCA), and the percentage of variances explained by the top PCs.

The variance of the PCs

Sample clustering

The following figure shows the sample clustering result.