This document describes contains some basic information from the RNA-seq data from Zebra fish. All commands needed to run the analysis that have been done. Based on these results I can perform downstream analysis of the results.
12 individuals from 3 different treatments (eg 4 replicates/treatment) were sequenced using Proton technology. The output per library is shown in the table below. The column ‘Number of reads’ is the actual macine output, whereas ‘Useful reads’ are the number of reads that are mapping to areas of the genome annotated as genes and hence part of the gene expression data.
Lib Name | Treatment | Replicate | Number of reads | Useful reads |
---|---|---|---|---|
up-87-1 | Control | 1 | 19741198 | 8153032 |
up-87-2 | Control | 2 | 22141691 | 9051586 |
up-87-3 | Control | 3 | 19379328 | 8847111 |
up-87-4 | Control | 4 | 20509164 | 5244292 |
up-87-5 | EE2-3 | 1 | 21248836 | 6671255 |
up-87-6 | EE2-3 | 2 | 19692093 | 9482823 |
up-87-7 | EE2-3 | 3 | 19737758 | 7617444 |
up-87-8 | EE2-3 | 4 | 14064762 | 6310603 |
up-87-9 | EE2-10 | 1 | 18336786 | 10896593 |
up-87-10 | EE2-10 | 2 | 14963248 | 8864047 |
up-87-11 | EE2-10 | 3 | 13291577 | 7756863 |
up-87-12 | EE2-10 | 4 | 21512379 | 7836155 |
All this data were mapped to the Danio rerio v. 9 and the ensembl annotation were used for extracting gene expression data.
The average quality or reads looks okay, but the mean level is slightly lower than commonly seen for Illumina reads. I think this expected with this technology as this is seen in all projects that I have been involved in that use proton data. I hence see no need to filter reads prior to mapping reads to the references sequence.
Control 1
Control 2
Control 3
Control 4
EE2-3 1
EE2-3 2
EE2-3 3
EE2-3 4
EE2-10 1
EE2-10 2
EE2-10 3
EE2-10 4
The program Star with the following settings.
genome=/home/thomask/Zebra092/nobackup/Danio_rerio/Ensembl/Zv9/Sequence/WholeGenomeFasta/genome.fa
gtf=/home/thomask/Zebra092/nobackup/Danio_rerio/Ensembl/Zv9/Annotation/Genes/genes.gtf
dir=/home/thomask/Zebra1481/private/Reference
Mapping=/home/thomask/Zebra1481/private/Mapping
cd /proj/b2014092/private/up87/
for file in *.fastq
do
star --readFilesIn ${file} --genomeDir ${dir} --outFileNamePrefix ~/Zebra1481/private/Mapping/${file} --outSAMtype BAM SortedByCoordinate --runThreadN 16
done
Prior to this the reference genome were indexed using with the following code
genome=/home/thomask/Zebra092/nobackup/Danio_rerio/Ensembl/Zv9/Sequence/WholeGenomeFasta/genome.fa
gtf=/home/thomask/Zebra092/nobackup/Danio_rerio/Ensembl/Zv9/Annotation/Genes/genes.gtf
dir=/home/thomask/Zebra1481/private/Reference
cd ${dir}
star --runThreadN 16 --runMode genomeGenerate --genomeDir ${dir} --genomeFastaFiles ${genome} --sjdbGTFfile ${gtf} --sjdbOverhang 100
The mapped reads were converted to read counts per gene with the script htseq-count as follows
gtf=/home/thomask/Zebra092/nobackup/Danio_rerio/Ensembl/Zv9/Annotation/Genes/genes.gtf
cd /home/thomask/Zebra1481/private/Mapping
for file in *.bam
do
htseq-count -f bam ${file} ${gtf} > Counts_${file}
done
From the count data additional quality control steps were performed. First of all correlations among samples and replicates were performed. The average correlation among samples were 0.91 on the unfiltered data and 0.97 when filtering the genes annotated as non-coding. The shift in average correlation is largely due to the control replicate 4 as seen in the plot ‘Correlations among samples, all genes’. This would suggest that filtering the non-coding genes will lead to a data set with less noise.
The same plot, but with non-coding genes clearly shows that sample control 4 is showing a better correlation to the other samples once the non-coding genes. It also highlights that the control 1 does over the coding genes show low correlations to all but one of the other samples.
library(lattice) # Plotting library
library(edgeR) # For short read analysis
## Loading required package: limma
library(biomaRt) # Convert ensembl data to common gene names etc
## Warning: package 'biomaRt' was built under R version 3.2.2
## Creating a generic function for 'nchar' from package 'base' in package 'S4Vectors'
rm(list=ls()) # remove objects
setwd("~/Dropbox/BilsWork/1481_b2014092/1481")
design <- read.table("Design.txt", header=T) # read sample info
treatment <- factor(design$Treatment) # convert Treatment to factor
related <- as.factor(c("A","B","C","D","E","B","F","D","B","D","C","D"))
designModel <- model.matrix(~treatment) # create design matrix from treatment
designModel2 <- model.matrix(~related + treatment)
zebraSamples <- dir("../Counts/",pattern="^Counts", full.names = TRUE) # read in bam file names
zebraSamples <- c(zebraSamples[1],zebraSamples[5:12],zebraSamples[2:4]) # rearrange order
controlSamples <- zebraSamples[c(1:4)]
EE2_3Samples <- zebraSamples[c(5:8)]
EE2_10Samples <- zebraSamples[c(9:12)]
Groups <- c(1,1,1,1,2,2,2,2,3,3,3,3)
labels=c("Control_1", "Control_2", "Control_3", "Control_4","EE2_3_1", "EE2_3_2", "EE2_3_3", "EE2_3_4","EE2_10_1", "EE2_10_2", "EE2_10_3", "EE2_10_4")
All_DGE <- readDGE(zebraSamples, header=F, group=Groups, comment.char = "_", labels=labels) # create digital gene expression matrix
All_DGE.nc.rm <- All_DGE # make a copy
listncRNA <- read.table('listofncRNA.txt') # read in list of non-coding loci
All_DGE.nc.rm$counts <- All_DGE.nc.rm$counts[!(row.names(All_DGE.nc.rm$counts)%in%listncRNA[,1]),] # filter all non-coding counts
plotMDS(All_DGE, dim.plot=c(1,2),labels=labels, xlim=c(-4,6), main = 'MDS plot showing sim. among samples, All genes')
plotMDS(All_DGE.nc.rm, dim.plot=c(1,2), xlim=c(-4,6), main = 'MDS plot showing sim. among samples, Only coding genes')
plotMDS(All_DGE, dim.plot=c(1,2),labels=related, xlim=c(-4,6), main = 'MDS plot showing sim. among samples, All genes')
levelplot(cor(All_DGE$counts), main ='Correlations among samples, all genes', xlab = 'Sample', ylab = 'Sample')
levelplot(cor(All_DGE.nc.rm$counts[,2:12]), main ='Correlations among samples, only coding genes', xlab = 'Sample', ylab = 'Sample')
As genes that have 0 or very few genes mapping to them is not meaningful to analyse for differential gene expression, one often filter genes that are lowly expressed. Below I filter all genes that show less than 1 read per million in more than 4 samples.
This drops 12086 genes from the 33737 that are found full data set. For the data set with only coding genes the corresponding figures are 6352 and 26812. In both data sets we hence end up with similar number of genes after filtering the lowly expressed once, suggesting that very few non-coding genes have high enough expression levels to be retained in the analysis.
All_DGE.keep14 <- rowSums(cpm(All_DGE)>1) >= 4 # create filtered that needs at least 1 cpm in 4 libraries (number of replicates in each treatment) for the genes to be retained in analysis
All_DGE.nc.rm.keep14 <- rowSums(cpm(All_DGE.nc.rm)>1) >= 4 # create filtered that needs at least 1 cpm in 4 libraries (number of replicates in each treatment) for the genes to be retained in analysis
table(All_DGE.keep14) # how many genes are retained after filtering
## All_DGE.keep14
## FALSE TRUE
## 12086 21651
table(All_DGE.nc.rm.keep14) # how many genes are retained after filtering
## All_DGE.nc.rm.keep14
## FALSE TRUE
## 6352 20460
All_DGE14 <- All_DGE[All_DGE.keep14,] # do the actual filtering
All_DGE.nc.rm14 <- All_DGE.nc.rm[All_DGE.nc.rm.keep14,] # do the actual filtering
# levelplot(cor(All_DGE14$counts))
# levelplot(cor(All_DGE.nc.rm14$counts))
All_DGE14 <- calcNormFactors(All_DGE14)
All_DGE14 <- estimateGLMCommonDisp(All_DGE14, designModel, verbose=TRUE)
## Disp = 0.17522 , BCV = 0.4186
All_DGE14 <- estimateGLMTrendedDisp(All_DGE14, designModel)
All_DGE14 <- estimateGLMTagwiseDisp(All_DGE14, designModel)
All_DGE.nc.rm14 <- calcNormFactors(All_DGE.nc.rm14)
All_DGE.nc.rm14 <- estimateGLMCommonDisp(All_DGE.nc.rm14, designModel, verbose=TRUE)
## Disp = 0.1739 , BCV = 0.417
All_DGE.nc.rm14 <- estimateGLMTrendedDisp(All_DGE.nc.rm14, designModel)
All_DGE.nc.rm14 <- estimateGLMTagwiseDisp(All_DGE.nc.rm14, designModel)
### Detection of DE for all samples no blocking
All_DGE14.fit <- glmFit(All_DGE14, designModel)
All_DGE14.fitEE2_3 <- glmLRT(All_DGE14.fit, coef = "treatmentEE2_3")
All_DGE14.fitEE2_10 <- glmLRT(All_DGE14.fit, coef = "treatmentEE2_10")
Table.All_DGE14EE2_3 <- topTags(All_DGE14.fitEE2_3, n=nrow(All_DGE14))$table
Table.All_DGE14EE2_10 <- topTags(All_DGE14.fitEE2_10, n=nrow(All_DGE14))$table
table(resultAll_DGE14EE2_3 <- decideTestsDGE(All_DGE14.fitEE2_3))
##
## -1 0 1
## 7 21640 4
topTags(All_DGE14.fitEE2_3)
## Coefficient: treatmentEE2_3
## logFC logCPM LR PValue FDR
## ENSDARG00000081077 -3.288345 3.3600861 40.20076 2.291587e-10 4.961514e-06
## ENSDARG00000083266 -2.401551 2.6333895 29.36339 6.000053e-08 6.495357e-04
## ENSDARG00000095618 1.976902 2.7687710 26.05853 3.312222e-07 2.390431e-03
## ENSDARG00000075193 1.871694 3.4215179 24.15760 8.876533e-07 4.804645e-03
## ENSDARG00000091399 -5.037778 -0.1574564 22.59628 1.998715e-06 7.339875e-03
## ENSDARG00000092458 -3.941008 0.8175363 22.56261 2.034052e-06 7.339875e-03
## ENSDARG00000085168 -6.851176 6.4469853 21.64316 3.283782e-06 1.015674e-02
## ENSDARG00000086157 -1.566132 3.4880740 19.48327 1.014845e-05 2.441617e-02
## ENSDARG00000003687 2.558923 1.3867636 19.48309 1.014944e-05 2.441617e-02
## ENSDARG00000083434 -2.311802 2.1263820 18.15711 2.034084e-05 4.403994e-02
table(resultAll_DGE14EE2_10 <- decideTestsDGE(All_DGE14.fitEE2_10))
##
## -1 0
## 1 21650
topTags(All_DGE14.fitEE2_10)
## Coefficient: treatmentEE2_10
## logFC logCPM LR PValue FDR
## ENSDARG00000085168 -8.088604 6.4469853 27.03828 1.994657e-07 0.004318632
## ENSDARG00000083042 3.449144 1.2876101 19.85616 8.349346e-06 0.063932314
## ENSDARG00000085829 -2.225485 4.0341587 19.40161 1.059175e-05 0.063932314
## ENSDARG00000063276 -7.164567 0.3178874 18.75954 1.482796e-05 0.063932314
## ENSDARG00000081218 -12.773096 13.1612192 18.31732 1.869993e-05 0.063932314
## ENSDARG00000080404 -2.209441 2.4327117 18.28056 1.906429e-05 0.063932314
## ENSDARG00000082850 -1.720103 4.5786217 18.02625 2.178797e-05 0.063932314
## ENSDARG00000080372 -1.957950 3.9352533 17.87235 2.362286e-05 0.063932314
## ENSDARG00000070000 -4.058072 4.4499515 17.31693 3.163566e-05 0.076104843
## ENSDARG00000080499 -1.750753 3.5362893 16.52946 4.789999e-05 0.103708261
### Detection of DE for coding genes no blocking
All_DGE.nc.rm14.fit <- glmFit(All_DGE.nc.rm14, designModel)
All_DGE.nc.rm14.fitEE2_3 <- glmLRT(All_DGE.nc.rm14.fit, coef = "treatmentEE2_3")
All_DGE.nc.rm14.fitEE2_10 <- glmLRT(All_DGE.nc.rm14.fit, coef = "treatmentEE2_10")
Table.All_DGE.nc.rm14EE2_3 <- topTags(All_DGE.nc.rm14.fitEE2_3, n=nrow(All_DGE.nc.rm14))$table
Table.All_DGE.nc.rm14EE2_10 <- topTags(All_DGE.nc.rm14.fitEE2_10, n=nrow(All_DGE.nc.rm14))$table
table(resultAll_DGE.nc.rm14EE2_3 <- decideTestsDGE(All_DGE.nc.rm14.fitEE2_3))
##
## -1 0 1
## 2 20457 1
topTags(All_DGE.nc.rm14.fitEE2_3)
## Coefficient: treatmentEE2_3
## logFC logCPM LR PValue FDR
## ENSDARG00000095618 1.974458 2.7682458 26.12925 3.193102e-07 0.006533087
## ENSDARG00000091399 -5.040822 -0.1575644 22.67616 1.917318e-06 0.015061744
## ENSDARG00000092458 -3.942801 0.8172499 22.40458 2.208467e-06 0.015061744
## ENSDARG00000003687 2.556425 1.3864546 19.34142 1.093091e-05 0.055911591
## ENSDARG00000041339 5.427560 1.7224635 17.71788 2.562076e-05 0.104840145
## ENSDARG00000062788 4.223931 3.9358022 16.98773 3.762222e-05 0.112064325
## ENSDARG00000070242 4.178573 3.3506308 16.95181 3.834068e-05 0.112064325
## ENSDARG00000058556 3.748632 3.4474707 16.12635 5.925373e-05 0.138706434
## ENSDARG00000070000 -3.848570 4.4497926 15.85823 6.826910e-05 0.138706434
## ENSDARG00000026925 4.178863 4.7479317 15.84352 6.880175e-05 0.138706434
table(resultAll_DGE.nc.rm14EE2_10 <- decideTestsDGE(All_DGE.nc.rm14.fitEE2_10))
##
## 0
## 20460
topTags(All_DGE.nc.rm14.fitEE2_10)
## Coefficient: treatmentEE2_10
## logFC logCPM LR PValue FDR
## ENSDARG00000063276 -7.169694 0.3182241 18.81848 1.437671e-05 0.2941475
## ENSDARG00000070000 -4.059054 4.4497926 17.34293 3.120577e-05 0.3192351
## ENSDARG00000034058 -3.790933 3.2150098 15.54090 8.073931e-05 0.3426614
## ENSDARG00000045835 6.744297 4.6691785 15.26711 9.332759e-05 0.3426614
## ENSDARG00000087368 -5.346894 3.8798049 14.86499 1.154867e-04 0.3426614
## ENSDARG00000003687 2.221157 1.3864546 14.74638 1.229835e-04 0.3426614
## ENSDARG00000092022 2.195444 4.2116451 14.60121 1.328295e-04 0.3426614
## ENSDARG00000037861 1.464917 3.6483854 14.30059 1.558161e-04 0.3426614
## ENSDARG00000033683 -3.700305 4.7099720 14.26634 1.586772e-04 0.3426614
## ENSDARG00000086739 -2.217343 0.9191311 14.16475 1.674787e-04 0.3426614
# plotBCV(All_DGE14)
# plotBCV(All_DGE.nc.rm14)
All_DGE14_2 <- estimateGLMCommonDisp(All_DGE14, designModel2, verbose=TRUE)
## Disp = 0.10716 , BCV = 0.3274
All_DGE14_2 <- estimateGLMTrendedDisp(All_DGE14_2, designModel2)
All_DGE14_2 <- estimateGLMTagwiseDisp(All_DGE14_2, designModel2)
All_DGE14.nc.rm_2 <- estimateGLMCommonDisp(All_DGE.nc.rm14, designModel2, verbose=TRUE)
## Disp = 0.10706 , BCV = 0.3272
All_DGE14.nc.rm_2 <- estimateGLMTrendedDisp(All_DGE14.nc.rm_2, designModel2)
All_DGE14.nc.rm_2 <- estimateGLMTagwiseDisp(All_DGE14.nc.rm_2, designModel2)
### Detection of DE for all genes with blocking
All_DGE14.fit_2 <- glmFit(All_DGE14_2, designModel2)
All_DGE14.fitEE2_3_2 <- glmLRT(All_DGE14.fit_2, coef = "treatmentEE2_3")
All_DGE14.fitEE2_10_2 <- glmLRT(All_DGE14.fit_2, coef = "treatmentEE2_10")
Table.All_DGE14EE2_3_2 <- topTags(All_DGE14.fitEE2_3_2, n=nrow(All_DGE14))$table
Table.All_DGE14EE2_10_2 <- topTags(All_DGE14.fitEE2_10_2, n=nrow(All_DGE14))$table
table(resultAll_DGE14EE2_3_2 <- decideTestsDGE(All_DGE14.fitEE2_3_2))
##
## -1 0 1
## 24 21402 225
topTags(All_DGE14.fitEE2_3_2)
## Coefficient: treatmentEE2_3
## logFC logCPM LR PValue FDR
## ENSDARG00000093844 7.256443 4.1152500 64.55146 9.404404e-16 2.036148e-11
## ENSDARG00000045834 6.698168 3.9178608 56.16264 6.671742e-14 5.853737e-10
## ENSDARG00000009443 6.078090 6.3709226 55.77858 8.111040e-14 5.853737e-10
## ENSDARG00000043722 5.287106 4.5613547 48.73027 2.936981e-12 1.589714e-08
## ENSDARG00000071306 8.366552 0.5449215 46.59824 8.713854e-12 3.495817e-08
## ENSDARG00000009153 5.546657 4.4599259 46.39064 9.687729e-12 3.495817e-08
## ENSDARG00000088158 4.456363 3.3894356 45.75943 1.337044e-11 3.952123e-08
## ENSDARG00000074613 6.133290 6.1891718 45.58670 1.460301e-11 3.952123e-08
## ENSDARG00000091609 4.870079 3.2311550 44.87660 2.098511e-11 5.048318e-08
## ENSDARG00000043171 4.545705 3.4507011 44.06328 3.179291e-11 6.883483e-08
table(resultAll_DGE14EE2_10_2 <- decideTestsDGE(All_DGE14.fitEE2_10_2))
##
## -1 0 1
## 2 21635 14
topTags(All_DGE14.fitEE2_10_2)
## Coefficient: treatmentEE2_10
## logFC logCPM LR PValue FDR
## ENSDARG00000041134 1.965283 5.3669909 36.82464 1.292462e-09 0.0000279831
## ENSDARG00000082628 5.348045 6.4865874 28.06254 1.174578e-07 0.0011249984
## ENSDARG00000097996 7.172022 0.5607916 27.40531 1.649780e-07 0.0011249984
## ENSDARG00000063276 -7.313692 0.3178874 26.95877 2.078423e-07 0.0011249984
## ENSDARG00000083042 4.059913 1.2876101 25.70350 3.981034e-07 0.0017238672
## ENSDARG00000073898 3.990033 3.7259670 25.27043 4.982891e-07 0.0017980763
## ENSDARG00000093844 3.904292 4.1152500 23.02343 1.600386e-06 0.0049499946
## ENSDARG00000088811 2.637950 4.3306905 21.78732 3.046063e-06 0.0082437897
## ENSDARG00000040553 3.213632 1.5835259 21.53709 3.470513e-06 0.0083488970
## ENSDARG00000045834 3.606974 3.9178608 20.80806 5.076887e-06 0.0109919691
### Detection of DE for only coding genes with blocking
All_DGE14.fit.nc.rm_2 <- glmFit(All_DGE14.nc.rm_2, designModel2)
All_DGE14.fit.nc.rm.EE2_3_2 <- glmLRT(All_DGE14.fit.nc.rm_2, coef = "treatmentEE2_3")
All_DGE14.fit.nc.rm.EE2_10_2 <- glmLRT(All_DGE14.fit.nc.rm_2, coef = "treatmentEE2_10")
Table.All_DGE14.nc.rm.EE2_3_2 <- topTags(All_DGE14.fit.nc.rm.EE2_3_2, n=nrow(All_DGE.nc.rm14))$table
Table.All_DGE14.nc.rm.EE2_10_2 <- topTags(All_DGE14.fit.nc.rm.EE2_10_2, n=nrow(All_DGE.nc.rm14))$table
table(resultAll_DGE14.nc.rm.EE2_3_2 <- decideTestsDGE(All_DGE14.fit.nc.rm.EE2_3_2))
##
## -1 0 1
## 21 20210 229
topTags(All_DGE14.fit.nc.rm.EE2_3_2)
## Coefficient: treatmentEE2_3
## logFC logCPM LR PValue FDR
## ENSDARG00000093844 7.255804 4.1151604 64.87644 7.974485e-16 1.631580e-11
## ENSDARG00000045834 6.698465 3.9177290 56.86054 4.678394e-14 4.442595e-10
## ENSDARG00000009443 6.074038 6.3706228 56.20966 6.514068e-14 4.442595e-10
## ENSDARG00000043722 5.283322 4.5610506 48.90390 2.688155e-12 1.374991e-08
## ENSDARG00000009153 5.543280 4.4597880 46.53255 9.010935e-12 2.945715e-08
## ENSDARG00000071306 8.362581 0.5447195 46.46166 9.342877e-12 2.945715e-08
## ENSDARG00000088158 4.454267 3.3890450 46.31321 1.007820e-11 2.945715e-08
## ENSDARG00000091609 4.874057 3.2309494 45.76412 1.333845e-11 3.148562e-08
## ENSDARG00000074613 6.129667 6.1890123 45.69040 1.384998e-11 3.148562e-08
## ENSDARG00000043171 4.544233 3.4506053 44.67218 2.329441e-11 4.766037e-08
table(resultAll_DGE14.nc.rm.EE2_10_2 <- decideTestsDGE(All_DGE14.fit.nc.rm.EE2_10_2))
##
## -1 0 1
## 1 20449 10
topTags(All_DGE14.fit.nc.rm.EE2_10_2)
## Coefficient: treatmentEE2_10
## logFC logCPM LR PValue FDR
## ENSDARG00000041134 1.962180 5.3661911 37.33345 9.956243e-10 2.037047e-05
## ENSDARG00000063276 -7.323259 0.3182241 27.20279 1.831938e-07 1.688309e-03
## ENSDARG00000073898 4.027082 3.7260486 26.62089 2.475526e-07 1.688309e-03
## ENSDARG00000093844 3.903872 4.1151604 23.12375 1.519029e-06 7.769834e-03
## ENSDARG00000088811 2.633743 4.3301115 22.04605 2.661872e-06 1.089238e-02
## ENSDARG00000040553 3.212572 1.5827829 21.44060 3.649607e-06 1.244516e-02
## ENSDARG00000045834 3.608489 3.9177290 21.04607 4.483724e-06 1.310529e-02
## ENSDARG00000043722 2.845846 4.5610506 20.34622 6.462070e-06 1.652674e-02
## ENSDARG00000074613 3.166565 6.1890123 18.57220 1.635887e-05 3.718916e-02
## ENSDARG00000053966 1.537037 5.2362430 17.96562 2.249316e-05 4.602100e-02
Below is the same analysis but omitting the sample Control 1 that seem to show a very different pattern.
All_DGEfiltered <- All_DGE[,c(2:12)] # remove the samples
designfiltered <- design[c(2:12),]
treatmentfiltered <- factor(designfiltered$Treatment)
designModelfiltered <- model.matrix(~treatmentfiltered)
All_DGE.nc.rmfiltered <- All_DGEfiltered
All_DGE.nc.rmfiltered$counts <- All_DGE.nc.rmfiltered$counts[!(row.names(All_DGE.nc.rmfiltered$counts)%in%listncRNA[,1]),]
All_DGE.keep13filtered <- rowSums(cpm(All_DGEfiltered)>1) >= 3 # create filtered that needs at least 1 cpm in 4 libraries (number of replicates in each treatment) for the genes to be retained in analysis
All_DGE.nc.rm.keep13filtered <- rowSums(cpm(All_DGE.nc.rmfiltered)>1) >= 3 # create filtered that needs at least 1 cpm in 4 libraries (number of replicates in each treatment) for the genes to be retained in analysis
table(All_DGE.keep13filtered) # how many genes are retained after filtering
## All_DGE.keep13filtered
## FALSE TRUE
## 11674 22063
table(All_DGE.nc.rm.keep13filtered) # how many genes are retained after filtering
## All_DGE.nc.rm.keep13filtered
## FALSE TRUE
## 6005 20807
All_DGE13filtered <- All_DGEfiltered[All_DGE.keep13filtered,] # do the actual filtering
All_DGE.nc.rm13filtered <- All_DGE.nc.rmfiltered[All_DGE.nc.rm.keep13filtered,] # do the actual filtering
plotMDS(All_DGE13filtered, dim.plot=c(1,2),labels=labels[c(2:12)], xlim=c(-4,6))
plotMDS(All_DGE.nc.rm13filtered, dim.plot=c(1,2),xlim=c(-4,6))
levelplot(cor(All_DGE13filtered$counts))
levelplot(cor(All_DGE.nc.rm13filtered$counts))
All_DGE13filtered <- calcNormFactors(All_DGE13filtered)
All_DGE13filtered <- estimateGLMCommonDisp(All_DGE13filtered, designModelfiltered, verbose=TRUE)
## Disp = 0.12291 , BCV = 0.3506
All_DGE13filtered <- estimateGLMTrendedDisp(All_DGE13filtered, designModelfiltered)
All_DGE13filtered <- estimateGLMTagwiseDisp(All_DGE13filtered, designModelfiltered)
All_DGE.nc.rm13filtered <- calcNormFactors(All_DGE.nc.rm13filtered)
All_DGE.nc.rm13filtered <- estimateGLMCommonDisp(All_DGE.nc.rm13filtered, designModelfiltered, verbose=TRUE)
## Disp = 0.12392 , BCV = 0.352
All_DGE.nc.rm13filtered <- estimateGLMTrendedDisp(All_DGE.nc.rm13filtered, designModelfiltered)
All_DGE.nc.rm13filtered <- estimateGLMTagwiseDisp(All_DGE.nc.rm13filtered, designModelfiltered)
All_DGE13filtered.fit <- glmFit(All_DGE13filtered, designModelfiltered)
All_DGE13filtered.fitEE2_3 <- glmLRT(All_DGE13filtered.fit, coef = "treatmentfilteredEE2_3")
All_DGE13filtered.fitEE2_10 <- glmLRT(All_DGE13filtered.fit, coef = "treatmentfilteredEE2_10")
Table.All_DGE13filteredEE2_3 <- topTags(All_DGE13filtered.fitEE2_3, n=nrow(All_DGE13filtered))$table
Table.All_DGE13filteredEE2_10 <- topTags(All_DGE13filtered.fitEE2_10, n=nrow(All_DGE13filtered))
table(resultAll_DGE13filteredEE2_3 <- decideTestsDGE(All_DGE13filtered.fitEE2_3))
##
## -1 0 1
## 6 22048 9
topTags(All_DGE13filtered.fitEE2_3, n = 16)
## Coefficient: treatmentfilteredEE2_3
## logFC logCPM LR PValue
## ENSDARG00000081077 -3.259998 3.2903327 36.34908 1.649571e-09
## ENSDARG00000085168 -7.244447 6.6105761 29.55016 5.448806e-08
## ENSDARG00000083266 -2.326151 2.5375884 25.65639 4.079421e-07
## ENSDARG00000092458 -4.182849 0.8985077 25.25233 5.029865e-07
## ENSDARG00000041339 7.412487 1.8527827 22.58908 2.006216e-06
## ENSDARG00000081218 -11.715476 13.3276119 22.53318 2.065456e-06
## ENSDARG00000097996 6.792520 0.6288484 22.43087 2.178447e-06
## ENSDARG00000091399 -5.263067 -0.1244900 22.31618 2.312499e-06
## ENSDARG00000095618 1.929696 2.8820710 21.78112 3.055921e-06
## ENSDARG00000073820 5.316315 1.8743178 21.64067 3.288048e-06
## ENSDARG00000062788 5.233288 4.0704738 19.89996 8.160215e-06
## ENSDARG00000097157 8.596447 1.3035282 19.52685 9.919581e-06
## ENSDARG00000069503 4.461870 2.3473625 19.20727 1.172661e-05
## ENSDARG00000075193 1.754298 3.5530263 19.20687 1.172908e-05
## ENSDARG00000088811 3.062316 4.0254160 18.55743 1.648615e-05
## ENSDARG00000006427 5.600417 2.5856179 16.68330 4.416814e-05
## FDR
## ENSDARG00000081077 3.639448e-05
## ENSDARG00000085168 6.010851e-04
## ENSDARG00000083266 2.774348e-03
## ENSDARG00000092458 2.774348e-03
## ENSDARG00000041339 6.377582e-03
## ENSDARG00000081218 6.377582e-03
## ENSDARG00000097996 6.377582e-03
## ENSDARG00000091399 6.377582e-03
## ENSDARG00000095618 7.254421e-03
## ENSDARG00000073820 7.254421e-03
## ENSDARG00000062788 1.636717e-02
## ENSDARG00000097157 1.823798e-02
## ENSDARG00000069503 1.848419e-02
## ENSDARG00000075193 1.848419e-02
## ENSDARG00000088811 2.424893e-02
## ENSDARG00000006427 6.090511e-02
table(resultAll_DGE13filteredEE2_10 <- decideTestsDGE(All_DGE13filtered.fitEE2_10))
##
## -1 0 1
## 2 22060 1
topTags(All_DGE13filtered.fitEE2_10)
## Coefficient: treatmentfilteredEE2_10
## logFC logCPM LR PValue
## ENSDARG00000085168 -8.488074 6.6105761 36.56610 1.475757e-09
## ENSDARG00000081218 -13.173566 13.3276119 25.95907 3.487329e-07
## ENSDARG00000097996 6.941902 0.6288484 23.64886 1.156143e-06
## ENSDARG00000083042 3.796417 1.4065348 18.76849 1.475851e-05
## ENSDARG00000080404 -2.263103 2.3943284 17.26530 3.250696e-05
## ENSDARG00000082850 -1.652810 4.4890515 16.51254 4.832936e-05
## ENSDARG00000082628 5.649366 6.1223891 16.05600 6.149642e-05
## ENSDARG00000063276 -6.961693 0.1743958 15.95148 6.498708e-05
## ENSDARG00000083431 -2.770505 7.9692558 15.46942 8.385103e-05
## ENSDARG00000080372 -1.876902 3.8522501 15.24261 9.454575e-05
## FDR
## ENSDARG00000085168 3.255963e-05
## ENSDARG00000081218 3.847047e-03
## ENSDARG00000097996 8.502662e-03
## ENSDARG00000083042 8.140427e-02
## ENSDARG00000080404 1.434402e-01
## ENSDARG00000082850 1.777151e-01
## ENSDARG00000082628 1.792262e-01
## ENSDARG00000063276 1.792262e-01
## ENSDARG00000083431 2.035962e-01
## ENSDARG00000080372 2.035962e-01
All_DGE.nc.rm13filtered.fit <- glmFit(All_DGE.nc.rm13filtered, designModelfiltered)
All_DGE.nc.rm13filtered.fitEE2_3 <- glmLRT(All_DGE.nc.rm13filtered.fit, coef = "treatmentfilteredEE2_3")
All_DGE.nc.rm13filtered.fitEE2_10 <- glmLRT(All_DGE.nc.rm13filtered.fit, coef = "treatmentfilteredEE2_10")
Table.All_DGE.nc.rm13filteredEE2_3 <- topTags(All_DGE.nc.rm13filtered.fitEE2_3, n=nrow(All_DGE.nc.rm13filtered))$table
Table.All_DGE.nc.rm13filteredEE2_10 <- topTags(All_DGE.nc.rm13filtered.fitEE2_10, n=nrow(All_DGE.nc.rm13filtered))$table
table(resultAll_DGE.nc.rm13filteredEE2_3 <- decideTestsDGE(All_DGE.nc.rm13filtered.fitEE2_3))
##
## -1 0 1
## 2 20799 6
topTags(All_DGE.nc.rm13filtered.fitEE2_3)
## Coefficient: treatmentfilteredEE2_3
## logFC logCPM LR PValue FDR
## ENSDARG00000092458 -4.192857 0.8989275 25.40946 4.636390e-07 0.009646936
## ENSDARG00000091399 -5.274667 -0.1238424 22.76679 1.828990e-06 0.014329419
## ENSDARG00000041339 7.393485 1.8446449 22.53262 2.066048e-06 0.014329419
## ENSDARG00000095618 1.921188 2.8800594 21.67142 3.235768e-06 0.014595547
## ENSDARG00000073820 5.304972 1.8712922 21.51683 3.507365e-06 0.014595547
## ENSDARG00000062788 5.207482 4.0533591 19.88044 8.243934e-06 0.028588588
## ENSDARG00000069503 4.444508 2.3419183 19.09502 1.243691e-05 0.036967841
## ENSDARG00000088811 3.055496 4.0237194 18.55239 1.652974e-05 0.042991793
## ENSDARG00000006427 5.574731 2.5729826 16.62898 4.545132e-05 0.105078391
## ENSDARG00000009544 5.339240 2.8029574 16.33786 5.299453e-05 0.106496225
table(resultAll_DGE.nc.rm13filteredEE2_10 <- decideTestsDGE(All_DGE.nc.rm13filtered.fitEE2_10))
##
## 0
## 20807
topTags(All_DGE.nc.rm13filtered.fitEE2_10)
## Coefficient: treatmentfilteredEE2_10
## logFC logCPM LR PValue FDR
## ENSDARG00000063276 -6.972995 0.1750804 16.03918 6.204529e-05 0.9990506
## ENSDARG00000016713 -3.435634 0.2128911 15.05291 1.045389e-04 0.9990506
## ENSDARG00000088811 2.610579 4.0237194 14.10608 1.727840e-04 0.9990506
## ENSDARG00000076761 -1.188794 4.9600651 13.74334 2.095628e-04 0.9990506
## ENSDARG00000010662 1.143213 5.2883990 13.25931 2.712295e-04 0.9990506
## ENSDARG00000052564 3.237355 -0.1653769 12.46131 4.154685e-04 0.9990506
## ENSDARG00000089687 1.343595 5.7110479 12.01383 5.280724e-04 0.9990506
## ENSDARG00000003687 2.196638 1.5092463 12.00695 5.300256e-04 0.9990506
## ENSDARG00000089762 -1.387607 2.2977225 11.96382 5.424362e-04 0.9990506
## ENSDARG00000037861 1.411788 3.7600213 11.50895 6.926184e-04 0.9990506