RNA analysis of Zebra fish Gonads

This document describes contains some basic information from the RNA-seq data from Zebra fish. All commands needed to run the analysis that have been done. Based on these results I can perform downstream analysis of the results.

Lib Name	Treatment	Replicate	Number of reads	Useful reads
up-87-1	Control	1	19741198	8153032
up-87-2	Control	2	22141691	9051586
up-87-3	Control	3	19379328	8847111
up-87-4	Control	4	20509164	5244292
up-87-5	EE2-3	1	21248836	6671255
up-87-6	EE2-3	2	19692093	9482823
up-87-7	EE2-3	3	19737758	7617444
up-87-8	EE2-3	4	14064762	6310603
up-87-9	EE2-10	1	18336786	10896593
up-87-10	EE2-10	2	14963248	8864047
up-87-11	EE2-10	3	13291577	7756863
up-87-12	EE2-10	4	21512379	7836155

Quality control and mapping

The average quality or reads looks okay, but the mean level is slightly lower than commonly seen for Illumina reads. I think this expected with this technology as this is seen in all projects that I have been involved in that use proton data. I hence see no need to filter reads prior to mapping reads to the references sequence.

Control 1 Control 2 Control 3 Control 4 EE2-3 1 EE2-3 1 EE2-3 2 EE2-3 2 EE2-3 3 EE2-3 3 EE2-3 4 EE2-3 4 EE2-10 1 EE2-10 2 EE2-10 3 EE2-10 4

The program Star with the following settings.


genome=/home/thomask/Zebra092/nobackup/Danio_rerio/Ensembl/Zv9/Sequence/WholeGenomeFasta/genome.fa
gtf=/home/thomask/Zebra092/nobackup/Danio_rerio/Ensembl/Zv9/Annotation/Genes/genes.gtf
dir=/home/thomask/Zebra1481/private/Reference
Mapping=/home/thomask/Zebra1481/private/Mapping

cd /proj/b2014092/private/up87/

for file in *.fastq
    do
    star --readFilesIn ${file} --genomeDir ${dir} --outFileNamePrefix ~/Zebra1481/private/Mapping/${file} --outSAMtype BAM SortedByCoordinate --runThreadN 16 
    done

Prior to this the reference genome were indexed using with the following code

genome=/home/thomask/Zebra092/nobackup/Danio_rerio/Ensembl/Zv9/Sequence/WholeGenomeFasta/genome.fa
gtf=/home/thomask/Zebra092/nobackup/Danio_rerio/Ensembl/Zv9/Annotation/Genes/genes.gtf
dir=/home/thomask/Zebra1481/private/Reference

cd ${dir}
star --runThreadN 16 --runMode genomeGenerate --genomeDir ${dir} --genomeFastaFiles ${genome} --sjdbGTFfile ${gtf} --sjdbOverhang 100

The mapped reads were converted to read counts per gene with the script htseq-count as follows

gtf=/home/thomask/Zebra092/nobackup/Danio_rerio/Ensembl/Zv9/Annotation/Genes/genes.gtf

cd /home/thomask/Zebra1481/private/Mapping
for file in *.bam
    do
    htseq-count -f bam ${file} ${gtf} > Counts_${file}
    done

From the count data additional quality control steps were performed. First of all correlations among samples and replicates were performed. The average correlation among samples were 0.91 on the unfiltered data and 0.97 when filtering the genes annotated as non-coding. The shift in average correlation is largely due to the control replicate 4 as seen in the plot ‘Correlations among samples, all genes’. This would suggest that filtering the non-coding genes will lead to a data set with less noise.

The same plot, but with non-coding genes clearly shows that sample control 4 is showing a better correlation to the other samples once the non-coding genes. It also highlights that the control 1 does over the coding genes show low correlations to all but one of the other samples.

library(lattice) # Plotting library
library(edgeR) # For short read analysis

## Loading required package: limma

library(biomaRt) # Convert ensembl data to common gene names etc

## Warning: package 'biomaRt' was built under R version 3.2.2

## Creating a generic function for 'nchar' from package 'base' in package 'S4Vectors'

rm(list=ls()) # remove objects
setwd("~/Dropbox/BilsWork/1481_b2014092/1481") 

design <- read.table("Design.txt", header=T) # read sample info
treatment <- factor(design$Treatment) # convert Treatment to factor
related <- as.factor(c("A","B","C","D","E","B","F","D","B","D","C","D"))
designModel <- model.matrix(~treatment) # create design matrix from treatment
designModel2 <- model.matrix(~related + treatment)

zebraSamples <- dir("../Counts/",pattern="^Counts", full.names = TRUE) # read in bam file names
zebraSamples <- c(zebraSamples[1],zebraSamples[5:12],zebraSamples[2:4]) # rearrange order
controlSamples <- zebraSamples[c(1:4)]
EE2_3Samples <- zebraSamples[c(5:8)]
EE2_10Samples <- zebraSamples[c(9:12)]
Groups <- c(1,1,1,1,2,2,2,2,3,3,3,3)
labels=c("Control_1", "Control_2", "Control_3", "Control_4","EE2_3_1", "EE2_3_2", "EE2_3_3", "EE2_3_4","EE2_10_1", "EE2_10_2", "EE2_10_3", "EE2_10_4")

All_DGE <- readDGE(zebraSamples, header=F, group=Groups, comment.char = "_", labels=labels) # create digital gene expression matrix


All_DGE.nc.rm <- All_DGE # make a copy
listncRNA <- read.table('listofncRNA.txt') # read in list of non-coding loci
All_DGE.nc.rm$counts <- All_DGE.nc.rm$counts[!(row.names(All_DGE.nc.rm$counts)%in%listncRNA[,1]),] # filter all non-coding counts

plotMDS(All_DGE, dim.plot=c(1,2),labels=labels, xlim=c(-4,6), main = 'MDS plot showing sim. among samples, All genes')

plotMDS(All_DGE.nc.rm, dim.plot=c(1,2), xlim=c(-4,6), main = 'MDS plot showing sim. among samples, Only coding genes')

plotMDS(All_DGE, dim.plot=c(1,2),labels=related, xlim=c(-4,6), main = 'MDS plot showing sim. among samples, All genes')

levelplot(cor(All_DGE$counts), main ='Correlations among samples, all genes', xlab = 'Sample', ylab = 'Sample')

levelplot(cor(All_DGE.nc.rm$counts[,2:12]), main ='Correlations among samples, only coding genes', xlab = 'Sample', ylab = 'Sample')

As genes that have 0 or very few genes mapping to them is not meaningful to analyse for differential gene expression, one often filter genes that are lowly expressed. Below I filter all genes that show less than 1 read per million in more than 4 samples.

This drops 12086 genes from the 33737 that are found full data set. For the data set with only coding genes the corresponding figures are 6352 and 26812. In both data sets we hence end up with similar number of genes after filtering the lowly expressed once, suggesting that very few non-coding genes have high enough expression levels to be retained in the analysis.

All_DGE.keep14 <- rowSums(cpm(All_DGE)>1) >= 4 # create filtered that needs at least 1 cpm in 4 libraries (number of replicates in each treatment) for the genes to be retained in analysis
All_DGE.nc.rm.keep14 <- rowSums(cpm(All_DGE.nc.rm)>1) >= 4 # create filtered that needs at least 1 cpm in 4 libraries (number of replicates in each treatment) for the genes to be retained in analysis
table(All_DGE.keep14) # how many genes are retained after filtering

## All_DGE.keep14
## FALSE  TRUE 
## 12086 21651

table(All_DGE.nc.rm.keep14) # how many genes are retained after filtering

## All_DGE.nc.rm.keep14
## FALSE  TRUE 
##  6352 20460

All_DGE14 <- All_DGE[All_DGE.keep14,] # do the actual filtering
All_DGE.nc.rm14 <- All_DGE.nc.rm[All_DGE.nc.rm.keep14,] # do the actual filtering
# levelplot(cor(All_DGE14$counts))
# levelplot(cor(All_DGE.nc.rm14$counts))


All_DGE14 <- calcNormFactors(All_DGE14)
All_DGE14 <- estimateGLMCommonDisp(All_DGE14, designModel, verbose=TRUE)

## Disp = 0.17522 , BCV = 0.4186

All_DGE14 <- estimateGLMTrendedDisp(All_DGE14, designModel)
All_DGE14 <- estimateGLMTagwiseDisp(All_DGE14, designModel)

All_DGE.nc.rm14 <- calcNormFactors(All_DGE.nc.rm14)
All_DGE.nc.rm14 <- estimateGLMCommonDisp(All_DGE.nc.rm14, designModel, verbose=TRUE)

## Disp = 0.1739 , BCV = 0.417

All_DGE.nc.rm14 <- estimateGLMTrendedDisp(All_DGE.nc.rm14, designModel)
All_DGE.nc.rm14 <- estimateGLMTagwiseDisp(All_DGE.nc.rm14, designModel)
### Detection of DE for all samples no blocking
All_DGE14.fit <- glmFit(All_DGE14, designModel)
All_DGE14.fitEE2_3 <- glmLRT(All_DGE14.fit, coef = "treatmentEE2_3")
All_DGE14.fitEE2_10 <- glmLRT(All_DGE14.fit, coef = "treatmentEE2_10")
Table.All_DGE14EE2_3 <- topTags(All_DGE14.fitEE2_3, n=nrow(All_DGE14))$table
Table.All_DGE14EE2_10 <- topTags(All_DGE14.fitEE2_10, n=nrow(All_DGE14))$table
table(resultAll_DGE14EE2_3 <- decideTestsDGE(All_DGE14.fitEE2_3))

## 
##    -1     0     1 
##     7 21640     4

topTags(All_DGE14.fitEE2_3)

## Coefficient:  treatmentEE2_3 
##                        logFC     logCPM       LR       PValue          FDR
## ENSDARG00000081077 -3.288345  3.3600861 40.20076 2.291587e-10 4.961514e-06
## ENSDARG00000083266 -2.401551  2.6333895 29.36339 6.000053e-08 6.495357e-04
## ENSDARG00000095618  1.976902  2.7687710 26.05853 3.312222e-07 2.390431e-03
## ENSDARG00000075193  1.871694  3.4215179 24.15760 8.876533e-07 4.804645e-03
## ENSDARG00000091399 -5.037778 -0.1574564 22.59628 1.998715e-06 7.339875e-03
## ENSDARG00000092458 -3.941008  0.8175363 22.56261 2.034052e-06 7.339875e-03
## ENSDARG00000085168 -6.851176  6.4469853 21.64316 3.283782e-06 1.015674e-02
## ENSDARG00000086157 -1.566132  3.4880740 19.48327 1.014845e-05 2.441617e-02
## ENSDARG00000003687  2.558923  1.3867636 19.48309 1.014944e-05 2.441617e-02
## ENSDARG00000083434 -2.311802  2.1263820 18.15711 2.034084e-05 4.403994e-02

table(resultAll_DGE14EE2_10 <- decideTestsDGE(All_DGE14.fitEE2_10))

## 
##    -1     0 
##     1 21650

topTags(All_DGE14.fitEE2_10)

## Coefficient:  treatmentEE2_10 
##                         logFC     logCPM       LR       PValue         FDR
## ENSDARG00000085168  -8.088604  6.4469853 27.03828 1.994657e-07 0.004318632
## ENSDARG00000083042   3.449144  1.2876101 19.85616 8.349346e-06 0.063932314
## ENSDARG00000085829  -2.225485  4.0341587 19.40161 1.059175e-05 0.063932314
## ENSDARG00000063276  -7.164567  0.3178874 18.75954 1.482796e-05 0.063932314
## ENSDARG00000081218 -12.773096 13.1612192 18.31732 1.869993e-05 0.063932314
## ENSDARG00000080404  -2.209441  2.4327117 18.28056 1.906429e-05 0.063932314
## ENSDARG00000082850  -1.720103  4.5786217 18.02625 2.178797e-05 0.063932314
## ENSDARG00000080372  -1.957950  3.9352533 17.87235 2.362286e-05 0.063932314
## ENSDARG00000070000  -4.058072  4.4499515 17.31693 3.163566e-05 0.076104843
## ENSDARG00000080499  -1.750753  3.5362893 16.52946 4.789999e-05 0.103708261

### Detection of DE for coding genes no blocking
All_DGE.nc.rm14.fit <- glmFit(All_DGE.nc.rm14, designModel)
All_DGE.nc.rm14.fitEE2_3 <- glmLRT(All_DGE.nc.rm14.fit, coef = "treatmentEE2_3")
All_DGE.nc.rm14.fitEE2_10 <- glmLRT(All_DGE.nc.rm14.fit, coef = "treatmentEE2_10")
Table.All_DGE.nc.rm14EE2_3 <- topTags(All_DGE.nc.rm14.fitEE2_3, n=nrow(All_DGE.nc.rm14))$table
Table.All_DGE.nc.rm14EE2_10 <- topTags(All_DGE.nc.rm14.fitEE2_10, n=nrow(All_DGE.nc.rm14))$table
table(resultAll_DGE.nc.rm14EE2_3 <- decideTestsDGE(All_DGE.nc.rm14.fitEE2_3))

## 
##    -1     0     1 
##     2 20457     1

topTags(All_DGE.nc.rm14.fitEE2_3)

## Coefficient:  treatmentEE2_3 
##                        logFC     logCPM       LR       PValue         FDR
## ENSDARG00000095618  1.974458  2.7682458 26.12925 3.193102e-07 0.006533087
## ENSDARG00000091399 -5.040822 -0.1575644 22.67616 1.917318e-06 0.015061744
## ENSDARG00000092458 -3.942801  0.8172499 22.40458 2.208467e-06 0.015061744
## ENSDARG00000003687  2.556425  1.3864546 19.34142 1.093091e-05 0.055911591
## ENSDARG00000041339  5.427560  1.7224635 17.71788 2.562076e-05 0.104840145
## ENSDARG00000062788  4.223931  3.9358022 16.98773 3.762222e-05 0.112064325
## ENSDARG00000070242  4.178573  3.3506308 16.95181 3.834068e-05 0.112064325
## ENSDARG00000058556  3.748632  3.4474707 16.12635 5.925373e-05 0.138706434
## ENSDARG00000070000 -3.848570  4.4497926 15.85823 6.826910e-05 0.138706434
## ENSDARG00000026925  4.178863  4.7479317 15.84352 6.880175e-05 0.138706434

table(resultAll_DGE.nc.rm14EE2_10 <- decideTestsDGE(All_DGE.nc.rm14.fitEE2_10))

## 
##     0 
## 20460

topTags(All_DGE.nc.rm14.fitEE2_10)

## Coefficient:  treatmentEE2_10 
##                        logFC    logCPM       LR       PValue       FDR
## ENSDARG00000063276 -7.169694 0.3182241 18.81848 1.437671e-05 0.2941475
## ENSDARG00000070000 -4.059054 4.4497926 17.34293 3.120577e-05 0.3192351
## ENSDARG00000034058 -3.790933 3.2150098 15.54090 8.073931e-05 0.3426614
## ENSDARG00000045835  6.744297 4.6691785 15.26711 9.332759e-05 0.3426614
## ENSDARG00000087368 -5.346894 3.8798049 14.86499 1.154867e-04 0.3426614
## ENSDARG00000003687  2.221157 1.3864546 14.74638 1.229835e-04 0.3426614
## ENSDARG00000092022  2.195444 4.2116451 14.60121 1.328295e-04 0.3426614
## ENSDARG00000037861  1.464917 3.6483854 14.30059 1.558161e-04 0.3426614
## ENSDARG00000033683 -3.700305 4.7099720 14.26634 1.586772e-04 0.3426614
## ENSDARG00000086739 -2.217343 0.9191311 14.16475 1.674787e-04 0.3426614

# plotBCV(All_DGE14)
# plotBCV(All_DGE.nc.rm14)


All_DGE14_2 <- estimateGLMCommonDisp(All_DGE14, designModel2, verbose=TRUE)

## Disp = 0.10716 , BCV = 0.3274

All_DGE14_2 <- estimateGLMTrendedDisp(All_DGE14_2, designModel2)
All_DGE14_2 <- estimateGLMTagwiseDisp(All_DGE14_2, designModel2)
All_DGE14.nc.rm_2 <- estimateGLMCommonDisp(All_DGE.nc.rm14, designModel2, verbose=TRUE)

## Disp = 0.10706 , BCV = 0.3272

All_DGE14.nc.rm_2 <- estimateGLMTrendedDisp(All_DGE14.nc.rm_2, designModel2)
All_DGE14.nc.rm_2 <- estimateGLMTagwiseDisp(All_DGE14.nc.rm_2, designModel2)


### Detection of DE for all genes with blocking
All_DGE14.fit_2 <- glmFit(All_DGE14_2, designModel2)
All_DGE14.fitEE2_3_2 <- glmLRT(All_DGE14.fit_2, coef = "treatmentEE2_3")
All_DGE14.fitEE2_10_2 <- glmLRT(All_DGE14.fit_2, coef = "treatmentEE2_10")
Table.All_DGE14EE2_3_2 <- topTags(All_DGE14.fitEE2_3_2, n=nrow(All_DGE14))$table
Table.All_DGE14EE2_10_2 <- topTags(All_DGE14.fitEE2_10_2, n=nrow(All_DGE14))$table
table(resultAll_DGE14EE2_3_2 <- decideTestsDGE(All_DGE14.fitEE2_3_2))

## 
##    -1     0     1 
##    24 21402   225

topTags(All_DGE14.fitEE2_3_2)

## Coefficient:  treatmentEE2_3 
##                       logFC    logCPM       LR       PValue          FDR
## ENSDARG00000093844 7.256443 4.1152500 64.55146 9.404404e-16 2.036148e-11
## ENSDARG00000045834 6.698168 3.9178608 56.16264 6.671742e-14 5.853737e-10
## ENSDARG00000009443 6.078090 6.3709226 55.77858 8.111040e-14 5.853737e-10
## ENSDARG00000043722 5.287106 4.5613547 48.73027 2.936981e-12 1.589714e-08
## ENSDARG00000071306 8.366552 0.5449215 46.59824 8.713854e-12 3.495817e-08
## ENSDARG00000009153 5.546657 4.4599259 46.39064 9.687729e-12 3.495817e-08
## ENSDARG00000088158 4.456363 3.3894356 45.75943 1.337044e-11 3.952123e-08
## ENSDARG00000074613 6.133290 6.1891718 45.58670 1.460301e-11 3.952123e-08
## ENSDARG00000091609 4.870079 3.2311550 44.87660 2.098511e-11 5.048318e-08
## ENSDARG00000043171 4.545705 3.4507011 44.06328 3.179291e-11 6.883483e-08

table(resultAll_DGE14EE2_10_2 <- decideTestsDGE(All_DGE14.fitEE2_10_2))

## 
##    -1     0     1 
##     2 21635    14

topTags(All_DGE14.fitEE2_10_2)

## Coefficient:  treatmentEE2_10 
##                        logFC    logCPM       LR       PValue          FDR
## ENSDARG00000041134  1.965283 5.3669909 36.82464 1.292462e-09 0.0000279831
## ENSDARG00000082628  5.348045 6.4865874 28.06254 1.174578e-07 0.0011249984
## ENSDARG00000097996  7.172022 0.5607916 27.40531 1.649780e-07 0.0011249984
## ENSDARG00000063276 -7.313692 0.3178874 26.95877 2.078423e-07 0.0011249984
## ENSDARG00000083042  4.059913 1.2876101 25.70350 3.981034e-07 0.0017238672
## ENSDARG00000073898  3.990033 3.7259670 25.27043 4.982891e-07 0.0017980763
## ENSDARG00000093844  3.904292 4.1152500 23.02343 1.600386e-06 0.0049499946
## ENSDARG00000088811  2.637950 4.3306905 21.78732 3.046063e-06 0.0082437897
## ENSDARG00000040553  3.213632 1.5835259 21.53709 3.470513e-06 0.0083488970
## ENSDARG00000045834  3.606974 3.9178608 20.80806 5.076887e-06 0.0109919691

### Detection of DE for only coding genes with blocking
All_DGE14.fit.nc.rm_2 <- glmFit(All_DGE14.nc.rm_2, designModel2)
All_DGE14.fit.nc.rm.EE2_3_2 <- glmLRT(All_DGE14.fit.nc.rm_2, coef = "treatmentEE2_3")
All_DGE14.fit.nc.rm.EE2_10_2 <- glmLRT(All_DGE14.fit.nc.rm_2, coef = "treatmentEE2_10")
Table.All_DGE14.nc.rm.EE2_3_2 <- topTags(All_DGE14.fit.nc.rm.EE2_3_2, n=nrow(All_DGE.nc.rm14))$table
Table.All_DGE14.nc.rm.EE2_10_2 <- topTags(All_DGE14.fit.nc.rm.EE2_10_2, n=nrow(All_DGE.nc.rm14))$table
table(resultAll_DGE14.nc.rm.EE2_3_2 <- decideTestsDGE(All_DGE14.fit.nc.rm.EE2_3_2))

## 
##    -1     0     1 
##    21 20210   229

topTags(All_DGE14.fit.nc.rm.EE2_3_2)

## Coefficient:  treatmentEE2_3 
##                       logFC    logCPM       LR       PValue          FDR
## ENSDARG00000093844 7.255804 4.1151604 64.87644 7.974485e-16 1.631580e-11
## ENSDARG00000045834 6.698465 3.9177290 56.86054 4.678394e-14 4.442595e-10
## ENSDARG00000009443 6.074038 6.3706228 56.20966 6.514068e-14 4.442595e-10
## ENSDARG00000043722 5.283322 4.5610506 48.90390 2.688155e-12 1.374991e-08
## ENSDARG00000009153 5.543280 4.4597880 46.53255 9.010935e-12 2.945715e-08
## ENSDARG00000071306 8.362581 0.5447195 46.46166 9.342877e-12 2.945715e-08
## ENSDARG00000088158 4.454267 3.3890450 46.31321 1.007820e-11 2.945715e-08
## ENSDARG00000091609 4.874057 3.2309494 45.76412 1.333845e-11 3.148562e-08
## ENSDARG00000074613 6.129667 6.1890123 45.69040 1.384998e-11 3.148562e-08
## ENSDARG00000043171 4.544233 3.4506053 44.67218 2.329441e-11 4.766037e-08

table(resultAll_DGE14.nc.rm.EE2_10_2 <- decideTestsDGE(All_DGE14.fit.nc.rm.EE2_10_2))

## 
##    -1     0     1 
##     1 20449    10

topTags(All_DGE14.fit.nc.rm.EE2_10_2)

## Coefficient:  treatmentEE2_10 
##                        logFC    logCPM       LR       PValue          FDR
## ENSDARG00000041134  1.962180 5.3661911 37.33345 9.956243e-10 2.037047e-05
## ENSDARG00000063276 -7.323259 0.3182241 27.20279 1.831938e-07 1.688309e-03
## ENSDARG00000073898  4.027082 3.7260486 26.62089 2.475526e-07 1.688309e-03
## ENSDARG00000093844  3.903872 4.1151604 23.12375 1.519029e-06 7.769834e-03
## ENSDARG00000088811  2.633743 4.3301115 22.04605 2.661872e-06 1.089238e-02
## ENSDARG00000040553  3.212572 1.5827829 21.44060 3.649607e-06 1.244516e-02
## ENSDARG00000045834  3.608489 3.9177290 21.04607 4.483724e-06 1.310529e-02
## ENSDARG00000043722  2.845846 4.5610506 20.34622 6.462070e-06 1.652674e-02
## ENSDARG00000074613  3.166565 6.1890123 18.57220 1.635887e-05 3.718916e-02
## ENSDARG00000053966  1.537037 5.2362430 17.96562 2.249316e-05 4.602100e-02

Below is the same analysis but omitting the sample Control 1 that seem to show a very different pattern.

All_DGEfiltered <- All_DGE[,c(2:12)] # remove the samples
designfiltered <- design[c(2:12),]
treatmentfiltered <- factor(designfiltered$Treatment)
designModelfiltered <- model.matrix(~treatmentfiltered)

All_DGE.nc.rmfiltered <- All_DGEfiltered
All_DGE.nc.rmfiltered$counts <- All_DGE.nc.rmfiltered$counts[!(row.names(All_DGE.nc.rmfiltered$counts)%in%listncRNA[,1]),]

All_DGE.keep13filtered <- rowSums(cpm(All_DGEfiltered)>1) >= 3 # create filtered that needs at least 1 cpm in 4 libraries (number of replicates in each treatment) for the genes to be retained in analysis
All_DGE.nc.rm.keep13filtered <- rowSums(cpm(All_DGE.nc.rmfiltered)>1) >= 3 # create filtered that needs at least 1 cpm in 4 libraries (number of replicates in each treatment) for the genes to be retained in analysis
table(All_DGE.keep13filtered) # how many genes are retained after filtering

## All_DGE.keep13filtered
## FALSE  TRUE 
## 11674 22063

table(All_DGE.nc.rm.keep13filtered) # how many genes are retained after filtering

## All_DGE.nc.rm.keep13filtered
## FALSE  TRUE 
##  6005 20807

All_DGE13filtered <- All_DGEfiltered[All_DGE.keep13filtered,] # do the actual filtering
All_DGE.nc.rm13filtered <- All_DGE.nc.rmfiltered[All_DGE.nc.rm.keep13filtered,] # do the actual filtering

plotMDS(All_DGE13filtered, dim.plot=c(1,2),labels=labels[c(2:12)], xlim=c(-4,6))

plotMDS(All_DGE.nc.rm13filtered, dim.plot=c(1,2),xlim=c(-4,6))

levelplot(cor(All_DGE13filtered$counts))

levelplot(cor(All_DGE.nc.rm13filtered$counts))

All_DGE13filtered <- calcNormFactors(All_DGE13filtered)
All_DGE13filtered <- estimateGLMCommonDisp(All_DGE13filtered, designModelfiltered, verbose=TRUE)

## Disp = 0.12291 , BCV = 0.3506

All_DGE13filtered <- estimateGLMTrendedDisp(All_DGE13filtered, designModelfiltered)
All_DGE13filtered <- estimateGLMTagwiseDisp(All_DGE13filtered, designModelfiltered)

All_DGE.nc.rm13filtered <- calcNormFactors(All_DGE.nc.rm13filtered)
All_DGE.nc.rm13filtered <- estimateGLMCommonDisp(All_DGE.nc.rm13filtered, designModelfiltered, verbose=TRUE)

## Disp = 0.12392 , BCV = 0.352

All_DGE.nc.rm13filtered <- estimateGLMTrendedDisp(All_DGE.nc.rm13filtered, designModelfiltered)
All_DGE.nc.rm13filtered <- estimateGLMTagwiseDisp(All_DGE.nc.rm13filtered, designModelfiltered)

All_DGE13filtered.fit <- glmFit(All_DGE13filtered, designModelfiltered)
All_DGE13filtered.fitEE2_3 <- glmLRT(All_DGE13filtered.fit, coef = "treatmentfilteredEE2_3")
All_DGE13filtered.fitEE2_10 <- glmLRT(All_DGE13filtered.fit, coef = "treatmentfilteredEE2_10")
Table.All_DGE13filteredEE2_3 <- topTags(All_DGE13filtered.fitEE2_3, n=nrow(All_DGE13filtered))$table
Table.All_DGE13filteredEE2_10 <- topTags(All_DGE13filtered.fitEE2_10, n=nrow(All_DGE13filtered))
table(resultAll_DGE13filteredEE2_3 <- decideTestsDGE(All_DGE13filtered.fitEE2_3))

## 
##    -1     0     1 
##     6 22048     9

topTags(All_DGE13filtered.fitEE2_3, n = 16)

## Coefficient:  treatmentfilteredEE2_3 
##                         logFC     logCPM       LR       PValue
## ENSDARG00000081077  -3.259998  3.2903327 36.34908 1.649571e-09
## ENSDARG00000085168  -7.244447  6.6105761 29.55016 5.448806e-08
## ENSDARG00000083266  -2.326151  2.5375884 25.65639 4.079421e-07
## ENSDARG00000092458  -4.182849  0.8985077 25.25233 5.029865e-07
## ENSDARG00000041339   7.412487  1.8527827 22.58908 2.006216e-06
## ENSDARG00000081218 -11.715476 13.3276119 22.53318 2.065456e-06
## ENSDARG00000097996   6.792520  0.6288484 22.43087 2.178447e-06
## ENSDARG00000091399  -5.263067 -0.1244900 22.31618 2.312499e-06
## ENSDARG00000095618   1.929696  2.8820710 21.78112 3.055921e-06
## ENSDARG00000073820   5.316315  1.8743178 21.64067 3.288048e-06
## ENSDARG00000062788   5.233288  4.0704738 19.89996 8.160215e-06
## ENSDARG00000097157   8.596447  1.3035282 19.52685 9.919581e-06
## ENSDARG00000069503   4.461870  2.3473625 19.20727 1.172661e-05
## ENSDARG00000075193   1.754298  3.5530263 19.20687 1.172908e-05
## ENSDARG00000088811   3.062316  4.0254160 18.55743 1.648615e-05
## ENSDARG00000006427   5.600417  2.5856179 16.68330 4.416814e-05
##                             FDR
## ENSDARG00000081077 3.639448e-05
## ENSDARG00000085168 6.010851e-04
## ENSDARG00000083266 2.774348e-03
## ENSDARG00000092458 2.774348e-03
## ENSDARG00000041339 6.377582e-03
## ENSDARG00000081218 6.377582e-03
## ENSDARG00000097996 6.377582e-03
## ENSDARG00000091399 6.377582e-03
## ENSDARG00000095618 7.254421e-03
## ENSDARG00000073820 7.254421e-03
## ENSDARG00000062788 1.636717e-02
## ENSDARG00000097157 1.823798e-02
## ENSDARG00000069503 1.848419e-02
## ENSDARG00000075193 1.848419e-02
## ENSDARG00000088811 2.424893e-02
## ENSDARG00000006427 6.090511e-02

table(resultAll_DGE13filteredEE2_10 <- decideTestsDGE(All_DGE13filtered.fitEE2_10))

## 
##    -1     0     1 
##     2 22060     1

topTags(All_DGE13filtered.fitEE2_10)

## Coefficient:  treatmentfilteredEE2_10 
##                         logFC     logCPM       LR       PValue
## ENSDARG00000085168  -8.488074  6.6105761 36.56610 1.475757e-09
## ENSDARG00000081218 -13.173566 13.3276119 25.95907 3.487329e-07
## ENSDARG00000097996   6.941902  0.6288484 23.64886 1.156143e-06
## ENSDARG00000083042   3.796417  1.4065348 18.76849 1.475851e-05
## ENSDARG00000080404  -2.263103  2.3943284 17.26530 3.250696e-05
## ENSDARG00000082850  -1.652810  4.4890515 16.51254 4.832936e-05
## ENSDARG00000082628   5.649366  6.1223891 16.05600 6.149642e-05
## ENSDARG00000063276  -6.961693  0.1743958 15.95148 6.498708e-05
## ENSDARG00000083431  -2.770505  7.9692558 15.46942 8.385103e-05
## ENSDARG00000080372  -1.876902  3.8522501 15.24261 9.454575e-05
##                             FDR
## ENSDARG00000085168 3.255963e-05
## ENSDARG00000081218 3.847047e-03
## ENSDARG00000097996 8.502662e-03
## ENSDARG00000083042 8.140427e-02
## ENSDARG00000080404 1.434402e-01
## ENSDARG00000082850 1.777151e-01
## ENSDARG00000082628 1.792262e-01
## ENSDARG00000063276 1.792262e-01
## ENSDARG00000083431 2.035962e-01
## ENSDARG00000080372 2.035962e-01

All_DGE.nc.rm13filtered.fit <- glmFit(All_DGE.nc.rm13filtered, designModelfiltered)
All_DGE.nc.rm13filtered.fitEE2_3 <- glmLRT(All_DGE.nc.rm13filtered.fit, coef = "treatmentfilteredEE2_3")
All_DGE.nc.rm13filtered.fitEE2_10 <- glmLRT(All_DGE.nc.rm13filtered.fit, coef = "treatmentfilteredEE2_10")
Table.All_DGE.nc.rm13filteredEE2_3 <- topTags(All_DGE.nc.rm13filtered.fitEE2_3, n=nrow(All_DGE.nc.rm13filtered))$table
Table.All_DGE.nc.rm13filteredEE2_10 <- topTags(All_DGE.nc.rm13filtered.fitEE2_10, n=nrow(All_DGE.nc.rm13filtered))$table
table(resultAll_DGE.nc.rm13filteredEE2_3 <- decideTestsDGE(All_DGE.nc.rm13filtered.fitEE2_3))

## 
##    -1     0     1 
##     2 20799     6

topTags(All_DGE.nc.rm13filtered.fitEE2_3)

## Coefficient:  treatmentfilteredEE2_3 
##                        logFC     logCPM       LR       PValue         FDR
## ENSDARG00000092458 -4.192857  0.8989275 25.40946 4.636390e-07 0.009646936
## ENSDARG00000091399 -5.274667 -0.1238424 22.76679 1.828990e-06 0.014329419
## ENSDARG00000041339  7.393485  1.8446449 22.53262 2.066048e-06 0.014329419
## ENSDARG00000095618  1.921188  2.8800594 21.67142 3.235768e-06 0.014595547
## ENSDARG00000073820  5.304972  1.8712922 21.51683 3.507365e-06 0.014595547
## ENSDARG00000062788  5.207482  4.0533591 19.88044 8.243934e-06 0.028588588
## ENSDARG00000069503  4.444508  2.3419183 19.09502 1.243691e-05 0.036967841
## ENSDARG00000088811  3.055496  4.0237194 18.55239 1.652974e-05 0.042991793
## ENSDARG00000006427  5.574731  2.5729826 16.62898 4.545132e-05 0.105078391
## ENSDARG00000009544  5.339240  2.8029574 16.33786 5.299453e-05 0.106496225

    table(resultAll_DGE.nc.rm13filteredEE2_10 <- decideTestsDGE(All_DGE.nc.rm13filtered.fitEE2_10))

## 
##     0 
## 20807

topTags(All_DGE.nc.rm13filtered.fitEE2_10)

## Coefficient:  treatmentfilteredEE2_10 
##                        logFC     logCPM       LR       PValue       FDR
## ENSDARG00000063276 -6.972995  0.1750804 16.03918 6.204529e-05 0.9990506
## ENSDARG00000016713 -3.435634  0.2128911 15.05291 1.045389e-04 0.9990506
## ENSDARG00000088811  2.610579  4.0237194 14.10608 1.727840e-04 0.9990506
## ENSDARG00000076761 -1.188794  4.9600651 13.74334 2.095628e-04 0.9990506
## ENSDARG00000010662  1.143213  5.2883990 13.25931 2.712295e-04 0.9990506
## ENSDARG00000052564  3.237355 -0.1653769 12.46131 4.154685e-04 0.9990506
## ENSDARG00000089687  1.343595  5.7110479 12.01383 5.280724e-04 0.9990506
## ENSDARG00000003687  2.196638  1.5092463 12.00695 5.300256e-04 0.9990506
## ENSDARG00000089762 -1.387607  2.2977225 11.96382 5.424362e-04 0.9990506
## ENSDARG00000037861  1.411788  3.7600213 11.50895 6.926184e-04 0.9990506

RNA analysis of Zebra fish Gonads

Thomas Källman

Summer 2015

Data

Quality control and mapping