Preparing normalized gene expression table. It is adapted from RNA-seq workflow from bioconductor.org. Full version can be found here: https://www.bioconductor.org/help/workflows/rnaseqGene/.
Install related R packages:
source("https://bioconductor.org/biocLite.R")
Bioconductor version 3.4 (BiocInstaller 1.24.0), ?biocLite for help
A new version of Bioconductor is available after installing the most recent version of R; see
http://bioconductor.org/install
biocLite(c("airway", "DESeq2", "pheatmap", "RColorBrewer", "AnnotationDbi", "org.Hs.eg.db"))
BioC_mirror: https://bioconductor.org
Using Bioconductor 3.4 (BiocInstaller 1.24.0), R 3.3.2 (2016-10-31).
Installing package(s) ‘airway’, ‘DESeq2’, ‘pheatmap’, ‘RColorBrewer’, ‘AnnotationDbi’,
‘org.Hs.eg.db’
trying URL 'https://bioconductor.org/packages/3.4/bioc/bin/macosx/mavericks/contrib/3.3/DESeq2_1.14.1.tgz'
Content type 'application/x-gzip' length 1781581 bytes (1.7 MB)
==================================================
downloaded 1.7 MB
trying URL 'http://cran.r-project.org/bin/macosx/mavericks/contrib/3.3/pheatmap_1.0.8.tgz'
Content type 'application/x-gzip' length 39380 bytes (38 KB)
==================================================
downloaded 38 KB
trying URL 'http://cran.r-project.org/bin/macosx/mavericks/contrib/3.3/RColorBrewer_1.1-2.tgz'
Content type 'application/x-gzip' length 24183 bytes (23 KB)
==================================================
downloaded 23 KB
trying URL 'https://bioconductor.org/packages/3.4/bioc/bin/macosx/mavericks/contrib/3.3/AnnotationDbi_1.36.2.tgz'
Content type 'application/x-gzip' length 4643424 bytes (4.4 MB)
==================================================
downloaded 4.4 MB
The downloaded binary packages are in
/var/folders/nk/nmr633rd2gzdfmjxgwql30whp_pl0r/T//RtmpfpzwjS/downloaded_packages
installing the source packages ‘airway’, ‘org.Hs.eg.db’
trying URL 'https://bioconductor.org/packages/3.4/data/experiment/src/contrib/airway_0.108.0.tar.gz'
Content type 'application/x-gzip' length 11755934 bytes (11.2 MB)
==================================================
downloaded 11.2 MB
trying URL 'https://bioconductor.org/packages/3.4/data/annotation/src/contrib/org.Hs.eg.db_3.4.0.tar.gz'
Content type 'application/x-gzip' length 69557350 bytes (66.3 MB)
==================================================
downloaded 66.3 MB
Bioconductor version 3.4 (BiocInstaller 1.24.0), ?biocLite for help
A new version of Bioconductor is available after installing the most recent
version of R; see http://bioconductor.org/install
* installing *source* package ‘airway’ ...
** data
** inst
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
Bioconductor version 3.4 (BiocInstaller 1.24.0), ?biocLite for help
A new version of Bioconductor is available after installing the most recent
version of R; see http://bioconductor.org/install
Warning: package ‘GenomicRanges’ was built under R version 3.3.3
Warning: package ‘S4Vectors’ was built under R version 3.3.3
Warning: package ‘IRanges’ was built under R version 3.3.3
* DONE (airway)
Bioconductor version 3.4 (BiocInstaller 1.24.0), ?biocLite for help
A new version of Bioconductor is available after installing the most recent
version of R; see http://bioconductor.org/install
* installing *source* package ‘org.Hs.eg.db’ ...
** R
** inst
** preparing package for lazy loading
Warning: package ‘IRanges’ was built under R version 3.3.3
Warning: package ‘S4Vectors’ was built under R version 3.3.3
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
Bioconductor version 3.4 (BiocInstaller 1.24.0), ?biocLite for help
A new version of Bioconductor is available after installing the most recent
version of R; see http://bioconductor.org/install
Warning: package ‘IRanges’ was built under R version 3.3.3
Warning: package ‘S4Vectors’ was built under R version 3.3.3
* DONE (org.Hs.eg.db)
The downloaded source packages are in
‘/private/var/folders/nk/nmr633rd2gzdfmjxgwql30whp_pl0r/T/RtmpfpzwjS/downloaded_packages’
Step1: Read full dataset:
library(airway)
library(DESeq2)
data("airway")
se <- airway
se
class: RangedSummarizedExperiment
dim: 64102 8
metadata(1): ''
assays(1): counts
rownames(64102): ENSG00000000003 ENSG00000000005 ... LRG_98 LRG_99
rowData names(0):
colnames(8): SRR1039508 SRR1039509 ... SRR1039520 SRR1039521
colData names(9): SampleName cell ... Sample BioSample
library("DESeq2")
dds <- DESeqDataSet(se, design = ~ cell + dex)
nrow(dds)
[1] 64102
dds <- dds[ rowSums(counts(dds)) > 8, ]
nrow(dds)
[1] 22724
Step2: Gene expression normalization:
rld <- rlog(dds, blind = FALSE)
Step3: Visualization: Calcualte Sample distances.
sampleDists <- dist(t(assay(rld)))
sampleDists
SRR1039508 SRR1039509 SRR1039512 SRR1039513 SRR1039516 SRR1039517 SRR1039520
SRR1039509 45.64583
SRR1039512 39.22526 54.86814
SRR1039513 62.58200 44.47942 48.67620
SRR1039516 44.48078 59.02111 43.54852 63.69250
SRR1039517 64.46484 51.41921 59.20253 49.84901 47.44240
SRR1039520 39.53220 57.40815 36.69556 58.43376 46.36536 63.56139
SRR1039521 63.30851 45.00248 57.82845 36.44424 65.49325 52.28204 50.08456
sampleDistMatrix <- as.matrix( sampleDists )
rownames(sampleDistMatrix) <- paste( rld$dex, rld$cell, sep = " - " )
colnames(sampleDistMatrix) <- NULL
Heatmap of sample-to-sample distances using the rlog-transformed values.
library("pheatmap")
library("RColorBrewer")
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
pheatmap(sampleDistMatrix,
clustering_distance_rows = sampleDists,
clustering_distance_cols = sampleDists,
col = colors)
PCA plot using the rlog-transformed values. Each unique combination of treatment and cell line is given its own color.
plotPCA(rld, intgroup = c("dex", "cell"))
Step 4: Differential gene expression analysis.
dds=DESeq(dds)
res <- results(dds, contrast=c("dex","trt","untrt"), lfcThreshold = 1, alpha=0.05)
summary(res)
plotMA(res, ylim = c(-5, 5))
Step 5: Gene annotation:
library("AnnotationDbi")
library("org.Hs.eg.db")
res$symbol <- mapIds(org.Hs.eg.db,
keys=row.names(res),
column="SYMBOL",
keytype="ENSEMBL",
multiVals="first")
'select()' returned 1:many mapping between keys and columns
res$entrez <- mapIds(org.Hs.eg.db,
keys=row.names(res),
column="ENTREZID",
keytype="ENSEMBL",
multiVals="first")
'select()' returned 1:many mapping between keys and columns
resOrdered <- res[order(res$padj),]
head(resOrdered)
log2 fold change (MAP): dex trt vs untrt
Wald test p-value: dex trt vs untrt
DataFrame with 6 rows and 8 columns
baseMean log2FoldChange lfcSE stat
<numeric> <numeric> <numeric> <numeric>
ENSG00000152583 997.4398 4.311996 0.1723934 19.21185
ENSG00000101347 12703.3871 3.617111 0.1494225 17.51484
ENSG00000165995 495.0929 3.186968 0.1277913 17.11358
ENSG00000189221 2341.7673 3.229328 0.1369682 16.27624
ENSG00000211445 12285.6151 3.551350 0.1585084 16.09599
ENSG00000120129 3409.0294 2.870872 0.1184702 15.79192
pvalue padj symbol entrez
<numeric> <numeric> <character> <character>
ENSG00000152583 2.945538e-82 5.655138e-78 SPARCL1 8404
ENSG00000101347 1.103913e-68 1.059701e-64 SAMHD1 25939
ENSG00000165995 1.175402e-65 7.522183e-62 CACNB2 783
ENSG00000189221 1.455309e-59 6.985119e-56 MAOA 4128
ENSG00000211445 2.721637e-58 1.045054e-54 GPX3 2878
ENSG00000120129 3.536655e-56 1.131671e-52 DUSP1 1843