Asthma Overview
Asthma affects 300+ million people worldwide and is characterized as a chronic inflammtory respiratory disease
Glucocorticoids have an anti-inflammatory effect on airway smooth muscle cells, and used as a therapy for asthma
Glucocorticoids act by binding to glucocorticoid receptors which then translocate to cellular nuclei and modulate gene expression
The following describes transcriptomic profiling of cells treated with dexamethasone - a potent synthetic glucocorticoid
N061011
Data Source
Himes BE, Jiang X, Wagner P, Hu R, Wang Q, Klanderman B, Whitaker RM, Duan Q, Lasky-Su J, Nikolos C, Jester W, Johnson M, Panettieri R Jr, Tantisira KG, Weiss ST, Lu Q. “RNA-Seq Transcriptome Profiling Identifies CRISPLD2 as a Glucocorticoid Responsive Gene that Modulates Cytokine Function in Airway Smooth Muscle Cells.” PLoS One. 2014 Jun 13;9(6):e99625. PMID: 24926665. GEO: GSE52778.
Airways Dataset
Asthma Image Source
Technical biases may lead to erroneous conclusions in downstream analysis
Certain samples may have been sequenced at different depths
rlog transofrmation of normalized counts
Data Description - Roughly 20,000 genes were analyzed - Illumina sequencing results - Fragment counts were analyzed for reads mapped to Chromosome 1 were analyzed
Genes Analyzed - Many genes were found to have differential expression pre- and post-dexamethasone treatment
The potential false discovery rate was accounted for by applying the Benjamini-Hochberg correction to the values and plotting differentially expressed genes with p <0.05
316 genes were significantly differentially expressed after correcting for false discovery rate by the Benjamini-Hochberg
The Euclidead distance between samples
Evaluation of the top 20 genes with the highest variance
Heatmap reflects the amount by which each gene deviates in a specific sample from the gene’s average across all samples
---
title: "BIOF439 Final Project"
author: Lesley M Chapman
output:
flexdashboard::flex_dashboard:
storyboard: true
social: menu
source: embed
---
```{r, include=FALSE}
#if (!('airway' %in% installed.packages()[,1])) #install.packages('airway', repos = 'http://cran.rstudio.com')
#if (!('ggbeeswarm' %in% installed.packages()[,1])) #install.packages('ggbeeswarm', repos = #'http://cran.rstudio.com')
#if (!('pheatmap' %in% installed.packages()[,1])) #install.packages('pheatmap', repos = #'http://cran.rstudio.com')
#if (!('PoiClaClu' %in% installed.packages()[,1])) #install.packages('PoiClaClu', repos = #'http://cran.rstudio.com')
#if (!requireNamespace("BiocManager", quietly = TRUE))
# install.packages("BiocManager")
#BiocManager::install("apeglm")
#if (!requireNamespace("BiocManager", quietly = TRUE))
# install.packages("BiocManager")
#BiocManager::install("DESeq2")
#if (!requireNamespace("BiocManager", quietly = TRUE))
# install.packages("BiocManager")
#BiocManager::install("Gviz")
#if (!requireNamespace("BiocManager", quietly = TRUE))
# install.packages("BiocManager")
#BiocManager::install("IHW")
#if (!requireNamespace("BiocManager", quietly = TRUE))
# install.packages("BiocManager")
#BiocManager::install("org.Hs.eg.db")
#if (!requireNamespace("BiocManager", quietly = TRUE))
# install.packages("BiocManager")
#BiocManager::install("vsn")
library("AnnotationDbi")
library("apeglm")
library(airway)
library('DESeq2')
library("dplyr")
library(flexdashboard)
library("genefilter")
library("ggbeeswarm")
library("ggplot2")
library("Gviz")
library("IHW")
library("org.Hs.eg.db")
library("pheatmap")
library("PoiClaClu")
library("RColorBrewer")
library("vsn")
dir <- system.file("extdata", package="airway", mustWork=TRUE)
```
```{r, include=FALSE}
'
Part 1
----------
Load data and create dataframe for analysis
'
# Load sample descriptions
csvfile <- file.path(dir,"sample_table.csv")
(sampleTable <- read.csv(csvfile,row.names=1))
data("airway")
se <- airway
head(se)
dds <- DESeqDataSet(se, design = ~ cell + dex)
head(dds, 5)
countdata <- assay(se)
coldata <- colData(se)
head(countdata, 3)
ddsMat <- DESeqDataSetFromMatrix(countData = countdata,
colData = coldata,
design = ~ cell + dex)
nrow(dds)
dds <- dds[ rowSums(counts(dds)) > 1, ]
nrow(dds)
lambda <- 10^seq(from = -1, to = 2, length = 1000)
cts <- matrix(rpois(1000*100, lambda), ncol = 100)
meanSdPlot(cts, ranks = FALSE)
#logarithm-transformed counts
log.cts.one <- log2(cts + 1)
meanSdPlot(log.cts.one, ranks = FALSE)
# FYI: fully unsupervised transformation blind = TRUE
vsd <- vst(dds, blind = FALSE)
head(assay(vsd), 3)
colData(vsd)
#rlog
rld <- rlog(dds, blind = FALSE)
```
### Project Overview:
Use visual exploratory data analysis (EDA) techniques to understand the structure and potential bias within a High Throughput Sequencing dataset.
```{r asthma, echo=FALSE, fig.cap="Asthma Overview", out.width = '100%'}
knitr::include_graphics("/Users/chapmanlm/Desktop/FAES/Homework/BIOF439/Project/normal-vs-asthma-lung.png")
```
***
##### Study Overview
* Asthma affects 300+ million people worldwide and is characterized as a chronic inflammtory respiratory disease
* Glucocorticoids have an anti-inflammatory effect on airway smooth muscle cells, and used as a therapy for asthma
* Glucocorticoids act by binding to glucocorticoid receptors which then translocate to cellular nuclei and modulate gene expression
* The following describes transcriptomic profiling of cells treated with dexamethasone - a potent synthetic glucocorticoid
* Project re-analyzes RNA-seq data from 4 airway smooth muscle (ASM) cell lines:
* N61311
* N052611
* N080611
* N061011
**Data Source**
Himes BE, Jiang X, Wagner P, Hu R, Wang Q, Klanderman B, Whitaker RM, Duan Q, Lasky-Su J, Nikolos C, Jester W, Johnson M, Panettieri R Jr, Tantisira KG, Weiss ST, Lu Q. “RNA-Seq Transcriptome Profiling Identifies CRISPLD2 as a Glucocorticoid Responsive Gene that Modulates Cytokine Function in Airway Smooth Muscle Cells.” PLoS One. 2014 Jun 13;9(6):e99625. PMID: 24926665. GEO: GSE52778.
[Airways Dataset](https://bioconnector.github.io/workshops/data.html)
[Asthma Image Source](https://community.aafa.org/blog/what-happens-in-your-airways-when-you-have-asthma)
### **Problem Description:**
A comparison of different read count normalization methods
```{r, include=FALSE}
dds <- estimateSizeFactors(dds)
df <- bind_rows(
as_data_frame(log2(counts(dds, normalized=TRUE)[, 1:2]+1)) %>%
mutate(transformation = "log2(x + 1)"),
as_data_frame(assay(vsd)[, 1:2]) %>% mutate(transformation = "vst"),
as_data_frame(assay(rld)[, 1:2]) %>% mutate(transformation = "rlog"))
colnames(df)[1:2] <- c("x", "y")
```
```{r}
ggplot(df, aes(x = x, y = y)) + geom_hex(bins = 80) +
coord_fixed() + facet_grid( . ~ transformation)
```
***
- Technical biases may lead to erroneous conclusions in downstream analysis
- Certain samples may have been sequenced at different depths
- The estimate size factors method was used to make comparison of gene counts more comparable
- The following normalization methods were then applied:
- log2 transformation of normalized counts
- variance stabilizing transformation(VST) transformation of normalized counts
- rlog transofrmation of normalized counts
### **Raw Data Description:**
Similarities between cell line samples pre- and post-treatment were found based on similarities in gene expression.
```{r, include=FALSE}
dds <- DESeq(dds)
# results : extracts the estimated log2 fold changes and p values for the last variable in the design formula.
res <- results(dds)
res <- results(dds, contrast=c("dex","trt","untrt"))
mcols(res, use.names = TRUE)
summary(res)
res
# Only take differentially expressed genes with FDR < 0.05
res.05 <- results(dds, alpha = 0.05)
table(res.05$padj < 0.05)
results(dds, contrast = c("cell", "N061011", "N61311"))
topGene <- rownames(res)[which.min(res$padj)]
geneCounts <- plotCounts(dds, gene = topGene, intgroup = c("dex","cell"),
returnData = TRUE)
```
```{r, include=FALSE}
sampleDists <- dist(t(assay(vsd)))
sampleDists
sampleDistMatrix <- as.matrix( sampleDists )
rownames(sampleDistMatrix) <- paste( vsd$dex, vsd$cell, sep = " - " )
colnames(sampleDistMatrix) <- NULL
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
```
```{r}
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
pheatmap(sampleDistMatrix,
clustering_distance_rows = sampleDists,
clustering_distance_cols = sampleDists,
col = colors)
```
***
**Data Description**
- Roughly 20,000 genes were analyzed
- Illumina sequencing results
- Fragment counts were analyzed for reads mapped to Chromosome 1 were analyzed
**Genes Analyzed**
- Many genes were found to have differential expression pre- and post-dexamethasone treatment
- The potential false discovery rate was accounted for by applying the Benjamini-Hochberg correction to the values and plotting differentially expressed genes with p <0.05
- 316 genes were significantly differentially expressed after correcting for false discovery rate by the Benjamini-Hochberg
- The Euclidead distance between samples
### **Major Result**: Evaluation of Genes with highest variance in expression
```{r}
topVarGenes <- head(order(rowVars(assay(vsd)), decreasing = TRUE), 20)
mat <- assay(vsd)[ topVarGenes, ]
mat <- mat - rowMeans(mat)
anno <- as.data.frame(colData(vsd)[, c("cell","dex")])
pheatmap(mat, annotation_col = anno)
```
***
- Evaluation of the top 20 genes with the highest variance
- Heatmap reflects the amount by which each gene deviates
in a specific sample from the gene’s average across all samples
### **Manual Inspection**:
Select Sample comparison
```{r, include=FALSE}
# Only take differentially expressed genes with FDR < 0.05
res.05 <- results(dds, alpha = 0.05)
table(res.05$padj < 0.05)
results(dds, contrast = c("cell", "N061011", "N61311"))
topGene <- rownames(res)[which.min(res$padj)]
plotCounts(dds, gene = topGene, intgroup=c("dex"))
```
```{r}
geneCounts <- plotCounts(dds, gene = topGene, intgroup = c("dex","cell"),
returnData = TRUE)
ggplot(geneCounts, aes(x = dex, y = count, color = cell)) +
scale_y_log10() + geom_beeswarm(cex = 3)
```
***