Project Overview:
Use visual exploratory data analysis (EDA) techniques to understand the structure and potential bias within a High Throughput Sequencing dataset.

Asthma Overview

Asthma Overview


Study Overview
  • Asthma affects 300+ million people worldwide and is characterized as a chronic inflammtory respiratory disease

  • Glucocorticoids have an anti-inflammatory effect on airway smooth muscle cells, and used as a therapy for asthma

  • Glucocorticoids act by binding to glucocorticoid receptors which then translocate to cellular nuclei and modulate gene expression

  • The following describes transcriptomic profiling of cells treated with dexamethasone - a potent synthetic glucocorticoid

  • Project re-analyzes RNA-seq data from 4 airway smooth muscle (ASM) cell lines:
  • N61311
  • N052611
  • N080611
  • N061011

Data Source

Himes BE, Jiang X, Wagner P, Hu R, Wang Q, Klanderman B, Whitaker RM, Duan Q, Lasky-Su J, Nikolos C, Jester W, Johnson M, Panettieri R Jr, Tantisira KG, Weiss ST, Lu Q. “RNA-Seq Transcriptome Profiling Identifies CRISPLD2 as a Glucocorticoid Responsive Gene that Modulates Cytokine Function in Airway Smooth Muscle Cells.” PLoS One. 2014 Jun 13;9(6):e99625. PMID: 24926665. GEO: GSE52778.
Airways Dataset
Asthma Image Source

Problem Description:
A comparison of different read count normalization methods


Raw Data Description:
Similarities between cell line samples pre- and post-treatment were found based on similarities in gene expression.


Data Description - Roughly 20,000 genes were analyzed - Illumina sequencing results - Fragment counts were analyzed for reads mapped to Chromosome 1 were analyzed

Genes Analyzed - Many genes were found to have differential expression pre- and post-dexamethasone treatment

Major Result: Evaluation of Genes with highest variance in expression


Manual Inspection:
Select Sample comparison


---
title: "BIOF439 Final Project"
author: Lesley M Chapman
output: 
  flexdashboard::flex_dashboard:
    storyboard: true
    social: menu
    source: embed
---

```{r, include=FALSE}
#if (!('airway' %in% installed.packages()[,1])) #install.packages('airway', repos = 'http://cran.rstudio.com')
#if (!('ggbeeswarm' %in% installed.packages()[,1])) #install.packages('ggbeeswarm', repos = #'http://cran.rstudio.com')
#if (!('pheatmap' %in% installed.packages()[,1])) #install.packages('pheatmap', repos = #'http://cran.rstudio.com')
#if (!('PoiClaClu' %in% installed.packages()[,1])) #install.packages('PoiClaClu', repos = #'http://cran.rstudio.com')


#if (!requireNamespace("BiocManager", quietly = TRUE))
#  install.packages("BiocManager")
#BiocManager::install("apeglm")

#if (!requireNamespace("BiocManager", quietly = TRUE))
#  install.packages("BiocManager")
#BiocManager::install("DESeq2")

#if (!requireNamespace("BiocManager", quietly = TRUE))
#  install.packages("BiocManager")
#BiocManager::install("Gviz")

#if (!requireNamespace("BiocManager", quietly = TRUE))
#  install.packages("BiocManager")
#BiocManager::install("IHW")

#if (!requireNamespace("BiocManager", quietly = TRUE))
#  install.packages("BiocManager")
#BiocManager::install("org.Hs.eg.db")

#if (!requireNamespace("BiocManager", quietly = TRUE))
#  install.packages("BiocManager")
#BiocManager::install("vsn")


library("AnnotationDbi")
library("apeglm")
library(airway)
library('DESeq2')
library("dplyr")
library(flexdashboard)
library("genefilter")
library("ggbeeswarm")
library("ggplot2")
library("Gviz")
library("IHW")
library("org.Hs.eg.db")
library("pheatmap")
library("PoiClaClu")
library("RColorBrewer")
library("vsn")

dir <- system.file("extdata", package="airway", mustWork=TRUE)

```


```{r, include=FALSE}
'
Part 1
----------
Load data and create dataframe for analysis
'
# Load sample descriptions
csvfile <- file.path(dir,"sample_table.csv")
(sampleTable <- read.csv(csvfile,row.names=1))

data("airway")
se <- airway
head(se)

dds <- DESeqDataSet(se, design = ~ cell + dex)
head(dds, 5)

countdata <- assay(se)
coldata <- colData(se)
head(countdata, 3)

ddsMat <- DESeqDataSetFromMatrix(countData = countdata,
                                 colData = coldata,
                                 design = ~ cell + dex)

nrow(dds)
dds <- dds[ rowSums(counts(dds)) > 1, ]
nrow(dds)



lambda <- 10^seq(from = -1, to = 2, length = 1000)
cts <- matrix(rpois(1000*100, lambda), ncol = 100)
meanSdPlot(cts, ranks = FALSE)

#logarithm-transformed counts
log.cts.one <- log2(cts + 1)
meanSdPlot(log.cts.one, ranks = FALSE)

# FYI: fully unsupervised transformation blind = TRUE
vsd <- vst(dds, blind = FALSE)
head(assay(vsd), 3)
colData(vsd)
#rlog
rld <- rlog(dds, blind = FALSE)

```

### Project Overview:
Use visual exploratory data analysis (EDA) techniques to understand the structure and potential bias within a High Throughput Sequencing dataset. ```{r asthma, echo=FALSE, fig.cap="Asthma Overview", out.width = '100%'} knitr::include_graphics("/Users/chapmanlm/Desktop/FAES/Homework/BIOF439/Project/normal-vs-asthma-lung.png") ``` *** ##### Study Overview * Asthma affects 300+ million people worldwide and is characterized as a chronic inflammtory respiratory disease * Glucocorticoids have an anti-inflammatory effect on airway smooth muscle cells, and used as a therapy for asthma * Glucocorticoids act by binding to glucocorticoid receptors which then translocate to cellular nuclei and modulate gene expression * The following describes transcriptomic profiling of cells treated with dexamethasone - a potent synthetic glucocorticoid * Project re-analyzes RNA-seq data from 4 airway smooth muscle (ASM) cell lines: * N61311 * N052611 * N080611 * N061011 **Data Source** Himes BE, Jiang X, Wagner P, Hu R, Wang Q, Klanderman B, Whitaker RM, Duan Q, Lasky-Su J, Nikolos C, Jester W, Johnson M, Panettieri R Jr, Tantisira KG, Weiss ST, Lu Q. “RNA-Seq Transcriptome Profiling Identifies CRISPLD2 as a Glucocorticoid Responsive Gene that Modulates Cytokine Function in Airway Smooth Muscle Cells.” PLoS One. 2014 Jun 13;9(6):e99625. PMID: 24926665. GEO: GSE52778.
[Airways Dataset](https://bioconnector.github.io/workshops/data.html)
[Asthma Image Source](https://community.aafa.org/blog/what-happens-in-your-airways-when-you-have-asthma) ### **Problem Description:**
A comparison of different read count normalization methods ```{r, include=FALSE} dds <- estimateSizeFactors(dds) df <- bind_rows( as_data_frame(log2(counts(dds, normalized=TRUE)[, 1:2]+1)) %>% mutate(transformation = "log2(x + 1)"), as_data_frame(assay(vsd)[, 1:2]) %>% mutate(transformation = "vst"), as_data_frame(assay(rld)[, 1:2]) %>% mutate(transformation = "rlog")) colnames(df)[1:2] <- c("x", "y") ``` ```{r} ggplot(df, aes(x = x, y = y)) + geom_hex(bins = 80) + coord_fixed() + facet_grid( . ~ transformation) ``` *** - Technical biases may lead to erroneous conclusions in downstream analysis - Certain samples may have been sequenced at different depths - The estimate size factors method was used to make comparison of gene counts more comparable - The following normalization methods were then applied: - log2 transformation of normalized counts - variance stabilizing transformation(VST) transformation of normalized counts - rlog transofrmation of normalized counts ### **Raw Data Description:**
Similarities between cell line samples pre- and post-treatment were found based on similarities in gene expression. ```{r, include=FALSE} dds <- DESeq(dds) # results : extracts the estimated log2 fold changes and p values for the last variable in the design formula. res <- results(dds) res <- results(dds, contrast=c("dex","trt","untrt")) mcols(res, use.names = TRUE) summary(res) res # Only take differentially expressed genes with FDR < 0.05 res.05 <- results(dds, alpha = 0.05) table(res.05$padj < 0.05) results(dds, contrast = c("cell", "N061011", "N61311")) topGene <- rownames(res)[which.min(res$padj)] geneCounts <- plotCounts(dds, gene = topGene, intgroup = c("dex","cell"), returnData = TRUE) ``` ```{r, include=FALSE} sampleDists <- dist(t(assay(vsd))) sampleDists sampleDistMatrix <- as.matrix( sampleDists ) rownames(sampleDistMatrix) <- paste( vsd$dex, vsd$cell, sep = " - " ) colnames(sampleDistMatrix) <- NULL colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255) ``` ```{r} colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255) pheatmap(sampleDistMatrix, clustering_distance_rows = sampleDists, clustering_distance_cols = sampleDists, col = colors) ``` *** **Data Description** - Roughly 20,000 genes were analyzed - Illumina sequencing results - Fragment counts were analyzed for reads mapped to Chromosome 1 were analyzed **Genes Analyzed** - Many genes were found to have differential expression pre- and post-dexamethasone treatment - The potential false discovery rate was accounted for by applying the Benjamini-Hochberg correction to the values and plotting differentially expressed genes with p <0.05 - 316 genes were significantly differentially expressed after correcting for false discovery rate by the Benjamini-Hochberg - The Euclidead distance between samples ### **Major Result**: Evaluation of Genes with highest variance in expression ```{r} topVarGenes <- head(order(rowVars(assay(vsd)), decreasing = TRUE), 20) mat <- assay(vsd)[ topVarGenes, ] mat <- mat - rowMeans(mat) anno <- as.data.frame(colData(vsd)[, c("cell","dex")]) pheatmap(mat, annotation_col = anno) ``` *** - Evaluation of the top 20 genes with the highest variance - Heatmap reflects the amount by which each gene deviates in a specific sample from the gene’s average across all samples ### **Manual Inspection**:
Select Sample comparison ```{r, include=FALSE} # Only take differentially expressed genes with FDR < 0.05 res.05 <- results(dds, alpha = 0.05) table(res.05$padj < 0.05) results(dds, contrast = c("cell", "N061011", "N61311")) topGene <- rownames(res)[which.min(res$padj)] plotCounts(dds, gene = topGene, intgroup=c("dex")) ``` ```{r} geneCounts <- plotCounts(dds, gene = topGene, intgroup = c("dex","cell"), returnData = TRUE) ggplot(geneCounts, aes(x = dex, y = count, color = cell)) + scale_y_log10() + geom_beeswarm(cex = 3) ``` ***