Ordinating all samples

The only goal of this is to take a top-down view of all samples; how do they cluster? This helps us to understand how “global” the changes are and what the dimensionality is. For example, if mutation and drug have the same effect, we expect a more 1-dimensional ordination, whereas if they are different (i.e. induce/repress a different set of genes), we will see a two-dimensional ordination.

library(limma)
library(statmod)
library(affy)
library(affycoretools)
library(oligo)
library(hta20transcriptcluster.db)
library(ggplot2)
library(RColorBrewer)
Annotation <- read.csv("HTA-2_0.na35.2.hg19.transcript.csv/HTA-2_0.na35.2.hg19.transcript.csv", header=TRUE, skip=19)
probe.gene.map <- read.csv("probe.gene.map.csv")
targets <- read.csv("targets.txt", as.is=T)
dat <- read.celfiles(targets$FileName) #read .CEL files
## Reading in : 20160505_HTA2.0_0316_WTH_1R_(HTA-2_0).CEL
## Reading in : 20160505_HTA2.0_0317_WTH_2R_(HTA-2_0).CEL
## Reading in : 20160505_HTA2.0_0318_WTH_3R_(HTA-2_0).CEL
## Reading in : 20160505_HTA2.0_0319_WTH_4R_(HTA-2_0).CEL
## Reading in : 20160616_HTA2.0_0333_WTH_1_R1_(HTA-2_0).CEL
## Reading in : 20160616_HTA2.0_0334_WTH_2_R1_(HTA-2_0).CEL
## Reading in : 20160616_HTA2.0_0335_WTH_3_R2_(HTA-2_0).CEL
## Reading in : 20160616_HTA2.0_0336_WTH_4_R1_(HTA-2_0).CEL
project.bgcorrect.norm.avg <- rma(dat, background=TRUE, normalize=TRUE) #normalize
## Background correcting
## Normalizing
## Calculating Expression
#this may remove control probes, it should only include probes which actually link to genes
project.bgcorrect.norm.avg.filt <- exprs(project.bgcorrect.norm.avg)[which(rownames(project.bgcorrect.norm.avg) %in% Annotation[,1]),]
Exp <- factor(targets$Experiment)
Strain <- factor(targets$Strain, levels=c("GF","GFR"))
Treatment <- factor(targets$Treatment, levels=c("control","drug"))
col.strain <- Strain
levels(col.strain) <-  brewer.pal(nlevels(col.strain), "Set1")
col.strain <- as.character(col.strain)
col.treatment <- Treatment
levels(col.treatment) <-  brewer.pal(nlevels(col.treatment), "Set2")
col.treatment <- as.character(col.treatment)
plotMDS(project.bgcorrect.norm.avg.filt, col=col.strain, pch = 19, labels = paste(Strain,"+",Treatment,sep=""))

plotMDS(project.bgcorrect.norm.avg.filt, col=col.treatment, pch = 12, labels = paste(Strain,"+",Treatment,sep=''))

We can see a nice two-dimensional clustering, with more variation amond the treated samples. However, it is clear that at least the large-scale, the same general set of genes is altered by drug exposure.

Are there any effects of experiment? (the two sets of four samples were prepared on different days). Color by experiment and examine the lesser axes, 3 and 4.

col.exp <- Exp
levels(col.exp) <-  brewer.pal(nlevels(col.exp), "Set1")
col.exp <- as.character(col.exp)
plotMDS(project.bgcorrect.norm.avg.filt, dim=c(3,4),col=col.exp, pch = 12, labels = paste(Strain,"+",Treatment,"+",Exp,sep=''))

Yes, the third axis delineates the experiments, which does justify controlling for experiment in the analyses.

General characterization of intensities, comparison of samples

We can plot intensity distributions of the probes in each sample:

targets <- read.csv("targets.txt", as.is=T)
project.bgcorrect.norm.avg <- rma(dat, background=TRUE, normalize=TRUE)
## Background correcting
## Normalizing
## Calculating Expression
project.bgcorrect.norm.avg.filt <- exprs(project.bgcorrect.norm.avg)[which(rownames(project.bgcorrect.norm.avg) %in% Annotation[,1]),]
plotDensities(project.bgcorrect.norm.avg.filt, legend = F)

Here we can have some assurance that the samples behaved very similarly to one-another. Not a rigorous method, but quickly allows for spotting any notable problems in intensity distributions.