Introduction to High Throughput Data

Lucas Schiffer
July 8, 2016

Data Analysis for the Life Sciences

Overview

The high throughput paradigm shift
Tools to work with “big data” in R
The structure of “omics” data in R
An example analysis of 175K data points
On to the exercises

The high throughput paradigm shift

Paper chromatography https://tinyurl.com/jl76ab9

Microarray platforms https://tinyurl.com/johvypu

Tools to work with "big data" in R

The devtools way

install.packages("devtools")
library(devtools)
install_github("genomicsclass/GSE5859Subset")

The Bioconductor way

source("https://bioconductor.org/biocLite.R")
biocLite()
library(BiocInstaller)
biocLite("genomicsclass/GSE5859Subset")

The structure of "omics" data in R

High throughput experiments are usually made up of three table - the measurements, their annotations, and information about the samples. The data are stored with the following structure:

Samples (rownames) - genes, single base locations of the genome, genomic regions, image pixel intensities, etc.

Features (colnames) - gene expression, methylation, mutations, variations, etc.

An example analysis of 175K data points

NYC noise pollution https://github.com/schifferl/cityNoise