Introduction to High Throughput Data

Lucas Schiffer
July 8, 2016

Data Analysis for the Life Sciences

Overview

  • The high throughput paradigm shift

  • Tools to work with “big data” in R

  • The structure of “omics” data in R

  • An example analysis of 175K data points

  • On to the exercises

The high throughput paradigm shift


Paper chromatography https://tinyurl.com/jl76ab9


Microarray platforms https://tinyurl.com/johvypu

Tools to work with "big data" in R

The devtools way

install.packages("devtools")
library(devtools)
install_github("genomicsclass/GSE5859Subset")

The Bioconductor way

source("https://bioconductor.org/biocLite.R")
biocLite()
library(BiocInstaller)
biocLite("genomicsclass/GSE5859Subset")

The structure of "omics" data in R

High throughput experiments are usually made up of three table - the measurements, their annotations, and information about the samples. The data are stored with the following structure:

Samples (rownames) - genes, single base locations of the genome, genomic regions, image pixel intensities, etc.

Features (colnames) - gene expression, methylation, mutations, variations, etc.

An example analysis of 175K data points

On to the exercises