.Rmd stands for R mark down file

First steps

Under the File tab, use Save As… to make a version of this file with a new name. In case things go sideways, we can go back to the original.

At the top of this document, put your name between the quotes after author. This is now your notebook.


The R programming language

R is a free, open-source programming language widely used in academia and industry for data analysis, statistical modeling, and data visualization. Throughout the DREAM-High Program, we will be coding with R.


R Studio

RStudio is a free and open-source environment for the R programming language. It provides a user-friendly interface to make working with R and generating reports pretty easy. We are working in R Studio now!


R Markdown

An R Markdown file, with the .Rmd extension, is a plain text document that combines text formatted in Markdown syntax with code written in R and other languages. Click the “Insert” tab at the top right of this window to see what kinds of programming languages can be used.

With R Markdown, we can generate reproducible and generalizable workflows. We will create beautiful reports we can share: Our R Markdown files can be “knit” (or rendered) into HTML pages, PDFs, Word documents, or slides.

Examples of R Markdown syntax

  • Headings: Use hash (#) symbols
  • Bold text: bold
  • Italic text: italic
  • Inline Code: x <- 5
  • Links to websites: DREAM-High
  • Add blank lines:
  • Add a line between section: —

We won’t see the effects of the syntax until we create our report with the “Knit” button at the top of this window.


Heatmaps

Heatmaps are a way to colorize, visualize, and organize a data set with the goal of finding relationships among observations and features.

We will use heatmaps in this course to find patterns in the gene expression data for the 1K breast cancer patients from The Cancer Genome Atlas. Here, we will learn how to create heatmaps with a practice data set.

mtcars data

R provides many data sets to work with, so we can learn new analysis skills before scaling up. mtcars is a classic go-to R data frame. It was extracted from the 1974 Motor Trend US magazine and comprises fuel consumption and 10 design and performance features for 32 automobiles (1973–74 models).

Jurui notes - gray spaces are code chunks - white spaces are notes

# This is a comment line

# Functions in R take arguments within the parentheses

# The function head() returns the first few lines of the mtcars table
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1


We can create a table of the entire data set in a new tab with the View() function.

# Check out the full data set
View(mtcars)


Each row of mtcars is an automobile, and each column is a performance feature. For example:

  • mpg is miles per gallon
  • wt is weight

The function help() provides information on R functions and data. We can find out what all the performance features are:

# what exactly is in mtcars?
help(mtcars)
#contains info about units and provides notes

Preparing the mtcars data

The function heatmap() is an easy way to convert the values in mtcars to colors which helps us visualize the data and look for relationships.

Let’s check out the help page for heatmap().

# The help() function can take a function as its argument
help(heatmap)


In the help file, we learn that heatmap() plots a numeric matrix of values. So our first step will be to ensure that the data are converted from a table or data frame to a matrix of number values. We will do most of our analysis on data in matrix form.

The symbol <- is the assignment operator. It assigns a value on the right side of the operator to a variable on the left side. It functions, for us, like an equals (=) sign.

# Convert mtcars into a matrix of numbers
# Assign the output to the variable data
data <- as.matrix(mtcars)   
#can also use an equals sign = instead of <-

Heatmap for mtcars

The heatmap() function is powerful: It not only converts our data values to colors, it also rearranges the rows (automobiles) and columns (performance features) so we can more easily find patterns in the data.

# A heat map is a color image of our data with dendrograms
heatmap(data)  


The rows correspond to cars (observations) and the columns to the 10 performance features.

The dendrograms (or tree diagrams) show how close the cars and features are according to the values in our data set.

In the default coloring scheme, the highest values have the darkest colors. We can see that some features disp and hp have higher values than others, but otherwise the visualization is not helpful.

Look at the mtcars table. Different features have very different scales, so what is high (red) for one feature, e.g. cyl, is low for another features, e.g. disp.


Heatmap for scaled mtcars

The scale() function normalizes the features so they are comparable.

# Let's change the range of each feature so they are comparable
# We'll assigne the output to a new variable data_scaled
data_scaled <- scale(data)


#first few rows of the scaled data
head(data_scaled)
##                          mpg        cyl        disp         hp       drat
## Mazda RX4          0.1508848 -0.1049878 -0.57061982 -0.5350928  0.5675137
## Mazda RX4 Wag      0.1508848 -0.1049878 -0.57061982 -0.5350928  0.5675137
## Datsun 710         0.4495434 -1.2248578 -0.99018209 -0.7830405  0.4739996
## Hornet 4 Drive     0.2172534 -0.1049878  0.22009369 -0.5350928 -0.9661175
## Hornet Sportabout -0.2307345  1.0148821  1.04308123  0.4129422 -0.8351978
## Valiant           -0.3302874 -0.1049878 -0.04616698 -0.6080186 -1.5646078
##                             wt       qsec         vs         am       gear
## Mazda RX4         -0.610399567 -0.7771651 -0.8680278  1.1899014  0.4235542
## Mazda RX4 Wag     -0.349785269 -0.4637808 -0.8680278  1.1899014  0.4235542
## Datsun 710        -0.917004624  0.4260068  1.1160357  1.1899014  0.4235542
## Hornet 4 Drive    -0.002299538  0.8904872  1.1160357 -0.8141431 -0.9318192
## Hornet Sportabout  0.227654255 -0.4637808 -0.8680278 -0.8141431 -0.9318192
## Valiant            0.248094592  1.3269868  1.1160357 -0.8141431 -0.9318192
##                         carb
## Mazda RX4          0.7352031
## Mazda RX4 Wag      0.7352031
## Datsun 710        -1.1221521
## Hornet 4 Drive    -1.1221521
## Hornet Sportabout -0.5030337
## Valiant           -1.1221521

Let’s see if a heatmap for the scaled data is more informative.

# A heat map is a color image of our data with dendrograms
heatmap(data_scaled)

  • Map is organized to place similar cars close together

Now the patterns emerge!

What relationships do you find? Are the values and groupings for wt and mpg surprising? Does the clustering of vehicles make sense?


Color schemes

We can use a color palette to change the color coding and style of our heatmap.

RColorBrewer is an R package that contains ready-to-use color palettes for creating nice graphics. - RColorBrewer package created by a woman named Brewer

# Packages are loaded with the library() function
library(RColorBrewer)

# Parameters for plotting 
par(cex = 0.5)

# Get a graphic for all color schemes
display.brewer.all()

The default color-coding by heatmap is “YlOrRd” which is the top row.

We can use any of the palettes provided. Perhaps another scheme reveals relationships in the data more effectively or it is just more fun.

# Change the arguemnt in parentheses to any of the palettes
heatmap(data_scaled, col=brewer.pal(8,"RdPu"))


Create your report

Click on the Knit icon next to the ball of blue yarn and select Knit it HTML. This will create an html file of your report. We will publish our reports online so we can share what we’ve done with others.


Summary

Great work! We learned a lot about R:

We will get lots of practice with all the functionality in our activities. We will use heatmaps and dendrograms to look for patterns in breast cancer gene expression data.