R Programming for Biologists: Workshop 3

Gavin Douglas
Aug. 21st, 2018

Fibonnaci's rabbits!

Success?

Warm-up question

How could you calculate the mean of each element in the below list?

my_list <- list("a"=c(15,14,526),
                "b"=c(417,6541,14), "c"=c(5,4))

lapply

Using lapply is easier!

lapply(my_list, mean)

$a
[1] 185

$b
[1] 2324

$c
[1] 4.5

lapply returns a new list no matter what is input.

my_df <- data.frame(matrix(c(1,5,7,4,1,5,7,3,14), 
                  nrow=3, ncol=3))

lapply(my_df, mean)

$X1
[1] 4.333333

$X2
[1] 3.333333

$X3
[1] 8

Coursera runif lapply example (1/2)

x <- 1:4
lapply(x, runif)

[[1]]
[1] 0.04721432

[[2]]
[1] 0.5410313 0.4316071

[[3]]
[1] 0.2415562 0.6597251 0.4966759

[[4]]
[1] 0.9568958 0.3634965 0.8942512 0.6853268

Coursera runif lapply example (2/2)

x <- 1:4
lapply(x, runif, min=0, max=10)

[[1]]
[1] 2.047823

[[2]]
[1] 1.5909216 0.6347786

[[3]]
[1] 7.711469 6.559685 3.702482

[[4]]
[1] 7.6326175 0.8124635 0.6938888 9.7873019

Anonymous functions

A function that is unnamed and only defined in the loop.

my_list <- list("a"=c(15,14,526),
                "b"=c(417,6541,14), "c"=c(5,4))

lapply(my_list, function(x) { x[1] + x[2] })

$a
[1] 29

$b
[1] 6958

$c
[1] 9

apply: collapsing by rows or columns.

my_df <- data.frame(matrix(c(1,5,7,4,1,5,7,3,14), 
                  nrow=3, ncol=3))
my_df

apply(my_df, 2, mean)

      X1       X2       X3 
4.333333 3.333333 8.000000

apply(my_df, 1, sum)

[1] 12  9 26

Specialized functions are usually faster!

If summing or averaging rows or columns, you should use these specialized functions:

rowSums
colSums
rowMeans
colMeans

Coursera quantile apply example

x <- matrix(rnorm(200), 4, 50)
apply(x, 1, quantile, probs=c(0.25, 0.75))

          [,1]       [,2]       [,3]       [,4]
25% -0.8391107 -0.5240767 -0.7099116 -0.8795485
75%  0.4337546  0.8268244  0.8929874  0.4903421

Other "apply" functions

sapply: returns output of lapply as vector if possible.
mapply: applies function in parallel to set of arguments.
tapply: applies function over subsets of a vector.

Major types of plots in R

base –> basic plots
lattice –> focus on breaking plots down by facets
ggplot2 –> popular and powerful (but steeper learning curve)

base plots - we've already been using these.

plot
boxplot
hist
etc…

"plot" function

data(iris)
plot(iris$Sepal.Length, iris$Petal.Length, col=iris$Species, pch=16)

plot of chunk unnamed-chunk-9

boxplot

boxplot(iris$Sepal.Length ~ iris$Species, outline=FALSE, col="grey")

plot of chunk unnamed-chunk-10

You can add to base plots - like on a canvas

library(beeswarm)
boxplot(iris$Sepal.Length ~ iris$Species, outline=FALSE, col="grey")
beeswarm(iris$Sepal.Length ~ iris$Species, pch=16, add=TRUE)

plot of chunk unnamed-chunk-11

Adding a line to a base plot

hist(iris$Petal.Length, col="grey", breaks=50)
abline(v=2.5, lwd=2, lty=2, col="black")

plot of chunk unnamed-chunk-12

lattice plots

Useful when you have multiple factors and you want to compare the levels (or facets).

data(CO2)
summary(CO2)

     Plant             Type         Treatment       conc     
 Qn1    : 7   Quebec     :42   nonchilled:42   Min.   :  95  
 Qn2    : 7   Mississippi:42   chilled   :42   1st Qu.: 175  
 Qn3    : 7                                    Median : 350  
 Qc1    : 7                                    Mean   : 435  
 Qc3    : 7                                    3rd Qu.: 675  
 Qc2    : 7                                    Max.   :1000  
 (Other):42                                                  
     uptake     
 Min.   : 7.70  
 1st Qu.:17.90  
 Median :28.30  
 Mean   :27.21  
 3rd Qu.:37.12  
 Max.   :45.50

Histogram example with lattice

library(lattice)
histogram(~ conc | Type + Treatment, data=CO2)

plot of chunk unnamed-chunk-14

Scatterplot example with lattice

xyplot(uptake ~ conc | Type + Treatment, data=CO2)

plot of chunk unnamed-chunk-15

ggplot2

ggplot2_cheatsheet

See more on this cheatsheet: https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf

Making a scatterplot with ggplot2

library(ggplot2)
data(iris)
ggplot(data=iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point()

plot of chunk unnamed-chunk-16

Adding additional setting before plotting

library(ggplot2)
data(iris)
my_plot <- ggplot(data=iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point()

my_plot <- my_plot + theme_bw() + geom_point(aes(color=Species, shape=Species)) + xlab("Sepal Length") +  ylab("Sepal Width")

Final sepal length vs width plot

plot of chunk unnamed-chunk-18

Additional ggplot2 examples

Many plot types both in base and ggplot2 are shown here on the iris dataset: https://www.mailman.columbia.edu/sites/default/files/media/fdawg_ggplot2.html

Coursera videos for workshop 4

The 3 “Simulation” videos. I'll cover these concepts along with basic stastitical tests and how to do build basic models in R.