Gavin Douglas
Aug. 21st, 2018
Success?
How could you calculate the mean of each element in the below list?
my_list <- list("a"=c(15,14,526),
"b"=c(417,6541,14), "c"=c(5,4))
Using lapply is easier!
lapply(my_list, mean)
$a
[1] 185
$b
[1] 2324
$c
[1] 4.5
my_df <- data.frame(matrix(c(1,5,7,4,1,5,7,3,14),
nrow=3, ncol=3))
lapply(my_df, mean)
$X1
[1] 4.333333
$X2
[1] 3.333333
$X3
[1] 8
x <- 1:4
lapply(x, runif)
[[1]]
[1] 0.04721432
[[2]]
[1] 0.5410313 0.4316071
[[3]]
[1] 0.2415562 0.6597251 0.4966759
[[4]]
[1] 0.9568958 0.3634965 0.8942512 0.6853268
x <- 1:4
lapply(x, runif, min=0, max=10)
[[1]]
[1] 2.047823
[[2]]
[1] 1.5909216 0.6347786
[[3]]
[1] 7.711469 6.559685 3.702482
[[4]]
[1] 7.6326175 0.8124635 0.6938888 9.7873019
A function that is unnamed and only defined in the loop.
my_list <- list("a"=c(15,14,526),
"b"=c(417,6541,14), "c"=c(5,4))
lapply(my_list, function(x) { x[1] + x[2] })
$a
[1] 29
$b
[1] 6958
$c
[1] 9
my_df <- data.frame(matrix(c(1,5,7,4,1,5,7,3,14),
nrow=3, ncol=3))
my_df
X1 X2 X3
1 1 4 7
2 5 1 3
3 7 5 14
apply(my_df, 2, mean)
X1 X2 X3
4.333333 3.333333 8.000000
apply(my_df, 1, sum)
[1] 12 9 26
If summing or averaging rows or columns, you should use these specialized functions:
rowSumscolSumsrowMeanscolMeansx <- matrix(rnorm(200), 4, 50)
apply(x, 1, quantile, probs=c(0.25, 0.75))
[,1] [,2] [,3] [,4]
25% -0.8391107 -0.5240767 -0.7099116 -0.8795485
75% 0.4337546 0.8268244 0.8929874 0.4903421
sapply: returns output of lapply as vector if possible.mapply: applies function in parallel to set of arguments.tapply: applies function over subsets of a vector.plot
boxplot
hist
etc…
data(iris)
plot(iris$Sepal.Length, iris$Petal.Length, col=iris$Species, pch=16)
boxplot(iris$Sepal.Length ~ iris$Species, outline=FALSE, col="grey")
library(beeswarm)
boxplot(iris$Sepal.Length ~ iris$Species, outline=FALSE, col="grey")
beeswarm(iris$Sepal.Length ~ iris$Species, pch=16, add=TRUE)
hist(iris$Petal.Length, col="grey", breaks=50)
abline(v=2.5, lwd=2, lty=2, col="black")
Useful when you have multiple factors and you want to compare the levels (or facets).
data(CO2)
summary(CO2)
Plant Type Treatment conc
Qn1 : 7 Quebec :42 nonchilled:42 Min. : 95
Qn2 : 7 Mississippi:42 chilled :42 1st Qu.: 175
Qn3 : 7 Median : 350
Qc1 : 7 Mean : 435
Qc3 : 7 3rd Qu.: 675
Qc2 : 7 Max. :1000
(Other):42
uptake
Min. : 7.70
1st Qu.:17.90
Median :28.30
Mean :27.21
3rd Qu.:37.12
Max. :45.50
library(lattice)
histogram(~ conc | Type + Treatment, data=CO2)
xyplot(uptake ~ conc | Type + Treatment, data=CO2)
See more on this cheatsheet: https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf
library(ggplot2)
data(iris)
ggplot(data=iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point()
library(ggplot2)
data(iris)
my_plot <- ggplot(data=iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point()
my_plot <- my_plot + theme_bw() + geom_point(aes(color=Species, shape=Species)) + xlab("Sepal Length") + ylab("Sepal Width")
Many plot types both in base and ggplot2 are shown here on the iris dataset: https://www.mailman.columbia.edu/sites/default/files/media/fdawg_ggplot2.html
The 3 “Simulation” videos. I'll cover these concepts along with basic stastitical tests and how to do build basic models in R.