We will work with a mouse dataset, containing “gene expression profiles of purified photoreceptors at distinct developmental stages and from different genetic backgrounds”. The microarray platform was Affymetrix mouse genomic expression array 430 2.0.
Load the photoRec (mini) data
kDat <- readRDS("GSE4051_MINI.rds")
str(kDat)
## 'data.frame': 39 obs. of 7 variables:
## $ sidChar : chr "Sample_11" "Sample_12" "Sample_2" "Sample_9" ...
## $ sidNum : num 11 12 2 9 36 37 38 39 16 17 ...
## $ devStage : Factor w/ 5 levels "E16","P2","P6",..: 5 5 5 5 5 5 5 5 1 1 ...
## $ gType : Factor w/ 2 levels "wt","NrlKO": 2 2 2 2 1 1 1 1 2 2 ...
## $ crabHammer: num 9.68 9.13 9.74 9.82 9.96 ...
## $ eggBomb : num 7.2 7.16 7.11 6.56 7.87 ...
## $ poisonFang: num 6.98 7.35 7.08 7.04 6.99 ...
table(kDat$devStage)
##
## E16 P2 P6 P10 4_weeks
## 7 8 8 8 8
table(kDat$gType)
##
## wt NrlKO
## 20 19
with(kDat, table(devStage, gType))
## gType
## devStage wt NrlKO
## E16 4 3
## P2 4 4
## P6 4 4
## P10 4 4
## 4_weeks 4 4
We see there are 39 samples (rows or observations). We have
- sample ID character (sidChar): a string recording sample ID
- sample ID number (sidNum): integers between 1 and 39
- developmental stage of the mouse (devStage): a factor with levels
* E16 (day 16 of embryonic development),
* P2 (postnatal day 2),
* P6 (postnatal day 6),
* P10 (postnatal day 10),
* 4_weeks (4 weeks postnatal)
- genotype of the mouse (gType):
* wt (wild type) or
* NrlKO (gene Nrl has been knocked out)
- gene expression level (in some abstract unit-less sense) for
* crabHammer,
* eggBomb, and
* poisonFang
It's pretty clear the intent was to have 4 mice at each combination of genotype and developmental stage, but there was some mishap for the embryonic knockouts, i.e. E16-NrlKO, where we only have data from 3 mice.
The most important plot type is arguably the scatterplot. The lattice function for this is xyplot(). Let's plot the expression data for two of these genes against each other using the lattice function xyplot().
To write the figure to file with an R code precede the figure-making code by opening a graphics device and follow it with a command that closes the device. You can view the file whatever way is appropriate to your OS. It's a silly figure but we've shown the process. Don't be surprised when you don't see any figure appear in your screen device. While you're sending graphics output to, e.g., the pdf() device, you'll be “flying blind”, which is why it's important to work out the graphics commands in advance.
library(lattice)
pdf("Fig1_eggBomb_vs_crabHammer.pdf")
xyplot(eggBomb ~ crabHammer, kDat)
dev.off()
## pdf
## 2
list.files(pattern = "Fig1_eggBomb_vs_crabHammer.pdf")
## [1] "Fig1_eggBomb_vs_crabHammer.pdf"
lattice graphing functions, like many other functions in R, utilize a special formula syntax and that is what is expected as the first inputted argument. The general form is y ~ x, read as “y twiddle x”. Here it requests that the first variable, eggBomb, be associated with the y direction and the second, crabHammer, be associated with x. The second argument kDat specifies the data.frame where the variables eggBomb and crabHammer can be found. It is being matched against the formal argument data by its position. As with the formula syntax, the use of a data argument, usually in the second position, is common in R, though sadly not universal. Take advantage of it whenever you can!
You try: request a scatterplot of the variable poisonFang against crabHammer.
xyplot(poisonFang ~ crabHammer, kDat)
Let's imagine that crabHammer is somehow a natural explanatory variable or predictor (weird here, I admit, but go with me) and eggBomb and poisonFang are natural response variables. We might want to see both responses plotted against crabHammer at the same time. Here is a first way to do so, using a bit of a cheat known as the “extended formula interface” in lattice.
xyplot(poisonFang + eggBomb ~ crabHammer, kDat, auto.key = T)
The “+” sign between the two response variables is, in this context, purely artificial, i.e. we don't literally add them up and scatterplot the sum against crabHammer. Read it as “and”. We've added auto.key = TRUE which produces the legend at the top and is a great convenience. Custom legends are absolutely possible but don't go there until you need to!
xyplot(poisonFang + eggBomb ~ crabHammer, kDat, outer = T, grid = T)
The addition of outer = TRUE has caused the information to be presented in two panels, which is the lattice term for the individual plotting cells. The addition of a background grid, via grid = TRUE, is a great help for viewers who want to make comparisons across panels.
xyplot(poisonFang + eggBomb ~ crabHammer, kDat, group = gType, outer = T, grid = T,
auto.key = T)
The lattice paradigm is for panels to correspond to a level of a factor or to a unique combination of levels of two or more factors.
Since reshaping is a topic in and of itself, we present the reshaping code without explanation here.
nDat <- with(kDat, data.frame(sidChar, sidNum, devStage, gType, crabHammer,
probeset = factor(rep(c("eggBomb", "poisonFang"), each = nrow(kDat))), geneExp = c(eggBomb,
poisonFang)))
str(nDat)
## 'data.frame': 78 obs. of 7 variables:
## $ sidChar : Factor w/ 39 levels "Sample_1","Sample_10",..: 3 4 12 39 30 31 32 33 8 9 ...
## $ sidNum : num 11 12 2 9 36 37 38 39 16 17 ...
## $ devStage : Factor w/ 5 levels "E16","P2","P6",..: 5 5 5 5 5 5 5 5 1 1 ...
## $ gType : Factor w/ 2 levels "wt","NrlKO": 2 2 2 2 1 1 1 1 2 2 ...
## $ crabHammer: num 9.68 9.13 9.74 9.82 9.96 ...
## $ probeset : Factor w/ 2 levels "eggBomb","poisonFang": 1 1 1 1 1 1 1 1 1 1 ...
## $ geneExp : num 7.2 7.16 7.11 6.56 7.87 ...
head(nDat)
## sidChar sidNum devStage gType crabHammer probeset geneExp
## 1 Sample_11 11 4_weeks NrlKO 9.677 eggBomb 7.204
## 2 Sample_12 12 4_weeks NrlKO 9.129 eggBomb 7.165
## 3 Sample_2 2 4_weeks NrlKO 9.744 eggBomb 7.107
## 4 Sample_9 9 4_weeks NrlKO 9.822 eggBomb 6.558
## 5 Sample_36 36 4_weeks wt 9.960 eggBomb 7.866
## 6 Sample_37 37 4_weeks wt 9.667 eggBomb 6.992
tail(nDat)
## sidChar sidNum devStage gType crabHammer probeset geneExp
## 73 Sample_4 4 P6 NrlKO 8.992 poisonFang 7.342
## 74 Sample_7 7 P6 NrlKO 8.803 poisonFang 7.754
## 75 Sample_28 28 P6 wt 8.214 poisonFang 7.428
## 76 Sample_29 29 P6 wt 10.130 poisonFang 7.100
## 77 Sample_30 30 P6 wt 8.951 poisonFang 7.274
## 78 Sample_31 31 P6 wt 8.693 poisonFang 7.409
Our reshaped dataset nDat has 78 observations of 7 variables, whereas kDat had 39 = 78 / 2 observations of 7 variables. The variables sidChar, sidNum, devStage, gType, and crabHammer have simply been repeated twice, resulting 39 * 2 = 78 rows. The variables eggBomb and poisonFang have been concatenated into a new variable geneExp with a new factor probeset identifying which gene the data is for.
xyplot(geneExp ~ crabHammer | probeset, nDat, grid = T, group = gType, auto.key = T)
We are using another feature of the formula syntax, y ~ x | z, which requests a scatterplot of y versus x for each level of the factor z. In general, in lattice plots the use of the vertical bar | requests individual plots in panels for each value of the factor specified on the right.
xyplot(geneExp ~ crabHammer | probeset, nDat, grid = T, group = devStage, auto.key = T)
The next set of figures we will make requires yet more data reshaping, which is a substantial background task in many analyses. We drop the idea of crabHammer being a predictor and eggBomb and poisonFang being responses and we just treat them all equivalently.
oDat <- with(kDat, data.frame(sidChar, sidNum, devStage, gType, probeset = factor(rep(c("crabHammer",
"eggBomb", "poisonFang"), each = nrow(kDat))), geneExp = c(crabHammer, eggBomb,
poisonFang)))
str(oDat)
## 'data.frame': 117 obs. of 6 variables:
## $ sidChar : Factor w/ 39 levels "Sample_1","Sample_10",..: 3 4 12 39 30 31 32 33 8 9 ...
## $ sidNum : num 11 12 2 9 36 37 38 39 16 17 ...
## $ devStage: Factor w/ 5 levels "E16","P2","P6",..: 5 5 5 5 5 5 5 5 1 1 ...
## $ gType : Factor w/ 2 levels "wt","NrlKO": 2 2 2 2 1 1 1 1 2 2 ...
## $ probeset: Factor w/ 3 levels "crabHammer","eggBomb",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ geneExp : num 9.68 9.13 9.74 9.82 9.96 ...
Our newly reshaped version of kDat, named oDat, has 39 * 3 = 117 observations but is otherwise very similar to nDat.
stripplot(~geneExp, oDat)
Pretty boring and slightly nonsensical! We had to start somewhere.
stripplot(probeset ~ geneExp, oDat)
stripplot(probeset ~ geneExp, oDat, jitter.data = T)
Notice that all the data is presented in one panel but with the different genes corresponding to different locations in the y direction.
stripplot(~geneExp | probeset, oDat, layout = c(nlevels(oDat$probeset), 1))
stripplot(~geneExp | probeset, oDat, layout = c(nlevels(oDat$probeset), 1),
groups = gType, auto.key = T)
stripplot(geneExp ~ devStage, oDat)
stripplot(geneExp ~ devStage | probeset, oDat, layout = c(nlevels(oDat$probeset),
1))
stripplot(geneExp ~ devStage | probeset, oDat, layout = c(nlevels(oDat$probeset),
1), groups = gType, auto.key = T)
stripplot(geneExp ~ devStage | probeset, oDat, layout = c(nlevels(oDat$probeset),
1), groups = gType, auto.key = T, grid = T, type = c("a", "p"))
The argument 'type' can be used to add a variety of enhancements. Type is specified as a vector (through the use of 'c'). The option 'p' in the above example specifies the data as points on the plot, 'a' refers to getting the average of each category and joining them by a line (other summaries can be requested too). Some of the other options include 'l' for joining points by lines, 'b' for both points and lines, 'r' for adding the fit from a simple linear regression and 'smooth' for adding a nonparametric “smooth” fitted curve.
stripplot(geneExp ~ devStage | probeset, oDat, layout = c(nlevels(oDat$probeset),
1), groups = gType, auto.key = T, grid = T, type = c("r", "p"))
stripplot(geneExp ~ devStage | probeset, oDat, layout = c(nlevels(oDat$probeset),
1), groups = gType, auto.key = T, grid = T, type = c("smooth", "p"))
## Warning: pseudoinverse used at 3
## Warning: neighborhood radius 1
## Warning: reciprocal condition number 0
## Warning: There are other near singularities as well. 1
## Warning: pseudoinverse used at 3
## Warning: neighborhood radius 1
## Warning: reciprocal condition number 0
## Warning: There are other near singularities as well. 1
## Warning: pseudoinverse used at 3
## Warning: neighborhood radius 1
## Warning: reciprocal condition number 0
## Warning: There are other near singularities as well. 1
## Warning: pseudoinverse used at 3
## Warning: neighborhood radius 1
## Warning: reciprocal condition number 0
## Warning: There are other near singularities as well. 1
## Warning: pseudoinverse used at 3
## Warning: neighborhood radius 1
## Warning: reciprocal condition number 0
## Warning: There are other near singularities as well. 1
## Warning: pseudoinverse used at 3
## Warning: neighborhood radius 1
## Warning: reciprocal condition number 0
## Warning: There are other near singularities as well. 1
## Warning: pseudoinverse used at 3
## Warning: neighborhood radius 1
## Warning: reciprocal condition number 0
## Warning: There are other near singularities as well. 1
## Warning: pseudoinverse used at 3
## Warning: neighborhood radius 1
## Warning: reciprocal condition number 0
## Warning: There are other near singularities as well. 1
## Warning: pseudoinverse used at 3
## Warning: neighborhood radius 1
## Warning: reciprocal condition number 0
## Warning: There are other near singularities as well. 1
## Warning: pseudoinverse used at 3
## Warning: neighborhood radius 1
## Warning: reciprocal condition number 0
## Warning: There are other near singularities as well. 1
## Warning: pseudoinverse used at 3
## Warning: neighborhood radius 1
## Warning: reciprocal condition number 0
## Warning: There are other near singularities as well. 1
## Warning: pseudoinverse used at 3
## Warning: neighborhood radius 1
## Warning: reciprocal condition number 0
## Warning: There are other near singularities as well. 1
## Warning: pseudoinverse used at 3
## Warning: neighborhood radius 1
## Warning: reciprocal condition number 0
## Warning: There are other near singularities as well. 1
## Warning: pseudoinverse used at 3
## Warning: neighborhood radius 1
## Warning: reciprocal condition number 0
## Warning: There are other near singularities as well. 1
## Warning: pseudoinverse used at 3
## Warning: neighborhood radius 1
## Warning: reciprocal condition number 0
## Warning: There are other near singularities as well. 1
stripplot(geneExp ~ devStage | probeset, oDat, layout = c(nlevels(oDat$probeset),
1), groups = gType, auto.key = T, grid = T, type = c("b", "p"))
Here's a nice alternative to histograms!
densityplot(~geneExp, oDat)
densityplot(~geneExp | gType, oDat, grid = T)
densityplot(~geneExp, oDat, groups = gType, auto.key = T)
It controls how smooth this smoothed histogram will be. Though densityplot() has a sensible default, you can always specify directly if you wish. The argument 'n' controls the number of points at which the kernel density estimate is evaluated. It is easy to confuse this with the usual use of 'n' to denote sample size, so beware. If your density looks jaggedy, try increasing 'n'.
jBw <- 0.2
jn <- 400
densityplot(~geneExp, oDat, groups = gType, auto.key = T, bw = jBw, n = jn,
main = paste("bw=", jBw, ",n=", jn))
You try: use densityplot() to explore the gene expression distribution by gene and/or developmental stage. Play with 'bw' and 'n' if you like.
jBw <- 0.5
jn <- 200
densityplot(~geneExp, oDat, groups = gType, auto.key = T, bw = jBw, n = jn,
main = paste("bw=", jBw, ",n=", jn))
densityplot(~geneExp, oDat, groups = devStage, auto.key = T, bw = jBw, n = jn,
main = paste("bw=", jBw, ", n=", jn))
densityplot(~geneExp | devStage, oDat, groups = gType, auto.key = T, grid = T,
bw = jBw, n = jn, main = paste("bw=", jBw, ", n=", jn))
There is also a time and place for boxplots, obtained with the lattice function bwplot() for “box-and-whiskers plot”.
bwplot(geneExp ~ devStage, oDat)
bwplot(geneExp ~ devStage | gType, oDat)
bwplot(geneExp ~ devStage, oDat, panel = panel.violin)
This is our first explicit glimpse of how a panel function might be specified to some non-default value. More on that can be found in Sarkar's book and the STAT545A materials.
Now we need a larger dataset. Let's load the real full data matrix and experimental design now.
The experimental data, i.e. covariate information for the samples, is available in plain text form in the file GSE4051_design.tsv but you will notice below that we prefer to load it from a RDS file for better factor levels.
prDat <- read.table("GSE4051_data.tsv")
str(prDat, max.level = 0)
## 'data.frame': 29949 obs. of 39 variables:
prDes <- readRDS("GSE4051_design.rds")
str(prDes)
## 'data.frame': 39 obs. of 4 variables:
## $ sidChar : chr "Sample_11" "Sample_12" "Sample_2" "Sample_9" ...
## $ sidNum : num 11 12 2 9 36 37 38 39 16 17 ...
## $ devStage: Factor w/ 5 levels "E16","P2","P6",..: 5 5 5 5 5 5 5 5 1 1 ...
## $ gType : Factor w/ 2 levels "wt","NrlKO": 2 2 2 2 1 1 1 1 2 2 ...
Let's draw 50 probesets at random – but in a repeatable way!
set.seed(1)
(yo <- sample(1:nrow(prDat), size = 50))
## [1] 7952 11145 17156 27198 6040 26902 28287 19786 18837 1850 6167
## [12] 5286 20568 11499 23046 14899 21481 29690 11375 23269 27975 6350
## [23] 19503 3758 7997 11555 401 11442 26023 10184 14424 17938 14766
## [34] 5571 24751 19997 23759 3229 21647 12302 24554 19353 23416 16540
## [45] 15842 23605 698 14271 21897 20713
hDat <- prDat[yo, ]
str(hDat)
## 'data.frame': 50 obs. of 39 variables:
## $ Sample_11: num 9.09 6.23 7 6.4 6.22 ...
## $ Sample_12: num 9.21 6.47 7.17 6.78 5.76 ...
## $ Sample_2 : num 9.05 6.2 7.2 6.54 5.9 ...
## $ Sample_9 : num 8.89 7.67 6.86 6.72 6.26 ...
## $ Sample_36: num 10.41 7.9 6.57 6.87 6.09 ...
## $ Sample_37: num 9.48 6.36 6.99 6.85 6.08 ...
## $ Sample_38: num 10.14 7.24 6.95 6.99 6.13 ...
## $ Sample_39: num 9.48 6.21 6.97 7.13 6.02 ...
## $ Sample_16: num 8.51 6.66 7.75 7.25 5.5 ...
## $ Sample_17: num 8.18 7.95 6.85 6.38 7.51 ...
## $ Sample_6 : num 7.96 8.45 7.42 6.19 7.64 ...
## $ Sample_20: num 8.3 8.25 6.91 6.79 6.4 ...
## $ Sample_21: num 8.33 9.14 6.55 6.23 6.69 ...
## $ Sample_22: num 8.43 8.19 6.59 6.69 6.14 ...
## $ Sample_23: num 8.49 8.66 6.58 6.34 6.34 ...
## $ Sample_13: num 8.73 6.83 7.14 6.46 6.25 ...
## $ Sample_15: num 8.57 6.86 7.12 6.72 5.84 ...
## $ Sample_18: num 8.96 6.95 6.96 6.31 6.44 ...
## $ Sample_19: num 8.65 6.69 7.03 6.91 6.32 ...
## $ Sample_32: num 9.7 7.62 6.96 6.89 6.82 ...
## $ Sample_33: num 8.72 6.83 7.36 7.21 5.93 ...
## $ Sample_34: num 8.58 6.73 7.63 7.19 5.93 ...
## $ Sample_35: num 8.54 6.91 7.14 6.71 5.59 ...
## $ Sample_14: num 8.46 6.99 7.14 6.78 6.29 ...
## $ Sample_3 : num 8.53 7.14 7.23 6.88 6.15 ...
## $ Sample_5 : num 8.45 6.71 7.36 6.92 6 ...
## $ Sample_8 : num 8.62 6.66 7.9 6.97 5.95 ...
## $ Sample_24: num 8.34 7.49 7.17 6.84 5.83 ...
## $ Sample_25: num 8.14 7.39 7.12 7.02 5.85 ...
## $ Sample_26: num 8.45 6.94 7.46 7.43 6.32 ...
## $ Sample_27: num 8.25 6.5 7.23 6.91 5.8 ...
## $ Sample_1 : num 8.66 6.81 7.22 6.56 6.03 ...
## $ Sample_10: num 8.68 7.41 6.97 6.5 5.99 ...
## $ Sample_4 : num 8.74 7.23 6.91 6.78 5.82 ...
## $ Sample_7 : num 8.69 6.61 7.43 7.1 5.64 ...
## $ Sample_28: num 8.63 6.46 7.45 7.17 6.03 ...
## $ Sample_29: num 8.58 7.84 6.72 6.91 6.31 ...
## $ Sample_30: num 8.28 7.01 6.81 6.75 5.81 ...
## $ Sample_31: num 8.47 6.88 7.18 6.89 5.79 ...
The functions for heatmapping expect a matrix not a data.frame, so we will convert hDat and also transpose for a nicer heatmap orientation below. I also give the samples more informative names that capture genotype and developmental stage.
hDat <- as.matrix(t(hDat))
rownames(hDat) <- with(prDes, paste(devStage, gType, sidChar, sep = "_"))
str(hDat)
## num [1:39, 1:50] 9.09 9.21 9.05 8.89 10.41 ...
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:39] "4_weeks_NrlKO_Sample_11" "4_weeks_NrlKO_Sample_12" "4_weeks_NrlKO_Sample_2" "4_weeks_NrlKO_Sample_9" ...
## ..$ : chr [1:50] "1426822_at" "1431375_s_at" "1440076_at" "1456157_at" ...
heatmap(hDat, Rowv = NA, Colv = NA, scale = "none", margins = c(5, 8))
heatmap(hDat, Rowv = NA, Colv = NA, scale = "none", col = cm.colors(256), margins = c(5,
8))
install.packages("RColorBrewer")
## Installing package into 'C:/Users/JML/Documents/R/win-library/3.0'
## (as 'lib' is unspecified)
## Error: trying to use CRAN without setting a mirror
library(RColorBrewer)
display.brewer.all()
Warning: this will seem arcane. To map a small number of colors into a ramp or palette suitable for encoding a quantitative variable as colors, the function colorRampPalette() is useful. Its somewhat surprising return value is a function itself that takes a number n as input and returns a correspondingly sized color ramp. Here I prepare to use a subdued gray palette and also a more stimulating pale blue to purple palette.
jGreysFun <- colorRampPalette(brewer.pal(n = 9, "Greys"))
jBuPuFun <- colorRampPalette(brewer.pal(n = 9, "BuPu"))
heatmap(hDat, Rowv = NA, Colv = NA, scale = "none", margins = c(5, 8), col = jGreysFun(256))
heatmap(hDat, Rowv = NA, Colv = NA, scale = "none", margins = c(5, 8), col = jBuPuFun(256))
By specifying Rowv = NA, Colv = NA, scale = “none”, we have been suppressing some rather common heatmap features:
Let's look at the heatmap as it would be rendered by default.
heatmap(hDat, margins = c(5, 8), col = jBuPuFun(256))
heatmap(hDat, col = jBuPuFun(256), margins = c(5, 8), scale = c("column"))
Finally we try out another popular heatmapping function heatmap.2() from the gplots package. This adds an automatic color legend, which helps you determine what each color extreme actually means.
install.packages("gplots")
## Installing package into 'C:/Users/JML/Documents/R/win-library/3.0'
## (as 'lib' is unspecified)
## Error: trying to use CRAN without setting a mirror
library(gplots)
## KernSmooth 2.23 loaded
## Copyright M. P. Wand 1997-2009
##
## Attaching package: 'gplots'
##
## The following object is masked from 'package:stats':
##
## lowess
heatmap.2(hDat, col = jGreysFun(256), trace = "none")
heatmap.2(hDat, col = jBuPuFun(256), trace = "none")
Now that we've loaded the main dataset we can also explore high-volume scatterplotting and the solutions to overplotting. First let's pick two samples at random to plot against each other.
set.seed(924)
(yo <- sample(1:ncol(prDat), size = 2))
## [1] 25 24
y <- prDat[[yo[1]]]
z <- prDat[[yo[2]]]
str(y)
## num [1:29949] 7.16 9.55 9.84 8.33 8.5 ...
str(z)
## num [1:29949] 7.09 9.56 9.88 8.57 8.59 ...
library(lattice)
xyplot(y ~ z, asp = 1)
See the giant dark point cloud of death? What's going on in there? Who knows. The overplotting is potentially hiding information about the joint distribution. Also, these plots use up lots of ink when printing and take a long time to load if you're giving a talk. Stop using plain vanilla scatterplot tools for very high-volume plots. The superior alternatives divide the plane into small neighborhoods and shade them according to how many points fall there.
smoothScatter(y ~ z, asp = 1)
You can see that we were missing some information in the dark cloud above. There is one main clump of data, concentrated around (6, 6) and then petering out diagonally up the x = y line. There's arguably a second, smaller clump of data on a steeper line running through the points ~(10, 8) and ~(14, 14).
xyplot(y ~ z, asp = 1, panel = panel.smoothScatter, nbin = 150)
install.packages("hexbin")
## Installing package into 'C:/Users/JML/Documents/R/win-library/3.0'
## (as 'lib' is unspecified)
## Error: trying to use CRAN without setting a mirror
library(grid)
library(hexbin)
hexbinplot(y ~ z)
Finally, if you want to plot several variables against each other, i.e. create a scatterplot matrix, there are several functions for that.
- base graphics offers **pairs()**
- lattice offers **splom()** for scatterplot matrix
- hexbin offers **hexplom()**
- ggplot2 offers **plotmatrix()** (sorry we won't go here today)
We need to take a slightly larger sample of columns now.
set.seed(3)
(yo <- sample(1:ncol(prDat), size = 4))
## [1] 7 31 15 12
pairDat <- subset(prDat, select = yo)
str(pairDat)
## 'data.frame': 29949 obs. of 4 variables:
## $ Sample_38: num 7.37 9.44 9.48 8.49 8.34 ...
## $ Sample_27: num 7.21 8.64 9.43 8.33 8.43 ...
## $ Sample_23: num 7.07 10.13 9.91 8.49 8.64 ...
## $ Sample_20: num 7.24 9.48 10.01 8.36 8.59 ...
pairs(pairDat)
pairs(pairDat, panel = function(...) smoothScatter(..., add = T))
splom(pairDat)
splom(pairDat, panel = panel.smoothScatter, raster = T)
hexplom(pairDat)