Correlation between numerical variables: R code for Chapter 16 examples

Note: This document was converted to R-Markdown from this page by M. Drew LaMar. You can download the R-Markdown here.

Download the R code on this page as a single file here

New methods

Hover over a function argument for a short description of its meaning. The variable names are plucked from the examples further below.

Linear correlation:

cor.test(booby$futureBehavior, booby$nVisitsNestling)

Spearman rank correlation:

cor.test(trick$years, trick$impressivenessScore, method = “spearman”)

Example 16.1. Flipping Booby

Estimate a linear correlation between the number of non-parent adult visits experienced by boobies as chicks and the number of similar behaviors performed by the same birds when adult.

Read and inspect the data

booby <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter16/chap16e1FlippingBird.csv"))
head(booby)

##   nVisitsNestling futureBehavior
## 1               1          -0.80
## 2               7          -0.92
## 3              15          -0.80
## 4               4          -0.46
## 5              11          -0.47
## 6              14          -0.46

Scatter plot.

plot(futureBehavior ~ nVisitsNestling, data = booby)

For a fancier scatter plot using more options (Figure 16.1-4):

plot(futureBehavior ~ nVisitsNestling, data = booby, pch = 16, col = "firebrick", las = 1, bty = "l", cex = 1.2, xlab = "Events experienced as a nestling", ylab = "Future behavior")

Correlation coefficient. The cor.test function computes a number of useful quantities, which we save in the object boobyCor. The quantities can be extracted one at a time or shown all at once.

boobyCor <- cor.test(booby$futureBehavior, booby$nVisitsNestling)
boobyCor

## 
##  Pearson's product-moment correlation
## 
## data:  booby$futureBehavior and booby$nVisitsNestling
## t = 2.9603, df = 22, p-value = 0.007229
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1660840 0.7710999
## sample estimates:
##       cor 
## 0.5337225

If only the estimated correlation and standard error are of interest, they can be obtained as follows. The calculation of standard error uses nrow(booby) to get the sample size for the correlation, but this will only be true if there are no missing values.

r <- boobyCor$estimate
r

##       cor 
## 0.5337225

SE <- sqrt( (1 - r^2)/(nrow(booby) - 3) )
unname(SE)

## [1] 0.1845381

Confidence limit for a correlation coefficient. The 95% confidence interval for the correlation is included in the output of cor.test. If all you want is the confidence interval, it can be extracted from the boobyCor calculated in an earlier step.

boobyCor$conf.int

## [1] 0.1660840 0.7710999
## attr(,"conf.level")
## [1] 0.95

Example 16.2. Inbreeding wolves

Test a linear correlation between inbreeding coefficients of litters of mated wolf pairs and the number of pups surviving their first winter.

Read and inspect data.

wolf <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter16/chap16e2InbreedingWolves.csv"))
head(wolf)

##   inbreedCoef nPups
## 1        0.00     6
## 2        0.00     6
## 3        0.13     7
## 4        0.13     5
## 5        0.13     4
## 6        0.19     8

Scatter plot.

plot(nPups ~ inbreedCoef, data = wolf)

A fancier scatter plot with more options:

plot(nPups ~ inbreedCoef, data = wolf, pch = 16, col = "firebrick", las = 1, bty = "l", cex = 1.2, xlab = "Inbreeding coefficient", ylab = "Number of pups")

Test of zero correlation. The results of the test are included in the output of cor.test.

cor.test(wolf$nPups, wolf$inbreedCoef)

## 
##  Pearson's product-moment correlation
## 
## data:  wolf$nPups and wolf$inbreedCoef
## t = -3.5893, df = 22, p-value = 0.001633
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.8120418 -0.2706791
## sample estimates:
##        cor 
## -0.6077184

Figure 16.4-1. Stream invertebrates

Effect of the range of the data on the correlation coefficient between population density of (log base 10 of number of individuals per square meter) and body mass (g) of different species of stream invertebrates.

Read and inspect the data.

streamInvert <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter16/chap16f4_1StreamInvertebrates.csv"))
head(streamInvert)

##   log10Mass log10Density
## 1     -3.22         4.22
## 2     -2.70         4.25
## 3     -2.70         4.13
## 4     -2.60         3.99
## 5     -2.03         5.34
## 6     -2.12         4.09

Scatter plot.

plot(log10Density ~ log10Mass, data = streamInvert)

Commands to make a scatter plot of these data with more options:

plot(log10Density ~ log10Mass, data = streamInvert, pch = 16, col = "firebrick", las = 1, bty = "l", cex = 1.2, xlab = "Log population density", ylab = "Log body mass")

Effect of the range of the data on the correlation coefficient. Here is the correlation coefficient for the full range of the data. The command uses cor.test but we extract just the correlation coefficient for this exercise.

cor.test(streamInvert$log10Density, streamInvert$log10Mass)$estimate

##        cor 
## -0.7651667

Here is the correlation coefficient for the subset of the data corresponding to a log10Mass between 0 and 2.

streamInvertReduced <- subset(streamInvert, log10Mass > 0 & log10Mass < 2)
cor.test(streamInvertReduced$log10Density, streamInvertReduced$log10Mass)$estimate

##        cor 
## -0.2552496

Example 16.5. Indian rope trick

Spearman rank correlation between impressiveness score of the Indian rope trick and the number of years elapsed bewteen the witnessing of the trick and the telling of it in writing.

Read and inspect the data.

trick <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter16/chap16e5IndianRopeTrick.csv"))
head(trick)

##   years impressivenessScore
## 1     2                   1
## 2     5                   1
## 3     5                   1
## 4     4                   2
## 5    17                   2
## 6    17                   2

Scatter plot.

plot(impressivenessScore ~ years, data = trick)

Commands to make a scatter plot of these data with more options:

plot(impressivenessScore ~ years, data = trick, pch = 16, col = "firebrick", las = 1, bty = "l", cex = 1.2, xlab = "Years elapsed", ylab = "Impressiveness score")

Test of zero Spearman rank correlation. In this example, the variable “impressivenessScore” is a number score with lots of tied observations. Because of the ties, R will warn you that the $P$-value in the output is not exact.

cor.test(trick$years, trick$impressivenessScore, method = "spearman")

## Warning in cor.test.default(trick$years, trick$impressivenessScore, method
## = "spearman"): Cannot compute exact p-value with ties

## 
##  Spearman's rank correlation rho
## 
## data:  trick$years and trick$impressivenessScore
## S = 332.12, p-value = 2.571e-05
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.7843363

Correlation between numerical variables: R code for Chapter 16 examples

Michael Whitlock and Dolph Schluter

New methods

Example 16.1. Flipping Booby

Example 16.2. Inbreeding wolves

Figure 16.4-1. Stream invertebrates

Example 16.5. Indian rope trick