Regression: R code for Chapter 17 examples

Note: This document was converted to R-Markdown from this page by M. Drew LaMar. You can download the R-Markdown here.

Download the R code on this page as a single file here

New methods

Hover over a function argument for a short description of its meaning. The variable names are plucked from the examples further below.

Linear regression:

lionRegression <- lm(ageInYears ~ proportionBlack, data = lion) summary(lionRegression)

Predict mean y on a regression line corresponding to a given x-value:

predict(lionRegression, data.frame(proportionBlack = 0.50), se.fit = TRUE)

ANOVA test of zero regression slope:

anova(prairieRegression)

Other new methods:

95% confidence interval for the slope.
95% confidence bands.
95% prediction intervals.
Scatter plot smoothing.
Nonlinear regression.
Logistic regression.

Example 17.1. Lion noses

Estimate a linear regression and calculate predicted values and their uncertainty for the relationship between lion age and proportion of black on the lion nose.

Read and inspect the data.

lion <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter17/chap17e1LionNoses.csv"))
head(lion)

##   proportionBlack ageInYears
## 1            0.21        1.1
## 2            0.14        1.5
## 3            0.11        1.9
## 4            0.13        2.2
## 5            0.12        2.6
## 6            0.13        3.2

Basic scatter plot.

plot(ageInYears ~ proportionBlack, data = lion)

Commands for a fancier scatter plot using more options (Figure 17.1-1):

plot(ageInYears ~ proportionBlack, data = lion, pch = 16, col="firebrick", las = 1, cex = 1.5, bty = "l", xlim = c(0,0.8), ylim = c(0,14), ylab = "Age (years)", xlab = "Proportion black")

Fit the linear regression to the data using least squares. Use lm, which was also used for ANOVA. Save the results into a model object, and then use other commands to extract the quantities we want.

lionRegression <- lm(ageInYears ~ proportionBlack, data = lion)

Show the estimated linear regression coefficients: slope and intercept. The output of summary includes other valuable quantities, including standard errors and a \(t\)-test of zero regression slope. If you want only the regression coeffients without the rest of the data, use coef(lionRegression).

summary(lionRegression)

## 
## Call:
## lm(formula = ageInYears ~ proportionBlack, data = lion)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.5449 -1.1117 -0.5285  0.9635  4.3421 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       0.8790     0.5688   1.545    0.133    
## proportionBlack  10.6471     1.5095   7.053 7.68e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.669 on 30 degrees of freedom
## Multiple R-squared:  0.6238, Adjusted R-squared:  0.6113 
## F-statistic: 49.75 on 1 and 30 DF,  p-value: 7.677e-08

Add the least squares regression line to the plot. The quick method is to use abline. If needed, redraw the scatter plot first.

plot(ageInYears ~ proportionBlack, data = lion)
abline(lionRegression)

Another way to draw the regression line is to generate predicted values and then connect them with a line (Figure 17.1-4). This has the advantage of more control - for example, we can make sure that the line doesn’t fall outside the range of the data, avoiding extrapolation. Commands to accomplish this:

plot(ageInYears ~ proportionBlack, data = lion)
xpts <- range(lion$proportionBlack)
ypts <- predict(lionRegression, data.frame(proportionBlack = xpts))
lines(ypts ~ xpts, lwd = 1.5)

Use the regression line for prediction. For example, here is how to predict mean lion age corresponding to a value of 0.50 of proportion black in the nose. Standard error of the predicted mean is included.

predict(lionRegression, data.frame(proportionBlack = 0.50), se.fit = TRUE)

## $fit
##        1 
## 6.202566 
## 
## $se.fit
## [1] 0.3988321
## 
## $df
## [1] 30
## 
## $residual.scale
## [1] 1.668764

Calculate the residual and regression mean squares. All the mean squares are included in the anova table, which is generate using the anova command.

anova(lionRegression)

## Analysis of Variance Table
## 
## Response: ageInYears
##                 Df  Sum Sq Mean Sq F value    Pr(>F)    
## proportionBlack  1 138.544 138.544  49.751 7.677e-08 ***
## Residuals       30  83.543   2.785                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Standard error of slope. The standard errors are provided in the output of the summary function.

summary(lionRegression)

## 
## Call:
## lm(formula = ageInYears ~ proportionBlack, data = lion)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.5449 -1.1117 -0.5285  0.9635  4.3421 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       0.8790     0.5688   1.545    0.133    
## proportionBlack  10.6471     1.5095   7.053 7.68e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.669 on 30 degrees of freedom
## Multiple R-squared:  0.6238, Adjusted R-squared:  0.6113 
## F-statistic: 49.75 on 1 and 30 DF,  p-value: 7.677e-08

95% confidence interval for slope.

confint(lionRegression)

##                      2.5 %    97.5 %
## (Intercept)     -0.2826733  2.040686
## proportionBlack  7.5643082 13.729931

95% confidence bands (Figure 17.2-1). If needed, redraw the scatter plot, including the regression line, before executing the commands below. The seq function below generates a sequence of 100 new x-values evenly spaced between the smallest and largest values of proportionBlack. The predict function calculates the corresponding predicted mean y-values on the regression line, as well as the lower and upper limits of the confidence intervals for the predicted mean y-values at each x-value (these are put into the data frame named ypt). Finally, the lines functions connect the consecutive lower limits of the confidence intervals with a line, and do the same for the consecutive upper limits, forming the bands.

plot(ageInYears ~ proportionBlack, data = lion)
abline(lionRegression)

xpt <- seq(min(lion$proportionBlack), max(lion$proportionBlack), length.out = 100)
ypt <- data.frame( predict(lionRegression, newdata = data.frame(proportionBlack = xpt), interval = "confidence") )
lines(ypt$lwr ~ xpt, lwd = 1.5, lty = 2)
lines(ypt$upr ~ xpt, lwd = 1.5, lty = 2)

95% prediction intervals (Figure 17.2-1). If needed, redraw the scatter plot, including the regression line, before executing the commands below. The commands below are the same as those used to generate the confidence bands (see above), but here the lower and upper limits are confidence intervals for the prediction of a single individual y-value, rather than the mean value, corresponding to each x-value.

plot(ageInYears ~ proportionBlack, data = lion)
abline(lionRegression)

xpt <- seq(min(lion$proportionBlack), max(lion$proportionBlack), length.out = 100)
ypt <- data.frame(predict(lionRegression, newdata = data.frame(proportionBlack = xpt), interval = "prediction", level = 0.95))
lines(ypt$lwr ~ xpt, lwd = 1.5, lty = 3)
lines(ypt$upr ~ xpt, lwd = 1.5, lty = 3)

Example 17.3. Prairie home campion

Test the null hypothesis of zero regression slope. The data are from an experiment investigating the effect of plant species diversity on the stability of plant biomass production.

Read and inspect the data.

prairie <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter17/chap17e3PlantDiversityAndStability.csv"))
head(prairie)

##   nSpecies biomassStability
## 1        1             7.47
## 2        1             6.74
## 3        1             6.61
## 4        1             6.40
## 5        1             5.67
## 6        1             5.26

Log-transform stability and include the new variable in the data frame. Inspect the data frame once again to make sure the function worked as intended.

prairie$logStability <- log(prairie$biomassStability)
head(prairie)

##   nSpecies biomassStability logStability
## 1        1             7.47     2.010895
## 2        1             6.74     1.908060
## 3        1             6.61     1.888584
## 4        1             6.40     1.856298
## 5        1             5.67     1.735189
## 6        1             5.26     1.660131

Scatter plot with regression line.

plot(logStability ~ nSpecies, data = prairie) 
prairieRegression <- lm(logStability ~ nSpecies, data = prairie)
abline(prairieRegression)

Commands for a fancier scatter plot with more options (Figure 17.3-1):

plot(logStability ~ nSpecies, data = prairie, bty = "l", col = "firebrick", pch = 1, las = 1, cex = 1.5, xlim = c(0,16), xaxp = c(0,16,8), xlab = "Species number treatment", ylab = "Log-transformed ecosystem stability")
prairieRegression <- lm(logStability ~ nSpecies, data = prairie)
xpts <- c(1,16)
ypts <- predict(prairieRegression, newdata = data.frame(nSpecies = xpts))
lines(ypts ~ xpts, lwd=1.5)

\(t\)-test and 95% confidence interval of regression slope.

prairieRegression <- lm(logStability ~ nSpecies, data = prairie)
summary(prairieRegression)

## 
## Call:
## lm(formula = logStability ~ nSpecies, data = prairie)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.97148 -0.25984 -0.00234  0.23100  1.03237 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.198294   0.041298  29.016  < 2e-16 ***
## nSpecies    0.032926   0.004884   6.742 2.73e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3484 on 159 degrees of freedom
## Multiple R-squared:  0.2223, Adjusted R-squared:  0.2174 
## F-statistic: 45.45 on 1 and 159 DF,  p-value: 2.733e-10

confint(prairieRegression)

##                  2.5 %     97.5 %
## (Intercept) 1.11673087 1.27985782
## nSpecies    0.02328063 0.04257117

ANOVA test of zero regression slope (Table 17.3-2).

anova(prairieRegression)

## Analysis of Variance Table
## 
## Response: logStability
##            Df  Sum Sq Mean Sq F value    Pr(>F)    
## nSpecies    1  5.5169  5.5169  45.455 2.733e-10 ***
## Residuals 159 19.2980  0.1214                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

\(R^{2}\). The \(R^{2}\) associated with the regression is part of the output of the summary command.

summary(prairieRegression)$r.squared

## [1] 0.2223217

Figure 17.5-2. Tail feather outlier

Examine the effect on an outlier on linear regression. The data are percentage of white in the tail feathers of the dark-eyed junco at sites at different latitudes.

Read and inspect data.

junco <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter17/chap17f5_2JuncoOutlier.csv"))
head(junco)

##   latitude percentWhiteTail
## 1     33.7             46.2
## 2     32.8             45.2
## 3     32.9             44.0
## 4     33.7             42.6
## 5     34.3             42.5
## 6     35.5             40.9

Scatter plot of all the data.

plot(percentWhiteTail ~ latitude, data = junco)

Commands for a more elaborate scatter plot using more options (Figure 17.5-2):

plot(percentWhiteTail ~ latitude, data = junco, pch = 16, col="firebrick", las = 1, cex = 1.5, bty = "l", xlab = "Latitude (degrees N)", ylab = "Percent white in tail")

Add regression lines to the scatter plot, calculated both with and without the outlier. You may need to redraw the scatter plot using the above commands before executing the following functions.

plot(percentWhiteTail ~ latitude, data = junco)

juncoRegression <- lm(percentWhiteTail ~ latitude, data = junco)
abline(juncoRegression, col = "red", lwd = 2)
juncoRegressionNoOut <- lm(percentWhiteTail ~ latitude, data = junco, subset = percentWhiteTail > 38)
abline(juncoRegressionNoOut, lwd = 2)

Figure 17.5-3. Desert pool fish

Detect a nonlinear relationship graphically, using data on surface area of desert pools and the number of fish species present.

Read and inspect data.

desertFish <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter17/chap17f5_3DesertPoolFish.csv"))
head(desertFish)

##   poolArea nFishSpecies
## 1        9            1
## 2       20            1
## 3       30            1
## 4       50            1
## 5      200            1
## 6       60            2

Scatter plot.

plot(nFishSpecies ~ poolArea, data = desertFish)

Commands for a fancier scatter plot:

plot(nFishSpecies ~ poolArea, data = desertFish, pch = 16, col = "firebrick", las = 1, cex = 1.5, bty = "l", xlab = expression(Area~(m^2)), ylab = "Number of species", ylim = c(0,6))

Add regression line to previous scatter plot.

plot(nFishSpecies ~ poolArea, data = desertFish)

desertRegression <- lm(nFishSpecies ~ poolArea, data = desertFish)
abline(desertRegression)

Add curve based on smoothing to previous scatter plot.

plot(nFishSpecies ~ poolArea, data = desertFish)

desertSmooth <-loess(nFishSpecies ~ poolArea, data = desertFish)
xpts <- seq(min(desertFish$poolArea), max(desertFish$poolArea), length.out = 100)
ypts <- predict(desertSmooth, new = data.frame(poolArea = xpts))
lines(xpts, ypts)

Here is another method to produce a smooth curve in a scatter plot.

plot(nFishSpecies ~ poolArea, data = desertFish)

desertSmooth <-smooth.spline(desertFish$poolArea, desertFish$nFishSpecies, df = 4)
xpts <- seq(min(desertFish$poolArea), max(desertFish$poolArea), length.out = 100)
ypts <- predict(desertSmooth, xpts)$y
lines(xpts, ypts)

Figure 17.5-4 (left). Blue tit cap color

Create a residual plot to check assumptions. The data are cap color of offspring and parents in a sample of the blue tit.

Read and inspect the data.

capColor <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter17/chap17f5_4BlueTitCapColor.csv"))
head(capColor)

##   midparentColorScore offspringColorScore
## 1               -0.62               -0.33
## 2               -0.27               -0.17
## 3               -0.25               -0.23
## 4               -0.12               -0.49
## 5                0.12               -0.59
## 6                0.18               -0.55

Basic scatter plot with regression line.

plot(offspringColorScore ~ midparentColorScore, data = capColor)
capColorRegression <- lm(offspringColorScore ~ midparentColorScore, data = capColor)
abline(capColorRegression)

Residual plot. Residuals are calculated from the model object created from the previous lm command, using the residuals command.

plot(residuals(capColorRegression) ~ midparentColorScore, data = capColor)
abline(c(0,0))

Commands for a fancier residual plot using more options:

plot(residuals(capColorRegression) ~ midparentColorScore, data = capColor, pch = 16, col = "firebrick", las = 1, cex = 1.5, bty = "l", xlab = "Midparent color score", ylab = "Residuals")
abline(c(0,0))

Figure 17.5-4 (right). Cockroach neurons

Make a residual plot to check assumptions. The data are temperature and the firing rate of neurons in cockroach.

Read and inspect the data.

roach <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter17/chap17f5_4CockroachNeurons.csv"))
head(roach)

##   temperature   rate
## 1        11.9 155.03
## 2        14.4 142.33
## 3        15.3 140.11
## 4        16.4 142.65
## 5        17.9 135.24
## 6        20.4 130.16

Scatter plot with regression line.

plot(rate ~ temperature, data = roach)
roachRegression <- lm(rate ~ temperature, data = roach)
abline(roachRegression)

Residual plot.

plot(residuals(roachRegression) ~ temperature, data = roach)
abline(c(0,0))

Commands for a more elaborate residual plot:

plot(residuals(roachRegression) ~ temperature, data = roach, pch = 16, col = "firebrick", las = 1, cex = 1.5, bty = "l", xlab = "Temperature", ylab = "Residuals")
abline(c(0,0))

Figure 17.6-3. Iris pollination

Use a square root transformation to improve the fit to assumptions of linear regression. The data are number of pollen grains received and floral tube length of an iris species.

Read and inspect data.

iris <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter17/chap17f6_3IrisPollination.csv"))
head(iris)

##   tubeLengthMm grainsDeposited
## 1           38             299
## 2           54             205
## 3           53             167
## 4           55             122
## 5           64             129
## 6           64              64

Scatter plot with regression line.

plot(grainsDeposited ~ tubeLengthMm, data = iris)
irisRegression <- lm(grainsDeposited ~ tubeLengthMm, data = iris)
abline(irisRegression)

Residual plot.

plot(residuals(irisRegression) ~ tubeLengthMm, data = iris)
abline(c(0,0))

Commands for a residual plot with more options:

plot(residuals(irisRegression) ~ tubeLengthMm, data = iris, pch = 16,  col = "firebrick", las = 1, cex = 1.5, bty = "l",xlab = "Floral tube length (mm)", ylab = "Residuals")
abline(c(0,0))

Square root transformation.

iris$sqRootGrains <- sqrt(iris$grainsDeposited + 1/2)

Scatter plot using transformed data, with new regression line added.

plot(sqRootGrains ~ tubeLengthMm, data = iris)
irisRegressionSqrt <- lm(sqRootGrains ~ tubeLengthMm, data = iris)
abline(irisRegressionSqrt)

Residual plot based on the transformed data.

plot(residuals(irisRegressionSqrt) ~ tubeLengthMm, data = iris)
abline(c(0,0))

Instructions for a residual plot with more options:

plot(residuals(irisRegressionSqrt) ~ tubeLengthMm, data = iris,  pch= 16, col = "firebrick", las = 1, cex=1.5, bty = "l", xlab = "Floral tube length (mm)", ylab = "Residuals")
abline(c(0,0))

Figure 17.8-1. Iron and phytoplankton growth

Fit a nonlinear regression curve having an asymptote (Michaelis-Menten curve). The data are the relationship between population growth rate of a phytoplankton in culture and the concentration of iron in the medium.

Read and examine data.

phytoplankton <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter17/chap17f8_1IronAndPhytoplanktonGrowth.csv"))
head(phytoplankton)

##   ironConcentration phytoGrowthRate
## 1        0.01096478            0.00
## 2        0.02398833            0.50
## 3        0.02818383            0.35
## 4        0.07244360            1.04
## 5        0.07585776            0.97
## 6        0.11481536            1.19

Scatter plot.

plot(phytoGrowthRate ~ ironConcentration, data = phytoplankton)

Instructions for a scatter plot with more options:

plot(phytoGrowthRate ~ ironConcentration, data = phytoplankton, pch = 16, col = "firebrick", las = 1, cex = 1.5, bty = "l", ylab = "Growth rate (no./day)", xlab = expression(paste("Iron concentration (", mu, "mol)")) )

Fit a Michaelis-Menten curve to the phytoplankton data using the nls (nonlinear least squares). To fit the curve, provide a formula that also includes symbols (letters) for the parameters to be estimated. In the following function, we use “a” and “b” to indicate the parameters of the Michaelis-Menten curve we want to estimate. The function includes an argument where we must provide an initial guess of parameter values. The value of the initial guess is not so important – here we choose a=1 and b=1 as initial guesses. The first function below carried out the model fit and save the results in an R object named phytoCurve.

phytoCurve <- nls(phytoGrowthRate ~ a*ironConcentration / ( b+ironConcentration), data = phytoplankton, list(a = 1, b = 1))

Obtain the parameter estimates using the summary command, including standard errors and \(t\)-tests of null hypotheses that parameter values are zero.

summary(phytoCurve)

## 
## Formula: phytoGrowthRate ~ a * ironConcentration/(b + ironConcentration)
## 
## Parameters:
##   Estimate Std. Error t value Pr(>|t|)    
## a  1.94164    0.06802  28.545 1.14e-11 ***
## b  0.07833    0.01120   6.994 2.29e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1133 on 11 degrees of freedom
## 
## Number of iterations to convergence: 6 
## Achieved convergence tolerance: 6.672e-06

Add the nonlinear regression curve to scatter plot. If necessary, redraw the scatter plot before issuing the following commands.

plot(phytoGrowthRate ~ ironConcentration, data = phytoplankton)

xpts <- seq(min(phytoplankton$ironConcentration), max(phytoplankton$ironConcentration), length.out = 100)
ypts <- predict(phytoCurve, new = data.frame(ironConcentration = xpts))
lines(xpts, ypts)

Many of the functions that can be used to extract results from a saved lm object work in the same way when applied to an nls object, such as predict, residuals, and coef.

Figure 17.8-2. Pond plants and productivity

Fit a quadratic curve to the relationship between the number of plant species present in ponds and pond productivity.

Read and examine data.

pondProductivity <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter17/chap17f8_2PondPlantsAndProductivity.csv"))
head(pondProductivity)

##   productivity species
## 1            4       3
## 2           11       2
## 3           10       1
## 4           10       0
## 5           12       4
## 6           15       4

Scatter plot.

plot(species ~ productivity, data = pondProductivity)

Commands for a fancier scatter plot:

plot(species ~ productivity, data = pondProductivity, pch = 16, col = "firebrick", las = 1, cex = 1.5, bty = "l", ylab = "Number of species", xlab = "Productivity (g/15 days)" )

Fit a quadratic curve to the data. Here, the single variable productivity in the data frame is included in the formula both as itself and as the squared term, productivity\(^2\). To make the squared term work, we need to wrap the term with I(). The results of the model fit are saved in an R object productivityCurve.

productivityCurve <- lm(species ~ productivity + I(productivity^2), data = pondProductivity)

Show estimates of the parameters of the quadratic curve (regression coefficients) are obtained as follows, along with standard errors and \(t\)-tests.

summary(productivityCurve)

## 
## Call:
## lm(formula = species ~ productivity + I(productivity^2), data = pondProductivity)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.6330 -0.9952 -0.0158  1.4458  3.5215 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        1.7535477  1.1118289   1.577 0.126401    
## productivity       0.2083549  0.0558421   3.731 0.000897 ***
## I(productivity^2) -0.0020407  0.0005677  -3.595 0.001280 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.999 on 27 degrees of freedom
## Multiple R-squared:  0.3402, Adjusted R-squared:  0.2913 
## F-statistic: 6.961 on 2 and 27 DF,  p-value: 0.003648

Add quadratic regression curve to scatter plot. If necessary, redraw the previous scatter plot before issuing the following commands.

plot(species ~ productivity, data = pondProductivity)

xpts <- seq(min(pondProductivity$productivity), max(pondProductivity$productivity), length.out = 100)
ypts <- predict(productivityCurve, new = data.frame(productivity = xpts))
lines(xpts, ypts)

Example 17.8. Incredible shrinking seal

Fit a formula-free curve (cubic spline) to the relationship between body length and age for female fur seals.

Read and inspect data

shrink <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter17/chap17e8ShrinkingSeals.csv"))
head(shrink)

##   ageInDays length
## 1      5337    131
## 2      2081    123
## 3      2085    122
## 4      4299    136
## 5      2861    122
## 6      5052    131

Scatter plot.

plot(length ~ ageInDays, data = shrink, pch = ".")

Commands for a fancier scatter plot with more options:

plot(jitter(length, factor = 2) ~ ageInDays, data = shrink, pch = ".", col = "firebrick", las = 1, bty = "l", ylab = "Body length (cm)", xlab = "Female age (days)")

Fit a cubic spline. The argument df stands for effective degrees of freedom, which allows you to control how complicated the curve should be. The simplest possible curve is a straight line, which has df=2 (one for slope and another for intercept). More complex curves require more df. Here we fit a very complicated curve.

shrinkCurve <- smooth.spline(shrink$ageInDays, shrink$length, df = 40)

Add curve to scatter plot. If needed, replot the scatter plot before issuing the commands below.

plot(length ~ ageInDays, data = shrink, pch = ".")

xpts <- seq(min(shrink$ageInDays), max(shrink$ageInDays), length.out = 1000)
ypts <- predict(shrinkCurve, xpts)$y
lines(xpts, ypts)

Figure 17.9-1. Guppy cold death

Fit a logistic regression to the relationship between guppy mortality and duration of exposure to a temperature of 5 degrees C.

Read and inspect the data.

guppy <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter17/chap17f9_1GuppyColdDeath.csv"))
head(guppy)

##   fish exposureDurationMin mortality
## 1    1                   3         1
## 2    2                   3         1
## 3    3                   3         1
## 4    4                   3         1
## 5    5                   3         1
## 6    6                   3         1

Draw a frequency table of mortality at different exposure times.

table(guppy$mortality, guppy$exposureDurationMin, dnn = c("Mortality","Exposure (min)"))

##          Exposure (min)
## Mortality  3  8 12 18
##         0 29 16 11  2
##         1 11 24 29 38

Scatter plot of the data.

plot(mortality ~ jitter(exposureDurationMin), data = guppy)

Commands for a fancier scatter plot, with more options:

plot(jitter(mortality, factor = 0.1) ~ jitter(exposureDurationMin, factor = 1), data = guppy, col = "firebrick", las = 1, bty = "l", ylab = "Mortality", xlab = "Duration of exposure (min)")

Fit a logistic regression.

guppyGlm <- glm(mortality ~ exposureDurationMin, data = guppy, family = binomial(link = logit))

Add logistic regression curve to scatter plot.

plot(mortality ~ jitter(exposureDurationMin), data = guppy)

xpts <- seq(min(guppy$exposureDurationMin), max(guppy$exposureDurationMin), length.out = 100)
ypts <- predict(guppyGlm, newdata = data.frame(exposureDurationMin = xpts), type = "response")
lines(xpts, ypts)

Table of regression coefficients, with parameter estimates (Table 17.9-2).

summary(guppyGlm)

## 
## Call:
## glm(formula = mortality ~ exposureDurationMin, family = binomial(link = logit), 
##     data = guppy)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3332  -0.8115   0.3688   0.7206   1.5943  
## 
## Coefficients:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         -1.66081    0.40651  -4.086 4.40e-05 ***
## exposureDurationMin  0.23971    0.04245   5.646 1.64e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 209.55  on 159  degrees of freedom
## Residual deviance: 164.69  on 158  degrees of freedom
## AIC: 168.69
## 
## Number of Fisher Scoring iterations: 4

95% confidence intervals for parameter estimates. It is necessary to load the MASS package first.

if (!require("MASS")) { install.packages("MASS", dependencies=T, repos="http://cran.rstudio.com/") }

## Loading required package: MASS

library(MASS)
confint(guppyGlm)

## Waiting for profiling to be done...

##                          2.5 %     97.5 %
## (Intercept)         -2.4964323 -0.8931288
## exposureDurationMin  0.1614356  0.3288161

Predict probability of mortality (mean mortality) for a given x-value, 10 min duration, including standard error of prediction.

predict(guppyGlm, newdata = data.frame(exposureDurationMin = 10), type = "response", se.fit = TRUE)

## $fit
##         1 
## 0.6761874 
## 
## $se.fit
##          1 
## 0.04421466 
## 
## $residual.scale
## [1] 1

Estimate the LD50, the dose at which half the individuals are predicted to die from exposure.

library(MASS)
dose.p(guppyGlm, p = 0.50)

##              Dose       SE
## p = 0.5: 6.928366 0.841132

Analysis of deviance table, with a null hypothesis of zero slope (Table 17.9-3).

anova(guppyGlm, test = "Chi")

## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: mortality
## 
## Terms added sequentially (first to last)
## 
## 
##                     Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                                  159     209.55              
## exposureDurationMin  1    44.86       158     164.69 2.116e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Regression: R code for Chapter 17 examples

Michael Whitlock and Dolph Schluter

New methods

Example 17.1. Lion noses

Example 17.3. Prairie home campion

Figure 17.5-2. Tail feather outlier

Figure 17.5-3. Desert pool fish

Figure 17.5-4 (left). Blue tit cap color

Figure 17.5-4 (right). Cockroach neurons

Figure 17.6-3. Iris pollination

Figure 17.8-1. Iron and phytoplankton growth

Figure 17.8-2. Pond plants and productivity

Example 17.8. Incredible shrinking seal

Figure 17.9-1. Guppy cold death