Note: This document was converted to R-Markdown from this page by M. Drew LaMar. You can download the R-Markdown here.
Download the R code on this page as a single file here
Hover over a function argument for a short description of its meaning. The variable names are plucked from the examples further below.
Linear regression:
lionRegression <- lm(ageInYears ~ proportionBlack, data = lion) summary(lionRegression)
Predict mean y on a regression line corresponding to a given x-value:
predict(lionRegression, data.frame(proportionBlack = 0.50), se.fit = TRUE)
ANOVA test of zero regression slope:
anova(prairieRegression)
Other new methods:
Estimate a linear regression and calculate predicted values and their uncertainty for the relationship between lion age and proportion of black on the lion nose.
Read and inspect the data.
lion <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter17/chap17e1LionNoses.csv"))
head(lion)
## proportionBlack ageInYears
## 1 0.21 1.1
## 2 0.14 1.5
## 3 0.11 1.9
## 4 0.13 2.2
## 5 0.12 2.6
## 6 0.13 3.2
Basic scatter plot.
plot(ageInYears ~ proportionBlack, data = lion)
Commands for a fancier scatter plot using more options (Figure 17.1-1):
plot(ageInYears ~ proportionBlack, data = lion, pch = 16, col="firebrick", las = 1, cex = 1.5, bty = "l", xlim = c(0,0.8), ylim = c(0,14), ylab = "Age (years)", xlab = "Proportion black")
Fit the linear regression to the data using least squares. Use lm
, which was also used for ANOVA. Save the results into a model object, and then use other commands to extract the quantities we want.
lionRegression <- lm(ageInYears ~ proportionBlack, data = lion)
Show the estimated linear regression coefficients: slope and intercept. The output of summary
includes other valuable quantities, including standard errors and a \(t\)-test of zero regression slope. If you want only the regression coeffients without the rest of the data, use coef(lionRegression)
.
summary(lionRegression)
##
## Call:
## lm(formula = ageInYears ~ proportionBlack, data = lion)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.5449 -1.1117 -0.5285 0.9635 4.3421
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.8790 0.5688 1.545 0.133
## proportionBlack 10.6471 1.5095 7.053 7.68e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.669 on 30 degrees of freedom
## Multiple R-squared: 0.6238, Adjusted R-squared: 0.6113
## F-statistic: 49.75 on 1 and 30 DF, p-value: 7.677e-08
Add the least squares regression line to the plot. The quick method is to use abline
. If needed, redraw the scatter plot first.
plot(ageInYears ~ proportionBlack, data = lion)
abline(lionRegression)
Another way to draw the regression line is to generate predicted values and then connect them with a line (Figure 17.1-4). This has the advantage of more control - for example, we can make sure that the line doesn’t fall outside the range of the data, avoiding extrapolation. Commands to accomplish this:
plot(ageInYears ~ proportionBlack, data = lion)
xpts <- range(lion$proportionBlack)
ypts <- predict(lionRegression, data.frame(proportionBlack = xpts))
lines(ypts ~ xpts, lwd = 1.5)
Use the regression line for prediction. For example, here is how to predict mean lion age corresponding to a value of 0.50 of proportion black in the nose. Standard error of the predicted mean is included.
predict(lionRegression, data.frame(proportionBlack = 0.50), se.fit = TRUE)
## $fit
## 1
## 6.202566
##
## $se.fit
## [1] 0.3988321
##
## $df
## [1] 30
##
## $residual.scale
## [1] 1.668764
Calculate the residual and regression mean squares. All the mean squares are included in the anova table, which is generate using the anova
command.
anova(lionRegression)
## Analysis of Variance Table
##
## Response: ageInYears
## Df Sum Sq Mean Sq F value Pr(>F)
## proportionBlack 1 138.544 138.544 49.751 7.677e-08 ***
## Residuals 30 83.543 2.785
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Standard error of slope. The standard errors are provided in the output of the summary
function.
summary(lionRegression)
##
## Call:
## lm(formula = ageInYears ~ proportionBlack, data = lion)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.5449 -1.1117 -0.5285 0.9635 4.3421
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.8790 0.5688 1.545 0.133
## proportionBlack 10.6471 1.5095 7.053 7.68e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.669 on 30 degrees of freedom
## Multiple R-squared: 0.6238, Adjusted R-squared: 0.6113
## F-statistic: 49.75 on 1 and 30 DF, p-value: 7.677e-08
95% confidence interval for slope.
confint(lionRegression)
## 2.5 % 97.5 %
## (Intercept) -0.2826733 2.040686
## proportionBlack 7.5643082 13.729931
95% confidence bands (Figure 17.2-1). If needed, redraw the scatter plot, including the regression line, before executing the commands below. The seq
function below generates a sequence of 100 new x-values evenly spaced between the smallest and largest values of proportionBlack
. The predict
function calculates the corresponding predicted mean y-values on the regression line, as well as the lower and upper limits of the confidence intervals for the predicted mean y-values at each x-value (these are put into the data frame named ypt
). Finally, the lines
functions connect the consecutive lower limits of the confidence intervals with a line, and do the same for the consecutive upper limits, forming the bands.
plot(ageInYears ~ proportionBlack, data = lion)
abline(lionRegression)
xpt <- seq(min(lion$proportionBlack), max(lion$proportionBlack), length.out = 100)
ypt <- data.frame( predict(lionRegression, newdata = data.frame(proportionBlack = xpt), interval = "confidence") )
lines(ypt$lwr ~ xpt, lwd = 1.5, lty = 2)
lines(ypt$upr ~ xpt, lwd = 1.5, lty = 2)
95% prediction intervals (Figure 17.2-1). If needed, redraw the scatter plot, including the regression line, before executing the commands below. The commands below are the same as those used to generate the confidence bands (see above), but here the lower and upper limits are confidence intervals for the prediction of a single individual y-value, rather than the mean value, corresponding to each x-value.
plot(ageInYears ~ proportionBlack, data = lion)
abline(lionRegression)
xpt <- seq(min(lion$proportionBlack), max(lion$proportionBlack), length.out = 100)
ypt <- data.frame(predict(lionRegression, newdata = data.frame(proportionBlack = xpt), interval = "prediction", level = 0.95))
lines(ypt$lwr ~ xpt, lwd = 1.5, lty = 3)
lines(ypt$upr ~ xpt, lwd = 1.5, lty = 3)
Test the null hypothesis of zero regression slope. The data are from an experiment investigating the effect of plant species diversity on the stability of plant biomass production.
Read and inspect the data.
prairie <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter17/chap17e3PlantDiversityAndStability.csv"))
head(prairie)
## nSpecies biomassStability
## 1 1 7.47
## 2 1 6.74
## 3 1 6.61
## 4 1 6.40
## 5 1 5.67
## 6 1 5.26
Log-transform stability and include the new variable in the data frame. Inspect the data frame once again to make sure the function worked as intended.
prairie$logStability <- log(prairie$biomassStability)
head(prairie)
## nSpecies biomassStability logStability
## 1 1 7.47 2.010895
## 2 1 6.74 1.908060
## 3 1 6.61 1.888584
## 4 1 6.40 1.856298
## 5 1 5.67 1.735189
## 6 1 5.26 1.660131
Scatter plot with regression line.
plot(logStability ~ nSpecies, data = prairie)
prairieRegression <- lm(logStability ~ nSpecies, data = prairie)
abline(prairieRegression)
Commands for a fancier scatter plot with more options (Figure 17.3-1):
plot(logStability ~ nSpecies, data = prairie, bty = "l", col = "firebrick", pch = 1, las = 1, cex = 1.5, xlim = c(0,16), xaxp = c(0,16,8), xlab = "Species number treatment", ylab = "Log-transformed ecosystem stability")
prairieRegression <- lm(logStability ~ nSpecies, data = prairie)
xpts <- c(1,16)
ypts <- predict(prairieRegression, newdata = data.frame(nSpecies = xpts))
lines(ypts ~ xpts, lwd=1.5)
\(t\)-test and 95% confidence interval of regression slope.
prairieRegression <- lm(logStability ~ nSpecies, data = prairie)
summary(prairieRegression)
##
## Call:
## lm(formula = logStability ~ nSpecies, data = prairie)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.97148 -0.25984 -0.00234 0.23100 1.03237
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.198294 0.041298 29.016 < 2e-16 ***
## nSpecies 0.032926 0.004884 6.742 2.73e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3484 on 159 degrees of freedom
## Multiple R-squared: 0.2223, Adjusted R-squared: 0.2174
## F-statistic: 45.45 on 1 and 159 DF, p-value: 2.733e-10
confint(prairieRegression)
## 2.5 % 97.5 %
## (Intercept) 1.11673087 1.27985782
## nSpecies 0.02328063 0.04257117
ANOVA test of zero regression slope (Table 17.3-2).
anova(prairieRegression)
## Analysis of Variance Table
##
## Response: logStability
## Df Sum Sq Mean Sq F value Pr(>F)
## nSpecies 1 5.5169 5.5169 45.455 2.733e-10 ***
## Residuals 159 19.2980 0.1214
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
\(R^{2}\). The \(R^{2}\) associated with the regression is part of the output of the summary
command.
summary(prairieRegression)$r.squared
## [1] 0.2223217
Examine the effect on an outlier on linear regression. The data are percentage of white in the tail feathers of the dark-eyed junco at sites at different latitudes.
Read and inspect data.
junco <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter17/chap17f5_2JuncoOutlier.csv"))
head(junco)
## latitude percentWhiteTail
## 1 33.7 46.2
## 2 32.8 45.2
## 3 32.9 44.0
## 4 33.7 42.6
## 5 34.3 42.5
## 6 35.5 40.9
Scatter plot of all the data.
plot(percentWhiteTail ~ latitude, data = junco)
Commands for a more elaborate scatter plot using more options (Figure 17.5-2):
plot(percentWhiteTail ~ latitude, data = junco, pch = 16, col="firebrick", las = 1, cex = 1.5, bty = "l", xlab = "Latitude (degrees N)", ylab = "Percent white in tail")
Add regression lines to the scatter plot, calculated both with and without the outlier. You may need to redraw the scatter plot using the above commands before executing the following functions.
plot(percentWhiteTail ~ latitude, data = junco)
juncoRegression <- lm(percentWhiteTail ~ latitude, data = junco)
abline(juncoRegression, col = "red", lwd = 2)
juncoRegressionNoOut <- lm(percentWhiteTail ~ latitude, data = junco, subset = percentWhiteTail > 38)
abline(juncoRegressionNoOut, lwd = 2)
Detect a nonlinear relationship graphically, using data on surface area of desert pools and the number of fish species present.
Read and inspect data.
desertFish <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter17/chap17f5_3DesertPoolFish.csv"))
head(desertFish)
## poolArea nFishSpecies
## 1 9 1
## 2 20 1
## 3 30 1
## 4 50 1
## 5 200 1
## 6 60 2
Scatter plot.
plot(nFishSpecies ~ poolArea, data = desertFish)
Commands for a fancier scatter plot:
plot(nFishSpecies ~ poolArea, data = desertFish, pch = 16, col = "firebrick", las = 1, cex = 1.5, bty = "l", xlab = expression(Area~(m^2)), ylab = "Number of species", ylim = c(0,6))
Add regression line to previous scatter plot.
plot(nFishSpecies ~ poolArea, data = desertFish)
desertRegression <- lm(nFishSpecies ~ poolArea, data = desertFish)
abline(desertRegression)
Add curve based on smoothing to previous scatter plot.
plot(nFishSpecies ~ poolArea, data = desertFish)
desertSmooth <-loess(nFishSpecies ~ poolArea, data = desertFish)
xpts <- seq(min(desertFish$poolArea), max(desertFish$poolArea), length.out = 100)
ypts <- predict(desertSmooth, new = data.frame(poolArea = xpts))
lines(xpts, ypts)
Here is another method to produce a smooth curve in a scatter plot.
plot(nFishSpecies ~ poolArea, data = desertFish)
desertSmooth <-smooth.spline(desertFish$poolArea, desertFish$nFishSpecies, df = 4)
xpts <- seq(min(desertFish$poolArea), max(desertFish$poolArea), length.out = 100)
ypts <- predict(desertSmooth, xpts)$y
lines(xpts, ypts)
Create a residual plot to check assumptions. The data are cap color of offspring and parents in a sample of the blue tit.
Read and inspect the data.
capColor <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter17/chap17f5_4BlueTitCapColor.csv"))
head(capColor)
## midparentColorScore offspringColorScore
## 1 -0.62 -0.33
## 2 -0.27 -0.17
## 3 -0.25 -0.23
## 4 -0.12 -0.49
## 5 0.12 -0.59
## 6 0.18 -0.55
Basic scatter plot with regression line.
plot(offspringColorScore ~ midparentColorScore, data = capColor)
capColorRegression <- lm(offspringColorScore ~ midparentColorScore, data = capColor)
abline(capColorRegression)
Residual plot. Residuals are calculated from the model object created from the previous lm
command, using the residuals
command.
plot(residuals(capColorRegression) ~ midparentColorScore, data = capColor)
abline(c(0,0))
Commands for a fancier residual plot using more options:
plot(residuals(capColorRegression) ~ midparentColorScore, data = capColor, pch = 16, col = "firebrick", las = 1, cex = 1.5, bty = "l", xlab = "Midparent color score", ylab = "Residuals")
abline(c(0,0))
Make a residual plot to check assumptions. The data are temperature and the firing rate of neurons in cockroach.
Read and inspect the data.
roach <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter17/chap17f5_4CockroachNeurons.csv"))
head(roach)
## temperature rate
## 1 11.9 155.03
## 2 14.4 142.33
## 3 15.3 140.11
## 4 16.4 142.65
## 5 17.9 135.24
## 6 20.4 130.16
Scatter plot with regression line.
plot(rate ~ temperature, data = roach)
roachRegression <- lm(rate ~ temperature, data = roach)
abline(roachRegression)
Residual plot.
plot(residuals(roachRegression) ~ temperature, data = roach)
abline(c(0,0))
Commands for a more elaborate residual plot:
plot(residuals(roachRegression) ~ temperature, data = roach, pch = 16, col = "firebrick", las = 1, cex = 1.5, bty = "l", xlab = "Temperature", ylab = "Residuals")
abline(c(0,0))
Use a square root transformation to improve the fit to assumptions of linear regression. The data are number of pollen grains received and floral tube length of an iris species.
Read and inspect data.
iris <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter17/chap17f6_3IrisPollination.csv"))
head(iris)
## tubeLengthMm grainsDeposited
## 1 38 299
## 2 54 205
## 3 53 167
## 4 55 122
## 5 64 129
## 6 64 64
Scatter plot with regression line.
plot(grainsDeposited ~ tubeLengthMm, data = iris)
irisRegression <- lm(grainsDeposited ~ tubeLengthMm, data = iris)
abline(irisRegression)
Residual plot.
plot(residuals(irisRegression) ~ tubeLengthMm, data = iris)
abline(c(0,0))
Commands for a residual plot with more options:
plot(residuals(irisRegression) ~ tubeLengthMm, data = iris, pch = 16, col = "firebrick", las = 1, cex = 1.5, bty = "l",xlab = "Floral tube length (mm)", ylab = "Residuals")
abline(c(0,0))
Square root transformation.
iris$sqRootGrains <- sqrt(iris$grainsDeposited + 1/2)
Scatter plot using transformed data, with new regression line added.
plot(sqRootGrains ~ tubeLengthMm, data = iris)
irisRegressionSqrt <- lm(sqRootGrains ~ tubeLengthMm, data = iris)
abline(irisRegressionSqrt)
Residual plot based on the transformed data.
plot(residuals(irisRegressionSqrt) ~ tubeLengthMm, data = iris)
abline(c(0,0))
Instructions for a residual plot with more options:
plot(residuals(irisRegressionSqrt) ~ tubeLengthMm, data = iris, pch= 16, col = "firebrick", las = 1, cex=1.5, bty = "l", xlab = "Floral tube length (mm)", ylab = "Residuals")
abline(c(0,0))
Fit a nonlinear regression curve having an asymptote (Michaelis-Menten curve). The data are the relationship between population growth rate of a phytoplankton in culture and the concentration of iron in the medium.
Read and examine data.
phytoplankton <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter17/chap17f8_1IronAndPhytoplanktonGrowth.csv"))
head(phytoplankton)
## ironConcentration phytoGrowthRate
## 1 0.01096478 0.00
## 2 0.02398833 0.50
## 3 0.02818383 0.35
## 4 0.07244360 1.04
## 5 0.07585776 0.97
## 6 0.11481536 1.19
Scatter plot.
plot(phytoGrowthRate ~ ironConcentration, data = phytoplankton)
Instructions for a scatter plot with more options:
plot(phytoGrowthRate ~ ironConcentration, data = phytoplankton, pch = 16, col = "firebrick", las = 1, cex = 1.5, bty = "l", ylab = "Growth rate (no./day)", xlab = expression(paste("Iron concentration (", mu, "mol)")) )
Fit a Michaelis-Menten curve to the phytoplankton data using the nls
(nonlinear least squares). To fit the curve, provide a formula that also includes symbols (letters) for the parameters to be estimated. In the following function, we use “a” and “b” to indicate the parameters of the Michaelis-Menten curve we want to estimate. The function includes an argument where we must provide an initial guess of parameter values. The value of the initial guess is not so important – here we choose a=1 and b=1 as initial guesses. The first function below carried out the model fit and save the results in an R object named phytoCurve
.
phytoCurve <- nls(phytoGrowthRate ~ a*ironConcentration / ( b+ironConcentration), data = phytoplankton, list(a = 1, b = 1))
Obtain the parameter estimates using the summary
command, including standard errors and \(t\)-tests of null hypotheses that parameter values are zero.
summary(phytoCurve)
##
## Formula: phytoGrowthRate ~ a * ironConcentration/(b + ironConcentration)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## a 1.94164 0.06802 28.545 1.14e-11 ***
## b 0.07833 0.01120 6.994 2.29e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1133 on 11 degrees of freedom
##
## Number of iterations to convergence: 6
## Achieved convergence tolerance: 6.672e-06
Add the nonlinear regression curve to scatter plot. If necessary, redraw the scatter plot before issuing the following commands.
plot(phytoGrowthRate ~ ironConcentration, data = phytoplankton)
xpts <- seq(min(phytoplankton$ironConcentration), max(phytoplankton$ironConcentration), length.out = 100)
ypts <- predict(phytoCurve, new = data.frame(ironConcentration = xpts))
lines(xpts, ypts)
Many of the functions that can be used to extract results from a saved lm
object work in the same way when applied to an nls
object, such as predict
, residuals
, and coef
.
Fit a quadratic curve to the relationship between the number of plant species present in ponds and pond productivity.
Read and examine data.
pondProductivity <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter17/chap17f8_2PondPlantsAndProductivity.csv"))
head(pondProductivity)
## productivity species
## 1 4 3
## 2 11 2
## 3 10 1
## 4 10 0
## 5 12 4
## 6 15 4
Scatter plot.
plot(species ~ productivity, data = pondProductivity)
Commands for a fancier scatter plot:
plot(species ~ productivity, data = pondProductivity, pch = 16, col = "firebrick", las = 1, cex = 1.5, bty = "l", ylab = "Number of species", xlab = "Productivity (g/15 days)" )
Fit a quadratic curve to the data. Here, the single variable productivity
in the data frame is included in the formula both as itself and as the squared term, productivity\(^2\). To make the squared term work, we need to wrap the term with I()
. The results of the model fit are saved in an R object productivityCurve
.
productivityCurve <- lm(species ~ productivity + I(productivity^2), data = pondProductivity)
Show estimates of the parameters of the quadratic curve (regression coefficients) are obtained as follows, along with standard errors and \(t\)-tests.
summary(productivityCurve)
##
## Call:
## lm(formula = species ~ productivity + I(productivity^2), data = pondProductivity)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.6330 -0.9952 -0.0158 1.4458 3.5215
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.7535477 1.1118289 1.577 0.126401
## productivity 0.2083549 0.0558421 3.731 0.000897 ***
## I(productivity^2) -0.0020407 0.0005677 -3.595 0.001280 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.999 on 27 degrees of freedom
## Multiple R-squared: 0.3402, Adjusted R-squared: 0.2913
## F-statistic: 6.961 on 2 and 27 DF, p-value: 0.003648
Add quadratic regression curve to scatter plot. If necessary, redraw the previous scatter plot before issuing the following commands.
plot(species ~ productivity, data = pondProductivity)
xpts <- seq(min(pondProductivity$productivity), max(pondProductivity$productivity), length.out = 100)
ypts <- predict(productivityCurve, new = data.frame(productivity = xpts))
lines(xpts, ypts)
Fit a formula-free curve (cubic spline) to the relationship between body length and age for female fur seals.
Read and inspect data
shrink <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter17/chap17e8ShrinkingSeals.csv"))
head(shrink)
## ageInDays length
## 1 5337 131
## 2 2081 123
## 3 2085 122
## 4 4299 136
## 5 2861 122
## 6 5052 131
Scatter plot.
plot(length ~ ageInDays, data = shrink, pch = ".")
Commands for a fancier scatter plot with more options:
plot(jitter(length, factor = 2) ~ ageInDays, data = shrink, pch = ".", col = "firebrick", las = 1, bty = "l", ylab = "Body length (cm)", xlab = "Female age (days)")
Fit a cubic spline. The argument df
stands for effective degrees of freedom, which allows you to control how complicated the curve should be. The simplest possible curve is a straight line, which has df=2
(one for slope and another for intercept). More complex curves require more df
. Here we fit a very complicated curve.
shrinkCurve <- smooth.spline(shrink$ageInDays, shrink$length, df = 40)
Add curve to scatter plot. If needed, replot the scatter plot before issuing the commands below.
plot(length ~ ageInDays, data = shrink, pch = ".")
xpts <- seq(min(shrink$ageInDays), max(shrink$ageInDays), length.out = 1000)
ypts <- predict(shrinkCurve, xpts)$y
lines(xpts, ypts)
Fit a logistic regression to the relationship between guppy mortality and duration of exposure to a temperature of 5 degrees C.
Read and inspect the data.
guppy <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter17/chap17f9_1GuppyColdDeath.csv"))
head(guppy)
## fish exposureDurationMin mortality
## 1 1 3 1
## 2 2 3 1
## 3 3 3 1
## 4 4 3 1
## 5 5 3 1
## 6 6 3 1
Draw a frequency table of mortality at different exposure times.
table(guppy$mortality, guppy$exposureDurationMin, dnn = c("Mortality","Exposure (min)"))
## Exposure (min)
## Mortality 3 8 12 18
## 0 29 16 11 2
## 1 11 24 29 38
Scatter plot of the data.
plot(mortality ~ jitter(exposureDurationMin), data = guppy)
Commands for a fancier scatter plot, with more options:
plot(jitter(mortality, factor = 0.1) ~ jitter(exposureDurationMin, factor = 1), data = guppy, col = "firebrick", las = 1, bty = "l", ylab = "Mortality", xlab = "Duration of exposure (min)")
Fit a logistic regression.
guppyGlm <- glm(mortality ~ exposureDurationMin, data = guppy, family = binomial(link = logit))
Add logistic regression curve to scatter plot.
plot(mortality ~ jitter(exposureDurationMin), data = guppy)
xpts <- seq(min(guppy$exposureDurationMin), max(guppy$exposureDurationMin), length.out = 100)
ypts <- predict(guppyGlm, newdata = data.frame(exposureDurationMin = xpts), type = "response")
lines(xpts, ypts)
Table of regression coefficients, with parameter estimates (Table 17.9-2).
summary(guppyGlm)
##
## Call:
## glm(formula = mortality ~ exposureDurationMin, family = binomial(link = logit),
## data = guppy)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.3332 -0.8115 0.3688 0.7206 1.5943
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.66081 0.40651 -4.086 4.40e-05 ***
## exposureDurationMin 0.23971 0.04245 5.646 1.64e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 209.55 on 159 degrees of freedom
## Residual deviance: 164.69 on 158 degrees of freedom
## AIC: 168.69
##
## Number of Fisher Scoring iterations: 4
95% confidence intervals for parameter estimates. It is necessary to load the MASS
package first.
if (!require("MASS")) { install.packages("MASS", dependencies=T, repos="http://cran.rstudio.com/") }
## Loading required package: MASS
library(MASS)
confint(guppyGlm)
## Waiting for profiling to be done...
## 2.5 % 97.5 %
## (Intercept) -2.4964323 -0.8931288
## exposureDurationMin 0.1614356 0.3288161
Predict probability of mortality (mean mortality) for a given x-value, 10 min duration, including standard error of prediction.
predict(guppyGlm, newdata = data.frame(exposureDurationMin = 10), type = "response", se.fit = TRUE)
## $fit
## 1
## 0.6761874
##
## $se.fit
## 1
## 0.04421466
##
## $residual.scale
## [1] 1
Estimate the LD50, the dose at which half the individuals are predicted to die from exposure.
library(MASS)
dose.p(guppyGlm, p = 0.50)
## Dose SE
## p = 0.5: 6.928366 0.841132
Analysis of deviance table, with a null hypothesis of zero slope (Table 17.9-3).
anova(guppyGlm, test = "Chi")
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: mortality
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 159 209.55
## exposureDurationMin 1 44.86 158 164.69 2.116e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1