Note: This document was converted to R-Markdown from this page by M. Drew LaMar. You can download the R-Markdown here.
Download the R code on this page as a single file here
Hover over a function argument for a short description of its meaning. The variable names are plucked from the examples further below.
Paired \(t\)-test and 95% confidence interval of a mean difference:
t.test(blackbird$logAfterImplant, blackbird$logBeforeImplant, paired = TRUE)
Two-sample \(t\)-test and 95% confidence interval for a difference between two means:
t.test(squamosalHornLength ~ Survival, data = lizard, var.equal = TRUE)
Other new methods:
Confidence interval for mean difference and the paired t-test, comparing immunocompetence of red-winged blackbirds before and after testosterone implants.
Read the data into a data frame. The data are in “wide” format (one row per individual).
blackbird <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter12/chap12e2BlackbirdTestosterone.csv"))
blackbird
## blackbird beforeImplant afterImplant logBeforeImplant logAfterImplant
## 1 1 105 85 4.65 4.44
## 2 2 50 74 3.91 4.30
## 3 3 136 145 4.91 4.98
## 4 4 90 86 4.50 4.45
## 5 5 122 148 4.80 5.00
## 6 6 132 148 4.88 5.00
## 7 7 131 150 4.88 5.01
## 8 8 119 142 4.78 4.96
## 9 9 145 151 4.98 5.02
## 10 10 130 113 4.87 4.73
## 11 11 116 118 4.75 4.77
## 12 12 110 99 4.70 4.60
## 13 13 138 150 4.93 5.01
Calculate and plot differences (Figure 12.2-2). We add a variable called d
to the data frame with the After minus Before difference.
blackbird$d <- blackbird$logAfterImplant - blackbird$logBeforeImplant
head(blackbird)
## blackbird beforeImplant afterImplant logBeforeImplant logAfterImplant
## 1 1 105 85 4.65 4.44
## 2 2 50 74 3.91 4.30
## 3 3 136 145 4.91 4.98
## 4 4 90 86 4.50 4.45
## 5 5 122 148 4.80 5.00
## 6 6 132 148 4.88 5.00
## d
## 1 -0.21
## 2 0.39
## 3 0.07
## 4 -0.05
## 5 0.20
## 6 0.12
hist(blackbird$d, right = FALSE, col = "firebrick")
To see how to produce the strip chart with lines (Figure 12.2-1):
# It helps to obtain a version of the data in "long" format.
blackbird2 = reshape(blackbird, varying = 4:5, direction = "long", idvar = "blackbird", v.names = "logAntibody", times = factor(c("before","after"), levels = c("before","after")))
head(blackbird2)
## blackbird beforeImplant afterImplant d time logAntibody
## 1.before 1 105 85 -0.21 before 4.65
## 2.before 2 50 74 0.39 before 3.91
## 3.before 3 136 145 0.07 before 4.91
## 4.before 4 90 86 -0.05 before 4.50
## 5.before 5 122 148 0.20 before 4.80
## 6.before 6 132 148 0.12 before 4.88
par(bty="l")
stripchart(logAntibody ~ time, data = blackbird2, vertical = TRUE, xlab = "Implant treatment", ylab="Antibody production rate (ln[mOD/min])", xlim = c(0.6,2.4), las = 1, pch = 16, col = "firebrick")
segments(1, blackbird$logBeforeImplant, 2, blackbird$logAfterImplant)
95% confidence interval for the mean difference. 95% confidence intervals are part of the output of the t.test
function, viewed in isolation by adding $conf.int
to the command.
t.test(blackbird$d)$conf.int
## [1] -0.04007695 0.15238464
## attr(,"conf.level")
## [1] 0.95
or using the paired = TRUE
argument of t.test
and specifying the paired variables.
t.test(blackbird$logAfterImplant, blackbird$logBeforeImplant, paired = TRUE)$conf.int
## [1] -0.04007695 0.15238464
## attr(,"conf.level")
## [1] 0.95
Paired t-test. A paired \(t\)-test can be done either on differences you have already calculated (d
here) or by using the paired=TRUE
argument with the measurements from the two groups.
t.test(blackbird$d)
##
## One Sample t-test
##
## data: blackbird$d
## t = 1.2714, df = 12, p-value = 0.2277
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -0.04007695 0.15238464
## sample estimates:
## mean of x
## 0.05615385
or
t.test(blackbird$logAfterImplant, blackbird$logBeforeImplant, paired = TRUE)
##
## Paired t-test
##
## data: blackbird$logAfterImplant and blackbird$logBeforeImplant
## t = 1.2714, df = 12, p-value = 0.2277
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.04007695 0.15238464
## sample estimates:
## mean of the differences
## 0.05615385
Confidence interval for the difference between two means, and the two-sample \(t\)-test, comparing horn length of live and dead (spiked) horned lizards.
lizard <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter12/chap12e3HornedLizards.csv"))
lizard
## squamosalHornLength Survival
## 1 25.2 living
## 2 26.9 living
## 3 26.6 living
## 4 25.6 living
## 5 25.7 living
## 6 25.9 living
## 7 27.3 living
## 8 25.1 living
## 9 30.3 living
## 10 25.6 living
## 11 26.0 living
## 12 24.6 living
## 13 25.6 living
## 14 25.3 living
## 15 23.5 living
## 16 24.5 living
## 17 23.3 living
## 18 26.0 living
## 19 23.9 living
## 20 27.3 living
## 21 25.4 living
## 22 25.5 living
## 23 21.4 living
## 24 23.8 living
## 25 25.5 living
## 26 19.2 living
## 27 20.7 living
## 28 19.2 living
## 29 25.5 living
## 30 20.5 living
## 31 20.6 living
## 32 24.9 living
## 33 23.7 living
## 34 23.4 living
## 35 25.6 living
## 36 26.6 living
## 37 27.7 living
## 38 25.7 living
## 39 27.0 living
## 40 26.5 living
## 41 25.0 living
## 42 19.7 living
## 43 27.1 living
## 44 23.0 living
## 45 22.7 living
## 46 25.8 living
## 47 28.8 living
## 48 23.5 living
## 49 23.2 living
## 50 25.3 living
## 51 23.8 living
## 52 25.1 living
## 53 26.7 living
## 54 27.1 living
## 55 22.5 living
## 56 25.5 living
## 57 24.4 living
## 58 25.6 living
## 59 20.6 living
## 60 25.5 living
## 61 24.5 living
## 62 23.0 living
## 63 27.6 living
## 64 27.3 living
## 65 20.7 living
## 66 26.0 living
## 67 25.1 living
## 68 19.9 living
## 69 23.5 living
## 70 24.8 living
## 71 22.4 living
## 72 24.7 living
## 73 27.5 living
## 74 21.5 living
## 75 24.0 living
## 76 22.4 living
## 77 21.3 living
## 78 25.0 living
## 79 24.3 living
## 80 24.5 living
## 81 23.6 living
## 82 21.1 living
## 83 25.6 living
## 84 26.1 living
## 85 23.7 living
## 86 24.2 living
## 87 23.2 living
## 88 26.5 living
## 89 28.1 living
## 90 24.1 living
## 91 29.5 living
## 92 24.0 living
## 93 26.8 living
## 94 25.8 living
## 95 23.3 living
## 96 22.4 living
## 97 24.2 living
## 98 26.1 living
## 99 23.2 living
## 100 21.7 living
## 101 24.7 living
## 102 21.4 living
## 103 18.5 living
## 104 23.2 living
## 105 NA living
## 106 21.5 living
## 107 29.1 living
## 108 21.7 living
## 109 23.5 living
## 110 23.0 living
## 111 20.0 living
## 112 21.9 living
## 113 27.4 living
## 114 27.1 living
## 115 23.1 living
## 116 23.8 living
## 117 26.5 living
## 118 27.0 living
## 119 26.4 living
## 120 26.3 living
## 121 20.6 living
## 122 24.5 living
## 123 22.8 living
## 124 25.5 living
## 125 25.3 living
## 126 28.0 living
## 127 20.8 living
## 128 25.9 living
## 129 24.0 living
## 130 22.5 living
## 131 26.3 living
## 132 23.3 living
## 133 24.5 living
## 134 21.6 living
## 135 23.6 living
## 136 23.0 living
## 137 22.4 living
## 138 27.4 living
## 139 25.0 living
## 140 23.9 living
## 141 28.2 living
## 142 27.2 living
## 143 13.1 living
## 144 22.9 living
## 145 26.6 living
## 146 25.8 living
## 147 26.3 living
## 148 20.9 living
## 149 25.6 living
## 150 24.8 living
## 151 19.2 living
## 152 27.4 living
## 153 24.2 living
## 154 15.7 living
## 155 17.7 living
## 156 21.4 killed
## 157 23.9 killed
## 158 23.2 killed
## 159 22.6 killed
## 160 22.5 killed
## 161 19.3 killed
## 162 23.5 killed
## 163 23.4 killed
## 164 19.0 killed
## 165 21.7 killed
## 166 20.2 killed
## 167 26.7 killed
## 168 21.7 killed
## 169 21.0 killed
## 170 23.9 killed
## 171 24.6 killed
## 172 21.6 killed
## 173 25.3 killed
## 174 25.0 killed
## 175 25.2 killed
## 176 15.2 killed
## 177 22.9 killed
## 178 21.4 killed
## 179 23.9 killed
## 180 17.2 killed
## 181 15.5 killed
## 182 22.0 killed
## 183 22.0 killed
## 184 23.1 killed
## 185 20.7 killed
Note that there is one missing value for the variable squamosalHornLength
. Everything is easier if we eliminate the row with missing data.
lizard2 <- na.omit(lizard)
lizard2
## squamosalHornLength Survival
## 1 25.2 living
## 2 26.9 living
## 3 26.6 living
## 4 25.6 living
## 5 25.7 living
## 6 25.9 living
## 7 27.3 living
## 8 25.1 living
## 9 30.3 living
## 10 25.6 living
## 11 26.0 living
## 12 24.6 living
## 13 25.6 living
## 14 25.3 living
## 15 23.5 living
## 16 24.5 living
## 17 23.3 living
## 18 26.0 living
## 19 23.9 living
## 20 27.3 living
## 21 25.4 living
## 22 25.5 living
## 23 21.4 living
## 24 23.8 living
## 25 25.5 living
## 26 19.2 living
## 27 20.7 living
## 28 19.2 living
## 29 25.5 living
## 30 20.5 living
## 31 20.6 living
## 32 24.9 living
## 33 23.7 living
## 34 23.4 living
## 35 25.6 living
## 36 26.6 living
## 37 27.7 living
## 38 25.7 living
## 39 27.0 living
## 40 26.5 living
## 41 25.0 living
## 42 19.7 living
## 43 27.1 living
## 44 23.0 living
## 45 22.7 living
## 46 25.8 living
## 47 28.8 living
## 48 23.5 living
## 49 23.2 living
## 50 25.3 living
## 51 23.8 living
## 52 25.1 living
## 53 26.7 living
## 54 27.1 living
## 55 22.5 living
## 56 25.5 living
## 57 24.4 living
## 58 25.6 living
## 59 20.6 living
## 60 25.5 living
## 61 24.5 living
## 62 23.0 living
## 63 27.6 living
## 64 27.3 living
## 65 20.7 living
## 66 26.0 living
## 67 25.1 living
## 68 19.9 living
## 69 23.5 living
## 70 24.8 living
## 71 22.4 living
## 72 24.7 living
## 73 27.5 living
## 74 21.5 living
## 75 24.0 living
## 76 22.4 living
## 77 21.3 living
## 78 25.0 living
## 79 24.3 living
## 80 24.5 living
## 81 23.6 living
## 82 21.1 living
## 83 25.6 living
## 84 26.1 living
## 85 23.7 living
## 86 24.2 living
## 87 23.2 living
## 88 26.5 living
## 89 28.1 living
## 90 24.1 living
## 91 29.5 living
## 92 24.0 living
## 93 26.8 living
## 94 25.8 living
## 95 23.3 living
## 96 22.4 living
## 97 24.2 living
## 98 26.1 living
## 99 23.2 living
## 100 21.7 living
## 101 24.7 living
## 102 21.4 living
## 103 18.5 living
## 104 23.2 living
## 106 21.5 living
## 107 29.1 living
## 108 21.7 living
## 109 23.5 living
## 110 23.0 living
## 111 20.0 living
## 112 21.9 living
## 113 27.4 living
## 114 27.1 living
## 115 23.1 living
## 116 23.8 living
## 117 26.5 living
## 118 27.0 living
## 119 26.4 living
## 120 26.3 living
## 121 20.6 living
## 122 24.5 living
## 123 22.8 living
## 124 25.5 living
## 125 25.3 living
## 126 28.0 living
## 127 20.8 living
## 128 25.9 living
## 129 24.0 living
## 130 22.5 living
## 131 26.3 living
## 132 23.3 living
## 133 24.5 living
## 134 21.6 living
## 135 23.6 living
## 136 23.0 living
## 137 22.4 living
## 138 27.4 living
## 139 25.0 living
## 140 23.9 living
## 141 28.2 living
## 142 27.2 living
## 143 13.1 living
## 144 22.9 living
## 145 26.6 living
## 146 25.8 living
## 147 26.3 living
## 148 20.9 living
## 149 25.6 living
## 150 24.8 living
## 151 19.2 living
## 152 27.4 living
## 153 24.2 living
## 154 15.7 living
## 155 17.7 living
## 156 21.4 killed
## 157 23.9 killed
## 158 23.2 killed
## 159 22.6 killed
## 160 22.5 killed
## 161 19.3 killed
## 162 23.5 killed
## 163 23.4 killed
## 164 19.0 killed
## 165 21.7 killed
## 166 20.2 killed
## 167 26.7 killed
## 168 21.7 killed
## 169 21.0 killed
## 170 23.9 killed
## 171 24.6 killed
## 172 21.6 killed
## 173 25.3 killed
## 174 25.0 killed
## 175 25.2 killed
## 176 15.2 killed
## 177 22.9 killed
## 178 21.4 killed
## 179 23.9 killed
## 180 17.2 killed
## 181 15.5 killed
## 182 22.0 killed
## 183 22.0 killed
## 184 23.1 killed
## 185 20.7 killed
Multiple histograms using the lattice package.
library(lattice)
histogram( ~ squamosalHornLength | Survival, data = lizard2, layout = c(1,2), col = "firebrick", breaks = seq(12, 32, by = 2), xlab = "Horn length (mm)")
Here is code to make multiple histograms using hist
in base R instead.
oldpar = par(no.readonly = TRUE) # make backup of default graph settings
par(mfrow = c(2,1), oma = c(4, 6, 2, 6), mar = c(3, 4, 3, 2))
hist(lizard2$squamosalHornLength[lizard2$Survival == "living"], breaks = seq(12,32,by=2), col = "firebrick", las = 1, main = "living", ylab = "Frequency")
hist(lizard2$squamosalHornLength[lizard2$Survival == "killed"], breaks = seq(12,32,by=2), col = "firebrick", las = 1, main = "killed", ylab = "Frequency")
mtext("Horn length (mm)", side = 1, outer = TRUE, padj = 0)
par(oldpar) # revert to default graph settings
95% confidence interval for the difference between means
The output of t.test
includes the 95% confidence interval for the difference between means. Add $confint
after calling the function to get R to report only the confidence interval. The formula in the following command tells R to compare squamosalHornLength
between the two groups indicated by Survival
.
t.test(squamosalHornLength ~ Survival, data = lizard, var.equal = TRUE)$conf.int
## [1] -3.335402 -1.253602
## attr(,"conf.level")
## [1] 0.95
A two-sample \(t\)-test of the difference between two means can be carried out with t.test
by using a formula, asking if squamosalHornLength
is predicted by Survival
, and specifying that the variables are in the data frame lizard
.
t.test(squamosalHornLength ~ Survival, data = lizard, var.equal = TRUE)
##
## Two Sample t-test
##
## data: squamosalHornLength by Survival
## t = -4.3494, df = 182, p-value = 2.27e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.335402 -1.253602
## sample estimates:
## mean in group killed mean in group living
## 21.98667 24.28117
Welch’s two-sample t-test for unequal variances, comparing chinook salmon survival in the presents and absence of brook trout. Below, we use this same example to demonstrate the 95% confidence interval for the ratio of two variances, and the F-test of equal variances.
Read the data.
chinook <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter12/chap12e4ChinookWithBrookTrout.csv"))
Set the preferred order of categories in tables and graphs
chinook$troutTreatment <- factor(chinook$troutTreatment, levels = c("present", "absent"))
Strip chart of the data (bare bones version of Figure 12.4-1)
stripchart(proportionSurvived ~ troutTreatment, data = chinook, method = "jitter", vertical = TRUE)
Adding the means and confidence intervals to the plot is trickier. Code for a fancier plot is as follows.
# Calculate means and confidence intervals for the means.
meanProportion = tapply(chinook$proportionSurvived, chinook$troutTreatment, mean)
ciPresence = t.test(chinook$proportionSurvived[chinook$troutTreatment == "present"])$conf.int
ciAbsence = t.test(chinook$proportionSurvived[chinook$troutTreatment == "absent"])$conf.int
lower = c(ciPresence[1], ciAbsence[1])
upper = c(ciPresence[2], ciAbsence[2])
# Stripchart with options
adjustAmount = 0.2
par(bty = "l")
stripchart(proportionSurvived ~ troutTreatment, data = chinook, vertical = TRUE,method = "jitter", jitter = 0.1, pch = 1, col = "firebrick", cex = 1.5, las = 1, ylim = c(0, max(chinook$proportionSurvived)), lwd = 1.5, xlab = "Trout treatment", ylab = "Proportion surviving")
segments( c(1,2) + adjustAmount, lower, c(1,2) + adjustAmount, upper)
points(meanProportion ~ c( c(1,2) + adjustAmount ), pch = 16, cex = 1.2)
Calculating summary statistics by group (as found in Table 12.4-3)
Use tapply
to calculate statistics by group. It has three required arguments. The first is the numeric variable of interest (a vector). The second argument is a categorical variable (a vector of the same length) identifying the groups that individuals belong to. The third argument is the name of the R function that you want to apply to the variable by group.
meanProportion <- tapply(chinook$proportionSurvived, chinook$troutTreatment, mean)
sdProportion <- tapply(chinook$proportionSurvived, chinook$troutTreatment, sd)
nProportion <- tapply(chinook$proportionSurvived, chinook$troutTreatment, length)
data.frame(mean = meanProportion, std.dev = sdProportion, n = nProportion)
## mean std.dev n
## present 0.1941667 0.02971476 6
## absent 0.2351667 0.10364056 6
Welch’s two-sample \(t\)-test of means for unequal variances can also be done with t.test
, when the var.equal
argument is set to FALSE
(as it is by default):
t.test(proportionSurvived ~ troutTreatment, data = chinook, var.equal = FALSE)
##
## Welch Two Sample t-test
##
## data: proportionSurvived by troutTreatment
## t = -0.93148, df = 5.8165, p-value = 0.3886
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.14953173 0.06753173
## sample estimates:
## mean in group present mean in group absent
## 0.1941667 0.2351667
Here, we demonstrate the 95% confidence interval for the ratio of two variances, and F-test of equal variances, using the chinook salmon experiment.
95% confidence interval for variance ratio. (Warning: remember that the method is not robust to departures from assumption of normality.)
var.test(proportionSurvived ~ troutTreatment, data = chinook)$conf.int
## [1] 0.01150267 0.58745010
## attr(,"conf.level")
## [1] 0.95
\(F\)-test of equal variances (Warning: Remember that the \(F\)-test is not robust to departures from assumption of normality.)
var.test(proportionSurvived ~ troutTreatment, data = chinook)
##
## F test to compare two variances
##
## data: proportionSurvived by troutTreatment
## F = 0.082202, num df = 5, denom df = 5, p-value = 0.01589
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.01150267 0.58745010
## sample estimates:
## ratio of variances
## 0.08220245
Levene’s test of equal variances. This function is in the car
package, which must first be installed and loaded with the library function before use.
if (!require("car")) {install.packages("car", dependencies = TRUE, repos="http://cran.rstudio.com/")}
library(car)
leveneTest(chinook$proportionSurvived, group = chinook$troutTreatment, center = mean)
## Levene's Test for Homogeneity of Variance (center = mean)
## Df F value Pr(>F)
## group 1 10.315 0.009306 **
## 10
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1