Because there is no pairing between wells, item c) is not a matched-pair, and the methods of this chapter would not be applicable to those data.
North <- c(255, 353, 470, 353, 353, 295, 199, 410, 346, 405)
South <- c(194, 348, 383, 225, 266, 194, 212, 320, 340, 310)
wilcox.test(North, South, alt = "greater", paired = TRUE)
## Warning in wilcox.test.default(North, South, alt = "greater", paired =
## TRUE): cannot compute exact p-value with ties
##
## Wilcoxon signed rank test with continuity correction
##
## data: North and South
## V = 52, p-value = 0.007185
## alternative hypothesis: true location shift is greater than 0
At an alpha=0.05 level, reject that the two forks have similar specific conductance, in favor of the North fork having higher conductance than the South.
wilcox.test(North, South, alt = "greater", paired = TRUE, conf.int = TRUE)
## Warning in wilcox.test.default(North, South, alt = "greater", paired =
## TRUE, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(North, South, alt = "greater", paired =
## TRUE, : cannot compute exact confidence interval with ties
##
## Wilcoxon signed rank test with continuity correction
##
## data: North and South
## V = 52, p-value = 0.007185
## alternative hypothesis: true location shift is greater than 0
## 95 percent confidence interval:
## 37.00004 Inf
## sample estimates:
## (pseudo)median
## 66.99994
The estimated difference is 67 units.
par(las = 1, tck = 0.02, xaxs = "i", yaxs = "i") # USGS graphics standards
d.hl <- -1 * wilcox.test(North, South, paired = TRUE, alternative = "greater",
conf.int = TRUE)$estimate # Hodges-Lehmann estimate
## Warning in wilcox.test.default(North, South, paired = TRUE, alternative =
## "greater", : cannot compute exact p-value with ties
## Warning in wilcox.test.default(North, South, paired = TRUE, alternative =
## "greater", : cannot compute exact confidence interval with ties
plot(North, South, xlab = "Specific Conductance, North Fork of the Shennandoah River", ylab = "Specific Conductance, South Fork of the Shennandoah River", xlim = c(150, 500), ylim = c(150, 450))
abline(0, 1)
abline(d.hl, 1, lty = "dashed")
The dashed line representing a 67 unit difference between the forks gives a reasonable fit to the data.
Use the signed-rank test to determine similarity or difference.
June.atra <- c(0.38, 0.04, -0.01, 0.03, 0.03, 0.05, 0.02, -0.01, -0.01, -0.01, 0.11, 0.09, -0.01, -0.01, -0.01, -0.01, 0.02, 0.03, 0.02, 0.02, 0.05, 0.03, 0.05, -0.01)
Sept.atra <- c(2.66, 0.63, 0.59, 0.05, 0.84, 0.58, 0.02, 0.01, -0.01, -0.01, 0.09, 0.31, 0.02, -0.01, 0.5, 0.03, 0.09, 0.06, 0.03, 0.01, 0.1, 0.25, 0.03, 88.36)
wilcox.test(June.atra, Sept.atra, alt = "less", paired = TRUE, conf.int = TRUE)
## Warning in wilcox.test.default(June.atra, Sept.atra, alt = "less", paired =
## TRUE, : cannot compute exact p-value with ties
## Warning in wilcox.test.default(June.atra, Sept.atra, alt = "less", paired =
## TRUE, : cannot compute exact confidence interval with ties
## Warning in wilcox.test.default(June.atra, Sept.atra, alt = "less", paired =
## TRUE, : cannot compute exact p-value with zeroes
## Warning in wilcox.test.default(June.atra, Sept.atra, alt = "less", paired =
## TRUE, : cannot compute exact confidence interval with zeroes
##
## Wilcoxon signed rank test with continuity correction
##
## data: June.atra and Sept.atra
## V = 12, p-value = 0.0002751
## alternative hypothesis: true location shift is less than 0
## 95 percent confidence interval:
## -Inf -0.04997844
## sample estimates:
## (pseudo)median
## -0.2750312
At an alpha=0.05 level, strongly reject the null hypothesis (p=0.00027) and conclude that concentrations are higher in Sept than they were in June, by a median of 0.275 ug/L.
Here are the consequences for testing differences in mean atrazine concentrations between the two months using the t-test. First, set the nondetects to a zero.
June.atra[June.atra < 0] = 0
Sept.atra[Sept.atra < 0] = 0
t.test(June.atra, Sept.atra, alt = "less", paired = TRUE)
##
## Paired t-test
##
## data: June.atra and Sept.atra
## t = -1.0698, df = 23, p-value = 0.1479
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf 2.365135
## sample estimates:
## mean of the differences
## -3.92875
At an alpha=0.05 level, do not reject the null hypothesis that mean concentrations are the same in the two months! This is in spite of a mean difference of 3.9 ug/L and a very significant difference found by the signed-rank test. Why this result?
Because testing for differences between means is not the same as testing whether one group has larger values more often than the other. The test on means is evaluating the total ‘mass’ of each group. The signed-rank test determines whether one group is higher than the other more frequently than could be expected by chance. These are two different objectives.
Because the one large Sept outlier inflates the standard deviation (the noise level for the test), making the signal more difficult to see. The signed-rank test looks at frequencies of occurrence, and doesn’t use the standard deviation in its computation. The nonparametric test is resistant to effects of outliers.
Neither the June nor Sept data follow a normal distribution. The t-test therefore has lower power to see differences.
Setting the nondetects to an arbitrary value (here, zero) could either increase or decrease the standard deviations. Using another arbitrary choice such as one-half the dl could result in a different yet equally arbitrary outcome. A big problem is that substitution declares to the t-test that you know exactly what each of those values are. That is false.
Together, these four problems show that substitution for nondetects followed by a t-test is an arbitrary and invalid procedure. It is also totally avoidable, as the signed-rank test and other familiar nonparametric methods work well for data with nondetects at one detection limit.