Exercise 6.1

Because there is no pairing between wells, item c) is not a matched-pair, and the methods of this chapter would not be applicable to those data.

Exercise 6.2

The null hypothesis is that the two forks have similar conductance. The (one-sided) alternative is that the North fork has higher conductance than the South fork.
There has been no discussion about cumulative amounts or mass of conductance over time, just whether one fork is higher than the other. This is a frequency question and so best answered by a nonparametric test.
Using the signed-rank test:

North <- c(255, 353, 470, 353, 353, 295, 199, 410, 346, 405)
South <- c(194, 348, 383, 225, 266, 194, 212, 320, 340, 310)
wilcox.test(North, South, alt = "greater", paired = TRUE)

## Warning in wilcox.test.default(North, South, alt = "greater", paired =
## TRUE): cannot compute exact p-value with ties

## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  North and South
## V = 52, p-value = 0.007185
## alternative hypothesis: true location shift is greater than 0

At an alpha=0.05 level, reject that the two forks have similar specific conductance, in favor of the North fork having higher conductance than the South.

Estimate the difference in conductance using a Hodges-Lehmann estimator:

wilcox.test(North, South, alt = "greater", paired = TRUE, conf.int = TRUE)

## Warning in wilcox.test.default(North, South, alt = "greater", paired =
## TRUE, : cannot compute exact p-value with ties

## Warning in wilcox.test.default(North, South, alt = "greater", paired =
## TRUE, : cannot compute exact confidence interval with ties

## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  North and South
## V = 52, p-value = 0.007185
## alternative hypothesis: true location shift is greater than 0
## 95 percent confidence interval:
##  37.00004      Inf
## sample estimates:
## (pseudo)median 
##       66.99994

The estimated difference is 67 units.

A scatterplot with a 1:1 line and a line representing the Hodges-Lehman difference is:

par(las = 1, tck = 0.02, xaxs = "i", yaxs = "i") # USGS graphics standards
d.hl <- -1 * wilcox.test(North, South, paired = TRUE, alternative = "greater", 
                         conf.int = TRUE)$estimate # Hodges-Lehmann estimate

## Warning in wilcox.test.default(North, South, paired = TRUE, alternative =
## "greater", : cannot compute exact p-value with ties

## Warning in wilcox.test.default(North, South, paired = TRUE, alternative =
## "greater", : cannot compute exact confidence interval with ties

plot(North, South, xlab = "Specific Conductance, North Fork of the Shennandoah River", ylab = "Specific Conductance, South Fork of the Shennandoah River", xlim = c(150, 500), ylim = c(150, 450))
abline(0, 1)
abline(d.hl, 1, lty = "dashed")

The dashed line representing a 67 unit difference between the forks gives a reasonable fit to the data.

Exercise 6.3

Use the signed-rank test to determine similarity or difference.

June.atra <- c(0.38, 0.04, -0.01, 0.03, 0.03, 0.05, 0.02, -0.01, -0.01, -0.01, 0.11, 0.09, -0.01, -0.01, -0.01, -0.01, 0.02, 0.03, 0.02, 0.02, 0.05, 0.03, 0.05, -0.01)
Sept.atra <- c(2.66, 0.63, 0.59, 0.05, 0.84, 0.58, 0.02, 0.01, -0.01, -0.01, 0.09, 0.31, 0.02, -0.01, 0.5, 0.03, 0.09, 0.06, 0.03, 0.01, 0.1, 0.25, 0.03, 88.36)
wilcox.test(June.atra, Sept.atra, alt = "less", paired = TRUE, conf.int = TRUE)

## Warning in wilcox.test.default(June.atra, Sept.atra, alt = "less", paired =
## TRUE, : cannot compute exact p-value with ties

## Warning in wilcox.test.default(June.atra, Sept.atra, alt = "less", paired =
## TRUE, : cannot compute exact confidence interval with ties

## Warning in wilcox.test.default(June.atra, Sept.atra, alt = "less", paired =
## TRUE, : cannot compute exact p-value with zeroes

## Warning in wilcox.test.default(June.atra, Sept.atra, alt = "less", paired =
## TRUE, : cannot compute exact confidence interval with zeroes

## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  June.atra and Sept.atra
## V = 12, p-value = 0.0002751
## alternative hypothesis: true location shift is less than 0
## 95 percent confidence interval:
##         -Inf -0.04997844
## sample estimates:
## (pseudo)median 
##     -0.2750312

At an alpha=0.05 level, strongly reject the null hypothesis (p=0.00027) and conclude that concentrations are higher in Sept than they were in June, by a median of 0.275 ug/L.

Exercise 6.4

Here are the consequences for testing differences in mean atrazine concentrations between the two months using the t-test. First, set the nondetects to a zero.

June.atra[June.atra < 0] = 0
Sept.atra[Sept.atra < 0] = 0
t.test(June.atra, Sept.atra, alt = "less", paired = TRUE)

## 
##  Paired t-test
## 
## data:  June.atra and Sept.atra
## t = -1.0698, df = 23, p-value = 0.1479
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##      -Inf 2.365135
## sample estimates:
## mean of the differences 
##                -3.92875

At an alpha=0.05 level, do not reject the null hypothesis that mean concentrations are the same in the two months! This is in spite of a mean difference of 3.9 ug/L and a very significant difference found by the signed-rank test. Why this result?

Because testing for differences between means is not the same as testing whether one group has larger values more often than the other. The test on means is evaluating the total ‘mass’ of each group. The signed-rank test determines whether one group is higher than the other more frequently than could be expected by chance. These are two different objectives.
Because the one large Sept outlier inflates the standard deviation (the noise level for the test), making the signal more difficult to see. The signed-rank test looks at frequencies of occurrence, and doesn’t use the standard deviation in its computation. The nonparametric test is resistant to effects of outliers.
Neither the June nor Sept data follow a normal distribution. The t-test therefore has lower power to see differences.
Setting the nondetects to an arbitrary value (here, zero) could either increase or decrease the standard deviations. Using another arbitrary choice such as one-half the dl could result in a different yet equally arbitrary outcome. A big problem is that substitution declares to the t-test that you know exactly what each of those values are. That is false.

Together, these four problems show that substitution for nondetects followed by a t-test is an arbitrary and invalid procedure. It is also totally avoidable, as the signed-rank test and other familiar nonparametric methods work well for data with nondetects at one detection limit.

Chapter 6, Exercises

Statistical Methods in Water Resources: 2nd Edition Team

June 29, 2016

Exercise 6.1

Exercise 6.2

Exercise 6.3

Exercise 6.4