Example taken from [1].
“Consider the data \(d = (49, -67, 8, 16, 6, 23, 28, 41, 14, 29, 56, 24, 75, 60, -48)\). These were obtained by Darwin to investigate the difference between cross and self fertilization in plants. They are the differences in height between cross- and self-fertilized pairs of plants in eighths of an inch. They were used by Fisher (1991b, Chapter 3) to illustrate the use of the Student t test on the hypothesis of no difference between cross- and self-fertilized plants \(H: \delta = 0\).”
References
# Differences between pairs
d <- c(49, -67, 8, 16, 6, 23, 28, 41, 14, 29, 56, 24, 75, 60, -48)
n <- length(d)
# t-interval
Int <- c(mean(d) - qt(0.975, df = n-1)*sqrt(var(d))/sqrt(n) , mean(d) + qt(0.975, df = n-1)*sqrt(var(d))/sqrt(n) )
library(knitr)
kable(Int, digits = 4)
x |
---|
0.0312 |
41.8355 |
# same result using the command t.test
t.test(d, conf.level = 0.95)$conf.int
## [1] 0.03119332 41.83547335
## attr(,"conf.level")
## [1] 0.95
The left point of this confidence interval is very close to \(0\) due to the presence of the two anomalous negative observations \(-67, -48\). There is controversy about whether or not these observations were actually misclassified and the role of \(X_i\) and \(Y_i\) were inadvertently interchanged. Let’s explore this idea by changing the sign these observations.
# Differences between pairs
d <- c(49, 67, 8, 16, 6, 23, 28, 41, 14, 29, 56, 24, 75, 60, 48)
n <- length(d)
# t-interval
Int <- c(mean(d) - qt(0.975, df = n-1)*sqrt(var(d))/sqrt(n) , mean(d) + qt(0.975, df = n-1)*sqrt(var(d))/sqrt(n) )
library(knitr)
kable(Int, digits = 4)
x |
---|
24.0719 |
48.4615 |
# same result using the command t.test
t.test(d, conf.level = 0.95)$conf.int
## [1] 24.07185 48.46148
## attr(,"conf.level")
## [1] 0.95
# Differences between pairs
d <- c(49, -67, 8, 16, 6, 23, 28, 41, 14, 29, 56, 24, 75, 60, -48)
n <- length(d)
# Discrepancy measure (t test statistic)
t0 <- mean(d)/sqrt(var(d)/n)
# P value
P <- pt(-t0, df = n-1) + 1 - pt(t0, df = n-1)
library(knitr)
kable(P, digits = 4)
x |
---|
0.0497 |
# Same result using the command t.test, fixing the alternative accordingly
t.test(d, alternative = "two.sided" )
##
## One Sample t-test
##
## data: d
## t = 2.148, df = 14, p-value = 0.0497
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 0.03119332 41.83547335
## sample estimates:
## mean of x
## 20.93333
# Differences between pairs
d <- c(49, 67, 8, 16, 6, 23, 28, 41, 14, 29, 56, 24, 75, 60, 48)
n <- length(d)
# Discrepancy measure (t test statistic)
t0 <- mean(d)/sqrt(var(d)/n)
# P value
P <- pt(-t0, df = n-1) + 1 - pt(t0, df = n-1)
library(knitr)
kable(P, digits = 8)
x |
---|
1.715e-05 |
# Same result using the command t.test, fixing the alternative accordingly
t.test(d, alternative = "two.sided" )
##
## One Sample t-test
##
## data: d
## t = 6.3785, df = 14, p-value = 1.715e-05
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 24.07185 48.46148
## sample estimates:
## mean of x
## 36.26667