\(t\)-intervals: paired observations

Example taken from [1].

“Consider the data \(d = (49, -67, 8, 16, 6, 23, 28, 41, 14, 29, 56, 24, 75, 60, -48)\). These were obtained by Darwin to investigate the difference between cross and self fertilization in plants. They are the differences in height between cross- and self-fertilized pairs of plants in eighths of an inch. They were used by Fisher (1991b, Chapter 3) to illustrate the use of the Student t test on the hypothesis of no difference between cross- and self-fertilized plants \(H: \delta = 0\).”

References

Statistical Inference in Science

# Differences between pairs
d <- c(49, -67, 8, 16, 6, 23, 28, 41, 14, 29, 56, 24, 75, 60, -48)

n <- length(d)

# t-interval
Int <- c(mean(d) - qt(0.975, df = n-1)*sqrt(var(d))/sqrt(n) , mean(d) + qt(0.975, df = n-1)*sqrt(var(d))/sqrt(n) )

library(knitr)
kable(Int, digits = 4)

x
0.0312
41.8355

# same result using the command t.test
t.test(d, conf.level = 0.95)$conf.int

## [1]  0.03119332 41.83547335
## attr(,"conf.level")
## [1] 0.95

Controversy about misclassification in this data set

The left point of this confidence interval is very close to \(0\) due to the presence of the two anomalous negative observations \(-67, -48\). There is controversy about whether or not these observations were actually misclassified and the role of \(X_i\) and \(Y_i\) were inadvertently interchanged. Let’s explore this idea by changing the sign these observations.

# Differences between pairs
d <- c(49, 67, 8, 16, 6, 23, 28, 41, 14, 29, 56, 24, 75, 60, 48)

n <- length(d)

# t-interval
Int <- c(mean(d) - qt(0.975, df = n-1)*sqrt(var(d))/sqrt(n) , mean(d) + qt(0.975, df = n-1)*sqrt(var(d))/sqrt(n) )

library(knitr)
kable(Int, digits = 4)

x
24.0719
48.4615

# same result using the command t.test
t.test(d, conf.level = 0.95)$conf.int

## [1] 24.07185 48.46148
## attr(,"conf.level")
## [1] 0.95

Test of Significance

Original data

# Differences between pairs
d <- c(49, -67, 8, 16, 6, 23, 28, 41, 14, 29, 56, 24, 75, 60, -48)
n <- length(d)

# Discrepancy measure (t test statistic)
t0 <- mean(d)/sqrt(var(d)/n)

# P value

P <- pt(-t0, df = n-1) + 1 - pt(t0, df = n-1)
library(knitr)
kable(P, digits = 4)

x
0.0497

# Same result using the command t.test, fixing the alternative accordingly

t.test(d, alternative = "two.sided" )

## 
##  One Sample t-test
## 
## data:  d
## t = 2.148, df = 14, p-value = 0.0497
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##   0.03119332 41.83547335
## sample estimates:
## mean of x 
##  20.93333

Changing the sign of two observations data

# Differences between pairs
d <- c(49, 67, 8, 16, 6, 23, 28, 41, 14, 29, 56, 24, 75, 60, 48)
n <- length(d)

# Discrepancy measure (t test statistic)
t0 <- mean(d)/sqrt(var(d)/n)

# P value

P <- pt(-t0, df = n-1) + 1 - pt(t0, df = n-1)
library(knitr)
kable(P, digits = 8)

x
1.715e-05

# Same result using the command t.test, fixing the alternative accordingly

t.test(d, alternative = "two.sided" )

## 
##  One Sample t-test
## 
## data:  d
## t = 6.3785, df = 14, p-value = 1.715e-05
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  24.07185 48.46148
## sample estimates:
## mean of x 
##  36.26667

Paired differences: the Darwin data

Francisco Javier Rubio

January 02, 2019

\(t\)-intervals: paired observations

Controversy about misclassification in this data set

Test of Significance

Original data

Changing the sign of two observations data