Comparing two means

M. Drew LaMar
March 18, 2016

“Statisticians, like artists, have the bad habit of falling in love with their models.”

- George Box

Class Announcements

Reading Assignment for Wednesday: Ruxton & Colegrave, Chapter 3
No HW due next week

It all starts with experimental design

We will be comparing the means of numerical variables between two groups.

Definition: In the paired design, both treatments are applied to every sampled unit. In the two-sample design, each treatment group is composed of an independent, random sample of units.

Paired design

Remember standard error: \[ \sigma_{\bar{Y}} = \frac{\sigma}{\sqrt{n}} \]

We can increase power and the precision of our estimates by decreasing the standard error through…

…increasing the sample size (denominator).
…decreasing the variability \( \sigma \) in our measured variable (numerator).

The paired design mainly effects point 2 above, i.e. reduces variability. How?

Experimental Design

Unpaired Design

Paired Design

Paired vs. Unpaired

Unpaired

Paired

Paired design examples

Discuss: Can you come up with an example of a paired and unpaired design?

From the book:

Comparing patient weight before and after hospitalization
Comparing fish species diversity in lakes before and after heavy metal contamination
Testing effects of sunscreen applied to one arm of each subject compared with a placebo applied to the other arm
Testing effects of smoking in a sample of smokers, each of which is compared with a nonsmoker closely matched by age, weight, and ethnic background

Paired design: What is our resulting variable?

Definition: Paired measurements are converted to a single measurement by taking the difference between them.

\[ d = Y_{T}-Y_{C}, \]

where \( Y_{T} \) and \( Y_{C} \) denote the variable in the treatment and control groups, respectively.

Paired design: Estimation

If \( Y_{T}\sim N(\mu_{T},\sigma_{T}^2) \), \( Y_{C}\sim N(\mu_{C},\sigma_{C}^2) \), and \( d = Y_{T}-Y_{C} \), then

\[ d \sim N(\mu_{T}-\mu_{C},\sigma_{T}^2 + \sigma_{C}^2) \]

Confidence intervals

\[ \bar{d} - t_{\alpha(2),df}\mathrm{SE}_{\bar{d}} < \mu_{d} < \bar{d} + t_{\alpha(2),df}\mathrm{SE}_{\bar{d}} \]

Paired design: Hypothesis testing

Paired \( t \)-test: One-sample \( t \)-test on the difference d

\[ H_{0}: \mu_{d} = \mu_{d0} \] \[ H_{A}: \mu_{d} \neq \mu_{d0} \]

Test statistic:

\[ t = \frac{\bar{d} - \mu_{d0}}{SE_{\bar{d}}} \]

Assumptions: Same as one-sample t-test

The sampling units are randomly sampled from population.
Paired differences have normal distribution in population. ~~Original measurements DO NOT have to be normal.~~

Paired design: Practice Problem #1

Question: Can the death rate be influenced by tax incentives?

Kopczuk and Slemrod (2003) investigated this possibility using data on deaths in the United States in years in which the government announced it was changing (usually raising) the tax rate on inheritance (the estate tax). The authors calculated the death rate during the 14 days before, and the 14 days after, the changes in the estate tax rates took effect. The number of deaths per day for each of these periods was recorded.

Paired design: Practice Problem #1

(deathRate <- read.csv("tmp_data/chap12q01DeathAndTaxes.csv"))

   yearOfChange HigherTaxDeaths lowerTaxDeaths
1          1917           22.21          24.93
2          1917           18.86          20.00
3          1919           28.21          29.93
4          1924           31.64          30.64
5          1926           18.43          20.86
6          1932            9.50          10.14
7          1934           24.29          28.00
8          1935           26.64          25.29
9          1940           35.07          35.00
10         1941           38.86          37.57
11         1942           28.50          34.79

Paired design: Practice Problem #1

plot of chunk unnamed-chunk-3

Paired design: Practice Problem #1

with(deathRate, stripchart(list(HigherTaxDeaths, lowerTaxDeaths), vertical = TRUE, group.names = c("Higher","Lower"), xlim=c(0.5, 2.5), pch = 16, col = "firebrick", ylab="Death Rate", xlab="Estate tax rate", cex=1.5, cex.lab=1.5, cex.axis=1.5))
with(deathRate, segments(1,HigherTaxDeaths,2,lowerTaxDeaths))

plot of chunk unnamed-chunk-4

Paired design: Practice Problem #1

Question: What are the null and alternate hypotheses?

Answer:
\[ \begin{align} H_{0}: & \mathrm{Mean \ change \ in \ death \ rate \ is \ zero}\\ H_{A}: & \mathrm{Mean \ change \ in \ death \ rate \ is \ not \ zero} \end{align} \]

Answer:
\[ H_{0}: \mu_{d} = 0 \] \[ H_{A}: \mu_{d} \neq 0 \]

Paired design: Practice Problem #1

Let's do a one-sample \( t \)-test

d <- deathRate$HigherTaxDeaths - deathRate$lowerTaxDeaths
n <- length(d)
sderr <- sd(d)/sqrt(n)
tstat <- (mean(d) - 0)/sderr
pval <- 2*pt(abs(tstat), df=n-1, lower.tail=FALSE)
results <- signif(c(mean(d), n, tstat, pval), 3)
names(results) <- c("Mean diff.", "n", "Statistic", "P-value")
results

Mean diff.          n  Statistic    P-value 
   -1.3600    11.0000    -1.9100     0.0849

Paired design: Practice Problem #1

Let's do a one-sample \( t \)-test

with(deathRate, t.test(HigherTaxDeaths, lowerTaxDeaths, mu = 0, paired = TRUE))


    Paired t-test

data:  HigherTaxDeaths and lowerTaxDeaths
t = -1.9121, df = 10, p-value = 0.08491
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.9408501  0.2244865
sample estimates:
mean of the differences 
              -1.358182