Comparing two means (Part 2)

M. Drew LaMar
October 16, 2020

“Statisticians, like artists, have the bad habit of falling in love with their models.”

- George Box

It all starts with experimental design

We will be comparing the means of a numerical variable between two groups.

Definition: In the paired design, both treatments are applied to every sampled unit. In the two-sample design, each treatment group is composed of an independent, random sample of units.

It all starts with experimental design

We will be comparing the means of a numerical variable between two groups.

Definition: In the paired design, both treatments are applied to every sampled unit. In the two-sample design, each treatment group is composed of an independent, random sample of units.

Data:

Response: One numerical variable
Explanatory: One categorical variable with 2 levels

Paired vs. unpaired

Paired designs

The sample size in each group is the same.
We want to estimate the mean of the differences.

Unpaired designs

The sample size in each group may not be the same.
We want to estimate the difference of the means.

Paired design: Practice Problem #1

Question: Can the death rate be influenced by tax incentives?

Kopczuk and Slemrod (2003) investigated this possibility using data on deaths in the United States in years in which the government announced it was changing (usually raising) the tax rate on inheritance (the estate tax). The authors calculated the death rate during the 14 days before, and the 14 days after, the changes in the estate tax rates took effect. The number of deaths per day for each of these periods was recorded.

Paired design: Practice Problem #1

   yearOfChange HigherTaxDeaths lowerTaxDeaths
1          1917           22.21          24.93
2          1917           18.86          20.00
3          1919           28.21          29.93
4          1924           31.64          30.64
5          1926           18.43          20.86
6          1932            9.50          10.14
7          1934           24.29          28.00
8          1935           26.64          25.29
9          1940           35.07          35.00
10         1941           38.86          37.57
11         1942           28.50          34.79

Paired design: Practice Problem #1

plot of chunk stripchart

Paired design: Practice Problem #1

with(deathRate, 
     stripchart(list(HigherTaxDeaths, 
                     lowerTaxDeaths), 
                vertical = TRUE, 
                group.names = c("Higher","Lower"), 
                xlim=c(0.5, 2.5), 
                pch = 16, 
                col = "firebrick", 
                ylab="Death Rate", xlab="Estate tax rate", 
                cex=1.5, cex.lab=1.5, cex.axis=1.5))

with(deathRate, 
     segments(1, HigherTaxDeaths, 
              2, lowerTaxDeaths))

Paired design: Practice Problem #1

Question: What are the null and alternate hypotheses?

Answer:
\[ \begin{align} H_{0}: & \mathrm{Mean \ change \ in \ death \ rate \ is \ zero}\\ H_{A}: & \mathrm{Mean \ change \ in \ death \ rate \ is \ not \ zero} \end{align} \]

Answer:
\[ H_{0}: \mu_{d} = 0 \] \[ H_{A}: \mu_{d} \neq 0 \]

Paired design: Practice Problem #1

Let's do a one-sample \( t \)-test

deathRate %>% 
  mutate(d = HigherTaxDeaths - lowerTaxDeaths) %>% 
  summarize(n = length(d),
            sderr = sd(d)/sqrt(n),
            tstat = (mean(d) - 0)/sderr,
            pval = 2*pt(abs(tstat), df=n-1, lower.tail=FALSE))

   n     sderr     tstat       pval
1 11 0.7103096 -1.912098 0.08491016

Paired design: Practice Problem #1

Let's do a one-sample \( t \)-test

with(deathRate, 
     t.test(HigherTaxDeaths, 
            lowerTaxDeaths, 
            mu = 0, 
            paired = TRUE))

Paired design: Practice Problem #1

Let's do a one-sample \( t \)-test


    Paired t-test

data:  HigherTaxDeaths and lowerTaxDeaths
t = -1.9121, df = 10, p-value = 0.08491
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.9408501  0.2244865
sample estimates:
mean of the differences 
              -1.358182

Two-sample design: Estimation

\[ \bar{Y}_{1}\sim N(\mu_{1},\sigma_{\bar{Y}_{1}}^2) \ \mathrm{and} \ \bar{Y}_{2}\sim N(\mu_{2},\sigma_{\bar{Y}_{2}}^2) \\ \bar{Y}_{1} - \bar{Y}_{2}\sim N(\mu_{1}-\mu_{2}, \sigma_{\bar{Y}_{1}}^2+\sigma_{\bar{Y}_{2}}^2) \]

Definition: The standard error of the difference of the means between two groups is given by \[ \mathrm{SE}_{\bar{Y_{1}}-\bar{Y_{2}}} = \sqrt{s_{p}^2\left(\frac{1}{n_{1}} + \frac{1}{n_{2}}\right)} \] where pooled sample variance \( s_{p}^{2} \) is given by

\[ s_{p}^2 = \frac{df_{1}s_{1}^2 + df_{2}s_{2}^2}{df_{1}+df_{2}}. \]

Two-sample design: Estimation

Since sampling distribution of \( \bar{Y}_{1} - \bar{Y}_{2} \) is normal

\[ \bar{Y}_{1} - \bar{Y}_{2}\sim N(\mu_{1}-\mu_{2}, \sigma_{\bar{Y}_{1}}^2+\sigma_{\bar{Y}_{2}}^2) \]

the sampling distribution of the statistic

\[ t = \frac{\left(\bar{Y}_{1} - \bar{Y}_{2}\right) - \left(\mu_{1}-\mu_{2}\right)}{\mathrm{SE}_{\bar{Y}_{1} - \bar{Y}_{2}}} \]

has a Student's \( t \)-distribution with total degrees of freedom given by

\[ df = df_{1} + df_{2} = n_{1} + n_{2} - 2. \]

Two-sample design: Estimation

Confidence intervals

\[ \left(\bar{Y}_{1} - \bar{Y}_{2}\right) - t_{\alpha(2),df}\mathrm{SE}_{\bar{Y}_{1} - \bar{Y}_{2}} < \mu_{1}-\mu_{2} < \left(\bar{Y}_{1} - \bar{Y}_{2}\right) + t_{\alpha(2),df}\mathrm{SE}_{\bar{Y}_{1} - \bar{Y}_{2}} \]

Two-sample design: Estimation Example

Practice Problem #16

A study in West Africa (Lefèvre et al. 2010), working with the mosquito species that carry malaria, wondered whether drinking the local beer influenced attractiveness to mosquitoes. They opened a container holding 50 mosquitoes next to each of 25 alcohol-free participants and measured the proportion of mosquitoes that left the container and flew toward the participants. They repeated this procedure 15 minutes after each of the same participants had consumed a liter of beer, measuring the change in proportion (treatment group).

Two-sample design: Estimation Example

Practice Problem #16

(cont'd) This procedure was also carried out on another 18 human participants who were given water instead of beer (control group).

Two-sample design: Estimation Example

mydata <- read.csv("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter12/chap12q16BeerAndMosquitoes.csv")
str(mydata)

'data.frame':   43 obs. of  4 variables:
 $ drink      : chr  "beer" "beer" "beer" "beer" ...
 $ beforeDrink: num  0.13 0.13 0.21 0.25 0.25 0.32 0.43 0.44 0.46 0.5 ...
 $ afterDrink : num  0.49 0.59 0.27 0.43 0.5 0.5 0.37 0.3 0.58 0.89 ...
 $ change     : num  0.36 0.46 0.06 0.18 0.25 0.18 -0.06 -0.14 0.12 0.39 ...

Two-sample design: Estimation Example

Discuss: What type of plot would be good here?

plot of chunk unnamed-chunk-8

Two-sample design: Estimation Example

The long way, first computing the means, and then sample sizes:

(meanChange <- tapply(mydata$change, 
                      mydata$drink, 
                      mean))

       beer       water 
0.154400000 0.007777778

(n <- tapply(mydata$change, 
             mydata$drink, 
             length))

 beer water 
   25    18

Two-sample design: Estimation Example

Now, degree of freedoms and variances:

dfs <- n-1
(vari <- tapply(mydata$change, mydata$drink, var))

      beer      water 
0.02632567 0.01611242

Pooled sample variance, which is actually a weighted mean (weighted by degrees of freedom):

(sp <- weighted.mean(vari, dfs))

[1] 0.02209091

Two-sample design: Estimation Example

Finally, find 95% confidence interval:

sderr <- sqrt(sp*sum(1/n)) # Standard error
df <- sum(n) - 2 # Degree of freedom
tcrit <- qt(0.025, df=df, lower.tail=FALSE)
diffMeans <- meanChange["beer"] - meanChange["water"]
lower <- diffMeans - tcrit*sderr
upper <- diffMeans + tcrit*sderr
(CI <- unname(c(lower, upper)))

[1] 0.05383517 0.23940928

Two-sample design: Estimation Example

Using dplyr:

mydata %>% 
  group_by(drink) %>%
  summarize(mu = mean(change),
            n = n(),
            df = n-1,
            vari = var(change)) %>% 
  summarize(sp = weighted.mean(vari, df),
            sderr = sqrt(sp*sum(1/n)),
            df = sum(df),
            tcrit = qt(0.025, df=df, lower.tail=FALSE),
            diffMeans = first(mu)-last(mu),
            lower = diffMeans - tcrit*sderr,
            upper = diffMeans + tcrit*sderr)

# A tibble: 1 x 7
      sp  sderr    df tcrit diffMeans  lower upper
   <dbl>  <dbl> <dbl> <dbl>     <dbl>  <dbl> <dbl>
1 0.0221 0.0459    41  2.02     0.147 0.0538 0.239

Two-sample design: Estimation Example

Now the fast way using t.test:

t.test(change ~ drink, 
       data=mydata, 
       mu=0, 
       var.equal=TRUE)$conf.int

[1] 0.05383517 0.23940928
attr(,"conf.level")
[1] 0.95

Two-sample design: Hypothesis testing

Two-sample \( t \)-test

\[ H_{0}: \mu_{1} - \mu_{2} = \left(\mu_{1} - \mu_{2}\right)_{0} \] \[ H_{A}: \mu_{1} - \mu_{2} \neq \left(\mu_{1} - \mu_{2}\right)_{0} \]

Test statistic:

\[ t = \frac{\left(\bar{Y}_{1} - \bar{Y}_{2}\right) - \left(\mu_{1} - \mu_{2}\right)_{0}}{SE_{\bar{Y}_{1} - \bar{Y}_{2}}} \]

Assumptions:

Each sample is randomly sampled from population.
Numerical variable is normally distributed.

Two-sample design: Hypothesis testing

Two-sample \( t \)-test

\[ H_{0}: \mu_{1} - \mu_{2} = \left(\mu_{1} - \mu_{2}\right)_{0} \] \[ H_{A}: \mu_{1} - \mu_{2} \neq \left(\mu_{1} - \mu_{2}\right)_{0} \]

Test statistic:

\[ t = \frac{\left(\bar{Y}_{1} - \bar{Y}_{2}\right) - \left(\mu_{1} - \mu_{2}\right)_{0}}{SE_{\bar{Y}_{1} - \bar{Y}_{2}}} \]

Assumptions:

~~Variation is same in both populations.~~

Two-sample design: Testing Example

Practice Problem #16

Two-sample \( t \)-test

\[ H_{0}: \mu_{\mathrm{beer}} - \mu_{\mathrm{water}} = 0 \] \[ H_{A}: \mu_{\mathrm{beer}} - \mu_{\mathrm{water}} \neq 0 \]

\[ H_{0}: \mu_{\mathrm{beer}} = \mu_{\mathrm{water}} \] \[ H_{A}: \mu_{\mathrm{beer}} \neq \mu_{\mathrm{water}} \]

Two-sample design: Testing Example

Starting with, as usual, the long way, we need to calculate the test statistic:

\[ t = \frac{\left(\bar{Y}_{1} - \bar{Y}_{2}\right)}{SE_{\bar{Y}_{1} - \bar{Y}_{2}}} \]

(tstat <- unname(meanChange["beer"] - meanChange["water"])/sderr)

[1] 3.191281

tstat > tcrit

[1] TRUE

Two-sample design: Testing Example

Let's calculate \( P \)-value as well:

(pval <- 2*pt(abs(tstat), 
              df=df, 
              lower.tail=FALSE))

[1] 0.00271747

Two-sample design: Testing Example

Short way, again using t.test (note the var.equal=TRUE):

t.test(change ~ drink, 
       mu=0, 
       var.equal=TRUE,
       data=mydata)

Two-sample design: Testing Example

Short way, again using t.test (note the var.equal=TRUE):


    Two Sample t-test

data:  change by drink
t = 3.1913, df = 41, p-value = 0.002717
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.05383517 0.23940928
sample estimates:
 mean in group beer mean in group water 
        0.154400000         0.007777778

Two-sample design: Assumptions

Heuristic for meeting two-sample assumptions:

Moderate sample sizes (\( n_{1}, n_{2} > 30 \))
Balanced: \( n_{1} \approx n_{2} \)
\( |s_{1} - s_{2}| \leq 3 \)

Robust to deviations in normality.

Two-sample design: Testing Example

Moderate sample sizes (\( n_{1}, n_{2} > 30 \))
Balanced: \( n_{1} \approx n_{2} \)
\( |s_{1} - s_{2}| \leq 3 \)

 beer water 
   25    18

sqrt(vari)

     beer     water 
0.1622519 0.1269347

Two-sample design: Testing Example

What to do if can't meet assumptions of two-sample \( t \)-test?

Definition: Welch’s t-test compares the mean of two groups and can be used even when the variances of the two groups are not equal.

Standard error and degrees of freedom are calculated differently than two-sample \( t \)-test, but otherwise the same (i.e. uses a \( t \)-distribution).

Two-sample design: Welch's t-test

Same as two-sample in R, except var.equal=FALSE (default).

t.test(change ~ drink, 
       mu=0, 
       var.equal=FALSE, 
       data=mydata)

Two-sample design: Welch's t-test

Same as two-sample in R, except var.equal=FALSE (default).


    Welch Two Sample t-test

data:  change by drink
t = 3.3219, df = 40.663, p-value = 0.001897
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.05746134 0.23578311
sample estimates:
 mean in group beer mean in group water 
        0.154400000         0.007777778

Section 12.4: Using the correct sampling units

~~This topic (replication and pseudoreplication) is covered in Chapter 5 of Ruxton & Colegrave~~