“Statisticians, like artists, have the bad habit of falling in love with their models.”
– George Box
It all starts with experimental design
We will be comparing the means of a numerical variable between two groups.
Definition: In the paired design, both treatments are applied to every sampled unit. In the two-sample design, each treatment group is composed of an independent, random sample of units.
It all starts with experimental design
We will be comparing the means of a numerical variable between two groups.
Definition: In the paired design, both treatments are applied to every sampled unit. In the two-sample design, each treatment group is composed of an independent, random sample of units.
Data:
Response: One numerical variable
Explanatory: One categorical variable with 2 levels
Paired vs. unpaired
Paired designs
The sample size in each group is the same.
We want to estimate the mean of the differences.
Unpaired designs
The sample size in each group may not be the same.
We want to estimate the difference of the means.
Paired design
Remember standard error: \[
\sigma_{\bar{Y}} = \frac{\sigma}{\sqrt{n}}
\]
We can increase power and the precision of our estimates by decreasing the standard error through…
…increasing the sample size (denominator).
…decreasing the variability \(\sigma\) in our measured variable (numerator).
The paired design mainly effects point 2 above, i.e. reduces variability. How?
Experimental Design
Unpaired Design
Paired Design
Paired vs. Unpaired
Unpaired
Paired
Paired design examples
Discuss: Can you come up with an example of a paired and unpaired design?
From the book:
Comparing patient weight before and after hospitalization
Comparing fish species diversity in lakes before and after heavy metal contamination
Testing effects of sunscreen applied to one arm of each subject compared with a placebo applied to the other arm
Testing effects of smoking in a sample of smokers, each of which is compared with a nonsmoker closely matched by age, weight, and ethnic background
Paired design: What is our resulting variable?
Definition: Paired measurements are converted to a single measurement by taking the difference between them.
\[d = Y_{T}-Y_{C},\]
where \(Y_{T}\) and \(Y_{C}\) denote the variable in the treatment and control groups, respectively.
Paired design: Estimation
If \(Y_{T}\sim N(\mu_{T},\sigma_{T}^2)\), \(Y_{C}\sim N(\mu_{C},\sigma_{C}^2)\), and \(d = Y_{T}-Y_{C}\), then
The sampling units are randomly sampled from population.
Paired differences have normal distribution in population. Original measurements DO NOT have to be normal.
Paired design: Practice Problem #1
Question: Can the death rate be influenced by tax incentives?
Kopczuk and Slemrod (2003) investigated this possibility using data on deaths in the United States in years in which the government announced it was changing (usually raising) the tax rate on inheritance (the estate tax). The authors calculated the death rate during the 14 days before, and the 14 days after, the changes in the estate tax rates took effect. The number of deaths per day for each of these periods was recorded.
n sderr tstat pval
1 11 0.7103096 -1.912098 0.08491016
Paired design: Practice Problem #1
Let’s do a one-sample \(t\)-test
with(deathRate, t.test(HigherTaxDeaths, lowerTaxDeaths, mu =0, paired =TRUE))
Paired t-test
data: HigherTaxDeaths and lowerTaxDeaths
t = -1.9121, df = 10, p-value = 0.08491
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.9408501 0.2244865
sample estimates:
mean of the differences
-1.358182
Definition: The standard error of the difference of the means between two groups is given by \[
\mathrm{SE}_{\bar{Y_{1}}-\bar{Y_{2}}} = \sqrt{s_{p}^2\left(\frac{1}{n_{1}} + \frac{1}{n_{2}}\right)}
\] where pooled sample variance\(s_{p}^{2}\) is given by \[
s_{p}^2 = \frac{df_{1}s_{1}^2 + df_{2}s_{2}^2}{df_{1}+df_{2}}.
\]
Two-sample design: Estimation
Since sampling distribution of \(\bar{Y}_{1} - \bar{Y}_{2}\) is normal
A study in West Africa (Lefèvre et al. 2010), working with the mosquito species that carry malaria, wondered whether drinking the local beer influenced attractiveness to mosquitoes. They opened a container holding 50 mosquitoes next to each of 25 alcohol-free participants and measured the proportion of mosquitoes that left the container and flew toward the participants. They repeated this procedure 15 minutes after each of the same participants had consumed a liter of beer, measuring the change in proportion (treatment group). This procedure was also carried out on another 18 human participants who were given water instead of beer (control group).
Two Sample t-test
data: change by drink
t = 3.1913, df = 41, p-value = 0.002717
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.05383517 0.23940928
sample estimates:
mean in group beer mean in group water
0.154400000 0.007777778
Short way, again using t.test (note the var.equal=TRUE):
Two Sample t-test
data: change by drink
t = 3.1913, df = 41, p-value = 0.002717
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.05383517 0.23940928
sample estimates:
mean in group beer mean in group water
0.154400000 0.007777778
Two-sample design: Assumptions
Heuristic for meeting two-sample assumptions:
Moderate sample sizes (\(n_{1}, n_{2} > 30\))
Balanced: \(n_{1} \approx n_{2}\)
\(1/3 \leq s_{2}/s_{1} \leq 3\)
Robust to deviations in normality.
Two-sample design: Testing Example
Moderate sample sizes (\(n_{1}, n_{2} > 30\))
Balanced: \(n_{1} \approx n_{2}\)
\(1/3 \leq s_{2}/s_{1} \leq 3\)
n
beer water
25 18
sqrt(vari)
beer water
0.1622519 0.1269347
Two-sample design: Testing Example
What to do if can’t meet assumptions of two-sample \(t\)-test?
Definition:Welch’s t-test compares the mean of two groups and can be used even when the variances of the two groups are not equal.
Standard error and degrees of freedom are calculated differently than two-sample \(t\)-test, but otherwise the same (i.e. uses a \(t\)-distribution).
Two-sample design: Welch’s t-test
Same as two-sample in R, except var.equal=FALSE (default).
Welch Two Sample t-test
data: change by drink
t = 3.3219, df = 40.663, p-value = 0.001897
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.05746134 0.23578311
sample estimates:
mean in group beer mean in group water
0.154400000 0.007777778
Comparing variances
Question: Do populations differ in the variability of measurements?
Remember, it isn’t always about inferring central tendency!
There are two main tests:
\(F\)-test (Warning: Highly sensitive to departures from normality assumption)
Levene’s test (More robust to departures from normality, but at a cost - loss of power!)
Comparing variances
Example 12.4
The brook trout is a species native to eastern North America that has been introduced into streams in the West for sport fishing. Biologists followed the survivorship of a native species, chinook salmon, in a series of 12 streams that either had brook trout introduced or did not (Levin et al. 2002). Their goal was to determine whether the presence of brook trout effected the survivorship of the salmon. In each stream, they released a number of tagged juvenile chinook and then recorded whether or not each chinook survived over one year.
var.test(proportionSurvived ~ troutTreatment, data = chinook)
F test to compare two variances
data: proportionSurvived by troutTreatment
F = 12.165, num df = 5, denom df = 5, p-value = 0.01589
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
1.702272 86.936360
sample estimates:
ratio of variances
12.16509
Comparing variances - Levene’s test
library(car)leveneTest(chinook$proportionSurvived, group = chinook$troutTreatment, center = mean)
Levene's Test for Homogeneity of Variance (center = mean)
Df F value Pr(>F)
group 1 10.315 0.009306 **
10
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Visualizing significance between groups
How to compare between two groups with only confidence intervals?
The fallacy of indirect comparison
Example 12.5: Mommy’s baby, Daddy’s maybe
Question: Do babies look more like their fathers or their mothers?
The fallacy of indirect comparison
Example 12.5: Mommy’s baby, Daddy’s maybe
Question: Do babies look more like their fathers or their mothers?
Christenfeld and Hill (1995) predicted that babies more resemble their fathers, due to the hypothesis that this resemblance affords an evolutionary advantage of increased paternal care. They tested this by obtaining pictures of a series of babies and their mothers and fathers. Particpants shown picture of child, and either three possible mothers or three possible fathers (one is correct).
The fallacy of indirect comparison
Conclusion: Authors concluded that since fathers turned up statistically significant and mothers did not, that babies more resembled their fathers than their mothers.
Discuss: What’s the mistake here?
Mistake: Misinterpretation of statistical significance
The fallacy of indirect comparison
Fallacy: If one test in Group 1 shows with statistical significance that \(\mu_{1} > \mu_{0}\), and the same test in Group 2 does not show \(\mu_{2} > \mu_{0}\), then this shows with statistical significance that \(\mu_{1} > \mu_{2}\).
The fallacy of indirect comparison
Fallacy: If one test in Group 1 shows with statistical significance that \(\mu_{1} > \mu_{0}\), and the same test in Group 2 does not show \(\mu_{2} > \mu_{0}\), then this shows with statistical significance that \(\mu_{1} > \mu_{2}\).
The fallacy of indirect comparison
Fallacy: If \(\bar{Y}_{1} > \bar{Y}_{2}\), then \(\mu_{1} > \mu_{2}\).
Mistake:Relying on point estimates rather than interval estimates
The fallacy of indirect comparison
Conclusion: Comparisons between two groups should always be made directly using the appropriate statistical test, not indirectly by comparing both to the same null hypothesized value.