Two-Sample t-Tests

August 25, 2017

Introduction

Background

t-Tests are a great way of identifying if two group means are statistically different.

This can be done by comparing a sample to the population (one-sample) or comparing two different samples (two-sample).

Applications

In the military, a common application of t-tests is in Test and Evaluation (T&E).

Does a new acquisition program meet or exceed standards?
Does an upgrade to an existing program perform at least as well as the legacy system?

Especially advantegous in OT&E when sample sizes are small due to cost.

Packages

No packages are needed to perform t-tests.

All functions come from the built-in stats package: t.test and pairwise.t.test.

However, I did utilize dplyr for efficiency.

library(dplyr)

Data

We will leverage built-in R data from sleep and airquality.

sleep: effects of two drugs on sleep for 10 patients. There are 20 observations on 3 variables: increase in hours of sleep, drug given, patient ID.

airquality: daily air quality measurements in New York, from May to September 1973. There are 154 observations on 6 variables: ozone, solar radiation, wind, temperature, month, and day.

Two-Sample t-Tests

Unpaired t-Test

The unpaired t-test is used when you want to compare two independent groups.

Suppose we want to test if the average increased sleep with two different drugs are equal, using the sleep dataset and t.test function.

t.test(extra~group, data = sleep)

## 
##  Welch Two Sample t-test
## 
## data:  extra by group
## t = -1.8608, df = 17.776, p-value = 0.07939
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.3654832  0.2054832
## sample estimates:
## mean in group 1 mean in group 2 
##            0.75            2.33

What does this mean?

Notice the results that t.test provides:

test statistic
degrees of freedom
p-value
confidence interval
sample mean of each group
the alternative hypothesis

You can also save the result of t.test and use the $ operator to extract them:

sleep_unpaired <- t.test(extra~group, data = sleep)
sleep_unpaired$p.value

## [1] 0.07939414

Based on the sleep data, is the unpaired t-test the appropriate test to use?

Paired t-Test

Use paired t-tests when obervations from one group are paired with the other.

This can be done easily in R, by simply adding paired = TRUE when calling t.test and/or pairwise.t.test.

Using the sleep dataset and t.test function again, does one drug significantly increase sleep more than the other?

t.test(extra~group, data = sleep, paired = TRUE)

## 
##  Paired t-test
## 
## data:  extra by group
## t = -4.0621, df = 9, p-value = 0.002833
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.4598858 -0.7001142
## sample estimates:
## mean of the differences 
##                   -1.58

Which drug increases sleep the most?

If you compare the results of the paired test with the unpaired, you may notice that the p-values are different: 0.079 for the unpaired test and 0.003 for the paired. Why is this?

Even though the two methods are comparing group means, they are looking at the data in different ways. Just look at how the test statistics are computed.

\[ t_{unpaired} = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{s^2(\frac{1}{n_1} + \frac{1}{n_2})}} \]

\[ t_{paired} = \frac{\bar{d}}{\frac{s_d}{\sqrt{n}}} \]

The unpaired test compares group means idependently. However, when utilizing paired t-tests, the observations are not independent and so it tests the mean difference to determine if there is a signficant difference between treatments.

Other t-Test Adjustments

Alternative Hypothesis

You can adjust the alternative hypothesis easily when calling the function.

The default alternative is two-sided, which means your null hypothesis is $H_0 = 0$ and the alternative is $H_A \neq 0$.

For one-sided tests, set

alternative = "less" if $H_0 \ge 0$ and $H_A < 0$, and
alternative = "greater" if $H_0 \le 0$ and $H_A > 0$.

Is the difference in average increased sleep for the two groups $\le 0$ ?

t.test(extra~group, data = sleep, alternative = "greater")

## 
##  Welch Two Sample t-test
## 
## data:  extra by group
## t = -1.8608, df = 17.776, p-value = 0.9603
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -3.053381       Inf
## sample estimates:
## mean in group 1 mean in group 2 
##            0.75            2.33

What does this mean?

Defining $\mu$

You can also adjust the true value of the mean difference being tested by setting $\mu$ = d , where d is the true value of the mean difference.

Is the difference in average increased sleep for the two groups $\ge 1$ ?

t.test(extra~group, data = sleep, alternative = "less", mu = 1)

## 
##  Welch Two Sample t-test
## 
## data:  extra by group
## t = -3.0385, df = 17.776, p-value = 0.003571
## alternative hypothesis: true difference in means is less than 1
## 95 percent confidence interval:
##        -Inf -0.1066185
## sample estimates:
## mean in group 1 mean in group 2 
##            0.75            2.33

What does this mean?

Equal Variances

The default for both t.test and pairwise.t.test is to assume unequal variances.

However, if you would like to test with the assumption that the two variances are equal, you can add var.equal = TRUE when you call the function.

Assuming that the variances are equal, is the average increased sleep for the two groups equal?

t.test(extra~group, data = sleep, var.eq = TRUE)

## 
##  Two Sample t-test
## 
## data:  extra by group
## t = -1.8608, df = 18, p-value = 0.07919
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.363874  0.203874
## sample estimates:
## mean in group 1 mean in group 2 
##            0.75            2.33

What does this mean?

Confidence Level

The default confidence level of the interval is 0.95 (95%).

You can adjust the confidence level that you are interested in by adding conf.level = $\alpha$ , where 0 < $\alpha$ < 1.

Can we detect a difference in average increased sleep for the two groups at an 80% confidence level?

t.test(extra~group, data = sleep, conf.level = .8)

## 
##  Welch Two Sample t-test
## 
## data:  extra by group
## t = -1.8608, df = 17.776, p-value = 0.07939
## alternative hypothesis: true difference in means is not equal to 0
## 80 percent confidence interval:
##  -2.7101645 -0.4498355
## sample estimates:
## mean in group 1 mean in group 2 
##            0.75            2.33

What does this mean?

Multiple Pairwise Comparisons

Until now, we have looked solely at applications of t.test. Now we will focus on the advantages of using pairwise.t.test.

Unlike t.test, pairwise.t.test only provides the p-value for the comparison.

The t.test function can only test two samples at a time. Whereas, pairwise.t.test can perform multiple pairwise comparisons and output a triangular matrix of the p-values between all groups.

Multiple Pairwise Comparisons

In addition, pairwise.t.test will adjust the p-values using the "holm" method by default. You can set p.adjust = to any of the following options:

## [1] "holm"       "hochberg"   "hommel"     "bonferroni" "BH"        
## [6] "BY"         "fdr"        "none"

Using any of the above methods, besides "none." will allow you to maintain the overall p.value across multiple comparisons.

Using the airquality dataset, can we detect a difference in mean Ozone between months?

attach(airquality)
Month <- factor(Month, labels = month.abb[5:9])
pairwise.t.test(Ozone, Month)

## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  Ozone and Month 
## 
##     May     Jun     Jul     Aug    
## Jun 1.00000 -       -       -      
## Jul 0.00026 0.05113 -       -      
## Aug 0.00019 0.04987 1.00000 -      
## Sep 1.00000 1.00000 0.00488 0.00388
## 
## P value adjustment method: holm

How do the mean Ozones compare in May and July?

(MayAvg<- airquality %>%
          filter(Month == 5) %>%
          select(Ozone) %>%
          as.matrix() %>%
          mean(na.rm = T))

## [1] 23.61538

(JulyAvg<-airquality %>%
          filter(Month == 7) %>%
          select(Ozone) %>%
          as.matrix() %>%
          mean(na.rm = T))

## [1] 59.11538

Conclusion

You can easily do two-sample t-tests with built-in R functions.

You have learned how to do unpaired, paired, and multiple pairwise t-tests.

In addition, you now know how to change the default settings in order to customize the t-tests to get the results you need.

Now…let's test your knowledge!

Exercises

For the following questions, utilize the built-in npk dataset.

This is a three-factor fractional factorial experiment conducted on 6 blocks.

The three factors are N (nitrogen), P (phosphate), and K (potassium).
If N, P, or K equals 1, then that element was applied at that datapoint; 0 indicates that it was not applied
The response is labeled Yield and indicates the yield of peas in pounds/plot.

Exercises

Is there a difference in average yield between groups with Nitrogen and groups without? Suppose, you want to detect the difference with 90% confidence.
Using Bonferroni with 95% confidence level, calculate all pairwise comparisons between each of the 7 treatments. (i.e. N, K, P, NK, NP, KP, NKP)
Which factor (or combination of factors) appears to affect pea yield the most? That is, which factor, or combination, yields the most peas in pounds/plot?

Exercise 1

t.test(yield~N, data = npk, conf.level = .9)

## 
##  Welch Two Sample t-test
## 
## data:  yield by N
## t = -2.4618, df = 21.88, p-value = 0.02218
## alternative hypothesis: true difference in means is not equal to 0
## 90 percent confidence interval:
##  -9.535247 -1.698086
## sample estimates:
## mean in group 0 mean in group 1 
##        52.06667        57.68333

Exercise 2

npk <- mutate(npk,fac_combined = 
                ifelse(N == 1 & P == 0 & K == 0, "N" ,
                ifelse(N == 0 & P == 1 & K == 0, "P",
                ifelse(N == 0 & P == 0 & K == 1, "K",       
                ifelse(N == 1 & P == 1 & K == 0, "NP",
                ifelse(N == 0 & P == 1 & K == 1, "PK", 
                ifelse(N == 1 & P == 0 & K == 1, "NK", "NPK")))))))

attach(npk)
pairwise.t.test(yield, fac_combined, data = npk, p.adj = "bonf" )

## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  yield and fac_combined 
## 
##     K    N    NK   NP   NPK  P   
## N   0.36 -    -    -    -    -   
## NK  1.00 1.00 -    -    -    -   
## NP  1.00 1.00 1.00 -    -    -   
## NPK 1.00 0.25 1.00 1.00 -    -   
## P   1.00 1.00 1.00 1.00 1.00 -   
## PK  1.00 0.18 1.00 1.00 1.00 1.00
## 
## P value adjustment method: bonferroni

Exercise 3

boxplot(yield~fac_combined, 
        xlab =  "Treatment", 
        ylab = "Pea Yield in Pounds/Plot", 
        main = "Yield Comparison Across Treatments")

AvgYields <- group_by(npk, fac_combined) %>%
             summarize(mean(yield)) %>%
             arrange(desc(`mean(yield)`))

## # A tibble: 7 x 2
##   fac_combined `mean(yield)`
##          <chr>         <dbl>
## 1            N      63.76667
## 2           NP      57.93333
## 3           NK      54.66667
## 4            P      54.33333
## 5          NPK      52.90000
## 6            K      52.00000
## 7           PK      50.50000

t.test(yield[fac_combined == "N"], yield[fac_combined == "NP"])

## 
##  Welch Two Sample t-test
## 
## data:  yield[fac_combined == "N"] and yield[fac_combined == "NP"]
## t = 1.3516, df = 3.9781, p-value = 0.2482
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -6.175176 17.841843
## sample estimates:
## mean of x mean of y 
##  63.76667  57.93333

t.test(yield[fac_combined == "N"], yield[fac_combined == "NK"])

## 
##  Welch Two Sample t-test
## 
## data:  yield[fac_combined == "N"] and yield[fac_combined == "NK"]
## t = 2.386, df = 3.8671, p-value = 0.07771
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.634076 19.834076
## sample estimates:
## mean of x mean of y 
##  63.76667  54.66667

t.test(yield[fac_combined == "N"], yield[fac_combined == "P"])

## 
##  Welch Two Sample t-test
## 
## data:  yield[fac_combined == "N"] and yield[fac_combined == "P"]
## t = 1.5274, df = 3.0762, p-value = 0.2219
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -9.949417 28.816083
## sample estimates:
## mean of x mean of y 
##  63.76667  54.33333

t.test(yield[fac_combined == "N"], yield[fac_combined == "NPK"])

## 
##  Welch Two Sample t-test
## 
## data:  yield[fac_combined == "N"] and yield[fac_combined == "NPK"]
## t = 3.1197, df = 3.7148, p-value = 0.0393
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   0.8956038 20.8377296
## sample estimates:
## mean of x mean of y 
##  63.76667  52.90000

Introduction

Background

Applications

Packages

Data

Two-Sample t-Tests

Unpaired t-Test

Paired t-Test

Other t-Test Adjustments

Alternative Hypothesis

Defining \(\mu\)

Equal Variances

Confidence Level

Multiple Pairwise Comparisons

Multiple Pairwise Comparisons

Conclusion

Exercises

Exercises

Exercise 1

Exercise 2

Exercise 3

Questions?