library(rstatix) #Allows you to do various statistical tests
## Warning: package 'rstatix' was built under R version 4.2.3
##
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
##
## filter
library(haven) #Allows you to import wide variety of data types
## Warning: package 'haven' was built under R version 4.2.3
library(tidyverse) #General coding
## Warning: package 'tidyverse' was built under R version 4.2.3
## Warning: package 'ggplot2' was built under R version 4.2.3
## Warning: package 'tibble' was built under R version 4.2.3
## Warning: package 'tidyr' was built under R version 4.2.3
## Warning: package 'readr' was built under R version 4.2.3
## Warning: package 'purrr' was built under R version 4.2.3
## Warning: package 'dplyr' was built under R version 4.2.3
## Warning: package 'stringr' was built under R version 4.2.3
## Warning: package 'forcats' was built under R version 4.2.3
## Warning: package 'lubridate' was built under R version 4.2.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks rstatix::filter(), stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
library(descr) #Easy frequencies and distributions of discrete variables
## Warning: package 'descr' was built under R version 4.2.3
library(ggplot2) #Graphing
library(skimr) #Easy descriptive data of all variables in dataset
## Warning: package 'skimr' was built under R version 4.2.3
data(mtcars)
data(sleep)
In this tutorial, we will review how to conduct a variety of t-tests and an ANOVA. Specifically, we look at the following:
Independent samples t-tests with continuous/ordinal DV
Paired Samples t-test
ANOVA
Strictly speaking, the dependent variable in these tests should be drawn from a continuous distribution. Practically speaking, you see these tests conducted, in both academia and industry, on both continuous variables as well as ordinal ones (typically with 5+ data points).
We will be working out of the built-in R datasets,
mtcars
& sleep
, in this tutorial.
As previously discussed, you should always review the variables you
have in your dataset prior to conducting any type of analysis. Here, we
start by using the names
function from Base R, which simply
tells us all of the variables in any given dataset. This is useful for
you to easily find the names of the variable(s) you might want to
review. We also use the skim
function from the
skimr
package to quickly assess the descriptive results for
all of the quantitative variables in the data.
We will be using the mpg
variable as our main dependent
variable in this analysis, which is a continuous variable that ranges
from 10.4 miles per gallon at a minimum to 33.4 at a maximum.
names(mtcars)
## [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
## [11] "carb"
skim(mtcars)
Name | mtcars |
Number of rows | 32 |
Number of columns | 11 |
_______________________ | |
Column type frequency: | |
numeric | 11 |
________________________ | |
Group variables | None |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
mpg | 0 | 1 | 20.09 | 6.03 | 10.40 | 15.43 | 19.20 | 22.80 | 33.90 | ▃▇▅▁▂ |
cyl | 0 | 1 | 6.19 | 1.79 | 4.00 | 4.00 | 6.00 | 8.00 | 8.00 | ▆▁▃▁▇ |
disp | 0 | 1 | 230.72 | 123.94 | 71.10 | 120.83 | 196.30 | 326.00 | 472.00 | ▇▃▃▃▂ |
hp | 0 | 1 | 146.69 | 68.56 | 52.00 | 96.50 | 123.00 | 180.00 | 335.00 | ▇▇▆▃▁ |
drat | 0 | 1 | 3.60 | 0.53 | 2.76 | 3.08 | 3.70 | 3.92 | 4.93 | ▇▃▇▅▁ |
wt | 0 | 1 | 3.22 | 0.98 | 1.51 | 2.58 | 3.33 | 3.61 | 5.42 | ▃▃▇▁▂ |
qsec | 0 | 1 | 17.85 | 1.79 | 14.50 | 16.89 | 17.71 | 18.90 | 22.90 | ▃▇▇▂▁ |
vs | 0 | 1 | 0.44 | 0.50 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▆ |
am | 0 | 1 | 0.41 | 0.50 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▆ |
gear | 0 | 1 | 3.69 | 0.74 | 3.00 | 3.00 | 4.00 | 4.00 | 5.00 | ▇▁▆▁▂ |
carb | 0 | 1 | 2.81 | 1.62 | 1.00 | 2.00 | 2.00 | 4.00 | 8.00 | ▇▂▅▁▁ |
summary(mtcars)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
Independent samples t-tests are a useful analytic tool to understand differences between exactly two independent groups on some dependent variable in any set of data. Here, independent means that a unit cannot be a member of both groups being used in the analysis.
Because it requires exactly two groups, the first thing to do is
ensure that the independent variable you want to use has two scale
points. Based on the results from the skim
function above,
we have a good sense of which variables have two scale points but we
start using the freq
function from the descr
package examining the am
variable, which measures the
transmission type of the car.
freq(mtcars$am)
## mtcars$am
## Frequency Percent
## 0 19 59.38
## 1 13 40.62
## Total 32 100.00
Here, we see that the am
variable does in fact have two
values where 1 = manual transmission and 0= automatic. Always start by
writing out your our null and alternative hypotheses:
Ho: No difference in mpg
between manual and automatic
transmissions
Ha: Difference in mpg
between manual and automatic
transmissions.
From lecture and the readings for this week, you learned that
depending on if the variance of the DV is the same or different by
group, you should either use the Student’s t or the Welch’s test.
Because of this, we will start by examining the difference in variances
for mpg
by transmission type.
We will also use the var.test
command to test for
equality of variances across the two groups on the dependent variable.
This test is evaluated in a similar way to others in that you evaluating
the p-value to decide if you reject or fail to reject the null
hypothesis. In this test, the null hypothesis is that the variances are
equal so a significant p-value indicates the variances between the two
groups should not be considered equivalent.
We use two specific functions from tidyverse
to do
this:
1. group_by
which calculates the specified values by
your grouping variable
2. ‘summarise’ which does the actual calculation of the specified value by your grouping variable
<-mtcars %>%
var_compgroup_by(am) %>% #Place your IV here that you want to compare variances between
summarise(mean = mean(mpg), #Your DV goes here
var = var(mpg),
sd = sd(mpg),
n=n())
print(var_comp)
## # A tibble: 2 × 5
## am mean var sd n
## <dbl> <dbl> <dbl> <dbl> <int>
## 1 0 17.1 14.7 3.83 19
## 2 1 24.4 38.0 6.17 13
var.test(mpg~am, data=mtcars) #Statistically assesses the difference in variance between the two groups
##
## F test to compare two variances
##
## data: mpg by am
## F = 0.38656, num df = 18, denom df = 12, p-value = 0.06691
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.1243721 1.0703429
## sample estimates:
## ratio of variances
## 0.3865615
Results show the mean, variance, standard deviation and n-counts for
mpg
by transmission type. We see that for manual
transmissions (when am = 1) the average MPG is 24.4 while for the
automatic transmission the average is 17.1. However, we are interested
in the variance here to decide which t-test to use. The variance for the
manual tranmission type = 38 while for the automatic = 14.7. The p-value
on the var.test
is .06991 which is greater than .05. This
means we would fail to reject the null hypothesis that there are
differences between the variances between the two groups.
We use the Student’s t approach as a result of this test, which assumes constant variance. Note, we will look at both here so you can understand how to calculate each one and to give you a sense of the differences in results.
We use the t.test
function from the rstatix
package to do this (true for both types of t-tests). The only
difference, as seen in the code chunk below, between the code for these
different approaches is that the Welch’s test uses
var.equal = FALSE
while the Student’s uses
var.equal = TRUE
.
#Conduct Welch's Test from the rstatix package
<- mtcars %>% #Dataset name goes in this line of code
welch_t_test t.test(mpg ~ am, data = ., var.equal = FALSE) #DV comes before ~ while IV comes after it
print(welch_t_test)
##
## Welch Two Sample t-test
##
## data: mpg by am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean in group 0 mean in group 1
## 17.14737 24.39231
#Conduct Student's T Test from the rstatix package
<- mtcars %>% #Dataset name goes in this line of code
student_t_test t.test(mpg ~ am, data = ., var.equal = TRUE) #DV comes before ~ while IV comes after it
print(student_t_test)
##
## Two Sample t-test
##
## data: mpg by am
## t = -4.1061, df = 30, p-value = 0.000285
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -10.84837 -3.64151
## sample estimates:
## mean in group 0 mean in group 1
## 17.14737 24.39231
Start by reviewing the actual means for the DV at each grouping level. As noted above, the average MPG is 24.4 while for the automatic transmission the average is 17.1. While this seems like a large difference, remember that the sample size is small for both groups and there is a decent amount of variation. This is one reason why we always review the test statistic and associated p-values before rejecting or failing to reject the null hypothesis.
Results for the Welch’s test show a t-test statistic of -3.77 and a p-value of .0013, which is smaller than .05 indicating a significant difference between the two groups. From this approach, we would reject the null hypothesis that there is no difference in miles per gallon between a car with an automatic and manual transmission. Looking at the 95% confidence interval in the results, you notice that both the lower and upper bound limits are negative. This is because the CI from this test is about the difference between the two means and group 0 (automatic transmission) is lower than the mean for group 1 (manual transmission). We would conclude from that CI that the true population difference between the two types of cars - since this is just a sample of cars from the 1970s rather than all of them - is somewhere between -11.3 and -3.2. Note that 0 is not included in this confidence interval, which is another way for you to review statistical significance.
Turning to the Student’s t-test, which assumes constant variance between the groups, we once draw the same conclusion - i.e. we reject the null hypothesis that there is no difference in mpg between the transmission types. However, there are differences in the results between these two approaches. First the p-value is smaller in the Student’s t result while the t-test statistic is more extreme as well. This also is reflected in the confidence interval which is tighter than in the Welch’s test. One final thing to note, which we reviewed in lecture, is that the degrees of freedom used in the analysis are different. The Welch’s test uses a more complex formula to assess DFs as reviewed in class whereas the Student’s t-test uses n-2 for its DFs calculation.
For the paired samples t-test, your data must consist of the same
units being measured at 2 time periods. Here we will use the
sleep
dataset, which measures 20 students who were given
different sleeping medication to help them sleep. Their sleep hours are
measured first with one sleeping medication and second for the other
medication for the same individual, which makes it appropriate for the
paired samples t-test.
skim(sleep)
Name | sleep |
Number of rows | 20 |
Number of columns | 3 |
_______________________ | |
Column type frequency: | |
factor | 2 |
numeric | 1 |
________________________ | |
Group variables | None |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
group | 0 | 1 | FALSE | 2 | 1: 10, 2: 10 |
ID | 0 | 1 | FALSE | 10 | 1: 2, 2: 2, 3: 2, 4: 2 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
extra | 0 | 1 | 1.54 | 2.02 | -1.6 | -0.03 | 0.95 | 3.4 | 5.5 | ▃▇▃▃▃ |
summary(sleep)
## extra group ID
## Min. :-1.600 1:10 1 :2
## 1st Qu.:-0.025 2:10 2 :2
## Median : 0.950 3 :2
## Mean : 1.540 4 :2
## 3rd Qu.: 3.400 5 :2
## Max. : 5.500 6 :2
## (Other):8
levels(sleep$group)
## [1] "1" "2"
Like always, we want to review the dataset prior to beginning
analysis. From this descriptive review, we see that we have 1 variable
considered continuous - extra
- which measures the number
of additional hours slept by the student (compared to a baseline no
medication measurement not included in the dataset) which will be the
dependent variable in the analysis. There is also an ID
variable which simply identifies the additional sleep hours by student
and a group variable which measures which time period the DV is measured
at. The group variable will be used as the independent variable in the
analysis.
In the below code chunk, we run the actual paired samples t-test.
Note that the code is largely the same as above but with the difference
of using paired=TRUE
, which tells R that this is a paired
sample. Functionally, this matches the IDs for group 1 and 2 to
calculate the difference in groups. Due to how the results are displayed
in the t-test, it can be beneficial to use the same
group_by
approach for calculating means as discussed
above.
<-sleep %>%
slpgroup_by(group) %>% #Place your IV here
summarise(mean = mean(extra), #DV goes here
var = var(extra),
sd = sd(extra),
n=n())
print(slp)
## # A tibble: 2 × 5
## group mean var sd n
## <fct> <dbl> <dbl> <dbl> <int>
## 1 1 0.75 3.20 1.79 10
## 2 2 2.33 4.01 2.00 10
t.test(extra ~ group, data = sleep, paired = TRUE)
##
## Paired t-test
##
## data: extra by group
## t = -4.0621, df = 9, p-value = 0.002833
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -2.4598858 -0.7001142
## sample estimates:
## mean difference
## -1.58
Results show a variety of metrics. First, it gives you the mean difference of -1.58 which means that the additional hours sleep for group 1 is about 1.58 hours less than group 2. It also provides the t-test statistic (-4.06), the degrees of freedom (9 since we have 10 unique units in the analysis), and the p-value of .0028. You interpret this in the same way as before. Review the p-value versus the standard cut-point of .05 and evaluate if it is greater than or less than that value. Here we have a significant p-value, which means we would reject the null hypothesis that the two sleep medications had the same impact on the sleep amount for the students. Ultimately, you would conclude that medication 2 (group 2) increased sleep hours at a higher rate than medication 1 (group 1).
ANOVA is an acroymn for analysis of variance, and this analytic technique is used when you have 3 or more independent groups and want to understand if there are differences between them on your dependent variable. With ANOVAs, there are two different corrections that you can apply to reduce the possibility of Type I error: Bonferroni & Tukey HSD
The Bonferroni adjustment will be more conservative than the TukeyHSD test as it is most concerned with preventing Type I error. The TukeyHSD test tries to balance the Type I and Type II error in its calculation so will be more liberal in what it considers significant. The TukeyHSD test also assumes constant variance across groups which the Bonferroni adjustment does not.
Once again, you should always review your variable is ensure it is
appropriate for the analytic technique you propose to use. That is where
we start here. Let’s review the cyl
variable which is the
number of cylinders the car has. We see below that there are 3 different
cylinders in the dataset: 4, 6, 8. This makes it appropriate for an
ANOVA approach since that requires 3+ groups.
freq(mtcars$cyl)
## mtcars$cyl
## Frequency Percent
## 4 11 34.38
## 6 7 21.88
## 8 14 43.75
## Total 32 100.00
Just like with a t-test, you should have a null and alternative
hypothesis. However, you have a decision to make with the ANOVA
hypotheses of making a general statement or advocating specific nulls
and alternative hypotheses based on individual pairwise comparisons.
This is because there is more than 1 comparison being made in an ANOVA.
Here, with cyl
having three values we will make three
comparisons (4 cylinder to 6 cylinder; 4 cylinder to 8 cylinder; 6
cylinder to 8 cylinder).
If you think that all of the pairwise comparisons will be significant, then simply use a general null and alternative hypothesis.
Ho: No differences between cylinders in miles per gallon Ha: Each comparison between cylinders will be different in miles per gallon
If you think only some of the comparisons will be different, then be specific like with the t-tests writing out the exact comparisons that you expect significant differences to occur on.
Unlike the t-tests reviewed above, the ANOVA function in R does not
return the means by group so we must do that separately using
tidyverse
code. The second thing to note is that the ANOVA
results themselves do not contain the pairwise comparisons
(i.e. calculating significant differences between each of the grouping
variable values for the DV). We must do that separately which is shown
for both the Bonferroni and the Tukeys adjustment at the bottom of the
code chunk.
#Step 1: Calculate the mean of the DV by grouping variable
<-mtcars %>% #Dataset goes here
meansgroup_by(cyl) %>% #Grouping variable goes here
summarise(mean = mean(mpg), #IV goes inside the ()
var = var(mpg), #Not necessary to calculate variance and SD but I always do
sd = sd(mpg),
n=n())
print(means)
## # A tibble: 3 × 5
## cyl mean var sd n
## <dbl> <dbl> <dbl> <dbl> <int>
## 1 4 26.7 20.3 4.51 11
## 2 6 19.7 2.11 1.45 7
## 3 8 15.1 6.55 2.56 14
#Step 2: Save ANOVA Results
<- aov(mpg ~ as.factor(cyl), data = mtcars) #The as.factor code tells R to make the variable inside the () a factor otherwise it will not calculate correctly
anova_result
# Step 3: Display the ANOVA result
print(summary(anova_result))
## Df Sum Sq Mean Sq F value Pr(>F)
## as.factor(cyl) 2 824.8 412.4 39.7 4.98e-09 ***
## Residuals 29 301.3 10.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
First, review the mean differences for the DV - here miles per gallon - by the independent variable - here the number of cylinders in the car. We see that for cars with 4 cylinders the average MPG is 26.7, for 6 cylinder cars it is 19.7 and for 8 cylinder cars it is 15.1. While these might seem like large differences, you cannot rely simply on the means to decide if you reject or fail to reject the null hypothesis. You must statistically assess these differences.
First, we review the overall ANOVA results which shows a small p-value 4.98 e-9 which is scientific notation for 9 decimal places. This result tells us that overall there are significant differences in the mean of the DV between the levels of the independent variable. What it does not tell us is where those differences might arise. Sometimes you will have a significant overall ANOVA result but with some of the pariwise comparisons not being significant. This is why it is important to examine the Bonferroni or Tukey’s adjustments so you can understand more specifically where differences might be. We do this in the next code chunk.
#Step 4: Add your pairwise comparisons
# Add Pairwise Comparison Tests - Tukey
<- TukeyHSD(anova_result)
tukey_result print(tukey_result)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = mpg ~ as.factor(cyl), data = mtcars)
##
## $`as.factor(cyl)`
## diff lwr upr p adj
## 6-4 -6.920779 -10.769350 -3.0722086 0.0003424
## 8-4 -11.563636 -14.770779 -8.3564942 0.0000000
## 8-6 -4.642857 -8.327583 -0.9581313 0.0112287
# Add Pairwise Comparison Tests - Bonferroni
pairwise.t.test(mtcars$mpg, mtcars$cyl,
p.adjust.method = "bonferroni")
##
## Pairwise comparisons using t tests with pooled SD
##
## data: mtcars$mpg and mtcars$cyl
##
## 4 6
## 6 0.00036 -
## 8 2.6e-09 0.01246
##
## P value adjustment method: bonferroni
Both tests show the same information - pairwise comparison p-values for significant differences between groups - they just show it in a different format. There will be a p-value for each set of comparisons, which here since we have 3 possible values of the IV it shows 3 unique p-values. Each one should be interpreted independently.
The TukeyHSD test gives you the mean difference, 95% CI, and associated p-value for each comparison while the Bonferroni adjustment just gives you the p-values. The TukeyHSD also shows the comparisons in a single row table while Bonferroni shows it in a matrix.
The p-values in both tests and interpreted in the same way as we have been discussing. If they are smaller than your predefined significance level, you would consider that relationship to be statistical significant. Here, in both tests, each p-value is < .05, and oftentimes quite a bit less, indicating that each pairwise comparison has a statistically significant difference between groups.
As stated above, the Bonferroni adjustment will be more conservative (i.e. have larger p-values) compared to the TukeyHSD test, which we see in these results. Because there are only 3 groups, these two tests and very similar; however, as the number of comparisons increases the differences between the two tests will increase.
We would want to put these results into less technical language if we were writing it into a report or presentation. Recall from above that for cars with 4 cylinders the average MPG is 26.7, for 6 cylinder cars it is 19.7 and for 8 cylinder cars it is 15.1. We would conclude that cars with a 4 cylinder engine get the best fuel economy and that as the number of cylinders increase to 6 and then to 8 that fuel economy gets significantly worse.