library(rstatix) #Allows you to do various statistical tests
## Warning: package 'rstatix' was built under R version 4.2.3
## 
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
## 
##     filter
library(haven) #Allows you to import wide variety of data types
## Warning: package 'haven' was built under R version 4.2.3
library(tidyverse) #General coding
## Warning: package 'tidyverse' was built under R version 4.2.3
## Warning: package 'ggplot2' was built under R version 4.2.3
## Warning: package 'tibble' was built under R version 4.2.3
## Warning: package 'tidyr' was built under R version 4.2.3
## Warning: package 'readr' was built under R version 4.2.3
## Warning: package 'purrr' was built under R version 4.2.3
## Warning: package 'dplyr' was built under R version 4.2.3
## Warning: package 'stringr' was built under R version 4.2.3
## Warning: package 'forcats' was built under R version 4.2.3
## Warning: package 'lubridate' was built under R version 4.2.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks rstatix::filter(), stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
library(descr) #Easy frequencies and distributions of discrete variables 
## Warning: package 'descr' was built under R version 4.2.3
library(ggplot2) #Graphing 
library(skimr) #Easy descriptive data of all variables in dataset
## Warning: package 'skimr' was built under R version 4.2.3
data(mtcars)
data(sleep)

Introduction

In this tutorial, we will review how to conduct a variety of t-tests and an ANOVA. Specifically, we look at the following:

Strictly speaking, the dependent variable in these tests should be drawn from a continuous distribution. Practically speaking, you see these tests conducted, in both academia and industry, on both continuous variables as well as ordinal ones (typically with 5+ data points).

We will be working out of the built-in R datasets, mtcars & sleep, in this tutorial.

T-Tests and ANOVA

As previously discussed, you should always review the variables you have in your dataset prior to conducting any type of analysis. Here, we start by using the names function from Base R, which simply tells us all of the variables in any given dataset. This is useful for you to easily find the names of the variable(s) you might want to review. We also use the skim function from the skimr package to quickly assess the descriptive results for all of the quantitative variables in the data.

We will be using the mpg variable as our main dependent variable in this analysis, which is a continuous variable that ranges from 10.4 miles per gallon at a minimum to 33.4 at a maximum.

names(mtcars)
##  [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
## [11] "carb"
skim(mtcars)
Data summary
Name mtcars
Number of rows 32
Number of columns 11
_______________________
Column type frequency:
numeric 11
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
mpg 0 1 20.09 6.03 10.40 15.43 19.20 22.80 33.90 ▃▇▅▁▂
cyl 0 1 6.19 1.79 4.00 4.00 6.00 8.00 8.00 ▆▁▃▁▇
disp 0 1 230.72 123.94 71.10 120.83 196.30 326.00 472.00 ▇▃▃▃▂
hp 0 1 146.69 68.56 52.00 96.50 123.00 180.00 335.00 ▇▇▆▃▁
drat 0 1 3.60 0.53 2.76 3.08 3.70 3.92 4.93 ▇▃▇▅▁
wt 0 1 3.22 0.98 1.51 2.58 3.33 3.61 5.42 ▃▃▇▁▂
qsec 0 1 17.85 1.79 14.50 16.89 17.71 18.90 22.90 ▃▇▇▂▁
vs 0 1 0.44 0.50 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▆
am 0 1 0.41 0.50 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▆
gear 0 1 3.69 0.74 3.00 3.00 4.00 4.00 5.00 ▇▁▆▁▂
carb 0 1 2.81 1.62 1.00 2.00 2.00 4.00 8.00 ▇▂▅▁▁
summary(mtcars)
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000

T-Tests

Independent Samples T-Test with Continuous/Ordinal DV

Independent samples t-tests are a useful analytic tool to understand differences between exactly two independent groups on some dependent variable in any set of data. Here, independent means that a unit cannot be a member of both groups being used in the analysis.

Because it requires exactly two groups, the first thing to do is ensure that the independent variable you want to use has two scale points. Based on the results from the skim function above, we have a good sense of which variables have two scale points but we start using the freq function from the descr package examining the am variable, which measures the transmission type of the car.

freq(mtcars$am)

## mtcars$am 
##       Frequency Percent
## 0            19   59.38
## 1            13   40.62
## Total        32  100.00

Here, we see that the am variable does in fact have two values where 1 = manual transmission and 0= automatic. Always start by writing out your our null and alternative hypotheses:

Ho: No difference in mpg between manual and automatic transmissions

Ha: Difference in mpg between manual and automatic transmissions.

From lecture and the readings for this week, you learned that depending on if the variance of the DV is the same or different by group, you should either use the Student’s t or the Welch’s test. Because of this, we will start by examining the difference in variances for mpg by transmission type.

We will also use the var.test command to test for equality of variances across the two groups on the dependent variable. This test is evaluated in a similar way to others in that you evaluating the p-value to decide if you reject or fail to reject the null hypothesis. In this test, the null hypothesis is that the variances are equal so a significant p-value indicates the variances between the two groups should not be considered equivalent.

We use two specific functions from tidyverse to do this:

1. group_by which calculates the specified values by your grouping variable

2. ‘summarise’ which does the actual calculation of the specified value by your grouping variable

var_comp<-mtcars %>% 
  group_by(am) %>% #Place your IV here that you want to compare variances between 
  summarise(mean = mean(mpg), #Your DV goes here
            var = var(mpg), 
            sd = sd(mpg), 
            n=n())

print(var_comp)
## # A tibble: 2 × 5
##      am  mean   var    sd     n
##   <dbl> <dbl> <dbl> <dbl> <int>
## 1     0  17.1  14.7  3.83    19
## 2     1  24.4  38.0  6.17    13
var.test(mpg~am, data=mtcars) #Statistically assesses the difference in variance between the two groups 
## 
##  F test to compare two variances
## 
## data:  mpg by am
## F = 0.38656, num df = 18, denom df = 12, p-value = 0.06691
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.1243721 1.0703429
## sample estimates:
## ratio of variances 
##          0.3865615

Results show the mean, variance, standard deviation and n-counts for mpg by transmission type. We see that for manual transmissions (when am = 1) the average MPG is 24.4 while for the automatic transmission the average is 17.1. However, we are interested in the variance here to decide which t-test to use. The variance for the manual tranmission type = 38 while for the automatic = 14.7. The p-value on the var.test is .06991 which is greater than .05. This means we would fail to reject the null hypothesis that there are differences between the variances between the two groups.

We use the Student’s t approach as a result of this test, which assumes constant variance. Note, we will look at both here so you can understand how to calculate each one and to give you a sense of the differences in results.

We use the t.test function from the rstatix package to do this (true for both types of t-tests). The only difference, as seen in the code chunk below, between the code for these different approaches is that the Welch’s test uses var.equal = FALSE while the Student’s uses var.equal = TRUE.

#Conduct Welch's Test from the rstatix package
welch_t_test <- mtcars %>% #Dataset name goes in this line of code
  t.test(mpg ~ am, data = ., var.equal = FALSE) #DV comes before ~ while IV comes after it
print(welch_t_test)
## 
##  Welch Two Sample t-test
## 
## data:  mpg by am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean in group 0 mean in group 1 
##        17.14737        24.39231
#Conduct Student's T Test from the rstatix package
student_t_test <- mtcars %>%  #Dataset name goes in this line of code
  t.test(mpg ~ am, data = ., var.equal = TRUE)  #DV comes before ~ while IV comes after it
print(student_t_test)
## 
##  Two Sample t-test
## 
## data:  mpg by am
## t = -4.1061, df = 30, p-value = 0.000285
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -10.84837  -3.64151
## sample estimates:
## mean in group 0 mean in group 1 
##        17.14737        24.39231

Start by reviewing the actual means for the DV at each grouping level. As noted above, the average MPG is 24.4 while for the automatic transmission the average is 17.1. While this seems like a large difference, remember that the sample size is small for both groups and there is a decent amount of variation. This is one reason why we always review the test statistic and associated p-values before rejecting or failing to reject the null hypothesis.

Results for the Welch’s test show a t-test statistic of -3.77 and a p-value of .0013, which is smaller than .05 indicating a significant difference between the two groups. From this approach, we would reject the null hypothesis that there is no difference in miles per gallon between a car with an automatic and manual transmission. Looking at the 95% confidence interval in the results, you notice that both the lower and upper bound limits are negative. This is because the CI from this test is about the difference between the two means and group 0 (automatic transmission) is lower than the mean for group 1 (manual transmission). We would conclude from that CI that the true population difference between the two types of cars - since this is just a sample of cars from the 1970s rather than all of them - is somewhere between -11.3 and -3.2. Note that 0 is not included in this confidence interval, which is another way for you to review statistical significance.

Turning to the Student’s t-test, which assumes constant variance between the groups, we once draw the same conclusion - i.e. we reject the null hypothesis that there is no difference in mpg between the transmission types. However, there are differences in the results between these two approaches. First the p-value is smaller in the Student’s t result while the t-test statistic is more extreme as well. This also is reflected in the confidence interval which is tighter than in the Welch’s test. One final thing to note, which we reviewed in lecture, is that the degrees of freedom used in the analysis are different. The Welch’s test uses a more complex formula to assess DFs as reviewed in class whereas the Student’s t-test uses n-2 for its DFs calculation.

Paired Samples T-Test with Continuous/Ordinal DV

For the paired samples t-test, your data must consist of the same units being measured at 2 time periods. Here we will use the sleep dataset, which measures 20 students who were given different sleeping medication to help them sleep. Their sleep hours are measured first with one sleeping medication and second for the other medication for the same individual, which makes it appropriate for the paired samples t-test.

skim(sleep)
Data summary
Name sleep
Number of rows 20
Number of columns 3
_______________________
Column type frequency:
factor 2
numeric 1
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
group 0 1 FALSE 2 1: 10, 2: 10
ID 0 1 FALSE 10 1: 2, 2: 2, 3: 2, 4: 2

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
extra 0 1 1.54 2.02 -1.6 -0.03 0.95 3.4 5.5 ▃▇▃▃▃
summary(sleep)
##      extra        group        ID   
##  Min.   :-1.600   1:10   1      :2  
##  1st Qu.:-0.025   2:10   2      :2  
##  Median : 0.950          3      :2  
##  Mean   : 1.540          4      :2  
##  3rd Qu.: 3.400          5      :2  
##  Max.   : 5.500          6      :2  
##                          (Other):8
levels(sleep$group)
## [1] "1" "2"

Like always, we want to review the dataset prior to beginning analysis. From this descriptive review, we see that we have 1 variable considered continuous - extra - which measures the number of additional hours slept by the student (compared to a baseline no medication measurement not included in the dataset) which will be the dependent variable in the analysis. There is also an ID variable which simply identifies the additional sleep hours by student and a group variable which measures which time period the DV is measured at. The group variable will be used as the independent variable in the analysis.

In the below code chunk, we run the actual paired samples t-test. Note that the code is largely the same as above but with the difference of using paired=TRUE , which tells R that this is a paired sample. Functionally, this matches the IDs for group 1 and 2 to calculate the difference in groups. Due to how the results are displayed in the t-test, it can be beneficial to use the same group_by approach for calculating means as discussed above.

slp<-sleep %>% 
  group_by(group) %>% #Place your IV here
  summarise(mean = mean(extra), #DV goes here
            var = var(extra), 
            sd = sd(extra), 
            n=n())

print(slp)
## # A tibble: 2 × 5
##   group  mean   var    sd     n
##   <fct> <dbl> <dbl> <dbl> <int>
## 1 1      0.75  3.20  1.79    10
## 2 2      2.33  4.01  2.00    10
t.test(extra ~ group, data = sleep, paired = TRUE)
## 
##  Paired t-test
## 
## data:  extra by group
## t = -4.0621, df = 9, p-value = 0.002833
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -2.4598858 -0.7001142
## sample estimates:
## mean difference 
##           -1.58

Results show a variety of metrics. First, it gives you the mean difference of -1.58 which means that the additional hours sleep for group 1 is about 1.58 hours less than group 2. It also provides the t-test statistic (-4.06), the degrees of freedom (9 since we have 10 unique units in the analysis), and the p-value of .0028. You interpret this in the same way as before. Review the p-value versus the standard cut-point of .05 and evaluate if it is greater than or less than that value. Here we have a significant p-value, which means we would reject the null hypothesis that the two sleep medications had the same impact on the sleep amount for the students. Ultimately, you would conclude that medication 2 (group 2) increased sleep hours at a higher rate than medication 1 (group 1).

ANOVAs

ANOVA is an acroymn for analysis of variance, and this analytic technique is used when you have 3 or more independent groups and want to understand if there are differences between them on your dependent variable. With ANOVAs, there are two different corrections that you can apply to reduce the possibility of Type I error: Bonferroni & Tukey HSD

The Bonferroni adjustment will be more conservative than the TukeyHSD test as it is most concerned with preventing Type I error. The TukeyHSD test tries to balance the Type I and Type II error in its calculation so will be more liberal in what it considers significant. The TukeyHSD test also assumes constant variance across groups which the Bonferroni adjustment does not.

Once again, you should always review your variable is ensure it is appropriate for the analytic technique you propose to use. That is where we start here. Let’s review the cyl variable which is the number of cylinders the car has. We see below that there are 3 different cylinders in the dataset: 4, 6, 8. This makes it appropriate for an ANOVA approach since that requires 3+ groups.

freq(mtcars$cyl)

## mtcars$cyl 
##       Frequency Percent
## 4            11   34.38
## 6             7   21.88
## 8            14   43.75
## Total        32  100.00

Just like with a t-test, you should have a null and alternative hypothesis. However, you have a decision to make with the ANOVA hypotheses of making a general statement or advocating specific nulls and alternative hypotheses based on individual pairwise comparisons. This is because there is more than 1 comparison being made in an ANOVA. Here, with cyl having three values we will make three comparisons (4 cylinder to 6 cylinder; 4 cylinder to 8 cylinder; 6 cylinder to 8 cylinder).

If you think that all of the pairwise comparisons will be significant, then simply use a general null and alternative hypothesis.

Ho: No differences between cylinders in miles per gallon Ha: Each comparison between cylinders will be different in miles per gallon

If you think only some of the comparisons will be different, then be specific like with the t-tests writing out the exact comparisons that you expect significant differences to occur on.

Unlike the t-tests reviewed above, the ANOVA function in R does not return the means by group so we must do that separately using tidyverse code. The second thing to note is that the ANOVA results themselves do not contain the pairwise comparisons (i.e. calculating significant differences between each of the grouping variable values for the DV). We must do that separately which is shown for both the Bonferroni and the Tukeys adjustment at the bottom of the code chunk.

#Step 1: Calculate the mean of the DV by grouping variable
means<-mtcars %>% #Dataset goes here 
  group_by(cyl) %>% #Grouping variable goes here
  summarise(mean = mean(mpg), #IV goes inside the ()
            var = var(mpg), #Not necessary to calculate variance and SD but I always do 
            sd = sd(mpg), 
            n=n())

print(means)
## # A tibble: 3 × 5
##     cyl  mean   var    sd     n
##   <dbl> <dbl> <dbl> <dbl> <int>
## 1     4  26.7 20.3   4.51    11
## 2     6  19.7  2.11  1.45     7
## 3     8  15.1  6.55  2.56    14
#Step 2: Save ANOVA Results
anova_result <- aov(mpg ~ as.factor(cyl), data = mtcars) #The as.factor code tells R to make the variable inside the () a factor otherwise it will not calculate correctly

# Step 3: Display the ANOVA result
print(summary(anova_result))
##                Df Sum Sq Mean Sq F value   Pr(>F)    
## as.factor(cyl)  2  824.8   412.4    39.7 4.98e-09 ***
## Residuals      29  301.3    10.4                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

First, review the mean differences for the DV - here miles per gallon - by the independent variable - here the number of cylinders in the car. We see that for cars with 4 cylinders the average MPG is 26.7, for 6 cylinder cars it is 19.7 and for 8 cylinder cars it is 15.1. While these might seem like large differences, you cannot rely simply on the means to decide if you reject or fail to reject the null hypothesis. You must statistically assess these differences.

First, we review the overall ANOVA results which shows a small p-value 4.98 e-9 which is scientific notation for 9 decimal places. This result tells us that overall there are significant differences in the mean of the DV between the levels of the independent variable. What it does not tell us is where those differences might arise. Sometimes you will have a significant overall ANOVA result but with some of the pariwise comparisons not being significant. This is why it is important to examine the Bonferroni or Tukey’s adjustments so you can understand more specifically where differences might be. We do this in the next code chunk.

#Step 4: Add your pairwise comparisons
# Add Pairwise Comparison Tests - Tukey
tukey_result <- TukeyHSD(anova_result)
print(tukey_result)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = mpg ~ as.factor(cyl), data = mtcars)
## 
## $`as.factor(cyl)`
##           diff        lwr        upr     p adj
## 6-4  -6.920779 -10.769350 -3.0722086 0.0003424
## 8-4 -11.563636 -14.770779 -8.3564942 0.0000000
## 8-6  -4.642857  -8.327583 -0.9581313 0.0112287
# Add Pairwise Comparison Tests - Bonferroni
pairwise.t.test(mtcars$mpg, mtcars$cyl, 
                p.adjust.method = "bonferroni")
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  mtcars$mpg and mtcars$cyl 
## 
##   4       6      
## 6 0.00036 -      
## 8 2.6e-09 0.01246
## 
## P value adjustment method: bonferroni

Both tests show the same information - pairwise comparison p-values for significant differences between groups - they just show it in a different format. There will be a p-value for each set of comparisons, which here since we have 3 possible values of the IV it shows 3 unique p-values. Each one should be interpreted independently.

The TukeyHSD test gives you the mean difference, 95% CI, and associated p-value for each comparison while the Bonferroni adjustment just gives you the p-values. The TukeyHSD also shows the comparisons in a single row table while Bonferroni shows it in a matrix.

The p-values in both tests and interpreted in the same way as we have been discussing. If they are smaller than your predefined significance level, you would consider that relationship to be statistical significant. Here, in both tests, each p-value is < .05, and oftentimes quite a bit less, indicating that each pairwise comparison has a statistically significant difference between groups.

As stated above, the Bonferroni adjustment will be more conservative (i.e. have larger p-values) compared to the TukeyHSD test, which we see in these results. Because there are only 3 groups, these two tests and very similar; however, as the number of comparisons increases the differences between the two tests will increase.

We would want to put these results into less technical language if we were writing it into a report or presentation. Recall from above that for cars with 4 cylinders the average MPG is 26.7, for 6 cylinder cars it is 19.7 and for 8 cylinder cars it is 15.1. We would conclude that cars with a 4 cylinder engine get the best fuel economy and that as the number of cylinders increase to 6 and then to 8 that fuel economy gets significantly worse.