The unpaired two-samples t-test is used to compare the mean of two independent groups.
For example, suppose that we have measured the weight of 100 individuals: 50 women (group A) and 50 men (group B). We want to know if the mean weight of women (mA) is significantly different from that of men (mB).
In this case, we have two unrelated (i.e., independent or unpaired) groups of samples. Therefore, it’s possible to use an independent t-test to evaluate whether the means are different.
Note that, unpaired two-samples t-test can be used only under certain conditions:
when the two groups of samples (A and B), being compared, are normally distributed. This can be checked using Shapiro-Wilk test. and when the variances of the two groups are equal. This can be checked using F-test.
library(ggpubr)
## Warning: package 'ggpubr' was built under R version 3.6.2
## Loading required package: ggplot2
## Registered S3 methods overwritten by 'ggplot2':
## method from
## [.quosures rlang
## c.quosures rlang
## print.quosures rlang
## Loading required package: magrittr
t.test(x, y, alternative = “two.sided”, var.equal = FALSE) x,y: numeric vectors alternative: the alternative hypothesis. Allowed value is one of “two.sided” (default), “greater” or “less”. var.equal: a logical variable indicating whether to treat the two variances as being equal. If TRUE then the pooled variance is used to estimate the variance otherwise the Welch test is used.
Here, we’ll use an example data set, which contains the weight of 18 individuals (9 women and 9 men):
# Data in two numeric vectors
women_weight <- c(38.9, 61.2, 73.3, 21.8, 63.4, 64.6, 48.4, 48.8, 48.5)
men_weight <- c(67.8, 60, 63.4, 76, 89.4, 73.3, 67.3, 61.3, 62.4)
# Create a data frame
my_data <- data.frame(
group = rep(c("Woman", "Man"), each = 9),
weight = c(women_weight, men_weight)
)
We want to know, if the average women’s weight differs from the average men’s weight?
# Print all data
print(my_data)
## group weight
## 1 Woman 38.9
## 2 Woman 61.2
## 3 Woman 73.3
## 4 Woman 21.8
## 5 Woman 63.4
## 6 Woman 64.6
## 7 Woman 48.4
## 8 Woman 48.8
## 9 Woman 48.5
## 10 Man 67.8
## 11 Man 60.0
## 12 Man 63.4
## 13 Man 76.0
## 14 Man 89.4
## 15 Man 73.3
## 16 Man 67.3
## 17 Man 61.3
## 18 Man 62.4
It’s possible to compute summary statistics (mean and sd) by groups. The dplyr package can be used.
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
group summary
group_by(my_data, group) %>%
summarise(
count = n(),
mean = mean(weight, na.rm = TRUE),
sd = sd(weight, na.rm = TRUE)
)
## # A tibble: 2 x 4
## group count mean sd
## <fct> <int> <dbl> <dbl>
## 1 Man 9 69.0 9.38
## 2 Woman 9 52.1 15.6
# Plot weight by group and color by group
ggboxplot(my_data, x = "group", y = "weight",
color = "group", palette = c("#00AFBB", "#E7B800"),
ylab = "Weight", xlab = "Groups")
Assumption 1: Are the two samples independents? Yes, since the samples from men and women are not related. Assumtion 2: Are the data from each of the 2 groups follow a normal distribution? Use Shapiro-Wilk normality test as described at: Normality Test in R. - Null hypothesis: the data are normally distributed - Alternative hypothesis: the data are not normally distributed
We’ll use the functions with() and shapiro.test() to compute Shapiro-Wilk test for each group of samples.
# Shapiro-Wilk normality test for Men's weights
with(my_data, shapiro.test(weight[group == "Man"]))
##
## Shapiro-Wilk normality test
##
## data: weight[group == "Man"]
## W = 0.86425, p-value = 0.1066
# Shapiro-Wilk normality test for Women's weights
with(my_data, shapiro.test(weight[group == "Woman"]))
##
## Shapiro-Wilk normality test
##
## data: weight[group == "Woman"]
## W = 0.94266, p-value = 0.6101
From the output, the two p-values are greater than the significance level 0.05 implying that the distribution of the data are not significantly different from the normal distribution. In other words, we can assume the normality.
Note that, if the data are not normally distributed, it’s recommended to use the non parametric two-samples Wilcoxon rank test.
Assumption 3. Do the two populations have the same variances? We’ll use F-test to test for homogeneity in variances. This can be performed with the function var.test() as follow:
res.ftest <- var.test(weight ~ group, data = my_data)
res.ftest
##
## F test to compare two variances
##
## data: weight by group
## F = 0.36134, num df = 8, denom df = 8, p-value = 0.1714
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.08150656 1.60191315
## sample estimates:
## ratio of variances
## 0.3613398
The p-value of F-test is p = 0.1713596. It’s greater than the significance level alpha = 0.05. In conclusion, there is no significant difference between the variances of the two sets of data. Therefore, we can use the classic t-test witch assume equality of the two variances.
Question : Is there any significant difference between women and men weights?
# Compute t-test
res <- t.test(women_weight, men_weight, var.equal = TRUE)
res
##
## Two Sample t-test
##
## data: women_weight and men_weight
## t = -2.7842, df = 16, p-value = 0.01327
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -29.748019 -4.029759
## sample estimates:
## mean of x mean of y
## 52.10000 68.98889
# Compute t-test
res <- t.test(weight ~ group, data = my_data, var.equal = TRUE)
res
##
## Two Sample t-test
##
## data: weight by group
## t = 2.7842, df = 16, p-value = 0.01327
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 4.029759 29.748019
## sample estimates:
## mean in group Man mean in group Woman
## 68.98889 52.10000
In the result above :
t is the t-test statistic value (t = 2.784), df is the degrees of freedom (df= 16), p-value is the significance level of the t-test (p-value = 0.01327). conf.int is the confidence interval of the mean at 95% (conf.int = [4.0298, 29.748]); sample estimates is he mean value of the sample (mean = 68.9888889, 52.1).
Note that:
if you want to test whether the average men’s weight is less than the average women’s weight, type this: t.test(weight ~ group, data = my_data, var.equal = TRUE, alternative = “less”) Or, if you want to test whether the average men’s weight is greater than the average women’s weight, type this t.test(weight ~ group, data = my_data, var.equal = TRUE, alternative = “greater”) ## Interpretation of Result
The p-value of the test is 0.01327, which is less than the significance level alpha = 0.05. We can conclude that men’s average weight is significantly different from women’s average weight with a p-value = 0.01327.
The result of t.test() function is a list containing the following components:
statistic: the value of the t test statistics parameter: the degrees of freedom for the t test statistics p.value: the p-value for the test conf.int: a confidence interval for the mean appropriate to the specified alternative hypothesis. estimate: the means of the two groups being compared (in the case of independent t test) or difference in means (in the case of paired t test).
# printing the p-value
res$p.value
## [1] 0.0132656
# printing the mean
res$estimate
## mean in group Man mean in group Woman
## 68.98889 52.10000
# printing the confidence interval
res$conf.int
## [1] 4.029759 29.748019
## attr(,"conf.level")
## [1] 0.95