Unpaired_two_Sample_T

Unpaired two Sample t-

The unpaired two-samples t-test is used to compare the mean of two independent groups.

For example, suppose that we have measured the weight of 100 individuals: 50 women (group A) and 50 men (group B). We want to know if the mean weight of women (mA) is significantly different from that of men (mB).

In this case, we have two unrelated (i.e., independent or unpaired) groups of samples. Therefore, it’s possible to use an independent t-test to evaluate whether the means are different.

Note that, unpaired two-samples t-test can be used only under certain conditions:

when the two groups of samples (A and B), being compared, are normally distributed. This can be checked using Shapiro-Wilk test. and when the variances of the two groups are equal. This can be checked using F-test.

load package

library(ggpubr)

## Warning: package 'ggpubr' was built under R version 3.6.2

## Loading required package: ggplot2

## Registered S3 methods overwritten by 'ggplot2':
##   method         from 
##   [.quosures     rlang
##   c.quosures     rlang
##   print.quosures rlang

## Loading required package: magrittr

R Function

t.test(x, y, alternative = “two.sided”, var.equal = FALSE) x,y: numeric vectors alternative: the alternative hypothesis. Allowed value is one of “two.sided” (default), “greater” or “less”. var.equal: a logical variable indicating whether to treat the two variances as being equal. If TRUE then the pooled variance is used to estimate the variance otherwise the Welch test is used.

Here, we’ll use an example data set, which contains the weight of 18 individuals (9 women and 9 men):

# Data in two numeric vectors
women_weight <- c(38.9, 61.2, 73.3, 21.8, 63.4, 64.6, 48.4, 48.8, 48.5)
men_weight <- c(67.8, 60, 63.4, 76, 89.4, 73.3, 67.3, 61.3, 62.4) 
# Create a data frame
my_data <- data.frame( 
                group = rep(c("Woman", "Man"), each = 9),
                weight = c(women_weight,  men_weight)
                )

We want to know, if the average women’s weight differs from the average men’s weight?

Check your data

# Print all data
print(my_data)

##    group weight
## 1  Woman   38.9
## 2  Woman   61.2
## 3  Woman   73.3
## 4  Woman   21.8
## 5  Woman   63.4
## 6  Woman   64.6
## 7  Woman   48.4
## 8  Woman   48.8
## 9  Woman   48.5
## 10   Man   67.8
## 11   Man   60.0
## 12   Man   63.4
## 13   Man   76.0
## 14   Man   89.4
## 15   Man   73.3
## 16   Man   67.3
## 17   Man   61.3
## 18   Man   62.4

It’s possible to compute summary statistics (mean and sd) by groups. The dplyr package can be used.

library(dplyr)

## Warning: package 'dplyr' was built under R version 3.6.3

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

group summary

group_by(my_data, group) %>%
  summarise(
    count = n(),
    mean = mean(weight, na.rm = TRUE),
    sd = sd(weight, na.rm = TRUE)
  )

## # A tibble: 2 x 4
##   group count  mean    sd
##   <fct> <int> <dbl> <dbl>
## 1 Man       9  69.0  9.38
## 2 Woman     9  52.1 15.6

Visualize your data using box plot

# Plot weight by group and color by group
ggboxplot(my_data, x = "group", y = "weight", 
          color = "group", palette = c("#00AFBB", "#E7B800"),
        ylab = "Weight", xlab = "Groups")

Preliminary test to check independent t-test assumption

Assumption 1: Are the two samples independents? Yes, since the samples from men and women are not related. Assumtion 2: Are the data from each of the 2 groups follow a normal distribution? Use Shapiro-Wilk normality test as described at: Normality Test in R. - Null hypothesis: the data are normally distributed - Alternative hypothesis: the data are not normally distributed

We’ll use the functions with() and shapiro.test() to compute Shapiro-Wilk test for each group of samples.

# Shapiro-Wilk normality test for Men's weights
with(my_data, shapiro.test(weight[group == "Man"]))

## 
##  Shapiro-Wilk normality test
## 
## data:  weight[group == "Man"]
## W = 0.86425, p-value = 0.1066

# Shapiro-Wilk normality test for Women's weights
with(my_data, shapiro.test(weight[group == "Woman"]))

## 
##  Shapiro-Wilk normality test
## 
## data:  weight[group == "Woman"]
## W = 0.94266, p-value = 0.6101

From the output, the two p-values are greater than the significance level 0.05 implying that the distribution of the data are not significantly different from the normal distribution. In other words, we can assume the normality.

Note that, if the data are not normally distributed, it’s recommended to use the non parametric two-samples Wilcoxon rank test.

Assumption 3. Do the two populations have the same variances? We’ll use F-test to test for homogeneity in variances. This can be performed with the function var.test() as follow:

res.ftest <- var.test(weight ~ group, data = my_data)
res.ftest

## 
##  F test to compare two variances
## 
## data:  weight by group
## F = 0.36134, num df = 8, denom df = 8, p-value = 0.1714
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.08150656 1.60191315
## sample estimates:
## ratio of variances 
##          0.3613398

The p-value of F-test is p = 0.1713596. It’s greater than the significance level alpha = 0.05. In conclusion, there is no significant difference between the variances of the two sets of data. Therefore, we can use the classic t-test witch assume equality of the two variances.

Compute unpaired two-sample t-test

Question : Is there any significant difference between women and men weights?

Compute independent t-test - Method 1: The data are saved in two different numeric vectors

# Compute t-test
res <- t.test(women_weight, men_weight, var.equal = TRUE)
res

## 
##  Two Sample t-test
## 
## data:  women_weight and men_weight
## t = -2.7842, df = 16, p-value = 0.01327
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -29.748019  -4.029759
## sample estimates:
## mean of x mean of y 
##  52.10000  68.98889

Compute independent t-test - Method 2: The data are saved in a data frame.

# Compute t-test
res <- t.test(weight ~ group, data = my_data, var.equal = TRUE)
res

## 
##  Two Sample t-test
## 
## data:  weight by group
## t = 2.7842, df = 16, p-value = 0.01327
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   4.029759 29.748019
## sample estimates:
##   mean in group Man mean in group Woman 
##            68.98889            52.10000

In the result above :

t is the t-test statistic value (t = 2.784), df is the degrees of freedom (df= 16), p-value is the significance level of the t-test (p-value = 0.01327). conf.int is the confidence interval of the mean at 95% (conf.int = [4.0298, 29.748]); sample estimates is he mean value of the sample (mean = 68.9888889, 52.1).

Note that:

if you want to test whether the average men’s weight is less than the average women’s weight, type this: t.test(weight ~ group, data = my_data, var.equal = TRUE, alternative = “less”) Or, if you want to test whether the average men’s weight is greater than the average women’s weight, type this t.test(weight ~ group, data = my_data, var.equal = TRUE, alternative = “greater”) ## Interpretation of Result

The p-value of the test is 0.01327, which is less than the significance level alpha = 0.05. We can conclude that men’s average weight is significantly different from women’s average weight with a p-value = 0.01327.

Access to the values returned by t.test()function

The result of t.test() function is a list containing the following components:

statistic: the value of the t test statistics parameter: the degrees of freedom for the t test statistics p.value: the p-value for the test conf.int: a confidence interval for the mean appropriate to the specified alternative hypothesis. estimate: the means of the two groups being compared (in the case of independent t test) or difference in means (in the case of paired t test).

# printing the p-value
res$p.value

## [1] 0.0132656

# printing the mean
res$estimate

##   mean in group Man mean in group Woman 
##            68.98889            52.10000

# printing the confidence interval
res$conf.int

## [1]  4.029759 29.748019
## attr(,"conf.level")
## [1] 0.95

Unpaired_two_Sample_T_test

Priyank Goyal

03/04/2020