Comparing Two Independent Continuous Variables and Nonparametric Tests

Learning Objectives

Develop the ability to perform independent two-samples t-test with equal and unequal variance in normally distributed continuous data.
Develop the ability to perform Wilcoxon signed-rank test and Wilcoxon rank-sum test for non-normally distributed continuous data.

Sources

Rosner, Bernard. Fundamentals of Biostatistics. Cengage Learning, 2017
Triola, Mario F., and Laura Iossi. Elementary Statistics. Pearson, 2018

RPubs Link

Load Libraries

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(openxlsx)

Load data

setwd("E:/Biostat and Study Design/204/Lectures/Data")
NHANES_df <- openxlsx::read.xlsx('NHEFS.xlsx')

Errors in Hypothesis Testing

Despite applying the proper statistical procedures, we might sometimes arrive to the wrong conclusion of rejecting or failing to reject the null hypothesis. There are two different types of errors that can be distinguished by calling them Type I and Type II errors:

Type I error: The mistake of rejecting the null hypothesis when it is actually true. Type I error is equal to \(\alpha\).
Type II error: The mistake of failing to reject the null hypothesis when it is false. Type II error is represented by the symbol \(\beta\) and it is equal to 1-power.

Comparing Two Samples

Two samples are independent if the sample values from one population are not related to or somehow naturally paired or match with the sample values from the other population.

Two samples are dependent if the sample values are somehow matched, where the matching is based on some inherent relationship. For example, sample values consist of two measurements from the same subject (before and after data), or a pair of sample values consists of matched pairs such as husband/wife data.

Example 1: A P1 pharmacy student conducted a study to compare the mean age of registered pharmacists in San Diego vs. San Francisco. The P1 pharmacy student randomly surveyed registered pharmacists in San Diego and San Francisco via phone. This is an example of independent samples because they are not matched according to some inherent relationship.

Example 2: You discovered a new drug that could potentially reduce blood pressure in patients with hypertension. You conducted a clinical trial to evaluate the efficacy of the drug by measuring the participants’ blood pressure before and after receiving the drug. This is an example of paired samples because the measurements belong to the same person.

Independent Two-Sample T-Test

Independent two-sample t-test is used to compare the means of two independent groups in order to determine whether there is statistical evidence that the associated population means are significantly different. The following assumptions must be met for independent two-sample t-test to be valid:

Data must be continuous.
Both samples are independent which means that there is no relationship between observations.
Both samples do not contain significant outliers.
Data is approximately normally distributed or sample sizes are both large (with \(n_1\) ≥ 30 and \(n_2\) ≥ 30).

In general, we use pooled variance t-test as long as \(s_1/s_2<2\) and \(s_2/s_1<2\), where \(s_1,s_2\) are the standard deviations of the outcome in the two groups.

Example: As part of the NHANES study, serum cholesterol level (mg/100 mL) was collected in 1971 to determine the effect of cholesterol on health. Using a significance level of 0.05, determine if the mean cholesterol level is equal between males and females.

\({H_0}: \mu_A = \mu_B\)

\({H_1}: \mu_A \neq\ \mu_B\)

Check if the data is numeric

class(NHANES_df$cholesterol)

## [1] "numeric"

Let’s explore the data

by(NHANES_df$cholesterol, NHANES_df$sex, length)

## NHANES_df$sex: Female
## [1] 830
## ------------------------------------------------------------ 
## NHANES_df$sex: Male
## [1] 799

by(NHANES_df$cholesterol, NHANES_df$sex, summary)

## NHANES_df$sex: Female
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    78.0   188.0   218.0   221.6   252.0   377.0      11 
## ------------------------------------------------------------ 
## NHANES_df$sex: Male
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    79.0   189.0   215.0   218.3   242.0   416.0       5

Next, draw a few plots. We start with a boxplot of cholesterol level grouped by sex

NHANES_df$sex <- as.factor(NHANES_df$sex) #convert sex to categorical variable

NHANES_df %>%  ggplot(aes(y=cholesterol,x=sex)) +
    stat_boxplot(geom = 'errorbar',  width = 0.2) + 
    geom_boxplot(fill='deepskyblue',outlier.colour="red", outlier.size=4) +
    theme_light()

## Warning: Removed 16 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
## Removed 16 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

Next, we plot a histogram of cholesterol level grouped by sex.

NHANES_df %>% ggplot(aes(x=cholesterol)) +
    geom_histogram( bins=29, fill="deepskyblue", color="black") + 
    theme_light() + facet_grid(sex ~.)

## Warning: Removed 16 rows containing non-finite outside the scale range
## (`stat_bin()`).

We assess normality using Q-Q plot.

NHANES_df %>% ggplot(aes(sample = cholesterol)) +
  stat_qq_line(size=2,aes(color='red'))+
  stat_qq(size=2) +
  theme_light()+
  facet_grid(sex ~.)

The plots reveal that males have lower serum cholesterol compared to females, and the data is slightly right-skewed in both males and females. Because the data is slightly skewed and \(n_1\) ≥ 30 and \(n_2\) ≥ 30, the use of independent two-sample t-test is appropriate.

To assess if it is appropriate to use pooled variance, we calculate the standard deviations of cholesterol levels in males and females.

s1 <- sd(NHANES_df$cholesterol[NHANES_df$sex=='Male'],na.rm = TRUE)
s2 <- sd(NHANES_df$cholesterol[NHANES_df$sex=='Female'],na.rm = TRUE)

s1/s2

## [1] 0.9249073

s2/s1

## [1] 1.081189

Since both ratios are less than 2, we perform t-test with pooled variance.

t.test(cholesterol ~ sex,data=NHANES_df,var.equal=TRUE)

## 
##  Two Sample t-test
## 
## data:  cholesterol by sex
## t = 1.487, df = 1611, p-value = 0.1372
## alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
## 95 percent confidence interval:
##  -1.073375  7.801962
## sample estimates:
## mean in group Female   mean in group Male 
##             221.6300             218.2657

If s1/s2 or s2/s1 were ≥ 2, we would have used Welch’s correction (var.equal=FALSE).

Interpretation: Since the P-Value is > 0.05, we fail to reject the null hypothesis and conclude that there is no sufficient evidence to warrant the rejection of the claim that males and females have equal mean serum cholesterol.

If the P-Value was ≤ 0.05, we would reject the null hypothesis and conclude that there is sufficient evidence to conclude males and females have different mean serum cholesterol.

Nonparametric Tests

z-test and t-test are examples of parametric tests. Parametric tests assume that sample data comes from a population that can be adequately modeled by a probability distribution. Nonparametric tests are distribution- free tests that don’t require the samples to come from a population with a normal distribution or any other distribution.

Advantages of nonparametric tests:

They have less rigid requirements compared to parametric tests and they can be applied to a wider variety of situations.
They can be applied to more data types compared to parametric tests.

Disadvantages of nonparametric tests:

They tend to waste data because they reduce numerical data to a qualitative form.
They are less efficient compared to parametric tests as they require stronger evidence to reject the null hypothesis.

Wilcoxon Signed-Ranks Test

Wilcoxon signed-ranks tests a claim that a single population of individual values has a median equal to some claimed value. By using ranks, the Wilcoxon signed ranks test takes the magnitudes of the differences into account, therefore it tends to yield conclusions that reflect the true nature of the data.

Example:Using a random sample size of 20 from the NHANES study, determine if the study participants have a median age of 48 years.

\({H_0}: median=48\)

\({H_1}: median \neq\ 48\)

set.seed(700)
NHANES_df_20 <- NHANES_df %>% sample_n(20) #select a sample of 20 participants

Let’s explore the data!

summary(NHANES_df_20$age)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   25.00   29.75   38.50   41.80   51.00   74.00

Graph a histogram of age.

NHANES_df_20 %>% ggplot(aes(x=age)) +
    geom_histogram( binwidth=5, fill="deepskyblue", color="black") + 
    theme_light()

Graph a boxplot of age.

NHANES_df_20 %>%  ggplot(aes(y=age)) +
    stat_boxplot(geom = 'errorbar',  width = 0.2) + 
    geom_boxplot(fill='deepskyblue',outlier.colour="red", outlier.size=4) +
    theme_light()

Assess normality using a Q-Q plot.

NHANES_df_20 %>% ggplot(aes(sample = age)) +
  stat_qq_line(size=2,aes(color='red'))+
   stat_qq(size=2) +
  theme_light()

It appears the data is not normally distributed and the small size is < 30; therefore, it is not appropriate to use one sample t-test and we should instead use Wilcoxon signed-rank test.

wilcox.test(NHANES_df_20$age,mu =48 ,exact=FALSE)

## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  NHANES_df_20$age
## V = 54.5, p-value = 0.06177
## alternative hypothesis: true location is not equal to 48

Interpretation: Since the P-Value > 0.05, we fail to reject the null hypothesis and conclude that there is no sufficient evidence to warrant the rejection of the claim that median age of NHANES participants is 48 years.

Wilcoxon Rank-Sum Test

Wilcoxon rank-sum test uses ranks of values from two independent samples to test the null hypothesis that the samples are from populations having equal medians. The Wilcoxon rank-sum test is equivalent to the Mann-Whitney U test.

The basic idea underlying the Wilcoxon rank-sum test: If two samples are drawn from identical populations and the individual values are all ranked as one combined collection of values, then the high and low ranks should fall evenly between the two samples. If the low ranks are found predominantly in one sample and the high ranks are found predominantly in the other sample, we have an indication that the two populations have different medians.

Example: Using a random sample of 20 participants from the NHANES study, determine if males and females have equal median age.

\({H_0}: median_{males} = median_{females}\)

\({H_1}: median_{males} \neq\ median_{females}\)

set.seed(600)
NHANES_df_20 <- NHANES_df %>% sample_n(20) #generate a sample size of 20 participants

Let’s evaluate the counts and frequency of males and females.

table(NHANES_df_20$sex)

## 
## Female   Male 
##      9     11

prop.table(table(NHANES_df_20$sex))

## 
## Female   Male 
##   0.45   0.55

Summarize data grouped by sex.

by(NHANES_df_20$age, NHANES_df_20$sex, summary)

## NHANES_df_20$sex: Female
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      29      37      42      41      46      51 
## ------------------------------------------------------------ 
## NHANES_df_20$sex: Male
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   25.00   29.00   34.00   41.45   53.50   69.00

Plot a histogram of age by sex.

NHANES_df_20 %>% ggplot(aes(x=age)) +
    geom_histogram( binwidth=5, fill="deepskyblue", color="black") + 
    theme_light() + facet_grid(sex ~.)

Plot a boxplot of age by sex.

NHANES_df_20 %>%  ggplot(aes(y=age,x=sex)) +
    stat_boxplot(geom = 'errorbar',  width = 0.2) + 
    geom_boxplot(fill='deepskyblue',outlier.colour="red", outlier.size=4) +
    theme_light()

Assess data using a Q-Q plot.

NHANES_df_20 %>% ggplot(aes(sample = age)) +
  stat_qq_line(size=2,aes(color='red'))+
  stat_qq(size=2) +
  theme_light()+
  facet_grid(sex ~.)

It appears the data is not normally distributed and the small size < 30; therefore, it is not appropriate to use two-sample t-test and we should use instead Wilcoxon rank-sum test.

wilcox.test(age~sex,data = NHANES_df_20,exact=FALSE)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  age by sex
## W = 52, p-value = 0.879
## alternative hypothesis: true location shift is not equal to 0

Interpretation: Since the P-Value > 0.05, we fail to reject the null hypothesis and conclude that there is no sufficient evidence to warrant the rejection of the claim that males and females have equal median age in the NHANES study.