Case study 2: Independent samples t-test

Foreword: The Elementary Statistics for Medical Students (ESMS) project

You are reading the second tutorial of ESMS project. The main objective of this project is to help Vietnamese medical students in developing their skills of data analysis. We provide the problem-based tutorials, each one will show the student how to resolve a common study question using the appropriate methods in R.

We aknowledge the fact that learning a new programming language likes R might be difficult for students, whilst their faculty have been using commercial packages such as SPSS or STATA for years. However on the long run, R is definitively better for the learner’s own benefits. By writing down a statistical procedure in R, we can understand our method in deeper level and take control of verything, down to the smallest detail.

Introduction

T-test is a general term that covers all statistical hypothesis test that based on a Student’s t distribution (as introduced by William Sealy Gosset in 1908 under the pseudonym of Student). The present tutorial only focus on the Independent samples t-test. This method is widely used by medical students to determine whether a difference exists between the mean of two independent samples. This comparison task could also be extended as a generalized linear regression analysis, in which a linear model is fitted for a continuous outcome in function of a two-levels factor in order to assess the effect of this factor on the outcome variance. Such linear model may also include covariates (a special form of ANCOVA).

This tutorial will show you how to carry out an Independent sample t-test in R. We also introduce a bootstraping procedure for t-test, as well as how to interpret and report the results.

Context

Our study was conducted in a hospital’s Intensive Care Unit (ICU). Systolic blood pressure (SysBP, in mmHg), Heart rate (beat per minute) and other clinical data including suspected Infection, admission type and survival outcomes were recorded in 200 patients. In this tutorial we will consider the SysBP and Infection as dependent and independent variables, respectively. Our study question is whether the SysBP was significantly lower in patients with Infection, compared to the patient without infection. Such question could be handled by a Student’s t-test.

How the Independent t-test works ?

First, we assume a null hypothesis (H0) that: the means of the two groups are equal (i.e the difference between two means is zero); thus our alternative hypothesis (HA) is that the means of the two groups are not equal (so the difference between them could be greater or lower than zero).

Then we will determine a significance level (p-value) which is the probability of group means being at least as high as observed in our data, given that the null hypothesis is indeed true. If such probability is small enough (based on a threshold, such as 0.05), we can reject our Null hypothesis and thus accept the alternative one.

Once the probability is higher than 0.05, we fail to reject the null hypothesis (but we cannot neither accept it), so there might be a high risk that our alternative hypothesis being false.

Data preparation

The dataset could be downloaded from the famous vincentarebundock database. Then we modified the labels of some categorical variables in our data.

library(tidyverse)

df=read.csv("http://vincentarelbundock.github.io/Rdatasets/csv/Stat2Data/ICU.csv")%>%as_tibble()

df$Emergency%<>%as.factor()%>%recode_factor(.,`0` = "Elective", `1` = "Emergency")
df$Survive%<>%as.factor()%>%recode_factor(.,`0` = "Dead", `1` = "Survived")
df$AgeGroup%<>%as.factor()%>%recode_factor(.,`1` = "Young", `2` = "Middle",`3` = "Old")
df$Infection%<>%as.factor()%>%recode_factor(.,`0` = "No_infection", `1` = "With_infection")
df$Sex%<>%as.factor()%>%recode_factor(.,`0` = "Male", `1` = "Female")

df%>%head()%>%knitr::kable()

X	ID	Survive	Age	AgeGroup	Sex	Infection	SysBP	Pulse	Emergency
1	4	Dead	87	Old	Female	With_infection	80	96	Emergency
2	8	Survived	27	Young	Female	With_infection	142	88	Emergency
3	12	Survived	59	Middle	Male	No_infection	112	80	Emergency
4	14	Survived	77	Old	Male	No_infection	100	70	Elective
5	27	Dead	76	Old	Female	With_infection	128	90	Emergency
6	28	Survived	54	Middle	Male	With_infection	142	103	Emergency

Step 1: Exploring data and Testing of Assumptions

Prior to performing an independent samples t-test, we should verify following assumptions:

1. The outcome was measured on a continuous scale. Some clinical scores can also be acceptable if they are normally distributed. Age, time and ratios are also acceptable. Conventional t-test is not recommended for count data, as they usually follow specific distribution families, like Poisson or Negative binomial. Percentage and other variables that vary in the (0-1) interval require Beta regression analysis.
1. The factor consists of a dichotomous variable. This could be interpreted as “conditions”, “group”, “categories” or “levels”. Our factor could be result of either data splitting or prospective sampling.
1. The observations should be independent, i.e there was no relationship between the observations in each subgroup or between two samples. Though in most of cases, such assumption can be easily verified (Case control design, or pure dichotomous variables like gender, age range, treatment mode…), sometimes it could be complicated as the individuals in both groups could have been influenced by a third, unknown factor. In such cases, we must consider other methods such as ANCOVA, multivariate models or moderating, mediating effect analysis.
1. There must be no significant outliers in the two groups. Outliers may consist the extremely low or high values compared to other scores in our data. The effect of outliers on the statistics could be negative or positive, but in both cases the statistical inference is no more reliable. Outliers could be visually detected using the Box-plot.

-5) The outcome must be approximately normally distributed. This assumption is only relative, as the comparison is also acceptable if both samples are skewed in the same way.

1. The variance must be homogenous in each sample. This could be verified using Levene test

Table 1: Descriptive analysis

psych::describeBy(df$SysBP,df$Infection)

## $No_infection
##    vars   n   mean    sd median trimmed   mad min max range skew kurtosis
## X1    1 116 138.67 32.71    134  136.63 32.62  66 256   190 0.64      0.8
##      se
## X1 3.04
## 
## $With_infection
##    vars  n   mean    sd median trimmed   mad min max range  skew kurtosis
## X1    1 84 123.45 31.39    124  124.38 24.46  36 200   164 -0.26     0.39
##      se
## X1 3.42
## 
## attr(,"call")
## by.default(data = x, INDICES = group, FUN = describe, type = type)

Checking for Normal distribution

The normal distribution of data can be verified by QQ-plot and/or statistical tests.

qqplot=function(x){
qqnorm(x)
qqline(x, col = "red")
}

df$SysBP%>%split(df$Infection)%>%map(~qqplot(.))

## $No_infection
## NULL
## 
## $With_infection
## NULL

library(fBasics)

df$SysBP%>%split(df$Infection)%>%map(~dagoTest(.))

## $No_infection
## 
## Title:
##  D'Agostino Normality Test
## 
## Test Results:
##   STATISTIC:
##     Chi2 | Omnibus: 11.1662
##     Z3  | Skewness: 2.8044
##     Z4  | Kurtosis: 1.817
##   P VALUE:
##     Omnibus  Test: 0.003761 
##     Skewness Test: 0.005041 
##     Kurtosis Test: 0.06921 
## 
## Description:
##  Fri Feb 24 00:04:12 2017 by user: Admin
## 
## 
## $With_infection
## 
## Title:
##  D'Agostino Normality Test
## 
## Test Results:
##   STATISTIC:
##     Chi2 | Omnibus: 2.4326
##     Z3  | Skewness: -1.0363
##     Z4  | Kurtosis: 1.1656
##   P VALUE:
##     Omnibus  Test: 0.2963 
##     Skewness Test: 0.3001 
##     Kurtosis Test: 0.2438 
## 
## Description:
##  Fri Feb 24 00:04:12 2017 by user: Admin

df$SysBP%>%split(df$Infection)%>%map(~shapiro.test(.))

## $No_infection
## 
##  Shapiro-Wilk normality test
## 
## data:  .
## W = 0.97222, p-value = 0.0164
## 
## 
## $With_infection
## 
##  Shapiro-Wilk normality test
## 
## data:  .
## W = 0.97995, p-value = 0.2148

Those results indicate that the distribution of systolic BP is only normal in Infection group (p>0.05) but not in the control group (as the outcome was more negatively skewed).

However, by inspecting the density curves, we found that this violation is tolerable. The curve showed an approximately normal distribution in both groups. Though there were some outliers in the data, we decided to retain the original data as the number of outliers was relatively small and most of observations were well distributed around the median.

df%>%ggplot(aes(x=SysBP,fill=Infection))+geom_density(alpha=0.5)+theme_bw()+scale_fill_manual(values=c("grey80","grey20"))

df%>%ggplot(aes(x=Infection,y=SysBP,fill=Infection))+geom_jitter(size=3,alpha=0.3,shape=21,color="black")+geom_boxplot(alpha=0.7,show.legend = F)+theme_bw()+scale_fill_manual(values=c("grey80","grey20"))+coord_flip()

Step 2: Performing Independent sample t test

This step consists of performing our t-test. This method is supported by the basic R. However, before doing this we would like to verify the 6th assumption about homogenous variance using Levene test.

aov(SysBP ~ Infection, data=df)%>%car::leveneTest()

## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   1  0.0897 0.7648
##       198

The Levene test for variance homogeneity confirms that our data met the 6th Assumption

There are 2 important arguments in the function that could change our study design and the content of our alternative hypothesis :

alternative could be set as “two.side” if we don’t want to check whether the difference is negative or positive. By setting this argument to “greater”, our question will be whether the Control group would has SysBP higher than that in Infection group.

The paired argument could be ignored now, but to warrant a correct study design (independent sample t test), we set this as ‘FALSE’. When the paired is TRUE, the test consists of compring the mean value of SysBP at two time points on the SAME patients, which is not our situation.

t.test(SysBP ~ Infection, alternative="two.sided", paired=FALSE, data=df)

## 
##  Welch Two Sample t-test
## 
## data:  SysBP by Infection
## t = 3.3252, df = 183.13, p-value = 0.001067
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   6.189174 24.250892
## sample estimates:
##   mean in group No_infection mean in group With_infection 
##                     138.6724                     123.4524

t.test(SysBP ~ Infection, alternative="greater", paired=FALSE, data=df)

## 
##  Welch Two Sample t-test
## 
## data:  SysBP by Infection
## t = 3.3252, df = 183.13, p-value = 0.0005337
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  7.652902      Inf
## sample estimates:
##   mean in group No_infection mean in group With_infection 
##                     138.6724                     123.4524

The output of t test provides following information:

The name of test (Two samples t-test) and the t value indicate that we are using t-distribution for comparing two samples.

df Indicates the degrees of freedom, which is calculated by N-2 t=3.3252 indicates the value of t statistic

The 95% CI represents the confidence interval for the difference in mean SysBP between two samples.

Effect size of the t-test

Likes all hypothesis statistical test, the independent t-test can only confirm whether a difference between two sample might exist. However, the p-value cannot provide any idea about the size of difference (or the importance of factor’s effect on the outcome).

This could be measured via the “effect size”. In this tutorial we will introduce all 3 types of effect-size measures that could be used to assess the effect of a two-levels factor on the outcome. These measures include: Cohen’s d (Jacob Cohen,1988), Glass’s Delta (Gene V Glass, 1976) and Hedge’s g (Larry Hedges, 1981).

effectsize=function(outcome,factor,data){
  d=data
  n1=Hmisc::describe(d[factor])%>%.[1]%>%.[[factor]]%>%.$values%>%.$frequency%>%.[1]
  n2=Hmisc::describe(d[factor])%>%.[1]%>%.[[factor]]%>%.$values%>%.$frequency%>%.[2]
  var=d[outcome]%>%split(d[factor])%>%map(~var(.))
  v1=var[[1]]%>%.[1]
  v2=var[[2]]%>%.[1]
  m1=d[outcome]%>%split(d[factor])%>%map(~psych::describe(.))%>%.[1]%>%.[[1]]%>%.$mean
  m2=d[outcome]%>%split(d[factor])%>%map(~psych::describe(.))%>%.[2]%>%.[[1]]%>%.$mean
  sd2=d[outcome]%>%split(d[factor])%>%map(~psych::describe(.))%>%.[2]%>%.[[1]]%>%.$sd
  s=sqrt(((n1-1)*(v1)+(n2-1)*(v2))/(n1+n2-2))
  sdPool=sqrt(((n1-1)*v1+(n2-1)*v2)/(n1+n2-2))
  cohend=((m1-m2)/sdPool)
  glassdelta=(m1-m2)/sd2 
  hedgeG=((m1-m2)/s)*(1- (3/(4*(n1+n2)-9)))
cat("Effect size for Independent t test","\n")
cat("Cohen's d value=",cohend,"\n")
cat("Glass's delta value=",glassdelta,"\n")
cat("Hedge's G=",hedgeG,"\n")
  }

effectsize(outcome="SysBP",factor="Infection",data=df)

## Effect size for Independent t test 
## Cohen's d value= 0.4732461 
## Glass's delta value= 0.4848978 
## Hedge's G= 0.4714513

Step 3: Bootstraping a t-test

Bootstraping is not mandatory for a routine analysis as this might require more skills in function writing and extracting information from the output. However, you might find it helpful when handling a dataset with limited sample size and/or more outliers.

First, we should write down a statistical function called bootT, it will provide the F-test, etasquared and Omegasquared for our resampled data.

The we call the boot package and perform a bootstrap with our function:

bootT=function(outcome,factor,data,alternative,i){
  d= data[i,]
  out=t.test(get(outcome)~get(factor),alternative=alternative,paired=FALSE,data=d)
  tv=out$statistic[[1]]
  pv=out$p.value
  md=out$estimate[[1]]-out$estimate[[2]]
  n1=Hmisc::describe(d[factor])%>%.[1]%>%.[[factor]]%>%.$values%>%.$frequency%>%.[1]
  n2=Hmisc::describe(d[factor])%>%.[1]%>%.[[factor]]%>%.$values%>%.$frequency%>%.[2]
  var=d[outcome]%>%split(d[factor])%>%map(~var(.))
  v1=var[[1]]%>%.[1]
  v2=var[[2]]%>%.[1]
  m1=d[outcome]%>%split(d[factor])%>%map(~psych::describe(.))%>%.[1]%>%.[[1]]%>%.$mean
  m2=d[outcome]%>%split(d[factor])%>%map(~psych::describe(.))%>%.[2]%>%.[[1]]%>%.$mean
  sd2=d[outcome]%>%split(d[factor])%>%map(~psych::describe(.))%>%.[2]%>%.[[1]]%>%.$sd
  s=sqrt(((n1-1)*(v1)+(n2-1)*(v2))/(n1+n2-2))
  sdPool=sqrt(((n1-1)*v1+(n2-1)*v2)/(n1+n2-2))
  cohend=((m1-m2)/sdPool)
  glassdelta=(m1-m2)/sd2 
  hedgeG=((m1-m2)/s)*(1- (3/(4*(n1+n2)-9)))
  return=cbind(md,tv,pv,cohend,glassdelta,hedgeG)
}

set.seed(123)
library(boot)
res=boot(statistic=bootT,outcome="SysBP",factor="Infection",alternative="two.sided",data=df,R=1000)%>%.$t%>%as_tibble()

names(res)=c("Mean_dif","t_value","p_value","Cohen_d","Glass_delta","Hedge_g")
res$iteration=c(1:nrow(res))

md=res%>%ggplot(aes(x=Mean_dif))+geom_histogram(color="black",fill="gold",alpha=0.8)+theme_bw()+geom_vline(xintercept=0,size=0.8,linetype="dashed",color="red4")
pv=res%>%ggplot(aes(x=p_value))+geom_histogram(color="black",fill="gold",alpha=0.8)+theme_bw()+geom_vline(xintercept=0.05,size=0.8,linetype="dashed",color="red4")+scale_x_continuous(limits=c(0,0.1))
chd=res%>%ggplot(aes(x=Cohen_d))+geom_histogram(color="black",fill="gold",alpha=0.8)+theme_bw()+geom_vline(xintercept=0.5,size=0.8,linetype="dashed",color="red4")
tv=res%>%ggplot(aes(x=t_value))+geom_histogram(color="black",fill="gold",alpha=0.8)+theme_bw()+geom_vline(xintercept=0.5,size=0.8,linetype="dashed",color="red4")

library(gridExtra)
grid.arrange(md,tv,pv,chd,ncol=2)

p1=res%>%ggplot(aes(x=iteration,y=Mean_dif))+geom_path(color="skyblue",size=0.5,alpha=0.8)+theme_bw()+geom_hline(yintercept=0,size=0.8,linetype="dashed",color="red4")
p2=res%>%ggplot(aes(x=iteration,y=p_value))+geom_path(color="red",size=0.5,alpha=0.6)+theme_bw()+geom_hline(yintercept=0.05,size=0.8,linetype="dashed",color="red4")+scale_y_continuous(limits=c(0,0.1))
p3=res%>%ggplot(aes(x=iteration,y=Cohen_d))+geom_path(color="violet",size=0.5,alpha=0.7)+theme_bw()+geom_hline(yintercept=0.5,size=0.8,linetype="dashed",color="red4")+geom_hline(yintercept=0.2,size=0.8,linetype="dashed",color="red2")+geom_hline(yintercept=0.8,size=0.8,linetype="dashed",color="red2")

grid.arrange(p1,p2,p3)

Hmisc::describe(res[,-7])

## res[, -7] 
## 
##  6  Variables      1000  Observations
## ---------------------------------------------------------------------------
## Mean_dif 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1000        0     1000        1    15.19    5.076    8.426    9.351 
##      .25      .50      .75      .90      .95 
##   11.823   15.157   18.316   20.943   22.338 
## 
## lowest :  1.901425  3.079167  3.678571  3.715620  3.730349
## highest: 28.016420 28.212158 28.236531 28.352000 30.940976
## ---------------------------------------------------------------------------
## t_value 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1000        0     1000        1    3.326    1.082    1.825    2.109 
##      .25      .50      .75      .90      .95 
##    2.638    3.342    3.979    4.517    4.911 
## 
## lowest : 0.4645500 0.6764279 0.7914471 0.8084723 0.8639911
## highest: 5.8327276 5.9289026 6.3084548 6.6720448 6.7438653
## ---------------------------------------------------------------------------
## p_value 
##         n   missing  distinct      Info      Mean       Gmd       .05 
##      1000         0      1000         1   0.01455   0.02443 2.039e-06 
##       .10       .25       .50       .75       .90       .95 
## 1.098e-05 9.974e-05 1.016e-03 9.148e-03 3.632e-02 6.981e-02 
## 
## lowest : 1.748234e-10 2.580845e-10 1.865078e-09 1.847772e-08 2.375263e-08
## highest: 3.887575e-01 4.198262e-01 4.299838e-01 4.997410e-01 6.428143e-01
## ---------------------------------------------------------------------------
## Cohen_d 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1000        0     1000        1   0.4741   0.1533   0.2608   0.2994 
##      .25      .50      .75      .90      .95 
##   0.3739   0.4758   0.5666   0.6411   0.6964 
## 
## lowest : 0.06083892 0.09908121 0.11451080 0.11871247 0.12209921
## highest: 0.85991575 0.86432355 0.87774117 0.94583365 0.95701867
## ---------------------------------------------------------------------------
## Glass_delta 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1000        0     1000        1   0.4898   0.1652   0.2659   0.3035 
##      .25      .50      .75      .90      .95 
##   0.3851   0.4925   0.5856   0.6769   0.7297 
## 
## lowest : 0.08405683 0.09493313 0.11046839 0.11766258 0.13147488
## highest: 0.93610961 0.95550654 0.98456545 0.98680599 1.00106522
## ---------------------------------------------------------------------------
## Hedge_g 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1000        0     1000        1   0.4723   0.1527   0.2598   0.2983 
##      .25      .50      .75      .90      .95 
##   0.3725   0.4740   0.5644   0.6386   0.6937 
## 
## lowest : 0.06060817 0.09870543 0.11407650 0.11826224 0.12163613
## highest: 0.85665437 0.86104546 0.87441219 0.94224642 0.95338902
## ---------------------------------------------------------------------------

The bootstrap result could be explored by either descriptive statistics or graphical tools.

Those graphs show the distribution of 4 main metrics in our t-test, including the mean difference, value of t statistic, p value or significance level of our test, and the Cohen’s d as effect size of our factor. They were evaluated for each t-test on 1000 different version of data (i.e 200,000 observations in total). The line graphs represent the evolution of these statistics through out 1000 iterations.

Step 5: Reporting the results

Based on our results, we could report the results of our study as follows:

“Data are expressed as mean ± standard deviation. The studied population included 84 patients with suspected Infection and 116 patients without any infection. The systolic blood pressure was significantly lower in patients with Infection compared to the other group (123.39 ± 31.39 vs 138.67 ± 32.71; t (183.13) = 3.3252, p (two-sided) <0.001)”

We can also report our result using the bootstrap output, that might generalise our results: The difference in mean sysBP between control and infection groups was consistently above zero. The upper limit of highest density interval for p-value was 0.069 which is a little bit higher than 0.05, suggesting that there is still a slight probability we could fail to reject the null hypothesis. The Cohen’s d varied from 0.26 (small effect) to 0.79 (large effect).

Exercise

Perform the same analysis but replacing the outcome by Heart rate.
Try to perform a t-test for comparing the sysBP between the patients who died and patients who survived.
Try to modify the bootstrap function to explore the mean of Systolic BP and Heart rate between two levels of any factor in the dataset. (Hint: the describeBy function might help)

Thank you

ESMS - Case study 2: Independent samples t-test

Lê Đông Nhật Nam

23 Feb 2017

Case study 2: Independent samples t-test