Agenda

We will learn about the following topics in this section.

Usual Preparation

## Load tidyverse
library(tidyverse)
## Load CSV file
framingham <- read_csv("./framingham.csv")

Hypothesis testing functions

t-test

The two-sample t-test is commonly used to test the hypothesis regarding mean difference between two independent groups. In R, this has the following syntax if we are comparing the mean BMI across SEX.

## Welch t-test is the default
(ttest_welch <- t.test(BMI ~ SEX, data = framingham))
## 
##  Welch Two Sample t-test
## 
## data:  BMI by SEX
## t = -4.8099, df = 4403.8, p-value = 0.00000156
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.8117584 -0.3416388
## sample estimates:
## mean in group 0 mean in group 1 
##        25.59288        26.16958
## Student t-test
(ttest_student <- t.test(BMI ~ SEX, data = framingham, var.equal = TRUE))
## 
##  Two Sample t-test
## 
## data:  BMI by SEX
## t = -4.6471, df = 4413, p-value = 0.000003464
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.8199941 -0.3334030
## sample estimates:
## mean in group 0 mean in group 1 
##        25.59288        26.16958

In R, most functions do not print the result, but return a result object that contains detail information, which then is printed. To examine what types of information is held in the result object, use the names() function.

names(ttest_welch)
## [1] "statistic"   "parameter"   "p.value"     "conf.int"    "estimate"   
## [6] "null.value"  "alternative" "method"      "data.name"

Elements can be extracted as follows.

## p-value using $ operator
ttest_welch$p.value
## [1] 0.00000156033
## t-statistic using [[]] operator
ttest_welch[["statistic"]]
##         t 
## -4.809922

The object created by BMI ~ SEX is of the “formula” class. it roughly means, “explain LHS by RHS”. “formula” objects are extensively used in regression modeling.

class(BMI ~ SEX)
## [1] "formula"

Tests for 2x2 tables

In epidemiology, 2x2 tables are often encountered. Given the raw data, one way to construct a 2x2 table is the following using xtabs(). The formula “~ SEX + CURSMOKE” means explain the LHS (implicitly count of observations) by SEX and CURSMOKE (current smoking status).

cross_tab1 <- xtabs( ~ SEX + CURSMOKE, data = framingham)
cross_tab1
##    CURSMOKE
## SEX    0    1
##   0 1484 1006
##   1  769 1175

If you already have a 2x2 table and want to manually enter it, it’s a bit clumsy…

cross_tab2 <- matrix(c(1484, 1006,
                       769, 1175),
                     ## Number of columns 2
                     ncol = 2,
                     ## Row-based entry
                     byrow = TRUE)
cross_tab2
##      [,1] [,2]
## [1,] 1484 1006
## [2,]  769 1175

\(\chi^2\) test

The \(\chi^2\) test can be conducted using the 2x2 table created above.

## Default is with continuity correction by subtraction of 0.5.
(chisq_correct <- chisq.test(cross_tab1))
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  cross_tab1
## X-squared = 174.63, df = 1, p-value < 2.2e-16
## Without continuity correction
(chisq_no_correct <- chisq.test(cross_tab1, correct = FALSE))
## 
##  Pearson's Chi-squared test
## 
## data:  cross_tab1
## X-squared = 175.43, df = 1, p-value < 2.2e-16

Check what elements the result object has using names().

Fisher’s exact test

Fisher’s exact test can use the same syntax.

fisher.test(cross_tab1)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  cross_tab1
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  1.993100 2.548945
## sample estimates:
## odds ratio 
##   2.253545

More 2x2 table tools

Different types of 2x2 tables exist depending on the type of assessment you are trying to do.

Exposure-outcome 2x2 table

This type of 2x2 tables are used to assess the association between an exposure of interest (environmental exposures, drugs, etc) and the outcome of interest (cancer, etc). Both variables have to be binary. Let’s examine the association between current smoking and death during the follow up. First create a two by two table.

tab_death_cursmoke <- xtabs( ~ CURSMOKE + DEATH, data = framingham)
tab_death_cursmoke
##         DEATH
## CURSMOKE    0    1
##        0 1491  762
##        1 1393  788

epiR::epi.2by2() function can be used to obtain measures that you will hear about in your epidemiology and biostatistics courses. This function takes a 2x2 table of specific format. The outcome occupies the columns, positive first and negative next. The rows are exposure status, positive first and negative next.

     Where method is ‘cohort.count’, ‘case.control’, or
     ‘cross.sectional’ and ‘outcome = as.columns’ the required 2 by 2
     table format is:

       ----      ----       ----       ----
                 Disease +  Disease -  Total
       ----      ----       ----       ----
       Expose +  a          b          a+b
       Expose -  c          d          c+d
       ----      ----       ----       ----
       Total     a+c        b+d        a+b+c+d
       ----      ----       ----       ----
library(epiR)
## Loading required package: survival
## Package epiR 0.9-82 is loaded
## Type help(epi.about) for summary information
## 
## Reorder columns and rows
tab_death_cursmoke[2:1, 2:1]
##         DEATH
## CURSMOKE    1    0
##        1  788 1393
##        0  762 1491
## epi.2by2()
res_2by2 <- epi.2by2(tab_death_cursmoke[2:1, 2:1], method = "cohort.count")
## Warning in N0 * (N0 + N1) * a: NAs produced by integer overflow
## Warning in N0 * N1 * (c + a): NAs produced by integer overflow

## Warning in N0 * N1 * (c + a): NAs produced by integer overflow
## Warning in N0 * (N0 + N1) * a: NAs produced by integer overflow
## Warning in N0 * N1 * (c + a): NAs produced by integer overflow

## Warning in N0 * N1 * (c + a): NAs produced by integer overflow
res_2by2
##              Outcome +    Outcome -      Total        Inc risk *
## Exposed +          788         1393       2181              36.1
## Exposed -          762         1491       2253              33.8
## Total             1550         2884       4434              35.0
##                  Odds
## Exposed +       0.566
## Exposed -       0.511
## Total           0.537
## 
## Point estimates and 95 % CIs:
## -------------------------------------------------------------------
## Inc risk ratio                               1.07 (0.99, 1.16)
## Odds ratio                                   1.11 (0.98, 1.25)
## Attrib risk *                                2.31 (-0.50, 5.12)
## Attrib risk in population *                  1.14 (-1.27, 3.54)
## Attrib fraction in exposed (%)               6.39 (-1.44, 13.61)
## Attrib fraction in population (%)            3.25 (-0.78, 7.12)
## -------------------------------------------------------------------
##  X2 test statistic: 2.598 p-value: 0.107
##  Wald confidence limits
##  * Outcomes per 100 population units
## Unformatted more detailed results
summary(res_2by2)
## $RR.strata.wald
##       est     lower    upper
## 1 1.06826 0.9858212 1.157592
## 
## $RR.strata.score
##       est lower upper
## 1 1.06826    NA    NA
## 
## $OR.strata.wald
##        est     lower    upper
## 1 1.106873 0.9782858 1.252362
## 
## $OR.strata.score
##        est     lower    upper
## 1 1.106873 0.9774559 1.253368
## 
## $OR.strata.cfield
##        est     lower    upper
## 1 1.106873 0.9782489 1.252407
## 
## $OR.strata.mle
##        est     lower    upper
## 1 1.106846 0.9763701 1.254817
## 
## $ARisk.strata.wald
##        est      lower    upper
## 1 2.308644 -0.4986352 5.115924
## 
## $ARisk.strata.score
##        est      lower    upper
## 1 2.308644 -0.4986573 5.114424
## 
## $PARisk.strata.wald
##        est     lower    upper
## 1 1.135578 -1.269871 3.541028
## 
## $PARisk.strata.piri
##        est     lower    upper
## 1 1.135578 -0.245687 2.516843
## 
## $AFRisk.strata.wald
##          est       lower     upper
## 1 0.06389788 -0.01438277 0.1361376
## 
## $PAFRisk.strata.wald
##          est        lower     upper
## 1 0.03248486 -0.007847079 0.0712028
## 
## $chisq.strata
##   test.statistic df   p.value
## 1       2.597764  1 0.1070146

Test-gold standard 2x2 table

This type of 2x2 tables are used when you are assessing performance of diagnostic tests. Here we use an example mentioned in the epiR package. The 2x2 table is similar except that instead of an exposure, we have a test as the rows.

## epiR package's example
## Scott et al. 2008, Table 1:
## A new diagnostic test was trialled on 1586 patients. Of 744 patients
## that were disease positive, 670 tested positive. Of 842 patients that
## were disease negative, 640 tested negative. What is the likeliood
## ratio of a positive test? What is the number needed to diagnose?
dat <- matrix(c(670, 202, 74, 640), nrow = 2, byrow = TRUE)
colnames(dat) <- c("Dis+","Dis-")
rownames(dat) <- c("Test+","Test-")
dat
##       Dis+ Dis-
## Test+  670  202
## Test-   74  640
## epi.tests()
rval <- epi.tests(dat, conf.level = 0.95)
rval
##           Outcome +    Outcome -      Total
## Test +          670          202        872
## Test -           74          640        714
## Total           744          842       1586
## 
## Point estimates and 95 % CIs:
## ---------------------------------------------------------
## Apparent prevalence                    0.55 (0.52, 0.57)
## True prevalence                        0.47 (0.44, 0.49)
## Sensitivity                            0.90 (0.88, 0.92)
## Specificity                            0.76 (0.73, 0.79)
## Positive predictive value              0.77 (0.74, 0.80)
## Negative predictive value              0.90 (0.87, 0.92)
## Positive likelihood ratio              3.75 (3.32, 4.24)
## Negative likelihood ratio              0.13 (0.11, 0.16)
## ---------------------------------------------------------
## Unformatted more detailed results
summary(rval)
##                 est      lower      upper
## aprev     0.5498108  0.5249373  0.5744996
## tprev     0.4691047  0.4443055  0.4940184
## se        0.9005376  0.8767462  0.9210923
## sp        0.7600950  0.7297765  0.7885803
## diag.acc  0.8259773  0.8064049  0.8443346
## diag.or  28.6861119 21.5181917 38.2417364
## nnd       1.5137005  1.4091004  1.6487431
## youden    0.6606326  0.6065226  0.7096726
## ppv       0.7683486  0.7388926  0.7959784
## npv       0.8963585  0.8716393  0.9177402
## plr       3.7537262  3.3206884  4.2432346
## nlr       0.1308552  0.1050643  0.1629771