04 statistical tools

Agenda

We will learn about the following topics in this section.

Probability related functions
- Probability density/mass functions
- Cumulative distribution functions
- Inverse cumulative distribution functions
- (Pseudo)random generator
Hypothesis testing functions
- t-test
- Tests for 2x2 tables
More 2x2 table tools
- Exposure-outcome 2x2 table
- Test-gold standard 2x2 table

Usual Preparation

## Load tidyverse
library(tidyverse)
## Load CSV file
framingham <- read_csv("./framingham.csv")

Probability related functions

For each well-known distributions (eg, norm, binom, t, etc), we have the following functions.

dDIST: Probability density or mass function (pdf/pmf). Often denoted with \(f\).
pDIST: Cumulative distribution function (cdf). Often detonated with \(F\).
qDIST: Inverse cumulative distribution function. Often detonated with \(F^{-1}\).
rDIST: (Pseudo)random number generator for the distribution.

Probability density/mass functions

A probability density function maps a real number to a density \([0,\infty)\). A probability mass function maps a real number to a probability \([0,1]\).

Normal density

The normal density function is dnorm. Let’s evaluate it at some values. We can also use -3:3 to give a vector of integers -3,-2,-1,0,1,2,3. Without any additional arguments, this is the standard normal distribution.

dnorm(c(-3,-2,-1,0,1,2,3))

## [1] 0.004431848 0.053990967 0.241970725 0.398942280 0.241970725 0.053990967
## [7] 0.004431848

The famed standard normal density curve can be drawn using the dnorm function within the stat_function().

## data_frame(x = 0) is a dummy data object.
ggplot(data = data_frame(x = 0), mapping = aes(x = x)) +
    ## geom_path is implicit.
    stat_function(fun = dnorm) +
    ## Widen the range of the X axis.
    scale_x_continuous(limits = c(-3,+3)) +
    ## Avoid the gray background.
    theme_bw()

\(N(1,2^2)\) density can be looked up with if specified as follows.

dnorm(-3:3, mean = 1, sd = 2)

## [1] 0.02699548 0.06475880 0.12098536 0.17603266 0.19947114 0.17603266
## [7] 0.12098536

It can be graphed as follows.

## data_frame(x = 0) is a dummy data object.
ggplot(data = data_frame(x = 0), mapping = aes(x = x)) +
    ## partial(dnorm, mean = 1, sd = 2) is the same as function(x){denorm(x = x, mean = 1, sd = 2)}
    stat_function(fun = partial(dnorm, mean = 1, sd = 2)) +
    ## Widen the range of the X axis.
    scale_x_continuous(limits = c(-5,+5)) +
    ## Avoid the gray background.
    theme_bw()

Sometimes it’s helpful to create your own custom function. Let’s see how close the BMI distribution is to a normal distribution with the corresponding mean and SD. The option na.rm = TRUE removes (ignores) missing values NA when calculating the summary.

## Outer parentheses force printing.
(bmi_mean <- mean(framingham$BMI, na.rm = TRUE))

## [1] 25.84616

(bmi_sd <- sd(framingham$BMI, na.rm = TRUE))

## [1] 4.101821

## Create a cumstom density function with the above mean and sd.
## We could also write bmi_dnorm <- partial(dnorm, mean = bmi_mean, sd = bmi_sd).
bmi_dnorm <- function(x) {
    dnorm(x = x, mean = bmi_mean, sd = bmi_sd)
}
bmi_dnorm

## function(x) {
##     dnorm(x = x, mean = bmi_mean, sd = bmi_sd)
## }

##
ggplot(data = framingham, mapping = aes(x = BMI)) +
    ## First layer
    ## Force Y axis to be density rather than frequency (count).
    geom_histogram(mapping = aes(y = ..density..)) +
    ## Second layer
    ## Overlay normal distribution.
    stat_function(fun = bmi_dnorm) +
    theme_bw() + theme(legend.key = element_blank())

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Warning: Removed 19 rows containing non-finite values (stat_bin).

Binomial probability

A binomial probability mass function is given by the dbinom function. Let’s examine \(Binom(10, 0.2)\). It’s probability clearer to produce a dataset here.

## Generate a dataset holding x's and corresponding probabilities.
data1 <- data_frame(x = seq(from = 0, to = 10, by = 1), # 0,1,...,10
                    ## Corresponding probabilities.
                    p = dbinom(x = x, size = 10, prob = 0.2))
data1

## # A tibble: 11 x 2
##        x            p
##    <dbl>        <dbl>
##  1     0 0.1073741824
##  2     1 0.2684354560
##  3     2 0.3019898880
##  4     3 0.2013265920
##  5     4 0.0880803840
##  6     5 0.0264241152
##  7     6 0.0055050240
##  8     7 0.0007864320
##  9     8 0.0000737280
## 10     9 0.0000040960
## 11    10 0.0000001024

##
ggplot(data = data1, mapping = aes(x = x, y = p)) +
    ## Use the bar geometric object without usual statistical transformation.
    ## geom_col() is equivalent here.
    geom_bar(stat = "identity") +
    ## Force X axis ticks at all values of x.
    scale_x_continuous(breaks = data1$x) +
    theme_bw()

Cumulative distribution functions

Normal distribution

pnorm(q) gives the lower-tail probability at value q. This is easier to understand if visualized as an area under the density curve.

## Area under the curve <= -1
pnorm(-1)

## [1] 0.1586553

## Visualize the area under the curve <= -1
## Create a dataset with fine increments for smooth plotting.
data2 <- data_frame(x = seq(from = -5, to = +5, by = 0.001),
                    d = dnorm(x = x))
ggplot(data = data2, mapping = aes(x = x, y = d)) +
    ## Entire curve
    geom_path() +
    ## Fill the desired area <= -1. Ribbon gives a colored band.
    geom_ribbon(data = filter(data2, x <= -1), # Use filtered data for this geom only.
                ## Constants are placed outside aes().
                alpha = 0.5,                   # Semi-transparalent.
                ymin = 0,                      # Lower end at 0.
                ## Variables should be specified in aes().
                mapping = aes(ymax = d)) +     # Upper end at density.
    theme_bw()

Binomial distribution

pbinom(q) also gives the lower-tail probability. With a discrete distribution, be careful of the bounds. pbinom(q) is inclusive of q. 1 - pbinom(q) or equivalently pbinom(q, lower.tail = FALSE) are not inclusive of q.

## Lower-tail probability including 1.
pbinom(q = 1, size = 10, prob = 0.2)

## [1] 0.3758096

## Summation sum() can be used with pmf.
sum(dbinom(x = 0:1, # This creates c(0,1) vector.
           size = 10, prob = 0.2))

## [1] 0.3758096

## Visualize reusing data1 above.
ggplot(data = data1, mapping = aes(x = x, y = p)) +
    ## Use the bar geometric object without usual statistical transformation.
    ## Blank bars.
    geom_bar(stat = "identity", fill = "white", color = "black") +
    ## Filled bars.
    geom_bar(data = filter(data1, x <= 1),
             stat = "identity", fill = "gray", color = "black") +
    ## Force X axis ticks at all values of x.
    scale_x_continuous(breaks = data1$x) +
    theme_bw()

## Upper-tail probability excluding 1.
pbinom(q = 1, size = 10, prob = 0.2, lower.tail = FALSE)

## [1] 0.6241904

## Summation sum() can be used with pmf.
sum(dbinom(x = 2:10, # This creates c(0,1) vector.
           size = 10, prob = 0.2))

## [1] 0.6241904

## Visualize reusing data1 above.
ggplot(data = data1, mapping = aes(x = x, y = p)) +
    ## Use the bar geometric object without usual statistical transformation.
    ## Blank bars.
    geom_bar(stat = "identity", fill = "white", color = "black") +
    ## Filled bars.
    geom_bar(data = filter(data1, x > 1),
             stat = "identity", fill = "gray", color = "black") +
    ## Force X axis ticks at all values of x.
    scale_x_continuous(breaks = data1$x) +
    theme_bw()

Inverse cumulative distribution functions

Normal distribution

Let’s find out the point at which the lower-tail probability is 2.5% and the point at which the upper-tail probability is 5.0%. They correspond to the upper end of the blue region and the lower end of the red region.

## Value at which the lower-tail probability is 2.5%.
qnorm(p = 0.025)

## [1] -1.959964

## Value at which the upper-tail probability is 5%.
qnorm(p = 0.05, lower.tail = FALSE)

## [1] 1.644854

ggplot(data = data2, mapping = aes(x = x, y = d)) +
    ## Entire curve
    geom_path() +
    ## Fill the desired area <= -1. Ribbon gives a colored band.
    geom_ribbon(data = filter(data2, x <= qnorm(p = 0.025)), # Use filtered data for this geom only.
                ## Constants are placed outside aes().
                fill = "blue",
                alpha = 0.5,                   # Semi-transparalent.
                ymin = 0,                      # Lower end at 0.
                ## Variables should be specified in aes().
                mapping = aes(ymax = d)) +     # Upper end at density.
    geom_ribbon(data = filter(data2, x > qnorm(p = 0.05, lower.tail = FALSE)),
                ## Constants are placed outside aes().
                fill = "red",
                alpha = 0.5,                   # Semi-transparalent.
                ymin = 0,                      # Lower end at 0.
                ## Variables should be specified in aes().
                mapping = aes(ymax = d)) +     # Upper end at density.
    theme_bw()

Binomial distribution

In the discrete case, the definition is the following by the qbinom() manual entry (type ?qbinom).

The quantile is defined as the smallest value \(x\) such that \(F(x) \ge p\), where \(F\) is the distribution function.

Now let’s examine what this means in the discrete case.

## Value at which the lower-tail probability is >= 0.025
qbinom(p = 0.025, size = 10, prob = 0.2)

## [1] 0

## Visualize reusing data1 above.
ggplot(data = data1, mapping = aes(x = x, y = p)) +
    ## Use the bar geometric object without usual statistical transformation.
    ## Blank bars.
    geom_bar(stat = "identity", fill = "white", color = "black") +
    ## Filled bars.
    geom_bar(data = filter(data1, x == 0),
             stat = "identity", fill = "blue", color = "black", alpha = 0.5) +
    ## Force X axis ticks at all values of x.
    scale_x_continuous(breaks = data1$x) +
    theme_bw()

The blue region is >= 0.025.

## Value at which the upper-tail probability is >= 0.05 .
qbinom(p = 0.05, size = 10, prob = 0.2, lower.tail = FALSE)

## [1] 4

## This is not enough.
sum(dbinom(5:10, size = 10, prob = 0.2))

## [1] 0.0327935

## This is >= 0.05.
sum(dbinom(4:10, size = 10, prob = 0.2))

## [1] 0.1208739

## Visualize reusing data1 above.
ggplot(data = data1, mapping = aes(x = x, y = p)) +
    ## Use the bar geometric object without usual statistical transformation.
    ## Blank bars.
    geom_bar(stat = "identity", fill = "white", color = "black") +
    ## Filled bars.
    geom_bar(data = filter(data1, x >= 4),
             stat = "identity", fill = "red", color = "black", alpha = 0.5) +
    ## Force X axis ticks at all values of x.
    scale_x_continuous(breaks = data1$x) +
    theme_bw()

The red region is >= 0.05.

(Pseudo)random number generators

Nothing a digital computer does is truly random. Thus, “random numbers” that computers generate are really pseudorandom numbers. But for the most purpose they should be good enough. In R, (pseudo)random number generators are named rDIST. Because these are pseudorandom numbers, reproducibility can be assured by setting the random number seed by set.seed().

Normal distribution

Let’s simulate \(X_i \sim N(\mu, \sigma^2)\) where \(\mu\) is the mean BMI and \(\sigma\) is the SD of BMI.

set.seed(seed = 613)
## Use rnorm
data3 <- data_frame(X = rnorm(n = 3000, mean = bmi_mean, sd = bmi_sd))
data3

## # A tibble: 3,000 x 1
##           X
##       <dbl>
##  1 34.15806
##  2 20.67401
##  3 23.98550
##  4 26.48643
##  5 22.13409
##  6 27.05949
##  7 24.82091
##  8 26.85969
##  9 22.81506
## 10 18.42517
## # ... with 2,990 more rows

## Histogram
ggplot(data = data3, mapping = aes(x = X)) +
    geom_histogram() +
    theme_bw() + theme(legend.key = element_blank())

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Binomial distribution

Now let’s try \(X_i \sim Binom(10, 0.2)\).

data4 <- data_frame(X = rbinom(n = 3000, size = 10, prob = 0.2))
data4

## # A tibble: 3,000 x 1
##        X
##    <int>
##  1     0
##  2     7
##  3     3
##  4     3
##  5     1
##  6     0
##  7     2
##  8     2
##  9     1
## 10     2
## # ... with 2,990 more rows

ggplot(data = data4, mapping = aes(x = X)) +
    ## Default stat is "count".
    geom_bar() +
    ## Force the limits to include 0,...,10.
    scale_x_continuous(breaks = 0:10, limits = c(-1,11)) +
    theme_bw() + theme(legend.key = element_blank())

Examining the sampling distribution of a sample mean

Let \(\bar{X}_{30} = \frac{1}{30}\sum_{i=1}^{30} X_i\) where independent \(X_i \sim Binom(1, 0.2) = Bernoulli(0.2)\). What does the distribution of \(\bar{X}\) look like if we repeat the experiment 3000 times.

## Create a vector to hold sample means.
Xbars30 <- numeric(length = 10^4)
## Repeat sample mean generation 10^4 times.
for (i in 1:10^4) {
    ## Assign the generated value to the i-th element of the vector.
    Xbars30[i] <- mean(rbinom(n = 30, size = 1, prob = 0.2))
}
## Check the moments.
mean(Xbars30)

## [1] 0.19919

var(Xbars30)

## [1] 0.00541044

## Plot frequencies
ggplot(data = data_frame(Xbar = Xbars30), mapping = aes(x = Xbar)) +
    geom_bar() +
    scale_x_continuous(limit = c(0,1)) +
    theme_bw() + theme(legend.key = element_blank())

What if the sample size is n = 100 each time?

## Create a vector to hold sample means.
Xbars100 <- numeric(length = 10^4)
## Repeat sample mean generation 10^4 times.
for (i in 1:10^4) {
    ## Assign the generated value to the i-th element of the vector.
    Xbars100[i] <- mean(rbinom(n = 100, size = 1, prob = 0.2))
}
## Check the moments.
mean(Xbars100)

## [1] 0.199929

var(Xbars100)

## [1] 0.001606926

## Plot frequencies
ggplot(data = data_frame(Xbar = Xbars100), mapping = aes(x = Xbar)) +
    geom_bar() +
    scale_x_continuous(limit = c(0,1)) +
    theme_bw() + theme(legend.key = element_blank())

What do you notice as you increase the sample size?

Hypothesis testing functions

t-test

The two-sample t-test is commonly used to test the hypothesis regarding mean difference between two independent groups. In R, this has the following syntax if we are comparing the mean BMI across SEX.

## Welch t-test is the default
(ttest_welch <- t.test(BMI ~ SEX, data = framingham))

## 
##  Welch Two Sample t-test
## 
## data:  BMI by SEX
## t = -4.8099, df = 4403.8, p-value = 0.00000156
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.8117584 -0.3416388
## sample estimates:
## mean in group 0 mean in group 1 
##        25.59288        26.16958

## Student t-test
(ttest_student <- t.test(BMI ~ SEX, data = framingham, var.equal = TRUE))

## 
##  Two Sample t-test
## 
## data:  BMI by SEX
## t = -4.6471, df = 4413, p-value = 0.000003464
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.8199941 -0.3334030
## sample estimates:
## mean in group 0 mean in group 1 
##        25.59288        26.16958

In R, most functions do not print the result, but return a result object that contains detail information, which then is printed. To examine what types of information is held in the result object, use the names() function.

names(ttest_welch)

## [1] "statistic"   "parameter"   "p.value"     "conf.int"    "estimate"   
## [6] "null.value"  "alternative" "method"      "data.name"

Elements can be extracted as follows.

## p-value using $ operator
ttest_welch$p.value

## [1] 0.00000156033

## t-statistic using [[]] operator
ttest_welch[["statistic"]]

##         t 
## -4.809922

The object created by BMI ~ SEX is of the “formula” class. it roughly means, “explain LHS by RHS”. “formula” objects are extensively used in regression modeling.

class(BMI ~ SEX)

## [1] "formula"

Tests for 2x2 tables

In epidemiology, 2x2 tables are often encountered. Given the raw data, one way to construct a 2x2 table is the following using xtabs(). The formula “~ SEX + CURSMOKE” means explain the LHS (implicitly count of observations) by SEX and CURSMOKE (current smoking status).

cross_tab1 <- xtabs( ~ SEX + CURSMOKE, data = framingham)
cross_tab1

##    CURSMOKE
## SEX    0    1
##   0 1484 1006
##   1  769 1175

If you already have a 2x2 table and want to manually enter it, it’s a bit clumsy…

cross_tab2 <- matrix(c(1484, 1006,
                       769, 1175),
                     ## Number of columns 2
                     ncol = 2,
                     ## Row-based entry
                     byrow = TRUE)
cross_tab2

##      [,1] [,2]
## [1,] 1484 1006
## [2,]  769 1175

\(\chi^2\) test

The \(\chi^2\) test can be conducted using the 2x2 table created above.

## Default is with continuity correction by subtraction of 0.5.
(chisq_correct <- chisq.test(cross_tab1))

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  cross_tab1
## X-squared = 174.63, df = 1, p-value < 2.2e-16

## Without continuity correction
(chisq_no_correct <- chisq.test(cross_tab1, correct = FALSE))

## 
##  Pearson's Chi-squared test
## 
## data:  cross_tab1
## X-squared = 175.43, df = 1, p-value < 2.2e-16

Check what elements the result object has using names().

Fisher’s exact test

Fisher’s exact test can use the same syntax.

fisher.test(cross_tab1)

## 
##  Fisher's Exact Test for Count Data
## 
## data:  cross_tab1
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  1.993100 2.548945
## sample estimates:
## odds ratio 
##   2.253545

More 2x2 table tools

Different types of 2x2 tables exist depending on the type of assessment you are trying to do.

Exposure-outcome 2x2 table

This type of 2x2 tables are used to assess the association between an exposure of interest (environmental exposures, drugs, etc) and the outcome of interest (cancer, etc). Both variables have to be binary. Let’s examine the association between current smoking and death during the follow up. First create a two by two table.

tab_death_cursmoke <- xtabs( ~ CURSMOKE + DEATH, data = framingham)
tab_death_cursmoke

##         DEATH
## CURSMOKE    0    1
##        0 1491  762
##        1 1393  788

epiR::epi.2by2() function can be used to obtain measures that you will hear about in your epidemiology and biostatistics courses. This function takes a 2x2 table of specific format. The outcome occupies the columns, positive first and negative next. The rows are exposure status, positive first and negative next.

     Where method is ‘cohort.count’, ‘case.control’, or
     ‘cross.sectional’ and ‘outcome = as.columns’ the required 2 by 2
     table format is:

       ----      ----       ----       ----
                 Disease +  Disease -  Total
       ----      ----       ----       ----
       Expose +  a          b          a+b
       Expose -  c          d          c+d
       ----      ----       ----       ----
       Total     a+c        b+d        a+b+c+d
       ----      ----       ----       ----

library(epiR)

## Loading required package: survival

## Package epiR 0.9-82 is loaded

## Type help(epi.about) for summary information

##

## Reorder columns and rows
tab_death_cursmoke[2:1, 2:1]

##         DEATH
## CURSMOKE    1    0
##        1  788 1393
##        0  762 1491

## epi.2by2()
res_2by2 <- epi.2by2(tab_death_cursmoke[2:1, 2:1], method = "cohort.count")

## Warning in N0 * (N0 + N1) * a: NAs produced by integer overflow

## Warning in N0 * N1 * (c + a): NAs produced by integer overflow

## Warning in N0 * N1 * (c + a): NAs produced by integer overflow

## Warning in N0 * (N0 + N1) * a: NAs produced by integer overflow

## Warning in N0 * N1 * (c + a): NAs produced by integer overflow

## Warning in N0 * N1 * (c + a): NAs produced by integer overflow

res_2by2

##              Outcome +    Outcome -      Total        Inc risk *
## Exposed +          788         1393       2181              36.1
## Exposed -          762         1491       2253              33.8
## Total             1550         2884       4434              35.0
##                  Odds
## Exposed +       0.566
## Exposed -       0.511
## Total           0.537
## 
## Point estimates and 95 % CIs:
## -------------------------------------------------------------------
## Inc risk ratio                               1.07 (0.99, 1.16)
## Odds ratio                                   1.11 (0.98, 1.25)
## Attrib risk *                                2.31 (-0.50, 5.12)
## Attrib risk in population *                  1.14 (-1.27, 3.54)
## Attrib fraction in exposed (%)               6.39 (-1.44, 13.61)
## Attrib fraction in population (%)            3.25 (-0.78, 7.12)
## -------------------------------------------------------------------
##  X2 test statistic: 2.598 p-value: 0.107
##  Wald confidence limits
##  * Outcomes per 100 population units

## Unformatted more detailed results
summary(res_2by2)

## $RR.strata.wald
##       est     lower    upper
## 1 1.06826 0.9858212 1.157592
## 
## $RR.strata.score
##       est lower upper
## 1 1.06826    NA    NA
## 
## $OR.strata.wald
##        est     lower    upper
## 1 1.106873 0.9782858 1.252362
## 
## $OR.strata.score
##        est     lower    upper
## 1 1.106873 0.9774559 1.253368
## 
## $OR.strata.cfield
##        est     lower    upper
## 1 1.106873 0.9782489 1.252407
## 
## $OR.strata.mle
##        est     lower    upper
## 1 1.106846 0.9763701 1.254817
## 
## $ARisk.strata.wald
##        est      lower    upper
## 1 2.308644 -0.4986352 5.115924
## 
## $ARisk.strata.score
##        est      lower    upper
## 1 2.308644 -0.4986573 5.114424
## 
## $PARisk.strata.wald
##        est     lower    upper
## 1 1.135578 -1.269871 3.541028
## 
## $PARisk.strata.piri
##        est     lower    upper
## 1 1.135578 -0.245687 2.516843
## 
## $AFRisk.strata.wald
##          est       lower     upper
## 1 0.06389788 -0.01438277 0.1361376
## 
## $PAFRisk.strata.wald
##          est        lower     upper
## 1 0.03248486 -0.007847079 0.0712028
## 
## $chisq.strata
##   test.statistic df   p.value
## 1       2.597764  1 0.1070146

Test-gold standard 2x2 table

This type of 2x2 tables are used when you are assessing performance of diagnostic tests. Here we use an example mentioned in the epiR package. The 2x2 table is similar except that instead of an exposure, we have a test as the rows.

## epiR package's example
## Scott et al. 2008, Table 1:
## A new diagnostic test was trialled on 1586 patients. Of 744 patients
## that were disease positive, 670 tested positive. Of 842 patients that
## were disease negative, 640 tested negative. What is the likeliood
## ratio of a positive test? What is the number needed to diagnose?
dat <- matrix(c(670, 202, 74, 640), nrow = 2, byrow = TRUE)
colnames(dat) <- c("Dis+","Dis-")
rownames(dat) <- c("Test+","Test-")
dat

##       Dis+ Dis-
## Test+  670  202
## Test-   74  640

## epi.tests()
rval <- epi.tests(dat, conf.level = 0.95)
rval

##           Outcome +    Outcome -      Total
## Test +          670          202        872
## Test -           74          640        714
## Total           744          842       1586
## 
## Point estimates and 95 % CIs:
## ---------------------------------------------------------
## Apparent prevalence                    0.55 (0.52, 0.57)
## True prevalence                        0.47 (0.44, 0.49)
## Sensitivity                            0.90 (0.88, 0.92)
## Specificity                            0.76 (0.73, 0.79)
## Positive predictive value              0.77 (0.74, 0.80)
## Negative predictive value              0.90 (0.87, 0.92)
## Positive likelihood ratio              3.75 (3.32, 4.24)
## Negative likelihood ratio              0.13 (0.11, 0.16)
## ---------------------------------------------------------

## Unformatted more detailed results
summary(rval)

##                 est      lower      upper
## aprev     0.5498108  0.5249373  0.5744996
## tprev     0.4691047  0.4443055  0.4940184
## se        0.9005376  0.8767462  0.9210923
## sp        0.7600950  0.7297765  0.7885803
## diag.acc  0.8259773  0.8064049  0.8443346
## diag.or  28.6861119 21.5181917 38.2417364
## nnd       1.5137005  1.4091004  1.6487431
## youden    0.6606326  0.6065226  0.7096726
## ppv       0.7683486  0.7388926  0.7959784
## npv       0.8963585  0.8716393  0.9177402
## plr       3.7537262  3.3206884  4.2432346
## nlr       0.1308552  0.1050643  0.1629771

04 statistical tools

Kazuki Yoshida

6/5/2017

Agenda

Usual Preparation

Hypothesis testing functions

t-test

Tests for 2x2 tables

\(\chi^2\) test

Fisher’s exact test

More 2x2 table tools

Exposure-outcome 2x2 table

Test-gold standard 2x2 table