Biostatistics

Day 2: Inferential Statistics

Priv. Doz. Dr. Raimund Kovacevic

09-2024

What is Statistical Inference?

What is the “true” distribution of a relevant variable, given observations?
What is the true joint distribution of several variables, given observations?
Drawing conclusions about populations based on sample data …
Main approaches:
1. Estimation: Determining likely values of population parameters
  Example: Estimating the average blood pressure of adults in a city based on a sample of 1000 residents.
2. Hypothesis Testing: Assessing claims about populations
  Example: Determining if a new drug significantly reduces cholesterol levels compared to a placebo
3. Forecasting (prognosis): What will be future values of relevant medical parameters:
  Example: Predicting the number of flu cases in the next winter season based on historical data and current trends.

Estimation

Point Estimation

Point Estimator: A single value that serves as a “best guess” of a population parameter
Example: Assume that systolic blood pressure is normally distributed. What are good point estimators \(\hat{\mu}, \hat{\sigma}\) for the expectation \(\mu\) and the standard deviation \(\sigma\)?
Typical properties of reasonable estimators:
1. Unbiasedness: \(E(\hat{\theta}) = \theta\)
2. Efficiency: Smallest variance among unbiased estimators
3. Consistency: \(\text{plim}_{n \to \infty} \hat{\theta}=\theta\)

Common Point Estimators

Binomial distribution: Sample Proportion (\(\hat{p}\)) for population proportion (\(p\)) \(\hat{p} = \frac{x}{n}\), where \(x\) is the number of successes
Poisson distribution: Estimated cases per period (\(\hat{\lambda}\)) for cases per period in population (\(\lambda\)) \(\hat{\lambda} = \frac{\sum_{i=1}^n x_i}{n}\), where \(x_i\) are counted cases in \(n\) subsequent time periods.
Normal distribution:
- Sample Mean (\(\bar{X}\)) for population mean (\(\mu\)): \(\bar{X} = \frac{1}{n}\sum_{i=1}^n x_i\)
- Sample Variance (\(s^2\)) for population variance (\(\sigma^2\)): \(s^2 = \frac{1}{n-1}\sum_{i=1}^n (x_i - \bar{X})^2\)

Interval Estimate: Confidence Interval

An interval \(CI=[v_l,v_u]\) “likely” to contain the population parameter \(\theta\)
Trustworthiness specified by the level of confidence \(\gamma=1-\alpha\), e.g., 0.99
Probability that the true population parameter is in the CI: \(\gamma\)??
Correct: If data calculation of CI is repeated, then \(\theta\in \text{CI}\) in \(100\cdot\gamma\%\).\[P(v_l(X)\leq \theta \leq v_u(X))=\gamma\]

Confidence Intervals for expectation \(\mu\)

For a sample \(X_i\) i.i.d. \(N(\mu,\sigma^2)\) confidence level \(1-\alpha\): \[v=\bar{x} \pm t_{\alpha/2, n-1} \cdot \frac{s}{\sqrt{n}}\]
Where:
- \(\bar{x}\) is the sample mean
- \(t_{\alpha/2, n-1}\) is the t-value (quantile) for desired confidence level with \(n-1\) degrees of freedom
- \(s\) is the sample standard deviation
- \(n\) is the sample size

CI for expectation in R

# Load a medical dataset
library(MASS)
data(birthwt)

x <- birthwt$bwt
n <- length(x)

c(mean(x) + qt(0.05/2,n-1) * sd(x) / sqrt(n),mean(x) - qt(0.05/2,n-1) * sd(x) / sqrt(n))

[1] 2839.952 3049.222

# alternative
ci_result <- t.test(birthwt$bwt, conf.level = 0.95)
ci_result$conf.int

[1] 2839.952 3049.222
attr(,"conf.level")
[1] 0.95

Confidence Intervals: Proportions

Sample \(X_i\) i.i.d. Bernoulli with parameter \(p\) and sample size \(n\)
CI for parameter \(p\), if large sample size \(n\): \[v=\hat{p}\pm z_{\alpha/2}\cdot\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\]
Where:
- \(\hat{p}\) is the relative frequency in the sample
- \(z_{\alpha/2}\) is the \(\alpha/2\) standard normal quantile for the desired confidence level \(\gamma=1-\alpha\)
- \(n\) is the sample size

CI for Proportion in R

# Calculate proportion of low birth weight babies
p <- mean(birthwt$low)
p

[1] 0.3121693

c(p + qnorm(0.05/2)*sqrt(p*(1-p)/n),p - qnorm(0.05/2)*sqrt(p*(1-p)/n))

[1] 0.2461071 0.3782315

# alternative: exact confidence interval
prop_ci <- prop.test(sum(birthwt$low), nrow(birthwt), conf.level = 0.95)

# Display results
round(prop_ci$conf.int, 3)

[1] 0.248 0.384
attr(,"conf.level")
[1] 0.95

Bootstrapping

Resampling technique for estimating the sampling distribution of statistics
useful when
- no distribution known
- no standard CI-formula can be used
- small sample size
Resampling
1. Draw a sample of size n with replacement from original data with size \(n\)
2. Calculate the statistic of interest for this resample
3. Repeat steps 1-2 \(M\) times (typically \(M\geq 1000\))
4. Use the simulated distribution of the statistics to estimate CI by quantiles

Bootstrapping

For a statistic \(\hat{\theta}\):
Bootstrap estimate:

\[\hat{\theta}^* = \frac{1}{n} \sum_{i=1}^n \hat{\theta}_i\]

Bootstrap standard error:

\[SE(\hat{\theta}^*) = \sqrt{\frac{1}{n-1} \sum_{i=1}^n (\hat{\theta}_i - \hat{\theta}^*)^2}\]

Where \(\hat{\theta}_i\) is the statistic calculated from the \(i\)-th bootstrap sample

Bootstrapping in R

library(mosaic)
# mean
simMean <- do(10000) * mean(~bwt,data=resample(birthwt)) 
# bootstrap estimate for the mean
mean(simMean$mean)

[1] 2945.304

# bootstrap confidence interval
quantile(simMean$mean,probs=c(0.025,0.9975))

    2.5%   99.75% 
2839.368 3088.826

# proportion
simProp <- do(10000) *  mean(resample(birthwt$low == 1)) 
quantile(simProp$mean,probs=c(0.025,0.9975))

     2.5%    99.75% 
0.2486772 0.4126984

# median
simMed <- do(10000) * median(~bwt,data=resample(birthwt)) 
# bootstrap estimate for the median
mean(simMed$median)

[1] 2976.418

# bootstrap confidence interval
quantile(simMed$median,probs=c(0.025,0.9975))

  2.5% 99.75% 
  2835   3147

Main Paradigms of Estimation

Maximum Likelihood Estimation (MLE)
- Chooses parameters that maximize the likelihood of observing the data
- Widely used, asymptotically efficient
Method of Moments (MoM)
- Equates sample moments with theoretical moments
- Simple but do not have the good properties of MLE
Bayesian Estimation
- Incorporates prior beliefs about parameters
- Provides full posterior distribution of parameters
- Good approach for updating believes over time

Maximum Likelihood Estimation

Idea: Choose parameters that maximize the probability of observing the data

Likelihood function: \[L(\theta|x) = f(x|\theta) = \prod_i f(x_i|\theta)\]

Log-likelihood: \[\ell(\theta|x) = \log L(\theta|x) = \sum_i \log f(x_i|\theta)\]

MLE estimator: \[\hat{\theta}_{MLE} = \arg\max_{\theta} \ell(\theta|x)\]

Ronald A. Fisher (1890-1962)

Life and Career
- British statistician, evolutionary biologist, and geneticist
- Founder of modern statistics and evolutionary biology
- Professor: UCL (1933-1943), Cambridge (1943-1957)
Key Contributions
- Developed ANOVA, maximum likelihood
- Pioneered design of experiments
- Fundamental work in genetics and evolution “Statistical Methods for Research Workers” (1925), “Genetical Theory of Natural Selection” (1930), “The Design of Experiments” (1935)
Controversies:
- Supported (positive) eugenics

Maximum Likelihood: Main Properties

Consistency: \[\hat{\theta}_{MLE} \xrightarrow{p} \theta_0 \text{ as } n \to \infty\]
Asymptotic normality: \[\sqrt{n}(\hat{\theta}_{MLE} - \theta_0) \xrightarrow{d} N(0, I(\theta_0)^{-1})\]
Asymptotic efficiency: ML-estimators Achieves Cramér-Rao lower bound as \(n \to \infty\). There is no unbiased estimator with a smaller variance …

Example: Glucose level

load("diabetes.RData")
glucLev <- diabetes$Glucose[diabetes$Glucose>0]

library(fitdistrplus)
plotdist(glucLev, histo = TRUE, demp = TRUE)

Example: Glucose level

fg=fitdist(glucLev,"gamma")
summary(fg)

Fitting of the distribution ' gamma ' by maximum likelihood 
Parameters : 
        estimate  Std. Error
shape 16.2787190 0.824061426
rate   0.1338447 0.006880465
Loglikelihood:  -3667.141   AIC:  7338.281   BIC:  7347.556 
Correlation matrix:
          shape      rate
shape 1.0000000 0.9846507
rate  0.9846507 1.0000000

fln=fitdist(glucLev,"lnorm")
summary(fln)

Fitting of the distribution ' lnorm ' by maximum likelihood 
Parameters : 
         estimate  Std. Error
meanlog 4.7703198 0.009063607
sdlog   0.2503591 0.006408478
Loglikelihood:  -3665.757   AIC:  7335.513   BIC:  7344.788 
Correlation matrix:
        meanlog sdlog
meanlog       1     0
sdlog         0     1

fw=fitdist(glucLev,"weibull")
summary(fw)

Fitting of the distribution ' weibull ' by maximum likelihood 
Parameters : 
        estimate Std. Error
shape   4.213099  0.1134178
scale 133.645611  1.2178178
Loglikelihood:  -3706.577   AIC:  7417.154   BIC:  7426.428 
Correlation matrix:
          shape     scale
shape 1.0000000 0.3322829
scale 0.3322829 1.0000000

Example: Glucose level

plot.legend <- c("Gamma", "lognormal", "Weibull")

par(mfrow=c(1,2))
denscomp(list(fg, fln, fw), legendtext = plot.legend)
qqcomp(list(fg, fln, fw), legendtext = plot.legend)

Hypothesis Testing

Hypotheses and Rejection

In statistics, hypotheses are hypotheses about underlying distributions at population level
In parametric statistics: hypotheses about distribution parameters
Content-related hypotheses have to be translated into statistical hypotheses!
- Glucose level in patients decreases “after” administering some drug.
  -> Assume that glucose level \(G\) is normally distributed with expectation \(\mu_b\) before, and \(\mu_a\) after administration. Hypothesis: \(H: \;\mu_a > \mu_b\).
It is not possible to prove a statistical hypothesis in direct manner.
Instead we formulate two contradictory hypotheses and try to “reject” one.
In order to confirm a hypothesis, reject the contradicting hypothesis!
Not possible to ultimately reject a hypothesis about a larger population. We want evidence that rejection is reasonable.

Hypotheses and Rejection

How to decide about rejection?

Test for a parameter \(\theta\) of the distribution of a random variable \(X\)
Consider simple Hypotheses: \(H_0: \theta=\theta_0\) versus \(H_1: \theta=\theta_1\)
Distribution of random variable \(X\):
- Under \(H_0\) with PDF \(f(x|\theta_0)\)
- Under \(H_1\) with PDF \(f(x|\theta_1)\)

Errors and Power

Data are random, hence any decision will randomly couse an error.
The Type I error or \(\alpha\)-error consists of incorrectly rejecting the null hypothesis. The probability that this happens should be not more that \(\alpha\) - the significance level.
Significance level is key: Tests at level \(\alpha\).
The Type II error or \(\beta\)-error consists of failing to reject the null hypothesis when the alternative is true. The probability of this error is denoted by \(\beta\).
The power \(1-\beta\) of a test is the probability of rightly deciding to reject the alternative hypothesis.
In medical contexts often
- power 1 - β is called sensitivity
- 1 - α is called specificity.

Errors and Power

		Reality
		H₀ is true	H₁ is true
Decision	for H₀	right decision true negative specificity \(1-\alpha\)	type II error false negative \(\beta\)
	for H₁	type I error false positive \(\alpha\)	right decision true positive sensitivity \(1-\beta\)

How to decide about rejection?

Neyman-Pearson test: compare likelihoods of observations \(x_1, \dots, x_n\) \[L(\theta_i|x) = \prod_j f(x_j|\theta_i)\]
Given the data \(x\), reject the null hypothesis if \[T(x) = \frac{L(\theta_1|x)}{L(\theta_0|x)}>\gamma,\]where \(\gamma\) is an \(1-\alpha\) quantile of the distribution of \(T\) under \(H_0\).
Is based on the same idea as Maximum-Likelihood
Most powerful test at level \(\alpha\).
Generalizes to hypotheses like \(\theta\leq \theta_0\) vs. \(\theta>\theta_0\) (certain distributions).

Simplified view

Start with a test statistics \(T\) - derived earlier by Neyman-Pearson approach
From data \(x\) the value of the statistics has been calculated as \(t_d=T(x)\).
Under \(H_0\) this statistics has a certain distribution and PDF \(f_0(t)\)
Consider the null hypothesis \(\theta\leq \theta_0\), then typically \(H_0\) is rejected when value \(t\) is large enough.
Enough means that \(t\) is larger than the \(1-\alpha\) quantile of the distribution of \(T\) with under \(H_0\)

p-Values

In similar manner for the null hypothesis \(\theta\geq \theta_0\), typically \(H_0\) is rejected when value \(t\) is small enough.
For the null hypothesis \(\theta= \theta_0\), typically \(H_0\) is rejected when value \(t\) is either small or large enough.
Alternatively and equivalently, the decision can be based on the p-value:
- For \(H_0: \theta\leq \theta_0\), calculate the probability \(p=P_0(T\geq t)\) under the null hypothesis
- For \(H_0: \theta\geq \theta_0\), calculate the probability \(p=P_0(T\leq t)\) under the null hypothesis
- For \(H_0: \theta = \theta_0\), calculate the probabilities \(p_1=P_0(T\geq t)\) and \(p_2=P_0(T\leq t)\) under the null hypothesis.
If the relevant \(p\)-values is larger than significance level \(\alpha\), reject \(H_0\)

Simplest Test

Assume a sample \(x_1,\cdots,x_n\) from i.i.d. \(X_i~N(\mu,\sigma^2)\). The sample mean (n=20) has been calculated as \(\bar{x}=132\)
The parameter \(\sigma = 5\) is known.
We use \(\alpha=0.05\)
\(H_0: \mu\leq 130\)
The distribution of the test statistics \(T=\frac{1}{n}\sum_{i=1}^n X_i\) under \(H_0\):
- Sum of i.i.d. normal distributions is normally distributed
- \(E(T) = \frac{1}{n}\sum_{i=1}^n E(X_i) = \mu_0=130\)
- \(Var(T)=\frac{1}{n^2}\sum_{i=1}^n Var(X_i) = \frac{25}{20}\approx 1.25\)

Simplest Test

Null hypothesis mean (mu0): 130

Critical value: 131.839

Rejection region: T > 131.839

Calculation

Usually, a slightly modified \(N(0,1)\) statistics is used\[z = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}.\]

mu0 <- 130  # null hypothesis mean
sigma <- 5  # known standard deviation
n <- 20     # sample size
se <- sigma / sqrt(n)  # standard error
xBar <- 132
# Calculate critical value and z-value
alpha <- 0.05
(t_crit <- qnorm(1 - alpha))

[1] 1.644854

(z <- (xBar - mu0)/se)

[1] 1.788854

#p-value
1 - pnorm(z)

[1] 0.03681914

Steps in Hypothesis Testing

State the hypotheses
Choose the significance level (α)
Select the appropriate test statistic
Calculate the test statistic
Determine the p-value
Make a decision and interpret results

One-Sample t-test

Compare a sample mean to a hypothesized population mean
Assumption: data are from a normal distribution \(N(\mu,\sigma^2)\), \(\sigma\) unknown
Hypotheses:
- \(H_0: \mu = \mu_0\) versus (two-tailed)
- \(H_0: \mu \leq \mu_0\) versus \(H_1: \mu > \mu_0\)
- \(H_0: \mu \geq \mu_0\) versus \(H_1: \mu < \mu_0\)
Test statistic: \[t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}\]
Under \(H_0\): Test statistics \(t\) is distributed according to a \(t\)-distribution with : \(df = n - 1\)

Student’s t-Distribution

Symmetric, bell-shaped distribution (like normal distribution)
Developed for small sample inference when σ unknown
Defined by degrees of freedom (df = n-1)
Heavier tails than normal distribution
Mean: \(\mu = 0\) (for all df)
Variance: \(\sigma^2 = \frac{df}{df-2}\) for df > 2
Probability Density Function: \(f(t) = \frac{\Gamma(\frac{df+1}{2})}{\sqrt{df\pi}\,\Gamma(\frac{df}{2})} \left(1 + \frac{t^2}{df}\right)^{-\frac{df+1}{2}}\)

Relationship to Normal Distribution

Test statistic:
- z-statistic (known σ): \(z = \frac{\bar{x} - \mu}{\sigma/\sqrt{n}}\)
- t-statistic (unknown σ): \(t = \frac{\bar{x} - \mu}{s/\sqrt{n}}\)
Key differences:
- t-distribution has heavier tails
- Critical values are larger than normal distribution
As df → ∞, t-distribution → standard normal distribution

Student’s t-Distribution

William Sealy Gosset (1876-1937)

Life and Career
- English statistician and chemist
- Worked at Guinness Brewery (1899-1937)
- Published under pseudonym “Student”
- Collaborated with Karl Pearson and R.A. Fisher
Key contributions
- Student’s t-distribution and t-test (1908)
- Pioneered small sample statistics
- Contributions to experimental design
- Introduced sequential analysis concept

Example: One-Sample t-Test in R

# Test if expected birth weight differs from 3000 grams
t_test_result <- t.test(birthwt$bwt, mu = 3000)

t_test_result


    One Sample t-test

data:  birthwt$bwt
t = -1.0447, df = 188, p-value = 0.2975
alternative hypothesis: true mean is not equal to 3000
95 percent confidence interval:
 2839.952 3049.222
sample estimates:
mean of x 
 2944.587

t-Test: Syntax

Visualizing One-Sample t-test

Reporting One-Sample t-test Results

“In a sample of 189 infants, the mean birth weight (M = 2944.6g, SD = 729.0) was not significantly different from the hypothesized population mean of 3000g (t(188) = -1.04, p = 0.299, 95% CI [2841.9g, 3047.3g]).”

2. Two-Sample t-test (Independent)

Use: Compare means between two independent groups
Hypotheses:
- \(H_0: \mu_1 = \mu_2, \;\mu_1\leq \mu_2,\; \mu_1\geq\mu_2\)
- \(H_1: \mu_1 \neq \mu_2, \; \mu_1>\mu_2,\;\mu_1<\mu_2\)
Test statistic: \(t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{s_p^2(\frac{1}{n_1} + \frac{1}{n_2})}}\)
Pooled variance: \(s_p^2 = \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1 + n_2 - 2}\)
Degrees of freedom: \(df = n_1 + n_2 - 2\)

Example: Two-Sample t-test in R

# Compare birth weights between smoking and non-smoking mothers
t.test(bwt ~ smoke, data = birthwt)


    Welch Two Sample t-test

data:  bwt by smoke
t = 2.7299, df = 170.1, p-value = 0.007003
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
  78.57486 488.97860
sample estimates:
mean in group 0 mean in group 1 
       3055.696        2771.919

# one sided alternative. Note that group1 - group2!
t.test(bwt ~ smoke, data = birthwt, alternative="greater")


    Welch Two Sample t-test

data:  bwt by smoke
t = 2.7299, df = 170.1, p-value = 0.003501
alternative hypothesis: true difference in means between group 0 and group 1 is greater than 0
95 percent confidence interval:
 111.8548      Inf
sample estimates:
mean in group 0 mean in group 1 
       3055.696        2771.919

Visualizing the Two-Sample t-test

Reporting Two-Sample t-test Results

“In a study of 189 infants, birth weight was significantly lower for infants of smokers (n = 86, M = 2771.5g, SD = 678.4) compared to non-smokers (n = 103, M = 3055.7g, SD = 729.0), t(187) = -2.80, p = 0.006, 95% CI of the difference [-486.3g, -82.1g].”

Origins of the Two-Sample t-test

Built upon the foundation of the one-sample t-test
Formalized by Ronald A. Fisher in the 1920s as part of his work on statistical methods in research
Initially developed for agricultural research, particularly for comparing crop yields under different conditions
The test addressed the need to compare means from two independent groups, a common scenario in experimental research
Fisher’s work on the two-sample t-test was part of his broader contributions to the field of analysis of variance (ANOVA)
The test quickly found applications beyond agriculture, becoming a staple in medical and social science research

Paired t-test

Use: Compare means between two samples. Measurement is done two times at the same individuals
There is dependency between the two groups.
All calculations based on differences \(D=X_1-X_2\)
Hypotheses (two-tailed):
- \(H_0: \mu_D = 0\)
- \(H_1: \mu_D \neq 0\)
Test statistic:\[t=\frac{\bar{D}}{s_D/\sqrt{n}}\]
Degrees of freedom: \(df = n - 1\)

Paired t-Test in R

# Simulate paired data (e.g., blood pressure before and after treatment)
set.seed(123)
bp_before <- rnorm(30, mean = 140, sd = 10)
bp_after <- bp_before + rnorm(30, mean = -5, sd = 5)

# Perform paired t-test
t.test(bp_before, bp_after, paired = TRUE, alternative = "greater")


    Paired t-test

data:  bp_before and bp_after
t = 5.3889, df = 29, p-value = 4.305e-06
alternative hypothesis: true mean difference is greater than 0
95 percent confidence interval:
 2.812955      Inf
sample estimates:
mean difference 
       4.108308

Reporting Paired t-test Results

“In a study of 30 patients, blood pressure significantly decreased after treatment (M = -5.2 mmHg, SD = 5.1), t(29) = -5.57, p < 0.001, 95% CI [-7.1, -3.3].”

Statistical Power

Definition: Probability of correctly rejecting a false null hypothesis
Factors affecting power:
1. Sample size (n)
2. Effect size (d)
3. Significance level (α)
4. Variability in the data (σ)
Each test has its own formula, t-test: \[n = \frac{2(z_{\alpha/2} + z_{\beta})^2 \sigma^2}{d^2}\]
Effect size d is also different, one sample t-test (Cohen): \[d=\frac{\bar{x}-\mu_0}{s/\sqrt{n}}\]

Example: Power Analysis in R

library(pwr)
# Calculate power for two-sample t-test.
pwr.t.test(n = 30, d = 0.5, sig.level = 0.05)


     Two-sample t test power calculation 

              n = 30
              d = 0.5
      sig.level = 0.05
          power = 0.4778965
    alternative = two.sided

NOTE: n is number in *each* group

# Calculate n, necessary to reach some power
pwr.t.test(power = 0.8, d = 0.5, sig.level = 0.05)


     Two-sample t test power calculation 

              n = 63.76561
              d = 0.5
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

NOTE: n is number in *each* group

Power Calculations in R

Tests for Proportions

The binomial test can be used to compare proportions between two groups (“success”, “failure”).
It assumes that the underlying distribution is a binomial distribution with \(n\) trials and probability of success \(p\).
Hypotheses are then formulated about \(p\).
Let’s test if the proportion of low birthweight babies in our sample differs from 10%: \(\text{H}_0: p\leq 0.10 \text{ versus H}_1: p>0.10\)

# Test proportion of low birthweight babies
binom.test(~(low==1), data=birthwt, p=0.10, alternative = "greater")




data:  birthwt$(low == 1)  [with success = TRUE]
number of successes = 59, number of trials = 189, p-value = 8.405e-16
alternative hypothesis: true probability of success is greater than 0.1
95 percent confidence interval:
 0.2565953 1.0000000
sample estimates:
probability of success 
             0.3121693

Tests for Proportions

In this dataset from the Baystate Medical Center in Springfield, Massachusetts, we observe a much higher rate of low birthweight babies than 10%. This is because this was a targeted study of risk factors for low birthweight, not a representative population sample.

Tests for Proportions

It is also possible to test, whether proportions in one variable are different in between groups fefined by another variable

tally(low ~ smoke,data=birthwt)

   smoke
low  0  1
  0 86 44
  1 29 30

prop.test(low ~ smoke, data = birthwt, 
          alternative = "less",
          success = 1)


    2-sample test for equality of proportions with continuity correction

data:  tally(low ~ smoke)
X-squared = 4.2359, df = 1, p-value = 0.01979
alternative hypothesis: less
95 percent confidence interval:
 -1.00000000 -0.02701885
sample estimates:
   prop 1    prop 2 
0.2521739 0.4054054

Chi-square Test of Independence

Use: Test association between two categorical variables
Hypotheses:
- \(H_0\): No association between variables
- \(H_1\): Association exists between variables
Test statistic: \(\chi^2 = \sum_{ij} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}\)
Degrees of freedom: \(df = (r-1)(c-1)\), where \(r\) = number of rows, \(c\) = number of columns
Under

Example: Chi-square Test in R

# frequency table
(tab <- table(birthwt$low, birthwt$smoke))

# Test association between low birth weight and smoking
chisq.test(tab)


    Pearson's Chi-squared test with Yates' continuity correction

data:  tab
X-squared = 4.2359, df = 1, p-value = 0.03958

Reporting Chi-square Test Results

Example: “In a sample of 189 mother-infant pairs, there was a significant association between low birth weight and maternal smoking status, χ²(1, N = 189) = 8.90, p = 0.003. The odds of having a low birth weight baby were 2.02 times higher for smokers compared to non-smokers (95% CI [1.26, 3.18]).”

Origins of the Chi-square Test

Developed by Karl Pearson in 1900
Introduced in his paper “On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling”
Pearson was addressing problems in biology and evolution, inspired by questions raised by Walter Frank Raphael Weldon
Originally used to test goodness of fit between observed and theoretical distributions
The test was a breakthrough in allowing researchers to quantify the agreement between observed data and a hypothesized distribution
Later extended to test independence between categorical variables
It became a cornerstone of contingency table analysis, widely used in medical research for analyzing categorical data

Nonparametric Tests

Nonparametric Statistics

What is Nonparametric Statistics?
- Statistical methods that do not rely on assumptions about the underlying distribution of the data
- Useful when data doesn’t follow a normal distribution or when sample sizes are small
- May be used with ordinal data or unclear scaling
- When dealing with outliers that might skew parametric results
- Often based on ranks of the data rather than the actual values
- In small samples where it’s difficult to verify distributional assumptions
- Generally more robust but less powerful than parametric tests when assumptions are met

Choosing Between Parametric and Nonparametric Tests

Consider the nature of your data (continuous, ordinal, nominal)
Check the assumptions of parametric tests (normality, homogeneity of variance)
Evaluate sample size and presence of outliers
Consider the research question and desired power of the analysis

Mann-Whitney U Test (Wilcoxon Rank-Sum Test)

Nonparametric alternative to the independent two-sample t-test
Used to compare the medians of two independent groups
Based on the ranks of the observations across both groups

Example: Mann-Whitney U Test in R

# Perform Mann-Whitney U test on birth weight between smokers and non-smokers
wilcox.test(bwt ~ smoke, data = birthwt)


    Wilcoxon rank sum test with continuity correction

data:  bwt by smoke
W = 5249.5, p-value = 0.006768
alternative hypothesis: true location shift is not equal to 0

Reporting Mann-Whitney U Test Results

Example: “A Mann-Whitney U test revealed a significant difference in birth weights between infants of smokers and non-smokers (W = 5331.5, p = 0.004).”

Wilcoxon Signed-Rank Test

Nonparametric alternative to the paired t-test
Can be also applied to the one-sample case
No assumption on the distribution
Hypotheses about the median
Used to compare a sample median to a hypothesized value or to compare two related samples
Based on the ranks of the (absolute) differences between pairs of observations

Wilcoxon Signed-Rank Test in R

# Simulating blood pressure data before and after treatment
set.seed(123)
bp_before <- rnorm(30, mean = 140, sd = 10)
bp_after <- bp_before + rnorm(30, mean = -5, sd = 5)

# Performing Wilcoxon Signed-Rank Test
wilcox.test(bp_before, bp_after, paired = TRUE)


    Wilcoxon signed rank exact test

data:  bp_before and bp_after
V = 425, p-value = 1.598e-05
alternative hypothesis: true location shift is not equal to 0

Reporting Wilcoxon Signed-Rank Test Results

Example: “A Wilcoxon signed-rank test indicated that the median reduction in blood pressure after treatment was statistically significant, V = 465, p < 0.001.”

Multiple Testing Problem

Introduction to Multiple Testing

Issue: Increased likelihood of Type I errors when performing multiple tests
Family-wise error rate (FWER): Probability of making at least one Type I error in a set of tests
False Discovery Rate (FDR): Expected proportion of false discoveries among all discoveries

Bonferroni Correction

Simplest and most conservative approach
Adjusted significance level: \(\alpha_{adjusted} = \alpha / m\), where \(m\) is the number of tests
Pros: Easy to implement and understand
Cons: Can be overly conservative, especially for large numbers of tests

Holm-Bonferroni Method

Order the p-values from smallest to largest: \(p_{(1)}, p_{(2)}, ..., p_{(m)}\)
For each \(p_{(i)}\), compare with \(\alpha / (m - i + 1)\)
Find the first \(k\) such that \(p_{(k)} > \alpha / (m - k + 1)\)
Reject null hypotheses for tests 1 to \(k-1\), accept null hypotheses for tests \(k\) to \(m\)

Pros: More powerful than Bonferroni, while still controlling FWER
Cons: Can still be conservative for very large numbers of tests

False Discovery Rate (FDR) Control

Less conservative approach focused on controlling the proportion of false positives
Benjamini-Hochberg procedure is a common method for FDR control
Pros: More powerful for large-scale testing (e.g., genomics)
Cons: Does not provide strong control of the FWER

Example: Multiple Testing Corrections in R

# Perform multiple t-tests
(p_values <- c(0.001,0.005,0.01,0.03,0.045))

[1] 0.001 0.005 0.010 0.030 0.045

# Apply different corrections
p.adjust(p_values, method = "bonferroni")

[1] 0.005 0.025 0.050 0.150 0.225

p.adjust(p_values, method = "holm")

[1] 0.005 0.020 0.030 0.060 0.060

p.adjust(p_values, method = "fdr")

[1] 0.00500000 0.01250000 0.01666667 0.03750000 0.04500000