2024-06-04

Introduction

Statistics is concerned with gathering, evaluating, presenting, and interpreting data. A statistical technique called hypothesis testing is used to assess statements regarding a population parameter based on sample data.

Types of Hypothesis Testing

There are two main types of hypothesis testing:

  • Null Hypothesis (H0): This is the default statement that there is no significant difference between two populations or that a specific parameter has a certain value.
  • Alternative Hypothesis (Ha): This is the opposite of the null hypothesis and proposes that there is a significant difference or that the parameter has a different value.

Process of Hypothesis Testing

  1. State the Hypotheses: Formulate the null (H0) and alternative (Ha) hypotheses.
  2. Set the Significance Level (α): This is the probability of rejecting the null hypothesis 3. when it’s actually true (Type I error). Common significance levels are 0.05 (5%) or 0.01 (1%).
  3. Collect Data: Obtain a random sample from the population of interest.
  4. Calculate the Test Statistic: Use a statistical test to calculate a test statistic from the sample data.
  5. Determine the p-value: The p-value is the probability of observing a test statistic as extreme or more extreme than the one calculated, assuming the null hypothesis is true.
  6. Make a Decision: Reject the null hypothesis if the p-value is less than the significance level (α). Otherwise, fail to reject the null hypothesis.

Example: Coin Flip (Test)

Scenario: We want to test if a coin is fair (H0: p = 0.5) or biased (Ha: p ≠ 0.5), where p is the probability of heads. We flip the coin 100 times and get 60 heads.

Coin Flip Test using R

flips <- rbinom(100, 1, 0.5) # 100 flips with probability of heads = 0.5

binom.test(sum(flips), 100, 0.5, alternative = “two.sided”)

## 
##  Exact binomial test
## 
## data:  sum(flips) and 100
## number of successes = 46, number of trials = 100, p-value = 0.4841
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.3598434 0.5625884
## sample estimates:
## probability of success 
##                   0.46

Coin Toss Results

The calculated p-value is likely to be high (greater than 0.05) since we expect 60 heads from a fair coin. Therefore, we fail to reject the null hypothesis and conclude there’s not enough evidence to say the coin is biased.

Binomial Distribution

The probability of getting k heads in n flips from a fair coin (p = 0.5) follows the binomial distribution:

\[ P(X=k)=\binom{n}{k}p^k(1-p)^{n-k} \]

where:

P(X = k) is the probability of getting k heads. n is the number of flips. k is the number of heads. p is the probability of heads (0.5 for a fair coin).

3D Binomial Distribution

Decision Rule in Hypothesis Testing

  • Ho represents the null hypothesis.
  • The p value represents the calculated p value from the statistical test.
  • The alpha symbol represents the chosen significance level.

\[ \text{Reject } H_0 \text{ if } p\text{-value} < \alpha \\ \text{Fail to Reject } H_0 \text{ if } p\text{-value} \geq \alpha \]

Distrubtion of Simulated P-Values for Coin Flips

Graph of Coin Flips by Outcome