2024-10-21

What is Hypothesis Testing

Hypothesis testing is a statistical method to decide whether there is enough evidence to support a certain belief (hypothesis) about a population. We start by setting up two hypotheses: - Null Hypothesis (H₀): No effect or no difference. - Alternative Hypothesis (H₁): There is an effect or a difference.

The Process of Hypothesis Testing

  1. State the null and alternative hypotheses.
  2. Choose a significance level (usually 0.05).
  3. Determine the test statistic.
  4. Compute the p-value based on the test statistic.
  5. Make a decision:
    • If the p-value is less than the significance level, reject the null hypothesis.
    • Otherwise, fail to reject the null hypothesis.

Significance Level and p-value

  • The significance level (\(\alpha\)) is the probability of rejecting the null hypothesis when it is true. Typically, \(\alpha = 0.05\).
  • The p-value is the probability of observing the test result under the null hypothesis.
    • If \(p \leq \alpha\), we reject the null hypothesis.
    • If \(p > \alpha\), we fail to reject the null hypothesis.

Types of Hypothesis Tests

  • One-Tailed Test: Tests if a parameter is either greater than or less than a certain value.
    • Example: Testing if a new drug works better than the current one.
  • Two-Tailed Test: Tests if a parameter is simply different from a certain value.
    • Example: Testing if the average height of people in a region is different from the national average.

Formula for t-Statistic

The formula for calculating the t-statistic is:

\[ t = \frac{\bar{X} - \mu}{\frac{s}{\sqrt{n}}} \]

Where: - \(\bar{X}\) is the sample mean - \(\mu\) is the population mean - \(s\) is the sample standard deviation - \(n\) is the sample size

## Example: t-Test

# Hypothesis: Is the mean of the sample data equal to 5?


data <- c(4.8, 5.2, 5.1, 5.3, 4.9)
t.test(data, mu = 5)
## 
##  One Sample t-test
## 
## data:  data
## t = 0.647, df = 4, p-value = 0.5529
## alternative hypothesis: true mean is not equal to 5
## 95 percent confidence interval:
##  4.802523 5.317477
## sample estimates:
## mean of x 
##      5.06

T Distribution

# T Distribution


library(plotly)
## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
x <- seq(-4, 4, length.out = 100)
y <- dt(x, df = 10)
plot_ly(x = ~x, y = ~y, type = 'scatter', mode = 'lines') %>%
  layout(title = "t-Distribution")

Density plot

# Density plot

library(ggplot2)
data <- c(4.8, 5.2, 5.1, 5.3, 4.9)
ggplot(data.frame(data), aes(x = data)) +
  geom_density(fill = "lightblue") +
  geom_vline(aes(xintercept = mean(data)), color = "blue", linetype = "dashed") +
  ggtitle("Density Plot of Sample Data")

boxplot

#Boxplot

ggplot(data.frame(data), aes(x = factor(0), y = data)) +
  geom_boxplot(fill = "lightgreen") +
  labs(x = "Sample", y = "Values") +
  ggtitle("Boxplot of Sample Data")