Overview

Statistical inference allows us to use information from a sample to make conclusions about a population. A sample statistic is used as a point estimate for a population parameter. For example, the sample proportion () is used to estimate the population proportion (p). Because different samples can produce different results, this difference is known as sampling variability.

The variability of a sample proportion is measured using the standard error (SE). Standard error measures how much the sample statistic would vary if we repeatedly took samples from the same population.

The formula for the standard error of a proportion is:

\[ SE = \sqrt{\frac{p(1-p)}{n}} \]

Since the population proportion is usually unknown, we estimate it using the sample proportion:

\[ SE = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]

As the sample size increases, the standard error becomes smaller, meaning the estimates become more consistent.

Confidence Intervals

A confidence interval provides a range of plausible values for the population parameter. A 95% confidence interval can be estimated using:

\[ \hat{p} \pm 2(SE) \]

This means we are 95% confident that the true population proportion falls within this range.

Central Limit Theorem

The Central Limit Theorem states that the sampling distribution of the sample proportion will be approximately normal if the following conditions are met:

  1. Observations are independent
  2. There are at least 10 successes and 10 failures

When these conditions are satisfied, the sampling distribution can be modeled using a normal distribution centered at the population proportion.

Hypothesis Testing

Hypothesis testing is used to evaluate claims about population parameters.

The null hypothesis (H₀) represents the current assumption or status quo, while the alternative hypothesis (Hₐ) represents the competing claim.

Example:

\[ H_0: p = p_0 \]

\[ H_A: p \neq p_0 \]

P-Value

The p-value is the probability of observing a sample statistic at least as extreme as the one obtained, assuming the null hypothesis is true.

If the p-value is less than the significance level (α), we reject the null hypothesis.

Type 1 and Type 2 Errors

A Type 1 error occurs when we reject a true null hypothesis.

A Type 2 error occurs when we fail to reject a false null hypothesis.

Significance Level

The significance level is often set to 0.05. This means there is a 5% chance of rejecting the null hypothesis when it is actually true.

Statistical vs Practical Significance

Statistical significance refers to whether a result is unlikely to occur by chance. Practical significance refers to whether the result has meaningful real-world importance.