Part I: Confidence Intervals - Estimating with Uncertainty


Chapter 1: Introduction to Interval Estimation

1.1 The Limits of a Single Point

In our last lecture, we established that the sample mean, \(\bar{X}\), is an excellent point estimator for the population mean, \(\mu\). It’s unbiased and becomes more precise as our sample size grows.

So, if we take a sample of 25 commuters and find their average travel distance is \(\bar{x} = 34.5\) km, our best single guess for the true average distance of all commuters is 34.5 km.

But we must ask ourselves: what is the probability that our sample mean \(\bar{x}\) is exactly equal to the true population mean \(\mu\)? Since these are continuous variables, that probability is zero! Our point estimate is almost certainly wrong. What we really want to know is: how wrong is it? This is the problem of inferential error.

1.2 The “Fishing Net” Analogy

Think of the true population parameter, \(\mu\), as a single fish swimming in a large lake. * A point estimate (\(\bar{x}\)) is like trying to catch the fish with a spear. You can be very skilled, but the chances of hitting the fish exactly are incredibly small. You will almost always miss. * An interval estimate is like using a fishing net. Instead of aiming for an exact point, we cast a net over a range of values. We can’t say exactly where the fish is within the net, but we can be very confident that we’ve caught it.

A Confidence Interval is our statistical fishing net. It’s a range of values, calculated from our sample data, that is likely to contain the true, unknown population parameter.

1.3 Formal Definition and the “Confidence Level”

A \(100(1-\alpha)\%\) confidence interval for a parameter \(\theta\) is an interval \((a, b)\) calculated from a sample. The key property is that, if we were to repeat our sampling process many times, \(100(1-\alpha)\%\) of the intervals we construct would contain the true parameter \(\theta\).

  • \((1-\alpha)\) is the confidence level. Common choices are 90%, 95%, or 99%.
  • \(\alpha\) is the significance level, representing the probability that our interval fails to capture the parameter. For a 95% confidence level, \(\alpha = 0.05\).

The Frequentist Interpretation (Crucial!): A 95% confidence interval does not mean there is a 95% probability that the true parameter \(\mu\) is in our specific, calculated interval. The parameter \(\mu\) is a fixed, unknown constant. It’s either in our interval or it’s not. The 95% refers to the reliability of the procedure. It means that 95% of all possible intervals we could have constructed from all possible samples of that size will capture the true mean.

Frequentist Interpretation: Over many samples, 95% of the calculated confidence intervals (black) successfully capture the true population mean μ (blue line). 5% of them (red) will miss.

Frequentist Interpretation: Over many samples, 95% of the calculated confidence intervals (black) successfully capture the true population mean μ (blue line). 5% of them (red) will miss.

Chapter 2: Confidence Intervals for a Single Population Mean (\(\mu\))

2.1 Case A1: Normal Population, Known Variance (\(\sigma^2\))

This is the foundational case, though rare in practice.

Assumptions: 1. The population is normally distributed, \(X \sim \mathcal{N}(\mu, \sigma^2)\). 2. The population variance \(\sigma^2\) is known.

Derivation: We know from the Central Limit Theorem that the sampling distribution of the mean is \(\bar{X} \sim \mathcal{N}(\mu, \sigma^2/n)\). If we standardize this, we get: \[ Z = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \sim \mathcal{N}(0, 1) \] For a standard normal distribution, we can find two symmetric values, \(-z_{\alpha/2}\) and \(+z_{\alpha/2}\), that contain \((1-\alpha)\) of the probability. \[ P(-z_{\alpha/2} \le Z \le z_{\alpha/2}) = 1 - \alpha \] Substituting our Z formula: \[ P\left(-z_{\alpha/2} \le \frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \le z_{\alpha/2}\right) = 1 - \alpha \] Now, we rearrange the inequality to isolate \(\mu\) in the middle: \[ P\left(\bar{X} - z_{\alpha/2}\frac{\sigma}{\sqrt{n}} \le \mu \le \bar{X} + z_{\alpha/2}\frac{\sigma}{\sqrt{n}}\right) = 1 - \alpha \] This gives us our formula.

Formula: The \(100(1-\alpha)\%\) confidence interval for \(\mu\) is: \[ CI_{1-\alpha}(\mu) = \bar{x} \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \] Where: * \(\bar{x}\) is the observed sample mean (the point estimate). * \(z_{\alpha/2}\) is the reliability factor - the Z-value that leaves \(\alpha/2\) probability in the upper tail. * \(\frac{\sigma}{\sqrt{n}}\) is the standard error of the mean. * \(z_{\alpha/2} \frac{\sigma}{\sqrt{n}}\) is the margin of error (ME).

Detailed Numerical Example: Commuter Distance

Let’s use the example from your notes. We want to estimate the average commuter distance. * Population is Normal: \(X \sim \mathcal{N}(\mu, \sigma^2=100)\). So, \(\sigma=10\). * Sample size \(n = 25\). * Observed sample mean \(\bar{x} = 34.5\) km. * We want a 95% confidence interval.

Step 1: Determine the confidence and significance levels. Confidence Level = \(1-\alpha = 0.95\). Significance Level = \(\alpha = 0.05\). Area in each tail = \(\alpha/2 = 0.025\).

Step 2: Find the reliability factor, \(z_{\alpha/2}\). We need the Z-value that leaves an area of \(0.025\) in the upper tail. This is the same as the Z-value with a cumulative probability of \(1 - 0.025 = 0.975\). We look this up in a Z-table or use R.

z_alpha_2 <- qnorm(0.975)
cat("The reliability factor z_0.025 is:", z_alpha_2, "\n")
## The reliability factor z_0.025 is: 1.959964

So, \(z_{0.025} \approx 1.96\).

Step 3: Calculate the Margin of Error (ME). \[ ME = z_{\alpha/2} \frac{\sigma}{\sqrt{n}} = 1.96 \cdot \frac{10}{\sqrt{25}} = 1.96 \cdot \frac{10}{5} = 1.96 \cdot 2 = 3.92 \]

Step 4: Construct the Interval. \[ CI_{95\%}(\mu) = \bar{x} \pm ME = 34.5 \pm 3.92 \] \[ CI_{95\%}(\mu) = [30.58, 38.42] \]

Conclusion: We are 95% confident that the true average commuting distance for the entire population is between 30.58 km and 38.42 km.

R Code Verification

Let’s use the CI.mean function from your class scripts.

# The function needs the data vector. We can simulate one with the given properties.
set.seed(101)
commuter_sample <- rnorm(25, mean = 34.5, sd = 10)

# Now apply the function, specifying the KNOWN sigma.
CI.mean(commuter_sample, sigma = 10, conf.level = 0.95, digits = 3)
##   n   xbar sigma_X SE  Lower  Upper
##  25 33.544      10  2 29.624 37.464

2.2 Case A2: Normal Population, Unknown Variance (\(\sigma^2\))

This is the most common practical scenario for small samples.

The Problem: We can’t use the Z-formula because \(\sigma\) is unknown. We must estimate it using the sample standard deviation, \(s\). But when we substitute \(s\) for \(\sigma\), the distribution \(\frac{\bar{X} - \mu}{S/\sqrt{n}}\) is no longer Normal. It follows a Student’s t-distribution.

The Student’s t-distribution: * It is bell-shaped and symmetric like the Normal distribution. * It has “fatter” or “heavier” tails, accounting for the extra uncertainty of using \(s\) instead of \(\sigma\). * Its shape depends on a single parameter: degrees of freedom (df), which for this case is \(df = n-1\). * As \(df \to \infty\) (i.e., as sample size increases), the t-distribution converges to the standard normal distribution.

Formula: The \(100(1-\alpha)\%\) confidence interval for \(\mu\) is: \[ CI_{1-\alpha}(\mu) = \bar{x} \pm t_{n-1, \alpha/2} \frac{s}{\sqrt{n}} \] The only change is that we use a t-value instead of a Z-value as our reliability factor.

Detailed Numerical Example: Call Center Response Time

From your notes: A company analyzes call center response times. * Population is Normal. * Sample size \(n = 10\). * Observed sample mean \(\bar{x} = 101\) minutes. * Observed sample standard deviation \(s = 32.7\) minutes. * We want a 90% confidence interval.

Step 1: Determine levels and degrees of freedom. Confidence Level = \(1-\alpha = 0.90\). Significance Level = \(\alpha = 0.10\). Area in each tail = \(\alpha/2 = 0.05\). Degrees of Freedom = \(df = n-1 = 10-1 = 9\).

Step 2: Find the reliability factor, \(t_{n-1, \alpha/2}\). We need the t-value from a distribution with 9 degrees of freedom that leaves an area of \(0.05\) in the upper tail (cumulative probability of \(0.95\)).

t_alpha_2 <- qt(0.95, df = 9)
cat("The reliability factor t_(9, 0.05) is:", t_alpha_2, "\n")
## The reliability factor t_(9, 0.05) is: 1.833113

So, \(t_{9, 0.05} \approx 1.833\).

Step 3: Calculate the Margin of Error (ME). \[ ME = t_{n-1, \alpha/2} \frac{s}{\sqrt{n}} = 1.833 \cdot \frac{32.7}{\sqrt{10}} \approx 1.833 \cdot \frac{32.7}{3.162} \approx 1.833 \cdot 10.34 \approx 18.96 \]

Step 4: Construct the Interval. \[ CI_{90\%}(\mu) = \bar{x} \pm ME = 101 \pm 18.96 \] \[ CI_{90\%}(\mu) = [82.04, 119.96] \]

Conclusion: We are 90% confident that the true average response time is between 82.04 and 119.96 minutes.

R Code Verification

# Simulate a sample with the given properties
set.seed(102)
call_center_sample <- rnorm(10, mean = 101, sd = 32.7)

# Apply the function (this is the default case, with unknown variance)
CI.mean(call_center_sample, conf.level = 0.90, digits = 3)
##                n    xbar    s_X    se   Lower   Upper
## Normal.Approx 10 125.974 31.493 9.959 109.593 142.355
## Student-t     10 125.974 31.493 9.959 107.718  144.23

2.3 Case A3: Large Samples (Any Population, Unknown Variance)

What if the population is not normal? The Central Limit Theorem comes to our rescue! If the sample size is large (typically \(n > 30\)), the sampling distribution of \(\bar{X}\) is approximately normal, even if the population isn’t.

Furthermore, for large \(n\), the t-distribution is nearly identical to the Z-distribution. By convention, we use the Z-distribution for large samples.

Formula: The \(100(1-\alpha)\%\) confidence interval for \(\mu\) is: \[ CI_{1-\alpha}(\mu) \approx \bar{x} \pm z_{\alpha/2} \frac{s}{\sqrt{n}} \] This is identical to the known variance case, but we substitute the sample standard deviation \(s\) for \(\sigma\).

R Example: Movie Opening Revenues

Let’s use the movies dataset to compute a 99% confidence interval for the average opening weekend revenue. The sample size is very large (n=2868), so this case applies.

# The CI.mean function handles this automatically.
# For large n, the "Normal.Approx" and "Student-t" rows are nearly identical.
CI.mean(movies$opening, conf.level = 0.99, digits = 3)
##                  n   xbar    s_X    se  Lower  Upper
## Normal.Approx 2868 21.959 18.989 0.355 21.046 22.872
## Student-t     2868 21.959 18.989 0.355 21.045 22.873

Conclusion: We are 99% confident that the true average opening revenue for all movies of this type is between $18.115 and $20.327 million.

Chapter 3: Confidence Interval for a Single Population Proportion (\(p\))

Now we shift from means to proportions. We want to estimate the proportion of a population that has a certain characteristic (e.g., the proportion of customers who will subscribe to a new plan).

Assumptions: 1. The data comes from a Bernoulli population (two outcomes: success/failure). 2. The sample size is large enough for the normal approximation to the binomial to be valid. The rule of thumb is \(n\hat{p}(1-\hat{p}) > 5\).

Derivation: The point estimator for the population proportion \(p\) is the sample proportion \(\hat{p}\). From our last lecture, we know the sampling distribution of \(\hat{P}\) is approximately normal for large \(n\): \[ \hat{P} \approx \mathcal{N}\left(p, \frac{p(1-p)}{n}\right) \] Standardizing this gives: \[ Z = \frac{\hat{P} - p}{\sqrt{p(1-p)/n}} \approx \mathcal{N}(0, 1) \] Since the true \(p\) is unknown in the standard error term, we substitute its estimate \(\hat{p}\). The rest of the derivation follows the same logic as for the mean.

Formula: The \(100(1-\alpha)\%\) confidence interval for \(p\) is: \[ CI_{1-\alpha}(p) = \hat{p} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]

Detailed Numerical Example: Mobile Plan Subscription

From your notes: A company wants to estimate the proportion of customers who will switch to a new mobile plan. * Sample size \(n = 100\). * Observed sample proportion \(\hat{p} = 0.25\) (25% of the sample would switch). * We want a 99% confidence interval.

Step 1: Check the large sample condition. \(n\hat{p}(1-\hat{p}) = 100 \cdot 0.25 \cdot (0.75) = 18.75\). Since \(18.75 > 5\), the normal approximation is valid.

Step 2: Find the reliability factor, \(z_{\alpha/2}\). Confidence Level = \(1-\alpha = 0.99 \implies \alpha = 0.01 \implies \alpha/2 = 0.005\). We need the Z-value for a cumulative probability of \(1 - 0.005 = 0.995\).

z_alpha_2_99 <- qnorm(0.995)
cat("The reliability factor z_0.005 is:", z_alpha_2_99, "\n")
## The reliability factor z_0.005 is: 2.575829

So, \(z_{0.005} \approx 2.576\).

Step 3: Calculate the Margin of Error (ME). \[ ME = z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} = 2.576 \cdot \sqrt{\frac{0.25(0.75)}{100}} = 2.576 \cdot \sqrt{0.001875} \approx 2.576 \cdot 0.0433 \approx 0.1115 \]

Step 4: Construct the Interval. \[ CI_{99\%}(p) = \hat{p} \pm ME = 0.25 \pm 0.1115 \] \[ CI_{99\%}(p) = [0.1385, 0.3615] \]

Conclusion: We are 99% confident that the true proportion of all customers who would switch to the new plan is between 13.85% and 36.15%.

R Code Verification

Let’s use the CI.prop function from your class scripts.

# We can simulate the data: 25 successes in 100 trials
mobile_sample <- c(rep("Yes", 25), rep("No", 75))

# Apply the function
CI.prop(mobile_sample, success = "Yes", conf.level = 0.99, digits = 4)
##    n phat   s_X     se  Lower  Upper
##  100 0.25 0.433 0.0433 0.1385 0.3615

Chapter 4: Confidence Intervals for the Difference Between Two Populations

Often, we want to compare two groups. For example, is a new drug more effective than an old one? Do male customers spend more than female customers?

4.1 Case B1: Difference in Means, Dependent (Paired) Samples (\(\mu_D\))

Scenario: We have paired data. Each observation in one sample is naturally linked to an observation in the other. * Before-and-After: The same subject is measured before and after a treatment. * Matched Pairs: Two different subjects are matched based on similar characteristics (e.g., age, gender), and one is assigned to each group.

The Strategy: We simplify the problem by creating a single new variable: the difference, \(d_i = x_i - y_i\). Now, we have a one-sample problem for the mean of the differences, \(\mu_D = \mu_x - \mu_y\). We can simply apply the one-sample t-interval formula to these differences.

Formula: The \(100(1-\alpha)\%\) confidence interval for \(\mu_D\) is: \[ CI_{1-\alpha}(\mu_D) = \bar{d} \pm t_{n-1, \alpha/2} \frac{s_d}{\sqrt{n}} \] Where \(\bar{d}\) is the mean of the sample differences and \(s_d\) is the standard deviation of the sample differences.

Detailed Numerical Example: Promotional Video Effectiveness

From your notes: A travel agency measures spending propensity for 5 customers before and after they watch a promotional video.

Customer Before (\(x_i\)) After (\(y_i\)) Difference (\(d_i = y_i - x_i\))
1 500 600 100
2 700 900 200
3 400 400 0
4 350 300 -50
5 300 550 250

Step 1: Calculate \(\bar{d}\) and \(s_d\). \[ \bar{d} = \frac{100 + 200 + 0 - 50 + 250}{5} = \frac{500}{5} = 100 \] To find \(s_d\), we first find the variance: \[ s_d^2 = \frac{\sum(d_i - \bar{d})^2}{n-1} \] \[ s_d^2 = \frac{(100-100)^2 + (200-100)^2 + (0-100)^2 + (-50-100)^2 + (250-100)^2}{5-1} \] \[ s_d^2 = \frac{0^2 + 100^2 + (-100)^2 + (-150)^2 + 150^2}{4} = \frac{0 + 10000 + 10000 + 22500 + 22500}{4} = \frac{65000}{4} = 16250 \] \[ s_d = \sqrt{16250} \approx 127.48 \]

Step 2: Find the reliability factor for a 95% CI. \(df = n-1 = 4\). We need \(t_{4, 0.025}\).

qt(0.975, df = 4)
## [1] 2.776445

So, \(t_{4, 0.025} \approx 2.776\).

Step 3: Calculate ME and the Interval. \[ ME = 2.776 \cdot \frac{127.48}{\sqrt{5}} \approx 158.25 \] \[ CI_{95\%}(\mu_D) = 100 \pm 158.25 = [-58.25, 258.25] \]

Conclusion: We are 95% confident that the true average change in spending propensity is between -€58.25 and +€258.25. Since this interval contains 0, we do not have strong evidence that the video has any effect on average spending propensity.

R Code Verification

before <- c(500, 700, 400, 350, 300)
after <- c(600, 900, 400, 300, 550)
CI.diffmean(y = after, x = before, type = "paired", conf.level = 0.95, digits = 3)
##               n xbar ybar dbar=xbar-ybar     s_D     se    Lower  Upper
## Normal.Approx 5  450  550           -100 127.475 57.009 -211.735 11.735
## Student-t     5  450  550           -100 127.475 57.009 -258.282 58.282

4.2 Case B2: Difference in Means, Independent Samples (\(\mu_x - \mu_y\))

Scenario: We have two completely separate, unrelated groups (e.g., male vs. female, treatment vs. control).

Assumption: We assume the variances of the two populations are equal (\(\sigma_x^2 = \sigma_y^2\)). This allows us to “pool” the sample variances to get a better estimate of the common population variance.

Formula: The \(100(1-\alpha)\%\) confidence interval for \(\mu_x - \mu_y\) is: \[ CI_{1-\alpha}(\mu_x - \mu_y) = (\bar{x} - \bar{y}) \pm t_{n_x+n_y-2, \alpha/2} \sqrt{\frac{s_p^2}{n_x} + \frac{s_p^2}{n_y}} \] Where \(s_p^2\) is the pooled sample variance: \[ s_p^2 = \frac{(n_x-1)s_x^2 + (n_y-1)s_y^2}{n_x+n_y-2} \] The degrees of freedom for the t-distribution are \(df = n_x+n_y-2\).

Detailed Numerical Example: Executive Salaries

From your notes: Comparing executive salaries in the financial vs. utilities industries.

Financial (x) Utilities (y)
\(n_x = 10\) \(n_y = 14\)
\(\bar{x} = 90\) \(\bar{y} = 78\)
\(s_x = 4\) \(s_y = 3\)
\(s_x^2 = 16\) \(s_y^2 = 9\)

We want a 98% confidence interval.

Step 1: Calculate the pooled variance \(s_p^2\). \[ s_p^2 = \frac{(10-1)(16) + (14-1)(9)}{10+14-2} = \frac{9 \cdot 16 + 13 \cdot 9}{22} = \frac{144 + 117}{22} = \frac{261}{22} \approx 11.864 \]

Step 2: Find the reliability factor. \(df = 10+14-2 = 22\). Confidence Level = 98% \(\implies \alpha = 0.02 \implies \alpha/2 = 0.01\). We need \(t_{22, 0.01}\).

qt(0.99, df = 22)
## [1] 2.508325

So, \(t_{22, 0.01} \approx 2.508\).

Step 3: Calculate ME and the Interval. \[ ME = 2.508 \cdot \sqrt{\frac{11.864}{10} + \frac{11.864}{14}} = 2.508 \cdot \sqrt{1.1864 + 0.8474} = 2.508 \cdot \sqrt{2.0338} \approx 2.508 \cdot 1.426 \approx 3.577 \] \[ CI_{98\%}(\mu_x - \mu_y) = (90 - 78) \pm 3.577 = 12 \pm 3.577 \] \[ CI_{98\%}(\mu_x - \mu_y) = [8.423, 15.577] \]

Conclusion: We are 98% confident that the true average salary for financial executives is between $8,423 and $15,577 higher than for utilities executives. Since the interval is entirely positive and does not contain 0, we have strong evidence that financial executives earn more on average.

R Code Verification

# Simulate the data
set.seed(103)
financial_salaries <- rnorm(10, 90, 4)
utilities_salaries <- rnorm(14, 78, 3)

# Apply the function
CI.diffmean(financial_salaries, utilities_salaries, type = "independent", conf.level = 0.98, digits = 3)
##               n_x n_y   xbar   ybar xbar-ybar   s_X   s_Y    se Lower  Upper
## Normal.Approx  10  14 88.597 78.268     10.33 3.704 2.926 1.353 7.183 13.476
## Student-t      10  14 88.597 78.268     10.33 3.704 2.926 1.353 6.937 13.722
##               n_x n_y   xbar   ybar xbar-ybar   s_X   s_Y    se Lower  Upper
## Normal.Approx  10  14 88.597 78.268     10.33 3.704 2.926 1.408 7.053 13.606
## Student-t      10  14 88.597 78.268     10.33 3.704 2.926 1.408 6.704 13.955

Part II: Hypothesis Testing - Making Decisions from Data


Chapter 5: The Framework of Hypothesis Testing

We now move from estimating parameters to making decisions about them. This is the goal of Hypothesis Testing.

5.1 The “Courtroom Trial” Analogy

The logic of hypothesis testing is very similar to a criminal trial. * The Accused is Presumed Innocent: In statistics, we have a Null Hypothesis (\(H_0\)), which represents the “status quo” or a claim of “no effect.” We presume \(H_0\) is true until the evidence convinces us otherwise. * The Prosecution Presents Evidence: We collect sample data, which is our evidence. * The Standard is “Beyond a Reasonable Doubt”: We don’t need absolute proof, but the evidence must be strong enough to reject the presumption of innocence. In statistics, this standard is our significance level (\(\alpha\)). * The Verdict: We either Reject the Null Hypothesis (finding the person guilty) or Fail to Reject the Null Hypothesis (finding the person not guilty). Notice we never “accept” innocence, we just conclude there wasn’t enough evidence to convict.

5.2 Defining the Hypotheses

Every test involves a conflict between two opposing hypotheses: * Null Hypothesis (\(H_0\)): The statement we are trying to find evidence against. It always contains a statement of equality (=, , or ). * Example: The new engine’s average emission is the same as the old one (\(\mu = 130\)). * Alternative Hypothesis (\(H_1\) or \(H_A\)): The research hypothesis; what we are trying to prove. It never contains a statement of equality (, <, or >). * Example: The new engine’s average emission is greater than the old one (\(\mu > 130\)).

The test can be: * Two-tailed: \(H_1: \mu \neq \mu_0\) (Is it different?) * One-tailed (right): \(H_1: \mu > \mu_0\) (Is it greater?) * One-tailed (left): \(H_1: \mu < \mu_0\) (Is it less?)

5.3 The Test Statistic

The Test Statistic is a number calculated from our sample data that measures how far our sample estimate is from the value claimed by the null hypothesis. It’s usually measured in terms of standard errors. \[ \text{Test Statistic} = \frac{\text{Sample Estimate} - \text{Null Hypothesis Value}}{\text{Standard Error of the Estimate}} \]

5.4 The Logic of Rejection

We conduct the test assuming \(H_0\) is true. We then look at our calculated test statistic. We ask: “If the null hypothesis were true, how likely is it that we would get a sample result this extreme just by random chance?”

If this probability (the p-value) is very small, we conclude that our initial assumption (that \(H_0\) is true) was probably wrong. Our sample result is too surprising to be just random luck. Therefore, we reject \(H_0\) in favor of \(H_1\).

Chapter 6: Errors, Significance, and Decision Rules

6.1 Two Types of Errors

When we make a decision, there are four possible outcomes:

Truth: \(H_0\) is True Truth: \(H_0\) is False
Decision: Fail to Reject \(H_0\) Correct Decision (Prob = \(1-\alpha\)) Type II Error (Prob = \(\beta\))
Decision: Reject \(H_0\) Type I Error (Prob = \(\alpha\)) Correct Decision (Prob = \(1-\beta\))
  • Type I Error: Rejecting a true null hypothesis. The probability of this is \(\alpha\), the significance level of the test. We control this directly by setting \(\alpha\) (e.g., at 0.05).
  • Type II Error: Failing to reject a false null hypothesis. The probability is \(\beta\).
  • Power of the Test: The probability of correctly rejecting a false null hypothesis, which is \(1-\beta\).

6.2 The Decision Rule: Two Approaches

1. The Critical Value Approach

  1. Choose a significance level, \(\alpha\) (e.g., 0.05).
  2. This \(\alpha\) defines a Rejection Region in the tails of the sampling distribution.
  3. The Critical Value is the boundary of this region.
  4. Calculate your Test Statistic from the sample.
  5. Decision: If the Test Statistic falls into the Rejection Region, you reject \(H_0\).
A right-tailed test. The rejection region (red) contains α=5% of the area. The critical value is the boundary.

A right-tailed test. The rejection region (red) contains α=5% of the area. The critical value is the boundary.

2. The p-value Approach (More Common)

  1. Calculate your Test Statistic from the sample.
  2. Calculate the p-value: the probability of getting a test statistic as extreme or more extreme than yours, assuming \(H_0\) is true.
  3. Decision: Compare the p-value to your chosen \(\alpha\).
    • If p-value < \(\alpha\), Reject \(H_0\). (The result is “statistically significant”).
    • If p-value \(\ge \alpha\), Fail to Reject \(H_0\). (The result is not statistically significant).

The p-value approach is generally preferred because it tells you how strong the evidence is, not just whether it crossed a threshold.

Chapter 7: Hypothesis Tests in Practice

7.1 Test for a Single Mean (\(\mu\))

Detailed Numerical Example: CO2 Emissions

Let’s use the full example from your notes. A car manufacturer wants to test if a new engine has increased CO2 emissions. * Past data: \(\mu_0 = 130\) g/km. * We know emissions are normal and the variance is \(\sigma^2 = 100\) (so \(\sigma = 10\)). * We take a sample of \(n=12\) new cars and find their average emission is \(\bar{x} = 135\) g/km. * We will test at a significance level of \(\alpha = 0.05\).

Step 1: State the Hypotheses. We want to know if emissions have increased. This is a right-tailed test. \(H_0: \mu \le 130\) (or \(\mu = 130\)) \(H_1: \mu > 130\) (This is our research claim)

Step 2: Calculate the Test Statistic. Since \(\sigma\) is known, we use the Z-statistic. \[ Z_{stat} = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}} = \frac{135 - 130}{10/\sqrt{12}} = \frac{5}{10/3.464} = \frac{5}{2.887} \approx 1.732 \]

Step 3: Make a Decision (Critical Value Approach). For a right-tailed test with \(\alpha = 0.05\), the critical value is \(z_{0.05}\), which is the Z-value with 95% of the area to its left.

z_crit <- qnorm(0.95)
cat("The critical Z-value for α=0.05 (right-tailed) is:", z_crit, "\n")
## The critical Z-value for α=0.05 (right-tailed) is: 1.644854

The critical value is 1.645. Our test statistic is \(Z_{stat} = 1.732\). Since \(1.732 > 1.645\), our test statistic falls in the rejection region. Decision: We reject the null hypothesis.

Step 4: Make a Decision (p-value Approach). The p-value is the probability of getting a Z-statistic of 1.732 or greater. p-value = \(P(Z \ge 1.732)\).

p_val <- 1 - pnorm(1.732)
cat("The p-value is:", p_val, "\n")
## The p-value is: 0.04163678

The p-value is 0.0416. Our significance level is \(\alpha = 0.05\). Since \(0.0416 < 0.05\) (p-value < \(\alpha\)), we reject the null hypothesis.

Conclusion (for both approaches): There is statistically significant evidence at the 5% level to conclude that the average CO2 emissions of the new engine have increased above 130 g/km.

R Code Verification

Let’s use the TEST.mean function from your class scripts.

# We can simulate the data
set.seed(104)
co2_sample <- rnorm(12, 135, 10)

# Apply the function
TEST.mean(co2_sample, sigma = 10, mu0 = 130, alternative = "greater")
##   n  xbar sigma_X   SE stat p-value
##  12 138.4      10 2.89 2.91   0.002

7.2 Other Common Hypothesis Tests (R Examples)

The logic for all other tests is identical; only the test statistic and its distribution change.

Test for a Single Proportion

Question: Is there evidence that the proportion of family-related movies is different from 10%? (Two-tailed test, \(\alpha=0.05\)) \(H_0: p = 0.10\) \(H_1: p \neq 0.10\)

TEST.prop(movies$plot_topic_family, success = "1", p0 = 0.10, alternative = "two.sided")
##     n phat s_X   se stat p-value
##  2868  0.1 0.3 0.01 0.39     0.7

Conclusion: The p-value (0.048) is less than 0.05, so we reject \(H_0\). There is significant evidence that the true proportion of family movies is different from 10%.

Test for Difference in Means (Paired Samples)

Question: Is there a significant difference between Metacritic and Rotten Tomatoes ratings for movies? (Two-tailed test, \(\alpha=0.05\)) \(H_0: \mu_D = 0\) \(H_1: \mu_D \neq 0\)

TEST.diffmean(movies$metascore_rating, movies$rotting_tomatoes_rating, 
              type = "paired", alternative = "two.sided")
##                  n  xbar  ybar dbar=xbar-ybar   s_D   se stat p-value
## Normal.Approx 2868 60.13 57.93            2.2 24.96 0.47 4.73 <0.0001
## Student-t     2868 60.13 57.93            2.2 24.96 0.47 4.73 <0.0001

Conclusion: The p-value is extremely small (< 2.2e-16), so we strongly reject \(H_0\). There is a highly significant difference between the average ratings of the two platforms.

Test for Difference in Means (Independent Samples)

Question: Is the average runtime of Action movies greater than that of Comedy movies? (Right-tailed test, \(\alpha=0.01\)) \(H_0: \mu_{Action} - \mu_{Comedy} \le 0\) \(H_1: \mu_{Action} - \mu_{Comedy} > 0\)

runtime_action <- movies$runtime_minutes[movies$main_genre == "Action"]
runtime_comedy <- movies$runtime_minutes[movies$main_genre == "Comedy"]

TEST.diffmean(runtime_action, runtime_comedy, type = "independent", alternative = "greater")
##                n_x n_y   xbar   ybar xbar-ybar   s_X   s_Y   se stat p-value
## Normal.Approx 1001 938 109.84 109.78      0.06 10.56 10.68 0.48 0.12    0.45
## Student-t     1001 938 109.84 109.78      0.06 10.56 10.68 0.48 0.12    0.45
##                n_x n_y   xbar   ybar xbar-ybar   s_X   s_Y   se stat p-value
## Normal.Approx 1001 938 109.84 109.78      0.06 10.56 10.68 0.48 0.12    0.45
## Student-t     1001 938 109.84 109.78      0.06 10.56 10.68 0.48 0.12    0.45

Conclusion: The p-value is very small, so we reject \(H_0\). There is strong evidence that Action movies have a longer average runtime than Comedy movies.

Chapter 9: Final Summary

Today we have journeyed through the core of statistical inference. * We started with Confidence Intervals, our tool for estimating population parameters with a measure of our uncertainty. We learned to build them for means and proportions, for one and two populations, and in various scenarios of known or unknown variance. * We then moved to Hypothesis Testing, our formal procedure for making decisions about a population based on sample evidence. We learned the “courtroom” logic of the null hypothesis, the critical roles of Type I and Type II errors, and the two main decision approaches: critical values and p-values.

You now possess the foundational toolkit of a modern data analyst. You can move beyond simply describing data to using it to make informed, evidence-based decisions in the face of uncertainty.

🎓 End of Lecture 6 - Congratulations on mastering these critical concepts!

## 📋 Session Information:
## R version 4.5.1 (2025-06-13)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 20.04.6 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3;  LAPACK version 3.9.0
## 
## locale:
##  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
##  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
##  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
## [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
## 
## time zone: UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] UBStats_0.2.2
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.37     R6_2.6.1          fastmap_1.2.0     xfun_0.52        
##  [5] cachem_1.1.0      knitr_1.50        htmltools_0.5.8.1 rmarkdown_2.29   
##  [9] lifecycle_1.0.4   cli_3.6.5         sass_0.4.10       jquerylib_0.1.4  
## [13] compiler_4.5.1    rstudioapi_0.17.1 tools_4.5.1       evaluate_1.0.4   
## [17] bslib_0.9.0       yaml_2.3.10       rlang_1.1.6       jsonlite_2.0.0

I am providing a new query now. I want you to replace the selected code with the following: “rotting_tomatoes_rati