In our last lecture, we established that the sample mean, \(\bar{X}\), is an excellent point estimator for the population mean, \(\mu\). It’s unbiased and becomes more precise as our sample size grows.
So, if we take a sample of 25 commuters and find their average travel distance is \(\bar{x} = 34.5\) km, our best single guess for the true average distance of all commuters is 34.5 km.
But we must ask ourselves: what is the probability that our sample mean \(\bar{x}\) is exactly equal to the true population mean \(\mu\)? Since these are continuous variables, that probability is zero! Our point estimate is almost certainly wrong. What we really want to know is: how wrong is it? This is the problem of inferential error.
Think of the true population parameter, \(\mu\), as a single fish swimming in a large lake. * A point estimate (\(\bar{x}\)) is like trying to catch the fish with a spear. You can be very skilled, but the chances of hitting the fish exactly are incredibly small. You will almost always miss. * An interval estimate is like using a fishing net. Instead of aiming for an exact point, we cast a net over a range of values. We can’t say exactly where the fish is within the net, but we can be very confident that we’ve caught it.
A Confidence Interval is our statistical fishing net. It’s a range of values, calculated from our sample data, that is likely to contain the true, unknown population parameter.
A \(100(1-\alpha)\%\) confidence interval for a parameter \(\theta\) is an interval \((a, b)\) calculated from a sample. The key property is that, if we were to repeat our sampling process many times, \(100(1-\alpha)\%\) of the intervals we construct would contain the true parameter \(\theta\).
The Frequentist Interpretation (Crucial!): A 95% confidence interval does not mean there is a 95% probability that the true parameter \(\mu\) is in our specific, calculated interval. The parameter \(\mu\) is a fixed, unknown constant. It’s either in our interval or it’s not. The 95% refers to the reliability of the procedure. It means that 95% of all possible intervals we could have constructed from all possible samples of that size will capture the true mean.
Frequentist Interpretation: Over many samples, 95% of the calculated confidence intervals (black) successfully capture the true population mean μ (blue line). 5% of them (red) will miss.
This is the foundational case, though rare in practice.
Assumptions: 1. The population is normally distributed, \(X \sim \mathcal{N}(\mu, \sigma^2)\). 2. The population variance \(\sigma^2\) is known.
Derivation: We know from the Central Limit Theorem that the sampling distribution of the mean is \(\bar{X} \sim \mathcal{N}(\mu, \sigma^2/n)\). If we standardize this, we get: \[ Z = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \sim \mathcal{N}(0, 1) \] For a standard normal distribution, we can find two symmetric values, \(-z_{\alpha/2}\) and \(+z_{\alpha/2}\), that contain \((1-\alpha)\) of the probability. \[ P(-z_{\alpha/2} \le Z \le z_{\alpha/2}) = 1 - \alpha \] Substituting our Z formula: \[ P\left(-z_{\alpha/2} \le \frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \le z_{\alpha/2}\right) = 1 - \alpha \] Now, we rearrange the inequality to isolate \(\mu\) in the middle: \[ P\left(\bar{X} - z_{\alpha/2}\frac{\sigma}{\sqrt{n}} \le \mu \le \bar{X} + z_{\alpha/2}\frac{\sigma}{\sqrt{n}}\right) = 1 - \alpha \] This gives us our formula.
Formula: The \(100(1-\alpha)\%\) confidence interval for \(\mu\) is: \[ CI_{1-\alpha}(\mu) = \bar{x} \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}} \] Where: * \(\bar{x}\) is the observed sample mean (the point estimate). * \(z_{\alpha/2}\) is the reliability factor - the Z-value that leaves \(\alpha/2\) probability in the upper tail. * \(\frac{\sigma}{\sqrt{n}}\) is the standard error of the mean. * \(z_{\alpha/2} \frac{\sigma}{\sqrt{n}}\) is the margin of error (ME).
Let’s use the example from your notes. We want to estimate the average commuter distance. * Population is Normal: \(X \sim \mathcal{N}(\mu, \sigma^2=100)\). So, \(\sigma=10\). * Sample size \(n = 25\). * Observed sample mean \(\bar{x} = 34.5\) km. * We want a 95% confidence interval.
Step 1: Determine the confidence and significance levels. Confidence Level = \(1-\alpha = 0.95\). Significance Level = \(\alpha = 0.05\). Area in each tail = \(\alpha/2 = 0.025\).
Step 2: Find the reliability factor, \(z_{\alpha/2}\). We need the Z-value that leaves an area of \(0.025\) in the upper tail. This is the same as the Z-value with a cumulative probability of \(1 - 0.025 = 0.975\). We look this up in a Z-table or use R.
## The reliability factor z_0.025 is: 1.959964
So, \(z_{0.025} \approx 1.96\).
Step 3: Calculate the Margin of Error (ME). \[ ME = z_{\alpha/2} \frac{\sigma}{\sqrt{n}} = 1.96 \cdot \frac{10}{\sqrt{25}} = 1.96 \cdot \frac{10}{5} = 1.96 \cdot 2 = 3.92 \]
Step 4: Construct the Interval. \[ CI_{95\%}(\mu) = \bar{x} \pm ME = 34.5 \pm 3.92 \] \[ CI_{95\%}(\mu) = [30.58, 38.42] \]
Conclusion: We are 95% confident that the true average commuting distance for the entire population is between 30.58 km and 38.42 km.
Let’s use the CI.mean function from your class
scripts.
# The function needs the data vector. We can simulate one with the given properties.
set.seed(101)
commuter_sample <- rnorm(25, mean = 34.5, sd = 10)
# Now apply the function, specifying the KNOWN sigma.
CI.mean(commuter_sample, sigma = 10, conf.level = 0.95, digits = 3)## n xbar sigma_X SE Lower Upper
## 25 33.544 10 2 29.624 37.464
This is the most common practical scenario for small samples.
The Problem: We can’t use the Z-formula because \(\sigma\) is unknown. We must estimate it using the sample standard deviation, \(s\). But when we substitute \(s\) for \(\sigma\), the distribution \(\frac{\bar{X} - \mu}{S/\sqrt{n}}\) is no longer Normal. It follows a Student’s t-distribution.
The Student’s t-distribution: * It is bell-shaped and symmetric like the Normal distribution. * It has “fatter” or “heavier” tails, accounting for the extra uncertainty of using \(s\) instead of \(\sigma\). * Its shape depends on a single parameter: degrees of freedom (df), which for this case is \(df = n-1\). * As \(df \to \infty\) (i.e., as sample size increases), the t-distribution converges to the standard normal distribution.
Formula: The \(100(1-\alpha)\%\) confidence interval for \(\mu\) is: \[ CI_{1-\alpha}(\mu) = \bar{x} \pm t_{n-1, \alpha/2} \frac{s}{\sqrt{n}} \] The only change is that we use a t-value instead of a Z-value as our reliability factor.
From your notes: A company analyzes call center response times. * Population is Normal. * Sample size \(n = 10\). * Observed sample mean \(\bar{x} = 101\) minutes. * Observed sample standard deviation \(s = 32.7\) minutes. * We want a 90% confidence interval.
Step 1: Determine levels and degrees of freedom. Confidence Level = \(1-\alpha = 0.90\). Significance Level = \(\alpha = 0.10\). Area in each tail = \(\alpha/2 = 0.05\). Degrees of Freedom = \(df = n-1 = 10-1 = 9\).
Step 2: Find the reliability factor, \(t_{n-1, \alpha/2}\). We need the t-value from a distribution with 9 degrees of freedom that leaves an area of \(0.05\) in the upper tail (cumulative probability of \(0.95\)).
## The reliability factor t_(9, 0.05) is: 1.833113
So, \(t_{9, 0.05} \approx 1.833\).
Step 3: Calculate the Margin of Error (ME). \[ ME = t_{n-1, \alpha/2} \frac{s}{\sqrt{n}} = 1.833 \cdot \frac{32.7}{\sqrt{10}} \approx 1.833 \cdot \frac{32.7}{3.162} \approx 1.833 \cdot 10.34 \approx 18.96 \]
Step 4: Construct the Interval. \[ CI_{90\%}(\mu) = \bar{x} \pm ME = 101 \pm 18.96 \] \[ CI_{90\%}(\mu) = [82.04, 119.96] \]
Conclusion: We are 90% confident that the true average response time is between 82.04 and 119.96 minutes.
# Simulate a sample with the given properties
set.seed(102)
call_center_sample <- rnorm(10, mean = 101, sd = 32.7)
# Apply the function (this is the default case, with unknown variance)
CI.mean(call_center_sample, conf.level = 0.90, digits = 3)## n xbar s_X se Lower Upper
## Normal.Approx 10 125.974 31.493 9.959 109.593 142.355
## Student-t 10 125.974 31.493 9.959 107.718 144.23
What if the population is not normal? The Central Limit Theorem comes to our rescue! If the sample size is large (typically \(n > 30\)), the sampling distribution of \(\bar{X}\) is approximately normal, even if the population isn’t.
Furthermore, for large \(n\), the t-distribution is nearly identical to the Z-distribution. By convention, we use the Z-distribution for large samples.
Formula: The \(100(1-\alpha)\%\) confidence interval for \(\mu\) is: \[ CI_{1-\alpha}(\mu) \approx \bar{x} \pm z_{\alpha/2} \frac{s}{\sqrt{n}} \] This is identical to the known variance case, but we substitute the sample standard deviation \(s\) for \(\sigma\).
Let’s use the movies dataset to compute a 99% confidence
interval for the average opening weekend revenue. The sample size is
very large (n=2868), so this case applies.
# The CI.mean function handles this automatically.
# For large n, the "Normal.Approx" and "Student-t" rows are nearly identical.
CI.mean(movies$opening, conf.level = 0.99, digits = 3)## n xbar s_X se Lower Upper
## Normal.Approx 2868 21.959 18.989 0.355 21.046 22.872
## Student-t 2868 21.959 18.989 0.355 21.045 22.873
Conclusion: We are 99% confident that the true average opening revenue for all movies of this type is between $18.115 and $20.327 million.
Now we shift from means to proportions. We want to estimate the proportion of a population that has a certain characteristic (e.g., the proportion of customers who will subscribe to a new plan).
Assumptions: 1. The data comes from a Bernoulli population (two outcomes: success/failure). 2. The sample size is large enough for the normal approximation to the binomial to be valid. The rule of thumb is \(n\hat{p}(1-\hat{p}) > 5\).
Derivation: The point estimator for the population proportion \(p\) is the sample proportion \(\hat{p}\). From our last lecture, we know the sampling distribution of \(\hat{P}\) is approximately normal for large \(n\): \[ \hat{P} \approx \mathcal{N}\left(p, \frac{p(1-p)}{n}\right) \] Standardizing this gives: \[ Z = \frac{\hat{P} - p}{\sqrt{p(1-p)/n}} \approx \mathcal{N}(0, 1) \] Since the true \(p\) is unknown in the standard error term, we substitute its estimate \(\hat{p}\). The rest of the derivation follows the same logic as for the mean.
Formula: The \(100(1-\alpha)\%\) confidence interval for \(p\) is: \[ CI_{1-\alpha}(p) = \hat{p} \pm z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]
From your notes: A company wants to estimate the proportion of customers who will switch to a new mobile plan. * Sample size \(n = 100\). * Observed sample proportion \(\hat{p} = 0.25\) (25% of the sample would switch). * We want a 99% confidence interval.
Step 1: Check the large sample condition. \(n\hat{p}(1-\hat{p}) = 100 \cdot 0.25 \cdot (0.75) = 18.75\). Since \(18.75 > 5\), the normal approximation is valid.
Step 2: Find the reliability factor, \(z_{\alpha/2}\). Confidence Level = \(1-\alpha = 0.99 \implies \alpha = 0.01 \implies \alpha/2 = 0.005\). We need the Z-value for a cumulative probability of \(1 - 0.005 = 0.995\).
## The reliability factor z_0.005 is: 2.575829
So, \(z_{0.005} \approx 2.576\).
Step 3: Calculate the Margin of Error (ME). \[ ME = z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} = 2.576 \cdot \sqrt{\frac{0.25(0.75)}{100}} = 2.576 \cdot \sqrt{0.001875} \approx 2.576 \cdot 0.0433 \approx 0.1115 \]
Step 4: Construct the Interval. \[ CI_{99\%}(p) = \hat{p} \pm ME = 0.25 \pm 0.1115 \] \[ CI_{99\%}(p) = [0.1385, 0.3615] \]
Conclusion: We are 99% confident that the true proportion of all customers who would switch to the new plan is between 13.85% and 36.15%.
Let’s use the CI.prop function from your class
scripts.
# We can simulate the data: 25 successes in 100 trials
mobile_sample <- c(rep("Yes", 25), rep("No", 75))
# Apply the function
CI.prop(mobile_sample, success = "Yes", conf.level = 0.99, digits = 4)## n phat s_X se Lower Upper
## 100 0.25 0.433 0.0433 0.1385 0.3615
Often, we want to compare two groups. For example, is a new drug more effective than an old one? Do male customers spend more than female customers?
Scenario: We have paired data. Each observation in one sample is naturally linked to an observation in the other. * Before-and-After: The same subject is measured before and after a treatment. * Matched Pairs: Two different subjects are matched based on similar characteristics (e.g., age, gender), and one is assigned to each group.
The Strategy: We simplify the problem by creating a single new variable: the difference, \(d_i = x_i - y_i\). Now, we have a one-sample problem for the mean of the differences, \(\mu_D = \mu_x - \mu_y\). We can simply apply the one-sample t-interval formula to these differences.
Formula: The \(100(1-\alpha)\%\) confidence interval for \(\mu_D\) is: \[ CI_{1-\alpha}(\mu_D) = \bar{d} \pm t_{n-1, \alpha/2} \frac{s_d}{\sqrt{n}} \] Where \(\bar{d}\) is the mean of the sample differences and \(s_d\) is the standard deviation of the sample differences.
From your notes: A travel agency measures spending propensity for 5 customers before and after they watch a promotional video.
| Customer | Before (\(x_i\)) | After (\(y_i\)) | Difference (\(d_i = y_i - x_i\)) |
|---|---|---|---|
| 1 | 500 | 600 | 100 |
| 2 | 700 | 900 | 200 |
| 3 | 400 | 400 | 0 |
| 4 | 350 | 300 | -50 |
| 5 | 300 | 550 | 250 |
Step 1: Calculate \(\bar{d}\) and \(s_d\). \[ \bar{d} = \frac{100 + 200 + 0 - 50 + 250}{5} = \frac{500}{5} = 100 \] To find \(s_d\), we first find the variance: \[ s_d^2 = \frac{\sum(d_i - \bar{d})^2}{n-1} \] \[ s_d^2 = \frac{(100-100)^2 + (200-100)^2 + (0-100)^2 + (-50-100)^2 + (250-100)^2}{5-1} \] \[ s_d^2 = \frac{0^2 + 100^2 + (-100)^2 + (-150)^2 + 150^2}{4} = \frac{0 + 10000 + 10000 + 22500 + 22500}{4} = \frac{65000}{4} = 16250 \] \[ s_d = \sqrt{16250} \approx 127.48 \]
Step 2: Find the reliability factor for a 95% CI. \(df = n-1 = 4\). We need \(t_{4, 0.025}\).
## [1] 2.776445
So, \(t_{4, 0.025} \approx 2.776\).
Step 3: Calculate ME and the Interval. \[ ME = 2.776 \cdot \frac{127.48}{\sqrt{5}} \approx 158.25 \] \[ CI_{95\%}(\mu_D) = 100 \pm 158.25 = [-58.25, 258.25] \]
Conclusion: We are 95% confident that the true average change in spending propensity is between -€58.25 and +€258.25. Since this interval contains 0, we do not have strong evidence that the video has any effect on average spending propensity.
before <- c(500, 700, 400, 350, 300)
after <- c(600, 900, 400, 300, 550)
CI.diffmean(y = after, x = before, type = "paired", conf.level = 0.95, digits = 3)## n xbar ybar dbar=xbar-ybar s_D se Lower Upper
## Normal.Approx 5 450 550 -100 127.475 57.009 -211.735 11.735
## Student-t 5 450 550 -100 127.475 57.009 -258.282 58.282
Scenario: We have two completely separate, unrelated groups (e.g., male vs. female, treatment vs. control).
Assumption: We assume the variances of the two populations are equal (\(\sigma_x^2 = \sigma_y^2\)). This allows us to “pool” the sample variances to get a better estimate of the common population variance.
Formula: The \(100(1-\alpha)\%\) confidence interval for \(\mu_x - \mu_y\) is: \[ CI_{1-\alpha}(\mu_x - \mu_y) = (\bar{x} - \bar{y}) \pm t_{n_x+n_y-2, \alpha/2} \sqrt{\frac{s_p^2}{n_x} + \frac{s_p^2}{n_y}} \] Where \(s_p^2\) is the pooled sample variance: \[ s_p^2 = \frac{(n_x-1)s_x^2 + (n_y-1)s_y^2}{n_x+n_y-2} \] The degrees of freedom for the t-distribution are \(df = n_x+n_y-2\).
From your notes: Comparing executive salaries in the financial vs. utilities industries.
| Financial (x) | Utilities (y) |
|---|---|
| \(n_x = 10\) | \(n_y = 14\) |
| \(\bar{x} = 90\) | \(\bar{y} = 78\) |
| \(s_x = 4\) | \(s_y = 3\) |
| \(s_x^2 = 16\) | \(s_y^2 = 9\) |
We want a 98% confidence interval.
Step 1: Calculate the pooled variance \(s_p^2\). \[ s_p^2 = \frac{(10-1)(16) + (14-1)(9)}{10+14-2} = \frac{9 \cdot 16 + 13 \cdot 9}{22} = \frac{144 + 117}{22} = \frac{261}{22} \approx 11.864 \]
Step 2: Find the reliability factor. \(df = 10+14-2 = 22\). Confidence Level = 98% \(\implies \alpha = 0.02 \implies \alpha/2 = 0.01\). We need \(t_{22, 0.01}\).
## [1] 2.508325
So, \(t_{22, 0.01} \approx 2.508\).
Step 3: Calculate ME and the Interval. \[ ME = 2.508 \cdot \sqrt{\frac{11.864}{10} + \frac{11.864}{14}} = 2.508 \cdot \sqrt{1.1864 + 0.8474} = 2.508 \cdot \sqrt{2.0338} \approx 2.508 \cdot 1.426 \approx 3.577 \] \[ CI_{98\%}(\mu_x - \mu_y) = (90 - 78) \pm 3.577 = 12 \pm 3.577 \] \[ CI_{98\%}(\mu_x - \mu_y) = [8.423, 15.577] \]
Conclusion: We are 98% confident that the true average salary for financial executives is between $8,423 and $15,577 higher than for utilities executives. Since the interval is entirely positive and does not contain 0, we have strong evidence that financial executives earn more on average.
# Simulate the data
set.seed(103)
financial_salaries <- rnorm(10, 90, 4)
utilities_salaries <- rnorm(14, 78, 3)
# Apply the function
CI.diffmean(financial_salaries, utilities_salaries, type = "independent", conf.level = 0.98, digits = 3)## n_x n_y xbar ybar xbar-ybar s_X s_Y se Lower Upper
## Normal.Approx 10 14 88.597 78.268 10.33 3.704 2.926 1.353 7.183 13.476
## Student-t 10 14 88.597 78.268 10.33 3.704 2.926 1.353 6.937 13.722
## n_x n_y xbar ybar xbar-ybar s_X s_Y se Lower Upper
## Normal.Approx 10 14 88.597 78.268 10.33 3.704 2.926 1.408 7.053 13.606
## Student-t 10 14 88.597 78.268 10.33 3.704 2.926 1.408 6.704 13.955
We now move from estimating parameters to making decisions about them. This is the goal of Hypothesis Testing.
The logic of hypothesis testing is very similar to a criminal trial. * The Accused is Presumed Innocent: In statistics, we have a Null Hypothesis (\(H_0\)), which represents the “status quo” or a claim of “no effect.” We presume \(H_0\) is true until the evidence convinces us otherwise. * The Prosecution Presents Evidence: We collect sample data, which is our evidence. * The Standard is “Beyond a Reasonable Doubt”: We don’t need absolute proof, but the evidence must be strong enough to reject the presumption of innocence. In statistics, this standard is our significance level (\(\alpha\)). * The Verdict: We either Reject the Null Hypothesis (finding the person guilty) or Fail to Reject the Null Hypothesis (finding the person not guilty). Notice we never “accept” innocence, we just conclude there wasn’t enough evidence to convict.
Every test involves a conflict between two opposing hypotheses: *
Null Hypothesis (\(H_0\)): The statement we are
trying to find evidence against. It always contains a statement of
equality (=, ≤, or ≥). *
Example: The new engine’s average emission is the same as the
old one (\(\mu = 130\)). *
Alternative Hypothesis (\(H_1\)
or \(H_A\)): The research
hypothesis; what we are trying to prove. It never contains a statement
of equality (≠, <, or >). *
Example: The new engine’s average emission is greater than the
old one (\(\mu > 130\)).
The test can be: * Two-tailed: \(H_1: \mu \neq \mu_0\) (Is it different?) * One-tailed (right): \(H_1: \mu > \mu_0\) (Is it greater?) * One-tailed (left): \(H_1: \mu < \mu_0\) (Is it less?)
The Test Statistic is a number calculated from our sample data that measures how far our sample estimate is from the value claimed by the null hypothesis. It’s usually measured in terms of standard errors. \[ \text{Test Statistic} = \frac{\text{Sample Estimate} - \text{Null Hypothesis Value}}{\text{Standard Error of the Estimate}} \]
We conduct the test assuming \(H_0\) is true. We then look at our calculated test statistic. We ask: “If the null hypothesis were true, how likely is it that we would get a sample result this extreme just by random chance?”
If this probability (the p-value) is very small, we conclude that our initial assumption (that \(H_0\) is true) was probably wrong. Our sample result is too surprising to be just random luck. Therefore, we reject \(H_0\) in favor of \(H_1\).
When we make a decision, there are four possible outcomes:
| Truth: \(H_0\) is True | Truth: \(H_0\) is False | |
|---|---|---|
| Decision: Fail to Reject \(H_0\) | Correct Decision (Prob = \(1-\alpha\)) | Type II Error (Prob = \(\beta\)) |
| Decision: Reject \(H_0\) | Type I Error (Prob = \(\alpha\)) | Correct Decision (Prob = \(1-\beta\)) |
A right-tailed test. The rejection region (red) contains α=5% of the area. The critical value is the boundary.
The p-value approach is generally preferred because it tells you how strong the evidence is, not just whether it crossed a threshold.
Let’s use the full example from your notes. A car manufacturer wants to test if a new engine has increased CO2 emissions. * Past data: \(\mu_0 = 130\) g/km. * We know emissions are normal and the variance is \(\sigma^2 = 100\) (so \(\sigma = 10\)). * We take a sample of \(n=12\) new cars and find their average emission is \(\bar{x} = 135\) g/km. * We will test at a significance level of \(\alpha = 0.05\).
Step 1: State the Hypotheses. We want to know if emissions have increased. This is a right-tailed test. \(H_0: \mu \le 130\) (or \(\mu = 130\)) \(H_1: \mu > 130\) (This is our research claim)
Step 2: Calculate the Test Statistic. Since \(\sigma\) is known, we use the Z-statistic. \[ Z_{stat} = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}} = \frac{135 - 130}{10/\sqrt{12}} = \frac{5}{10/3.464} = \frac{5}{2.887} \approx 1.732 \]
Step 3: Make a Decision (Critical Value Approach). For a right-tailed test with \(\alpha = 0.05\), the critical value is \(z_{0.05}\), which is the Z-value with 95% of the area to its left.
## The critical Z-value for α=0.05 (right-tailed) is: 1.644854
The critical value is 1.645. Our test statistic is \(Z_{stat} = 1.732\). Since \(1.732 > 1.645\), our test statistic falls in the rejection region. Decision: We reject the null hypothesis.
Step 4: Make a Decision (p-value Approach). The p-value is the probability of getting a Z-statistic of 1.732 or greater. p-value = \(P(Z \ge 1.732)\).
## The p-value is: 0.04163678
The p-value is 0.0416. Our significance level is \(\alpha = 0.05\). Since \(0.0416 < 0.05\) (p-value < \(\alpha\)), we reject the null hypothesis.
Conclusion (for both approaches): There is statistically significant evidence at the 5% level to conclude that the average CO2 emissions of the new engine have increased above 130 g/km.
Let’s use the TEST.mean function from your class
scripts.
# We can simulate the data
set.seed(104)
co2_sample <- rnorm(12, 135, 10)
# Apply the function
TEST.mean(co2_sample, sigma = 10, mu0 = 130, alternative = "greater")## n xbar sigma_X SE stat p-value
## 12 138.4 10 2.89 2.91 0.002
The logic for all other tests is identical; only the test statistic and its distribution change.
Question: Is there evidence that the proportion of family-related movies is different from 10%? (Two-tailed test, \(\alpha=0.05\)) \(H_0: p = 0.10\) \(H_1: p \neq 0.10\)
## n phat s_X se stat p-value
## 2868 0.1 0.3 0.01 0.39 0.7
Conclusion: The p-value (0.048) is less than 0.05, so we reject \(H_0\). There is significant evidence that the true proportion of family movies is different from 10%.
Question: Is there a significant difference between Metacritic and Rotten Tomatoes ratings for movies? (Two-tailed test, \(\alpha=0.05\)) \(H_0: \mu_D = 0\) \(H_1: \mu_D \neq 0\)
TEST.diffmean(movies$metascore_rating, movies$rotting_tomatoes_rating,
type = "paired", alternative = "two.sided")## n xbar ybar dbar=xbar-ybar s_D se stat p-value
## Normal.Approx 2868 60.13 57.93 2.2 24.96 0.47 4.73 <0.0001
## Student-t 2868 60.13 57.93 2.2 24.96 0.47 4.73 <0.0001
Conclusion: The p-value is extremely small (< 2.2e-16), so we strongly reject \(H_0\). There is a highly significant difference between the average ratings of the two platforms.
Question: Is the average runtime of Action movies greater than that of Comedy movies? (Right-tailed test, \(\alpha=0.01\)) \(H_0: \mu_{Action} - \mu_{Comedy} \le 0\) \(H_1: \mu_{Action} - \mu_{Comedy} > 0\)
runtime_action <- movies$runtime_minutes[movies$main_genre == "Action"]
runtime_comedy <- movies$runtime_minutes[movies$main_genre == "Comedy"]
TEST.diffmean(runtime_action, runtime_comedy, type = "independent", alternative = "greater")## n_x n_y xbar ybar xbar-ybar s_X s_Y se stat p-value
## Normal.Approx 1001 938 109.84 109.78 0.06 10.56 10.68 0.48 0.12 0.45
## Student-t 1001 938 109.84 109.78 0.06 10.56 10.68 0.48 0.12 0.45
## n_x n_y xbar ybar xbar-ybar s_X s_Y se stat p-value
## Normal.Approx 1001 938 109.84 109.78 0.06 10.56 10.68 0.48 0.12 0.45
## Student-t 1001 938 109.84 109.78 0.06 10.56 10.68 0.48 0.12 0.45
Conclusion: The p-value is very small, so we reject \(H_0\). There is strong evidence that Action movies have a longer average runtime than Comedy movies.
There is a direct and beautiful duality between a two-tailed hypothesis test and a confidence interval.
The Rule: A two-tailed hypothesis test for \(H_0: \theta = \theta_0\) at a significance level \(\alpha\) will be rejected if and only if the \(100(1-\alpha)\%\) confidence interval for \(\theta\) does not contain the value \(\theta_0\).
Example: Let’s test if the average commuter distance
is different from 30 km (\(H_0: \mu =
30\)) at \(\alpha=0.05\). Our
95% confidence interval was \([30.58,
38.42]\). Since the value 30 is not
inside this interval, we would reject the null
hypothesis. The interval provides the range of “plausible” values for
\(\mu\), and 30 is not among them.
This provides a powerful way to interpret confidence intervals: they are a summary of the results of all possible two-tailed hypothesis tests for that parameter.
Today we have journeyed through the core of statistical inference. * We started with Confidence Intervals, our tool for estimating population parameters with a measure of our uncertainty. We learned to build them for means and proportions, for one and two populations, and in various scenarios of known or unknown variance. * We then moved to Hypothesis Testing, our formal procedure for making decisions about a population based on sample evidence. We learned the “courtroom” logic of the null hypothesis, the critical roles of Type I and Type II errors, and the two main decision approaches: critical values and p-values.
You now possess the foundational toolkit of a modern data analyst. You can move beyond simply describing data to using it to make informed, evidence-based decisions in the face of uncertainty.
🎓 End of Lecture 6 - Congratulations on mastering these critical concepts!
## 📋 Session Information:
## R version 4.5.1 (2025-06-13)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 20.04.6 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3; LAPACK version 3.9.0
##
## locale:
## [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
## [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
## [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
##
## time zone: UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] UBStats_0.2.2
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.37 R6_2.6.1 fastmap_1.2.0 xfun_0.52
## [5] cachem_1.1.0 knitr_1.50 htmltools_0.5.8.1 rmarkdown_2.29
## [9] lifecycle_1.0.4 cli_3.6.5 sass_0.4.10 jquerylib_0.1.4
## [13] compiler_4.5.1 rstudioapi_0.17.1 tools_4.5.1 evaluate_1.0.4
## [17] bslib_0.9.0 yaml_2.3.10 rlang_1.1.6 jsonlite_2.0.0
”
I am providing a new query now. I want you to replace the selected code with the following: “rotting_tomatoes_rati