MATH 343: APPLIED STATISTICS NOTES
1 Introduction to hypothesis Testing
Imagine a doctor testing a new drug. The existing drug (Drug A) has a 60% success rate. The new drug (Drug B) is more expensive to produce.
The crucial question: Is Drug B significantly better than Drug A, or is its higher observed success rate in a small trial just due to random chance?
Hypothesis testing is the formal, statistical framework we use to answer these kinds of questions. It allows us to make data-driven inferences about a population based on sample data, while quantifying the uncertainty of those inferences.
2 Core Components
2.1 Null and Alternative Hypotheses
Every hypothesis test sets up two competing claims.
- Null Hypothesis (H₀): Represents the status quo or no effect.
Examples:- “The new drug is no better than the old one.”
- “The mean height of women is 65 inches.”
- “The coin is fair.”
- “The new drug is no better than the old one.”
- Alternative Hypothesis (H₁ or Hₐ): Represents the effect we want to detect.
Examples:- “The new drug is better than the old one.”
- “The mean height of women is not 65 inches.”
- “The coin is biased.”
- “The new drug is better than the old one.”
Formulating Hypotheses:
Two-tailed test:
H₀: μ = k
H₁: μ ≠ kOne-tailed test:
H₀: μ ≤ k H₁: μ > kor
H₀: μ ≥ k H₁: μ < k
The choice between one-tailed and two-tailed must be made before looking at the data.
2.2 Type I and Type II Errors & Power
Because we use samples, we can never be 100% certain.
| Decision | H₀ True | H₀ False |
|---|---|---|
| Reject H₀ | Type I Error (α) | Correct Decision (Power = 1 - β) |
| Fail to Reject H₀ | Correct (1 - α) | Type II Error (β) |
Type I Error (α): Rejecting a true null hypothesis.
Consequence: Concluding an effect exists when it doesn’t. (e.g., Convicting an innocent person, adopting a new drug that is no better).
Significance Level (α): The pre-chosen probability of making a Type I error. Common choices are 0.05 (5%), 0.01 (1%), and 0.10 (10%). This is our threshold for “unlikely.”
Type II Error (β): Failing to reject a false null hypothesis.
Consequence: Concluding no effect exists when it actually does. (e.g., Letting a guilty person go free, sticking with an old drug when a new one is better).
It depends on the sample size, the true effect size, and the chosen α.
Power (1 - β): Probability of correctly rejecting a false null. Researchers typically aim for 80% power.
Ways to increase power:
- Increase sample size (n).
- Increase effect size.
- Increase α (but this raises Type I error risk).
NB: There is a trade-off between Type I and Type II errors. Decreasing α makes it harder to reject H₀, which inadvertently increases β (decreases power), unless compensated for by a larger sample size.
2.3 The p-value: Making the Decision
The p-value is the probability of obtaining a test statistic at least as extreme as the one observed, assuming the null hypothesis (H₀) is true.
How to interpret it: A small p-value (typically ≤ α) means that the observed data would be very unlikely if the null hypothesis were true. This provides evidence against H₀.
The Decision Rule:
If p-value ≤ α, we reject the null hypothesis (H₀). The result is “statistically significant.”
If p-value > α, we fail to reject the null hypothesis (H₀). We don’t have enough evidence to support H₁.
Crucial: A p-value > α does not prove H₀ is true. It only means the evidence wasn’t strong enough to reject it. Also, “statistically significant” does not necessarily mean “practically important.”
2.4 Steps in Hypothesis Testing:
State the null and alternative hypotheses.
Choose the level of significance (alpha).
Calculate the test statistic.
Determine the critical value or p-value.
Compare the Z-statistic to the critical value or p-value to α.
Make a decision to reject or fail to reject the null hypothesis.
3 One-Sample Tests
3.1 Part A: One-Sample z-test
The Z-test is a statistical hypothesis test used to determine if there is a significant difference between sample and population parameters when the population standard deviation is known.
Key Characteristics:
Uses standard normal distribution (Z-distribution)
Appropriate for large sample sizes (n ≥ 30)
Population standard deviation (σ) must be known
More powerful than t-test when assumptions are met
Test statistic:
\[ z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}} \]
where:
x̄ = sample mean
μ₀ = hypothesized population mean under H₀
σ = known population standard deviation
n = sample size
3.1.1 Example 1: One-Sample Z-Test
A company claims its energy bars have an average of 20 grams of protein.The population standard deviation is known to be 1.5 grams. A random sample of 35 bars is taken, and the sample mean is found to be 20.6 grams. At a 5% significance level, is there evidence that the mean protein content is different from 20 grams?
Step 1: Hypotheses
Null hypothesis:
\(H_0: \mu = 20\)Alternative hypothesis:
\(H_1: \mu \neq 20\) (two-tailed test)
Step 2: Significance Level \(\alpha = 0.05\)
Step 3: Test Statistic (Manual Calculation)
Given:
- Sample mean \(\bar{x} = 20.6\)
- Population mean \(\mu_0 = 20\)
- Population standard deviation \(\sigma = 1.5\)
- Sample size \(n = 35\)
\[ z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}} \]
\[ z = \frac{20.6 - 20}{1.5 / \sqrt{35}} \]
\[ z = \frac{0.6}{1.5 / 5.916} = \frac{0.6}{0.2535} \approx 2.37 \]
Step 4: P-value
For a two-tailed test:
\[ p\text{-value} = 2 \times P(Z > |2.37|) \]
From Z-table: \(P(Z > 2.37) \approx 0.0089\)
\[ p\text{-value} = 2 \times 0.0089 = 0.0178 \]
Step 5: Decision
- \(p\)-value (0.0178) < \(\alpha\) (0.05)
- Reject \(H_0\)
Step 6: Conclusion
At the 5% significance level, there is sufficient evidence to conclude that the true mean protein content of the energy bars is different from 20 grams.
Verification in R:
# Given values
xbar <- 20.6
mu0 <- 20
sigma <- 1.5
n <- 35
# Z statistic
z <- (xbar - mu0) / (sigma / sqrt(n))
z
## [1] 2.366432
# Two-tailed p-value
p_value <- 2 * (1 - pnorm(abs(z)))
p_value
## [1] 0.01796048
3.1.2 Example 2: Two tailed test
A company claims its light bulbs last 1200 hours. The population standard deviation is 100 hours. A sample of 50 bulbs has a mean lifespan of 1175 hours. Test at \(\alpha=0.05\).
Step 1: Hypotheses
\(H_0: \mu = 1200\)
\(H_1: \mu \neq 1200\)
Step 2: Given Values
\(\mu_0=1200\), \(\sigma=100\), \(\bar{x}=1175\), \(n=50\), \(\alpha=0.05\)
Step 3: Test Statistic
\(SE = 100/\sqrt{50} = 14.14\)
\(Z = (1175-1200)/14.14 = -25/14.14 = -1.77\)
Step 4: Critical Values
Two-tailed test, \(Z_{0.025}=\pm1.96\)
Step 5: Decision
\(-1.77 > -1.96\) → Fail to reject \(H_0\).
Step 6: Conclusion
No significant evidence that mean lifespan differs from 1200 hours (\(Z=-1.77\), \(p>0.05\)).
3.1.3 Example 3: One-Tailed Test
A cereal company claims boxes contain at least 500 g. Population \(\sigma=15\). A sample of 40 boxes has mean weight 495 g. Test at \(\alpha=0.01\) if boxes are underfilled.
Step 1: Hypotheses
\(H_0: \mu \geq 500\)
\(H_1: \mu < 500\)
Step 2: Given Values
\(\mu_0=500\), \(\sigma=15\), \(\bar{x}=495\), \(n=40\), \(\alpha=0.01\)
Step 3: Test Statistic
\(SE = 15/\sqrt{40} = 2.37\)
\(Z = (495-500)/2.37 = -5/2.37 = -2.11\)
Step 4: Critical Value
Left-tailed test, \(Z_{0.01}=-2.33\)
Step 5: Decision
\(-2.11 > -2.33\) → Fail to reject \(H_0\).
Step 6: Conclusion
No significant evidence that boxes are underfilled (\(Z=-2.11\), \(p>0.01\)).
3.1.4 Example 4: Right-Tailed Test
A school district claims average SAT math score is 520 with \(\sigma=100\). A new teaching method is tested on 64 students, yielding mean score 540. Test at \(\alpha=0.05\) if the new method improves scores.
Step 1: Hypotheses
\(H_0: \mu \leq 520\)
\(H_1: \mu > 520\)
Step 2: Given Values
\(\mu_0=520\), \(\sigma=100\), \(\bar{x}=540\), \(n=64\), \(\alpha=0.05\)
Step 3: Test Statistic
\(SE = 100/\sqrt{64} = 12.5\)
\(Z = (540-520)/12.5 = 20/12.5 = 1.60\)
Step 4: Critical Value
Right-tailed test, \(Z_{0.05}=1.645\)
Step 5: Decision
\(1.60 < 1.645\) → Fail to reject \(H_0\).
Step 6: Conclusion
No significant evidence that the new method improves scores (\(Z=1.60\), \(p>0.05\)).
- When to use:
- Z-test: Population standard deviation (\(\sigma\)) known
- t-test: Population standard deviation unknown (use sample \(s\))
- Z-test: Population standard deviation (\(\sigma\)) known
3.2 Part B: One-Sample t-test
When to use it: To test a hypothesis about a population mean (μ) when:
- The population standard deviation (σ) is unknown (which is almost always the case in real life).
- We use the sample standard deviation (s) as an estimate.
- The sample size is small (n < 30) and the population is approximately normal.
Test statistic:
The t-test is a statistical hypothesis test used to determine if there is a significant difference between the means of two groups or between a sample mean and a known population mean
The one-sample t-test is used to test whether the mean of a sample differs significantly from a hypothesized population mean when the population standard deviation is unknown.
The test statistic is:
\[ t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}} \]
degrees of freedom: \(n-1\)
where s is the sample standard deviation.
with Degrees of Freedom (df): df = n - 1 The t-distribution is slightly wider and more variable than the z-distribution, accounting for the extra uncertainty from estimating σ with s. As df increases, the t-distribution approaches the z-distribution.
3.2.1 Example 1: One-Sample t-test (Practical with R & Python)
Problem: A car manufacturer claims a new model gets at least 40 MPG. A consumer agency tests a random sample of 12 cars, with the following results:
39.2, 40.5, 38.7, 41.0, 39.8, 40.9, 38.5, 39.9, 40.2, 39.3, 41.1, 38.6
Test the manufacturer’s claim at α = 0.05. Assume MPG is approximately normally distributed.
Solution (Manual Steps First):
1. Hypotheses:
H₀: μ ≥ 40 (Manufacturer's claim is true)
H₁: μ < 40 (The mean MPG is less than 40) -> One-tailed (left-tailed) test
2. Significance Level: α = 0.05
3. Calculate Sample Statistics:
Calculate the mean (x̄) and standard deviation (s) of the sample data
\[ \bar{x} = \frac{\text{sum of all values}}{12} \approx 39.783 \]
s ≈ 0.905 (calculated using the sample standard deviation formula)
5. Test Statistic:
\[ t = \frac{\bar{x} - \mu_{0}}{s / \sqrt{n}} = \frac{39.783 - 40}{0.905 / \sqrt{12}} = \frac{-0.217}{0.905 / 3.464} = \frac{-0.217}{0.261} \approx -0.831 \]
df = n - 1 = 11
Find p-value (using t-table for df=11):
This is a left-tailed test. We need P(T < -0.831).
From a t-table, for df=11, the value 0.831 falls between 0.697 and 1.363. The corresponding one-tailed probabilities are 0.25 and 0.10.
We can estimate p-value > 0.10. (Software will give a more precise value).
6. Decision (Manual):
Our estimated p-value (> 0.10) is greater than α (0.05). Therefore, we fail to reject H₀.
Now, let’s solve it precisely with code:
In R:
# Sample data
mpg_data <- c(39.2, 40.5, 38.7, 41.0, 39.8, 40.9, 38.5, 39.9, 40.2, 39.3, 41.1, 38.6)
# Perform one-sample t-test (alternative="less" for H1: mu < 40)
test_result <- t.test(mpg_data, mu = 40, alternative = "less")
# Print the results
print(test_result)
##
## One Sample t-test
##
## data: mpg_data
## t = -0.69814, df = 11, p-value = 0.2498
## alternative hypothesis: true mean is less than 40
## 95 percent confidence interval:
## -Inf 40.30138
## sample estimates:
## mean of x
## 39.80833
Conclusion from R:
The p-value is 0.2116. Since 0.2116 > 0.05, we fail to reject H₀. There is not enough evidence to reject the manufacturer’s claim that the mean MPG is at least 40.
In Python:
#import numpy as np
#from scipy import stats
# Sample data
#mpg_data = np.array([39.2, 40.5, 38.7, 41.0, 39.8, 40.9, 38.5, 39.9, 40.2, 39.3, 41.1, 38.6])
# Perform one-sample t-test
# 'alternative="less"' sets H1: mu < 40
#t_stat, p_value = stats.ttest_1samp(mpg_data, popmean=40, alternative='less')
# Print the results
#print(f"t-statistic: {t_stat:.4f}")
#print(f"p-value: {p_value:.4f}")
# For a one-tailed test, the p-value from `ttest_1samp` with 'alternative' is already correct.
# If using an older version without 'alternative', p_value / 2 for one-tailed.
Conclusion is the same as in R.
3.2.2 Example 2: Two-tailed Test (Fail to Reject)
Problem: A machine claims to fill jars with 100 g. A sample of \(n=10\) jars has \(\bar{x}=105\) g and \(s=8\) g. Test \(H_0: \mu=100\) vs \(H_1: \mu \neq 100\) at \(\alpha=0.05\).
Step 1: State hypotheses
\(H_0: \mu=100\)
\(H_1: \mu\neq100\)
Step 2: Compute standard error
\(SE = s/\sqrt{n} = 8/\sqrt{10} = 8/3.1623 = 2.5298\)
Step 3: Compute test statistic
\(t = (\bar{x}-\mu_0)/SE = (105-100)/2.5298 = 5/2.5298 = 1.976\)
Step 4: Degrees of freedom
\(df = 10-1=9\)
Step 5: Critical value
\(t_{0.975,9}=2.262\)
Decision: \(|t|=1.976 < 2.262\), so fail to reject \(H_0\). No evidence mean differs from 100.
In R:
n <- 10
xbar <- 105
mu0 <- 100
s <- 8
se <- s/sqrt(n)
t_stat <- (xbar - mu0)/se
df <- n-1
p_val <- 2*pt(-abs(t_stat), df)
list(t_statistic = t_stat, df = df, p_value = p_val)
## $t_statistic
## [1] 1.976424
##
## $df
## [1] 9
##
## $p_value
## [1] 0.07951604
3.2.3 Example 3: Two-tailed Test with 95% CI (Reject \(H_0\))
Problem: \(n=15\), \(\bar{x}=52\), \(s=3.5\), test \(H_0: \mu=50\) vs \(H_1: \mu\neq50\).
Step 1: SE
\(SE = 3.5/\sqrt{15} = 3.5/3.873 = 0.9035\)
Step 2: Test statistic
\(t = (52-50)/0.9035 = 2/0.9035 = 2.214\)
Step 3: df
\(df=14\)
Step 4: Critical value
\(t_{0.975,14}=2.145\)
Decision: \(t=2.214 > 2.145\), reject \(H_0\).
Step 5: 95% CI
Margin = \(t_{0.975,14}\times SE = 2.145\times 0.9035 = 1.939\)
CI = \(52\pm1.939 = (50.06,53.94)\)
Conclusion: Reject \(H_0\) (p < 0.05). The mean sodium content differs from 50 mg. The 95% CI provides the plausible range.
In R:
n <- 15
xbar <- 52
mu0 <- 50
s <- 3.5
se <- s/sqrt(n)
t_stat <- (xbar - mu0)/se
df <- n-1
p_val <- 2*pt(-abs(t_stat), df)
# 95% CI
alpha <- 0.05
t_crit <- qt(1-alpha/2, df)
margin <- t_crit*se
ci <- c(xbar - margin, xbar + margin)
list(t_statistic = t_stat, df = df, p_value = p_val, CI_95 = ci)
## $t_statistic
## [1] 2.213133
##
## $df
## [1] 14
##
## $p_value
## [1] 0.04400273
##
## $CI_95
## [1] 50.06176 53.93824
3.2.4 Example 4: One-sided Test with 99% CI (Reject \(H_0\))
Problem: Training course improvement, \(n=25\), \(\bar{x}=4\), \(s=6\), test \(H_0: \mu=0\) vs \(H_1: \mu>0\) at \(\alpha=0.01\).
Step 1: SE
\(SE = 6/\sqrt{25} = 6/5 = 1.2\)
Step 2: Test statistic
\(t = (4-0)/1.2 = 4/1.2 = 3.333\)
Step 3: df
\(df=24\)
Step 4: Critical value (one-sided)
\(t_{0.99,24}=2.492\)
Decision: \(t=3.333 > 2.492\), reject \(H_0\). Strong evidence of positive improvement.
Step 5: 99% CI
Margin = \(t_{0.995,24}\times SE = 2.797\times1.2=3.356\)
CI = \(4\pm3.356 = (0.64,7.36)\)
Conclusion: Strong evidence (\(p<0.01\)) that the course increases scores. The 99% CI shows the true mean improvement is positive.
In R:
n <- 25
xbar <- 4
mu0 <- 0
s <- 6
se <- s/sqrt(n)
t_stat <- (xbar - mu0)/se
df <- n-1
p_val <- 1 - pt(t_stat, df) # one-sided
# 99% CI
t_crit <- qt(0.995, df)
margin <- t_crit*se
ci <- c(xbar - margin, xbar + margin)
list(t_statistic = t_stat, df = df, p_value = p_val, CI_99 = ci)
## $t_statistic
## [1] 3.333333
##
## $df
## [1] 24
##
## $p_value
## [1] 0.001388157
##
## $CI_99
## [1] 0.6436726 7.3563274
4 Exercises & Assignments
4.1 Part A: One-Sample Z-test Questions
Q1. A manufacturer claims that the mean lifetime of its light bulbs is 1200 hours. A sample of 64 bulbs has a mean of 1180 hours. Assume the population standard deviation is known to be 80 hours. At the 5% level of significance, test whether the mean lifetime is different from 1200 hours.
Sample Data: Mean = 1180, \(n=64\), \(\sigma=80\), \(\mu_0=1200\).
Q2. The average weight of a packaged product is claimed to be 250 g. A quality inspector samples 100 packages and finds the sample mean to be 247 g. The population standard deviation is 10 g. Test at the 1% level whether the average weight is less than 250 g.
Sample Data: Mean = 247, \(n=100\), \(\sigma=10\), \(\mu_0=250\).
Q3. A machine is set to dispense 500 ml of juice. A random sample of 36 bottles has a mean content of 505 ml. The population standard deviation is known to be 12 ml. Test at the 5% significance level whether the machine is overfilling bottles.
Sample Data: Mean = 505, \(n=36\), \(\sigma=12\), \(\mu_0=500\).
Q4. A national survey found that the average American adult works 43.7 hours per week. The population standard deviation is assumed to be 4.6 hours. You survey 50 adults in your state and find they work an average of 45.1 hours per week. At the α = 0.01 level, is there significant evidence to conclude that workers in your state work more than the national average?
4.2 Part B: One-Sample T-test Questions
Q5. A nutritionist wants to test whether the average daily protein intake of adults differs from the recommended 60 g. A random sample of 12 adults has the following intakes (in grams):
protein <- c(62, 65, 59, 64, 60, 66, 61, 63, 62, 67, 64, 61)
mean(protein); sd(protein); length(protein)
## [1] 62.83333
## [1] 2.443296
## [1] 12
Q6. A teacher claims that the average score of her students on a math test is at least 70. A random sample of 20 students has the following scores:
scores <- c(65, 68, 70, 72, 67, 66, 69, 71, 68, 70,
64, 66, 67, 68, 72, 69, 70, 68, 67, 66)
mean(scores); sd(scores); length(scores)
## [1] 68.15
## [1] 2.230766
## [1] 20
Q7. A company believes the average monthly expenditure of households on internet services is 2000 KES. A sample of 18 households reports the following expenditures (in KES):
expenditure <- c(2100, 2050, 2150, 2200, 1900, 2000, 2300, 2250, 2100,
1950, 2000, 2050, 2150, 2200, 2100, 2250, 2050, 2150)
mean(expenditure); sd(expenditure); length(expenditure)
## [1] 2108.333
## [1] 108.8037
## [1] 18
Q8.The recommended daily calcium intake for adults is 1000 mg. A nutritionist believes the intake for women in their 50s is too low. She collects data from a random sample of 15 women:
980, 1005, 1010, 942, 865, 1200, 1105, 978, 1020, 999, 870, 1050, 1055, 955, 907
Test the nutritionist’s belief at the α = 0.05 level. Assume the population is approximately normal.
5 Test for Difference Between the Means of Two Samples
5.1 A. Two-Sample z-test
The two-sample z-test is used to compare the means of two independent groups when the population standard deviations are known.
When to Use Two-Sample Z-Test
Comparing means of two independent groups
Population standard deviations are known
Sample sizes are sufficiently large (typically n ≥ 30)
Data are approximately normally distributed
Steps for a two-sample z-test:
Step 1: State the Hypotheses
- Null Hypothesis (H₀): μ₁ = μ₂ (The population means are equal)
- Alternative Hypothesis (H₁):
- Two-tailed: μ₁ ≠ μ₂
- Right-tailed: μ₁ > μ₂
- Left-tailed: μ₁ < μ₂
- Two-tailed: μ₁ ≠ μ₂
Choose the form of H₁ based on the research question.
Step 2: Identify the Given Data
- Sample 1:
- Mean: \(\bar{x}_1\)
- Population standard deviation: \(\sigma_1\)
- Sample size: \(n_1\)
- Mean: \(\bar{x}_1\)
- Sample 2:
- Mean: \(\bar{x}_2\)
- Population standard deviation: \(\sigma_2\)
- Sample size: \(n_2\)
- Mean: \(\bar{x}_2\)
- Significance level: \(\alpha\)
Step 3: Calculate the Z-Statistic
Use the formula:
\[ z = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}} \]
Substitute the known values and simplify step-by-step.
Step 4: Determine the Critical Z-Value
- Use the standard normal distribution table.
- For a two-tailed test at \(\alpha = 0.05\), critical values are ±1.96.
- For a right-tailed test at \(\alpha = 0.05\), critical value is 1.645.
- For a left-tailed test at \(\alpha = 0.05\), critical value is -1.645.
Adjust based on the chosen tail and significance level.
Step 5: Compare and Decide
- If the test is two-tailed, reject H₀ if \(|z| > z_{\alpha/2}\)
- If the test is right-tailed, reject H₀ if \(z > z_\alpha\)
- If the test is left-tailed, reject H₀ if \(z < -z_\alpha\)
Step 6: Conclusion
- Reject H₀ if the z-statistic falls in the rejection region.
- Fail to reject H₀ if the z-statistic does not fall in the rejection region.
State the conclusion in context of the problem, including the calculated z-value and comparison to the critical value.
Notes
- This test assumes known population standard deviations.
- If population standard deviations are unknown, consider using a two-sample t-test instead.
- Ensure samples are independent and drawn randomly from normally distributed populations.
Let’s do three examples.
Example 1: Two-tailed test
Example 2: One-tailed test (right-tailed)
Example 3: One-tailed test (left-tailed)
5.1.1 Example 1: Two-Tailed Z-Test
A study compares test scores between two schools.
- School A: \(n_1 = 40, \ \bar{x}_1 = 85, \ \sigma_1 = 8\)
- School B: \(n_2 = 50, \ \bar{x}_2 = 82, \ \sigma_2 = 7.5\)
- Significance level: \(\alpha = 0.05\)
Solution
Step 1: State Hypotheses
\[H_0: \mu_1 = \mu_2 \quad \text{(no difference in means)}\]
\[H_1: \mu_1 \neq \mu_2 \quad \text{(means differ)}\]
Step 2: Compute Test Statistic (Manual Calculation)
The standard error (SE) is:
\[SE = \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}} = \sqrt{\frac{8^2}{40} + \frac{7.5^2}{50}}= \sqrt{\frac{64}{40} + \frac{56.25}{50}}= \sqrt{1.6 + 1.125} = \sqrt{2.725} \approx 1.651\]
The z-statistic is:
\[z = \frac{\bar{x}_1 - \bar{x}_2}{SE} = \frac{85 - 82}{1.651} \approx 1.82\]
Step 3: Critical Value & Decision Rule
For a two-tailed test at \(\alpha = 0.05\):
\[z_{\alpha/2} = \pm 1.96\]
Decision rule:
- If \(|z| > 1.96\), reject \(H_0\).
- Otherwise, fail to reject \(H_0\).
Here, \(|1.82| < 1.96\), so we fail to reject \(H_0\).
Step 4: Conclusion
There is no significant difference between the mean test scores of the two schools.
- Test statistic: \(z = 1.82\)
- Critical values: \(\pm 1.96\)
- Decision: Fail to reject \(H_0\)
- Interpretation: The evidence is insufficient at \(\alpha = 0.05\) to conclude a difference in mean test scores.
Step 5: R Verification
# Given data
x1 <- 85; x2 <- 82
n1 <- 40; n2 <- 50
sigma1 <- 8; sigma2 <- 7.5
# Standard error
SE <- sqrt((sigma1^2/n1) + (sigma2^2/n2))
SE
## [1] 1.650757
# Z statistic
z_value <- (x1 - x2)/SE
z_value
## [1] 1.817348
# Two-tailed p-value
p_value <- 2 * (1 - pnorm(abs(z_value)))
p_value
## [1] 0.06916391
5.1.2 Example 2: One-Tailed Test (Right-Tailed)
A company tests two production methods. Method X (n=35) has mean output=120 units/hour, σ=10. Method Y (n=40) has mean output=115 units/hour, σ=9. Test at α=0.01 if Method X is superior.
Solution
Step 1: State Hypotheses
Null Hypothesis (H₀): μₓ ≤ μᵧ (Method X is not superior)
Alternative Hypothesis (H₁): μₓ > μᵧ (Method X is superior)
Step 2: Given Values
- Method X:
- Sample size (n₁) = 35
- Mean (x̄₁) = 120
- Standard deviation (σ₁) = 10
- Method Y:
- Sample size (n₂) = 40
- Mean (x̄₂) = 115
- Standard deviation (σ₂) = 9
- Sample size (n₂) = 40
- Significance level (α) = 0.01
Step 3: Calculate Z-Statistic
\[Z = \frac{x̄₁ - x̄₂}{\sqrt{\frac{σ₁^2}{n₁} + \frac{σ₂^2}{n₂}}} = \frac{120 - 115}{\sqrt{\frac{100}{35} + \frac{81}{40}}} = \frac{5}{\sqrt{2.857 + 2.025}} = \frac{5}{\sqrt{4.882}} = \frac{5}{2.210} ≈ 2.26 \]
Step 4: Critical Value
- Right-tailed test at α = 0.01
- Critical Z-value: Zₐ = 2.33
Step 5: Compare and Decide
- Since 2.26 < 2.33, the Z-statistic is not in the rejection region
- Fail to reject H₀
Step 6: Conclusion
There is no significant evidence that Method X is superior.
Z = 2.26, p > 0.01
5.1.3 Example 3: One-Tailed Test (Left-Tailed)
A nutritionist compares calorie intake between two diets. Diet A (n=60) has mean=1800 calories, σ=150. Diet B (n=55) has mean=1850 calories, σ=140. Test at α=0.05 if Diet A has lower calorie intake.
Solution
Step 1: State Hypotheses
Null Hypothesis (H₀): μₐ ≥ μᵦ (Diet A is not lower in calories)
Alternative Hypothesis (H₁): μₐ < μᵦ (Diet A has lower calorie intake)
Step 2: Given Values
- Diet A:
- Sample size (n₁) = 60
- Mean (x̄₁) = 1800
- Standard deviation (σ₁) = 150
- Sample size (n₁) = 60
- Diet B:
- Sample size (n₂) = 55
- Mean (x̄₂) = 1850
- Standard deviation (σ₂) = 140
- Sample size (n₂) = 55
- Significance level (α) = 0.05
Step 3: Calculate Z-Statistic
\[ Z = \frac{x̄₁ - x̄₂}{\sqrt{\frac{σ₁^2}{n₁} + \frac{σ₂^2}{n₂}}} = \frac{1800 - 1850}{\sqrt{\frac{22500}{60} + \frac{19600}{55}}} = \frac{-50}{\sqrt{375 + 356.36}} = \frac{-50}{\sqrt{731.36}} = \frac{-50}{27.04} ≈ -1.85 \]
Step 4: Critical Value
- Left-tailed test at α = 0.05
- Critical Z-value: Zₐ = -1.645
Step 5: Compare and Decide
- Since -1.85 < -1.645, the Z-statistic is in the rejection region
- Reject H₀
Step 6: Conclusion
There is significant evidence that Diet A has lower calorie intake.
Z = -1.85, p < 0.05
5.2 B. Two-Sample T-test
5.2.1 Purpose
A two-tailed test is used to determine whether there is a significant difference between the means of two populations. Unlike a one-tailed test, which focuses on whether one mean is greater than or less than the other, a two-tailed test checks for any difference (positive or negative).
5.2.2 Types of Two-Sample Tests
Independent Samples: The two samples are unrelated.
Paired Samples: The two samples are related (e.g., before-and-after measurements on the same subjects).
5.2.3 Assumptions
The samples are randomly selected.
Independence of observations
Equal variances for pooled t-test
Random sampling
The populations are normally distributed (or sample sizes are large enough for the Central Limit Theorem to apply).
For independent samples: The variances of the two populations may or may not be equal.
For paired samples: The differences between paired observations are normally distributed.
5.2.4 Hypotheses
For independent samples:
Null hypothesis (\(H_0\)): \(\mu_1 = \mu_2\) (no difference in population means)
Alternative hypothesis (\(H_a\)): \(\mu_1 \neq \mu_2\) (there is a difference)
For paired samples:
Null hypothesis (\(H_0\)): \(\mu_d = 0\) (mean difference is zero)
Alternative hypothesis (\(H_a\)): \(\mu_d \neq 0\) (mean difference is not zero)
5.2.5 Test Statistics
For Independent Samples
If population variances are assumed equal: \[ t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{S_p^2 \left( \frac{1}{n_1} + \frac{1}{n_2} \right)}} \]
where \(S_p^2 = \frac{(n_1 - 1)S_1^2 + (n_2 - 1)S_2^2}{n_1 + n_2 - 2}\) is the pooled variance.
If population variances are not assumed equal (Welch’s t-test): \[ t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2}}} \] Degrees of freedom are calculated using the Welch-Satterthwaite equation.
For Paired Samples \[ t = \frac{\bar{d}}{s_d / \sqrt{n}} \] where \(\bar{d}\) is the mean of the differences, \(s_d\) is the standard deviation of the differences, and \(n\) is the number of pairs.
5.2.6 Decision Rule
Compare the calculated \(t\)-value with the critical \(t\)-value from the \(t\)-distribution table at the given significance level (\(\alpha\)) and degrees of freedom.
Reject \(H_0\) if the absolute value of the calculated \(t\)-value exceeds the critical \(t\)-value.
There are two types: independent (unpaired) and paired t-tests.
Two-Sample Pooled t-Test (Equal Variances): Used to compare the means of two independent samples when population variances are assumed equal.
Test Statistic
\[t = \frac{\bar{x}_1 - \bar{x}_2}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}\]
Pooled Variance
\[s_p^2 = \frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}\]
Degrees of Freedom
\[df = n_1 + n_2 - 2\]
Welch’s t-Test (Unequal Variances): Used to compare the means of two independent samples when population variances are not assumed equal.
Test Statistic
\[t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}\]
Degrees of Freedom (Welch-Satterthwaite Approximation)
\[df = \frac{\left( \frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} \right)^2}{\frac{\left( \frac{s_1^2}{n_1} \right)^2}{n_1 - 1} + \frac{\left( \frac{s_2^2}{n_2} \right)^2}{n_2 - 1}}\]
Paired t-Test: Used to compare means from the same group at different times or under different conditions.
Test Statistic
\[t = \frac{\bar{d}}{s_d / \sqrt{n}}\]
Where:
- \(\bar{d}\) = mean of the differences
- \(s_d\) = standard deviation of the differences
Degrees of Freedom
\[df = n - 1\] (n = No. of paires)
I will provide examples of both.
Example 1: Independent two-sample t-test (equal variances assumed)
Example 2: Independent two-sample t-test (Welch’s t-test, unequal variances not assumed)
Example 3: Paired two-sample t-test
5.2.7 Solved Example 1: Independent Samples**
Problem: A researcher wants to compare the average test scores of students from two different teaching methods. A random sample of 25 students from Method A has a mean score of 78 with a standard deviation of 10. A random sample of 30 students from Method B has a mean score of 75 with a standard deviation of 12. Assume unequal variances. Perform a two-tailed test at \(\alpha = 0.05\).
Solution:
State the hypotheses:
- \(H_0: \mu_A = \mu_B\)
- \(H_a: \mu_A \neq \mu_B\)
Calculate the test statistic:
Using Welch’s \(t\)-test: \[ t = \frac{\bar{x}_A - \bar{x}_B}{\sqrt{\frac{S_A^2}{n_A} + \frac{S_B^2}{n_B}}} \] Substituting values: \[ t = \frac{78 - 75}{\sqrt{\frac{10^2}{25} + \frac{12^2}{30}}} = \frac{3}{\sqrt{4 + 4.8}} = \frac{3}{\sqrt{8.8}} \approx \frac{3}{2.97} \approx 1.01 \]
Degrees of freedom:
Using the Welch-Satterthwaite formula: \[ df \approx \frac{\left(\frac{S_A^2}{n_A} + \frac{S_B^2}{n_B}\right)^2}{\frac{\left(\frac{S_A^2}{n_A}\right)^2}{n_A - 1} + \frac{\left(\frac{S_B^2}{n_B}\right)^2}{n_B - 1}} \] Substituting values: \[ df \approx \frac{(4 + 4.8)^2}{\frac{4^2}{24} + \frac{4.8^2}{29}} \approx \frac{8.8^2}{\frac{16}{24} + \frac{23.04}{29}} \approx \frac{77.44}{0.667 + 0.795} \approx 52 \]
Critical value:
From the \(t\)-table, \(t_{\text{critical}} = 2.009\) (for \(\alpha = 0.05\) and \(df = 52\)).
Decision:
Since \(|t| = 1.01 < 2.009\), we fail to reject \(H_0\). There is no significant difference in the means.
5.2.8 Solved Example 2: Paired Samples
Problem: A study measures the blood pressure of 10 patients before and after a new medication. The differences in systolic blood pressure are: \([-5, -3, -8, -6, -4, -7, -2, -5, -6, -4]\). Perform a two-tailed test at \(\alpha = 0.05\).
Solution: 1. State the hypotheses: - \(H_0: \mu_d = 0\) - \(H_a: \mu_d \neq 0\)
Calculate the mean and standard deviation of differences:
- Mean: \(\bar{d} = \frac{-5 - 3 - 8 - 6 - 4 - 7 - 2 - 5 - 6 - 4}{10} = -5\)
- Standard deviation: \(s_d = \sqrt{\frac{\sum(d_i - \bar{d})^2}{n-1}} = \sqrt{\frac{(-5+5)^2 + (-3+5)^2 + ... + (-4+5)^2}{9}} = \sqrt{\frac{50}{9}} \approx 2.36\)
Calculate the test statistic: \[ t = \frac{\bar{d}}{s_d / \sqrt{n}} = \frac{-5}{2.36 / \sqrt{10}} = \frac{-5}{2.36 / 3.16} = \frac{-5}{0.747} \approx -6.7 \]
Degrees of freedom: \(df = n - 1 = 10 - 1 = 9\)
Critical value: From the \(t\)-table, \(t_{\text{critical}} = 2.262\) (for \(\alpha = 0.05\) and \(df = 9\)).
Decision: Since \(|t| = 6.7 > 2.262\), we reject \(H_0\). There is a significant difference in blood pressure before and after the medication.
5.2.9 Solved Example 3: Independent Samples (Equal Variances)
Problem: A researcher wants to compare the average heights of plants grown in two different fertilizers. A random sample of 15 plants from Fertilizer A has a mean height of 20 cm with a standard deviation of 3 cm. A random sample of 18 plants from Fertilizer B has a mean height of 18 cm with a standard deviation of 4 cm. Assume equal variances. Perform a two-tailed test at \(\alpha = 0.05\).
Solution:
State the hypotheses:
- \(H_0: \mu_A = \mu_B\)
- \(H_a: \mu_A \neq \mu_B\)
Calculate the pooled variance: \[ S_p^2 = \frac{(n_A - 1)S_A^2 + (n_B - 1)S_B^2}{n_A + n_B - 2} \] Substituting values: \[ S_p^2 = \frac{(15 - 1)(3^2) + (18 - 1)(4^2)}{15 + 18 - 2} = \frac{14(9) + 17(16)}{31} = \frac{126 + 272}{31} = \frac{398}{31} \approx 12.84 \]
Calculate the test statistic: \[ t = \frac{\bar{x}_A - \bar{x}_B}{\sqrt{S_p^2 \left( \frac{1}{n_A} + \frac{1}{n_B} \right)}} \] Substituting values: \[ t = \frac{20 - 18}{\sqrt{12.84 \left( \frac{1}{15} + \frac{1}{18} \right)}} = \frac{2}{\sqrt{12.84 \left( 0.0667 + 0.0556 \right)}} = \frac{2}{\sqrt{12.84 \cdot 0.1223}} = \frac{2}{\sqrt{1.57}} \approx \frac{2}{1.25} \approx 1.6 \]
Degrees of freedom: \[ df = n_A + n_B - 2 = 15 + 18 - 2 = 31 \]
Critical value: From the \(t\)-table, \(t_{\text{critical}} = 2.042\) (for \(\alpha = 0.05\) and \(df = 31\)).
Decision: Since \(|t| = 1.6 < 2.042\), we fail to reject \(H_0\). There is no significant difference in the mean heights of plants grown with the two fertilizers.
5.2.10 Solved Example 4: Paired Samples
Problem: A study measures the reaction times of 12 drivers before and after consuming alcohol. The differences in reaction times (in seconds) are: \([0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3]\). Test if alcohol significantly increases reaction time at \(\alpha = 0.01\).
Solution:
State the hypotheses:
- \(H_0: \mu_d = 0\)
- \(H_a: \mu_d \neq 0\)
Calculate the mean and standard deviation of differences:
Mean: \(\bar{d} = \frac{0.2 + 0.3 + 0.4 + ... + 1.3}{12} = \frac{7.8}{12} = 0.65\)
Standard deviation: \(s_d = \sqrt{\frac{\sum(d_i - \bar{d})^2}{n-1}}\) \[ s_d = \sqrt{\frac{(0.2 - 0.65)^2 + (0.3 - 0.65)^2 + ... + (1.3 - 0.65)^2}{11}} \]
\[ = \sqrt{\frac{0.2025 + 0.1225 + ... + 0.4225}{11}} \] = $$
Calculate the test statistic: \[ t = \frac{\bar{d}}{s_d / \sqrt{n}} = \frac{0.65}{0.51 / \sqrt{12}} = \frac{0.65}{0.51 / 3.46} = \frac{0.65}{0.147} \approx 4.42 \]
Degrees of freedom: \[ df = n - 1 = 12 - 1 = 11 \]
Critical value: From the \(t\)-table, \(t_{\text{critical}} = 3.106\) (for \(\alpha = 0.01\) and \(df = 11\)).
Decision: Since \(|t| = 4.42 > 3.106\), we reject \(H_0\). Alcohol significantly increases reaction time.
5.2.11 Solved Example 5: Independent Samples (Unequal Variances)
Problem: A company compares the productivity of two teams. Team X produces 30 items with a mean output of 50 units and a standard deviation of 8 units. Team Y produces 25 items with a mean output of 45 units and a standard deviation of 10 units. Assume unequal variances. Perform a two-tailed test at \(\alpha = 0.05\).
Solution:
State the hypotheses:
- \(H_0: \mu_X = \mu_Y\)
- \(H_a: \mu_X \neq \mu_Y\)
Calculate the test statistic: Using Welch’s \(t\)-test: \[ t = \frac{\bar{x}_X - \bar{x}_Y}{\sqrt{\frac{S_X^2}{n_X} + \frac{S_Y^2}{n_Y}}} \] Substituting values: \[ t = \frac{50 - 45}{\sqrt{\frac{8^2}{30} + \frac{10^2}{25}}} \] \[ = \frac{5}{\sqrt{\frac{64}{30} + \frac{100}{25}}} = \frac{5}{\sqrt{2.13 + 4}} \]
\[ = \frac{5}{\sqrt{6.13}} \approx \frac{5}{2.47} \approx 2.02 \]
Degrees of freedom: Using the Welch-Satterthwaite formula: \[ df \approx \frac{\left(\frac{S_X^2}{n_X} + \frac{S_Y^2}{n_Y}\right)^2}{\frac{\left(\frac{S_X^2}{n_X}\right)^2}{n_X - 1} + \frac{\left(\frac{S_Y^2}{n_Y}\right)^2}{n_Y - 1}} \] Substituting values: \[ df \approx \frac{(2.13 + 4)^2}{\frac{2.13^2}{29} + \frac{4^2}{24}} \]
\[ = \frac{6.13^2}{\frac{4.54}{29} + \frac{16}{24}}\] \[ = \frac{37.57}{0.157 + 0.667} \approx \frac{37.57}{0.824} \approx 45.6 \]
Critical value: From the \(t\)-table, \(t_{\text{critical}} = 2.014\) (for \(\alpha = 0.05\) and \(df = 45\)).
Decision: Since \(|t| = 2.02 > 2.014\), we reject \(H_0\). There is a significant difference in productivity between the two teams.
5.2.12 Solved Example 6: Paired Samples
Problem: A study measures the cholesterol levels of 8 patients before and after a new diet. The differences in cholesterol levels are: \([-10, -15, -20, -12, -18, -14, -16, -13]\). Test if the diet significantly reduces cholesterol at \(\alpha = 0.05\).
Solution:
State the hypotheses:
- \(H_0: \mu_d = 0\)
- \(H_a: \mu_d \neq 0\)
Calculate the mean and standard deviation of differences:
- Mean: \(\bar{d} = \frac{-10 - 15 - 20 - 12 - 18 - 14 - 16 - 13}{8} = \frac{-118}{8} = -14.75\)
- Standard deviation: \(s_d = \sqrt{\frac{\sum(d_i - \bar{d})^2}{n-1}}\) \[ s_d = \sqrt{\frac{(-10 + 14.75)^2 + (-15 + 14.75)^2 + ... + (-13 + 14.75)^2}{7}} \] \[ = \sqrt{\frac{22.56 + 0.06 + ... + 3.06}{7}} = \sqrt{\frac{33.63}{7}} \approx \sqrt{4.8} \approx 2.19 \]
Calculate the test statistic: \[ t = \frac{\bar{d}}{s_d / \sqrt{n}} = \frac{-14.75}{2.19 / \sqrt{8}} \] \[ = \frac{-14.75}{2.19 / 2.83} = \frac{-14.75}{0.774} \approx -19.05 \]
Degrees of freedom: \[ df = n - 1 = 8 - 1 = 7 \]
Critical value: From the \(t\)-table, \(t_{\text{critical}} = 2.365\) (for \(\alpha = 0.05\) and \(df = 7\)).
Decision: Since \(|t| = 19.05 > 2.365\), we reject \(H_0\). The diet significantly reduces cholesterol levels.
5.2.13 Solved Example 7: Two-Tailed Test (Equal Variances Assumed)
A researcher wants to compare the effectiveness of two teaching methods. Method A is used on 25 students, Method B on 30 students. Test scores are recorded:
Method A: 78, 82, 85, 79, 83, 88, 76, 81, 84, 80, 82, 85, 79, 83, 87, 77, 82, 86, 80, 84, 78, 81, 85, 79, 83
Method B: 75, 79, 82, 76, 80, 84, 74, 78, 81, 77, 79, 83, 75, 80, 82, 76, 79, 81, 77, 80, 75, 78, 82, 76, 79, 81, 75, 78, 82, 76
Test at α = 0.05 if there’s a significant difference between methods.
Solution
Step 1: State Hypotheses
H₀: μ₁ = μ₂ (No difference in mean scores)
H₁: μ₁ ≠ μ₂ (Means differ significantly)
Step 2: Sample Statistics
- Method A: n₁ = 25, x̄₁ = 81.6, s₁ = 3.24
- Method B: n₂ = 30, x̄₂ = 78.9, s₂ = 2.98
Step 3: Pooled Variance
\[ s_p^2 = \frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2} = \frac{(24)(10.50) + (29)(8.88)}{53} = \frac{252 + 257.5}{53} = 9.61 \]
\[ s_p = \sqrt{9.61} = 3.10 \]
Step 4: t-Statistic
\[ t = \frac{x̄_1 - x̄_2}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} = \frac{2.7}{3.10 \times \sqrt{0.04 + 0.0333}} = \frac{2.7}{0.84} ≈ 3.21 \]
Step 5: Critical Value
Degrees of freedom: df = 25 + 30 - 2 = 53
α = 0.05 (two-tailed)
Critical t-value: ±2.006
Step 6: Decision
- Since 3.21 > 2.006, reject H₀
Step 7: Conclusion
There is a significant difference in effectiveness between the two teaching methods.
t(53) = 3.21, p < 0.05
5.2.14 Solved Example 8: One-Tailed Test (Welch’s t-test, Unequal Variances)
A company tests two battery types. Type X (n=15) has mean life=120 hours, s=12. Type Y (n=20) has mean life=115 hours, s=8. Test at α=0.05 if Type X lasts longer.
Solution
Step 1: State Hypotheses - H₀: $ μ_X ≤ μ_Y $
- H₁: $ μ_X > μ_Y $
Step 2: Given Values - Type X: $n₁ = 15, x̄₁ = 120, s₁ = 12 $ - Type Y: \(n₂ = 20, x̄₂ = 115, s₂ = 8\)
Step 3: Welch’s t-Statistic
\[ t = \frac{x̄_1 - x̄_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} = \frac{5}{\sqrt{9.6 + 3.2}} = \frac{5}{\sqrt{12.8}} = \frac{5}{3.58} ≈ 1.40 \]
Step 4: Degrees of Freedom
\[ df = \frac{(s_1^2/n_1 + s_2^2/n_2)^2}{\frac{(s_1^2/n_1)^2}{n_1 - 1} + \frac{(s_2^2/n_2)^2}{n_2 - 1}} = \frac{(12.8)^2}{\frac{(9.6)^2}{14} + \frac{(3.2)^2}{19}} = \frac{163.84}{6.58 + 0.54} ≈ 23.01 \]
Step 5: Critical Value
- df ≈ 23, α = 0.05 (one-tailed)
- Critical t-value: 1.714
Step 6: Decision - Since 1.40 < 1.714, fail to reject H₀
Step 7: Conclusion
No significant evidence that Type X batteries last longer.
t(23) = 1.40, p > 0.05
5.2.15 Solved Example 9: Practical Application with Raw Data
A fitness trainer compares weight loss between two diets. Participants are randomly assigned:
Diet Plan A (n=12): 5.2, 4.8, 6.1, 5.5, 4.9, 5.8, 6.2, 5.1, 4.7, 5.9, 5.3, 5.6 kg
Diet Plan B (n=10): 4.1, 4.5, 3.9, 4.8, 4.0, 4.3, 3.7, 4.6, 4.2, 4.4 kg
Test at α=0.05 if Diet A leads to greater weight loss.
Solution
Step 1: State Hypotheses - H₀: μ_A ≤ μ_B - H₁: μ_A > μ_B
Step 2: Sample Statistics
Diet A: n₁ = 12, ∑x = 65.1, x̄ = 5.425, ∑x² = 356.7 9
Diet B: n₂ = 10, ∑x = 42.5, x̄ = 4.25, ∑x² = 182.0 9
Variance Calculations
\[ s_1^2 = \frac{∑x_1^2 - (∑x_1)^2 / n_1}{n_1 - 1} = \frac{356.79 - (65.1)^2 / 12}{11} = \frac{356.79 - 353.17}{11} = 0.329 \]
\[ s_2^2 = \frac{∑x_2^2 - (∑x_2)^2 / n_2}{n_2 - 1} = \frac{182.09 - (42.5)^2 / 10}{9} = \frac{182.09 - 180.63}{9} = 0.162 \]
Step 3: Pooled Variance
\[ s_p^2 = \frac{(11)(0.329) + (9)(0.162)}{20} = \frac{3.619 + 1.458}{20} = 0.254 \quad s_p = \sqrt{0.254} = 0.504 \]
Step 4: t-Statistic
\[ t = \frac{5.425 - 4.25}{0.504 \sqrt{\frac{1}{12} + \frac{1}{10}}} = \frac{1.175}{0.504 \times \sqrt{0.1833}} = \frac{1.175}{0.216} ≈ 5.44 \]
Step 5: Critical Value
df = 12 + 10 - 2 = 20
α = 0.05 (one-tailed)
Critical t-value: 1.725
Step 6: Decision
- Since 5.44 > 1.725, reject H₀
Step 7: Conclusion
Diet A leads to significantly greater weight loss than Diet B.
t(20) = 5.44, p < 0.001
6 Exercise & Assignment
6.1 Part A: Two-Sample Z-Test Questions
6.1.1 Question 1: Product Quality Comparison
A consumer protection agency wants to compare the average weight of cereal boxes from two different brands. The population standard deviations are known from historical data.
Known Information:
Brand A: Population σ = 15 grams
Brand B: Population σ = 12 grams
Sample sizes: n₁ = 50 boxes of Brand A, n₂ = 45 boxes of Brand B
Sample means: x̄₁ = 498 grams, x̄₂ = 505 grams
Required
Test at α = 0.05 if there is a significant difference in average weights between the two brands.
Calculate the 95% confidence interval for the difference in means.
What sample size would be needed to detect a difference of 5 grams with 90% power?
6.1.2 Question 2: Manufacturing Process Evaluation
A factory has two production lines for making electrical components. The specification requires components to have a resistance of 100 ohms. Historical data shows known population standard deviations.
Data:
Line 1: n = 60, x̄ = 101.5 ohms, σ = 2.5 ohms
Line 2: n = 55, x̄ = 99.8 ohms, σ = 3.0 ohms
Required:
Test at α = 0.01 if Line 1 produces components with significantly higher resistance than Line 2.
Calculate the p-value for the test.
What is the probability of Type II error if the true difference in means is 1 ohm?
6.2 Part B: Two-Sample t-Test Questions
6.2.1 Question 3: Teaching Method Effectiveness
A school district wants to compare the effectiveness of traditional teaching methods versus technology-enhanced methods. Students were randomly assigned to two groups.
Test Scores Data:
Traditional Group (n = 25): 78, 82, 75, 85, 80, 79, 83, 76, 81, 84, 77, 82, 79, 83, 78, 81, 80, 84, 76, 82, 79, 83, 77, 81, 80
Technology Group (n = 28): 85, 88, 82, 90, 86, 84, 87, 83, 89, 85, 81, 88, 84, 86, 83, 87, 85, 89, 84, 87, 82, 88, 86, 85, 87, 84, 88, 86
Required:
Perform a two-sample t-test at α = 0.05 to determine if technology-enhanced methods lead to higher scores.
Check the assumption of equal variances using an F-test.
Calculate Cohen’s d to measure effect size.
Interpret the results in educational context.
6.2.2 Question 4: Drug Efficacy Study
A pharmaceutical company is testing a new drug for cholesterol reduction. Patients are randomly assigned to treatment and control groups.
Cholesterol Reduction (mg/dL):
Treatment Group (n = 20): 25, 28, 32, 35, 29, 31, 27, 34, 30, 33, 26, 29, 32, 36, 28, 31, 34, 27, 30, 33
Control Group (n = 18): 18, 20, 22, 19, 21, 23, 17, 20, 24, 19, 22, 18, 21, 23, 20, 22, 19, 21
Required:
Test at α = 0.01 if the treatment group shows significantly greater cholesterol reduction.
Should you use pooled or Welch’s t-test? Justify your choice.
Calculate the 99% confidence interval for the difference in means.
What are the practical implications of your findings?
6.2.3 Question 5: Independent Samples
A company tests two production methods. Method X produces 100 items with a mean weight of 15 kg and a standard deviation of 2 kg. Method Y produces 120 items with a mean weight of 14.5 kg and a standard deviation of 2.5 kg. Test if the mean weights differ at \(\alpha = 0.01\).
6.2.4 Question 6: Paired Samples
A fitness program measures the weights of 15 participants before and after a 6-month training period. The differences in weights are: \([-3, -2, -4, -1, -5, -3, -2, -4, -3, -2, -1, -3, -4, -2, -3]\). Test if the program significantly reduces weight at \(\alpha = 0.05\).
6.2.5 Question 7: Paired Samples
A psychologist studies the effect of a new therapy on stress levels. The stress scores (on a scale of 0 to 100) of 12 patients before and after the therapy are as follows:
| Patient | Before Therapy | After Therapy |
|---|---|---|
| 1 | 75 | 60 |
| 2 | 80 | 65 |
| 3 | 70 | 55 |
| 4 | 85 | 70 |
| 5 | 90 | 75 |
| 6 | 72 | 60 |
| 7 | 88 | 73 |
| 8 | 78 | 68 |
| 9 | 82 | 67 |
| 10 | 76 | 62 |
| 11 | 84 | 70 |
| 12 | 79 | 65 |
Perform a two-tailed test at \(\alpha = 0.01\) to determine if the therapy significantly reduces stress levels.
6.2.6 Question 8: Paired Samples
A nutritionist evaluates the effectiveness of a new diet plan on blood sugar levels. The blood sugar levels (in mg/dL) of 15 patients before and after the diet are recorded as follows:
| Patient | Before Diet | After Diet |
|---|---|---|
| 1 | 120 | 110 |
| 2 | 130 | 120 |
| 3 | 140 | 130 |
| 4 | 125 | 115 |
| 5 | 135 | 125 |
| 6 | 145 | 135 |
| 7 | 150 | 140 |
| 8 | 160 | 150 |
| 9 | 140 | 130 |
| 10 | 155 | 145 |
| 11 | 165 | 155 |
| 12 | 170 | 160 |
| 13 | 180 | 170 |
| 14 | 165 | 155 |
| 15 | 175 | 165 |
Perform a two-tailed test at \(\alpha = 0.05\) to determine if the diet significantly lowers blood sugar levels.
6.2.7 Question 9: Challenge Problem
Compare the performance of two algorithms on 50 datasets. Algorithm A has a mean accuracy of 85% with a standard deviation of 5%, while Algorithm B has a mean accuracy of 87% with a standard deviation of 6%. Assume unequal variances. Perform a two-tailed test at \(\alpha = 0.05\).
7 F-Tests (Variance Ratio test)
1. Introduction
The F-test is a statistical procedure used to compare the variances of two populations. It is based on the F-distribution and helps test hypotheses about whether two population variances are equal.
2. Types of F-Tests
a. Two-Sample F-Test for Variances
Used to determine if two populations have the same variance.
- H₀: \(\sigma_1^2 = \sigma_2^2\)
- H₁: \(\sigma_1^2 \ne \sigma_2^2\) (two-tailed), or
\(\sigma_1^2 > \sigma_2^2\), \(\sigma_1^2 < \sigma_2^2\) (one-tailed)
b. ANOVA F-Test
Used in Analysis of Variance to compare the means of three or more groups.
- Tests whether at least one group mean differs significantly
- Compares between-group variance to within-group variance
3. Assumptions of the F-Test
- Populations are normally distributed
- Samples are independent
- Data is continuous (for variance comparison)
4. Steps for Conducting a Two-Sample F-Test for Variances
Step 1: State the Hypotheses
- Null Hypothesis (H₀): \(\sigma_1^2 = \sigma_2^2\)
- Alternative Hypothesis (H₁):
- Two-tailed: \(\sigma_1^2 \ne \sigma_2^2\)
- One-tailed: \(\sigma_1^2 > \sigma_2^2\) or \(\sigma_1^2 < \sigma_2^2\)
- Two-tailed: \(\sigma_1^2 \ne \sigma_2^2\)
Step 2: Calculate the Test Statistic
\[F = \frac{s_1^2}{s_2^2} \]
Where:
- \(s_1^2\) and \(s_2^2\) are the sample variances
- By convention, place the larger variance in the numerator so that \(F \ge 1\)
Step 3: Determine the Degrees of Freedom
- Numerator degrees of freedom: \(df_1 = n_1 - 1\)
- Denominator degrees of freedom: \(df_2 = n_2 - 1\)
Step 4: Find the Critical Value
- Use the F-distribution table
- Input: \(df_1\), \(df_2\), and significance level \(\alpha\)
- For two-tailed tests, use \(\alpha/2\) in each tail
Step 5: Make a Decision
Reject H₀ if \(F\) is greater than the critical value
Alternatively, use the p-value approach:
- If \(p < \alpha\), reject H₀
- If \(p \ge \alpha\), fail to reject H₀
- If \(p < \alpha\), reject H₀
7.0.1 Example 1: Two-Tailed F-Test for Equal Variances
A quality control manager wants to compare the consistency of two machines. Samples are taken from each machine:
Machine A (n=16): 102, 105, 98, 100, 103, 99, 101, 104, 97, 102, 100, 103, 99, 101, 105, 98
Machine B (n=13): 100, 98, 102, 97, 99, 101, 96, 100, 98, 103, 97, 99, 101
Test at α=0.05 if the variances differ significantly.
Solution
Step 1: State Hypotheses
- H₀: \(\sigma_1^2 = \sigma_2^2\)
- H₁: \(\sigma_1^2 \ne \sigma_2^2\)
Step 2: Sample Data
Machine A: n₁ = 16, ∑x = 1617, \(\bar{x}_1 = 101.06\), \(s_1^2 = 7.796\)
Machine B: n₂ = 13, ∑x = 1291, \(\bar{x}_2 = 99.31\), \(s_2^2 = 3.564\)
Step 3: Calculate F-Statistic
\[ F = \frac{7.796}{3.564} = 2.188 \]
Step 4: Critical Values
- α = 0.05 (two-tailed)
- df₁ = 15, df₂ = 12
- Upper critical value: \(F_{0.025}(15,12) = 3.18\)
- Lower critical value: \(F_{0.975}(15,12) = \frac{1}{F_{0.025}(12,15)} = \frac{1}{3.67} = 0.272\)
Step 5: Decision
- Rejection region: F < 0.272 or F > 3.18
- Since \(0.272 < 2.188 < 3.18\), fail to reject H₀
Step 6: Conclusion
No significant difference in variances between the two machines.
F(15,12) = 2.188, p > 0.05
7.0.2 Example 2: One-Tailed F-Test (Testing if One Variance is Greater)
A pharmaceutical company tests two production methods. They want to know if Method B has less variability than Method A. Samples:
Method A (n=11): 24.8, 25.2, 24.9, 25.5, 24.7, 25.1, 25.3, 24.6, 25.4, 24.8, 25.0 mg
Method B (n=9): 25.1, 25.0, 25.2, 25.1, 25.0, 25.2, 25.1, 25.0, 25.1 mg
Test at α=0.05 if Method B has significantly lower variance.
Solution
Step 1: State Hypotheses
- H₀: \(\sigma_A^2 \le \sigma_B^2\)
- H₁: \(\sigma_A^2 > \sigma_B^2\)
Step 2: Sample Data
- Method A: n₁ = 11, ∑x = 274.3, \(\bar{x}_1 = 24.94\), \(s_A^2 = 0.0934\)
- Method B: n₂ = 9, ∑x = 225.9, \(\bar{x}_2 = 25.10\), \(s_B^2 = 0.0050\)
Step 3: Calculate F-Statistic
\[F = \frac{0.0934}{0.0050} = 18.68 \]
Step 4: Critical Value
α = 0.05 (one-tailed)
df₁ = 10, df₂ = 8
Critical value: \(F_{0.05}(10,8) = 3.35\)
Step 5: Decision
Rejection region: F > 3.35
Since \(18.68 > 3.35\), reject H₀
Step 6: Conclusion
Method B has significantly lower variance than Method A. F(10,8) = 18.68, p < 0.05
7.0.3 Example 3: F-Test as Preliminary Test for t-Test
A researcher wants to compare two teaching methods but first needs to check if equal variances can be assumed.
Group 1 (Traditional, n=21): Variance = 45.2
Group 2 (Experimental, n=18): Variance = 28.7
Test at α=0.10 if equal variances can be assumed for the subsequent t-test.
Solution
Step 1: State Hypotheses
- H₀: \(\sigma_1^2 = \sigma_2^2\)
- H₁: \(\sigma_1^2 \ne \sigma_2^2\)
Step 2: Sample Data
- Group 1 (Traditional): n₁ = 21, variance = 45.2
- Group 2 (Experimental): n₂ = 18, variance = 28.7
Step 3: Calculate F-Statistic
\[F = \frac{45.2}{28.7} = 1.575\]
Step 4: Critical Values
- α = 0.10 (two-tailed)
- df₁ = 20, df₂ = 17
- Upper critical value: \(F_{0.05}(20,17) = 2.23\)
- Lower critical value: \(F_{0.95}(20,17) = \frac{1}{F_{0.05}(17,20)} = \frac{1}{2.16} = 0.463\)
Step 5: Decision
Rejection region: F < 0.463 or F > 2.23
Since \(0.463 < 1.575 < 2.23\), fail to reject H₀
Step 6: Conclusion
Equal variances can be assumed. The researcher can proceed with a pooled t-test.
Key Formulas and Rules
F-Statistic
\[F = \frac{s_1^2}{s_2^2}\]
Always place the larger variance in the numerator
F ≥ 1 by construction
Degrees of Freedom
df₁ = n₁ - 1 (numerator)
df₂ = n₂ - 1 (denominator)
Critical Value Rules
- Two-tailed test: Compare F to \(F_{\alpha/2}(df_1, df_2)\) and \(1/F_{\alpha/2}(df_2, df_1)\)
- One-tailed test:
- If testing \(\sigma_1^2 > \sigma_2^2\): compare F to \(F_\alpha(df_1, df_2)\)
- If testing \(\sigma_1^2 < \sigma_2^2\): use \(F = s_2^2 / s_1^2\) and compare to \(F_\alpha(df_2, df_1)\)
- If testing \(\sigma_1^2 > \sigma_2^2\): compare F to \(F_\alpha(df_1, df_2)\)
F-Distribution Properties
Right-skewed distribution
\(F(df_1, df_2) = \frac{1}{F(df_2, df_1)}\)
Requires normal population distributions
When to Use the F-Test
Preliminary to t-test: check equal variance assumption
Quality control: compare process variability
Method validation: test precision of different methods
Research studies: compare variability between groups
Assumptions
Independent samples
Normal distribution in both populations
Random sampling
Important Note The F-test is sensitive to non-normality. If data are not normal, consider using Levene’s test or Brown-Forsythe test instead.
8 Exercise & Assignment
8.0.1 Question 1: Quality Control Analysis
A manufacturing company produces electrical components using two different machines (Machine X and Machine Y). The quality control department wants to determine if there is a significant difference in the consistency (variance) of component weights between the two machines.
Data Collected
Random samples of components from each machine were weighed (in grams):
Machine X (n = 15): 45.2, 44.8, 45.5, 45.1, 44.9, 45.3, 45.0, 44.7, 45.4, 45.1, 44.8, 45.2, 45.0, 44.9, 45.3
Machine Y (n = 12): 45.1, 45.3, 44.9, 45.4, 45.0, 45.2, 44.8, 45.5, 45.1, 44.7, 45.0, 45.2
A manufacturing company compares the consistency of component weights from Machine X and Machine Y.
Required
Formulate the appropriate null and alternative hypotheses for testing whether the variances of component weights differ significantly between the two machines.
Calculate the sample variances for both machines.
Compute the F-test statistic for comparing the variances.
Determine the critical F-value at α = 0.05 significance level.
Make a statistical decision and state your conclusion in the context of the problem.
What practical implications would your conclusion have for the manufacturing process?
8.0.2 Question 2: Teaching Method Comparison
An educational researcher is investigating the effectiveness of two different teaching methods (Traditional vs. Interactive) on student performance. Before comparing the mean scores, the researcher needs to check if the assumption of equal variances is satisfied for conducting a two-sample t-test.
Data Collected Final exam scores (out of 100) from two randomly assigned student groups:
Traditional Method (n = 20): 78, 82, 75, 85, 80, 79, 83, 76, 81, 84, 77, 82, 79, 83, 78, 81, 80, 84, 76, 82
Interactive Method (n = 18): 85, 88, 82, 90, 86, 84, 87, 83, 89, 85, 81, 88, 84, 86, 83, 87, 85, 89
Required
State the hypotheses for testing the equality of variances between the two teaching methods.
Calculate descriptive statistics (mean, variance, standard deviation) for both groups.
Perform the F-test at α = 0.10 significance level.
Interpret the results in terms of the assumption for the subsequent t-test.
Based on your conclusion, which type of two-sample t-test (pooled or Welch’s) would be appropriate for comparing the mean scores? Justify your answer.
Discuss the limitations of using the F-test for checking equal variances assumption.
Solutions-step-by-step
Step 1: Hypotheses
- H₀: \(\sigma_X^2 = \sigma_Y^2\) (Variances are equal)
- H₁: \(\sigma_X^2 \ne \sigma_Y^2\) (Variances differ significantly)
Step 2: Sample Data
- Machine X (n₁ = 15):
45.2, 44.8, 45.5, 45.1, 44.9, 45.3, 45.0, 44.7, 45.4, 45.1, 44.8, 45.2, 45.0, 44.9, 45.3
- Mean: \(\bar{x}_1 = 45.06\)
- Variance: \(s_1^2 = 0.0674\)
- Machine Y (n₂ = 12):
45.1, 45.3, 44.9, 45.4, 45.0, 45.2, 44.8, 45.5, 45.1, 44.7, 45.0, 45.2- Mean: \(\bar{x}_2 = 45.08\)
- Variance: \(s_2^2 = 0.0627\)
- Mean: \(\bar{x}_2 = 45.08\)
Step 3: F-Statistic
\[ F = \frac{s_1^2}{s_2^2} = \frac{0.0674}{0.0627} = 1.075 \]
Step 4: Critical Values
- α = 0.05 (two-tailed)
- df₁ = 14, df₂ = 11
- Upper critical value: \(F_{0.025}(14,11) ≈ 3.29\)
- Lower critical value: \(F_{0.975}(14,11) = \frac{1}{F_{0.025}(11,14)} ≈ \frac{1}{3.42} = 0.292\)
Step 5: Decision
- Since \(0.292 < 1.075 < 3.29\), fail to reject H₀
Step 6: Conclusion
There is no significant difference in the variances of component weights between Machine X and Machine Y.
F(14,11) = 1.075, p > 0.05
Step 7: Practical Implications
Both machines show similar consistency in production. No adjustment is needed based on variance; focus can shift to mean output or other quality metrics.
Assignment Question 2: Teaching Method Comparison
Scenario: An educational researcher compares exam score variances between Traditional and Interactive teaching methods.
Step 1: Hypotheses
- H₀: \(\sigma_T^2 = \sigma_I^2\)
- H₁: \(\sigma_T^2 \ne \sigma_I^2\)
Step 2: Sample Data
Traditional (n₁ = 20):
Mean = 80.05, Variance = 10.71, SD = 3.27Interactive (n₂ = 18):
Mean = 86.06, Variance = 7.38, SD = 2.72
Step 3: F-Statistic
\[ F = \frac{10.71}{7.38} = 1.451 \]
Step 4: Critical Values
- α = 0.10 (two-tailed)
- df₁ = 19, df₂ = 17
- Upper critical value: \(F_{0.05}(19,17) ≈ 2.12\)
- Lower critical value: \(F_{0.95}(19,17) = \frac{1}{F_{0.05}(17,19)} ≈ \frac{1}{2.17} = 0.461\)
Step 5: Decision
- Since \(0.461 < 1.451 < 2.12\), fail to reject H₀
Step 6: Conclusion
Equal variances can be assumed: F(19,17) = 1.451, p > 0.10
Step 7: Appropriate t-Test
Use pooled two-sample t-test since equal variances assumption holds.
Step 8: Limitations of F-Test
- Sensitive to non-normality
- May mislead if data are skewed or contain outliers
- Alternatives: Levene’s test, Brown-Forsythe test
9 Chi-Square Test
Chi-square tests are powerful tools for analyzing categorical data. Always check assumptions and choose the appropriate test based on study design.
Chi-Square Statistic
The general formula for the chi-square test statistic is:
\[ \chi^2 = \sum \frac{(O_i - E_i)^2}{E_i} \]
Where:
- \(O_i\) = observed frequency in category \(i\)
- \(E_i\) = expected frequency in category \(i\)
Degrees of Freedom
- Goodness of Fit:
\[df = k - 1 \]
Where \(k\) is the number of categories
- Test of Independence / Homogeneity:
\[df = (r - 1)(c - 1)\]
Where \(r\) = number of rows, \(c\) = number of columns
Assumptions
- Observations are independent
- Sample size is adequate (all expected frequencies ≥ 5)
- Data are categorical
- Sampling is random
Expected Frequency Calculation
- Goodness of Fit:
\[E = n \times p\]
Where:
\(n\) = total sample size
\(p\) = expected proportion for each category
Independence / Homogeneity:
\[E = \frac{(\text{Row Total}) \times (\text{Column Total})}{\text{Grand Total}} \]
The chi-square test can be used for:
- Test of homogeneity:
Goodness of Fit test: to see if sample data fits a population with a specific distribution.
- Compare observed distribution to a theoretical distribution
- Test if a sample follows specific proportions
- Example: Testing if candy colors match claimed proportions
Test of Independence: to determine if there is a significant association between two categorical variables.
- Determine if two categorical variables are related
- Use when both variables are measured on the same subjects
- Example: Testing if gender is related to product preference
Test of Homogeneity: to determine if different populations have the same distribution of a single categorical variable.
- Compare distributions across different populations
- Use when samples are drawn from separate groups
- Example: Comparing pass rates across different teaching methods
We’ll provide one example for goodness-of-fit, one for independence, and one for homogeneity.
Example 1: Chi-Square Goodness-of-Fit Test
Example 2: Chi-Square Test of Independence
Example 3: Chi-Square Test of Homogeneity
9.0.1 Example 1: Chi-Square Goodness of Fit Test
A candy company claims that their mixed candy bags contain 30% red, 25% green, 20% yellow, 15% blue, and 10% orange candies. A sample of 200 candies is taken:
Observed counts: Red=70 | Green=45| Yellow=38| Blue=30 | Orange=17
Expected proportions: 0.30 | 0.25 | 0.20 | 0.15 | 0.10
Test at α=0.05 if the sample matches the claimed distribution.
Solution
Step 1: State Hypotheses
- Null Hypothesis (H₀): The candy distribution matches the claimed proportions
- Alternative Hypothesis (H₁): The candy distribution does not match the claimed proportions
Step 2: Calculate Expected Frequencies
Total sample size: \(n = 200\)
| Color | Claimed Proportion | Expected Frequency |
|---|---|---|
| Red | 0.30 | \(200 \times 0.30 = 60\) |
| Green | 0.25 | \(200 \times 0.25 = 50\) |
| Yellow | 0.20 | \(200 \times 0.20 = 40\) |
| Blue | 0.15 | \(200 \times 0.15 = 30\) |
| Orange | 0.10 | \(200 \times 0.10 = 20\) |
Step 3: Calculate Chi-Square Statistic
Use the formula:
\[ \chi^2 = \sum \frac{(O - E)^2}{E}\]
| Color | Observed (O) | Expected (E) | \(O - E\) | \((O - E)^2\) | \(\frac{(O - E)^2}{E}\) |
|---|---|---|---|---|---|
| Red | 70 | 60 | 10 | 100 | 1.667 |
| Green | 45 | 50 | -5 | 25 | 0.500 |
| Yellow | 38 | 40 | -2 | 4 | 0.100 |
| Blue | 30 | 30 | 0 | 0 | 0.000 |
| Orange | 17 | 20 | -3 | 9 | 0.450 |
\[ \chi^2 = 1.667 + 0.500 + 0.100 + 0.000 + 0.450 = 2.717 \]
Step 4: Determine Critical Value
- Degrees of freedom: \(df = k - 1 = 5 - 1 = 4\)
- Significance level: \(\alpha = 0.05\)
- Critical value from chi-square table: \(\chi^2_{0.05, 4} = 9.488\)
Step 5: Compare and Decide
- Since \(2.717 < 9.488\), the test statistic is not in the rejection region
- Fail to reject H₀
Step 6: Conclusion
There is no significant evidence that the candy distribution differs from the claimed proportions.
Chi-square(4) = 2.717, p > 0.05
Notes
- This is a goodness-of-fit test comparing observed frequencies to expected frequencies under a specified distribution.
- Assumes random sampling and that expected frequencies are all ≥ 5.
9.0.2 Example 2: Chi-Square Test of Independence
A researcher wants to test if there’s a relationship between gender and preference for a new product. Survey results:
| Gender | Like | Neutral | Dislike | Total |
|---|---|---|---|---|
| Male | 40 | 30 | 20 | 90 |
| Female | 35 | 45 | 30 | 110 |
| Total | 75 | 75 | 50 | 200 |
Test at α=0.05 if gender and product preference are independent.
Solution
Step 1: State Hypotheses
- Null Hypothesis (H₀): Gender and product preference are independent
- Alternative Hypothesis (H₁): Gender and product preference are not independent
Step 2: Observed Frequencies
| Like | Neutral | Dislike | Total | |
|---|---|---|---|---|
| Male | 40 | 30 | 20 | 90 |
| Female | 35 | 45 | 30 | 110 |
| Total | 75 | 75 | 50 | 200 |
Step 3: Calculate Expected Frequencies
Use the formula:
\[ E_{ij} = \frac{(\text{Row Total}) \times (\text{Column Total})}{\text{Grand Total}} \]
| Like | Neutral | Dislike | |
|---|---|---|---|
| Male | \(\frac{90 \times 75}{200} = 33.75\) | \(\frac{90 \times 75}{200} = 33.75\) | \(\frac{90 \times 50}{200} = 22.5\) |
| Female | \(\frac{110 \times 75}{200} = 41.25\) | \(\frac{110 \times 75}{200} = 41.25\) | \(\frac{110 \times 50}{200} = 27.5\) |
Step 4: Calculate Chi-Square Statistic
\[ \chi^2 = \sum \frac{(O - E)^2}{E} \]
| Group | Category | O | E | \(O - E\) | \((O - E)^2\) | \(\frac{(O - E)^2}{E}\) |
|---|---|---|---|---|---|---|
| Male | Like | 40 | 33.75 | 6.25 | 39.06 | 1.157 |
| Male | Neutral | 30 | 33.75 | -3.75 | 14.06 | 0.417 |
| Male | Dislike | 20 | 22.5 | -2.5 | 6.25 | 0.278 |
| Female | Like | 35 | 41.25 | -6.25 | 39.06 | 0.947 |
| Female | Neutral | 45 | 41.25 | 3.75 | 14.06 | 0.341 |
| Female | Dislike | 30 | 27.5 | 2.5 | 6.25 | 0.227 |
\[ \chi^2 = 1.157 + 0.417 + 0.278 + 0.947 + 0.341 + 0.227 = 3.367 \]
Step 5: Determine Critical Value
- Degrees of freedom: \(df = (r - 1)(c - 1) = (2 - 1)(3 - 1) = 2\)
- Significance level: \(\alpha = 0.05\)
- Critical value from chi-square table: \(\chi^2_{0.05, 2} = 5.991\)
Step 6: Compare and Decide
- Since \(3.367 < 5.991\), the test statistic is not in the rejection region:Fail to reject H₀
Step 7: Conclusion
There is no significant evidence of a relationship between gender and product preference.
Chi-square(2) = 3.367, p > 0.05
Notes
- This is a test of independence using a contingency table.
- Assumes random sampling and expected frequencies ≥ 5 in all cells.
9.0.3 Example 3: Chi-Square Test of Homogeneity
Three different teaching methods are used in different classes. Test scores are categorized as Pass/Fail:
| Method | Pass | Fail | Total |
|---|---|---|---|
| Method A | 45 | 15 | 60 |
| Method B | 50 | 20 | 70 |
| Method C | 55 | 15 | 70 |
| Total | 150 | 50 | 200 |
Test at α=0.05 if the pass rates are the same across all methods.
Solution
Step 1: State Hypotheses
- Null Hypothesis (H₀): The pass rates are the same across all teaching methods
- Alternative Hypothesis (H₁): The pass rates differ across teaching methods
Step 2: Observed Frequencies
| Method | Pass | Fail | Total |
|---|---|---|---|
| Method A | 45 | 15 | 60 |
| Method B | 50 | 20 | 70 |
| Method C | 55 | 15 | 70 |
| Total | 150 | 50 | 200 |
Step 3: Calculate Expected Frequencies
Use the formula:
\[ E_{ij} = \frac{(\text{Row Total}) \times (\text{Column Total})}{\text{Grand Total}} \]
| Method | Expected Pass | Expected Fail |
|---|---|---|
| Method A | \(\frac{60 \times 150}{200} = 45\) | \(\frac{60 \times 50}{200} = 15\) |
| Method B | \(\frac{70 \times 150}{200} = 52.5\) | \(\frac{70 \times 50}{200} = 17.5\) |
| Method C | \(\frac{70 \times 150}{200} = 52.5\) | \(\frac{70 \times 50}{200} = 17.5\) |
Step 4: Calculate Chi-Square Statistic
\[ \chi^2 = \sum \frac{(O - E)^2}{E} \]
| Method | Category | O | E | \(O - E\) | \((O - E)^2\) | \(\frac{(O - E)^2}{E}\) |
|---|---|---|---|---|---|---|
| Method A | Pass | 45 | 45 | 0 | 0 | 0.000 |
| Method A | Fail | 15 | 15 | 0 | 0 | 0.000 |
| Method B | Pass | 50 | 52.5 | -2.5 | 6.25 | 0.119 |
| Method B | Fail | 20 | 17.5 | 2.5 | 6.25 | 0.357 |
| Method C | Pass | 55 | 52.5 | 2.5 | 6.25 | 0.119 |
| Method C | Fail | 15 | 17.5 | -2.5 | 6.25 | 0.357 |
\[ \chi^2 = 0.000 + 0.000 + 0.119 + 0.357 + 0.119 + 0.357 = 0.952 \]
Step 5: Determine Critical Value
- Degrees of freedom: \(df = (r - 1)(c - 1) = (3 - 1)(2 - 1) = 2\)
- Significance level: \(\alpha = 0.05\)
- Critical value from chi-square table: \(\chi^2_{0.05, 2} = 5.991\)
Step 6: Compare and Decide
- Since \(0.952 < 5.991\), the test statistic is not in the rejection region
- Fail to reject H₀
Step 7: Conclusion
There is no significant evidence that pass rates differ across teaching methods.
Chi-square(2) = 0.952, p > 0.05