MATH 343: APPLIED STATISTICS NOTES

1 Introduction to hypothesis Testing

Imagine a doctor testing a new drug. The existing drug (Drug A) has a 60% success rate. The new drug (Drug B) is more expensive to produce.

The crucial question: Is Drug B significantly better than Drug A, or is its higher observed success rate in a small trial just due to random chance?

Hypothesis testing is the formal, statistical framework we use to answer these kinds of questions. It allows us to make data-driven inferences about a population based on sample data, while quantifying the uncertainty of those inferences.


2 Core Components

2.1 Null and Alternative Hypotheses

Every hypothesis test sets up two competing claims.

  • Null Hypothesis (H₀): Represents the status quo or no effect.
    Examples:
    • “The new drug is no better than the old one.”
    • “The mean height of women is 65 inches.”
    • “The coin is fair.”
  • Alternative Hypothesis (H₁ or Hₐ): Represents the effect we want to detect.
    Examples:
    • “The new drug is better than the old one.”
    • “The mean height of women is not 65 inches.”
    • “The coin is biased.”

Formulating Hypotheses:

  • Two-tailed test:
    H₀: μ = k
    H₁: μ ≠ k

  • One-tailed test:
    H₀: μ ≤ k H₁: μ > k

    or

    H₀: μ ≥ k H₁: μ < k

The choice between one-tailed and two-tailed must be made before looking at the data.

2.2 Type I and Type II Errors & Power

Because we use samples, we can never be 100% certain.

Decision H₀ True H₀ False
Reject H₀ Type I Error (α) Correct Decision (Power = 1 - β)
Fail to Reject H₀ Correct (1 - α) Type II Error (β)

Type I Error (α): Rejecting a true null hypothesis.

  • Consequence: Concluding an effect exists when it doesn’t. (e.g., Convicting an innocent person, adopting a new drug that is no better).

  • Significance Level (α): The pre-chosen probability of making a Type I error. Common choices are 0.05 (5%), 0.01 (1%), and 0.10 (10%). This is our threshold for “unlikely.”

Type II Error (β): Failing to reject a false null hypothesis.

  • Consequence: Concluding no effect exists when it actually does. (e.g., Letting a guilty person go free, sticking with an old drug when a new one is better).

  • It depends on the sample size, the true effect size, and the chosen α.

Power (1 - β): Probability of correctly rejecting a false null. Researchers typically aim for 80% power.

Ways to increase power:
- Increase sample size (n).
- Increase effect size.
- Increase α (but this raises Type I error risk).

NB: There is a trade-off between Type I and Type II errors. Decreasing α makes it harder to reject H₀, which inadvertently increases β (decreases power), unless compensated for by a larger sample size.

2.3 The p-value: Making the Decision

The p-value is the probability of obtaining a test statistic at least as extreme as the one observed, assuming the null hypothesis (H₀) is true.

How to interpret it: A small p-value (typically ≤ α) means that the observed data would be very unlikely if the null hypothesis were true. This provides evidence against H₀.

The Decision Rule:

  • If p-value ≤ α, we reject the null hypothesis (H₀). The result is “statistically significant.”

  • If p-value > α, we fail to reject the null hypothesis (H₀). We don’t have enough evidence to support H₁.

Crucial: A p-value > α does not prove H₀ is true. It only means the evidence wasn’t strong enough to reject it. Also, “statistically significant” does not necessarily mean “practically important.”

2.4 Steps in Hypothesis Testing:

  • State the null and alternative hypotheses.

  • Choose the level of significance (alpha).

  • Calculate the test statistic.

  • Determine the critical value or p-value.

  • Compare the Z-statistic to the critical value or p-value to α.

  • Make a decision to reject or fail to reject the null hypothesis.

3 One-Sample Tests

3.1 Part A: One-Sample z-test

The Z-test is a statistical hypothesis test used to determine if there is a significant difference between sample and population parameters when the population standard deviation is known.

Key Characteristics:

  • Uses standard normal distribution (Z-distribution)

  • Appropriate for large sample sizes (n ≥ 30)

  • Population standard deviation (σ) must be known

  • More powerful than t-test when assumptions are met

Test statistic:

\[ z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}} \]

where:

x̄ = sample mean

μ₀ = hypothesized population mean under H₀

σ = known population standard deviation

n = sample size

3.1.1 Example 1: One-Sample Z-Test

A company claims its energy bars have an average of 20 grams of protein.The population standard deviation is known to be 1.5 grams. A random sample of 35 bars is taken, and the sample mean is found to be 20.6 grams. At a 5% significance level, is there evidence that the mean protein content is different from 20 grams?

Step 1: Hypotheses

  • Null hypothesis:
    \(H_0: \mu = 20\)

  • Alternative hypothesis:
    \(H_1: \mu \neq 20\) (two-tailed test)

Step 2: Significance Level \(\alpha = 0.05\)

Step 3: Test Statistic (Manual Calculation)

Given:

  • Sample mean \(\bar{x} = 20.6\)
  • Population mean \(\mu_0 = 20\)
  • Population standard deviation \(\sigma = 1.5\)
  • Sample size \(n = 35\)

\[ z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}} \]

\[ z = \frac{20.6 - 20}{1.5 / \sqrt{35}} \]

\[ z = \frac{0.6}{1.5 / 5.916} = \frac{0.6}{0.2535} \approx 2.37 \]

Step 4: P-value

For a two-tailed test:

\[ p\text{-value} = 2 \times P(Z > |2.37|) \]

From Z-table: \(P(Z > 2.37) \approx 0.0089\)

\[ p\text{-value} = 2 \times 0.0089 = 0.0178 \]

Step 5: Decision

  • \(p\)-value (0.0178) < \(\alpha\) (0.05)
  • Reject \(H_0\)

Step 6: Conclusion

At the 5% significance level, there is sufficient evidence to conclude that the true mean protein content of the energy bars is different from 20 grams.

Verification in R:

# Given values
xbar <- 20.6
mu0 <- 20
sigma <- 1.5
n <- 35

# Z statistic
z <- (xbar - mu0) / (sigma / sqrt(n))
z
## [1] 2.366432
# Two-tailed p-value
p_value <- 2 * (1 - pnorm(abs(z)))
p_value
## [1] 0.01796048

3.1.2 Example 2: Two tailed test

A company claims its light bulbs last 1200 hours. The population standard deviation is 100 hours. A sample of 50 bulbs has a mean lifespan of 1175 hours. Test at \(\alpha=0.05\).

Step 1: Hypotheses
\(H_0: \mu = 1200\)
\(H_1: \mu \neq 1200\)

Step 2: Given Values
\(\mu_0=1200\), \(\sigma=100\), \(\bar{x}=1175\), \(n=50\), \(\alpha=0.05\)

Step 3: Test Statistic
\(SE = 100/\sqrt{50} = 14.14\)
\(Z = (1175-1200)/14.14 = -25/14.14 = -1.77\)

Step 4: Critical Values
Two-tailed test, \(Z_{0.025}=\pm1.96\)

Step 5: Decision
\(-1.77 > -1.96\) → Fail to reject \(H_0\).

Step 6: Conclusion
No significant evidence that mean lifespan differs from 1200 hours (\(Z=-1.77\), \(p>0.05\)).

3.1.3 Example 3: One-Tailed Test

A cereal company claims boxes contain at least 500 g. Population \(\sigma=15\). A sample of 40 boxes has mean weight 495 g. Test at \(\alpha=0.01\) if boxes are underfilled.

Step 1: Hypotheses
\(H_0: \mu \geq 500\)
\(H_1: \mu < 500\)

Step 2: Given Values
\(\mu_0=500\), \(\sigma=15\), \(\bar{x}=495\), \(n=40\), \(\alpha=0.01\)

Step 3: Test Statistic
\(SE = 15/\sqrt{40} = 2.37\)
\(Z = (495-500)/2.37 = -5/2.37 = -2.11\)

Step 4: Critical Value
Left-tailed test, \(Z_{0.01}=-2.33\)

Step 5: Decision
\(-2.11 > -2.33\) → Fail to reject \(H_0\).

Step 6: Conclusion
No significant evidence that boxes are underfilled (\(Z=-2.11\), \(p>0.01\)).

3.1.4 Example 4: Right-Tailed Test

A school district claims average SAT math score is 520 with \(\sigma=100\). A new teaching method is tested on 64 students, yielding mean score 540. Test at \(\alpha=0.05\) if the new method improves scores.

Step 1: Hypotheses
\(H_0: \mu \leq 520\)
\(H_1: \mu > 520\)

Step 2: Given Values
\(\mu_0=520\), \(\sigma=100\), \(\bar{x}=540\), \(n=64\), \(\alpha=0.05\)

Step 3: Test Statistic
\(SE = 100/\sqrt{64} = 12.5\)
\(Z = (540-520)/12.5 = 20/12.5 = 1.60\)

Step 4: Critical Value
Right-tailed test, \(Z_{0.05}=1.645\)

Step 5: Decision
\(1.60 < 1.645\) → Fail to reject \(H_0\).

Step 6: Conclusion
No significant evidence that the new method improves scores (\(Z=1.60\), \(p>0.05\)).

  • When to use:
    • Z-test: Population standard deviation (\(\sigma\)) known
    • t-test: Population standard deviation unknown (use sample \(s\))

3.2 Part B: One-Sample t-test

When to use it: To test a hypothesis about a population mean (μ) when:

  • The population standard deviation (σ) is unknown (which is almost always the case in real life).
  • We use the sample standard deviation (s) as an estimate.
  • The sample size is small (n < 30) and the population is approximately normal.

Test statistic:

The t-test is a statistical hypothesis test used to determine if there is a significant difference between the means of two groups or between a sample mean and a known population mean

The one-sample t-test is used to test whether the mean of a sample differs significantly from a hypothesized population mean when the population standard deviation is unknown.

The test statistic is:

\[ t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}} \]

degrees of freedom: \(n-1\)

where s is the sample standard deviation.

with Degrees of Freedom (df): df = n - 1 The t-distribution is slightly wider and more variable than the z-distribution, accounting for the extra uncertainty from estimating σ with s. As df increases, the t-distribution approaches the z-distribution.

3.2.1 Example 1: One-Sample t-test (Practical with R & Python)

Problem: A car manufacturer claims a new model gets at least 40 MPG. A consumer agency tests a random sample of 12 cars, with the following results:

39.2, 40.5, 38.7, 41.0, 39.8, 40.9, 38.5, 39.9, 40.2, 39.3, 41.1, 38.6

Test the manufacturer’s claim at α = 0.05. Assume MPG is approximately normally distributed.

Solution (Manual Steps First):

1. Hypotheses:

H₀: μ ≥ 40 (Manufacturer's claim is true)

H₁: μ < 40 (The mean MPG is less than 40) -> One-tailed (left-tailed) test

2. Significance Level: α = 0.05

3. Calculate Sample Statistics:

Calculate the mean (x̄) and standard deviation (s) of the sample data

\[ \bar{x} = \frac{\text{sum of all values}}{12} \approx 39.783 \]

s ≈ 0.905 (calculated using the sample standard deviation formula)

5. Test Statistic:

\[ t = \frac{\bar{x} - \mu_{0}}{s / \sqrt{n}} = \frac{39.783 - 40}{0.905 / \sqrt{12}} = \frac{-0.217}{0.905 / 3.464} = \frac{-0.217}{0.261} \approx -0.831 \]

df = n - 1 = 11

Find p-value (using t-table for df=11):

This is a left-tailed test. We need P(T < -0.831).

From a t-table, for df=11, the value 0.831 falls between 0.697 and 1.363. The corresponding one-tailed probabilities are 0.25 and 0.10.

We can estimate p-value > 0.10. (Software will give a more precise value).

6. Decision (Manual):

Our estimated p-value (> 0.10) is greater than α (0.05). Therefore, we fail to reject H₀.

Now, let’s solve it precisely with code:

In R:

# Sample data
mpg_data <- c(39.2, 40.5, 38.7, 41.0, 39.8, 40.9, 38.5, 39.9, 40.2, 39.3, 41.1, 38.6)

# Perform one-sample t-test (alternative="less" for H1: mu < 40)
test_result <- t.test(mpg_data, mu = 40, alternative = "less")

# Print the results
print(test_result)
## 
##  One Sample t-test
## 
## data:  mpg_data
## t = -0.69814, df = 11, p-value = 0.2498
## alternative hypothesis: true mean is less than 40
## 95 percent confidence interval:
##      -Inf 40.30138
## sample estimates:
## mean of x 
##  39.80833

Conclusion from R:

The p-value is 0.2116. Since 0.2116 > 0.05, we fail to reject H₀. There is not enough evidence to reject the manufacturer’s claim that the mean MPG is at least 40.

In Python:

#import numpy as np
#from scipy import stats

# Sample data
#mpg_data = np.array([39.2, 40.5, 38.7, 41.0, 39.8, 40.9, 38.5, 39.9, 40.2, 39.3, 41.1, 38.6])

# Perform one-sample t-test
# 'alternative="less"' sets H1: mu < 40
#t_stat, p_value = stats.ttest_1samp(mpg_data, popmean=40, alternative='less')

# Print the results
#print(f"t-statistic: {t_stat:.4f}")
#print(f"p-value: {p_value:.4f}")
# For a one-tailed test, the p-value from `ttest_1samp` with 'alternative' is already correct.
# If using an older version without 'alternative', p_value / 2 for one-tailed.

Conclusion is the same as in R.

3.2.2 Example 2: Two-tailed Test (Fail to Reject)

Problem: A machine claims to fill jars with 100 g. A sample of \(n=10\) jars has \(\bar{x}=105\) g and \(s=8\) g. Test \(H_0: \mu=100\) vs \(H_1: \mu \neq 100\) at \(\alpha=0.05\).

Step 1: State hypotheses
\(H_0: \mu=100\)
\(H_1: \mu\neq100\)

Step 2: Compute standard error
\(SE = s/\sqrt{n} = 8/\sqrt{10} = 8/3.1623 = 2.5298\)

Step 3: Compute test statistic
\(t = (\bar{x}-\mu_0)/SE = (105-100)/2.5298 = 5/2.5298 = 1.976\)

Step 4: Degrees of freedom
\(df = 10-1=9\)

Step 5: Critical value
\(t_{0.975,9}=2.262\)

Decision: \(|t|=1.976 < 2.262\), so fail to reject \(H_0\). No evidence mean differs from 100.

In R:

n <- 10
xbar <- 105
mu0 <- 100
s <- 8

se <- s/sqrt(n)
t_stat <- (xbar - mu0)/se
df <- n-1
p_val <- 2*pt(-abs(t_stat), df)

list(t_statistic = t_stat, df = df, p_value = p_val)
## $t_statistic
## [1] 1.976424
## 
## $df
## [1] 9
## 
## $p_value
## [1] 0.07951604

3.2.3 Example 3: Two-tailed Test with 95% CI (Reject \(H_0\))

Problem: \(n=15\), \(\bar{x}=52\), \(s=3.5\), test \(H_0: \mu=50\) vs \(H_1: \mu\neq50\).

Step 1: SE
\(SE = 3.5/\sqrt{15} = 3.5/3.873 = 0.9035\)

Step 2: Test statistic
\(t = (52-50)/0.9035 = 2/0.9035 = 2.214\)

Step 3: df
\(df=14\)

Step 4: Critical value
\(t_{0.975,14}=2.145\)

Decision: \(t=2.214 > 2.145\), reject \(H_0\).

Step 5: 95% CI
Margin = \(t_{0.975,14}\times SE = 2.145\times 0.9035 = 1.939\)
CI = \(52\pm1.939 = (50.06,53.94)\)

Conclusion: Reject \(H_0\) (p < 0.05). The mean sodium content differs from 50 mg. The 95% CI provides the plausible range.

In R:

n <- 15
xbar <- 52
mu0 <- 50
s <- 3.5

se <- s/sqrt(n)
t_stat <- (xbar - mu0)/se
df <- n-1
p_val <- 2*pt(-abs(t_stat), df)

# 95% CI
alpha <- 0.05
t_crit <- qt(1-alpha/2, df)
margin <- t_crit*se
ci <- c(xbar - margin, xbar + margin)

list(t_statistic = t_stat, df = df, p_value = p_val, CI_95 = ci)
## $t_statistic
## [1] 2.213133
## 
## $df
## [1] 14
## 
## $p_value
## [1] 0.04400273
## 
## $CI_95
## [1] 50.06176 53.93824

3.2.4 Example 4: One-sided Test with 99% CI (Reject \(H_0\))

Problem: Training course improvement, \(n=25\), \(\bar{x}=4\), \(s=6\), test \(H_0: \mu=0\) vs \(H_1: \mu>0\) at \(\alpha=0.01\).

Step 1: SE
\(SE = 6/\sqrt{25} = 6/5 = 1.2\)

Step 2: Test statistic
\(t = (4-0)/1.2 = 4/1.2 = 3.333\)

Step 3: df
\(df=24\)

Step 4: Critical value (one-sided)
\(t_{0.99,24}=2.492\)

Decision: \(t=3.333 > 2.492\), reject \(H_0\). Strong evidence of positive improvement.

Step 5: 99% CI
Margin = \(t_{0.995,24}\times SE = 2.797\times1.2=3.356\)
CI = \(4\pm3.356 = (0.64,7.36)\)

Conclusion: Strong evidence (\(p<0.01\)) that the course increases scores. The 99% CI shows the true mean improvement is positive.

In R:

n <- 25
xbar <- 4
mu0 <- 0
s <- 6

se <- s/sqrt(n)
t_stat <- (xbar - mu0)/se
df <- n-1
p_val <- 1 - pt(t_stat, df) # one-sided

# 99% CI
t_crit <- qt(0.995, df)
margin <- t_crit*se
ci <- c(xbar - margin, xbar + margin)

list(t_statistic = t_stat, df = df, p_value = p_val, CI_99 = ci)
## $t_statistic
## [1] 3.333333
## 
## $df
## [1] 24
## 
## $p_value
## [1] 0.001388157
## 
## $CI_99
## [1] 0.6436726 7.3563274

4 Exercises & Assignments

4.1 Part A: One-Sample Z-test Questions

Q1. A manufacturer claims that the mean lifetime of its light bulbs is 1200 hours. A sample of 64 bulbs has a mean of 1180 hours. Assume the population standard deviation is known to be 80 hours. At the 5% level of significance, test whether the mean lifetime is different from 1200 hours.

Sample Data: Mean = 1180, \(n=64\), \(\sigma=80\), \(\mu_0=1200\).

Q2. The average weight of a packaged product is claimed to be 250 g. A quality inspector samples 100 packages and finds the sample mean to be 247 g. The population standard deviation is 10 g. Test at the 1% level whether the average weight is less than 250 g.

Sample Data: Mean = 247, \(n=100\), \(\sigma=10\), \(\mu_0=250\).

Q3. A machine is set to dispense 500 ml of juice. A random sample of 36 bottles has a mean content of 505 ml. The population standard deviation is known to be 12 ml. Test at the 5% significance level whether the machine is overfilling bottles.

Sample Data: Mean = 505, \(n=36\), \(\sigma=12\), \(\mu_0=500\).

Q4. A national survey found that the average American adult works 43.7 hours per week. The population standard deviation is assumed to be 4.6 hours. You survey 50 adults in your state and find they work an average of 45.1 hours per week. At the α = 0.01 level, is there significant evidence to conclude that workers in your state work more than the national average?

4.2 Part B: One-Sample T-test Questions

Q5. A nutritionist wants to test whether the average daily protein intake of adults differs from the recommended 60 g. A random sample of 12 adults has the following intakes (in grams):

protein <- c(62, 65, 59, 64, 60, 66, 61, 63, 62, 67, 64, 61)
mean(protein); sd(protein); length(protein)
## [1] 62.83333
## [1] 2.443296
## [1] 12

Q6. A teacher claims that the average score of her students on a math test is at least 70. A random sample of 20 students has the following scores:

scores <- c(65, 68, 70, 72, 67, 66, 69, 71, 68, 70,
            64, 66, 67, 68, 72, 69, 70, 68, 67, 66)
mean(scores); sd(scores); length(scores)
## [1] 68.15
## [1] 2.230766
## [1] 20

Q7. A company believes the average monthly expenditure of households on internet services is 2000 KES. A sample of 18 households reports the following expenditures (in KES):

expenditure <- c(2100, 2050, 2150, 2200, 1900, 2000, 2300, 2250, 2100,
                 1950, 2000, 2050, 2150, 2200, 2100, 2250, 2050, 2150)
mean(expenditure); sd(expenditure); length(expenditure)
## [1] 2108.333
## [1] 108.8037
## [1] 18

Q8.The recommended daily calcium intake for adults is 1000 mg. A nutritionist believes the intake for women in their 50s is too low. She collects data from a random sample of 15 women:

980, 1005, 1010, 942, 865, 1200, 1105, 978, 1020, 999, 870, 1050, 1055, 955, 907

Test the nutritionist’s belief at the α = 0.05 level. Assume the population is approximately normal.

5 Test for Difference Between the Means of Two Samples

5.1 A. Two-Sample z-test

The two-sample z-test is used to compare the means of two independent groups when the population standard deviations are known.

When to Use Two-Sample Z-Test

  1. Comparing means of two independent groups

  2. Population standard deviations are known

  3. Sample sizes are sufficiently large (typically n ≥ 30)

  4. Data are approximately normally distributed

Steps for a two-sample z-test:

Step 1: State the Hypotheses

  • Null Hypothesis (H₀): μ₁ = μ₂ (The population means are equal)
  • Alternative Hypothesis (H₁):
    • Two-tailed: μ₁ ≠ μ₂
    • Right-tailed: μ₁ > μ₂
    • Left-tailed: μ₁ < μ₂

Choose the form of H₁ based on the research question.

Step 2: Identify the Given Data

  • Sample 1:
    • Mean: \(\bar{x}_1\)
    • Population standard deviation: \(\sigma_1\)
    • Sample size: \(n_1\)
  • Sample 2:
    • Mean: \(\bar{x}_2\)
    • Population standard deviation: \(\sigma_2\)
    • Sample size: \(n_2\)
  • Significance level: \(\alpha\)

Step 3: Calculate the Z-Statistic

Use the formula:

\[ z = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}} \]

Substitute the known values and simplify step-by-step.

Step 4: Determine the Critical Z-Value

  • Use the standard normal distribution table.
  • For a two-tailed test at \(\alpha = 0.05\), critical values are ±1.96.
  • For a right-tailed test at \(\alpha = 0.05\), critical value is 1.645.
  • For a left-tailed test at \(\alpha = 0.05\), critical value is -1.645.

Adjust based on the chosen tail and significance level.

Step 5: Compare and Decide

  • If the test is two-tailed, reject H₀ if \(|z| > z_{\alpha/2}\)
  • If the test is right-tailed, reject H₀ if \(z > z_\alpha\)
  • If the test is left-tailed, reject H₀ if \(z < -z_\alpha\)

Step 6: Conclusion

  • Reject H₀ if the z-statistic falls in the rejection region.
  • Fail to reject H₀ if the z-statistic does not fall in the rejection region.

State the conclusion in context of the problem, including the calculated z-value and comparison to the critical value.

Notes

  • This test assumes known population standard deviations.
  • If population standard deviations are unknown, consider using a two-sample t-test instead.
  • Ensure samples are independent and drawn randomly from normally distributed populations.

Let’s do three examples.

  • Example 1: Two-tailed test

  • Example 2: One-tailed test (right-tailed)

  • Example 3: One-tailed test (left-tailed)

5.1.1 Example 1: Two-Tailed Z-Test

A study compares test scores between two schools.

  • School A: \(n_1 = 40, \ \bar{x}_1 = 85, \ \sigma_1 = 8\)
  • School B: \(n_2 = 50, \ \bar{x}_2 = 82, \ \sigma_2 = 7.5\)
  • Significance level: \(\alpha = 0.05\)

Solution

Step 1: State Hypotheses

\[H_0: \mu_1 = \mu_2 \quad \text{(no difference in means)}\]

\[H_1: \mu_1 \neq \mu_2 \quad \text{(means differ)}\]

Step 2: Compute Test Statistic (Manual Calculation)

The standard error (SE) is:

\[SE = \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}} = \sqrt{\frac{8^2}{40} + \frac{7.5^2}{50}}= \sqrt{\frac{64}{40} + \frac{56.25}{50}}= \sqrt{1.6 + 1.125} = \sqrt{2.725} \approx 1.651\]

The z-statistic is:

\[z = \frac{\bar{x}_1 - \bar{x}_2}{SE} = \frac{85 - 82}{1.651} \approx 1.82\]

Step 3: Critical Value & Decision Rule

For a two-tailed test at \(\alpha = 0.05\):

\[z_{\alpha/2} = \pm 1.96\]

Decision rule:
- If \(|z| > 1.96\), reject \(H_0\).
- Otherwise, fail to reject \(H_0\).

Here, \(|1.82| < 1.96\), so we fail to reject \(H_0\).

Step 4: Conclusion

There is no significant difference between the mean test scores of the two schools.

  • Test statistic: \(z = 1.82\)
  • Critical values: \(\pm 1.96\)
  • Decision: Fail to reject \(H_0\)
  • Interpretation: The evidence is insufficient at \(\alpha = 0.05\) to conclude a difference in mean test scores.

Step 5: R Verification

# Given data
x1 <- 85; x2 <- 82
n1 <- 40; n2 <- 50
sigma1 <- 8; sigma2 <- 7.5

# Standard error
SE <- sqrt((sigma1^2/n1) + (sigma2^2/n2))
SE
## [1] 1.650757
# Z statistic
z_value <- (x1 - x2)/SE
z_value
## [1] 1.817348
# Two-tailed p-value
p_value <- 2 * (1 - pnorm(abs(z_value)))
p_value
## [1] 0.06916391

5.1.2 Example 2: One-Tailed Test (Right-Tailed)

A company tests two production methods. Method X (n=35) has mean output=120 units/hour, σ=10. Method Y (n=40) has mean output=115 units/hour, σ=9. Test at α=0.01 if Method X is superior.

Solution

Step 1: State Hypotheses

  • Null Hypothesis (H₀): μₓ ≤ μᵧ (Method X is not superior)

  • Alternative Hypothesis (H₁): μₓ > μᵧ (Method X is superior)

Step 2: Given Values - Method X:
- Sample size (n₁) = 35
- Mean (x̄₁) = 120
- Standard deviation (σ₁) = 10

  • Method Y:
    • Sample size (n₂) = 40
    • Mean (x̄₂) = 115
    • Standard deviation (σ₂) = 9
  • Significance level (α) = 0.01

Step 3: Calculate Z-Statistic

\[Z = \frac{x̄₁ - x̄₂}{\sqrt{\frac{σ₁^2}{n₁} + \frac{σ₂^2}{n₂}}} = \frac{120 - 115}{\sqrt{\frac{100}{35} + \frac{81}{40}}} = \frac{5}{\sqrt{2.857 + 2.025}} = \frac{5}{\sqrt{4.882}} = \frac{5}{2.210} ≈ 2.26 \]

Step 4: Critical Value

  • Right-tailed test at α = 0.01
  • Critical Z-value: Zₐ = 2.33

Step 5: Compare and Decide

  • Since 2.26 < 2.33, the Z-statistic is not in the rejection region
  • Fail to reject H₀

Step 6: Conclusion

There is no significant evidence that Method X is superior.
Z = 2.26, p > 0.01

5.1.3 Example 3: One-Tailed Test (Left-Tailed)

A nutritionist compares calorie intake between two diets. Diet A (n=60) has mean=1800 calories, σ=150. Diet B (n=55) has mean=1850 calories, σ=140. Test at α=0.05 if Diet A has lower calorie intake.

Solution

Step 1: State Hypotheses

  • Null Hypothesis (H₀): μₐ ≥ μᵦ (Diet A is not lower in calories)

  • Alternative Hypothesis (H₁): μₐ < μᵦ (Diet A has lower calorie intake)

Step 2: Given Values

  • Diet A:
    • Sample size (n₁) = 60
    • Mean (x̄₁) = 1800
    • Standard deviation (σ₁) = 150
  • Diet B:
    • Sample size (n₂) = 55
    • Mean (x̄₂) = 1850
    • Standard deviation (σ₂) = 140
  • Significance level (α) = 0.05

Step 3: Calculate Z-Statistic

\[ Z = \frac{x̄₁ - x̄₂}{\sqrt{\frac{σ₁^2}{n₁} + \frac{σ₂^2}{n₂}}} = \frac{1800 - 1850}{\sqrt{\frac{22500}{60} + \frac{19600}{55}}} = \frac{-50}{\sqrt{375 + 356.36}} = \frac{-50}{\sqrt{731.36}} = \frac{-50}{27.04} ≈ -1.85 \]

Step 4: Critical Value

  • Left-tailed test at α = 0.05
  • Critical Z-value: Zₐ = -1.645

Step 5: Compare and Decide

  • Since -1.85 < -1.645, the Z-statistic is in the rejection region
  • Reject H₀

Step 6: Conclusion

There is significant evidence that Diet A has lower calorie intake.
Z = -1.85, p < 0.05

5.2 B. Two-Sample T-test

5.2.1 Purpose

A two-tailed test is used to determine whether there is a significant difference between the means of two populations. Unlike a one-tailed test, which focuses on whether one mean is greater than or less than the other, a two-tailed test checks for any difference (positive or negative).

5.2.2 Types of Two-Sample Tests

  • Independent Samples: The two samples are unrelated.

  • Paired Samples: The two samples are related (e.g., before-and-after measurements on the same subjects).

5.2.3 Assumptions

  • The samples are randomly selected.

  • Independence of observations

  • Equal variances for pooled t-test

  • Random sampling

  • The populations are normally distributed (or sample sizes are large enough for the Central Limit Theorem to apply).

  • For independent samples: The variances of the two populations may or may not be equal.

  • For paired samples: The differences between paired observations are normally distributed.

5.2.4 Hypotheses

For independent samples:

  • Null hypothesis (\(H_0\)): \(\mu_1 = \mu_2\) (no difference in population means)

  • Alternative hypothesis (\(H_a\)): \(\mu_1 \neq \mu_2\) (there is a difference)

For paired samples:

  • Null hypothesis (\(H_0\)): \(\mu_d = 0\) (mean difference is zero)

  • Alternative hypothesis (\(H_a\)): \(\mu_d \neq 0\) (mean difference is not zero)

5.2.5 Test Statistics

For Independent Samples

  • If population variances are assumed equal: \[ t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{S_p^2 \left( \frac{1}{n_1} + \frac{1}{n_2} \right)}} \]

    where \(S_p^2 = \frac{(n_1 - 1)S_1^2 + (n_2 - 1)S_2^2}{n_1 + n_2 - 2}\) is the pooled variance.

  • If population variances are not assumed equal (Welch’s t-test): \[ t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2}}} \] Degrees of freedom are calculated using the Welch-Satterthwaite equation.

For Paired Samples \[ t = \frac{\bar{d}}{s_d / \sqrt{n}} \] where \(\bar{d}\) is the mean of the differences, \(s_d\) is the standard deviation of the differences, and \(n\) is the number of pairs.

5.2.6 Decision Rule

  • Compare the calculated \(t\)-value with the critical \(t\)-value from the \(t\)-distribution table at the given significance level (\(\alpha\)) and degrees of freedom.

  • Reject \(H_0\) if the absolute value of the calculated \(t\)-value exceeds the critical \(t\)-value.

There are two types: independent (unpaired) and paired t-tests.

Two-Sample Pooled t-Test (Equal Variances): Used to compare the means of two independent samples when population variances are assumed equal.

Test Statistic

\[t = \frac{\bar{x}_1 - \bar{x}_2}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}\]

Pooled Variance

\[s_p^2 = \frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}\]

Degrees of Freedom

\[df = n_1 + n_2 - 2\]

Welch’s t-Test (Unequal Variances): Used to compare the means of two independent samples when population variances are not assumed equal.

Test Statistic

\[t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}\]

Degrees of Freedom (Welch-Satterthwaite Approximation)

\[df = \frac{\left( \frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} \right)^2}{\frac{\left( \frac{s_1^2}{n_1} \right)^2}{n_1 - 1} + \frac{\left( \frac{s_2^2}{n_2} \right)^2}{n_2 - 1}}\]

Paired t-Test: Used to compare means from the same group at different times or under different conditions.

Test Statistic

\[t = \frac{\bar{d}}{s_d / \sqrt{n}}\]

Where:

  • \(\bar{d}\) = mean of the differences
  • \(s_d\) = standard deviation of the differences

Degrees of Freedom

\[df = n - 1\] (n = No. of paires)

I will provide examples of both.

  • Example 1: Independent two-sample t-test (equal variances assumed)

  • Example 2: Independent two-sample t-test (Welch’s t-test, unequal variances not assumed)

  • Example 3: Paired two-sample t-test

5.2.7 Solved Example 1: Independent Samples**

Problem: A researcher wants to compare the average test scores of students from two different teaching methods. A random sample of 25 students from Method A has a mean score of 78 with a standard deviation of 10. A random sample of 30 students from Method B has a mean score of 75 with a standard deviation of 12. Assume unequal variances. Perform a two-tailed test at \(\alpha = 0.05\).

Solution:

  1. State the hypotheses:

    • \(H_0: \mu_A = \mu_B\)
    • \(H_a: \mu_A \neq \mu_B\)
  2. Calculate the test statistic:

    Using Welch’s \(t\)-test: \[ t = \frac{\bar{x}_A - \bar{x}_B}{\sqrt{\frac{S_A^2}{n_A} + \frac{S_B^2}{n_B}}} \] Substituting values: \[ t = \frac{78 - 75}{\sqrt{\frac{10^2}{25} + \frac{12^2}{30}}} = \frac{3}{\sqrt{4 + 4.8}} = \frac{3}{\sqrt{8.8}} \approx \frac{3}{2.97} \approx 1.01 \]

  3. Degrees of freedom:

    Using the Welch-Satterthwaite formula: \[ df \approx \frac{\left(\frac{S_A^2}{n_A} + \frac{S_B^2}{n_B}\right)^2}{\frac{\left(\frac{S_A^2}{n_A}\right)^2}{n_A - 1} + \frac{\left(\frac{S_B^2}{n_B}\right)^2}{n_B - 1}} \] Substituting values: \[ df \approx \frac{(4 + 4.8)^2}{\frac{4^2}{24} + \frac{4.8^2}{29}} \approx \frac{8.8^2}{\frac{16}{24} + \frac{23.04}{29}} \approx \frac{77.44}{0.667 + 0.795} \approx 52 \]

  4. Critical value:

    From the \(t\)-table, \(t_{\text{critical}} = 2.009\) (for \(\alpha = 0.05\) and \(df = 52\)).

  5. Decision:

    Since \(|t| = 1.01 < 2.009\), we fail to reject \(H_0\). There is no significant difference in the means.

5.2.8 Solved Example 2: Paired Samples

Problem: A study measures the blood pressure of 10 patients before and after a new medication. The differences in systolic blood pressure are: \([-5, -3, -8, -6, -4, -7, -2, -5, -6, -4]\). Perform a two-tailed test at \(\alpha = 0.05\).

Solution: 1. State the hypotheses: - \(H_0: \mu_d = 0\) - \(H_a: \mu_d \neq 0\)

  1. Calculate the mean and standard deviation of differences:

    • Mean: \(\bar{d} = \frac{-5 - 3 - 8 - 6 - 4 - 7 - 2 - 5 - 6 - 4}{10} = -5\)
    • Standard deviation: \(s_d = \sqrt{\frac{\sum(d_i - \bar{d})^2}{n-1}} = \sqrt{\frac{(-5+5)^2 + (-3+5)^2 + ... + (-4+5)^2}{9}} = \sqrt{\frac{50}{9}} \approx 2.36\)
  2. Calculate the test statistic: \[ t = \frac{\bar{d}}{s_d / \sqrt{n}} = \frac{-5}{2.36 / \sqrt{10}} = \frac{-5}{2.36 / 3.16} = \frac{-5}{0.747} \approx -6.7 \]

  3. Degrees of freedom: \(df = n - 1 = 10 - 1 = 9\)

  4. Critical value: From the \(t\)-table, \(t_{\text{critical}} = 2.262\) (for \(\alpha = 0.05\) and \(df = 9\)).

  5. Decision: Since \(|t| = 6.7 > 2.262\), we reject \(H_0\). There is a significant difference in blood pressure before and after the medication.

5.2.9 Solved Example 3: Independent Samples (Equal Variances)

Problem: A researcher wants to compare the average heights of plants grown in two different fertilizers. A random sample of 15 plants from Fertilizer A has a mean height of 20 cm with a standard deviation of 3 cm. A random sample of 18 plants from Fertilizer B has a mean height of 18 cm with a standard deviation of 4 cm. Assume equal variances. Perform a two-tailed test at \(\alpha = 0.05\).

Solution:

  1. State the hypotheses:

    • \(H_0: \mu_A = \mu_B\)
    • \(H_a: \mu_A \neq \mu_B\)
  2. Calculate the pooled variance: \[ S_p^2 = \frac{(n_A - 1)S_A^2 + (n_B - 1)S_B^2}{n_A + n_B - 2} \] Substituting values: \[ S_p^2 = \frac{(15 - 1)(3^2) + (18 - 1)(4^2)}{15 + 18 - 2} = \frac{14(9) + 17(16)}{31} = \frac{126 + 272}{31} = \frac{398}{31} \approx 12.84 \]

  3. Calculate the test statistic: \[ t = \frac{\bar{x}_A - \bar{x}_B}{\sqrt{S_p^2 \left( \frac{1}{n_A} + \frac{1}{n_B} \right)}} \] Substituting values: \[ t = \frac{20 - 18}{\sqrt{12.84 \left( \frac{1}{15} + \frac{1}{18} \right)}} = \frac{2}{\sqrt{12.84 \left( 0.0667 + 0.0556 \right)}} = \frac{2}{\sqrt{12.84 \cdot 0.1223}} = \frac{2}{\sqrt{1.57}} \approx \frac{2}{1.25} \approx 1.6 \]

  4. Degrees of freedom: \[ df = n_A + n_B - 2 = 15 + 18 - 2 = 31 \]

  5. Critical value: From the \(t\)-table, \(t_{\text{critical}} = 2.042\) (for \(\alpha = 0.05\) and \(df = 31\)).

  6. Decision: Since \(|t| = 1.6 < 2.042\), we fail to reject \(H_0\). There is no significant difference in the mean heights of plants grown with the two fertilizers.

5.2.10 Solved Example 4: Paired Samples

Problem: A study measures the reaction times of 12 drivers before and after consuming alcohol. The differences in reaction times (in seconds) are: \([0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3]\). Test if alcohol significantly increases reaction time at \(\alpha = 0.01\).

Solution:

  1. State the hypotheses:

    • \(H_0: \mu_d = 0\)
    • \(H_a: \mu_d \neq 0\)
  2. Calculate the mean and standard deviation of differences:

    • Mean: \(\bar{d} = \frac{0.2 + 0.3 + 0.4 + ... + 1.3}{12} = \frac{7.8}{12} = 0.65\)

    • Standard deviation: \(s_d = \sqrt{\frac{\sum(d_i - \bar{d})^2}{n-1}}\) \[ s_d = \sqrt{\frac{(0.2 - 0.65)^2 + (0.3 - 0.65)^2 + ... + (1.3 - 0.65)^2}{11}} \]

      \[ = \sqrt{\frac{0.2025 + 0.1225 + ... + 0.4225}{11}} \] = $$

  3. Calculate the test statistic: \[ t = \frac{\bar{d}}{s_d / \sqrt{n}} = \frac{0.65}{0.51 / \sqrt{12}} = \frac{0.65}{0.51 / 3.46} = \frac{0.65}{0.147} \approx 4.42 \]

  4. Degrees of freedom: \[ df = n - 1 = 12 - 1 = 11 \]

  5. Critical value: From the \(t\)-table, \(t_{\text{critical}} = 3.106\) (for \(\alpha = 0.01\) and \(df = 11\)).

  6. Decision: Since \(|t| = 4.42 > 3.106\), we reject \(H_0\). Alcohol significantly increases reaction time.

5.2.11 Solved Example 5: Independent Samples (Unequal Variances)

Problem: A company compares the productivity of two teams. Team X produces 30 items with a mean output of 50 units and a standard deviation of 8 units. Team Y produces 25 items with a mean output of 45 units and a standard deviation of 10 units. Assume unequal variances. Perform a two-tailed test at \(\alpha = 0.05\).

Solution:

  1. State the hypotheses:

    • \(H_0: \mu_X = \mu_Y\)
    • \(H_a: \mu_X \neq \mu_Y\)
  2. Calculate the test statistic: Using Welch’s \(t\)-test: \[ t = \frac{\bar{x}_X - \bar{x}_Y}{\sqrt{\frac{S_X^2}{n_X} + \frac{S_Y^2}{n_Y}}} \] Substituting values: \[ t = \frac{50 - 45}{\sqrt{\frac{8^2}{30} + \frac{10^2}{25}}} \] \[ = \frac{5}{\sqrt{\frac{64}{30} + \frac{100}{25}}} = \frac{5}{\sqrt{2.13 + 4}} \]

    \[ = \frac{5}{\sqrt{6.13}} \approx \frac{5}{2.47} \approx 2.02 \]

  3. Degrees of freedom: Using the Welch-Satterthwaite formula: \[ df \approx \frac{\left(\frac{S_X^2}{n_X} + \frac{S_Y^2}{n_Y}\right)^2}{\frac{\left(\frac{S_X^2}{n_X}\right)^2}{n_X - 1} + \frac{\left(\frac{S_Y^2}{n_Y}\right)^2}{n_Y - 1}} \] Substituting values: \[ df \approx \frac{(2.13 + 4)^2}{\frac{2.13^2}{29} + \frac{4^2}{24}} \]

    \[ = \frac{6.13^2}{\frac{4.54}{29} + \frac{16}{24}}\] \[ = \frac{37.57}{0.157 + 0.667} \approx \frac{37.57}{0.824} \approx 45.6 \]

  4. Critical value: From the \(t\)-table, \(t_{\text{critical}} = 2.014\) (for \(\alpha = 0.05\) and \(df = 45\)).

  5. Decision: Since \(|t| = 2.02 > 2.014\), we reject \(H_0\). There is a significant difference in productivity between the two teams.

5.2.12 Solved Example 6: Paired Samples

Problem: A study measures the cholesterol levels of 8 patients before and after a new diet. The differences in cholesterol levels are: \([-10, -15, -20, -12, -18, -14, -16, -13]\). Test if the diet significantly reduces cholesterol at \(\alpha = 0.05\).

Solution:

  1. State the hypotheses:

    • \(H_0: \mu_d = 0\)
    • \(H_a: \mu_d \neq 0\)
  2. Calculate the mean and standard deviation of differences:

    • Mean: \(\bar{d} = \frac{-10 - 15 - 20 - 12 - 18 - 14 - 16 - 13}{8} = \frac{-118}{8} = -14.75\)
    • Standard deviation: \(s_d = \sqrt{\frac{\sum(d_i - \bar{d})^2}{n-1}}\) \[ s_d = \sqrt{\frac{(-10 + 14.75)^2 + (-15 + 14.75)^2 + ... + (-13 + 14.75)^2}{7}} \] \[ = \sqrt{\frac{22.56 + 0.06 + ... + 3.06}{7}} = \sqrt{\frac{33.63}{7}} \approx \sqrt{4.8} \approx 2.19 \]
  3. Calculate the test statistic: \[ t = \frac{\bar{d}}{s_d / \sqrt{n}} = \frac{-14.75}{2.19 / \sqrt{8}} \] \[ = \frac{-14.75}{2.19 / 2.83} = \frac{-14.75}{0.774} \approx -19.05 \]

  4. Degrees of freedom: \[ df = n - 1 = 8 - 1 = 7 \]

  5. Critical value: From the \(t\)-table, \(t_{\text{critical}} = 2.365\) (for \(\alpha = 0.05\) and \(df = 7\)).

  6. Decision: Since \(|t| = 19.05 > 2.365\), we reject \(H_0\). The diet significantly reduces cholesterol levels.

5.2.13 Solved Example 7: Two-Tailed Test (Equal Variances Assumed)

A researcher wants to compare the effectiveness of two teaching methods. Method A is used on 25 students, Method B on 30 students. Test scores are recorded:

Method A: 78, 82, 85, 79, 83, 88, 76, 81, 84, 80, 82, 85, 79, 83, 87, 77, 82, 86, 80, 84, 78, 81, 85, 79, 83

Method B: 75, 79, 82, 76, 80, 84, 74, 78, 81, 77, 79, 83, 75, 80, 82, 76, 79, 81, 77, 80, 75, 78, 82, 76, 79, 81, 75, 78, 82, 76

Test at α = 0.05 if there’s a significant difference between methods.

Solution

Step 1: State Hypotheses

  • H₀: μ₁ = μ₂ (No difference in mean scores)

  • H₁: μ₁ ≠ μ₂ (Means differ significantly)

Step 2: Sample Statistics

  • Method A: n₁ = 25, x̄₁ = 81.6, s₁ = 3.24
  • Method B: n₂ = 30, x̄₂ = 78.9, s₂ = 2.98

Step 3: Pooled Variance

\[ s_p^2 = \frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2} = \frac{(24)(10.50) + (29)(8.88)}{53} = \frac{252 + 257.5}{53} = 9.61 \]

\[ s_p = \sqrt{9.61} = 3.10 \]

Step 4: t-Statistic

\[ t = \frac{x̄_1 - x̄_2}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} = \frac{2.7}{3.10 \times \sqrt{0.04 + 0.0333}} = \frac{2.7}{0.84} ≈ 3.21 \]

Step 5: Critical Value

  • Degrees of freedom: df = 25 + 30 - 2 = 53

  • α = 0.05 (two-tailed)

  • Critical t-value: ±2.006

Step 6: Decision

  • Since 3.21 > 2.006, reject H₀

Step 7: Conclusion

There is a significant difference in effectiveness between the two teaching methods.
t(53) = 3.21, p < 0.05

5.2.14 Solved Example 8: One-Tailed Test (Welch’s t-test, Unequal Variances)

A company tests two battery types. Type X (n=15) has mean life=120 hours, s=12. Type Y (n=20) has mean life=115 hours, s=8. Test at α=0.05 if Type X lasts longer.

Solution

Step 1: State Hypotheses - H₀: $ μ_X ≤ μ_Y $

  • H₁: $ μ_X > μ_Y $

Step 2: Given Values - Type X: $n₁ = 15, x̄₁ = 120, s₁ = 12 $ - Type Y: \(n₂ = 20, x̄₂ = 115, s₂ = 8\)

Step 3: Welch’s t-Statistic

\[ t = \frac{x̄_1 - x̄_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} = \frac{5}{\sqrt{9.6 + 3.2}} = \frac{5}{\sqrt{12.8}} = \frac{5}{3.58} ≈ 1.40 \]

Step 4: Degrees of Freedom

\[ df = \frac{(s_1^2/n_1 + s_2^2/n_2)^2}{\frac{(s_1^2/n_1)^2}{n_1 - 1} + \frac{(s_2^2/n_2)^2}{n_2 - 1}} = \frac{(12.8)^2}{\frac{(9.6)^2}{14} + \frac{(3.2)^2}{19}} = \frac{163.84}{6.58 + 0.54} ≈ 23.01 \]

Step 5: Critical Value - df ≈ 23, α = 0.05 (one-tailed)
- Critical t-value: 1.714

Step 6: Decision - Since 1.40 < 1.714, fail to reject H₀

Step 7: Conclusion No significant evidence that Type X batteries last longer.
t(23) = 1.40, p > 0.05

5.2.15 Solved Example 9: Practical Application with Raw Data

A fitness trainer compares weight loss between two diets. Participants are randomly assigned:

Diet Plan A (n=12): 5.2, 4.8, 6.1, 5.5, 4.9, 5.8, 6.2, 5.1, 4.7, 5.9, 5.3, 5.6 kg

Diet Plan B (n=10): 4.1, 4.5, 3.9, 4.8, 4.0, 4.3, 3.7, 4.6, 4.2, 4.4 kg

Test at α=0.05 if Diet A leads to greater weight loss.

Solution

Step 1: State Hypotheses - H₀: μ_A ≤ μ_B - H₁: μ_A > μ_B

Step 2: Sample Statistics

  • Diet A: n₁ = 12, ∑x = 65.1, x̄ = 5.425, ∑x² = 356.7 9

  • Diet B: n₂ = 10, ∑x = 42.5, x̄ = 4.25, ∑x² = 182.0 9

Variance Calculations

\[ s_1^2 = \frac{∑x_1^2 - (∑x_1)^2 / n_1}{n_1 - 1} = \frac{356.79 - (65.1)^2 / 12}{11} = \frac{356.79 - 353.17}{11} = 0.329 \]

\[ s_2^2 = \frac{∑x_2^2 - (∑x_2)^2 / n_2}{n_2 - 1} = \frac{182.09 - (42.5)^2 / 10}{9} = \frac{182.09 - 180.63}{9} = 0.162 \]

Step 3: Pooled Variance

\[ s_p^2 = \frac{(11)(0.329) + (9)(0.162)}{20} = \frac{3.619 + 1.458}{20} = 0.254 \quad s_p = \sqrt{0.254} = 0.504 \]

Step 4: t-Statistic

\[ t = \frac{5.425 - 4.25}{0.504 \sqrt{\frac{1}{12} + \frac{1}{10}}} = \frac{1.175}{0.504 \times \sqrt{0.1833}} = \frac{1.175}{0.216} ≈ 5.44 \]

Step 5: Critical Value

  • df = 12 + 10 - 2 = 20

  • α = 0.05 (one-tailed)

  • Critical t-value: 1.725

Step 6: Decision

  • Since 5.44 > 1.725, reject H₀

Step 7: Conclusion

Diet A leads to significantly greater weight loss than Diet B.
t(20) = 5.44, p < 0.001

6 Exercise & Assignment

6.1 Part A: Two-Sample Z-Test Questions

6.1.1 Question 1: Product Quality Comparison

A consumer protection agency wants to compare the average weight of cereal boxes from two different brands. The population standard deviations are known from historical data.

Known Information:

Brand A: Population σ = 15 grams

Brand B: Population σ = 12 grams

Sample sizes: n₁ = 50 boxes of Brand A, n₂ = 45 boxes of Brand B

Sample means: x̄₁ = 498 grams, x̄₂ = 505 grams

Required

  • Test at α = 0.05 if there is a significant difference in average weights between the two brands.

  • Calculate the 95% confidence interval for the difference in means.

  • What sample size would be needed to detect a difference of 5 grams with 90% power?

6.1.2 Question 2: Manufacturing Process Evaluation

A factory has two production lines for making electrical components. The specification requires components to have a resistance of 100 ohms. Historical data shows known population standard deviations.

Data:

Line 1: n = 60, x̄ = 101.5 ohms, σ = 2.5 ohms

Line 2: n = 55, x̄ = 99.8 ohms, σ = 3.0 ohms

Required:

  • Test at α = 0.01 if Line 1 produces components with significantly higher resistance than Line 2.

  • Calculate the p-value for the test.

  • What is the probability of Type II error if the true difference in means is 1 ohm?

6.2 Part B: Two-Sample t-Test Questions

6.2.1 Question 3: Teaching Method Effectiveness

A school district wants to compare the effectiveness of traditional teaching methods versus technology-enhanced methods. Students were randomly assigned to two groups.

Test Scores Data: Traditional Group (n = 25): 78, 82, 75, 85, 80, 79, 83, 76, 81, 84, 77, 82, 79, 83, 78, 81, 80, 84, 76, 82, 79, 83, 77, 81, 80

Technology Group (n = 28): 85, 88, 82, 90, 86, 84, 87, 83, 89, 85, 81, 88, 84, 86, 83, 87, 85, 89, 84, 87, 82, 88, 86, 85, 87, 84, 88, 86

Required:

  • Perform a two-sample t-test at α = 0.05 to determine if technology-enhanced methods lead to higher scores.

  • Check the assumption of equal variances using an F-test.

  • Calculate Cohen’s d to measure effect size.

  • Interpret the results in educational context.

6.2.2 Question 4: Drug Efficacy Study

A pharmaceutical company is testing a new drug for cholesterol reduction. Patients are randomly assigned to treatment and control groups.

Cholesterol Reduction (mg/dL): Treatment Group (n = 20): 25, 28, 32, 35, 29, 31, 27, 34, 30, 33, 26, 29, 32, 36, 28, 31, 34, 27, 30, 33

Control Group (n = 18): 18, 20, 22, 19, 21, 23, 17, 20, 24, 19, 22, 18, 21, 23, 20, 22, 19, 21

Required:

  • Test at α = 0.01 if the treatment group shows significantly greater cholesterol reduction.

  • Should you use pooled or Welch’s t-test? Justify your choice.

  • Calculate the 99% confidence interval for the difference in means.

  • What are the practical implications of your findings?

6.2.3 Question 5: Independent Samples

A company tests two production methods. Method X produces 100 items with a mean weight of 15 kg and a standard deviation of 2 kg. Method Y produces 120 items with a mean weight of 14.5 kg and a standard deviation of 2.5 kg. Test if the mean weights differ at \(\alpha = 0.01\).

6.2.4 Question 6: Paired Samples

A fitness program measures the weights of 15 participants before and after a 6-month training period. The differences in weights are: \([-3, -2, -4, -1, -5, -3, -2, -4, -3, -2, -1, -3, -4, -2, -3]\). Test if the program significantly reduces weight at \(\alpha = 0.05\).

6.2.5 Question 7: Paired Samples

A psychologist studies the effect of a new therapy on stress levels. The stress scores (on a scale of 0 to 100) of 12 patients before and after the therapy are as follows:

Patient Before Therapy After Therapy
1 75 60
2 80 65
3 70 55
4 85 70
5 90 75
6 72 60
7 88 73
8 78 68
9 82 67
10 76 62
11 84 70
12 79 65

Perform a two-tailed test at \(\alpha = 0.01\) to determine if the therapy significantly reduces stress levels.

6.2.6 Question 8: Paired Samples

A nutritionist evaluates the effectiveness of a new diet plan on blood sugar levels. The blood sugar levels (in mg/dL) of 15 patients before and after the diet are recorded as follows:

Patient Before Diet After Diet
1 120 110
2 130 120
3 140 130
4 125 115
5 135 125
6 145 135
7 150 140
8 160 150
9 140 130
10 155 145
11 165 155
12 170 160
13 180 170
14 165 155
15 175 165

Perform a two-tailed test at \(\alpha = 0.05\) to determine if the diet significantly lowers blood sugar levels.

6.2.7 Question 9: Challenge Problem

Compare the performance of two algorithms on 50 datasets. Algorithm A has a mean accuracy of 85% with a standard deviation of 5%, while Algorithm B has a mean accuracy of 87% with a standard deviation of 6%. Assume unequal variances. Perform a two-tailed test at \(\alpha = 0.05\).

7 F-Tests (Variance Ratio test)

1. Introduction

The F-test is a statistical procedure used to compare the variances of two populations. It is based on the F-distribution and helps test hypotheses about whether two population variances are equal.

2. Types of F-Tests

a. Two-Sample F-Test for Variances

Used to determine if two populations have the same variance.
- H₀: \(\sigma_1^2 = \sigma_2^2\)
- H₁: \(\sigma_1^2 \ne \sigma_2^2\) (two-tailed), or
\(\sigma_1^2 > \sigma_2^2\), \(\sigma_1^2 < \sigma_2^2\) (one-tailed)

b. ANOVA F-Test

Used in Analysis of Variance to compare the means of three or more groups.
- Tests whether at least one group mean differs significantly
- Compares between-group variance to within-group variance

3. Assumptions of the F-Test

  • Populations are normally distributed
  • Samples are independent
  • Data is continuous (for variance comparison)

4. Steps for Conducting a Two-Sample F-Test for Variances

Step 1: State the Hypotheses

  • Null Hypothesis (H₀): \(\sigma_1^2 = \sigma_2^2\)
  • Alternative Hypothesis (H₁):
    • Two-tailed: \(\sigma_1^2 \ne \sigma_2^2\)
    • One-tailed: \(\sigma_1^2 > \sigma_2^2\) or \(\sigma_1^2 < \sigma_2^2\)

Step 2: Calculate the Test Statistic

\[F = \frac{s_1^2}{s_2^2} \]

Where:

  • \(s_1^2\) and \(s_2^2\) are the sample variances
  • By convention, place the larger variance in the numerator so that \(F \ge 1\)

Step 3: Determine the Degrees of Freedom

  • Numerator degrees of freedom: \(df_1 = n_1 - 1\)
  • Denominator degrees of freedom: \(df_2 = n_2 - 1\)

Step 4: Find the Critical Value

  • Use the F-distribution table
  • Input: \(df_1\), \(df_2\), and significance level \(\alpha\)
  • For two-tailed tests, use \(\alpha/2\) in each tail

Step 5: Make a Decision

  • Reject H₀ if \(F\) is greater than the critical value

  • Alternatively, use the p-value approach:

    • If \(p < \alpha\), reject H₀
    • If \(p \ge \alpha\), fail to reject H₀

7.0.1 Example 1: Two-Tailed F-Test for Equal Variances

A quality control manager wants to compare the consistency of two machines. Samples are taken from each machine:

Machine A (n=16): 102, 105, 98, 100, 103, 99, 101, 104, 97, 102, 100, 103, 99, 101, 105, 98

Machine B (n=13): 100, 98, 102, 97, 99, 101, 96, 100, 98, 103, 97, 99, 101

Test at α=0.05 if the variances differ significantly.

Solution

Step 1: State Hypotheses

  • H₀: \(\sigma_1^2 = \sigma_2^2\)
  • H₁: \(\sigma_1^2 \ne \sigma_2^2\)

Step 2: Sample Data

  • Machine A: n₁ = 16, ∑x = 1617, \(\bar{x}_1 = 101.06\), \(s_1^2 = 7.796\)

  • Machine B: n₂ = 13, ∑x = 1291, \(\bar{x}_2 = 99.31\), \(s_2^2 = 3.564\)

Step 3: Calculate F-Statistic

\[ F = \frac{7.796}{3.564} = 2.188 \]

Step 4: Critical Values

  • α = 0.05 (two-tailed)
  • df₁ = 15, df₂ = 12
  • Upper critical value: \(F_{0.025}(15,12) = 3.18\)
  • Lower critical value: \(F_{0.975}(15,12) = \frac{1}{F_{0.025}(12,15)} = \frac{1}{3.67} = 0.272\)

Step 5: Decision

  • Rejection region: F < 0.272 or F > 3.18
  • Since \(0.272 < 2.188 < 3.18\), fail to reject H₀

Step 6: Conclusion

No significant difference in variances between the two machines.
F(15,12) = 2.188, p > 0.05

7.0.2 Example 2: One-Tailed F-Test (Testing if One Variance is Greater)

A pharmaceutical company tests two production methods. They want to know if Method B has less variability than Method A. Samples:

Method A (n=11): 24.8, 25.2, 24.9, 25.5, 24.7, 25.1, 25.3, 24.6, 25.4, 24.8, 25.0 mg

Method B (n=9): 25.1, 25.0, 25.2, 25.1, 25.0, 25.2, 25.1, 25.0, 25.1 mg

Test at α=0.05 if Method B has significantly lower variance.

Solution

Step 1: State Hypotheses

  • H₀: \(\sigma_A^2 \le \sigma_B^2\)
  • H₁: \(\sigma_A^2 > \sigma_B^2\)

Step 2: Sample Data

  • Method A: n₁ = 11, ∑x = 274.3, \(\bar{x}_1 = 24.94\), \(s_A^2 = 0.0934\)
  • Method B: n₂ = 9, ∑x = 225.9, \(\bar{x}_2 = 25.10\), \(s_B^2 = 0.0050\)

Step 3: Calculate F-Statistic

\[F = \frac{0.0934}{0.0050} = 18.68 \]

Step 4: Critical Value

  • α = 0.05 (one-tailed)

  • df₁ = 10, df₂ = 8

  • Critical value: \(F_{0.05}(10,8) = 3.35\)

Step 5: Decision

  • Rejection region: F > 3.35

  • Since \(18.68 > 3.35\), reject H₀

Step 6: Conclusion

Method B has significantly lower variance than Method A. F(10,8) = 18.68, p < 0.05

7.0.3 Example 3: F-Test as Preliminary Test for t-Test

A researcher wants to compare two teaching methods but first needs to check if equal variances can be assumed.

Group 1 (Traditional, n=21): Variance = 45.2

Group 2 (Experimental, n=18): Variance = 28.7

Test at α=0.10 if equal variances can be assumed for the subsequent t-test.

Solution

Step 1: State Hypotheses

  • H₀: \(\sigma_1^2 = \sigma_2^2\)
  • H₁: \(\sigma_1^2 \ne \sigma_2^2\)

Step 2: Sample Data

  • Group 1 (Traditional): n₁ = 21, variance = 45.2
  • Group 2 (Experimental): n₂ = 18, variance = 28.7

Step 3: Calculate F-Statistic

\[F = \frac{45.2}{28.7} = 1.575\]

Step 4: Critical Values

  • α = 0.10 (two-tailed)
  • df₁ = 20, df₂ = 17
  • Upper critical value: \(F_{0.05}(20,17) = 2.23\)
  • Lower critical value: \(F_{0.95}(20,17) = \frac{1}{F_{0.05}(17,20)} = \frac{1}{2.16} = 0.463\)

Step 5: Decision

  • Rejection region: F < 0.463 or F > 2.23

  • Since \(0.463 < 1.575 < 2.23\), fail to reject H₀

Step 6: Conclusion

Equal variances can be assumed. The researcher can proceed with a pooled t-test.

Key Formulas and Rules

F-Statistic

\[F = \frac{s_1^2}{s_2^2}\]

  • Always place the larger variance in the numerator

  • F ≥ 1 by construction

Degrees of Freedom

  • df₁ = n₁ - 1 (numerator)

  • df₂ = n₂ - 1 (denominator)

Critical Value Rules

  • Two-tailed test: Compare F to \(F_{\alpha/2}(df_1, df_2)\) and \(1/F_{\alpha/2}(df_2, df_1)\)
  • One-tailed test:
    • If testing \(\sigma_1^2 > \sigma_2^2\): compare F to \(F_\alpha(df_1, df_2)\)
    • If testing \(\sigma_1^2 < \sigma_2^2\): use \(F = s_2^2 / s_1^2\) and compare to \(F_\alpha(df_2, df_1)\)

F-Distribution Properties

  • Right-skewed distribution

  • \(F(df_1, df_2) = \frac{1}{F(df_2, df_1)}\)

  • Requires normal population distributions

When to Use the F-Test

  • Preliminary to t-test: check equal variance assumption

  • Quality control: compare process variability

  • Method validation: test precision of different methods

  • Research studies: compare variability between groups

Assumptions

  • Independent samples

  • Normal distribution in both populations

  • Random sampling

Important Note The F-test is sensitive to non-normality. If data are not normal, consider using Levene’s test or Brown-Forsythe test instead.

8 Exercise & Assignment

8.0.1 Question 1: Quality Control Analysis

A manufacturing company produces electrical components using two different machines (Machine X and Machine Y). The quality control department wants to determine if there is a significant difference in the consistency (variance) of component weights between the two machines.

Data Collected

Random samples of components from each machine were weighed (in grams):

Machine X (n = 15): 45.2, 44.8, 45.5, 45.1, 44.9, 45.3, 45.0, 44.7, 45.4, 45.1, 44.8, 45.2, 45.0, 44.9, 45.3

Machine Y (n = 12): 45.1, 45.3, 44.9, 45.4, 45.0, 45.2, 44.8, 45.5, 45.1, 44.7, 45.0, 45.2

A manufacturing company compares the consistency of component weights from Machine X and Machine Y.

Required

  • Formulate the appropriate null and alternative hypotheses for testing whether the variances of component weights differ significantly between the two machines.

  • Calculate the sample variances for both machines.

  • Compute the F-test statistic for comparing the variances.

  • Determine the critical F-value at α = 0.05 significance level.

  • Make a statistical decision and state your conclusion in the context of the problem.

  • What practical implications would your conclusion have for the manufacturing process?

8.0.2 Question 2: Teaching Method Comparison

An educational researcher is investigating the effectiveness of two different teaching methods (Traditional vs. Interactive) on student performance. Before comparing the mean scores, the researcher needs to check if the assumption of equal variances is satisfied for conducting a two-sample t-test.

Data Collected Final exam scores (out of 100) from two randomly assigned student groups:

Traditional Method (n = 20): 78, 82, 75, 85, 80, 79, 83, 76, 81, 84, 77, 82, 79, 83, 78, 81, 80, 84, 76, 82

Interactive Method (n = 18): 85, 88, 82, 90, 86, 84, 87, 83, 89, 85, 81, 88, 84, 86, 83, 87, 85, 89

Required

  • State the hypotheses for testing the equality of variances between the two teaching methods.

  • Calculate descriptive statistics (mean, variance, standard deviation) for both groups.

  • Perform the F-test at α = 0.10 significance level.

  • Interpret the results in terms of the assumption for the subsequent t-test.

  • Based on your conclusion, which type of two-sample t-test (pooled or Welch’s) would be appropriate for comparing the mean scores? Justify your answer.

  • Discuss the limitations of using the F-test for checking equal variances assumption.

Solutions-step-by-step

Step 1: Hypotheses - H₀: \(\sigma_X^2 = \sigma_Y^2\) (Variances are equal)
- H₁: \(\sigma_X^2 \ne \sigma_Y^2\) (Variances differ significantly)

Step 2: Sample Data - Machine X (n₁ = 15):
45.2, 44.8, 45.5, 45.1, 44.9, 45.3, 45.0, 44.7, 45.4, 45.1, 44.8, 45.2, 45.0, 44.9, 45.3
- Mean: \(\bar{x}_1 = 45.06\)
- Variance: \(s_1^2 = 0.0674\)

  • Machine Y (n₂ = 12):
    45.1, 45.3, 44.9, 45.4, 45.0, 45.2, 44.8, 45.5, 45.1, 44.7, 45.0, 45.2
    • Mean: \(\bar{x}_2 = 45.08\)
    • Variance: \(s_2^2 = 0.0627\)

Step 3: F-Statistic

\[ F = \frac{s_1^2}{s_2^2} = \frac{0.0674}{0.0627} = 1.075 \]

Step 4: Critical Values

  • α = 0.05 (two-tailed)
  • df₁ = 14, df₂ = 11
  • Upper critical value: \(F_{0.025}(14,11) ≈ 3.29\)
  • Lower critical value: \(F_{0.975}(14,11) = \frac{1}{F_{0.025}(11,14)} ≈ \frac{1}{3.42} = 0.292\)

Step 5: Decision

  • Since \(0.292 < 1.075 < 3.29\), fail to reject H₀

Step 6: Conclusion

There is no significant difference in the variances of component weights between Machine X and Machine Y.
F(14,11) = 1.075, p > 0.05

Step 7: Practical Implications

Both machines show similar consistency in production. No adjustment is needed based on variance; focus can shift to mean output or other quality metrics.

Assignment Question 2: Teaching Method Comparison

Scenario: An educational researcher compares exam score variances between Traditional and Interactive teaching methods.

Step 1: Hypotheses

  • H₀: \(\sigma_T^2 = \sigma_I^2\)
  • H₁: \(\sigma_T^2 \ne \sigma_I^2\)

Step 2: Sample Data

  • Traditional (n₁ = 20):
    Mean = 80.05, Variance = 10.71, SD = 3.27

  • Interactive (n₂ = 18):
    Mean = 86.06, Variance = 7.38, SD = 2.72

Step 3: F-Statistic

\[ F = \frac{10.71}{7.38} = 1.451 \]

Step 4: Critical Values

  • α = 0.10 (two-tailed)
  • df₁ = 19, df₂ = 17
  • Upper critical value: \(F_{0.05}(19,17) ≈ 2.12\)
  • Lower critical value: \(F_{0.95}(19,17) = \frac{1}{F_{0.05}(17,19)} ≈ \frac{1}{2.17} = 0.461\)

Step 5: Decision

  • Since \(0.461 < 1.451 < 2.12\), fail to reject H₀

Step 6: Conclusion

Equal variances can be assumed: F(19,17) = 1.451, p > 0.10

Step 7: Appropriate t-Test

Use pooled two-sample t-test since equal variances assumption holds.

Step 8: Limitations of F-Test

  • Sensitive to non-normality
  • May mislead if data are skewed or contain outliers
  • Alternatives: Levene’s test, Brown-Forsythe test

9 Chi-Square Test

Chi-square tests are powerful tools for analyzing categorical data. Always check assumptions and choose the appropriate test based on study design.

Chi-Square Statistic

The general formula for the chi-square test statistic is:

\[ \chi^2 = \sum \frac{(O_i - E_i)^2}{E_i} \]

Where:

  • \(O_i\) = observed frequency in category \(i\)
  • \(E_i\) = expected frequency in category \(i\)

Degrees of Freedom

  • Goodness of Fit:

\[df = k - 1 \]

Where \(k\) is the number of categories

  • Test of Independence / Homogeneity:

\[df = (r - 1)(c - 1)\]

Where \(r\) = number of rows, \(c\) = number of columns

Assumptions

  • Observations are independent
  • Sample size is adequate (all expected frequencies ≥ 5)
  • Data are categorical
  • Sampling is random

Expected Frequency Calculation

  • Goodness of Fit:

\[E = n \times p\]

Where:

  • \(n\) = total sample size

  • \(p\) = expected proportion for each category

  • Independence / Homogeneity:

\[E = \frac{(\text{Row Total}) \times (\text{Column Total})}{\text{Grand Total}} \]

The chi-square test can be used for:

  • Test of homogeneity:

Goodness of Fit test: to see if sample data fits a population with a specific distribution.

  • Compare observed distribution to a theoretical distribution
  • Test if a sample follows specific proportions
  • Example: Testing if candy colors match claimed proportions

Test of Independence: to determine if there is a significant association between two categorical variables.

  • Determine if two categorical variables are related
  • Use when both variables are measured on the same subjects
  • Example: Testing if gender is related to product preference

Test of Homogeneity: to determine if different populations have the same distribution of a single categorical variable.

  • Compare distributions across different populations
  • Use when samples are drawn from separate groups
  • Example: Comparing pass rates across different teaching methods

We’ll provide one example for goodness-of-fit, one for independence, and one for homogeneity.

Example 1: Chi-Square Goodness-of-Fit Test

Example 2: Chi-Square Test of Independence

Example 3: Chi-Square Test of Homogeneity

9.0.1 Example 1: Chi-Square Goodness of Fit Test

A candy company claims that their mixed candy bags contain 30% red, 25% green, 20% yellow, 15% blue, and 10% orange candies. A sample of 200 candies is taken:

Observed counts: Red=70 | Green=45| Yellow=38| Blue=30 | Orange=17

Expected proportions: 0.30 | 0.25 | 0.20 | 0.15 | 0.10

Test at α=0.05 if the sample matches the claimed distribution.

Solution

Step 1: State Hypotheses

  • Null Hypothesis (H₀): The candy distribution matches the claimed proportions
  • Alternative Hypothesis (H₁): The candy distribution does not match the claimed proportions

Step 2: Calculate Expected Frequencies

Total sample size: \(n = 200\)

Color Claimed Proportion Expected Frequency
Red 0.30 \(200 \times 0.30 = 60\)
Green 0.25 \(200 \times 0.25 = 50\)
Yellow 0.20 \(200 \times 0.20 = 40\)
Blue 0.15 \(200 \times 0.15 = 30\)
Orange 0.10 \(200 \times 0.10 = 20\)

Step 3: Calculate Chi-Square Statistic

Use the formula:

\[ \chi^2 = \sum \frac{(O - E)^2}{E}\]

Color Observed (O) Expected (E) \(O - E\) \((O - E)^2\) \(\frac{(O - E)^2}{E}\)
Red 70 60 10 100 1.667
Green 45 50 -5 25 0.500
Yellow 38 40 -2 4 0.100
Blue 30 30 0 0 0.000
Orange 17 20 -3 9 0.450

\[ \chi^2 = 1.667 + 0.500 + 0.100 + 0.000 + 0.450 = 2.717 \]

Step 4: Determine Critical Value

  • Degrees of freedom: \(df = k - 1 = 5 - 1 = 4\)
  • Significance level: \(\alpha = 0.05\)
  • Critical value from chi-square table: \(\chi^2_{0.05, 4} = 9.488\)

Step 5: Compare and Decide

  • Since \(2.717 < 9.488\), the test statistic is not in the rejection region
  • Fail to reject H₀

Step 6: Conclusion

There is no significant evidence that the candy distribution differs from the claimed proportions.
Chi-square(4) = 2.717, p > 0.05

Notes

  • This is a goodness-of-fit test comparing observed frequencies to expected frequencies under a specified distribution.
  • Assumes random sampling and that expected frequencies are all ≥ 5.

9.0.2 Example 2: Chi-Square Test of Independence

A researcher wants to test if there’s a relationship between gender and preference for a new product. Survey results:

Gender Like Neutral Dislike Total
Male 40 30 20 90
Female 35 45 30 110
Total 75 75 50 200

Test at α=0.05 if gender and product preference are independent.

Solution

Step 1: State Hypotheses

  • Null Hypothesis (H₀): Gender and product preference are independent
  • Alternative Hypothesis (H₁): Gender and product preference are not independent

Step 2: Observed Frequencies

Like Neutral Dislike Total
Male 40 30 20 90
Female 35 45 30 110
Total 75 75 50 200

Step 3: Calculate Expected Frequencies

Use the formula:

\[ E_{ij} = \frac{(\text{Row Total}) \times (\text{Column Total})}{\text{Grand Total}} \]

Like Neutral Dislike
Male \(\frac{90 \times 75}{200} = 33.75\) \(\frac{90 \times 75}{200} = 33.75\) \(\frac{90 \times 50}{200} = 22.5\)
Female \(\frac{110 \times 75}{200} = 41.25\) \(\frac{110 \times 75}{200} = 41.25\) \(\frac{110 \times 50}{200} = 27.5\)

Step 4: Calculate Chi-Square Statistic

\[ \chi^2 = \sum \frac{(O - E)^2}{E} \]

Group Category O E \(O - E\) \((O - E)^2\) \(\frac{(O - E)^2}{E}\)
Male Like 40 33.75 6.25 39.06 1.157
Male Neutral 30 33.75 -3.75 14.06 0.417
Male Dislike 20 22.5 -2.5 6.25 0.278
Female Like 35 41.25 -6.25 39.06 0.947
Female Neutral 45 41.25 3.75 14.06 0.341
Female Dislike 30 27.5 2.5 6.25 0.227

\[ \chi^2 = 1.157 + 0.417 + 0.278 + 0.947 + 0.341 + 0.227 = 3.367 \]

Step 5: Determine Critical Value

  • Degrees of freedom: \(df = (r - 1)(c - 1) = (2 - 1)(3 - 1) = 2\)
  • Significance level: \(\alpha = 0.05\)
  • Critical value from chi-square table: \(\chi^2_{0.05, 2} = 5.991\)

Step 6: Compare and Decide

  • Since \(3.367 < 5.991\), the test statistic is not in the rejection region:Fail to reject H₀

Step 7: Conclusion

There is no significant evidence of a relationship between gender and product preference.
Chi-square(2) = 3.367, p > 0.05

Notes

  • This is a test of independence using a contingency table.
  • Assumes random sampling and expected frequencies ≥ 5 in all cells.

9.0.3 Example 3: Chi-Square Test of Homogeneity

Three different teaching methods are used in different classes. Test scores are categorized as Pass/Fail:

Method Pass Fail Total
Method A 45 15 60
Method B 50 20 70
Method C 55 15 70
Total 150 50 200

Test at α=0.05 if the pass rates are the same across all methods.

Solution

Step 1: State Hypotheses

  • Null Hypothesis (H₀): The pass rates are the same across all teaching methods
  • Alternative Hypothesis (H₁): The pass rates differ across teaching methods

Step 2: Observed Frequencies

Method Pass Fail Total
Method A 45 15 60
Method B 50 20 70
Method C 55 15 70
Total 150 50 200

Step 3: Calculate Expected Frequencies

Use the formula:

\[ E_{ij} = \frac{(\text{Row Total}) \times (\text{Column Total})}{\text{Grand Total}} \]

Method Expected Pass Expected Fail
Method A \(\frac{60 \times 150}{200} = 45\) \(\frac{60 \times 50}{200} = 15\)
Method B \(\frac{70 \times 150}{200} = 52.5\) \(\frac{70 \times 50}{200} = 17.5\)
Method C \(\frac{70 \times 150}{200} = 52.5\) \(\frac{70 \times 50}{200} = 17.5\)

Step 4: Calculate Chi-Square Statistic

\[ \chi^2 = \sum \frac{(O - E)^2}{E} \]

Method Category O E \(O - E\) \((O - E)^2\) \(\frac{(O - E)^2}{E}\)
Method A Pass 45 45 0 0 0.000
Method A Fail 15 15 0 0 0.000
Method B Pass 50 52.5 -2.5 6.25 0.119
Method B Fail 20 17.5 2.5 6.25 0.357
Method C Pass 55 52.5 2.5 6.25 0.119
Method C Fail 15 17.5 -2.5 6.25 0.357

\[ \chi^2 = 0.000 + 0.000 + 0.119 + 0.357 + 0.119 + 0.357 = 0.952 \]

Step 5: Determine Critical Value

  • Degrees of freedom: \(df = (r - 1)(c - 1) = (3 - 1)(2 - 1) = 2\)
  • Significance level: \(\alpha = 0.05\)
  • Critical value from chi-square table: \(\chi^2_{0.05, 2} = 5.991\)

Step 6: Compare and Decide

  • Since \(0.952 < 5.991\), the test statistic is not in the rejection region
  • Fail to reject H₀

Step 7: Conclusion

There is no significant evidence that pass rates differ across teaching methods.
Chi-square(2) = 0.952, p > 0.05