Semester 1, 2025 - University of Sydney
Course: STAT5002 Introduction to Statistics
Lecturer: Tiangang Cui
# Given Data
test.A <- c(5.90, 5.26, 2.97, 7.15, 10.06, 11.87, 1.94, 6.27, 6.81, 4.08,
8.13, 15.18, 8.82, 3.87, 5.23, 11.29, 7.92, 12.82, 7.20, 10.03)
test.B <- c(6.07, 4.89, 2.92, 7.00, 9.99, 11.70, 1.94, 5.86, 6.95, 4.03,
7.76, 15.02, 9.08, 3.73, 4.88, 10.81, 8.05, 12.96, 7.10, 10.07)
# Calculate differences D = test.A - test.B
D_1 <- test.A - test.B
# Number of pairs
n_1 <- length(D_1)
Let mean_D(μ_D) represent the true mean difference in braking deceleration between the new tire and the standard tire for vehicles. The engineer conducts a paired test where each vehicle is tested twice: once using the new tire and once using the standard tire.
We are interested in testing whether the new tire provides a statistically significant improvement in braking. Improvement means higher deceleration (vehicles stop faster).
Null Hypothesis (H₀):mean_D(μ_D)=0
There is no difference in mean braking deceleration between the new and standard tires. Any observed difference is due to random variation.
Alternative Hypothesis (H₁):mean_D(μ_D)>0
The mean braking deceleration with the new tire is greater than that with the standard tire. This implies that the new tire improves braking performance on average.
A paired t-test is the right choice here because the same vehicle is tested with both the new and standard tires. So, the data is naturally paired. This helps cancel out the differences between vehicles and lets us directly compare how each one performs with the two tire types. It gives more accurate results by reducing random variation that would be missed if we treated the data as independent.
# Box plot to see the outliers
boxplot(D_1, horizontal=T,main="Boxplot of Differences")
hist(D_1, breaks = 10, main = "Histogram of Differences (New - Standard)",
xlab = "Difference in Deceleration (m/s^2)", col = "green")
qqnorm(D_1, main = "Q-Q Plot of Differences")
qqline(D_1, col = "brown")
The differences in braking deceleration (New - Standard) seem to follow a roughly normal distribution. The histogram is fairly symmetric and centered around zero, with no extreme outliers. The Q-Q plot shows most points close to the line, with only minor deviations at the tails. Even the box plot shows the data is with no outliers.
Thus, the normality assumption is reasonably met, and it is appropriate to use the paired t-test.
Formula for T test: \[ T = \frac{\bar{X} - \mu_0}{SE_0(\bar{X})} = \frac{\bar{X} - \mu_0}{\frac{\hat{\sigma}}{\sqrt{n}}} \]
\[ \hat{\sigma} = \sqrt{ \frac{1}{n - 1} \sum_{i = 1}^{n} (X_i - \bar{X})^2 } \]
mean_D_1 <- mean(D_1) # Mean difference
sd_D_1 <- sd(D_1) # Std dev of differences
#Standard error
se_D_1 <- (sd_D_1/sqrt(n_1))
# Compute t-statistic
t_stat_1= mean_D_1/se_D_1
# Degrees of freedom
df_1 <- n_1 - 1
# Calculate one-sided p-value (right tail)
p_value_1 <- 1 - pt(t_stat_1, df_1)
# 95% confidence interval
alpha <- 0.05
t_crit <- qt(1 - alpha/2, df = df_1)
CI_95 <- mean_D_1 + c(-1, 1) * t_crit * se_D_1
# Print results
cat("Mean difference:", round(mean_D_1, 3), "\n")
## Mean difference: 0.099
cat("Standard deviation:", round(sd_D_1, 3), "\n")
## Standard deviation: 0.214
cat("t-statistic:", round(t_stat_1, 3), "\n")
## t-statistic: 2.082
cat("Degrees of freedom:", df_1, "\n")
## Degrees of freedom: 19
cat("One-sided p-value:", round(p_value_1, 4), "\n")
## One-sided p-value: 0.0255
cat("95% Confidence Interval: (", round(CI_95[1], 4), ",", round(CI_95[2], 4), ")\n")
## 95% Confidence Interval: ( -5e-04 , 0.1995 )
The test statistic for the paired t-test is 2.082. This value follows a t-distribution with 19 degrees of freedom if the null hypothesis is true.
The one-sided p-value for this test statistic is 0.0255.
When the test statistic is large and positive, it shows evidence against the null hypothesis. Instead, it supports the idea that the new tire has a higher mean braking deceleration.
Since our test statistic (2.082) is quite large and the p-value (0.0255) is less than 0.05, we reject the null hypothesis.
Conclusion: There is enough evidence to say the new tire improves braking deceleration compared to the standard tire.
The p-value is 0.0255. Since this is less than the common 5% significance level, we reject the null hypothesis and conclude that the new tire significantly improves braking deceleration.
However, if we used a stricter level like 2%, the p-value would be too large to reject the null. So, at 2% significance, we wouldn’t have enough evidence to say the new tire is better.
In short, at the usual 5% level, there is enough evidence to support that the new tire improves braking and we reject the null hypothesis.
D_centered_1 <- D_1 - mean(D_1)
set.seed(123) # For reproducibility
B_1 <- 10000 # Number of bootstrap samples
t_boot_1 <- numeric(B_1)
for (i in 1:B_1) {
# Resample differences with replacement
sample_D_1 <- sample(D_centered_1, size = n_1, replace = TRUE)
# Calculate sample mean and sd of bootstrap sample
mean_boot_1 <- mean(sample_D_1)
sd_boot_1 <- sd(sample_D_1)
# Compute bootstrap t-statistic
t_boot_1[i] <- mean_boot_1 / (sd_boot_1 / sqrt(n_1))
}
# Plot histogram of bootstrap t-statistics
hist(t_boot_1, freq= F,breaks = 40, col = "blue",
main = "Bootstrap Distribution of t-statistics",
xlab = "Bootstrap t-statistic")
# Add theoretical t-distribution curve
curve(dt(x, df = n_1 - 1), add = TRUE, col = "darkred", lwd = 2, lty = 2)
# Add standard normal distribution curve
curve(dnorm(x),add = TRUE, col = "darkgreen", lwd = 2, lty = 3)
abline(v = t_stat_1, col = "pink", lwd = 2) # observed t-statistic
The histogram of the 10,000 bootstrap t-statistics (using freq = FALSE) displays a bell-shaped distribution centered around zero, consistent with the null hypothesis. The bootstrap was performed using centered differences, ensuring the distribution reflects the null scenario (mean difference = 0).
Overlaid are two theoretical curves: the t-distribution with 19 degrees of freedom (in red) and the standard normal distribution (in green). Both curves align well with the bootstrap histogram, especially the t-distribution.
The observed t-statistic (2.082) is marked in pink and lies in the right tail, indicating moderate evidence against the null. The close match between the curves and the bootstrap distribution supports the use of the paired t-test and its assumptions in this case.
p_boot_1 <- mean(t_boot_1 >= t_stat_1)
cat("Bootstrap p-value:", round(p_boot_1, 4), "\n")
## Bootstrap p-value: 0.0208
Conclusion: Since the bootstrap p-value ≈ 0.0208 < 0.05, we reject H₀ based on the bootstrap method.
The bootstrap p-value of 0.0208 is smaller than the typical significance level of 0.05. This means that under the null hypothesis (that there is no difference in braking performance between the two tires), observing a test statistic as large as or larger than 2.082 is uncommon in the simulated data.
Summary: I performed a paired t-test to determine whether the new tire improves braking deceleration compared to the standard tire, using data from the same set of vehicles tested under both conditions.
Paired t-test Result: t-statistic = 2.082 p-value (one-sided) = 0.0255
Since this p-value is less than 0.05, the result is statistically significant under the classical t-test assumption.
This would lead us to reject the null hypothesis and conclude that the new tire significantly improves braking deceleration.
Bootstrap Simulation Result: Bootstrap p-value = 0.0208
This value is also less than 0.05, indicating that the observed t-statistic is unlikely under the null hypothesis when we rely on the empirical distribution of the data.
This suggests we reject the null hypothesis based on the bootstrap method as well.
The close agreement between the theoretical and bootstrap-based conclusions implies that:
The normality assumption required for the t-test appears to be reasonable for the differences in deceleration.
The bootstrap method, which makes fewer assumptions, confirms the reliability of the classical test result in this case.
Final Verdict: Both the paired t-test (p = 0.0255) and the bootstrap method (p = 0.0208) lead to the same conclusion — we reject the null hypothesis. There is sufficient evidence to claim that the new tire significantly improves braking deceleration. The classical p-value (0.0255) and bootstrap p-value (0.0208) are both below 0.05, reinforcing our conclusion that the new tire significantly improves braking. Conclusion: Based on the bootstrap result, we should reject the null hypothesis. There is sufficient evidence to claim the new tire improves braking deceleration. Both classical and bootstrap analyses suggest that the new tire results in statistically higher braking deceleration compared to the standard tire at the 5% significance level. Therefore, there is sufficient evidence to support the automotive engineer’s claim that the new tire improves vehicle braking performance. Since the p-value (\(\approx 0.0255\)) is less than the significance level \(\alpha = 0.05\), we reject the null hypothesis. There is sufficient statistical evidence to conclude that the new tire provides greater average braking deceleration than the standard tire.
#Given data
group.A_2 <- c(5.54, 4.41, 6.35, 5.04, 7.33, 6.47, 4.08, 6.00, 7.39, 5.53, 1.54, 6.16,
4.23, 2.36, 5.09, 5.10, 5.33, 3.75, 6.49, 2.13, 5.44, 7.74, 3.80)
group.B_2 <- c(4.31, 6.20, 5.25, 2.14, 3.26, 1.47, 2.24, 4.20, 3.56, 3.68, 7.02, 2.94,
5.49, 3.37, 4.59, 3.05, 5.24)
##(a) State the null and alternative hypotheses to test whether two programs have the same effect on the student performance. In answering, introduce appropriate parameters, as well as a null and alternative hypothesis in terms of these parameters.] Let μ_A is the average student performance for the Online Tutoring group(Group A), and μ_B is the average student performance for the In-Person Tutoring group (Group B).
We want to check if the tutoring method actually makes a difference in how well students perform.
Null Hypothesis (H₀):μ_A - μ_B=0
This means: there is no real difference in the average performance between students who got online tutoring and those who got in-person tutoring. In other words, the type of tutoring doesn’t matter.
Alternative Hypothesis (H₁):μ_A- μ_B ≠0
This means: there is a difference in the average performance between the two groups. So, the kind of tutoring does affect how students perform.
##(b) Use appropriate graphical and numerical summaries to assess whether the necessary assumptions for applying the classical two-sample t-test are satisfied.
Assumptions for two-sample classical t-test: The samples are independent. Each group’s scores are approximately normally distributed. The variances of the two groups are equal (homoscedasticity).
boxplot(group.A_2, group.B_2, names=c("Online", "In-Person"),
main="Boxplot of Student Performance", col=c("brown", "yellow"), horizontal=TRUE)
hist(group.A_2, main="Histogram: Group A (Online)", xlab="Scores", col="brown")
hist(group.B_2, main="Histogram: Group B (In-Person)", xlab="Scores", col="yellow")
qqnorm(group.A_2, main="Q-Q Plot: Group A")
qqline(group.A_2)
qqnorm(group.B_2, main="Q-Q Plot: Group B")
qqline(group.B_2)
Numerical Summaries
mean.A_2 <- mean(group.A_2)
mean.B_2 <- mean(group.B_2)
sd.A_2 <- sd(group.A_2)
sd.B_2 <- sd(group.B_2)
n.A_2 <- length(group.A_2)
n.B_2 <- length(group.B_2)
cat("Mean A (Online):", round(mean.A_2, 3), "\n")
## Mean A (Online): 5.1
cat("Mean B (In-Person):", round(mean.B_2, 3), "\n")
## Mean B (In-Person): 4.001
cat("SD A:", round(sd.A_2, 3), "\n")
## SD A: 1.649
cat("SD B:", round(sd.B_2, 3), "\n")
## SD B: 1.5
cat("Sample sizes A and B:", n.A_2, ",", n.B_2, "\n")
## Sample sizes A and B: 23 , 17
Interpretation: The boxplots show the median, spread, and potential outliers. Histograms give a rough idea of normality. Q-Q plots help assess normality by comparing quantiles of data against theoretical normal quantiles. Points roughly on the line indicate approximate normality.
From these plots, you can judge if normality assumption is reasonable.
##(c) Write down the formula for the test statistic of the classical two-sample t-test, and calculate the observed test statistic. Show your working step by step, rounding each step to three decimal places. Two-Sample t-Test Formula
We use the following test statistic for comparing two means:
\[ T = \frac{\bar{X} - \bar{Y}}{\hat{\sigma}_p \sqrt{\frac{1}{m} + \frac{1}{n}}} \sim t_{m+n-2} \]
Where the pooled standard deviation is:
\[ \hat{\sigma}_p = \sqrt{\frac{\sum_{i=1}^{m}(X_i - \bar{X})^2 + \sum_{j=1}^{n}(Y_j - \bar{Y})^2}{m + n - 2}} = \sqrt{\frac{(m - 1)\hat{\sigma}_X^2 + (n - 1)\hat{\sigma}_Y^2}{m + n - 2}} \]
sp2_2 <- ((n.A_2 - 1) * sd.A_2^2 + (n.B_2 - 1) * sd.B_2^2) / (n.A_2 + n.B_2 - 2)
sp_2 <- sqrt(sp2_2)
t_stat_2 <- (mean.A_2 - mean.B_2) / (sp_2* sqrt(1/n.A_2 + 1/n.B_2))
df_2<- n.A_2 + n.B_2 - 2
cat("Pooled variance:", round(sp2_2, 3), "\n")
## Pooled variance: 2.522
#observed t-statistics
cat("Observed t-statistic:", round(t_stat_2, 3), "\n")
## Observed t-statistic: 2.165
# Calculate two-tailed p-value
p_value_2 <- 2 * (1 - pt(abs(t_stat_2), df_2))
# Display p-value
cat("p-value:", round(p_value_2, 4), "\n")
## p-value: 0.0368
##(d) Construct the critical region of rejection at the 5% level of significance. What is your conclusion of the hypothesis test based on the critical region?
t_crit_2 <- qt(0.975, df_2)
cat("Critical t-value (±):", round(t_crit_2, 3), "\n")
## Critical t-value (±): 2.024
# Critical region
critical_region_lower_2 <- -t_crit_2
critical_region_upper_2 <- t_crit_2
critical_region_2 <- c(-1, 1) * t_crit_2
cat("Critical Region: t <", round(critical_region_lower_2, 3), "or t >", round(critical_region_upper_2, 3), "\n")
## Critical Region: t < -2.024 or t > 2.024
# Confidence interval
mean_diff_2 <- mean.A_2 - mean.B_2
margin_error_2 <- t_crit_2 * sp_2 * sqrt(1/n.A_2 + 1/n.B_2)
CI_95_2 <- c(mean_diff_2 - margin_error_2, mean_diff_2 + margin_error_2)
cat("95% Confidence Interval for Mean Difference: [", round(CI_95_2[1], 3), ",", round(CI_95_2[2], 3), "]\n")
## 95% Confidence Interval for Mean Difference: [ 0.071 , 2.128 ]
# Hypothesis test conclusion
if (t_stat_2 < -t_crit_2 | t_stat_2 > t_crit_2) {
cat("Conclusion: Reject the null hypothesis at the 5% significance level.\n")
} else {
cat("Conclusion: Fail to reject the null hypothesis at the 5% significance level.\n")
}
## Conclusion: Reject the null hypothesis at the 5% significance level.
Decision:
If |t_stat| > t_crit → Reject H₀.
In this case, t_stat ≈ 2.16 > 2.024 → Reject H₀
##(e) Now conduct a Welch test using R, what is the computed P-value, how does it compare with the classical two-sample t-test?
welch_test_2 <- t.test(group.A_2, group.B_2, var.equal = FALSE)
welch_test_2
##
## Welch Two Sample t-test
##
## data: group.A_2 and group.B_2
## t = 2.1964, df = 36.294, p-value = 0.03453
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.0845232 2.1143003
## sample estimates:
## mean of x mean of y
## 5.100000 4.000588
cat("\nConclusion:\n")
##
## Conclusion:
if (abs(t_stat_2) > t_crit_2) {
cat("Classical t-test result: Reject H0 → Select H1 (mu_A- mu_B ≠ 0 )\n")
} else {
cat("Classical t-test result: Fail to reject H0 → Do not select H1\n")
}
## Classical t-test result: Reject H0 → Select H1 (mu_A- mu_B ≠ 0 )
if (welch_test_2$p.value < 0.05) {
cat("Welch's t-test result: Reject H0 → Select H1 (mu_A- mu_B ≠ 0 ) \n")
} else {
cat("Welch's t-test result: Fail to reject H0 → Do not select H1\n")
}
## Welch's t-test result: Reject H0 → Select H1 (mu_A- mu_B ≠ 0 )
Welch’s t-test vs. Classical Two-Sample t-test To address the potential inequality of variances between the two groups, a Welch Two Sample t-test was conducted using R. The results of the Welch test are as follows:
Welch t-statistic: 2.196 Degrees of freedom (Welch): 36.29 p-value (Welch): 0.0345 95% Confidence Interval: [0.085, 2.114] Mean of Group A (Online): 5.10 Mean of Group B (In-Person): 4.00
In comparison, the classical two-sample t-test (assuming equal variances) yielded: Classical t-statistic: 2.165 Degrees of freedom: 38 p-value (Classical): 0.0368 95% Confidence Interval: [0.071, 2.128]
Conclusion: Both tests produce similar t-statistics and p-values, leading to the same conclusion: Reject the null hypothesis at the 5% significance level. However, the Welch test is more robust in cases where variances are unequal or sample sizes differ (as seen here: 23 vs. 17). It slightly adjusts the degrees of freedom downward (from 38 to ~36.29), leading to a slightly smaller p-value (0.0345 vs. 0.0368).
Device Usage Table
| Age Group | Laptop | Desktop | Tablet | Total |
|---|---|---|---|---|
| Under 18 | 12 | 6 | 12 | 30 |
| 18–29 | 14 | 10 | 6 | 30 |
| 30–49 | 16 | 12 | 12 | 40 |
| 50+ | 8 | 16 | 6 | 30 |
| Total | 50 | 44 | 36 | 130 |
##(a) State the null and alternative hypotheses. We are testing whether device preference (Laptop, Desktop, Tablet) is independent of age group.
Null Hypothesis (H₀): Device preference is independent of age group. In other words, the distribution of device preferences is the same across all age groups.
Alternative Hypothesis (H₁): Device preference is dependent on age group. That is, the distribution of device preferences differs between age groups.
##(b) Set up the table of expected frequencies. ### Expected Value Formula
The expected value for each cell is calculated using the formula:
\[ E_{ij} = \frac{(\text{Row Total})_i \times (\text{Column Total})_j}{\text{Grand Total}} \]
Using R
observed <- matrix(c(
12, 6, 12,
14, 10, 6,
16, 12, 12,
8, 16, 6
), nrow = 4, byrow = TRUE)
rownames(observed) <- c("Under18", "18to29", "30to49", "50plus")
colnames(observed) <- c("Laptop", "Desktop", "Tablet")
round(observed,3)
## Laptop Desktop Tablet
## Under18 12 6 12
## 18to29 14 10 6
## 30to49 16 12 12
## 50plus 8 16 6
cat("\n")
row_totals <- rowSums(observed)
row_totals
## Under18 18to29 30to49 50plus
## 30 30 40 30
cat("\n")
col_totals <- colSums(observed)
col_totals
## Laptop Desktop Tablet
## 50 44 36
cat("\n")
total <- sum(observed)
expected <- outer(row_totals, col_totals) / total
# Loop to Print Formula for Each Expected Value
for (i in 1:nrow(expected)) {
for (j in 1:ncol(expected)) {
row_name <- rownames(expected)[i]
col_name <- colnames(expected)[j]
row_total <- row_totals[i]
col_total <- col_totals[j]
exp_value <- expected[i, j]
cat(sprintf("E[%s, %s] = (%d * %d) / %d = %.3f\n",
row_name, col_name, row_total, col_total, total, exp_value))
}
}
## E[Under18, Laptop] = (30 * 50) / 130 = 11.538
## E[Under18, Desktop] = (30 * 44) / 130 = 10.154
## E[Under18, Tablet] = (30 * 36) / 130 = 8.308
## E[18to29, Laptop] = (30 * 50) / 130 = 11.538
## E[18to29, Desktop] = (30 * 44) / 130 = 10.154
## E[18to29, Tablet] = (30 * 36) / 130 = 8.308
## E[30to49, Laptop] = (40 * 50) / 130 = 15.385
## E[30to49, Desktop] = (40 * 44) / 130 = 13.538
## E[30to49, Tablet] = (40 * 36) / 130 = 11.077
## E[50plus, Laptop] = (30 * 50) / 130 = 11.538
## E[50plus, Desktop] = (30 * 44) / 130 = 10.154
## E[50plus, Tablet] = (30 * 36) / 130 = 8.308
cat("\n")
round(expected, 3)
## Laptop Desktop Tablet
## Under18 11.538 10.154 8.308
## 18to29 11.538 10.154 8.308
## 30to49 15.385 13.538 11.077
## 50plus 11.538 10.154 8.308
cat("\n")
By hand calculation
\[ E_{11} = \frac{30 \times 50}{130} \approx 11.538 \] \[ E_{12} = \frac{30 \times 44}{130} \approx 10.154 \] \[ E_{13} = \frac{30 \times 36}{130} \approx 8.308 \]
\[ E_{21} = \frac{30 \times 50}{130} \approx 11.538 \] \[ E_{22} = \frac{30 \times 44}{130} \approx 10.154 \] \[ E_{23} = \frac{30 \times 36}{130} \approx 8.308 \]
\[ E_{31} = \frac{40 \times 50}{130} \approx 15.385 \] \[ E_{32} = \frac{40 \times 44}{130} \approx 13.538 \] \[ E_{33} = \frac{40 \times 36}{130} \approx 11.077 \]
\[ E_{41} = \frac{30 \times 50}{130} \approx 11.538 \] \[ E_{42} = \frac{30 \times 44}{130} \approx 10.154 \] \[ E_{43} = \frac{30 \times 36}{130} \approx 8.308 \] Expected Frequency Table (rounded to 3 decimal places)
| Age Group | Laptop | Desktop | Tablet | Total |
|---|---|---|---|---|
| Under 18 | 11.538 | 10.154 | 8.308 | 30 |
| 18–29 | 11.538 | 10.154 | 8.308 | 30 |
| 30–49 | 15.385 | 13.538 | 11.077 | 40 |
| 50+ | 11.538 | 10.154 | 8.308 | 30 |
| Total | 50 | 44 | 36 | 130 |
##(c) Discuss whether the necessary assumptions for applying the chi-squared test are satisfied. The assumptions are Variables are categorical Both variables—Age Group and Device Type—are categorical- satisfied.
Observations are independent Each participant’s response is assumed to be independent of others- satisfied.
Expected frequency in each cell should be at least 5 All expected cell frequencies are greater than 5 (lowest expected value is 8.308) - satisfied.
expected < 5
## Laptop Desktop Tablet
## Under18 FALSE FALSE FALSE
## 18to29 FALSE FALSE FALSE
## 30to49 FALSE FALSE FALSE
## 50plus FALSE FALSE FALSE
mean(expected < 5)
## [1] 0
The Chi-Square test statistic is calculated using the formula:
\[ \chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \]
Where:
- \(O_{ij}\) is the observed
frequency in row \(i\), column
\(j\)
- \(E_{ij}\) is the expected
frequency in row \(i\), column
\(j\) under the null
hypothesis
By using R
chi_stat <- sum((observed - expected)^2 / expected)
cat("Chi-squared statistic:", round(chi_stat, 4), "\n")
## Chi-squared statistic: 9.8958
By hand calculation ### Calculation for Each Cell
Under 18:
\[ \frac{(12 - 11.538)^2}{11.538} \approx 0.004 \\ \frac{(6 - 10.154)^2}{10.154} \approx 1.699 \\ \frac{(12 - 8.308)^2}{8.308} \approx 1.641 \]
18–29:
\[ \frac{(14 - 11.538)^2}{11.538} \approx 0.534 \\ \frac{(10 - 10.154)^2}{10.154} \approx 0.002 \\ \frac{(6 - 8.308)^2}{8.308} \approx 0.641 \]
30–49:
\[ \frac{(16 - 15.385)^2}{15.385} \approx 0.025 \\ \frac{(12 - 13.538)^2}{13.538} \approx 0.175 \\ \frac{(12 - 11.077)^2}{11.077} \approx 0.077 \]
50+:
\[ \frac{(8 - 11.538)^2}{11.538} \approx 1.084 \\ \frac{(16 - 10.154)^2}{10.154} \approx 3.358 \\ \frac{(6 - 8.308)^2}{8.308} \approx 0.641 \]
Total:
\[ \chi^2 = 0.004 + 1.699 + 1.641 + 0.534 + 0.002 + 0.641 + 0.025 + 0.175 + 0.077 + 1.084 + 3.358 + 0.641 \approx 9.881 \]
*Since hand-calculations are not accurate, so using the R output for next steps**
# Degrees of freedom
df_chi <- (nrow(observed) - 1) * (ncol(observed) - 1)
# Significance level
alpha_chi <- 0.05
# Critical chi-square value
crit_value_chi <- qchisq(1 - alpha_chi, df_chi)
# Calculate p-value
p_val_chi <- pchisq(chi_stat, df_chi, lower.tail = FALSE)
# Conclusion
if (chi_stat > crit_value_chi) {
result <- "Reject H0: There is significant association between age group and device preference."
} else {
result <- "Fail to reject H0: No significant association between age group and device preference."
}
# Decision based on p-value
if (p_val_chi < alpha_chi) {
pval_decision <- "Reject H0: The p-value is less than the significance level, indicating a significant association."
} else {
pval_decision <- "Fail to reject H0: The p-value is greater than the significance level, indicating insufficient evidence of association."
}
# Printed all results
cat("Degrees of freedom (df):", df_chi, "\n")
## Degrees of freedom (df): 6
cat("Significance level (alpha):", alpha_chi, "\n")
## Significance level (alpha): 0.05
cat("Critical value (χ²):", round(crit_value_chi, 3), "\n")
## Critical value (χ²): 12.592
cat("Calculated chi-square statistic (χ²):", round(chi_stat, 3), "\n")
## Calculated chi-square statistic (χ²): 9.896
cat("P-value:", round(p_val_chi, 4), "\n")
## P-value: 0.1291
cat("Conclusion:", result, "\n")
## Conclusion: Fail to reject H0: No significant association between age group and device preference.
cat("Conclusion based on p-value:\n", pval_decision, "\n")
## Conclusion based on p-value:
## Fail to reject H0: The p-value is greater than the significance level, indicating insufficient evidence of association.
# Critical region description
cat("\nCritical region: χ² >", round(crit_value_chi, 3), "\n")
##
## Critical region: χ² > 12.592
cat("Significance region corresponds to the upper", alpha_chi*100, "% of the chi-square distribution with", df_chi, "df.\n")
## Significance region corresponds to the upper 5 % of the chi-square distribution with 6 df.
Chi-Squared Test Decision: Fail to reject H₀
P-Value Interpretation: Since p = 0.1291 > 0.05, we do not have statistically significant evidence to support that device preference depends on age group.
Conclusion: Therefore, we conclude that the data does not provide sufficient evidence to suggest that age group and device preference are associated. Device preference appears to be independent of age group.