Student ID: 540770313

Semester 1, 2025 - University of Sydney
Course: STAT5002 Introduction to Statistics
Lecturer: Tiangang Cui

Question 1: An automotive engineer wants to determine whether a new type of tire provides better vehicle braking deceleration (in metres per second squared, m/s2 ) compared to the standard tire. Twenty vehicles are randomly selected from the production line. Each vehicle is tested twice: oncewith the new tire (test.A) and once with the standard tire (test.B). The paired results (New Tire vs Standard Tire) are shown below:

# Given Data
test.A <- c(5.90, 5.26, 2.97, 7.15, 10.06, 11.87, 1.94, 6.27, 6.81, 4.08,
            8.13, 15.18, 8.82, 3.87, 5.23, 11.29, 7.92, 12.82, 7.20, 10.03)
test.B <- c(6.07, 4.89, 2.92, 7.00, 9.99, 11.70, 1.94, 5.86, 6.95, 4.03,
            7.76, 15.02, 9.08, 3.73, 4.88, 10.81, 8.05, 12.96, 7.10, 10.07)

# Calculate differences D = test.A - test.B
D_1 <- test.A - test.B
# Number of pairs
n_1 <- length(D_1)

(a) Introduce appropriate parameters and state the null and alternative hypotheses.

Let mean_D(μ_D) represent the true mean difference in braking deceleration between the new tire and the standard tire for vehicles. The engineer conducts a paired test where each vehicle is tested twice: once using the new tire and once using the standard tire.

We are interested in testing whether the new tire provides a statistically significant improvement in braking. Improvement means higher deceleration (vehicles stop faster).

Null Hypothesis (H₀):mean_D(μ_D)=0

There is no difference in mean braking deceleration between the new and standard tires. Any observed difference is due to random variation.

Alternative Hypothesis (H₁):mean_D(μ_D)>0

The mean braking deceleration with the new tire is greater than that with the standard tire. This implies that the new tire improves braking performance on average.

(b) What statistical test should be used to test these hypotheses? Justify your choice.

A paired t-test is the right choice here because the same vehicle is tested with both the new and standard tires. So, the data is naturally paired. This helps cancel out the differences between vehicles and lets us directly compare how each one performs with the two tire types. It gives more accurate results by reducing random variation that would be missed if we treated the data as independent.

(c) Use appropriate graphical summaries to assess whether the necessary assumptions for applying the chosen test are satisfied.

# Box plot to see the outliers
boxplot(D_1, horizontal=T,main="Boxplot of Differences")

hist(D_1, breaks = 10, main = "Histogram of Differences (New - Standard)",
     xlab = "Difference in Deceleration (m/s^2)", col = "green")

qqnorm(D_1, main = "Q-Q Plot of Differences")
qqline(D_1, col = "brown")

The differences in braking deceleration (New - Standard) seem to follow a roughly normal distribution. The histogram is fairly symmetric and centered around zero, with no extreme outliers. The Q-Q plot shows most points close to the line, with only minor deviations at the tails. Even the box plot shows the data is with no outliers.

Thus, the normality assumption is reasonably met, and it is appropriate to use the paired t-test.

(d) Compute the observed test statistic and P-value. In your answer, clearly state the distribution of the test statistic and indicate which values of the statistic argue against the null hypothesis.

Formula for T test: \[ T = \frac{\bar{X} - \mu_0}{SE_0(\bar{X})} = \frac{\bar{X} - \mu_0}{\frac{\hat{\sigma}}{\sqrt{n}}} \]

\[ \hat{\sigma} = \sqrt{ \frac{1}{n - 1} \sum_{i = 1}^{n} (X_i - \bar{X})^2 } \]

mean_D_1 <- mean(D_1)   # Mean difference
sd_D_1 <- sd(D_1)       # Std dev of differences

#Standard error
se_D_1 <- (sd_D_1/sqrt(n_1))

# Compute t-statistic
t_stat_1= mean_D_1/se_D_1

# Degrees of freedom
df_1 <- n_1 - 1

# Calculate one-sided p-value (right tail)
p_value_1 <- 1 - pt(t_stat_1, df_1)

# 95% confidence interval
alpha <- 0.05
t_crit <- qt(1 - alpha/2, df = df_1)
CI_95 <- mean_D_1 + c(-1, 1) * t_crit * se_D_1

# Print results
cat("Mean difference:", round(mean_D_1, 3), "\n")

## Mean difference: 0.099

cat("Standard deviation:", round(sd_D_1, 3), "\n")

## Standard deviation: 0.214

cat("t-statistic:", round(t_stat_1, 3), "\n")

## t-statistic: 2.082

cat("Degrees of freedom:", df_1, "\n")

## Degrees of freedom: 19

cat("One-sided p-value:", round(p_value_1, 4), "\n")

## One-sided p-value: 0.0255

cat("95% Confidence Interval: (", round(CI_95[1], 4), ",", round(CI_95[2], 4), ")\n")

## 95% Confidence Interval: ( -5e-04 , 0.1995 )

The test statistic for the paired t-test is 2.082. This value follows a t-distribution with 19 degrees of freedom if the null hypothesis is true.

The one-sided p-value for this test statistic is 0.0255.

When the test statistic is large and positive, it shows evidence against the null hypothesis. Instead, it supports the idea that the new tire has a higher mean braking deceleration.

Since our test statistic (2.082) is quite large and the p-value (0.0255) is less than 0.05, we reject the null hypothesis.

Conclusion: There is enough evidence to say the new tire improves braking deceleration compared to the standard tire.

e) What is your conclusion based on the calculated P-value? You can either specify your own significance level or use the default 5% level to draw your conclusion.

The p-value is 0.0255. Since this is less than the common 5% significance level, we reject the null hypothesis and conclude that the new tire significantly improves braking deceleration.

However, if we used a stricter level like 2%, the p-value would be too large to reject the null. So, at 2% significance, we wouldn’t have enough evidence to say the new tire is better.

In short, at the usual 5% level, there is enough evidence to support that the new tire improves braking and we reject the null hypothesis.

f) Perform a bootstrap simulation (with 10,000 repetitions) to simulate the test statistic, and plot the histogram of the simulated statistics. Does the histogram of the simulated test statistics agree with the theoretical test distribution used above?

D_centered_1 <- D_1 - mean(D_1)
set.seed(123)  # For reproducibility
B_1 <- 10000     # Number of bootstrap samples
t_boot_1 <- numeric(B_1)

for (i in 1:B_1) {
  # Resample differences with replacement
  sample_D_1 <- sample(D_centered_1, size = n_1, replace = TRUE)
  
  # Calculate sample mean and sd of bootstrap sample
  mean_boot_1 <- mean(sample_D_1)
  sd_boot_1 <- sd(sample_D_1)
  
  # Compute bootstrap t-statistic
  t_boot_1[i] <- mean_boot_1 / (sd_boot_1 / sqrt(n_1))
}

# Plot histogram of bootstrap t-statistics



hist(t_boot_1, freq= F,breaks = 40, col = "blue",
     main = "Bootstrap Distribution of t-statistics",
     xlab = "Bootstrap t-statistic")
# Add theoretical t-distribution curve
curve(dt(x, df = n_1 - 1), add = TRUE, col = "darkred", lwd = 2, lty = 2)
# Add standard normal distribution curve
curve(dnorm(x),add = TRUE, col = "darkgreen", lwd = 2, lty = 3)
abline(v = t_stat_1, col = "pink", lwd = 2)  # observed t-statistic

The histogram of the 10,000 bootstrap t-statistics (using freq = FALSE) displays a bell-shaped distribution centered around zero, consistent with the null hypothesis. The bootstrap was performed using centered differences, ensuring the distribution reflects the null scenario (mean difference = 0).

Overlaid are two theoretical curves: the t-distribution with 19 degrees of freedom (in red) and the standard normal distribution (in green). Both curves align well with the bootstrap histogram, especially the t-distribution.

The observed t-statistic (2.082) is marked in pink and lies in the right tail, indicating moderate evidence against the null. The close match between the curves and the bootstrap distribution supports the use of the paired t-test and its assumptions in this case.

(g) What is the P-value based on the simulated test statistics?

p_boot_1 <- mean(t_boot_1 >= t_stat_1)
cat("Bootstrap p-value:", round(p_boot_1, 4), "\n")

## Bootstrap p-value: 0.0208

Conclusion: Since the bootstrap p-value ≈ 0.0208 < 0.05, we reject H₀ based on the bootstrap method.

The bootstrap p-value of 0.0208 is smaller than the typical significance level of 0.05. This means that under the null hypothesis (that there is no difference in braking performance between the two tires), observing a test statistic as large as or larger than 2.082 is uncommon in the simulated data.

Summary: I performed a paired t-test to determine whether the new tire improves braking deceleration compared to the standard tire, using data from the same set of vehicles tested under both conditions.

Paired t-test Result: t-statistic = 2.082 p-value (one-sided) = 0.0255

Since this p-value is less than 0.05, the result is statistically significant under the classical t-test assumption.

This would lead us to reject the null hypothesis and conclude that the new tire significantly improves braking deceleration.

Bootstrap Simulation Result: Bootstrap p-value = 0.0208

This value is also less than 0.05, indicating that the observed t-statistic is unlikely under the null hypothesis when we rely on the empirical distribution of the data.

This suggests we reject the null hypothesis based on the bootstrap method as well.

The close agreement between the theoretical and bootstrap-based conclusions implies that:

The normality assumption required for the t-test appears to be reasonable for the differences in deceleration.

The bootstrap method, which makes fewer assumptions, confirms the reliability of the classical test result in this case.

Final Verdict: Both the paired t-test (p = 0.0255) and the bootstrap method (p = 0.0208) lead to the same conclusion — we reject the null hypothesis. There is sufficient evidence to claim that the new tire significantly improves braking deceleration. The classical p-value (0.0255) and bootstrap p-value (0.0208) are both below 0.05, reinforcing our conclusion that the new tire significantly improves braking. Conclusion: Based on the bootstrap result, we should reject the null hypothesis. There is sufficient evidence to claim the new tire improves braking deceleration. Both classical and bootstrap analyses suggest that the new tire results in statistically higher braking deceleration compared to the standard tire at the 5% significance level. Therefore, there is sufficient evidence to support the automotive engineer’s claim that the new tire improves vehicle braking performance. Since the p-value (\(\approx 0.0255\)) is less than the significance level \(\alpha = 0.05\), we reject the null hypothesis. There is sufficient statistical evidence to conclude that the new tire provides greater average braking deceleration than the standard tire.

2 A school administrator is investigating whether a new online tutoring program leads to a difference in student performance compared to a traditional in-person tutoring method. Students are randomly assigned to two groups: Group A (Online Tutoring) and Group B (In-Person Tutoring). The normalised student performance indicators for these two groups are shown as follows

#Given data
group.A_2 <- c(5.54, 4.41, 6.35, 5.04, 7.33, 6.47, 4.08, 6.00, 7.39, 5.53, 1.54, 6.16,
             4.23, 2.36, 5.09, 5.10, 5.33, 3.75, 6.49, 2.13, 5.44, 7.74, 3.80)

group.B_2 <- c(4.31, 6.20, 5.25, 2.14, 3.26, 1.47, 2.24, 4.20, 3.56, 3.68, 7.02, 2.94,
             5.49, 3.37, 4.59, 3.05, 5.24)

##(a) State the null and alternative hypotheses to test whether two programs have the same effect on the student performance. In answering, introduce appropriate parameters, as well as a null and alternative hypothesis in terms of these parameters.] Let μ_A is the average student performance for the Online Tutoring group(Group A), and μ_B is the average student performance for the In-Person Tutoring group (Group B).

We want to check if the tutoring method actually makes a difference in how well students perform.

Null Hypothesis (H₀):μ_A - μ_B=0

This means: there is no real difference in the average performance between students who got online tutoring and those who got in-person tutoring. In other words, the type of tutoring doesn’t matter.

Alternative Hypothesis (H₁):μ_A- μ_B ≠0

This means: there is a difference in the average performance between the two groups. So, the kind of tutoring does affect how students perform.

##(b) Use appropriate graphical and numerical summaries to assess whether the necessary assumptions for applying the classical two-sample t-test are satisfied.

Assumptions for two-sample classical t-test: The samples are independent. Each group’s scores are approximately normally distributed. The variances of the two groups are equal (homoscedasticity).

boxplot(group.A_2, group.B_2, names=c("Online", "In-Person"), 
        main="Boxplot of Student Performance", col=c("brown", "yellow"), horizontal=TRUE)

hist(group.A_2, main="Histogram: Group A (Online)", xlab="Scores", col="brown")

hist(group.B_2, main="Histogram: Group B (In-Person)", xlab="Scores", col="yellow")

qqnorm(group.A_2, main="Q-Q Plot: Group A")
qqline(group.A_2)

qqnorm(group.B_2, main="Q-Q Plot: Group B")
qqline(group.B_2)

Numerical Summaries

mean.A_2 <- mean(group.A_2)
mean.B_2 <- mean(group.B_2)
sd.A_2 <- sd(group.A_2)
sd.B_2 <- sd(group.B_2)
n.A_2 <- length(group.A_2)
n.B_2 <- length(group.B_2)

cat("Mean A (Online):", round(mean.A_2, 3), "\n")

## Mean A (Online): 5.1

cat("Mean B (In-Person):", round(mean.B_2, 3), "\n")

## Mean B (In-Person): 4.001

cat("SD A:", round(sd.A_2, 3), "\n")

## SD A: 1.649

cat("SD B:", round(sd.B_2, 3), "\n")

## SD B: 1.5

cat("Sample sizes A and B:", n.A_2, ",", n.B_2, "\n")

## Sample sizes A and B: 23 , 17

Interpretation: The boxplots show the median, spread, and potential outliers. Histograms give a rough idea of normality. Q-Q plots help assess normality by comparing quantiles of data against theoretical normal quantiles. Points roughly on the line indicate approximate normality.

From these plots, you can judge if normality assumption is reasonable.

##(c) Write down the formula for the test statistic of the classical two-sample t-test, and calculate the observed test statistic. Show your working step by step, rounding each step to three decimal places. Two-Sample t-Test Formula

We use the following test statistic for comparing two means:

\[ T = \frac{\bar{X} - \bar{Y}}{\hat{\sigma}_p \sqrt{\frac{1}{m} + \frac{1}{n}}} \sim t_{m+n-2} \]

Where the pooled standard deviation is:

\[ \hat{\sigma}_p = \sqrt{\frac{\sum_{i=1}^{m}(X_i - \bar{X})^2 + \sum_{j=1}^{n}(Y_j - \bar{Y})^2}{m + n - 2}} = \sqrt{\frac{(m - 1)\hat{\sigma}_X^2 + (n - 1)\hat{\sigma}_Y^2}{m + n - 2}} \]

sp2_2 <- ((n.A_2 - 1) * sd.A_2^2 + (n.B_2 - 1) * sd.B_2^2) / (n.A_2 + n.B_2 - 2)
sp_2 <- sqrt(sp2_2)
t_stat_2 <- (mean.A_2 - mean.B_2) / (sp_2* sqrt(1/n.A_2 + 1/n.B_2))
df_2<- n.A_2 + n.B_2 - 2

cat("Pooled variance:", round(sp2_2, 3), "\n")

## Pooled variance: 2.522

#observed t-statistics
cat("Observed t-statistic:", round(t_stat_2, 3), "\n")

## Observed t-statistic: 2.165

# Calculate two-tailed p-value
p_value_2 <- 2 * (1 - pt(abs(t_stat_2), df_2))

# Display p-value
cat("p-value:", round(p_value_2, 4), "\n")

## p-value: 0.0368

##(d) Construct the critical region of rejection at the 5% level of significance. What is your conclusion of the hypothesis test based on the critical region?

t_crit_2 <- qt(0.975, df_2)
cat("Critical t-value (±):", round(t_crit_2, 3), "\n")

## Critical t-value (±): 2.024

# Critical region
critical_region_lower_2 <- -t_crit_2
critical_region_upper_2 <- t_crit_2
critical_region_2 <- c(-1, 1) * t_crit_2
cat("Critical Region: t <", round(critical_region_lower_2, 3), "or t >", round(critical_region_upper_2, 3), "\n")

## Critical Region: t < -2.024 or t > 2.024

# Confidence interval
mean_diff_2 <- mean.A_2 - mean.B_2
margin_error_2 <- t_crit_2 * sp_2 * sqrt(1/n.A_2 + 1/n.B_2)
CI_95_2 <- c(mean_diff_2 - margin_error_2, mean_diff_2 + margin_error_2)
cat("95% Confidence Interval for Mean Difference: [", round(CI_95_2[1], 3), ",", round(CI_95_2[2], 3), "]\n")

## 95% Confidence Interval for Mean Difference: [ 0.071 , 2.128 ]

# Hypothesis test conclusion
if (t_stat_2 < -t_crit_2 | t_stat_2 > t_crit_2) {
  cat("Conclusion: Reject the null hypothesis at the 5% significance level.\n")
} else {
  cat("Conclusion: Fail to reject the null hypothesis at the 5% significance level.\n")
}

## Conclusion: Reject the null hypothesis at the 5% significance level.

Decision:
If |t_stat| > t_crit → Reject H₀.
In this case, t_stat ≈ 2.16 > 2.024 → Reject H₀

##(e) Now conduct a Welch test using R, what is the computed P-value, how does it compare with the classical two-sample t-test?

welch_test_2 <- t.test(group.A_2, group.B_2, var.equal = FALSE)
welch_test_2

## 
##  Welch Two Sample t-test
## 
## data:  group.A_2 and group.B_2
## t = 2.1964, df = 36.294, p-value = 0.03453
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.0845232 2.1143003
## sample estimates:
## mean of x mean of y 
##  5.100000  4.000588

cat("\nConclusion:\n")

## 
## Conclusion:

if (abs(t_stat_2) > t_crit_2) {
  cat("Classical t-test result: Reject H0 → Select H1 (mu_A- mu_B ≠ 0 )\n")
} else {
  cat("Classical t-test result: Fail to reject H0 → Do not select H1\n")
}

## Classical t-test result: Reject H0 → Select H1 (mu_A- mu_B ≠ 0 )

if (welch_test_2$p.value < 0.05) {
  cat("Welch's t-test result: Reject H0 → Select H1 (mu_A- mu_B ≠ 0 ) \n")
} else {
  cat("Welch's t-test result: Fail to reject H0 → Do not select H1\n")
}

## Welch's t-test result: Reject H0 → Select H1 (mu_A- mu_B ≠ 0 )

Welch’s t-test vs. Classical Two-Sample t-test To address the potential inequality of variances between the two groups, a Welch Two Sample t-test was conducted using R. The results of the Welch test are as follows:

Welch t-statistic: 2.196 Degrees of freedom (Welch): 36.29 p-value (Welch): 0.0345 95% Confidence Interval: [0.085, 2.114] Mean of Group A (Online): 5.10 Mean of Group B (In-Person): 4.00

In comparison, the classical two-sample t-test (assuming equal variances) yielded: Classical t-statistic: 2.165 Degrees of freedom: 38 p-value (Classical): 0.0368 95% Confidence Interval: [0.071, 2.128]

Conclusion: Both tests produce similar t-statistics and p-values, leading to the same conclusion: Reject the null hypothesis at the 5% significance level. However, the Welch test is more robust in cases where variances are unequal or sample sizes differ (as seen here: 23 vs. 17). It slightly adjusts the degrees of freedom downward (from 38 to ~36.29), leading to a slightly smaller p-value (0.0345 vs. 0.0368).

3)Consider the table below, which shows the number of individuals in each of four age groups (under 18, 18–29, 30–49, 50+) who prefer one of three device types: laptop, desktop, or tablet. The data comes from a survey on technology usage preferences:

Device Usage Table

Age Group	Laptop	Desktop	Tablet	Total
Under 18	12	6	12	30
18–29	14	10	6	30
30–49	16	12	12	40
50+	8	16	6	30
Total	50	44	36	130

##(a) State the null and alternative hypotheses. We are testing whether device preference (Laptop, Desktop, Tablet) is independent of age group.

Null Hypothesis (H₀): Device preference is independent of age group. In other words, the distribution of device preferences is the same across all age groups.

Alternative Hypothesis (H₁): Device preference is dependent on age group. That is, the distribution of device preferences differs between age groups.

##(b) Set up the table of expected frequencies. ### Expected Value Formula

The expected value for each cell is calculated using the formula:

\[ E_{ij} = \frac{(\text{Row Total})_i \times (\text{Column Total})_j}{\text{Grand Total}} \]

Using R

observed <- matrix(c(
  12, 6, 12,
  14, 10, 6,
  16, 12, 12,
  8, 16, 6
), nrow = 4, byrow = TRUE)


rownames(observed) <- c("Under18", "18to29", "30to49", "50plus")
colnames(observed) <- c("Laptop", "Desktop", "Tablet")

round(observed,3)

##         Laptop Desktop Tablet
## Under18     12       6     12
## 18to29      14      10      6
## 30to49      16      12     12
## 50plus       8      16      6

cat("\n")

row_totals <- rowSums(observed)
row_totals

## Under18  18to29  30to49  50plus 
##      30      30      40      30

cat("\n")

col_totals <- colSums(observed)
col_totals

##  Laptop Desktop  Tablet 
##      50      44      36

cat("\n")

total <- sum(observed)
expected <- outer(row_totals, col_totals) / total
# Loop to Print Formula for Each Expected Value
for (i in 1:nrow(expected)) {
  for (j in 1:ncol(expected)) {
    row_name <- rownames(expected)[i]
    col_name <- colnames(expected)[j]
    row_total <- row_totals[i]
    col_total <- col_totals[j]
    exp_value <- expected[i, j]
    
    cat(sprintf("E[%s, %s] = (%d * %d) / %d = %.3f\n",
                row_name, col_name, row_total, col_total, total, exp_value))
  }
}

## E[Under18, Laptop] = (30 * 50) / 130 = 11.538
## E[Under18, Desktop] = (30 * 44) / 130 = 10.154
## E[Under18, Tablet] = (30 * 36) / 130 = 8.308
## E[18to29, Laptop] = (30 * 50) / 130 = 11.538
## E[18to29, Desktop] = (30 * 44) / 130 = 10.154
## E[18to29, Tablet] = (30 * 36) / 130 = 8.308
## E[30to49, Laptop] = (40 * 50) / 130 = 15.385
## E[30to49, Desktop] = (40 * 44) / 130 = 13.538
## E[30to49, Tablet] = (40 * 36) / 130 = 11.077
## E[50plus, Laptop] = (30 * 50) / 130 = 11.538
## E[50plus, Desktop] = (30 * 44) / 130 = 10.154
## E[50plus, Tablet] = (30 * 36) / 130 = 8.308

cat("\n")

round(expected, 3)

##         Laptop Desktop Tablet
## Under18 11.538  10.154  8.308
## 18to29  11.538  10.154  8.308
## 30to49  15.385  13.538 11.077
## 50plus  11.538  10.154  8.308

cat("\n")

By hand calculation

Under 18:

\[ E_{11} = \frac{30 \times 50}{130} \approx 11.538 \] \[ E_{12} = \frac{30 \times 44}{130} \approx 10.154 \] \[ E_{13} = \frac{30 \times 36}{130} \approx 8.308 \]

18–29:

\[ E_{21} = \frac{30 \times 50}{130} \approx 11.538 \] \[ E_{22} = \frac{30 \times 44}{130} \approx 10.154 \] \[ E_{23} = \frac{30 \times 36}{130} \approx 8.308 \]

30–49:

\[ E_{31} = \frac{40 \times 50}{130} \approx 15.385 \] \[ E_{32} = \frac{40 \times 44}{130} \approx 13.538 \] \[ E_{33} = \frac{40 \times 36}{130} \approx 11.077 \]

50+:

\[ E_{41} = \frac{30 \times 50}{130} \approx 11.538 \] \[ E_{42} = \frac{30 \times 44}{130} \approx 10.154 \] \[ E_{43} = \frac{30 \times 36}{130} \approx 8.308 \] Expected Frequency Table (rounded to 3 decimal places)

Age Group	Laptop	Desktop	Tablet	Total
Under 18	11.538	10.154	8.308	30
18–29	11.538	10.154	8.308	30
30–49	15.385	13.538	11.077	40
50+	11.538	10.154	8.308	30
Total	50	44	36	130

##(c) Discuss whether the necessary assumptions for applying the chi-squared test are satisfied. The assumptions are Variables are categorical Both variables—Age Group and Device Type—are categorical- satisfied.

Observations are independent Each participant’s response is assumed to be independent of others- satisfied.

Expected frequency in each cell should be at least 5 All expected cell frequencies are greater than 5 (lowest expected value is 8.308) - satisfied.

expected < 5

##         Laptop Desktop Tablet
## Under18  FALSE   FALSE  FALSE
## 18to29   FALSE   FALSE  FALSE
## 30to49   FALSE   FALSE  FALSE
## 50plus   FALSE   FALSE  FALSE

mean(expected < 5)

## [1] 0

(d) Compute the observed test statistic.

The Chi-Square test statistic is calculated using the formula:

\[ \chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \]

Where:
- \(O_{ij}\) is the observed frequency in row \(i\), column \(j\)
- \(E_{ij}\) is the expected frequency in row \(i\), column \(j\) under the null hypothesis

By using R

chi_stat <- sum((observed - expected)^2 / expected)
cat("Chi-squared statistic:", round(chi_stat, 4), "\n")

## Chi-squared statistic: 9.8958

By hand calculation ### Calculation for Each Cell

Under 18:

\[ \frac{(12 - 11.538)^2}{11.538} \approx 0.004 \\ \frac{(6 - 10.154)^2}{10.154} \approx 1.699 \\ \frac{(12 - 8.308)^2}{8.308} \approx 1.641 \]

18–29:

\[ \frac{(14 - 11.538)^2}{11.538} \approx 0.534 \\ \frac{(10 - 10.154)^2}{10.154} \approx 0.002 \\ \frac{(6 - 8.308)^2}{8.308} \approx 0.641 \]

30–49:

\[ \frac{(16 - 15.385)^2}{15.385} \approx 0.025 \\ \frac{(12 - 13.538)^2}{13.538} \approx 0.175 \\ \frac{(12 - 11.077)^2}{11.077} \approx 0.077 \]

50+:

\[ \frac{(8 - 11.538)^2}{11.538} \approx 1.084 \\ \frac{(16 - 10.154)^2}{10.154} \approx 3.358 \\ \frac{(6 - 8.308)^2}{8.308} \approx 0.641 \]

Total:

\[ \chi^2 = 0.004 + 1.699 + 1.641 + 0.534 + 0.002 + 0.641 + 0.025 + 0.175 + 0.077 + 1.084 + 3.358 + 0.641 \approx 9.881 \]

*Since hand-calculations are not accurate, so using the R output for next steps**

(e) Construct the critical region of rejection at the 5% level of significance. What is your conclusion of the hypothesis test? Justify your answer

# Degrees of freedom
df_chi <- (nrow(observed) - 1) * (ncol(observed) - 1)

# Significance level
alpha_chi <- 0.05

# Critical chi-square value
crit_value_chi <- qchisq(1 - alpha_chi, df_chi)

# Calculate p-value
p_val_chi <- pchisq(chi_stat, df_chi, lower.tail = FALSE)

# Conclusion
if (chi_stat > crit_value_chi) {
  result <- "Reject H0: There is significant association between age group and device preference."
} else {
  result <- "Fail to reject H0: No significant association between age group and device preference."
}

# Decision based on p-value
if (p_val_chi < alpha_chi) {
  pval_decision <- "Reject H0: The p-value is less than the significance level, indicating a significant association."
} else {
  pval_decision <- "Fail to reject H0: The p-value is greater than the significance level, indicating insufficient evidence of association."
}


# Printed all results
cat("Degrees of freedom (df):", df_chi, "\n")

## Degrees of freedom (df): 6

cat("Significance level (alpha):", alpha_chi, "\n")

## Significance level (alpha): 0.05

cat("Critical value (χ²):", round(crit_value_chi, 3), "\n")

## Critical value (χ²): 12.592

cat("Calculated chi-square statistic (χ²):", round(chi_stat, 3), "\n")

## Calculated chi-square statistic (χ²): 9.896

cat("P-value:", round(p_val_chi, 4), "\n")

## P-value: 0.1291

cat("Conclusion:", result, "\n")

## Conclusion: Fail to reject H0: No significant association between age group and device preference.

cat("Conclusion based on p-value:\n", pval_decision, "\n")

## Conclusion based on p-value:
##  Fail to reject H0: The p-value is greater than the significance level, indicating insufficient evidence of association.

# Critical region description
cat("\nCritical region: χ² >", round(crit_value_chi, 3), "\n")

## 
## Critical region: χ² > 12.592

cat("Significance region corresponds to the upper", alpha_chi*100, "% of the chi-square distribution with", df_chi, "df.\n")

## Significance region corresponds to the upper 5 % of the chi-square distribution with 6 df.

Chi-Squared Test Decision: Fail to reject H₀

P-Value Interpretation: Since p = 0.1291 > 0.05, we do not have statistically significant evidence to support that device preference depends on age group.

Conclusion: Therefore, we conclude that the data does not provide sufficient evidence to suggest that age group and device preference are associated. Device preference appears to be independent of age group.

STAT5002 Individual Assignment