Statistical Inferences
Assignment ~ Week 14
1 About Statistical Hypothesis Testing
Statistical hypothesis testing is a core inferential method used to evaluate claims about population parameters based on sample data. It enables analysts to make objective decisions under uncertainty by comparing observed sample evidence against predefined statistical assumptions.
This report presents six study cases that illustrate the application of hypothesis testing across different data types and analytical contexts. The key concepts and methods addressed in this report include:
- Formulation of Statistical Hypotheses, including null hypotheses (\(H_0\)) and alternative hypotheses (\(H_1\)), as the foundation of all inferential testing procedures.
- One-Sample Z-Test, applied when the population variance is known and the sample size is sufficiently large, to evaluate claims about a population mean.
- One-Sample and Two-Sample t-Tests, used when the population variance is unknown and/or sample sizes are small, including their application in A/B testing scenarios.
- Chi-Square Test of Independence, employed to assess the relationship between two categorical variables in contingency tables.
- Type I and Type II Errors, examined conceptually to highlight the risks of incorrect statistical decisions and their implications in business and financial contexts.
- P-Value Interpretation and Statistical Decision-Making, emphasizing how significance levels (\(\alpha\)) guide hypothesis testing outcomes and why p-values should not be misinterpreted as measures of effect size.
Each study case follows a structured analytical process consisting of hypothesis formulation, test identification, test statistic and p-value computation, statistical decision-making, and interpretation of results within a practical business or managerial context. Through these cases, the report demonstrates how hypothesis testing supports evidence-based decision-making while balancing uncertainty, risk, and reliability.
2 Case Study 1: One-Sample Z-Test (Statistical Hypotheses)
A digital learning platform claims that the average daily study time of its users is 120 minutes. Based on historical records, the population standard deviation is known to be 15 minutes.
A random sample of 64 users is collected, resulting in an average study time of 116 minutes.
The relevant statistical parameters are summarized as follows:
A digital learning platform claims that the average daily
study time of its users is 120 minutes.
Based on historical records, the population standard deviation
is known to be 15 minutes.
A random sample of 64 users is collected, resulting in an average study time of 116 minutes.
The relevant statistical parameters are summarized as follows:
\[ \begin{aligned} \mu &= 120 \\ \sigma &= 15 \\ n &= 64 \\ \bar{x} &= 116 \end{aligned} \]
Tasks
- Formulate the Null Hypothesis (H₀) and
Alternative Hypothesis (H₁).
- Identify the appropriate statistical test and justify the
choice.
- Compute the test statistic and
p-value using \(\alpha =
0.05\).
- State the statistical decision.
- Interpret the result in a business analytics context.
2.1 Formulation of Hypotheses
The hypotheses are defined as follows:
\[ H_0: \mu = 120 \]
\[ H_1: \mu \neq 120 \]
This is a two-tailed hypothesis test, since we are testing whether the true mean differs from the claimed value.
2.2 Identification of the Statistical Test
Because the population standard deviation is known (\(\sigma = 15\)) and the sample size is sufficiently large (\(n = 64\)), the appropriate statistical method is a One-Sample Z-Test for the population mean.
This test is appropriate under the assumption that the sampling distribution of the sample mean follows a normal distribution, supported by the Central Limit Theorem.
2.3 Mathematical Computation
Given parameters:
\[ \mu_0 = 120, \quad \sigma = 15, \quad n = 64, \quad \bar{x} = 116 \]
The Z-test statistic is calculated as:
\[ Z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}} \]
Standard error:
\[ \frac{\sigma}{\sqrt{n}} = \frac{15}{\sqrt{64}} = 1.875 \]
Z-statistic:
\[ Z = \frac{116 - 120}{1.875} = -2.133 \]
To obtain the p-value for a two-tailed test:
\[ p\text{-value} = 2P(Z \leq -|2.133|) \]
\[ p\text{-value} \approx 2 \times 0.0165 = 0.033 \]
2.4 Statistical Decision
At a significance level of
\[
\alpha = 0.05
\]
the following results are obtained:
Test statistic:
\[ Z = -2.133 \]p-value:
\[ p\text{-value} \approx 0.033 \]
Since the p-value is smaller than the significance level (0.05), the null hypothesis (H₀) is rejected.
Therefore, there is sufficient statistical evidence to conclude that the true average daily study time differs significantly from 120 minutes.
2.5 Business Analytics Interpretation
From a business analytics perspective, this finding indicates that users are spending significantly less time studying than the platform’s claimed average of 120 minutes per day. This gap between the expected and observed study time may reflect declining user engagement, suboptimal learning content, or usability issues within the digital learning platform.
For decision-makers, this result serves as an early warning signal that current engagement strategies may not be fully effective. Management should consider conducting deeper behavioral analyses, refining content delivery, or implementing targeted engagement initiatives to increase user study time and improve overall platform performance.
3 Case Study 2: One-Sample T-Test (Sigma Unknown, Small Sample)
A UX Research team investigates whether the average task
completion time of a new application differs from 10
minutes.
The study collects task completion times (in minutes) from 10
users, resulting in the following sample:
\[ x = 9.2, 10.5, 9.8, 10.1, 9.6, 10.3, 9.9, 9.7, 10.0, 9.5 \]
The objective is to evaluate whether the observed data provide sufficient statistical evidence to conclude that the true average task completion time is different from the claimed value of 10 minutes.
Tasks
- Formulate the Null Hypothesis (H₀) and
Alternative Hypothesis (H₁).
- Determine the appropriate hypothesis test and justify the
choice.
- Compute the test statistic and
p-value using \(\alpha =
0.05\).
- State the statistical decision.
- Explain how sample size affects inferential reliability.
3.1 Formulation of Hypotheses
\[ H_0: \mu = 10 \]
\[ H_1: \mu \neq 10 \]
This is a two-tailed hypothesis test, since we want to test whether the true mean differs from the claimed value in either direction.
3.2 Identification of the Statistical Test
Because the population standard deviation is unknown and the sample size is small (\(n = 10 < 30\)), the appropriate test is a One-Sample t-Test.
The t-test is used instead of a Z-test because the sampling distribution of the mean follows a Student’s t-distribution, which accounts for the additional uncertainty from estimating the standard deviation with a small sample.
3.3 Mathematical Computation
Sample size:
\[ n = 10 \]
Sample mean:
\[ \bar{x} = \frac{1}{10}\sum_{i=1}^{10} x_i = 9.86 \]
Sample standard deviation:
\[ s = \sqrt{\frac{\sum (x_i - \bar{x})^2}{n-1}} \approx 0.41 \]
Hypothesized mean:
\[ \mu_0 = 10 \]
The t-test statistic is computed as:
\[ t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}} \]
Substituting the values:
\[ t = \frac{9.86 - 10}{0.41 / \sqrt{10}} \]
\[ t = \frac{-0.14}{0.1297} \approx -1.08 \]
Degrees of freedom:
\[ df = n - 1 = 9 \]
Two-tailed p-value:
\[ p\text{-value} = 2P(T_9 \le -|1.08|) \]
\[ p\text{-value} \approx 0.31 \]
3.4 Statistical Decision
At significance level:
\[ \alpha = 0.05 \]
We have:
\[ p\text{-value} \approx 0.31 > 0.05 \]
Since the p-value exceeds α, we fail to reject the null hypothesis (H₀).
Conclusion: There is insufficient statistical evidence to conclude that the true average task completion time differs from 10 minutes.
3.5 Effect of Sample Size on Inferential Reliability
Sample size plays a critical role in the reliability of statistical inference. With a small sample size, estimates of the population mean and standard deviation tend to exhibit greater variability, which reduces the statistical power of hypothesis tests. As a result, the likelihood of committing a Type II error—failing to detect a true difference—increases.
As the sample size grows, the standard error decreases and the sampling distribution of the mean becomes more stable. This leads to more precise estimates, stronger hypothesis testing results, and more reliable inferential conclusions. Consequently, conclusions drawn from small samples should be interpreted with greater caution than those based on larger datasets.
4 Case Study 3: Two-Sample T-Test (A/B Testing)
A product analytics team conducts an A/B test to compare the average session duration (minutes) between two versions of a landing page.
| Version | Sample Size (n) | Mean (\(\bar{x}\)) | Standard Deviation (s) |
|---|---|---|---|
| A | 25 | 4.8 | 1.2 |
| B | 25 | 5.4 | 1.4 |
Tasks
- Formulate the Null Hypothesis (H₀) and
Alternative Hypothesis (H₁).
- Identify the appropriate t-test type and justify
the choice.
- Compute the test statistic and
p-value.
- Make a statistical decision using \(\alpha = 0.05\).
- Interpret the result in the context of product decision-making.
4.1 Formulation of Hypotheses
The hypotheses for this A/B test are:
\[ H_0: \mu_A = \mu_B \quad \text{(no difference in average session duration)} \]
\[ H_1: \mu_A \neq \mu_B \quad \text{(difference exists in average session duration)} \]
This is a two-tailed hypothesis test, as we are checking for any difference in either direction.
4.2 Identification of the Statistical Test
Because the population variances are unknown and sample sizes are equal but small (\(n_A = n_B = 25 < 30\)), the appropriate test is a Two-Sample t-Test (Independent Samples, Unequal Variances / Welch’s t-test).
Welch’s t-test is preferred when the standard deviations of the two groups may differ.
4.3 Mathematical Computation
Sample statistics:
\[ \bar{x}_A = 4.8, \quad s_A = 1.2, \quad n_A = 25 \]
\[ \bar{x}_B = 5.4, \quad s_B = 1.4, \quad n_B = 25 \]
The t-test statistic for two independent samples is computed as:
\[ t = \frac{\bar{x}_A - \bar{x}_B}{\sqrt{\frac{s_A^2}{n_A} + \frac{s_B^2}{n_B}}} \]
Substitute the values:
\[ t = \frac{4.8 - 5.4}{\sqrt{\frac{1.2^2}{25} + \frac{1.4^2}{25}}} \]
\[ t = \frac{-0.6}{\sqrt{\frac{1.44}{25} + \frac{1.96}{25}}} \]
\[ t = \frac{-0.6}{\sqrt{0.0576 + 0.0784}} \]
\[ t = \frac{-0.6}{\sqrt{0.136}} \]
\[ t = \frac{-0.6}{0.3686} \approx -1.63 \]
Degrees of freedom (Welch-Satterthwaite equation):
\[ df = \frac{\left(\frac{s_A^2}{n_A} + \frac{s_B^2}{n_B}\right)^2}{\frac{(s_A^2/n_A)^2}{n_A-1} + \frac{(s_B^2/n_B)^2}{n_B-1}} \]
\[ df = \frac{(0.0576 + 0.0784)^2}{\frac{0.0576^2}{24} + \frac{0.0784^2}{24}} \approx 47 \]
Two-tailed p-value:
\[ p\text{-value} = 2P(T_{47} \le -1.63) \approx 0.11 \]
4.4 Statistical Decision
At significance level:
\[ \alpha = 0.05 \]
We have:
\[ p\text{-value} \approx 0.11 > 0.05 \]
Since the p-value exceeds α, we fail to reject the null hypothesis (H₀).
Conclusion: There is insufficient statistical evidence to conclude that the two versions of the landing page differ significantly in average session duration.
4.5 Product Analytics Interpretation
From a product decision-making perspective, this result suggests that the observed difference in session duration between Version A and Version B could be due to random variation rather than a true difference. Therefore, the team may decide that both versions perform similarly in terms of engagement.
Management could consider collecting more data (larger sample size) or testing additional metrics before making major changes to the landing page design. This ensures that any product decisions are based on statistically reliable evidence.
5 Case Study 4: Chi-Square Test of Independence
An e-commerce company examines whether
device type is associated with payment method
preference.
The observed frequencies are as follows:
| Device | E-Wallet | Credit Card | Cash on Delivery |
|---|---|---|---|
| Mobile | 120 | 80 | 50 |
| Desktop | 60 | 90 | 40 |
The objective is to determine whether device type and payment method preference are statistically independent.
Tasks
- State the Null Hypothesis (H₀) and
Alternative Hypothesis (H₁).
- Identify the appropriate statistical test.
- Compute the Chi-Square statistic (χ²).
- Determine the p-value at α = 0.05.
- Interpret the results in terms of digital payment strategy.
5.1 Formulation of Hypotheses
\[ H_0: \text{Device type and payment method preference are independent} \]
\[ H_1: \text{Device type and payment method preference are not independent} \]
This is a test of independence in a contingency table.
5.2 Identification of the Statistical Test
Since the data are categorical and presented in a contingency table, the appropriate test is the Chi-Square Test of Independence.
This test examines whether the observed frequencies differ significantly from the expected frequencies under the assumption of independence.
5.3 Mathematical Computation
Let \(O_{ij}\) denote the observed
frequency in row \(i\) and column \(j\).
The expected frequency under independence is:
\[ E_{ij} = \frac{(\text{Row Total})_i \times (\text{Column Total})_j}{\text{Grand Total}} \]
5.3.1 Calculate row totals, column totals, and grand total
Row totals:
\[ \text{Mobile} = 120 + 80 + 50 = 250 \]
\[ \text{Desktop} = 60 + 90 + 40 = 190 \]Column totals:
\[ \text{E-Wallet} = 120 + 60 = 180 \]
\[ \text{Credit Card} = 80 + 90 = 170 \]
\[ \text{Cash on Delivery} = 50 + 40 = 90 \]Grand Total:
\[ N = 250 + 190 = 440 \]
5.3.2 Compute expected frequencies
\[ E_{\text{Mobile, E-Wallet}} = \frac{250 \times 180}{440} = 102.27 \]
\[ E_{\text{Mobile, Credit Card}} = \frac{250 \times 170}{440} = 96.59 \]
\[ E_{\text{Mobile, Cash on Delivery}} = \frac{250 \times 90}{440} = 51.14 \]
\[ E_{\text{Desktop, E-Wallet}} = \frac{190 \times 180}{440} = 77.73 \]
\[ E_{\text{Desktop, Credit Card}} = \frac{190 \times 170}{440} = 73.41 \]
\[ E_{\text{Desktop, Cash on Delivery}} = \frac{190 \times 90}{440} = 38.86 \]
5.3.3 Compute Chi-Square statistic
\[ \chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \]
\[ \chi^2 = \frac{(120-102.27)^2}{102.27} + \frac{(80-96.59)^2}{96.59} + \frac{(50-51.14)^2}{51.14} + \frac{(60-77.73)^2}{77.73} + \frac{(90-73.41)^2}{73.41} + \frac{(40-38.86)^2}{38.86} \]
\[ \chi^2 \approx \frac{17.73^2}{102.27} + \frac{(-16.59)^2}{96.59} + \frac{(-1.14)^2}{51.14} + \frac{(-17.73)^2}{77.73} + \frac{16.59^2}{73.41} + \frac{1.14^2}{38.86} \]
\[ \chi^2 \approx 3.07 + 2.85 + 0.03 + 4.04 + 3.75 + 0.03 \]
\[ \chi^2 \approx 13.77 \]
5.4 Determine p-value
Degrees of freedom:
\[ df = (\text{Number of Rows} - 1) \times (\text{Number of Columns} - 1) = (2-1)(3-1) = 2 \]
P-value (right-tailed):
\[ p\text{-value} = P(\chi^2_2 \ge 13.77) \approx 0.001 \]
Since \(p\text{-value} < \alpha = 0.05\), we reject the null hypothesis (H₀).
5.5 Interpretation for Digital Payment Strategy
The test indicates that device type and payment method preference are not independent. Mobile users show a stronger preference for E-Wallets, while desktop users prefer Credit Cards. This suggests that the e-commerce company can customize payment options by device type to improve conversion rates and user experience. Marketing and UI strategies can be tailored to highlight preferred payment methods on each device type, supporting more effective business decisions.
6 Case Study 5: Type I and Type II Errors (Conceptual)
A fintech startup tests whether a new fraud detection algorithm
reduces fraudulent transactions.
The hypotheses are:
\[ H_0: \text{The new algorithm does not reduce fraud} \]
\[ H_1: \text{The new algorithm reduces fraud} \]
Tasks
- Explain a Type I Error (α) in this context.
- Explain a Type II Error (β) in this context.
- Identify which error is more costly from a business
perspective.
- Discuss how sample size affects Type II Error.
- Explain the relationship between α, β, and statistical power.
6.1 Type I Error
A Type I Error (α) occurs when the startup concludes that the new fraud detection algorithm reduces fraudulent transactions, even though in reality it does not. This means the null hypothesis is incorrectly rejected, and the company might implement an ineffective algorithm based on misleading evidence. In this context, the error reflects a false positive claim about the algorithm’s effectiveness.
6.2 Type II Error
A Type II Error (β) happens when the startup fails to detect that the new algorithm actually reduces fraud, leading to incorrectly accepting the null hypothesis. In this case, the algorithm is effective, but the test does not provide sufficient evidence to reject H₀. This error represents a false negative, where the company misses the opportunity to adopt a beneficial fraud detection system.
6.3 More Costly Error
From a business perspective, the more costly error depends on the potential impact of undetected fraud versus wasted resources. In most fintech applications, a Type II Error is usually more serious, because failing to implement a genuinely effective fraud detection algorithm can result in financial losses and increased risk exposure, whereas a Type I Error primarily leads to resource misallocation without immediate financial harm.
6.4 Effect of Sample Size on Type II Error
The sample size directly affects the probability of committing a Type II Error. A larger sample size reduces the standard error of the estimate, increases the test’s precision, and lowers β, making it more likely to detect a true effect. Conversely, a small sample size increases variability and decreases statistical power, which raises the risk of a Type II Error and may prevent the detection of a genuinely effective algorithm.
6.5 Relationship
The significance level (α) is the probability of committing a Type I Error, and β is the probability of committing a Type II Error. Statistical power is defined as \(1 - \beta\), representing the probability of correctly rejecting a false null hypothesis. There is a trade-off between α and β: lowering α (being more conservative) often increases β, and increasing the sample size can reduce β while maintaining α. Understanding this relationship helps in experiment design to achieve sufficient power while controlling the likelihood of false positives and false negatives.
7 Case Study 6: P-Value and Statistical Decision Making
A churn prediction model evaluation yields the following results:
- Test statistic = 2.31
- p-value = 0.021
- Significance level: \(\alpha = 0.05\)
Tasks
- Explain the meaning of the p-value.
- Make a statistical decision.
- Translate the decision into non-technical language for
management.
- Discuss the risk if the sample is not representative.
- Explain why the p-value does not measure effect size.
7.1 Meaning of the P-Value
The p-value is the probability of obtaining a test statistic as extreme as, or more extreme than, the observed value assuming the null hypothesis is true. In this case, a p-value of 0.021 means there is a 2.1% probability of observing such results if the churn prediction model had no real predictive effect.
7.2 Statistical Decision
Given the significance level:
\[ \alpha = 0.05 \]
we compare the p-value to α:
\[ p\text{-value} = 0.021 < 0.05 \]
Since the p-value is smaller than α, we reject the null hypothesis. This indicates that the model’s predictions are statistically significant.
7.3 Non-Technical Explanation
The model is capable of effectively identifying customers who are likely to churn.
7.4 Risk of Non-Representative Sample
If the sample used for evaluation does not reflect the actual customer population, the conclusions may be inaccurate. The model may appear effective in the sample but could fail when applied to the overall population, leading to incorrect assessments of performance.
7.5 Why P-Value Does Not Measure Effect Size
The p-value only measures statistical significance, not the magnitude or importance of the effect. A small p-value does not indicate that the model has a large or practically meaningful impact, only that the observed result is unlikely under the null hypothesis. Effect size must be measured using other metrics.
References
[1] T. Wonnacott and R. Wonnacott, Introductory Statistics, 12th ed. Wiley, 2020.
[2] D. S. Moore, G. P. McCabe, and B. A. Craig, Introduction to the Practice of Statistics, 10th ed. W.H. Freeman, 2021.
[3] B. B. Gerstman, Basic Biostatistics: Statistics for Public Health Practice, 2nd ed. Jones & Bartlett Learning, 2019.