STATISTICAL INFERENCES
Tugas Week 14
Adam Richie Wijaya
Detail Profil Mahasiswa
Program Studi
Sains DataUniversitas
Institut Teknologi Sains Bandung (ITSB)Mata Kuliah
Statistik DasarDosen Pengampu
BAKTI SIREGAR, M.Sc., CDS.Keahlian Utama
1 Case Study 1
1.1 One-Sample Z-Test (Statistical Hypotheses)
A digital learning platform claims that the average daily study time of its users is 120 minutes. Based on historical records, the population standard deviation is known to be 15 minutes.
A random sample of 64 users shows an average study time of 116 minutes.
\[ \begin{eqnarray*} \mu_0 &=& 120 \\ \sigma &=& 15 \\ n &=& 64 \\ \bar{x} &=& 116 \end{eqnarray*} \]
1.2 Tasks
- Formulate the Null Hypothesis (H₀) and Alternative Hypothesis (H₁).
- Identify the appropriate statistical test and justify your choice.
- Compute the test statistic and p-value using α=0.05.
- State the statistical decision.
- Interpret the result in a business analytics context.
1.2.1 Statistical Hypotheses
Null Hypothesis (\(H_0\)): \(\mu = 120\) (The average daily study time is 120 minutes).
Alternative Hypothesis (\(H_1\)): \(\mu \neq 120\) (The average daily study time is not 120 minutes). Note: This is a two-tailed test.
1.2.2 Appropriate Statistical Test
The appropriate test is the One-Sample Z-Test. Justification:
The sample size is large (\(n = 64 > 30\)).
The population standard deviation (\(\sigma\)) is known.
1.2.3 Compute Test Statistic and P-Value
Test Statistic (Z):
\[Z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}} = \frac{116 - 120}{15 / \sqrt{64}} = \frac{-4}{1.875} = -2.133\]P-Value:Since this is a two-tailed test, \(P\text{-value} = 2 \times P(Z < -2.133)\).Using the Z-distribution table, \(P(Z < -2.133) \approx 0.0165\).\(P\text{-value} \approx 0.033\).
1.2.4 Statistical Decision
Compare the \(P\text{-value}\) to \(\alpha\):
\(P\text{-value} (0.033) < \alpha (0.05)\).
Decision: Reject \(H_0\).
1.2.5 Business Analytics Interpretation
There is sufficient statistical evidence at the 95% confidence level to conclude that the average daily study time of the platform’s users is significantly different from the 120 minutes claimed by the company. Specifically, the data suggests that users are studying less than the claimed average.
2 Case Study 2
2.1 One-Sample T-Test (σ Unknown, Small Sample)
A UX Research Team investigates whether the average task completion time of a new application differs from 10 minutes.
The following data are collected from 10 users:
\[ 9.2,\; 10.5,\; 9.8,\; 10.1,\; 9.6,\; 10.3,\; 9.9,\; 9.7,\; 10.0,\; 9.5 \]
2.2 Tasks
- Define H₀ and H₁ (two-tailed).
- Determine the appropriate hypothesis test.
- Calculate the t-statistic and p-value at \(\alpha = 0.05\).
- Make a statistical decision.
- Explain how sample size affects inferential reliability.
2.2.1 Define \(H_0\) and \(H_1\) (Two-Tailed)
Since we are testing for any significant difference (either higher or lower) from the target:
\(H_0\) (Null Hypothesis): \(\mu = 10\). The average task completion time is equal to 10 minutes.
\(H_1\) (Alternative Hypothesis): \(\mu \neq 10\). The average task completion time is not equal to 10 minutes.
2.2.2 Appropriate Hypothesis Test
The appropriate test is the One-Sample T-Test. Reasons:
There is only one sample group being compared to a known mean.
The population standard deviation (\(\sigma\)) is unknown.
The sample size is small (\(n = 10\)).
2.2.3 Calculate T-Statistic and P-Value (\(\alpha = 0.05\))
Data: 9.2, 10.5, 9.8, 10.1, 9.6, 10.3, 9.9, 9.7, 10.0, 9.5.
Sample Mean (\(\bar{x}\)): 9.89
Sample Standard Deviation (\(s\)): 0.387
Sample Size (\(n\)): 10
Degrees of Freedom (\(df\)): \(n - 1 = 9\)
T-Statistic Calculation:\[t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}} = \frac{9.89 - 10}{0.387 / \sqrt{10}} \approx -0.899\]P-Value:Using a T-distribution table with \(df = 9\) and \(t = -0.899\), the p-value is approximately 0.392 (two-tailed).
2.2.4 Statistical Decision
Significance Level (\(\alpha\)): 0.05.
Decision Rule: If p-value > \(\alpha\), we fail to reject \(H_0\).
Decision: Since \(0.392 > 0.05\), we fail to reject \(H_0\).
Conclusion: There is no statistically significant evidence to suggest that the average task completion time differs from 10 minutes at a 95% confidence level.
2.2.5 Effect of Sample Size on Inferential Reliability
Sample size (\(n\)) is critical for the reliability of statistical inferences:
Precision: Larger samples reduce the standard error, leading to more precise estimates of the population mean.
Statistical Power: Small samples (like \(n=10\)) have lower power, meaning they are less likely to detect a true difference if one actually exists.
Stability: Larger samples minimize the impact of outliers and random noise, making the results more consistent and reliable.
3 Case Study 3
3.1 Two-Sample T-Test (A/B Testing)
A product analytics team conducts an A/B test to compare the average session duration (minutes) between two versions of a landing page.
| Version | Sample Size (n) | Mean | Standard Deviation |
|---|---|---|---|
| A | 25 | 4.8 | 1.2 |
| B | 25 | 5.4 | 1.4 |
3.2 Tasks
- Formulate the null and alternative hypotheses.
- Identify the type of t-test required.
- Compute the test statistic and p-value.
- Draw a statistical conclusion at \(\alpha = 0.05\).
- Interpret the result for product decision-making.
3.2.1 Formulate the Null and Alternative Hypotheses
Null Hypothesis (\(H_0\)): \(\mu_A = \mu_B\). There is no significant difference in the average session duration between Version A and Version B.
Alternative Hypothesis (\(H_1\)): \(\mu_A \neq \mu_B\). There is a significant difference in the average session duration between Version A and Version B.
3.2.2 Identify the Type of T-Test Required
The required test is an Independent Two-Sample T-Test.
- Reason: We are comparing the means of two distinct and independent groups (Version A and Version B) to determine if there is a statistical difference between them.
3.2.3 Compute the Test Statistic and P-Value
Pooled Standard Deviation (\(s_p\)): \[s_p = \sqrt{\frac{(n_A-1)s_A^2 + (n_B-1)s_B^2}{n_A + n_B - 2}} = \sqrt{\frac{24(1.2^2) + 24(1.4^2)}{48}} \approx 1.304\]
T-Statistic (\(t\)):\[t = \frac{\bar{x}_B - \bar{x}_A}{s_p \sqrt{\frac{1}{n_A} + \frac{1}{n_B}}} = \frac{5.4 - 4.8}{1.304 \sqrt{\frac{1}{25} + \frac{1}{25}}} \approx 1.628\]
P-Value: For \(df = 48\) and a two-tailed test, the p-value is approximately 0.110.
3.2.4 Draw a Statistical Conclusion at \(\alpha = 0.05\)
Comparison: The p-value (\(0.110\)) is greater than the significance level \(\alpha\) (\(0.05\)).
Decision: Fail to reject the null hypothesis (\(H_0\)).
Conclusion: There is no statistically significant evidence to conclude that the session duration differs between the two landing page versions.
3.2.5 Interpret the Result for Product Decision-Making
Insight: Although Version B showed a higher numerical mean (\(5.4\) vs \(4.8\)), this difference is not statistically significant and could be due to random chance.
Recommendation: The team should not rush to implement Version B based on these results alone. It is advisable to either increase the sample size to gain more statistical power or investigate other qualitative user experience factors before making a permanent change.
4 Case Study 4
4.1 Chi-Square Test of Independence
An e-commerce company examines whether device type is associated with payment method preference.
| Device / Payment | E-Wallet | Credit Card | Cash on Delivery |
|---|---|---|---|
| Mobile | 120 | 80 | 50 |
| Desktop | 60 | 90 | 40 |
4.2 Tasks
- State the Null Hypothesis (H₀) and Alternative Hypothesis (H₁).
- Identify the appropriate statistical test.
- Compute the Chi-Square statistic (χ²).
- Determine the p-value at \(\alpha = 0.05\).
- Interpret the results in terms of digital payment strategy.
4.2.1 State the Null Hypothesis (\(H_0\)) and Alternative Hypothesis (\(H_1\))
Null Hypothesis (\(H_0\)): Device type and payment method preference are independent (there is no association between them).
Alternative Hypothesis (\(H_1\)): Device type and payment method preference are associated (there is a significant relationship between them).
4.2.2 Identify the Appropriate Statistical Test
The appropriate test is the Chi-Square Test of Independence.
Reason: This test is used to determine if there is a significant relationship between two categorical variables (Device Type and Payment Method) from the same population.
4.2.3 Compute the Chi-Square Statistic (\(\chi^2\))
To calculate \(\chi^2\), we first determine the Expected Frequency (\(E\)) for each cell using the formula: \(E = \frac{(\text{Row Total} \times \text{Column Total})}{\text{Grand Total}}\).
Expected Values:
Mobile & E-Wallet: \((250 \times 180) / 440 = 102.27\)
Mobile & Credit Card: \((250 \times 170) / 440 = 96.59\)
Mobile & COD: \((250 \times 90) / 440 = 51.14\)
Desktop & E-Wallet: \((190 \times 180) / 440 = 77.73\)
Desktop & Credit Card: \((190 \times 170) / 440 = 73.41\)
Desktop & COD: \((190 \times 90) / 440 = 38.86\)
Chi-Square Calculation:\[\chi^2 = \sum \frac{(O - E)^2}{E}\]\[\chi^2 = \frac{(120-102.27)^2}{102.27} + \frac{(80-96.59)^2}{96.59} + \dots + \frac{(40-38.86)^2}{38.86}\]\(\chi^2 \approx 13.52\)
4.2.4 Determine the p-value at \(\alpha = 0.05\)
Degrees of Freedom (\(df\)): \((r-1) \times (c-1) = (2-1) \times (3-1) = 2\).
Using the Chi-Square distribution table with \(df = 2\) and \(\chi^2 = 13.52\), the p-value \(\approx 0.0012\).
Statistical Decision:
- Since the p-value (\(0.0012\)) is less than \(\alpha\) (\(0.05\)), we Reject \(H_0\).
4.2.5 Interpret the Results in terms of Digital Payment Strategy
The results indicate a significant association between device type and payment preference:
Mobile Strategy: Mobile users are significantly more likely to use E-Wallets than expected (120 observed vs 102.27 expected). The company should prioritize optimizing E-Wallet checkout flows on the mobile app.
Desktop Strategy: Desktop users show a higher preference for Credit Cards (90 observed vs 73.41 expected). Marketing strategies for desktop could focus on highlighting credit card security features or installment plans to drive conversions.
5 Case Study 5
5.1 Type I and Type II Errors (Conceptual)
A fintech startup tests whether a new fraud detection algorithm reduces fraudulent transactions.
- H₀: The new algorithm does not reduce fraud.
- H₁: The new algorithm reduces fraud.
5.2 Tasks
- Explain a Type I Error (α) in this context.
- Explain a Type II Error (β) in this context.
- Identify which error is more costly from a business perspective.
- Discuss how sample size affects Type II Error.
- Explain the relationship between α, β, and statistical power.
5.2.1 Explain a Type I Error (\(\alpha\)) in this context
A Type I Error occurs when the null hypothesis (\(H_0\)) is rejected when it is actually true (a “False Positive”).
In this context: The startup concludes that the new algorithm is effective at reducing fraud, when in reality, it has no effect.
Consequence: The company wastes resources (money, time, and effort) implementing a system that does not provide any benefit.
5.2.2 Explain a Type II Error (\(\beta\)) in this context
A Type II Error occurs when the test fails to reject the null hypothesis (\(H_0\)) when the alternative hypothesis (\(H_1\)) is actually true (a “False Negative”).
In this context: The startup concludes that the algorithm is ineffective and decides not to use it, even though it actually would have reduced fraud.
Consequence: The company misses out on the opportunity to save money and prevent fraudulent transactions that the algorithm could have stopped.
5.2.3 Identify which error is more costly from a business perspective
From a fintech business perspective, a Type II Error is likely more costly.
- Reason: Failing to detect a working algorithm (Type II) means the company continues to suffer ongoing, large-scale financial losses from fraud. While a Type I error involves a one-time sunk cost for implementation, a Type II error results in continuous operational losses.
5.2.4 Discuss how sample size affects Type II Error
Relationship: As the sample size increases, the probability of committing a Type II Error (\(\beta\)) decreases.
Explanation: A larger sample size provides more data and reduces statistical noise, making the test more sensitive to detecting a true effect (i.e., the reduction in fraud) if it actually exists.
5.2.5 Explain the relationship between \(\alpha\), \(\beta\), and statistical power
These three elements are fundamentally linked in hypothesis testing:
\(\alpha\) and \(\beta\) Trade-off: There is an inverse relationship between \(\alpha\) and \(\beta\). If you decrease the risk of a Type I Error (by making \(\alpha\) smaller), the risk of a Type II Error (\(\beta\)) usually increases, unless the sample size is also increased.
Statistical Power (\(1 - \beta\)): Statistical power is the probability of correctly rejecting the null hypothesis when it is false.
Relationship: Lowering \(\beta\) (Type II Error) directly increases the Statistical Power. Increasing the sample size is the most effective way to decrease \(\beta\) and increase power without needing to increase \(\alpha\).
6 Case Study 6
6.1 P-Value and Statistical Decision Making
A churn prediction model evaluation yields the following results:
- Test statistic = 2.31
- p-value = 0.021
- Significance level: \(\alpha = 0.05\)
6.2 Tasks
- Explain the meaning of the p-value.
- Make a statistical decision.
- Translate the decision into non-technical language for management.
- Discuss the risk if the sample is not representative.
- Explain why the p-value does not measure effect size.
6.2.1 Explain the meaning of the p-value
The p-value represents the probability of obtaining test results at least as extreme as the results actually observed, under the assumption that the null hypothesis (\(H_0\)) is correct.
- In this case, a p-value of 0.021 means there is a 2.1% chance that these results occurred by random coincidence if the model actually had no predictive power.
6.2.2 Make a statistical decision
To make a decision, we compare the p-value to the significance level \(\alpha\):
Condition: If p-value \(\leq \alpha\), we reject \(H_0\).
Analysis: \(0.021 \leq 0.05\).
Decision: Reject the Null Hypothesis (\(H_0\)). The result is statistically significant at the 95% confidence level.
6.2.3 Translate the decision into non-technical language for management
“Our evaluation shows that the new churn prediction model is effective. We are highly confident that its performance is not due to luck. This model successfully identifies customers at risk of leaving, allowing us to take proactive retention steps.”
6.2.4 Discuss the risk if the sample is not representative
If the sample used for evaluation does not accurately represent the entire customer base, it introduces Selection Bias:
False Generalization: The model might appear accurate on the test data but fail when applied to real-world customers with different characteristics.
Misguided Investment: Management might invest heavily in a model that is invalid for the broader market, leading to wasted budget and missed retention targets.
6.2.5 Explain why the p-value does not measure effect size
The p-value only tells us if an effect exists (that it’s unlikely to be a fluke), but it does not tell us how large that effect is.
Sample Size Dependency: Large samples can produce a very small (significant) p-value even if the actual improvement is tiny.
Statistical vs. Practical Significance: A model could have a p-value \(< 0.05\) but only improve accuracy by 0.1%. While statistically significant, it might not be practically useful for the business.
7 Referensi
– Spiegelhalter, D. (2019). The Art of Statistics: How to Learn from Data. Basic Books. (Fokus: Memahami risiko, ketidakpastian, dan interpretasi hasil statistik).
– Wasserman, L. (2004). All of Statistics: A Concise Course in Statistical Inference. Springer. (Fokus: Hubungan antara statistika inferensial dengan Machine Learning).
– Casella, G., & Berger, R. L. (2002). Statistical Inference. Duxbury Resource Center. (Fokus: Teori matematis mendalam, standar utama untuk tingkat pascasarjana).