Statistics Inference ~ Week 14
ITSB
Syafif Azmi Lontoh (52250060)
Student Major in Data Science
1 Case Study 1
One-Sample Z-Test (Statistical Hypotheses)
A digital learning platform claims that the average daily study time of its users is 120 minutes. Based on historical records, the population standard deviation is known to be 15 minutes.
A random sample of 64 users shows an average study time of 116 minutes
\[ \mu_0 =120 \] \[ \sigma=15 \] \[ n=64 \] \[\bar{x} = 116\]
Tasks
Formulate the Null Hypothesis (H₀) and Alternative Hypothesis (H₁).
Identify the appropriate statistical test and justify your choice.
Compute the test statistic and p-value using α=0.05
State the statistical decision.
Interpret the result in a business analytics context.
1.1 Statistical Hypothesis
Formulate Null Hypothesis & Alternative Hypothesis
- Null Hypothesis: \[H_0: \mu = 120\]
- Alternative Hypothesis: \[H_1: \mu \neq 120\]
1.2 Appropriate Statistical Test
The appropriate statistical test is the One-Sample Z-Test.
Justification:
The population standard deviation \((σ)\) is known
The sample size is large \((n≥30)\)
The analysis focuses on a single population mean
the sampling distribution of the mean follows the standard normal (Z) distribution.
1.3 Compute Statistic Formula
Rumus:
Test Statistic Calculation:
P-Value Calculation
the p-value is computed as:
Using the standard normal distribution:
\[p-value≈0.033\] Desicion:
- Reject \(H_0\) if \(p-value<α\)
\[0.033<0.05\]
The null hypothesis is rejected
1.4 Interpretation
The null hypothesis \((H_0: \mu = 120)\) is tested against the alternative hypothesis \((H_1: \mu \neq 120)\) using a one-sample Z-test, since the population standard deviation is known and the sample size is large \((n=64)\).
The test statistic \(Z = -2.133\) indicates that the sample mean (116) is sufficiently far from the hypothesized population mean (120), relative to the population variability and sample size.
The obtained p-value is approximately 0.033, meaning that the probability of obtaining a Z-value as extreme as -2.133, if \(H_0\) is true, is 3.3%
2 Case Study 2
A UX Research team investigates whether the average task completion time of a new application differs from 10 minutes.
The following data are collected from 10 users: \[9.2,10.5,9.8,10.1,9.6,10.3,9.9,9.7,10.0,9.5\]
Tasks:
- Define H₀ and H₁ (two-tailed).
- Determine the appropriate hypothesis test.
- Calculate the t-statistic and p-value at α=0.05.
- Make a statistical decision.
- Explain how sample size affects inferential reliability.
2.1 Statistical Hypotheses
H₀: The average task completion time is equal to 10 minutes ( (\(\mu= 10\))
H₁: The average task completion time is different from 10 minutes (\(\mu \neq 10\)) .
2.2 Appropriate Hypothesis Test
The correct test is a one-sample t-test, because we are comparing the sample mean to a known population value (10 minutes) and the sample size is small (n=10) .
2.3 t-statistic and p-value Calculation
- Sample Mean:
- Sample Standard Deviation:
- t-statistics calculation:
Deegrees of Freedom:
\[df=n-1=9\]
P-value:
2.4 Statistical Decision
Since the p-value (0.281) is greater than α (0.05), we fail to reject H₀. There is insufficient statistical evidence to conclude that the average task completion time differs from 10 minutes.
2.5 Conclusion
The data do not provide sufficient evidence to conclude that the new application’s average task completion time is different from 10 minutes at the 5% significance level, given the small sample (n=10) and a p-value of 0.281 from the one-sample t-test.
Effect of Sample Size on Inferential Reliability:
Sample size significantly affects inferential reliability. Larger sample sizes reduce the standard error of the mean, leading to more accurate population estimates. With small samples (like n=10), sample distributions tend to be more variable, making statistical tests less sensitive to small differences between sample and population means. As a result, inferences from small samples are more prone to Type II errors (failing to reject H₀ when H₁ is true) .
3 Case Study 3
Two-Sample T-Test (A/B Testing)
A product analytics team conducts an A/B test to compare the average session duration (minutes) between two versions of a landing page.
| Versi | Ukuran.Sampel..n. | Rata.rata | Simpangan.Baku |
|---|---|---|---|
| A | 25 | 4.8 | 1.2 |
| B | 25 | 5.4 | 1.4 |
Tasks
- Formulate the null and alternative hypotheses.
- Identify the type of t-test required.
- Compute the test statistic and p-value.
- Draw a statistical conclusion at α=0.05.
- Interpret the result for product decision-making.
3.1 Statistical Hypothesis
Null hypothesis (H₀): There is no difference in the average session duration between version A and version B. \(H_1:\mu_a = \mu_b\)
Alternative hypothesis (H₁): There is a difference in the average session duration between version A and version B. \(H_1:\mu_a \neq \mu_b\)
3.2 Apropriate Statistical test
Two independent groups (users seeing page A vs page B).
Outcome variable is continuous (minutes of session duration).
Sample sizes: \(n_A =25\), \(n_B=25\).
Given only summary stats (mean and SD), and typically in A/B tests we assume equal variances unless there is strong evidence otherwise.
independent two-sample t-test with equal variances (pooled t-test).
3.3 Test statistic Formula
Version A : \(n_A = 25, \bar{x}_A=4.8,s_A=1.2\)
Version B : \(n_A = 25, \bar{x}_A=5.4,s_A=1.4\)
Pooled Standard Deviation
Standard Error of the difference
t-statistics
Calculation:
- \(t ≈ -1.63\)
- Degrees of freedom: \(df = n_A + n_B - 2 =48\)
- Two-sided p-value: \(p≈0.11\)
3.4 Statistical Decision
- Significance level: \(α=0.05\)
- p-value: \(p≈0.11\)
Decisions rule:
\(p ≤ 0.05\) reject \(H_0\)
\(p > 0.05\) fail to reject \(H_0\)
Since \(p=0.11>0.05\), fail to reject the null hypothesis.
3.5 Interpretation
product decision-making
From a product perspective:
Version B shows a numerically higher mean session duration (5.4 vs 4.8 minutes), but the difference is not statistically significant at \(α = 0.05\) \((p ≈ 0.11)\).
This means the observed difference could be due to random variation in the sample, and you cannot confidently claim that Version B truly improves session duration in the population.
4 Case Study 4
Chi-Square Test of Independence
An e-commerce company examines whether device type is associated with payment method preference.| Device | E.Wallet | Credit.Card | Cash.on.Delivery |
|---|---|---|---|
| Mobile | 120 | 80 | 50 |
| Desktop | 60 | 90 | 40 |
Tasks:
State the Null Hypothesis (H₀) and Alternative Hypothesis (H₁).
Identify the appropriate statistical test.
Compute the Chi-Square statistic (χ²).
Determine the p-value at α=0.05.
Interpret the results in terms of digital payment strategy.
4.1 Hypotheses
H₀: Device type (Mobile vs Desktop) and payment method (E-Wallet, Credit Card, Cash on Delivery) are independent; payment distribution is the same for both devices .
H₁: Device type and payment method are associated; the payment distribution differs between mobile and desktop .
4.2 Appropriate statistical test
Data: two categorical variables (Device with 2 levels, Payment Method with 3 levels) .
Appropriate test: Chi-Square test of independence on a 2×3 contingency table.
Expected Frequencies
The expected frequency for each cell is computed as:
Chi-Square statistic (χ²) Observed table:
Mobile: E-Wallet = 120, Credit Card = 80, Cash on Delivery = 50 .
Desktop: E-Wallet = 60, Credit Card = 90, Cash on Delivery = 40 .
From the Chi-Square test: \({X}² ≈ 13.77\)
Degrees of freedom \(df=(2−1)×(3−1)=2\)
Expected Frequency Table
| Device | E_Wallet | Credit_Card | Cash_on_Delivery |
|---|---|---|---|
| Mobile | 102.27 | 96.59 | 51.14 |
| Desktop | 77.73 | 73.41 | 38.86 |
4.3 p-value & Statistical Decision
The p-value is the probability of obtaining a Chi-Square statistic at least as large as the observed \(χ^2≈13.77\), if the null hypothesis (no association between device and payment method) is actually true.
\(p ≈ 0.0010\) means that, under H₀, there is only about a 0.1% chance of seeing differences between mobile and desktop as large as (or larger than) those in the table .
Statistical decision at α = 0.05
Decision rule:
If p-value ≤ α → Reject H₀.
If p-value > α → Fail to reject H₀.
Here, p ≈ 0.0010 < 0.05, so the decision is to reject the null hypothesis of independence between device type and payment method .
4.4 Interpretation
Because device type and payment method are associated, payment preferences are not uniform across devices . Implications:
Emphasize E-Wallet options and promotions on mobile, where E-Wallet usage appears relatively stronger than on desktop .
Strengthen Credit Card experience on desktop (e.g., smooth card entry, secure saving), while still offering prominent E-Wallet options to capture multi-device users and maximize conversions .
5 Case Study 5
Type I and Type II Errors (Conceptual)
A fintech startup tests whether a new fraud detection algorithm reduces fraudulent transactions.
Statistical hypotheses:
H0: The new algorithm does not reduce fraud
H1: The new algorithm reduces fraud
Tasks:
- Explain a Type I Error (α) in this context.
- Explain a Type II Error (β) in this context.
- Identify which error is more costly from a business perspective.
- Discuss how sample size affects Type II Error.
- Explain the relationship between α, β, and statistical power.
5.1 Type I Error (α)
Occurs when you conclude the new algorithm reduces fraud, but in reality, it does not.
Also known as a “false positive.”
Business consequence: The company adopts an ineffective algorithm, wasting resources and potentially harming customer experience.
5.2 Type II Error (β)
Occurs when you fail to conclude the new algorithm is effective, but in reality, it does reduce fraud.
Also known as a “false negative.”
Business consequence: The company misses the opportunity to reduce fraud losses and continues using a less effective system.
5.3 Business Cost Comparison
Type I Error: Direct costs such as wasted resources and potential service disruptions.
Type II Error: Indirect costs such as financial losses and reputational damage due to undetected fraud.
Type II Error is typically more costly for businesses due to long-term losses.
5.4 Sample size has a direct effect on Type II Error (β)
A small sample size increases the risk of Type II Error, meaning you are more likely to fail to reject the null hypothesis when it is actually false (false negative).
Larger sample sizes reduce the probability of Type II Error because they provide more data, making it easier to detect true effects and reducing variability in results.
Increasing the sample size also increases statistical power (1 - β), which is the ability of a test to correctly identify a true effect.
5.5 Relationship Between α, β, and Statistical Power
The relationship between α (alpha), β (beta), and statistical power is fundamental in hypothesis testing:
α (alpha) is the probability of making a Type I Error, which means rejecting the null hypothesis when it is actually true (false positive). It is also known as the significance level, commonly set at 0.05.
β (beta) is the probability of making a Type II Error, which means failing to reject the null hypothesis when it is actually false (false negative).
Statistical Power is the probability of correctly rejecting the null hypothesis when it is false, calculated as Power=1−β
Key Points
increasing α increases the risk of Type I Error but reduces β and increases power.
Increasing the sample size reduces β and increases statistical power, allowing for better detection of true effects.
In practice, researchers aim for a balance: a typical α is 0.05 and a target power of 0.80 (meaning β = 0.20).
6 Case Study 6
P-Value and Statistical Decision Making
A churn prediction model evaluation yields the following results:
- Test statistic = 2.31
- p-value = 0.021
- Significance level: α=0.05
Tasks:
- Explain the meaning of the p-value.
- Make a statistical decision.
- Translate the decision into non-technical language for management.
- Discuss the risk if the sample is not representative.
- Explain why the p-value does not measure effect size.
6.1 Meaning of the P-Value
The p-value is the probability of obtaining results as extreme as, or more extreme than, the observed results, assuming the null hypothesis is true.
A smaller p-value indicates stronger evidence against the null hypothesis.
A p-value of 0.021 means there is a 2.1% chance of observing such a result if there were no real effect.
6.2 Statistical Decision
If the p-value is less than α (0.05), the result is considered statistically significant.
Here, since 0.021 < 0.05, the null hypothesis is rejected.
This means the churn prediction model shows a real effect, not just random chance.
6.3 Explanation for Management
The results show that our churn prediction model is working well and can reliably identify customers who are likely to leave. This means the model is not just guessing—it actually helps us spot churn risks with a high level of confidence, making it a useful tool for planning customer retention strategies.
6.4 Risk if Sample is Not Representative
If the sample is not representative, results may be biased and not reflect the true population.
This can lead to Type I errors (rejecting a true null hypothesis) or Type II errors (failing to reject a false null hypothesis).
Decisions based on such results may be misleading and not generalizable to the entire population.
6.5 Why P-Value Does Not Measure Effect Size
The p-value only measures the strength of evidence against the null hypothesis, not the size of the effect.
A small effect can have a small p-value with a large sample size.
A large effect can have a large p-value with a small sample size.
To assess the size of the effect, measures like Cohen’s d or