KAYLA APRILIA
Data Science Student at ITSB
NIM: 52250057
Email: kaylaaprilia2142@gmail.com
1 Case Study 1
One-Sample Z-Test (Statistical Hypotheses)
A digital learning platform claims that the average daily
study time of its users is 120 minutes.
Based on historical data, the population standard deviation is
known.
A random sample of 64 users shows an average study time of 116 minutes.
- μ0 = 120
- σ = 15
- n = 64
- x¯ = 116
Tasks:
- Formulate the Null Hypothesis (H₀) and Alternative Hypothesis (H₁).
- Identify the appropriate statistical test and justify your choice.
- Compute the test statistic and p-value using α=0.05.
- State the statistical decision.
- Interpret the result in a business analytics context.
knitr::opts_chunk$set(
echo = TRUE,
warning = FALSE,
message = FALSE
)
# =========================================================
# Load libraries
# =========================================================
library(knitr)
library(kableExtra)
library(htmltools)
library(ggplot2)
1.1 Statistical Hypotheses
The objective is to test whether the true mean study time differs from the claimed value.
Null Hypothesis \[ H_0 : \mu = 120 \]
Alternative Hypothesis \[ H_1 : \mu \neq 120 \]
This is a two-tailed hypothesis test.
1.2 Appropriate Statistical Test
The appropriate statistical test is the One-Sample Z-Test.
Justification:
- The population standard deviation (\(\sigma\)) is known
- The sample size is large (\(n \ge
30\))
- The analysis focuses on a single population mean
Therefore, the sampling distribution of the mean follows the standard normal (Z) distribution.
1.3 Compute Statistic Formula
\[ Z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}} \]
Test Statistic Calculation
\[ Z = \frac{116 - 120}{15 / \sqrt{64}} \]
\[ Z = \frac{-4}{1.875} \]
\[ Z = -2.13 \]
P-Value Calculation
Since this is a two-tailed test, the p-value is computed as:
\[ p\text{-value} = 2 \times P(Z \le |z|) \]
\[ p\text{-value} = 2 \times P(Z \le -2.13) \]
Using the standard normal distribution:
\[ p\text{-value} \approx 0.033 \]
1.4 Statistical Decision
Decision rule:
- Reject \(H_0\) if \(p\text{-value} < \alpha\)
Since:
\[ 0.033 < 0.05 \]
The null hypothesis is rejected.
R Computation
# Input values
mu0 <- 120
sigma <- 15
n <- 64
xbar <- 116
alpha <- 0.05
# Z-test statistic
z_value <- (xbar - mu0) / (sigma / sqrt(n))
# Two-tailed p-value
p_value <- 2 * pnorm(abs(z_value), lower.tail = FALSE)
1.5 Interpretation
At the 5% significance level, there is sufficient statistical evidence to conclude that the average daily study time differs significantly from 120 minutes.
The sample mean indicates a lower-than-expected study duration, suggesting that user engagement may not meet the platform’s expectations.
From a business analytics perspective, this result highlights the need to:
- Evaluate learning content effectiveness
- Improve platform usability
- Introduce incentives to increase user engagement 4.Strategic improvements may help align actual user behavior with business objectives.
2 Case Study 2
A UX Research team investigates whether the average task completion time of a new application differs from 10 minutes.
The following data are collected from 10 users:
\[9.2, 10.5, 9.8, 10.1, 9.6, 10.3, 9.9, 9.7, 10.0, 9.5\]
Tasks:
- Define H₀ and H₁ (two-tailed).
- Determine the appropriate hypothesis test.
- Calculate the t-statistic and p-value at α=0.05.
- Make a statistical decision.
- Explain how sample size affects inferential reliability.
2.1 Statistical Hypotheses
The objective is to test whether the true mean task completion time is different from 10 minutes.
Null Hypothesis \[ H_0 : \mu = 10 \]
Alternative Hypothesis \[ H_1 : \mu \neq 10 \]
2.2 Appropriate Hypothesis Test
The appropriate statistical test is the One-Sample t-Test.
Justification:
- Population standard deviation (\(\sigma\)) is unknown
- Sample size is small (\(n <
30\))
- Data are assumed to be approximately normally distributed
Therefore, the Student’s t-distribution is used.
2.3 Calculate Sample Statistics
Given data:
\[ 9.2,\; 10.5,\; 9.8,\; 10.1,\; 9.6,\; 10.3,\; 9.9,\; 9.7,\; 10.0,\; 9.5 \]
Sample Mean \[ \bar{x} = \frac{\sum x_i}{n} = \frac{98.6}{10} = 9.86 \]
Sample Standard Deviation \[ s = \sqrt{\frac{\sum (x_i - \bar{x})^2}{n-1}} \approx 0.39 \]
Test Statistic Formula
The t-statistic is calculated as:
\[ t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}} \]
Test Statistic Calculation
\[ t = \frac{9.86 - 10}{0.39 / \sqrt{10}} \]
\[ t = \frac{-0.14}{0.123} \]
\[ t \approx -1.14 \]
Degrees of freedom:
\[ df = n - 1 = 9 \]
P-Value Calculation
Since this is a two-tailed test, the p-value is:
\[ p\text{-value} = 2 \times P(T \le |t|) \]
\[ p\text{-value} = 2 \times P(T \le -1.14) \]
Using the t-distribution with \(df = 9\):
\[ p\text{-value} \approx 0.28 \]
2.4 Statistical Decision
Decision rule:
- Reject \(H_0\) if \(p\text{-value} < \alpha\)
Since:
\[ 0.28 > 0.05 \]
Fail to reject the null hypothesis.
R Computation
# Input data
data <- c(9.2, 10.5, 9.8, 10.1, 9.6,
10.3, 9.9, 9.7, 10.0, 9.5)
mu0 <- 10
alpha <- 0.05
# Sample statistics
xbar <- mean(data)
s <- sd(data)
n <- length(data)
# t statistic
t_value <- (xbar - mu0) / (s / sqrt(n))
# degrees of freedom
df <- n - 1
# two-tailed p-value
p_value <- 2 * pt(abs(t_value), df = df, lower.tail = FALSE)
2.5 Conclusion
At the 5% significance level, there is no sufficient statistical evidence to conclude that the average task completion time differs from 10 minutes.
The observed difference between the sample mean and the hypothesized mean is likely due to sampling variability rather than a true performance change.
Effect of Sample Size on Inferential Reliability
- Small sample sizes lead to higher variability and wider confidence intervals
- Estimates are more sensitive to outliers and random fluctuations
- The t-distribution accounts for this uncertainty by producing more conservative results
As the sample size increases, statistical estimates become more stable, precise, and reliable, improving inferential confidence.
3 Case Study 3
Two-Sample T-Test (A/B Testing)
A product analytics team conducts an A/B test to compare the average session duration (in minutes) between two versions of a landing page.
| Version | Sample Size (n) | Mean | Standard Deviation |
|---|---|---|---|
| A | 25 | 4.8 | 1.2 |
| B | 25 | 5.4 | 1.4 |
Tasks:
- Formulate the null and alternative hypotheses.
- Identify the type of t-test required.
- Compute the test statistic and p-value.
- Draw a statistical conclusion at α=0.05.
- Interpret the result for product decision-making.
3.1 Statistical Hypotheses
The goal is to test whether there is a difference in average session duration between the two versions.
Null Hypothesis \[ H_0 : \mu_A = \mu_B \]
Alternative Hypothesis \[ H_1 : \mu_A \neq \mu_B \]
This is a two-tailed hypothesis test.
3.2 Appropriate Statistical Test
The appropriate test is a Two-Sample t-Test (Independent Samples).
Justification:
- Two independent groups (Version A and Version B)
- Population standard deviations are unknown
- Sample sizes are moderate and approximately equal
- Data are assumed to be approximately normally distributed
Because the sample variances are not guaranteed to be equal, the Welch Two-Sample t-Test is used.
3.3 Test Statistic Formula (Welch’s t-Test)
\[ t = \frac{\bar{x}_A - \bar{x}_B} {\sqrt{\frac{s_A^2}{n_A} + \frac{s_B^2}{n_B}}} \]
Test Statistic Calculation
Standard Error
\[ SE = \sqrt{\frac{1.2^2}{25} + \frac{1.4^2}{25}} = \sqrt{\frac{1.44}{25} + \frac{1.96}{25}} = \sqrt{0.0576 + 0.0784} = \sqrt{0.136} \approx 0.369 \]
t-Statistic
\[ t = \frac{4.8 - 5.4}{0.369} = \frac{-0.6}{0.369} \approx -1.63 \]
Degrees of Freedom (Welch Approximation)
\[ df \approx \frac{\left(\frac{s_A^2}{n_A} + \frac{s_B^2}{n_B}\right)^2} {\frac{(\frac{s_A^2}{n_A})^2}{n_A-1} + \frac{(\frac{s_B^2}{n_B})^2}{n_B-1}} \approx 47 \]
P-Value Calculation
Since this is a two-tailed test:
\[ p\text{-value} = 2 \times P(T \le |t|) \]
Using the t-distribution with \(df \approx 47\):
\[ p\text{-value} \approx 0.11 \]
3.4 Statistical Decision
Decision rule: - Reject \(H_0\) if \(p\text{-value} < \alpha\)
Since:
\[ 0.11 > 0.05 \]
Fail to reject the null hypothesis.
R Computation
# Given summary statistics
xbar_A <- 4.8
xbar_B <- 5.4
s_A <- 1.2
s_B <- 1.4
n_A <- 25
n_B <- 25
alpha <- 0.05
# Standard error
SE <- sqrt((s_A^2 / n_A) + (s_B^2 / n_B))
# t statistic
t_value <- (xbar_A - xbar_B) / SE
# Degrees of freedom (Welch)
df <- (SE^4) / (
((s_A^2 / n_A)^2 / (n_A - 1)) +
((s_B^2 / n_B)^2 / (n_B - 1))
)
# Two-tailed p-value
p_value <- 2 * pt(abs(t_value), df = df, lower.tail = FALSE)
3.5 Interpretation
Product Decision-Making Context
At the 5% significance level, there is no statistically significant difference in average session duration between Version A and Version B.
Although Version B shows a higher mean session duration (5.4 minutes vs. 4.8 minutes), the observed difference may be due to random variation rather than a true effect of the new landing page.
Product Insight
From a product decision-making perspective:
- The new landing page (Version B) does not yet provide strong statistical evidence of improved user engagement.
The team may consider:
- Running the experiment longer to increase sample size
- Combining session duration with other metrics (conversion rate, bounce rate)
- Conducting further UX improvements before a full rollout
4 Case Study 4
Chi-Square Test of Independence
An e-commerce company examines whether device type is associated with payment method preference.
| Device / Payment | E-Wallet | Credit Card | Cash on Delivery |
|---|---|---|---|
| Mobile | 120 | 80 | 50 |
| Desktop | 60 | 90 | 40 |
Tasks:
- State the Null Hypothesis (H₀) and Alternative Hypothesis (H₁).
- Identify the appropriate statistical test.
- Compute the Chi-Square statistic (χ²).
- Determine the p-value at α=0.05.
- Interpret the results in terms of digital payment strategy.
4.1 Statistical Hypotheses
This analysis tests whether device type and payment method are independent.
Null Hypothesis \[ H_0 : \text{Device type and payment method are independent} \]
Alternative Hypothesis \[ H_1 : \text{Device type and payment method are associated} \]
4.2 Appropriate Statistical Test
The appropriate statistical test is the Chi-Square Test of Independence.
Justification:
- Both variables are categorical
- Data are summarized in a contingency table
- The objective is to test for an association between two variables
Expected Frequencies
The expected frequency for each cell is computed as:
\[ E_{ij} = \frac{(\text{Row Total})(\text{Column Total})}{\text{Grand Total}} \]
Totals
- Row totals:
- Mobile = 250
- Desktop = 190
- Mobile = 250
- Column totals:
- E-Wallet = 180
- Credit Card = 170
- Cash on Delivery = 90
- E-Wallet = 180
- Grand total: \[ N = 440 \]
Expected Frequency Table
| Device / Payment | E-Wallet | Credit Card | Cash on Delivery |
|---|---|---|---|
| Mobile | 102.27 | 96.59 | 51.14 |
| Desktop | 77.73 | 73.41 | 38.86 |
4.3 Chi-Square Statistic Formula
\[ \chi^2 = \sum \frac{(O - E)^2}{E} \]
Chi-Square Statistic Calculation
Summing all cells:
\[ \chi^2 \approx 13.77 \]
Degrees of freedom:
\[ df = (r-1)(c-1) = (2-1)(3-1) = 2 \]
4.4 P-Value and Statistical Decision
Using the Chi-Square distribution with \(df = 2\):
\[ p\text{-value} \approx 0.001 \]
Decision rule: - Reject \(H_0\) if \(p\text{-value} < \alpha\)
Since:
\[ 0.001 < 0.05 \]
The null hypothesis is rejected.
R Computation
# Observed frequency table
payment_data <- matrix(
c(120, 80, 50,
60, 90, 40),
nrow = 2,
byrow = TRUE
)
rownames(payment_data) <- c("Mobile", "Desktop")
colnames(payment_data) <- c("E-Wallet", "Credit Card", "Cash on Delivery")
# Chi-Square Test of Independence
chi_result <- chisq.test(payment_data)
4.5 Conclusion
Digital Payment Strategy Context
At the 5% significance level, there is strong statistical evidence of an association between device type and payment method preference.
From a digital payment strategy perspective:
- Mobile users tend to prefer E-Wallets more than expected
- Desktop users show relatively higher usage of Credit Cards
- Cash on Delivery usage is relatively similar across devices
This insight suggests that the company should:
- Optimize mobile checkout flows for E-Wallet payments
- Emphasize Credit Card options on desktop platforms
- Customize payment recommendations based on device type to improve conversion rates
5 Case Study 5
Type I and Type II Errors (Conceptual)
A fintech startup tests whether a new fraud detection algorithm reduces fraudulent transactions.
Statistical hypotheses:
\[ H_0 : \text{The new algorithm does not reduce fraud} \]
\[ H_1 : \text{The new algorithm reduces fraud} \]
Tasks:
- Explain a Type I Error (α) in this context.
- Explain a Type II Error (β) in this context.
- Identify which error is more costly from a business perspective.
- Discuss how sample size affects Type II Error.
- Explain the relationship between α, β, and statistical power.
5.1 Type I Error (α)
A Type I Error occurs when the null hypothesis is rejected even though it is true.
In this context: The company concludes that the
new fraud detection algorithm reduces fraud,
when in reality it does not.
Business implication:
- The startup deploys the new algorithm believing it is effective
- Fraud rates remain unchanged
- Resources are wasted on implementation and maintenance
- Customer trust may be affected if fraud incidents persist
The probability of committing a Type I Error is denoted by α (significance level).
5.2 Type II Error (β)
A Type II Error occurs when the null hypothesis is not rejected even though it is false.
In this context: The company concludes that the
new algorithm does not reduce fraud,
when in fact it actually does.
Business implication:
- A genuinely effective fraud detection system is not adopted
- Fraud losses continue unnecessarily
- The company misses an opportunity to reduce financial risk
- Competitive advantage may be lost
The probability of committing a Type II Error is denoted by β.
5.3 Which Error Is More Costly?
From a business perspective, a Type II Error is generally more costly in this scenario.
Reasons:
- Failing to adopt an effective fraud detection algorithm allows fraudulent transactions to continue
- This can result in significant financial losses, regulatory issues, and reputational damage
While Type I Errors cause inefficiency, Type II Errors expose the company to ongoing fraud risk.
5.4 Effect of Sample Size on Type II Error
Sample size has a direct impact on Type II Error (β).
- Small sample size:
- Lower ability to detect real fraud reduction
- Higher probability of Type II Error
- Large sample size:
- Greater sensitivity to detect true effects
- Lower probability of Type II Error
Increasing the sample size improves the reliability of the test and reduces the chance of missing a true improvement.
5.5 Relationship Between α, β, and Statistical Power
- α (Type I Error): Probability of falsely detecting fraud reduction
- β (Type II Error): Probability of failing to detect real fraud reduction
- Statistical Power: \[ \text{Power} = 1 - \beta \]
Key relationships:
- Lowering α (being more conservative) usually increases β
- Increasing sample size allows both low α and low β
- Higher statistical power means a greater chance of correctly detecting a real reduction in fraud
Business Insight:
For fraud detection systems:
- High statistical power is crucial to avoid missing effective algorithms
- Adequate sample size helps balance false alarms (Type I Error) and missed detections (Type II Error)
- Decisions should consider both statistical risk and financial impact
6 Case Study 6
P-Value and Statistical Decision Making
A churn prediction model evaluation yields the following results:
- Test statistic = 2.31
- p-value = 0.021
- Significance level: \(\alpha = 0.05\)
Tasks:
- Explain the meaning of the p-value.
- Make a statistical decision.
- Translate the decision into non-technical language for management.
- Discuss the risk if the sample is not representative.
- Explain why the p-value does not measure effect size.
6.1 Meaning of the p-value
The p-value represents the probability of obtaining a test statistic at least as extreme as the observed value, assuming the null hypothesis is true.
In this context: A p-value of 0.021 means that there is a 2.1% chance of observing a result as strong as this one if the churn model has no real predictive improvement.
A smaller p-value provides stronger evidence against the null hypothesis.
6.2 Statistical Decision
Decision rule: - Reject \(H_0\) if \(p\text{-value} < \alpha\)
Since:
\[ 0.021 < 0.05 \]
The correct statistical decision is to reject the null hypothesis.
6.3 Non-Technical Explanation for Management
In simple business terms:
“The results suggest that the churn prediction model is performing better than what we would expect by random chance. We have enough statistical evidence to believe that the model provides real predictive value.”
This indicates that the model’s improvement is unlikely to be due to randomness alone.
6.4 Risk of a Non-Representative Sample
If the sample used to evaluate the model is not representative of the actual customer population, several risks arise:
- The model may appear effective in testing but fail in real-world deployment
- Certain customer segments may be overrepresented or underrepresented
- The observed statistical significance may not generalize to future data
As a result, business decisions based on the test may lead to incorrect expectations of churn reduction.
6.5 Why the p-value Does Not Measure Effect Size
The p-value indicates statistical significance, not practical importance.
- It does not quantify how large or meaningful the model’s improvement is
- A small p-value can result from a very large sample even if the actual improvement is minimal
- Effect size metrics (e.g., lift, AUC improvement, odds ratios) are required to assess business impact
Therefore, statistical significance should always be evaluated together with effect size and business relevance.
Business Insight
For churn modeling:
- A statistically significant result indicates the model is reliable
- Effect size determines whether the improvement is worth acting on
- Representative sampling is essential for trustworthy deployment decisions