Study Cases

Statistical Inferences - Week 14

December 27, 2025

1 Case Study 1

One-Sample Z-Test (Statistical Hypotheses)

A digital learning platform claims that the average daily study time of its users is 120 minutes.
Based on historical data, the population standard deviation is known.

A random sample of 64 users shows an average study time of 116 minutes.

μ0 = 120
σ = 15
n = 64
x¯ = 116

Tasks:

Formulate the Null Hypothesis (H₀) and Alternative Hypothesis (H₁).
Identify the appropriate statistical test and justify your choice.
Compute the test statistic and p-value using α=0.05.
State the statistical decision.
Interpret the result in a business analytics context.

knitr::opts_chunk$set(
  echo = TRUE,
  warning = FALSE,
  message = FALSE
)

# =========================================================
# Load libraries
# =========================================================
library(knitr)
library(kableExtra)
library(htmltools)
library(ggplot2)

1.1 Statistical Hypotheses

The objective is to test whether the true mean study time differs from the claimed value.

Null Hypothesis \[ H_0 : \mu = 120 \]

Alternative Hypothesis \[ H_1 : \mu \neq 120 \]

This is a two-tailed hypothesis test.

1.2 Appropriate Statistical Test

The appropriate statistical test is the One-Sample Z-Test.

Justification:

The population standard deviation (\(\sigma\)) is known
The sample size is large (\(n \ge 30\))
The analysis focuses on a single population mean

Therefore, the sampling distribution of the mean follows the standard normal (Z) distribution.

1.3 Compute Statistic Formula

\[ Z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}} \]

Test Statistic Calculation

\[ Z = \frac{116 - 120}{15 / \sqrt{64}} \]

\[ Z = \frac{-4}{1.875} \]

\[ Z = -2.13 \]

P-Value Calculation

Since this is a two-tailed test, the p-value is computed as:

\[ p\text{-value} = 2 \times P(Z \le |z|) \]

\[ p\text{-value} = 2 \times P(Z \le -2.13) \]

Using the standard normal distribution:

\[ p\text{-value} \approx 0.033 \]

1.4 Statistical Decision

Decision rule:

Reject \(H_0\) if \(p\text{-value} < \alpha\)

Since:

\[ 0.033 < 0.05 \]

The null hypothesis is rejected.

R Computation

# Input values
mu0 <- 120
sigma <- 15
n <- 64
xbar <- 116
alpha <- 0.05

# Z-test statistic
z_value <- (xbar - mu0) / (sigma / sqrt(n))

# Two-tailed p-value
p_value <- 2 * pnorm(abs(z_value), lower.tail = FALSE)

1.5 Interpretation

At the 5% significance level, there is sufficient statistical evidence to conclude that the average daily study time differs significantly from 120 minutes.

The sample mean indicates a lower-than-expected study duration, suggesting that user engagement may not meet the platform’s expectations.

From a business analytics perspective, this result highlights the need to:

Evaluate learning content effectiveness
Improve platform usability
Introduce incentives to increase user engagement 4.Strategic improvements may help align actual user behavior with business objectives.

2 Case Study 2

A UX Research team investigates whether the average task completion time of a new application differs from 10 minutes.

The following data are collected from 10 users:

\[9.2, 10.5, 9.8, 10.1, 9.6, 10.3, 9.9, 9.7, 10.0, 9.5\]

Tasks:

Define H₀ and H₁ (two-tailed).
Determine the appropriate hypothesis test.
Calculate the t-statistic and p-value at α=0.05.
Make a statistical decision.
Explain how sample size affects inferential reliability.

2.1 Statistical Hypotheses

The objective is to test whether the true mean task completion time is different from 10 minutes.

Null Hypothesis \[ H_0 : \mu = 10 \]

Alternative Hypothesis \[ H_1 : \mu \neq 10 \]

2.2 Appropriate Hypothesis Test

The appropriate statistical test is the One-Sample t-Test.

Justification:

Population standard deviation (\(\sigma\)) is unknown
Sample size is small (\(n < 30\))
Data are assumed to be approximately normally distributed

Therefore, the Student’s t-distribution is used.

2.3 Calculate Sample Statistics

Given data:

\[ 9.2,\; 10.5,\; 9.8,\; 10.1,\; 9.6,\; 10.3,\; 9.9,\; 9.7,\; 10.0,\; 9.5 \]

Sample Mean \[ \bar{x} = \frac{\sum x_i}{n} = \frac{98.6}{10} = 9.86 \]

Sample Standard Deviation \[ s = \sqrt{\frac{\sum (x_i - \bar{x})^2}{n-1}} \approx 0.39 \]

Test Statistic Formula

The t-statistic is calculated as:

\[ t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}} \]

Test Statistic Calculation

\[ t = \frac{9.86 - 10}{0.39 / \sqrt{10}} \]

\[ t = \frac{-0.14}{0.123} \]

\[ t \approx -1.14 \]

Degrees of freedom:

\[ df = n - 1 = 9 \]

P-Value Calculation

Since this is a two-tailed test, the p-value is:

\[ p\text{-value} = 2 \times P(T \le |t|) \]

\[ p\text{-value} = 2 \times P(T \le -1.14) \]

Using the t-distribution with \(df = 9\):

\[ p\text{-value} \approx 0.28 \]

2.4 Statistical Decision

Decision rule:

Reject \(H_0\) if \(p\text{-value} < \alpha\)

Since:

\[ 0.28 > 0.05 \]

Fail to reject the null hypothesis.

R Computation

# Input data
data <- c(9.2, 10.5, 9.8, 10.1, 9.6, 
          10.3, 9.9, 9.7, 10.0, 9.5)

mu0 <- 10
alpha <- 0.05

# Sample statistics
xbar <- mean(data)
s <- sd(data)
n <- length(data)

# t statistic
t_value <- (xbar - mu0) / (s / sqrt(n))

# degrees of freedom
df <- n - 1

# two-tailed p-value
p_value <- 2 * pt(abs(t_value), df = df, lower.tail = FALSE)

2.5 Conclusion

At the 5% significance level, there is no sufficient statistical evidence to conclude that the average task completion time differs from 10 minutes.

The observed difference between the sample mean and the hypothesized mean is likely due to sampling variability rather than a true performance change.

Effect of Sample Size on Inferential Reliability

Small sample sizes lead to higher variability and wider confidence intervals
Estimates are more sensitive to outliers and random fluctuations
The t-distribution accounts for this uncertainty by producing more conservative results

As the sample size increases, statistical estimates become more stable, precise, and reliable, improving inferential confidence.

3 Case Study 3

Two-Sample T-Test (A/B Testing)

A product analytics team conducts an A/B test to compare the average session duration (in minutes) between two versions of a landing page.

Version	Sample Size (n)	Mean	Standard Deviation
A	25	4.8	1.2
B	25	5.4	1.4

Tasks:

Formulate the null and alternative hypotheses.
Identify the type of t-test required.
Compute the test statistic and p-value.
Draw a statistical conclusion at α=0.05.
Interpret the result for product decision-making.

3.1 Statistical Hypotheses

The goal is to test whether there is a difference in average session duration between the two versions.

Null Hypothesis \[ H_0 : \mu_A = \mu_B \]

Alternative Hypothesis \[ H_1 : \mu_A \neq \mu_B \]

This is a two-tailed hypothesis test.

3.2 Appropriate Statistical Test

The appropriate test is a Two-Sample t-Test (Independent Samples).

Justification:

Two independent groups (Version A and Version B)
Population standard deviations are unknown
Sample sizes are moderate and approximately equal
Data are assumed to be approximately normally distributed

Because the sample variances are not guaranteed to be equal, the Welch Two-Sample t-Test is used.

3.3 Test Statistic Formula (Welch’s t-Test)

\[ t = \frac{\bar{x}_A - \bar{x}_B} {\sqrt{\frac{s_A^2}{n_A} + \frac{s_B^2}{n_B}}} \]

Test Statistic Calculation

Standard Error

\[ SE = \sqrt{\frac{1.2^2}{25} + \frac{1.4^2}{25}} = \sqrt{\frac{1.44}{25} + \frac{1.96}{25}} = \sqrt{0.0576 + 0.0784} = \sqrt{0.136} \approx 0.369 \]

t-Statistic

\[ t = \frac{4.8 - 5.4}{0.369} = \frac{-0.6}{0.369} \approx -1.63 \]

Degrees of Freedom (Welch Approximation)

\[ df \approx \frac{\left(\frac{s_A^2}{n_A} + \frac{s_B^2}{n_B}\right)^2} {\frac{(\frac{s_A^2}{n_A})^2}{n_A-1} + \frac{(\frac{s_B^2}{n_B})^2}{n_B-1}} \approx 47 \]

P-Value Calculation

Since this is a two-tailed test:

\[ p\text{-value} = 2 \times P(T \le |t|) \]

Using the t-distribution with \(df \approx 47\):

\[ p\text{-value} \approx 0.11 \]

3.4 Statistical Decision

Decision rule: - Reject \(H_0\) if \(p\text{-value} < \alpha\)

Since:

\[ 0.11 > 0.05 \]

Fail to reject the null hypothesis.

R Computation

# Given summary statistics
xbar_A <- 4.8
xbar_B <- 5.4
s_A <- 1.2
s_B <- 1.4
n_A <- 25
n_B <- 25
alpha <- 0.05

# Standard error
SE <- sqrt((s_A^2 / n_A) + (s_B^2 / n_B))

# t statistic
t_value <- (xbar_A - xbar_B) / SE

# Degrees of freedom (Welch)
df <- (SE^4) / (
  ((s_A^2 / n_A)^2 / (n_A - 1)) +
  ((s_B^2 / n_B)^2 / (n_B - 1))
)

# Two-tailed p-value
p_value <- 2 * pt(abs(t_value), df = df, lower.tail = FALSE)

3.5 Interpretation

Product Decision-Making Context

At the 5% significance level, there is no statistically significant difference in average session duration between Version A and Version B.

Although Version B shows a higher mean session duration (5.4 minutes vs. 4.8 minutes), the observed difference may be due to random variation rather than a true effect of the new landing page.

Product Insight

From a product decision-making perspective:

The new landing page (Version B) does not yet provide strong statistical evidence of improved user engagement.

The team may consider:

Running the experiment longer to increase sample size
Combining session duration with other metrics (conversion rate, bounce rate)
Conducting further UX improvements before a full rollout

4 Case Study 4

Chi-Square Test of Independence

An e-commerce company examines whether device type is associated with payment method preference.

Device / Payment	E-Wallet	Credit Card	Cash on Delivery
Mobile	120	80	50
Desktop	60	90	40

Tasks:

State the Null Hypothesis (H₀) and Alternative Hypothesis (H₁).
Identify the appropriate statistical test.
Compute the Chi-Square statistic (χ²).
Determine the p-value at α=0.05.
Interpret the results in terms of digital payment strategy.

4.1 Statistical Hypotheses

This analysis tests whether device type and payment method are independent.

Null Hypothesis \[ H_0 : \text{Device type and payment method are independent} \]

Alternative Hypothesis \[ H_1 : \text{Device type and payment method are associated} \]

4.2 Appropriate Statistical Test

The appropriate statistical test is the Chi-Square Test of Independence.

Justification:

Both variables are categorical
Data are summarized in a contingency table
The objective is to test for an association between two variables

Expected Frequencies

The expected frequency for each cell is computed as:

\[ E_{ij} = \frac{(\text{Row Total})(\text{Column Total})}{\text{Grand Total}} \]

Totals

Row totals:
- Mobile = 250
- Desktop = 190
Column totals:
- E-Wallet = 180
- Credit Card = 170
- Cash on Delivery = 90
Grand total: \[ N = 440 \]

Expected Frequency Table

Device / Payment	E-Wallet	Credit Card	Cash on Delivery
Mobile	102.27	96.59	51.14
Desktop	77.73	73.41	38.86

4.3 Chi-Square Statistic Formula

\[ \chi^2 = \sum \frac{(O - E)^2}{E} \]

Chi-Square Statistic Calculation

Summing all cells:

\[ \chi^2 \approx 13.77 \]

Degrees of freedom:

\[ df = (r-1)(c-1) = (2-1)(3-1) = 2 \]

4.4 P-Value and Statistical Decision

Using the Chi-Square distribution with \(df = 2\):

\[ p\text{-value} \approx 0.001 \]

Decision rule: - Reject \(H_0\) if \(p\text{-value} < \alpha\)

Since:

\[ 0.001 < 0.05 \]

The null hypothesis is rejected.

R Computation

# Observed frequency table
payment_data <- matrix(
  c(120, 80, 50,
    60, 90, 40),
  nrow = 2,
  byrow = TRUE
)

rownames(payment_data) <- c("Mobile", "Desktop")
colnames(payment_data) <- c("E-Wallet", "Credit Card", "Cash on Delivery")

# Chi-Square Test of Independence
chi_result <- chisq.test(payment_data)

4.5 Conclusion

Digital Payment Strategy Context

At the 5% significance level, there is strong statistical evidence of an association between device type and payment method preference.

From a digital payment strategy perspective:

Mobile users tend to prefer E-Wallets more than expected
Desktop users show relatively higher usage of Credit Cards
Cash on Delivery usage is relatively similar across devices

This insight suggests that the company should:

Optimize mobile checkout flows for E-Wallet payments
Emphasize Credit Card options on desktop platforms
Customize payment recommendations based on device type to improve conversion rates

5 Case Study 5

Type I and Type II Errors (Conceptual)

A fintech startup tests whether a new fraud detection algorithm reduces fraudulent transactions.

Statistical hypotheses:

\[ H_0 : \text{The new algorithm does not reduce fraud} \]

\[ H_1 : \text{The new algorithm reduces fraud} \]

Tasks:

Explain a Type I Error (α) in this context.
Explain a Type II Error (β) in this context.
Identify which error is more costly from a business perspective.
Discuss how sample size affects Type II Error.
Explain the relationship between α, β, and statistical power.

5.1 Type I Error (α)

A Type I Error occurs when the null hypothesis is rejected even though it is true.

In this context: The company concludes that the new fraud detection algorithm reduces fraud,
when in reality it does not.

Business implication:

The startup deploys the new algorithm believing it is effective
Fraud rates remain unchanged
Resources are wasted on implementation and maintenance
Customer trust may be affected if fraud incidents persist

The probability of committing a Type I Error is denoted by α (significance level).

5.2 Type II Error (β)

A Type II Error occurs when the null hypothesis is not rejected even though it is false.

In this context: The company concludes that the new algorithm does not reduce fraud,
when in fact it actually does.

Business implication:

A genuinely effective fraud detection system is not adopted
Fraud losses continue unnecessarily
The company misses an opportunity to reduce financial risk
Competitive advantage may be lost

The probability of committing a Type II Error is denoted by β.

5.3 Which Error Is More Costly?

From a business perspective, a Type II Error is generally more costly in this scenario.

Reasons:

Failing to adopt an effective fraud detection algorithm allows fraudulent transactions to continue
This can result in significant financial losses, regulatory issues, and reputational damage

While Type I Errors cause inefficiency, Type II Errors expose the company to ongoing fraud risk.

5.4 Effect of Sample Size on Type II Error

Sample size has a direct impact on Type II Error (β).

Small sample size:
- Lower ability to detect real fraud reduction
- Higher probability of Type II Error
Large sample size:
- Greater sensitivity to detect true effects
- Lower probability of Type II Error

Increasing the sample size improves the reliability of the test and reduces the chance of missing a true improvement.

5.5 Relationship Between α, β, and Statistical Power

α (Type I Error): Probability of falsely detecting fraud reduction
β (Type II Error): Probability of failing to detect real fraud reduction
Statistical Power: \[ \text{Power} = 1 - \beta \]

Key relationships:

Lowering α (being more conservative) usually increases β
Increasing sample size allows both low α and low β
Higher statistical power means a greater chance of correctly detecting a real reduction in fraud

Business Insight:

For fraud detection systems:

High statistical power is crucial to avoid missing effective algorithms
Adequate sample size helps balance false alarms (Type I Error) and missed detections (Type II Error)
Decisions should consider both statistical risk and financial impact

6 Case Study 6

P-Value and Statistical Decision Making

A churn prediction model evaluation yields the following results:

Test statistic = 2.31
p-value = 0.021
Significance level: \(\alpha = 0.05\)

Tasks:

Explain the meaning of the p-value.
Make a statistical decision.
Translate the decision into non-technical language for management.
Discuss the risk if the sample is not representative.
Explain why the p-value does not measure effect size.

6.1 Meaning of the p-value

The p-value represents the probability of obtaining a test statistic at least as extreme as the observed value, assuming the null hypothesis is true.

In this context: A p-value of 0.021 means that there is a 2.1% chance of observing a result as strong as this one if the churn model has no real predictive improvement.

A smaller p-value provides stronger evidence against the null hypothesis.

6.2 Statistical Decision

Decision rule: - Reject \(H_0\) if \(p\text{-value} < \alpha\)

Since:

\[ 0.021 < 0.05 \]

The correct statistical decision is to reject the null hypothesis.

6.3 Non-Technical Explanation for Management

In simple business terms:

“The results suggest that the churn prediction model is performing better than what we would expect by random chance. We have enough statistical evidence to believe that the model provides real predictive value.”

This indicates that the model’s improvement is unlikely to be due to randomness alone.

6.4 Risk of a Non-Representative Sample

If the sample used to evaluate the model is not representative of the actual customer population, several risks arise:

The model may appear effective in testing but fail in real-world deployment
Certain customer segments may be overrepresented or underrepresented
The observed statistical significance may not generalize to future data

As a result, business decisions based on the test may lead to incorrect expectations of churn reduction.

6.5 Why the p-value Does Not Measure Effect Size

The p-value indicates statistical significance, not practical importance.

It does not quantify how large or meaningful the model’s improvement is
A small p-value can result from a very large sample even if the actual improvement is minimal
Effect size metrics (e.g., lift, AUC improvement, odds ratios) are required to assess business impact

Therefore, statistical significance should always be evaluated together with effect size and business relevance.

Business Insight

For churn modeling:

A statistically significant result indicates the model is reliable
Effect size determines whether the improvement is worth acting on
Representative sampling is essential for trustworthy deployment decisions

Study Cases

Statistical Inferences - Week 14

KAYLA APRILIA

1 Case Study 1

1.1 Statistical Hypotheses

1.2 Appropriate Statistical Test

1.3 Compute Statistic Formula

1.4 Statistical Decision

1.5 Interpretation

2 Case Study 2

2.1 Statistical Hypotheses

2.2 Appropriate Hypothesis Test

2.3 Calculate Sample Statistics

2.4 Statistical Decision

2.5 Conclusion

3 Case Study 3

3.1 Statistical Hypotheses

3.2 Appropriate Statistical Test

3.3 Test Statistic Formula (Welch’s t-Test)

3.4 Statistical Decision

3.5 Interpretation

4 Case Study 4

4.1 Statistical Hypotheses

4.2 Appropriate Statistical Test

4.3 Chi-Square Statistic Formula

4.4 P-Value and Statistical Decision

4.5 Conclusion

5 Case Study 5

5.1 Type I Error (α)

5.2 Type II Error (β)

5.3 Which Error Is More Costly?

5.4 Effect of Sample Size on Type II Error

5.5 Relationship Between α, β, and Statistical Power

6 Case Study 6

6.1 Meaning of the p-value

6.2 Statistical Decision

6.3 Non-Technical Explanation for Management

6.4 Risk of a Non-Representative Sample

6.5 Why the p-value Does Not Measure Effect Size