1 Case Study 1: One-Sample Z-Test

1.1 Problem Statement

Study Context

A digital learning platform claims that the average daily study time of its users is 120 minutes. Based on historical records, the population standard deviation is known to be 15 minutes.

A random sample of 64 users shows an average study time of 116 minutes.

Given:

μ₀ = 120 minutes (claimed population mean)
σ = 15 minutes (population standard deviation)
n = 64 users (sample size)
x̄ = 116 minutes (sample mean)
α = 0.05 (significance level)

1.2 Hypotheses

Formulate the Null Hypothesis (H₀) and Alternative Hypothesis (H₁)

H₀ (Null Hypothesis): μ = 120 minutes

The true average daily study time is 120 minutes as claimed.

H₁ (Alternative Hypothesis): μ ≠ 120 minutes

The true average daily study time differs from 120 minutes.

Type of Test: Two-tailed test

Significance Level: α = 0.05

1.3 Appropriate Statistical Test

Why One-Sample Z-Test?

We use the One-Sample Z-Test for the following reasons:

Population standard deviation (σ) is KNOWN: σ = 15 minutes (from historical records)
Large sample size: n = 64 ≥ 30, so Central Limit Theorem applies
Testing a single population mean: We’re comparing sample mean to a claimed population value
Quantitative data: Study time is a continuous numerical variable

Decision Rule: When σ is known and n ≥ 30, use Z-test instead of t-test.

1.4 Compute Test Statistic and P-Value

# Given data
mu_0 <- 120
sigma <- 15
n <- 64
x_bar <- 116
alpha <- 0.05

# Calculate Standard Error
se <- sigma / sqrt(n)

# Calculate Z-statistic
z_stat <- (x_bar - mu_0) / se

# Calculate p-value (two-tailed)
p_value <- 2 * pnorm(abs(z_stat), lower.tail = FALSE)

# Critical values
z_critical <- qnorm(1 - alpha/2)

# Results
cat("Standard Error (SE):", round(se, 4), "\n")

Standard Error (SE): 1.875

cat("Z-Statistic:", round(z_stat, 4), "\n")

Z-Statistic: -2.1333

cat("P-Value:", round(p_value, 4), "\n")

P-Value: 0.0329

cat("Critical Z-values: ±", round(z_critical, 4), "\n")

Critical Z-values: ± 1.96

Z-Test Formula:

Z = (x̄ - μ₀) / (σ / √n)

Calculation:
SE = σ / √n = 15 / √64 = 15 / 8 = 1.875

Z = (116 - 120) / 1.875
Z = -4 / 1.875
Z = -2.1333

P-value = 2 × P(Z > |-2.1333|) = 0.0329

results_df <- data.frame(
  Statistic = c("Standard Error (SE)", "Z-Statistic", "P-Value", 
                "Critical Z (±)", "Decision"),
  Value = c(round(se, 4), round(z_stat, 4), round(p_value, 4),
            round(z_critical, 4), 
            ifelse(abs(z_stat) > z_critical, "Reject H₀", "Fail to Reject H₀"))
)

kable(results_df, caption = "Test Results Summary", align = 'lr') %>%
  kable_styling(bootstrap_options = c("striped", "hover"), 
                full_width = FALSE) %>%
  row_spec(0, bold = TRUE, color = "white", background = "#8B1538") %>%
  row_spec(2, bold = TRUE, background = "#FFE4E6") %>%
  row_spec(3, bold = TRUE, background = "#E0F2FE")

Test Results Summary
Statistic	Value
Standard Error (SE)	1.875
Z-Statistic	-2.1333
P-Value	0.0329
Critical Z (±)	1.96
Decision	Reject H₀

1.5 Visualization

x <- seq(-4, 4, length.out = 1000)
y <- dnorm(x)
df_plot <- data.frame(x = x, y = y)

ggplot(df_plot, aes(x = x, y = y)) +
  geom_line(color = "#0EA5E9", size = 1.5) +
  geom_area(data = df_plot %>% filter(x < -z_critical), 
            aes(x = x, y = y), fill = "#F43F5E", alpha = 0.5) +
  geom_area(data = df_plot %>% filter(x > z_critical), 
            aes(x = x, y = y), fill = "#F43F5E", alpha = 0.5) +
  geom_vline(xintercept = z_stat, color = "#8B1538", 
             linetype = "dashed", size = 1.5) +
  geom_vline(xintercept = c(-z_critical, z_critical), 
             color = "#F97316", linetype = "dotted", size = 1.2) +
  annotate("text", x = z_stat, y = max(y) * 0.8, 
           label = paste0("Z = ", round(z_stat, 3)), 
           color = "#8B1538", size = 5, fontface = "bold", hjust = 1.2) +
  annotate("text", x = -2.5, y = 0.05, 
           label = "Rejection\nRegion", 
           color = "#DC2626", size = 4, fontface = "bold") +
  annotate("text", x = 2.5, y = 0.05, 
           label = "Rejection\nRegion", 
           color = "#DC2626", size = 4, fontface = "bold") +
  labs(title = "One-Sample Z-Test: Standard Normal Distribution",
       subtitle = paste0("H₀: μ = ", mu_0, " | α = ", alpha, " | p-value = ", round(p_value, 4)),
       x = "Z-Score", y = "Density") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold", color = "#8B1538", hjust = 0.5),
    plot.subtitle = element_text(size = 12, color = "#475569", hjust = 0.5),
    panel.background = element_rect(fill = "#F0F9FF", color = NA)
  )

1.6 State the Conclusion

Statistical Decision

Decision Rule: Reject H₀ if |Z| > 1.96 or if p-value < 0.05

Observed:

|Z| = 2.1333 > 1.96
p-value = 0.0329 < 0.05

Conclusion: We REJECT H₀ at α = 0.05 significance level.

There is sufficient statistical evidence to conclude that the true average daily study time differs from the claimed 120 minutes.

1.7 Business Analytics Context

Interpretation in Business Context

Current Situation:

Platform claims: 120 minutes average study time
Observed data: 116 minutes average (4 minutes difference)
Statistical significance: YES - The difference is significant

Business Implications:

Platform Claim Challenged: The observed difference is statistically significant and suggests the platform’s claim may need revision.
User Engagement: There may be a decline in user engagement that warrants investigation.
Recommendations: Conduct further analysis to identify causes of reduced study time and implement engagement strategies.

Practical Significance: While statistical significance is important, consider whether a 4-minute difference has meaningful business impact in your context.

2 Case Study 3: Two-Sample T-Test (A/B Testing)

2.1 Problem Statement

A/B Test Scenario

A product analytics team conducts an A/B test to compare the average session duration (minutes) between two versions of a landing page.

Data Collected:

n_A <- 25
mean_A <- 4.8
sd_A <- 1.2

n_B <- 25
mean_B <- 5.4
sd_B <- 1.4

alpha <- 0.05

ab_data <- data.frame(
  Version = c("Version A", "Version B"),
  `Sample Size (n)` = c(n_A, n_B),
  `Mean (minutes)` = c(mean_A, mean_B),
  `Standard Deviation` = c(sd_A, sd_B),
  check.names = FALSE
)

kable(ab_data, caption = "A/B Test Data Summary", align = 'lrrr') %>%
  kable_styling(bootstrap_options = c("striped", "hover"), 
                full_width = FALSE) %>%
  row_spec(0, bold = TRUE, color = "white", background = "#8B1538") %>%
  row_spec(1, background = "#DBEAFE") %>%
  row_spec(2, background = "#FFE4E6")

A/B Test Data Summary
Version	Sample Size (n)	Mean (minutes)	Standard Deviation
Version A	25	4.8	1.2
Version B	25	5.4	1.4

2.2 Null and Alternative Hypotheses

Hypotheses Formulation

H₀ (Null Hypothesis): μ_A = μ_B

There is no difference in average session duration between Version A and Version B.

H₁ (Alternative Hypothesis): μ_A ≠ μ_B

There is a significant difference in average session duration between the two versions.

Type of Test: Two-sample independent t-test (two-tailed)

Significance Level: α = 0.05

2.3 Type of T-Test Required

Why Two-Sample Independent T-Test?

We use the Two-Sample Independent T-Test because:

Two separate groups: Version A users vs Version B users
Independent samples: Different users in each group (not paired)
Population standard deviations unknown: Only sample SDs available
Comparing two means: Testing if μ_A ≠ μ_B
Assume equal variances: sd_A = 1.2 and sd_B = 1.4 are reasonably similar

Test Type: Pooled variance t-test (assuming equal population variances)

2.4 Compute Test Statistic and P-Value

# Pooled standard deviation
sp <- sqrt(((n_A - 1) * sd_A^2 + (n_B - 1) * sd_B^2) / (n_A + n_B - 2))

# Standard error of difference
se_diff <- sp * sqrt(1/n_A + 1/n_B)

# T-statistic
t_stat <- (mean_B - mean_A) / se_diff

# Degrees of freedom
df <- n_A + n_B - 2

# P-value (two-tailed)
p_value_t <- 2 * pt(abs(t_stat), df, lower.tail = FALSE)

# Critical value
t_critical <- qt(1 - alpha/2, df)

# Effect size (Cohen's d)
cohens_d <- (mean_B - mean_A) / sp

cat("Pooled Standard Deviation (Sp):", round(sp, 4), "\n")

Pooled Standard Deviation (Sp): 1.3038

cat("Standard Error (SE):", round(se_diff, 4), "\n")

Standard Error (SE): 0.3688

cat("T-Statistic:", round(t_stat, 4), "\n")

T-Statistic: 1.627

cat("Degrees of Freedom:", df, "\n")

Degrees of Freedom: 48

cat("P-Value:", round(p_value_t, 4), "\n")

P-Value: 0.1103

cat("Critical T (±):", round(t_critical, 4), "\n")

Critical T (±): 2.0106

cat("Cohen's d:", round(cohens_d, 4), "\n")

Cohen's d: 0.4602

Two-Sample T-Test Formula:

Step 1: Pooled Standard Deviation
Sp = √[((n_A-1)×s_A² + (n_B-1)×s_B²) / (n_A + n_B - 2)]
Sp = √[((24)×1.44 + (24)×1.96) / 48]
Sp = 1.3038

Step 2: Standard Error
SE = Sp × √(1/n_A + 1/n_B)
SE = 1.3038 × √(1/25 + 1/25)
SE = 0.3688

Step 3: T-Statistic
T = (x̄_B - x̄_A) / SE
T = (5.4 - 4.8) / 0.3688
T = 1.627

P-value = 2 × P(t > |1.627|) with df = 48
P-value = 0.1103

t_results <- data.frame(
  Statistic = c("Mean Difference (B - A)", "Pooled Std Dev", "Standard Error",
                "T-Statistic", "Degrees of Freedom", "P-Value", "Critical T (±)",
                "Cohen's d"),
  Value = c(round(mean_B - mean_A, 4), round(sp, 4), round(se_diff, 4),
            round(t_stat, 4), df, round(p_value_t, 4), 
            paste("±", round(t_critical, 4)), round(cohens_d, 4))
)

kable(t_results, caption = "T-Test Results Summary", align = 'lr') %>%
  kable_styling(bootstrap_options = c("striped", "hover"), 
                full_width = FALSE) %>%
  row_spec(0, bold = TRUE, color = "white", background = "#8B1538") %>%
  row_spec(4, bold = TRUE, background = "#FFE4E6") %>%
  row_spec(6, bold = TRUE, background = "#E0F2FE")

T-Test Results Summary
Statistic	Value
Mean Difference (B - A)	0.6
Pooled Std Dev	1.3038
Standard Error	0.3688
T-Statistic	1.627
Degrees of Freedom	48
P-Value	0.1103
Critical T (±)	± 2.0106
Cohen’s d	0.4602

2.5 Visualization

# Simulated data for visualization
set.seed(123)
sim_A <- rnorm(n_A, mean_A, sd_A)
sim_B <- rnorm(n_B, mean_B, sd_B)

df_sim <- data.frame(
  Version = rep(c("Version A", "Version B"), each = 25),
  Duration = c(sim_A, sim_B)
)

# Boxplot
p1 <- ggplot(df_sim, aes(x = Version, y = Duration, fill = Version)) +
  geom_boxplot(alpha = 0.7, outlier.shape = 21) +
  geom_jitter(width = 0.2, alpha = 0.4, size = 2) +
  stat_summary(fun = mean, geom = "point", shape = 23, 
               size = 4, fill = "red", color = "darkred") +
  scale_fill_manual(values = c("#93C5FD", "#FCA5A5")) +
  labs(title = "Session Duration Comparison: A/B Test",
       subtitle = "Boxplot with individual data points (Red diamond = mean)",
       y = "Session Duration (minutes)", x = "") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 14, face = "bold", color = "#8B1538"),
    legend.position = "none",
    panel.background = element_rect(fill = "#F0F9FF", color = NA)
  )

# T-distribution
x_t <- seq(-4, 4, length.out = 1000)
y_t <- dt(x_t, df)
df_t <- data.frame(x = x_t, y = y_t)

p2 <- ggplot(df_t, aes(x = x, y = y)) +
  geom_line(color = "#0EA5E9", size = 1.5) +
  geom_area(data = df_t %>% filter(x < -t_critical),
            aes(x = x, y = y), fill = "#F43F5E", alpha = 0.5) +
  geom_area(data = df_t %>% filter(x > t_critical),
            aes(x = x, y = y), fill = "#F43F5E", alpha = 0.5) +
  geom_vline(xintercept = t_stat, color = "#8B1538",
             linetype = "dashed", size = 1.5) +
  geom_vline(xintercept = c(-t_critical, t_critical),
             color = "#F97316", linetype = "dotted", size = 1.2) +
  annotate("text", x = t_stat, y = max(y_t) * 0.8,
           label = paste0("T = ", round(t_stat, 3)),
           color = "#8B1538", size = 5, fontface = "bold", hjust = -0.2) +
  labs(title = "T-Distribution with Test Statistic",
       subtitle = paste0("df = ", df, " | p-value = ", round(p_value_t, 4)),
       x = "T-Score", y = "Density") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 14, face = "bold", color = "#8B1538"),
    panel.background = element_rect(fill = "#F0F9FF", color = NA)
  )

grid.arrange(p1, p2, ncol = 1)

2.6 Statistical Conclusion

Draw a Statistical Conclusion at α = 0.05

Decision Rule: Reject H₀ if |T| > 2.0106 or if p-value < 0.05

Observed:

|T| = 1.627 < 2.0106
p-value = 0.1103 > 0.05

Conclusion: We FAIL TO REJECT H₀ at α = 0.05 significance level.

There is insufficient statistical evidence to conclude that there is a significant difference in average session duration between Version A and Version B.

Effect Size: Cohen’s d = 0.4602 (small to medium effect)

2.7 Product Decision-Making

Interpret the Result for Product Decision-Making

Key Findings:

Version B: 5.4 minutes average session duration
Version A: 4.8 minutes average session duration
Difference: 0.6 minutes (12.5% increase)
Statistical significance: NO

Product Recommendation:

No Clear Winner: The difference between versions is not statistically significant.
Recommendation: Either version can be used, or continue testing with larger sample size.
Consider: Test duration, seasonal effects, or segment-specific analysis.

Risk Assessment: With p-value = 0.1103, there is 11.03% probability of observing this difference by chance if there’s truly no difference between versions.

3 Case Study 4: Chi-Square Test of Independence

3.1 Problem Statement

E-Commerce Analysis

An e-commerce company examines whether device type is associated with payment method preference.

# Contingency table
payment_matrix <- matrix(
  c(120, 80, 50,
    60, 90, 40),
  nrow = 2, byrow = TRUE,
  dimnames = list(
    Device = c("Mobile", "Desktop"),
    Payment = c("E-Wallet", "Credit Card", "Cash on Delivery")
  )
)

kable(payment_matrix, caption = "Observed Frequencies: Device Type vs Payment Method") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), 
                full_width = FALSE) %>%
  add_header_above(c(" " = 1, "Payment Method" = 3)) %>%
  row_spec(0, bold = TRUE, color = "white", background = "#8B1538")

Observed Frequencies: Device Type vs Payment Method
	Payment Method
	E-Wallet	Credit Card	Cash on Delivery
Mobile	120	80	50
Desktop	60	90	40

3.2 Null Hypothesis and Alternative Hypothesis

Hypotheses Formulation

H₀ (Null Hypothesis): Device type and payment method preference are independent

There is no association between device type and payment method choice.

H₁ (Alternative Hypothesis): Device type and payment method preference are dependent

There is an association between device type and payment method choice.

Significance Level: α = 0.05

3.3 Appropriate Statistical Test

Why Chi-Square Test of Independence?

We use the Chi-Square (χ²) Test of Independence because:

Two categorical variables: Device type (Mobile/Desktop) and Payment method (3 categories)
Testing association: We want to know if the variables are related
Contingency table format: Data presented as frequency counts
Independent observations: Each transaction is independent
Expected frequencies check: All expected frequencies should be > 5

3.4 Compute Chi-Square Statistic

# Perform Chi-Square test
chi_test <- chisq.test(payment_matrix)

# Extract results
chi_stat <- chi_test$statistic
p_value_chi <- chi_test$p.value
df_chi <- chi_test$parameter
expected_freq <- chi_test$expected

# Critical value
chi_critical <- qchisq(1 - alpha, df_chi)

cat("Chi-Square Statistic (χ²):", round(chi_stat, 4), "\n")

Chi-Square Statistic (χ²): 13.7736

cat("Degrees of Freedom:", df_chi, "\n")

Degrees of Freedom: 2

cat("P-Value:", round(p_value_chi, 4), "\n")

P-Value: 0.001

cat("Critical Chi-Square:", round(chi_critical, 4), "\n")

Critical Chi-Square: 5.9915

Chi-Square Test Formula:

χ² = Σ [(Observed - Expected)² / Expected]

Degrees of Freedom:
df = (number of rows - 1) × (number of columns - 1)
df = (2 - 1) × (3 - 1) = 2

Chi-Square Statistic: χ² = 13.7736
P-Value: 0.001
Critical Value (α = 0.05): 5.9915

kable(round(expected_freq, 2), 
      caption = "Expected Frequencies (under H₀: Independence)") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), 
                full_width = FALSE) %>%
  add_header_above(c(" " = 1, "Payment Method" = 3)) %>%
  row_spec(0, bold = TRUE, color = "white", background = "#6366F1")

Expected Frequencies (under H₀: Independence)
	Payment Method
	E-Wallet	Credit Card	Cash on Delivery
Mobile	102.27	96.59	51.14
Desktop	77.73	73.41	38.86

chi_results <- data.frame(
  Statistic = c("Chi-Square (χ²)", "Degrees of Freedom", "P-Value", 
                "Critical Value", "Decision"),
  Value = c(round(chi_stat, 4), df_chi, round(p_value_chi, 4),
            round(chi_critical, 4),
            ifelse(chi_stat > chi_critical, "Reject H₀", "Fail to Reject H₀"))
)

kable(chi_results, caption = "Chi-Square Test Results", align = 'lr') %>%
  kable_styling(bootstrap_options = c("striped", "hover"), 
                full_width = FALSE) %>%
  row_spec(0, bold = TRUE, color = "white", background = "#8B1538") %>%
  row_spec(1, bold = TRUE, background = "#FFE4E6") %>%
  row_spec(3, bold = TRUE, background = "#E0F2FE")

Chi-Square Test Results
Statistic	Value
Chi-Square (χ²)	13.7736
Degrees of Freedom	2
P-Value	0.001
Critical Value	5.9915
Decision	Reject H₀

3.5 Determine P-Value at α = 0.05

Statistical Decision

Decision Rule: Reject H₀ if χ² > 5.9915 or if p-value < 0.05

Observed:

χ² = 13.7736 > 5.9915
p-value = 0.001 < 0.05

Conclusion: We REJECT H₀ at α = 0.05 significance level.

There is sufficient statistical evidence to conclude that device type and payment method preference are dependent (associated).

3.6 Visualization

# Convert to long format
df_payment <- as.data.frame(as.table(payment_matrix))
colnames(df_payment) <- c("Device", "Payment", "Frequency")

# Grouped bar chart
p1 <- ggplot(df_payment, aes(x = Device, y = Frequency, fill = Payment)) +
  geom_bar(stat = "identity", position = "dodge", alpha = 0.8) +
  geom_text(aes(label = Frequency), position = position_dodge(0.9),
            vjust = -0.5, size = 4, fontface = "bold") +
  scale_fill_manual(values = c("#93C5FD", "#FCA5A5", "#FDE68A")) +
  labs(title = "Payment Method Distribution by Device Type",
       subtitle = "Observed Frequencies",
       y = "Frequency", x = "Device Type") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 14, face = "bold", color = "#8B1538"),
    legend.position = "top",
    panel.background = element_rect(fill = "#F0F9FF", color = NA)
  )

# Proportional stacked bar
df_prop <- df_payment %>%
  group_by(Device) %>%
  mutate(Proportion = Frequency / sum(Frequency) * 100)

p2 <- ggplot(df_prop, aes(x = Device, y = Proportion, fill = Payment)) +
  geom_bar(stat = "identity", alpha = 0.8) +
  geom_text(aes(label = paste0(round(Proportion, 1), "%")),
            position = position_stack(vjust = 0.5),
            size = 4, fontface = "bold", color = "white") +
  scale_fill_manual(values = c("#93C5FD", "#FCA5A5", "#FDE68A")) +
  labs(title = "Payment Method Preference by Device (Proportional)",
       subtitle = "Percentage Distribution",
       y = "Percentage (%)", x = "Device Type") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 14, face = "bold", color = "#8B1538"),
    legend.position = "top",
    panel.background = element_rect(fill = "#F0F9FF", color = NA)
  )

grid.arrange(p1, p2, ncol = 1)

3.7 Digital Payment Strategy

Interpret Results for Digital Payment Strategy

Key Findings:

Significant Association Detected! Device type influences payment method choice.

Mobile Users (250 total):

E-Wallet: 120 (48%) - Highest preference
Credit Card: 80 (32%)
Cash on Delivery: 50 (20%)

Desktop Users (190 total):

E-Wallet: 60 (31.6%)
Credit Card: 90 (47.4%) - Highest preference
Cash on Delivery: 40 (21.1%)

Strategic Recommendations:

Mobile-First E-Wallet Optimization:
- Prioritize e-wallet integration on mobile app
- Implement one-click payments and biometric authentication
- Partner with popular mobile payment providers (GoPay, OVO, Dana)
- Offer mobile-exclusive e-wallet cashback deals
Desktop Credit Card Experience:
- Streamline credit card checkout flow on desktop
- Display security badges prominently
- Offer installment options for high-value purchases
- Enable card saving for returning customers
Device-Specific Marketing:
- Promote e-wallet deals in mobile push notifications
- Highlight credit card benefits in desktop banners
- A/B test payment option ordering by device
Conversion Rate Optimization:
- Default to preferred payment method based on device
- Reduce friction by minimizing form fields
- Test express checkout options per device

Expected Business Impact:

Potential 5-15% increase in conversion rates by optimizing payment flow per device
Reduced cart abandonment through device-appropriate payment options
Improved customer satisfaction and repeat purchase rates

4 Case Study 5: Type I and Type II Errors (Conceptual)

4.1 Problem Statement

Fraud Detection Algorithm Scenario

A fintech startup tests whether a new fraud detection algorithm reduces fraudulent transactions.

Hypotheses:

H₀: The new algorithm does NOT reduce fraud
H₁: The new algorithm REDUCES fraud

4.2 Type I Error (α)

Explain Type I Error in This Context

Definition: Type I Error occurs when we reject H₀ when H₀ is actually true (False Positive).

In This Context:

We conclude that the algorithm reduces fraud, but in reality, it does NOT reduce fraud.

Consequences:

Wasted Investment: Company deploys an ineffective algorithm, wasting development and implementation costs
False Confidence: Security team operates with false sense of security
Opportunity Cost: Resources diverted from developing truly effective solutions
Reputation Risk: When discovered, damages credibility of data science team
Implementation Costs: Training, deployment, and maintenance of ineffective system

Probability: P(Type I Error) = α (typically 0.05 or 5%)

Financial Impact: Moderate - Limited to implementation costs (~$50K-$200K)

4.3 Type II Error (β)

Explain Type II Error in This Context

Definition: Type II Error occurs when we fail to reject H₀ when H₁ is actually true (False Negative).

In This Context:

We conclude that the algorithm does NOT reduce fraud, but in reality, it DOES reduce fraud.

Consequences:

Missed Opportunity: Company doesn’t deploy an effective fraud prevention tool
Continued Fraud Losses: Business continues suffering from preventable fraud (potentially millions in losses)
Competitive Disadvantage: Competitors may deploy better fraud detection first
Customer Trust Erosion: Continued fraud affects customer satisfaction and retention
Revenue Impact: Lost sales, increased chargebacks, and customer attrition
Regulatory Risk: Failure to implement adequate fraud controls

Probability: P(Type II Error) = β (depends on sample size and effect size)

Statistical Power: Power = 1 - β (probability of correctly detecting the improvement)

Financial Impact: High - Ongoing fraud losses (potentially $500K-$5M+ annually)

4.4 Which Error is More Costly?

More Costly Error from Business Perspective

Type II Error is MORE COSTLY in this fraud detection context.

Comparative Analysis:

Error Type	Immediate Cost	Long-term Cost	Business Risk
Type I (False Positive)	$50K-$200K (Implementation)	Limited (One-time cost)	Low to Moderate (Reversible)
Type II (False Negative)	$0 (No action taken)	$500K-$5M+ (Annual fraud losses)	High (Ongoing damage)

Justification:

Magnitude: Ongoing fraud losses far exceed one-time implementation costs
Duration: Type II error leads to continuous losses; Type I is correctable
Customer Impact: Type II directly harms customers through fraud
Competitive Risk: Falling behind in fraud prevention is strategically dangerous

4.5 Sample Size and Type II Error

How Sample Size Affects Type II Error

Relationship: Sample size and Type II Error are inversely related.

As Sample Size Increases:

Type II Error probability (β) DECREASES
Statistical Power (1 - β) INCREASES
Ability to detect true effects IMPROVES
Confidence in test results STRENGTHENS

Why This Happens:

Reduced Standard Error: SE = σ/√n decreases as n increases
Narrower Confidence Intervals: More precise estimates of true effect
Better Signal Detection: Easier to distinguish true effect from noise

Practical Example:

Small sample (n=50): Power = 60%, β = 40% (miss real fraud reduction 40% of the time)
Large sample (n=500): Power = 95%, β = 5% (miss real fraud reduction only 5% of the time)

Business Recommendation: Invest in larger sample sizes when testing critical systems like fraud detection to minimize costly Type II errors.

4.6 Relationship: α, β, and Statistical Power

Explain Relationship Between α, β, and Statistical Power

Key Concepts:

1. Alpha (α) - Significance Level:

Probability of Type I Error (False Positive)
Typically set at 0.05 (5%)
Controls how strict we are about avoiding false positives

2. Beta (β) - Type II Error Probability:

Probability of Type II Error (False Negative)
Typically 0.20 (20%) or lower
Depends on sample size, effect size, and α

3. Statistical Power (1 - β):

Probability of correctly rejecting H₀ when H₁ is true
Typically aim for 0.80 (80%) or higher
Power = 1 - β

Relationships:

If You…	Then…	Trade-off
Decrease α (more strict)	β increases (Power decreases)	Fewer false positives, more false negatives
Increase sample size (n)	β decreases (Power increases)	Better at detecting true effects
Larger effect size	β decreases (Power increases)	Easier to detect big differences

Optimal Balance:

α = 0.05 (5% Type I Error)
Power = 0.80 (80% chance to detect true effect)
β = 0.20 (20% Type II Error)

Formula: Power = 1 - β

If β = 0.20, then Power = 1 - 0.20 = 0.80 (80%)

5 Case Study 6: P-Value and Statistical Decision Making

5.1 Problem Statement

Churn Prediction Model Evaluation

A churn prediction model evaluation yields the following results:

Test statistic = 2.31
P-value = 0.021
Significance level: α = 0.05

5.2 Meaning of P-Value

Explain the Meaning of the P-Value

P-Value = 0.021

Definition: The p-value is the probability of obtaining test results at least as extreme as the observed results, assuming the null hypothesis is true.

In This Context:

If the churn prediction model has NO real predictive power (H₀ is true), there is a 2.1% chance of observing a test statistic as extreme as 2.31 or more extreme, purely by random chance.

Interpretation:

Low p-value (0.021 < 0.05): The observed result is unlikely under H₀
Evidence against H₀: The data provides strong evidence that the model has predictive power
Not a probability of H₀: p-value is NOT the probability that H₀ is true

Common Misconceptions to Avoid:

“There’s a 2.1% chance H₀ is true” → WRONG
“There’s a 97.9% chance H₁ is true” → WRONG
“If H₀ were true, we’d see results this extreme only 2.1% of the time” → CORRECT

5.3 Make a Statistical Decision

Statistical Decision

Decision Rule: Reject H₀ if p-value < α

Comparison:

P-value = 0.021
α = 0.05
0.021 < 0.05 ✓

Decision: REJECT H₀ at the 5% significance level.

Conclusion: There is statistically significant evidence that the churn prediction model has meaningful predictive power. The observed performance is unlikely to have occurred by chance alone.

Confidence Level: We can be 95% confident in this decision (1 - α = 0.95).

5.4 Translate to Non-Technical Language

Translate the Decision into Non-Technical Language for Management

Executive Summary:

“Our new churn prediction model has been rigorously tested and the results show that it works significantly better than random guessing.

What We Found:

The model’s performance is statistically significant (p-value = 0.021)
There’s only a 2% chance these results happened by luck
We can be 95% confident the model has real predictive capability

What This Means for the Business:

Actionable Insights: The model can reliably identify customers at risk of churning
Proactive Intervention: We can target at-risk customers with retention campaigns before they leave
Resource Optimization: Focus retention efforts on customers most likely to churn
ROI Potential: Expected to reduce churn rate and increase customer lifetime value

Recommendation: Deploy the model into production with proper monitoring and validation protocols.

Next Steps:

Integrate model into CRM system
Develop automated retention workflows
Monitor model performance in production
A/B test retention strategies on high-risk segments

”

5.5 Risk if Sample is Not Representative

Discuss the Risk if the Sample is Not Representative

Critical Assumption: Statistical tests assume the sample is representative of the population.

If Sample is NOT Representative:

1. Selection Bias Risks:

Wrong Customer Segments: Model trained on high-value customers won’t work for budget segments
Seasonal Bias: Sample from holiday season may not apply to rest of year
Geographic Bias: Model trained on urban customers fails in rural markets
Temporal Bias: Historical data may not reflect current market conditions

2. Consequences:

Model Fails in Production: Predictions don’t work on real customer base
Wasted Resources: Investment in ineffective retention campaigns
Customer Alienation: Inappropriate messaging to wrong customer segments
False Confidence: P-value looks good, but results don’t generalize
Business Decisions Based on Flawed Data: Strategy built on unreliable insights

3. Impact on Statistical Validity:

External Validity Compromised: Results don’t apply beyond the biased sample
Overfitting Risk: Model learns sample-specific patterns, not general trends
Confidence Intervals Misleading: Uncertainty estimates are incorrect
P-value Unreliable: Statistical significance doesn’t translate to real-world effectiveness

4. How to Mitigate:

Random Sampling: Ensure truly random selection from entire customer base
Stratified Sampling: Include proportional representation from all segments
Sample Size Calculation: Ensure adequate representation of subgroups
Validation Datasets: Test model on separate, representative holdout set
Cross-Validation: Validate across different time periods and segments
A/B Testing: Pilot deployment to verify real-world performance

Business Recommendation:

Before full deployment, conduct a pilot test on a representative sample of the current customer base to validate that the model performs as expected. Monitor key metrics like prediction accuracy, false positive rate, and business KPIs (retention rate, CLV).

5.6 P-Value Does Not Measure Effect Size

Explain Why P-Value Does Not Measure Effect Size

Critical Distinction: P-value measures statistical significance, NOT practical significance (effect size).

What P-Value Tells Us:

Whether the effect is likely to be real (not due to chance)
Strength of evidence against H₀
Reliability of the finding

What P-Value Does NOT Tell Us:

How big the effect is
Whether the effect matters practically
Business impact or importance

Example Scenarios:

Scenario	P-Value	Effect Size	Business Impact
Large Sample Tiny improvement	0.001 (Very significant)	0.5% churn reduction	Low Not worth implementing
Small Sample Large improvement	0.08 (Not significant)	15% churn reduction	High Need larger study
Optimal Meaningful result	0.021 (Significant)	8% churn reduction	High Deploy immediately

Why This Matters:

Scenario 1: With millions of customers, even a 0.1% difference will show p < 0.001, but the business impact is negligible. Statistically significant ≠ Practically important.

Scenario 2: A small pilot shows 15% improvement but p = 0.08. Don’t dismiss it! The effect is large but sample was too small. Not significant ≠ No effect.

Best Practice: Always report BOTH:

P-value: Is the effect real? (Statistical significance)
Effect Size: How big is it? (Practical significance)
- Cohen’s d for t-tests
- Odds ratio or risk ratio for categorical outcomes
- R² for regression models
- Percentage change in key metrics

For Our Churn Model:

P-value = 0.021 → Effect is likely real ✓
But we need to know: What’s the actual churn reduction? 2%? 10%? 20%?
Business decision depends on: Cost of implementation vs. value of churn reduction

Recommendation: Supplement statistical significance testing with effect size estimates and ROI calculations for management decisions.

6 References and Additional Resources

References

Montgomery, D. C., & Runger, G. C. (2018). Applied Statistics and Probability for Engineers (7th ed.). John Wiley & Sons.
Agresti, A., & Finlay, B. (2018). Statistical Methods for the Social Sciences (5th ed.). Pearson.
Devore, J. L. (2015). Probability and Statistics for Engineering and the Sciences (9th ed.). Cengage Learning.
Walpole, R. E., Myers, R. H., Myers, S. L., & Ye, K. (2016). Probability & Statistics for Engineers & Scientists (9th ed.). Pearson.
Field, A. (2017). Discovering Statistics Using IBM SPSS Statistics (5th ed.). SAGE Publications.
Kohavi, R., Tang, D., & Xu, Y. (2020). Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing. Cambridge University Press.
Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Lawrence Erlbaum Associates.
Wasserstein, R. L., & Lazar, N. A. (2016). “The ASA Statement on p-Values: Context, Process, and Purpose.” The American Statistician, 70(2), 129-133.

🔗 Online Resources

R Documentation: https://www.rdocumentation.org/
CRAN Task Views - Statistics: https://cran.r-project.org/web/views/
UCLA Statistical Consulting: https://stats.oarc.ucla.edu/
Penn State STAT 500: https://online.stat.psu.edu/stat500/

Statistical Software & Packages Used

R Version: R version 4.5.2 (2025-10-31)
ggplot2: Elegant graphics for data visualization
dplyr: Data manipulation and transformation
knitr & kableExtra: Dynamic report generation and table formatting
gridExtra: Multiple plot arrangements

7 Summary and Key Takeaways

Key Learning Points

1. Case Study 1 - One-Sample Z-Test:

Use Z-test when population SD is known and sample size is large
Two-tailed tests detect differences in either direction
Statistical significance doesn’t always equal practical importance

2. Case Study 3 - Two-Sample T-Test (A/B Testing):

Independent t-tests compare means between two separate groups
Effect size (Cohen’s d) measures practical significance
Business decisions should consider both statistical and practical significance

3. Case Study 4 - Chi-Square Test:

Chi-square tests association between categorical variables
Contingency tables reveal patterns in cross-tabulated data
Device-specific optimization can significantly improve conversion rates

4. Case Study 5 - Type I & II Errors:

Type I Error: False positive (reject true H₀)
Type II Error: False negative (fail to reject false H₀)
Sample size directly affects statistical power
Business context determines which error is more costly

5. Case Study 6 - P-Value Interpretation:

P-value measures evidence against H₀, not probability of H₀
Low p-value indicates statistical significance
P-value ≠ effect size or practical importance
Representative sampling is critical for valid inference

Best Practices for Business Analytics

Context Matters: Always interpret statistical results within business context
Multiple Metrics: Report p-values, effect sizes, confidence intervals, and business KPIs
Sample Quality: Ensure representative sampling before generalizing results
Power Analysis: Calculate required sample sizes before conducting studies
Practical Significance: Consider ROI and implementation costs, not just statistical significance
Transparent Communication: Translate statistical findings into actionable business insights
Validation: Always validate findings with holdout data or A/B tests
Continuous Monitoring: Track performance metrics post-implementation

Statistical Inference

TASKS WEEK 14

1 Case Study 1: One-Sample Z-Test

1.1 Problem Statement

Study Context

1.2 Hypotheses

Formulate the Null Hypothesis (H₀) and Alternative Hypothesis (H₁)

1.3 Appropriate Statistical Test

1.4 Compute Test Statistic and P-Value

1.5 Visualization

1.6 State the Conclusion

Statistical Decision

1.7 Business Analytics Context

Interpretation in Business Context

2 Case Study 3: Two-Sample T-Test (A/B Testing)

2.1 Problem Statement

A/B Test Scenario

2.2 Null and Alternative Hypotheses

Hypotheses Formulation

2.3 Type of T-Test Required

2.4 Compute Test Statistic and P-Value

2.5 Visualization

2.6 Statistical Conclusion

Draw a Statistical Conclusion at α = 0.05

2.7 Product Decision-Making

Interpret the Result for Product Decision-Making

3 Case Study 4: Chi-Square Test of Independence

3.1 Problem Statement

E-Commerce Analysis

3.2 Null Hypothesis and Alternative Hypothesis

Hypotheses Formulation

3.3 Appropriate Statistical Test

3.4 Compute Chi-Square Statistic

3.5 Determine P-Value at α = 0.05

Statistical Decision

3.6 Visualization

3.7 Digital Payment Strategy

Interpret Results for Digital Payment Strategy

4 Case Study 5: Type I and Type II Errors (Conceptual)

4.1 Problem Statement

Fraud Detection Algorithm Scenario

4.2 Type I Error (α)

Explain Type I Error in This Context

4.3 Type II Error (β)

4.4 Which Error is More Costly?

More Costly Error from Business Perspective

4.5 Sample Size and Type II Error

How Sample Size Affects Type II Error

4.6 Relationship: α, β, and Statistical Power

5 Case Study 6: P-Value and Statistical Decision Making

5.1 Problem Statement

Churn Prediction Model Evaluation

5.2 Meaning of P-Value

Explain the Meaning of the P-Value

5.3 Make a Statistical Decision

Statistical Decision

5.4 Translate to Non-Technical Language

Translate the Decision into Non-Technical Language for Management

5.5 Risk if Sample is Not Representative

5.6 P-Value Does Not Measure Effect Size

Explain Why P-Value Does Not Measure Effect Size

6 References and Additional Resources

References

🔗 Online Resources

Statistical Software & Packages Used

7 Summary and Key Takeaways

Key Learning Points

Best Practices for Business Analytics