Probability Distribution

Week 14

INSTITUT TEKNOLOGI SAINS BANDUNG

IDENTITY CARD

Name : Dhefio Alim Muzakki

Student ID : 52250014

Major : Data Science

Lecturer : Mr. Bakti Siregar, M.Sc., CDS.


library(tidyverse)
library(readr)
library(ggplot2)
library(dplyr)
library(ggridges)
library(knitr)
library(DT)


1 Case Study 1

1.1 One-Sample Z-Test (Statistical Hypotheses)

A digital learning platform claims that the average daily study time of its users is 120 minutes.
The population standard deviation is known to be 15 minutes.

A random sample of 64 users shows an average study time of 116 minutes.

\[ \mu_0 = 120,\quad \sigma = 15,\quad n = 64,\quad \bar{x} = 116 \]


1. Hypothesis

\[ \begin{aligned} H_0 &: \mu = 120 \\ H_1 &: \mu \neq 120 \end{aligned} \]


1.2 Statistical Test

The appropriate test is a One-Sample Z-Test because:

  • Population standard deviation is known
  • Sample size is large (\(n \ge 30\))
  • Testing a single population mean

1.3 Test Statistic

\[ Z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}} \]

\[ Z = \frac{116 - 120}{15 / \sqrt{64}} = \frac{-4}{1.875} = -2.13 \]


P-value Calculation (\(\alpha = 0.05\))

1.4 Summary of One-Sample Z-Test Results (\(\alpha = 0.05\))

Full Summary of One-Sample Z-Test
Component Value
Sample Mean (x̄) 116
Hypothesized Mean (μ₀) 120
Standard Deviation (σ) 15
Sample Size (n) 64
Z-statistic -2.133
P-value 0.0329
Significance Level (α) 0.05
Statistical Decision Reject H0
Conclusion The population mean study time is significantly different from 120 minutes.

1.5 Statistical Decision

At the significance level of \(\alpha = 0.05\), the decision rule is:

\[ \text{Reject } H_0 \quad \text{if } \text{p-value} < \alpha \]

From the test results:

\[ \text{p-value} = 0.033 < 0.05 \]

Therefore, the null hypothesis is rejected.

\[ \boxed{\text{Reject } H_0} \]

This indicates that there is sufficient statistical evidence to conclude that the population mean daily study time differs from 120 minutes.


1.6 Business Analytics Interpretation

From a business analytics perspective, the observed sample mean of \(\bar{x} = 116\) minutes is significantly lower than the claimed average of 120 minutes. This suggests that users may be spending less time on the platform than expected.

Such a discrepancy may indicate potential issues related to user engagement, content effectiveness, or learning motivation. Management should consider reviewing engagement strategies, improving platform features, or reassessing performance benchmarks to better align user behavior with business objectives.


1.7 Conclusion

Based on the One-Sample Z-Test conducted at the 5% significance level, the claim that the average daily study time of users is 120 minutes is not supported by the sample data.

\[ \boxed{\mu \neq 120} \]

In conclusion, the platform’s stated average daily study time differs significantly from the observed user behavior, warranting further investigation and potential strategic adjustments.

2 Case Study 2

2.1 One-Sample T-Test (\(\sigma\) Unknown, Small Sample)

A UX Research Team investigates whether the average task completion time of a new application differs from 10 minutes.

The following data are collected from 10 users:

\[ 9.2,\; 10.5,\; 9.8,\; 10.1,\; 9.6,\; 10.3,\; 9.9,\; 9.7,\; 10.0,\; 9.5 \]


2.2 Hypothesis

\[ \begin{aligned} H_0 &: \mu = 10 \\ H_1 &: \mu \neq 10 \end{aligned} \]


2.3 Summary of One-Sample T-Test Results (\(\alpha = 0.05\))

Full Summary of One-Sample T-Test
Component Value
Sample Mean (x̄) 9.86
Hypothesized Mean (μ₀) 10
Sample Standard Deviation (s) 0.386
Sample Size (n) 10
Degrees of Freedom 9
T-statistic -1.146
P-value 0.2815
Significance Level (α) 0.05
Statistical Decision Fail to Reject H0
Conclusion There is no significant evidence that the average task completion time differs from 10 minutes.
Effect of Sample Size With a small sample size, estimates are more sensitive to variability, resulting in lower statistical power and wider confidence intervals.

2.4 Statistical Decision

At the significance level of \(\alpha = 0.05\), the decision rule is:

\[ \text{Reject } H_0 \quad \text{if } \text{p-value} < \alpha \]

From the analysis:

\[ \text{p-value} = 0.2815 \]

Since \(\text{p-value} > 0.05\), the null hypothesis is not rejected.

\[ \boxed{\text{Fail to Reject } H_0} \]


2.5 Analysis (UX Analytics Perspective)

From a UX analytics perspective, the observed average task completion time does not differ significantly from the 10-minute benchmark. This suggests that the new application meets the expected usability standard in terms of task efficiency.

However, the small sample size increases uncertainty in the estimate and limits inferential reliability. Minor variations in user behavior could substantially affect the results, indicating the need for further testing with a larger sample.


2.6 Conclusion

Based on the One-Sample T-Test conducted at the 5% significance level, there is insufficient statistical evidence to conclude that the average task completion time differs from 10 minutes.

\[ \boxed{\mu = 10} \]


2.7 Effect of Sample Size on Inferential Reliability

Small sample sizes reduce statistical power and increase sensitivity to random variation, which may obscure true effects. Increasing the sample size would improve estimation accuracy, narrow confidence intervals, and strengthen the reliability of statistical conclusions.

3 Case Study 3

3.1 Two-Sample T-Test (A/B Testing)

A product analytics team conducts an A/B test to compare the average session duration (minutes) between two versions of a landing page.

Version Sample Size (n) Mean Standard Deviation
A 25 4.8 1.2
B 25 5.4 1.4

3.2 Hypotheses Formulation

Let \(\mu_A\) and \(\mu_B\) denote the average session duration for version A and B.

\[ H_0: \mu_A = \mu_B \]

\[ H_1: \mu_A \neq \mu_B \]


3.3 Type of Test

Since the samples are independent and the population variances are unknown, a Two-Sample T-Test (Welch’s T-Test) is used.


3.4 Test Statistic and p-value

# Given summary statistics
n_A <- 25
n_B <- 25
mean_A <- 4.8
mean_B <- 5.4
sd_A <- 1.2
sd_B <- 1.4

# Standard error
SE <- sqrt((sd_A^2 / n_A) + (sd_B^2 / n_B))

# Test statistic
t_stat <- (mean_A - mean_B) / SE

# Degrees of freedom (Welch)
df <- (SE^4) / (
  ((sd_A^2 / n_A)^2 / (n_A - 1)) +
  ((sd_B^2 / n_B)^2 / (n_B - 1))
)

# Two-tailed p-value
p_value <- 2 * pt(abs(t_stat), df = df, lower.tail = FALSE)

# Summary table (controlled decimals)
test_summary <- data.frame(
  Statistic = c("t-statistic", "Degrees of Freedom", "p-value"),
  Value = c(
    sprintf("%.2f", t_stat),
    sprintf("%.1f", df),
    sprintf("%.3f", p_value)
  )
)

test_summary

3.5 Statistical Decision

At the significance level \(\alpha = 0.05\), the decision rule is:

\[ \text{Reject } H_0 \quad \text{if } \text{p-value} < 0.05 \]

From the test results:

\[ \text{p-value} = \texttt{0.110} \]

Since \(\text{p-value} > 0.05\), the null hypothesis is not rejected.

\[ \boxed{\text{Fail to Reject } H_0} \]

3.6 Business Analytics Interpretation

The statistical analysis indicates that the observed difference in average session duration between Version A and Version B is not statistically significant at the 5% significance level.

Although Version B shows a higher mean session duration (5.4 minutes) compared to Version A (4.8 minutes), the difference may be attributed to sampling variability rather than a true performance improvement.

From a product analytics perspective, this result suggests that the current evidence is insufficient to justify a decision solely based on session duration. Additional data, larger sample sizes, or complementary metrics (e.g., conversion rate or bounce rate) should be considered before deploying Version B.

3.7 Conclusion

Based on the Two-Sample T-Test conducted at the 5% significance level, there is insufficient statistical evidence to conclude that the average session duration differs between Version A and Version B.

\[ \boxed{\mu_A = \mu_B} \]

Therefore, the null hypothesis cannot be rejected, and the A/B test does not provide strong enough evidence to favor one version over the other based on session duration alone.


4 Case Study 4

4.1 Chi-Square Test of Independence

An e-commerce company examines whether device type is associated with payment method preference. The observed data is summarized in the table below:

Device / Payment E-Wallet Credit Card Cash on Delivery Row Totals
Mobile 120 80 50 250
Desktop 60 90 40 190
Col Totals 180 170 90 N = 440

4.2 Tasks

State the Hypotheses To test the relationship between the two categorical variables, we define:

  • Hypothesis (\(H_0\)): Device type and payment method preference are independent. There is no relationship between the device used and how a customer pays.
  • Alternative Hypothesis (\(H_1\)): Device type and payment method preference are dependent. The choice of payment method is associated with the device type.

Identify the Appropriate Statistical Test The appropriate test is the Chi-Square (\(\chi^2\)) Test of Independence. This test is used because we are evaluating the association between two categorical variables (Device Type and Payment Method) using frequency data.

Compute the Chi-Square Statistic (\(\chi^2\))

Calculate Expected Frequencies (\(E\))

The expected frequency for each cell is calculated as: \[E = \frac{(\text{Row Total} \times \text{Column Total})}{\text{Grand Total}}\]

Mobile / E-Wallet: \(\frac{250 \times 180}{440} \approx 102.27\)

Mobile / Credit Card: \(\frac{250 \times 170}{440} \approx 96.59\)

Mobile / COD: \(\frac{250 \times 90}{440} \approx 51.14\)

Desktop / E-Wallet: \(\frac{190 \times 180}{440} \approx 77.73\)

Desktop / Credit Card: \(\frac{190 \times 170}{440} \approx 73.41\)

Desktop / COD: \(\frac{190 \times 90}{440} \approx 38.86\)

Calculate the \(\chi^2\) Statistic

use the formula: \[\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}\]

Cell (Device/Pay) Observed (\(O\)) Expected (\(E\)) \((O-E)^2 / E\)
Mobile/E-Wallet 120 102.27 3.074
Mobile/CC 80 96.59 2.849
Mobile/COD 50 51.14 0.025
Desktop/E-Wallet 60 77.73 4.044
Desktop/CC 90 73.41 3.749
Desktop/COD 40 38.86 0.033
Total 13.774

Total \(\chi^2 \approx 13.77\)

Determine the p-value at \(\alpha = 0.05\)

Degrees of Freedom (\(df\)): \((r - 1) \times (c - 1) = (2 - 1) \times (3 - 1) = 2\).

Critical Value: For \(df = 2\) and \(\alpha = 0.05\), the critical value is 5.991.

p-value: For \(\chi^2 = 13.77\) and \(df = 2\), the \(p \approx 0.001\).

Since our calculated \(\chi^2\) (13.77) is greater than the critical value (5.991), and \(p < 0.05\), we reject the Null Hypothesis.

4.3 Interpret Results for Digital Payment Strategy

The data suggests a significant association between device and payment choice.

  • E-Wallet Dominance on Mobile: Mobile users utilize E-Wallets significantly more than expected. The strategy should focus on biometric authentication (FaceID/Fingerprint) and seamless app-to-app E-Wallet integration to reduce friction.

  • Desktop and Credit Cards: Desktop users prefer Credit Cards. The strategy should focus on enhanced security features (like 3D Secure) and auto-fill optimization for browsers.

  • Cash on Delivery (COD): This remains relatively consistent across both devices, suggesting it is a baseline preference for security-conscious or unbanked customers.

4.4 Final Conclusion

Based on the Chi-Square Test of Independence, we have sufficient statistical evidence (\(\chi^2 = 13.77, p < 0.05\)) to conclude that a customer’s choice of payment method is not random, but is significantly influenced by the device they use to browse the e-commerce platform.

Key Findings:

  • Mobile Synergy: There is a strong positive deviation for E-Wallets on mobile devices. This suggests that mobile users value speed and “one-tap” convenience, likely due to the integration of mobile payment apps (e.g., Apple Pay, Google Pay) on the same device.
  • Desktop Stability: Desktop users show a higher-than-expected reliance on Credit Cards. This may be attributed to a perception of higher security when entering sensitive card details on a larger screen or a more “traditional” shopping environment.
  • COD Neutrality: Cash on Delivery appears to be the most device-neutral option, indicating it serves a specific demographic (security-conscious or cash-reliant) regardless of hardware preference.

Strategic Recommendation:

The company should move away from a “one-size-fits-all” checkout experience. Instead, implement a Device-Adaptive Checkout Strategy: 1. Prioritize E-Wallets as the primary/default option on mobile interfaces to reduce cart abandonment. 2. Highlight Security and Rewards for Credit Card users on the desktop interface to align with the existing user behavior. 3. Targeted Marketing: Use device-specific promotions (e.g., “5% cashback for E-Wallet users on Mobile”) to further capitalize on these natural behavioral trends.

By aligning the digital payment infrastructure with these device-specific preferences, the company can streamline the conversion funnel and improve overall user satisfaction.


5 Case Study 5

5.1 Type I and Type II Errors (Conceptual)

A fintech startup tests whether a new fraud detection algorithm reduces fraudulent transactions.

  • Hypothesis (\(H_0\)): The new algorithm does not reduce fraud.

  • Alternative Hypothesis (\(H_1\)): The new algorithm reduces fraud.


5.2 Tasks

Explain a Type I Error (\(\alpha\)) in this context

A Type I Error occurs when we reject a true null hypothesis (a “false positive”). * In this context: The startup concludes that the new algorithm reduces fraud when, in reality, it does not. * Result: The company invests time and resources into deploying an ineffective system.

Explain a Type II Error (\(\beta\)) in this context

A Type II Error occurs when we fail to reject a false null hypothesis (a “false negative”). * In this context: The startup concludes that the new algorithm does not reduce fraud when, in reality, it actually is effective. * Result: The company misses out on a superior security tool and continues to suffer from higher fraud rates.

Identify which error is more costly from a business perspective

In a fintech environment, a Type II Error (\(\beta\)) is often considered more costly. * Reasoning: If the startup fails to adopt an effective algorithm (Type II), they remain vulnerable to financial losses from fraud and potential regulatory fines. While a Type I error leads to wasted R&D costs, a Type II error leads to ongoing, direct financial depletion and loss of customer trust.

Discuss how sample size affects Type II Error

There is an inverse relationship between sample size (\(n\)) and the probability of a Type II error (\(\beta\)). * As the sample size increases, the standard error decreases, making the test more sensitive to small improvements in fraud reduction. * Consequently, a larger sample size decreases \(\beta\), thereby increasing the likelihood of detecting a truly effective algorithm.

Explain the relationship between \(\alpha\), \(\beta\), and Statistical Power

These three components are mathematically linked within the framework of hypothesis testing:

  • Inverse Relationship between \(\alpha\) and \(\beta\): If you decrease the significance level (\(\alpha\)) to be very strict against false positives, you naturally increase the chance of missing a real effect (\(\beta\)).
  • Statistical Power: Power is defined as the probability of correctly rejecting a false null hypothesis, calculated as: \[\text{Power} = 1 - \beta\]
  • Optimization: To increase power (and decrease \(\beta\)) without increasing \(\alpha\), a researcher must typically increase the sample size or the effect size.

5.3 Summary Table

Feature Type I Error (\(\alpha\)) Type II Error (\(\beta\))
Condition of \(H_0\) \(H_0\) is True \(H_0\) is False
Decision Taken Reject \(H_0\) Fail to Reject \(H_0\)
Fintech Context “False Alarm”: Ineffective algorithm adopted “Missed Opportunity”: Effective algorithm rejected
Business Impact Wasted implementation costs Ongoing fraud losses and vulnerability

6 Case Study 6

6.1 6.1 P-Value and Statistical Decision Making

A churn prediction model evaluation yields the following results:

  • Test statistic \(= 2.31\)
  • p-value \(= 0.021\)
  • Significance level (\(\alpha\)) \(= 0.05\)

6.2 Tasks

Explain the meaning of the p-value The p-value is the probability of obtaining a test statistic at least as extreme as the one observed (\(2.31\)), assuming that the null hypothesis (\(H_0\)) is true. * In this context, a p-value of \(0.021\) means there is only a \(2.1\%\) chance that the observed difference in churn is due to random sampling error rather than an actual effect in the model.

Make a statistical decision To make a decision, we compare the p-value to the significance level (\(\alpha\)): * Comparison: \(0.021 < 0.05\). * Decision: Since the p-value is less than \(\alpha\), we reject the null hypothesis (\(H_0\)). * Conclusion: The results are statistically significant at the \(5\%\) level.

Translate the decision into non-technical language for management “Our evaluation confirms that the churn prediction model is effectively identifying patterns beyond simple random chance. With a high degree of confidence (\(97.9\%\)), we can state that the model provides statistically valid insights into customer behavior, allowing us to proceed with targeted retention strategies”.

Discuss the risk if the sample is not representative If the sample used for evaluation is not representative of the actual customer base, the model suffers from selection bias. * Risk: The statistical significance (p-value) only applies to that specific subset. * Impact: Management might implement a strategy that works for the sample group but fails—or even backfires—when applied to the broader population, leading to unexpected churn and wasted resources.

6.3 Explain why the p-value does not measure effect size

The p-value only indicates the statistical significance (whether an effect exists), not the practical significance (how large or important the effect is). * The p-value is highly sensitive to sample size (\(n\)). A very large sample can produce a “significant” p-value even if the actual reduction in churn is tiny (e.g., \(0.01\%\)). * Effect size (such as Cohen’s \(d\) or \(R^2\)) is required to quantify the magnitude of the model’s impact on business outcomes.


6.4 Summary of Findings

Metric Value Threshold Interpretation
Test Statistic \(2.31\) N/A Observed deviation from \(H_0\)
p-value \(0.021\) \(\alpha = 0.05\) Evidence is strong enough to reject \(H_0\)
Decision Reject \(H_0\) \(p < \alpha\) The model is statistically significant

Reference

[1] Holmes, A., Illowsky, B., & Dean, S. (2017). Introductory Business Statistics. OpenStax. https://openstax.org/details/books/introductory-business-statistics

[2] Black, K. (2006). Business Statistics: For Contemporary Decision Making (4th ed.). Wiley. https://archive.org/details/businessstatisti00kenb