Statistics Inference ~ Week 14

ITSB

Syafif Azmi Lontoh (52250060)

Student Major in Data Science

1 Case Study 1

One-Sample Z-Test (Statistical Hypotheses)

A digital learning platform claims that the average daily study time of its users is 120 minutes. Based on historical records, the population standard deviation is known to be 15 minutes.

A random sample of 64 users shows an average study time of 116 minutes

\[ \mu_0 =120 \] \[ \sigma=15 \] \[ n=64 \] \[\bar{x} = 116\]

Tasks

Formulate the Null Hypothesis (H₀) and Alternative Hypothesis (H₁).
Identify the appropriate statistical test and justify your choice.
Compute the test statistic and p-value using α=0.05
State the statistical decision.
Interpret the result in a business analytics context.

1.1 Statistical Hypothesis

Formulate Null Hypothesis & Alternative Hypothesis

Null Hypothesis: \[H_0: \mu = 120\]
Alternative Hypothesis: \[H_1: \mu \neq 120\]

1.2 Appropriate Statistical Test

The appropriate statistical test is the One-Sample Z-Test.

Justification:

The population standard deviation $(σ)$ is known
The sample size is large $(n≥30)$
The analysis focuses on a single population mean

the sampling distribution of the mean follows the standard normal (Z) distribution.

1.3 Compute Statistic Formula

Rumus: $Z = \frac{\bar{x} - μ_{0}}{σ / \sqrt{n}}$

Test Statistic Calculation:

$Z = \frac{116 - 120}{15 / \sqrt{64}} = - 2.133$

P-Value Calculation

the p-value is computed as:

$p - value = 2 \times P (Z \leq | z |)$

$p - value = 2 \times P (Z \leq - 2.13)$

Using the standard normal distribution:

\[p-value≈0.033\] Desicion:

Reject $H_0$ if $p-value<α$

\[0.033<0.05\]

The null hypothesis is rejected

1.4 Interpretation

The null hypothesis $(H_0: \mu = 120)$ is tested against the alternative hypothesis $(H_1: \mu \neq 120)$ using a one-sample Z-test, since the population standard deviation is known and the sample size is large $(n=64)$.
The test statistic $Z = -2.133$ indicates that the sample mean (116) is sufficiently far from the hypothesized population mean (120), relative to the population variability and sample size.
The obtained p-value is approximately 0.033, meaning that the probability of obtaining a Z-value as extreme as -2.133, if $H_0$ is true, is 3.3%

2 Case Study 2

A UX Research team investigates whether the average task completion time of a new application differs from 10 minutes.

The following data are collected from 10 users: \[9.2,10.5,9.8,10.1,9.6,10.3,9.9,9.7,10.0,9.5\]

Tasks:

Define H₀ and H₁ (two-tailed).
Determine the appropriate hypothesis test.
Calculate the t-statistic and p-value at α=0.05.
Make a statistical decision.
Explain how sample size affects inferential reliability.

2.1 Statistical Hypotheses

H₀: The average task completion time is equal to 10 minutes ( ($\mu= 10$)

H₁: The average task completion time is different from 10 minutes ($\mu \neq 10$) .

2.2 Appropriate Hypothesis Test

The correct test is a one-sample t-test, because we are comparing the sample mean to a known population value (10 minutes) and the sample size is small (n=10) .

2.3 t-statistic and p-value Calculation

Sample Mean: $\bar{x} = \frac{\sum_{i = 1}^{n} x_{i}}{n}$

$\bar{x} = \frac{9.2 + 10.5 + 9.8 + 10.1 + 9.6 + 10.3 + 9.9 + 9.7 + 10.0 + 9.5}{10} = \frac{98.6}{10} = 9.86$

Sample Standard Deviation:

$s = \sqrt{\frac{\sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2}}{n - 1}} \approx 0.39$

t-statistics calculation: $t = \frac{\bar{x} - μ}{s / \sqrt{n}} = -1.146$

Deegrees of Freedom:

\[df=n-1=9\]

P-value: $p - value = 0.281$

2.4 Statistical Decision

Since the p-value (0.281) is greater than α (0.05), we fail to reject H₀. There is insufficient statistical evidence to conclude that the average task completion time differs from 10 minutes.

2.5 Conclusion

The data do not provide sufficient evidence to conclude that the new application’s average task completion time is different from 10 minutes at the 5% significance level, given the small sample (n=10) and a p-value of 0.281 from the one-sample t-test.

Effect of Sample Size on Inferential Reliability:

Sample size significantly affects inferential reliability. Larger sample sizes reduce the standard error of the mean, leading to more accurate population estimates. With small samples (like n=10), sample distributions tend to be more variable, making statistical tests less sensitive to small differences between sample and population means. As a result, inferences from small samples are more prone to Type II errors (failing to reject H₀ when H₁ is true) .

3 Case Study 3

Two-Sample T-Test (A/B Testing)

A product analytics team conducts an A/B test to compare the average session duration (minutes) between two versions of a landing page.

Versi	Ukuran.Sampel..n.	Rata.rata	Simpangan.Baku
A	25	4.8	1.2
B	25	5.4	1.4

Tasks

Formulate the null and alternative hypotheses.
Identify the type of t-test required.
Compute the test statistic and p-value.
Draw a statistical conclusion at α=0.05.
Interpret the result for product decision-making.

3.1 Statistical Hypothesis

Null hypothesis (H₀): There is no difference in the average session duration between version A and version B. $H_1:\mu_a = \mu_b$

Alternative hypothesis (H₁): There is a difference in the average session duration between version A and version B. $H_1:\mu_a \neq \mu_b$

3.2 Apropriate Statistical test

Two independent groups (users seeing page A vs page B).
Outcome variable is continuous (minutes of session duration).
Sample sizes: $n_A =25$, $n_B=25$.
Given only summary stats (mean and SD), and typically in A/B tests we assume equal variances unless there is strong evidence otherwise.

independent two-sample t-test with equal variances (pooled t-test).

3.3 Test statistic Formula

Version A : $n_A = 25, \bar{x}_A=4.8,s_A=1.2$

Version B : $n_A = 25, \bar{x}_A=5.4,s_A=1.4$

Pooled Standard Deviation

$s_{p} = \sqrt{\frac{(n_{A} - 1) s_{A}^{2} + (n_{B} - 1) s_{B}^{2}}{n_{A} + n_{B} - 2}}$

Standard Error of the difference

$S_{E diff} = s_{p} \sqrt{\frac{1}{n_{A}} + \frac{1}{n_{B}}}$

t-statistics

$t = \frac{{\bar{x}}_{A} - {\bar{x}}_{B}}{s_{p} \sqrt{\frac{1}{n_{A}} + \frac{1}{n_{B}}}}$

Calculation:

$t ≈ -1.63$

Degrees of freedom: $df = n_A + n_B - 2 =48$

Two-sided p-value: $p≈0.11$

3.4 Statistical Decision

Significance level: $α=0.05$
p-value: $p≈0.11$

Decisions rule:

$p ≤ 0.05$ reject $H_0$

$p > 0.05$ fail to reject $H_0$

Since $p=0.11>0.05$, fail to reject the null hypothesis.

3.5 Interpretation

product decision-making

From a product perspective:

Version B shows a numerically higher mean session duration (5.4 vs 4.8 minutes), but the difference is not statistically significant at $α = 0.05$ $(p ≈ 0.11)$.

This means the observed difference could be due to random variation in the sample, and you cannot confidently claim that Version B truly improves session duration in the population.

4 Case Study 4

Chi-Square Test of Independence

An e-commerce company examines whether device type is associated with payment method preference.

Distribusi metode pembayaran menurut device
Device	E.Wallet	Credit.Card	Cash.on.Delivery
Mobile	120	80	50
Desktop	60	90	40

Tasks:

State the Null Hypothesis (H₀) and Alternative Hypothesis (H₁).
Identify the appropriate statistical test.
Compute the Chi-Square statistic (χ²).
Determine the p-value at α=0.05.
Interpret the results in terms of digital payment strategy.

4.1 Hypotheses

H₀: Device type (Mobile vs Desktop) and payment method (E-Wallet, Credit Card, Cash on Delivery) are independent; payment distribution is the same for both devices .

H₁: Device type and payment method are associated; the payment distribution differs between mobile and desktop .

4.2 Appropriate statistical test

Data: two categorical variables (Device with 2 levels, Payment Method with 3 levels) .

Appropriate test: Chi-Square test of independence on a 2×3 contingency table.

Expected Frequencies

The expected frequency for each cell is computed as:

$E_{i j} = \frac{(Row Total) (Column Total)}{Grand
Total}$

Chi-Square statistic (χ²) Observed table:

Mobile: E-Wallet = 120, Credit Card = 80, Cash on Delivery = 50 .

Desktop: E-Wallet = 60, Credit Card = 90, Cash on Delivery = 40 .

From the Chi-Square test: ${X}² ≈ 13.77$

Degrees of freedom $df=(2−1)×(3−1)=2$

Expected Frequency Table

Device	E_Wallet	Credit_Card	Cash_on_Delivery
Mobile	102.27	96.59	51.14
Desktop	77.73	73.41	38.86

4.3 p-value & Statistical Decision

The p-value is the probability of obtaining a Chi-Square statistic at least as large as the observed $χ^2≈13.77$, if the null hypothesis (no association between device and payment method) is actually true.
$p ≈ 0.0010$ means that, under H₀, there is only about a 0.1% chance of seeing differences between mobile and desktop as large as (or larger than) those in the table .

Statistical decision at α = 0.05

Decision rule:

If p-value ≤ α → Reject H₀.
If p-value > α → Fail to reject H₀.

Here, p ≈ 0.0010 < 0.05, so the decision is to reject the null hypothesis of independence between device type and payment method .

4.4 Interpretation

Because device type and payment method are associated, payment preferences are not uniform across devices . Implications:

Emphasize E-Wallet options and promotions on mobile, where E-Wallet usage appears relatively stronger than on desktop .
Strengthen Credit Card experience on desktop (e.g., smooth card entry, secure saving), while still offering prominent E-Wallet options to capture multi-device users and maximize conversions .

5 Case Study 5

Type I and Type II Errors (Conceptual)

A fintech startup tests whether a new fraud detection algorithm reduces fraudulent transactions.

Statistical hypotheses:

H0: The new algorithm does not reduce fraud

H1: The new algorithm reduces fraud

Tasks:

Explain a Type I Error (α) in this context.
Explain a Type II Error (β) in this context.
Identify which error is more costly from a business perspective.
Discuss how sample size affects Type II Error.
Explain the relationship between α, β, and statistical power.

5.1 Type I Error (α)

Occurs when you conclude the new algorithm reduces fraud, but in reality, it does not.
Also known as a “false positive.”
Business consequence: The company adopts an ineffective algorithm, wasting resources and potentially harming customer experience.

5.2 Type II Error (β)

Occurs when you fail to conclude the new algorithm is effective, but in reality, it does reduce fraud.
Also known as a “false negative.”
Business consequence: The company misses the opportunity to reduce fraud losses and continues using a less effective system.

5.3 Business Cost Comparison

Type I Error: Direct costs such as wasted resources and potential service disruptions.
Type II Error: Indirect costs such as financial losses and reputational damage due to undetected fraud.
Type II Error is typically more costly for businesses due to long-term losses.

5.4 Sample size has a direct effect on Type II Error (β)

A small sample size increases the risk of Type II Error, meaning you are more likely to fail to reject the null hypothesis when it is actually false (false negative).

Larger sample sizes reduce the probability of Type II Error because they provide more data, making it easier to detect true effects and reducing variability in results.

Increasing the sample size also increases statistical power (1 - β), which is the ability of a test to correctly identify a true effect.

5.5 Relationship Between α, β, and Statistical Power

The relationship between α (alpha), β (beta), and statistical power is fundamental in hypothesis testing:
α (alpha) is the probability of making a Type I Error, which means rejecting the null hypothesis when it is actually true (false positive). It is also known as the significance level, commonly set at 0.05.
β (beta) is the probability of making a Type II Error, which means failing to reject the null hypothesis when it is actually false (false negative).
Statistical Power is the probability of correctly rejecting the null hypothesis when it is false, calculated as Power=1−β

Key Points

increasing α increases the risk of Type I Error but reduces β and increases power.
Increasing the sample size reduces β and increases statistical power, allowing for better detection of true effects.
In practice, researchers aim for a balance: a typical α is 0.05 and a target power of 0.80 (meaning β = 0.20).

6 Case Study 6

P-Value and Statistical Decision Making

A churn prediction model evaluation yields the following results:

Test statistic = 2.31
p-value = 0.021
Significance level: α=0.05

Tasks:

Explain the meaning of the p-value.
Make a statistical decision.
Translate the decision into non-technical language for management.
Discuss the risk if the sample is not representative.
Explain why the p-value does not measure effect size.

6.1 Meaning of the P-Value

The p-value is the probability of obtaining results as extreme as, or more extreme than, the observed results, assuming the null hypothesis is true.
A smaller p-value indicates stronger evidence against the null hypothesis.
A p-value of 0.021 means there is a 2.1% chance of observing such a result if there were no real effect.

6.2 Statistical Decision

If the p-value is less than α (0.05), the result is considered statistically significant.
Here, since 0.021 < 0.05, the null hypothesis is rejected.
This means the churn prediction model shows a real effect, not just random chance.

6.3 Explanation for Management

The results show that our churn prediction model is working well and can reliably identify customers who are likely to leave. This means the model is not just guessing—it actually helps us spot churn risks with a high level of confidence, making it a useful tool for planning customer retention strategies.

6.4 Risk if Sample is Not Representative

If the sample is not representative, results may be biased and not reflect the true population.
This can lead to Type I errors (rejecting a true null hypothesis) or Type II errors (failing to reject a false null hypothesis).
Decisions based on such results may be misleading and not generalizable to the entire population.

6.5 Why P-Value Does Not Measure Effect Size

The p-value only measures the strength of evidence against the null hypothesis, not the size of the effect.
A small effect can have a small p-value with a large sample size.
A large effect can have a large p-value with a small sample size.
To assess the size of the effect, measures like Cohen’s d or