Statistical Inferences ~ Week 14

logo week 10

Nazwa Nur Ramadhani

Undergraduate Student in Data Science at Institut Teknologi Sains Bandung

Case Study 1

One-Sample Z-Test (Statistical Hypotheses)

A digital learning platform claims that the average daily study time of its users is 120 minutes. Based on historical records, the population standard deviation is known to be 15 minutes.

A random sample of 64 users shows an average study time of 116 minutes.

\[ \begin{eqnarray*} \mu_0 &=& 120 \\ \sigma &=& 15 \\ n &=& 64 \\ \bar{x} &=& 116 \end{eqnarray*} \]

Tasks:

Soal 1

1.Formulate the Null Hypothesis (H₀) and Alternative Hypothesis (H₁).

Null Hypothesis (H₀): \[H_0:\mu=120\] This hypothesis represents the platform’s claim that the average daily study time of users is exactly 120 minutes.

Alternative Hypothesis (H₁): \[H_1:\mu<120\] This hypothesis states that the true average daily study time of users is less than 120 minutes.

Soal 2

2.Identify the appropriate statistical test and justify your choice.

The appropriate statistical test for this problem is a One-Sample Z-Test because:

  • The population standard deviation is known (\(\sigma = 15\)).

  • The sample size is relatively large (\(n = 64\)).

  • The goal is to compare a sample mean with a known population mean.

Soal 3

3.Compute the statistic and p-value using \(\alpha = 0.05\).

\[Z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}\]

Where:

  • \(\bar{x}\) = sample mean = 116
  • \(\mu_0\) = population mean (claimed) = 120
  • \(\sigma\) = population standard deviation = 15
  • \(n\) = sample size = 64

Substituting the given values:

Standard Error (SE):

\[SE = \frac{\sigma}{\sqrt{n}} = \frac{15}{\sqrt{64}} = \frac{15}{8} = 1.875\]

Z-statistic:

\[Z = \frac{116 - 120}{1.875} = \frac{-4}{1.875} = -2.13\]

P-Value Calculation:

Since this is a two-tailed test, we need to find the probability in both tails.

\[P(Z \leq -2.13) \approx 0.0166\]

\[\text{P-value} = 2 \times 0.0166 = 0.0332\]

Soal 4

4.State the Statistical Decision

Decision Rule:

  • If p-value \(\leq \alpha\), reject \(H_0\)
  • If p-value \(> \alpha\), fail to reject \(H_0\)

Decision:

Given:

  • P-value = 0.0332

  • Significance level \(\alpha = 0.05\)

Since p-value (0.0332) \(< \alpha\) (0.05), REJECT the null hypothesis (\(H_0\)).

Soal 5

5.Interpret the Result in a Business Analytics Context

At the 5% significance level, there is sufficient statistical evidence to reject the platform’s claim that the average daily study time is 120 minutes. The sample data suggests that the true average study time is significantly different from 120 minutes. The actual average study time (116 minutes) is statistically significantly lower than the claimed 120 minutes.

Business Implications:

1.Platform Performance: The platform is not meeting its claimed engagement target. Users are studying approximately 4 minutes less per day than advertised.

2.Marketing Concerns: If the 120-minute claim is used in marketing materials, it may be misleading to potential customers and could lead to reputation issues.

3.User Engagement: The lower study time could indicate:

  • User engagement issues
  • Content quality problems
  • Platform usability concerns
  • Competition from other platforms

The platform should address the gap between claimed and actual study time by improving user engagement and adjusting its communication strategy to maintain credibility with users and stakeholders.

Case Study 2

One-Sample T-Test (σ Unknown, Small Sample)

A UX Research Team investigates whether the average task completion time of a new application differs from 10 minutes.

The following data are collected from 10 users:

\[ 9.2,\; 10.5,\; 9.8,\; 10.1,\; 9.6,\; 10.3,\; 9.9,\; 9.7,\; 10.0,\; 9.5 \]

Tasks:

Soal 1

1.Define H₀ and H₁ (two-tailed).

Null Hypothesis (H₀):

\[ \begin{array}{l} H_0: \mu = 10 \text{ minutes} \end{array} \] The null hypothesis states that the average task completion time is exactly 10 minutes

Alternative Hypothesis (H₁):

\[ \begin{array}{l} H_1: \mu \neq 10 \text{ minutes} \end{array} \] The alternative hypothesis states that the true mean is different from 10 minutes.

Soal 2

2.Determine the Appropriate Hypothesis Test

The One-Sample T-Test is the most appropriate method for this analysis because:

  • The population standard deviation is unknown.

  • The sample size is small (\(n = 10\)).

  • The analysis compares a sample mean against a hypothesized population mean.

Soal 3

3.Calculate the t-statistic and p-value at \(\alpha\) = 0.05

Sample data:

\[9.2,\; 10.5,\; 9.8,\; 10.1,\; 9.6,\; 10.3,\; 9.9,\; 9.7,\; 10.0,\; 9.5\]

Calculation

Sample Mean (\(\bar{x}\)):

\[ \begin{array}{rcl} \bar{x} &=& \frac{\sum x_i}{n} = \frac{9.2 + 10.5 + 9.8 + 10.1 + 9.6 + 10.3 + 9.9 + 9.7 + 10.0 + 9.5}{10} \\ \bar{x} &=& \frac{98.6}{10} = 9.86 \text{ minutes} \end{array} \]

Standard Deviation (\(s\)):

\[ \begin{array}{rcl} \sum (x_i - \bar{x})^2 &=& (9.2-9.86)^2 + (10.5-9.86)^2 + \ldots + (9.5-9.86)^2 \\ &=& (-0.66)^2 + (0.64)^2 + (-0.06)^2 + (0.24)^2 + (-0.26)^2 \\ & & + (0.44)^2 + (0.04)^2 + (-0.16)^2 + (0.14)^2 + (-0.36)^2 \\ &=& 0.4356 + 0.4096 + 0.0036 + 0.0576 + 0.0676 \\ & & + 0.1936 + 0.0016 + 0.0256 + 0.0196 + 0.1296 \\ \sum (x_i - \bar{x})^2 &=& 1.344 \end{array} \]

Sample Variance:

\[ \begin{array}{rcl} s^2 = \frac{\sum (x_i - \bar{x})^2}{n-1} = \frac{1.344}{9} = 0.1493 \end{array} \]

Sample Standard Deviation:

\[ \begin{array}{rcl} s = \sqrt{0.1493} = 0.3864 \text{ minutes} \end{array} \]

Standard Error (SE):

\[ \begin{array}{rcl} SE = \frac{s}{\sqrt{n}} = \frac{0.3864}{\sqrt{10}} = \frac{0.3864}{3.162} = 0.1222 \end{array} \]

T-statistic:

\[ \begin{array}{rcl} t = \frac{9.86 - 10}{0.1222} = \frac{-0.14}{0.1222} = -1.146 \end{array} \]

Degrees of Freedom: \[df = n - 1 = 10 - 1 = 9\]

  • One-tailed probability = \[\approx 0.1405\]

  • Two-tailed p-value = \[2 \times 0.1405 = 0.281\]

Soal 4

4.Make a Statistical Decision

Decision Rule:

  • If p-value \(\leq \alpha\), reject \(H_0\)
  • If p-value \(> \alpha\), fail to reject \(H_0\)

Decision:

Given:

  • P-value = 0.281

  • Significance level \(\alpha = 0.05\)

Since p-value (0.281) \(> \alpha\) (0.05), FAIL TO REJECT the null hypothesis (\(H_0\)).

Soal 5

5.Explain How Sample Size Affects Inferential Reliability

Sample size plays a crucial role in the reliability of statistical inference.

When the sample size is small:

  • The standard error is larger, making estimates less precise.

  • The statistical power is lower, increasing the risk of a Type II error (failing to detect a real difference).

  • Results are more sensitive to outliers and random variation.

As sample size increases:

  • The standard error decreases.

  • Confidence intervals become narrower.

  • Hypothesis tests become more capable of detecting true effects.

In this case, the small sample size (\(n = 10\)) limits the strength of the conclusion. Although the sample mean is slightly below 10 minutes, the data do not provide strong enough evidence to conclude a real difference.

Case Study 3

Two-Sample T-Test (A/B Testing)

A product analytics team conducts an A/B test to compare the average session duration (minutes) between two versions of a landing page.

Version Sample Size (n) Mean Standard Deviation
A 25 4.8 1.2
B 25 5.4 1.4

Tasks:

Soal 1

1.Formulate the null and alternative hypotheses.

Null Hypothesis (H₀): \[H_0:\mu_A=\mu_B\]

There is no difference in the average session duration between Version A and Version B.

Alternative Hypothesis (H₁): \[H_1:\mu_A\neq\mu_B\]

There is a difference in the average session duration between the two landing page versions.

Soal 2

2.Identify the type of t-test required.

The appropriate statistical test for this scenario is an Independent Two-Sample t-Test because:

  • The two samples (Version A and Version B) are independent.

  • The population standard deviations are unknown.

  • The sample sizes are relatively small (n = 25 per group).

  • The sample standard deviations are not identical, so a Welch’s Two-Sample t-Test is preferred as it does not assume equal variances.

Soal 3

3.Compute the test statistic and p-value.

The test statistic for a two-sample t-test (Welch’s version) is given by: \[t = \frac{\bar{x}_A - \bar{x}_B} {\sqrt{\frac{s_A^2}{n_A} + \frac{s_B^2}{n_B}}}\]

Where:

  • \(\bar{x}=4.8, s_A=1.2, n_A=25\)

  • \(\bar{x}=5.4, s_B=1.4, n_A=25\)

Calculation

Standard Error (SE):

\[\begin{array}{rcl} SE &=& \sqrt{\dfrac{s_A^2}{n_A} + \dfrac{s_B^2}{n_B}} \\[6pt] &=& \sqrt{\dfrac{1.2^2}{25} + \dfrac{1.4^2}{25}} \\[6pt] &=& \sqrt{\dfrac{1.44}{25} + \dfrac{1.96}{25}} \\[6pt] &=& \sqrt{0.0576 + 0.0784} \\[6pt] &=& \sqrt{0.136} \\[6pt] &\approx& 0.369 \end{array}\]

t-statistic: \[\begin{array}{rcl} t &=& \dfrac{4.8 - 5.4}{0.369} \\[6pt] &=& \dfrac{-0.6}{0.369} \\[6pt] &\approx& -1.63 \end{array}\]

Degrees of freedom (df) using Welch’s approximation:

\[ \begin{array}{rcl} df = \frac{\left(\frac{s_A^2}{n_A} + \frac{s_B^2}{n_B}\right)^2}{\frac{\left(\frac{s_A^2}{n_A}\right)^2}{n_A - 1} + \frac{\left(\frac{s_B^2}{n_B}\right)^2}{n_B - 1}} \end{array} \]

Calculate numerator:

\[ \begin{array}{rcl} \left(\frac{s_A^2}{n_A} + \frac{s_B^2}{n_B}\right)^2 = (0.0576 + 0.0784)^2 = (0.136)^2 = 0.0185 \end{array} \]

Calculate denominator:

\[ \begin{array}{rcl} \frac{(0.0576)^2}{24} + \frac{(0.0784)^2}{24} &=& \frac{0.00332}{24} + \frac{0.00615}{24} \\ &=& 0.000138 + 0.000256 &= 0.000394 \end{array} \]

Calculate df:

\[ \begin{array}{rcl} df = \frac{0.0185}{0.000394} = 46.95 \approx 47 \end{array} \]

Calculate P-Value

For a two-tailed test with \(t = 1.627\) and \(df = 47\):

  • One-tailed probability \(\approx 0.0553\)

  • Two-tailed p-value = \(2 \times 0.0553 = 0.1106\approx 0.111\)

Soal 4

4.Draw a statistical conclusion at \(\alpha = 0.05\).

Decision Rule:

  • If p-value \(\leq \alpha\), reject \(H_0\)

  • If p-value \(>\alpha\), fail to reject \(H_0\)

Decision:

Given:

  • P-value = 0.111

  • Significance level \(\alpha = 0.05\)

Since p-value (0.111) \(>\alpha\) (0.05), FAIL TO REJECT the null hypothesis (\(H_0\)).

Soal 5

5.Interpret the result for product decision-making.

From a product analytics perspective, the results indicate that although Version B shows a higher average session duration (5.4 minutes) compared to Version A (4.8 minutes), this observed difference is not statistically significant.

This implies that:

  • The increase in session duration for Version B may be due to random sampling variation rather than a true improvement.

  • There is insufficient evidence to confidently conclude that Version B outperforms Version A.

  • The product team should be cautious about rolling out Version B solely based on this result.

Overall, while Version B appears promising descriptively, the statistical evidence does not yet justify a definitive product change.

Case Study 4

Chi-Square Test of Independence

An e-commerce company examines whether device type is associated with payment method preference.

Device / Payment E-Wallet Credit Card Cash on Delivery
Mobile 120 80 50
Desktop 60 90 40

Tasks:

Soal 1

1.State the Null Hypothesis (H₀) and Alternative Hypothesis (H₁).

Null Hypothesis (H₀):

\[ \begin{array}{l} H_0: \text{Device type and payment method are independent} \end{array} \]

In other words, the choice of payment method does not depend on the device type used.

Alternative Hypothesis (H₁):

\[ \begin{array}{l} H_1: \text{Device type and payment method are not independent} \end{array} \]

This means there is a relationship between device type and payment method choice.

Soal 2

2.Identify the Appropriate Statistical Test

The Chi-Square Test of Independence is the most appropriate method for this analysis because:

  • Both variables (device type and payment method) are categorical.

  • The data are presented in a contingency table.

  • The objective is to test whether two categorical variables are associated.

Soal 3

3.Compute the Chi-Square statistic (χ²).

Calculate Expected Frequencies

The expected frequency for each cell is calculated using:

\[ \begin{array}{rcl} E_{ij} = \frac{(\text{Row Total}_i) \times (\text{Column Total}_j)}{\text{Grand Total}} \end{array} \]

Expected Frequencies Table:

  • For Mobile row:

\[ \begin{array}{rcl} E_{\text{Mobile, E-Wallet}} &=& \frac{250 \times 180}{440} = \frac{45000}{440} = 102.27 \\ E_{\text{Mobile, Credit Card}} &=& \frac{250 \times 170}{440} = \frac{42500}{440} = 96.59 \\ E_{\text{Mobile, COD}} &=& \frac{250 \times 90}{440} = \frac{22500}{440} = 51.14 \end{array} \]

  • For Desktop row:

\[ \begin{array}{rcl} E_{\text{Desktop, E-Wallet}} &=& \frac{190 \times 180}{440} = \frac{34200}{440} = 77.73 \\ E_{\text{Desktop, Credit Card}} &=& \frac{190 \times 170}{440} = \frac{32300}{440} = 73.41 \\ E_{\text{Desktop, COD}} &=& \frac{190 \times 90}{440} = \frac{17100}{440} = 38.86 \end{array} \]

Expected Frequencies Table:

Device E-Wallet Credit Card Cash on Delivery
Mobile 102.27 96.59 51.14
Desktop 77.73 73.41 38.86

Verification: All expected frequencies are ≥ 5, so the Chi-Square test is appropriate.

Calculate Chi-Square Statistic

The Chi-Square test statistic is calculated using:

\[ \begin{array}{rcl} \chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \end{array} \]

Where:

  • \(O_{ij}\) = Observed frequency in cell \((i,j)\)

  • \(E_{ij}\) = Expected frequency in cell \((i,j)\)

Calculate each cell contribution:

  • Mobile row:

\[ \begin{array}{rcl} \text{Mobile, E-Wallet:} && \frac{(120 - 102.27)^2}{102.27} = \frac{(17.73)^2}{102.27} = \frac{314.35}{102.27} = 3.074 \\ \text{Mobile, Credit Card:} && \frac{(80 - 96.59)^2}{96.59} = \frac{(-16.59)^2}{96.59} = \frac{275.23}{96.59} = 2.850 \\ \text{Mobile, COD:} && \frac{(50 - 51.14)^2}{51.14} = \frac{(-1.14)^2}{51.14} = \frac{1.30}{51.14} = 0.025 \end{array} \]

  • Desktop row:

\[ \begin{array}{rcl} \text{Desktop, E-Wallet:} && \frac{(60 - 77.73)^2}{77.73} = \frac{(-17.73)^2}{77.73} = \frac{314.35}{77.73} = 4.044 \\ \text{Desktop, Credit Card:} && \frac{(90 - 73.41)^2}{73.41} = \frac{(16.59)^2}{73.41} = \frac{275.23}{73.41} = 3.750 \\ \text{Desktop, COD:} && \frac{(40 - 38.86)^2}{38.86} = \frac{(1.14)^2}{38.86} = \frac{1.30}{38.86} = 0.033 \end{array} \]

  • Sum all contributions:

\[ \begin{array}{rcl} \chi^2 &=& 3.074 + 2.850 + 0.025 + 4.044 + 3.750 + 0.033 \\ \chi^2 &=& 13.776 \end{array} \]

  • Chi-Square Statistic: \(\chi^2 = 13.776\)

Calculate Degrees of Freedom

\[ \begin{array}{rcl} df = (r - 1) \times (c - 1) \end{array} \]

Where:

  • \(r\) = number of rows = 2

  • \(c\) = number of columns = 3

\[ \begin{array}{rcl} df = (2 - 1) \times (3 - 1) = 1 \times 2 = 2 \end{array} \]

Soal 4

4.Determine the p-value at \(\alpha = 0.05\).

Using Chi-Square distribution table or calculator with \(\chi^2 = 13.776\) and \(df = 2\):

\[ \begin{array}{rcl} P(\chi^2 \geq 13.776) \approx 0.001 \end{array} \]

Critical Value Approach (Alternative)

At \(\alpha = 0.05\) and \(df = 2\), the critical value from Chi-Square table:

\[ \begin{array}{rcl} \chi^2_{\text{critical}} = 5.991 \end{array} \]

Since \(\chi^2 = 13.776 > 5.991\), we reject \(H_0\).

Statistical Decision

Decision Rule:

  • If p-value \(\leq \alpha\), reject \(H_0\)
  • If p-value \(> \alpha\), fail to reject \(H_0\)

Decision:

Given:

  • P-value = 0.001
  • Significance level \(\alpha = 0.05\)

Since p-value (0.001) \(< \alpha\) (0.05), REJECT the null hypothesis (\(H_0\)).

Soal 5

5.Interpret the results in terms of digital payment strategy.

From a digital payment and business strategy perspective:

  • Mobile users show a stronger tendency toward E-Wallet payments, likely due to convenience and mobile-first payment integrations.

  • Desktop users rely more heavily on Credit Cards, which may be influenced by familiarity, perceived security, or ease of data entry on larger screens.

  • Cash on Delivery usage appears relatively stable across devices but is less dominant overall.

Strategic Implications:

  • Optimize E-Wallet promotions and UI for mobile users.

  • Maintain strong credit card visibility and trust signals for desktop users.

  • Use device-based personalization to improve checkout conversion rates.

  • Align marketing campaigns with dominant payment preferences per device.

Overall, this analysis provides actionable insights that can help the company tailor its payment infrastructure and user experience to better match customer behavior.

Case Study 5

Type I and Type II Errors (Conceptual)

A fintech startup tests whether a new fraud detection algorithm reduces fraudulent transactions.

  • H₀: The new algorithm does not reduce fraud.

  • H₁: The new algorithm reduces fraud.

Tasks:

Soal 1

1.Explain a Type I Error (α) in this context.

A Type I Error occurs when the null hypothesis is rejected even though it is true.

In this case, a Type I Error means:

“The fintech startup concludes that the new fraud detection algorithm reduces fraud, when in reality it does not.”

Practical Implications:

  • The company deploys an ineffective algorithm.

  • Fraudulent transactions continue at the same level.

  • The company incurs financial losses due to undetected fraud.

  • There may be reputational damage if customers lose trust in the platform’s security.

Thus, a Type I Error represents a false positive, where the algorithm is believed to work when it actually does not.

Soal 2

2.Explain a Type II Error (β) in this context.

A Type II Error occurs when the null hypothesis is not rejected even though the alternative hypothesis is true.

In this context, a Type II Error means:

“The fintech startup concludes that the new fraud detection algorithm does not reduce fraud, when in reality it does reduce fraud.”

Practical Implications:

  • A potentially effective fraud prevention tool is not implemented.

  • The company misses an opportunity to reduce fraud-related losses.

  • Resources invested in developing the algorithm are wasted.

  • Competitors may gain an advantage by adopting better fraud detection methods.

A Type II Error represents a false negative, where a useful improvement is overlooked.

Soal 3

3.Identify which error is more costly from a business perspective.

From a business standpoint, Type I Error is generally more costly in this scenario.

Reasoning:

  • Deploying an ineffective fraud detection system creates a false sense of security.

  • Fraud continues while the company assumes it is protected.

  • Financial losses from fraud can accumulate quickly.

  • Reputational damage and regulatory risks may arise.

While Type II Errors also have costs (missed improvements), the direct and immediate financial risks associated with Type I Errors in fraud detection are typically more severe.

Soal 4

4.Discuss how sample size affects Type II Error.

Sample size plays a crucial role in determining the probability of a Type II Error.

  • Small sample sizes make it harder to detect a true reduction in fraud.

  • With limited data, even a genuinely effective algorithm may not show a statistically significant result.

  • This increases the likelihood of a Type II Error (β).

As the sample size increases:

  • Estimates become more precise.

  • The test becomes more sensitive to real effects.

  • The probability of a Type II Error decreases.

Therefore, larger sample sizes improve the ability to detect true fraud reduction.

Soal 5

5.Explain the relationship between α, β, and statistical power.

The key relationships are:

  • α (Significance Level): Probability of committing a Type I Error.,/p>

  • β: Probability of committing a Type II Error.

  • Statistical Power: Probability of correctly rejecting the null hypothesis when the alternative is true.

Mathematically:

\[ \text{Power} = 1 - \beta \]

Trade-offs and Interpretation

  • Reducing α (being more conservative) decreases the chance of Type I Error but increases β, making Type II Errors more likely.

  • Increasing sample size allows both α and β* to be kept low.

  • High statistical power means the test is effective at detecting real improvements in fraud detection.

In practice, fintech companies aim to balance these factors by:

  • Choosing an appropriate α level.

  • Collecting sufficient data.

  • Ensuring the test has adequate power before making deployment decisions.

Case Study 6

P-Value and Statistical Decision Making

A churn prediction model evaluation yields the following results:

  • Test statistic = 2.31
  • p-value = 0.021
  • Significance level: \(\alpha = 0.05\)

Tasks:

Soal 1

1.Explain the meaning of the p-value.

The p-value represents the probability of observing a test statistic at least as extreme as the one obtained, assuming that the null hypothesis (H₀) is true.

In this context, a p-value of 0.021 means:

“If the churn model actually provides no real improvement, there is a 2.1% chance of observing a test statistic as large as 2.31 purely due to random variation.”

A smaller p-value indicates that the observed result is less likely to be caused by random chance, providing stronger evidence against the null hypothesis.

Soal 2

2.Make a statistical decision.

Decision Rule

The standard decision rule for hypothesis testing:

\[ \begin{array}{ll} \text{If } p\text{-value} \leq \alpha: & \text{Reject } H_0 \\ \text{If } p\text{-value} > \alpha: & \text{Fail to reject } H_0 \end{array} \]

Apply Decision Rule

Given:

  • p-value = 0.021
  • Significance level: \(\alpha = 0.05\)

Comparison:

\[ \begin{array}{rcl} \text{p-value} &=& 0.021 \\ \alpha &=& 0.05 \\ 0.021 &<& 0.05 \end{array} \]

Statistical Decision:

Since p-value (0.021) < α (0.05), we REJECT the null hypothesis (\(H_0\)).

This indicates that the churn prediction model’s performance is statistically significant at the 5% level.

Soal 3

3.Translate the decision into non-technical language for management.

In plain, non-technical language:

“The results suggest that the improvement we see in the churn prediction model is unlikely to be due to random chance. We can be reasonably confident that the model is genuinely performing better than a baseline approach.”

This supports moving forward with further validation, controlled deployment, or business integration of the model.

Soal 4

4.Discuss the risk if the sample is not representative.

Statistical conclusions rely heavily on the assumption that the sample data accurately represent the broader customer population.

If the sample is not representative:

  • The p-value may give a false sense of confidence.

  • The model might perform well only for a specific subgroup (e.g., long-term users, one region, or one pricing tier).

  • Business decisions based on these results may fail when the model is applied at scale.

  • This can lead to incorrect conclusions about churn reduction effectiveness.

In short, a statistically significant result does not guarantee real-world success if the underlying data are biased.

Soal 5

5.Explain why the p-value does not measure effect size.

The p-value indicates whether an effect exists, not how large or meaningful that effect is.

Key points:

  • A small p-value can result from a very small improvement if the sample size is large enough.

  • Conversely, a large and practically important improvement may produce a non-significant p-value if the sample size is small.

  • The p-value does not communicate the magnitude, business impact, or financial value of the improvement.

To assess effect size, additional metrics are needed, such as:

  • Lift or improvement percentage

  • Cohen’s d

  • Reduction in churn rate

  • Expected revenue impact

Reference

Siregar, B. (n.d.). Introduction to statistics: Chapter 9: Statistical Inference. dsciencelabs. https://bookdown.org/dsciencelabs/intro_statistics/09-Statistical_Inference.html?authuser=0

Mansyur, S. (2025). Statistik dasar. UP45 Press – Universitas Proklamasi 45. https://press.up45.ac.id/wp-content/uploads/sites/42/2025/03/STATISTIK-DASAR-BOOK-CHAPTER_ok_KIRIM.pdf

Levine, D. M., & Stephan, D. F. (2022). Hypothesis testing: Z and t tests. In Even you can learn statistics and analytics: An easy to understand guide. Addison-Wesley Professional. https://www.oreilly.com/library/view/even-you-can/9780137654789/