Foto

Fityanandra Athar Adyaksa (52250059)


Data Science students at

Enthusiastic about learning

December 29, 2025




Case Study 1

One-Sample Z-Test (Statistical Hypotheses)

A digital learning platform claims that the average daily study time of its users is 120 minutes. Based on historical records, the population standard deviation is known to be 15 minutes.

A random sample of 64 users shows an average study time of 116 minutes.

\[ \begin{eqnarray*} \mu_0 &=& 120 \\ \sigma &=& 15 \\ n &=& 64 \\ \bar{x} &=& 116 \end{eqnarray*} \]

Tasks

  1. Formulate the Null Hypothesis (H₀) and Alternative Hypothesis (H₁).
  2. Identify the appropriate statistical test and justify your choice.
  3. Compute the test statistic and p-value using \(\alpha = 0.05\).
  4. State the statistical decision.
  5. Interpret the result in a business analytics context.



1. Formulating Hypotheses

In inference, we start by defining what we are testing against.

  • Null Hypothesis (\(H_0\)): \(\mu = 120\)

    • Meaning: The average daily study time is exactly what the platform claims (120 minutes). There is no “real” difference; any variation in our sample is just random luck.
  • Alternative Hypothesis (\(H_1\)): \(\mu \neq 120\)

    • Meaning: The average daily study time is significantly different from 120 minutes. This is a two-tailed test because we are checking for a difference in either direction (more or less).


2. Appropriate Statistical Test

The appropriate test is the One-Sample Z-Test.
  • Justification:
    1. Known \(\sigma\): We are explicitly given the population standard deviation (\(\sigma = 15\)).
    2. Normality/Sample Size: The sample size (\(n = 64\)) is greater than 30. According to the Central Limit Theorem, the sampling distribution of the mean will be approximately normal regardless of the underlying population distribution.


3. Computing Test Statistic and P-value

We need to see how many “standard errors” our sample mean (\(\bar{x} = 116\)) sits away from the claimed mean (\(\mu = 120\)).

  • Calculate the Standard Error (\(SE\))\[SE = \frac{\sigma}{\sqrt{n}} = \frac{15}{\sqrt{64}} = \frac{15}{8} = 1.875\]
  • Calculate the Z-Statistic \[Z = \frac{\bar{x} - \mu_0}{SE} = \frac{116 - 120}{1.875} = \frac{-4}{1.875} \approx -2.13\]
  • Find the P-Value Using a standard normal distribution table for \(Z = -2.13\):

    • The area to the left of \(-2.13\) is approximately 0.0166.
    • Since this is a two-tailed test, we multiply by 2: \(0.0166 \times 2 = \mathbf{0.0332}\).


4. Statistical Decision

We compare our p-value to our significance level (\(\alpha = 0.05\)):

  • Comparison: \(0.0332 < 0.05\)
  • Decision: Reject the Null Hypothesis (\(H_0\)).


5. Business Analytics Interpretation

From a business perspective, the platform’s claim that users study for 120 minutes is statistically unsupported by this data.

The sample mean of 116 minutes is low enough that it is unlikely to have happened by random chance if the true average were 120. As a data analyst, you would advise the marketing or product team that their “120-minute” claim is likely an overestimation and should be revised to reflect actual user behavior more accurately.




Case Study 2

One-Sample T-Test (σ Unknown, Small Sample)

A UX Research Team investigates whether the average task completion time of a new application differs from 10 minutes.

The following data are collected from 10 users:

\[ 9.2,\; 10.5,\; 9.8,\; 10.1,\; 9.6,\; 10.3,\; 9.9,\; 9.7,\; 10.0,\; 9.5 \]

Tasks

  1. Define H₀ and H₁ (two-tailed).
  2. Determine the appropriate hypothesis test.
  3. Calculate the t-statistic and p-value at \(\alpha = 0.05\).
  4. Make a statistical decision.
  5. Explain how sample size affects inferential reliability.



1. Hypotheses Formulation

Null Hypothesis (H₀):

H₀: μ = 10 minutes

  • The average task completion time for the new application is 10 minutes

  • No difference from the target benchmark

Alternative Hypothesis (H₁):

H₁: μ ≠ 10 minutes

  • The average task completion time differs from 10 minutes

  • This is a two-tailed test because we’re checking for any difference (faster or slower)



2. Appropriate Hypothesis Test

Justification:

  1. Parameter of interest: Population mean (μ)

  2. Population standard deviation: Unknown (we only have sample data)

  3. Sample size: Small (n = 10 < 30)

  4. Conditions check:

    • Random sample (assumed from “10 users”)
    • Independence (reasonable to assume one user’s completion time doesn’t affect another’s)
    • Normality assumption needed (with small n, we assume the population of completion times is approximately normal)

Why t-test instead of z-test?

  • σ is unknown
  • Small sample size (n < 30)
  • We must estimate σ using the sample standard deviation (s)


3. Calculation

Data: 9.2, 10.5, 9.8, 10.1, 9.6, 10.3, 9.9, 9.7, 10.0, 9.5
  1. Find the Sample Mean (\(\bar{x}\))

\[\begin{aligned} \bar{x} &= \frac{\sum\_{i=1}^{n} x\_i}{n} \\ &= \frac{98.6}{10} \\ &= 9.86 \text{ minutes} \end{aligned}\]

  1. Find the Sample Standard Deviation (\(s\)) Calculate the variance by finding the sum of squared differences from the mean:
  • \(\sum(x_i - \bar{x})^2 = 1.344\).
  • \(s = \sqrt{\frac{1.344}{10 - 1}} \approx \mathbf{0.386}\)
  1. Calculate the T-statistic

\[t = \frac{\bar{x} - \mu}{s / \sqrt{n}} = \frac{9.86 - 10}{0.386 / \sqrt{10}} = \frac{-0.14}{0.122} \approx \mathbf{-1.15}\]

  1. Determine the P-Value using the T-distribution table with Degrees of Freedom (3\(df\)) = 9 (which is 4\(n-1\)):
  • For \(t = -1.15\) (two-tailed), the p-value is approximately 0.28.


4. Statistical Decision

  • Comparison: \(p \text{-value} (0.28) > \alpha (0.05)\).
  • Decision: Fail to Reject the Null Hypothesis (\(H_0\)).
  • Interpretation: There is not enough evidence to say the average completion time differs significantly from 10 minutes. The difference we observed (9.86 vs 10) is likely due to random sampling variation.


5. Sample Size and Inferential Reliability

Sample size (\(n\)) plays a critical role in hypothesis testing and the reliability of our inferences:

  • Larger sample size:
    • Reduces the standard error (\(s / \sqrt{n}\)), making the t-statistic larger for the same deviation from \(\mu_0\).
    • Increases statistical power (ability to detect real differences if they exist; reduces Type II error risk).
    • Makes the test more sensitive — even small deviations from H₀ can become statistically significant.
    • Improves reliability: Conclusions are more precise and generalizable.
  • Small sample size (like n=10 here):
    • Larger standard error → smaller |t|-statistic → higher p-value → harder to reject H₀.
    • Lower power: Higher chance of missing real effects (Type II error).
    • Wider confidence intervals → less precise estimates.
    • Less reliable inference: Results are more variable and sensitive to outliers.

In this case: The sample mean (9.86) is slightly below 10, but with only n=10 and low variability, the difference is not statistically significant. A larger sample (e.g., n=50) showing the same mean difference would likely yield a much smaller p-value and rejection of H₀.




Case Study 3

Two-Sample T-Test (A/B Testing)

A product analytics team conducts an A/B test to compare the average session duration (minutes) between two versions of a landing page.

Version Sample Size (n) Mean Standard Deviation
A 25 4.8 1.2
B 25 5.4 1.4

Tasks

  1. Formulate the null and alternative hypotheses.
  2. Identify the type of t-test required.
  3. Compute the test statistic and p-value.
  4. Draw a statistical conclusion at \(\alpha = 0.05\).
  5. Interpret the result for product decision-making.



1. Hypotheses Formulation

Null Hypothesis (H₀):

\[\begin{aligned} H_0 : \mu_A = \mu_B \end{aligned}\]
  • The average session duration is equal for both landing page versions
  • No difference in user engagement between versions A and B

Alternative Hypothesis (H₁):

\[\begin{aligned} H_1 : \mu_A \neq \mu_B \end{aligned}\]
  • The average session duration differs between the two versions
  • Version A and B have different levels of user engagement
  • Note: This is a two-tailed test as we’re checking for any difference


2. Type of T-Test Required

Test Selection: Two-Sample Independent T-Test (Welch’s t-test)

Justification:

  1. Comparing two independent groups (Version A vs Version B)

  2. Population standard deviations unknown (only sample SDs provided: 1.2 and 1.4)

  3. Sample sizes are equal (both n = 25), but not necessarily large enough for z-test

  4. Equal variances assumption check needed:

  • We need to decide between Student’s t-test (equal variances) and Welch’s t-test (unequal variances)
  • Sample standard deviations: s_A = 1.2, s_B = 1.4
  • Ratio: (1.4²)/(1.2²) = 1.96/1.44 ≈ 1.36 < 2 (often used as a rule of thumb)
  • We’ll use Welch’s t-test as it’s more robust and doesn’t assume equal variances


3. Compute the Test Statistic and P-Value

To compare the two groups, we calculate how much the difference in means (\(5.4 - 4.8 = 0.6\)) stands out against the combined “noise” (standard error) of both groups.

Calculate the Standard Error (\(SE\))

Since the sample sizes are equal (\(n=25\)), we use the formula:

\[SE = \sqrt{\frac{s_A^2}{n_A} + \frac{s_B^2}{n_B}}\] \[SE = \sqrt{\frac{1.2^2}{25} + \frac{1.4^2}{25}} = \sqrt{\frac{1.44}{25} + \frac{1.96}{25}} \\ = \sqrt{0.0576 + 0.0784} \approx \mathbf{0.369}\]

Calculate the T-Statistic

\[t = \frac{\bar{x}_B - \bar{x}_A}{SE}\\ = \frac{5.4 - 4.8}{0.369} = \frac{0.6}{0.369} \approx \mathbf{1.626}\]

Determine the P-Value

Using the degrees of freedom (\(df \approx 48\) using a simplified \(n_A + n_B - 2\)):

  • For \(t = 1.626\), the two-tailed p-value is approximately 0.110.


4. Draw a Statistical Conclusion

  • Comparison: \(p \text{-value} (0.110) > \alpha (0.05)\).

  • Decision: Fail to Reject the Null Hypothesis (\(H_0\)).

  • Reasoning: Although Version B had a higher average (5.4 minutes) than Version A (4.8 minutes), the p-value tells us there is an 11% chance this difference happened just by random luck. Since 11% is higher than our 5% threshold, we cannot claim the result is “statistically significant.”



5. Product Decision-Making Interpretation

Version B shows a higher sample mean session duration (5.4 vs. 4.8 minutes, a +12.5% increase), but this difference is not statistically significant at \(\alpha = 0.05\).

Product Implications:

  • We cannot confidently claim that Version B improves user engagement over Version A based on this data.
  • The observed improvement could be due to random variation rather than a true effect.
  • Recommendation: Do not roll out Version B universally yet. Options include:
    • Continue the test with a larger sample size to increase statistical power and potentially detect a real difference.
    • Consider a one-tailed test in future if the team has strong prior belief that B should perform better (this would make rejection easier).
    • Explore practical significance: Even if significant, a 0.6-minute increase may not justify development and rollout costs.
    • Segment the data (e.g., by device, traffic source) to check if Version B performs better in specific subgroups.

Key Lesson in A/B Testing: Statistical significance is essential before declaring a “winner.” Many promising variants fail to reach significance due to insufficient sample size or small effect sizes.




Case Study 4

Chi-Square Test of Independence

An e-commerce company examines whether device type is associated with payment method preference.

Device / Payment E-Wallet Credit Card Cash on Delivery
Mobile 120 80 50
Desktop 60 90 40

Tasks

  1. State the Null Hypothesis (H₀) and Alternative Hypothesis (H₁).
  2. Identify the appropriate statistical test.
  3. Compute the Chi-Square statistic (χ²).
  4. Determine the p-value at \(\alpha = 0.05\).
  5. Interpret the results in terms of digital payment strategy.



1. Hypotheses Formulation

Null Hypothesis (H₀):

\[H_0 : \text{Device type and payment} \\ \text{method are independent}\]

  • There is no association between the type of device used and payment method preference
  • Any observed differences in the contingency table are due to random chance

Alternative Hypothesis (H₁):

\[H_1 : \text{Device type and payment}\\ \text{method are dependent}\]

  • There is a statistically significant association between device type and payment method
  • Payment preferences differ between mobile and desktop users


2. Appropriate Statistical Test

Test Selection: Pearson’s Chi-Square Test of Independence

Justification:

  1. Two categorical variables:

    • Device type (Mobile, Desktop) - 2 categories

    • Payment method (E-Wallet, Credit Card, Cash on Delivery) - 3 categories

  2. Independent observations: Each user contributes to only one cell

  3. Expected frequencies ≥ 5: We’ll verify this during calculation

  4. Goal: Test association/independence between two categorical variables

3. Compute the Chi-Square Statistic (\(\chi^2\))

To find the test statistic, we compare the Observed values (the data we have) to the Expected values (what the data would look like if there were no relationship).

  1. Calculate Row and Column Totals
library(knitr)
data_tabel <- data.frame(
  Device = c("E-Wallet", "Credit Card", "Cash on Delivery", "Row Total"),
  Mobile = c(120, 80, 50, 250),
  Desktop = c(60, 90, 40, 190),
  Column_Total = c(180, 170, 90, 440)
)

colnames(data_tabel) <- c("Device", "Mobile", "Desktop", "Column Total")

kable(data_tabel, 
      caption = "Contingency Table: Device Type vs Payment Method",
      align = "lccc") # l=left, c=center
Contingency Table: Device Type vs Payment Method
Device Mobile Desktop Column Total
E-Wallet 120 60 180
Credit Card 80 90 170
Cash on Delivery 50 40 90
Row Total 250 190 440


  1. Calculate Expected Frequencies (\(E\))

Using the formula

\[E = \frac{(\text{Row Total} \times \text{Column Total})}{\text{Grand Total}}\]

  • Mobile + E-Wallet: \((250 \times 180) / 440 = \mathbf{102.27}\)

  • Mobile + Credit Card: \((250 \times 170) / 440 = \mathbf{96.59}\)

  • Mobile + Cash (COD): \((250 \times 90) / 440 = \mathbf{51.14}\)

  • Desktop + E-Wallet: \((190 \times 180) / 440 = \mathbf{77.73}\)

  • Desktop + Credit Card: \((190 \times 170) / 440 = \mathbf{73.41}\)

  • Desktop + Cash (COD): \((190 \times 90) / 440 = \mathbf{38.86}\)


  1. Calculate Chi-Square Statistic

Formula:

\[\chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}}\]

Calculation :

\[\begin{aligned} \chi^2 &= \frac{(120-102.27)^2}{102.27} + \frac{(80-96.59)^2}{96.59} + \frac{(50-51.14)^2}{51.14} \\ &\quad + \frac{(60-77.73)^2}{77.73} + \frac{(90-73.41)^2}{73.41} + \frac{(40-38.86)^2}{38.86} \\[10pt] &= \frac{17.73^2}{102.27} + \frac{(-16.59)^2}{96.59} + \frac{(-1.14)^2}{51.14} \\ &\quad + \frac{(-17.73)^2}{77.73} + \frac{16.59^2}{73.41} + \frac{1.14^2}{38.86} \\[10pt] &= \frac{314.35}{102.27} + \frac{275.23}{96.59} + \frac{1.30}{51.14} \\ &\quad + \frac{314.35}{77.73} + \frac{275.23}{73.41} + \frac{1.30}{38.86} \\[10pt] &= 3.074 + 2.849 + 0.025 + 4.044 + 3.749 + 0.033 \\ &= 13.774 \end{aligned}\]

\[\chi^2 = 13.774\]

Degrees of freedom:

\[df = (2 - 1) \times (3 - 1) = 1 \times 2 = 2\]



4. Determine P-value

  • p-value ≈ 0.00102 (or approximately 0.001)

Compare to \(\alpha = 0.05\):

  • p-value (0.001) < 0.05

Decision: Reject the null hypothesis (H₀). There is strong evidence at the 5% significance level of an association between device type and payment method preference. (Alternatively, using critical value: For \(\alpha = 0.05\), df=2, critical \(\chi^2 = 5.991\). Observed 13.77 > 5.991 → reject H₀.)



5. Statistical Decision

  • Statistical Decision: Since \(p = 0.001 < \alpha = 0.05\), we Reject the Null Hypothesis (\(H_0\)).
  • Business Insight: There is a statistically significant link between device and payment choice.
  • Observation: Mobile users use E-Wallets much more than expected (120 observed vs. 102.27 expected), while Desktop users prefer Credit Cards (90 observed vs. 73.41 expected).

Strategy:

  1. Mobile UX: Ensure the E-Wallet checkout flow is “one-click” and frictionless.
  2. Desktop UX: Highlight security features and “saved card” options to cater to Credit Card users.
  3. Marketing: Run E-Wallet promotions specifically targeting mobile app users.




Case Study 5

Type I and Type II Errors (Conceptual)

A fintech startup tests whether a new fraud detection algorithm reduces fraudulent transactions.

  • H₀: The new algorithm does not reduce fraud.
  • H₁: The new algorithm reduces fraud.

Tasks

  1. Explain a Type I Error (α) in this context.
  2. Explain a Type II Error (β) in this context.
  3. Identify which error is more costly from a business perspective.
  4. Discuss how sample size affects Type II Error.
  5. Explain the relationship between α, β, and statistical power.



Type I Error (\(\alpha\)) - The “False Alarm”

A Type I error occurs when we reject the null hypothesis when it is actually true. This is a false positive or false alarm.

In the Fraud Detection Context:

  • H₀: The new algorithm does not reduce fraud (no improvement over current system)
  • Type I Error: Concluding that the new algorithm reduces fraud when it actually doesn’t


Practical Example:

The startup conducts a test, analyzes the data, and finds a statistically significant reduction in fraudulent transactions (p < 0.05). They implement the new algorithm company-wide, but in reality, the algorithm is no better than the old one. They’ve wasted resources implementing an ineffective solution.

Mathematical Representation:

\[\alpha = P(\text{Reject } H_0 \mid H_0 \text{ is true})\]

Common α level: 0.05 (5% risk of false positive)



2. Type II Error (\(\beta\)) - The “Failed Detection”

A Type II error occurs when we fail to reject the null hypothesis when it is actually false. This is a false negative or missed detection.

In the Fraud Detection Context:

  • H₁: The new algorithm reduces fraud (it’s genuinely better)
  • Type II Error: Concluding that the new algorithm doesn’t reduce fraud when it actually does

Practical Example: The startup tests the algorithm, finds no statistically significant improvement (p > 0.05), and decides to abandon it. However, the algorithm actually works and could have significantly reduced fraud losses. They’ve missed an opportunity to improve their system.

Mathematical Representation:

\[\beta = P(\text{Fail to reject } H_0 \mid H_1 \text{ is true})\]

Power: 1 - \(\beta\) (probability of correctly detecting a real effect)



3. Which Error is More Costly?

Error Type Financial Cost Reputational Cost Operational Cost
Type I - Development & implementation costs
- Maintenance for unnecessary system
- Opportunity cost
- Loss of trust in results
- Credibility damage
- Wasted engineering resources
- System complexity
Type II - Continued fraud losses
- Lost revenue from fraud
- Regulatory fines
- Perception of being behind competitors
- Customer dissatisfaction
- Missed efficiency gains
- Use of suboptimal system

Business Perspective Assessment: For a Fintech Startup:

  1. Fraud losses are direct financial hits - can threaten company survival
  2. Regulatory requirements mandate effective fraud controls
  3. Customer trust is paramount - fraud incidents damage reputation severely
  4. Competitive landscape demands cutting-edge fraud prevention

Conclusion: In this specific context, Type II error is likely more costly because:

  • Fraud directly impacts the bottom line and can be substantial
  • Missing an effective fraud detection tool leaves the company vulnerable
  • The cost of ongoing fraud likely exceeds the cost of implementing an ineffective algorithm
  • Regulatory penalties for inadequate fraud controls can be severe

However, this depends on:

  • Implementation costs: If the new algorithm is extremely expensive to implement, Type I might be costlier
  • Fraud exposure: If current fraud levels are low, Type I might be relatively worse
  • Business model: For high-volume transactions, even small fraud percentages represent large absolute losses

General Rule: In safety-critical or high-risk domains (medicine, fraud detection, security), Type II errors are often more dangerous.



4. How Sample Size (\(n\)) affects Type II Error

# R code to demonstrate sample size effect on power
library(pwr)

# Fixed parameters: effect size d = 0.5, alpha = 0.05
effect_size <- 0.5
alpha <- 0.05

# Calculate power for different sample sizes
sample_sizes <- c(10, 20, 50, 100, 200)
powers <- sapply(sample_sizes, function(n) {
  pwr.t.test(n = n, d = effect_size, sig.level = alpha, 
             type = "one.sample")$power
})

data.frame(Sample_Size = sample_sizes, 
           Power = round(powers, 3),
           Type_II_Error = round(1 - powers, 3))

Practical Implications for the Startup:

  1. Small sample sizes (e.g., testing on 100 transactions) → High \(\beta\) → Likely to miss real effects
  2. Large sample sizes (e.g., testing on 10,000 transactions) → Low \(\beta\) → Better chance of detecting real improvements
  3. Balancing act: Larger samples cost more time/money but reduce Type II error risk

Recommendation:

  • Conduct power analysis before testing to determine adequate sample size
  • Consider minimum detectable effect that would be economically meaningful
  • Balance statistical rigor with practical constraints


5. Relationship between \(\alpha\), \(\beta\), and Statistical Power

These three concepts are a “balancing act”:

  • The Trade-off: If you try to be extremely safe and lower your \(\alpha\) (e.g., from 0.05 to 0.01) to avoid a False Alarm, you automatically increase your risk of a Type II Error (\(\beta\)).
  • Statistical Power (\(1 - \beta\)): This is the probability that you will correctly detect an effect if there is one.
  • To increase power (and reduce \(\beta\)), you can:
    1. Increase sample size (\(n\)).
    2. Increase the significance level (\(\alpha\))—though this increases Type I risk.
    3. Reduce “noise” in your data collection.




Case Study 6

P-Value and Statistical Decision Making

A churn prediction model evaluation yields the following results:

  • Test statistic = 2.31
  • p-value = 0.021
  • Significance level: \(\alpha = 0.05\)

Tasks

  1. Explain the meaning of the p-value.
  2. Make a statistical decision.
  3. Translate the decision into non-technical language for management.
  4. Discuss the risk if the sample is not representative.
  5. Explain why the p-value does not measure effect size.



1. Meaning of the P-Value

Most students memorize that a p-value is a probability, but few understand what it actually measures: the compatibility of the data with the Null Hypothesis.

Imagine the “Null World”—a world where our churn model is a total fraud and has zero predictive power. In that world, any success we see is just “the luck of the draw.”

The p-value of 0.021 tells us that if we lived in that Null World, the chance of seeing a result as strong as ours is only 2.1%. Because that probability is so low, we conclude that we likely do not live in the Null World. The model is likely doing something “real.”



2. Statistical Decision

In statistics, we don’t say “The model works.” We say “The evidence is strong enough to reject the idea that it doesn’t work.” This is why we use the Significance Level (\(\alpha\)).

\(\alpha\) is our “tolerance for being wrong.” By setting \(\alpha = 0.05\), we are saying: “I am willing to accept a 5% risk of being a False Positive (accusing the Null of being false when it’s actually true).”

Decision Logic: Since \(0.021 < 0.05\), the “weight” of our evidence is heavier than our “risk threshold.” Official Decision: REJECT \(H_0\). We have moved from the land of “maybe” into the land of “statistically significant.”



3. Translation for Management

When you talk to a CEO, they don’t want to hear about Z-scores. They want to hear about Reliability.

The Refined Sentence: “We have validated the Churn Prediction Model against a 95% confidence standard. The analysis shows a high degree of mathematical certainty (97.9%) that the model identifies patterns beyond mere coincidence. From a strategic standpoint, this model is now ‘Production-Ready’ for our retention campaigns.”



4. Risk of a Non-Representative Sample

A p-value is only as good as the data it was built on. This is the Representative Sampling Principle.

If our training data only included “Premium Users,” but we are applying the model to “Free Tier Users,” our p-value is a lie. This is known as Selection Bias.

The Danger: If the sample isn’t a “mini-me” of the entire population, your inferential results will not generalize. You might have a “significant” result that only works for a tiny, specific group of people, leading to a massive failure when the model is launched globally.



5. P-Value vs. Effect Size

This is the most common pitfall for junior analysts.

  • P-value (Statistical Significance): This is about Precision. It tells you if the effect is “real” or “noise.”
  • Effect Size (Practical Significance): This is about Magnitude. It tells you the “size of the win.”
Significance Comparison
Significance Comparison


Always pair your p-value with an Effect Size metric. A result can be statistically significant (real) but practically insignificant (too small to care about).



R Script: The “Decision Engine”

Use the code below to turn the raw test statistic into a final decision. This automates the “Z-table” look-up process.

# Input Variables
z_score <- 2.31
alpha_threshold <- 0.05

# 1. Calculate P-value (Two-Tailed)
# pnorm(z) finds the area to the left. 1-pnorm(z) finds the area to the right.
# We multiply by 2 because we are testing for a difference in 'either' direction.
calculated_p <- 2 * (1 - pnorm(abs(z_score)))

# 2. Display the Results with Professional Formatting
cat("--- CHURN MODEL EVALUATION REPORT ---\n")
## --- CHURN MODEL EVALUATION REPORT ---
cat("Test Statistic (Z): ", z_score, "\n")
## Test Statistic (Z):  2.31
cat("P-Value calculated: ", round(calculated_p, 4), "\n")
## P-Value calculated:  0.0209
cat("Alpha Threshold:    ", alpha_threshold, "\n")
## Alpha Threshold:     0.05
cat("-------------------------------------\n")
## -------------------------------------
# 3. The 'Decision Engine'
if (calculated_p <= alpha_threshold) {
  print("CONCLUSION: STATISTICALLY SIGNIFICANT. REJECT NULL.")
  print("STRATEGY: IMPLEMENT MODEL.")
} else {
  print("CONCLUSION: NOT SIGNIFICANT. FAIL TO REJECT NULL.")
  print("STRATEGY: RE-EVALUATE DATA SOURCE.")
}
## [1] "CONCLUSION: STATISTICALLY SIGNIFICANT. REJECT NULL."
## [1] "STRATEGY: IMPLEMENT MODEL."




References