Case Study 1

One-Sample Z-Test (Statistical Hypotheses)

A digital learning platform claims that the average daily study time of its users is 120 minutes. Based on historical records, the population standard deviation is known to be 15 minutes.

A random sample of 64 users shows an average study time of 116 minutes.

\[ \begin{eqnarray*} \mu_0 &=& 120 \\ \sigma &=& 15 \\ n &=& 64 \\ \bar{x} &=& 116 \end{eqnarray*} \]

Tasks

Formulate the Null Hypothesis (H₀) and Alternative Hypothesis (H₁).
Identify the appropriate statistical test and justify your choice.
Compute the test statistic and p-value using \(\alpha = 0.05\).
State the statistical decision.
Interpret the result in a business analytics context.

1. Formulating Hypotheses

In inference, we start by defining what we are testing against.

Null Hypothesis (\(H_0\)): \(\mu = 120\)
- Meaning: The average daily study time is exactly what the platform claims (120 minutes). There is no “real” difference; any variation in our sample is just random luck.

Alternative Hypothesis (\(H_1\)): \(\mu \neq 120\)
- Meaning: The average daily study time is significantly different from 120 minutes. This is a two-tailed test because we are checking for a difference in either direction (more or less).

2. Appropriate Statistical Test

The appropriate test is the One-Sample Z-Test.

Justification:
1. Known \(\sigma\): We are explicitly given the population standard deviation (\(\sigma = 15\)).
2. Normality/Sample Size: The sample size (\(n = 64\)) is greater than 30. According to the Central Limit Theorem, the sampling distribution of the mean will be approximately normal regardless of the underlying population distribution.

3. Computing Test Statistic and P-value

We need to see how many “standard errors” our sample mean (\(\bar{x} = 116\)) sits away from the claimed mean (\(\mu = 120\)).

Calculate the Standard Error (\(SE\))\[SE = \frac{\sigma}{\sqrt{n}} = \frac{15}{\sqrt{64}} = \frac{15}{8} = 1.875\]

Calculate the Z-Statistic \[Z = \frac{\bar{x} - \mu_0}{SE} = \frac{116 - 120}{1.875} = \frac{-4}{1.875} \approx -2.13\]

Find the P-Value Using a standard normal distribution table for \(Z = -2.13\):
- The area to the left of \(-2.13\) is approximately 0.0166.
- Since this is a two-tailed test, we multiply by 2: \(0.0166 \times 2 = \mathbf{0.0332}\).

4. Statistical Decision

We compare our p-value to our significance level (\(\alpha = 0.05\)):

Comparison: \(0.0332 < 0.05\)
Decision: Reject the Null Hypothesis (\(H_0\)).

5. Business Analytics Interpretation

From a business perspective, the platform’s claim that users study for 120 minutes is statistically unsupported by this data.

The sample mean of 116 minutes is low enough that it is unlikely to have happened by random chance if the true average were 120. As a data analyst, you would advise the marketing or product team that their “120-minute” claim is likely an overestimation and should be revised to reflect actual user behavior more accurately.

Case Study 2

One-Sample T-Test (σ Unknown, Small Sample)

A UX Research Team investigates whether the average task completion time of a new application differs from 10 minutes.

The following data are collected from 10 users:

\[ 9.2,\; 10.5,\; 9.8,\; 10.1,\; 9.6,\; 10.3,\; 9.9,\; 9.7,\; 10.0,\; 9.5 \]

Tasks

Define H₀ and H₁ (two-tailed).
Determine the appropriate hypothesis test.
Calculate the t-statistic and p-value at \(\alpha = 0.05\).
Make a statistical decision.
Explain how sample size affects inferential reliability.

1. Hypotheses Formulation

Null Hypothesis (H₀):

H₀: μ = 10 minutes

The average task completion time for the new application is 10 minutes
No difference from the target benchmark

Alternative Hypothesis (H₁):

H₁: μ ≠ 10 minutes

The average task completion time differs from 10 minutes
This is a two-tailed test because we’re checking for any difference (faster or slower)

2. Appropriate Hypothesis Test

Justification:

Parameter of interest: Population mean (μ)
Population standard deviation: Unknown (we only have sample data)
Sample size: Small (n = 10 < 30)
Conditions check:
- Random sample (assumed from “10 users”)
- Independence (reasonable to assume one user’s completion time doesn’t affect another’s)
- Normality assumption needed (with small n, we assume the population of completion times is approximately normal)

Why t-test instead of z-test?

σ is unknown
Small sample size (n < 30)
We must estimate σ using the sample standard deviation (s)

3. Calculation

Data: 9.2, 10.5, 9.8, 10.1, 9.6, 10.3, 9.9, 9.7, 10.0, 9.5

Find the Sample Mean (\(\bar{x}\))

\[\begin{aligned} \bar{x} &= \frac{\sum\_{i=1}^{n} x\_i}{n} \\ &= \frac{98.6}{10} \\ &= 9.86 \text{ minutes} \end{aligned}\]

Find the Sample Standard Deviation (\(s\)) Calculate the variance by finding the sum of squared differences from the mean:

\(\sum(x_i - \bar{x})^2 = 1.344\).
\(s = \sqrt{\frac{1.344}{10 - 1}} \approx \mathbf{0.386}\)

Calculate the T-statistic

\[t = \frac{\bar{x} - \mu}{s / \sqrt{n}} = \frac{9.86 - 10}{0.386 / \sqrt{10}} = \frac{-0.14}{0.122} \approx \mathbf{-1.15}\]

Determine the P-Value using the T-distribution table with Degrees of Freedom (3\(df\)) = 9 (which is 4\(n-1\)):

For \(t = -1.15\) (two-tailed), the p-value is approximately 0.28.

4. Statistical Decision

Comparison: \(p \text{-value} (0.28) > \alpha (0.05)\).
Decision: Fail to Reject the Null Hypothesis (\(H_0\)).
Interpretation: There is not enough evidence to say the average completion time differs significantly from 10 minutes. The difference we observed (9.86 vs 10) is likely due to random sampling variation.

5. Sample Size and Inferential Reliability

Sample size (\(n\)) plays a critical role in hypothesis testing and the reliability of our inferences:

Larger sample size:
- Reduces the standard error (\(s / \sqrt{n}\)), making the t-statistic larger for the same deviation from \(\mu_0\).
- Increases statistical power (ability to detect real differences if they exist; reduces Type II error risk).
- Makes the test more sensitive — even small deviations from H₀ can become statistically significant.
- Improves reliability: Conclusions are more precise and generalizable.
Small sample size (like n=10 here):
- Larger standard error → smaller |t|-statistic → higher p-value → harder to reject H₀.
- Lower power: Higher chance of missing real effects (Type II error).
- Wider confidence intervals → less precise estimates.
- Less reliable inference: Results are more variable and sensitive to outliers.

In this case: The sample mean (9.86) is slightly below 10, but with only n=10 and low variability, the difference is not statistically significant. A larger sample (e.g., n=50) showing the same mean difference would likely yield a much smaller p-value and rejection of H₀.

Case Study 3

Two-Sample T-Test (A/B Testing)

A product analytics team conducts an A/B test to compare the average session duration (minutes) between two versions of a landing page.

Version	Sample Size (n)	Mean	Standard Deviation
A	25	4.8	1.2
B	25	5.4	1.4

Tasks

Formulate the null and alternative hypotheses.
Identify the type of t-test required.
Compute the test statistic and p-value.
Draw a statistical conclusion at \(\alpha = 0.05\).
Interpret the result for product decision-making.

1. Hypotheses Formulation

Null Hypothesis (H₀):

\[\begin{aligned} H_0 : \mu_A = \mu_B \end{aligned}\]

The average session duration is equal for both landing page versions
No difference in user engagement between versions A and B

Alternative Hypothesis (H₁):

\[\begin{aligned} H_1 : \mu_A \neq \mu_B \end{aligned}\]

The average session duration differs between the two versions
Version A and B have different levels of user engagement
Note: This is a two-tailed test as we’re checking for any difference

2. Type of T-Test Required

Test Selection: Two-Sample Independent T-Test (Welch’s t-test)

Justification:

Comparing two independent groups (Version A vs Version B)
Population standard deviations unknown (only sample SDs provided: 1.2 and 1.4)
Sample sizes are equal (both n = 25), but not necessarily large enough for z-test
Equal variances assumption check needed:

We need to decide between Student’s t-test (equal variances) and Welch’s t-test (unequal variances)
Sample standard deviations: s_A = 1.2, s_B = 1.4
Ratio: (1.4²)/(1.2²) = 1.96/1.44 ≈ 1.36 < 2 (often used as a rule of thumb)
We’ll use Welch’s t-test as it’s more robust and doesn’t assume equal variances

3. Compute the Test Statistic and P-Value

To compare the two groups, we calculate how much the difference in means (\(5.4 - 4.8 = 0.6\)) stands out against the combined “noise” (standard error) of both groups.

Calculate the Standard Error (\(SE\))

Since the sample sizes are equal (\(n=25\)), we use the formula:

\[SE = \sqrt{\frac{s_A^2}{n_A} + \frac{s_B^2}{n_B}}\] \[SE = \sqrt{\frac{1.2^2}{25} + \frac{1.4^2}{25}} = \sqrt{\frac{1.44}{25} + \frac{1.96}{25}} \\ = \sqrt{0.0576 + 0.0784} \approx \mathbf{0.369}\]

Calculate the T-Statistic

\[t = \frac{\bar{x}_B - \bar{x}_A}{SE}\\ = \frac{5.4 - 4.8}{0.369} = \frac{0.6}{0.369} \approx \mathbf{1.626}\]

Determine the P-Value

Using the degrees of freedom (\(df \approx 48\) using a simplified \(n_A + n_B - 2\)):

For \(t = 1.626\), the two-tailed p-value is approximately 0.110.

4. Draw a Statistical Conclusion

Comparison: \(p \text{-value} (0.110) > \alpha (0.05)\).
Decision: Fail to Reject the Null Hypothesis (\(H_0\)).
Reasoning: Although Version B had a higher average (5.4 minutes) than Version A (4.8 minutes), the p-value tells us there is an 11% chance this difference happened just by random luck. Since 11% is higher than our 5% threshold, we cannot claim the result is “statistically significant.”

5. Product Decision-Making Interpretation

Version B shows a higher sample mean session duration (5.4 vs. 4.8 minutes, a +12.5% increase), but this difference is not statistically significant at \(\alpha = 0.05\).

Product Implications:

We cannot confidently claim that Version B improves user engagement over Version A based on this data.
The observed improvement could be due to random variation rather than a true effect.
Recommendation: Do not roll out Version B universally yet. Options include:
- Continue the test with a larger sample size to increase statistical power and potentially detect a real difference.
- Consider a one-tailed test in future if the team has strong prior belief that B should perform better (this would make rejection easier).
- Explore practical significance: Even if significant, a 0.6-minute increase may not justify development and rollout costs.
- Segment the data (e.g., by device, traffic source) to check if Version B performs better in specific subgroups.

Key Lesson in A/B Testing: Statistical significance is essential before declaring a “winner.” Many promising variants fail to reach significance due to insufficient sample size or small effect sizes.

Case Study 4

Chi-Square Test of Independence

An e-commerce company examines whether device type is associated with payment method preference.

Device / Payment	E-Wallet	Credit Card	Cash on Delivery
Mobile	120	80	50
Desktop	60	90	40

Tasks

State the Null Hypothesis (H₀) and Alternative Hypothesis (H₁).
Identify the appropriate statistical test.
Compute the Chi-Square statistic (χ²).
Determine the p-value at \(\alpha = 0.05\).
Interpret the results in terms of digital payment strategy.

1. Hypotheses Formulation

Null Hypothesis (H₀):

\[H_0 : \text{Device type and payment} \\ \text{method are independent}\]

There is no association between the type of device used and payment method preference
Any observed differences in the contingency table are due to random chance

Alternative Hypothesis (H₁):

\[H_1 : \text{Device type and payment}\\ \text{method are dependent}\]

There is a statistically significant association between device type and payment method
Payment preferences differ between mobile and desktop users

2. Appropriate Statistical Test

Test Selection: Pearson’s Chi-Square Test of Independence

Justification:

Two categorical variables:
- Device type (Mobile, Desktop) - 2 categories
- Payment method (E-Wallet, Credit Card, Cash on Delivery) - 3 categories
Independent observations: Each user contributes to only one cell
Expected frequencies ≥ 5: We’ll verify this during calculation
Goal: Test association/independence between two categorical variables

3. Compute the Chi-Square Statistic (\(\chi^2\))

To find the test statistic, we compare the Observed values (the data we have) to the Expected values (what the data would look like if there were no relationship).

Calculate Row and Column Totals

library(knitr)
data_tabel <- data.frame(
  Device = c("E-Wallet", "Credit Card", "Cash on Delivery", "Row Total"),
  Mobile = c(120, 80, 50, 250),
  Desktop = c(60, 90, 40, 190),
  Column_Total = c(180, 170, 90, 440)
)

colnames(data_tabel) <- c("Device", "Mobile", "Desktop", "Column Total")

kable(data_tabel, 
      caption = "Contingency Table: Device Type vs Payment Method",
      align = "lccc") # l=left, c=center

Contingency Table: Device Type vs Payment Method
Device	Mobile	Desktop	Column Total
E-Wallet	120	60	180
Credit Card	80	90	170
Cash on Delivery	50	40	90
Row Total	250	190	440

Calculate Expected Frequencies (\(E\))

Using the formula

\[E = \frac{(\text{Row Total} \times \text{Column Total})}{\text{Grand Total}}\]

Mobile + E-Wallet: \((250 \times 180) / 440 = \mathbf{102.27}\)
Mobile + Credit Card: \((250 \times 170) / 440 = \mathbf{96.59}\)
Mobile + Cash (COD): \((250 \times 90) / 440 = \mathbf{51.14}\)
Desktop + E-Wallet: \((190 \times 180) / 440 = \mathbf{77.73}\)
Desktop + Credit Card: \((190 \times 170) / 440 = \mathbf{73.41}\)
Desktop + Cash (COD): \((190 \times 90) / 440 = \mathbf{38.86}\)

Calculate Chi-Square Statistic

Formula:

\[\chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}}\]

Calculation :

\[\begin{aligned} \chi^2 &= \frac{(120-102.27)^2}{102.27} + \frac{(80-96.59)^2}{96.59} + \frac{(50-51.14)^2}{51.14} \\ &\quad + \frac{(60-77.73)^2}{77.73} + \frac{(90-73.41)^2}{73.41} + \frac{(40-38.86)^2}{38.86} \\[10pt] &= \frac{17.73^2}{102.27} + \frac{(-16.59)^2}{96.59} + \frac{(-1.14)^2}{51.14} \\ &\quad + \frac{(-17.73)^2}{77.73} + \frac{16.59^2}{73.41} + \frac{1.14^2}{38.86} \\[10pt] &= \frac{314.35}{102.27} + \frac{275.23}{96.59} + \frac{1.30}{51.14} \\ &\quad + \frac{314.35}{77.73} + \frac{275.23}{73.41} + \frac{1.30}{38.86} \\[10pt] &= 3.074 + 2.849 + 0.025 + 4.044 + 3.749 + 0.033 \\ &= 13.774 \end{aligned}\]

\[\chi^2 = 13.774\]

Degrees of freedom:

\[df = (2 - 1) \times (3 - 1) = 1 \times 2 = 2\]

4. Determine P-value

p-value ≈ 0.00102 (or approximately 0.001)

Compare to \(\alpha = 0.05\):

p-value (0.001) < 0.05

Decision: Reject the null hypothesis (H₀). There is strong evidence at the 5% significance level of an association between device type and payment method preference. (Alternatively, using critical value: For \(\alpha = 0.05\), df=2, critical \(\chi^2 = 5.991\). Observed 13.77 > 5.991 → reject H₀.)

5. Statistical Decision

Statistical Decision: Since \(p = 0.001 < \alpha = 0.05\), we Reject the Null Hypothesis (\(H_0\)).
Business Insight: There is a statistically significant link between device and payment choice.
Observation: Mobile users use E-Wallets much more than expected (120 observed vs. 102.27 expected), while Desktop users prefer Credit Cards (90 observed vs. 73.41 expected).

Strategy:

Mobile UX: Ensure the E-Wallet checkout flow is “one-click” and frictionless.
Desktop UX: Highlight security features and “saved card” options to cater to Credit Card users.
Marketing: Run E-Wallet promotions specifically targeting mobile app users.

Case Study 5

Type I and Type II Errors (Conceptual)

A fintech startup tests whether a new fraud detection algorithm reduces fraudulent transactions.

H₀: The new algorithm does not reduce fraud.
H₁: The new algorithm reduces fraud.

Tasks

Explain a Type I Error (α) in this context.
Explain a Type II Error (β) in this context.
Identify which error is more costly from a business perspective.
Discuss how sample size affects Type II Error.
Explain the relationship between α, β, and statistical power.

Type I Error (\(\alpha\)) - The “False Alarm”

A Type I error occurs when we reject the null hypothesis when it is actually true. This is a false positive or false alarm.

In the Fraud Detection Context:

H₀: The new algorithm does not reduce fraud (no improvement over current system)
Type I Error: Concluding that the new algorithm reduces fraud when it actually doesn’t

Practical Example:

The startup conducts a test, analyzes the data, and finds a statistically significant reduction in fraudulent transactions (p < 0.05). They implement the new algorithm company-wide, but in reality, the algorithm is no better than the old one. They’ve wasted resources implementing an ineffective solution.

Mathematical Representation:

\[\alpha = P(\text{Reject } H_0 \mid H_0 \text{ is true})\]

Common α level: 0.05 (5% risk of false positive)

2. Type II Error (\(\beta\)) - The “Failed Detection”

A Type II error occurs when we fail to reject the null hypothesis when it is actually false. This is a false negative or missed detection.

In the Fraud Detection Context:

H₁: The new algorithm reduces fraud (it’s genuinely better)
Type II Error: Concluding that the new algorithm doesn’t reduce fraud when it actually does

Practical Example: The startup tests the algorithm, finds no statistically significant improvement (p > 0.05), and decides to abandon it. However, the algorithm actually works and could have significantly reduced fraud losses. They’ve missed an opportunity to improve their system.

Mathematical Representation:

\[\beta = P(\text{Fail to reject } H_0 \mid H_1 \text{ is true})\]

Power: 1 - \(\beta\) (probability of correctly detecting a real effect)

3. Which Error is More Costly?

Error Type	Financial Cost	Reputational Cost	Operational Cost
Type I	- Development & implementation costs - Maintenance for unnecessary system - Opportunity cost	- Loss of trust in results - Credibility damage	- Wasted engineering resources - System complexity
Type II	- Continued fraud losses - Lost revenue from fraud - Regulatory fines	- Perception of being behind competitors - Customer dissatisfaction	- Missed efficiency gains - Use of suboptimal system

Business Perspective Assessment: For a Fintech Startup:

Fraud losses are direct financial hits - can threaten company survival
Regulatory requirements mandate effective fraud controls
Customer trust is paramount - fraud incidents damage reputation severely
Competitive landscape demands cutting-edge fraud prevention

Conclusion: In this specific context, Type II error is likely more costly because:

Fraud directly impacts the bottom line and can be substantial
Missing an effective fraud detection tool leaves the company vulnerable
The cost of ongoing fraud likely exceeds the cost of implementing an ineffective algorithm
Regulatory penalties for inadequate fraud controls can be severe

However, this depends on:

Implementation costs: If the new algorithm is extremely expensive to implement, Type I might be costlier
Fraud exposure: If current fraud levels are low, Type I might be relatively worse
Business model: For high-volume transactions, even small fraud percentages represent large absolute losses

General Rule: In safety-critical or high-risk domains (medicine, fraud detection, security), Type II errors are often more dangerous.

4. How Sample Size (\(n\)) affects Type II Error

# R code to demonstrate sample size effect on power
library(pwr)

# Fixed parameters: effect size d = 0.5, alpha = 0.05
effect_size <- 0.5
alpha <- 0.05

# Calculate power for different sample sizes
sample_sizes <- c(10, 20, 50, 100, 200)
powers <- sapply(sample_sizes, function(n) {
  pwr.t.test(n = n, d = effect_size, sig.level = alpha, 
             type = "one.sample")$power
})

data.frame(Sample_Size = sample_sizes, 
           Power = round(powers, 3),
           Type_II_Error = round(1 - powers, 3))

Practical Implications for the Startup:

Small sample sizes (e.g., testing on 100 transactions) → High \(\beta\) → Likely to miss real effects
Large sample sizes (e.g., testing on 10,000 transactions) → Low \(\beta\) → Better chance of detecting real improvements
Balancing act: Larger samples cost more time/money but reduce Type II error risk

Recommendation:

Conduct power analysis before testing to determine adequate sample size
Consider minimum detectable effect that would be economically meaningful
Balance statistical rigor with practical constraints

5. Relationship between \(\alpha\), \(\beta\), and Statistical Power

These three concepts are a “balancing act”:

The Trade-off: If you try to be extremely safe and lower your \(\alpha\) (e.g., from 0.05 to 0.01) to avoid a False Alarm, you automatically increase your risk of a Type II Error (\(\beta\)).
Statistical Power (\(1 - \beta\)): This is the probability that you will correctly detect an effect if there is one.
To increase power (and reduce \(\beta\)), you can:
1. Increase sample size (\(n\)).
2. Increase the significance level (\(\alpha\))—though this increases Type I risk.
3. Reduce “noise” in your data collection.

Case Study 6

P-Value and Statistical Decision Making

A churn prediction model evaluation yields the following results:

Test statistic = 2.31
p-value = 0.021
Significance level: \(\alpha = 0.05\)

Tasks

Explain the meaning of the p-value.
Make a statistical decision.
Translate the decision into non-technical language for management.
Discuss the risk if the sample is not representative.
Explain why the p-value does not measure effect size.

1. Meaning of the P-Value

Most students memorize that a p-value is a probability, but few understand what it actually measures: the compatibility of the data with the Null Hypothesis.

Imagine the “Null World”—a world where our churn model is a total fraud and has zero predictive power. In that world, any success we see is just “the luck of the draw.”

The p-value of 0.021 tells us that if we lived in that Null World, the chance of seeing a result as strong as ours is only 2.1%. Because that probability is so low, we conclude that we likely do not live in the Null World. The model is likely doing something “real.”

2. Statistical Decision

In statistics, we don’t say “The model works.” We say “The evidence is strong enough to reject the idea that it doesn’t work.” This is why we use the Significance Level (\(\alpha\)).

\(\alpha\) is our “tolerance for being wrong.” By setting \(\alpha = 0.05\), we are saying: “I am willing to accept a 5% risk of being a False Positive (accusing the Null of being false when it’s actually true).”

Decision Logic: Since \(0.021 < 0.05\), the “weight” of our evidence is heavier than our “risk threshold.” Official Decision: REJECT \(H_0\). We have moved from the land of “maybe” into the land of “statistically significant.”

3. Translation for Management

When you talk to a CEO, they don’t want to hear about Z-scores. They want to hear about Reliability.

The Refined Sentence: “We have validated the Churn Prediction Model against a 95% confidence standard. The analysis shows a high degree of mathematical certainty (97.9%) that the model identifies patterns beyond mere coincidence. From a strategic standpoint, this model is now ‘Production-Ready’ for our retention campaigns.”

4. Risk of a Non-Representative Sample

A p-value is only as good as the data it was built on. This is the Representative Sampling Principle.

If our training data only included “Premium Users,” but we are applying the model to “Free Tier Users,” our p-value is a lie. This is known as Selection Bias.

The Danger: If the sample isn’t a “mini-me” of the entire population, your inferential results will not generalize. You might have a “significant” result that only works for a tiny, specific group of people, leading to a massive failure when the model is launched globally.

5. P-Value vs. Effect Size

This is the most common pitfall for junior analysts.

P-value (Statistical Significance): This is about Precision. It tells you if the effect is “real” or “noise.”
Effect Size (Practical Significance): This is about Magnitude. It tells you the “size of the win.”

Significance Comparison

Always pair your p-value with an Effect Size metric. A result can be statistically significant (real) but practically insignificant (too small to care about).

R Script: The “Decision Engine”

Use the code below to turn the raw test statistic into a final decision. This automates the “Z-table” look-up process.

# Input Variables
z_score <- 2.31
alpha_threshold <- 0.05

# 1. Calculate P-value (Two-Tailed)
# pnorm(z) finds the area to the left. 1-pnorm(z) finds the area to the right.
# We multiply by 2 because we are testing for a difference in 'either' direction.
calculated_p <- 2 * (1 - pnorm(abs(z_score)))

# 2. Display the Results with Professional Formatting
cat("--- CHURN MODEL EVALUATION REPORT ---\n")

## --- CHURN MODEL EVALUATION REPORT ---

cat("Test Statistic (Z): ", z_score, "\n")

## Test Statistic (Z):  2.31

cat("P-Value calculated: ", round(calculated_p, 4), "\n")

## P-Value calculated:  0.0209

cat("Alpha Threshold:    ", alpha_threshold, "\n")

## Alpha Threshold:     0.05

cat("-------------------------------------\n")

## -------------------------------------

# 3. The 'Decision Engine'
if (calculated_p <= alpha_threshold) {
  print("CONCLUSION: STATISTICALLY SIGNIFICANT. REJECT NULL.")
  print("STRATEGY: IMPLEMENT MODEL.")
} else {
  print("CONCLUSION: NOT SIGNIFICANT. FAIL TO REJECT NULL.")
  print("STRATEGY: RE-EVALUATE DATA SOURCE.")
}

## [1] "CONCLUSION: STATISTICALLY SIGNIFICANT. REJECT NULL."
## [1] "STRATEGY: IMPLEMENT MODEL."

References

Primary Text: Introductory Statistics: Statistical Inference – This text provided the definitions for Z-tests, T-tests, and the Chi-Square distributions used in Case Studies 1-4.
Video Guide: Grad Coach - Inferential Statistics – Supplemental material for the roadmap used in Case Study 3 and 6.
Decision Theory: 365 Data Science - Error Types – The conceptual foundation for the business risk analysis in Case Study 5.

Statistical Inferences

Assignment Week 14

Fityanandra Athar Adyaksa (52250059)

December 29, 2025

Fityanandra Athar Adyaksa (52250059) Data Science students at Enthusiastic about learning December 29, 2025

Enthusiastic about learning

December 29, 2025

Case Study 1

One-Sample Z-Test (Statistical Hypotheses)

Tasks

1. Formulating Hypotheses

2. Appropriate Statistical Test

3. Computing Test Statistic and P-value

4. Statistical Decision

5. Business Analytics Interpretation

Case Study 2

One-Sample T-Test (σ Unknown, Small Sample)

Tasks

1. Hypotheses Formulation

2. Appropriate Hypothesis Test

3. Calculation

4. Statistical Decision

5. Sample Size and Inferential Reliability

Case Study 3

Two-Sample T-Test (A/B Testing)

Tasks

1. Hypotheses Formulation

2. Type of T-Test Required

3. Compute the Test Statistic and P-Value

4. Draw a Statistical Conclusion

5. Product Decision-Making Interpretation

Case Study 4

Chi-Square Test of Independence

Tasks

1. Hypotheses Formulation

2. Appropriate Statistical Test

3. Compute the Chi-Square Statistic (\(\chi^2\))

4. Determine P-value

5. Statistical Decision

Case Study 5

Type I and Type II Errors (Conceptual)

Tasks

Type I Error (\(\alpha\)) - The “False Alarm”

2. Type II Error (\(\beta\)) - The “Failed Detection”

3. Which Error is More Costly?

4. How Sample Size (\(n\)) affects Type II Error

5. Relationship between \(\alpha\), \(\beta\), and Statistical Power

Case Study 6

P-Value and Statistical Decision Making

Tasks

1. Meaning of the P-Value

2. Statistical Decision

3. Translation for Management

4. Risk of a Non-Representative Sample

5. P-Value vs. Effect Size

R Script: The “Decision Engine”

References

Fityanandra Athar Adyaksa (52250059)

Data Science students at

Enthusiastic about learning

December 29, 2025