Foto

Fityanandra Athar Adyaksa (52250059)


Data Science students at

Enthusiastic about learning

December 28, 2025




Case Study 1

One-Sample Z-Test (Statistical Hypotheses)

A digital learning platform claims that the average daily study time of its users is 120 minutes. Based on historical records, the population standard deviation is known to be 15 minutes.

A random sample of 64 users shows an average study time of 116 minutes.

\[ \begin{eqnarray*} \mu_0 &=& 120 \\ \sigma &=& 15 \\ n &=& 64 \\ \bar{x} &=& 116 \end{eqnarray*} \]

Tasks

  1. Formulate the Null Hypothesis (H₀) and Alternative Hypothesis (H₁).
  2. Identify the appropriate statistical test and justify your choice.
  3. Compute the test statistic and p-value using \(\alpha = 0.05\).
  4. State the statistical decision.
  5. Interpret the result in a business analytics context.



1. Formulating Hypotheses

In inference, we start by defining what we are testing against.

  • Null Hypothesis (\(H_0\)): \(\mu = 120\)

    • Meaning: The average daily study time is exactly what the platform claims (120 minutes). There is no “real” difference; any variation in our sample is just random luck.
  • Alternative Hypothesis (\(H_1\)): \(\mu \neq 120\)

    • Meaning: The average daily study time is significantly different from 120 minutes. This is a two-tailed test because we are checking for a difference in either direction (more or less).


2. Appropriate Statistical Test

The appropriate test is the One-Sample Z-Test.
  • Justification:
    1. Known \(\sigma\): We are explicitly given the population standard deviation (\(\sigma = 15\)).
    2. Normality/Sample Size: The sample size (\(n = 64\)) is greater than 30. According to the Central Limit Theorem, the sampling distribution of the mean will be approximately normal regardless of the underlying population distribution.


3. Computing Test Statistic and P-value

We need to see how many “standard errors” our sample mean (\(\bar{x} = 116\)) sits away from the claimed mean (\(\mu = 120\)).

  • Calculate the Standard Error (\(SE\))\[SE = \frac{\sigma}{\sqrt{n}} = \frac{15}{\sqrt{64}} = \frac{15}{8} = 1.875\]
  • Calculate the Z-Statistic \[Z = \frac{\bar{x} - \mu_0}{SE} = \frac{116 - 120}{1.875} = \frac{-4}{1.875} \approx -2.13\]
  • Find the P-Value Using a standard normal distribution table for \(Z = -2.13\):

    • The area to the left of \(-2.13\) is approximately 0.0166.
    • Since this is a two-tailed test, we multiply by 2: \(0.0166 \times 2 = \mathbf{0.0332}\).


4. Statistical Decision

We compare our p-value to our significance level (\(\alpha = 0.05\)):

  • Comparison: \(0.0332 < 0.05\)
  • Decision: Reject the Null Hypothesis (\(H_0\)).


5. Business Analytics Interpretation

From a business perspective, the platform’s claim that users study for 120 minutes is statistically unsupported by this data.

The sample mean of 116 minutes is low enough that it is unlikely to have happened by random chance if the true average were 120. As a data analyst, you would advise the marketing or product team that their “120-minute” claim is likely an overestimation and should be revised to reflect actual user behavior more accurately.




Case Study 2

One-Sample T-Test (σ Unknown, Small Sample)

A UX Research Team investigates whether the average task completion time of a new application differs from 10 minutes.

The following data are collected from 10 users:

\[ 9.2,\; 10.5,\; 9.8,\; 10.1,\; 9.6,\; 10.3,\; 9.9,\; 9.7,\; 10.0,\; 9.5 \]

Tasks

  1. Define H₀ and H₁ (two-tailed).
  2. Determine the appropriate hypothesis test.
  3. Calculate the t-statistic and p-value at \(\alpha = 0.05\).
  4. Make a statistical decision.
  5. Explain how sample size affects inferential reliability.



1. Hypotheses Formulation

Null Hypothesis (H₀):

H₀: μ = 10 minutes

  • The average task completion time for the new application is 10 minutes

  • No difference from the target benchmark

Alternative Hypothesis (H₁):

H₁: μ ≠ 10 minutes

  • The average task completion time differs from 10 minutes

  • This is a two-tailed test because we’re checking for any difference (faster or slower)



2. Appropriate Hypothesis Test

Justification:

  1. Parameter of interest: Population mean (μ)

  2. Population standard deviation: Unknown (we only have sample data)

  3. Sample size: Small (n = 10 < 30)

  4. Conditions check:

    • Random sample (assumed from “10 users”)
    • Independence (reasonable to assume one user’s completion time doesn’t affect another’s)
    • Normality assumption needed (with small n, we assume the population of completion times is approximately normal)

Why t-test instead of z-test?

  • σ is unknown
  • Small sample size (n < 30)
  • We must estimate σ using the sample standard deviation (s)


3. Calculation

Data: 9.2, 10.5, 9.8, 10.1, 9.6, 10.3, 9.9, 9.7, 10.0, 9.5
  1. Find the Sample Mean (\(\bar{x}\))

\[\begin{aligned} \bar{x} &= \frac{\sum\_{i=1}^{n} x\_i}{n} \\ &= \frac{98.6}{10} \\ &= 9.86 \text{ minutes} \end{aligned}\]

  1. Find the Sample Standard Deviation (\(s\)) Calculate the variance by finding the sum of squared differences from the mean:
  • \(\sum(x_i - \bar{x})^2 = 1.344\).
  • \(s = \sqrt{\frac{1.344}{10 - 1}} \approx \mathbf{0.386}\)
  1. Calculate the T-statistic

\[t = \frac{\bar{x} - \mu}{s / \sqrt{n}} = \frac{9.86 - 10}{0.386 / \sqrt{10}} = \frac{-0.14}{0.122} \approx \mathbf{-1.15}\]

  1. Determine the P-Value using the T-distribution table with Degrees of Freedom (3\(df\)) = 9 (which is 4\(n-1\)):
  • For \(t = -1.15\) (two-tailed), the p-value is approximately 0.28.


4. Statistical Decision

  • Comparison: \(p \text{-value} (0.28) > \alpha (0.05)\).
  • Decision: Fail to Reject the Null Hypothesis (\(H_0\)).
  • Interpretation: There is not enough evidence to say the average completion time differs significantly from 10 minutes. The difference we observed (9.86 vs 10) is likely due to random sampling variation.


5. Sample Size and Inferential Reliability

Sample size (\(n\)) plays a critical role in hypothesis testing and the reliability of our inferences:

  • Larger sample size:
    • Reduces the standard error (\(s / \sqrt{n}\)), making the t-statistic larger for the same deviation from \(\mu_0\).
    • Increases statistical power (ability to detect real differences if they exist; reduces Type II error risk).
    • Makes the test more sensitive — even small deviations from H₀ can become statistically significant.
    • Improves reliability: Conclusions are more precise and generalizable.
  • Small sample size (like n=10 here):
    • Larger standard error → smaller |t|-statistic → higher p-value → harder to reject H₀.
    • Lower power: Higher chance of missing real effects (Type II error).
    • Wider confidence intervals → less precise estimates.
    • Less reliable inference: Results are more variable and sensitive to outliers.

In this case: The sample mean (9.86) is slightly below 10, but with only n=10 and low variability, the difference is not statistically significant. A larger sample (e.g., n=50) showing the same mean difference would likely yield a much smaller p-value and rejection of H₀.




Case Study 3

Two-Sample T-Test (A/B Testing)

A product analytics team conducts an A/B test to compare the average session duration (minutes) between two versions of a landing page.

Version Sample Size (n) Mean Standard Deviation
A 25 4.8 1.2
B 25 5.4 1.4

Tasks

  1. Formulate the null and alternative hypotheses.
  2. Identify the type of t-test required.
  3. Compute the test statistic and p-value.
  4. Draw a statistical conclusion at \(\alpha = 0.05\).
  5. Interpret the result for product decision-making.



1. Hypotheses Formulation

Null Hypothesis (H₀):

\[\begin{aligned} H_0 : \mu_A = \mu_B \end{aligned}\]
  • The average session duration is equal for both landing page versions
  • No difference in user engagement between versions A and B

Alternative Hypothesis (H₁):

\[\begin{aligned} H_1 : \mu_A \neq \mu_B \end{aligned}\]
  • The average session duration differs between the two versions
  • Version A and B have different levels of user engagement
  • Note: This is a two-tailed test as we’re checking for any difference


2. Type of T-Test Required

Test Selection: Two-Sample Independent T-Test (Welch’s t-test)

Justification:

  1. Comparing two independent groups (Version A vs Version B)

  2. Population standard deviations unknown (only sample SDs provided: 1.2 and 1.4)

  3. Sample sizes are equal (both n = 25), but not necessarily large enough for z-test

  4. Equal variances assumption check needed:

  • We need to decide between Student’s t-test (equal variances) and Welch’s t-test (unequal variances)
  • Sample standard deviations: s_A = 1.2, s_B = 1.4
  • Ratio: (1.4²)/(1.2²) = 1.96/1.44 ≈ 1.36 < 2 (often used as a rule of thumb)
  • We’ll use Welch’s t-test as it’s more robust and doesn’t assume equal variances


3. Compute the Test Statistic and P-Value

To compare the two groups, we calculate how much the difference in means (\(5.4 - 4.8 = 0.6\)) stands out against the combined “noise” (standard error) of both groups.

Calculate the Standard Error (\(SE\))

Since the sample sizes are equal (\(n=25\)), we use the formula:

\[SE = \sqrt{\frac{s_A^2}{n_A} + \frac{s_B^2}{n_B}}\] \[SE = \sqrt{\frac{1.2^2}{25} + \frac{1.4^2}{25}} = \sqrt{\frac{1.44}{25} + \frac{1.96}{25}} \\ = \sqrt{0.0576 + 0.0784} \approx \mathbf{0.369}\]

Calculate the T-Statistic

\[t = \frac{\bar{x}_B - \bar{x}_A}{SE}\\ = \frac{5.4 - 4.8}{0.369} = \frac{0.6}{0.369} \approx \mathbf{1.626}\]

Determine the P-Value

Using the degrees of freedom (\(df \approx 48\) using a simplified \(n_A + n_B - 2\)):

  • For \(t = 1.626\), the two-tailed p-value is approximately 0.110.


4. Draw a Statistical Conclusion

  • Comparison: \(p \text{-value} (0.110) > \alpha (0.05)\).

  • Decision: Fail to Reject the Null Hypothesis (\(H_0\)).

  • Reasoning: Although Version B had a higher average (5.4 minutes) than Version A (4.8 minutes), the p-value tells us there is an 11% chance this difference happened just by random luck. Since 11% is higher than our 5% threshold, we cannot claim the result is “statistically significant.”



5. Product Decision-Making Interpretation

Version B shows a higher sample mean session duration (5.4 vs. 4.8 minutes, a +12.5% increase), but this difference is not statistically significant at \(\alpha = 0.05\).

Product Implications:

  • We cannot confidently claim that Version B improves user engagement over Version A based on this data.
  • The observed improvement could be due to random variation rather than a true effect.
  • Recommendation: Do not roll out Version B universally yet. Options include:
    • Continue the test with a larger sample size to increase statistical power and potentially detect a real difference.
    • Consider a one-tailed test in future if the team has strong prior belief that B should perform better (this would make rejection easier).
    • Explore practical significance: Even if significant, a 0.6-minute increase may not justify development and rollout costs.
    • Segment the data (e.g., by device, traffic source) to check if Version B performs better in specific subgroups.

Key Lesson in A/B Testing: Statistical significance is essential before declaring a “winner.” Many promising variants fail to reach significance due to insufficient sample size or small effect sizes.




Case Study 4

Chi-Square Test of Independence

An e-commerce company examines whether device type is associated with payment method preference.

Device / Payment E-Wallet Credit Card Cash on Delivery
Mobile 120 80 50
Desktop 60 90 40

Tasks

  1. State the Null Hypothesis (H₀) and Alternative Hypothesis (H₁).
  2. Identify the appropriate statistical test.
  3. Compute the Chi-Square statistic (χ²).
  4. Determine the p-value at \(\alpha = 0.05\).
  5. Interpret the results in terms of digital payment strategy.



1. Hypotheses Formulation

Null Hypothesis (H₀):

\[H_0 : \text{Device type and payment} \\ \text{method are independent}\]

  • There is no association between the type of device used and payment method preference
  • Any observed differences in the contingency table are due to random chance

Alternative Hypothesis (H₁):

\[H_1 : \text{Device type and payment}\\ \text{method are dependent}\]

  • There is a statistically significant association between device type and payment method
  • Payment preferences differ between mobile and desktop users


2. Appropriate Statistical Test

Test Selection: Pearson’s Chi-Square Test of Independence

Justification:

  1. Two categorical variables:

    • Device type (Mobile, Desktop) - 2 categories

    • Payment method (E-Wallet, Credit Card, Cash on Delivery) - 3 categories

  2. Independent observations: Each user contributes to only one cell

  3. Expected frequencies ≥ 5: We’ll verify this during calculation

  4. Goal: Test association/independence between two categorical variables

3. Compute the Chi-Square Statistic (\(\chi^2\))

To find the test statistic, we compare the Observed values (the data we have) to the Expected values (what the data would look like if there were no relationship).

  1. Calculate Row and Column Totals
library(knitr)
data_tabel <- data.frame(
  Device = c("E-Wallet", "Credit Card", "Cash on Delivery", "Row Total"),
  Mobile = c(120, 80, 50, 250),
  Desktop = c(60, 90, 40, 190),
  Column_Total = c(180, 170, 90, 440)
)

colnames(data_tabel) <- c("Device", "Mobile", "Desktop", "Column Total")

kable(data_tabel, 
      caption = "Contingency Table: Device Type vs Payment Method",
      align = "lccc") # l=left, c=center
Contingency Table: Device Type vs Payment Method
Device Mobile Desktop Column Total
E-Wallet 120 60 180
Credit Card 80 90 170
Cash on Delivery 50 40 90
Row Total 250 190 440


  1. Calculate Expected Frequencies (\(E\))

Using the formula

\(E = \frac{(\text{Row Total} \times \text{Column Total})}{\text{Grand Total}}\)

  • Mobile + E-Wallet: \((250 \times 180) / 440 = \mathbf{102.27}\)

  • Mobile + Credit Card: \((250 \times 170) / 440 = \mathbf{96.59}\)

  • Mobile + Cash (COD): \((250 \times 90) / 440 = \mathbf{51.14}\)

  • Desktop + E-Wallet: \((190 \times 180) / 440 = \mathbf{77.73}\)

  • Desktop + Credit Card: \((190 \times 170) / 440 = \mathbf{73.41}\)

  • Desktop + Cash (COD): \((190 \times 90) / 440 = \mathbf{38.86}\)


  1. Calculate Chi-Square Statistic

Formula:

\[\chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}}\]

Calculation :

\[\begin{aligned} \chi^2 &= \frac{(120-102.27)^2}{102.27} + \frac{(80-96.59)^2}{96.59} + \frac{(50-51.14)^2}{51.14} \\ &\quad + \frac{(60-77.73)^2}{77.73} + \frac{(90-73.41)^2}{73.41} + \frac{(40-38.86)^2}{38.86} \\[10pt] &= \frac{17.73^2}{102.27} + \frac{(-16.59)^2}{96.59} + \frac{(-1.14)^2}{51.14} \\ &\quad + \frac{(-17.73)^2}{77.73} + \frac{16.59^2}{73.41} + \frac{1.14^2}{38.86} \\[10pt] &= \frac{314.35}{102.27} + \frac{275.23}{96.59} + \frac{1.30}{51.14} \\ &\quad + \frac{314.35}{77.73} + \frac{275.23}{73.41} + \frac{1.30}{38.86} \\[10pt] &= 3.074 + 2.849 + 0.025 + 4.044 + 3.749 + 0.033 \\ &= 13.774 \end{aligned}\]

\[\chi^2 = 13.774\]

Degrees of freedom:

\[df = (2 - 1) \times (3 - 1) = 1 \times 2 = 2\]

Case Study 5

Type I and Type II Errors (Conceptual)

A fintech startup tests whether a new fraud detection algorithm reduces fraudulent transactions.

  • H₀: The new algorithm does not reduce fraud.
  • H₁: The new algorithm reduces fraud.

Tasks

  1. Explain a Type I Error (α) in this context.
  2. Explain a Type II Error (β) in this context.
  3. Identify which error is more costly from a business perspective.
  4. Discuss how sample size affects Type II Error.
  5. Explain the relationship between α, β, and statistical power.

Case Study 6

P-Value and Statistical Decision Making

A churn prediction model evaluation yields the following results:

  • Test statistic = 2.31
  • p-value = 0.021
  • Significance level: \(\alpha = 0.05\)

Tasks

  1. Explain the meaning of the p-value.
  2. Make a statistical decision.
  3. Translate the decision into non-technical language for management.
  4. Discuss the risk if the sample is not representative.
  5. Explain why the p-value does not measure effect size.