STUDY CASES – STATISTICAL INFERENCES

Basic Statistics – Data Science – Assignment Week 14

INSTITUT TEKNOLOGI SAINS BANDUNG

IDENTITY CARD

Name : Hirose Kawarin Sirait

Student ID : 52250012

Major : Data Science

Lecturer : Mr. Bakti Siregar, M.Sc., CDS.

1 CASE STUDY 1

This analysis aims to evaluate the claim made by a digital learning platform regarding the average daily study time of its users. Using a one-sample statistical hypothesis testing approach, the study examines whether the observed sample data provide sufficient evidence to support or reject the stated population mean. The population standard deviation is assumed to be known, allowing the use of a parametric test under the normality assumption.

1.1 Question

One-Sample Z-Test (Statistical Hypotheses)

Problem Statement

A digital learning platform claims that the average daily study time of its users is 120 minutes.
Based on historical records, the population standard deviation is known to be 15 minutes.

A random sample of 64 users shows an average daily study time of 116 minutes.

Given Information:

\[ \mu_0 = 120 \]

\[ \sigma = 15 \]

\[ n = 64 \]

\[ \bar{x} = 116 \]

Tasks

  1. Formulate the Null hypothesis (\(H_0\)) and alternative hypothesis (\(H_1\)).

  2. Identify the appropriate statistical test and justify your choice.

  3. Compute the test statistic and p-value using \(\alpha = 0.05\).

  4. State the statistical decision.

  5. Interpret the result in a business analytics context.

1.2 Answer

1. Formulate the Null hypothesis (\(H_0\)) and alternative hypothesis (\(H_1\)).

Let \(\mu\) = the true average daily study time of users (in minutes).

The digital learning platform claims that the average daily study time is 120 minutes.
To test this claim, the hypotheses are formulated as follows:

\[ \begin{aligned} H_0 &: \mu = 120 \\ H_1 &: \mu \neq 120 \end{aligned} \]

This is a two-tailed hypothesis test because the alternative hypothesis tests whether the population mean is different from 120 minutes, without specifying a direction.

2. Appropriate statistical test and explanation.

The parameter of interest is the population mean \(\mu\), and the population standard deviation is known (\(\sigma = 15\) minutes). The sample size is relatively large (\(n = 64\)).

Since the population standard deviation is known and the objective is to test a claim about the population mean, the appropriate statistical test is a one-sample Z-test for the population mean.

This test is suitable because:

  1. The parameter being tested is a population mean.

  2. The population standard deviation is known.

  3. The sample is randomly selected.

  4. The sample size is sufficiently large to assume normality of the sampling distribution of the sample mean.

3. Compute the test statistic and p-value using \(\alpha = 0.05\).

The test statistic for a one-sample Z-test is calculated using the formula:

\[ Z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}} \]

Given:

  • Sample mean: \(\bar{x} = 116\)

  • Hypothesized mean: \(\mu_0 = 120\)

  • Population standard deviation: \(\sigma = 15\)

  • Sample size: \(n = 64\)

Substituting the values into the formula:

\[ Z = \frac{116 - 120}{15 / \sqrt{64}} \]

\[ Z = \frac{-4}{1.875} = -2.13 \]

Since this is a two-tailed test, the p-value is calculated as:

\[ p\text{-value} = 2 \times P(Z \leq -2.13) \]

\[ p\text{-value} \approx 0.033 \]

4. State the statistical decision

The significance level used in this hypothesis test is \(\alpha = 0.05\).

From the previous calculation, the p-value is approximately 0.033.
Since the p-value is less than the significance level (\(0.033 < 0.05\)), the null hypothesis is rejected.

Therefore, there is sufficient statistical evidence to reject the null hypothesis that the average daily study time of users is 120 minutes.

5. Interpretation of results in the context of business analytics.

Based on the hypothesis testing results, the null hypothesis is rejected at the 5% significance level. This indicates that the actual average daily study time of users is significantly lower than the claimed 120 minutes.

From a business analytics perspective, this suggests that users are not engaging with the platform as long as expected. A lower average study time may negatively impact learning outcomes, user retention, and the overall effectiveness of the platform.

As a result, the company should investigate potential causes such as content quality, platform usability, or user motivation, and implement data-driven strategies to improve user engagement and increase daily study time.

2 CASE STUDY 2

In the field of user experience (UX) research, task completion time is a key metric used to evaluate the efficiency of an application. Understanding whether a new application design meets performance expectations is essential for improving user satisfaction and usability.

In this study, a UX Research Team investigates whether the average task completion time of a newly developed application differs from the expected benchmark of 10 minutes. A sample of task completion times is collected from a small number of users, and statistical inference is required to draw conclusions about the population mean.

Since the population standard deviation is unknown and the sample size is relatively small, a one-sample t-test is applied to test the research hypothesis at a 5% level of significance.

2.1 Question

One-Sample T-Test (σ Unknown, Small Sample)

A UX Research Team investigates whether the average task completion time of a new application differs from 10 minutes.

The following data are collected from 10 users:

9.2,10.5,9.8,10.1,9.6,10.3,9.9,9.7,10.0,9.5

Tasks

  1. Define H₀ and H₁ (two-tailed).

  2. Determine the appropriate hypothesis test.

  3. Calculate the t-statistic and p-value at \(\alpha = 0.05\).

  4. Make a statistical decision.

  5. Explain how sample size affects inferential reliability.

2.2 Answer

1. determining H₀ and H₁ (two-sided).

Null Hypothesis (H₀):

\[ H_0: \mu = 10 \] The average task completion time is 10 minutes.

Alternative Hypothesis (H₁):

\[ H_1: \mu \neq 10 \] The average task completion time differs from 10 minutes.

2. determining the appropriate hypothesis test.

The appropriate statistical test for this analysis is a one-sample t-test.

This test is chosen because the objective is to test a hypothesis about a population mean, the population standard deviation is unknown, and the sample size is relatively small.

3. Calculate the t-statistic and p-value at \(\alpha = 0.05\).

Sample statistics

Statistic Value
Sample Size (n) 10.00
Sample Mean (x̄) 9.86
Sample Standard Deviation (s) 0.39

Test Statistic Formula

Since the population standard deviation is unknown and the sample size is small, a one-sample t-test is used. The test statistic is calculated using the formula:

\[ t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}} \]

Substitution

Substituting the sample values into the formula:

\[ t = \frac{9.86 - 10}{0.40 / \sqrt{10}} = -1.11 \]

p-value Calculation

For a two-tailed test with significance level \(\alpha = 0.05\) and degrees of freedom \(df = n - 1 = 9\), the p-value is calculated as:

\[ p\text{-value} = 2P(T_9 \geq |t|) \approx 0.29 \] Based on the calculation, the test statistic is \(t = -1.11\) with a corresponding p-value of approximately 0.29 for a two-tailed test at \(\alpha = 0.05\).

4. Statistical decision.

At the significance level \(\alpha = 0.05\), the p-value (0.29) is greater than the significance level. Therefore, we fail to reject the null hypothesis (\(H_0\)).

5. an explanation of how sample size affects the reliability of inferences.
At the 5% significance level, the hypothesis test does not provide sufficient evidence to conclude that the population mean task completion time differs from 10 minutes. Any observed deviation from the hypothesized mean is likely attributable to sampling variability.

3 CASE STUDY 3

In today’s digital era, user experience on online platforms plays a crucial role in product success. Product analytics teams often conduct A/B testing to compare two versions of a feature or interface to determine which one performs better.

This task focuses on comparing the average session duration between two versions of a landing page, Version A and Version B. Statistical analysis will be used to determine whether the observed difference is significant or simply due to random variation. The results of this analysis will help the product team make data-driven decisions regarding the optimization of the landing page.

3.1 Question

A product analytics team conducts an A/B test to compare the average session duration (minutes) between two versions of a landing page.

Version Sample Size (n) Mean Standard Deviation
A 25 4.8 1.2
B 25 5.4 1.4

Task

  1. Formulate the null and alternative hypotheses.

  2. Identify the type of t-test required.

  3. Compute the test statistic and p-value.

  4. Draw a statistical conclusion at α=0.05.

  5. Interpret the result for product decision-making.

3.2 Answer

1. Formulate the null and alternative hypotheses.

We want to test whether there is a significant difference in average session duration between Version A and Version B.

  • Null Hypothesis (H₀): μA = μB
    There is no difference in average session duration between Version A and Version B.

  • Alternative Hypothesis (H₁): μA ≠ μB
    There is a difference in average session duration between Version A and Version B.

2. Identify the type of t-test required.

Since we are comparing two independent samples and their standard deviations are different, the appropriate test is:

  • Two-sample t-test (Welch’s t-test / unequal variance t-test)
3. Compute the test statistic and p-value.

Given:

  • \(n_A = n_B = 25\)
  • \(\bar{X}_A = 4.8\), \(\bar{X}_B = 5.4\)
  • \(s_A = 1.2\), \(s_B = 1.4\)

Step 1: Calculate t-statistic

\[ s_A^2 = 1.2^2 = 1.44, \quad s_B^2 = 1.4^2 = 1.96 \]

\[ SE = \sqrt{\frac{s_A^2}{n_A} + \frac{s_B^2}{n_B}} = \sqrt{\frac{1.44}{25} + \frac{1.96}{25}} = \sqrt{0.0576 + 0.0784} = \sqrt{0.136} \approx 0.3687 \]

\[ t = \frac{\bar{X}_A - \bar{X}_B}{SE} = \frac{4.8 - 5.4}{0.3687} \approx -1.63 \]

Step 2: Degrees of Freedom (Welch’s method)

\[ df = \frac{(s_A^2/n_A + s_B^2/n_B)^2}{\frac{(s_A^2/n_A)^2}{n_A-1} + \frac{(s_B^2/n_B)^2}{n_B-1}} \approx 47 \]

Step 3: p-value (two-tailed)

\[ p \approx 0.11 \]

4. Draw a statistical conclusion at α=0.05
  • \(\alpha = 0.05\)
  • \(p \approx 0.11 > 0.05\)

Decision: Fail to reject H₀.

There is not enough statistical evidence to conclude that the average session duration differs between Version A and Version B.

5. Interpret the result for product decision-making.
  • The analysis indicates that Versions A and B have similar average session durations.

  • For product decisions: no strong statistical reason to prefer one version over the other based solely on session duration.

  • The team may consider other metrics (e.g., conversion rates, user satisfaction) before deciding which version to implement.

4 CASE STUDY 4

In the digital commerce environment, understanding customer behavior is crucial for developing effective payment strategies. Different device types, such as mobile and desktop, may influence customers’ preferred payment methods.

Therefore, a statistical analysis is conducted to examine the relationship between device type and payment method preference using the Chi-Square Test of Independence.

4.1 Question

An e-commerce company examines whether device type is associated with payment method preference.

Device Type E-Wallet Credit Card Cash on Delivery
Mobile 120 80 50
Desktop 60 90 40

Tasks

  1. State the Null Hypothesis (H₀) and Alternative Hypothesis (H₁).

  2. Identify the appropriate statistical test.

  3. Compute the Chi-Square statistic (χ²).

  4. Determine the p-value at α=0.05.

  5. Interpret the results in terms of digital payment strategy.

4.2 Answer

1. State the Null Hypothesis (H₀) and Alternative Hypothesis (H₁).

The hypotheses for this study are formulated as follows:

  • Null Hypothesis (H₀):
    There is no association between device type and payment method preference.

  • Alternative Hypothesis (H₁):
    There is an association between device type and payment method preference.

2. Identify the appropriate statistical test.

The appropriate statistical test for this analysis is the Chi-Square Test of Independence. This test is used to determine whether there is a significant association between two categorical variables, namely device type and payment method preference.

3. Compute the Chi-Square statistic (χ²).

The Chi-Square statistic is calculated using the observed and expected frequencies.

Expected Frequency Formula \[ E_{ij} = \frac{(\text{Row Total})(\text{Column Total})}{\text{Grand Total}} \]

Expected Frequencies

Device Type E-Wallet Credit Card Cash on Delivery
Mobile 102.27 96.59 51.14
Desktop 77.73 73.41 38.86

Chi-Square Formula \[ \chi^2 = \sum \frac{(O - E)^2}{E} \]

By summing the Chi-Square values from all cells, the Chi-Square statistic is obtained as follows: \[ \chi^2 \approx 13.77 \]

4. Determine the p-value at α=0.05

The level of significance used in this analysis is \(\alpha = 0.05\).

The degrees of freedom for the Chi-Square Test of Independence are calculated as: \[ df = (r - 1)(c - 1) = (2 - 1)(3 - 1) = 2 \]

Based on the Chi-Square distribution with \(df = 2\), the Chi-Square statistic \(\chi^2 = 13.77\) corresponds to a p-value of approximately: \[ p\text{-value} \approx 0.001 \]

Since the p-value is less than the significance level \(\alpha = 0.05\), the null hypothesis is rejected.

5. Interpret the results in terms of digital payment strategy

The results of the Chi-Square Test of Independence indicate a significant association between device type and payment method preference. This suggests that customers’ choice of payment method varies depending on whether they use a mobile device or a desktop.

From a digital payment strategy perspective, mobile users tend to prefer E-Wallet payments, while desktop users are more inclined to use credit cards. Therefore, e-commerce companies should optimize E-Wallet integration and user experience on mobile platforms, while highlighting credit card payment options on desktop interfaces.

By aligning payment features with device-specific user behavior, companies can improve transaction convenience and potentially increase conversion rates.

5 CASE STUDY 5

Hypothesis testing is a fundamental statistical method used to support data-driven decision making. It allows researchers and practitioners to evaluate whether an observed effect is statistically significant or occurs merely due to random variation.

In the fintech industry, hypothesis testing plays a crucial role in assessing the effectiveness of technological innovations, such as fraud detection algorithms. Incorrect statistical decisions may lead to financial losses and increased business risk. Therefore, understanding Type I Error, Type II Error, and statistical power is essential before drawing conclusions from hypothesis testing.

The following section discusses these concepts using a case study on the evaluation of a new fraud detection algorithm.

5.1 Question

A fintech startup tests whether a new fraud detection algorithm reduces fraudulent transactions.

  • H₀: The new algorithm does not reduce fraud.

  • H₁: The new algorithm reduces fraud.

5.2 Answer

1. Explain a Type I Error (α) in this context.

A Type I Error occurs when the null hypothesis is rejected even though it is actually true. In other words, the statistical test indicates a significant effect when no real effect exists.

In the context of this study, a Type I Error would occur if the fintech company concludes that the new fraud detection algorithm reduces fraudulent transactions, when in reality the algorithm does not provide any actual reduction in fraud. As a result, the company may implement an ineffective algorithm, leading to false confidence in fraud prevention and potential financial losses.

2. Explain a Type II Error (β) in this context.

A Type II Error occurs when the null hypothesis is not rejected even though the alternative hypothesis is true. This means that the statistical test fails to detect an effect that actually exists.

In the context of this study, a Type II Error would occur if the fintech company concludes that the new fraud detection algorithm does not reduce fraudulent transactions, when in fact the algorithm is effective in reducing fraud. Consequently, the company may choose not to implement a beneficial algorithm, resulting in missed opportunities to reduce fraud-related losses.

3. Identify which error is more costly from a business perspective.

From a business perspective, a Type II Error is generally more costly than a Type I Error in this context.

A Type II Error occurs when the company fails to identify an effective fraud detection algorithm and therefore does not implement it. As a result, fraudulent transactions continue to occur, leading to ongoing financial losses and missed opportunities for risk reduction. In contrast, a Type I Error involves implementing an ineffective algorithm, which may be identified and corrected over time. Therefore, failing to detect a truly effective solution poses a greater long-term risk to the business.

4. Discuss how sample size affects Type II Error.

Sample size has a direct impact on the probability of committing a Type II Error. When the sample size is small, the statistical test has limited ability to detect a true effect, which increases the likelihood of failing to reject the null hypothesis when it is false.

As the sample size increases, the estimates become more precise and the test gains greater sensitivity to detect real differences. Consequently, a larger sample size reduces the probability of a Type II Error and increases the reliability of the statistical decision.

5. Explain the relationship between α, β, and statistical power.

In hypothesis testing, α represents the probability of committing a Type I Error, while β represents the probability of committing a Type II Error. Statistical power is defined as the probability of correctly rejecting the null hypothesis when the alternative hypothesis is true, and it is expressed as \(1 - \beta\).

There is a trade-off between α and β. Reducing the significance level α makes the test more conservative, which can increase the likelihood of a Type II Error. However, this trade-off can be mitigated by increasing the sample size, which reduces β and increases statistical power without increasing α.

Therefore, an appropriate balance between α, β, and sample size is essential to ensure reliable and effective statistical decision making.

6 STUDY CASE 6

Statistical hypothesis testing plays a crucial role in evaluating the performance of predictive models in data-driven decision making. One important concept in hypothesis testing is the p-value, which helps determine whether observed results are statistically significant or likely to occur by chance.

In this task, a churn prediction model is evaluated using hypothesis testing at a given significance level. The analysis focuses on interpreting the p-value, making an appropriate statistical decision, and translating the results into a business context. Additionally, potential risks related to data representativeness and common misconceptions about p-values are discussed.

6.1 Question

A churn prediction model evaluation yields the following results:

  • Test statistic = 2.31

  • p-value = 0.021

  • Significance level: α=0.05

Tasks

  1. Explain the meaning of the p-value.

  2. Make a statistical decision.

  3. Translate the decision into non-technical language for management.

  4. Discuss the risk if the sample is not representative.

  5. Explain why the p-value does not measure effect size.

6.2 Answer

1. Explain the meaning of the p-value.

The p-value represents the probability of obtaining a test statistic as extreme as the observed value, assuming that the null hypothesis is true. In this analysis, the p-value is 0.021, which indicates that there is a 2.1% chance of observing such results if the churn prediction model has no real effect.

A smaller p-value provides stronger evidence against the null hypothesis, suggesting that the observed result is unlikely to have occurred due to random variation alone.

2. Make a statistical decision.

The statistical decision is made by comparing the p-value to the significance level. In this analysis, the p-value (0.021) is less than the significance level (α = 0.05). Therefore, the null hypothesis is rejected.

This result indicates that there is sufficient statistical evidence to support the alternative hypothesis at the 5% significance level.

3. Translate the decision into non-technical language for management.

The analysis indicates that the churn prediction model shows a real improvement that is unlikely to be caused by random chance. This means the model’s performance difference is statistically reliable and can be considered meaningful for supporting business decisions.

As a result, management can have confidence that the model provides useful insights for identifying customers who are at risk of churn.

4. Discuss the risk if the sample is not representative.

If the sample used to evaluate the churn prediction model is not representative of the overall customer population, the results of the analysis may be misleading. The model may appear to perform well on the sample data but fail to generalize to real-world customers.

This lack of representativeness can lead to biased conclusions, reduced model reliability, and ineffective business decisions. Therefore, statistical significance alone is not sufficient without ensuring that the sample accurately reflects the target population.

5. Explain why the p-value does not measure effect size.

The p-value indicates whether an observed effect is statistically significant, but it does not provide information about the magnitude or practical importance of that effect. A small p-value only suggests that the observed result is unlikely to have occurred by random chance.

In addition, the p-value is influenced by the sample size. With a large sample, even a very small and practically unimportant effect can produce a statistically significant p-value. Therefore, effect size measures are needed to assess how large or meaningful the impact of a model is in practice.