Detail Profil Mahasiswa

FRIZZY LITHMENTSYAH
Program Studi
Sains Data
Universitas
Institut Teknologi Sains Bandung (ITSB)
Mata Kuliah
Statistik Dasar
Dosen Pengampu
BAKTI SIREGAR, M.Sc., CDS.

1 FIRST CASE

Confidence Interval for Mean (σ Known)
An e-commerce platform aims to estimate the average number of daily transactions per user after launching a new feature. Based on historical large-scale data, the population standard deviation is known.


Given Information:

  • Population standard deviation (σ) = 3.2
  • Sample size (n) = 100
  • Sample mean (x̄) = 12.6

Tasks:

  1. Identify the appropriate statistical method and justify the choice.
  2. Compute confidence intervals at 90%, 95%, and 99% levels.
  3. Visualize the comparison of the confidence intervals.
  4. Interpret the results in a business analytics context.

1.1 Identify the Appropriate Statistical Test

Selected Method:
Z-Confidence Interval for Population Mean

Justification:

  • The population standard deviation (σ = 3.2) is known.
  • The sample size is large (n = 100 ≥ 30), ensuring the sampling distribution of the mean is approximately normal according to the Central Limit Theorem.
  • The objective is estimation of the population mean, not hypothesis testing.

Therefore, the Z Confidence Interval is the most appropriate statistical approach for this analysis.

1.2 Confidence Interval Calculation

DIKETAHUI:

  • Rata-rata sampel (σ = 3.2) \(\bar{x} = 12.6\).
  • Simpangan baku populasi, \(\sigma = 3.2\).
  • Ukuran sampel \(n = 100\)

Perhitungan Confidence Interval (CI). Rumus umum Confidence Interval adalah: \[ \bar{x} \pm Z \left(\frac{\sigma}{\sqrt{n}}\right) \] Di mana Standard Error (SE) adalah: \[ SE = \frac{3.2}{\sqrt{100}} = \frac{3.2}{10} = 0.32 \]

Tabel Confidence Interval Rata-rata Transaksi
Confidence Z Score Margin Lower Upper
90% 1.645 0.526 12.074 13.126
95% 1.960 0.627 11.973 13.227
99% 2.576 0.824 11.776 13.424

1.3 visualization

In confidence interval analysis, an average transaction value of 12.6 exhibits varying degrees of uncertainty depending on the confidence level. As the confidence level increases, the interval becomes wider.

1.4 Business Analytics Interpretation

Executive Summary: Confidence Interval Analysis

Based on the Confidence Interval (CI) analysis, the estimated mean number of daily transactions per user is 12.6 transactions. The confidence intervals constructed at the 90%, 95%, and 99% confidence levels indicate that the true population mean is highly likely to fall within these respective ranges.

At the 90% confidence level, the interval is relatively narrow, providing a more precise estimate but with a lower degree of certainty. In contrast, the 95% and 99% confidence levels produce wider intervals, reflecting a more conservative estimation approach with increased confidence. This highlights the trade-off between estimation precision and confidence level, which is a key consideration in business decision-making.

Business Analytics Perspective: The results suggest that the newly launched feature is associated with a stable and consistent level of user transaction activity. The 95% confidence interval is particularly recommended for managerial decision-making as it offers an optimal balance between reliability and accuracy.

Practically, ini enables the company to:

  • Lower Bound: Used as a conservative estimate for system capacity planning and risk management.
  • Upper Bound: Used as an optimistic scenario for revenue forecasting and growth projections.
  • Benchmarking: Assessing the effectiveness of the new feature against historical data.

Overall, this analysis provides a statistically robust foundation for data-driven strategic decisions, supporting informed planning and sustainable growth.

2 SECOND CASE

Assignment 2: UX Research Analysis

Confidence Interval for Mean (σ Unknown)

Case Scenario:

A UX Research team is analyzing the task completion time (in minutes) for a newly launched mobile application. Data was collected from a sample of 12 users:

8.4, 7.9, 9.1, 8.7, 8.2, 9.0, 7.8, 8.5, 8.9, 8.1, 8.6, 8.3

Key Deliverables:

1. Statistical Identification

Select the appropriate test and provide a technical justification.

2. CI Computation

Calculate the intervals for 90%, 95%, and 99% confidence levels.

3. Visual Comparison

Generate a single-plot visualization to compare the three intervals.

4. Sensitivity Analysis

Explain how sample size and confidence levels affect interval width.

Note: Since the population standard deviation (σ) is unknown and the sample size is small (n=12), a Student's t-distribution is required for this analysis.

UX Research Department • Statistical Report 2026

2.1 Identification of Statistical Tes

2.1

Statistical Test Identification

The appropriate statistical test for this analysis is the t-Interval (Student's t-Distribution).

📉 Population Standard Deviation (σ) is Unknown

The actual population variance is not provided in the data set. Therefore, we must estimate it using the sample standard deviation (s) as a proxy.

📊 Small Sample Size (n < 30)

With a sample of only 12 users (n=12), the Central Limit Theorem's normal approximation is less reliable. The t-distribution provides a more conservative interval to account for this uncertainty.

σ Unknown  +  n < 30  =  t-Interval Selection

2.2 Calculation & descriptive stats

2.2

Descriptive Statistics & CI Calculation

First, we calculate the fundamental statistics from the user task completion data:

Sample Mean ($\bar{x}$) 8.458 minutes
Sample Std. Deviation ($s$) 0.412
Degrees of Freedom ($df$) 11 ($n-1$)
Standard Error ($SE$) ≈ 0.119

Standard Error Formula:

$$SE = \frac{s}{\sqrt{n}} = \frac{0.412}{\sqrt{12}} \approx 0.119$$

2.3 visualization

2.3

Data Visualization Strategy

To effectively communicate the precision vs. confidence trade-off, the following interactive plot is constructed using ggplot2 and plotly. Key features include:

  • Error Bars: Representing the t-interval range for 90%, 95%, and 99% levels.
  • Point Estimates: Indicating the sample mean ($\bar{x} = 8.458$).
  • Interactive Tooltips: Displaying exact range values on hover.
Visual Expectation:

As the confidence level increases, the vertical error bars will visibly expand, reflecting the wider margin of error.

2.4 Sensitivity Analysis: Factors Influencing Interval Width

2.4

Sensitivity Analysis: Factors Influencing Interval Width

The width of a Confidence Interval (CI) determines the precision of our estimate. In this UX study, the interval width is influenced by two primary variables:

🎯 Confidence Level

Relationship: Directly Proportional
Mechanism: To be more certain (e.g., 99% vs 90%) that the interval contains the true population mean, we must widen the range. Higher confidence requires a larger "safety net," resulting in a wider interval and lower precision.

👥 Sample Size ($n$)

Relationship: Inversely Proportional
Mechanism: As the number of users tested increases, the Standard Error (SE) decreases ($SE = s / \sqrt{n}$). More data reduces uncertainty, making the interval narrower and our estimate significantly more precise.

UX Research Note: With only 12 users, our intervals are relatively wide. To achieve higher precision for business decisions, increasing the sample size is more effective than lowering the confidence level.

3 THIRD CASE

🎯 Task 3: CTA Button A/B Testing

Confidence Interval for Proportions

Experiment Results:

  • Conversions (x): 85 users
  • 👥 Total Sample (n): 500 users
Sample Proportion ($\hat{p}$)

$\hat{p} = \frac{x}{n} = \frac{85}{500} = \mathbf{0.17}$ (17% Conversion Rate)

3.1 Calculating Sample Proportion

3.1

Sample Proportion Calculation

The sample proportion represents the ratio of users who performed the desired action (clicked the CTA button) relative to the total number of participants in the A/B test.

Sample Proportion Formula:
$$\hat{p} = \frac{x}{n} = \frac{156}{400}$$
$\hat{p} = 0.39$ (39%)
Conversions ($x$) 156 Users
Total Sample ($n$) 400 Users

3.2 Confidence Interval Calculation for Proportion

3.2

Confidence Interval Computation

We apply the Normal Approximation (Z-Interval for Proportions) given that the sample size is sufficiently large:

$$\hat{p} \pm Z \times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$
Standard Error (SE) Calculation:
$$SE = \sqrt{\frac{0.39 \times 0.61}{400}} = \sqrt{0.00059475} \approx \mathbf{0.02439}$$

Estimation Results by Confidence Level:

Confidence Level Z-Score Margin of Error Lower Bound Upper Bound
90% 1.645 4.01% 34.99% 43.01%
95% 1.960 4.78% 34.22% 43.78%
99% 2.576 6.28% 32.72% 45.28%

3.3 visualization

3.3

Click-Through Rate Visualization

This interactive chart visualizes the Click-Through Rate (CTR) estimation. It allows stakeholders to see how much the conversion range shifts as we increase our demand for certainty.

Error Bars: Represent the range where the true CTR likely exists.
Point Estimate: The white dot indicates our observed 39% conversion rate.

Interpretation: The 99% interval (Green) is the widest, showing that high certainty requires us to accept a broader range of possible conversion outcomes.

3.4 The Influence of Confidence Level in Decision Making

3.4

Impact on Product Decision-Making

In product development, selecting a Confidence Level is a strategic balance between risk mitigation and operational speed:

99% Confidence (Conservative / Low Risk)

Application: Critical changes like checkout flows or pricing models.
Trade-off: Produces a wider interval. If the lower bound (32.72%) still outperforms the baseline, the decision to launch is statistically "bulletproof."

95% Confidence (Industry Standard)

Application: General UI updates and feature experiments.
Trade-off: Offers the optimal balance between accuracy and certainty for most data-driven organizations.

90% Confidence (Aggressive / Rapid Prototyping)

Application: Early-stage iterations where speed is more valuable than precision.
Trade-off: Narrower interval (higher precision), but carries a 10% risk that the true CTR falls outside this range.

Final Product Recommendation

Assuming the baseline CTR is 30%, the new CTA design consistently outperforms the old one across all confidence levels—even at the most conservative 99% level (where the lowest bound of 32.7% is still > 30%).

Verdict: The new design is statistically superior and is highly recommended for full implementation.

4 FOURTH CASE

4.1

Statistical Methodology Identification

The choice between a Z-test and a t-test depends entirely on the availability of population parameters. Below is the identification for both teams:

🔵 Team A: Z-Interval

Distribution: Standard Normal (Z)
Justification: The population standard deviation (σ) is known. This allows for a more precise calculation using the exact population variance.

🟠 Team B: t-Interval

Distribution: Student's t
Justification: The population standard deviation is unknown. They must rely on the sample standard deviation (s) as an estimate, requiring the t-distribution to account for the added uncertainty.

Key Divider: σ Known (Z) vs. σ Unknown (t)

4.1 Confidence Interval Calculation (CI)

4.2

Comparative CI Calculations

Common Standard Error (SE) for Both Teams:

$$SE = \frac{\sigma \text{ or } s}{\sqrt{n}} = \frac{24}{\sqrt{36}} = \frac{24}{6} = \mathbf{4}$$
Team A (Z-Interval)
  • 90% (Z=1.645):
    $210 \pm (1.645 \times 4) = \mathbf{[203.42, 216.58]}$
  • 95% (Z=1.960):
    $210 \pm (1.960 \times 4) = \mathbf{[202.16, 217.84]}$
  • 99% (Z=2.576):
    $210 \pm (2.576 \times 4) = \mathbf{[199.70, 220.30]}$
Team B (t-Interval, df=35)
  • 90% (t=1.690):
    $210 \pm (1.690 \times 4) = \mathbf{[203.24, 216.76]}$
  • 95% (t=2.030):
    $210 \pm (2.030 \times 4) = \mathbf{[201.88, 218.12]}$
  • 99% (t=2.724):
    $210 \pm (2.724 \times 4) = \mathbf{[199.10, 220.90]}$

Key Observation: Team B's intervals are consistently wider than Team A's. This is because the $t$-critical values are higher than $Z$-critical values to compensate for the uncertainty of estimating the population standard deviation.

4.2 Comparison Visualization (Data Representation)

4.3

Comparative Visualization & Summary

This visualization highlights the margin of safety provided by the $t$-distribution. While both teams observed the same sample data, their differing knowledge of the population parameters leads to distinct estimation widths.

Analytical Summary:
  • Consistency: Both teams centered their intervals around the mean ($\bar{x} = 210$).
  • The "t-Factor": Team B's intervals are consistently wider (e.g., 21.80ms wide at 99% vs 20.60ms for Team A).
  • Risk: Using a Z-interval when the population $\sigma$ is actually unknown leads to underestimating uncertainty—making Team B's approach the standard for real-world scenarios.

4.3 Interpretation of Results

4.4

Final Interpretation of Results

Despite sharing identical sample means ($\bar{x} = 210$) and standard deviations ($24$), Team B’s intervals are consistently wider. This phenomenon is driven by three core statistical principles:

1. The "Uncertainty Penalty"

Team A possesses "certain" population information ($\sigma$), whereas Team B only has an estimate ($s$). The $t$-distribution compensates for this estimation risk by providing a wider range—essentially a safety margin for not knowing the true population variance.

2. Heavier Tails (Distribution Geometry)

The $t$-distribution is physically wider with "heavier tails" than the $Z$-distribution. Because more area is located in the tails, the Critical Values ($t^*$) are always larger than $Z^*$ for any finite sample size, directly increasing the Margin of Error.

3. The Sample Size Effect ($n=36$)

As sample size ($n$) increases, the $t$-distribution slowly converges to the $Z$-distribution. At $n=36$, the difference is subtle but significant enough to affect precision. If $n$ were smaller (e.g., $n=10$), the gap between Team A and Team B would be even more dramatic.

Conclusion for Engineering Teams

"In real-world API testing where population variance is rarely known, Team B’s approach is the more robust and scientifically accurate method for reporting latency reliability."

5 FIFTH CASE

5.1

One-Sided Statistical Identification

For this specific business case, the analytical focus shifts from finding a range to establishing a guaranteed minimum performance level.

Type of Interval: One-sided Lower Confidence Interval

Management is exclusively interested in the minimum threshold. We focus on the "Lower Bound" to verify if we can be 95% confident that the proportion of users is at least 70%.

Statistical Test: Z-test for Proportions

The Z-distribution is used because the sample size (n = 250) is sufficiently large. It satisfies the normality assumption for proportions where $n\hat{p} \geq 5$ and $n(1-\hat{p}) \geq 5$.

5.1 Calculation of the One-sided Lower Confidence Interval

5.2

Lower Bound Calculation

Sample Proportion ($\hat{p}$) 0.74 (74%)
Standard Error ($SE$) ≈ 0.0277

Standard Error Formula:

$$SE = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} = \sqrt{\frac{0.74 \times 0.26}{250}} \approx 0.0277$$

Critical Value for One-Sided 95%: We use $Z = 1.645$ (the value that leaves 5% in the lower tail).

5.2 visualization

5.3

Lower Bound Visualization

Visualisasi ini menggunakan One-Sided Error Bars. Tidak seperti interval standar yang memiliki dua ujung, di sini kita hanya fokus pada seberapa jauh performa bisa turun sebelum melewati batas kritis 70%.

Business Critical Insight:

Perhatikan garis putus-putus hijau (Target 70%). Pada tingkat kepercayaan 95%, ujung interval berada di 69.44%. Karena ujung ini berada di sebelah kiri target, secara statistik kita belum bisa menjamin target 70% tercapai sepenuhnya meskipun rata-ratanya tinggi (74%).

5.3 Is the 70% target statistically achieved?

5.4

Statutory Compliance: Is the 70% Target Met?

We evaluated the sample performance ($\hat{p} = 74\%$) against the management's minimum threshold of 70% across three levels of statistical rigor:

Confidence Level Lower Bound Status Management Interpretation
90% 70.45% PASSED We are 90% sure the actual rate is at least 70.45%.
95% 69.44% MARGINAL High likelihood of success, but statistically could fall below 70%.
99% 65.56% FAILED Under strict certainty, we cannot guarantee the 70% target.

📋 Final Conclusion for Management

While the observed sample proportion of 74% looks promising, the statistical validity of the 70% mandate depends on your risk tolerance.

If 90% certainty is sufficient for the business, the feature has passed. However, if the business requires the industry-standard 95% or higher, the current data is not yet strong enough to guarantee that the minimum 70% threshold has been secured.