Confidence Interval for Mean (σ Known)
An e-commerce platform aims to estimate the average number of daily transactions per user after launching a new feature.
Based on historical large-scale data, the population standard deviation is known.
Given Information:
Tasks:
Selected Method:
Z-Confidence Interval for Population Mean
Justification:
Therefore, the Z Confidence Interval is the most appropriate statistical approach for this analysis.
DIKETAHUI:
Perhitungan Confidence Interval (CI). Rumus umum Confidence Interval adalah: \[ \bar{x} \pm Z \left(\frac{\sigma}{\sqrt{n}}\right) \] Di mana Standard Error (SE) adalah: \[ SE = \frac{3.2}{\sqrt{100}} = \frac{3.2}{10} = 0.32 \]
| Confidence | Z Score | Margin | Lower | Upper |
|---|---|---|---|---|
| 90% | 1.645 | 0.526 | 12.074 | 13.126 |
| 95% | 1.960 | 0.627 | 11.973 | 13.227 |
| 99% | 2.576 | 0.824 | 11.776 | 13.424 |
In confidence interval analysis, an average transaction value of 12.6 exhibits varying degrees of uncertainty depending on the confidence level. As the confidence level increases, the interval becomes wider.
Based on the Confidence Interval (CI) analysis, the estimated mean number of daily transactions per user is 12.6 transactions. The confidence intervals constructed at the 90%, 95%, and 99% confidence levels indicate that the true population mean is highly likely to fall within these respective ranges.
At the 90% confidence level, the interval is relatively narrow, providing a more precise estimate but with a lower degree of certainty. In contrast, the 95% and 99% confidence levels produce wider intervals, reflecting a more conservative estimation approach with increased confidence. This highlights the trade-off between estimation precision and confidence level, which is a key consideration in business decision-making.
Business Analytics Perspective: The results suggest that the newly launched feature is associated with a stable and consistent level of user transaction activity. The 95% confidence interval is particularly recommended for managerial decision-making as it offers an optimal balance between reliability and accuracy.
Practically, ini enables the company to:
Overall, this analysis provides a statistically robust foundation for data-driven strategic decisions, supporting informed planning and sustainable growth.
Case Scenario:
A UX Research team is analyzing the task completion time (in minutes) for a newly launched mobile application. Data was collected from a sample of 12 users:
Select the appropriate test and provide a technical justification.
Calculate the intervals for 90%, 95%, and 99% confidence levels.
Generate a single-plot visualization to compare the three intervals.
Explain how sample size and confidence levels affect interval width.
Note: Since the population standard deviation (σ) is unknown and the sample size is small (n=12), a Student's t-distribution is required for this analysis.
UX Research Department • Statistical Report 2026
The appropriate statistical test for this analysis is the t-Interval (Student's t-Distribution).
The actual population variance is not provided in the data set. Therefore, we must estimate it using the sample standard deviation (s) as a proxy.
With a sample of only 12 users (n=12), the Central Limit Theorem's normal approximation is less reliable. The t-distribution provides a more conservative interval to account for this uncertainty.
σ Unknown + n < 30 = t-Interval Selection
First, we calculate the fundamental statistics from the user task completion data:
Standard Error Formula:
To effectively communicate the precision vs. confidence trade-off, the following interactive plot is constructed using ggplot2 and plotly. Key features include:
As the confidence level increases, the vertical error bars will visibly expand, reflecting the wider margin of error.
The width of a Confidence Interval (CI) determines the precision of our estimate. In this UX study, the interval width is influenced by two primary variables:
Relationship: Directly Proportional
Mechanism: To be more certain (e.g., 99% vs 90%) that the interval contains the true population mean, we must widen the range. Higher confidence requires a larger "safety net," resulting in a wider interval and lower precision.
Relationship: Inversely Proportional
Mechanism: As the number of users tested increases, the Standard Error (SE) decreases ($SE = s / \sqrt{n}$). More data reduces uncertainty, making the interval narrower and our estimate significantly more precise.
UX Research Note: With only 12 users, our intervals are relatively wide. To achieve higher precision for business decisions, increasing the sample size is more effective than lowering the confidence level.
Experiment Results:
$\hat{p} = \frac{x}{n} = \frac{85}{500} = \mathbf{0.17}$ (17% Conversion Rate)
The sample proportion represents the ratio of users who performed the desired action (clicked the CTA button) relative to the total number of participants in the A/B test.
We apply the Normal Approximation (Z-Interval for Proportions) given that the sample size is sufficiently large:
| Confidence Level | Z-Score | Margin of Error | Lower Bound | Upper Bound |
|---|---|---|---|---|
| 90% | 1.645 | 4.01% | 34.99% | 43.01% |
| 95% | 1.960 | 4.78% | 34.22% | 43.78% |
| 99% | 2.576 | 6.28% | 32.72% | 45.28% |
This interactive chart visualizes the Click-Through Rate (CTR) estimation. It allows stakeholders to see how much the conversion range shifts as we increase our demand for certainty.
Interpretation: The 99% interval (Green) is the widest, showing that high certainty requires us to accept a broader range of possible conversion outcomes.
In product development, selecting a Confidence Level is a strategic balance between risk mitigation and operational speed:
Application: Critical changes like checkout flows or pricing models.
Trade-off: Produces a wider interval. If the lower bound (32.72%) still outperforms the baseline, the decision to launch is statistically "bulletproof."
Application: General UI updates and feature experiments.
Trade-off: Offers the optimal balance between accuracy and certainty for most data-driven organizations.
Application: Early-stage iterations where speed is more valuable than precision.
Trade-off: Narrower interval (higher precision), but carries a 10% risk that the true CTR falls outside this range.
Assuming the baseline CTR is 30%, the new CTA design consistently outperforms the old one across all confidence levels—even at the most conservative 99% level (where the lowest bound of 32.7% is still > 30%).
Verdict: The new design is statistically superior and is highly recommended for full implementation.
The choice between a Z-test and a t-test depends entirely on the availability of population parameters. Below is the identification for both teams:
Distribution: Standard Normal (Z)
Justification: The population standard deviation (σ) is known. This allows for a more precise calculation using the exact population variance.
Distribution: Student's t
Justification: The population standard deviation is unknown. They must rely on the sample standard deviation (s) as an estimate, requiring the t-distribution to account for the added uncertainty.
Key Divider: σ Known (Z) vs. σ Unknown (t)
Common Standard Error (SE) for Both Teams:
Key Observation: Team B's intervals are consistently wider than Team A's. This is because the $t$-critical values are higher than $Z$-critical values to compensate for the uncertainty of estimating the population standard deviation.
This visualization highlights the margin of safety provided by the $t$-distribution. While both teams observed the same sample data, their differing knowledge of the population parameters leads to distinct estimation widths.
Despite sharing identical sample means ($\bar{x} = 210$) and standard deviations ($24$), Team B’s intervals are consistently wider. This phenomenon is driven by three core statistical principles:
Team A possesses "certain" population information ($\sigma$), whereas Team B only has an estimate ($s$). The $t$-distribution compensates for this estimation risk by providing a wider range—essentially a safety margin for not knowing the true population variance.
The $t$-distribution is physically wider with "heavier tails" than the $Z$-distribution. Because more area is located in the tails, the Critical Values ($t^*$) are always larger than $Z^*$ for any finite sample size, directly increasing the Margin of Error.
As sample size ($n$) increases, the $t$-distribution slowly converges to the $Z$-distribution. At $n=36$, the difference is subtle but significant enough to affect precision. If $n$ were smaller (e.g., $n=10$), the gap between Team A and Team B would be even more dramatic.
"In real-world API testing where population variance is rarely known, Team B’s approach is the more robust and scientifically accurate method for reporting latency reliability."
For this specific business case, the analytical focus shifts from finding a range to establishing a guaranteed minimum performance level.
Management is exclusively interested in the minimum threshold. We focus on the "Lower Bound" to verify if we can be 95% confident that the proportion of users is at least 70%.
The Z-distribution is used because the sample size (n = 250) is sufficiently large. It satisfies the normality assumption for proportions where $n\hat{p} \geq 5$ and $n(1-\hat{p}) \geq 5$.
Standard Error Formula:
Critical Value for One-Sided 95%: We use $Z = 1.645$ (the value that leaves 5% in the lower tail).
Visualisasi ini menggunakan One-Sided Error Bars. Tidak seperti interval standar yang memiliki dua ujung, di sini kita hanya fokus pada seberapa jauh performa bisa turun sebelum melewati batas kritis 70%.
Perhatikan garis putus-putus hijau (Target 70%). Pada tingkat kepercayaan 95%, ujung interval berada di 69.44%. Karena ujung ini berada di sebelah kiri target, secara statistik kita belum bisa menjamin target 70% tercapai sepenuhnya meskipun rata-ratanya tinggi (74%).
We evaluated the sample performance ($\hat{p} = 74\%$) against the management's minimum threshold of 70% across three levels of statistical rigor:
| Confidence Level | Lower Bound | Status | Management Interpretation |
|---|---|---|---|
| 90% | 70.45% | PASSED | We are 90% sure the actual rate is at least 70.45%. |
| 95% | 69.44% | MARGINAL | High likelihood of success, but statistically could fall below 70%. |
| 99% | 65.56% | FAILED | Under strict certainty, we cannot guarantee the 70% target. |
While the observed sample proportion of 74% looks promising, the statistical validity of the 70% mandate depends on your risk tolerance.
If 90% certainty is sufficient for the business, the feature has passed. However, if the business requires the industry-standard 95% or higher, the current data is not yet strong enough to guarantee that the minimum 70% threshold has been secured.