Confidence Interval
Assignment ~ Week 13
1 About Confidence Interval
In practice, data-driven decision-making is often constrained by the absence of complete population information. As a result, the ability to perform statistical inference has become a crucial analytical competency. This report presents an in-depth analysis of five study cases that focus on the application of Confidence Intervals (CI) as a primary tool for quantifying uncertainty and assessing the reliability of statistical estimates.
A confidence interval provides a plausible range of values within which the true population parameter is expected to lie, rather than relying on a single point estimate that may be subject to sampling variability. Through these study cases, the analysis explores the use of Z-distributions for data with known population variance, t-distributions for smaller sample sizes with unknown variance, and proportion-based confidence intervals for categorical data. In addition, comparative analyses are conducted to evaluate differences between groups under varying statistical assumptions.
Each study case in this report is examined through four systematic stages:
- Identification of the Statistical Test: Determining
the appropriate inferential method based on data characteristics and
available parameters.
- Interval Computation: Calculating confidence
intervals at the 90%, 95%, and 99% levels to examine how varying
confidence levels influence estimation precision.
- Comparison Visualization: Presenting graphical
representations to enhance interpretation of interval width and
statistical significance.
- Business Interpretation: Translating statistical results into meaningful insights and recommendations for real-world decision-making contexts.
By understanding the outcomes of each scenario, this report aims to support more measurable and evidence-based decision-making processes, reduce uncertainty, and provide a robust foundation for conclusions grounded in valid statistical inference.
2 Case Study 1: Confidence Interval for Mean (Known Population Variance)
A business analytics team is analyzing the average number of daily transactions processed by an online platform. Based on historical operational records, the population standard deviation of daily transactions is known to be 3.2 transactions. A random sample of 100 days is collected, yielding a sample mean of 12.6 transactions per day.
The management team would like to estimate the true average number of daily transactions with varying levels of confidence in order to support strategic planning and operational decision-making.
Tasks:
- Identify the appropriate statistical test and justify your choice.
- Compute the Confidence Intervals for:
- \(90\%\)
- \(95\%\)
- \(99\%\)
- Create a comparison visualization of the three confidence intervals.
- Interpret the results in a business analytics context.
2.1 Identification of the Statistical Test
The objective of this study case is to estimate the population mean when the population standard deviation is known and the sample size is sufficiently large. Under these conditions, the appropriate statistical method is the Z-Confidence Interval for the population mean.
This method is justified because the sampling distribution of the sample mean follows a normal distribution, as guaranteed by the Central Limit Theorem.
2.2 Mathematical Computation of the Confidence Intervals
Given the following parameters:
\[ \bar{x} = 12.6, \quad \sigma = 3.2, \quad n = 100 \]
The general form of the confidence interval for the population mean when the population variance is known is:
\[ \bar{x} \pm z_{\alpha/2}\left(\frac{\sigma}{\sqrt{n}}\right) \]
First, compute the standard error:
\[ \frac{\sigma}{\sqrt{n}} = \frac{3.2}{\sqrt{100}} = \frac{3.2}{10} = 0.32 \]
2.2.1 Confidence Interval = 90%
For a 90% confidence level: \[ \alpha = 0.10 \quad \Rightarrow \quad z_{\alpha/2} = z_{0.05} = 1.645 \]
\[ \text{Margin of Error} = 1.645 \times 0.32 = 0.5264 \]
\[ \text{Lower Bound} = 12.6 - 0.5264 = 12.0736 \]
\[ \text{Upper Bound} = 12.6 + 0.5264 = 13.1264 \]
\[ \boxed{CI_{90\%} = (12.0736,\; 13.1264)} \]
2.2.2 Confidence Interval = 95%
For a 95% confidence level: \[ \alpha = 0.05 \quad \Rightarrow \quad z_{\alpha/2} = z_{0.025} = 1.96 \]
\[ \text{Margin of Error} = 1.96 \times 0.32 = 0.6272 \]
\[ \text{Lower Bound} = 12.6 - 0.6272 = 11.9728 \]
\[ \text{Upper Bound} = 12.6 + 0.6272 = 13.2272 \]
\[ \boxed{CI_{95\%} = (11.9728,\; 13.2272)} \]
2.2.3 Confidence Interval = 99%
For a 99% confidence level: \[ \alpha = 0.01 \quad \Rightarrow \quad z_{\alpha/2} = z_{0.005} = 2.576 \]
\[ \text{Margin of Error} = 2.576 \times 0.32 = 0.8243 \]
\[ \text{Lower Bound} = 12.6 - 0.8243 = 11.7757 \]
\[ \text{Upper Bound} = 12.6 + 0.8243 = 13.4243 \]
\[ \boxed{CI_{99\%} = (11.7757,\; 13.4243)} \]
These results clearly demonstrate that as the confidence level increases, the margin of error becomes larger, resulting in wider confidence intervals. This reflects the trade-off between precision and certainty in statistical estimation.
2.3 Comparison Visualization of the Confidence Intervals
x_bar <- 12.6
ci_plot <- data.frame(
Confidence_Level = c("90%", "95%", "99%"),
Mean = rep(x_bar, 3),
Lower = c(12.0736, 11.9728, 11.7757),
Upper = c(13.1264, 13.2272, 13.4243)
)
colors <- c("#1E90FF", "#FF8C00", "#DC143C")
par(
mar = c(6, 6, 5, 2),
bg = "#F8F9FA"
)
plot(
x = 1:3,
y = ci_plot$Mean,
ylim = range(ci_plot[, c("Lower", "Upper")]),
pch = 21,
bg = colors,
col = "black",
cex = 3,
xaxt = "n",
xlab = "Confidence Level",
ylab = "Mean Daily Transactions",
main = "Comparison of Confidence Intervals",
cex.lab = 1.8,
cex.main = 2,
font.main = 2
)
# GRID
grid(col = "gray85", lty = "dotted")
# ERROR BAR
arrows(
x0 = 1:3,
y0 = ci_plot$Lower,
x1 = 1:3,
y1 = ci_plot$Upper,
angle = 90,
code = 3,
length = 0.15,
col = colors,
lwd = 4
)
axis(
1,
at = 1:3,
labels = ci_plot$Confidence_Level,
cex.axis = 1.6,
font = 2
)
# GARIS MEAN
abline(
h = x_bar,
lty = 2,
col = "#2E8B57",
lwd = 3
)2.4 Interpretation
An error bar plot was used to visualize the 90%, 95%, and 99% confidence intervals of the mean daily transactions. It clearly shows the sample mean as a central point and the margin of error as vertical lines, making it easy to compare how interval width increases with higher confidence levels. The simplicity of the plot keeps the focus on the relationship between confidence level and interval width.
The 90 percent confidence interval provides a relatively narrow range, indicating a more precise estimate of the average daily transactions. This level of confidence is suitable for short-term operational decisions, such as daily monitoring and routine resource allocation, where higher precision is prioritized and a small level of uncertainty is acceptable.
The 95 percent confidence interval offers a balanced combination of precision and reliability. This level is widely used in business analytics and provides strong confidence that the true average daily transactions fall within the estimated range. It is most appropriate for medium-term planning decisions, including performance evaluation, budgeting, and capacity planning.
The 99 percent confidence interval results in the widest range, reflecting the highest level of certainty but lower precision. This conservative estimate is best suited for long-term and high-risk strategic decisions, such as infrastructure investment and system scalability planning, where decision errors could have significant consequences.
Overall, the results indicate that higher confidence levels provide greater certainty but result in wider intervals, while lower confidence levels offer more precise estimates with higher uncertainty. Therefore, the choice of confidence level should be adjusted to the risk level and decision horizon of the business context.
3 Case Study 2: Confidence Interval for Mean (sigma Unknown)
A UX Research team measured task completion time
(minutes) from 12 users for a new mobile
app:
\[
8.4, 7.9, 9.1, 8.7, 8.2, 9.0, 7.8, 8.5, 8.9, 8.1, 8.6, 8.3
\]
Task:
Identify the appropriate statistical test and explain why.
Compute the Confidence Intervals for:
- \(90\%\)
- \(95\%\)
- \(99\%\)
Visualize the three intervals on a single plot.
Explain how sample size and confidence level influence the interval width.
3.1 Identification of the Statistical Test
Since the population standard deviation is unknown and the sample size is small (n<30), the appropriate method is a t-Confidence Interval. The sample mean follows a t-Student distribution with \(n-1 = 11\) degrees of freedom.
3.2 Mathematical Computation of the Confidence Intervals
Given data:
\[ x = [8.4, 7.9, 9.1, 8.7, 8.2, 9.0, 7.8, 8.5, 8.9, 8.1, 8.6, 8.3], \quad n = 12 \]
Compute sample mean and standard deviation:
\[ \bar{x} = 8.525, \quad s = 0.431 \]
Compute standard error:
\[ SE = \frac{s}{\sqrt{n}} = \frac{0.431}{\sqrt{12}} \approx 0.124 \]
3.2.1 Confidence Interval = 90%
\[ \alpha = 0.10 \quad \Rightarrow \quad t_{\alpha/2, 11} = t_{0.05, 11} \approx 1.796 \]
\[ \text{Margin of Error} = 1.796 \times 0.124 \approx 0.223 \]
\[ \text{Lower Bound} = 8.525 - 0.223 = 8.302 \]
\[ \text{Upper Bound} = 8.525 + 0.223 = 8.748 \]
\[ \boxed{CI_{90\%} = (8.302, 8.748)} \]
3.2.2 Confidence Interval = 95%
\[ \alpha = 0.05 \quad \Rightarrow \quad t_{\alpha/2, 11} = t_{0.025, 11} \approx 2.201 \]
\[ \text{Margin of Error} = 2.201 \times 0.124 \approx 0.273 \]
\[ \text{Lower Bound} = 8.525 - 0.273 = 8.252 \]
\[ \text{Upper Bound} = 8.525 + 0.273 = 8.798 \]
\[ \boxed{CI_{95\%} = (8.252, 8.798)} \]
3.2.3 Confidence Interval = 99%
\[ \alpha = 0.01 \quad \Rightarrow \quad t_{\alpha/2, 11} = t_{0.005, 11} \approx 3.106 \]
\[ \text{Margin of Error} = 3.106 \times 0.124 \approx 0.385 \]
\[ \text{Lower Bound} = 8.525 - 0.385 = 8.140 \]
\[ \text{Upper Bound} = 8.525 + 0.385 = 8.910 \]
\[ \boxed{CI_{99\%} = (8.140, 8.910)} \]
3.3 Comparison Visualization of the Confidence Intervals
# Data
x <- c(8.4, 7.9, 9.1, 8.7, 8.2, 9.0, 7.8, 8.5, 8.9, 8.1, 8.6, 8.3)
n <- length(x)
x_bar <- mean(x)
s <- sd(x)
SE <- s / sqrt(n)
# Confidence levels and t-values
conf_levels <- c(0.90, 0.95, 0.99)
t_values <- qt(1 - (1 - conf_levels)/2, df = n - 1)
margins <- t_values * SE
# CI results
ci_plot <- data.frame(
Confidence_Level = c("90%", "95%", "99%"),
Mean = rep(x_bar, 3),
Lower = x_bar - margins,
Upper = x_bar + margins
)
colors <- c("#1E90FF", "#FF8C00", "#DC143C")
# Plot
par(mar=c(6,6,5,2), bg="#F8F9FA")
plot(
1:3,
ci_plot$Mean,
ylim = range(ci_plot[, c("Lower", "Upper")]),
pch=21,
bg=colors,
col="black",
cex=3,
xaxt="n",
xlab="Confidence Level",
ylab="Mean Task Completion Time (minutes)",
main="Comparison of Confidence Intervals (t-distribution)",
cex.lab=1.8,
cex.main=2,
font.main=2
)
grid(col="gray85", lty="dotted")
arrows(
x0=1:3,
y0=ci_plot$Lower,
x1=1:3,
y1=ci_plot$Upper,
angle=90,
code=3,
length=0.15,
col=colors,
lwd=4
)
axis(1, at=1:3, labels=ci_plot$Confidence_Level, cex.axis=1.6, font=2)
abline(h=x_bar, lty=2, col="#2E8B57", lwd=3)3.4 Interpretation
An error bar plot was used to visualize the 90%, 95%, and 99% confidence intervals of the mean task completion time. It displays the sample mean as the central point and the intervals as vertical lines, making it easy to compare the widths and see how higher confidence levels result in wider intervals. The plot focuses on the uncertainty around the sample mean and clearly fulfills the task objective.
The width of a confidence interval is influenced by both the sample size and the confidence level. A larger sample size reduces the standard error, resulting in a narrower interval, because the estimate of the mean becomes more precise. Conversely, a higher confidence level increases the critical t-value, which widens the interval, reflecting greater certainty that the true mean lies within the range. Therefore, narrower intervals occur with larger samples or lower confidence levels, while wider intervals result from smaller samples or higher confidence levels.
4 Case Study 3: Confidence Interval for a Proportion (A/B Testing)
A data science team runs an A/B test on a new Call-To-Action (CTA)
button design. The experiment yields:
\[
n = 400 \text{ (total users)}, \quad x = 156 \text{ (users who clicked
the CTA)}
\]
Tasks:
- Compute the sample proportion \(\hat{p}\)
- Compute the Confidence Intervals for:
- \(90\%\)
- \(95\%\)
- \(99\%\)
- Visualize and compare the three intervals
- Explain how confidence level affects decision-making in product experiments
4.1 Identification of the Statistical Method
The goal is to estimate the true proportion of users who
click the CTA.
Since we are working with categorical (binary) data,
the appropriate method is a proportion-based confidence
interval (using normal approximation because \(np\) and \(n(1-p)\) are sufficiently large).
4.2 Mathematical Computation
Compute the sample proportion:
\[ \hat{p} = \frac{x}{n} = \frac{156}{400} = 0.39 \]
Standard error:
\[ SE = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} = \sqrt{\frac{0.39 \times 0.61}{400}} \approx 0.0244 \]
4.2.1 Confidence Interval = 90%
\[ z_{0.05} = 1.645 \]
\[ ME = 1.645 \times 0.0244 \approx 0.0401 \]
\[ CI_{90\%} = 0.39 \pm 0.0401 = (0.3499, 0.4301) \]
4.2.2 Confidence Interval = 95%
\[ z_{0.025} = 1.96 \]
\[ ME = 1.96 \times 0.0244 \approx 0.0478 \]
\[ CI_{95\%} = 0.39 \pm 0.0478 = (0.3422, 0.4378) \]
4.2.3 Confidence Interval = 99%
\[ z_{0.005} = 2.576 \]
\[ ME = 2.576 \times 0.0244 \approx 0.0629 \]
\[ CI_{99\%} = 0.39 \pm 0.0629 = (0.3271, 0.4529) \]
4.3 Visualization of the Three Intervals
# Data
p_hat <- 0.39
ci_levels <- c("90%", "95%", "99%")
z_vals <- c(1.645, 1.96, 2.576)
SE <- sqrt(p_hat * (1 - p_hat) / 400)
Margins <- z_vals * SE
ci_plot <- data.frame(
Level = ci_levels,
Mean = rep(p_hat, 3),
Lower = p_hat - Margins,
Upper = p_hat + Margins
)
colors <- c("#1E90FF", "#FF8C00", "#DC143C")
plot(1:3, ci_plot$Mean, ylim=c(min(ci_plot$Lower)-0.01, max(ci_plot$Upper)+0.01),
xaxt="n", pch=19, col="black", cex=2,
xlab="Confidence Level", ylab="Proportion of Users Clicking CTA",
main="Comparison of Confidence Intervals for CTA Clicks")
axis(1, at=1:3, labels=ci_plot$Level)
for(i in 1:3){
lines(c(i,i), c(ci_plot$Lower[i], ci_plot$Upper[i]), lwd=3, col=colors[i])
}
points(1:3, rep(p_hat,3), pch=19, col="black")4.4 Interpretation
The widths of confidence intervals differ because they depend on the confidence level and sample size, not just the data itself. Even with similar data, a higher confidence level requires a larger critical value (z or t), which widens the interval to ensure greater certainty that the true parameter lies within it. Conversely, a lower confidence level uses a smaller critical value, producing a narrower interval. Additionally, small variations in sample statistics (like sample mean or proportion) can slightly change the interval bounds, but the main driver of width differences is the chosen confidence level.
An error bar plot was chosen to visualize the confidence intervals because it clearly shows the sample mean or proportion as a central point and the margin of error as vertical lines. This allows for an immediate comparison of interval widths across different confidence levels. The simplicity of the chart keeps the focus on how the interval changes with confidence level, which is the main objective of the analysis, without distractions from the underlying data distribution.
5 Case Study 4: Precision Comparison (Z-Test vs t-Test)
Two data teams measured API latency (milliseconds) under different conditions:
Team A (known σ):
\(n = 36, \bar{x} = 210, \sigma =
24\)
Team B (unknown σ):
\(n = 36, \bar{x} = 210, s = 24\)
Tasks:
- Identify the statistical test used by each team.
- Compute Confidence Intervals for:
- \(90\%\)
- \(95\%\)
- \(99\%\)
- Create a visualization comparing all intervals.
- Explain why the interval widths differ, even with similar data.
5.1 Identification of the Statistical Test
- Team A: Population standard deviation is known →
Z-Confidence Interval.
- Team B: Population standard deviation unknown → t-Confidence Interval with \(df = n-1 = 35\).
5.2 Mathematical Computation
5.2.1 Team A (Z-Test, σ known)
Given:
\[
n = 36, \quad \bar{x} = 210, \quad \sigma = 24
\]
Standard Error (SE):
\[
SE = \frac{\sigma}{\sqrt{n}} = \frac{24}{\sqrt{36}} = \frac{24}{6} = 4
\]
Confidence Intervals:
90% CI
\[ z_{0.05} = 1.645, \quad ME = z \cdot SE = 1.645 \cdot 4 = 6.58 \]
\[ CI_{90\%} = \bar{x} \pm ME = 210 \pm 6.58 = (203.42, 216.58) \]95% CI
\[ z_{0.025} = 1.96, \quad ME = 1.96 \cdot 4 = 7.84 \]
\[ CI_{95\%} = 210 \pm 7.84 = (202.16, 217.84) \]99% CI
\[ z_{0.005} = 2.576, \quad ME = 2.576 \cdot 4 = 10.30 \]
\[ CI_{99\%} = 210 \pm 10.30 = (199.70, 220.30) \]
5.2.2 Team B (t-Test, σ unknown)
Given:
\[
n = 36, \quad \bar{x} = 210, \quad s = 24, \quad df = n-1 = 35
\]
Standard Error (SE):
\[
SE = \frac{s}{\sqrt{n}} = \frac{24}{\sqrt{36}} = 4
\]
Confidence Intervals:
90% CI
\[ t_{0.05, 35} \approx 1.690, \quad ME = t \cdot SE = 1.690 \cdot 4 = 6.76 \]
\[ CI_{90\%} = 210 \pm 6.76 = (203.24, 216.76) \]95% CI
\[ t_{0.025, 35} \approx 2.030, \quad ME = 2.030 \cdot 4 = 8.12 \]
\[ CI_{95\%} = 210 \pm 8.12 = (201.88, 218.12) \]99% CI
\[ t_{0.005, 35} \approx 2.724, \quad ME = 2.724 \cdot 4 = 10.90 \]
\[ CI_{99\%} = 210 \pm 10.90 = (199.10, 220.90) \]
5.3 Comparison Visualization of the Confidence Intervals
Compare API latency measured by two teams. Team A uses a Z-Test (known σ), and Team B uses a t-Test (unknown σ).
# Parameters
n <- 36
x_bar <- 210
sigma <- 24 # Team A
s <- 24 # Team B
df <- n-1
# Standard Errors
SE_z <- sigma / sqrt(n)
SE_t <- s / sqrt(n)
# Sequence for plotting
x_vals <- seq(190, 230, length.out=500)
# Density curves
density_z <- dnorm(x_vals, mean=x_bar, sd=SE_z)
density_t <- dt((x_vals - x_bar)/SE_t, df=df)/SE_t # scaled t to x-axis
# Colors
col_blue <- "#1E90FF55" # semi-transparent
col_orange <- "#FF8C0055"
line_blue <- "#1E90FF"
line_orange <- "#FF8C00"
# Plot empty canvas
plot(x_vals, density_z, type="n",
ylim=c(0, max(density_z, density_t)*1.2),
xlab="API Latency (ms)", ylab="Density",
main="Density Plot: Z-Test vs t-Test with Vertical CI Lines")
# Fill area under curves
polygon(c(x_vals, rev(x_vals)), c(density_z, rep(0, length(density_z))),
col=col_blue, border=line_blue, lwd=2)
polygon(c(x_vals, rev(x_vals)), c(density_t, rep(0, length(density_t))),
col=col_orange, border=line_orange, lwd=2)
# Confidence Intervals
CI_levels <- c(0.90, 0.95, 0.99)
z_values <- c(1.645, 1.96, 2.576)
t_values <- qt(1 - (1 - CI_levels)/2, df = df)
# Draw vertical CI lines for Team A (Z-Test)
for(i in 1:3){
CI_z <- c(x_bar - z_values[i]*SE_z, x_bar + z_values[i]*SE_z)
segments(CI_z[1], 0, CI_z[1], dnorm(CI_z[1], mean=x_bar, sd=SE_z), col=line_blue, lwd=3)
segments(CI_z[2], 0, CI_z[2], dnorm(CI_z[2], mean=x_bar, sd=SE_z), col=line_blue, lwd=3)
text(mean(CI_z), max(density_z)*0.95 - 0.02*i,
paste0(CI_levels[i]*100, "%"), col=line_blue, font=2)
}
# Draw vertical CI lines for Team B (t-Test)
for(i in 1:3){
CI_t <- c(x_bar - t_values[i]*SE_t, x_bar + t_values[i]*SE_t)
segments(CI_t[1], 0, CI_t[1], dt((CI_t[1]-x_bar)/SE_t, df=df)/SE_t, col=line_orange, lwd=3)
segments(CI_t[2], 0, CI_t[2], dt((CI_t[2]-x_bar)/SE_t, df=df)/SE_t, col=line_orange, lwd=3)
text(mean(CI_t), max(density_t)*0.95 - 0.02*i,
paste0(CI_levels[i]*100, "%"), col=line_orange, font=2)
}
# Legend
legend("topright", legend=c("Team A (Z-Test)", "Team B (t-Test)"),
fill=c(col_blue, col_orange), border=c(line_blue, line_orange), bty="n")5.4 Interpretation
Even with the same sample mean and size, interval widths differ because Team A (Z-Test) uses the known σ, giving a narrower interval, while Team B (t-Test) uses the sample s, producing a wider t-interval due to heavier tails. Higher confidence levels also widen intervals, so 90% is narrowest, 95% medium, and 99% widest. The Z-Test is more precise, whereas the t-Test accounts for extra uncertainty.
Density plots show the shape of the estimated statistic (API latency mean). With Team A (Z-Test) and Team B (t-Test), we can see the difference in spread and peak. Z-Test appears sharper (narrower) while t-Test is wider because the t-distribution has heavier tails, especially when the population variance is unknown.
7 Reference
[1] Moore, D. S., McCabe, G. P., & Craig, B. A. (2022). Introduction to the Practice of Statistics (10th Edition). New York, NY: W.H. Freeman and Company.
[2] Agresti, A., & Finlay, B. (2020). Statistical Methods for the Social Sciences (5th ed.). Pearson.