Confidence Interval ~ Week 13

logo week 10

Nazwa Nur Ramadhani

Undergraduate Student in Data Science at Institut Teknologi Sains Bandung

Case Study 1

Confidence Interval for Mean, \(\sigma\) Known: An e-commerce platform wants to estimate the average number of daily transactions per user after launching a new feature. Based on large-scale historical data, the population standard deviation is known.

\[ \begin{eqnarray*} \sigma &=& 3.2 \quad \text{(population standard deviation)} \\ n &=& 100 \quad \text{(sample size)} \\ \bar{x} &=& 12.6 \quad \text{(sample mean)} \end{eqnarray*} \]

Tasks:

Soal 1

1.Identify the appropriate statistical test and justify your choice.

The appropriate method is a Z confidence interval for the mean because the population standard deviation is known and the sample size is large, allowing the sampling distribution of the mean to be approximated by a normal distribution.

Soal 2

2.Compute the Confidence Intervals for:

  • \(90\%\)
  • \(95\%\)
  • \(99\%\)

Confidence Intervals for \(90\%\)

Sample size: \[n=100\]

Sample mean: \[\bar{x}=12.6\]

Population standard deviation (known): \[\sigma=3.2\]

Formula for CI using z-distribution: \[CI = \bar{x} \pm z_{\alpha/2}\left(\frac{\sigma}{\sqrt{n}}\right)\]

Critical value \(z\) for 90% CI:

  • Significance level: \[\alpha = 0.10,\quad \alpha/2 = 0.05\]

  • Standard normal distribution table: \(z_{0.05} = 1.645\)

Standard Error (SE): \[\begin{array}{rl} SE & = \frac{\sigma}{\sqrt{n}} \\ & = \frac{3.2}{\sqrt{100}} \\ & = \frac{3.2}{10} \\ & = 0.32 \end{array}\]

Margin of Error (ME): \[\begin{array}{rl} ME & = z_{\alpha/2} \times SE \\ & = 1.645 \times 0.32 \\ & \approx 0.526 \\ \end{array}\]

Confidence Interval: \[ \begin{array}{rl} CI_{90\%} & = \bar{x} \pm ME \\ & = 12.6 \pm 0.526 \\ & \approx (12.07,\; 13.13) \\ \end{array}\]

Confidence Intervals for \(95\%\)

Formula for CI using z-distribution: \[CI = \bar{x} \pm z_{\alpha/2}\left(\frac{\sigma}{\sqrt{n}}\right)\]

Critical value \(z\) for 95% CI:

  • Significance level: \[\alpha = 0.05,\quad \alpha/2 = 0.025\]

  • Standard normal distribution table: \(z_{0.025} = 1.96\)

Standard Error (SE): \[\begin{array}{rl} SE & = \frac{\sigma}{\sqrt{n}} \\ & = \frac{3.2}{\sqrt{100}} \\ & = \frac{3.2}{10} \\ & = 0.32 \end{array}\]

Margin of Error (ME): \[\begin{array}{rl} ME & = z_{\alpha/2} \times SE \\ & = 1.96 \times 0.32 \\ & \approx 0.627 \\ \end{array}\]

Confidence Interval: \[ \begin{array}{rl} CI_{95\%} & = \bar{x} \pm ME \\ & = 12.6 \pm 0.627 \\ & \approx (11.97,\; 13.23) \\ \end{array}\]

Confidence Intervals for \(99\%\)

Formula for CI using z-distribution: \[CI = \bar{x} \pm z_{\alpha/2}\left(\frac{\sigma}{\sqrt{n}}\right)\]

Critical value \(z\) for 99% CI:

  • Significance level: \[\alpha = 0.01,\quad \alpha/2 = 0.005\]

  • Standard normal distribution table: \(z_{0.005} = 2.575\)

Standard Error (SE): \[\begin{array}{rl} SE & = \frac{\sigma}{\sqrt{n}} \\ & = \frac{3.2}{\sqrt{100}} \\ & = \frac{3.2}{10} \\ & = 0.32 \end{array}\]

Margin of Error (ME): \[\begin{array}{rl} ME & = z_{\alpha/2} \times SE \\ & = 2.575 \times 0.32 \\ & = 0.824 \\ \end{array}\]

Confidence Interval: \[ \begin{array}{rl} CI_{99\%} & = \bar{x} \pm ME \\ & = 12.6 \pm 0.824 \\ & \approx (11.78,\; 13.42) \\ \end{array}\]

Soal 3

3.Create a comparison visualization of the three confidence intervals.

library(ggplot2)
library(dplyr)
library(plotly)

# Data z-distribution CI
mean_x <- 12.6
sigma <- 3.2
n <- 100
SE <- sigma / sqrt(n)  # SE = 0.32

# Confidence intervals
ci <- data.frame(
  level = c("90%", "95%", "99%"),
  lower = c(12.0736, 11.9728, 11.776),
  upper = c(13.1264, 13.2272, 13.424),
  z_value = c(1.645, 1.96, 2.575),
  ME = c(0.5264, 0.6272, 0.824)
)

# Colors
colors <- c(
  "99%" = "#FFDCE5",
  "95%" = "#D4F1F4",  
  "90%" = "#FFE8D1"   
)

# Create normal distribution curve
x_vals <- seq(mean_x - 4*SE, mean_x + 4*SE, length.out = 400)
density_vals <- dnorm(x_vals, mean = mean_x, sd = SE)
df_curve <- data.frame(x = x_vals, y = density_vals)

# Plot dengan desain yang rapi
p <- ggplot(df_curve, aes(x, y)) +
  
  # CI ribbons - dari terluar (99%) ke terdalam (90%)
  geom_ribbon(data = subset(df_curve, x >= ci$lower[3] & x <= ci$upper[3]),
              aes(ymin = 0, ymax = y),
              fill = colors["99%"], alpha = 0.7) +
  
  geom_ribbon(data = subset(df_curve, x >= ci$lower[2] & x <= ci$upper[2]),
              aes(ymin = 0, ymax = y),
              fill = colors["95%"], alpha = 0.7) +
  
  geom_ribbon(data = subset(df_curve, x >= ci$lower[1] & x <= ci$upper[1]),
              aes(ymin = 0, ymax = y),
              fill = colors["90%"], alpha = 0.7) +
  
  # Base bell curve
  geom_line(linewidth = 1.3, color = "gray20") +
  
  # Vertical mean line (dashed)
  geom_vline(xintercept = mean_x, color = "black", 
             linetype = "dashed", linewidth = 1) +
  
  # Vertical lines untuk CI boundaries (dotted) - 90%
  geom_vline(xintercept = ci$lower[1], 
             color = "#E8A860", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
  geom_vline(xintercept = ci$upper[1], 
             color = "#E8A860", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
  
  # 95%
  geom_vline(xintercept = ci$lower[2], 
             color = "#75B8BD", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
  geom_vline(xintercept = ci$upper[2], 
             color = "#75B8BD", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
  
  # 99%
  geom_vline(xintercept = ci$lower[3], 
             color = "#F5A3B5", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
  geom_vline(xintercept = ci$upper[3], 
             color = "#F5A3B5", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
  
  # Label mean
  annotate("text", x = mean_x, y = max(df_curve$y)*1.06, 
           label = "x˄ = 12.6", size = 5, fontface = "bold") +
  
  # CI labels di sisi kanan
  # 90% CI
  annotate("text", x = 13.7, y = max(df_curve$y)*0.30,
           label = "90% CI\n[12.07, 13.13]\nz = 1.645", 
           color = "#C8843D", size = 3.2, fontface = "bold", 
           hjust = 0, lineheight = 0.9) +
  
  # 95% CI
  annotate("text", x = 13.7, y = max(df_curve$y)*0.55,
           label = "95% CI\n[11.97, 13.23]\nz = 1.96", 
           color = "#4A8A8F", size = 3.2, fontface = "bold",
           hjust = 0, lineheight = 0.9) +
  
  # 99% CI
  annotate("text", x = 13.7, y = max(df_curve$y)*0.80,
           label = "99% CI\n[11.78, 13.42]\nz = 2.575", 
           color = "#D67889", size = 3.2, fontface = "bold",
           hjust = 0, lineheight = 0.9) +
  
  # Garis penghubung dari upper bound ke label
  annotate("segment", 
           x = ci$upper[1] + 0.02, xend = 13.68,
           y = max(df_curve$y)*0.30, yend = max(df_curve$y)*0.30,
           color = "#E8A860", linetype = "solid", linewidth = 0.3, alpha = 0.6) +
  
  annotate("segment", 
           x = ci$upper[2] + 0.02, xend = 13.68,
           y = max(df_curve$y)*0.55, yend = max(df_curve$y)*0.55,
           color = "#75B8BD", linetype = "solid", linewidth = 0.3, alpha = 0.6) +
  
  annotate("segment", 
           x = ci$upper[3] + 0.02, xend = 13.68,
           y = max(df_curve$y)*0.80, yend = max(df_curve$y)*0.80,
           color = "#F5A3B5", linetype = "solid", linewidth = 0.3, alpha = 0.6) +
  
  labs(
    title = "Confidence Intervals (z-distribution)",
    subtitle = "Sample Mean = 12.6  |  σ = 3.2  |  n = 100  |  SE = 0.32",
    x = "Value",
    y = "Probability Density"
  ) +
  
  scale_x_continuous(breaks = seq(11.5, 14, 0.5), 
                     limits = c(11.3, 14.3)) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
  
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold", size = 15, hjust = 0.5),
    plot.subtitle = element_text(size = 10.5, hjust = 0.5, color = "gray40"),
    panel.grid.minor = element_blank(),
    panel.grid.major.y = element_line(color = "gray90"),
    axis.line.x = element_line(color = "gray30"),
    plot.margin = margin(10, 10, 10, 10)
  )

# Convert to interactive plotly
ggplotly(p)

Interpretation:

The plot shows a normal distribution centered at the sample mean of 12.6, with three confidence intervals drawn around it. The 90% CI (12.07–13.13) is the narrowest, meaning the estimate is more precise but less certain. The 95% CI (11.97–13.23) is slightly wider, reflecting higher confidence. The 99% CI (11.78–13.42) is the widest, showing the broadest range of plausible population means. As the confidence level increases, the interval expands outward symmetrically from the mean, demonstrating the standard relationship (If the confidence is higher, then the interval is wider).

Soal 4

4.Interpret the results in a business analytics context.

The confidence interval analysis shows that the estimated average number of daily transactions per user is centered at 12.6, and all intervals fall within a relatively tight range. This suggests that user behavior is fairly consistent and the average is unlikely to deviate far from this value. As the confidence level increases, the interval becomes wider, meaning the business gains more certainty but with less precision about the exact value. This trade-off is important for decision-making: if the business needs a more confident estimate for planning, they may rely on a wider interval, while more precise decisions can use the narrower ones. Overall, the results indicate stable transaction patterns that can support reliable operational and strategic decisions.

Case Study 2

Confidence Interval for Mean, \(\sigma\) Unknown: A UX Research team analyzes task completion time (in minutes) for a new mobile application. The data are collected from 12 users:

\[ 8.4,\; 7.9,\; 9.1,\; 8.7,\; 8.2,\; 9.0,\; 7.8,\; 8.5,\; 8.9,\; 8.1,\; 8.6,\; 8.3 \]

Tasks:

Soal 1

1.Identify the appropriate statistical test and explain why.

The appropriate statistical method is a one-sample t confidence interval because the population standard deviation is unknown and the sample size is small \((n = 12)\). The t-distribution is therefore used to estimate the population mean task completion time.

Soal 2

2.Compute the Confidence Intervals for:

  • \(90\%\)
  • \(95\%\)
  • \(99\%\)

Confidence Intervals for \(90\%\)

Sample size: \[n=12\]

Sample mean: \[\begin{array}{rl} \bar{x} &= \frac{1}{n} \sum_{i=1}^{n} x_i \\[2mm] &= \frac{1}{12} (x_1 + x_2 + \cdots + x_{12}) \\[1mm] &= \frac{1}{12} (8.4+7.9+9.1+8.7+8.2+9.0+7.8+8.5+8.9+8.1+8.6+8.3) \\[1mm] &= \frac{101.5}{12} \\[1mm] &= 8.458 \ \text{minutes} \end{array}\]

Sample standard deviation

\[ \begin{array}{rl} s & = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}} \\[2mm] & = \sqrt{\frac{\sum (x_i - 8.458)^2}{12-1}} \\[2mm] & = \sqrt{\frac{1.947}{11}} \\[1mm] & \approx 0.42\ \text{minutes} \end{array} \]

Degrees of freedom: \[df = n - 1 = 11\]

Critical \(t\) value for 90% CI

  • Significance level: \[\alpha = 0.10,\quad \alpha/2 = 0.05\]

  • From the t-table: \[t_{0.05,\,11} \approx 1.796\]

Standard Error (SE): \[\begin{array}{rl} SE & = \frac{s}{\sqrt{n}} \\ & = \frac{0.42}{\sqrt{12}} \\ & \approx 0.121 \end{array}\]

Margin of Error (ME): \[\begin{array}{rl} ME & = t_{\alpha/2,\,df} \times SE \\ & = 1.796 \times 0.121 \\ & \approx 0.217 \end{array}\]

Confidence Interval: \[\begin{array}{rl} CI_{90\%} & = \bar{x} \pm ME \\ & = 8.458 \pm 0.217 \\ & \approx (8.240,\; 8.680) \end{array}\]

Confidence Intervals for \(95\%\)

Critical \(t\) value for 95% CI

  • Significance level: \[\alpha = 0.05,\quad \alpha/2 = 0.025\]

  • From the t-table: \[t_{0.025,\,11} \approx 2.201\]

Standard Error (SE): \[\begin{array}{rl} SE & = \frac{s}{\sqrt{n}} \\ & = \frac{0.42}{\sqrt{12}} \\ & \approx 0.121 \end{array}\]

Margin of Error (ME): \[\begin{array}{rl} ME & = t_{\alpha/2,\,df} \times SE \\ & = 2.201 \times 0.121 \\ & \approx 0.266 \end{array}\]

Confidence Interval: \[\begin{array}{rl} CI_{95\%} & = \bar{x} \pm ME \\ & = 8.458 \pm 0.266 \\ & \approx (8.190,\; 8.720) \end{array}\]

Confidence Intervals for \(99\%\)

Critical \(t\) value for 99% CI

  • Significance level: \[\alpha = 0.01,\quad \alpha/2 = 0.005\]

  • From the t-table: \[t_{0.025,\,11} \approx 3.106\]

Standard Error (SE): \[\begin{array}{rl} SE & = \frac{s}{\sqrt{n}} \\ & = \frac{0.42}{\sqrt{12}} \\ & \approx 0.121 \end{array}\]

Margin of Error (ME): \[\begin{array}{rl} ME & = t_{\alpha/2,\,df} \times SE \\ & = 3.106 \times 0.121 \\ & \approx 0.380 \end{array}\]

Confidence Interval: \[\begin{array}{rl} CI_{99\%} & = \bar{x} \pm ME \\ & = 8.458 \pm 0.380 \\ & \approx (8.080,\; 8.830) \end{array}\]

Soal 3

3.Visualize the three intervals on a single plot.

library(ggplot2)
library(dplyr)
library(plotly)

# Data t-distribution CI (small sample)
mean_x <- 8.458
s <- 0.42
n <- 12
df <- n - 1  # df = 11
SE <- s / sqrt(n)  # SE = 0.121

# Confidence intervals
ci <- data.frame(
  level = c("90%", "95%", "99%"),
  lower = c(8.240, 8.190, 8.080),
  upper = c(8.680, 8.720, 8.830),
  t_value = c(1.796, 2.201, 3.106),
  ME = c(0.217, 0.266, 0.380)
)

# Colors
colors <- c(
  "99%" = "#E6D5F5",
  "95%" = "#FFE5CC",  
  "90%" = "#D1F2EB"   
)

# Create t-distribution curve
x_vals <- seq(mean_x - 4.5*SE, mean_x + 4.5*SE, length.out = 400)
# Using t-distribution with df=11
density_vals <- dt((x_vals - mean_x)/SE, df = df) / SE
df_curve <- data.frame(x = x_vals, y = density_vals)

# Plot dengan desain yang rapi
p <- ggplot(df_curve, aes(x, y)) +
  
  # CI ribbons - dari terluar (99%) ke terdalam (90%)
  geom_ribbon(data = subset(df_curve, x >= ci$lower[3] & x <= ci$upper[3]),
              aes(ymin = 0, ymax = y),
              fill = colors["99%"], alpha = 0.7) +
  
  geom_ribbon(data = subset(df_curve, x >= ci$lower[2] & x <= ci$upper[2]),
              aes(ymin = 0, ymax = y),
              fill = colors["95%"], alpha = 0.7) +
  
  geom_ribbon(data = subset(df_curve, x >= ci$lower[1] & x <= ci$upper[1]),
              aes(ymin = 0, ymax = y),
              fill = colors["90%"], alpha = 0.7) +
  
  # Base bell curve
  geom_line(linewidth = 1.3, color = "gray20") +
  
  # Vertical mean line (dashed)
  geom_vline(xintercept = mean_x, color = "black", 
             linetype = "dashed", linewidth = 1) +
  
  # Vertical lines untuk CI boundaries (dotted) - 90%
  geom_vline(xintercept = ci$lower[1], 
             color = "#7AC5B0", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
  geom_vline(xintercept = ci$upper[1], 
             color = "#7AC5B0", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
  
  # 95%
  geom_vline(xintercept = ci$lower[2], 
             color = "#E6A962", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
  geom_vline(xintercept = ci$upper[2], 
             color = "#E6A962", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
  
  # 99%
  geom_vline(xintercept = ci$lower[3], 
             color = "#B399CC", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
  geom_vline(xintercept = ci$upper[3], 
             color = "#B399CC", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
  
  # Label mean
  annotate("text", x = mean_x, y = max(df_curve$y)*1.06, 
           label = "x˄ = 8.458", size = 5, fontface = "bold") +
  
  # CI labels di sisi kanan
  # 90% CI
  annotate("text", x = 9.05, y = max(df_curve$y)*0.28,
           label = "90% CI\n[8.240, 8.680]\nt = 1.796", 
           color = "#4A9B85", size = 3.2, fontface = "bold", 
           hjust = 0, lineheight = 0.9) +
  
  # 95% CI
  annotate("text", x = 9.05, y = max(df_curve$y)*0.55,
           label = "95% CI\n[8.190, 8.720]\nt = 2.201", 
           color = "#C88641", size = 3.2, fontface = "bold",
           hjust = 0, lineheight = 0.9) +
  
  # 99% CI
  annotate("text", x = 9.05, y = max(df_curve$y)*0.82,
           label = "99% CI\n[8.080, 8.830]\nt = 3.106", 
           color = "#8B6FA3", size = 3.2, fontface = "bold",
           hjust = 0, lineheight = 0.9) +
  
  # Garis penghubung dari upper bound ke label
  annotate("segment", 
           x = ci$upper[1] + 0.01, xend = 9.03,
           y = max(df_curve$y)*0.28, yend = max(df_curve$y)*0.28,
           color = "#7AC5B0", linetype = "solid", linewidth = 0.3, alpha = 0.6) +
  
  annotate("segment", 
           x = ci$upper[2] + 0.01, xend = 9.03,
           y = max(df_curve$y)*0.55, yend = max(df_curve$y)*0.55,
           color = "#E6A962", linetype = "solid", linewidth = 0.3, alpha = 0.6) +
  
  annotate("segment", 
           x = ci$upper[3] + 0.01, xend = 9.03,
           y = max(df_curve$y)*0.82, yend = max(df_curve$y)*0.82,
           color = "#B399CC", linetype = "solid", linewidth = 0.3, alpha = 0.6) +
  
  labs(
    title = "Confidence Interval (t-Distribution)",
    subtitle = "Sample Mean = 8.458 min  |  s = 0.42  |  n = 12  |  df = 11  |  SE = 0.121",
    x = "Time (minutes)",
    y = "Probability Density"
  ) +
  
  scale_x_continuous(breaks = seq(7.9, 9.0, 0.2), 
                     limits = c(7.8, 9.5)) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
  
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold", size = 15, hjust = 0.5),
    plot.subtitle = element_text(size = 10, hjust = 0.5, color = "gray40"),
    panel.grid.minor = element_blank(),
    panel.grid.major.y = element_line(color = "gray90"),
    axis.line.x = element_line(color = "gray30"),
    plot.margin = margin(10, 10, 10, 10)
  )

# Convert to interactive plotly
ggplotly(p)

Interpretation:

The plot shows a t-distribution centered at the sample mean of 8.458 minutes, based on a small sample of 12 users. Because the sample size is small and the population standard deviation is unknown, the confidence intervals are wider than they would be under a normal z-distribution.

The three intervals (90%, 95%, and 99%) all stay fairly close to the sample mean, indicating that user task-completion times are consistent. As the confidence level increases, the interval expands (from about 8.24–8.68 at 90% to 8.08–8.83 at 99%), showing the usual trade-off: more certainty leads to a wider possible range. Overall, the visualization suggests that the average task time is stable around 8.5 minutes, and even with higher confidence levels, the estimate remains within a narrow and reliable range.

Soal 4

4.Explain how sample size and confidence level influence the interval width.

All confidence intervals are calculated using the same sample size (n = 12), so the interval width is mainly affected by the confidence level. As the confidence level increases from 90% - 95% - 99%$, the intervals become noticeably wider. This happens because a higher confidence level requires a larger margin of error to ensure the true mean is captured with greater certainty. If the sample size were increased, the intervals in the graph would shrink. A larger sample size reduces variability and produces a smaller standard error, which narrows the confidence interval. In contrast, a smaller sample size would make the intervals wider due to increased uncertainty.

Case Study 3

Confidence Interval for a Proportion, A/B Testing: A data science team runs an A/B test on a new Call-To-Action (CTA) button design. The experiment yields:

\[ \begin{eqnarray*} n &=& 400 \quad \text{(total users)} \\ x &=& 156 \quad \text{(users who clicked the CTA)} \end{eqnarray*} \]

Tasks:

Soal 1

1.Compute the sample proportion \(\hat{p}\).

Sample proportion:

The sample proportion \(\hat{p}\) is: \[\hat{p} = \frac{x}{n} = \frac{156}{400} = 0.39\]

So about 39% of sampled users clicked the CTA.

Soal 2

2.Compute Confidence Intervals for the proportion at:

  • \(90\%\)
  • \(95\%\)
  • \(99\%\)

Confidence Intervals for the proportion at \(90\%\)

Standard Error(SE): \[\begin{array}{rl} SE &= \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}} \\ & = \sqrt{\frac{0.39(1 - 0.39)}{400}} \\ & = \sqrt{\frac{0.39 \times 0.61}{400}} \\ & \approx 0.024 \end{array}\]

Critical value:

For 90% confidence level: \[z_{\alpha/2} = 1.645\]

Margin of Error (ME): \[ME = z_{\alpha/2} \times SE = 1.645 \times 0.024 \approx 0.039\]

Confidence Interval: \[\begin{array}{rl} CI_{90\%} & = \hat{p} \pm ME \\ & = 0.39 \pm 0.039\\ &\approx (0.3,\; 0.420) \end{array}\]

Confidence Intervals for the proportion at \(95\%\)

Standard Error(SE): \[\begin{array}{rl} SE &= \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}} \\ & = \sqrt{\frac{0.39(1 - 0.39)}{400}} \\ & = \sqrt{\frac{0.39 \times 0.61}{400}} \\ & \approx 0.024 \end{array}\]

Critical value:

For 95% confidence level: \[z_{\alpha/2} = 1.96\]

Margin of Error (ME): \[ME = z_{\alpha/2} \times SE = 1.96 \times 0.024 \approx 0.047\]

Confidence Interval: \[\begin{array}{rl} CI_{95\%} & = \hat{p} \pm ME \\ & = 0.39 \pm 0.047\\ &\approx (0.350,\; 0.440) \end{array}\]

Confidence Intervals for the proportion at \(99\%\)

Standard Error(SE): \[\begin{array}{rl} SE &= \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}} \\ & = \sqrt{\frac{0.39(1 - 0.39)}{400}} \\ & = \sqrt{\frac{0.39 \times 0.61}{400}} \\ & \approx 0.024 \end{array}\]

Critical value:

For 99% confidence level: \[z_{\alpha/2} = 2.575\]

Margin of Error (ME): \[ME = z_{\alpha/2} \times SE = 2.575 \times 0.024 \approx 0.061\]

Confidence Interval: \[\begin{array}{rl} CI_{99\%} & = \hat{p} \pm ME \\ & = 0.39 \pm 0.061\\ &\approx (0.330,\; 0.450) \end{array}\]

Soal 3

3.Visualize and compare the three intervals.

library(ggplot2)
library(dplyr)
library(plotly)

# Data Proportion CI
p_hat <- 0.39
n <- 400
SE <- 0.024

# Confidence intervals
ci <- data.frame(
  level = c("90%", "95%", "99%"),
  lower = c(0.350, 0.340, 0.330),
  upper = c(0.430, 0.440, 0.450),
  z_value = c(1.645, 1.96, 2.575),
  ME = c(0.039, 0.047, 0.061)
)

# Colors
colors <- c(
  "99%" = "#D4A5A5",  
  "95%" = "#B5D4C6",  
  "90%" = "#C9B8D4"   
)

# Create normal distribution curve for proportion
x_vals <- seq(p_hat - 4.5*SE, p_hat + 4.5*SE, length.out = 400)
density_vals <- dnorm(x_vals, mean = p_hat, sd = SE)
df_curve <- data.frame(x = x_vals, y = density_vals)

# Plot dengan desain yang rapi
p <- ggplot(df_curve, aes(x, y)) +
  
  # CI ribbons - dari terluar (99%) ke terdalam (90%)
  geom_ribbon(data = subset(df_curve, x >= ci$lower[3] & x <= ci$upper[3]),
              aes(ymin = 0, ymax = y),
              fill = colors["99%"], alpha = 0.7) +
  
  geom_ribbon(data = subset(df_curve, x >= ci$lower[2] & x <= ci$upper[2]),
              aes(ymin = 0, ymax = y),
              fill = colors["95%"], alpha = 0.7) +
  
  geom_ribbon(data = subset(df_curve, x >= ci$lower[1] & x <= ci$upper[1]),
              aes(ymin = 0, ymax = y),
              fill = colors["90%"], alpha = 0.7) +
  
  # Base bell curve
  geom_line(linewidth = 1.3, color = "gray25") +
  
  # Vertical mean line (dashed)
  geom_vline(xintercept = p_hat, color = "black", 
             linetype = "dashed", linewidth = 1) +
  
  # Vertical lines untuk CI boundaries (dotted) - 90% CI
  geom_vline(xintercept = ci$lower[1], 
             color = "#9B7FB8", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
  geom_vline(xintercept = ci$upper[1], 
             color = "#9B7FB8", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
  
  # 95% CI
  geom_vline(xintercept = ci$lower[2], 
             color = "#7DA891", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
  geom_vline(xintercept = ci$upper[2], 
             color = "#7DA891", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
  
  # 99% CI
  geom_vline(xintercept = ci$lower[3], 
             color = "#A67272", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
  geom_vline(xintercept = ci$upper[3], 
             color = "#A67272", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
  
  # Label mean
  annotate("text", x = p_hat, y = max(df_curve$y)*1.07, 
           label = "p˂ = 0.39", size = 5.5, fontface = "bold") +
  
  # CI labels di sisi kanan
  # 90% CI
  annotate("text", x = 0.475, y = max(df_curve$y)*0.28,
           label = "90% CI\n[0.350, 0.430]\nz = 1.645", 
           color = "#6F4B8B", size = 3.2, fontface = "bold", 
           hjust = 0, lineheight = 0.9) +
  
  # 95% CI
  annotate("text", x = 0.475, y = max(df_curve$y)*0.55,
           label = "95% CI\n[0.340, 0.440]\nz = 1.96", 
           color = "#4A7C59", size = 3.2, fontface = "bold",
           hjust = 0, lineheight = 0.9) +
  
  # 99% CI
  annotate("text", x = 0.475, y = max(df_curve$y)*0.82,
           label = "99% CI\n[0.330, 0.450]\nz = 2.575", 
           color = "#8B5757", size = 3.2, fontface = "bold",
           hjust = 0, lineheight = 0.9) +
  
  # Garis penghubung dari upper bound ke label
  annotate("segment", 
           x = ci$upper[1] + 0.003, xend = 0.473,
           y = max(df_curve$y)*0.28, yend = max(df_curve$y)*0.28,
           color = "#9B7FB8", linetype = "solid", linewidth = 0.3, alpha = 0.6) +
  
  annotate("segment", 
           x = ci$upper[2] + 0.003, xend = 0.473,
           y = max(df_curve$y)*0.55, yend = max(df_curve$y)*0.55,
           color = "#7DA891", linetype = "solid", linewidth = 0.3, alpha = 0.6) +
  
  annotate("segment", 
           x = ci$upper[3] + 0.003, xend = 0.473,
           y = max(df_curve$y)*0.82, yend = max(df_curve$y)*0.82,
           color = "#A67272", linetype = "solid", linewidth = 0.3, alpha = 0.6) +
  
  labs(
    title = "Confidence Intervals for Proportion",
    subtitle = "Sample Proportion p˂ = 0.39  |  n = 400  |  SE = 0.024",
    x = "Proportion (p)",
    y = "Probability Density"
  ) +
  
  scale_x_continuous(breaks = seq(0.28, 0.50, 0.02), 
                     limits = c(0.26, 0.54)) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
  
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold", size = 15, hjust = 0.5),
    plot.subtitle = element_text(size = 10.5, hjust = 0.5, color = "gray40"),
    panel.grid.minor = element_blank(),
    panel.grid.major.y = element_line(color = "gray90"),
    axis.line.x = element_line(color = "gray30"),
    plot.margin = margin(10, 10, 10, 10)
  )

# Convert to interactive plotly
ggplotly(p)

Interpretation:

  • Confidence Interval \(90\%\): This interval has a range of 7.8 percentage points (from 35.1% to 42.9%). This is the narrowest interval, thus providing the most precise estimate, but it has the lowest confidence level among the three intervals.

  • Confidence Interval \(95\%\): This interval has a range of 9.4 percentage points (from 0.343, 0.437), slightly wider than the 90% interval. The 95% is a good compromise between precision and confidence level.

  • Confidence Interval \(99\%\): This interval has a range of 12.4 percentage points (from 0.328 to 0.452). It is the widest among all three intervals. The estimate becomes less precise because the interval spreads out more, but in return, we gain the highest level of confidence that the true population proportion lies within this range.

Soal 4

4.Explain how confidence level affects decision-making in product experiments.

The graph shows that higher confidence levels produce wider intervals. In product experiments, this affects how clearly we can judge the impact of a feature. The 90% interval is narrow, so it gives a more precise estimate, making decisions faster though with lower certainty. The 95% interval offers a balanced view, with good certainty and still-manageable width. At 99%, the interval becomes much wider, making the result harder to interpret because the true effect could fall anywhere in a large range. This means higher confidence gives more assurance, but reduces precision, which can slow down or complicate decision-making.

Case Study 4

Precision Comparison (Z-Test vs t-Test): Two data teams measure API latency (in milliseconds) under different conditions.

\[\begin{eqnarray*} \text{Team A:} \\ n &=& 36 \quad \text{(sample size)} \\ \bar{x} &=& 210 \quad \text{(sample mean)} \\ \sigma &=& 24 \quad \text{(known population standard deviation)} \\[6pt] \text{Team B:} \\ n &=& 36 \quad \text{(sample size)} \\ \bar{x} &=& 210 \quad \text{(sample mean)} \\ s &=& 24 \quad \text{(sample standard deviation)} \end{eqnarray*}\]

Tasks:

Soal 1

1.Identify the statistical test used by each team.

  • Team A uses a Z-test because the population standard deviation \((\sigma = 24)\) is known. Even though the sample size is 36 (large), the key requirement for a Z-test is having σ. Therefore, a one-sample Z-test or Z-based confidence interval is appropriate.

  • Team B uses a t-test because the population standard deviation is unknown. They only have the sample standard deviation \((s = 24)\), which means the variability of the population must be estimated from the sample. Even with a large sample size \((n = 36)\), the correct procedure is still a one-sample t-test or t-based confidence interval when \(\sigma\) is unknown.

Soal 2

2.Compute Confidence Intervals for 90%, 95%, and 99%.

Team A (z-distribution)

\[\begin{eqnarray*} \text{Team A:} \\ n &=& 36 \quad \text{(sample size)} \\ \bar{x} &=& 210 \quad \text{(sample mean)} \\ \sigma &=& 24 \quad \text{(known population standard deviation)} \\[6pt] \end{eqnarray*}\]

Confidence intervals for 90%

Formula for CI using z-distribution: \[CI = \bar{x} \pm z_{\alpha/2}\left(\frac{\sigma}{\sqrt{n}}\right)\]

Critical value \(z\) for 90% CI:

  • Significance level: \[\alpha = 0.10,\quad \alpha/2 = 0.05\]

  • Standard normal distribution table: \(z_{0.05} = 1.645\)

Standard Error (SE): \[\begin{array}{rl} SE & = \frac{\sigma}{\sqrt{n}} \\ & = \frac{24}{\sqrt{36}} \\ & = \frac{24}{6} \\ & = 4 \end{array}\]

Margin of Error (ME): \[\begin{array}{rl} ME & = z_{\alpha/2} \times SE \\ & = 1.645 \times 4 \\ & = 6.58 \\ \end{array}\]

Confidence Interval: \[ \begin{array}{rl} CI_{90\%} & = \bar{x} \pm ME \\ & = 210 \pm 6.58 \\ & = (203.4,\; 216.6) \\ \end{array}\]

Confidence intervals for 95%

Critical value \(z\) for 95% CI:

  • Significance level: \[\alpha = 0.05,\quad \alpha/2 = 0.025\]

  • Standard normal distribution table: \(z_{0.05} = 1.96\)

Standard Error (SE): \[\begin{array}{rl} SE & = \frac{\sigma}{\sqrt{n}} \\ & = \frac{24}{\sqrt{36}} \\ & = \frac{24}{6} \\ & = 4 \end{array}\]

Margin of Error (ME): \[\begin{array}{rl} ME & = z_{\alpha/2} \times SE \\ & = 1.96 \times 4 \\ & \approx 7.80 \\ \end{array}\]

Confidence Interval: \[ \begin{array}{rl} CI_{95\%} & = \bar{x} \pm ME \\ & = 210 \pm 7.80 \\ & = (202.2,\; 217.8) \\ \end{array}\]

Confidence intervals for 99%

Critical value \(z\) for 99% CI:

  • Significance level: \[\alpha = 0.01,\quad \alpha/2 = 0.005\]

  • Standard normal distribution table: \(z_{0.05} = 2.575\)

Standard Error (SE): \[\begin{array}{rl} SE & = \frac{\sigma}{\sqrt{n}} \\ & = \frac{24}{\sqrt{36}} \\ & = \frac{24}{6} \\ & = 4 \end{array}\]

Margin of Error (ME): \[\begin{array}{rl} ME & = z_{\alpha/2} \times SE \\ & = 2.575 \times 4 \\ & \approx 10.3 \\ \end{array}\]

Confidence Interval: \[ \begin{array}{rl} CI_{99\%} & = \bar{x} \pm ME \\ & = 210 \pm 10.3 \\ & = (199.7,\; 220.3) \\ \end{array}\]

Team B (t-distribution)

\[\begin{eqnarray*} \text{Team B:} \\ n &=& 36 \quad \text{(sample size)} \\ \bar{x} &=& 210 \quad \text{(sample mean)} \\ s &=& 24 \quad \text{(sample standard deviation)} \end{eqnarray*}\]

Confidence intervals for \(90\%\)

Degrees of freedom: \[df = n - 1 = 35\]

Critical \(t\) value for 90% CI

  • Significance level: \[\alpha = 0.10,\quad \alpha/2 = 0.05\]

  • From the t-table: \[t_{0.05,\,35} \approx 1.697\]

Standard Error (SE): \[\begin{array}{rl} SE & = \frac{s}{\sqrt{n}} \\ & = \frac{24}{\sqrt{36}} \\ & = 4 \end{array}\]

Margin of Error (ME): \[\begin{array}{rl} ME & = t_{\alpha/2,\,df} \times SE \\ & = 1.697 \times 4 \\ & \approx 6.788 \end{array}\]

Confidence Interval: \[\begin{array}{rl} CI_{90\%} & = \bar{x} \pm ME \\ & = 210 \pm 6.788 \\ & \approx (203.20,\; 216.80) \end{array}\]

Confidence intervals for \(95\%\)

Critical \(t\) value for 95% CI:

  • Significance level: \[\alpha = 0.05,\quad \alpha/2 = 0.025\]

  • From the t-table: \[t_{0.025,\,35} \approx 2.042\]

Standard Error (SE): \[\begin{array}{rl} SE & = \frac{s}{\sqrt{n}} \\ & = \frac{24}{\sqrt{36}} \\ & = 4 \end{array}\]

Margin of Error (ME): \[\begin{array}{rl} ME & = t_{\alpha/2,\,df} \times SE \\ & = 2.042 \times 4 \\ & = 8.168 \end{array}\]

Confidence Interval: \[\begin{array}{rl} CI_{95\%} & = \bar{x} \pm ME \\ & = 210 \pm 8.168 \\ & \approx (201.80,\; 218.20) \end{array}\]

Confidence intervals for \(99\%\)

Critical \(t\) value for 99% CI:

  • Significance level: \[\alpha = 0.01,\quad \alpha/2 = 0.005\]

  • From the t-table: \[t_{0.005,\,35} \approx 2.75\]

Standard Error (SE): \[\begin{array}{rl} SE & = \frac{s}{\sqrt{n}} \\ & = \frac{24}{\sqrt{36}} \\ & = 4 \end{array}\]

Margin of Error (ME): \[\begin{array}{rl} ME & = t_{\alpha/2,\,df} \times SE \\ & = 2.75 \times 4 \\ & = 11 \end{array}\]

Confidence Interval: \[\begin{array}{rl} CI_{99\%} & = \bar{x} \pm ME \\ & = 210 \pm 11 \\ & = (199,\; 221) \end{array}\]

Soal 3

3.Create a visualization comparing all intervals.

Visualisasi Team A (z-distribution)

library(ggplot2)
library(dplyr)
library(plotly)

# Data from the problem
mean_x <- 210
SE <- 4
ci <- data.frame(
  level = c("90%", "95%", "99%"),
  lower = c(203.40, 202.20, 199.70),
  upper = c(216.60, 217.80, 220.30)
)
# Colors
colors <- c(
  "99%" = "#E8C5E5",  
  "95%" = "#FFE5B4",  
  "90%" = "#B4E7CE"   
)
# Create bell curve data
x_vals <- seq(mean_x - 4*SE, mean_x + 4*SE, length.out = 400)
density_vals <- dnorm(x_vals, mean = mean_x, sd = SE)
df_curve <- data.frame(x = x_vals, y = density_vals)
# Plot dengan posisi label yang rapi
p <- ggplot(df_curve, aes(x, y)) +
  
  # CI ribbons - dari terluar ke terdalam agar terlihat semua
  geom_ribbon(data = subset(df_curve, x >= ci$lower[3] & x <= ci$upper[3]),
              aes(ymin = 0, ymax = y),
              fill = colors["99%"], alpha = 0.6) +
  
  geom_ribbon(data = subset(df_curve, x >= ci$lower[2] & x <= ci$upper[2]),
              aes(ymin = 0, ymax = y),
              fill = colors["95%"], alpha = 0.6) +
  
  geom_ribbon(data = subset(df_curve, x >= ci$lower[1] & x <= ci$upper[1]),
              aes(ymin = 0, ymax = y),
              fill = colors["90%"], alpha = 0.6) +
  
  # Base bell curve di atas ribbon
  geom_line(linewidth = 1.2, color = "gray20") +
  
  # Vertical mean line
  geom_vline(xintercept = mean_x, color = "black", linetype = "dashed", linewidth = 0.8) +
  
  # Vertical lines untuk CI boundaries
  geom_vline(xintercept = c(ci$lower[1], ci$upper[1]), 
             color = "#FF6B75", linetype = "dotted", alpha = 0.7, linewidth = 0.7) +
  geom_vline(xintercept = c(ci$lower[2], ci$upper[2]), 
             color = "#4A9FE8", linetype = "dotted", alpha = 0.7, linewidth = 0.7) +
  geom_vline(xintercept = c(ci$lower[3], ci$upper[3]), 
             color = "#67C9A8", linetype = "dotted", alpha = 0.7, linewidth = 0.7) +
  
  # Label mean
  annotate("text", x = mean_x, y = max(df_curve$y)*1.05, 
           label = "Ξ = 210", size = 5, fontface = "bold") +
  
  # CI labels di sisi kanan - DIGESER LEBIH JAUH
  annotate("text", x = 228, y = max(df_curve$y)*0.25,
           label = "90% CI\n[203.40, 216.60]", 
           color = "#FF6B75", size = 3.5, fontface = "bold", 
           hjust = 0, lineheight = 0.9) +
  
  annotate("text", x = 228, y = max(df_curve$y)*0.50,
           label = "95% CI\n[202.20, 217.80]", 
           color = "#4A9FE8", size = 3.5, fontface = "bold",
           hjust = 0, lineheight = 0.9) +
  
  annotate("text", x = 228, y = max(df_curve$y)*0.75,
           label = "99% CI\n[199.70, 220.30]", 
           color = "#67C9A8", size = 3.5, fontface = "bold",
           hjust = 0, lineheight = 0.9) +
  
  # Garis penghubung dari CI ke label - DIPERPANJANG
  annotate("segment", 
           x = ci$upper[1] + 0.5, xend = 227,
           y = max(df_curve$y)*0.25, yend = max(df_curve$y)*0.25,
           color = "#FF6B75", linetype = "solid", linewidth = 0.3, alpha = 0.5) +
  
  annotate("segment", 
           x = ci$upper[2] + 0.5, xend = 227,
           y = max(df_curve$y)*0.50, yend = max(df_curve$y)*0.50,
           color = "#4A9FE8", linetype = "solid", linewidth = 0.3, alpha = 0.5) +
  
  annotate("segment", 
           x = ci$upper[3] + 0.5, xend = 227,
           y = max(df_curve$y)*0.75, yend = max(df_curve$y)*0.75,
           color = "#67C9A8", linetype = "solid", linewidth = 0.3, alpha = 0.5) +
  
  labs(
    title = "Confidence Interval Comparison of Team A & Team B ",
    subtitle = "Distribution with Mean = 210, SE = 4",
    x = "Task Time (minutes)",
    y = "Probability Density"
  ) +
  
  scale_x_continuous(breaks = seq(195, 235, 5), limits = c(194, 240)) +  # DIPERLEBAR
  scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
  
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold", size = 16, hjust = 0.5),
    plot.subtitle = element_text(size = 12, hjust = 0.5, color = "gray40"),
    panel.grid.minor = element_blank(),
    panel.grid.major.y = element_line(color = "gray90"),
    axis.line.x = element_line(color = "gray30"),
    plot.margin = margin(10, 10, 10, 10)
  )

# Convert to interactive plotly
ggplotly(p)

Visualisasi Team B (t-distribution)

library(ggplot2)
library(dplyr)
library(plotly)

# Data Team B (t-distribution)
mean_x <- 210
SE <- 4
ci <- data.frame(
  level = c("90%", "95%", "99%"),
  lower = c(203.20, 201.80, 199.00),
  upper = c(216.80, 218.20, 221.00),
  t_value = c(1.697, 2.042, 2.75),
  ME = c(6.79, 8.17, 11)
)

# Colors
colors <- c(
  "99%" = "#E8C5E5",  
  "95%" = "#FFE5B4",  
  "90%" = "#B4E7CE"   
)

# Create bell curve data (using t-distribution)
df <- 35
x_vals <- seq(mean_x - 4.5*SE, mean_x + 4.5*SE, length.out = 400)
# Using t-distribution with df=35
density_vals <- dt((x_vals - mean_x)/SE, df = df) / SE
df_curve <- data.frame(x = x_vals, y = density_vals)

# Plot dengan posisi label yang rapi
p <- ggplot(df_curve, aes(x, y)) +
  
  # CI ribbons - dari terluar ke terdalam agar terlihat semua
  geom_ribbon(data = subset(df_curve, x >= ci$lower[3] & x <= ci$upper[3]),
              aes(ymin = 0, ymax = y),
              fill = colors["99%"], alpha = 0.6) +
  
  geom_ribbon(data = subset(df_curve, x >= ci$lower[2] & x <= ci$upper[2]),
              aes(ymin = 0, ymax = y),
              fill = colors["95%"], alpha = 0.6) +
  
  geom_ribbon(data = subset(df_curve, x >= ci$lower[1] & x <= ci$upper[1]),
              aes(ymin = 0, ymax = y),
              fill = colors["90%"], alpha = 0.6) +
  
  # Base bell curve di atas ribbon
  geom_line(linewidth = 1.2, color = "gray20") +
  
  # Vertical mean line
  geom_vline(xintercept = mean_x, color = "black", linetype = "dashed", linewidth = 0.8) +
  
  # Vertical lines untuk CI boundaries
  geom_vline(xintercept = c(ci$lower[1], ci$upper[1]), 
             color = "#6DAA8E", linetype = "dotted", alpha = 0.7, linewidth = 0.7) +
  geom_vline(xintercept = c(ci$lower[2], ci$upper[2]), 
             color = "#E6A853", linetype = "dotted", alpha = 0.7, linewidth = 0.7) +
  geom_vline(xintercept = c(ci$lower[3], ci$upper[3]), 
             color = "#C77DBF", linetype = "dotted", alpha = 0.7, linewidth = 0.7) +
  
  # Label mean
  annotate("text", x = mean_x, y = max(df_curve$y)*1.05, 
           label = "x˄ = 210", size = 5, fontface = "bold") +
  
  # CI labels di sisi kanan
  annotate("text", x = 228, y = max(df_curve$y)*0.25,
           label = "90% CI\n[203.20, 216.80]\nt = 1.697", 
           color = "#4F8A6B", size = 3.2, fontface = "bold", 
           hjust = 0, lineheight = 0.9) +
  
  annotate("text", x = 228, y = max(df_curve$y)*0.50,
           label = "95% CI\n[201.80, 218.20]\nt = 2.042", 
           color = "#CC8B2E", size = 3.2, fontface = "bold",
           hjust = 0, lineheight = 0.9) +
  
  annotate("text", x = 228, y = max(df_curve$y)*0.75,
           label = "99% CI\n[199.00, 221.00]\nt = 2.75", 
           color = "#A85B9D", size = 3.2, fontface = "bold",
           hjust = 0, lineheight = 0.9) +
  
  # Garis penghubung dari CI ke label
  annotate("segment", 
           x = ci$upper[1] + 0.5, xend = 227,
           y = max(df_curve$y)*0.25, yend = max(df_curve$y)*0.25,
           color = "#6DAA8E", linetype = "solid", linewidth = 0.3, alpha = 0.5) +
  
  annotate("segment", 
           x = ci$upper[2] + 0.5, xend = 222,
           y = max(df_curve$y)*0.50, yend = max(df_curve$y)*0.50,
           color = "#E6A853", linetype = "solid", linewidth = 0.3, alpha = 0.5) +
  
  annotate("segment", 
           x = ci$upper[3] + 0.5, xend = 222,
           y = max(df_curve$y)*0.75, yend = max(df_curve$y)*0.75,
           color = "#C77DBF", linetype = "solid", linewidth = 0.3, alpha = 0.5) +
  
  labs(
    title = "Confidence Intervals for Team B",
    subtitle = "Sample Mean = 210 min  |  SE = 4  |  df = 35",
    x = "Task Time (minutes)",
    y = "Probability Density"
  ) +
  
  scale_x_continuous(breaks = seq(195, 235, 5), limits = c(194, 240)) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
  
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold", size = 16, hjust = 0.5),
    plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
    panel.grid.minor = element_blank(),
    panel.grid.major.y = element_line(color = "gray90"),
    axis.line.x = element_line(color = "gray30"),
    plot.margin = margin(10, 10, 10, 10)
  )

# Convert to interactive plotly
ggplotly(p)

Soal 4

4.Explain why the interval widths differ* even with similar data.

The width of each confidence interval differs because higher confidence levels require capturing more of the bell curve. Even though the mean and standard error stay the same, a larger confidence level pushes the interval further into the tails of the distribution. As a result, the 90% interval is the narrowest because it only needs to cover the central portion of the curve, while the 95% and especially the 99% intervals become wider to guarantee a higher level of certainty. In short, the data do not change, only the amount of “certainty” we want, and that directly determines how wide the interval must be.

Case Study 5

One-Sided Confidence Interval: A Software as a Service (SaaS) company wants to ensure that at least 70% of weekly active users utilize a premium feature.

From the experiment:

\[ \begin{eqnarray*} n &=& 250 \quad \text{(total users)} \\ x &=& 185 \quad \text{(active premium users)} \end{eqnarray*} \]

Management is only interested in the lower bound of the estimate.

Tasks:

Soal 1

1.Identify the type of Confidence Interval and the appropriate test.

This analysis uses a one-sided (lower-bound) confidence interval for a population proportion because management is only concerned with the minimum plausible proportion of active premium users, the interval focuses solely on estimating the lower limit of the true proportion.

The appropriate statistical method is a Z-based confidence interval for a population proportion. This approach is valid because the sample size is sufficiently large (n=250), and both np and n(1−p) meet the normal approximation conditions, allowing the sampling distribution of the sample proportion to be treated as approximately normal.

Soal 2

2.Compute the one-sided lower Confidence Interval at:

  • \(90\%\)
  • \(95\%\)
  • \(99\%\)

\[ \begin{eqnarray*} n &=& 250 \quad \text{(total users)} \\ x &=& 185 \quad \text{(active premium users)} \end{eqnarray*} \]

One-sided lower Confidence Interval at \(90\%\)

Sample proportion: \[\hat{p} = \frac{x}{n} = \frac{185}{250} = 0.74\]

Standard Error (SE): \[\begin{array}{rl} SE & = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \\[1mm] & = \sqrt{\frac{0.74 \times 0.26}{250}} \\[1mm] & \approx 0.028 \end{array}\]

Critical Value for \(90\%\) one sided CI:

  • Significance level: \[\alpha = 0.10\]

  • From z-table (one-sided): \[z_{1-\alpha} \approx 1.28\]

Margin of Error(ME): \[\begin{array}{rl} ME & = z_{1-\alpha} \cdot SE \\[1mm] & = 1.28 \cdot 0.028 \\[1mm] & \approx 0.036 \end{array}\]

One-Sided Confidence Interval:

Lower One-Sided CI: \[\begin{array}{rl} CI_{lower} & = \hat{p} - ME \\[1mm] & = 0.74 - 0.036 \\[1mm] & \approx 0.704 \end{array}\]

One-sided lower Confidence Interval at \(95\%\)

Standard Error (SE): \[\begin{array}{rl} SE & = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \\[1mm] & = \sqrt{\frac{0.74 \times 0.26}{250}} \\[1mm] & \approx 0.028 \end{array}\]

Critical Value for \(95\%\) one sided CI:

  • Significance level: \[\alpha = 0.05\]

  • From z-table (one-sided): \[z_{1-\alpha} \approx 1.645\]

Margin of Error(ME): \[\begin{array}{rl} ME & = z_{1-\alpha} \cdot SE \\[1mm] & = 1.645 \cdot 0.028 \\[1mm] & \approx 0.046 \end{array}\]

One-Sided Confidence Interval:

Lower One-Sided CI: \[\begin{array}{rl} CI_{lower} & = \hat{p} - ME \\[1mm] & = 0.74 - 0.046 \\[1mm] & = 0.694 \end{array}\]

One-sided lower Confidence Interval at \(99\%\)

Standard Error (SE): \[\begin{array}{rl} SE & = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \\[1mm] & = \sqrt{\frac{0.74 \times 0.26}{250}} \\[1mm] & \approx 0.028 \end{array}\]

Critical Value for \(99\%\) one sided CI:

  • Significance level: \[\alpha = 0.01\]

  • From z-table (one-sided): \[z_{1-\alpha} \approx 2.33\]

Margin of Error(ME): \[\begin{array}{rl} ME & = z_{1-\alpha} \cdot SE \\[1mm] & = 2.33 \cdot 0.028 \\[1mm] & \approx 0.065 \end{array}\]

One-Sided Confidence Interval:

Lower One-Sided CI: \[\begin{array}{rl} CI_{lower} & = \hat{p} - ME \\[1mm] & = 0.74 - 0.065 \\[1mm] & = 0.675 \end{array}\]

Soal 3

3.Visualize the lower bounds for all confidence levels.

library(ggplot2)
library(dplyr)
library(plotly)

# Data Proportion CI
p_hat <- 0.74
SE <- 0.028

# One-sided lower confidence intervals
ci <- data.frame(
  level = c("90%", "95%", "99%"),
  lower = c(0.706, 0.694, 0.675),
  upper = rep(p_hat, 3),  
  z_value = c(1.28, 1.645, 2.33),
  ME = c(0.036, 0.046, 0.065)
)

# Colors
colors <- c(
  "99%" = "#FFD6E8",  
  "95%" = "#C9E4DE",  
  "90%" = "#E8DAF2" 
)

# Create normal curve for proportion
x_vals <- seq(p_hat - 4*SE, p_hat + 4*SE, length.out = 400)
density_vals <- dnorm(x_vals, mean = p_hat, sd = SE)
df_curve <- data.frame(x = x_vals, y = density_vals)

# Plot dengan layer yang rapi
p <- ggplot(df_curve, aes(x, y)) +
  
  # CI ribbons - dari terluar (99%) ke terdalam (90%)
  geom_ribbon(data = subset(df_curve, x >= ci$lower[3] & x <= ci$upper[3]),
              aes(ymin = 0, ymax = y),
              fill = colors["99%"], alpha = 0.7) +
  
  geom_ribbon(data = subset(df_curve, x >= ci$lower[2] & x <= ci$upper[2]),
              aes(ymin = 0, ymax = y),
              fill = colors["95%"], alpha = 0.7) +
  
  geom_ribbon(data = subset(df_curve, x >= ci$lower[1] & x <= ci$upper[1]),
              aes(ymin = 0, ymax = y),
              fill = colors["90%"], alpha = 0.7) +
  
  # Base bell curve
  geom_line(linewidth = 1.2, color = "gray20") +
  
  # Vertical line untuk p_hat (sample proportion)
  geom_vline(xintercept = p_hat, color = "black", 
             linetype = "dashed", linewidth = 0.9) +
  
  # Vertical lines untuk CI boundaries (lower bounds)
  geom_vline(xintercept = ci$lower[1], 
             color = "#B896D4", linetype = "dotted", alpha = 0.8, linewidth = 0.7) +
  geom_vline(xintercept = ci$lower[2], 
             color = "#7FB8A8", linetype = "dotted", alpha = 0.8, linewidth = 0.7) +
  geom_vline(xintercept = ci$lower[3], 
             color = "#FFB3CE", linetype = "dotted", alpha = 0.8, linewidth = 0.7) +
  
  # Label p_hat
  annotate("text", x = p_hat, y = max(df_curve$y)*1.05, 
           label = "p˂ = 0.74", size = 5, fontface = "bold") +
  
  # CI labels di sisi kanan
  # 90% CI
  annotate("text", x = 0.805, y = max(df_curve$y)*0.28,
           label = "90% CI (Lower)\n[0.706, ∞)\nz = 1.28", 
           color = "#8B5FA8", size = 3.2, fontface = "bold", 
           hjust = 0, lineheight = 0.9) +
  
  # 95% CI
  annotate("text", x = 0.805, y = max(df_curve$y)*0.55,
           label = "95% CI (Lower)\n[0.694, ∞)\nz = 1.645", 
           color = "#4A8B78", size = 3.2, fontface = "bold",
           hjust = 0, lineheight = 0.9) +
  
  # 99% CI
  annotate("text", x = 0.805, y = max(df_curve$y)*0.82,
           label = "99% CI (Lower)\n[0.675, ∞)\nz = 2.33", 
           color = "#E8699A", size = 3.2, fontface = "bold",
           hjust = 0, lineheight = 0.9) +
  
  # Garis penghubung dari lower bound ke label
  annotate("segment", 
           x = ci$lower[1] + 0.002, xend = 0.803,
           y = max(df_curve$y)*0.28, yend = max(df_curve$y)*0.28,
           color = "#B896D4", linetype = "solid", linewidth = 0.3, alpha = 0.6) +
  
  annotate("segment", 
           x = ci$lower[2] + 0.002, xend = 0.803,
           y = max(df_curve$y)*0.55, yend = max(df_curve$y)*0.55,
           color = "#7FB8A8", linetype = "solid", linewidth = 0.3, alpha = 0.6) +
  
  annotate("segment", 
           x = ci$lower[3] + 0.002, xend = 0.803,
           y = max(df_curve$y)*0.82, yend = max(df_curve$y)*0.82,
           color = "#FFB3CE", linetype = "solid", linewidth = 0.3, alpha = 0.6) +
  
  labs(
    title = "One-Sided Lower Confidence Intervals for Population Proportion",
    subtitle = "Sample proportion p˂ = 0.74  |  n = 250  |  SE = 0.028",
    x = "Proportion (p)",
    y = "Probability Density"
  ) +
  
  scale_x_continuous(breaks = seq(0.64, 0.84, 0.02), 
                     limits = c(0.63, 0.87)) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
  
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold", size = 15, hjust = 0.5),
    plot.subtitle = element_text(size = 10.5, hjust = 0.5, color = "gray40"),
    panel.grid.minor = element_blank(),
    panel.grid.major.y = element_line(color = "gray90"),
    axis.line.x = element_line(color = "gray30"),
    plot.margin = margin(10, 10, 10, 10)
  )

# Convert to interactive plotly
ggplotly(p)

Soal 4

4.Determine whether the 70% target is statistically satisfied.

The 70% target is statistically met only at the 90% confidence level, because the lower bound (0.704) remains above the target. However, at higher confidence levels (95% and 99%), the lower bounds fall below 70%, which means the data does not provide enough evidence to confidently claim the target is reached.

Reference

Siregar, B. (t.t.). Introduction to Statistics: Chapter 8 Confidence Interval. dsciencelabs. Diakses dari https://bookdown.org/dsciencelabs/intro_statistics/08-Confidence_Interval.html

Illowsky, B., & Dean, S. (2023). Introductory Statistics 2e (2nd ed.). Houston: OpenStax. Diakses dari https://openstax.org/details/books/introductory-statistics-2e

Lane, D. M. (2013). Online Statistics Education: A Multimedia Course of Study. Rice University. Diakses dari https://onlinestatbook.com/Online_Statistics_Education.pdf