Undergraduate Student in Data Science at Institut Teknologi Sains Bandung
Confidence Interval for Mean, \(\sigma\) Known: An e-commerce platform wants to estimate the average number of daily transactions per user after launching a new feature. Based on large-scale historical data, the population standard deviation is known.
\[ \begin{eqnarray*} \sigma &=& 3.2 \quad \text{(population standard deviation)} \\ n &=& 100 \quad \text{(sample size)} \\ \bar{x} &=& 12.6 \quad \text{(sample mean)} \end{eqnarray*} \]
Soal 1
1.Identify the appropriate statistical test and justify your choice.
The appropriate method is a Z confidence interval for the mean because the population standard deviation is known and the sample size is large, allowing the sampling distribution of the mean to be approximated by a normal distribution.
Soal 2
2.Compute the Confidence Intervals for:
Sample size: \[n=100\]
Sample mean: \[\bar{x}=12.6\]
Population standard deviation (known): \[\sigma=3.2\]
Formula for CI using z-distribution: \[CI = \bar{x} \pm z_{\alpha/2}\left(\frac{\sigma}{\sqrt{n}}\right)\]
Critical value \(z\) for 90% CI:
Standard Error (SE): \[\begin{array}{rl} SE & = \frac{\sigma}{\sqrt{n}} \\ & = \frac{3.2}{\sqrt{100}} \\ & = \frac{3.2}{10} \\ & = 0.32 \end{array}\]
Margin of Error (ME): \[\begin{array}{rl} ME & = z_{\alpha/2} \times SE \\ & = 1.645 \times 0.32 \\ & \approx 0.526 \\ \end{array}\]
Confidence Interval: \[ \begin{array}{rl} CI_{90\%} & = \bar{x} \pm ME \\ & = 12.6 \pm 0.526 \\ & \approx (12.07,\; 13.13) \\ \end{array}\]
Formula for CI using z-distribution: \[CI = \bar{x} \pm z_{\alpha/2}\left(\frac{\sigma}{\sqrt{n}}\right)\]
Critical value \(z\) for 95% CI:
Standard Error (SE): \[\begin{array}{rl} SE & = \frac{\sigma}{\sqrt{n}} \\ & = \frac{3.2}{\sqrt{100}} \\ & = \frac{3.2}{10} \\ & = 0.32 \end{array}\]
Margin of Error (ME): \[\begin{array}{rl} ME & = z_{\alpha/2} \times SE \\ & = 1.96 \times 0.32 \\ & \approx 0.627 \\ \end{array}\]
Confidence Interval: \[ \begin{array}{rl} CI_{95\%} & = \bar{x} \pm ME \\ & = 12.6 \pm 0.627 \\ & \approx (11.97,\; 13.23) \\ \end{array}\]
Formula for CI using z-distribution: \[CI = \bar{x} \pm z_{\alpha/2}\left(\frac{\sigma}{\sqrt{n}}\right)\]
Critical value \(z\) for 99% CI:
Standard Error (SE): \[\begin{array}{rl} SE & = \frac{\sigma}{\sqrt{n}} \\ & = \frac{3.2}{\sqrt{100}} \\ & = \frac{3.2}{10} \\ & = 0.32 \end{array}\]
Margin of Error (ME): \[\begin{array}{rl} ME & = z_{\alpha/2} \times SE \\ & = 2.575 \times 0.32 \\ & = 0.824 \\ \end{array}\]
Confidence Interval: \[ \begin{array}{rl} CI_{99\%} & = \bar{x} \pm ME \\ & = 12.6 \pm 0.824 \\ & \approx (11.78,\; 13.42) \\ \end{array}\]
Soal 3
3.Create a comparison visualization of the three confidence intervals.
library(ggplot2)
library(dplyr)
library(plotly)
# Data z-distribution CI
mean_x <- 12.6
sigma <- 3.2
n <- 100
SE <- sigma / sqrt(n) # SE = 0.32
# Confidence intervals
ci <- data.frame(
level = c("90%", "95%", "99%"),
lower = c(12.0736, 11.9728, 11.776),
upper = c(13.1264, 13.2272, 13.424),
z_value = c(1.645, 1.96, 2.575),
ME = c(0.5264, 0.6272, 0.824)
)
# Colors
colors <- c(
"99%" = "#FFDCE5",
"95%" = "#D4F1F4",
"90%" = "#FFE8D1"
)
# Create normal distribution curve
x_vals <- seq(mean_x - 4*SE, mean_x + 4*SE, length.out = 400)
density_vals <- dnorm(x_vals, mean = mean_x, sd = SE)
df_curve <- data.frame(x = x_vals, y = density_vals)
# Plot dengan desain yang rapi
p <- ggplot(df_curve, aes(x, y)) +
# CI ribbons - dari terluar (99%) ke terdalam (90%)
geom_ribbon(data = subset(df_curve, x >= ci$lower[3] & x <= ci$upper[3]),
aes(ymin = 0, ymax = y),
fill = colors["99%"], alpha = 0.7) +
geom_ribbon(data = subset(df_curve, x >= ci$lower[2] & x <= ci$upper[2]),
aes(ymin = 0, ymax = y),
fill = colors["95%"], alpha = 0.7) +
geom_ribbon(data = subset(df_curve, x >= ci$lower[1] & x <= ci$upper[1]),
aes(ymin = 0, ymax = y),
fill = colors["90%"], alpha = 0.7) +
# Base bell curve
geom_line(linewidth = 1.3, color = "gray20") +
# Vertical mean line (dashed)
geom_vline(xintercept = mean_x, color = "black",
linetype = "dashed", linewidth = 1) +
# Vertical lines untuk CI boundaries (dotted) - 90%
geom_vline(xintercept = ci$lower[1],
color = "#E8A860", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
geom_vline(xintercept = ci$upper[1],
color = "#E8A860", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
# 95%
geom_vline(xintercept = ci$lower[2],
color = "#75B8BD", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
geom_vline(xintercept = ci$upper[2],
color = "#75B8BD", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
# 99%
geom_vline(xintercept = ci$lower[3],
color = "#F5A3B5", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
geom_vline(xintercept = ci$upper[3],
color = "#F5A3B5", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
# Label mean
annotate("text", x = mean_x, y = max(df_curve$y)*1.06,
label = "xĖ = 12.6", size = 5, fontface = "bold") +
# CI labels di sisi kanan
# 90% CI
annotate("text", x = 13.7, y = max(df_curve$y)*0.30,
label = "90% CI\n[12.07, 13.13]\nz = 1.645",
color = "#C8843D", size = 3.2, fontface = "bold",
hjust = 0, lineheight = 0.9) +
# 95% CI
annotate("text", x = 13.7, y = max(df_curve$y)*0.55,
label = "95% CI\n[11.97, 13.23]\nz = 1.96",
color = "#4A8A8F", size = 3.2, fontface = "bold",
hjust = 0, lineheight = 0.9) +
# 99% CI
annotate("text", x = 13.7, y = max(df_curve$y)*0.80,
label = "99% CI\n[11.78, 13.42]\nz = 2.575",
color = "#D67889", size = 3.2, fontface = "bold",
hjust = 0, lineheight = 0.9) +
# Garis penghubung dari upper bound ke label
annotate("segment",
x = ci$upper[1] + 0.02, xend = 13.68,
y = max(df_curve$y)*0.30, yend = max(df_curve$y)*0.30,
color = "#E8A860", linetype = "solid", linewidth = 0.3, alpha = 0.6) +
annotate("segment",
x = ci$upper[2] + 0.02, xend = 13.68,
y = max(df_curve$y)*0.55, yend = max(df_curve$y)*0.55,
color = "#75B8BD", linetype = "solid", linewidth = 0.3, alpha = 0.6) +
annotate("segment",
x = ci$upper[3] + 0.02, xend = 13.68,
y = max(df_curve$y)*0.80, yend = max(df_curve$y)*0.80,
color = "#F5A3B5", linetype = "solid", linewidth = 0.3, alpha = 0.6) +
labs(
title = "Confidence Intervals (z-distribution)",
subtitle = "Sample Mean = 12.6 | Ï = 3.2 | n = 100 | SE = 0.32",
x = "Value",
y = "Probability Density"
) +
scale_x_continuous(breaks = seq(11.5, 14, 0.5),
limits = c(11.3, 14.3)) +
scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", size = 15, hjust = 0.5),
plot.subtitle = element_text(size = 10.5, hjust = 0.5, color = "gray40"),
panel.grid.minor = element_blank(),
panel.grid.major.y = element_line(color = "gray90"),
axis.line.x = element_line(color = "gray30"),
plot.margin = margin(10, 10, 10, 10)
)
# Convert to interactive plotly
ggplotly(p)
Interpretation:
The plot shows a normal distribution centered at the sample mean of 12.6, with three confidence intervals drawn around it. The 90% CI (12.07â13.13) is the narrowest, meaning the estimate is more precise but less certain. The 95% CI (11.97â13.23) is slightly wider, reflecting higher confidence. The 99% CI (11.78â13.42) is the widest, showing the broadest range of plausible population means. As the confidence level increases, the interval expands outward symmetrically from the mean, demonstrating the standard relationship (If the confidence is higher, then the interval is wider).
Soal 4
4.Interpret the results in a business analytics context.
The confidence interval analysis shows that the estimated average number of daily transactions per user is centered at 12.6, and all intervals fall within a relatively tight range. This suggests that user behavior is fairly consistent and the average is unlikely to deviate far from this value. As the confidence level increases, the interval becomes wider, meaning the business gains more certainty but with less precision about the exact value. This trade-off is important for decision-making: if the business needs a more confident estimate for planning, they may rely on a wider interval, while more precise decisions can use the narrower ones. Overall, the results indicate stable transaction patterns that can support reliable operational and strategic decisions.
Confidence Interval for Mean, \(\sigma\) Unknown: A UX Research team analyzes task completion time (in minutes) for a new mobile application. The data are collected from 12 users:
\[ 8.4,\; 7.9,\; 9.1,\; 8.7,\; 8.2,\; 9.0,\; 7.8,\; 8.5,\; 8.9,\; 8.1,\; 8.6,\; 8.3 \]
Soal 1
1.Identify the appropriate statistical test and explain why.
The appropriate statistical method is a one-sample t confidence interval because the population standard deviation is unknown and the sample size is small \((n = 12)\). The t-distribution is therefore used to estimate the population mean task completion time.
Soal 2
2.Compute the Confidence Intervals for:
Sample size: \[n=12\]
Sample mean: \[\begin{array}{rl} \bar{x} &= \frac{1}{n} \sum_{i=1}^{n} x_i \\[2mm] &= \frac{1}{12} (x_1 + x_2 + \cdots + x_{12}) \\[1mm] &= \frac{1}{12} (8.4+7.9+9.1+8.7+8.2+9.0+7.8+8.5+8.9+8.1+8.6+8.3) \\[1mm] &= \frac{101.5}{12} \\[1mm] &= 8.458 \ \text{minutes} \end{array}\]
Sample standard deviation
\[ \begin{array}{rl} s & = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}} \\[2mm] & = \sqrt{\frac{\sum (x_i - 8.458)^2}{12-1}} \\[2mm] & = \sqrt{\frac{1.947}{11}} \\[1mm] & \approx 0.42\ \text{minutes} \end{array} \]
Degrees of freedom: \[df = n - 1 = 11\]
Critical \(t\) value for 90% CI
Standard Error (SE): \[\begin{array}{rl} SE & = \frac{s}{\sqrt{n}} \\ & = \frac{0.42}{\sqrt{12}} \\ & \approx 0.121 \end{array}\]
Margin of Error (ME): \[\begin{array}{rl} ME & = t_{\alpha/2,\,df} \times SE \\ & = 1.796 \times 0.121 \\ & \approx 0.217 \end{array}\]
Confidence Interval: \[\begin{array}{rl} CI_{90\%} & = \bar{x} \pm ME \\ & = 8.458 \pm 0.217 \\ & \approx (8.240,\; 8.680) \end{array}\]
Critical \(t\) value for 95% CI
Standard Error (SE): \[\begin{array}{rl} SE & = \frac{s}{\sqrt{n}} \\ & = \frac{0.42}{\sqrt{12}} \\ & \approx 0.121 \end{array}\]
Margin of Error (ME): \[\begin{array}{rl} ME & = t_{\alpha/2,\,df} \times SE \\ & = 2.201 \times 0.121 \\ & \approx 0.266 \end{array}\]
Confidence Interval: \[\begin{array}{rl} CI_{95\%} & = \bar{x} \pm ME \\ & = 8.458 \pm 0.266 \\ & \approx (8.190,\; 8.720) \end{array}\]
Critical \(t\) value for 99% CI
Standard Error (SE): \[\begin{array}{rl} SE & = \frac{s}{\sqrt{n}} \\ & = \frac{0.42}{\sqrt{12}} \\ & \approx 0.121 \end{array}\]
Margin of Error (ME): \[\begin{array}{rl} ME & = t_{\alpha/2,\,df} \times SE \\ & = 3.106 \times 0.121 \\ & \approx 0.380 \end{array}\]
Confidence Interval: \[\begin{array}{rl} CI_{99\%} & = \bar{x} \pm ME \\ & = 8.458 \pm 0.380 \\ & \approx (8.080,\; 8.830) \end{array}\]
Soal 3
3.Visualize the three intervals on a single plot.
library(ggplot2)
library(dplyr)
library(plotly)
# Data t-distribution CI (small sample)
mean_x <- 8.458
s <- 0.42
n <- 12
df <- n - 1 # df = 11
SE <- s / sqrt(n) # SE = 0.121
# Confidence intervals
ci <- data.frame(
level = c("90%", "95%", "99%"),
lower = c(8.240, 8.190, 8.080),
upper = c(8.680, 8.720, 8.830),
t_value = c(1.796, 2.201, 3.106),
ME = c(0.217, 0.266, 0.380)
)
# Colors
colors <- c(
"99%" = "#E6D5F5",
"95%" = "#FFE5CC",
"90%" = "#D1F2EB"
)
# Create t-distribution curve
x_vals <- seq(mean_x - 4.5*SE, mean_x + 4.5*SE, length.out = 400)
# Using t-distribution with df=11
density_vals <- dt((x_vals - mean_x)/SE, df = df) / SE
df_curve <- data.frame(x = x_vals, y = density_vals)
# Plot dengan desain yang rapi
p <- ggplot(df_curve, aes(x, y)) +
# CI ribbons - dari terluar (99%) ke terdalam (90%)
geom_ribbon(data = subset(df_curve, x >= ci$lower[3] & x <= ci$upper[3]),
aes(ymin = 0, ymax = y),
fill = colors["99%"], alpha = 0.7) +
geom_ribbon(data = subset(df_curve, x >= ci$lower[2] & x <= ci$upper[2]),
aes(ymin = 0, ymax = y),
fill = colors["95%"], alpha = 0.7) +
geom_ribbon(data = subset(df_curve, x >= ci$lower[1] & x <= ci$upper[1]),
aes(ymin = 0, ymax = y),
fill = colors["90%"], alpha = 0.7) +
# Base bell curve
geom_line(linewidth = 1.3, color = "gray20") +
# Vertical mean line (dashed)
geom_vline(xintercept = mean_x, color = "black",
linetype = "dashed", linewidth = 1) +
# Vertical lines untuk CI boundaries (dotted) - 90%
geom_vline(xintercept = ci$lower[1],
color = "#7AC5B0", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
geom_vline(xintercept = ci$upper[1],
color = "#7AC5B0", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
# 95%
geom_vline(xintercept = ci$lower[2],
color = "#E6A962", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
geom_vline(xintercept = ci$upper[2],
color = "#E6A962", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
# 99%
geom_vline(xintercept = ci$lower[3],
color = "#B399CC", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
geom_vline(xintercept = ci$upper[3],
color = "#B399CC", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
# Label mean
annotate("text", x = mean_x, y = max(df_curve$y)*1.06,
label = "xĖ = 8.458", size = 5, fontface = "bold") +
# CI labels di sisi kanan
# 90% CI
annotate("text", x = 9.05, y = max(df_curve$y)*0.28,
label = "90% CI\n[8.240, 8.680]\nt = 1.796",
color = "#4A9B85", size = 3.2, fontface = "bold",
hjust = 0, lineheight = 0.9) +
# 95% CI
annotate("text", x = 9.05, y = max(df_curve$y)*0.55,
label = "95% CI\n[8.190, 8.720]\nt = 2.201",
color = "#C88641", size = 3.2, fontface = "bold",
hjust = 0, lineheight = 0.9) +
# 99% CI
annotate("text", x = 9.05, y = max(df_curve$y)*0.82,
label = "99% CI\n[8.080, 8.830]\nt = 3.106",
color = "#8B6FA3", size = 3.2, fontface = "bold",
hjust = 0, lineheight = 0.9) +
# Garis penghubung dari upper bound ke label
annotate("segment",
x = ci$upper[1] + 0.01, xend = 9.03,
y = max(df_curve$y)*0.28, yend = max(df_curve$y)*0.28,
color = "#7AC5B0", linetype = "solid", linewidth = 0.3, alpha = 0.6) +
annotate("segment",
x = ci$upper[2] + 0.01, xend = 9.03,
y = max(df_curve$y)*0.55, yend = max(df_curve$y)*0.55,
color = "#E6A962", linetype = "solid", linewidth = 0.3, alpha = 0.6) +
annotate("segment",
x = ci$upper[3] + 0.01, xend = 9.03,
y = max(df_curve$y)*0.82, yend = max(df_curve$y)*0.82,
color = "#B399CC", linetype = "solid", linewidth = 0.3, alpha = 0.6) +
labs(
title = "Confidence Interval (t-Distribution)",
subtitle = "Sample Mean = 8.458 min | s = 0.42 | n = 12 | df = 11 | SE = 0.121",
x = "Time (minutes)",
y = "Probability Density"
) +
scale_x_continuous(breaks = seq(7.9, 9.0, 0.2),
limits = c(7.8, 9.5)) +
scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", size = 15, hjust = 0.5),
plot.subtitle = element_text(size = 10, hjust = 0.5, color = "gray40"),
panel.grid.minor = element_blank(),
panel.grid.major.y = element_line(color = "gray90"),
axis.line.x = element_line(color = "gray30"),
plot.margin = margin(10, 10, 10, 10)
)
# Convert to interactive plotly
ggplotly(p)
Interpretation:
The plot shows a t-distribution centered at the sample mean of 8.458 minutes, based on a small sample of 12 users. Because the sample size is small and the population standard deviation is unknown, the confidence intervals are wider than they would be under a normal z-distribution.
The three intervals (90%, 95%, and 99%) all stay fairly close to the sample mean, indicating that user task-completion times are consistent. As the confidence level increases, the interval expands (from about 8.24â8.68 at 90% to 8.08â8.83 at 99%), showing the usual trade-off: more certainty leads to a wider possible range. Overall, the visualization suggests that the average task time is stable around 8.5 minutes, and even with higher confidence levels, the estimate remains within a narrow and reliable range.
Soal 4
4.Explain how sample size and confidence level influence the interval width.
All confidence intervals are calculated using the same sample size (n = 12), so the interval width is mainly affected by the confidence level. As the confidence level increases from 90% - 95% - 99%$, the intervals become noticeably wider. This happens because a higher confidence level requires a larger margin of error to ensure the true mean is captured with greater certainty. If the sample size were increased, the intervals in the graph would shrink. A larger sample size reduces variability and produces a smaller standard error, which narrows the confidence interval. In contrast, a smaller sample size would make the intervals wider due to increased uncertainty.
Confidence Interval for a Proportion, A/B Testing: A data science team runs an A/B test on a new Call-To-Action (CTA) button design. The experiment yields:
\[ \begin{eqnarray*} n &=& 400 \quad \text{(total users)} \\ x &=& 156 \quad \text{(users who clicked the CTA)} \end{eqnarray*} \]Soal 1
1.Compute the sample proportion \(\hat{p}\).
Sample proportion:
The sample proportion \(\hat{p}\) is: \[\hat{p} = \frac{x}{n} = \frac{156}{400} = 0.39\]
So about 39% of sampled users clicked the CTA.
Soal 2
2.Compute Confidence Intervals for the proportion at:
Standard Error(SE): \[\begin{array}{rl} SE &= \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}} \\ & = \sqrt{\frac{0.39(1 - 0.39)}{400}} \\ & = \sqrt{\frac{0.39 \times 0.61}{400}} \\ & \approx 0.024 \end{array}\]
Critical value:
For 90% confidence level: \[z_{\alpha/2} = 1.645\]
Margin of Error (ME): \[ME = z_{\alpha/2} \times SE = 1.645 \times 0.024 \approx 0.039\]
Confidence Interval: \[\begin{array}{rl} CI_{90\%} & = \hat{p} \pm ME \\ & = 0.39 \pm 0.039\\ &\approx (0.3,\; 0.420) \end{array}\]
Standard Error(SE): \[\begin{array}{rl} SE &= \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}} \\ & = \sqrt{\frac{0.39(1 - 0.39)}{400}} \\ & = \sqrt{\frac{0.39 \times 0.61}{400}} \\ & \approx 0.024 \end{array}\]
Critical value:
For 95% confidence level: \[z_{\alpha/2} = 1.96\]
Margin of Error (ME): \[ME = z_{\alpha/2} \times SE = 1.96 \times 0.024 \approx 0.047\]
Confidence Interval: \[\begin{array}{rl} CI_{95\%} & = \hat{p} \pm ME \\ & = 0.39 \pm 0.047\\ &\approx (0.350,\; 0.440) \end{array}\]
Standard Error(SE): \[\begin{array}{rl} SE &= \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}} \\ & = \sqrt{\frac{0.39(1 - 0.39)}{400}} \\ & = \sqrt{\frac{0.39 \times 0.61}{400}} \\ & \approx 0.024 \end{array}\]
Critical value:
For 99% confidence level: \[z_{\alpha/2} = 2.575\]
Margin of Error (ME): \[ME = z_{\alpha/2} \times SE = 2.575 \times 0.024 \approx 0.061\]
Confidence Interval: \[\begin{array}{rl} CI_{99\%} & = \hat{p} \pm ME \\ & = 0.39 \pm 0.061\\ &\approx (0.330,\; 0.450) \end{array}\]
Soal 3
3.Visualize and compare the three intervals.
library(ggplot2)
library(dplyr)
library(plotly)
# Data Proportion CI
p_hat <- 0.39
n <- 400
SE <- 0.024
# Confidence intervals
ci <- data.frame(
level = c("90%", "95%", "99%"),
lower = c(0.350, 0.340, 0.330),
upper = c(0.430, 0.440, 0.450),
z_value = c(1.645, 1.96, 2.575),
ME = c(0.039, 0.047, 0.061)
)
# Colors
colors <- c(
"99%" = "#D4A5A5",
"95%" = "#B5D4C6",
"90%" = "#C9B8D4"
)
# Create normal distribution curve for proportion
x_vals <- seq(p_hat - 4.5*SE, p_hat + 4.5*SE, length.out = 400)
density_vals <- dnorm(x_vals, mean = p_hat, sd = SE)
df_curve <- data.frame(x = x_vals, y = density_vals)
# Plot dengan desain yang rapi
p <- ggplot(df_curve, aes(x, y)) +
# CI ribbons - dari terluar (99%) ke terdalam (90%)
geom_ribbon(data = subset(df_curve, x >= ci$lower[3] & x <= ci$upper[3]),
aes(ymin = 0, ymax = y),
fill = colors["99%"], alpha = 0.7) +
geom_ribbon(data = subset(df_curve, x >= ci$lower[2] & x <= ci$upper[2]),
aes(ymin = 0, ymax = y),
fill = colors["95%"], alpha = 0.7) +
geom_ribbon(data = subset(df_curve, x >= ci$lower[1] & x <= ci$upper[1]),
aes(ymin = 0, ymax = y),
fill = colors["90%"], alpha = 0.7) +
# Base bell curve
geom_line(linewidth = 1.3, color = "gray25") +
# Vertical mean line (dashed)
geom_vline(xintercept = p_hat, color = "black",
linetype = "dashed", linewidth = 1) +
# Vertical lines untuk CI boundaries (dotted) - 90% CI
geom_vline(xintercept = ci$lower[1],
color = "#9B7FB8", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
geom_vline(xintercept = ci$upper[1],
color = "#9B7FB8", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
# 95% CI
geom_vline(xintercept = ci$lower[2],
color = "#7DA891", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
geom_vline(xintercept = ci$upper[2],
color = "#7DA891", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
# 99% CI
geom_vline(xintercept = ci$lower[3],
color = "#A67272", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
geom_vline(xintercept = ci$upper[3],
color = "#A67272", linetype = "dotted", alpha = 0.8, linewidth = 0.8) +
# Label mean
annotate("text", x = p_hat, y = max(df_curve$y)*1.07,
label = "pĖ = 0.39", size = 5.5, fontface = "bold") +
# CI labels di sisi kanan
# 90% CI
annotate("text", x = 0.475, y = max(df_curve$y)*0.28,
label = "90% CI\n[0.350, 0.430]\nz = 1.645",
color = "#6F4B8B", size = 3.2, fontface = "bold",
hjust = 0, lineheight = 0.9) +
# 95% CI
annotate("text", x = 0.475, y = max(df_curve$y)*0.55,
label = "95% CI\n[0.340, 0.440]\nz = 1.96",
color = "#4A7C59", size = 3.2, fontface = "bold",
hjust = 0, lineheight = 0.9) +
# 99% CI
annotate("text", x = 0.475, y = max(df_curve$y)*0.82,
label = "99% CI\n[0.330, 0.450]\nz = 2.575",
color = "#8B5757", size = 3.2, fontface = "bold",
hjust = 0, lineheight = 0.9) +
# Garis penghubung dari upper bound ke label
annotate("segment",
x = ci$upper[1] + 0.003, xend = 0.473,
y = max(df_curve$y)*0.28, yend = max(df_curve$y)*0.28,
color = "#9B7FB8", linetype = "solid", linewidth = 0.3, alpha = 0.6) +
annotate("segment",
x = ci$upper[2] + 0.003, xend = 0.473,
y = max(df_curve$y)*0.55, yend = max(df_curve$y)*0.55,
color = "#7DA891", linetype = "solid", linewidth = 0.3, alpha = 0.6) +
annotate("segment",
x = ci$upper[3] + 0.003, xend = 0.473,
y = max(df_curve$y)*0.82, yend = max(df_curve$y)*0.82,
color = "#A67272", linetype = "solid", linewidth = 0.3, alpha = 0.6) +
labs(
title = "Confidence Intervals for Proportion",
subtitle = "Sample Proportion pĖ = 0.39 | n = 400 | SE = 0.024",
x = "Proportion (p)",
y = "Probability Density"
) +
scale_x_continuous(breaks = seq(0.28, 0.50, 0.02),
limits = c(0.26, 0.54)) +
scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", size = 15, hjust = 0.5),
plot.subtitle = element_text(size = 10.5, hjust = 0.5, color = "gray40"),
panel.grid.minor = element_blank(),
panel.grid.major.y = element_line(color = "gray90"),
axis.line.x = element_line(color = "gray30"),
plot.margin = margin(10, 10, 10, 10)
)
# Convert to interactive plotly
ggplotly(p)
Interpretation:
Soal 4
4.Explain how confidence level affects decision-making in product experiments.
The graph shows that higher confidence levels produce wider intervals. In product experiments, this affects how clearly we can judge the impact of a feature. The 90% interval is narrow, so it gives a more precise estimate, making decisions faster though with lower certainty. The 95% interval offers a balanced view, with good certainty and still-manageable width. At 99%, the interval becomes much wider, making the result harder to interpret because the true effect could fall anywhere in a large range. This means higher confidence gives more assurance, but reduces precision, which can slow down or complicate decision-making.
Precision Comparison (Z-Test vs t-Test): Two data teams measure API latency (in milliseconds) under different conditions.
\[\begin{eqnarray*} \text{Team A:} \\ n &=& 36 \quad \text{(sample size)} \\ \bar{x} &=& 210 \quad \text{(sample mean)} \\ \sigma &=& 24 \quad \text{(known population standard deviation)} \\[6pt] \text{Team B:} \\ n &=& 36 \quad \text{(sample size)} \\ \bar{x} &=& 210 \quad \text{(sample mean)} \\ s &=& 24 \quad \text{(sample standard deviation)} \end{eqnarray*}\]
Soal 1
1.Identify the statistical test used by each team.
Soal 2
2.Compute Confidence Intervals for 90%, 95%, and 99%.
\[\begin{eqnarray*} \text{Team A:} \\ n &=& 36 \quad \text{(sample size)} \\ \bar{x} &=& 210 \quad \text{(sample mean)} \\ \sigma &=& 24 \quad \text{(known population standard deviation)} \\[6pt] \end{eqnarray*}\]
Formula for CI using z-distribution: \[CI = \bar{x} \pm z_{\alpha/2}\left(\frac{\sigma}{\sqrt{n}}\right)\]
Critical value \(z\) for 90% CI:
Standard Error (SE): \[\begin{array}{rl} SE & = \frac{\sigma}{\sqrt{n}} \\ & = \frac{24}{\sqrt{36}} \\ & = \frac{24}{6} \\ & = 4 \end{array}\]
Margin of Error (ME): \[\begin{array}{rl} ME & = z_{\alpha/2} \times SE \\ & = 1.645 \times 4 \\ & = 6.58 \\ \end{array}\]
Confidence Interval: \[ \begin{array}{rl} CI_{90\%} & = \bar{x} \pm ME \\ & = 210 \pm 6.58 \\ & = (203.4,\; 216.6) \\ \end{array}\]
Critical value \(z\) for 95% CI:
Standard Error (SE): \[\begin{array}{rl} SE & = \frac{\sigma}{\sqrt{n}} \\ & = \frac{24}{\sqrt{36}} \\ & = \frac{24}{6} \\ & = 4 \end{array}\]
Margin of Error (ME): \[\begin{array}{rl} ME & = z_{\alpha/2} \times SE \\ & = 1.96 \times 4 \\ & \approx 7.80 \\ \end{array}\]
Confidence Interval: \[ \begin{array}{rl} CI_{95\%} & = \bar{x} \pm ME \\ & = 210 \pm 7.80 \\ & = (202.2,\; 217.8) \\ \end{array}\]
Critical value \(z\) for 99% CI:
Standard Error (SE): \[\begin{array}{rl} SE & = \frac{\sigma}{\sqrt{n}} \\ & = \frac{24}{\sqrt{36}} \\ & = \frac{24}{6} \\ & = 4 \end{array}\]
Margin of Error (ME): \[\begin{array}{rl} ME & = z_{\alpha/2} \times SE \\ & = 2.575 \times 4 \\ & \approx 10.3 \\ \end{array}\]
Confidence Interval: \[ \begin{array}{rl} CI_{99\%} & = \bar{x} \pm ME \\ & = 210 \pm 10.3 \\ & = (199.7,\; 220.3) \\ \end{array}\]
\[\begin{eqnarray*} \text{Team B:} \\ n &=& 36 \quad \text{(sample size)} \\ \bar{x} &=& 210 \quad \text{(sample mean)} \\ s &=& 24 \quad \text{(sample standard deviation)} \end{eqnarray*}\]
Degrees of freedom: \[df = n - 1 = 35\]
Critical \(t\) value for 90% CI
Standard Error (SE): \[\begin{array}{rl} SE & = \frac{s}{\sqrt{n}} \\ & = \frac{24}{\sqrt{36}} \\ & = 4 \end{array}\]
Margin of Error (ME): \[\begin{array}{rl} ME & = t_{\alpha/2,\,df} \times SE \\ & = 1.697 \times 4 \\ & \approx 6.788 \end{array}\]
Confidence Interval: \[\begin{array}{rl} CI_{90\%} & = \bar{x} \pm ME \\ & = 210 \pm 6.788 \\ & \approx (203.20,\; 216.80) \end{array}\]
Critical \(t\) value for 95% CI:
Standard Error (SE): \[\begin{array}{rl} SE & = \frac{s}{\sqrt{n}} \\ & = \frac{24}{\sqrt{36}} \\ & = 4 \end{array}\]
Margin of Error (ME): \[\begin{array}{rl} ME & = t_{\alpha/2,\,df} \times SE \\ & = 2.042 \times 4 \\ & = 8.168 \end{array}\]
Confidence Interval: \[\begin{array}{rl} CI_{95\%} & = \bar{x} \pm ME \\ & = 210 \pm 8.168 \\ & \approx (201.80,\; 218.20) \end{array}\]
Critical \(t\) value for 99% CI:
Standard Error (SE): \[\begin{array}{rl} SE & = \frac{s}{\sqrt{n}} \\ & = \frac{24}{\sqrt{36}} \\ & = 4 \end{array}\]
Margin of Error (ME): \[\begin{array}{rl} ME & = t_{\alpha/2,\,df} \times SE \\ & = 2.75 \times 4 \\ & = 11 \end{array}\]
Confidence Interval: \[\begin{array}{rl} CI_{99\%} & = \bar{x} \pm ME \\ & = 210 \pm 11 \\ & = (199,\; 221) \end{array}\]
Soal 3
3.Create a visualization comparing all intervals.
library(ggplot2)
library(dplyr)
library(plotly)
# Data from the problem
mean_x <- 210
SE <- 4
ci <- data.frame(
level = c("90%", "95%", "99%"),
lower = c(203.40, 202.20, 199.70),
upper = c(216.60, 217.80, 220.30)
)
# Colors
colors <- c(
"99%" = "#E8C5E5",
"95%" = "#FFE5B4",
"90%" = "#B4E7CE"
)
# Create bell curve data
x_vals <- seq(mean_x - 4*SE, mean_x + 4*SE, length.out = 400)
density_vals <- dnorm(x_vals, mean = mean_x, sd = SE)
df_curve <- data.frame(x = x_vals, y = density_vals)
# Plot dengan posisi label yang rapi
p <- ggplot(df_curve, aes(x, y)) +
# CI ribbons - dari terluar ke terdalam agar terlihat semua
geom_ribbon(data = subset(df_curve, x >= ci$lower[3] & x <= ci$upper[3]),
aes(ymin = 0, ymax = y),
fill = colors["99%"], alpha = 0.6) +
geom_ribbon(data = subset(df_curve, x >= ci$lower[2] & x <= ci$upper[2]),
aes(ymin = 0, ymax = y),
fill = colors["95%"], alpha = 0.6) +
geom_ribbon(data = subset(df_curve, x >= ci$lower[1] & x <= ci$upper[1]),
aes(ymin = 0, ymax = y),
fill = colors["90%"], alpha = 0.6) +
# Base bell curve di atas ribbon
geom_line(linewidth = 1.2, color = "gray20") +
# Vertical mean line
geom_vline(xintercept = mean_x, color = "black", linetype = "dashed", linewidth = 0.8) +
# Vertical lines untuk CI boundaries
geom_vline(xintercept = c(ci$lower[1], ci$upper[1]),
color = "#FF6B75", linetype = "dotted", alpha = 0.7, linewidth = 0.7) +
geom_vline(xintercept = c(ci$lower[2], ci$upper[2]),
color = "#4A9FE8", linetype = "dotted", alpha = 0.7, linewidth = 0.7) +
geom_vline(xintercept = c(ci$lower[3], ci$upper[3]),
color = "#67C9A8", linetype = "dotted", alpha = 0.7, linewidth = 0.7) +
# Label mean
annotate("text", x = mean_x, y = max(df_curve$y)*1.05,
label = "Ξ = 210", size = 5, fontface = "bold") +
# CI labels di sisi kanan - DIGESER LEBIH JAUH
annotate("text", x = 228, y = max(df_curve$y)*0.25,
label = "90% CI\n[203.40, 216.60]",
color = "#FF6B75", size = 3.5, fontface = "bold",
hjust = 0, lineheight = 0.9) +
annotate("text", x = 228, y = max(df_curve$y)*0.50,
label = "95% CI\n[202.20, 217.80]",
color = "#4A9FE8", size = 3.5, fontface = "bold",
hjust = 0, lineheight = 0.9) +
annotate("text", x = 228, y = max(df_curve$y)*0.75,
label = "99% CI\n[199.70, 220.30]",
color = "#67C9A8", size = 3.5, fontface = "bold",
hjust = 0, lineheight = 0.9) +
# Garis penghubung dari CI ke label - DIPERPANJANG
annotate("segment",
x = ci$upper[1] + 0.5, xend = 227,
y = max(df_curve$y)*0.25, yend = max(df_curve$y)*0.25,
color = "#FF6B75", linetype = "solid", linewidth = 0.3, alpha = 0.5) +
annotate("segment",
x = ci$upper[2] + 0.5, xend = 227,
y = max(df_curve$y)*0.50, yend = max(df_curve$y)*0.50,
color = "#4A9FE8", linetype = "solid", linewidth = 0.3, alpha = 0.5) +
annotate("segment",
x = ci$upper[3] + 0.5, xend = 227,
y = max(df_curve$y)*0.75, yend = max(df_curve$y)*0.75,
color = "#67C9A8", linetype = "solid", linewidth = 0.3, alpha = 0.5) +
labs(
title = "Confidence Interval Comparison of Team A & Team B ",
subtitle = "Distribution with Mean = 210, SE = 4",
x = "Task Time (minutes)",
y = "Probability Density"
) +
scale_x_continuous(breaks = seq(195, 235, 5), limits = c(194, 240)) + # DIPERLEBAR
scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", size = 16, hjust = 0.5),
plot.subtitle = element_text(size = 12, hjust = 0.5, color = "gray40"),
panel.grid.minor = element_blank(),
panel.grid.major.y = element_line(color = "gray90"),
axis.line.x = element_line(color = "gray30"),
plot.margin = margin(10, 10, 10, 10)
)
# Convert to interactive plotly
ggplotly(p)
library(ggplot2)
library(dplyr)
library(plotly)
# Data Team B (t-distribution)
mean_x <- 210
SE <- 4
ci <- data.frame(
level = c("90%", "95%", "99%"),
lower = c(203.20, 201.80, 199.00),
upper = c(216.80, 218.20, 221.00),
t_value = c(1.697, 2.042, 2.75),
ME = c(6.79, 8.17, 11)
)
# Colors
colors <- c(
"99%" = "#E8C5E5",
"95%" = "#FFE5B4",
"90%" = "#B4E7CE"
)
# Create bell curve data (using t-distribution)
df <- 35
x_vals <- seq(mean_x - 4.5*SE, mean_x + 4.5*SE, length.out = 400)
# Using t-distribution with df=35
density_vals <- dt((x_vals - mean_x)/SE, df = df) / SE
df_curve <- data.frame(x = x_vals, y = density_vals)
# Plot dengan posisi label yang rapi
p <- ggplot(df_curve, aes(x, y)) +
# CI ribbons - dari terluar ke terdalam agar terlihat semua
geom_ribbon(data = subset(df_curve, x >= ci$lower[3] & x <= ci$upper[3]),
aes(ymin = 0, ymax = y),
fill = colors["99%"], alpha = 0.6) +
geom_ribbon(data = subset(df_curve, x >= ci$lower[2] & x <= ci$upper[2]),
aes(ymin = 0, ymax = y),
fill = colors["95%"], alpha = 0.6) +
geom_ribbon(data = subset(df_curve, x >= ci$lower[1] & x <= ci$upper[1]),
aes(ymin = 0, ymax = y),
fill = colors["90%"], alpha = 0.6) +
# Base bell curve di atas ribbon
geom_line(linewidth = 1.2, color = "gray20") +
# Vertical mean line
geom_vline(xintercept = mean_x, color = "black", linetype = "dashed", linewidth = 0.8) +
# Vertical lines untuk CI boundaries
geom_vline(xintercept = c(ci$lower[1], ci$upper[1]),
color = "#6DAA8E", linetype = "dotted", alpha = 0.7, linewidth = 0.7) +
geom_vline(xintercept = c(ci$lower[2], ci$upper[2]),
color = "#E6A853", linetype = "dotted", alpha = 0.7, linewidth = 0.7) +
geom_vline(xintercept = c(ci$lower[3], ci$upper[3]),
color = "#C77DBF", linetype = "dotted", alpha = 0.7, linewidth = 0.7) +
# Label mean
annotate("text", x = mean_x, y = max(df_curve$y)*1.05,
label = "xĖ = 210", size = 5, fontface = "bold") +
# CI labels di sisi kanan
annotate("text", x = 228, y = max(df_curve$y)*0.25,
label = "90% CI\n[203.20, 216.80]\nt = 1.697",
color = "#4F8A6B", size = 3.2, fontface = "bold",
hjust = 0, lineheight = 0.9) +
annotate("text", x = 228, y = max(df_curve$y)*0.50,
label = "95% CI\n[201.80, 218.20]\nt = 2.042",
color = "#CC8B2E", size = 3.2, fontface = "bold",
hjust = 0, lineheight = 0.9) +
annotate("text", x = 228, y = max(df_curve$y)*0.75,
label = "99% CI\n[199.00, 221.00]\nt = 2.75",
color = "#A85B9D", size = 3.2, fontface = "bold",
hjust = 0, lineheight = 0.9) +
# Garis penghubung dari CI ke label
annotate("segment",
x = ci$upper[1] + 0.5, xend = 227,
y = max(df_curve$y)*0.25, yend = max(df_curve$y)*0.25,
color = "#6DAA8E", linetype = "solid", linewidth = 0.3, alpha = 0.5) +
annotate("segment",
x = ci$upper[2] + 0.5, xend = 222,
y = max(df_curve$y)*0.50, yend = max(df_curve$y)*0.50,
color = "#E6A853", linetype = "solid", linewidth = 0.3, alpha = 0.5) +
annotate("segment",
x = ci$upper[3] + 0.5, xend = 222,
y = max(df_curve$y)*0.75, yend = max(df_curve$y)*0.75,
color = "#C77DBF", linetype = "solid", linewidth = 0.3, alpha = 0.5) +
labs(
title = "Confidence Intervals for Team B",
subtitle = "Sample Mean = 210 min | SE = 4 | df = 35",
x = "Task Time (minutes)",
y = "Probability Density"
) +
scale_x_continuous(breaks = seq(195, 235, 5), limits = c(194, 240)) +
scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", size = 16, hjust = 0.5),
plot.subtitle = element_text(size = 11, hjust = 0.5, color = "gray40"),
panel.grid.minor = element_blank(),
panel.grid.major.y = element_line(color = "gray90"),
axis.line.x = element_line(color = "gray30"),
plot.margin = margin(10, 10, 10, 10)
)
# Convert to interactive plotly
ggplotly(p)
Soal 4
4.Explain why the interval widths differ* even with similar data.
The width of each confidence interval differs because higher confidence levels require capturing more of the bell curve. Even though the mean and standard error stay the same, a larger confidence level pushes the interval further into the tails of the distribution. As a result, the 90% interval is the narrowest because it only needs to cover the central portion of the curve, while the 95% and especially the 99% intervals become wider to guarantee a higher level of certainty. In short, the data do not change, only the amount of âcertaintyâ we want, and that directly determines how wide the interval must be.
One-Sided Confidence Interval: A Software as a Service (SaaS) company wants to ensure that at least 70% of weekly active users utilize a premium feature.
From the experiment:
\[ \begin{eqnarray*} n &=& 250 \quad \text{(total users)} \\ x &=& 185 \quad \text{(active premium users)} \end{eqnarray*} \]
Management is only interested in the lower bound of the estimate.
Soal 1
1.Identify the type of Confidence Interval and the appropriate test.
This analysis uses a one-sided (lower-bound) confidence interval for a population proportion because management is only concerned with the minimum plausible proportion of active premium users, the interval focuses solely on estimating the lower limit of the true proportion.
The appropriate statistical method is a Z-based confidence interval for a population proportion. This approach is valid because the sample size is sufficiently large (n=250), and both np and n(1âp) meet the normal approximation conditions, allowing the sampling distribution of the sample proportion to be treated as approximately normal.
Soal 2
2.Compute the one-sided lower Confidence Interval at:
\[ \begin{eqnarray*} n &=& 250 \quad \text{(total users)} \\ x &=& 185 \quad \text{(active premium users)} \end{eqnarray*} \]
Sample proportion: \[\hat{p} = \frac{x}{n} = \frac{185}{250} = 0.74\]
Standard Error (SE): \[\begin{array}{rl} SE & = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \\[1mm] & = \sqrt{\frac{0.74 \times 0.26}{250}} \\[1mm] & \approx 0.028 \end{array}\]
Critical Value for \(90\%\) one sided CI:
Margin of Error(ME): \[\begin{array}{rl} ME & = z_{1-\alpha} \cdot SE \\[1mm] & = 1.28 \cdot 0.028 \\[1mm] & \approx 0.036 \end{array}\]
One-Sided Confidence Interval:
Lower One-Sided CI: \[\begin{array}{rl} CI_{lower} & = \hat{p} - ME \\[1mm] & = 0.74 - 0.036 \\[1mm] & \approx 0.704 \end{array}\]
Standard Error (SE): \[\begin{array}{rl} SE & = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \\[1mm] & = \sqrt{\frac{0.74 \times 0.26}{250}} \\[1mm] & \approx 0.028 \end{array}\]
Critical Value for \(95\%\) one sided CI:
Margin of Error(ME): \[\begin{array}{rl} ME & = z_{1-\alpha} \cdot SE \\[1mm] & = 1.645 \cdot 0.028 \\[1mm] & \approx 0.046 \end{array}\]
One-Sided Confidence Interval:
Lower One-Sided CI: \[\begin{array}{rl} CI_{lower} & = \hat{p} - ME \\[1mm] & = 0.74 - 0.046 \\[1mm] & = 0.694 \end{array}\]
Standard Error (SE): \[\begin{array}{rl} SE & = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \\[1mm] & = \sqrt{\frac{0.74 \times 0.26}{250}} \\[1mm] & \approx 0.028 \end{array}\]
Critical Value for \(99\%\) one sided CI:
Margin of Error(ME): \[\begin{array}{rl} ME & = z_{1-\alpha} \cdot SE \\[1mm] & = 2.33 \cdot 0.028 \\[1mm] & \approx 0.065 \end{array}\]
One-Sided Confidence Interval:
Lower One-Sided CI: \[\begin{array}{rl} CI_{lower} & = \hat{p} - ME \\[1mm] & = 0.74 - 0.065 \\[1mm] & = 0.675 \end{array}\]
Soal 3
3.Visualize the lower bounds for all confidence levels.
library(ggplot2)
library(dplyr)
library(plotly)
# Data Proportion CI
p_hat <- 0.74
SE <- 0.028
# One-sided lower confidence intervals
ci <- data.frame(
level = c("90%", "95%", "99%"),
lower = c(0.706, 0.694, 0.675),
upper = rep(p_hat, 3),
z_value = c(1.28, 1.645, 2.33),
ME = c(0.036, 0.046, 0.065)
)
# Colors
colors <- c(
"99%" = "#FFD6E8",
"95%" = "#C9E4DE",
"90%" = "#E8DAF2"
)
# Create normal curve for proportion
x_vals <- seq(p_hat - 4*SE, p_hat + 4*SE, length.out = 400)
density_vals <- dnorm(x_vals, mean = p_hat, sd = SE)
df_curve <- data.frame(x = x_vals, y = density_vals)
# Plot dengan layer yang rapi
p <- ggplot(df_curve, aes(x, y)) +
# CI ribbons - dari terluar (99%) ke terdalam (90%)
geom_ribbon(data = subset(df_curve, x >= ci$lower[3] & x <= ci$upper[3]),
aes(ymin = 0, ymax = y),
fill = colors["99%"], alpha = 0.7) +
geom_ribbon(data = subset(df_curve, x >= ci$lower[2] & x <= ci$upper[2]),
aes(ymin = 0, ymax = y),
fill = colors["95%"], alpha = 0.7) +
geom_ribbon(data = subset(df_curve, x >= ci$lower[1] & x <= ci$upper[1]),
aes(ymin = 0, ymax = y),
fill = colors["90%"], alpha = 0.7) +
# Base bell curve
geom_line(linewidth = 1.2, color = "gray20") +
# Vertical line untuk p_hat (sample proportion)
geom_vline(xintercept = p_hat, color = "black",
linetype = "dashed", linewidth = 0.9) +
# Vertical lines untuk CI boundaries (lower bounds)
geom_vline(xintercept = ci$lower[1],
color = "#B896D4", linetype = "dotted", alpha = 0.8, linewidth = 0.7) +
geom_vline(xintercept = ci$lower[2],
color = "#7FB8A8", linetype = "dotted", alpha = 0.8, linewidth = 0.7) +
geom_vline(xintercept = ci$lower[3],
color = "#FFB3CE", linetype = "dotted", alpha = 0.8, linewidth = 0.7) +
# Label p_hat
annotate("text", x = p_hat, y = max(df_curve$y)*1.05,
label = "pĖ = 0.74", size = 5, fontface = "bold") +
# CI labels di sisi kanan
# 90% CI
annotate("text", x = 0.805, y = max(df_curve$y)*0.28,
label = "90% CI (Lower)\n[0.706, â)\nz = 1.28",
color = "#8B5FA8", size = 3.2, fontface = "bold",
hjust = 0, lineheight = 0.9) +
# 95% CI
annotate("text", x = 0.805, y = max(df_curve$y)*0.55,
label = "95% CI (Lower)\n[0.694, â)\nz = 1.645",
color = "#4A8B78", size = 3.2, fontface = "bold",
hjust = 0, lineheight = 0.9) +
# 99% CI
annotate("text", x = 0.805, y = max(df_curve$y)*0.82,
label = "99% CI (Lower)\n[0.675, â)\nz = 2.33",
color = "#E8699A", size = 3.2, fontface = "bold",
hjust = 0, lineheight = 0.9) +
# Garis penghubung dari lower bound ke label
annotate("segment",
x = ci$lower[1] + 0.002, xend = 0.803,
y = max(df_curve$y)*0.28, yend = max(df_curve$y)*0.28,
color = "#B896D4", linetype = "solid", linewidth = 0.3, alpha = 0.6) +
annotate("segment",
x = ci$lower[2] + 0.002, xend = 0.803,
y = max(df_curve$y)*0.55, yend = max(df_curve$y)*0.55,
color = "#7FB8A8", linetype = "solid", linewidth = 0.3, alpha = 0.6) +
annotate("segment",
x = ci$lower[3] + 0.002, xend = 0.803,
y = max(df_curve$y)*0.82, yend = max(df_curve$y)*0.82,
color = "#FFB3CE", linetype = "solid", linewidth = 0.3, alpha = 0.6) +
labs(
title = "One-Sided Lower Confidence Intervals for Population Proportion",
subtitle = "Sample proportion pĖ = 0.74 | n = 250 | SE = 0.028",
x = "Proportion (p)",
y = "Probability Density"
) +
scale_x_continuous(breaks = seq(0.64, 0.84, 0.02),
limits = c(0.63, 0.87)) +
scale_y_continuous(expand = expansion(mult = c(0, 0.15))) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", size = 15, hjust = 0.5),
plot.subtitle = element_text(size = 10.5, hjust = 0.5, color = "gray40"),
panel.grid.minor = element_blank(),
panel.grid.major.y = element_line(color = "gray90"),
axis.line.x = element_line(color = "gray30"),
plot.margin = margin(10, 10, 10, 10)
)
# Convert to interactive plotly
ggplotly(p)
Soal 4
4.Determine whether the 70% target is statistically satisfied.
The 70% target is statistically met only at the 90% confidence level, because the lower bound (0.704) remains above the target. However, at higher confidence levels (95% and 99%), the lower bounds fall below 70%, which means the data does not provide enough evidence to confidently claim the target is reached.
Siregar, B. (t.t.). Introduction to Statistics: Chapter 8 Confidence Interval. dsciencelabs. Diakses dari https://bookdown.org/dsciencelabs/intro_statistics/08-Confidence_Interval.html
Illowsky, B., & Dean, S. (2023). Introductory Statistics 2e (2nd ed.). Houston: OpenStax. Diakses dari https://openstax.org/details/books/introductory-statistics-2e
Lane, D. M. (2013). Online Statistics Education: A Multimedia Course of Study. Rice University. Diakses dari https://onlinestatbook.com/Online_Statistics_Education.pdf