Tugas week 13 ~ Study Cases Confidence Interval
CHELSEA TESALONIKA PATRICIA HUTAJULU
DATA SCIENCE UNDERGRADUATE STUDENT AT INSTITUT TEKNOLOGI SAINS BANDUNG
Case Study 1
Confidence Interval for Mean, \(\sigma\) Known: An e-commerce platform wants to estimate the average number of daily transactions per user after launching a new feature. Based on large-scale historical data, the population standard deviation is known.
\[ \begin{eqnarray*} \sigma &=& 3.2 \quad \text{(population standard deviation)} \\ n &=& 100 \quad \text{(sample size)} \\ \bar{x} &=& 12.6 \quad \text{(sample mean)} \end{eqnarray*} \]
Tasks
- Identify the appropriate statistical test and justify your choice.
- Compute the Confidence Intervals for:
- \(90\%\)
- \(95\%\)
- \(99\%\)
- Create a comparison visualization of the three confidence intervals.
- Interpret the results in a business analytics context.
Identify
Uji statistik yang tepat adalah Uji Z (Z-test) atau perhitungan Interval Kepercayaan Z (Z-confidence interval). Menggunakan Uji Z karena simpangan baku populasi (\(\sigma\)) diketahui (\(\sigma =3.2\)) dan ukuran sampel (\(n=100\)) cukup besar (\(n\ge 30\)). Teorema Limit Pusat berlaku, sehingga distribusi sampling mean dapat diasumsikan berdistribusi normal, memungkinkan penggunaan nilai Z dari tabel distribusi normal standar.
Confidence Interval
- Menentukan nilai alpha :
\[ 1 - \alpha \]
- Menentukan nilai kritis :
\[ Z_{\alpha}{2} \]
- Menentukan Margin of Error :
\[E = Z\alpha/2 \times \frac{\sigma}{\sqrt{n}}\]
- Menentukan Confidence Interval :
\[CI = \bar{x} \pm E\] 1. 90% = 0.9
Diketahui:
\[ n = 100,\quad \bar{x} = 12{.}6,\quad \sigma = 3{.}2 \]
- Menentukan nilai kritis Z
\[ \alpha = 1 - 0{.}9 = 0{.}1 \]
\[ \frac{\alpha}{2} = \frac{0{.}1}{2} = 0{.}05 \]
\[ Z_{0{.}05} = 1{.}64 \]
- Margin of Error
\[ E = Z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}} \]
\[ E = 1{.}64 \times \frac{3{.}2}{\sqrt{100}} \]
\[ E = 0{.}5248 \]
- Confidence Interval
\[ CI = \bar{x} \pm E \]
\[ CI = 12{.}6 \pm 0{.}5248 \]
\[ CI = (12{.}07,\; 13{.}12) \]
2. 95% = 0.95
- Menentukan nilai kritis Z
\[ \alpha = 1 - 0{.}95 = 0{.}05 \]
\[ \frac{\alpha}{2} = \frac{0{.}05}{2} = 0{.}025 \]
\[ Z_{0{.}025} = 1{.}96 \]
- Margin of Error
\[ E = Z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}} \]
\[ E = 1{.}96 \times \frac{3{.}2}{\sqrt{100}} \]
\[ E = 0{.}6272 \]
- Confidence Interval
\[ CI = \bar{x} \pm E \]
\[ CI = 12{.}6 \pm 0{.}6272 \]
\[ CI = (13{.}23,\; 11{.}97) \]
3. 99% = 0.99
- Menentukan nilai kritis Z
\[ \alpha = 1 - 0{.}99 = 0{.}01 \]
\[ \frac{\alpha}{2} = \frac{0{.}01}{2} = 0{.}005 \]
\[ Z_{0{.}005} = 2{.}57 \]
- Margin of Error
\[ E = Z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}} \]
\[ E = 2{.}57 \times \frac{3{.}2}{\sqrt{100}} \]
\[ E = 0{.}8224 \]
- Confidence Interval
\[ CI = \bar{x} \pm E \]
\[ CI = 12{.}6 \pm 0{.}8224 \]
\[ CI = (13{.}42,\; 11{.}78) \]
Visualization
library(ggplot2)
# Data
mean_val <- 12.6
sigma <- 3.2
n <- 100
se <- sigma / sqrt(n) # Standard Error = 0.32
# Confidence Intervals
ci_90 <- c(lower = 12.07, upper = 13.12)
ci_95 <- c(lower = 11.97, upper = 13.23)
ci_99 <- c(lower = 11.78, upper = 13.42)
# Data frame untuk plotting
x_range <- seq(11.5, 13.7, length.out = 500)
y_norm <- dnorm(x_range, mean = mean_val, sd = se)
df_curve <- data.frame(x = x_range, y = y_norm)
# Warna tema biru
color_99 <- "#1e3a8a" # Dark blue
color_95 <- "#3b82f6" # Medium blue
color_90 <- "#60a5fa" # Light blue
color_mean <- "#0ea5e9" # Cyan blue
# Plot
p <- ggplot() +
# Shaded areas (dari terluar ke terdalam) - TANPA legend dulu
geom_ribbon(data = df_curve,
aes(x = x, ymin = 0,
ymax = ifelse(x >= ci_99["lower"] & x <= ci_99["upper"], y, 0)),
fill = color_99, alpha = 0.2) +
geom_ribbon(data = df_curve,
aes(x = x, ymin = 0,
ymax = ifelse(x >= ci_95["lower"] & x <= ci_95["upper"], y, 0)),
fill = color_95, alpha = 0.3) +
geom_ribbon(data = df_curve,
aes(x = x, ymin = 0,
ymax = ifelse(x >= ci_90["lower"] & x <= ci_90["upper"], y, 0)),
fill = color_90, alpha = 0.4) +
# Bell curve
geom_line(data = df_curve, aes(x = x, y = y),
color = "#1e293b", size = 1.2) +
# Vertical lines untuk CI bounds
# 99% CI
geom_vline(xintercept = ci_99["lower"],
linetype = "dashed", color = color_99, size = 1) +
geom_vline(xintercept = ci_99["upper"],
linetype = "dashed", color = color_99, size = 1) +
# 95% CI
geom_vline(xintercept = ci_95["lower"],
linetype = "dashed", color = color_95, size = 1) +
geom_vline(xintercept = ci_95["upper"],
linetype = "dashed", color = color_95, size = 1) +
# 90% CI
geom_vline(xintercept = ci_90["lower"],
linetype = "dashed", color = color_90, size = 1) +
geom_vline(xintercept = ci_90["upper"],
linetype = "dashed", color = color_90, size = 1) +
# Mean line
geom_vline(xintercept = mean_val,
color = color_mean, size = 1.5) +
geom_point(aes(x = mean_val, y = 0),
color = color_mean, size = 4) +
# Labels
annotate("text", x = mean_val, y = max(y_norm) * 1.1,
label = paste0("μ = ", mean_val),
color = color_mean, fontface = "bold", size = 5) +
annotate("text", x = mean_val, y = max(y_norm) * 0.85,
label = "99%", color = color_99, fontface = "bold", size = 4) +
annotate("text", x = mean_val, y = max(y_norm) * 0.65,
label = "95%", color = color_95, fontface = "bold", size = 4) +
annotate("text", x = mean_val, y = max(y_norm) * 0.45,
label = "90%", color = color_90, fontface = "bold", size = 4) +
# Dummy geom untuk legend (invisible points)
geom_point(aes(x = 14, y = 0, color = "99% CI [11.78, 13.42]"),
size = 0, alpha = 0) +
geom_point(aes(x = 14, y = 0, color = "95% CI [11.97, 13.23]"),
size = 0, alpha = 0) +
geom_point(aes(x = 14, y = 0, color = "90% CI [12.07, 13.12]"),
size = 0, alpha = 0) +
# Scale untuk legend warna
scale_color_manual(
name = "Confidence Intervals",
values = c("99% CI [11.78, 13.42]" = color_99,
"95% CI [11.97, 13.23]" = color_95,
"90% CI [12.07, 13.12]" = color_90),
guide = guide_legend(
override.aes = list(
size = 8,
shape = 15, # Square shape
alpha = 0.6
)
)
) +
# Themes and labels
labs(
title = "Confidence Intervals dengan Kurva Normal",
subtitle = paste0("σ = ", sigma, ", n = ", n, ", x̄ = ", mean_val, ", SE = ", round(se, 2)),
x = "Nilai",
y = "Density"
) +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(face = "bold", size = 18, hjust = 0.5, color = "#1e293b"),
plot.subtitle = element_text(hjust = 0.5, color = "#475569"),
panel.grid.minor = element_blank(),
panel.background = element_rect(fill = "#f8fafc", color = NA),
plot.background = element_rect(fill = "#f1f5f9", color = NA),
legend.position = "bottom",
legend.title = element_text(face = "bold", size = 12, color = "#1e293b"),
legend.text = element_text(size = 10, color = "#475569"),
legend.background = element_rect(fill = "white", color = "#cbd5e1", size = 0.5),
legend.key = element_rect(fill = "white", color = NA),
legend.key.size = unit(1, "cm"),
legend.margin = margin(10, 10, 10, 10),
legend.box.background = element_rect(fill = "white", color = "#cbd5e1")
) +
scale_x_continuous(breaks = seq(11.5, 13.5, by = 0.25)) +
coord_cartesian(xlim = c(11.5, 13.7), ylim = c(0, max(y_norm) * 1.15))
# Tampilkan plot
print(p)Interpretasi
Dari hasil confidence interval 90%, 95%, dan 99%, dapat disimpulkan bahwa rata-rata jumlah transaksi harian per pengguna setelah peluncuran fitur baru berada di sekitar 12–13 transaksi. Perbedaan tingkat confidence menunjukkan trade-off antara tingkat keyakinan dan ketepatan estimasi. Dalam konteks business analytics, confidence interval ini memberikan gambaran rentang performa yang realistis, bukan hanya satu angka rata-rata. Misalnya, pada confidence level 95%, manajemen dapat cukup yakin bahwa rata-rata transaksi harian per pengguna berada di kisaran 11,97 hingga 13,23. Rentang ini bisa digunakan sebagai baseline kinerja fitur baru.
Dari sisi pengambilan keputusan:
CI 90% cocok digunakan untuk keputusan operasional jangka pendek atau eksperimen cepat, karena rentangnya lebih sempit dan lebih presisi.
CI 95% paling relevan untuk evaluasi kinerja bisnis secara umum, seperti penilaian apakah fitur baru layak dipertahankan atau ditingkatkan.
CI 99% lebih sesuai untuk keputusan strategis yang berisiko tinggi, seperti investasi besar atau scaling sistem, karena memberikan tingkat keyakinan paling tinggi meskipun dengan rentang estimasi yang lebih lebar.
Case Study 2
Confidence Interval for Mean, \(\sigma\) Unknown: A UX Research team analyzes task completion time (in minutes) for a new mobile application. The data are collected from 12 users:
\[ 8.4,\; 7.9,\; 9.1,\; 8.7,\; 8.2,\; 9.0,\; 7.8,\; 8.5,\; 8.9,\; 8.1,\; 8.6,\; 8.3 \]
Tasks:
- Identify the appropriate statistical test and explain why.
- Compute the Confidence Intervals for:
- \(90\%\)
- \(95\%\)
- \(99\%\)
- Visualize the three intervals on a single plot.
- Explain how sample size and confidence level influence the interval width.
Identify
Uji statistik yang sesuai adalah uji-t karena deviasi standar populasi (\(\sigma\)) tidak diketahui, dan hanya diketahui deviasi standar sampel (\(s\)) dari sampel kecil (\(n=12\)) yang diambil dari populasi yang diasumsikan berdistribusi normal (seperti yang tersirat dalam konteks soal statistik). Jika \(\sigma\) diketahui atau ukuran sampel sangat besar (\(n>30\)), uji Z akan digunakan sebagai gantinya.
Confidence Interval
1. 90% = 0.90
Diketahui statistik sampel sebagai berikut:
\[ n = 12,\quad \bar{x} = 8.458,\quad s = 0.421 \]
Derajat kebebasan:
\[ df = n - 1 = 11 \]
- Menentukan Nilai Kritis t
\[ \alpha = 1 - 0.90 = 0.10 \]
\[ \frac{\alpha}{2} = 0.05 \]
\[ t_{0.95,\,11} = 1.796 \]
- Standard Error
\[ SE = \frac{s}{\sqrt{n}} = \frac{0.421}{\sqrt{12}} = 0.122 \]
- Margin of Error
\[ ME = t_{\alpha/2,\,df} \times SE \]
\[ ME = 1.796 \times 0.122 = 0.218 \]
- Confidence Interval
\[ CI = \bar{x} \pm ME \]
\[ CI = 8.458 \pm 0.218 \]
\[ CI_{90\%} = (8.240,\; 8.677) \]
2. 95% = 0.95
- Menentukan Nilai Kritis t
\[ \alpha = 1 - 0.95 = 0.05 \]
\[ \frac{\alpha}{2} = 0.025 \]
\[ t_{0.975,\,11} = 2.201 \]
- Standard Error
\[ SE = \frac{0.421}{\sqrt{12}} = 0.122 \]
- Margin of Error
\[ ME = 2.201 \times 0.122 = 0.267 \]
- Confidence Interval
\[ CI = 8.458 \pm 0.267 \]
\[ CI_{95\%} = (8.191,\; 8.726) \]
3. 99% = 0.99
- Menentukan Nilai Kritis t
\[ \alpha = 1 - 0.99 = 0.01 \]
\[ \frac{\alpha}{2} = 0.005 \]
\[ t_{0.995,\,11} = 3.106 \]
- Standard Error
\[ SE = \frac{0.421}{\sqrt{12}} = 0.122 \]
- Margin of Error
\[ ME = 3.106 \times 0.122 = 0.377 \]
- Confidence Interval
\[ CI = 8.458 \pm 0.377 \]
\[ CI_{99\%} = (8.081,\; 8.836) \]
Visualization
# Load library yang diperlukan
library(ggplot2)
# Parameter
mean_val <- 12.6
sd_val <- 3.2 / sqrt(100) # Standard error = σ/√n = 3.2/10 = 0.32
n <- 100
# Data Confidence Intervals (sudah dikoreksi)
ci_data <- data.frame(
level = c("90%", "95%", "99%"),
lower = c(12.07, 11.97, 11.78),
upper = c(13.13, 13.23, 13.42),
z_score = c(1.64, 1.96, 2.57),
color = c("#60a5fa", "#2563eb", "#1e3a8a")
)
# Buat data untuk kurva normal
x_vals <- seq(10.5, 14.7, length.out = 500)
y_vals <- dnorm(x_vals, mean = mean_val, sd = sd_val)
# Create plot
ggplot() +
# Area untuk CI 99% (terluar)
geom_area(data = data.frame(x = x_vals[x_vals >= 11.78 & x_vals <= 13.42],
y = dnorm(x_vals[x_vals >= 11.78 & x_vals <= 13.42],
mean = mean_val, sd = sd_val)),
aes(x = x, y = y), fill = "#1e3a8a", alpha = 0.2) +
# Area untuk CI 95%
geom_area(data = data.frame(x = x_vals[x_vals >= 11.97 & x_vals <= 13.23],
y = dnorm(x_vals[x_vals >= 11.97 & x_vals <= 13.23],
mean = mean_val, sd = sd_val)),
aes(x = x, y = y), fill = "#2563eb", alpha = 0.3) +
# Area untuk CI 90% (terdalam)
geom_area(data = data.frame(x = x_vals[x_vals >= 12.07 & x_vals <= 13.13],
y = dnorm(x_vals[x_vals >= 12.07 & x_vals <= 13.13],
mean = mean_val, sd = sd_val)),
aes(x = x, y = y), fill = "#60a5fa", alpha = 0.4) +
# Kurva lonceng utama
geom_line(data = data.frame(x = x_vals, y = y_vals),
aes(x = x, y = y), color = "#0f172a", linewidth = 1.5) +
# Garis vertikal untuk batas CI 99%
geom_vline(xintercept = c(11.78, 13.42),
color = "#1e3a8a", linetype = "dashed", linewidth = 1) +
# Garis vertikal untuk batas CI 95%
geom_vline(xintercept = c(11.97, 13.23),
color = "#2563eb", linetype = "dashed", linewidth = 1) +
# Garis vertikal untuk batas CI 90%
geom_vline(xintercept = c(12.07, 13.13),
color = "#60a5fa", linetype = "dashed", linewidth = 1) +
# Garis vertikal untuk mean
geom_vline(xintercept = mean_val,
color = "#dc2626", linetype = "solid", linewidth = 1.5) +
# Label untuk mean
annotate("text", x = mean_val, y = max(y_vals) * 1.05,
label = paste0("μ = ", mean_val),
size = 5, fontface = "bold", color = "#dc2626") +
# Label untuk CI 90%
annotate("text", x = 12.07, y = max(y_vals) * 0.85,
label = "12.07", size = 3.5, hjust = 1.2, color = "#60a5fa", fontface = "bold") +
annotate("text", x = 13.13, y = max(y_vals) * 0.85,
label = "13.13", size = 3.5, hjust = -0.2, color = "#60a5fa", fontface = "bold") +
# Label untuk CI 95%
annotate("text", x = 11.97, y = max(y_vals) * 0.70,
label = "11.97", size = 3.5, hjust = 1.2, color = "#2563eb", fontface = "bold") +
annotate("text", x = 13.23, y = max(y_vals) * 0.70,
label = "13.23", size = 3.5, hjust = -0.2, color = "#2563eb", fontface = "bold") +
# Label untuk CI 99%
annotate("text", x = 11.78, y = max(y_vals) * 0.55,
label = "11.78", size = 3.5, hjust = 1.2, color = "#1e3a8a", fontface = "bold") +
annotate("text", x = 13.42, y = max(y_vals) * 0.55,
label = "13.42", size = 3.5, hjust = -0.2, color = "#1e3a8a", fontface = "bold") +
# Legend manual menggunakan geom_point (invisible points untuk legend)
geom_point(data = data.frame(x = c(14.7, 14.7, 14.7),
y = c(0, 0, 0),
CI = factor(c("99% CI [11.78, 13.42]",
"95% CI [11.97, 13.23]",
"90% CI [12.07, 13.13]"),
levels = c("99% CI [11.78, 13.42]",
"95% CI [11.97, 13.23]",
"90% CI [12.07, 13.13]"))),
aes(x = x, y = y, fill = CI),
shape = 22, size = 8, alpha = 0.6, color = "black") +
scale_fill_manual(
name = "Confidence Intervals",
values = c("99% CI [11.78, 13.42]" = "#1e3a8a",
"95% CI [11.97, 13.23]" = "#2563eb",
"90% CI [12.07, 13.13]" = "#60a5fa"),
guide = guide_legend(override.aes = list(size = 6, alpha = 0.6))
) +
# Labels dan tema
labs(title = "Distribusi Normal dengan Confidence Intervals",
subtitle = "n = 100, σ = 3.2, x̄ = 12.6, SE = 0.32",
x = "Nilai",
y = "Densitas Probabilitas") +
scale_x_continuous(breaks = seq(10.5, 14.5, 0.5)) +
theme_minimal() +
theme(
plot.title = element_text(size = 18, face = "bold", color = "#0f172a", hjust = 0.5),
plot.subtitle = element_text(size = 12, color = "#475569", hjust = 0.5),
axis.title = element_text(size = 12, face = "bold", color = "#0f172a"),
axis.text = element_text(size = 11, color = "#334155"),
panel.grid.major = element_line(color = "#e2e8f0", linewidth = 0.3),
panel.grid.minor = element_blank(),
legend.position = "bottom",
legend.title = element_text(size = 12, face = "bold", color = "#0f172a"),
legend.text = element_text(size = 10, color = "#334155"),
legend.background = element_rect(fill = "#f1f5f9", color = "#cbd5e1", linewidth = 0.5),
legend.key = element_rect(fill = "white", color = NA),
legend.key.size = unit(1, "cm"),
legend.margin = margin(10, 10, 10, 10),
plot.background = element_rect(fill = "#f8fafc", color = NA),
panel.background = element_rect(fill = "white", color = NA),
plot.margin = margin(20, 20, 20, 20)
)Interpretasi
- Pengaruh Ukuran Sampel (n)
Ukuran sampel berbanding terbalik dengan lebar interval kepercayaan. Semakin besar ukuran sampel, nilai standard error akan semakin kecil, sehingga margin of error mengecil dan interval kepercayaan menjadi lebih sempit. Sebaliknya, pada ukuran sampel kecil seperti \(n = 12\), nilai standard error relatif lebih besar sehingga interval kepercayaan menjadi lebih lebar. Hal ini menunjukkan bahwa estimasi mean dari sampel kecil memiliki tingkat ketidakpastian yang lebih tinggi.
- Pengaruh Tingkat Kepercayaan (Confidence Level)
Tingkat kepercayaan berbanding lurus dengan lebar interval kepercayaan. Semakin tinggi tingkat kepercayaan yang digunakan, semakin besar nilai kritis \(t_{1-\alpha/2}\), sehingga margin of error meningkat dan interval kepercayaan menjadi lebih lebar. Hal ini terlihat dari hasil perhitungan interval kepercayaan 90%, 95%, dan 99%, di mana interval 90% adalah yang paling sempit, sedangkan interval 99% adalah yang paling lebar.
Kesimpulannya adalah ukuran sampel yang lebih besar menghasilkan interval kepercayaan yang lebih sempit dan estimasi yang lebih presisi, sedangkan tingkat kepercayaan yang lebih tinggi menghasilkan interval kepercayaan yang lebih lebar dan bersifat lebih konservatif. Dengan demikian, terdapat trade-off antara tingkat presisi estimasi dan tingkat keyakinan terhadap interval kepercayaan.
Case Study 3
Confidence Interval for a Proportion, A/B Testing: A data science team runs an A/B test on a new Call-To-Action (CTA) button design. The experiment yields:
\[ \begin{eqnarray*} n &=& 400 \quad \text{(total users)} \\ x &=& 156 \quad \text{(users who clicked the CTA)} \end{eqnarray*} \]
Tasks:
- Compute the sample proportion \(\hat{p}\).
- Compute Confidence Intervals for the proportion at:
- \(90\%\)
- \(95\%\)
- \(99\%\)
- Visualize and compare the three intervals.
- Explain how confidence level affects decision-making in product experiments.
Compute the sample proportion (p̂)
Diketahui:
- \(n = 400\)
- \(x = 156\)
\[ \hat{p} = \frac{x}{n} = \frac{156}{400} = 0.39 \]
Confidence Interval
1. 90%
Nilai \(z = 1.645\)
\[ ME = 1.645 \times 0.0244 \approx 0.0401 \]
\[ CI_{90\%} = (0.39 - 0.0401,\ 0.39 + 0.0401) \]
\[ (0.3499,\ 0.4301) \]
2. 95%
Nilai \(z = 1.96\)
\[ ME = 1.96 \times 0.0244 \approx 0.0478 \]
\[ CI_{95\%} = (0.3422,\ 0.4378) \]
3. 99%
Nilai \(z = 2.576\)
\[ ME = 2.576 \times 0.0244 \approx 0.0629 \]
\[ CI_{99\%} = (0.3271,\ 0.4529) \]
Visualization
library(ggplot2)
# Data
p_hat <- 0.39 # Proporsi sampel
se <- 0.0244 # Standard Error
# Confidence Intervals
ci_90 <- list(
lower = 0.3499,
upper = 0.4301,
z = 1.645,
me = 0.0401,
level = "90%"
)
ci_95 <- list(
lower = 0.3422,
upper = 0.4378,
z = 1.96,
me = 0.0478,
level = "95%"
)
ci_99 <- list(
lower = 0.3271,
upper = 0.4529,
z = 2.576,
me = 0.0629,
level = "99%"
)
# Data frame untuk plotting
x_range <- seq(0.30, 0.48, length.out = 500)
y_norm <- dnorm(x_range, mean = p_hat, sd = se)
df_curve <- data.frame(x = x_range, y = y_norm)
# Data frame untuk shaded areas
df_ci90 <- data.frame(
x = seq(ci_90$lower, ci_90$upper, length.out = 200)
)
df_ci90$y <- dnorm(df_ci90$x, mean = p_hat, sd = se)
df_ci95 <- data.frame(
x = seq(ci_95$lower, ci_95$upper, length.out = 200)
)
df_ci95$y <- dnorm(df_ci95$x, mean = p_hat, sd = se)
df_ci99 <- data.frame(
x = seq(ci_99$lower, ci_99$upper, length.out = 200)
)
df_ci99$y <- dnorm(df_ci99$x, mean = p_hat, sd = se)
# Warna tema biru
color_99 <- "#1e3a8a" # Dark blue
color_95 <- "#3b82f6" # Medium blue
color_90 <- "#60a5fa" # Light blue
color_mean <- "#0ea5e9" # Cyan blue
# Plot
p <- ggplot() +
# Shaded areas (dari terluar ke terdalam)
geom_area(data = df_ci99, aes(x = x, y = y),
fill = color_99, alpha = 0.2) +
geom_area(data = df_ci95, aes(x = x, y = y),
fill = color_95, alpha = 0.3) +
geom_area(data = df_ci90, aes(x = x, y = y),
fill = color_90, alpha = 0.4) +
# Bell curve
geom_line(data = df_curve, aes(x = x, y = y),
color = "#1e293b", size = 1.2) +
# Vertical lines untuk CI bounds
# 99% CI
geom_vline(xintercept = ci_99$lower,
linetype = "dashed", color = color_99, size = 1) +
geom_vline(xintercept = ci_99$upper,
linetype = "dashed", color = color_99, size = 1) +
# 95% CI
geom_vline(xintercept = ci_95$lower,
linetype = "dashed", color = color_95, size = 1) +
geom_vline(xintercept = ci_95$upper,
linetype = "dashed", color = color_95, size = 1) +
# 90% CI
geom_vline(xintercept = ci_90$lower,
linetype = "dashed", color = color_90, size = 1) +
geom_vline(xintercept = ci_90$upper,
linetype = "dashed", color = color_90, size = 1) +
# Mean line (proporsi sampel)
geom_vline(xintercept = p_hat,
color = color_mean, size = 1.5) +
geom_point(aes(x = p_hat, y = 0),
color = color_mean, size = 4) +
# Labels
annotate("text", x = p_hat, y = max(y_norm) * 1.1,
label = paste0("p̂ = ", p_hat),
color = color_mean, fontface = "bold", size = 5) +
annotate("text", x = mean(c(ci_99$lower, ci_99$upper)),
y = max(y_norm) * 0.9,
label = "99%", color = color_99, fontface = "bold", size = 4.5) +
annotate("text", x = mean(c(ci_95$lower, ci_95$upper)),
y = max(y_norm) * 0.7,
label = "95%", color = color_95, fontface = "bold", size = 4.5) +
annotate("text", x = mean(c(ci_90$lower, ci_90$upper)),
y = max(y_norm) * 0.5,
label = "90%", color = color_90, fontface = "bold", size = 4.5) +
# Themes and labels
labs(
title = "Confidence Intervals untuk Proporsi dengan Kurva Normal",
subtitle = paste0("p̂ = ", p_hat, ", SE = ", se),
x = "Proporsi",
y = "Density"
) +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(face = "bold", size = 18, hjust = 0.5, color = "#1e293b"),
plot.subtitle = element_text(hjust = 0.5, color = "#475569"),
panel.grid.minor = element_blank(),
panel.background = element_rect(fill = "#f8fafc", color = NA),
plot.background = element_rect(fill = "#f1f5f9", color = NA)
) +
scale_x_continuous(breaks = seq(0.30, 0.48, by = 0.02),
labels = sprintf("%.2f", seq(0.30, 0.48, by = 0.02)))
# Tampilkan plot
print(p)Conclusion
Dalam eksperimen produk seperti A/B testing, confidence level memengaruhi seberapa yakin tim sebelum mengambil keputusan. Confidence level yang lebih rendah, seperti 90%, menghasilkan interval kepercayaan yang lebih sempit sehingga keputusan bisa diambil lebih cepat, tetapi risikonya lebih besar karena peluang kesalahan masih cukup tinggi. Sebaliknya, confidence level yang lebih tinggi, seperti 95% atau 99%, membuat interval kepercayaan lebih lebar sehingga tim harus lebih berhati-hati dan membutuhkan bukti yang lebih kuat sebelum menyimpulkan bahwa suatu perubahan benar-benar efektif.
Secara umum, confidence level 95% sering digunakan karena memberikan keseimbangan antara kecepatan dan tingkat keyakinan, sementara 99% biasanya dipilih untuk keputusan besar yang berisiko tinggi. Artinya, semakin tinggi confidence level, semakin konservatif dan hati-hati keputusan yang diambil dalam eksperimen produk.
Case Study 4
Precision Comparison (Z-Test vs t-Test): Two data teams measure API latency (in milliseconds) under different conditions.
\[\begin{eqnarray*} \text{Team A:} \\ n &=& 36 \quad \text{(sample size)} \\ \bar{x} &=& 210 \quad \text{(sample mean)} \\ \sigma &=& 24 \quad \text{(known population standard deviation)} \\[6pt] \text{Team B:} \\ n &=& 36 \quad \text{(sample size)} \\ \bar{x} &=& 210 \quad \text{(sample mean)} \\ s &=& 24 \quad \text{(sample standard deviation)} \end{eqnarray*}\]
Tasks
- Identify the statistical test used by each team.
- Compute Confidence Intervals for 90%, 95%, and 99%.
- Create a visualization comparing all intervals.
- Explain why the interval widths differ, even with similar data.
Identify
Team A Team A menggunakan uji Z (Z-Test) dalam analisis latensi API. Hal ini dikarenakan standar deviasi populasi (σ) diketahui, yaitu sebesar 24 milidetik. Selain itu, ukuran sampel yang digunakan cukup besar (n = 36), sehingga asumsi distribusi normal dapat diterapkan dengan baik. Dengan diketahui-nya standar deviasi populasi, uji Z menjadi metode yang paling tepat karena mampu memberikan estimasi interval kepercayaan dengan tingkat ketidakpastian yang lebih rendah.
Team B Team B menggunakan uji t (t-Test) karena standar deviasi populasi tidak diketahui dan hanya tersedia standar deviasi sampel (s = 24). Meskipun ukuran sampel yang digunakan sama besar dengan Team A (n = 36), ketidaktahuan terhadap variasi populasi menyebabkan adanya ketidakpastian tambahan dalam perhitungan. Oleh karena itu, uji t digunakan karena dirancang khusus untuk kondisi ketika standar deviasi populasi harus diestimasi dari data sampel.
Confidence Interval
TEAM A
Diketahui:
\[ n = 36,\quad \bar{x} = 210,\quad \sigma = 24 \]
1. 90% = 0.9
- Menentukan nilai kritis Z
\[ \alpha = 1 - 0{.}9 = 0{.}1 \]
\[ \frac{\alpha}{2} = \frac{0{.}1}{2} = 0{.}05 \]
\[ Z_{0{.}05} = 1{.}64 \]
- Margin of Error
\[ E = Z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}} \]
\[ E = 1{.}64 \times \frac{24}{\sqrt{36}} \]
\[ E = 6{.}56 \]
- Confidence Interval
\[ CI = \bar{x} \pm E \]
\[ CI = 210 \pm 6{.}56 \]
\[ CI = (216{.}56,\; 203{.}44) \]
2. 95% = 0.95
- Menentukan nilai kritis Z
\[ \alpha = 1 - 0{.}95 = 0{.}05 \]
\[ \frac{\alpha}{2} = \frac{0{.}05}{2} = 0{.}025 \]
\[ Z_{0{.}025} = 1{.}96 \]
- Margin of Error
\[ E = Z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}} \]
\[ E = 1{.}96 \times \frac{24}{\sqrt{36}} \]
\[ E = 7{.}84 \]
- Confidence Interval
\[ CI = \bar{x} \pm E \]
\[ CI = 210 \pm 7{.}84 \]
\[ CI = (217{.}84,\; 202{.}16) \]
3. 99% = 0.99
- Menentukan nilai kritis Z
\[ \alpha = 1 - 0{.}99 = 0{.}01 \]
\[ \frac{\alpha}{2} = \frac{0{.}01}{2} = 0{.}005 \]
\[ Z_{0{.}005} = 2{.}57 \]
- Margin of Error
\[ E = Z_{\alpha/2} \times \frac{\sigma}{\sqrt{n}} \]
\[ E = 2{.}57 \times \frac{24}{\sqrt{36}} \]
\[ E = 10{.}28 \]
- Confidence Interval
\[ CI = \bar{x} \pm E \]
\[ CI = 210 \pm 10{.}28 \]
\[ CI = (211{.}28,\; 208{.}72) \]
TEAM B
Diketahui statistik sampel sebagai berikut:
\[ n = 36,\quad \bar{x} = 210,\quad s = 24 \]
Derajat kebebasan:
\[ df = n - 1 = 35 \] 1. 90%
- Menentukan Nilai Kritis t
\[ \alpha = 1 - 0.90 = 0.10 \]
\[ \frac{\alpha}{2} = 0.05 \]
\[ t_{0.95,\,35} = 1.690 \]
- Standard Error
\[ SE = \frac{s}{\sqrt{n}} = \frac{24}{\sqrt{36}} = 4 \]
- Margin of Error
\[ ME = t_{\alpha/2,\,df} \times SE \]
\[ ME = 1.690 \times 4 = 6.758 \]
- Confidence Interval
\[ CI = \bar{x} \pm ME \]
\[ CI = 210 \pm 6.758 \]
\[ CI_{90\%} = (203.242,\; 216.758) \]
2. 95%
- Menentukan Nilai Kritis t
\[ \alpha = 1 - 0.95 = 0.05 \]
\[ \frac{\alpha}{2} = 0.025 \]
\[ t_{0.975,\,35} = 2.03 \]
- Standard Error
\[ SE = \frac{24}{\sqrt{36}} = 4 \]
- Margin of Error
\[ ME = 2.03 \times 4 = 8.12 \]
- Confidence Interval
\[ CI = 210 \pm 8.12 \]
\[ CI_{95\%} = (201.88,\; 218.12) \] 1. 99%
- Menentukan Nilai Kritis t
\[ \alpha = 1 - 0.99 = 0.01 \]
\[ \frac{\alpha}{2} = 0.005 \]
\[ t_{0.995,\,35} = 2.724 \]
- Standard Error
\[ SE = \frac{24}{\sqrt{36}} = 4 \]
- Margin of Error
\[ ME = 2.724 \times 4 = 10.895 \]
- Confidence Interval
\[ CI = 210 \pm 10.895 \]
\[ CI_{99\%} = (199.105,\; 220.895) \]
Visualization
TEAM A
# Load library yang diperlukan
library(ggplot2)
# Parameter
mean_val <- 210
sd_val <- 24 / sqrt(36) # Standard error = σ/√n = 24/6 = 4
n <- 36
# Data Confidence Intervals
ci_data <- data.frame(
level = c("90%", "95%", "99%"),
lower = c(203.44, 202.16, 199.72),
upper = c(216.56, 217.84, 220.28),
z_score = c(1.64, 1.96, 2.57),
color = c("#60a5fa", "#2563eb", "#1e3a8a")
)
# Buat data untuk kurva normal
x_vals <- seq(195, 225, length.out = 500)
y_vals <- dnorm(x_vals, mean = mean_val, sd = sd_val)
# Create plot
ggplot() +
# Area untuk CI 99% (terluar)
geom_area(data = data.frame(x = x_vals[x_vals >= 199.72 & x_vals <= 220.28],
y = dnorm(x_vals[x_vals >= 199.72 & x_vals <= 220.28],
mean = mean_val, sd = sd_val)),
aes(x = x, y = y), fill = "#1e3a8a", alpha = 0.2) +
# Area untuk CI 95%
geom_area(data = data.frame(x = x_vals[x_vals >= 202.16 & x_vals <= 217.84],
y = dnorm(x_vals[x_vals >= 202.16 & x_vals <= 217.84],
mean = mean_val, sd = sd_val)),
aes(x = x, y = y), fill = "#2563eb", alpha = 0.3) +
# Area untuk CI 90% (terdalam)
geom_area(data = data.frame(x = x_vals[x_vals >= 203.44 & x_vals <= 216.56],
y = dnorm(x_vals[x_vals >= 203.44 & x_vals <= 216.56],
mean = mean_val, sd = sd_val)),
aes(x = x, y = y), fill = "#60a5fa", alpha = 0.4) +
# Kurva lonceng utama
geom_line(data = data.frame(x = x_vals, y = y_vals),
aes(x = x, y = y), color = "#0f172a", linewidth = 1.5) +
# Garis vertikal untuk batas CI 99%
geom_vline(xintercept = c(199.72, 220.28),
color = "#1e3a8a", linetype = "dashed", linewidth = 1) +
# Garis vertikal untuk batas CI 95%
geom_vline(xintercept = c(202.16, 217.84),
color = "#2563eb", linetype = "dashed", linewidth = 1) +
# Garis vertikal untuk batas CI 90%
geom_vline(xintercept = c(203.44, 216.56),
color = "#60a5fa", linetype = "dashed", linewidth = 1) +
# Garis vertikal untuk mean
geom_vline(xintercept = mean_val,
color = "#dc2626", linetype = "solid", linewidth = 1.5) +
# Label untuk mean
annotate("text", x = mean_val, y = max(y_vals) * 1.05,
label = paste0("μ = ", mean_val),
size = 5, fontface = "bold", color = "#dc2626") +
# Label untuk CI 90%
annotate("text", x = 203.44, y = max(y_vals) * 0.85,
label = "203.44", size = 3.5, hjust = 1.2, color = "#60a5fa", fontface = "bold") +
annotate("text", x = 216.56, y = max(y_vals) * 0.85,
label = "216.56", size = 3.5, hjust = -0.2, color = "#60a5fa", fontface = "bold") +
# Label untuk CI 95%
annotate("text", x = 202.16, y = max(y_vals) * 0.70,
label = "202.16", size = 3.5, hjust = 1.2, color = "#2563eb", fontface = "bold") +
annotate("text", x = 217.84, y = max(y_vals) * 0.70,
label = "217.84", size = 3.5, hjust = -0.2, color = "#2563eb", fontface = "bold") +
# Label untuk CI 99%
annotate("text", x = 199.72, y = max(y_vals) * 0.55,
label = "199.72", size = 3.5, hjust = 1.2, color = "#1e3a8a", fontface = "bold") +
annotate("text", x = 220.28, y = max(y_vals) * 0.55,
label = "220.28", size = 3.5, hjust = -0.2, color = "#1e3a8a", fontface = "bold") +
# Legend manual menggunakan geom_point (invisible points untuk legend)
geom_point(data = data.frame(x = c(225, 225, 225),
y = c(0, 0, 0),
CI = factor(c("99% CI [199.72, 220.28]",
"95% CI [202.16, 217.84]",
"90% CI [203.44, 216.56]"),
levels = c("99% CI [199.72, 220.28]",
"95% CI [202.16, 217.84]",
"90% CI [203.44, 216.56]"))),
aes(x = x, y = y, fill = CI),
shape = 22, size = 8, alpha = 0.6, color = "black") +
scale_fill_manual(
name = "Confidence Intervals",
values = c("99% CI [199.72, 220.28]" = "#1e3a8a",
"95% CI [202.16, 217.84]" = "#2563eb",
"90% CI [203.44, 216.56]" = "#60a5fa"),
guide = guide_legend(override.aes = list(size = 6, alpha = 0.6))
) +
# Labels dan tema
labs(title = "Distribusi Normal dengan Confidence Intervals",
subtitle = "n = 36, σ = 24, x̄ = 210, SE = 4",
x = "Nilai",
y = "Densitas Probabilitas") +
scale_x_continuous(breaks = seq(195, 225, 5)) +
theme_minimal() +
theme(
plot.title = element_text(size = 18, face = "bold", color = "#0f172a", hjust = 0.5),
plot.subtitle = element_text(size = 12, color = "#475569", hjust = 0.5),
axis.title = element_text(size = 12, face = "bold", color = "#0f172a"),
axis.text = element_text(size = 11, color = "#334155"),
panel.grid.major = element_line(color = "#e2e8f0", linewidth = 0.3),
panel.grid.minor = element_blank(),
legend.position = "bottom",
legend.title = element_text(size = 12, face = "bold", color = "#0f172a"),
legend.text = element_text(size = 10, color = "#334155"),
legend.background = element_rect(fill = "#f1f5f9", color = "#cbd5e1", linewidth = 0.5),
legend.key = element_rect(fill = "white", color = NA),
legend.key.size = unit(1, "cm"),
legend.margin = margin(10, 10, 10, 10),
plot.background = element_rect(fill = "#f8fafc", color = NA),
panel.background = element_rect(fill = "white", color = NA),
plot.margin = margin(20, 20, 20, 20)
)TEAM B
library(ggplot2)
# Data - T-Distribution
mean_val <- 210
s <- 24
n <- 36
df <- 35
se <- s / sqrt(n) # Standard Error = 4
# Confidence Intervals (T-Distribution)
ci_90 <- c(lower = 203.242, upper = 216.758, t = 1.690)
ci_95 <- c(lower = 201.88, upper = 218.12, t = 2.03)
ci_99 <- c(lower = 199.105, upper = 220.895, t = 2.724)
# Data frame untuk t-distribution curve
x_range <- seq(195, 225, length.out = 500)
y_t <- dt((x_range - mean_val) / se, df = df) / se
df_curve <- data.frame(x = x_range, y = y_t)
# Warna tema biru
color_99 <- "#1e3a8a" # Dark blue
color_95 <- "#3b82f6" # Medium blue
color_90 <- "#60a5fa" # Light blue
color_mean <- "#dc2626" # Red for mean
# Plot
p <- ggplot() +
# Shaded areas (dari terluar ke terdalam)
geom_ribbon(data = df_curve,
aes(x = x, ymin = 0,
ymax = ifelse(x >= ci_99["lower"] & x <= ci_99["upper"], y, 0)),
fill = color_99, alpha = 0.2) +
geom_ribbon(data = df_curve,
aes(x = x, ymin = 0,
ymax = ifelse(x >= ci_95["lower"] & x <= ci_95["upper"], y, 0)),
fill = color_95, alpha = 0.3) +
geom_ribbon(data = df_curve,
aes(x = x, ymin = 0,
ymax = ifelse(x >= ci_90["lower"] & x <= ci_90["upper"], y, 0)),
fill = color_90, alpha = 0.4) +
# T-distribution curve
geom_line(data = df_curve, aes(x = x, y = y),
color = "#0f172a", size = 1.5) +
# Vertical lines untuk CI bounds
# 99% CI
geom_vline(xintercept = ci_99["lower"],
linetype = "dashed", color = color_99, size = 1) +
geom_vline(xintercept = ci_99["upper"],
linetype = "dashed", color = color_99, size = 1) +
# 95% CI
geom_vline(xintercept = ci_95["lower"],
linetype = "dashed", color = color_95, size = 1) +
geom_vline(xintercept = ci_95["upper"],
linetype = "dashed", color = color_95, size = 1) +
# 90% CI
geom_vline(xintercept = ci_90["lower"],
linetype = "dashed", color = color_90, size = 1) +
geom_vline(xintercept = ci_90["upper"],
linetype = "dashed", color = color_90, size = 1) +
# Mean line
geom_vline(xintercept = mean_val,
color = color_mean, size = 1.5) +
# Labels untuk mean
annotate("text", x = mean_val, y = max(y_t) * 1.08,
label = paste0("x̄ = ", mean_val),
color = color_mean, fontface = "bold", size = 5) +
# Labels untuk batas CI 90%
annotate("text", x = ci_90["lower"], y = max(y_t) * 0.90,
label = "203.24", size = 3.5, hjust = 1.2,
color = color_90, fontface = "bold") +
annotate("text", x = ci_90["upper"], y = max(y_t) * 0.90,
label = "216.76", size = 3.5, hjust = -0.2,
color = color_90, fontface = "bold") +
# Labels untuk batas CI 95%
annotate("text", x = ci_95["lower"], y = max(y_t) * 0.75,
label = "201.88", size = 3.5, hjust = 1.2,
color = color_95, fontface = "bold") +
annotate("text", x = ci_95["upper"], y = max(y_t) * 0.75,
label = "218.12", size = 3.5, hjust = -0.2,
color = color_95, fontface = "bold") +
# Labels untuk batas CI 99%
annotate("text", x = ci_99["lower"], y = max(y_t) * 0.60,
label = "199.11", size = 3.5, hjust = 1.2,
color = color_99, fontface = "bold") +
annotate("text", x = ci_99["upper"], y = max(y_t) * 0.60,
label = "220.90", size = 3.5, hjust = -0.2,
color = color_99, fontface = "bold") +
# Dummy geom untuk legend
geom_point(aes(x = 230, y = 0, color = "99% CI [199.11, 220.90]"),
size = 0, alpha = 0) +
geom_point(aes(x = 230, y = 0, color = "95% CI [201.88, 218.12]"),
size = 0, alpha = 0) +
geom_point(aes(x = 230, y = 0, color = "90% CI [203.24, 216.76]"),
size = 0, alpha = 0) +
# Scale untuk legend
scale_color_manual(
name = "Confidence Intervals (T-Distribution)",
values = c("99% CI [199.11, 220.90]" = color_99,
"95% CI [201.88, 218.12]" = color_95,
"90% CI [203.24, 216.76]" = color_90),
guide = guide_legend(
override.aes = list(
size = 8,
shape = 15,
alpha = 0.6
)
)
) +
# Themes and labels
labs(
title = "T-Distribution dengan Confidence Intervals",
subtitle = paste0("n = ", n, ", s = ", s, ", x̄ = ", mean_val,
", SE = ", se, ", df = ", df),
x = "Nilai",
y = "Densitas Probabilitas"
) +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(face = "bold", size = 18, hjust = 0.5,
color = "#0f172a"),
plot.subtitle = element_text(hjust = 0.5, color = "#475569", size = 11),
axis.title = element_text(face = "bold", size = 12, color = "#0f172a"),
axis.text = element_text(size = 11, color = "#334155"),
panel.grid.major = element_line(color = "#e2e8f0", size = 0.3),
panel.grid.minor = element_blank(),
panel.background = element_rect(fill = "white", color = NA),
plot.background = element_rect(fill = "#f8fafc", color = NA),
legend.position = "bottom",
legend.title = element_text(face = "bold", size = 11, color = "#0f172a"),
legend.text = element_text(size = 10, color = "#475569"),
legend.background = element_rect(fill = "white", color = "#cbd5e1", size = 0.5),
legend.key = element_rect(fill = "white", color = NA),
legend.key.size = unit(1, "cm"),
legend.margin = margin(10, 10, 10, 10),
legend.box.background = element_rect(fill = "white", color = "#cbd5e1")
) +
scale_x_continuous(breaks = seq(195, 225, by = 5)) +
coord_cartesian(xlim = c(195, 225), ylim = c(0, max(y_t) * 1.12))
# Tampilkan plot
print(p)Conclusion
Perbedaan lebar confidence interval antara Team A dan Team B sebenarnya wajar, meskipun data yang dipakai hampir sama. Bedanya ada di cara hitung dan asumsi yang digunakan. Team A pakai metode Z, karena dianggap simpangan baku populasi sudah diketahui. Dengan asumsi ini, ketidakpastian jadi lebih kecil, jadi batas bawah dan atas intervalnya juga lebih sempit.
Sementara itu, Team B pakai metode t karena simpangan baku populasi tidak diketahui dan harus diperkirakan dari data sampel. Nah, karena ini cuma perkiraan, tingkat ketidakpastiannya jadi lebih besar. Akibatnya, nilai pengali (nilai t) lebih besar dibanding Z, sehingga margin of error ikut membesar dan intervalnya jadi lebih lebar.
Selain itu, makin tinggi tingkat kepercayaan (dari 90% ke 99%), interval kepercayaan memang pasti makin melebar. Ini karena kita ingin lebih “yakin” bahwa nilai rata-rata sebenarnya ada di dalam interval tersebut, jadi rentangnya harus dibuat lebih luas.
Case Study 5
One-Sided Confidence Interval: A Software as a Service (SaaS) company wants to ensure that at least 70% of weekly active users utilize a premium feature.
From the experiment:
\[ \begin{eqnarray*} n &=& 250 \quad \text{(total users)} \\ x &=& 185 \quad \text{(active premium users)} \end{eqnarray*} \]
Management is only interested in the lower bound of the estimate.
Tasks:
- Identify the type of Confidence Interval and the appropriate test.
- Compute the one-sided lower Confidence Interval at:
- \(90\%\)
- \(95\%\)
- \(99\%\)
- Visualize the lower bounds for all confidence levels.
- Determine whether the 70% target is statistically satisfied.
Identify
Jenis confidence interval yang digunakan pada studi kasus ini adalah confidence interval satu sisi (lower bound) untuk proporsi populasi, karena perusahaan hanya tertarik untuk mengetahui batas bawah dari estimasi proporsi pengguna aktif yang menggunakan fitur premium. Data yang dianalisis berupa proporsi, yaitu perbandingan antara jumlah pengguna yang menggunakan fitur premium dengan total pengguna dalam sampel. Mengingat ukuran sampel cukup besar (n = 250), maka pendekatan distribusi normal dapat digunakan. Oleh karena itu, metode yang tepat adalah one-sample Z confidence interval for proportion.
Confidence Interval
Diketahui:Estimasi proporsi sampel: \[ \hat{p} = \frac{x}{n} = \frac{185}{250} = 0{,}74 \]
Standar error: \[ SE = \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}} = \sqrt{\frac{0{,}74(0{,}26)}{250}} \approx 0{,}0277 \]
Rumus satu sisi: \[ \text{Lower Bound} = \hat{p} - z_{\alpha} \times SE \]
1. 90%
\[ z_{0.90} = 1.282 \]
\[ LB_{90\%} = 0.74 - (1.282)(0.0277) = 0.704 \]
\[ \text{CI } 90\% \text{ (Lower Bound): } \quad p \ge 0.704 \]
2. 95%
\[ z_{0.95} = 1.645 \]
\[ LB_{95\%} = 0.74 - (1.645)(0.0277) = 0.694 \]
\[ \text{CI } 95\% \text{ (Lower Bound): } \quad p \ge 0.694 \] 2. 99% \[ z_{0.99} = 2.326 \]
\[ LB_{99\%} = 0.74 - (2.326)(0.0277) = 0.675 \]
\[ \text{CI } 99\% \text{ (Lower Bound): } \quad p \ge 0.675 \]
Visualization
library(ggplot2)
# Parameter
p_hat <- 0.74
se <- 0.0277
# Lower Bounds untuk setiap confidence level
ci_90 <- c(lower = 0.704, z = 1.282)
ci_95 <- c(lower = 0.694, z = 1.645)
ci_99 <- c(lower = 0.675, z = 2.326)
# Data frame untuk plotting
x_range <- seq(0.65, 0.80, length.out = 500)
y_norm <- dnorm(x_range, mean = p_hat, sd = se)
df_curve <- data.frame(x = x_range, y = y_norm)
# Warna tema biru
color_99 <- "#1e3a8a" # Dark blue
color_95 <- "#3b82f6" # Medium blue
color_90 <- "#60a5fa" # Light blue
color_p <- "#dc2626" # Red for p-hat
# Plot
p <- ggplot() +
# Shaded areas untuk lower bounds (dari terluar ke terdalam)
# 99% CI (area dari 0.675 ke kanan)
geom_ribbon(data = df_curve,
aes(x = x, ymin = 0,
ymax = ifelse(x >= ci_99["lower"], y, 0)),
fill = color_99, alpha = 0.2) +
# 95% CI (area dari 0.694 ke kanan)
geom_ribbon(data = df_curve,
aes(x = x, ymin = 0,
ymax = ifelse(x >= ci_95["lower"], y, 0)),
fill = color_95, alpha = 0.3) +
# 90% CI (area dari 0.704 ke kanan)
geom_ribbon(data = df_curve,
aes(x = x, ymin = 0,
ymax = ifelse(x >= ci_90["lower"], y, 0)),
fill = color_90, alpha = 0.4) +
# Normal curve
geom_line(data = df_curve, aes(x = x, y = y),
color = "#0f172a", size = 1.5) +
# Vertical lines untuk lower bounds
# 99% Lower Bound
geom_vline(xintercept = ci_99["lower"],
linetype = "dashed", color = color_99, size = 1.2) +
# 95% Lower Bound
geom_vline(xintercept = ci_95["lower"],
linetype = "dashed", color = color_95, size = 1.2) +
# 90% Lower Bound
geom_vline(xintercept = ci_90["lower"],
linetype = "dashed", color = color_90, size = 1.2) +
# p-hat line
geom_vline(xintercept = p_hat,
color = color_p, size = 1.5) +
# Label untuk p-hat
annotate("text", x = p_hat, y = max(y_norm) * 1.08,
label = paste0("p̂ = ", p_hat),
color = color_p, fontface = "bold", size = 5) +
# Labels untuk lower bounds
annotate("text", x = ci_90["lower"], y = max(y_norm) * 0.85,
label = "0.704", size = 3.8, hjust = 1.3,
color = color_90, fontface = "bold") +
annotate("segment", x = ci_90["lower"], xend = p_hat,
y = max(y_norm) * 0.78, yend = max(y_norm) * 0.78,
arrow = arrow(length = unit(0.3, "cm"), ends = "both"),
color = color_90, size = 0.8) +
annotate("text", x = (ci_90["lower"] + p_hat) / 2, y = max(y_norm) * 0.82,
label = "90% CI", size = 3.5, color = color_90, fontface = "bold") +
annotate("text", x = ci_95["lower"], y = max(y_norm) * 0.65,
label = "0.694", size = 3.8, hjust = 1.3,
color = color_95, fontface = "bold") +
annotate("segment", x = ci_95["lower"], xend = p_hat,
y = max(y_norm) * 0.58, yend = max(y_norm) * 0.58,
arrow = arrow(length = unit(0.3, "cm"), ends = "both"),
color = color_95, size = 0.8) +
annotate("text", x = (ci_95["lower"] + p_hat) / 2, y = max(y_norm) * 0.62,
label = "95% CI", size = 3.5, color = color_95, fontface = "bold") +
annotate("text", x = ci_99["lower"], y = max(y_norm) * 0.45,
label = "0.675", size = 3.8, hjust = 1.3,
color = color_99, fontface = "bold") +
annotate("segment", x = ci_99["lower"], xend = p_hat,
y = max(y_norm) * 0.38, yend = max(y_norm) * 0.38,
arrow = arrow(length = unit(0.3, "cm"), ends = "both"),
color = color_99, size = 0.8) +
annotate("text", x = (ci_99["lower"] + p_hat) / 2, y = max(y_norm) * 0.42,
label = "99% CI", size = 3.5, color = color_99, fontface = "bold") +
# Dummy geom untuk legend
geom_point(aes(x = 0.82, y = 0, color = "99% CI: p ≥ 0.675"),
size = 0, alpha = 0) +
geom_point(aes(x = 0.82, y = 0, color = "95% CI: p ≥ 0.694"),
size = 0, alpha = 0) +
geom_point(aes(x = 0.82, y = 0, color = "90% CI: p ≥ 0.704"),
size = 0, alpha = 0) +
# Scale untuk legend
scale_color_manual(
name = "Lower Bound Confidence Intervals",
values = c("99% CI: p ≥ 0.675" = color_99,
"95% CI: p ≥ 0.694" = color_95,
"90% CI: p ≥ 0.704" = color_90),
guide = guide_legend(
override.aes = list(
size = 8,
shape = 15,
alpha = 0.6
)
)
) +
# Themes and labels
labs(
title = "Lower Bound Confidence Intervals",
subtitle = paste0("p̂ = ", p_hat, ", SE = ", se),
x = "Proporsi (p)",
y = "Densitas Probabilitas"
) +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(face = "bold", size = 18, hjust = 0.5,
color = "#0f172a"),
plot.subtitle = element_text(hjust = 0.5, color = "#475569", size = 12),
axis.title = element_text(face = "bold", size = 12, color = "#0f172a"),
axis.text = element_text(size = 11, color = "#334155"),
panel.grid.major = element_line(color = "#e2e8f0", size = 0.3),
panel.grid.minor = element_blank(),
panel.background = element_rect(fill = "white", color = NA),
plot.background = element_rect(fill = "#f8fafc", color = NA),
legend.position = "bottom",
legend.title = element_text(face = "bold", size = 11, color = "#0f172a"),
legend.text = element_text(size = 10, color = "#475569"),
legend.background = element_rect(fill = "white", color = "#cbd5e1", size = 0.5),
legend.key = element_rect(fill = "white", color = NA),
legend.key.size = unit(1, "cm"),
legend.margin = margin(10, 10, 10, 10),
legend.box.background = element_rect(fill = "white", color = "#cbd5e1")
) +
scale_x_continuous(breaks = seq(0.65, 0.80, by = 0.02),
labels = scales::number_format(accuracy = 0.001)) +
coord_cartesian(xlim = c(0.65, 0.80), ylim = c(0, max(y_norm) * 1.12))
# Tampilkan plot
print(p)Conclusion
| Confidence.Level | Lower.Bound | Target | Keputusan |
|---|---|---|---|
| 90% | 0.704 | 0.7 | Terpenuhi |
| 95% | 0.694 | 0.7 | Tidak Terpenuhi |
| 99% | 0.675 | 0.7 | Tidak Terpenuhi |
Berdasarkan hasil confidence interval satu sisi (lower bound), diperoleh bahwa pada tingkat kepercayaan 90%, batas bawah confidence interval berada di atas target 70%. Hal ini menunjukkan bahwa dengan confidence level 90%, perusahaan dapat menyimpulkan secara statistik bahwa minimal 70% pengguna aktif mingguan menggunakan fitur premium.
Namun, pada tingkat kepercayaan 95% dan 99%, batas bawah confidence interval berada di bawah 70%, sehingga target tersebut tidak dapat dinyatakan terpenuhi secara statistik. Oleh karena itu, target penggunaan fitur premium sebesar 70% hanya terpenuhi pada confidence level 90%.
Reference
[1] Siregar, B. (n.d.). Introduction to statistics: Chapter 8: Confidence Interval. dsciencelabs. https://bookdown.org/dsciencelabs/intro_statistics/08-Confidence_Interval.html
[2] Devore, J. L. (2016). Probability and statistics for engineering and the sciences (9th ed.). Cengage Learning. https://www.cengage.com/c/etextbook-probability-and-statistics-for-engineering-and-the-sciences-enhanced-edition-9e-devore/9780357539156/
[3] Walpole, R. E., Myers, R. H., Myers, S. L., & Ye, K. (2012). Probability and statistics for engineers and scientists (9th ed.). Pearson Education.
[4] Montgomery, D. C., & Runger, G. C. (2014). Applied statistics and probability for engineers (6th ed.). John Wiley & Sons.