ULIN NIKMAH (52250042)
INSTITUT TEKNOLOGI SAINS BANDUNG
Mata Kuliah: Statistika Dasar Program Studi: Sains Data Dosen Pengampu: Bakti Siregar, M.SC., CDS.
Confidence Interval for Mean, \(\sigma\) Known: An e-commerce platform wants to estimate the average number of daily transactions per user after launching a new feature. Based on large-scale historical data, the population standard deviation is known.
\[ \begin{eqnarray*} \sigma &=& 3.2 \quad \text{(population standard deviation)} \\ n &=& 100 \quad \text{(sample size)} \\ \bar{x} &=& 12.6 \quad \text{(sample mean)} \end{eqnarray*} \]
Tasks
Uji statistik yang digunakan dalam analisis ini adalah Uji Z (Z-Confidence Interval untuk Mean), dengan alasan sebagai berikut:
Oleh karena itu, interval kepercayaan berbasis distribusi normal standar (Z) merupakan metode yang paling tepat digunakan dalam kasus ini.
Interval kepercayaan untuk rata-rata populasi dengan simpangan baku (σ) diketahui dihitung menggunakan rumus sebagai berikut:
\[ CI = \bar{x} \pm Z_{\alpha/2} \left( \frac{\sigma}{\sqrt{n}} \right) \]
dengan keterangan:
Data yang digunakan dalam perhitungan interval kepercayaan ini adalah sebagai berikut:
\[ \begin{aligned} \sigma &= 3.2 \quad \text{(population standard deviation)} \\ n &= 100 \quad \text{(sample size)} \\ \bar{x} &= 12.6 \quad \text{(sample mean)} \end{aligned} \]
Nilai Kritis Z
Tingkat signifikansi untuk CI 90%:
\[ \alpha = 0.10 \Rightarrow \frac{\alpha}{2} = 0.05 \]
Berdasarkan tabel distribusi normal standar:
\[ Z_{0.05} = 1.645 \]
Standard Error (SE)
\[ SE = \frac{\sigma}{\sqrt{n}} = \frac{3.2}{\sqrt{100}} = 0.32 \]
Margin of Error (ME)
\[ ME = Z_{\alpha/2} \times SE \]
\[ ME = 1.645 \times 0.32 = 0.5264 \]
Interval Kepercayaan 90%
\[ CI_{90\%} = \bar{x} \pm ME \]
\[ CI_{90\%} = 12.6 \pm 0.5264 \]
\[ CI_{90\%} = (12.0736, 13.1264) \approx (12.07, 13.13) \]
Nilai Kritis Z
Tingkat signifikansi untuk CI 95%:
\[ \alpha = 0.05 \Rightarrow \frac{\alpha}{2} = 0.025 \]
Berdasarkan tabel distribusi normal standar:
\[ Z_{0.025} = 1.96 \]
Standard Error (SE)
\[ SE = \frac{\sigma}{\sqrt{n}} = \frac{3.2}{\sqrt{100}} = 0.32 \]
Margin of Error (ME)
\[ ME = Z_{\alpha/2} \times SE \]
\[ ME = 1.96 \times 0.32 = 0.6272 \]
Interval Kepercayaan 95%
\[ CI_{95\%} = \bar{x} \pm ME \]
\[ CI_{95\%} = 12.6 \pm 0.6272 \]
\[ CI_{95\%} = (11.9728, 13.2272) \approx (11.97, 13.23) \]
Nilai Kritis Z
Tingkat signifikansi untuk CI 99%:
\[ \alpha = 0.01 \Rightarrow \frac{\alpha}{2} = 0.005 \]
Berdasarkan tabel distribusi normal standar:
\[ Z_{0.005} = 2.576 \]
Standard Error (SE)
\[ SE = \frac{\sigma}{\sqrt{n}} = \frac{3.2}{\sqrt{100}} = 0.32 \]
Margin of Error (ME)
\[ ME = Z_{\alpha/2} \times SE \]
\[ ME = 2.576 \times 0.32 = 0.8243 \]
Interval Kepercayaan 99%
\[ CI_{99\%} = \bar{x} \pm ME \]
\[ CI_{99\%} = 12.6 \pm 0.824 \]
\[ CI_{99\%} = (11.776, 13.424) \approx (11.78, 13.42) \]
| Tingkat Kepercayaan | Z α/2 | SE | Margin of Error (ME) | Interval Kepercayaan (CI) |
|---|---|---|---|---|
| 90% | 1.645 | 0.32 | 0.5264 | (12.07, 13.13) |
| 95% | 1.96 | 0.32 | 0.6272 | (11.97, 13.23) |
| 99% | 2.575 | 0.32 | 0.824 | (11.78, 13.42) |
library(plotly)
# =========================
# Parameter dasar
# =========================
mean_x <- 12.6
sigma <- 3.2
n <- 100
se <- sigma / sqrt(n)
# CI levels dengan warna
ci_levels <- data.frame(
CI = c("99%", "95%", "90%"),
t = c(3.106, 2.201, 1.796),
color = c("rgba(186,85,211,0.30)", # 99%
"rgba(100,149,237,0.35)", # 95%
"rgba(144,238,144,0.45)") # 90%
)
# Hitung Lower & Upper
ci_levels$Lower <- mean_x - ci_levels$t * se
ci_levels$Upper <- mean_x + ci_levels$t * se
# =========================
# Density curve
# =========================
x <- seq(mean_x - 4*se, mean_x + 4*se, length.out = 1000)
y <- dnorm(x, mean = mean_x, sd = se)
# =========================
# Plot interaktif
# =========================
p <- plot_ly()
# Kurva density
p <- p %>% add_lines(x = x, y = y, line = list(color="black"), name="Density",
hovertemplate = "X: %{x}<br>Density: %{y}<extra></extra>")
for(i in 1:nrow(ci_levels)){
idx <- x >= ci_levels$Lower[i] & x <= ci_levels$Upper[i]
p <- p %>% add_polygons(
x = c(x[idx], rev(x[idx])),
y = c(y[idx], rep(0, sum(idx))),
fillcolor = ci_levels$color[i],
line = list(color = "rgba(0,0,0,0)"),
name = paste("CI", ci_levels$CI[i]),
hovertemplate = paste0(
"<b>CI ", ci_levels$CI[i], "</b><br>",
"LCL = ", round(ci_levels$Lower[i],2), "<br>",
"UCL = ", round(ci_levels$Upper[i],2),
"<extra></extra>"
)
)
}
# Garis mean
p <- p %>% add_lines(
x = c(mean_x, mean_x), y = c(0, max(y)),
line = list(dash="dash", color="black"),
name = "Mean",
hovertemplate = paste0("<b>Mean</b><br>Value = ", mean_x, "<extra></extra>")
)
# Tambahkan label mean di atas garis
p <- p %>% add_annotations(
x = mean_x,
y = max(y) * 1.05, # sedikit di atas puncak density
text = paste0("Mean = ", mean_x),
showarrow = TRUE,
arrowhead = 2,
arrowsize = 1,
arrowcolor = "black",
font = list(color="black", size=12),
ax = 0,
ay = -30
)
# Layout
x_start <- floor(mean_x - 4*se)
x_end <- ceiling(mean_x + 4*se)
# =========================
# Layout
# =========================
p <- p %>% layout(
title = list(
text = "<b>Interval Kepercayaan</b>",
x = 0.5
),
margin = list(t = 100, b = 60),
xaxis = list(title = "Rata-rata Sampel"),
yaxis = list(title = "Density"),
hovermode = "x",
hoverlabel = list(
bgcolor = "white",
font = list(color = "black"),
align = "left"
),
legend = list(
orientation = "h",
x = 0.5,
xanchor = "center",
y = -0.2
)
)
p
Rata-rata transaksi harian per pengguna diperkirakan sekitar 12.6 transaksi. Interval kepercayaan menunjukkan rentang nilai rata-rata yang mungkin:
Semakin tinggi tingkat kepercayaan, interval semakin lebar. Artinya, kita lebih yakin rata-rata sebenarnya berada di dalam rentang tersebut, tetapi estimasi menjadi kurang presisi.
Dalam konteks bisnis:
Intinya: Interval kepercayaan memberikan gambaran realistis tentang rata-rata transaksi harian, bukan sekadar angka tunggal, sehingga strategi bisnis bisa lebih aman dan terukur.
Confidence Interval for Mean, \(\sigma\) Unknown: A UX Research team analyzes task completion time (in minutes) for a new mobile application. The data are collected from 12 users:
\[ 8.4,\; 7.9,\; 9.1,\; 8.7,\; 8.2,\; 9.0,\; 7.8,\; 8.5,\; 8.9,\; 8.1,\; 8.6,\; 8.3 \]
Tasks:
Uji statistik yang digunakan dalam analisis ini adalah Uji t (t-Confidence Interval untuk Mean).
Pemilihan uji ini didasarkan pada beberapa pertimbangan berikut:
Oleh karena itu, interval kepercayaan berbasis distribusi t-Student merupakan metode yang paling tepat digunakan dalam kasus ini.
Interval kepercayaan untuk rata-rata populasi dengan simpangan baku populasi (\(\sigma\)) tidak diketahui dihitung menggunakan rumus sebagai berikut:
\[ CI = \bar{x} \pm t_{\alpha/2,\, n-1} \left( \frac{s}{\sqrt{n}} \right) \]
dengan keterangan:
Data yang digunakan dalam perhitungan interval kepercayaan ini adalah sebagai berikut:
\[ 8.4,\; 7.9,\; 9.1,\; 8.7,\; 8.2,\; 9.0,\; 7.8,\; 8.5,\; 8.9,\; 8.1,\; 8.6,\; 8.3 \]
Jumlah sampel adalah:
\[ n = 12 \]
Rata-rata sampel dihitung menggunakan rumus:
\[ \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i \]
\[ \bar{x} = \frac{101.5}{12} = 8.4583 \]
Simpangan baku sampel dihitung menggunakan rumus:
\[ s = \sqrt{\frac{\sum (x_i - \bar{x})^2}{n - 1}} \]
Dengan rata-rata sampel:
\[ \bar{x} = 8.4583 \]
Perhitungan selisih setiap data terhadap rata-rata ditunjukkan pada tabel berikut:
| \(x_i\) | \(x_i - \bar{x}\) | \((x_i - \bar{x})^2\) |
|---|---|---|
| 8.4 | -0.0583 | 0.0034 |
| 7.9 | -0.5583 | 0.3117 |
| 9.1 | 0.6417 | 0.4118 |
| 8.7 | 0.2417 | 0.0584 |
| 8.2 | -0.2583 | 0.0667 |
| 9.0 | 0.5417 | 0.2934 |
| 7.8 | -0.6583 | 0.4334 |
| 8.5 | 0.0417 | 0.0017 |
| 8.9 | 0.4417 | 0.1951 |
| 8.1 | -0.3583 | 0.1284 |
| 8.6 | 0.1417 | 0.0201 |
| 8.3 | -0.1583 | 0.0251 |
Jumlah kuadrat selisih diperoleh sebagai berikut:
\[ \sum (x_i - \bar{x})^2 = 1.949 \]
Sehingga simpangan baku sampel adalah:
\[ s = \sqrt{\frac{1.949}{11}} = \sqrt{0.1772} = 0.421 \approx 0.42 \]
Nilai Kritis t
Tingkat signifikansi untuk CI 90%:
\[ \alpha = 0.10 \Rightarrow \frac{\alpha}{2} = 0.05 \]
Derajat bebas:
\[ df = n - 1 = 11 \]
Berdasarkan tabel distribusi t:
\[ t_{0.05,11} = 1.796 \]
Standard Error (SE)
\[ SE = \frac{s}{\sqrt{n}} = \frac{0.42}{\sqrt{12}} = 0.1212 \]
Margin of Error (ME)
\[ ME = t_{\alpha/2} \times SE \]
\[ ME = 1.796 \times 0.1212 = 0.2176 \]
Interval Kepercayaan 90%
\[ CI_{90\%} = \bar{x} \pm ME \]
\[ CI_{90\%} = 8.4583 \pm 0.2176 \]
\[ CI_{90\%} =(8.2407,8.6759)≈(8.24,\; 8.68) \]
Nilai Kritis t
\[ \alpha = 0.05 \Rightarrow \frac{\alpha}{2} = 0.025 \]
\[ t_{0.025,11} = 2.201 \]
Standard Error (SE)
\[ SE = \frac{s}{\sqrt{n}} = \frac{0.42}{\sqrt{12}} = 0.1212 \]
Margin of Error (ME)
\[ ME = 2.201 \times 0.1212 = 0.2667 \]
Interval Kepercayaan 95%
\[ CI_{95\%} = 8.4583 \pm 0.2667 \]
\[ CI_{95\%} = =(8.1916,8.725)≈(8.19,\; 8.73) \]
Nilai Kritis t
\[ \alpha = 0.01 \Rightarrow \frac{\alpha}{2} = 0.005 \]
\[ t_{0.005,11} = 3.106 \]
Standard Error (SE)
\[ SE = \frac{s}{\sqrt{n}} = \frac{0.42}{\sqrt{12}} = 0.1212 \]
Margin of Error (ME)
\[ ME = 3.106 \times 0.1212 = 0.3764 \]
Interval Kepercayaan 99%
\[ CI_{99\%} = 8.4583 \pm 0.3764 \]
\[ CI_{99\%} = =(8.0819,8.8347)≈(8.08,\; 8.83) \]
library(plotly)
# =========================
# Parameter dasar
# =========================
mean_x <- 8.4583
se <- 0.1212
# CI levels (t-test, df = 11)
ci_levels <- data.frame(
CI = c("99%", "95%", "90%"),
t = c(3.106, 2.201, 1.796),
color = c("rgba(186,85,211,0.30)", # 99%
"rgba(100,149,237,0.35)", # 95%
"rgba(144,238,144,0.45)") # 90%
)
# Hitung batas CI
ci_levels$Lower <- mean_x - ci_levels$t * se
ci_levels$Upper <- mean_x + ci_levels$t * se
# =========================
# Density curve
# =========================
x <- seq(mean_x - 4*se, mean_x + 4*se, length.out = 1000)
y <- dnorm(x, mean = mean_x, sd = se)
# =========================
# Plot interaktif
# =========================
p <- plot_ly()
# Kurva density
p <- p %>% add_lines(
x = x, y = y,
line = list(color = "black"),
name = "Density",
hovertemplate = "X: %{x}<br>Density: %{y}<extra></extra>"
)
# Area CI
for(i in 1:nrow(ci_levels)){
idx <- x >= ci_levels$Lower[i] & x <= ci_levels$Upper[i]
p <- p %>% add_polygons(
x = c(x[idx], rev(x[idx])),
y = c(y[idx], rep(0, sum(idx))),
fillcolor = ci_levels$color[i],
line = list(color = "rgba(0,0,0,0)"),
name = paste("CI", ci_levels$CI[i]),
hovertemplate = paste0(
"<b>CI ", ci_levels$CI[i], "</b><br>",
"LCL = ", round(ci_levels$Lower[i],4), "<br>",
"UCL = ", round(ci_levels$Upper[i],4),
"<extra></extra>"
)
)
}
# Garis mean
p <- p %>% add_lines(
x = c(mean_x, mean_x),
y = c(0, max(y)),
line = list(dash = "dash", color = "black"),
name = "Mean",
hovertemplate = paste0(
"<b>Mean</b><br>Value = ", round(mean_x,4),
"<extra></extra>"
)
)
# Label mean
p <- p %>% add_annotations(
x = mean_x,
y = max(y) * 1.05,
text = paste0("Mean = ", round(mean_x,4)),
showarrow = TRUE,
arrowhead = 2,
ax = 0,
ay = -30
)
# =========================
# Layout
# =========================
p <- p %>% layout(
title = list(
text = "<b>Interval Kepercayaan t-Test (df = 11)</b>",
x = 0.5
),
margin = list(t = 120, b = 60),
xaxis = list(title = "Rata-rata Sampel"),
yaxis = list(title = "Density"),
hovermode = "x",
legend = list(
orientation = "h",
x = 0.5,
xanchor = "center",
y = -0.2
)
)
p
Interpretasi
Rata-rata penyelesaian tugas per pengguna diperkirakan sekitar 8.4583 menit. Interval kepercayaan menunjukkan rentang nilai rata-rata yang mungkin:
Semakin tinggi tingkat kepercayaan, interval semakin lebar. Artinya, kita lebih yakin rata-rata sebenarnya berada di dalam rentang tersebut, tetapi estimasi menjadi kurang presisi.
Dalam konteks bisnis:
Intinya: Interval kepercayaan memberikan gambaran realistis tentang rata-rata penyelesaian tugas, bukan sekadar angka tunggal, sehingga strategi bisnis bisa lebih aman dan terukur.
1. Ukuran Sampel (n)
Semakin besar ukuran sampel, lebar interval kepercayaan semakin
sempit.
Hal ini terjadi karena standard error (SE) berkurang
saat n bertambah:
\[ SE = \frac{s}{\sqrt{n}} \]
Contoh:
Intinya: Lebih banyak data → perkiraan rata-rata populasi lebih akurat.
2. Tingkat Kepercayaan (Confidence Level)
Semakin tinggi tingkat kepercayaan, lebar CI semakin lebar.
Contoh:
Alasan: Tingkat kepercayaan yang tinggi berarti kita ingin lebih yakin bahwa rata-rata populasi berada di dalam interval, sehingga rentang harus diperluas.
3. Ringkasan| Faktor | Pengaruh pada Lebar CI | Penjelasan |
|---|---|---|
| Ukuran sampel (n) | Semakin besar → semakin sempit | Lebih banyak data → estimasi rata-rata lebih akurat |
| Tingkat kepercayaan | Semakin tinggi → semakin lebar | Lebih yakin → rentang perkiraan lebih luas |
4. Kesimpulan
Ukuran sampel dan tingkat kepercayaan adalah dua faktor utama yang
memengaruhi lebar CI.
Dengan memahami keduanya, kita bisa membuat perkiraan rata-rata populasi
yang lebih presisi dan mengambil keputusan berbasis data dengan lebih
tepat.
Confidence Interval for a Proportion, A/B Testing: A data science team runs an A/B test on a new Call-To-Action (CTA) button design. The experiment yields:
\[ \begin{eqnarray*} n &=& 400 \quad \text{(total users)} \\ x &=& 156 \quad \text{(users who clicked the CTA)} \end{eqnarray*} \]
Tasks:
Proporsi sampel dihitung dengan rumus:
\[ \hat{p} = \frac{x}{n} \]
Dengan data:
\[ x = 156 \quad \text{(jumlah pengguna yang mengklik CTA)} \]
\[ n = 400 \quad \text{(total pengguna)} \]
\[ \hat{p} = \frac{156}{400} = 0.39 \]
Kesimpulan:
Sekitar 39% pengguna mengklik tombol CTA.
Interval kepercayaan untuk Proporsi (A/B Testing) sebagai berikut:
Rumus interval kepercayaan untuk proporsi:
\[ CI = \hat{p} \pm Z_{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]
dengan keterangan:
Data yang digunakan:
\[ n = 400, \quad x = 156, \quad \hat{p} = 0.39 \]
Nilai Kritis Z
\[ \alpha = 0.10 \Rightarrow \frac{\alpha}{2} = 0.05 \]
\[ Z_{0.05} = 1.645 \]
Standard Error (SE)
\[ SE = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} = \sqrt{\frac{0.39(1-0.39)}{400}} \approx 0.0244 \]
Margin of Error (ME)
\[ ME = Z_{\alpha/2} \times SE = 1.645 \times 0.0244 \approx 0.0401 \]
Interval Kepercayaan 90%
\[ CI_{90\%} = 0.39 \pm 0.0401 \]
\[ CI_{90\%} =(0.3499,0.4301)≈(0.35,0.43) \]
Nilai Kritis Z
\[ \alpha = 0.05 \Rightarrow \frac{\alpha}{2} = 0.025 \]
\[ Z_{0.025} = 1.96 \]
Standard Error (SE)
\[ SE = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} = \sqrt{\frac{0.39(1-0.39)}{400}} \approx 0.0244 \]
Margin of Error (ME)
\[ ME = 1.96 \times 0.0244 \approx 0.0478 \]
Interval Kepercayaan 95%
\[ CI_{95\%} = 0.39 \pm 0.0478 \]
\[ CI_{95\%} =(0.3422,0.4378)≈(0.34,0.44) \]
Nilai Kritis Z
\[ \alpha = 0.01 \Rightarrow \frac{\alpha}{2} = 0.005 \]
\[ Z_{0.005} = 2.575 \]
Standard Error (SE)
\[ SE = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} = \sqrt{\frac{0.39(1-0.39)}{400}} \approx 0.0244 \]
Margin of Error (ME)
\[ ME = 2.575 \times 0.0244 \approx 0.0629 \]
Interval Kepercayaan 99%
\[ CI_{99\%} = 0.39 \pm 0.0629 \]
\[ CI_{99\%} =(0.3271,0.4529)≈(0.33,0.45) \]
library(plotly)
# =========================
# Parameter dasar
# =========================
p_hat <- 0.39
n <- 400
se <- sqrt(p_hat*(1-p_hat)/n)
# CI levels dengan warna
ci_levels <- data.frame(
CI = c("99%", "95%", "90%"),
Z = c(2.575, 1.96, 1.645),
color = c("rgba(186,85,211,0.30)", # 99%
"rgba(100,149,237,0.35)", # 95%
"rgba(144,238,144,0.45)") # 90%
)
# Hitung Lower & Upper
ci_levels$Lower <- p_hat - ci_levels$Z * se
ci_levels$Upper <- p_hat + ci_levels$Z * se
# =========================
# Density curve (Normal approx)
# =========================
x <- seq(p_hat - 4*se, p_hat + 4*se, length.out = 1000)
y <- dnorm(x, mean = p_hat, sd = se)
# =========================
# Plot interaktif
# =========================
p <- plot_ly()
# Kurva density
p <- p %>% add_lines(x = x, y = y, line = list(color="black"), name="Density",
hovertemplate = "Proportion: %{x}<br>Density: %{y}<extra></extra>")
# Polygon CI
for(i in 1:nrow(ci_levels)){
idx <- x >= ci_levels$Lower[i] & x <= ci_levels$Upper[i]
p <- p %>% add_polygons(
x = c(x[idx], rev(x[idx])),
y = c(y[idx], rep(0, sum(idx))),
fillcolor = ci_levels$color[i],
line = list(color = "rgba(0,0,0,0)"),
name = paste("CI", ci_levels$CI[i]),
hovertemplate = paste0(
"<b>CI ", ci_levels$CI[i], "</b><br>",
"LCL = ", round(ci_levels$Lower[i],3), "<br>",
"UCL = ", round(ci_levels$Upper[i],3),
"<extra></extra>"
)
)
}
# Garis mean (p_hat)
p <- p %>% add_lines(
x = c(p_hat, p_hat), y = c(0, max(y)),
line = list(dash="dash", color="black"),
name = "Sample Proportion",
hovertemplate = paste0("<b>Sample Proportion</b><br>Value = ", p_hat, "<extra></extra>")
)
# Label mean
p <- p %>% add_annotations(
x = p_hat,
y = max(y) * 1.05,
text = paste0("p̂ = ", p_hat),
showarrow = TRUE,
arrowhead = 2,
arrowsize = 1,
arrowcolor = "black",
font = list(color="black", size=12),
ax = 0,
ay = -30
)
# Layout
p <- p %>% layout(
title = list(
text = "<b>Interval Kepercayaan untuk Proporsi</b>",
x = 0.5
),
margin = list(t = 100, b = 60),
xaxis = list(title = "Proporsi Sampel"),
yaxis = list(title = "Density"),
hovermode = "x",
hoverlabel = list(
bgcolor = "white",
font = list(color = "black"),
align = "left"
),
legend = list(
orientation = "h",
x = 0.5,
xanchor = "center",
y = -0.2
)
)
p
Interpretasi
| Tingkat CI | Rentang (LCL – UCL) | Artinya |
|---|---|---|
| 90% | 0.35 – 0.43 | Lebih sempit → estimasi lebih tepat, tapi keyakinan sedikit lebih rendah |
| 95% | 0.34 – 0.44 | Standar umum → keseimbangan antara ketepatan dan keyakinan |
| 99% | 0.33 – 0.45 | Paling lebar → sangat yakin proporsi sebenarnya ada di sini, tapi rentangnya kurang presisi |
Semakin tinggi tingkat kepercayaan → rentang CI lebih lebar → kita lebih yakin proporsi sebenarnya ada di dalamnya, tapi estimasinya kurang tepat.
Tingkat kepercayaan dalam interval kepercayaan menunjukkan seberapa yakin kita bahwa nilai proporsi sebenarnya berada dalam rentang yang dihitung.
Tingkat Kepercayaan Tinggi (misalnya 99%) Kita sangat yakin bahwa
hasilnya benar, tetapi interval kepercayaannya menjadi lebih
lebar.
Akibatnya, tim produk biasanya akan lebih berhati-hati dan cenderung
menunda keputusan sampai tersedia data tambahan.
Tingkat Kepercayaan Sedang (misalnya 95%) Memberikan keseimbangan
antara tingkat keyakinan dan kecepatan pengambilan keputusan.
Tingkat ini paling sering digunakan dalam A/B testing karena
cukup meyakinkan tanpa terlalu memperlambat proses pengambilan
keputusan.
Tingkat Kepercayaan Lebih Rendah (misalnya 90%) Interval
kepercayaan lebih sempit sehingga hasil terlihat lebih jelas, namun
risiko pengambilan keputusan yang salah menjadi lebih besar.
Pendekatan ini cocok untuk eksperimen kecil atau perubahan dengan risiko
yang relatif rendah.
Intinya Semakin tinggi tingkat kepercayaan, semakin
yakin kita terhadap hasil, tetapi proses pengambilan keputusan menjadi
lebih lambat dan hati-hati.
Sebaliknya, tingkat kepercayaan yang lebih rendah memungkinkan keputusan
diambil lebih cepat, tetapi dengan risiko yang lebih besar.
Precision Comparison (Z-Test vs t-Test): Two data teams measure API latency (in milliseconds) under different conditions.
\[\begin{eqnarray*} \text{Team A:} \\ n &=& 36 \quad \text{(sample size)} \\ \bar{x} &=& 210 \quad \text{(sample mean)} \\ \sigma &=& 24 \quad \text{(known population standard deviation)} \\[6pt] \text{Team B:} \\ n &=& 36 \quad \text{(sample size)} \\ \bar{x} &=& 210 \quad \text{(sample mean)} \\ s &=& 24 \quad \text{(sample standard deviation)} \end{eqnarray*}\]
Tasks
Tim A
\[ CI = \bar{x} \pm Z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}} \]
Tim B
\[ CI = \bar{x} \pm t_{\alpha/2,\, n-1} \cdot \frac{s}{\sqrt{n}} \]
Intinya
Data:
Rumus:
\[ CI = \bar{x} \pm Z_{\alpha/2} \cdot SE \]
Rumus:
\[ CI = \bar{x} \pm t_{\alpha/2, n-1} \cdot SE \]
Nilai t dari interpolasi (df = 35):
| Tingkat CI | df 30 | df 40 | Interpolasi t35 |
|---|---|---|---|
| 90% | 1.697 | 1.684 | 1.691 |
| 95% | 2.042 | 2.021 | 2.032 |
| 99% | 2.750 | 2.704 | 2.727 |
library(plotly)
# =========================
# Fungsi buat plot CI
# =========================
plot_CI <- function(mean_x, se, ci_levels, title_text, show_legend = TRUE){
x <- seq(mean_x - 4*se, mean_x + 4*se, length.out = 1000)
y <- dnorm(x, mean = mean_x, sd = se)
p <- plot_ly()
p <- p %>% add_lines(x = x, y = y, line = list(color="black"), name="Density",
hovertemplate = "X: %{x}<br>Density: %{y}<extra></extra>",
showlegend = show_legend)
for(i in 1:nrow(ci_levels)){
idx <- x >= ci_levels$Lower[i] & x <= ci_levels$Upper[i]
p <- p %>% add_polygons(
x = c(x[idx], rev(x[idx])),
y = c(y[idx], rep(0, sum(idx))),
fillcolor = ci_levels$color[i],
line = list(color = "rgba(0,0,0,0)"),
name = paste("CI", ci_levels$CI[i]),
hovertemplate = paste0(
"<b>CI ", ci_levels$CI[i], "</b><br>",
"LCL = ", round(ci_levels$Lower[i],2), "<br>",
"UCL = ", round(ci_levels$Upper[i],2),
"<extra></extra>"
),
showlegend = show_legend
)
}
p <- p %>% add_lines(
x = c(mean_x, mean_x), y = c(0, max(y)),
line = list(dash="dash", color="black"),
name = "Mean",
hovertemplate = paste0("<b>Mean</b><br>Value = ", mean_x, "<extra></extra>"),
showlegend = show_legend
)
p <- p %>% add_annotations(
x = mean_x,
y = max(y) * 1.05,
text = paste0("Mean = ", mean_x),
showarrow = TRUE,
arrowhead = 2,
arrowsize = 1,
arrowcolor = "black",
font = list(color="black", size=12),
ax = 0,
ay = -30
)
p <- p %>% layout(
title = list(text = title_text, x = 0.5),
xaxis = list(title = "Rata-rata Sampel"),
yaxis = list(title = "Density"),
hovermode = "x",
hoverlabel = list(bgcolor = "white", font = list(color="black"), align="left")
)
return(p)
}
# =========================
# Tim A (Z-Test)
# =========================
mean_A <- 210
se_A <- 4
ci_levels_A <- data.frame(
CI = c("99%", "95%", "90%"),
t = c(2.576, 1.96, 1.645),
color = c("rgba(186,85,211,0.30)",
"rgba(100,149,237,0.35)",
"rgba(144,238,144,0.45)")
)
ci_levels_A$Lower <- mean_A - ci_levels_A$t * se_A
ci_levels_A$Upper <- mean_A + ci_levels_A$t * se_A
pA <- plot_CI(mean_A, se_A, ci_levels_A, "Tim A (Z-Test, σ diketahui)", show_legend = TRUE)
# =========================
# Tim B (t-Test)
# =========================
mean_B <- 210
se_B <- 4
ci_levels_B <- data.frame(
CI = c("99%", "95%", "90%"),
t = c(2.727, 2.032, 1.691),
color = c("rgba(186,85,211,0.30)",
"rgba(100,149,237,0.35)",
"rgba(144,238,144,0.45)")
)
ci_levels_B$Lower <- mean_B - ci_levels_B$t * se_B
ci_levels_B$Upper <- mean_B + ci_levels_B$t * se_B
pB <- plot_CI(mean_B, se_B, ci_levels_B, "Tim B (t-Test, σ tidak diketahui)", show_legend = FALSE)
# =========================
# Gabung dua plot secara vertikal
# =========================
subplot(pA, pB, nrows = 2, shareX = TRUE, titleY = TRUE) %>%
layout(
title = list(
text = "<b>Perbandingan Interval Kepercayaan Tim A & Tim B</b>",
x = 0.5,
y = 0.99, # turunkan sedikit
xanchor = "center",
yanchor = "top"
)
)
| Tingkat CI | Tim A (Z-Test) | Tim B (t-Test, df=35) |
|---|---|---|
| 90% | (203.42, 216.58) | (203.24, 216.76) |
| 95% | (202.16, 217.84) | (201.87, 218.13) |
| 99% | (199.70, 220.30) | (199.09, 220.91) |
Lebar interval berbeda meskipun datanya sama karena metode perhitungannya berbeda:
One-Sided Confidence Interval: A Software as a Service (SaaS) company wants to ensure that at least 70% of weekly active users utilize a premium feature.
From the experiment:
\[ \begin{eqnarray*} n &=& 250 \quad \text{(total users)} \\ x &=& 185 \quad \text{(active premium users)} \end{eqnarray*} \]
Management is only interested in the lower bound of the estimate.
Tasks:
Uji statistik yang tepat adalah Z-Interval untuk proporsi, dengan alasan sebagai berikut:
Data yang di gunakan yaitu:
Jumlah sampel total:
\[n = 250\]
Jumlah pengguna premium aktif:
\[x = 185\]
Proporsi sampel dihitung sebagai berikut:
\[\hat{p} = \frac{x}{n} = \frac{185}{250} = 0.74\]
Standard Error(SE)
Rumus kesalahan standar untuk proporsi adalah:
\[SE = \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}}\]
\[SE = \sqrt{\frac{0.74 \cdot 0.26}{250}}\]
\[SE = \sqrt{0.0007696} \approx 0.0277\]
Tingkat signifikansi:
\[ \alpha = 0.10 \]
Dari tabel Z (satu sisi):
\[ Z_{1-\alpha} = Z_{0.90} \approx 1.28 \]
Margin Kesalahan (Margin of Error, ME):
\[ ME = Z_{1-\alpha} \cdot SE \]
\[ ME = 1.28 \cdot 0.0277 \approx 0.0355 \]
Interval Kepercayaan Satu Sisi:
\[ CI_{\text{lower}} = 0.74 - 0.0355 \approx 0.705 \]
Tingkat signifikansi:
\[ \alpha = 0.05 \]
Dari tabel Z (satu sisi):
\[ Z_{1-\alpha} = Z_{0.95} \approx 1.645 \]
Margin Kesalahan:
\[ ME = 1.645 \cdot 0.0277 \approx 0.0456 \]
Interval Kepercayaan Satu Sisi:
\[ CI_{\text{lower}} = 0.74 - 0.0456 \approx 0.694 \]
Tingkat signifikansi:
\[ \alpha = 0.01 \]
Dari tabel Z (satu sisi):
\[ Z_{1-\alpha} = Z_{0.99} \approx 2.325 \]
Margin Kesalahan:
\[ ME = 2.325 \cdot 0.0277 \approx 0.0644 \]
Interval Kepercayaan Satu Sisi:
\[ CI_{\text{lower}} = 0.74 - 0.0644 \approx 0.804 \]
library(plotly)
# =========================
# Parameter dasar
# =========================
p_hat <- 0.74
se <- 0.0277
# CI satu sisi (Lower bound)
ci_levels <- data.frame(
CI = c("99%", "95%", "90%"),
Z = c(2.325, 1.645, 1.28),
color = c("rgba(186,85,211,0.30)",
"rgba(100,149,237,0.35)",
"rgba(144,238,144,0.45)")
)
ci_levels$Lower <- p_hat - ci_levels$Z * se
ci_levels$Upper <- p_hat
# =========================
# Density curve
x <- seq(p_hat - 4*se, p_hat + 4*se, length.out = 1000)
y <- dnorm(x, mean = p_hat, sd = se)
# =========================
# Plot interaktif
p <- plot_ly()
p <- p %>% add_lines(x = x, y = y, line = list(color="black"), name="Density",
hovertemplate = "Proportion: %{x}<br>Density: %{y}<extra></extra>")
for(i in 1:nrow(ci_levels)){
idx <- x >= ci_levels$Lower[i] & x <= ci_levels$Upper[i]
p <- p %>% add_polygons(
x = c(x[idx], rev(x[idx])),
y = c(y[idx], rep(0, sum(idx))),
fillcolor = ci_levels$color[i],
line = list(color = "rgba(0,0,0,0)"),
name = paste("CI", ci_levels$CI[i], "(lower)"),
hovertemplate = paste0(
"<b>CI ", ci_levels$CI[i], " Lower</b><br>",
"LCL = ", round(ci_levels$Lower[i],3), "<br>",
"UCL = ", round(ci_levels$Upper[i],3),
"<extra></extra>"
)
)
}
# Garis mean
p <- p %>% add_lines(
x = c(p_hat, p_hat), y = c(0, max(y)),
line = list(dash="dash", color="black"),
name = "Sample Proportion",
hovertemplate = paste0("<b>Sample Proportion</b><br>Value = ", p_hat, "<extra></extra>")
)
# Label mean
p <- p %>% add_annotations(
x = p_hat,
y = max(y) * 1.05,
text = paste0("p̂ = ", p_hat),
showarrow = TRUE,
arrowhead = 2,
arrowsize = 1,
arrowcolor = "black",
font = list(color="black", size=12),
ax = 0,
ay = -30
)
# =========================
# Layout dengan margin atas lebih besar
p <- p %>% layout(
title = "<b>Interval Kepercayaan Satu Sisi (Lower) untuk Proporsi</b>",
margin = list(t = 100, b = 60), # t = 120 untuk memberi jarak judul
xaxis = list(title = "Proporsi Sampel"),
yaxis = list(title = "Density"),
hovermode = "x",
hoverlabel = list(
bgcolor = "white",
font = list(color="black"),
align = "left"
),
legend = list(
orientation = "h",
x = 0.5,
xanchor = "center",
y = -0.2
)
)
p
| Tingkat CI | α | Z1−α | Margin Kesalahan (ME) | Batas Bawah CI |
|---|---|---|---|---|
| 90% | 0.10 | 1.28 | 0.0355 | 0.705 |
| 95% | 0.05 | 1.645 | 0.0456 | 0.694 |
| 99% | 0.01 | 2.325 | 0.0644 | 0.675 |
90% CI Lower Bound = 0.705 (70,5%)
Dengan tingkat keyakinan 90%, kita yakin proporsi pengguna aktif premium
minimal 70,5%.
Karena 70,5% > target 70%, target perusahaan terpenuhi.
95% CI Lower Bound = 0.694 (69,4%)
Dengan keyakinan 95%, batas bawah proporsi pengguna premium adalah
69,4%.
Ini sedikit di bawah target 70%, sehingga dengan tingkat keyakinan lebih
tinggi, tidak dapat dipastikan target tercapai.
99% CI Lower Bound = 0.675 (67,5%)
Dengan keyakinan 99%, batas bawah proporsi hanya 67,5%, jauh di bawah
target.
Artinya, untuk tingkat keyakinan sangat tinggi, tidak bisa dipastikan
minimal 70% pengguna aktif premium.
Kesimpulan Bisnis