Jackknife adalah metode resampling yang bekerja dengan cara membuang satu observasi ke-i secara bergantian (leave-one-out), lalu menghitung ulang statistik pada setiap subset.
Tujuan utama:
Formula:
set.seed(123)
x <- rnorm(100)
n <- length(x)
# --- Statistik asli ---
theta.hat <- mean(x)
# --- Leave-one-out estimates ---
theta.jack <- numeric(n)
for (i in 1:n) {
theta.jack[i] <- mean(x[-i]) # hitung mean tanpa obs ke-i
}
# --- Bias ---
bias <- (n - 1) * (mean(theta.jack) - theta.hat)
# --- Standard Error ---
jack.se <- sqrt((n - 1) * mean((theta.jack - mean(theta.jack))^2))
# --- Confidence Interval 95% ---
t.val <- qt(0.025, df = n - 1, lower.tail = FALSE)
lower <- theta.hat - t.val * jack.se
upper <- theta.hat + t.val * jack.se
cat("Estimasi :", round(theta.hat, 6), "\n")## Estimasi : 0.090406
## Bias Jackknife : 0
## Standard Error : 0.091282
## CI 95% : [ -0.090717 , 0.271528 ]
bootstrap# install.packages("bootstrap")
library(bootstrap)
# Definisikan fungsi statistik (harus menerima indeks)
theta_fn <- function(idx, data) mean(data[idx])
hasil_jack <- jackknife(1:n, theta_fn, x)
cat("Bias (package) :", round(hasil_jack$jack.bias, 6), "\n")## Bias (package) : 0
## SE (package) : 0.091282
| Komponen | Scratch (manual loop) | Package bootstrap |
|---|---|---|
| Cara kerja | Loop for, buang obs ke-i satu per satu |
jackknife(indices, fn, data) |
| Fleksibilitas | Bisa untuk statistik apa pun (custom formula) | Perlu wrapper fungsi berindeks |
| Bias formula | (n-1) * (mean(jack) - theta.hat) |
Dihitung otomatis |
| SE formula | sqrt((n-1)/n * sum(...)) |
Dihitung otomatis |
Kesimpulan: Scratch lebih transparan dan cocok untuk ujian. Package lebih praktis di dunia nyata.
# Fungsi jackknife yang bisa menerima statistik apa pun
jknife <- function(x, fxn, ci = 0.95) {
theta.hat <- fxn(x)
n <- length(x)
theta.jack <- numeric(n)
for (i in 1:n) theta.jack[i] <- fxn(x[-i])
bias <- (n - 1) * (mean(theta.jack) - theta.hat)
se <- sqrt((n - 1) * mean((theta.jack - mean(theta.jack))^2))
alpha <- 1 - ci
t.val <- qt(alpha / 2, df = n - 1, lower.tail = FALSE)
list(
est = theta.hat,
bias = bias,
se = se,
ci = c(theta.hat - t.val * se, theta.hat + t.val * se),
replicates = theta.jack
)
}
# Contoh: estimasi mean
hasil <- jknife(x, mean)
cat("Est:", round(hasil$est, 6), "\n")## Est: 0.090406
## Bias: 0
## SE: 0.091282
## CI: [ -0.090717 , 0.271528 ]
##
## Jackknife Median:
cat("Est:", round(hasil_med$est, 6), "| Bias:", round(hasil_med$bias, 6),
"| SE:", round(hasil_med$se, 6), "\n")## Est: 0.061756 | Bias: 0 | SE: 0.087082
Bootstrap adalah metode resampling yang bekerja dengan cara mengambil sampel dengan pengembalian (sampling with replacement) sebanyak \(B\) kali dari data asli.
Tujuan utama:
Formula:
set.seed(42)
x <- c(25, 10, 3, 2, 7, 20, 30, 9, 15)
n <- length(x)
B <- 1000 # jumlah iterasi bootstrap
theta.hat <- mean(x)
theta.boot <- numeric(B)
for (b in 1:B) {
x.star <- sample(x, size = n, replace = TRUE) # resample dengan pengembalian
theta.boot[b] <- mean(x.star)
}
# Standard Error Bootstrap
boot.se <- sd(theta.boot)
# Bias Bootstrap
boot.bias <- mean(theta.boot) - theta.hat
# Confidence Interval — Percentile Method
ci.lower <- quantile(theta.boot, 0.025)
ci.upper <- quantile(theta.boot, 0.975)
cat("Estimasi Asli :", round(theta.hat, 4), "\n")## Estimasi Asli : 13.4444
## Bias Bootstrap : 0.0214
## SE Bootstrap : 3.1541
## CI 95% (percentile): [ 7.8889 , 19.8917 ]
# Visualisasi distribusi bootstrap
hist(theta.boot, breaks = 30, col = "steelblue", border = "white",
main = "Distribusi Bootstrap (Mean)", xlab = expression(hat(theta)^"*"))
abline(v = theta.hat, col = "red", lwd = 2, lty = 2)
abline(v = ci.lower, col = "orange", lwd = 2, lty = 3)
abline(v = ci.upper, col = "orange", lwd = 2, lty = 3)
legend("topright",
legend = c("Estimasi Asli", "CI 95%"),
col = c("red", "orange"), lty = c(2, 3), lwd = 2)bootlibrary(boot)
# Fungsi statistik HARUS menerima (data, indices)
mean_fn <- function(data, indices) mean(data[indices])
hasil_boot <- boot(data = x, statistic = mean_fn, R = 1000)
print(hasil_boot)##
## ORDINARY NONPARAMETRIC BOOTSTRAP
##
##
## Call:
## boot(data = x, statistic = mean_fn, R = 1000)
##
##
## Bootstrap Statistics :
## original bias std. error
## t1* 13.44444 0.019 2.958989
# Confidence Interval (4 metode sekaligus)
boot.ci(hasil_boot, type = c("norm", "basic", "perc", "bca"))## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 1000 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = hasil_boot, type = c("norm", "basic", "perc",
## "bca"))
##
## Intervals :
## Level Normal Basic
## 95% ( 7.63, 19.22 ) ( 7.67, 19.00 )
##
## Level Percentile BCa
## 95% ( 7.89, 19.22 ) ( 8.11, 19.44 )
## Calculations and Intervals on Original Scale
| Komponen | Scratch (manual loop) | Package boot |
|---|---|---|
| Cara kerja | sample(..., replace=TRUE) dalam loop
for |
boot(data, statistic, R) |
| Fungsi stat | Tulis bebas di dalam loop | Harus wrapper function(data, indices) |
| CI | Manual quantile() (percentile) |
boot.ci() — 4 metode sekaligus |
| Output | Vektor replikasi manual | Objek boot dengan plot bawaan |
Perbedaan kunci Jackknife vs Bootstrap: - Jackknife: deterministik, leave-one-out, cocok untuk bias & SE - Bootstrap: stokastik (random), sampling ulang \(B\) kali, lebih fleksibel untuk CI
OLS adalah metode estimasi parameter regresi linear yang meminimalkan jumlah kuadrat residual antara nilai aktual \(y_i\) dan nilai prediksi \(\hat{y}_i\).
Model:
\[y_i = \beta_0 + \beta_1 x_i + \varepsilon_i, \quad \varepsilon_i \sim N(0, \sigma^2)\]
Estimator OLS (solusi closed-form):
\[\hat{\boldsymbol{\beta}} = (\mathbf{X}^\top \mathbf{X})^{-1} \mathbf{X}^\top \mathbf{y}\]
Standard Error koefisien:
\[\widehat{Var}(\hat{\boldsymbol{\beta}}) = \hat{\sigma}^2 (\mathbf{X}^\top \mathbf{X})^{-1}, \quad \hat{\sigma}^2 = \frac{RSS}{n - p}\]
dengan \(p\) = jumlah parameter (termasuk intercept), \(RSS = \sum(y_i - \hat{y}_i)^2\).
# --- Data contoh ---
set.seed(123)
n <- 50
x1 <- rnorm(n, mean = 5, sd = 2)
x2 <- rnorm(n, mean = 3, sd = 1)
y <- 2 + 3*x1 - 1.5*x2 + rnorm(n, sd = 2) # true: b0=2, b1=3, b2=-1.5
# --- Matriks desain X (dengan kolom 1 untuk intercept) ---
X <- cbind(1, x1, x2) # dimensi: n x 3
p <- ncol(X) # jumlah parameter
# --- Estimasi OLS: beta_hat = (X'X)^{-1} X'y ---
XtX <- t(X) %*% X
Xty <- t(X) %*% y
beta_hat <- solve(XtX) %*% Xty
cat("Koefisien OLS (scratch):\n")## Koefisien OLS (scratch):
## Intercept (b0): 2.432
## b1 : 3.0252
## b2 : -1.8392
# --- Residual dan RSS ---
y_hat <- X %*% beta_hat
resid <- y - y_hat
RSS <- sum(resid^2)
# --- Estimasi varians galat ---
sigma2 <- RSS / (n - p)
# --- Varians-kovarians koefisien ---
Var_beta <- sigma2 * solve(XtX)
SE_beta <- sqrt(diag(Var_beta))
# --- t-statistik dan p-value ---
t_stat <- beta_hat / SE_beta
p_val <- 2 * pt(abs(t_stat), df = n - p, lower.tail = FALSE)
# --- R-squared ---
SST <- sum((y - mean(y))^2)
R2 <- 1 - RSS / SST
R2_adj <- 1 - (1 - R2) * (n - 1) / (n - p)
# --- Tabel ringkasan ---
tabel_ols <- data.frame(
Koefisien = c("Intercept", "x1", "x2"),
Estimate = round(beta_hat, 4),
Std_Error = round(SE_beta, 4),
t_value = round(t_stat, 4),
p_value = round(p_val, 6)
)
print(tabel_ols)## Koefisien Estimate Std_Error t_value p_value
## Intercept 2.4320 1.314 1.8509 0.070481
## x1 x1 3.0252 0.154 19.6427 0.000000
## x2 x2 -1.8392 0.315 -5.8394 0.000000
##
## R-squared : 0.9012
## Adj R-squared: 0.897
## Sigma (galat): 1.995
lm()##
## Call:
## lm(formula = y ~ x1 + x2, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.7627 -1.4811 -0.1275 1.0503 4.5409
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.432 1.314 1.851 0.0705 .
## x1 3.025 0.154 19.643 < 2e-16 ***
## x2 -1.839 0.315 -5.839 4.71e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.995 on 47 degrees of freedom
## Multiple R-squared: 0.9012, Adjusted R-squared: 0.897
## F-statistic: 214.4 on 2 and 47 DF, p-value: < 2.2e-16
## (Intercept) x1 x2
## 2.432023 3.025152 -1.839216
## 2.5 % 97.5 %
## (Intercept) -0.2114061 5.075452
## x1 2.7153258 3.334977
## x2 -2.4728442 -1.205588
lm()| Komponen | Scratch (matrix algebra) | lm() |
|---|---|---|
| Estimasi \(\hat\beta\) | solve(t(X) %*% X) %*% t(X) %*% y |
lm(y ~ x1 + x2) |
| SE koefisien | sqrt(diag(sigma2 * solve(XtX))) |
Otomatis di summary() |
| R² | 1 - RSS/SST |
summary()$r.squared |
| CI | Manual dari qt() × SE |
confint(model) |
| Kemampuan | Transparan, cocok untuk ujian | Ringkas, lengkap, siap pakai |
##
## Shapiro-Wilk normality test
##
## data: residuals(model_lm)
## W = 0.97735, p-value = 0.4463
4 plot diagnostik: 1. Residuals vs Fitted — cek linearitas & homoskedastisitas 2. Q-Q Plot — cek normalitas residual 3. Scale-Location — cek homoskedastisitas 4. Residuals vs Leverage — cek outlier/influential points
MLE mencari nilai parameter \(\theta\) yang memaksimalkan fungsi likelihood \(L(\theta)\), yaitu probabilitas mengamati data yang ada jika parameter adalah \(\theta\).
Ide inti:
\[\hat{\theta}_{MLE} = \arg\max_\theta \, L(\theta \mid \mathbf{x}) = \arg\max_\theta \prod_{i=1}^n f(x_i \mid \theta)\]
Dalam praktik, kita maksimalkan log-likelihood (lebih mudah secara komputasi):
\[\ell(\theta) = \log L(\theta) = \sum_{i=1}^n \log f(x_i \mid \theta)\]
MLE untuk distribusi Normal \(X \sim N(\mu, \sigma^2)\):
\[\hat{\mu}_{MLE} = \bar{x}, \qquad \hat{\sigma}^2_{MLE} = \frac{1}{n}\sum_{i=1}^n (x_i - \bar{x})^2\]
(Catatan: \(\hat{\sigma}^2_{MLE}\) adalah estimator biased — pembaginya \(n\), bukan \(n-1\))
set.seed(42)
x <- rnorm(100, mean = 5, sd = 2)
# --- Log-likelihood untuk distribusi Normal ---
loglik_normal <- function(params, x) {
mu <- params[1]
sigma <- params[2]
if (sigma <= 0) return(-Inf) # sigma harus positif
sum(dnorm(x, mean = mu, sd = sigma, log = TRUE))
}
# --- Optim: MAKSIMALKAN log-likelihood (minimasi negatif-nya) ---
neg_loglik <- function(params) -loglik_normal(params, x)
hasil_optim <- optim(
par = c(mu = 0, sigma = 1), # nilai awal
fn = neg_loglik,
method = "L-BFGS-B",
lower = c(-Inf, 1e-6), # sigma > 0
hessian = TRUE
)
mu_mle <- hasil_optim$par[1]
sigma_mle <- hasil_optim$par[2]
cat("MLE mu :", round(mu_mle, 4), "| True: 5\n")## MLE mu : 5.065 | True: 5
## MLE sigma : 2.0723 | True: 2
## Rumus cepat mu_mle = 5.065
## Rumus cepat sigma_mle = 2.0723
# --- Standard Error dari Hessian ---
Fisher_info <- hasil_optim$hessian # Hessian dari -loglik ≈ Fisher info
SE_mle <- sqrt(diag(solve(Fisher_info)))
cat("\nSE mu :", round(SE_mle[1], 4), "\n")##
## SE mu : 0.2072
## SE sigma : 0.1465
set.seed(1)
x_pois <- rpois(80, lambda = 3.5)
# Log-likelihood Poisson: sum(x*log(lambda) - lambda - log(x!))
loglik_pois <- function(lambda, x) {
if (lambda <= 0) return(-Inf)
sum(dpois(x, lambda = lambda, log = TRUE))
}
# Grid search sederhana (alternatif optim)
lambdas <- seq(0.1, 10, by = 0.01)
ll_values <- sapply(lambdas, loglik_pois, x = x_pois)
lambda_mle_grid <- lambdas[which.max(ll_values)]
# Solusi analitik: lambda_MLE = mean(x)
lambda_mle_analitik <- mean(x_pois)
cat("MLE lambda (grid search) :", lambda_mle_grid, "\n")## MLE lambda (grid search) : 3.6
## MLE lambda (rumus analitik): 3.6
## True lambda : 3.5
# Plot log-likelihood
plot(lambdas, ll_values, type = "l", col = "steelblue", lwd = 2,
main = "Log-Likelihood Poisson", xlab = expression(lambda), ylab = "log L")
abline(v = lambda_mle_analitik, col = "red", lwd = 2, lty = 2)
legend("topright", legend = "MLE", col = "red", lty = 2, lwd = 2)MASS (fitdistr)## mean sd
## 5.0650296 2.0722742
## (0.2072274) (0.1465319)
## lambda
## 3.600000
## (0.212132)
# Pada regresi linear dengan asumsi normalitas,
# MLE menghasilkan estimator beta yang SAMA dengan OLS
loglik_regresi <- function(params, y, X) {
k <- ncol(X)
beta <- params[1:k]
sigma <- params[k + 1]
if (sigma <= 0) return(-Inf)
mu_hat <- X %*% beta
sum(dnorm(y, mean = mu_hat, sd = sigma, log = TRUE))
}
X_mat <- cbind(1, x1, x2)
inits <- c(rep(0, ncol(X_mat)), 1)
res <- optim(
par = inits,
fn = function(p) -loglik_regresi(p, y, X_mat),
method = "L-BFGS-B",
lower = c(rep(-Inf, ncol(X_mat)), 1e-6)
)
cat("MLE beta (regresi):\n")## MLE beta (regresi):
## Intercept: 2.4321
## b1 : 3.0251
## b2 : -1.8392
## sigma : 1.9342
## Bandingkan dengan OLS (lm()):
## (Intercept) x1 x2
## 2.432023 3.025152 -1.839216
| Komponen | Scratch (optim) |
Package MASS::fitdistr |
|---|---|---|
| Definisi fungsi | Tulis sendiri loglik |
Tinggal sebut nama distribusi |
| Optimisasi | optim() dengan negatif log-lik |
Otomatis |
| SE | sqrt(diag(solve(hessian))) |
Langsung tersedia di output |
| Fleksibilitas | Bisa untuk distribusi/model apa pun | Terbatas pada distribusi standar |
Hubungan OLS & MLE: Ketika \(\varepsilon \sim N(0, \sigma^2)\), estimator MLE untuk \(\beta\) identik dengan estimator OLS. Perbedaannya: MLE menggunakan \(n\) sebagai pembagi \(\hat\sigma^2\), sementara OLS menggunakan \(n-p\) (unbiased).
ANOVA (Analysis of Variance) menguji apakah rata-rata beberapa kelompok berbeda secara signifikan dengan mempartisi total keragaman data menjadi komponen-komponen.
Prinsip umum:
\[F = \frac{\text{Kuadrat Tengah Perlakuan (KTP)}}{\text{Kuadrat Tengah Galat (KTG)}}\]
Tolak \(H_0\) jika \(F_{hitung} > F_{tabel}\) atau \(p\)-value \(< \alpha\).
Kapan digunakan: Satu faktor perlakuan, unit eksperimen homogen.
Model: \(Y_{ij} = \mu + \tau_i + \varepsilon_{ij}\)
Hipotesis: \(H_0: \mu_1 = \mu_2 = \cdots = \mu_t\) vs \(H_1:\) minimal satu berbeda
data_ral <- data.frame(
Pakan = rep(c("A","B","C","D"), each = 5),
Berat = c(42,45,43,44,46, 50,52,49,51,53, 47,48,46,49,47, 58,60,57,59,61)
)
y <- data_ral$Berat
grp <- data_ral$Pakan
N <- length(y)
k <- length(unique(grp))
grand_mean <- mean(y)
mean_grp <- tapply(y, grp, mean)
n_grp <- tapply(y, grp, length)
# Jumlah Kuadrat
JKT <- sum((y - grand_mean)^2)
JKP <- sum(n_grp * (mean_grp - grand_mean)^2)
JKG <- JKT - JKP
# Derajat Bebas
db_P <- k - 1
db_G <- N - k
db_T <- N - 1
# Kuadrat Tengah & F
KTP <- JKP / db_P
KTG <- JKG / db_G
F_hit <- KTP / KTG
p_value <- pf(F_hit, db_P, db_G, lower.tail = FALSE)
tabel_ral <- data.frame(
Sumber = c("Perlakuan","Galat","Total"),
db = c(db_P, db_G, db_T),
JK = round(c(JKP, JKG, JKT), 3),
KT = round(c(KTP, KTG, NA), 3),
F_hit = round(c(F_hit, NA, NA), 3),
p_val = round(c(p_value, NA, NA), 6)
)
print(tabel_ral)## Sumber db JK KT F_hit p_val
## 1 Perlakuan 3 621.35 207.117 94.144 0
## 2 Galat 16 35.20 2.200 NA NA
## 3 Total 19 656.55 NA NA NA
aov()## Df Sum Sq Mean Sq F value Pr(>F)
## Pakan 3 621.4 207.1 94.14 2.22e-10 ***
## Residuals 16 35.2 2.2
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Uji lanjut (Post Hoc)
library(agricolae)
lsd_ral <- LSD.test(model_ral, "Pakan", alpha = 0.05, console = FALSE)
lsd_ral$groups## Berat groups
## D 59.0 a
## B 51.0 b
## C 47.4 c
## A 44.0 d
## Berat groups
## D 59.0 a
## B 51.0 b
## C 47.4 c
## A 44.0 d
Kapan digunakan: Satu faktor perlakuan + satu sumber keragaman yang dikendalikan (blok). Unit eksperimen tidak homogen.
Model: \(Y_{ij} = \mu + \tau_i + \beta_j + \varepsilon_{ij}\)
data_rak <- data.frame(
Blok = factor(rep(paste0("K", 1:5), each = 4)),
Metode = factor(rep(paste0("M", 1:4), times = 5)),
Tinggi = c(31,35,33,39, 29,34,32,37, 33,38,35,42, 30,36,34,40, 32,37,34,41)
)
y <- data_rak$Tinggi
grp <- data_rak$Metode
blk <- data_rak$Blok
N <- length(y)
t <- nlevels(grp)
b <- nlevels(blk)
grand_mean <- mean(y)
mean_perlk <- tapply(y, grp, mean)
mean_blok <- tapply(y, blk, mean)
JKT <- sum((y - grand_mean)^2)
JKP <- b * sum((mean_perlk - grand_mean)^2)
JKB <- t * sum((mean_blok - grand_mean)^2)
JKG <- JKT - JKP - JKB
db_P <- t - 1
db_B <- b - 1
db_G <- (t - 1) * (b - 1)
db_T <- N - 1
KTP <- JKP / db_P
KTB <- JKB / db_B
KTG <- JKG / db_G
F_P <- KTP / KTG
F_B <- KTB / KTG
pP <- pf(F_P, db_P, db_G, lower.tail = FALSE)
pB <- pf(F_B, db_B, db_G, lower.tail = FALSE)
tabel_rak <- data.frame(
Sumber = c("Perlakuan","Blok","Galat","Total"),
db = c(db_P, db_B, db_G, db_T),
JK = round(c(JKP, JKB, JKG, JKT), 3),
KT = round(c(KTP, KTB, KTG, NA), 3),
F_hit = round(c(F_P, F_B, NA, NA), 3),
p_val = round(c(pP, pB, NA, NA), 6)
)
print(tabel_rak)## Sumber db JK KT F_hit p_val
## 1 Perlakuan 3 209.8 69.933 262.25 0e+00
## 2 Blok 4 36.8 9.200 34.50 2e-06
## 3 Galat 12 3.2 0.267 NA NA
## 4 Total 19 249.8 NA NA NA
aov()## Df Sum Sq Mean Sq F value Pr(>F)
## Metode 3 209.8 69.93 262.2 3.35e-11 ***
## Blok 4 36.8 9.20 34.5 1.71e-06 ***
## Residuals 12 3.2 0.27
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Tinggi groups
## M4 39.8 a
## M2 36.0 b
## M3 33.6 c
## M1 31.0 d
## Tinggi groups
## M4 39.8 a
## M2 36.0 b
## M3 33.6 c
## M1 31.0 d
Kapan digunakan: Satu faktor perlakuan, dua sumber keragaman yang dikendalikan (baris & kolom).
Model: \(Y_{ijk} = \mu + \alpha_i + \beta_j + \tau_k + \varepsilon_{ijk}\), dengan \(p \times p\) layout.
rbsl <- data.frame(
baris = factor(rep(1:4, each = 4)),
kolom = factor(rep(1:4, times = 4)),
perlakuan = factor(c("A","B","C","D", "B","C","D","A",
"C","D","A","B", "D","A","B","C")),
Y = c(24,22,20,28, 21,23,26,31, 18,30,29,27, 24,31,26,23)
)
p <- nlevels(rbsl$perlakuan)
N <- p^2
G <- sum(rbsl$Y)
sum_baris <- tapply(rbsl$Y, rbsl$baris, sum)
sum_kolom <- tapply(rbsl$Y, rbsl$kolom, sum)
sum_perl <- tapply(rbsl$Y, rbsl$perlakuan, sum)
FK <- G^2 / N
JKT <- sum(rbsl$Y^2) - FK
JKB <- sum(sum_baris^2) / p - FK
JKK <- sum(sum_kolom^2) / p - FK
JKP <- sum(sum_perl^2) / p - FK
JKG <- JKT - JKB - JKK - JKP
db_B <- p - 1; db_K <- p - 1; db_P <- p - 1
db_G <- (p-1)*(p-2); db_T <- N - 1
KTB <- JKB/db_B; KTK <- JKK/db_K; KTP <- JKP/db_P; KTG <- JKG/db_G
F_B <- KTB/KTG; F_K <- KTK/KTG; F_P <- KTP/KTG
tabel_rbsl <- data.frame(
Sumber = c("Baris","Kolom","Perlakuan","Galat","Total"),
db = c(db_B, db_K, db_P, db_G, db_T),
JK = round(c(JKB, JKK, JKP, JKG, JKT), 3),
KT = round(c(KTB, KTK, KTP, KTG, NA), 3),
F_hit = round(c(F_B, F_K, F_P, NA, NA), 3),
p_val = round(c(pf(F_B,db_B,db_G,lower.tail=F),
pf(F_K,db_K,db_G,lower.tail=F),
pf(F_P,db_P,db_G,lower.tail=F), NA, NA), 5)
)
print(tabel_rbsl)## Sumber db JK KT F_hit p_val
## 1 Baris 3 16.688 5.562 3.761 0.07865
## 2 Kolom 3 71.188 23.729 16.042 0.00285
## 3 Perlakuan 3 139.688 46.562 31.479 0.00046
## 4 Galat 6 8.875 1.479 NA NA
## 5 Total 15 236.438 NA NA NA
aov()## Df Sum Sq Mean Sq F value Pr(>F)
## baris 3 16.69 5.56 3.761 0.078652 .
## kolom 3 71.19 23.73 16.042 0.002853 **
## perlakuan 3 139.69 46.56 31.479 0.000456 ***
## Residuals 6 8.87 1.48
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Y groups
## A 28.75 a
## D 27.00 a
## B 24.00 b
## C 21.00 c
## Y groups
## A 28.75 a
## D 27.00 a
## B 24.00 b
## C 21.00 c
Kapan digunakan: Dua atau lebih faktor perlakuan, unit eksperimen homogen.
Model 2 faktor: \(Y_{ijk} = \mu + \tau_i + \beta_j + (\tau\beta)_{ij} + \varepsilon_{ijk}\)
Kunci: Periksa interaksi terlebih dahulu!
fakt <- data.frame(
bahan = factor(rep(1:3, each = 12)),
suhu = factor(rep(rep(c(15,70,125), each = 4), times = 3), levels = c(15,70,125)),
Y = c(130,155,74,180, 34,40,80,75, 20,70,82,58,
150,188,159,126, 136,122,106,115, 25,70,58,45,
138,110,168,160, 174,120,150,139, 96,104,82,60)
)
a <- nlevels(fakt$bahan); b <- nlevels(fakt$suhu); n <- 4
G <- sum(fakt$Y); FK <- G^2 / (a*b*n)
sum_A <- tapply(fakt$Y, fakt$bahan, sum)
sum_B <- tapply(fakt$Y, fakt$suhu, sum)
sum_sel <- tapply(fakt$Y, list(fakt$bahan, fakt$suhu), sum)
JKT <- sum(fakt$Y^2) - FK
JK_A <- sum(sum_A^2)/(b*n) - FK
JK_B <- sum(sum_B^2)/(a*n) - FK
JK_sel <- sum(sum_sel^2)/n - FK
JK_AB <- JK_sel - JK_A - JK_B
JK_G <- JKT - JK_sel
db_A <- a-1; db_B <- b-1; db_AB <- (a-1)*(b-1)
db_G <- a*b*(n-1); db_T <- a*b*n - 1
KTG <- JK_G / db_G
tabel_fakt <- data.frame(
Sumber = c("Bahan (A)","Suhu (B)","Interaksi AB","Galat","Total"),
db = c(db_A, db_B, db_AB, db_G, db_T),
JK = round(c(JK_A, JK_B, JK_AB, JK_G, JKT), 3),
KT = round(c(JK_A/db_A, JK_B/db_B, JK_AB/db_AB, KTG, NA), 3),
F_hit = round(c((JK_A/db_A)/KTG, (JK_B/db_B)/KTG,
(JK_AB/db_AB)/KTG, NA, NA), 3),
p_val = round(c(pf((JK_A/db_A)/KTG, db_A, db_G, lower.tail=F),
pf((JK_B/db_B)/KTG, db_B, db_G, lower.tail=F),
pf((JK_AB/db_AB)/KTG, db_AB, db_G, lower.tail=F), NA, NA), 5)
)
print(tabel_fakt)## Sumber db JK KT F_hit p_val
## 1 Bahan (A) 2 10683.722 5341.861 7.911 0.00198
## 2 Suhu (B) 2 39118.722 19559.361 28.968 0.00000
## 3 Interaksi AB 4 9613.778 2403.444 3.560 0.01861
## 4 Galat 27 18230.750 675.213 NA NA
## 5 Total 35 77646.972 NA NA NA
aov()model_fakt <- aov(Y ~ bahan * suhu, data = fakt) # bahan*suhu = bahan + suhu + bahan:suhu
summary(model_fakt)## Df Sum Sq Mean Sq F value Pr(>F)
## bahan 2 10684 5342 7.911 0.00198 **
## suhu 2 39119 19559 28.968 1.91e-07 ***
## bahan:suhu 4 9614 2403 3.560 0.01861 *
## Residuals 27 18231 675
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# --- Uji lanjut: interaksi SIGNIFIKAN → kombinasi perlakuan ---
lsd_komb <- LSD.test(model_fakt, c("bahan","suhu"), console = FALSE)
lsd_komb$groups## Y groups
## 2:15 155.75 a
## 3:70 145.75 a
## 3:15 144.00 a
## 1:15 134.75 a
## 2:70 119.75 ab
## 3:125 85.50 bc
## 1:125 57.50 c
## 1:70 57.25 c
## 2:125 49.50 c
## Y groups
## 2:15 155.75 a
## 3:70 145.75 ab
## 3:15 144.00 ab
## 1:15 134.75 ab
## 2:70 119.75 ab
## 3:125 85.50 bc
## 1:125 57.50 c
## 1:70 57.25 c
## 2:125 49.50 c
# --- Uji lanjut: Simple Effect (B dalam setiap level A) ---
hasil_simple <- list()
for (level_suhu in levels(fakt$suhu)) {
sub <- subset(fakt, suhu == level_suhu)
mod <- aov(Y ~ bahan, data = sub)
hasil_simple[[level_suhu]] <- LSD.test(mod, "bahan", console = FALSE)$groups
}
print(hasil_simple)## $`15`
## Y groups
## 2 155.75 a
## 3 144.00 a
## 1 134.75 a
##
## $`70`
## Y groups
## 3 145.75 a
## 2 119.75 a
## 1 57.25 b
##
## $`125`
## Y groups
## 3 85.5 a
## 1 57.5 ab
## 2 49.5 b
Kapan digunakan: Dua atau lebih faktor + blok (unit eksperimen tidak homogen).
Model: \(Y_{ijkl} = \mu + \tau_i + \beta_j + (\tau\beta)_{ij} + \rho_k + \varepsilon_{ijkl}\)
rakl <- expand.grid(
filter = factor(c("T1","T2")),
operator = factor(c("O1","O2","O3","O4")),
lokasi = factor(c("Rendah","Sedang","Tinggi"), levels = c("Rendah","Sedang","Tinggi"))
)
rakl$Y <- c(90,86, 96,84, 100,92, 92,81,
102,87, 106,90, 105,97, 96,80,
114,93, 112,91, 108,95, 98,83)
# Blok (operator) sebagai term aditif
model_rakl <- aov(Y ~ operator + lokasi * filter, data = rakl)
summary(model_rakl)## Df Sum Sq Mean Sq F value Pr(>F)
## operator 3 402.2 134.1 12.089 0.000277 ***
## lokasi 2 335.6 167.8 15.132 0.000253 ***
## filter 1 1066.7 1066.7 96.192 6.45e-08 ***
## lokasi:filter 2 77.1 38.5 3.476 0.057507 .
## Residuals 15 166.3 11.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Interaksi TIDAK signifikan → uji lanjut pada main effect
lsd_lok <- LSD.test(model_rakl, "lokasi", console = FALSE)
lsd_lok$groups## Y groups
## Tinggi 99.250 a
## Sedang 95.375 b
## Rendah 90.125 c
## Y groups
## T1 101.5833 a
## T2 88.2500 b
## Y groups
## Tinggi 99.250 a
## Sedang 95.375 a
## Rendah 90.125 b
| Rancangan | Faktor Perlakuan | Sumber Keragaman Dikontrol | Interaksi |
|---|---|---|---|
| RAL | 1 | Tidak ada (unit homogen) | - |
| RAKL | 1 | 1 arah (blok) | - |
| RBSL | 1 | 2 arah (baris & kolom) | - |
| Faktorial RAL | 2+ | Tidak ada (unit homogen) | Ada |
| Faktorial RAKL | 2+ | 1 arah (blok) | Ada |
Setelah ANOVA:
├── RAL / RAKL / RBSL (satu faktor)
│ └── Perlakuan signifikan? → LSD atau Tukey HSD langsung
│
└── Faktorial (dua faktor)
├── Interaksi SIGNIFIKAN?
│ ├── Tujuan: cari perlakuan terbaik → uji lanjut KOMBINASI perlakuan
│ └── Tujuan: jelaskan pola → uji SIMPLE EFFECT
└── Interaksi TIDAK signifikan?
└── Uji lanjut pada MAIN EFFECT yang signifikan
| Metode | Tujuan | Scratch (inti) | Package |
|---|---|---|---|
| Jackknife | Bias & SE suatu statistik | Loop LOO, hitung bias & SE manual | bootstrap::jackknife() |
| Bootstrap | Distribusi sampling, SE, CI | sample(..., replace=TRUE) dalam loop |
boot::boot() |
| OLS | Estimasi koefisien regresi | solve(t(X)%*%X) %*% t(X)%*%y |
lm() |
| MLE | Estimasi parameter distribusi | optim() minimasi -loglik |
MASS::fitdistr() |
| ANOVA | Bandingkan rata-rata ≥2 kelompok | Hitung JK, KT, F manual | aov() + agricolae |