Support Vector Regression (SVR) adalah metode regresi berbasis Support Vector Machine (SVM). SVR mencoba meminimalkan error sambil menjaga margin error yang disebut epsilon-tube. Tujuan SVR adalah mencari fungsi prediksi \(f(x)\) yang tidak menyimpang lebih dari \(\varepsilon\) dari nilai target \(y_i\) untuk semua data. Rumus utama dari SVR:
\[ \min_{w,b,\xi,\xi^*} \ \frac{1}{2} \| w \|^2 + C \sum_{i=1}^n (\xi_i + \xi_i^*) \]
dengan kendala:
\[ \begin{align*} y_i - (w \cdot x_i + b) &\leq \varepsilon + \xi_i \\ (w \cdot x_i + b) - y_i &\leq \varepsilon + \xi_i^* \\ \xi_i, \xi_i^* &\geq 0 \end{align*} \]
Model SVR memiliki dua varian utama: - Linear SVR: Kernel linear, cocok untuk hubungan linier. - Non-linear SVR: Kernel non-linear, misalnya RBF.
Sebagai pembanding, OLS (Ordinary Least Squares) meminimalkan:
\[ \min_{w,b} \sum_{i=1}^n (y_i - (w \cdot x_i + b))^2 \]
Untuk evaluasi, digunakan metrik: - RMSE: \(\sqrt{\frac{1}{n} \sum_{i=1}^n (y_i - \hat{y}_i)^2}\) - MAE: \(\frac{1}{n} \sum_{i=1}^n |y_i - \hat{y}_i|\) - R²: \(1 - \frac{\sum_{i=1}^n (y_i - \hat{y}_i)^2}{\sum_{i=1}^n (y_i - \bar{y})^2}\)
library(MASS)
data(Boston)
df <- Boston
summary(df)
## crim zn indus chas
## Min. : 0.00632 Min. : 0.00 Min. : 0.46 Min. :0.00000
## 1st Qu.: 0.08205 1st Qu.: 0.00 1st Qu.: 5.19 1st Qu.:0.00000
## Median : 0.25651 Median : 0.00 Median : 9.69 Median :0.00000
## Mean : 3.61352 Mean : 11.36 Mean :11.14 Mean :0.06917
## 3rd Qu.: 3.67708 3rd Qu.: 12.50 3rd Qu.:18.10 3rd Qu.:0.00000
## Max. :88.97620 Max. :100.00 Max. :27.74 Max. :1.00000
## nox rm age dis
## Min. :0.3850 Min. :3.561 Min. : 2.90 Min. : 1.130
## 1st Qu.:0.4490 1st Qu.:5.886 1st Qu.: 45.02 1st Qu.: 2.100
## Median :0.5380 Median :6.208 Median : 77.50 Median : 3.207
## Mean :0.5547 Mean :6.285 Mean : 68.57 Mean : 3.795
## 3rd Qu.:0.6240 3rd Qu.:6.623 3rd Qu.: 94.08 3rd Qu.: 5.188
## Max. :0.8710 Max. :8.780 Max. :100.00 Max. :12.127
## rad tax ptratio black
## Min. : 1.000 Min. :187.0 Min. :12.60 Min. : 0.32
## 1st Qu.: 4.000 1st Qu.:279.0 1st Qu.:17.40 1st Qu.:375.38
## Median : 5.000 Median :330.0 Median :19.05 Median :391.44
## Mean : 9.549 Mean :408.2 Mean :18.46 Mean :356.67
## 3rd Qu.:24.000 3rd Qu.:666.0 3rd Qu.:20.20 3rd Qu.:396.23
## Max. :24.000 Max. :711.0 Max. :22.00 Max. :396.90
## lstat medv
## Min. : 1.73 Min. : 5.00
## 1st Qu.: 6.95 1st Qu.:17.02
## Median :11.36 Median :21.20
## Mean :12.65 Mean :22.53
## 3rd Qu.:16.95 3rd Qu.:25.00
## Max. :37.97 Max. :50.00
model_ols <- lm(medv ~ ., data = df)
pred_ols <- predict(model_ols, df)
library(e1071)
## Warning: package 'e1071' was built under R version 4.4.3
model_svr_linear <- svm(medv ~ ., data = df, kernel = "linear")
pred_linear <- predict(model_svr_linear, df)
model_svr_rbf <- svm(medv ~ ., data = df, kernel = "radial")
pred_rbf <- predict(model_svr_rbf, df)
library(Metrics)
## Warning: package 'Metrics' was built under R version 4.4.3
r2 <- function(actual, predicted) 1 - sum((actual - predicted)^2) / sum((actual - mean(actual))^2)
results <- data.frame(
Model = c("OLS", "SVR Linear", "SVR RBF"),
RMSE = c(rmse(df$medv, pred_ols), rmse(df$medv, pred_linear), rmse(df$medv, pred_rbf)),
MAE = c(mae(df$medv, pred_ols), mae(df$medv, pred_linear), mae(df$medv, pred_rbf)),
R2 = c(r2(df$medv, pred_ols), r2(df$medv, pred_linear), r2(df$medv, pred_rbf))
)
results
plot(df$medv, pred_linear, col = "blue", pch = 16, main = "Prediksi SVR Linear vs Aktual", xlab = "Aktual Harga Rumah", ylab = "Prediksi")
abline(0, 1, col = "red")
svr_1d <- svm(medv ~ lstat, data = df, epsilon = 1, cost = 10)
plot(df$lstat, df$medv, main = "SVR 1D dengan Epsilon-tube", xlab = "% Lower Status Population", ylab = "Harga Rumah")
xgrid <- seq(min(df$lstat), max(df$lstat), length = 100)
ygrid <- predict(svr_1d, data.frame(lstat = xgrid))
lines(xgrid, ygrid, col = "blue", lwd = 2)
lines(xgrid, ygrid + 1, col = "red", lty = 2)
lines(xgrid, ygrid - 1, col = "red", lty = 2)
residuals <- df$medv - pred_rbf
plot(pred_rbf, residuals, pch = 16, col = "darkgreen", main = "Residual vs Prediksi SVR RBF", xlab = "Prediksi", ylab = "Residual")
abline(h = 0, col = "red")
tune_result <- tune(svm, medv ~ ., data = df,
kernel = "radial",
ranges = list(epsilon = c(0.1, 0.5, 1),
cost = c(1, 10, 100),
gamma = c(0.1, 0.5, 1)))
summary(tune_result)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## epsilon cost gamma
## 0.1 10 0.1
##
## - best performance: 10.3466
##
## - Detailed performance results:
## epsilon cost gamma error dispersion
## 1 0.1 1 0.1 14.81989 7.810659
## 2 0.5 1 0.1 17.42796 6.265855
## 3 1.0 1 0.1 34.32588 6.852562
## 4 0.1 10 0.1 10.34660 3.302572
## 5 0.5 10 0.1 15.16176 3.418004
## 6 1.0 10 0.1 31.65017 7.756733
## 7 0.1 100 0.1 12.93457 4.704263
## 8 0.5 100 0.1 15.80794 4.139034
## 9 1.0 100 0.1 31.65017 7.756733
## 10 0.1 1 0.5 25.36572 10.283560
## 11 0.5 1 0.5 32.42749 10.348636
## 12 1.0 1 0.5 53.64137 7.641906
## 13 0.1 10 0.5 19.65814 6.117592
## 14 0.5 10 0.5 29.29553 7.089881
## 15 1.0 10 0.5 52.93527 6.506454
## 16 0.1 100 0.5 19.78443 6.185621
## 17 0.5 100 0.5 29.29553 7.089881
## 18 1.0 100 0.5 52.93527 6.506454
## 19 0.1 1 1.0 37.53796 12.999375
## 20 0.5 1 1.0 44.67850 11.422224
## 21 1.0 1 1.0 64.52798 9.289612
## 22 0.1 10 1.0 30.24874 8.108948
## 23 0.5 10 1.0 40.93718 7.962534
## 24 1.0 10 1.0 63.99485 7.828871
## 25 0.1 100 1.0 30.24874 8.108948
## 26 0.5 100 1.0 40.93718 7.962534
## 27 1.0 100 1.0 63.99485 7.828871
tuned <- tune_result$best.model
summary(tuned)
##
## Call:
## best.tune(METHOD = svm, train.x = medv ~ ., data = df, ranges = list(epsilon = c(0.1,
## 0.5, 1), cost = c(1, 10, 100), gamma = c(0.1, 0.5, 1)), kernel = "radial")
##
##
## Parameters:
## SVM-Type: eps-regression
## SVM-Kernel: radial
## cost: 10
## gamma: 0.1
## epsilon: 0.1
##
##
## Number of Support Vectors: 339
Model SVR dengan kernel RBF menunjukkan hasil prediksi yang sangat kompetitif dibandingkan OLS. Visualisasi epsilon-tube membantu memahami batas toleransi prediksi. Tuning parameter memberikan peningkatan akurasi yang nyata, menunjukkan pentingnya proses optimisasi model dalam regresi non-linear.