Support Vector Regression (SVR) adalah versi regresi dari Support Vector Machine (SVM), yang bertujuan untuk menemukan fungsi regresi \(f(x)\) sedemikian sehingga deviasi dari nilai aktual tidak melebihi nilai tertentu (epsilon \(\varepsilon\)), dan model tetap sekompleks mungkin.
SVR meminimalkan: \[ \frac{1}{2} \|w\|^2 + C \sum_{i=1}^{n} (\xi_i + \xi_i^*) \] Dengan batasan: \[ \begin{aligned} &y_i - \langle w, x_i \rangle - b \leq \varepsilon + \xi_i \\ &\langle w, x_i \rangle + b - y_i \leq \varepsilon + \xi_i^* \\ &\xi_i, \xi_i^* \geq 0 \end{aligned} \]
data.frame(
Parameter = c("epsilon", "cost", "kernel", "gamma"),
Penjelasan = c(
"Toleransi kesalahan tanpa penalti",
"Ukuran penalti untuk kesalahan",
"Jenis fungsi kernel",
"Parameter RBF untuk kontrol kompleksitas")
) %>% knitr::kable()| Parameter | Penjelasan |
|---|---|
| epsilon | Toleransi kesalahan tanpa penalti |
| cost | Ukuran penalti untuk kesalahan |
| kernel | Jenis fungsi kernel |
| gamma | Parameter RBF untuk kontrol kompleksitas |
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
predict_and_evaluate <- function(model, data, target) {
pred <- predict(model, data)
RMSE <- sqrt(mean((data[[target]] - pred)^2))
MAE <- mean(abs(data[[target]] - pred))
R2 <- 1 - sum((data[[target]] - pred)^2) / sum((data[[target]] - mean(data[[target]]))^2)
return(data.frame(RMSE, MAE, R2))
}
result <- bind_rows(
predict_and_evaluate(ols_model, mtcars_scaled, "mpg"),
predict_and_evaluate(svr_linear, mtcars_scaled, "mpg"),
predict_and_evaluate(svr_rbf, mtcars_scaled, "mpg")
)
rownames(result) <- c("OLS", "SVR Linear", "SVR RBF")
result## RMSE MAE R2
## OLS 0.3562176 0.2858396 0.8690158
## SVR Linear 0.3704118 0.2804698 0.8583692
## SVR RBF 0.3525334 0.2696326 0.8717112
data_pred <- mtcars_scaled %>% mutate(pred_svr = predict(svr_rbf, .))
ggplot(data_pred, aes(x = mpg, y = pred_svr)) +
geom_point() +
geom_abline(slope = 1, intercept = 0, color = 'red', linetype = "dashed") +
labs(title = "Prediksi SVR vs Nilai Aktual", x = "mpg Aktual", y = "Prediksi")
Interpretasi: Garis merah menunjukkan prediksi
sempurna. Titik di sekitar garis menandakan performa prediksi yang
baik.
?svm, Package e1071.