Soal
Data untuk tugas dapat diperoleh di package ISLR.Silahkan Install dulu packagenya.
A. Define Data
Gunakan model-model non-linier yang sudah dipelajari pada data Auto dengan peubah respon mpg dan pilih tiga kolom untuk dijadikan peubah prediktor.
library(ISLR)
attach(Auto)
#Memanggil data yang akan dismulasikan
View(Auto)Akan dipilih variabel prediktor displacement, horsepower, dan weight.
B. Visualize Data
Apakah ada bukti untuk hubungan non-linier untuk peubah-peubah yang anda pilih? Buat visualisasi untuk Mendukung jawaban Anda.
library(tidyverse)## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.3 v dplyr 1.0.7
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 2.0.1 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
#Membuat plot HP vs MpG
#Cobakan dengan model linier
mod_linear = lm(mpg~horsepower,data=Auto)
summary(mod_linear)##
## Call:
## lm(formula = mpg ~ horsepower, data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.5710 -3.2592 -0.3435 2.7630 16.9240
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.935861 0.717499 55.66 <2e-16 ***
## horsepower -0.157845 0.006446 -24.49 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.906 on 390 degrees of freedom
## Multiple R-squared: 0.6059, Adjusted R-squared: 0.6049
## F-statistic: 599.7 on 1 and 390 DF, p-value: < 2.2e-16
#Plot model linier
ggplot(Auto,aes(x=horsepower, y=mpg)) +
geom_point(alpha=0.55, color="black") +
stat_smooth(method = "lm",
formula = y~x,lty = 1,
col = "blue",se = T)+
theme_bw()#Membuat plot HP vs displacement
#Cobakan dengan model linier
mod_linear = lm(mpg~Auto$weight,data=Auto)
summary(mod_linear)##
## Call:
## lm(formula = mpg ~ Auto$weight, data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.9736 -2.7556 -0.3358 2.1379 16.5194
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 46.216524 0.798673 57.87 <2e-16 ***
## Auto$weight -0.007647 0.000258 -29.64 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.333 on 390 degrees of freedom
## Multiple R-squared: 0.6926, Adjusted R-squared: 0.6918
## F-statistic: 878.8 on 1 and 390 DF, p-value: < 2.2e-16
#Plot model linier
ggplot(Auto,aes(x=Auto$weight, y=mpg)) +
geom_point(alpha=0.55, color="black") +
stat_smooth(method = "lm",
formula = y~x,lty = 1,
col = "blue",se = T)+ xlab("Weight") +
theme_bw()## Warning: Use of `Auto$weight` is discouraged. Use `weight` instead.
## Warning: Use of `Auto$weight` is discouraged. Use `weight` instead.
#Membuat plot HP vs Acceleration
#Cobakan dengan model linier
mod_linear = lm(mpg~Auto$acceleration,data=Auto)
summary(mod_linear)##
## Call:
## lm(formula = mpg ~ Auto$acceleration, data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.989 -5.616 -1.199 4.801 23.239
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.8332 2.0485 2.359 0.0188 *
## Auto$acceleration 1.1976 0.1298 9.228 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.08 on 390 degrees of freedom
## Multiple R-squared: 0.1792, Adjusted R-squared: 0.1771
## F-statistic: 85.15 on 1 and 390 DF, p-value: < 2.2e-16
#Plot model linier
ggplot(Auto,aes(x=Auto$acceleration, y=mpg)) +
geom_point(alpha=0.55, color="black") +
stat_smooth(method = "lm",
formula = y~x,lty = 1,
col = "blue",se = T)+xlab("Acceleration") +
theme_bw()## Warning: Use of `Auto$acceleration` is discouraged. Use `acceleration` instead.
## Warning: Use of `Auto$acceleration` is discouraged. Use `acceleration` instead.
Berdasarkan tiga output plot di atas, terlihat bahwa hubungan antara variabel mpg dan acceleration linier, begitu juga hubungan antara variabel mpg dengan weight. Hal yang berbeda ditunjukkan oleh pola hubungan variabel mpg dan horsepower dimana pola hubunganya cenderung tidak linier terlihat pada nilai X yang berada di ujung atas dan bawah.
C. Menentukan Model
Tentukan model non-linear terbaik untuk masing pasangan peubah (Secara visual dan/atau secara empiris)
1. Regresi Polinomial
Pada pendekatan polinomial ini akan diuji dengan pendekatan polinomial berderajat 2,3, dan 4 untuk dilihat secara visual dan juga secara empiris dengan nilai mrse dan mae nya.
library(tidyverse)
library(splines)
library(rsample)
#Regresi polinomial derajat 2
mod_polinomial2 = lm(mpg ~ poly(horsepower,2,raw = T),
data=Auto)
summary(mod_polinomial2)##
## Call:
## lm(formula = mpg ~ poly(horsepower, 2, raw = T), data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.7135 -2.5943 -0.0859 2.2868 15.8961
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 56.9000997 1.8004268 31.60 <2e-16 ***
## poly(horsepower, 2, raw = T)1 -0.4661896 0.0311246 -14.98 <2e-16 ***
## poly(horsepower, 2, raw = T)2 0.0012305 0.0001221 10.08 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.374 on 389 degrees of freedom
## Multiple R-squared: 0.6876, Adjusted R-squared: 0.686
## F-statistic: 428 on 2 and 389 DF, p-value: < 2.2e-16
#Plot model polinom derajat 2
ggplot(Auto,aes(x=horsepower, y=mpg)) +
geom_point(alpha=0.55, color="black") +
stat_smooth(method = "lm",
formula = y~poly(x,2,raw=T),
lty = 1, col = "blue",se = T)+
labs(title= "Polinom d=2")+
theme_bw()#Regresi polinomial derajat 3
mod_polinomial3 = lm(mpg ~ poly(horsepower,3,raw = T),
data=Auto)
summary(mod_polinomial3)##
## Call:
## lm(formula = mpg ~ poly(horsepower, 3, raw = T), data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.7039 -2.4491 -0.1519 2.2035 15.8159
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.068e+01 4.563e+00 13.298 < 2e-16 ***
## poly(horsepower, 3, raw = T)1 -5.689e-01 1.179e-01 -4.824 2.03e-06 ***
## poly(horsepower, 3, raw = T)2 2.079e-03 9.479e-04 2.193 0.0289 *
## poly(horsepower, 3, raw = T)3 -2.147e-06 2.378e-06 -0.903 0.3673
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.375 on 388 degrees of freedom
## Multiple R-squared: 0.6882, Adjusted R-squared: 0.6858
## F-statistic: 285.5 on 3 and 388 DF, p-value: < 2.2e-16
#Plot model polinom derajat 3
ggplot(Auto,aes(x=horsepower, y=mpg)) +
geom_point(alpha=0.55, color="black") +
stat_smooth(method = "lm",
formula = y~poly(x,3,raw=T),
lty = 1, col = "blue",se = T)+labs(title= "Polinom d=3")+
theme_bw()#Regresi polinomial derajat 4
mod_polinomial4 = lm(mpg ~ poly(horsepower,3,raw = T),
data=Auto)
summary(mod_polinomial4)##
## Call:
## lm(formula = mpg ~ poly(horsepower, 3, raw = T), data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.7039 -2.4491 -0.1519 2.2035 15.8159
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.068e+01 4.563e+00 13.298 < 2e-16 ***
## poly(horsepower, 3, raw = T)1 -5.689e-01 1.179e-01 -4.824 2.03e-06 ***
## poly(horsepower, 3, raw = T)2 2.079e-03 9.479e-04 2.193 0.0289 *
## poly(horsepower, 3, raw = T)3 -2.147e-06 2.378e-06 -0.903 0.3673
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.375 on 388 degrees of freedom
## Multiple R-squared: 0.6882, Adjusted R-squared: 0.6858
## F-statistic: 285.5 on 3 and 388 DF, p-value: < 2.2e-16
#Plot model polinom derajat 4
ggplot(Auto,aes(x=horsepower, y=mpg)) +
geom_point(alpha=0.55, color="black") +
stat_smooth(method = "lm",
formula = y~poly(x,4,raw=T),
lty = 1, col = "blue",se = T)+labs(title= "Polinom d=4")+
theme_bw()#Uji Empiris Regresi Linier
set.seed(123)
cross_val <- vfold_cv(Auto,v=10,strata = "mpg")
metric_linear1 <- map_dfr(cross_val$splits,
function(x){
mod <- lm(mpg ~ horsepower,
data=Auto[x$in_id,])
pred <- predict(mod,
newdata=Auto[-x$in_id,])
truth <- Auto[-x$in_id,]$mpg
rmse <- mlr3measures::rmse(truth = truth,
response = pred
)
mae <- mlr3measures::mae(truth = truth,
response = pred
)
metric <- c(rmse,mae)
names(metric) <- c("rmse","mae")
return(metric)
}
)
metric_linear1# menghitung rata-rata untuk 10 folds
mean_metric_linear1 <- colMeans(metric_linear1)
mean_metric_linear1 ## rmse mae
## 4.889086 3.832328
#Perbandingan Model dengan RMSE dan MAE polinom derajat 2
set.seed(123)
cross_val <- vfold_cv(Auto,v=10,strata = "mpg")
metric_poly_d2 <- map_dfr(cross_val$splits,
function(x){
mod <- lm(mpg ~ poly(horsepower,2,raw = T),
data=Auto[x$in_id,])
pred <- predict(mod,
newdata=Auto[-x$in_id,])
truth <- Auto[-x$in_id,]$mpg
rmse <- mlr3measures::rmse(truth = truth,
response = pred
)
mae <- mlr3measures::mae(truth = truth,
response = pred
)
metric <- c(rmse,mae)
names(metric) <- c("rmse","mae")
return(metric)
}
)
metric_poly_d2# menghitung rata-rata untuk 10 folds
mean_metric_poly_d2 <- colMeans(metric_poly_d2)
mean_metric_poly_d2## rmse mae
## 4.351129 3.266460
#Perbandingan Model dengan RMSE dan MAE polinom derajat 3
set.seed(123)
cross_val <- vfold_cv(Auto,v=10,strata = "mpg")
metric_poly_d3 <- map_dfr(cross_val$splits,
function(x){
mod <- lm(mpg ~ poly(horsepower,3,raw = T),
data=Auto[x$in_id,])
pred <- predict(mod,
newdata=Auto[-x$in_id,])
truth <- Auto[-x$in_id,]$mpg
rmse <- mlr3measures::rmse(truth = truth,
response = pred
)
mae <- mlr3measures::mae(truth = truth,
response = pred
)
metric <- c(rmse,mae)
names(metric) <- c("rmse","mae")
return(metric)
}
)
metric_poly_d3# menghitung rata-rata untuk 10 folds
mean_metric_poly_d3 <- colMeans(metric_poly_d3)
mean_metric_poly_d3## rmse mae
## 4.355430 3.266945
#Perbandingan Model dengan RMSE dan MAE polinom derajat 4
set.seed(333)
cross_val <- vfold_cv(Auto,v=10,strata = "mpg")
metric_poly_d4 <- map_dfr(cross_val$splits,
function(x){
mod <- lm(mpg ~ poly(horsepower,4,raw = T),
data=Auto[x$in_id,])
pred <- predict(mod,
newdata=Auto[-x$in_id,])
truth <- Auto[-x$in_id,]$mpg
rmse <- mlr3measures::rmse(truth = truth,
response = pred
)
mae <- mlr3measures::mae(truth = truth,
response = pred
)
metric <- c(rmse,mae)
names(metric) <- c("rmse","mae")
return(metric)
}
)
metric_poly_d4# menghitung rata-rata untuk 10 folds
mean_metric_poly_d4 <- colMeans(metric_poly_d4)
mean_metric_poly_d4## rmse mae
## 4.368638 3.290133
2. Regresi Tangga
Pada pendekatan fungsi tangga ini akan dicari pada knots berapa sehingga memiliki nilai mrse dan mae yang paling kecil dan juga akan ditunjukkan visualisasi datanya.
library(tidyverse)
library(splines)
library(rsample)
#Uji Dengan regresi fungsi tangga
mod_tangga1 = lm(mpg ~ cut(horsepower,5),data=Auto)
summary(mod_tangga1)##
## Call:
## lm(formula = mpg ~ cut(horsepower, 5), data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.3033 -3.0220 -0.5413 2.4074 16.6394
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 31.3033 0.4272 73.27 <2e-16 ***
## cut(horsepower, 5)(82.8,120] -8.2813 0.5642 -14.68 <2e-16 ***
## cut(horsepower, 5)(120,156] -15.2427 0.7210 -21.14 <2e-16 ***
## cut(horsepower, 5)(156,193] -17.5922 1.0036 -17.53 <2e-16 ***
## cut(horsepower, 5)(193,230] -18.5340 1.3767 -13.46 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.719 on 387 degrees of freedom
## Multiple R-squared: 0.6382, Adjusted R-squared: 0.6345
## F-statistic: 170.7 on 4 and 387 DF, p-value: < 2.2e-16
#Sebaran plotnya
ggplot(Auto,aes(x=horsepower, y=mpg)) +
geom_point(alpha=0.55, color="black") +
stat_smooth(method = "lm",
formula = y~cut(x,5),
lty = 1, col = "blue",se = F)+
theme_bw()mod1 <- lm(mpg ~ horsepower,
data=Auto)
#Jika diuji secara empirin dengan RMSE dan MAE
set.seed(223)
cross_val1 <- vfold_cv(Auto,v=10,strata = "mpg")
az<-cross_val1$splits[[1]]
#Misal ingin dicoba-coba pada interval berapa model terbaiknya, misal dicoba intervalnya 3 -10
breaks <- 3:10
best_tangga1 <- map_dfr(breaks, function(i){
metric_tangga1 <- map_dfr(cross_val1$splits,
function(x){
training1 <- Auto[x$in_id,]
training1$horsepower <- cut(training1$horsepower,i)
mod1 <- lm(mpg ~ horsepower,
data=training1)
labs_x <- levels(mod1$model[,2])
labs_x_breaks <- cbind(lower = as.numeric( sub("\\((.+),.*", "\\1", labs_x) ),
upper = as.numeric( sub("[^,]*,([^]]*)\\]", "\\1", labs_x) ))
testing1 <- Auto[-x$in_id,]
horsepower_new <- cut(testing1$horsepower,c(labs_x_breaks[1,1],labs_x_breaks[,2]))
pred1 <- predict(mod1,
newdata=list(horsepower=horsepower_new))
truth <- testing1$mpg
data_eval1 <- na.omit(data.frame(truth,pred1))
rmse <- mlr3measures::rmse(truth = data_eval1$truth,
response = data_eval1$pred1
)
mae <- mlr3measures::mae(truth = data_eval1$truth,
response = data_eval1$pred1
)
metric <- c(rmse,mae)
names(metric) <- c("rmse","mae")
return(metric)
}
)
metric_tangga1
# menghitung rata-rata untuk 10 folds
mean_metric_tangga1 <- colMeans(metric_tangga1)
mean_metric_tangga1
})
best_tangga1 <- cbind(breaks=breaks,best_tangga1)
# menampilkan hasil all breaks
best_tangga1#berdasarkan rmse
best_tangga1 %>% slice_min(rmse)#berdasarkan mae
best_tangga1 %>% slice_min(mae)3. Regresi Spline
Pada pendekatan regresi spline ini akan dicoba dengan menggunakan base spline dan juga naturan spline untuk kemudian dibandingkan mana yang memiliki nilai rmsep dan mae yang paling kecil.
library(tidyverse)
library(splines)
library(rsample)
#b-spline
dim(bs(Auto$horsepower, knots = c(80, 120,180)))## [1] 392 6
mod_spline3 = lm(mpg ~ bs(horsepower, knots = c(80, 120,180)),data=Auto)
summary(mod_spline3)##
## Call:
## lm(formula = mpg ~ bs(horsepower, knots = c(80, 120, 180)), data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.6595 -2.6233 -0.2615 2.2284 15.2843
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 32.838 1.757 18.691 < 2e-16
## bs(horsepower, knots = c(80, 120, 180))1 5.778 2.548 2.268 0.0239
## bs(horsepower, knots = c(80, 120, 180))2 -9.224 1.840 -5.013 8.19e-07
## bs(horsepower, knots = c(80, 120, 180))3 -14.738 2.510 -5.871 9.37e-09
## bs(horsepower, knots = c(80, 120, 180))4 -21.074 2.620 -8.043 1.09e-14
## bs(horsepower, knots = c(80, 120, 180))5 -20.772 3.509 -5.919 7.15e-09
## bs(horsepower, knots = c(80, 120, 180))6 -18.484 3.219 -5.742 1.91e-08
##
## (Intercept) ***
## bs(horsepower, knots = c(80, 120, 180))1 *
## bs(horsepower, knots = c(80, 120, 180))2 ***
## bs(horsepower, knots = c(80, 120, 180))3 ***
## bs(horsepower, knots = c(80, 120, 180))4 ***
## bs(horsepower, knots = c(80, 120, 180))5 ***
## bs(horsepower, knots = c(80, 120, 180))6 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.294 on 385 degrees of freedom
## Multiple R-squared: 0.7019, Adjusted R-squared: 0.6973
## F-statistic: 151.1 on 6 and 385 DF, p-value: < 2.2e-16
#natural spline
mod_spline3ns = lm(mpg ~ ns(horsepower, knots = c(80, 120,180)),data=Auto)
summary(mod_spline3ns)##
## Call:
## lm(formula = mpg ~ ns(horsepower, knots = c(80, 120, 180)), data = Auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.8425 -2.4877 -0.1741 2.2245 15.9857
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 38.260 1.089 35.12 <2e-16
## ns(horsepower, knots = c(80, 120, 180))1 -22.367 1.262 -17.72 <2e-16
## ns(horsepower, knots = c(80, 120, 180))2 -21.902 1.758 -12.46 <2e-16
## ns(horsepower, knots = c(80, 120, 180))3 -35.347 2.706 -13.06 <2e-16
## ns(horsepower, knots = c(80, 120, 180))4 -18.569 1.852 -10.02 <2e-16
##
## (Intercept) ***
## ns(horsepower, knots = c(80, 120, 180))1 ***
## ns(horsepower, knots = c(80, 120, 180))2 ***
## ns(horsepower, knots = c(80, 120, 180))3 ***
## ns(horsepower, knots = c(80, 120, 180))4 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.369 on 387 degrees of freedom
## Multiple R-squared: 0.6899, Adjusted R-squared: 0.6867
## F-statistic: 215.2 on 4 and 387 DF, p-value: < 2.2e-16
#Buat plot bs
ggplot(Auto,aes(x=horsepower, y=mpg)) +
geom_point(alpha=0.55, color="black") +
stat_smooth(method = "lm",
formula = y~bs(x, knots = c(80, 120,180)),
lty = 1,se = F)#Buat plot ns
ggplot(Auto,aes(x=horsepower, y=mpg)) +
geom_point(alpha=0.55, color="black")+
stat_smooth(method = "lm",
formula = y~ns(x, knots = c(80, 120,180)),
lty = 1,se=F)#Secara empiris
set.seed(123)
cross_val <- vfold_cv(Auto,v=10,strata = "mpg")
df <- 6:12
best_spline3 <- map_dfr(df, function(i){
metric_spline3 <- map_dfr(cross_val$splits,
function(x){
mod <- lm(mpg ~ bs(horsepower,df=i),
data=Auto[x$in_id,])
pred <- predict(mod,
newdata=Auto[-x$in_id,])
truth <- Auto[-x$in_id,]$mpg
rmse <- mlr3measures::rmse(truth = truth,
response = pred
)
mae <- mlr3measures::mae(truth = truth,
response = pred
)
metric <- c(rmse,mae)
names(metric) <- c("rmse","mae")
return(metric)
}
)
metric_spline3
# menghitung rata-rata untuk 10 folds
mean_metric_spline3 <- colMeans(metric_spline3)
mean_metric_spline3
}
)## Warning in bs(horsepower, degree = 3L, knots = c(`25%` = 75, `50%` = 92.5, :
## some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(horsepower, degree = 3L, knots = c(`20%` = 72, `40%` = 88, : some
## 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(horsepower, degree = 3L, knots = c(`16.66667%` = 70, `33.33333%` =
## 84, : some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(horsepower, degree = 3L, knots = c(`14.28571%` = 68, `28.57143%`
## = 78.8571428571428, : some 'x' values beyond boundary knots may cause ill-
## conditioned bases
## Warning in bs(horsepower, degree = 3L, knots = c(`12.5%` = 67, `25%` = 75, :
## some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(horsepower, degree = 3L, knots = c(`11.11111%` = 67, `22.22222%` =
## 75, : some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(horsepower, degree = 3L, knots = c(`10%` = 66.3, `20%` = 72, :
## some 'x' values beyond boundary knots may cause ill-conditioned bases
best_spline3 <- cbind(df=df,best_spline3)
# menampilkan hasil all breaks
best_spline3#berdasarkan rmse
best_spline3 %>% slice_min(rmse)#berdasarkan mae
best_spline3 %>% slice_min(mae)4. Komparasi Model
Setelah didapatkan hasil pendekatan dengan berbagai macam pendekatan, berikut ini adalah hasil komparasi terhadap nilai rmse dan mae nya untuk kemudian ditentukan mana pendekatan yang paling baik untuk mengetahui pola hubungan antara variabel mpg dan horsepower.
library(tidyverse)
library(splines)
library(rsample)
#Perbandingan semua model
nilai_metric <- rbind(mean_metric_linear1,
mean_metric_poly_d2,
mean_metric_poly_d3,
mean_metric_poly_d4,
best_tangga1 %>% select(-1) %>% slice_min(mae),
best_spline3 %>% select(-1)%>% slice_min(mae)
)
nama_model <- c("Linear",
"Poly2",
"Poly3","Poly4","Tangga(breaks=8)",
"spline(df=12)"
)
data.frame(nama_model,nilai_metric)Dari output tersebut dapat disimpulkan bahwa regresi fungsi natural cubic splines adalah pendekatan yang paling baik dalam menjelaskan hubungan antara variabel mpg dan horsepower, hal itu terlihat dari nilai mae nya paling kecil.