Beyond Linearity
Package yang digunakan dalam analisis ini adalah :
Dataset
Dataset yang digunakan adalah Auto dari package ISLR dengan memanfaatkan variabel mpg dan horsepower
## The following object is masked from package:ggplot2:
##
## mpg
| mpg | cylinders | displacement | horsepower | weight | acceleration | year | origin | name |
|---|---|---|---|---|---|---|---|---|
| 18 | 8 | 307 | 130 | 3504 | 12.0 | 70 | 1 | chevrolet chevelle malibu |
| 15 | 8 | 350 | 165 | 3693 | 11.5 | 70 | 1 | buick skylark 320 |
| 18 | 8 | 318 | 150 | 3436 | 11.0 | 70 | 1 | plymouth satellite |
| 16 | 8 | 304 | 150 | 3433 | 12.0 | 70 | 1 | amc rebel sst |
| 17 | 8 | 302 | 140 | 3449 | 10.5 | 70 | 1 | ford torino |
| 15 | 8 | 429 | 198 | 4341 | 10.0 | 70 | 1 | ford galaxie 500 |
Sebelum dilakukan pemodelan, dapat dilihat dari scatter plot nampak hubungan tidak linear antara variabel mpg dan horsepower
Cross Validation Model
Berikut adalah seluruh function yang digunakan dalam pemodelan regresi polynomial, piecewise, dan natural cubic spline. Validasi silang menggukankan 5-Fold untuk mencari nilai metric evaluasi (RMSE) unutk memilih model terbaik secara empiris untuk memrediksi konsumsi bahan bakar dengan tenaga mesin
set.seed(30000)
cv.polyreg = function(hyperpar, dataset){
res = NULL
for (i in hyperpar) {
train.control <- trainControl(method = "cv", number = 5)
stm = paste0("mpg ~ poly(horsepower,",i,")")
model <- train(as.formula(stm), data = dataset, method = "lm",
trControl = train.control)
temp =cbind(model = "polynomial",hyper = i , model$results )
res=rbind(res,temp)
model$finalModel
}
res
}
cv.piecewise = function(hyperpar, dataset){
res = NULL
for (i in hyperpar) {
train.control <- trainControl(method = "cv", number = 5)
stm = paste0("mpg ~ cut(horsepower,",i,")")
model <- train(as.formula(stm), data = dataset, method = "lm",
trControl = train.control)
temp =cbind(model ="piecewise",hyper = i , model$results )
res=rbind(res,temp)
model$finalModel
}
res
}
regspline = lm(mpg ~ ns(horsepower, df =))
cv.ncubicspline = function(hyperpar, dataset){
res = NULL
for (i in hyperpar) {
train.control <- trainControl(method = "cv", number = 5)
stm = paste0("mpg ~ ns(horsepower,df = ",i,")")
model <- train(as.formula(stm), data = dataset, method = "lm",
trControl = train.control)
temp =cbind(model ="ncubicspline",hyper = i , model$results )
res=rbind(res,temp)
model$finalModel
}
res
}
runNonLinear = function(dataset,hypoly=2:20, hypw=2:50,hyncs=2:50){
resultpoly = cv.polyreg(hypoly,dataset)
resultpw = cv.piecewise(hypw,dataset)
resultncs = cv.ncubicspline(hyncs,dataset)
allresult = rbind(resultpoly,resultpw,resultncs)
}
#Fungsi plot
fmPlot = function(model,dataset){
ggplot(dataset,aes(x=horsepower, y=mpg)) +
geom_point(alpha=0.55, color="black")+
stat_smooth(method = "lm",
formula = as.formula(model),
lty = 1,se=F)+labs(title = paste("Final Model Plot :",model))
}Results
All Regions
Dilakukan polynomial regresion untuk degree 2-20, piecewiese dengan cut 2-50, dan natural cubic spline untuk df 2-50
allres = runNonLinear(dataset)
ggplot(allres,aes(x=hyper, y=RMSE,group=model)) +
geom_line(aes( color=model), size=1) +labs(title = "Model Comparison")| model | hyper | intercept | RMSE | Rsquared | MAE | RMSESD | RsquaredSD | MAESD |
|---|---|---|---|---|---|---|---|---|
| ncubicspline | 30 | TRUE | 4.219206 | 0.710297 | 3.177063 | 0.3857166 | 0.0437547 | 0.338039 |
Berdasarkan pemodelan tersebut, dipilih natural cubic spline dengan df=30 yang memiliki RMSE terkecil. Berikut plot model tersebut :
America
Dilakukan polynomial regresion untuk degree 2-20, piecewiese dengan cut 2-50, dan natural cubic spline untuk df 2-50
datamerica = dataset %>% filter(origin ==1)
ameres = runNonLinear(datamerica)
ameres %>% slice(which.min(RMSE)) %>% kbl() %>% kable_styling()| model | hyper | intercept | RMSE | Rsquared | MAE | RMSESD | RsquaredSD | MAESD |
|---|---|---|---|---|---|---|---|---|
| piecewise | 18 | TRUE | 3.639119 | 0.676671 | 2.755722 | 0.4676462 | 0.0822582 | 0.3692884 |
Berdasarkan pemodelan tersebut, dipilih piecewise regression dengan cut =18 yang memiliki RMSE terkecil. Berikut plot model tersebut :
Europe
Dilakukan polynomial regresion untuk degree 2-19, piecewiese dengan cut 2-50, dan natural cubic spline untuk df 2-50
dateropa = dataset %>% filter(origin ==2)
eropares = runNonLinear(dateropa,hypoly = 2:19)
eropares %>% slice(which.min(RMSE)) %>% kbl() %>% kable_styling()| model | hyper | intercept | RMSE | Rsquared | MAE | RMSESD | RsquaredSD | MAESD |
|---|---|---|---|---|---|---|---|---|
| polynomial | 3 | TRUE | 4.900887 | 0.4754153 | 3.840624 | 0.6362768 | 0.163826 | 0.566287 |
Berdasarkan pemodelan tersebut, dipilih polynomial derajat 3 yang memiliki RMSE terkecil. Berikut plot model tersebut :
Japan
Dilakukan polynomial regresion untuk degree 2-19, piecewiese dengan cut 2-50, dan natural cubic spline untuk df 2-50
datjapan = dataset %>% filter(origin ==3)
japanres = runNonLinear(datjapan,hypoly = 1:18)
japanres %>% slice(which.min(RMSE)) %>% kbl() %>% kable_styling()| model | hyper | intercept | RMSE | Rsquared | MAE | RMSESD | RsquaredSD | MAESD |
|---|---|---|---|---|---|---|---|---|
| polynomial | 3 | TRUE | 3.874093 | 0.6218529 | 2.972925 | 1.221668 | 0.149859 | 0.866784 |
Berdasarkan pemodelan tersebut, dipilih polynomial regression dengan derajat 3 yang memiliki RMSE terkecil. Berikut plot model tersebut :