1. Pendahuluan
1.1. Latar Belakang
Pengepasan kurva (fitting curve) merupakan salah satu masalah yang dihadapi dalam bidang statistika yang berakhir pada masalah pendugaan garis regresi, misalnya regresi nonlinear. Oleh karena itu, diperlukan pemodelan yang tepat untuk mengetahui hubungan fungsional antara satu peubah dengan peubah lain.
1.2. Deskripsi Data
- Data yang digunakan terdiri atas 2 jenis data, yaitu:
- Data bangkitan dengan 1000 amatan yang menyebar normal dengan mean sebesar 2 dan ragam sebesar 1.
- Dataset rock Dataset rock terdiri atas 48 amatan dan 4 peubah dengan
rincian tipe peubah:
- area (numeric)
- peri (numeric)
- shape (numeric)
- perm (numeric) Pada kasus kali ini, peubah yang akan digunakan adalah peubah area dan peri.
1.3. Package
- Package yang digunakan dalam analisis:
- tidyverse
- ggplot2
- dplyr
- purrr
- rsample
- splines
2. Pemodelan Nonlinear Data Bangkitan
2.1. Inisiasi Data
set.seed(82)
data.x <- rnorm(1000,2,1)
err <- rnorm(1000)
y <- 5+2*data.x+6*data.x^2+err
plot(data.x,y,xlim=c(-2,5), ylim=c(-10,70))
Berdasarkan plot di atas, dapat dilihat bahwa plot membentuk pola yang
tidak linear. Hal ini dapat dilihat pada pola grafik yang tidak
membentuk garis lurus.
2.2. Pemodelan Data Linier
lin.mod <-lm( y~data.x)
plot( data.x,y,xlim=c( -2,5), ylim=c( -10,70))
lines(data.x,lin.mod$fitted.values,col="red")
summary(lin.mod)
##
## Call:
## lm(formula = y ~ data.x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.218 -4.913 -2.860 1.899 50.580
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -13.1952 0.5507 -23.96 <2e-16 ***
## data.x 25.9413 0.2476 104.76 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.633 on 998 degrees of freedom
## Multiple R-squared: 0.9166, Adjusted R-squared: 0.9166
## F-statistic: 1.097e+04 on 1 and 998 DF, p-value: < 2.2e-16
Berdasarkan plot hasil pemodelan data linear, garis regresi linear tidak mengikuti pola data.
2.3. Pemodelan Data Polynomial
pol.mod <- lm( y~data.x+I(data.x^2)) #Ordo 2
ix <- sort( data.x,index.return=T)$ix
plot(data.x,y,xlim=c(-2,5), ylim=c(-10,70))
lines(data.x[ix], pol.mod$fitted.values[ix],col="blue", cex=2)
summary(pol.mod)
##
## Call:
## lm(formula = y ~ data.x + I(data.x^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.2431 -0.7114 -0.0032 0.7236 3.0194
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.03639 0.10738 46.91 <2e-16 ***
## data.x 2.00131 0.10768 18.59 <2e-16 ***
## I(data.x^2) 5.98938 0.02563 233.69 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.023 on 997 degrees of freedom
## Multiple R-squared: 0.9985, Adjusted R-squared: 0.9985
## F-statistic: 3.33e+05 on 2 and 997 DF, p-value: < 2.2e-16
Berdasarkan plot pemodelan data polynomial, dapat dilihat bahwa garis berwarna biru mengikuti pola data. Hal ini sudah sesuai dengan model yang dibangun, yakni model polynomial ordo 2.
2.4. Pemodelan Data Fungsi Tangga
#regresi fungsi tangga
range(data.x)
## [1] -1.074611 4.694234
c1 <- as.factor(ifelse(data.x<=0,1,0))
c2 <- as.factor(ifelse(data.x<=2 & data.x>0,1,0))
c3 <- as.factor(ifelse(data.x>2,1,0))
step.mod <- lm(y~c1+c2+c3)
plot(data.x,y,xlim=c(-2,5), ylim=c(-10,70))
lines(data.x,lin.mod$fitted.values,col="red")
lines(data.x[ix], step.mod$fitted.values[ix],col="green")
Berdasarkan output di atas, garis hijau merupakan fungsi tangga dan
garis merah merupakan fungsi regresi linear. Dapat dilihat bahwa garis
hijau membentuk 3 selang pada data. Selang pertama adalah data.x dalam
rentang -1.1 sampai 0. Selang kedua adalah data.x dalam rentang 0 sampai
2 dan selang ketiga adalah data.x dalam rentang 2 sampai 4.7. Secata
eksploratif, garis fungsi tangga lebih mengikuti pola data dibandingkan
dengan garis fungsi regresi linear.
summary(step.mod)
##
## Call:
## lm(formula = y ~ c1 + c2 + c3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.217 -10.863 -2.345 7.748 86.626
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 59.0160 0.7495 78.74 <2e-16 ***
## c11 -53.3047 3.9173 -13.61 <2e-16 ***
## c21 -40.2159 1.0704 -37.57 <2e-16 ***
## c31 NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16.76 on 997 degrees of freedom
## Multiple R-squared: 0.5985, Adjusted R-squared: 0.5977
## F-statistic: 743.2 on 2 and 997 DF, p-value: < 2.2e-16
2.5. Perbandingan Hasil Pemodelan
MSE = function(pred, actual){
mean((pred-actual)^2)
}
model <- data.frame("Model" = c("Linear", "Polynomial (2)", "Step (6)"), "AIC" = c(AIC(lin.mod), AIC(pol.mod), AIC(step.mod)), "MSE" = c(MSE(predict(lin.mod), y), MSE(predict(pol.mod), y), MSE(predict(step.mod), y)))
knitr::kable(model)
| Model | AIC | MSE |
|---|---|---|
| Linear | 6906.859 | 58.14743 |
| Polynomial (2) | 2887.528 | 1.04253 |
| Step (6) | 8480.800 | 280.03731 |
Model terbaik saat ini berdasarkan nilai AIC dan MSE yang paling rendah. Berdasarkan ketiga hasil pemodelan, didapatkan bahwa model tersebut tidak linier. Model pada data bangkitan ini merupakan model polynomial dengan derajat 2 karena pada model tersebut, nilai AIC dan MSE-nya yang paling kecil (2887,528 dan 1,04253) diantara model lainnya.
2.6. Piecewise Cubic Polynomial Modelling
plot(data.x,y,xlim=c(-2,5),ylim=c(-10,70),pch=16,col="orange")
abline(v=1,col="red",lty=2)
plot(data.x,y,xlim=c(-2,5),ylim=c(-10,70),pch=16,col="orange")
abline(v=1,col="red",lty=2)
#piecewise cubic polynomial
dt.all <- cbind(y,data.x)
##knots = 1
sm.dt <- cbind(data.x, y)
dt1 <- dt.all[data.x <=1,]
dim(dt1)
## [1] 153 2
dt2 <- dt.all[data.x >1,]
dim(dt2)
## [1] 847 2
cub.mod1 <- lm(dt1[,1]~dt1[,2]+I(dt1[,2]^2)+I(dt1[,2]^3))
ix <- sort(dt1[,2],index.return=T)$ix
lines(dt1[ix,2],cub.mod1$fitted.values[ix],col="blue",lwd=2)
cub.mod2 <- lm(dt2[,1]~dt2[,2]+I(dt2[,2]^2)+I(dt2[,2]^3))
ix <- sort(dt2[,2],index.return=T)$ix
lines(dt2[ix,2],cub.mod2$fitted.values[ix],col="blue",lwd=2)
Berdasarkan output di atas dengan knots = 1, jumlah observasi disebelah
kiri garis pembatas ada sebanyak 502, sedangkan disebelah kanan garis
pembatas ada sebanyak 498.
2.7. Truncated Power Basis
# untuk menyambungkan garis regresi
plot(data.x, y, xlim=c(-2,5), ylim=c(-10,70), pch=16, col="orange")
abline(v=1, col="red", lty=2)
hx <- ifelse(data.x > 1, (data.x-1)^3, 0)
cubspline.mod <- lm(y~data.x+I(data.x^2)+I(data.x^3)+hx)
ix <- sort(data.x, index.return=T)$ix
lines(data.x[ix], cubspline.mod$fitted.values[ix],col="green",lwd=2)
2.8. Perbandingan dengan k-fold CV
# Pemisah di 0 dan 2
plot(data.x, y, xlim=c(-2,5), ylim=c(-10,70), pch=16, col="orange")
abline(v=0, col="red", lty=2)
abline(v=2, col="red", lty=2)
hx1 <- ifelse(data.x>0,(data.x-0)^3,0)
hx2 <- ifelse(data.x>2,(data.x-2)^3,0)
cubspline.mod2 <- lm(y~data.x+I(data.x^2)+I(data.x^3)+hx1+hx2)
ix<-sort(data.x, index.return=T)$ix
lines(data.x[ix],cubspline.mod2$fitted.values[ix],col="green", lwd=2)
2.9. Spline Regression
1. Menentukan banyaknya fungsi basis dan knots
dt.all <- cbind(data.x, y)
dt.all <- as.data.frame(dt.all)
knots.val <- attr(splines::bs(dt.all$data.x, df=6), "knots")
knots.val
## 25% 50% 75%
## 1.328286 2.000412 2.664781
2. Pemodelan regresi splines
spline.mod <- lm(y ~ splines::bs(data.x, knots=knots.val), data=dt.all)
summary(spline.mod)
##
## Call:
## lm(formula = y ~ splines::bs(data.x, knots = knots.val), data = dt.all)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.2684 -0.7289 0.0055 0.7097 3.0228
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.4995 0.7525 12.624 < 2e-16
## splines::bs(data.x, knots = knots.val)1 -8.0240 1.1311 -7.094 2.47e-12
## splines::bs(data.x, knots = knots.val)2 -5.0592 0.6930 -7.300 5.89e-13
## splines::bs(data.x, knots = knots.val)3 22.6838 0.7859 28.864 < 2e-16
## splines::bs(data.x, knots = knots.val)4 55.8373 0.7559 73.873 < 2e-16
## splines::bs(data.x, knots = knots.val)5 98.0008 0.8802 111.345 < 2e-16
## splines::bs(data.x, knots = knots.val)6 136.5119 0.8856 154.144 < 2e-16
##
## (Intercept) ***
## splines::bs(data.x, knots = knots.val)1 ***
## splines::bs(data.x, knots = knots.val)2 ***
## splines::bs(data.x, knots = knots.val)3 ***
## splines::bs(data.x, knots = knots.val)4 ***
## splines::bs(data.x, knots = knots.val)5 ***
## splines::bs(data.x, knots = knots.val)6 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.024 on 993 degrees of freedom
## Multiple R-squared: 0.9985, Adjusted R-squared: 0.9985
## F-statistic: 1.108e+05 on 6 and 993 DF, p-value: < 2.2e-16
3. Plot Spline Regression
library(ggplot2)
ggplot(dt.all, aes(x=data.x, y=y)) + geom_point(color="black") + stat_smooth(method="lm", formula=y~splines::bs(x, knots=knots.val), lty=1, se=F)
2.10. Smoothing Spline
Smoothing Spline digunakan untuk memperoleh parameter lambda terbaik dengan menggunakan CV dan GCV.
ss1 <- smooth.spline(data.x, y, all.knots=T); ss1
## Call:
## smooth.spline(x = data.x, y = y, all.knots = T)
##
## Smoothing Parameter spar= 1.499948 lambda= 0.0001539026 (26 iterations)
## Equivalent Degrees of Freedom (Df): 17.22686
## Penalized Criterion (RSS): 1020.323
## GCV: 1.056407
plot(data.x, y, xlim=c(-2,5), ylim=c(-10,70), pch=16, col="orange")
lines(ss1,col="purple", lwd=2)
Berdasarkan output di atas, diperoleh parameter lambda terbaik sebesar
0.0001377739 dengan GCV 1.045209.
2.11. LOESS Function
loess.tr <- loess(formula=y~data.x, data=dt.all)
summary(loess.tr)
## Call:
## loess(formula = y ~ data.x, data = dt.all)
##
## Number of Observations: 1000
## Equivalent Number of Parameters: 5.19
## Residual Standard Error: 1.024
## Trace of smoother matrix: 5.68 (exact)
##
## Control settings:
## span : 0.75
## degree : 2
## family : gaussian
## surface : interpolate cell = 0.2
## normalize: TRUE
## parametric: FALSE
## drop.square: FALSE
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.2.1
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
loess.span.tr <- data.frame(span=seq(0.1,2,by=0.1)) %>%
group_by(span) %>%
do(broom::augment(loess(y~data.x, data=dt.all,span=.$span)))
ggplot(loess.span.tr, aes(x=data.x, y=y)) + geom_line(aes(y=.fitted), col="blue", lty=1) + facet_wrap(~span)
2.12. Perbandingan Eksploratif seluruh Pemodelan
ggplot(dt.all, aes(x=data.x, y=y)) + geom_point(color="black") + stat_smooth(method="lm", formula=y~x, col="red", show.legend=T) + stat_smooth(method="lm", formula=y~poly(x,2,raw=T), col="blue", lwd=3.5, show.legend=T) + stat_smooth(method="lm", formula=y~cut(x,6), col="green", show.legend=T) + stat_smooth(method="lm", formula=y~splines::bs(x, knots=knots.val), col="yellow", lwd=2, show.legend=T) + stat_smooth(method="loess", formula=y~x, col="orange", show.legend=T)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
model.tr2 <- data.frame(Model = c("Linear", "Polynomial (2)", "Step-Function (6)", "Spline Regression", "LOESS Function"), MSE = c(MSE(predict(lin.mod), y), MSE(predict(pol.mod), y), MSE(predict(step.mod), y), MSE(predict(spline.mod), y), MSE(predict(loess.tr), y)))
knitr::kable(model.tr2)
| Model | MSE |
|---|---|
| Linear | 58.147427 |
| Polynomial (2) | 1.042530 |
| Step-Function (6) | 280.037308 |
| Spline Regression | 1.040312 |
| LOESS Function | 1.042016 |
Pemodelan terbaik untuk data bangkitan: LOESS Function(Berdasarkan nilai MSE terkecil).
3. Pemodelan Nonlinier Data Rock
3.1. Data Rock
library(dplyr)
data.tambah <- rock %>% select(area, peri)
knitr::kable(head(data.tambah))
| area | peri |
|---|---|
| 4990 | 2791.90 |
| 7002 | 3892.60 |
| 7558 | 3930.66 |
| 7352 | 3869.32 |
| 7943 | 3948.54 |
| 7979 | 4010.15 |
3.2. Pemodelan Data Linear
Model: area~peri
lin.mod1 = lm(area~peri ,data=rock)
summary(lin.mod1)
##
## Call:
## lm(formula = area ~ peri, data = rock)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2511.9 -1282.7 -80.1 790.8 4375.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3052.0181 476.8561 6.400 7.26e-08 ***
## peri 1.5419 0.1572 9.808 7.51e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1543 on 46 degrees of freedom
## Multiple R-squared: 0.6765, Adjusted R-squared: 0.6695
## F-statistic: 96.2 on 1 and 46 DF, p-value: 7.506e-13
library(ggplot2)
ggplot(rock,aes(x=peri, y=area)) +
geom_point(alpha=0.55, color="black") +
theme_bw()+ stat_smooth(method = "lm", formula = y~x,lty = 1, col = "yellow",se = F)
3.4. Pemodelan Data Polynomial
#menggunakan derajat bebas polinomial 3
ggplot(data.tambah, aes(x=peri, y=area)) + geom_point(color="black") + stat_smooth(method="lm", formula=y~poly(x,3, raw=T), col="blue", se=F)
Berdasarkan plot pemodelan data polynomial, dapat dilihat bahwa garis
berwarna biru mengikuti pola data. Hal ini sudah sesuai dengan model
yang dibangun, yakni model polynomial derajat bebas 3.
3.5. Pemodelan Data Fungsi Tangga
#menggunakan derajar bebas 12
ggplot(data.tambah, aes(x=peri, y=area)) + geom_point(color="black") + stat_smooth(method="lm", formula=y~cut(x,12), col="blue", se=F)
3.6. Perbandingan Hasil Pemodelan
pol.mod1 <- lm(area~poly(peri, 3, raw=T), data=data.tambah)
step.mod1 <- lm(area~cut(peri, 12, raw=T), data=data.tambah)
model.rock <- data.frame("Model" = c("Linear (area~peri)", "Polynomial (3) (area~peri)", "Step (12) (area~peri)"), "AIC" = c(AIC(lin.mod1),AIC(pol.mod1), AIC(step.mod1)), "MSE" = c(MSE(predict(lin.mod1),data.tambah$area), MSE(predict(pol.mod1),data.tambah$area), MSE(predict(step.mod1),data.tambah$area)))
knitr::kable(model.rock)
| Model | AIC | MSE |
|---|---|---|
| Linear (area~peri) | 844.9550 | 2281521 |
| Polynomial (3) (area~peri) | 829.8618 | 1532754 |
| Step (12) (area~peri) | 829.3903 | 1087532 |
Berdasarkah hasil pemodelan, diketahui bahwa untuk model hubungan peubah area~peri lebih baik menggunakan model fungsi tangga (12). Model terbaik ini masih bersifat sementara karena masih ada pemodelan lainnya.
3.7. Spline Regression
knots.val.rock <- attr(splines::bs(data.tambah$peri, df=6), "knots")
knots.val.rock
## 25% 50% 75%
## 1414.907 2536.195 3989.523
3.7.1 b-Spline Model
spline.mod.rock <- lm(area ~ splines::bs(peri, knots=knots.val.rock), data=data.tambah)
summary(spline.mod.rock)
##
## Call:
## lm(formula = area ~ splines::bs(peri, knots = knots.val.rock),
## data = data.tambah)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2463.3 -751.4 -95.4 792.9 3444.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 931.8 1156.8 0.806 0.425167
## splines::bs(peri, knots = knots.val.rock)1 1096.0 2267.7 0.483 0.631442
## splines::bs(peri, knots = knots.val.rock)2 6798.8 1449.8 4.689 3.03e-05
## splines::bs(peri, knots = knots.val.rock)3 5303.9 2216.1 2.393 0.021355
## splines::bs(peri, knots = knots.val.rock)4 6966.1 1633.8 4.264 0.000115
## splines::bs(peri, knots = knots.val.rock)5 9348.9 1636.6 5.712 1.11e-06
## splines::bs(peri, knots = knots.val.rock)6 10962.7 1530.3 7.164 9.69e-09
##
## (Intercept)
## splines::bs(peri, knots = knots.val.rock)1
## splines::bs(peri, knots = knots.val.rock)2 ***
## splines::bs(peri, knots = knots.val.rock)3 *
## splines::bs(peri, knots = knots.val.rock)4 ***
## splines::bs(peri, knots = knots.val.rock)5 ***
## splines::bs(peri, knots = knots.val.rock)6 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1324 on 41 degrees of freedom
## Multiple R-squared: 0.7877, Adjusted R-squared: 0.7567
## F-statistic: 25.36 on 6 and 41 DF, p-value: 2.448e-12
3.7.2 Natural Spline
ns.spline.rock <- lm(area ~ splines::ns(peri, knots=knots.val.rock), data=data.tambah)
summary(ns.spline.rock)
##
## Call:
## lm(formula = area ~ splines::ns(peri, knots = knots.val.rock),
## data = data.tambah)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2426.5 -790.3 -79.1 799.2 3523.5
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 570.7 860.4 0.663 0.511
## splines::ns(peri, knots = knots.val.rock)1 6215.8 1395.9 4.453 5.94e-05
## splines::ns(peri, knots = knots.val.rock)2 5432.5 1019.6 5.328 3.44e-06
## splines::ns(peri, knots = knots.val.rock)3 15507.4 1995.8 7.770 1.01e-09
## splines::ns(peri, knots = knots.val.rock)4 7148.5 829.6 8.617 6.46e-11
##
## (Intercept)
## splines::ns(peri, knots = knots.val.rock)1 ***
## splines::ns(peri, knots = knots.val.rock)2 ***
## splines::ns(peri, knots = knots.val.rock)3 ***
## splines::ns(peri, knots = knots.val.rock)4 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1297 on 43 degrees of freedom
## Multiple R-squared: 0.7862, Adjusted R-squared: 0.7663
## F-statistic: 39.54 on 4 and 43 DF, p-value: 7.031e-14
3.8. Smoothing Spline
ss.rock <- with(data=data.tambah, smooth.spline(peri, area))
ss.rock
## Call:
## smooth.spline(x = peri, y = area)
##
## Smoothing Parameter spar= -1.499562 lambda= 4.4911e-21 (37 iterations)
## Equivalent Degrees of Freedom (Df): 47.00319
## Penalized Criterion (RSS): 6.669407e-14
## GCV: 3.221852e-12
pred.ss1 <- broom::augment(ss.rock)
ggplot(pred.ss1, aes(x=x, y=y)) + geom_point(color="orange") + geom_line(aes(y=.fitted), col="blue", lty=1) + xlab("peri") + ylab("area")
3.9. LOESS Function
loess.rock <- loess(area~peri, data=data.tambah)
summary(loess.rock)
## Call:
## loess(formula = area ~ peri, data = data.tambah)
##
## Number of Observations: 48
## Equivalent Number of Parameters: 4.16
## Residual Standard Error: 1307
## Trace of smoother matrix: 4.53 (exact)
##
## Control settings:
## span : 0.75
## degree : 2
## family : gaussian
## surface : interpolate cell = 0.2
## normalize: TRUE
## parametric: FALSE
## drop.square: FALSE
ggplot(data.tambah, aes(x=peri, y=area)) + geom_point(color="black") + stat_smooth(method="loess", formula=y~x, col="red", show.legend=T)
3.10. Perbandingan Eksploratif seluruh Pemodelan
ggplot(data.tambah, aes(x=peri, y=area)) + geom_point(color="black") + stat_smooth(method="lm", formula=y~poly(x,3,raw=T), col="blue", lwd=1.2, se=F) + stat_smooth(method="lm", formula=y~cut(x,12), col="green", lwd=1.2, se=F) + stat_smooth(method="lm", formula=y~bs(x, knots=knots.val.rock), col="yellow", lwd=1.2, se=F) + stat_smooth(method="lm", formula=y~ns(x, knots=knots.val.rock), col="darkgreen", lwd=1.2, se=F) + stat_smooth(method="loess", formula=y~x, col="orange", lwd=1.2, se=F) + stat_smooth(method="lm", formula=y~x, col="red", lwd=1.2, se=F)
## Warning: Computation failed in `stat_smooth()`
## Caused by error in `bs()`:
## ! could not find function "bs"
## Warning: Computation failed in `stat_smooth()`
## Caused by error in `ns()`:
## ! could not find function "ns"
model.rock2 <- data.frame(Model = c("Linear", "Polynomial (3)", "Step-Function (12)", "Spline Regression", "Natural Spline Regression", "LOESS Function"), MSE = c(MSE(predict(lin.mod1), data.tambah$area), MSE(predict(pol.mod1), data.tambah$area), MSE(predict(step.mod1), data.tambah$area), MSE(predict(spline.mod.rock), data.tambah$area), MSE(predict(ns.spline.rock), data.tambah$area), MSE(predict(loess.rock), data.tambah$area)))
knitr::kable(model.rock2)
| Model | MSE |
|---|---|
| Linear | 2281521 |
| Polynomial (3) | 1532754 |
| Step-Function (12) | 1087532 |
| Spline Regression | 1497187 |
| Natural Spline Regression | 1507754 |
| LOESS Function | 1533233 |
Pemodelan terbaik untuk data rock area~peri: Step-Function with 12 breaks (Berdasarkan nilai MSE terkecil).