Tugas Responsi UAS 1 - STA1381 Pengantar Sains Data

2022-11-05

1. Pendahuluan

1.1. Latar Belakang

Pengepasan kurva (fitting curve) merupakan salah satu masalah yang dihadapi dalam bidang statistika yang berakhir pada masalah pendugaan garis regresi, misalnya regresi nonlinear. Oleh karena itu, diperlukan pemodelan yang tepat untuk mengetahui hubungan fungsional antara satu peubah dengan peubah lain.

1.2. Deskripsi Data

  • Data yang digunakan terdiri atas 2 jenis data, yaitu:
    • Data bangkitan dengan 1000 amatan yang menyebar normal dengan mean sebesar 2 dan ragam sebesar 1.
    • Dataset rock Dataset rock terdiri atas 48 amatan dan 4 peubah dengan rincian tipe peubah:
      • area (numeric)
      • peri (numeric)
      • shape (numeric)
      • perm (numeric) Pada kasus kali ini, peubah yang akan digunakan adalah peubah area dan peri.

1.3. Package

  • Package yang digunakan dalam analisis:
    1. tidyverse
    2. ggplot2
    3. dplyr
    4. purrr
    5. rsample
    6. splines

2. Pemodelan Nonlinear Data Bangkitan

2.1. Inisiasi Data

set.seed(82)
data.x <- rnorm(1000,2,1)
err <- rnorm(1000)
y <- 5+2*data.x+6*data.x^2+err
plot(data.x,y,xlim=c(-2,5), ylim=c(-10,70))

Berdasarkan plot di atas, dapat dilihat bahwa plot membentuk pola yang tidak linear. Hal ini dapat dilihat pada pola grafik yang tidak membentuk garis lurus.

2.2. Pemodelan Data Linier

lin.mod <-lm( y~data.x)
plot( data.x,y,xlim=c( -2,5), ylim=c( -10,70))
lines(data.x,lin.mod$fitted.values,col="red")

summary(lin.mod)
## 
## Call:
## lm(formula = y ~ data.x)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.218 -4.913 -2.860  1.899 50.580 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -13.1952     0.5507  -23.96   <2e-16 ***
## data.x       25.9413     0.2476  104.76   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.633 on 998 degrees of freedom
## Multiple R-squared:  0.9166, Adjusted R-squared:  0.9166 
## F-statistic: 1.097e+04 on 1 and 998 DF,  p-value: < 2.2e-16

Berdasarkan plot hasil pemodelan data linear, garis regresi linear tidak mengikuti pola data.

2.3. Pemodelan Data Polynomial

pol.mod <- lm( y~data.x+I(data.x^2)) #Ordo 2
ix <- sort( data.x,index.return=T)$ix
plot(data.x,y,xlim=c(-2,5), ylim=c(-10,70))
lines(data.x[ix], pol.mod$fitted.values[ix],col="blue", cex=2)

summary(pol.mod)
## 
## Call:
## lm(formula = y ~ data.x + I(data.x^2))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2431 -0.7114 -0.0032  0.7236  3.0194 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.03639    0.10738   46.91   <2e-16 ***
## data.x       2.00131    0.10768   18.59   <2e-16 ***
## I(data.x^2)  5.98938    0.02563  233.69   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.023 on 997 degrees of freedom
## Multiple R-squared:  0.9985, Adjusted R-squared:  0.9985 
## F-statistic: 3.33e+05 on 2 and 997 DF,  p-value: < 2.2e-16

Berdasarkan plot pemodelan data polynomial, dapat dilihat bahwa garis berwarna biru mengikuti pola data. Hal ini sudah sesuai dengan model yang dibangun, yakni model polynomial ordo 2.

2.4. Pemodelan Data Fungsi Tangga

#regresi fungsi tangga
range(data.x)
## [1] -1.074611  4.694234
c1 <- as.factor(ifelse(data.x<=0,1,0))
c2 <- as.factor(ifelse(data.x<=2 & data.x>0,1,0))
c3 <- as.factor(ifelse(data.x>2,1,0))
step.mod <- lm(y~c1+c2+c3)
plot(data.x,y,xlim=c(-2,5), ylim=c(-10,70))
lines(data.x,lin.mod$fitted.values,col="red")
lines(data.x[ix], step.mod$fitted.values[ix],col="green")

Berdasarkan output di atas, garis hijau merupakan fungsi tangga dan garis merah merupakan fungsi regresi linear. Dapat dilihat bahwa garis hijau membentuk 3 selang pada data. Selang pertama adalah data.x dalam rentang -1.1 sampai 0. Selang kedua adalah data.x dalam rentang 0 sampai 2 dan selang ketiga adalah data.x dalam rentang 2 sampai 4.7. Secata eksploratif, garis fungsi tangga lebih mengikuti pola data dibandingkan dengan garis fungsi regresi linear.

summary(step.mod)
## 
## Call:
## lm(formula = y ~ c1 + c2 + c3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -27.217 -10.863  -2.345   7.748  86.626 
## 
## Coefficients: (1 not defined because of singularities)
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  59.0160     0.7495   78.74   <2e-16 ***
## c11         -53.3047     3.9173  -13.61   <2e-16 ***
## c21         -40.2159     1.0704  -37.57   <2e-16 ***
## c31               NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16.76 on 997 degrees of freedom
## Multiple R-squared:  0.5985, Adjusted R-squared:  0.5977 
## F-statistic: 743.2 on 2 and 997 DF,  p-value: < 2.2e-16

2.5. Perbandingan Hasil Pemodelan

MSE = function(pred, actual){
  mean((pred-actual)^2)
}
model <- data.frame("Model" = c("Linear", "Polynomial (2)", "Step (6)"), "AIC" = c(AIC(lin.mod), AIC(pol.mod), AIC(step.mod)), "MSE" = c(MSE(predict(lin.mod), y), MSE(predict(pol.mod), y), MSE(predict(step.mod), y)))
knitr::kable(model)
Model AIC MSE
Linear 6906.859 58.14743
Polynomial (2) 2887.528 1.04253
Step (6) 8480.800 280.03731

Model terbaik saat ini berdasarkan nilai AIC dan MSE yang paling rendah. Berdasarkan ketiga hasil pemodelan, didapatkan bahwa model tersebut tidak linier. Model pada data bangkitan ini merupakan model polynomial dengan derajat 2 karena pada model tersebut, nilai AIC dan MSE-nya yang paling kecil (2887,528 dan 1,04253) diantara model lainnya.

2.6. Piecewise Cubic Polynomial Modelling

plot(data.x,y,xlim=c(-2,5),ylim=c(-10,70),pch=16,col="orange")
abline(v=1,col="red",lty=2)

plot(data.x,y,xlim=c(-2,5),ylim=c(-10,70),pch=16,col="orange")
abline(v=1,col="red",lty=2)
#piecewise cubic polynomial
dt.all <- cbind(y,data.x)
##knots = 1
sm.dt <- cbind(data.x, y)
dt1 <- dt.all[data.x <=1,]
dim(dt1)
## [1] 153   2
dt2 <- dt.all[data.x >1,]
dim(dt2)
## [1] 847   2
cub.mod1 <- lm(dt1[,1]~dt1[,2]+I(dt1[,2]^2)+I(dt1[,2]^3))
ix <- sort(dt1[,2],index.return=T)$ix
lines(dt1[ix,2],cub.mod1$fitted.values[ix],col="blue",lwd=2)
cub.mod2 <- lm(dt2[,1]~dt2[,2]+I(dt2[,2]^2)+I(dt2[,2]^3))
ix <- sort(dt2[,2],index.return=T)$ix
lines(dt2[ix,2],cub.mod2$fitted.values[ix],col="blue",lwd=2)

Berdasarkan output di atas dengan knots = 1, jumlah observasi disebelah kiri garis pembatas ada sebanyak 502, sedangkan disebelah kanan garis pembatas ada sebanyak 498.

2.7. Truncated Power Basis

# untuk menyambungkan garis regresi
plot(data.x, y, xlim=c(-2,5), ylim=c(-10,70), pch=16, col="orange")
abline(v=1, col="red", lty=2)

hx <- ifelse(data.x > 1, (data.x-1)^3, 0)

cubspline.mod <- lm(y~data.x+I(data.x^2)+I(data.x^3)+hx)
ix <- sort(data.x, index.return=T)$ix
lines(data.x[ix], cubspline.mod$fitted.values[ix],col="green",lwd=2)

2.8. Perbandingan dengan k-fold CV

# Pemisah di 0 dan 2
plot(data.x, y, xlim=c(-2,5), ylim=c(-10,70), pch=16, col="orange")
abline(v=0, col="red", lty=2)
abline(v=2, col="red", lty=2)

hx1 <- ifelse(data.x>0,(data.x-0)^3,0)
hx2 <- ifelse(data.x>2,(data.x-2)^3,0)

cubspline.mod2 <- lm(y~data.x+I(data.x^2)+I(data.x^3)+hx1+hx2)
ix<-sort(data.x, index.return=T)$ix
lines(data.x[ix],cubspline.mod2$fitted.values[ix],col="green", lwd=2)

2.9. Spline Regression

1. Menentukan banyaknya fungsi basis dan knots

dt.all <- cbind(data.x, y)
dt.all <- as.data.frame(dt.all)
knots.val <- attr(splines::bs(dt.all$data.x, df=6), "knots")
knots.val
##      25%      50%      75% 
## 1.328286 2.000412 2.664781

2. Pemodelan regresi splines

spline.mod <- lm(y ~ splines::bs(data.x, knots=knots.val), data=dt.all)
summary(spline.mod)
## 
## Call:
## lm(formula = y ~ splines::bs(data.x, knots = knots.val), data = dt.all)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2684 -0.7289  0.0055  0.7097  3.0228 
## 
## Coefficients:
##                                         Estimate Std. Error t value Pr(>|t|)
## (Intercept)                               9.4995     0.7525  12.624  < 2e-16
## splines::bs(data.x, knots = knots.val)1  -8.0240     1.1311  -7.094 2.47e-12
## splines::bs(data.x, knots = knots.val)2  -5.0592     0.6930  -7.300 5.89e-13
## splines::bs(data.x, knots = knots.val)3  22.6838     0.7859  28.864  < 2e-16
## splines::bs(data.x, knots = knots.val)4  55.8373     0.7559  73.873  < 2e-16
## splines::bs(data.x, knots = knots.val)5  98.0008     0.8802 111.345  < 2e-16
## splines::bs(data.x, knots = knots.val)6 136.5119     0.8856 154.144  < 2e-16
##                                            
## (Intercept)                             ***
## splines::bs(data.x, knots = knots.val)1 ***
## splines::bs(data.x, knots = knots.val)2 ***
## splines::bs(data.x, knots = knots.val)3 ***
## splines::bs(data.x, knots = knots.val)4 ***
## splines::bs(data.x, knots = knots.val)5 ***
## splines::bs(data.x, knots = knots.val)6 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.024 on 993 degrees of freedom
## Multiple R-squared:  0.9985, Adjusted R-squared:  0.9985 
## F-statistic: 1.108e+05 on 6 and 993 DF,  p-value: < 2.2e-16

3. Plot Spline Regression

library(ggplot2)
ggplot(dt.all, aes(x=data.x, y=y)) + geom_point(color="black") + stat_smooth(method="lm", formula=y~splines::bs(x, knots=knots.val), lty=1, se=F)

2.10. Smoothing Spline

Smoothing Spline digunakan untuk memperoleh parameter lambda terbaik dengan menggunakan CV dan GCV.

ss1 <- smooth.spline(data.x, y, all.knots=T); ss1
## Call:
## smooth.spline(x = data.x, y = y, all.knots = T)
## 
## Smoothing Parameter  spar= 1.499948  lambda= 0.0001539026 (26 iterations)
## Equivalent Degrees of Freedom (Df): 17.22686
## Penalized Criterion (RSS): 1020.323
## GCV: 1.056407
plot(data.x, y, xlim=c(-2,5), ylim=c(-10,70), pch=16, col="orange")
lines(ss1,col="purple", lwd=2)

Berdasarkan output di atas, diperoleh parameter lambda terbaik sebesar 0.0001377739 dengan GCV 1.045209.

2.11. LOESS Function

loess.tr <- loess(formula=y~data.x, data=dt.all)
summary(loess.tr)
## Call:
## loess(formula = y ~ data.x, data = dt.all)
## 
## Number of Observations: 1000 
## Equivalent Number of Parameters: 5.19 
## Residual Standard Error: 1.024 
## Trace of smoother matrix: 5.68  (exact)
## 
## Control settings:
##   span     :  0.75 
##   degree   :  2 
##   family   :  gaussian
##   surface  :  interpolate      cell = 0.2
##   normalize:  TRUE
##  parametric:  FALSE
## drop.square:  FALSE
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.2.1
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
loess.span.tr <- data.frame(span=seq(0.1,2,by=0.1)) %>% 
  group_by(span) %>% 
  do(broom::augment(loess(y~data.x, data=dt.all,span=.$span)))
ggplot(loess.span.tr, aes(x=data.x, y=y)) + geom_line(aes(y=.fitted), col="blue", lty=1) + facet_wrap(~span)

2.12. Perbandingan Eksploratif seluruh Pemodelan

ggplot(dt.all, aes(x=data.x, y=y)) + geom_point(color="black") + stat_smooth(method="lm", formula=y~x, col="red", show.legend=T) + stat_smooth(method="lm", formula=y~poly(x,2,raw=T), col="blue", lwd=3.5, show.legend=T) + stat_smooth(method="lm", formula=y~cut(x,6), col="green", show.legend=T) + stat_smooth(method="lm", formula=y~splines::bs(x, knots=knots.val), col="yellow", lwd=2, show.legend=T) + stat_smooth(method="loess", formula=y~x, col="orange", show.legend=T)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.

model.tr2 <- data.frame(Model = c("Linear", "Polynomial (2)", "Step-Function (6)", "Spline Regression", "LOESS Function"), MSE = c(MSE(predict(lin.mod), y), MSE(predict(pol.mod), y), MSE(predict(step.mod), y), MSE(predict(spline.mod), y), MSE(predict(loess.tr), y)))
knitr::kable(model.tr2)
Model MSE
Linear 58.147427
Polynomial (2) 1.042530
Step-Function (6) 280.037308
Spline Regression 1.040312
LOESS Function 1.042016

Pemodelan terbaik untuk data bangkitan: LOESS Function(Berdasarkan nilai MSE terkecil).

3. Pemodelan Nonlinier Data Rock

3.1. Data Rock

library(dplyr)
data.tambah <- rock %>% select(area, peri)
knitr::kable(head(data.tambah))
area peri
4990 2791.90
7002 3892.60
7558 3930.66
7352 3869.32
7943 3948.54
7979 4010.15

3.2. Pemodelan Data Linear

Model: area~peri

lin.mod1 = lm(area~peri ,data=rock)
summary(lin.mod1)
## 
## Call:
## lm(formula = area ~ peri, data = rock)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2511.9 -1282.7   -80.1   790.8  4375.4 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3052.0181   476.8561   6.400 7.26e-08 ***
## peri           1.5419     0.1572   9.808 7.51e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1543 on 46 degrees of freedom
## Multiple R-squared:  0.6765, Adjusted R-squared:  0.6695 
## F-statistic:  96.2 on 1 and 46 DF,  p-value: 7.506e-13
library(ggplot2)
ggplot(rock,aes(x=peri, y=area)) +
   geom_point(alpha=0.55, color="black") +
  theme_bw()+ stat_smooth(method = "lm", formula = y~x,lty = 1, col = "yellow",se = F)

3.4. Pemodelan Data Polynomial

#menggunakan derajat bebas polinomial 3
ggplot(data.tambah, aes(x=peri, y=area)) + geom_point(color="black") + stat_smooth(method="lm", formula=y~poly(x,3, raw=T), col="blue", se=F)

Berdasarkan plot pemodelan data polynomial, dapat dilihat bahwa garis berwarna biru mengikuti pola data. Hal ini sudah sesuai dengan model yang dibangun, yakni model polynomial derajat bebas 3.

3.5. Pemodelan Data Fungsi Tangga

#menggunakan derajar bebas 12
ggplot(data.tambah, aes(x=peri, y=area)) + geom_point(color="black") + stat_smooth(method="lm", formula=y~cut(x,12), col="blue", se=F)

3.6. Perbandingan Hasil Pemodelan

pol.mod1 <- lm(area~poly(peri, 3, raw=T), data=data.tambah)
step.mod1 <- lm(area~cut(peri, 12, raw=T), data=data.tambah)

model.rock <- data.frame("Model" = c("Linear (area~peri)", "Polynomial (3) (area~peri)", "Step (12) (area~peri)"), "AIC" = c(AIC(lin.mod1),AIC(pol.mod1), AIC(step.mod1)), "MSE" = c(MSE(predict(lin.mod1),data.tambah$area), MSE(predict(pol.mod1),data.tambah$area), MSE(predict(step.mod1),data.tambah$area)))
knitr::kable(model.rock)
Model AIC MSE
Linear (area~peri) 844.9550 2281521
Polynomial (3) (area~peri) 829.8618 1532754
Step (12) (area~peri) 829.3903 1087532

Berdasarkah hasil pemodelan, diketahui bahwa untuk model hubungan peubah area~peri lebih baik menggunakan model fungsi tangga (12). Model terbaik ini masih bersifat sementara karena masih ada pemodelan lainnya.

3.7. Spline Regression

knots.val.rock <- attr(splines::bs(data.tambah$peri, df=6), "knots")
knots.val.rock
##      25%      50%      75% 
## 1414.907 2536.195 3989.523

3.7.1 b-Spline Model

spline.mod.rock <- lm(area ~ splines::bs(peri, knots=knots.val.rock), data=data.tambah)
summary(spline.mod.rock)
## 
## Call:
## lm(formula = area ~ splines::bs(peri, knots = knots.val.rock), 
##     data = data.tambah)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2463.3  -751.4   -95.4   792.9  3444.4 
## 
## Coefficients:
##                                            Estimate Std. Error t value Pr(>|t|)
## (Intercept)                                   931.8     1156.8   0.806 0.425167
## splines::bs(peri, knots = knots.val.rock)1   1096.0     2267.7   0.483 0.631442
## splines::bs(peri, knots = knots.val.rock)2   6798.8     1449.8   4.689 3.03e-05
## splines::bs(peri, knots = knots.val.rock)3   5303.9     2216.1   2.393 0.021355
## splines::bs(peri, knots = knots.val.rock)4   6966.1     1633.8   4.264 0.000115
## splines::bs(peri, knots = knots.val.rock)5   9348.9     1636.6   5.712 1.11e-06
## splines::bs(peri, knots = knots.val.rock)6  10962.7     1530.3   7.164 9.69e-09
##                                               
## (Intercept)                                   
## splines::bs(peri, knots = knots.val.rock)1    
## splines::bs(peri, knots = knots.val.rock)2 ***
## splines::bs(peri, knots = knots.val.rock)3 *  
## splines::bs(peri, knots = knots.val.rock)4 ***
## splines::bs(peri, knots = knots.val.rock)5 ***
## splines::bs(peri, knots = knots.val.rock)6 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1324 on 41 degrees of freedom
## Multiple R-squared:  0.7877, Adjusted R-squared:  0.7567 
## F-statistic: 25.36 on 6 and 41 DF,  p-value: 2.448e-12

3.7.2 Natural Spline

ns.spline.rock <- lm(area ~ splines::ns(peri, knots=knots.val.rock), data=data.tambah)
summary(ns.spline.rock)
## 
## Call:
## lm(formula = area ~ splines::ns(peri, knots = knots.val.rock), 
##     data = data.tambah)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2426.5  -790.3   -79.1   799.2  3523.5 
## 
## Coefficients:
##                                            Estimate Std. Error t value Pr(>|t|)
## (Intercept)                                   570.7      860.4   0.663    0.511
## splines::ns(peri, knots = knots.val.rock)1   6215.8     1395.9   4.453 5.94e-05
## splines::ns(peri, knots = knots.val.rock)2   5432.5     1019.6   5.328 3.44e-06
## splines::ns(peri, knots = knots.val.rock)3  15507.4     1995.8   7.770 1.01e-09
## splines::ns(peri, knots = knots.val.rock)4   7148.5      829.6   8.617 6.46e-11
##                                               
## (Intercept)                                   
## splines::ns(peri, knots = knots.val.rock)1 ***
## splines::ns(peri, knots = knots.val.rock)2 ***
## splines::ns(peri, knots = knots.val.rock)3 ***
## splines::ns(peri, knots = knots.val.rock)4 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1297 on 43 degrees of freedom
## Multiple R-squared:  0.7862, Adjusted R-squared:  0.7663 
## F-statistic: 39.54 on 4 and 43 DF,  p-value: 7.031e-14

3.8. Smoothing Spline

ss.rock <- with(data=data.tambah, smooth.spline(peri, area))
ss.rock
## Call:
## smooth.spline(x = peri, y = area)
## 
## Smoothing Parameter  spar= -1.499562  lambda= 4.4911e-21 (37 iterations)
## Equivalent Degrees of Freedom (Df): 47.00319
## Penalized Criterion (RSS): 6.669407e-14
## GCV: 3.221852e-12
pred.ss1 <- broom::augment(ss.rock)
ggplot(pred.ss1, aes(x=x, y=y)) + geom_point(color="orange") + geom_line(aes(y=.fitted), col="blue", lty=1) + xlab("peri") + ylab("area")

3.9. LOESS Function

loess.rock <- loess(area~peri, data=data.tambah)
summary(loess.rock)
## Call:
## loess(formula = area ~ peri, data = data.tambah)
## 
## Number of Observations: 48 
## Equivalent Number of Parameters: 4.16 
## Residual Standard Error: 1307 
## Trace of smoother matrix: 4.53  (exact)
## 
## Control settings:
##   span     :  0.75 
##   degree   :  2 
##   family   :  gaussian
##   surface  :  interpolate      cell = 0.2
##   normalize:  TRUE
##  parametric:  FALSE
## drop.square:  FALSE
ggplot(data.tambah, aes(x=peri, y=area)) + geom_point(color="black") + stat_smooth(method="loess", formula=y~x, col="red", show.legend=T)

3.10. Perbandingan Eksploratif seluruh Pemodelan

ggplot(data.tambah, aes(x=peri, y=area)) + geom_point(color="black") + stat_smooth(method="lm", formula=y~poly(x,3,raw=T), col="blue", lwd=1.2, se=F) + stat_smooth(method="lm", formula=y~cut(x,12), col="green", lwd=1.2, se=F) + stat_smooth(method="lm", formula=y~bs(x, knots=knots.val.rock), col="yellow", lwd=1.2, se=F) + stat_smooth(method="lm", formula=y~ns(x, knots=knots.val.rock), col="darkgreen", lwd=1.2, se=F) + stat_smooth(method="loess", formula=y~x, col="orange", lwd=1.2, se=F) + stat_smooth(method="lm", formula=y~x, col="red", lwd=1.2, se=F)
## Warning: Computation failed in `stat_smooth()`
## Caused by error in `bs()`:
## ! could not find function "bs"
## Warning: Computation failed in `stat_smooth()`
## Caused by error in `ns()`:
## ! could not find function "ns"

model.rock2 <- data.frame(Model = c("Linear", "Polynomial (3)", "Step-Function (12)", "Spline Regression", "Natural Spline Regression", "LOESS Function"), MSE = c(MSE(predict(lin.mod1), data.tambah$area), MSE(predict(pol.mod1), data.tambah$area), MSE(predict(step.mod1), data.tambah$area), MSE(predict(spline.mod.rock), data.tambah$area), MSE(predict(ns.spline.rock), data.tambah$area), MSE(predict(loess.rock), data.tambah$area)))
knitr::kable(model.rock2)
Model MSE
Linear 2281521
Polynomial (3) 1532754
Step-Function (12) 1087532
Spline Regression 1497187
Natural Spline Regression 1507754
LOESS Function 1533233

Pemodelan terbaik untuk data rock area~peri: Step-Function with 12 breaks (Berdasarkan nilai MSE terkecil).