Tugas Responsi 1 STA1333 Pengantar Sains Data: Nonlinear Regression

Tugas akan di update secara berkala pada link berikut “https://rpubs.com/hanung0212safrizal/965994

1. PENDAHULUAN

1.1 Cakupan Materi

  • Nonlinier Regression Part 1 Pemodelan pada data nonlinier yang meliputi:
    1. Regresi Linier
    2. Regresi Polinomial
    3. Regresi Fungsi Tangga
  • Nonlinier Regression Part 2 Pemulusan pada data nonlinier yang meliputi:
    1. Piecewise Cubic Polynomial
    2. Spline Regression
    3. Smoothing Spline
    4. LOESS Function

1.2 Data

Data yang akan digunakan terdiri atas 2 jenis data, yakni: 1. Data bangkitan dengan 1000 amatan yang menyebar normal dengan mean sebesar 2 dan ragam 2. 2. Dataset Auto dari library(ISLR)

Dataset Auto terdiri atas 392 amatan dan 9 peubah dengan rincian tipe peubah:

1.  *mpg (numeric)
2.  cylinders (numeric)*
3.  *displacement (numeric)*
4.  *horsepower (numeric)*
5.  *weight (numeric)*
6.  *acceleration (numeric)*
7.  *year (numeric)*
8.  *origin (numeric)*
9.  *name (categoric (Factor))*

Pada materi analisis regresi nonlinier ini, peubah yang akan dipakai adalah peubah *mpg*, *horsepower*, dan *origin*.
  1. Dataset TRICEPS dari library(MultiKink)

    Data yang digunakan untuk ilustrasi berasal dari studi antropometri terhadap 892 perempuan di bawah 50 tahun di tiga desa Gambia di Afrika Barat. Data terdiri dari 3 Kolom yaitu Age, Intriceps dan tricepts. Berikut adalah penjelasannya pada masing-masing kolom:

    1. age : umur responden
    2. Intriceps : logaritma dari ketebalan lipatan kulit triceps
    3. triceps : ketebalan lipatan kulit triceps

    Lipatan kulit trisep diperlukan untuk menghitung lingkar otot lengan atas. Ketebalannya memberikan informasi tentang cadangan lemak tubuh, sedangkan massa otot yang dihitung memberikan informasi tentang cadangan protein

1.3 Package

Daftar package yang digunakan dalam nonlinear regression:

library(MultiKink)
library(ISLR)
library(tidyverse)
library(ggplot2)
library(dplyr)
library(purrr)
library(rsample)
library(splines)
library(MultiKink)
library(ISLR)
library(tidyverse)
library(ggplot2)
library(dplyr)
library(purrr)
library(rsample)
library(splines)
library(corrplot)
library(PerformanceAnalytics)

2. Regresi Nonlinear Data Bangkitan (Contoh Kuliah)

2.1 Panggil Data

set.seed(123)
datapsd.x <- rnorm(1000,2,2) #jumlah data, miu, standar deviasi
err <- rnorm(1000)
y <- 5+2*datapsd.x+3*datapsd.x^2+err
plot(datapsd.x,y,xlim=c(-2,5), ylim=c(-10,70))

Berdasarkan plot di atas, data bangkitan tersebut termasuk model polinomial ordo 2 karena pola data tidak garis lurus dan terdapat lengkungan di ujung dan tengah.

2.2 Pemodelan Regresi Linier

Regresi linier adalah metode statistika yang digunakan untuk membentuk model atau hubungan antara satu atau lebih variabel bebas X dengan sebuah variabel respon Y.

lin.mod <-lm( y~datapsd.x)
plot( datapsd.x,y,xlim=c( -2,5), ylim=c( -10,70), xlab = "x", main = "Plot Data Bangkitan Fungsi Regresi Linier")
lines(datapsd.x,lin.mod$fitted.values,col="red")

summary(lin.mod)
## 
## Call:
## lm(formula = y ~ datapsd.x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -14.530 -10.542  -6.695   4.101 110.065 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.5634     0.7417   4.804 1.79e-06 ***
## datapsd.x    14.6259     0.2612  55.985  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16.38 on 998 degrees of freedom
## Multiple R-squared:  0.7585, Adjusted R-squared:  0.7582 
## F-statistic:  3134 on 1 and 998 DF,  p-value: < 2.2e-16

Terdapat garis lurus warna merah yang menunjukkan garis fungsi regresi linier yang dibentuk dari fitted value sebagai nilai harapan/nilai duga/nilai prediksi dan data x. Berdasarkan plot di atas, penggunaan fungsi regresi linier biasa tidak tepat karena garis fungsi linier biasa tidak tepat dan menjauhi pola data observasi.

2.3 Pemodelan Polynomial Ordo 2

pol.mod <- lm( y~datapsd.x+I(datapsd.x^2)) # ordo 2
ix <- sort( datapsd.x,index.return=T)$ix
plot(datapsd.x,y,xlim=c(-2,5), ylim=c(-10,70), xlab = "x", main = "Plot Data Bangkitan Fungsi Polinomial")
lines(datapsd.x[ix], pol.mod$fitted.values[ix],col="red", cex=4)

summary(pol.mod)
## 
## Call:
## lm(formula = y ~ datapsd.x + I(datapsd.x^2))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.0319 -0.6942  0.0049  0.7116  3.2855 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    4.951933   0.045676  108.41   <2e-16 ***
## datapsd.x      2.053661   0.029305   70.08   <2e-16 ***
## I(datapsd.x^2) 2.997702   0.005845  512.90   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.007 on 997 degrees of freedom
## Multiple R-squared:  0.9991, Adjusted R-squared:  0.9991 
## F-statistic: 5.462e+05 on 2 and 997 DF,  p-value: < 2.2e-16

Terdapat garis warna merah yang menunjukkan garis fungsi polinomial yang dibentuk dari fitted value index i sebagai nilai harapan/nilai duga/nilai prediksi dan data x. Plot di atas merupakan fungsi polinomial berordo 2. Berdasarkan plot di atas, fungsi polinomial tepat digunakan untuk data bangkitan karena garis fungsi polinomial mendekati dan mengikuti pola data amatan sehingga kemungkinan error yang dihasilkan lebih kecil

2.4 Pemodelan Fungsi Tangga

range( datapsd.x)
## [1] -3.619549  8.482080
c1 <- as.factor(ifelse(datapsd.x<=0,1,0))
c2 <- as.factor(ifelse(datapsd.x<=2 & datapsd.x>0,1,0))
c3 <- as.factor(ifelse(datapsd.x>2,1,0))
data.frame(y, c1, c2, c3)
##               y c1 c2 c3
## 1      8.080479  0  1  0
## 2     14.150855  0  1  0
## 3     93.780712  0  0  1
## 4     22.901717  0  0  1
## 5     22.271298  0  0  1
## 6    105.359768  0  0  1
## 7     36.704704  0  0  1
## 8      7.199052  1  0  0
## 9      8.114520  0  1  0
## 10    10.457881  0  1  0
## 11    76.052196  0  0  1
## 12    35.460605  0  0  1
## 13    32.930302  0  0  1
## 14    24.715156  0  0  1
## 15     8.932714  0  1  0
## 16   109.537322  0  0  1
## 17    38.141617  0  0  1
## 18    11.083816  1  0  0
## 19    46.826356  0  0  1
## 20    12.193469  0  1  0
## 21     4.619816  1  0  0
## 22    15.303932  0  1  0
## 23     6.302669  1  0  0
## 24     7.864831  0  1  0
## 25     6.538495  0  1  0
## 26     8.140356  1  0  0
## 27    54.534230  0  0  1
## 28    26.992003  0  0  1
## 29     5.096386  1  0  0
## 30    75.692661  0  0  1
## 31    33.926524  0  0  1
## 32    14.082936  0  1  0
## 33    54.724069  0  0  1
## 34    54.383141  0  0  1
## 35    53.039820  0  0  1
## 36    44.835739  0  0  1
## 37    40.458510  0  0  1
## 38    19.740801  0  1  0
## 39    13.611315  0  1  0
## 40    13.906099  0  1  0
## 41     6.317271  0  1  0
## 42    16.303202  0  1  0
## 43     4.694707  1  0  0
## 44   137.922375  0  0  1
## 45    72.797093  0  0  1
## 46     3.669034  1  0  0
## 47    10.353568  0  1  0
## 48    10.052376  0  1  0
## 49    51.890927  0  0  1
## 50    18.804836  0  1  0
## 51    29.194396  0  0  1
## 52    20.020623  0  1  0
## 53    20.292174  0  1  0
## 54    80.846051  0  0  1
## 55    16.447993  0  1  0
## 56    91.642079  0  0  1
## 57     5.612092  1  0  0
## 58    41.525017  0  0  1
## 59    25.368329  0  0  1
## 60    28.163664  0  0  1
## 61    34.841353  0  0  1
## 62     9.349902  0  1  0
## 63    14.118656  0  1  0
## 64     5.966387  1  0  0
## 65     4.612200  1  0  0
## 66    29.628431  0  0  1
## 67    34.871433  0  0  1
## 68    22.975619  0  0  1
## 69    56.959290  0  0  1
## 70   130.615640  0  0  1
## 71    10.679605  0  1  0
## 72    19.958457  1  0  0
## 73    60.273256  0  0  1
## 74     6.595566  0  1  0
## 75     7.758917  0  1  0
## 76    61.886623  0  0  1
## 77    14.513734  0  1  0
## 78     4.367387  1  0  0
## 79    26.365389  0  0  1
## 80    16.612022  0  1  0
## 81    23.066840  0  0  1
## 82    33.901765  0  0  1
## 83    12.500819  0  1  0
## 84    42.333335  0  0  1
## 85    16.069540  0  1  0
## 86    30.587223  0  0  1
## 87    65.256641  0  0  1
## 88    36.376018  0  0  1
## 89    12.695992  0  1  0
## 90    67.255348  0  0  1
## 91    62.432611  0  0  1
## 92    37.586579  0  0  1
## 93    28.941214  0  0  1
## 94     9.167072  0  1  0
## 95    80.683802  0  0  1
## 96     8.960757  0  1  0
## 97   140.097562  0  0  1
## 98    93.140465  0  0  1
## 99    15.551145  0  1  0
## 100    4.657809  1  0  0
## 101    8.080738  0  1  0
## 102   29.785237  0  0  1
## 103   13.886341  0  1  0
## 104   11.317450  0  1  0
## 105    5.381892  0  1  0
## 106   19.489591  0  1  0
## 107    5.430036  0  1  0
## 108    7.765920  1  0  0
## 109   10.768527  0  1  0
## 110   57.027789  0  0  1
## 111    8.237646  0  1  0
## 112   43.415613  0  0  1
## 113    9.534307  1  0  0
## 114   18.565331  0  1  0
## 115   39.838472  0  0  1
## 116   31.345764  0  0  1
## 117   24.022749  0  0  1
## 118    7.532636  0  1  0
## 119    7.447556  0  1  0
## 120    2.905013  1  0  0
## 121   23.816999  0  0  1
## 122    3.806365  0  1  0
## 123   11.547464  0  1  0
## 124   14.425714  0  1  0
## 125  112.901390  0  0  1
## 126   11.029911  0  1  0
## 127   28.205669  0  0  1
## 128   22.812089  0  0  1
## 129    5.469898  0  1  0
## 130   17.495967  0  1  0
## 131   86.978453  0  0  1
## 132   35.992221  0  0  1
## 133   22.643449  0  0  1
## 134   10.329761  0  1  0
## 135   13.075984  1  0  0
## 136   67.343114  0  0  1
## 137    4.935721  1  0  0
## 138   49.587848  0  0  1
## 139  119.770163  0  0  1
## 140    5.432029  1  0  0
## 141   46.201040  0  0  1
## 142   14.154402  0  1  0
## 143    6.708847  1  0  0
## 144    6.216827  1  0  0
## 145    7.226039  1  0  0
## 146    8.770279  0  1  0
## 147    4.864700  1  0  0
## 148   47.137501  0  0  1
## 149  132.179914  0  0  1
## 150    5.143562  1  0  0
## 151   50.446107  0  0  1
## 152   48.672445  0  0  1
## 153   32.217037  0  0  1
## 154    5.140440  1  0  0
## 155   19.226337  0  1  0
## 156   14.209849  0  1  0
## 157   40.235648  0  0  1
## 158   12.514537  0  1  0
## 159   58.623387  0  0  1
## 160   11.359572  0  1  0
## 161   64.284611  0  0  1
## 162    4.499192  1  0  0
## 163    4.705587  1  0  0
## 164  237.685975  0  0  1
## 165   10.762718  0  1  0
## 166   28.398960  0  0  1
## 167   44.035437  0  0  1
## 168   11.024306  0  1  0
## 169   37.389177  0  0  1
## 170   34.447027  0  0  1
## 171   15.911166  0  1  0
## 172   24.221003  0  0  1
## 173   19.102873  0  1  0
## 174  135.127124  0  0  1
## 175    6.737526  0  1  0
## 176    5.495106  1  0  0
## 177   21.499351  0  0  1
## 178   30.840143  0  0  1
## 179   33.730631  0  0  1
## 180    9.909335  0  1  0
## 181    4.919852  1  0  0
## 182   74.810505  0  0  1
## 183   12.633284  0  1  0
## 184    5.287064  0  1  0
## 185   15.661039  0  1  0
## 186   17.114103  0  1  0
## 187   66.038343  0  0  1
## 188   23.151773  0  0  1
## 189   50.376433  0  0  1
## 190    7.812410  0  1  0
## 191   27.236472  0  0  1
## 192   15.238550  0  1  0
## 193   25.949281  0  0  1
## 194    5.706528  0  1  0
## 195    4.052356  1  0  0
## 196  124.953768  0  0  1
## 197   41.497286  0  0  1
## 198    6.205379  1  0  0
## 199    7.563157  0  1  0
## 200    5.043826  1  0  0
## 201  141.203743  0  0  1
## 202   77.659187  0  0  1
## 203   15.271086  0  1  0
## 204   39.002221  0  0  1
## 205   12.088852  0  1  0
## 206   11.483482  0  1  0
## 207    5.393411  0  1  0
## 208    9.701548  0  1  0
## 209   99.441822  0  0  1
## 210   19.816594  0  1  0
## 211   24.711337  0  0  1
## 212   28.108654  0  0  1
## 213   74.005389  0  0  1
## 214    8.515644  0  1  0
## 215    4.894508  0  1  0
## 216  102.440827  0  0  1
## 217    8.808805  0  1  0
## 218    5.540119  0  1  0
## 219    3.562870  1  0  0
## 220    3.244804  1  0  0
## 221    9.301672  0  1  0
## 222   41.893552  0  0  1
## 223   64.692356  0  0  1
## 224   46.183080  0  0  1
## 225   12.013920  0  1  0
## 226   23.572624  0  0  1
## 227    6.125021  0  1  0
## 228    8.252003  0  1  0
## 229   55.559855  0  0  1
## 230    5.302899  1  0  0
## 231  120.773799  0  0  1
## 232   20.522611  0  1  0
## 233   27.395139  0  0  1
## 234    5.041405  0  1  0
## 235    8.672329  0  1  0
## 236    3.003482  1  0  0
## 237   15.969119  0  1  0
## 238   34.415835  0  0  1
## 239   32.024431  0  0  1
## 240    7.456066  0  1  0
## 241    5.655575  0  1  0
## 242   10.770987  0  1  0
## 243   91.172392  0  0  1
## 244    3.892869  1  0  0
## 245   15.718865  0  1  0
## 246  118.344675  0  0  1
## 247   18.478102  0  1  0
## 248    5.663236  1  0  0
## 249    9.094161  0  1  0
## 250   37.808019  0  0  1
## 251   13.227751  0  1  0
## 252    9.678828  0  1  0
## 253   13.223287  0  1  0
## 254   24.018266  0  0  1
## 255   97.712332  0  0  1
## 256   17.612042  0  1  0
## 257   64.174734  0  0  1
## 258   44.027270  0  0  1
## 259   17.853362  0  1  0
## 260    6.350212  1  0  0
## 261   10.408752  0  1  0
## 262   10.916599  0  1  0
## 263   22.084336  0  0  1
## 264   77.379218  0  0  1
## 265  148.378344  0  0  1
## 266   94.135373  0  0  1
## 267   17.910544  0  1  0
## 268   10.274902  1  0  0
## 269   11.920324  0  1  0
## 270   24.718965  0  0  1
## 271   54.111930  0  0  1
## 272   59.680388  0  0  1
## 273   46.194717  0  0  1
## 274    5.013922  1  0  0
## 275   53.343687  0  0  1
## 276   11.118754  0  1  0
## 277   26.310036  0  0  1
## 278   24.097352  0  0  1
## 279   35.079274  0  0  1
## 280   21.627829  0  0  1
## 281    7.192057  1  0  0
## 282   47.992666  0  0  1
## 283   33.528176  0  0  1
## 284   12.094868  0  1  0
## 285   23.110712  0  0  1
## 286   24.896191  0  0  1
## 287   27.509457  0  0  1
## 288   98.051337  0  0  1
## 289   13.450848  0  1  0
## 290   25.690413  0  0  1
## 291   70.749695  0  0  1
## 292   65.625879  0  0  1
## 293   68.768441  0  0  1
## 294   10.325712  0  1  0
## 295  125.271784  0  0  1
## 296   23.036545  0  0  1
## 297  115.418307  0  0  1
## 298    4.203404  1  0  0
## 299   21.541106  0  0  1
## 300   75.653493  0  0  1
## 301    6.362350  0  1  0
## 302    6.401591  0  1  0
## 303    4.143404  0  1  0
## 304    5.177391  1  0  0
## 305   11.477634  0  1  0
## 306   32.237520  0  0  1
## 307   12.066823  1  0  0
## 308   27.581916  0  0  1
## 309   73.035226  0  0  1
## 310  127.872182  0  0  1
## 311   79.092259  0  0  1
## 312   48.558665  0  0  1
## 313    9.147407  1  0  0
## 314    7.749867  0  1  0
## 315   12.151411  0  1  0
## 316   47.076742  0  0  1
## 317   17.496088  0  1  0
## 318    3.065230  1  0  0
## 319  103.477252  0  0  1
## 320   56.846922  0  0  1
## 321   27.740886  0  0  1
## 322   70.918425  0  0  1
## 323    6.924329  1  0  0
## 324   48.133541  0  0  1
## 325    9.847182  0  1  0
## 326   46.604777  0  0  1
## 327   20.565937  0  1  0
## 328   42.828767  0  0  1
## 329   79.446585  0  0  1
## 330   19.477639  0  0  1
## 331   61.180171  0  0  1
## 332    5.294762  1  0  0
## 333    6.752915  0  1  0
## 334   91.020154  0  0  1
## 335   33.163364  0  0  1
## 336   12.213516  1  0  0
## 337    5.971759  1  0  0
## 338   14.418395  0  1  0
## 339   54.028141  0  0  1
## 340   17.833268  0  1  0
## 341   42.933975  0  0  1
## 342   60.348412  0  0  1
## 343  100.716566  0  0  1
## 344   21.823025  0  0  1
## 345   18.057278  0  1  0
## 346    7.989751  1  0  0
## 347   22.733380  0  0  1
## 348    9.320294  0  1  0
## 349    4.249063  0  1  0
## 350   16.655063  0  1  0
## 351   61.633297  0  0  1
## 352   11.422039  1  0  0
## 353   10.436383  0  1  0
## 354   25.314207  0  0  1
## 355    6.467101  0  1  0
## 356   33.692750  0  0  1
## 357   34.547752  0  0  1
## 358   18.592257  0  1  0
## 359   24.154280  1  0  0
## 360  172.758081  0  0  1
## 361   17.657531  0  1  0
## 362   44.432053  0  0  1
## 363   30.705226  0  0  1
## 364   63.058395  0  0  1
## 365   50.749177  0  0  1
## 366   15.482837  0  1  0
## 367   34.610090  0  0  1
## 368    6.130223  0  1  0
## 369   54.269445  0  0  1
## 370   11.118717  0  1  0
## 371  158.267770  0  0  1
## 372    6.162795  1  0  0
## 373   11.887193  0  1  0
## 374   50.865440  0  0  1
## 375   37.467638  0  0  1
## 376    9.293351  0  1  0
## 377    3.750807  0  1  0
## 378   24.743928  0  0  1
## 379   19.419049  0  1  0
## 380    9.954072  1  0  0
## 381   22.428068  0  0  1
## 382   27.182584  0  0  1
## 383   26.701156  0  0  1
## 384    5.373500  1  0  0
## 385   37.691523  0  0  1
## 386   80.436770  0  0  1
## 387   36.123623  0  0  1
## 388    4.790721  1  0  0
## 389   11.804047  0  1  0
## 390   30.940868  0  0  1
## 391    7.407139  0  1  0
## 392   15.377512  1  0  0
## 393   56.199057  0  0  1
## 394    7.310097  0  1  0
## 395    9.675646  0  1  0
## 396   89.027418  0  0  1
## 397    6.967499  0  1  0
## 398   54.414074  0  0  1
## 399    4.940676  1  0  0
## 400   12.015107  0  1  0
## 401   17.919239  0  1  0
## 402    4.001411  1  0  0
## 403    8.776761  0  1  0
## 404   19.770757  0  1  0
## 405   45.405099  0  0  1
## 406    8.771289  1  0  0
## 407   13.253153  0  1  0
## 408   50.409861  0  0  1
## 409    7.695547  0  1  0
## 410   27.703437  0  0  1
## 411   37.754936  0  0  1
## 412   29.938797  0  0  1
## 413   43.242956  0  0  1
## 414   18.551031  0  1  0
## 415   11.777987  0  1  0
## 416   31.090472  1  0  0
## 417   19.009793  0  1  0
## 418   35.153352  0  0  1
## 419   40.356537  0  0  1
## 420    9.800445  0  1  0
## 421  108.675439  0  0  1
## 422   31.044728  0  0  1
## 423   25.020871  0  0  1
## 424   76.716174  0  0  1
## 425    6.483093  0  1  0
## 426   12.404609  0  1  0
## 427  157.098021  0  0  1
## 428   22.160946  0  0  1
## 429   98.662349  0  0  1
## 430    5.273800  1  0  0
## 431   16.885528  0  1  0
## 432   31.729708  0  0  1
## 433   30.959723  0  0  1
## 434    5.371403  1  0  0
## 435   18.848381  0  0  1
## 436    5.130622  1  0  0
## 437   44.882628  0  0  1
## 438   66.154351  0  0  1
## 439   17.653274  1  0  0
## 440   73.227740  0  0  1
## 441    4.726205  1  0  0
## 442   37.588373  0  0  1
## 443   44.067609  0  0  1
## 444   16.440659  0  1  0
## 445    8.272031  0  1  0
## 446   24.777442  0  0  1
## 447   33.856644  0  0  1
## 448   53.102580  0  0  1
## 449   14.630884  1  0  0
## 450    7.279484  1  0  0
## 451   87.454448  0  0  1
## 452   64.024466  0  0  1
## 453   36.311504  0  0  1
## 454   48.497141  0  0  1
## 455   56.274697  0  0  1
## 456   31.970381  1  0  0
## 457   67.749135  0  0  1
## 458   11.612255  0  1  0
## 459   28.858132  0  0  1
## 460   14.202146  0  1  0
## 461   53.670669  0  0  1
## 462   13.429528  0  1  0
## 463   39.244525  0  0  1
## 464   12.526688  0  1  0
## 465    5.155810  1  0  0
## 466   72.247961  0  0  1
## 467   48.255528  0  0  1
## 468  105.643670  0  0  1
## 469   23.046882  0  0  1
## 470   66.857542  0  0  1
## 471  122.848940  0  0  1
## 472   12.750161  0  1  0
## 473    3.992733  1  0  0
## 474   14.841015  0  1  0
## 475   13.758477  0  1  0
## 476   23.834595  0  0  1
## 477  105.230965  0  0  1
## 478   12.567790  0  1  0
## 479   31.262028  0  0  1
## 480   15.134063  0  1  0
## 481   22.898708  0  0  1
## 482   31.639455  0  0  1
## 483   79.801246  0  0  1
## 484   25.757278  0  0  1
## 485   46.285814  0  0  1
## 486   50.817245  0  0  1
## 487   56.068288  0  0  1
## 488    8.876866  0  1  0
## 489  100.528049  0  0  1
## 490   13.044182  0  1  0
## 491   18.940335  0  1  0
## 492   82.861365  0  0  1
## 493   76.543951  0  0  1
## 494    7.021331  1  0  0
## 495    4.607747  0  1  0
## 496    5.320810  1  0  0
## 497   27.381113  0  0  1
## 498   26.960372  0  0  1
## 499   33.875277  0  0  1
## 500   39.955825  0  0  1
## 501    7.673314  0  1  0
## 502    4.718425  0  1  0
## 503   61.499334  0  0  1
## 504   49.425902  0  0  1
## 505    7.194696  1  0  0
## 506   20.571721  0  1  0
## 507    5.912245  0  1  0
## 508   13.600309  1  0  0
## 509   26.498271  0  0  1
## 510   19.762125  0  1  0
## 511   18.149181  0  1  0
## 512   26.055079  0  0  1
## 513   55.815272  0  0  1
## 514   28.393119  0  0  1
## 515    8.004606  0  1  0
## 516    7.441023  0  1  0
## 517   17.242095  0  1  0
## 518   31.516117  0  0  1
## 519    4.742882  1  0  0
## 520   15.902313  0  1  0
## 521   59.422375  0  0  1
## 522   17.825447  0  1  0
## 523    6.706701  0  1  0
## 524   13.871347  0  1  0
## 525   67.111304  0  0  1
## 526   40.239306  0  0  1
## 527   75.511750  0  0  1
## 528   23.768921  0  0  1
## 529   34.307992  0  0  1
## 530    9.737219  0  1  0
## 531   44.110083  0  0  1
## 532   10.325159  0  1  0
## 533    5.426489  1  0  0
## 534   24.472833  0  0  1
## 535  121.334186  0  0  1
## 536   52.112225  0  0  1
## 537   69.737022  0  0  1
## 538   32.757051  0  0  1
## 539    8.621437  0  1  0
## 540   16.557350  0  1  0
## 541   15.356408  0  1  0
## 542   10.791721  0  1  0
## 543   46.590732  0  0  1
## 544    6.072637  1  0  0
## 545   54.429869  0  0  1
## 546   55.735236  0  0  1
## 547    4.616998  1  0  0
## 548   44.427099  0  0  1
## 549  158.372277  0  0  1
## 550    9.011444  0  1  0
## 551   53.201882  0  0  1
## 552    5.682080  0  1  0
## 553   65.868185  0  0  1
## 554   29.692200  0  0  1
## 555  100.913683  0  0  1
## 556    4.393235  1  0  0
## 557   20.019616  0  1  0
## 558    8.465351  0  1  0
## 559   14.892472  0  1  0
## 560    8.653640  0  1  0
## 561    5.309897  0  1  0
## 562   42.216742  0  0  1
## 563    4.780245  1  0  0
## 564   89.517189  0  0  1
## 565    4.147622  1  0  0
## 566   22.801599  0  0  1
## 567   40.247383  0  0  1
## 568   41.797761  0  0  1
## 569   13.404638  0  1  0
## 570   23.371138  0  0  1
## 571   60.329650  0  0  1
## 572    4.976303  1  0  0
## 573    5.695283  0  1  0
## 574   30.583541  0  0  1
## 575    9.335085  0  1  0
## 576   82.702671  0  0  1
## 577   45.482727  0  0  1
## 578   23.290338  0  0  1
## 579    6.019433  1  0  0
## 580   21.228821  0  0  1
## 581   12.518365  0  1  0
## 582   19.111852  0  1  0
## 583    3.243143  1  0  0
## 584   38.386642  0  0  1
## 585    4.069774  1  0  0
## 586   15.562211  0  1  0
## 587   32.685061  0  0  1
## 588    5.904980  0  1  0
## 589   43.171673  0  0  1
## 590    5.604387  1  0  0
## 591   34.919417  1  0  0
## 592   36.739859  0  0  1
## 593   52.561389  0  0  1
## 594   12.840193  0  1  0
## 595   38.375040  0  0  1
## 596    4.798019  1  0  0
## 597   17.305334  0  1  0
## 598   13.844112  1  0  0
## 599   68.566634  0  0  1
## 600  115.426944  0  0  1
## 601   64.625348  0  0  1
## 602   20.899773  0  1  0
## 603   19.626084  0  1  0
## 604    5.537774  1  0  0
## 605   48.916918  0  0  1
## 606   15.422901  0  1  0
## 607   10.265679  0  1  0
## 608    6.378782  1  0  0
## 609   15.360513  0  1  0
## 610    6.792111  0  1  0
## 611   12.919007  0  1  0
## 612    5.600674  1  0  0
## 613  102.903397  0  0  1
## 614   19.972473  0  1  0
## 615   64.431610  0  0  1
## 616   27.777663  1  0  0
## 617   10.769302  0  1  0
## 618    8.461167  0  1  0
## 619    5.736571  1  0  0
## 620   93.104188  0  0  1
## 621    4.860754  1  0  0
## 622   34.421913  0  0  1
## 623   54.034354  0  0  1
## 624   27.790921  0  0  1
## 625    5.348032  0  1  0
## 626   57.944202  0  0  1
## 627   26.574277  0  0  1
## 628    6.470914  1  0  0
## 629    4.943314  1  0  0
## 630  132.534380  0  0  1
## 631    5.431496  0  1  0
## 632    9.854493  1  0  0
## 633   38.051640  0  0  1
## 634   31.954724  0  0  1
## 635    4.922594  1  0  0
## 636   11.507802  1  0  0
## 637   19.274945  0  1  0
## 638   69.598056  0  0  1
## 639   42.769297  0  0  1
## 640   10.541466  0  1  0
## 641    4.764725  0  1  0
## 642   28.995980  0  0  1
## 643   27.319592  0  0  1
## 644   44.840499  0  0  1
## 645   12.906548  0  1  0
## 646   55.317062  0  0  1
## 647    4.138819  0  1  0
## 648   11.880168  0  1  0
## 649   49.281037  0  0  1
## 650   60.189163  0  0  1
## 651    9.652866  1  0  0
## 652   23.959021  0  0  1
## 653   43.903394  0  0  1
## 654    6.279082  1  0  0
## 655   36.408270  0  0  1
## 656    4.495675  0  1  0
## 657   61.231532  0  0  1
## 658   39.591579  0  0  1
## 659   50.518835  0  0  1
## 660   22.615079  0  0  1
## 661   55.156021  0  0  1
## 662   82.529709  0  0  1
## 663  123.018299  0  0  1
## 664   18.561358  0  1  0
## 665   18.058685  1  0  0
## 666   21.446421  0  0  1
## 667   28.306670  0  0  1
## 668   17.968090  0  1  0
## 669   41.222595  0  0  1
## 670   63.160835  0  0  1
## 671    9.200750  0  1  0
## 672   14.815773  0  1  0
## 673   34.002983  0  0  1
## 674    8.559347  0  1  0
## 675   24.578824  0  0  1
## 676   13.632848  1  0  0
## 677    5.571163  1  0  0
## 678    4.468658  1  0  0
## 679    5.372830  0  1  0
## 680    8.733203  0  1  0
## 681   33.811257  0  0  1
## 682   60.903289  0  0  1
## 683    6.643601  0  1  0
## 684    6.273799  0  1  0
## 685    4.045345  1  0  0
## 686   12.079473  0  1  0
## 687   14.175630  0  1  0
## 688   34.885347  0  0  1
## 689   11.898943  0  1  0
## 690   14.780007  1  0  0
## 691   19.423086  0  1  0
## 692   69.402879  0  0  1
## 693   70.152374  0  0  1
## 694    8.143131  0  1  0
## 695    5.975704  1  0  0
## 696  162.836130  0  0  1
## 697   16.241821  0  1  0
## 698   18.086743  0  1  0
## 699   34.985913  0  0  1
## 700    7.079337  1  0  0
## 701    6.780942  0  1  0
## 702    5.873370  1  0  0
## 703    4.310052  0  1  0
## 704   26.365942  0  0  1
## 705    6.927742  1  0  0
## 706   40.595388  0  0  1
## 707   29.938165  0  0  1
## 708    6.868512  0  1  0
## 709   28.950064  0  0  1
## 710   49.938379  0  0  1
## 711   63.149998  0  0  1
## 712   18.161169  0  1  0
## 713   17.349924  0  1  0
## 714   19.416153  0  1  0
## 715   86.435949  0  0  1
## 716   67.287744  0  0  1
## 717   52.338540  0  0  1
## 718   12.953838  0  1  0
## 719   34.745130  0  0  1
## 720   34.333360  0  0  1
## 721    5.113365  1  0  0
## 722    8.661544  1  0  0
## 723   43.539419  0  0  1
## 724    5.110772  1  0  0
## 725   21.300904  0  0  1
## 726   26.664787  0  0  1
## 727   39.185077  0  0  1
## 728   25.658126  0  0  1
## 729    5.846311  0  1  0
## 730   60.400444  0  0  1
## 731  107.554402  0  0  1
## 732   53.475396  0  0  1
## 733   12.344797  1  0  0
## 734   84.593440  0  0  1
## 735   18.478609  0  1  0
## 736   38.951220  0  0  1
## 737   43.883646  0  0  1
## 738   18.149459  0  1  0
## 739   18.739921  0  1  0
## 740   62.308298  0  0  1
## 741   46.941368  0  0  1
## 742   59.939122  0  0  1
## 743  155.723408  0  0  1
## 744   46.783865  0  0  1
## 745   28.196424  0  0  1
## 746   16.830461  1  0  0
## 747  182.066966  0  0  1
## 748    9.920790  0  1  0
## 749  156.493726  0  0  1
## 750   33.466993  0  0  1
## 751   91.775750  0  0  1
## 752   18.954782  0  1  0
## 753   38.327039  0  0  1
## 754   26.419481  0  0  1
## 755   16.665504  0  1  0
## 756   19.327052  0  1  0
## 757   62.103830  0  0  1
## 758   16.038197  0  1  0
## 759   13.114446  1  0  0
## 760   16.543971  0  1  0
## 761   38.540053  0  0  1
## 762   41.167823  0  0  1
## 763   42.782417  0  0  1
## 764    7.945055  1  0  0
## 765   35.321482  0  0  1
## 766   58.124728  0  0  1
## 767   76.470947  0  0  1
## 768   16.113428  0  1  0
## 769   14.280978  0  1  0
## 770   89.212672  0  0  1
## 771    6.938566  1  0  0
## 772   10.992822  0  1  0
## 773   38.259953  0  0  1
## 774    7.591377  1  0  0
## 775   27.723404  0  0  1
## 776  117.320131  0  0  1
## 777   21.427002  0  1  0
## 778   14.100155  0  1  0
## 779   36.986158  0  0  1
## 780   13.970765  0  1  0
## 781   51.971212  0  0  1
## 782   55.636368  0  0  1
## 783   21.352465  0  0  1
## 784    3.736517  1  0  0
## 785    5.142437  1  0  0
## 786    9.722662  0  1  0
## 787   51.465648  0  0  1
## 788   13.218815  1  0  0
## 789    9.790698  1  0  0
## 790    7.878094  0  1  0
## 791   34.526874  0  0  1
## 792    4.367381  0  1  0
## 793   56.830674  0  0  1
## 794   30.633178  0  0  1
## 795   16.736024  0  1  0
## 796   52.124774  0  0  1
## 797   33.805825  0  0  1
## 798   10.689691  0  1  0
## 799   28.562124  0  0  1
## 800   43.130492  0  0  1
## 801   31.209823  0  0  1
## 802    7.116875  0  1  0
## 803   53.664787  0  0  1
## 804   70.490105  0  0  1
## 805   31.239073  0  0  1
## 806   25.603606  0  0  1
## 807   19.332719  0  1  0
## 808  137.336620  0  0  1
## 809   29.701091  0  0  1
## 810   15.463905  0  1  0
## 811   26.544056  1  0  0
## 812    6.037841  1  0  0
## 813   19.232227  0  1  0
## 814   26.940939  0  0  1
## 815   29.010713  0  0  1
## 816   51.881559  0  0  1
## 817   15.649170  0  1  0
## 818   73.805776  0  0  1
## 819    5.597061  0  1  0
## 820    5.120101  1  0  0
## 821    3.472418  1  0  0
## 822   49.893197  0  0  1
## 823   18.920121  0  1  0
## 824   49.744369  0  0  1
## 825   14.673268  0  1  0
## 826    9.501490  0  1  0
## 827   10.930654  0  1  0
## 828   12.003105  1  0  0
## 829    4.508740  1  0  0
## 830    6.697768  1  0  0
## 831   63.932430  0  0  1
## 832    8.223973  0  1  0
## 833   48.748918  0  0  1
## 834   89.958490  0  0  1
## 835   16.030296  0  1  0
## 836   30.743509  0  0  1
## 837    6.883280  1  0  0
## 838    7.495330  0  1  0
## 839   24.010928  0  0  1
## 840   29.304553  0  0  1
## 841    7.129379  0  1  0
## 842  181.644027  0  0  1
## 843   10.561334  0  1  0
## 844   23.777009  0  0  1
## 845   45.129266  0  0  1
## 846   19.430427  0  1  0
## 847    7.563624  0  1  0
## 848   62.404362  0  0  1
## 849   97.894653  0  0  1
## 850   18.620074  0  1  0
## 851   37.907895  0  0  1
## 852   19.856133  0  1  0
## 853   64.132394  0  0  1
## 854    4.212956  1  0  0
## 855  151.381895  0  0  1
## 856    8.464622  0  1  0
## 857    7.165430  1  0  0
## 858   13.206165  0  1  0
## 859   25.569782  0  0  1
## 860   96.296007  0  0  1
## 861   57.560464  0  0  1
## 862   24.881515  0  0  1
## 863    3.858945  1  0  0
## 864    5.549403  0  1  0
## 865   16.621187  0  1  0
## 866  171.759246  0  0  1
## 867    9.271591  1  0  0
## 868   67.463302  0  0  1
## 869   11.061272  0  1  0
## 870  102.312719  0  0  1
## 871    5.838934  1  0  0
## 872   26.930506  0  0  1
## 873    4.924752  1  0  0
## 874   55.512679  0  0  1
## 875   89.520249  0  0  1
## 876  122.342661  0  0  1
## 877   53.380453  0  0  1
## 878  111.418039  0  0  1
## 879   74.815976  0  0  1
## 880   18.170989  0  1  0
## 881   37.033218  0  0  1
## 882    3.563613  0  1  0
## 883   16.967438  0  1  0
## 884   72.296927  0  0  1
## 885   40.177680  0  0  1
## 886   37.223666  0  0  1
## 887    6.148881  1  0  0
## 888    6.901001  0  1  0
## 889    7.688242  0  1  0
## 890   33.186409  0  0  1
## 891    3.750031  0  1  0
## 892   41.842166  0  0  1
## 893    5.843228  1  0  0
## 894  112.078553  0  0  1
## 895   37.201831  0  0  1
## 896    6.408446  0  1  0
## 897   28.295492  0  0  1
## 898   12.528266  0  1  0
## 899   32.764413  0  0  1
## 900   29.264713  0  0  1
## 901    6.483261  1  0  0
## 902    5.901572  0  1  0
## 903   30.433041  0  0  1
## 904  100.768105  0  0  1
## 905   65.156963  0  0  1
## 906    5.589087  0  1  0
## 907   52.825609  0  0  1
## 908   18.781925  0  1  0
## 909   29.527723  0  0  1
## 910   30.524396  0  0  1
## 911  175.155889  0  0  1
## 912    3.631046  1  0  0
## 913   24.978999  0  0  1
## 914    9.086318  1  0  0
## 915   41.832090  0  0  1
## 916   67.357876  0  0  1
## 917   86.771489  0  0  1
## 918   10.718778  1  0  0
## 919   34.131681  0  0  1
## 920   96.766428  0  0  1
## 921    9.128518  0  1  0
## 922   16.695124  0  1  0
## 923   18.425827  0  1  0
## 924   32.841320  0  0  1
## 925   44.743369  0  0  1
## 926   80.031877  0  0  1
## 927   18.624611  0  1  0
## 928   27.060025  0  0  1
## 929  165.643678  0  0  1
## 930   34.473911  0  0  1
## 931   26.516721  0  0  1
## 932    6.214176  1  0  0
## 933   21.763541  0  0  1
## 934   14.115194  0  1  0
## 935   21.330137  0  1  0
## 936    7.483982  0  1  0
## 937    6.831440  0  1  0
## 938    8.263434  0  1  0
## 939   13.294975  0  1  0
## 940   79.990143  0  0  1
## 941   64.882791  0  0  1
## 942   11.660889  0  1  0
## 943    9.145818  1  0  0
## 944    5.743268  0  1  0
## 945   11.125017  0  1  0
## 946   24.844033  0  0  1
## 947    7.064365  0  1  0
## 948  125.203677  0  0  1
## 949   13.850967  0  1  0
## 950    7.509573  1  0  0
## 951   17.410276  0  1  0
## 952    5.451637  1  0  0
## 953   25.324222  0  0  1
## 954   27.954967  0  0  1
## 955   33.852518  0  0  1
## 956    6.814176  1  0  0
## 957   27.784614  0  0  1
## 958   31.226115  0  0  1
## 959    5.011830  1  0  0
## 960   60.661984  0  0  1
## 961   48.907413  0  0  1
## 962  150.859626  0  0  1
## 963   25.330762  0  0  1
## 964   23.691867  0  0  1
## 965   21.817172  0  0  1
## 966   23.967943  0  0  1
## 967   11.121413  1  0  0
## 968    6.449213  1  0  0
## 969   18.132733  0  1  0
## 970    8.975310  0  1  0
## 971   23.712411  0  0  1
## 972   14.606576  1  0  0
## 973    5.765480  1  0  0
## 974    6.289000  1  0  0
## 975   59.150013  0  0  1
## 976    5.670237  1  0  0
## 977    6.733048  0  1  0
## 978   22.207210  0  0  1
## 979   12.545217  0  1  0
## 980   26.140214  0  0  1
## 981  149.550424  0  0  1
## 982    6.112996  1  0  0
## 983   12.964807  0  1  0
## 984    8.020732  0  1  0
## 985   91.842978  0  0  1
## 986    5.951085  0  1  0
## 987   18.902766  0  1  0
## 988   52.193735  0  0  1
## 989    4.556418  1  0  0
## 990  101.769762  0  0  1
## 991   44.304066  0  0  1
## 992    5.694764  1  0  0
## 993   36.454670  0  0  1
## 994   14.836628  0  1  0
## 995   29.667744  0  0  1
## 996   18.654485  0  1  0
## 997   64.981669  0  0  1
## 998    5.352303  1  0  0
## 999   10.181127  0  1  0
## 1000  14.307327  0  1  0
step.mod <- lm(y~c1+c2+c3)
plot(datapsd.x,y,xlim=c(-2,5), ylim=c(-10,70))
lines(datapsd.x,lin.mod$fitted.values,col="blue", lwd =1)
lines(datapsd.x[ix], step.mod$fitted.values[ix],col="red", lwd =2)

summary(step.mod)
## 
## Call:
## lm(formula = y ~ c1 + c2 + c3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -36.639 -11.091  -2.245   4.847 182.198 
## 
## Coefficients: (1 not defined because of singularities)
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   55.488      1.095   50.68   <2e-16 ***
## c11          -47.735      2.206  -21.64   <2e-16 ***
## c21          -43.407      1.741  -24.93   <2e-16 ***
## c31               NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 24.6 on 997 degrees of freedom
## Multiple R-squared:  0.4555, Adjusted R-squared:  0.4544 
## F-statistic: 417.1 on 2 and 997 DF,  p-value: < 2.2e-16

Terdapat garis warna merah yang menunjukkan garis fungsi tangga dimana terdapat 3 break. Plot di atas merupakan fungsi tangga dengan break 3. Selang 1 adalah data x dengan rentang >-2.5 (data minimum) sampai 0. Selang 2 dalam rentang 0 sampai 2 dan selang 3 dalam rentang 2 sampai >5 (data maksimum). Secara eksploratif, garis tangga lebih mengikuti pola data dibandingkan dengan garis fungsi regresi linier.

2.5 Perbandingan Hasil Model

Perbandingan model dapat dilihat secara eksploratif dengan plot-plot di atas. Untuk lebih akurat, komparasi model dapat dilakukan dengan melihat nilai AIC (Akaike’s Information Criterion) dan MSE (Mean Squared Error). Semakin baik model maka semakin kecil nilai AIC atau MSE setiap modelnya.

MSE = function(pred,actual){
  mean((pred-actual)^2)
}
AIC <- rbind(AIC(lin.mod),
                   AIC(pol.mod),
                   AIC(step.mod))
MSE <- rbind(MSE(predict(lin.mod),y),
                   MSE(predict(pol.mod),y),
                   MSE(predict(step.mod),y))
Model <- c("Linear","Polynomial (ordo=2)","Tangga (breaks=3)")
model.bangkitan1 <- data.frame(Model,AIC,MSE)

Berdasarkan ketiga hasil pemodelan, didapatkan bahwa model tersebut tidak linier. Secara eksploratif, fungsi polinomial ordo dua paling tepat dan baik. Berdasarkan komparasi model dengan nilai AIC dan MSE, fungsi model polinomial ordo dua juga menjadi model terbaik dikarenakan memiliki nilai AIC dan MSE paling kecil di antara ketiga model lainnya yaitu masing-masing sebesar 2856.470 dan 1.010649.

2.6 Piecewise Cubic Polynomial

# Membagi data menjadi 2 bagian berdasarkan mean (mean = 2)
dt.all <- cbind(y,datapsd.x)
## knots = 1
dt1 <- dt.all[datapsd.x <=2,]
dim(dt1)
## [1] 495   2
## knots = 2
dt2 <- dt.all[datapsd.x >2,]
dim(dt2)
## [1] 505   2
plot(datapsd.x,y,xlim=c(-2,5), ylim=c(-10,70), pch=16, col="orange")

cub.mod1 <- lm(dt1[,1]~dt1[,2]+I(dt1[,2]^2)+I(dt1[,2]^3))
ix <- sort(dt1[,2], index.return=T)$ix
lines(dt1[ix,2],cub.mod1$fitted.values[ix],col="red", lwd=2)

cub.mod2 <- lm(dt2[,1]~dt2[,2]+I(dt2[,2]^2)+I(dt2[,2]^3))
ix <- sort(dt2[,2], index.return=T)$ix
lines(dt2[ix,2],cub.mod2$fitted.values[ix],col="blue", lwd=2)

2.7 Truncated Power Basis

plot(datapsd.x,y,xlim=c( -2,5), ylim=c( -10,70), pch=16,col="orange")
abline(v=1,col="red", lty=2)

hx <- ifelse( datapsd.x>1,(datapsd.x-1)^3,0)

cubspline.mod <- lm( y~datapsd.x+I(datapsd.x^2)+I(datapsd.x^3)+hx)
ix <- sort( datapsd.x,index.return=T)$ix
lines( datapsd.x[ix], cubspline.mod$fitted.values[ix],col="blue", lwd=2)

2.8 Perbandingan dengan K-Fold CV

plot( datapsd.x,y,xlim=c(-2,5), ylim=c( -10,70), pch=16,col="orange")
abline(v=0,col="red", lty=2)
abline(v=2,col="red", lty=2)

hx1 <- ifelse( datapsd.x>0,(datapsd.x-0)^3,0)
hx2 <- ifelse( datapsd.x>2,(datapsd.x-2)^3,0)

cubspline.mod2 <- lm( y~datapsd.x+I(datapsd.x^2)+I(datapsd.x^3)+hx1+hx2)
ix <- sort( datapsd.x,index.return=T)$ix
lines(datapsd.x[ix],cubspline.mod2$fitted.values[ix],col="blue", lwd=2)

2.9 Perbandingan 1 Knot dan 2 Knot dengan 5-Fold CV

##1 knot
data.all <- cbind( y,datapsd.x,hx)
set.seed(456)
ind <- sample(1:5,length( datapsd.x),replace=T)
res <- c()
for( i in 1:5){
  dt.train <- data.all[ ind!=i,]
  x.test <- data.all[ ind==i, -1]
  y.test <- data.all[ ind==i,1]
  mod1 <- lm( dt.train[,1]~dt.train[,2]+I( dt.train[,2]^2)+I( dt.train[,2]^3)+dt.train[,3])
  x.test.olah <- cbind(1,x.test[,1], x.test[,1]^2,x.test[,1]^3,x.test[,2])
  beta <- coefficients(mod1)
  prediksi <- x.test.olah%*%beta
  res <- c( res,mean(( y.test-prediksi)^2))
}
res
## [1] 0.8883211 0.8545532 1.1762354 1.1853072 1.0085237
mean(res)
## [1] 1.022588
##2 knot (knot = 0, 2)
data.all2 <- cbind(y,datapsd.x,hx1,hx2)
set.seed(456)
ind2 <- sample(1:5,length( datapsd.x),replace=T)
res2 <- c()
for( i in 1:5){
  dt.train2 <- data.all2[ind2!=i,]
  x.test2 <- data.all2[ind2==i,-1]
  y.test2 <- data.all2[ind2==i,1]
  mod2 <- lm(dt.train2[,1]~dt.train2[,2]+I(dt.train2[,2]^2)+I(dt.train2[,2]^3)+dt.train2[,3]+dt.train2[,4])
  x.test.olah2 <- cbind(1,x.test2[,1],x.test2[,1]^2,x.test2[,1]^3,x.test2[,2],x.test2[,3])
  beta2 <- coefficients(mod2)
  prediksi2 <- x.test.olah2%*%beta2
  res2 <- c(res2,mean((y.test2-prediksi2)^2))
}
res2
## [1] 0.8926291 0.8563630 1.1799248 1.1941252 1.0096880
mean(res2)
## [1] 1.026546

2.10 Spline Regression

  1. Menentukan banyaknya fungsi basis dan knots (Penentuan via komputerisasi)
dt.all <- as.data.frame(dt.all)
knots.val <- attr(bs(dt.all$datapsd.x, df=6), "knots")
knots.val
##       25%       50%       75% 
## 0.7433515 2.0184193 3.3292037
  1. Pemodelan Regresi Spline (df = banyak bagian yang terbagi + 2)
spline.mod.tr <- lm(y ~ bs(datapsd.x, knots=knots.val), data=dt.all)
summary(spline.mod.tr)
## 
## Call:
## lm(formula = y ~ bs(datapsd.x, knots = knots.val), data = dt.all)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.0346 -0.6809 -0.0013  0.7086  3.3460 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        35.7407     0.5373   66.52   <2e-16 ***
## bs(datapsd.x, knots = knots.val)1 -26.9856     0.8589  -31.42   <2e-16 ***
## bs(datapsd.x, knots = knots.val)2 -39.9744     0.4999  -79.96   <2e-16 ***
## bs(datapsd.x, knots = knots.val)3 -15.8889     0.5700  -27.87   <2e-16 ***
## bs(datapsd.x, knots = knots.val)4  30.4698     0.5613   54.29   <2e-16 ***
## bs(datapsd.x, knots = knots.val)5 111.9672     0.7656  146.26   <2e-16 ***
## bs(datapsd.x, knots = knots.val)6 201.8858     0.9206  219.29   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.006 on 993 degrees of freedom
## Multiple R-squared:  0.9991, Adjusted R-squared:  0.9991 
## F-statistic: 1.823e+05 on 6 and 993 DF,  p-value: < 2.2e-16

2.11 Smoothing Spline

ss1 <- smooth.spline(datapsd.x,y,all.knots = T)
ss.mod.tr <- with(data=dt.all, smooth.spline(datapsd.x, y))
ss.mod.tr
## Call:
## smooth.spline(x = datapsd.x, y = y)
## 
## Smoothing Parameter  spar= 0.8275227  lambda= 0.0001455632 (14 iterations)
## Equivalent Degrees of Freedom (Df): 17.06386
## Penalized Criterion (RSS): 993.8526
## GCV: 2278.857
plot(datapsd.x,y,xlim=c(-2,5), ylim=c(-10,70), pch=16,col="orange")
lines(ss1,col="blue", lwd=2)

2.12 Loess Function

loess.tr <- loess(formula=y~datapsd.x, data=dt.all)
summary(loess.tr)
## Call:
## loess(formula = y ~ datapsd.x, data = dt.all)
## 
## Number of Observations: 1000 
## Equivalent Number of Parameters: 5.25 
## Residual Standard Error: 1.006 
## Trace of smoother matrix: 5.74  (exact)
## 
## Control settings:
##   span     :  0.75 
##   degree   :  2 
##   family   :  gaussian
##   surface  :  interpolate      cell = 0.2
##   normalize:  TRUE
##  parametric:  FALSE
## drop.square:  FALSE
loess.span.tr <- data.frame(span=seq(0.1,2,by=0.1)) %>% 
  group_by(span) %>% 
  do(broom::augment(loess(y~datapsd.x, data=dt.all,span=.$span)))
ggplot(loess.span.tr, aes(x=datapsd.x, y=y)) + geom_line(aes(y=.fitted), col="blue", lty=1) + facet_wrap(~span)

Contoh Data TRICEPS

Penjelasan data

Data yang digunakan untuk ilustrasi berasal dari studi antropometri terhadap 892 perempuan di bawah 50 tahun di tiga desa Gambia di Afrika Barat. Data terdiri dari 3 Kolom yaitu Age, Intriceps dan tricepts. Berikut adalah penjelasannya pada masing-masing kolom:

  • age : umur responden
  • Intriceps : logaritma dari ketebalan lipatan kulit triceps
  • triceps: ketebalan lipatan kulit triceps

Lipatan kulit trisep diperlukan untuk menghitung lingkar otot lengan atas. Ketebalannya memberikan informasi tentang cadangan lemak tubuh, sedangkan massa otot yang dihitung memberikan informasi tentang cadangan protein

data("triceps", package="MultiKink")

Jika digambarkan dalam bentuk scatterplot

library(corrplot)
## Warning: package 'corrplot' was built under R version 4.1.2
## corrplot 0.92 loaded
ggplot(triceps,aes(x=age, y=triceps)) +
                 geom_point(alpha=0.55, color="black") + 
                 theme_bw()

cor(triceps)
##                 age lntriceps   triceps
## age       1.0000000 0.5776443 0.5806972
## lntriceps 0.5776443 1.0000000 0.9612043
## triceps   0.5806972 0.9612043 1.0000000

Scatterplot diatas terlihat memiliki hubungan positif yang lemah dengan nilai korelasi hanya sebesar 0.5806972. Berdasarkan pola hubungan yang terlihat pada scatterplot, akan dicoba untuk mencari model yang bisa merepresentasikan pola hubungan tersebut dengan baik.

Regresi Linear

mod_linear = lm(triceps~age,data=triceps)
summary(mod_linear)
## 
## Call:
## lm(formula = triceps ~ age, data = triceps)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.9512  -2.3965  -0.5154   1.5822  25.1233 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.19717    0.21244   29.17   <2e-16 ***
## age          0.21584    0.01014   21.28   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.007 on 890 degrees of freedom
## Multiple R-squared:  0.3372, Adjusted R-squared:  0.3365 
## F-statistic: 452.8 on 1 and 890 DF,  p-value: < 2.2e-16
ggplot(triceps,aes(x=age, y=triceps)) +
                 geom_point(alpha=0.55, color="black") +
   stat_smooth(method = "lm", 
               formula = y~x,lty = 1,
               col = "blue",se = F)+
  theme_bw()

Berdasarkan scatterplot diatas, fungsi regresi linier belum mengikuti pola hubungan pada scatterplot.

Regresi Polinomial Derajat 2 (Ordo 2)

mod_polinomial2 = lm(triceps ~ poly(age,2,raw = T), # raw = T untuk matriks ortogonal atau tidak
                     data=triceps)
summary(mod_polinomial2)
## 
## Call:
## lm(formula = triceps ~ poly(age, 2, raw = T), data = triceps)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.5677  -2.4401  -0.4587   1.6368  24.9961 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             6.0229191  0.3063806  19.658  < 2e-16 ***
## poly(age, 2, raw = T)1  0.2434733  0.0364403   6.681 4.17e-11 ***
## poly(age, 2, raw = T)2 -0.0006257  0.0007926  -0.789     0.43    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.008 on 889 degrees of freedom
## Multiple R-squared:  0.3377, Adjusted R-squared:  0.3362 
## F-statistic: 226.6 on 2 and 889 DF,  p-value: < 2.2e-16
ggplot(triceps,aes(x=age, y=triceps)) + 
  geom_point(alpha=0.55, color="black") +
  stat_smooth(method = "lm", 
               formula = y~poly(x,2,raw=T), 
               lty = 1, col = "blue",se = F)+
  theme_bw()

ggplot(triceps,aes(x=age, y=triceps)) + 
  geom_point(alpha=0.55, color="black") +
  stat_smooth(method = "lm", 
               formula = y~poly(x,2,raw=T), 
               lty = 1, col = "blue",se = T)+
  theme_bw()

Fungsi polinomial Ordo 2 juga belum mengikuti pola hubungan pada scatterplot.

Regresi Polinomial Derajat 3 (ordo 3)

mod_polinomial3 = lm(triceps ~ poly(age,3,raw = T),data=triceps)
summary(mod_polinomial3)
## 
## Call:
## lm(formula = triceps ~ poly(age, 3, raw = T), data = triceps)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.5832  -1.9284  -0.5415   1.3283  24.4440 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             8.004e+00  3.831e-01  20.893  < 2e-16 ***
## poly(age, 3, raw = T)1 -3.157e-01  7.721e-02  -4.089 4.73e-05 ***
## poly(age, 3, raw = T)2  3.101e-02  3.964e-03   7.824 1.45e-14 ***
## poly(age, 3, raw = T)3 -4.566e-04  5.612e-05  -8.135 1.38e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.868 on 888 degrees of freedom
## Multiple R-squared:  0.3836, Adjusted R-squared:  0.3815 
## F-statistic: 184.2 on 3 and 888 DF,  p-value: < 2.2e-16
ggplot(triceps,aes(x=age, y=triceps)) + 
  geom_point(alpha=0.55, color="black") +
  stat_smooth(method = "lm", 
               formula = y~poly(x,3,raw=T), 
               lty = 1, col = "blue",se = T)+
  theme_bw()

Berdasarkan scatterplot diatas, dapat dilihat bahwa regresi polynomial ordo 3 lebih mengikuti pola hubungan jika dibandingkan dengan regresi linier dan polynomial orodo 2.

Regresi Fungsi Tangga (5)

mod_tangga5 = lm(triceps ~ cut(age,5),data=triceps)
summary(mod_tangga5)
## 
## Call:
## lm(formula = triceps ~ cut(age, 5), data = triceps)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.5474  -2.0318  -0.4465   1.3682  23.3759 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              7.2318     0.1994  36.260  < 2e-16 ***
## cut(age, 5)(10.6,20.9]   1.6294     0.3244   5.023 6.16e-07 ***
## cut(age, 5)(20.9,31.2]   5.9923     0.4222  14.192  < 2e-16 ***
## cut(age, 5)(31.2,41.5]   7.5156     0.4506  16.678  < 2e-16 ***
## cut(age, 5)(41.5,51.8]   7.4561     0.5543  13.452  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.939 on 887 degrees of freedom
## Multiple R-squared:  0.3617, Adjusted R-squared:  0.3588 
## F-statistic: 125.7 on 4 and 887 DF,  p-value: < 2.2e-16
ggplot(triceps,aes(x=age, y=triceps)) +
       geom_point(alpha=0.55, color="black") +
       stat_smooth(method = "lm", 
               formula = y~cut(x,5), 
               lty = 1, col = "blue",se = F)+
  theme_bw()

Scatter plot diatas menggambarkan fungsi tangga dimana data age dibagi menjadi 5 selang yang sama panjang. Dapat dilihat bahwa fungsi tangga cenderung mengikuti pola hubungan pada data.

Regresi Fungsi Tangga (7)

mod_tangga7 = lm(triceps ~ cut(age,7),data=triceps)
summary(mod_tangga7)
## 
## Call:
## lm(formula = triceps ~ cut(age, 7), data = triceps)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.8063  -1.7592  -0.4366   1.2894  23.1461 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              7.5592     0.2219  34.060  < 2e-16 ***
## cut(age, 7)(7.62,15]    -0.6486     0.3326  -1.950   0.0515 .  
## cut(age, 7)(15,22.3]     3.4534     0.4239   8.146 1.27e-15 ***
## cut(age, 7)(22.3,29.7]   5.8947     0.4604  12.804  < 2e-16 ***
## cut(age, 7)(29.7,37]     7.8471     0.5249  14.949  < 2e-16 ***
## cut(age, 7)(37,44.4]     6.9191     0.5391  12.835  < 2e-16 ***
## cut(age, 7)(44.4,51.8]   6.3013     0.6560   9.606  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.805 on 885 degrees of freedom
## Multiple R-squared:  0.4055, Adjusted R-squared:  0.4014 
## F-statistic: 100.6 on 6 and 885 DF,  p-value: < 2.2e-16
ggplot(triceps,aes(x=age, y=triceps)) +
       geom_point(alpha=0.55, color="black") +
       stat_smooth(method = "lm", 
               formula = y~cut(x,7), 
               lty = 1, col = "blue",se = F)+
  theme_bw()

Scatter plot di atas menggambarkan fungsi tangga dengan 7 selang. Garis pada fungsi tangga dengan selang 7 lebih mengikuti pola data dibandingkan dengan selang 5.

Perbandingan Model

MSE = function(pred,actual){
  mean((pred-actual)^2)
}
nilai_AIC_triceps <- rbind(AIC(mod_linear),
                   AIC(mod_polinomial2),
                   AIC(mod_polinomial3),
                   AIC(mod_tangga5),
                   AIC(mod_tangga7))
nilai_MSE_triceps <- rbind(MSE(predict(mod_linear),triceps$triceps),
              MSE(predict(mod_polinomial2),triceps$triceps),
              MSE(predict(mod_polinomial3),triceps$triceps),
              MSE(predict(mod_tangga5),triceps$triceps),
              MSE(predict(mod_tangga7),triceps$triceps))
nama_model_triceps <- c("Linear","Poly (ordo=2)", "Poly (ordo=3)",
                "Tangga (breaks=5)", "Tangga (breaks=7)")
data.frame(nama_model_triceps,nilai_AIC_triceps, nilai_MSE_triceps)
##   nama_model_triceps nilai_AIC_triceps nilai_MSE_triceps
## 1             Linear          5011.515          16.01758
## 2      Poly (ordo=2)          5012.890          16.00636
## 3      Poly (ordo=3)          4950.774          14.89621
## 4  Tangga (breaks=5)          4983.948          15.42602
## 5  Tangga (breaks=7)          4924.556          14.36779

Berdasarkan perbandingan model diatas dilihat dari nilai MSE dan BIC, model fungsi tangga dengan selang 7 memiliki nilai MSE paling kecil yaitu sebesar 14.36779 dan nilai AIC sebesar 4924.556.

Evaluasi Model Menggunakan Cross Validation

Cross validation atau dapat disebut estimasi rotasi adalah sebuah teknik validasi model untuk menilai bagaimana hasil statistik analisis akan menggeneralisasi kumpulan data independen. Teknik ini utamanya digunakan untuk melakukan prediksi model dan memperkirakan seberapa akurat sebuah model prediktif ketika dijalankan dalam praktiknya.

Regresi Linear

set.seed(123)
cross_val <- vfold_cv(triceps,v=10,strata = "triceps")
metric_linear <- map_dfr(cross_val$splits,
    function(x){
    mod <- lm(triceps ~ age,data=triceps[x$in_id,])
    pred <- predict(mod,newdata=triceps[-x$in_id,])
    truth <- triceps[-x$in_id,]$triceps
    rmse <- mlr3measures::rmse(truth = truth,response = pred)
    mae <- mlr3measures::mae(truth = truth,response = pred)
    metric <- c(rmse,mae)
    names(metric) <- c("RMSE","MAE")
    return(metric)
  }
)
metric_linear
## # A tibble: 10 x 2
##     RMSE   MAE
##    <dbl> <dbl>
##  1  3.65  2.82
##  2  4.62  3.22
##  3  4.38  3.00
##  4  3.85  2.80
##  5  3.08  2.36
##  6  3.83  2.81
##  7  3.59  2.78
##  8  4.66  3.06
##  9  3.50  2.56
## 10  4.59  2.93
# menghitung rata-rata untuk 10 folds
mean_metric_linear <- colMeans(metric_linear)
mean_metric_linear
##     RMSE      MAE 
## 3.973249 2.833886

Polynomial Derajat 2 (ordo 2)

set.seed(123)
cross_val <- vfold_cv(triceps,v=10,strata = "triceps")
metric_poly2 <- map_dfr(cross_val$splits,
function(x){
  mod <- lm(triceps ~ poly(age,2,raw = T),data=triceps[x$in_id,])
  pred <- predict(mod,newdata=triceps[-x$in_id,])
  truth <- triceps[-x$in_id,]$triceps
  rmse <- mlr3measures::rmse(truth = truth,response = pred)
  mae <- mlr3measures::mae(truth = truth,response = pred)
  metric <- c(rmse,mae)
  names(metric) <- c("RMSE","MAE")
  return(metric)
  }
)
metric_poly2
## # A tibble: 10 x 2
##     RMSE   MAE
##    <dbl> <dbl>
##  1  3.64  2.82
##  2  4.62  3.26
##  3  4.39  3.02
##  4  3.85  2.81
##  5  3.10  2.38
##  6  3.82  2.81
##  7  3.62  2.83
##  8  4.65  3.07
##  9  3.50  2.57
## 10  4.59  2.94
# menghitung rata-rata untuk 10 folds
mean_metric_poly2 <- colMeans(metric_poly2)
mean_metric_poly2
##     RMSE      MAE 
## 3.977777 2.851787

Polynomial Derajat 3 (ordo 3)

set.seed(123)
cross_val <- vfold_cv(triceps,v=10,strata = "triceps")

metric_poly3 <- map_dfr(cross_val$splits,
function(x){
  mod <- lm(triceps ~ poly(age,3,raw = T),data=triceps[x$in_id,])
  pred <- predict(mod,newdata=triceps[-x$in_id,])
  truth <- triceps[-x$in_id,]$triceps
  rmse <- mlr3measures::rmse(truth = truth,response = pred)
  mae <- mlr3measures::mae(truth = truth,response = pred)
  metric <- c(rmse,mae)
  names(metric) <- c("RMSE","MAE")
  return(metric)
  }
)
metric_poly3
## # A tibble: 10 x 2
##     RMSE   MAE
##    <dbl> <dbl>
##  1  3.49  2.60
##  2  4.48  2.99
##  3  4.21  2.85
##  4  4.02  2.75
##  5  3.03  2.09
##  6  3.63  2.59
##  7  3.53  2.52
##  8  4.54  2.91
##  9  3.27  2.33
## 10  4.27  2.68
# menghitung rata-rata untuk 10 folds
mean_metric_poly3 <- colMeans(metric_poly3)
mean_metric_poly3
##     RMSE      MAE 
## 3.845976 2.632125

Fungsi Tangga

set.seed(123)
cross_val <- vfold_cv(triceps,v=10,strata = "triceps")
breaks <- 3:10
best_tangga <- map_dfr(breaks, function(i){
    metric_tangga <- map_dfr(cross_val$splits,
    function(x){
        training <- triceps[x$in_id,]
        training$age <- cut(training$age,i)
        mod <- lm(triceps ~ age,data=training)
        labs_x <- levels(mod$model[,2])
        labs_x_breaks <- cbind(lower = as.numeric( sub("\\((.+),.*", "\\1", labs_x) ),
                  upper = as.numeric( sub("[^,]*,([^]]*)\\]", "\\1", labs_x) ))
        testing <- triceps[-x$in_id,]
        age_new <- cut(testing$age,c(labs_x_breaks[1,1],labs_x_breaks[,2]))
        pred <- predict(mod,newdata=list(age=age_new))
        truth <- testing$triceps
        data_eval <- na.omit(data.frame(truth,pred))
        rmse <- mlr3measures::rmse(truth = data_eval$truth,response = data_eval$pred)
        mae <- mlr3measures::mae(truth = data_eval$truth,response = data_eval$pred)
        metric <- c(rmse,mae)
        names(metric) <- c("RMSE","MAE")
        return(metric)
      }
    )
  metric_tangga
  # menghitung rata-rata untuk 10 folds
  mean_metric_tangga <- colMeans(metric_tangga)
  mean_metric_tangga
  }
)

best_tangga <- cbind(breaks=breaks,best_tangga)
# menampilkan hasil all breaks
best_tangga
##   breaks     RMSE      MAE
## 1      3 3.835357 2.618775
## 2      4 3.882932 2.651911
## 3      5 3.917840 2.724368
## 4      6 3.836068 2.622939
## 5      7 3.789715 2.555062
## 6      8 3.812789 2.555563
## 7      9 3.781720 2.518706
## 8     10 3.795877 2.529479
#berdasarkan rmse
best_tangga %>% slice_min(RMSE)
##   breaks    RMSE      MAE
## 1      9 3.78172 2.518706
#berdasarkan mae
best_tangga %>% slice_min(MAE)
##   breaks    RMSE      MAE
## 1      9 3.78172 2.518706

Perbandingan Model

nilai_metric <- rbind(mean_metric_linear,
                      mean_metric_poly2,
                      mean_metric_poly3,
                      best_tangga %>% select(-1) %>% slice_min(MAE))

nama_model <- c("Linear","Poly2","Poly3","Tangga (breaks=9)")
data.frame(nama_model,nilai_metric)
##          nama_model     RMSE      MAE
## 1            Linear 3.973249 2.833886
## 2             Poly2 3.977777 2.851787
## 3             Poly3 3.845976 2.632125
## 4 Tangga (breaks=9) 3.781720 2.518706

Berdasarkan perbandingan model diatas, diperoleh nilai RMSE dan MAE terkecil yaitu pada model Tangga breaks 9 dengan nilai masing-masing sebesar 3.781720 dan 2.518706.

3. REGRESI NON LINIER PART 2

Data Contoh Kuliah

Data yang digunakan adalah data dari set seed 123. Sebanyak data 1000 dibangkitkan dengan menyebar secara normal yang mempunyai rata-rata dan ragam 1. Data dibagi menjadi 2 bagian, yaitu data.x kurang dari sama dengan 1 dan data.x lebih dari 1

set.seed(123)
datapsd.x <- rnorm(1000,1,1)
err <- rnorm(1000,0,5)
y <- 5+2*datapsd.x+3*datapsd.x^2+err
plot(datapsd.x,y,xlim=c(-2,5), ylim=c(-10,70), pch=16, col="blue")
abline(v=1, col="red", lty=2)

Piecewise cubic polynomial

dt.all <- cbind(y,datapsd.x)

## knots = 1
dt1 <- dt.all[datapsd.x <=1,]
dim(dt1)
## [1] 495   2
## knots = 2
dt2 <- dt.all[datapsd.x >1,]
dim(dt2)
## [1] 505   2
plot(datapsd.x,y,xlim=c(-2,5), ylim=c(-10,70), pch=16, col="dark blue")

cub.mod1 <- lm(dt1[,1]~dt1[,2]+I(dt1[,2]^2)+I(dt1[,2]^3))
ix <- sort(dt1[,2], index.return=T)$ix
lines(dt1[ix,2],cub.mod1$fitted.values[ix],col="red", lwd=2)

cub.mod2 <- lm(dt2[,1]~dt2[,2]+I(dt2[,2]^2)+I(dt2[,2]^3))
ix <- sort(dt2[,2], index.return=T)$ix
lines(dt2[ix,2],cub.mod2$fitted.values[ix],col="red", lwd=2)

Dapat dilihat bahwa terdapat perbedaaan garis pada nilai 1 datapsd.x

Truncated power basis

Untuk menghaluskan data.x sama dengan 1 maka dilakuan penghalusan dengan menggunakan truncates power basis.

plot(datapsd.x,y,xlim=c( -2,5), ylim=c( -10,70), pch=16,col="dark blue")
abline(v=1,col="red", lty=2)

hx <- ifelse( datapsd.x>1,(datapsd.x-1)^3,0)

cubspline.mod <- lm( y~datapsd.x+I(datapsd.x^2)+I(datapsd.x^3)+hx)
ix <- sort( datapsd.x,index.return=T)$ix
lines( datapsd.x[ix], cubspline.mod$fitted.values[ix],col="green", lwd=2)

Perbandingan dengan k-fold CV

Data dibagi menjadi 3 bagian, yaitu datapsd.x kurang dari 0, datapsd.x dari 1 sampai 2, dan datapsd.x lebih dari 2.

plot( datapsd.x,y,xlim=c(-2,5), ylim=c( -10,70), pch=16,col="dark blue")
abline(v=0,col="red", lty=2)
abline(v=2,col="red", lty=2)

hx1 <- ifelse( datapsd.x>0,(datapsd.x-0)^3,0)
hx2 <- ifelse( datapsd.x>2,(datapsd.x-2)^3,0)

cubspline.mod2 <- lm( y~datapsd.x+I(datapsd.x^2)+I(datapsd.x^3)+hx1+hx2)
ix <- sort( datapsd.x,index.return=T)$ix
lines(datapsd.x[ix],cubspline.mod2$fitted.values[ix],col="green", lwd=2)

Perbandingan 1 knot dan 2 knot dengan 5-fold CV

##1 knot
data.all <- cbind( y,datapsd.x,hx)
set.seed(456)
ind <- sample(1:5,length( datapsd.x),replace=T)
res <- c()
for( i in 1:5){
  dt.train <- data.all[ ind!=i,]
  x.test <- data.all[ ind==i, -1]
  y.test <- data.all[ ind==i,1]
  mod1 <- lm( dt.train[,1]~dt.train[,2]+I( dt.train[,2]^2)+I( dt.train[,2]^3)+dt.train[,3])
  x.test.olah <- cbind(1,x.test[,1], x.test[,1]^2,x.test[,1]^3,x.test[,2])
  beta <- coefficients(mod1)
  prediksi <- x.test.olah%*%beta
  res <- c(res,mean(( y.test-prediksi)^2))
}
res
## [1] 22.08767 21.39598 29.47677 29.68683 25.18070
mean(res)
## [1] 25.56559
##2 knot (knot = 0, 2)
data.all2 <- cbind(y,datapsd.x,hx1,hx2)
set.seed(456)
ind2 <- sample(1:5,length( datapsd.x),replace=T)
res2 <- c()
for( i in 1:5){
  dt.train2 <- data.all2[ind2!=i,]
  x.test2 <- data.all2[ind2==i,-1]
  y.test2 <- data.all2[ind2==i,1]
  mod2 <- lm(dt.train2[,1]~dt.train2[,2]+I(dt.train2[,2]^2)+I(dt.train2[,2]^3)+dt.train2[,3]+dt.train2[,4])
  x.test.olah2 <- cbind(1,x.test2[,1],x.test2[,1]^2,x.test2[,1]^3,x.test2[,2],x.test2[,3])
  beta2 <- coefficients(mod2)
  prediksi2 <- x.test.olah2%*%beta2
  res2 <- c(res2,mean((y.test2-prediksi2)^2))
}
res2
## [1] 22.32898 21.33868 29.40459 29.79913 25.22047
mean(res2)
## [1] 25.61837

Smoothing Spline

ss1 <- smooth.spline(datapsd.x,y,all.knots = T)
plot(datapsd.x,y,xlim=c(-2,5), ylim=c(-10,70), pch=16,col="dark blue")
lines(ss1,col="red", lwd=2)

Contoh Data TRICEPS

Data yang digunakan untuk ilustrasi berasal dari studi antropometri terhadap 892 perempuan di bawah 50 tahun di tiga desa Gambia di Afrika Barat. Data terdiri dari 3 Kolom yaitu Age, Intriceps dan tricepts. Berikut adalah penjelasannya pada masing-masing kolom:

age : umur responden

Intriceps : logaritma dari ketebalan lipatan kulit triceps

triceps: ketebalan lipatan kulit triceps

Lipatan kulit trisep diperlukan untuk menghitung lingkar otot lengan atas. Ketebalannya memberikan informasi tentang cadangan lemak tubuh, sedangkan massa otot yang dihitung memberikan informasi tentang cadangan protein.

data("triceps", package="MultiKink")

Jika digambarkan dalam bentuk scatterplot

ggplot(triceps,aes(x=age, y=triceps)) +
                 geom_point(alpha=0.55, color="black") + 
                 theme_bw() 

Regresi Spline

library(splines)
#Menentukan banyaknya fungsi basis dan knots
dim(bs(triceps$age, knots = c(10, 20,40)))
## [1] 892   6
#atau
dim(bs(triceps$age, df=6))
## [1] 892   6
#nilai knots yang ditentukan oleh komputer
attr(bs(triceps$age, df=6),"knots")
##     25%     50%     75% 
##  5.5475 12.2100 24.7275

b-spline : Menggunakan knot yang ditentukan manual

knot_value_manual_3 = c(10, 20,40)
mod_spline_1 = lm(triceps ~ bs(age, knots =knot_value_manual_3 ),data=triceps)
summary(mod_spline_1)
## 
## Call:
## lm(formula = triceps ~ bs(age, knots = knot_value_manual_3), 
##     data = triceps)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -11.385  -1.682  -0.393   1.165  22.855 
## 
## Coefficients:
##                                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                            8.22533    0.61873  13.294  < 2e-16 ***
## bs(age, knots = knot_value_manual_3)1 -0.06551    1.14972  -0.057    0.955    
## bs(age, knots = knot_value_manual_3)2 -4.30051    0.72301  -5.948 3.90e-09 ***
## bs(age, knots = knot_value_manual_3)3  7.80435    1.17793   6.625 6.00e-11 ***
## bs(age, knots = knot_value_manual_3)4  6.14890    1.27439   4.825 1.65e-06 ***
## bs(age, knots = knot_value_manual_3)5  5.56640    1.42225   3.914 9.78e-05 ***
## bs(age, knots = knot_value_manual_3)6  7.90178    1.54514   5.114 3.87e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.76 on 885 degrees of freedom
## Multiple R-squared:  0.4195, Adjusted R-squared:  0.4155 
## F-statistic: 106.6 on 6 and 885 DF,  p-value: < 2.2e-16
ggplot(triceps,aes(x=age, y=triceps)) +
                 geom_point(alpha=0.55, color="black") +
  stat_smooth(method = "lm", 
               formula = y~bs(x, knots = knot_value_manual_3), 
               lty = 1,se = F)

Menggunakan knot yang ditentukan fungsi komputer

knot_value_pc_df_6 = attr(bs(triceps$age, df=6),"knots")
mod_spline_1 = lm(triceps ~ bs(age, knots =knot_value_pc_df_6 ),data=triceps)
summary(mod_spline_1)
## 
## Call:
## lm(formula = triceps ~ bs(age, knots = knot_value_pc_df_6), data = triceps)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.0056  -1.7556  -0.2944   1.2008  23.0695 
## 
## Coefficients:
##                                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                            6.5925     0.8770   7.517 1.37e-13 ***
## bs(age, knots = knot_value_pc_df_6)1   3.7961     1.5015   2.528  0.01164 *  
## bs(age, knots = knot_value_pc_df_6)2  -2.0749     0.8884  -2.335  0.01974 *  
## bs(age, knots = knot_value_pc_df_6)3   1.5139     1.1645   1.300  0.19391    
## bs(age, knots = knot_value_pc_df_6)4  11.6394     1.3144   8.855  < 2e-16 ***
## bs(age, knots = knot_value_pc_df_6)5   5.9680     1.5602   3.825  0.00014 ***
## bs(age, knots = knot_value_pc_df_6)6   8.9127     1.4053   6.342 3.60e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.757 on 885 degrees of freedom
## Multiple R-squared:  0.4206, Adjusted R-squared:  0.4167 
## F-statistic: 107.1 on 6 and 885 DF,  p-value: < 2.2e-16
ggplot(triceps,aes(x=age, y=triceps)) +
                 geom_point(alpha=0.55, color="black") +
  stat_smooth(method = "lm", 
               formula = y~bs(x, knots = knot_value_pc_df_6), 
               lty = 1,se = F)

Natural Spline

knot_value_manual_3 = c(10, 20,40)
mod_spline3ns = lm(triceps ~ ns(age, knots = knot_value_manual_3),data=triceps)
summary(mod_spline3ns)
## 
## Call:
## lm(formula = triceps ~ ns(age, knots = knot_value_manual_3), 
##     data = triceps)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.7220  -1.7640  -0.3985   1.1908  22.9684 
## 
## Coefficients:
##                                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                             8.8748     0.3742  23.714  < 2e-16 ***
## ns(age, knots = knot_value_manual_3)1   7.0119     0.6728  10.422  < 2e-16 ***
## ns(age, knots = knot_value_manual_3)2   6.0762     0.8625   7.045 3.72e-12 ***
## ns(age, knots = knot_value_manual_3)3   2.0780     1.0632   1.954    0.051 .  
## ns(age, knots = knot_value_manual_3)4   8.8616     0.9930   8.924  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.762 on 887 degrees of freedom
## Multiple R-squared:  0.4176, Adjusted R-squared:  0.415 
## F-statistic:   159 on 4 and 887 DF,  p-value: < 2.2e-16

Smoothing Splin

model_sms <- with(data = triceps,smooth.spline(age,triceps))
model_sms
## Call:
## smooth.spline(x = age, y = triceps)
## 
## Smoothing Parameter  spar= 0.5162704  lambda= 1.232259e-06 (13 iterations)
## Equivalent Degrees of Freedom (Df): 50.00639
## Penalized Criterion (RSS): 8591.581
## GCV: 13.77835
pred_data <- broom::augment(model_sms)

ggplot(pred_data,aes(x=x,y=y))+
  geom_point(alpha=0.55, color="black")+
  geom_line(aes(y=.fitted),col="blue",
            lty=1)+
  xlab("age")+
  ylab("triceps")+
  theme_bw()

model_sms_lambda <- data.frame(lambda=seq(0,5,by=0.5)) %>% 
  group_by(lambda) %>% 
  do(broom::augment(with(data = triceps,smooth.spline(age,triceps,lambda = .$lambda))))

p <- ggplot(model_sms_lambda,
       aes(x=x,y=y))+
  geom_line(aes(y=.fitted),
            col="blue",
            lty=1
            )+
  facet_wrap(~lambda)
p

model_sms <- with(data = triceps,smooth.spline(age,triceps,df=7))
model_sms 
## Call:
## smooth.spline(x = age, y = triceps, df = 7)
## 
## Smoothing Parameter  spar= 1.049874  lambda= 0.008827582 (11 iterations)
## Equivalent Degrees of Freedom (Df): 7.000747
## Penalized Criterion (RSS): 10112.16
## GCV: 14.20355
pred_data <- broom::augment(model_sms)

ggplot(pred_data,aes(x=x,y=y))+
  geom_point(alpha=0.55, color="black")+
  geom_line(aes(y=.fitted),col="blue",
            lty=1)+
  xlab("age")+
  ylab("triceps")+
  theme_bw()

## LOESS

model_loess <- loess(triceps~age,
                     data = triceps)
summary(model_loess)
## Call:
## loess(formula = triceps ~ age, data = triceps)
## 
## Number of Observations: 892 
## Equivalent Number of Parameters: 4.6 
## Residual Standard Error: 3.777 
## Trace of smoother matrix: 5.01  (exact)
## 
## Control settings:
##   span     :  0.75 
##   degree   :  2 
##   family   :  gaussian
##   surface  :  interpolate      cell = 0.2
##   normalize:  TRUE
##  parametric:  FALSE
## drop.square:  FALSE
model_loess_span <- data.frame(span=seq(0.1,5,by=0.5)) %>% 
  group_by(span) %>% 
  do(broom::augment(loess(triceps~age,
                     data = triceps,span=.$span)))

p2 <- ggplot(model_loess_span,
       aes(x=age,y=triceps))+
  geom_line(aes(y=.fitted),
            col="blue",
            lty=1
            )+
  facet_wrap(~span)
p2

library(ggplot2)
ggplot(triceps, aes(age,triceps)) +
  geom_point(alpha=0.5,color="black") +
  stat_smooth(method='loess',
               formula=y~x,
              span = 0.75,
              col="blue",
              lty=1,
              se=F)

set.seed(123)
cross_val <- vfold_cv(triceps,v=10,strata = "triceps")

span <- seq(0.1,1,length.out=50)

best_loess <- map_dfr(span, function(i){
metric_loess <- map_dfr(cross_val$splits,
    function(x){
mod <- loess(triceps ~ age,span = i,
         data=triceps[x$in_id,])
pred <- predict(mod,
               newdata=triceps[-x$in_id,])
truth <- triceps[-x$in_id,]$triceps

data_eval <- na.omit(data.frame(pred=pred,
                                truth=truth))


rmse <- mlr3measures::rmse(truth = data_eval$truth,
                           response = data_eval$pred
                           )
mae <- mlr3measures::mae(truth = data_eval$truth,
                           response = data_eval$pred
                           )
metric <- c(rmse,mae)
names(metric) <- c("rmse","mae")
return(metric)
}
)

head(metric_loess,20)

# menghitung rata-rata untuk 10 folds
mean_metric_loess <- colMeans(metric_loess)

mean_metric_loess
}
)

best_loess <- cbind(span=span,best_loess)
# menampilkan hasil all breaks
best_loess
##         span     rmse      mae
## 1  0.1000000 3.773822 2.495752
## 2  0.1183673 3.771342 2.493506
## 3  0.1367347 3.766450 2.487557
## 4  0.1551020 3.757385 2.480571
## 5  0.1734694 3.749539 2.473494
## 6  0.1918367 3.746288 2.470915
## 7  0.2102041 3.742741 2.467990
## 8  0.2285714 3.740397 2.465388
## 9  0.2469388 3.737724 2.463311
## 10 0.2653061 3.737105 2.462252
## 11 0.2836735 3.735645 2.460514
## 12 0.3020408 3.734522 2.459225
## 13 0.3204082 3.733298 2.457694
## 14 0.3387755 3.732538 2.456280
## 15 0.3571429 3.732082 2.455452
## 16 0.3755102 3.731261 2.454642
## 17 0.3938776 3.730882 2.454293
## 18 0.4122449 3.730585 2.454291
## 19 0.4306122 3.730351 2.454869
## 20 0.4489796 3.730717 2.456222
## 21 0.4673469 3.730831 2.456768
## 22 0.4857143 3.731387 2.458307
## 23 0.5040816 3.732091 2.460189
## 24 0.5224490 3.732392 2.461811
## 25 0.5408163 3.733609 2.464543
## 26 0.5591837 3.734393 2.467180
## 27 0.5775510 3.735554 2.470517
## 28 0.5959184 3.736679 2.473012
## 29 0.6142857 3.738227 2.476519
## 30 0.6326531 3.741401 2.476887
## 31 0.6510204 3.742660 2.479842
## 32 0.6693878 3.744336 2.483581
## 33 0.6877551 3.746651 2.488792
## 34 0.7061224 3.748708 2.492180
## 35 0.7244898 3.750734 2.495218
## 36 0.7428571 3.752815 2.498359
## 37 0.7612245 3.753998 2.504175
## 38 0.7795918 3.755981 2.511421
## 39 0.7979592 3.756794 2.514375
## 40 0.8163265 3.759456 2.522994
## 41 0.8346939 3.767892 2.543250
## 42 0.8530612 3.777168 2.557863
## 43 0.8714286 3.782407 2.566972
## 44 0.8897959 3.788428 2.578042
## 45 0.9081633 3.797002 2.595698
## 46 0.9265306 3.803259 2.609852
## 47 0.9448980 3.809478 2.623386
## 48 0.9632653 3.824192 2.655691
## 49 0.9816327 3.835596 2.679155
## 50 1.0000000 3.860514 2.718911
library(ggplot2)
ggplot(triceps, aes(age,triceps)) +
  geom_point(alpha=0.5,color="black") +
  stat_smooth(method='loess',
              formula=y~x,
              span = 0.4306122,
              col="blue",
              lty=1,
              se=F)

4. DATA LATIHAN

AutoData = Auto %>% select(mpg, horsepower, origin)
tibble(AutoData)
## # A tibble: 392 x 3
##      mpg horsepower origin
##    <dbl>      <dbl>  <dbl>
##  1    18        130      1
##  2    15        165      1
##  3    18        150      1
##  4    16        150      1
##  5    17        140      1
##  6    15        198      1
##  7    14        220      1
##  8    14        215      1
##  9    14        225      1
## 10    15        190      1
## # ... with 382 more rows

Plot Data

plot(AutoData$mpg, AutoData$horsepower,col="yellow")

Berdasarkan scatterplot diatas dapat dilihat bahwa pola hubungan data tidak linier diindikasikan dengan garis yang cenderung melengkung tidak membentuk garis lurus.

3.1 Regresi Linier

lin.mod2 <-lm(AutoData$horsepower~AutoData$mpg)
plot(AutoData$mpg,AutoData$horsepower)
lines(AutoData$mpg,lin.mod2$fitted.values,col="red")

Garis fungsi linier yang ditunjukkan oleh garis warna merah belum mengikuti pola data.

summary(lin.mod2)
## 
## Call:
## lm(formula = AutoData$horsepower ~ AutoData$mpg)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -64.892 -15.716  -2.094  13.108  96.947 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  194.4756     3.8732   50.21   <2e-16 ***
## AutoData$mpg  -3.8389     0.1568  -24.49   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 24.19 on 390 degrees of freedom
## Multiple R-squared:  0.6059, Adjusted R-squared:  0.6049 
## F-statistic: 599.7 on 1 and 390 DF,  p-value: < 2.2e-16

3.2 Polinomial dengan Ordo 2

pol.mod2 <- lm(AutoData$horsepower~AutoData$mpg+I(AutoData$mpg^2)) #ordo 2
ix2 <- sort(AutoData$mpg,index.return=T)$ix
plot(AutoData$mpg,AutoData$horsepower)
lines(AutoData$mpg[ix2], pol.mod2$fitted.values[ix2],col="blue", cex=2)

Garis fungsi polinomial dengan ordo 2 yang ditunjukkan warna biru sudah mengikuti pola data.

summary(pol.mod2)
## 
## Call:
## lm(formula = AutoData$horsepower ~ AutoData$mpg + I(AutoData$mpg^2))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -72.52 -11.24  -0.11   9.47  92.98 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       302.06824    9.25689   32.63   <2e-16 ***
## AutoData$mpg      -13.32406    0.77456  -17.20   <2e-16 ***
## I(AutoData$mpg^2)   0.18804    0.01513   12.43   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.49 on 389 degrees of freedom
## Multiple R-squared:  0.718,  Adjusted R-squared:  0.7165 
## F-statistic: 495.1 on 2 and 389 DF,  p-value: < 2.2e-16

3.3 Fungsi Tangga

range(AutoData$mpg)
## [1]  9.0 46.6
d1 <- as.factor(ifelse(AutoData$mpg<=20,1,0))
d2 <- as.factor(ifelse(AutoData$mpg<=30 & AutoData$mpg>20,1,0))
d3 <- as.factor(ifelse(AutoData$mpg>30,1,0))
data.frame(AutoData$horsepower,d1,d2,d3)
##     AutoData.horsepower d1 d2 d3
## 1                   130  1  0  0
## 2                   165  1  0  0
## 3                   150  1  0  0
## 4                   150  1  0  0
## 5                   140  1  0  0
## 6                   198  1  0  0
## 7                   220  1  0  0
## 8                   215  1  0  0
## 9                   225  1  0  0
## 10                  190  1  0  0
## 11                  170  1  0  0
## 12                  160  1  0  0
## 13                  150  1  0  0
## 14                  225  1  0  0
## 15                   95  0  1  0
## 16                   95  0  1  0
## 17                   97  1  0  0
## 18                   85  0  1  0
## 19                   88  0  1  0
## 20                   46  0  1  0
## 21                   87  0  1  0
## 22                   90  0  1  0
## 23                   95  0  1  0
## 24                  113  0  1  0
## 25                   90  0  1  0
## 26                  215  1  0  0
## 27                  200  1  0  0
## 28                  210  1  0  0
## 29                  193  1  0  0
## 30                   88  0  1  0
## 31                   90  0  1  0
## 32                   95  0  1  0
## 33                  100  1  0  0
## 34                  105  1  0  0
## 35                  100  1  0  0
## 36                   88  1  0  0
## 37                  100  1  0  0
## 38                  165  1  0  0
## 39                  175  1  0  0
## 40                  153  1  0  0
## 41                  150  1  0  0
## 42                  180  1  0  0
## 43                  170  1  0  0
## 44                  175  1  0  0
## 45                  110  1  0  0
## 46                   72  0  1  0
## 47                  100  1  0  0
## 48                   88  1  0  0
## 49                   86  0  1  0
## 50                   90  0  1  0
## 51                   70  0  1  0
## 52                   76  0  1  0
## 53                   65  0  0  1
## 54                   69  0  0  1
## 55                   60  0  1  0
## 56                   70  0  1  0
## 57                   95  0  1  0
## 58                   80  0  1  0
## 59                   54  0  1  0
## 60                   90  1  0  0
## 61                   86  0  1  0
## 62                  165  1  0  0
## 63                  175  1  0  0
## 64                  150  1  0  0
## 65                  153  1  0  0
## 66                  150  1  0  0
## 67                  208  1  0  0
## 68                  155  1  0  0
## 69                  160  1  0  0
## 70                  190  1  0  0
## 71                   97  1  0  0
## 72                  150  1  0  0
## 73                  130  1  0  0
## 74                  140  1  0  0
## 75                  150  1  0  0
## 76                  112  1  0  0
## 77                   76  0  1  0
## 78                   87  0  1  0
## 79                   69  0  1  0
## 80                   86  0  1  0
## 81                   92  0  1  0
## 82                   97  0  1  0
## 83                   80  0  1  0
## 84                   88  0  1  0
## 85                  175  1  0  0
## 86                  150  1  0  0
## 87                  145  1  0  0
## 88                  137  1  0  0
## 89                  150  1  0  0
## 90                  198  1  0  0
## 91                  150  1  0  0
## 92                  158  1  0  0
## 93                  150  1  0  0
## 94                  215  1  0  0
## 95                  225  1  0  0
## 96                  175  1  0  0
## 97                  105  1  0  0
## 98                  100  1  0  0
## 99                  100  1  0  0
## 100                  88  1  0  0
## 101                  95  0  1  0
## 102                  46  0  1  0
## 103                 150  1  0  0
## 104                 167  1  0  0
## 105                 170  1  0  0
## 106                 180  1  0  0
## 107                 100  1  0  0
## 108                  88  1  0  0
## 109                  72  0  1  0
## 110                  94  0  1  0
## 111                  90  1  0  0
## 112                  85  1  0  0
## 113                 107  0  1  0
## 114                  90  0  1  0
## 115                 145  1  0  0
## 116                 230  1  0  0
## 117                  49  0  1  0
## 118                  75  0  1  0
## 119                  91  1  0  0
## 120                 112  1  0  0
## 121                 150  1  0  0
## 122                 110  0  1  0
## 123                 122  1  0  0
## 124                 180  1  0  0
## 125                  95  1  0  0
## 126                 100  1  0  0
## 127                 100  1  0  0
## 128                  67  0  0  1
## 129                  80  0  1  0
## 130                  65  0  0  1
## 131                  75  0  1  0
## 132                 100  1  0  0
## 133                 110  1  0  0
## 134                 105  1  0  0
## 135                 140  1  0  0
## 136                 150  1  0  0
## 137                 150  1  0  0
## 138                 140  1  0  0
## 139                 150  1  0  0
## 140                  83  0  1  0
## 141                  67  0  1  0
## 142                  78  0  1  0
## 143                  52  0  0  1
## 144                  61  0  0  1
## 145                  75  0  1  0
## 146                  75  0  1  0
## 147                  75  0  1  0
## 148                  97  0  1  0
## 149                  93  0  1  0
## 150                  67  0  0  1
## 151                  95  1  0  0
## 152                 105  1  0  0
## 153                  72  1  0  0
## 154                  72  1  0  0
## 155                 170  1  0  0
## 156                 145  1  0  0
## 157                 150  1  0  0
## 158                 148  1  0  0
## 159                 110  1  0  0
## 160                 105  1  0  0
## 161                 110  1  0  0
## 162                  95  1  0  0
## 163                 110  0  1  0
## 164                 110  1  0  0
## 165                 129  1  0  0
## 166                  75  0  1  0
## 167                  83  0  1  0
## 168                 100  1  0  0
## 169                  78  0  1  0
## 170                  96  0  1  0
## 171                  71  0  1  0
## 172                  97  0  1  0
## 173                  97  1  0  0
## 174                  70  0  1  0
## 175                  90  1  0  0
## 176                  95  0  1  0
## 177                  88  0  1  0
## 178                  98  0  1  0
## 179                 115  0  1  0
## 180                  53  0  0  1
## 181                  86  0  1  0
## 182                  81  0  1  0
## 183                  92  0  1  0
## 184                  79  0  1  0
## 185                  83  0  1  0
## 186                 140  1  0  0
## 187                 150  1  0  0
## 188                 120  1  0  0
## 189                 152  1  0  0
## 190                 100  0  1  0
## 191                 105  0  1  0
## 192                  81  0  1  0
## 193                  90  0  1  0
## 194                  52  0  1  0
## 195                  60  0  1  0
## 196                  70  0  1  0
## 197                  53  0  0  1
## 198                 100  1  0  0
## 199                  78  1  0  0
## 200                 110  1  0  0
## 201                  95  1  0  0
## 202                  71  0  1  0
## 203                  70  0  0  1
## 204                  75  0  1  0
## 205                  72  0  1  0
## 206                 102  1  0  0
## 207                 150  1  0  0
## 208                  88  1  0  0
## 209                 108  1  0  0
## 210                 120  1  0  0
## 211                 180  1  0  0
## 212                 145  1  0  0
## 213                 130  1  0  0
## 214                 150  1  0  0
## 215                  68  0  0  1
## 216                  80  0  1  0
## 217                  58  0  0  1
## 218                  96  0  1  0
## 219                  70  0  0  1
## 220                 145  1  0  0
## 221                 110  1  0  0
## 222                 145  1  0  0
## 223                 130  1  0  0
## 224                 110  1  0  0
## 225                 105  0  1  0
## 226                 100  1  0  0
## 227                  98  1  0  0
## 228                 180  1  0  0
## 229                 170  1  0  0
## 230                 190  1  0  0
## 231                 149  1  0  0
## 232                  78  0  1  0
## 233                  88  0  1  0
## 234                  75  0  1  0
## 235                  89  0  1  0
## 236                  63  0  0  1
## 237                  83  0  0  1
## 238                  67  0  1  0
## 239                  78  0  0  1
## 240                  97  0  1  0
## 241                 110  0  1  0
## 242                 110  0  1  0
## 243                  48  0  0  1
## 244                  66  0  0  1
## 245                  52  0  0  1
## 246                  70  0  0  1
## 247                  60  0  0  1
## 248                 110  1  0  0
## 249                 140  1  0  0
## 250                 139  0  1  0
## 251                 105  1  0  0
## 252                  95  0  1  0
## 253                  85  0  1  0
## 254                  88  0  1  0
## 255                 100  0  1  0
## 256                  90  1  0  0
## 257                 105  0  1  0
## 258                  85  0  1  0
## 259                 110  1  0  0
## 260                 120  1  0  0
## 261                 145  1  0  0
## 262                 165  1  0  0
## 263                 139  1  0  0
## 264                 140  1  0  0
## 265                  68  0  1  0
## 266                  95  0  1  0
## 267                  97  0  1  0
## 268                  75  0  0  1
## 269                  95  0  1  0
## 270                 105  0  1  0
## 271                  85  0  1  0
## 272                  97  0  1  0
## 273                 103  0  1  0
## 274                 125  1  0  0
## 275                 115  0  1  0
## 276                 133  1  0  0
## 277                  71  0  0  1
## 278                  68  0  1  0
## 279                 115  0  1  0
## 280                  85  1  0  0
## 281                  88  0  1  0
## 282                  90  0  1  0
## 283                 110  0  1  0
## 284                 130  1  0  0
## 285                 129  1  0  0
## 286                 138  1  0  0
## 287                 135  1  0  0
## 288                 155  1  0  0
## 289                 142  1  0  0
## 290                 125  1  0  0
## 291                 150  1  0  0
## 292                  71  0  0  1
## 293                  65  0  0  1
## 294                  80  0  0  1
## 295                  80  0  1  0
## 296                  77  0  1  0
## 297                 125  0  1  0
## 298                  71  0  1  0
## 299                  90  0  1  0
## 300                  70  0  0  1
## 301                  70  0  0  1
## 302                  65  0  0  1
## 303                  69  0  0  1
## 304                  90  0  1  0
## 305                 115  0  1  0
## 306                 115  0  1  0
## 307                  90  0  0  1
## 308                  76  0  0  1
## 309                  60  0  0  1
## 310                  70  0  0  1
## 311                  65  0  0  1
## 312                  90  0  1  0
## 313                  88  0  1  0
## 314                  90  0  1  0
## 315                  90  1  0  0
## 316                  78  0  0  1
## 317                  90  0  1  0
## 318                  75  0  0  1
## 319                  92  0  0  1
## 320                  75  0  0  1
## 321                  65  0  0  1
## 322                 105  0  1  0
## 323                  65  0  0  1
## 324                  48  0  0  1
## 325                  48  0  0  1
## 326                  67  0  0  1
## 327                  67  0  1  0
## 328                  67  0  0  1
## 329                  67  0  0  1
## 330                  62  0  1  0
## 331                 132  0  0  1
## 332                 100  0  1  0
## 333                  88  0  0  1
## 334                  72  0  0  1
## 335                  84  0  1  0
## 336                  84  0  1  0
## 337                  92  0  1  0
## 338                 110  0  1  0
## 339                  84  0  1  0
## 340                  58  0  0  1
## 341                  64  0  0  1
## 342                  60  0  0  1
## 343                  67  0  0  1
## 344                  65  0  0  1
## 345                  62  0  0  1
## 346                  68  0  0  1
## 347                  63  0  0  1
## 348                  65  0  0  1
## 349                  65  0  1  0
## 350                  74  0  0  1
## 351                  75  0  0  1
## 352                  75  0  0  1
## 353                 100  0  0  1
## 354                  74  0  0  1
## 355                  80  0  1  0
## 356                  76  0  0  1
## 357                 116  0  1  0
## 358                 120  0  1  0
## 359                 110  0  1  0
## 360                 105  0  1  0
## 361                  88  0  1  0
## 362                  85  1  0  0
## 363                  88  0  1  0
## 364                  88  0  1  0
## 365                  88  0  0  1
## 366                  85  0  0  1
## 367                  84  0  1  0
## 368                  90  0  1  0
## 369                  92  0  1  0
## 370                  74  0  0  1
## 371                  68  0  0  1
## 372                  68  0  0  1
## 373                  63  0  0  1
## 374                  70  0  0  1
## 375                  88  0  0  1
## 376                  75  0  0  1
## 377                  70  0  0  1
## 378                  67  0  0  1
## 379                  67  0  0  1
## 380                  67  0  0  1
## 381                 110  0  1  0
## 382                  85  0  0  1
## 383                  92  0  1  0
## 384                 112  0  1  0
## 385                  96  0  0  1
## 386                  84  0  0  1
## 387                  90  0  1  0
## 388                  86  0  1  0
## 389                  52  0  0  1
## 390                  84  0  0  1
## 391                  79  0  1  0
## 392                  82  0  0  1
step.mod2 <- lm(AutoData$horsepower~d1+d2+d3)
plot(AutoData$mpg,AutoData$horsepower)
lines(AutoData$mpg,lin.mod2$fitted.values,col="red")
lines(AutoData$mpg[ix2], step.mod2$fitted.values[ix2],col="blue")

Garis gungsi dengan breaks 3 yang ditunjukkan oleh warna garis biru cenderung mengikuti pola data.

summary(step.mod2)
## 
## Call:
## lm(formula = AutoData$horsepower ~ d1 + d2 + d3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -65.450 -12.966   0.034  12.550  92.550 
## 
## Coefficients: (1 not defined because of singularities)
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   70.518      2.886  24.431  < 2e-16 ***
## d11           66.932      3.557  18.816  < 2e-16 ***
## d21           17.448      3.602   4.844 1.84e-06 ***
## d31               NA         NA      NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 26.3 on 389 degrees of freedom
## Multiple R-squared:  0.5356, Adjusted R-squared:  0.5332 
## F-statistic: 224.3 on 2 and 389 DF,  p-value: < 2.2e-16

3.4 Perbandingan Nilai AIC

nilai_AIC <- rbind(AIC(lin.mod2),
                   AIC(pol.mod2),
                   AIC(step.mod2))

nama_model <- c("Linear","Poly (ordo=2)","Tangga (breaks=3)")
data.frame(nama_model,nilai_AIC)
##          nama_model nilai_AIC
## 1            Linear  3614.324
## 2     Poly (ordo=2)  3485.217
## 3 Tangga (breaks=3)  3680.688

Dapat dlihat berdasarkan output diatas bahwa model Polynomial ordo 2 memiliki nilai AIC yang paling kecil yaitu sebesar 3485.217 yang berarti bahwamodel tersebut yang paling baik untuk dapat merepresentasikan pola hubungan data.

plot(AutoData$mpg,AutoData$horsepower)
lines(AutoData$mpg,lin.mod2$fitted.values,col="red")
lines(AutoData$mpg[ix2], pol.mod2$fitted.values[ix2],col="blue", cex=2)
lines(AutoData$mpg[ix2], step.mod2$fitted.values[ix2],col="yellow")

Secara eksploratif, garis yang paling mengikuti pola data adalah garis yang warna biru yaitu fungsi polynomial dengan ordo 2.

3.5 Perbandingan Nilai MSE

MSE2 = function(pred,actual){
  mean((pred-actual)^2)
}
MSE2(predict(lin.mod2),AutoData$horsepower)
## [1] 582.3257
MSE2(predict(pol.mod2),AutoData$horsepower)
## [1] 416.7871
MSE2(predict(step.mod2),AutoData$horsepower)
## [1] 686.2376
nilai_MSE <- rbind(MSE2(predict(lin.mod2),AutoData$horsepower),MSE2(predict(pol.mod2),AutoData$horsepower),MSE2(predict(step.mod2),AutoData$horsepower))

nama_model <- c("Linear","Poly (ordo=2)","Tangga (breaks=3)")
data.frame(nama_model,nilai_MSE)
##          nama_model nilai_MSE
## 1            Linear  582.3257
## 2     Poly (ordo=2)  416.7871
## 3 Tangga (breaks=3)  686.2376

Berdasarkan output diatas dapat dilihat bahwa model polynomial memiliki nilai MSE paling kecil yaitu sebesar 416.7871. Sehingga dapat disimpulkan bahwa model terbaik adalah polynomial dengan ordo 2.

3.6 Regresi Spline

ggplot(AutoData,aes(x=horsepower, y=mpg)) +
                 geom_point(alpha=0.55, color="black") +
  stat_smooth(method = "lm", 
               formula = y~bs(x, knots = knot_value_manual_auto), 
               lty = 1,se = T)
## Warning: Computation failed in `stat_smooth()`:
## could not find function "bs"

## 3.7 Natural Spline

knot_value_manual_auto = c(10, 20,40)
mod_spline_ns_auto = lm(mpg ~ ns(horsepower, knots = knot_value_manual_auto),data=AutoData)
summary(mod_spline_ns_auto)
## 
## Call:
## lm(formula = mpg ~ ns(horsepower, knots = knot_value_manual_auto), 
##     data = AutoData)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.5710  -3.2592  -0.3435   2.7630  16.9240 
## 
## Coefficients: (3 not defined because of singularities)
##                                                 Estimate Std. Error t value
## (Intercept)                                      10.0857     0.5992   16.83
## ns(horsepower, knots = knot_value_manual_auto)1  32.2705     1.3177   24.49
## ns(horsepower, knots = knot_value_manual_auto)2       NA         NA      NA
## ns(horsepower, knots = knot_value_manual_auto)3       NA         NA      NA
## ns(horsepower, knots = knot_value_manual_auto)4       NA         NA      NA
##                                                 Pr(>|t|)    
## (Intercept)                                       <2e-16 ***
## ns(horsepower, knots = knot_value_manual_auto)1   <2e-16 ***
## ns(horsepower, knots = knot_value_manual_auto)2       NA    
## ns(horsepower, knots = knot_value_manual_auto)3       NA    
## ns(horsepower, knots = knot_value_manual_auto)4       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.906 on 390 degrees of freedom
## Multiple R-squared:  0.6059, Adjusted R-squared:  0.6049 
## F-statistic: 599.7 on 1 and 390 DF,  p-value: < 2.2e-16
ggplot(AutoData,aes(x=horsepower, y=mpg)) +
                 geom_point(alpha=0.55, color="black")+
      stat_smooth(method = "lm", 
                 formula = y~ns(x, knots = knot_value_manual_auto), 
                 lty = 1,se=T)
## Warning in predict.lm(model, newdata = new_data_frame(list(x = xseq)), se.fit =
## se, : prediction from a rank-deficient fit may be misleading

## 3.8 Smoothing Spline

ggplot(AutoData,aes(x=horsepower, y=mpg)) +
                 geom_point(alpha=0.55, color="black")+
      stat_smooth(method = "lm", 
                 formula = y~ns(x, knots = knot_value_manual_auto), 
                 lty = 1,se=T)
## Warning in predict.lm(model, newdata = new_data_frame(list(x = xseq)), se.fit =
## se, : prediction from a rank-deficient fit may be misleading

model_sms_auto <- with(data = AutoData,smooth.spline(horsepower,mpg))
model_sms_auto
## Call:
## smooth.spline(x = horsepower, y = mpg)
## 
## Smoothing Parameter  spar= 0.2648644  lambda= 5.101567e-07 (12 iterations)
## Equivalent Degrees of Freedom (Df): 42.14987
## Penalized Criterion (RSS): 1117.657
## GCV: 18.64132
pred_data <- broom::augment(model_sms_auto)

ggplot(pred_data,aes(x=x,y=y))+
  geom_point(alpha=0.55, color="black")+
  geom_line(aes(y=.fitted),col="blue",
            lty=1)+
  xlab("horsepower")+
  ylab("mpg")+
  theme_bw()

Terlihat pengaruh lambda terhadap tingkat smoothness kurva

model_sms_lambda_auto <- data.frame(lambda=seq(0,5,by=0.5)) %>% 
  group_by(lambda) %>% 
  do(broom::augment(with(data = AutoData,smooth.spline(horsepower,mpg,lambda = .$lambda))))

p <- ggplot(model_sms_lambda_auto,
       aes(x=x,y=y))+
  geom_line(aes(y=.fitted),
            col="blue",
            lty=1
            )+
  facet_wrap(~lambda)
p

Jika ditentukan df=7 maka model kurva akan lebih merepresentasikan data:

model_sms_auto <- with(data = AutoData,smooth.spline(horsepower,mpg,df=7))
model_sms_auto
## Call:
## smooth.spline(x = horsepower, y = mpg, df = 7)
## 
## Smoothing Parameter  spar= 0.7927002  lambda= 0.003320238 (12 iterations)
## Equivalent Degrees of Freedom (Df): 6.999082
## Penalized Criterion (RSS): 2424.819
## GCV: 18.84972
pred_data <- broom::augment(model_sms_auto)

ggplot(pred_data,aes(x=x,y=y))+
  geom_point(alpha=0.55, color="black")+
  geom_line(aes(y=.fitted),col="blue",
            lty=1)+
  xlab("horsepower")+
  ylab("mpg")+
  theme_bw()

## 3.8 LOESS

model_loess_auto <- loess(mpg~horsepower,
                     data = AutoData)
summary(model_loess_auto)
## Call:
## loess(formula = mpg ~ horsepower, data = AutoData)
## 
## Number of Observations: 392 
## Equivalent Number of Parameters: 4.97 
## Residual Standard Error: 4.32 
## Trace of smoother matrix: 5.43  (exact)
## 
## Control settings:
##   span     :  0.75 
##   degree   :  2 
##   family   :  gaussian
##   surface  :  interpolate      cell = 0.2
##   normalize:  TRUE
##  parametric:  FALSE
## drop.square:  FALSE
model_loess_span_auto <- data.frame(span=seq(0.1,5,by=0.5)) %>% 
  group_by(span) %>% 
  do(broom::augment(loess(mpg~horsepower,
                     data = AutoData,span=.$span)))
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 89
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 1
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0
p2 <- ggplot(model_loess_span_auto,
       aes(x=horsepower,y=mpg))+
  geom_line(aes(y=.fitted),
            col="blue",
            lty=1
            )+
  facet_wrap(~span)
p2

library(ggplot2)
ggplot(AutoData, aes(horsepower,mpg)) +
  geom_point(alpha=0.5,color="black") +
  stat_smooth(method='loess',
               formula=y~x,
              span = 0.75,
              col="blue",
              lty=1,
              se=T)

## 3.9 Perbandingan Model

ggplot(AutoData, aes(x=horsepower, y=mpg)) + geom_point(color="black") + 
  stat_smooth(method="lm", formula=y~poly(x,7,raw=T), col="blue", lwd=1.2, se=F) +
  stat_smooth(method="lm", formula=y~cut(x,8), col="green", lwd=1.2, se=F) +
  stat_smooth(method="lm", formula=y~bs(x, knots=knot_value_pc_df_auto), col="yellow", lwd=1.2, se=F) +
  stat_smooth(method="lm", formula=y~ns(x, knots=knot_value_pc_df_auto), col="darkgreen", lwd=1.2, se=F) + 
  stat_smooth(method="loess", formula=y~x, col="orange", lwd=1.2, se=F) + 
  stat_smooth(method="lm", formula=y~x, col="red", lwd=1.2, se=F)
## Warning: Computation failed in `stat_smooth()`:
## object 'knot_value_pc_df_auto' not found
## Computation failed in `stat_smooth()`:
## object 'knot_value_pc_df_auto' not found

MSE = function(pred,actual){
  mean((pred-actual)^2)
}
MSE_Autogab<- rbind(MSE(predict(lin.mod2),AutoData$mpg),
              MSE(predict(pol.mod2),AutoData$mpg),
              MSE(predict(step.mod2),AutoData$mpg),
              MSE(predict(mod_spline_ns_auto),AutoData$mpg),
              MSE(predict(model_loess_auto),AutoData$mpg))
Model <- c("Linear","Poly (ordo=2)","Tangga (breaks=3)","Naturak Spline Regression", "Loess Function")
model.autogab <- data.frame(Model, MSE_Autogab)
knitr::kable(model.autogab)
Model MSE_Autogab
Linear 7987.55223
Poly (ordo=2) 8153.09082
Tangga (breaks=3) 7805.23895
Naturak Spline Regression 23.94366
Loess Function 18.37932

Berdasarkan output diatas terlihat bahwa model dengan MSE terkecil adalah Loess Function.