Moving Beyond Linearity

Tugas Praktikum 8 STA581 Sains Data

Nur Andi Setiabudi1

2021-11-03

Pendahuluan

Tujuan

  • Melakukan cross validation untuk menghasilkan pemodelan antara tingkat efisiensi penggunaan bahan bakar (direpresentasikan olehmpg atau miles per gallon) versus tiga peubah prediktor yang pada data ISLR::Auto dengan metode:

    • Polynomial regression
    • Piecewise constant
    • Natural cubic splines
  • Pemodelan dilakukan untuk masing-masing peubah prediktor terpilih.

Output

Mendapatkan model non-linier yang optimal untuk masing-masing peubah prediktor.

Metodologi

Tahapan

Pemodelan dilakukan dengan tahapan sebagai berikut:

Skenario Tuning Parameter

  • Cross validation

    4-fold cross validation dengan meminimumkan RMSE (root mean squared error)

  • Polynomial regression

    Derajat/degree polinomial awal: c(2:10)

  • Piecewise constant

    Jumlah bins/breaks awal: c(2:10)

  • Natural Cubic Spline

    Jumlah dan lokasi knots awal ditentukan perdasarkan quantile:

Jumlah knots Lokasi knots (quantiles)
1 .5,
2 .33 * c(1:2)
3 .25 * c(1:3)
3* c(.10, .5, .90)
4 .20 * c(1:4)
4* c(.05, .35, .65, .95)
5 .167 * c(1:5)
5* c(.05, .275, .5, .725, .95)
6 .143 * c(1:6)
6* c(.05, .23, .41, .59, .77, .95)
7 .125 * c(1:7)
7* c(.025, .1833, .3417, .5, .6583, .8167, .975)

Sumber : Frank E. Harrell, Jr. (2015). Regression Modeling Strategies

Tools

  • R
  • splines dan stats untuk pemodelan
  • caret untuk cross validation
  • ggplot2 dan cowplot untuk grafik/visualisasi

Kumpulan Fungsi

Untuk memudahkan iterasi, dibuat beberapa fungsi:

Modeling dan Cross Validation

library(splines)
library(caret)

beyondLinear <- function(df, y, x, model, param){
  
  #' beyondLinear()
  #' 
  #' Perform k-fold cross validation for Polynomial, Piecewise, Natural Cubic Spline
  #' and select the best model based on RMSE
  #' 
  #' Input:
  #' df: data-frame, ex: Auto
  #' y: quoted y-variable, ex: "mpg"
  #' x: quoted x-variable, ex: "horsepower"
  #' model: model/algorithm: polynomial, piecewise, ncubicspline
  #' param: tuning parameter
  
  cvResult <- mapply(
    function(param, model){
      
      if (model == "polynomial"){
        f <- sprintf("%s ~ poly(%s, degree = %s)", y, x, param)
      } else if (model == "piecewise") {
        f <- sprintf("%s ~ cut(%s, breaks = %s)", y, x, param)
      } else if (model == "ncubicspline") {
        f <- sprintf("%s ~ ns(%s, knots = c(%s))", y, x, paste(param, collapse = ","))
      } else if (model == "smoothspline") {
        f <- smooth.spline(cars[,"speed"], cars[,"dist"], df = param)
      }
      
      trCtl <- trainControl(method='cv', number = 10)
      models <- train(as.formula(f), data = df, trControl = trCtl, method = 'glm')
      
      RMSE <- models$results$RMSE
      fModel <- models$finalModel
      
      return(list(models, RMSE, fModel))
    },  model = model, param = param, USE.NAMES = FALSE, SIMPLIFY = FALSE )
  
  
  if(!is.null(names(param))){
    paramIdx <- names(param)
  } else {
    paramIdx <- param
  }
  
  cvScore <- data.frame(paramIdx = as.character(unlist(paramIdx)),
                        score = unlist(sapply(cvResult, "[", 2, USE.NAMES = F)))
  cvScore$param <- param
  
  bestPos <- which.min(cvScore$score)[1]
  bestParam <- param[[bestPos]]
  bestScore <- cvResult[[bestPos]][[2]]
  bestModel <- cvResult[[bestPos]][[1]]
  
  predValue <- data.frame(df[c(y,x)], 
                          pred = predict(bestModel, df, interval = "prediction"))
  
  return(list(method = model,
              cvScore = cvScore, 
              whichBest = bestPos, 
              bestParamIdx = paramIdx[bestPos],
              bestParam = bestParam, 
              bestScore = bestScore, 
              bestModel = bestModel,
              predValue = predValue))
}

Grafik

library(ggplot2)
library(cowplot)

cvPlot <- function(df, title, xlab){
  
  #' cvPlot()
  #' 
  #' Create cross-validation plot for Polynomial, Piecewise, Natural Cubic Spline
  #' based on given tuning parameters
  
  scaleFUN <- function(x) as.character(round(x,2))
  df$paramIdx <- factor(df$paramIdx, levels = df$paramIdx)
  
  ggplot(df, aes(x=paramIdx, y=score)) + 
    geom_bar(stat = "identity", width=0.1, fill = "black") +
    coord_cartesian(ylim = c(0.99*min(df$score), NA)) +
    geom_hline(yintercept = min(df$score), linetype="dashed", color = "red") +
    ggtitle(title) +
    xlab(xlab) + ylab("CV Root Mean Squared Error") +
    theme_classic()
}

predPlot <- function(df, y, laby, x, labx, title){
  
  #' predPlot()
  #' 
  #' Create plot of best model's fitted value vs response
  #' Along with its observed values
  
  ggplot(df) +
    geom_point(aes_string(x = x, y = y, fill = as.factor(1))) +
    geom_line(aes_string(x = x, y = "pred", color = as.factor(1)), size = 1) +
    scale_fill_manual(labels = c("Observed"), values = c("black")) +
    scale_color_manual(labels = c("Predicted"), values = c("blue"))  +
    theme_classic() +
    ggtitle(title) +
    xlab(labx) + ylab(laby) +
    theme(legend.title = element_blank(), 
          legend.position = "bottom", 
          legend.box = "horizontal")
}

predPlotAll <- function(df, y, x, laby, labx) {
  
  #' predPlotAll()
  #' 
  #' Create plot of all best model's fitted value vs response 
  #' In one frame
  
  ggplot(df) +
    geom_point(aes_string(x = x, y = y, fill = as.factor(1))) +
    geom_line(aes_string(x = x, y = "polynomial", color = as.factor(2)), size = 1) +
    geom_line(aes_string(x = x, y = "piecewise", color = as.factor(3)), size = 1) +
    geom_line(aes_string(x = x, y = "ncubicspline", color = as.factor(4)), size = 1) +
    scale_fill_manual(labels = c("Observed"), values = c("black")) +
    scale_color_manual(labels = c("Polynomial", "Piecewise", "Natural Cubic Splines"), 
                       values = c("blue", "red", "green"))  +
    theme_classic() +
    ggtitle("Comparison of Best Predictions") +
    xlab(labx) + ylab(laby) +
    theme(legend.title = element_blank(), 
          legend.position = "bottom", 
          legend.box = "horizontal")
}

Fungsi wrapper

generateAllREg <- function(df, y, x, laby, labx, degree, cut, knot, best=NULL){
  
  #' generateAllREg()
  #' 
  #' Wrapper function to call 
  #'   beyondLinear()
  #'   cvPlot()
  #'   predPlot()
  #' in one call
  
  methods <- c("polynomial", "piecewise", "ncubicspline")
  method_alias <- c("Polynomial Regression", "Piecewise Constant", "Natural Cubic Splines")
  params <- list(Degree = degree, `N Bins` = cut, `N Knots` = knot)
  
  outModel <- list()
  pltCV <- list()
  pltPred <- list()
  
  for( i in seq(length(methods)) ){
    
    model <- beyondLinear(df = df, y = y, x = x, model = methods[i], param = params[[i]])
    
    outModel[[i]] <- model
    
    
    p <- cvPlot(model$cvScore,
                title = method_alias[i],
                xlab =  names(params)[i])
    
    pltCV[[i]] <- p
    
    p <- predPlot(df = model$predValue, 
                  y = y, laby = laby,
                  x = x, labx = labx,
                  title = sprintf("%s with %s=%s (RMSE=%s)", 
                                  method_alias[i], names(params)[i], 
                                  model$bestParamIdx, round(model$bestScore,2)))
    
    if(methods[i] == "ncubicspline"){
      p <- p + geom_vline(xintercept = model$bestParam, linetype="dashed", color = "gray")
    }
    
    pltPred[[i]] <- p
  }
  
  bestSummary <- data.frame(t(sapply(outModel, "[", c("method", "bestParamIdx", "bestScore"))))
  names(bestSummary) <- c("method", "param", "RMSE")
  
  predAllBest <- data.frame(outModel[[1]]$predValue, 
                            outModel[[2]]$predValue$pred, 
                            outModel[[3]]$predValue$pred)
  names(predAllBest) <- c(y, x, methods)
  
  pltPredAll <- predPlotAll(df = predAllBest, 
                          y = y, x = x, 
                          laby = laby, labx = labx)
    
    if(is.null(best)){
      bIdx <- which.min(bestSummary$RMSE)
    } else {
      bIdx <- best
    }
  
  bestModel <- bestSummary[bIdx, ]
  
  predicted <- outModel[[bIdx]]$predValue
  
  return(list(bestSummary = bestSummary, 
              bestModel = bestModel,
              bestModelDetail = outModel[[bIdx]]$bestModel,
              bestParamDetail = outModel[[bIdx]]$bestParam,
              model = outModel, 
              pltCV = pltCV, 
              pltPred = pltPred,
              pltPredAll = pltPredAll,
              pltCVGrid = plot_grid(pltCV[[1]], pltCV[[2]], pltCV[[3]]),
              pltPredGrid = plot_grid(pltPred[[1]], pltPred[[2]], pltPred[[3]], pltPredAll),
              predBest3 = predAllBest,
              predBestFinal = predicted))
  
}

Penentuan Knots

pctKnots <- function(x){
  
  #' pctKnots()
  #' 
  #' Calculate knots based on quartiles
  
  pct <- list(
    .5,
    .33*c(1:2),
    .25*c(1:3),
    c(.10,.5,.90),
    .20*c(1:4),
    c(.05,.35,.65,.95),
    .167*c(1:5),
    c(.05,.275,.5,.725,.95),
    .143*c(1:6),
    c(.05,.23,.41,.59,.77,.95),
    .125*c(1:7),
    c(.025,.1833,.3417,.5,.6583,.8167,.975))
  
  knt <- sapply(pct, function(p, i){round(quantile(i, p),2)}, i = x)
  names(knt) <- c(1,2,3,"3*",4,"4*",5,"5*",6,"6*",7,"7*")
  return(knt)
  
}

Hasil dan Pembahasan

Preview Dataset

library(ISLR)
head(Auto)
mpg cylinders displacement horsepower weight acceleration year origin name
18 8 307 130 3504 12.0 70 1 chevrolet chevelle malibu
15 8 350 165 3693 11.5 70 1 buick skylark 320
18 8 318 150 3436 11.0 70 1 plymouth satellite
16 8 304 150 3433 12.0 70 1 amc rebel sst
17 8 302 140 3449 10.5 70 1 ford torino
15 8 429 198 4341 10.0 70 1 ford galaxie 500
summary(Auto)
##       mpg          cylinders      displacement     horsepower        weight    
##  Min.   : 9.00   Min.   :3.000   Min.   : 68.0   Min.   : 46.0   Min.   :1613  
##  1st Qu.:17.00   1st Qu.:4.000   1st Qu.:105.0   1st Qu.: 75.0   1st Qu.:2225  
##  Median :22.75   Median :4.000   Median :151.0   Median : 93.5   Median :2804  
##  Mean   :23.45   Mean   :5.472   Mean   :194.4   Mean   :104.5   Mean   :2978  
##  3rd Qu.:29.00   3rd Qu.:8.000   3rd Qu.:275.8   3rd Qu.:126.0   3rd Qu.:3615  
##  Max.   :46.60   Max.   :8.000   Max.   :455.0   Max.   :230.0   Max.   :5140  
##                                                                                
##   acceleration        year           origin                      name    
##  Min.   : 8.00   Min.   :70.00   Min.   :1.000   amc matador       :  5  
##  1st Qu.:13.78   1st Qu.:73.00   1st Qu.:1.000   ford pinto        :  5  
##  Median :15.50   Median :76.00   Median :1.000   toyota corolla    :  5  
##  Mean   :15.54   Mean   :75.98   Mean   :1.577   amc gremlin       :  4  
##  3rd Qu.:17.02   3rd Qu.:79.00   3rd Qu.:2.000   amc hornet        :  4  
##  Max.   :24.80   Max.   :82.00   Max.   :3.000   chevrolet chevette:  4  
##                                                  (Other)           :365

Hubungan mpg terhadap peubah-peubah lain

Diagram pencar di bawah menunjukkan hubungan peubah mpg terhadap peubah-peubah lain yang ada dalam dataset, yaitu:

  • cylinders
  • displacement
  • horsepower
  • weight
  • year
plotmpg <- function(x){
  ggplot(Auto) +
  geom_point(aes_string(x = tolower(x), y = "mpg"), alpha = 0.8) +
  ggtitle(paste0("Miles per Gallon vs ", x)) +
  xlab(x) + ylab("Miles per Gallon") +
  theme_classic() 
}
p1 <- plotmpg("cylinders")
p1

p2 <- plotmpg("displacement")
p2

p3 <- plotmpg("horsepower")
p3

p4 <- plotmpg("weight")
p4

p5 <- plotmpg("year")
p5

Jika disatukan dalam satu frame, maka dapat dilihat:

plot_grid(p1, p2, p3, p4, p5, ncol = 2)

Berdasarkan diagram pencar di atas, maka terlihat hubungan mpg dengan peubah-peubah lain bersifat tidak linier.

Untuk proses selanjutnya, akan dipilih tiga peubah sebagai prediktor bagi mpg, yaitu displacement, horsepower dan weight.

mpg versus displacement

AutoKnots <- pctKnots(Auto$displacement)

set.seed(123)
mpg.dp <- generateAllREg(df = Auto,
                         y = "mpg", x = "displacement", 
                         laby = "Miles per Gallon", labx = "Displacement", 
                         degree = c(2:10), cut = c(2:10), knot = AutoKnots,
                         best = 2)

Cross validation

mpg.dp$pltCVGrid

Ringkasan tuning parameter terbaik

mpg.dp$bestSummary
method param RMSE
polynomial 10 4.146329
piecewise 9 4.222382
ncubicspline 6* 4.15199
mpg.dp$bestModel
method param RMSE
2 piecewise 9 4.222382

Tunning Parameter terbaik:

mpg.dp$bestParamDetail
## [1] 9

Nilai observasi vs prediksi

mpg.dp$pltPredGrid

Model terbaik yang diperoleh dari cross-validation adalah polynomial derajat 10 yang memberikan RMSE 4.15. Akan tetapi, dari kurva terlihal nilai prediksi polynomial dan natural cubic splines menunjukkan ada belokan di ujung sebelah kiri kurva. Karena itu, dipilih model lainnya yaitu piecewise constant dengan jumlah tangga 9. Konsekuensinya, RMSE meningkat menjadi 4.22.

Ringkasan modeling

summary(mpg.dp$bestModelDetail)
## 
## Call:
## NULL
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -13.5181   -2.2250   -0.2905    2.0339   19.4607  
## 
## Coefficients:
##                                          Estimate Std. Error t value Pr(>|t|)
## (Intercept)                               31.5181     0.3947  79.850  < 2e-16
## `cut(displacement, breaks = 9)(111,154]`  -5.5601     0.6010  -9.252  < 2e-16
## `cut(displacement, breaks = 9)(154,197]`  -8.2848     1.0770  -7.693 1.23e-13
## `cut(displacement, breaks = 9)(197,240]` -11.7022     0.7527 -15.547  < 2e-16
## `cut(displacement, breaks = 9)(240,283]` -12.9788     0.8951 -14.499  < 2e-16
## `cut(displacement, breaks = 9)(283,326]` -16.2276     0.7656 -21.197  < 2e-16
## `cut(displacement, breaks = 9)(326,369]` -16.7923     0.8595 -19.537  < 2e-16
## `cut(displacement, breaks = 9)(369,412]` -17.5494     1.1337 -15.479  < 2e-16
## `cut(displacement, breaks = 9)(412,455]` -18.2959     1.4710 -12.438  < 2e-16
##                                             
## (Intercept)                              ***
## `cut(displacement, breaks = 9)(111,154]` ***
## `cut(displacement, breaks = 9)(154,197]` ***
## `cut(displacement, breaks = 9)(197,240]` ***
## `cut(displacement, breaks = 9)(240,283]` ***
## `cut(displacement, breaks = 9)(283,326]` ***
## `cut(displacement, breaks = 9)(326,369]` ***
## `cut(displacement, breaks = 9)(369,412]` ***
## `cut(displacement, breaks = 9)(412,455]` ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 18.07284)
## 
##     Null deviance: 23819.0  on 391  degrees of freedom
## Residual deviance:  6921.9  on 383  degrees of freedom
## AIC: 2258
## 
## Number of Fisher Scoring iterations: 2

mpg versus horsepower

AutoKnots <- pctKnots(Auto$horsepower)

set.seed(123)
mpg.hp <- generateAllREg(df = Auto,
                         y = "mpg", x = "horsepower", 
                         laby = "Miles per Gallon", labx = "Horsepower", 
                         degree = c(2:10), cut = c(2:10), knot = AutoKnots)

Cross validation

mpg.hp$pltCVGrid

Ringkasan tuning parameter terbaik

mpg.hp$bestSummary
method param RMSE
polynomial 8 4.287447
piecewise 8 4.430233
ncubicspline 7 4.289125
mpg.hp$bestModel
method param RMSE
polynomial 8 4.287447

Tunning Parameter terbaik:

mpg.hp$bestParamDetail
## [1] 8

Nilai observasi vs prediksi

mpg.hp$pltPredGrid

Model terbaik yang diperoleh dari cross-validation adalah polynomial derajat 8.

Ringkasan modeling

summary(mpg.hp$bestModelDetail)
## 
## Call:
## NULL
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -15.5939   -2.5657   -0.2684    2.2188   15.1306  
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                       23.4459     0.2172 107.953  < 2e-16 ***
## `poly(horsepower, degree = 8)1` -120.1377     4.3001 -27.939  < 2e-16 ***
## `poly(horsepower, degree = 8)2`   44.0895     4.3001  10.253  < 2e-16 ***
## `poly(horsepower, degree = 8)3`   -3.9488     4.3001  -0.918  0.35903    
## `poly(horsepower, degree = 8)4`   -5.1878     4.3001  -1.206  0.22839    
## `poly(horsepower, degree = 8)5`   13.2722     4.3001   3.086  0.00217 ** 
## `poly(horsepower, degree = 8)6`   -8.5462     4.3001  -1.987  0.04758 *  
## `poly(horsepower, degree = 8)7`    7.9806     4.3001   1.856  0.06423 .  
## `poly(horsepower, degree = 8)8`    2.1727     4.3001   0.505  0.61366    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 18.49066)
## 
##     Null deviance: 23819.0  on 391  degrees of freedom
## Residual deviance:  7081.9  on 383  degrees of freedom
## AIC: 2266.9
## 
## Number of Fisher Scoring iterations: 2

mpg versus weight

AutoKnots <- pctKnots(Auto$weight)

set.seed(123)
mpg.wg <- generateAllREg(df = Auto,
                         y = "mpg", x = "weight", 
                         laby = "Miles per Gallon", labx = "Weight", 
                         degree = c(2:10), cut = c(2:10), knot = AutoKnots)

Cross validation

mpg.wg$pltCVGrid

Ringkasan tuning parameter terbaik

mpg.wg$bestSummary
method param RMSE
polynomial 2 4.126055
piecewise 6 4.197694
ncubicspline 4 4.11046
mpg.wg$bestModel
method param RMSE
3 ncubicspline 4 4.11046

Tunning Parameter terbaik:

mpg.wg$bestParamDetail
##    20%    40%    60%    80% 
## 2155.0 2583.2 3113.4 3820.8

Nilai observasi vs prediksi

mpg.wg$pltPredGrid

Model terbaik yang diperoleh dari cross-validation adalah natural cubic spline dengan 4* knots.

Ringkasan modeling

summary(mpg.wg$bestModelDetail)
## 
## Call:
## NULL
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -12.8856   -2.7740   -0.4045    1.9150   16.1559  
## 
## Coefficients:
##                                                        Estimate Std. Error
## (Intercept)                                              35.686      1.591
## `ns(weight, knots = c(2155, 2583.2, 3113.4, 3820.8))1`  -10.885      1.553
## `ns(weight, knots = c(2155, 2583.2, 3113.4, 3820.8))2`  -15.031      1.991
## `ns(weight, knots = c(2155, 2583.2, 3113.4, 3820.8))3`  -19.698      1.433
## `ns(weight, knots = c(2155, 2583.2, 3113.4, 3820.8))4`  -26.677      3.648
## `ns(weight, knots = c(2155, 2583.2, 3113.4, 3820.8))5`  -21.759      1.674
##                                                        t value Pr(>|t|)    
## (Intercept)                                             22.423  < 2e-16 ***
## `ns(weight, knots = c(2155, 2583.2, 3113.4, 3820.8))1`  -7.010 1.07e-11 ***
## `ns(weight, knots = c(2155, 2583.2, 3113.4, 3820.8))2`  -7.549 3.20e-13 ***
## `ns(weight, knots = c(2155, 2583.2, 3113.4, 3820.8))3` -13.745  < 2e-16 ***
## `ns(weight, knots = c(2155, 2583.2, 3113.4, 3820.8))4`  -7.312 1.53e-12 ***
## `ns(weight, knots = c(2155, 2583.2, 3113.4, 3820.8))5` -12.997  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 17.53828)
## 
##     Null deviance: 23819.0  on 391  degrees of freedom
## Residual deviance:  6769.8  on 386  degrees of freedom
## AIC: 2243.2
## 
## Number of Fisher Scoring iterations: 2

Kesimpulan

Berdasarkan hasil modeling tidak linier, pemilihan tuning parameter menggunakan 10-fold cross validation, dan evaluasi diperoleh model terbaik untuk masing-masing prediktor sebagai berikut:

summary3 <- rbind(
    cbind(predictor = "displacement", mpg.dp$bestModel, paramlist = paste0(mpg.dp$bestParamDetail, collapse = ",")),
    cbind(predictor = "horsepower",   mpg.hp$bestModel, paramlist = paste0(mpg.hp$bestParamDetail, collapse = ",")),
    cbind(predictor = "weight",       mpg.wg$bestModel, paramlist = paste0(mpg.wg$bestParamDetail, collapse = ",")))

row.names(summary3) <- NULL 
summary3 <- summary3[, c(1,2,3,5,4)]
summary3
predictor method param paramlist RMSE
displacement piecewise 9 9 4.222382
horsepower polynomial 8 8 4.287447
weight ncubicspline 4 2155,2583.2,3113.4,3820.8 4.11046

Hasil modeling dapat disajikan dalam diagram pencar dan kurva prediksi sebagai berikut:

plt.dp <- mpg.dp$pltPred[[2]] 
plt.dp <- plt.dp + ggtitle("Miles per Gallon vs Displacement", subtitle = plt.dp$labels$title)

plt.hp <- mpg.hp$pltPred[[1]] 
plt.hp <- plt.hp + ggtitle("Miles per Gallon vs Horsepower", subtitle = plt.hp$labels$title)

plt.wg <- mpg.wg$pltPred[[3]] 
plt.wg <- plt.wg + ggtitle("Miles per Gallon vs Weight", subtitle = plt.wg$labels$title)

plot_grid(plt.dp, plt.hp, plt.wg)

Meskipun sudah mendapatkan “model terbaik”, tuning model masih perlu dilakukan terutama untuk prediktor horsepower karena masih ada belokan di ujung-ujung kurva.

Daftar Pustaka

Dito, Gerry A. (2021), Regresi Polinomial, Regresi Fungsi Tangga dan Regresi Spline di R, https://gerrydito.github.io/Regresi-Polinomial,-Regresi-Fungsi-Tangga-dan-Regresi-Spline/ (Oct 29, 2021).

Harrell, Frank E., Jr. (2015). Regression Modeling Strategies. Springer Series in Statistics, Springer.

James, G.; Witten, D.; Hastie, T. & Tibshirani, R. (2013), An Introduction to Statistical Learning: with Applications in R, Springer.


  1. Mahasiswa Pascasarjana Statistika dan Sains Data, Institut Pertanian Bogor, NIM: G1501211061, ↩︎