Moving Beyond Linearity

Tugas Praktikum 8 STA581 Sains Data

Nur Andi Setiabudi¹

2021-11-03

Pendahuluan

Tujuan

Melakukan cross validation untuk menghasilkan pemodelan antara tingkat efisiensi penggunaan bahan bakar (direpresentasikan olehmpg atau miles per gallon) versus tiga peubah prediktor yang pada data ISLR::Auto dengan metode:
- Polynomial regression
- Piecewise constant
- Natural cubic splines
Pemodelan dilakukan untuk masing-masing peubah prediktor terpilih.

Output

Mendapatkan model non-linier yang optimal untuk masing-masing peubah prediktor.

Metodologi

Tahapan

Pemodelan dilakukan dengan tahapan sebagai berikut:

Skenario Tuning Parameter

Cross validation

4-fold cross validation dengan meminimumkan RMSE (root mean squared error)
Polynomial regression

Derajat/degree polinomial awal: c(2:10)
Piecewise constant

Jumlah bins/breaks awal: c(2:10)
Natural Cubic Spline

Jumlah dan lokasi knots awal ditentukan perdasarkan quantile:

Jumlah knots	Lokasi knots (quantiles)
1	`.5,`
2	`.33 * c(1:2)`
3	`.25 * c(1:3)`
3*	`c(.10, .5, .90)`
4	`.20 * c(1:4)`
4*	`c(.05, .35, .65, .95)`
5	`.167 * c(1:5)`
5*	`c(.05, .275, .5, .725, .95)`
6	`.143 * c(1:6)`
6*	`c(.05, .23, .41, .59, .77, .95)`
7	`.125 * c(1:7)`
7*	`c(.025, .1833, .3417, .5, .6583, .8167, .975)`

Sumber : Frank E. Harrell, Jr. (2015). Regression Modeling Strategies

Tools

R
splines dan stats untuk pemodelan
caret untuk cross validation
ggplot2 dan cowplot untuk grafik/visualisasi

Kumpulan Fungsi

Untuk memudahkan iterasi, dibuat beberapa fungsi:

Modeling dan Cross Validation

library(splines)
library(caret)

beyondLinear <- function(df, y, x, model, param){
  
  #' beyondLinear()
  #' 
  #' Perform k-fold cross validation for Polynomial, Piecewise, Natural Cubic Spline
  #' and select the best model based on RMSE
  #' 
  #' Input:
  #' df: data-frame, ex: Auto
  #' y: quoted y-variable, ex: "mpg"
  #' x: quoted x-variable, ex: "horsepower"
  #' model: model/algorithm: polynomial, piecewise, ncubicspline
  #' param: tuning parameter
  
  cvResult <- mapply(
    function(param, model){
      
      if (model == "polynomial"){
        f <- sprintf("%s ~ poly(%s, degree = %s)", y, x, param)
      } else if (model == "piecewise") {
        f <- sprintf("%s ~ cut(%s, breaks = %s)", y, x, param)
      } else if (model == "ncubicspline") {
        f <- sprintf("%s ~ ns(%s, knots = c(%s))", y, x, paste(param, collapse = ","))
      } else if (model == "smoothspline") {
        f <- smooth.spline(cars[,"speed"], cars[,"dist"], df = param)
      }
      
      trCtl <- trainControl(method='cv', number = 10)
      models <- train(as.formula(f), data = df, trControl = trCtl, method = 'glm')
      
      RMSE <- models$results$RMSE
      fModel <- models$finalModel
      
      return(list(models, RMSE, fModel))
    },  model = model, param = param, USE.NAMES = FALSE, SIMPLIFY = FALSE )
  
  
  if(!is.null(names(param))){
    paramIdx <- names(param)
  } else {
    paramIdx <- param
  }
  
  cvScore <- data.frame(paramIdx = as.character(unlist(paramIdx)),
                        score = unlist(sapply(cvResult, "[", 2, USE.NAMES = F)))
  cvScore$param <- param
  
  bestPos <- which.min(cvScore$score)[1]
  bestParam <- param[[bestPos]]
  bestScore <- cvResult[[bestPos]][[2]]
  bestModel <- cvResult[[bestPos]][[1]]
  
  predValue <- data.frame(df[c(y,x)], 
                          pred = predict(bestModel, df, interval = "prediction"))
  
  return(list(method = model,
              cvScore = cvScore, 
              whichBest = bestPos, 
              bestParamIdx = paramIdx[bestPos],
              bestParam = bestParam, 
              bestScore = bestScore, 
              bestModel = bestModel,
              predValue = predValue))
}

Grafik

library(ggplot2)
library(cowplot)

cvPlot <- function(df, title, xlab){
  
  #' cvPlot()
  #' 
  #' Create cross-validation plot for Polynomial, Piecewise, Natural Cubic Spline
  #' based on given tuning parameters
  
  scaleFUN <- function(x) as.character(round(x,2))
  df$paramIdx <- factor(df$paramIdx, levels = df$paramIdx)
  
  ggplot(df, aes(x=paramIdx, y=score)) + 
    geom_bar(stat = "identity", width=0.1, fill = "black") +
    coord_cartesian(ylim = c(0.99*min(df$score), NA)) +
    geom_hline(yintercept = min(df$score), linetype="dashed", color = "red") +
    ggtitle(title) +
    xlab(xlab) + ylab("CV Root Mean Squared Error") +
    theme_classic()
}

predPlot <- function(df, y, laby, x, labx, title){
  
  #' predPlot()
  #' 
  #' Create plot of best model's fitted value vs response
  #' Along with its observed values
  
  ggplot(df) +
    geom_point(aes_string(x = x, y = y, fill = as.factor(1))) +
    geom_line(aes_string(x = x, y = "pred", color = as.factor(1)), size = 1) +
    scale_fill_manual(labels = c("Observed"), values = c("black")) +
    scale_color_manual(labels = c("Predicted"), values = c("blue"))  +
    theme_classic() +
    ggtitle(title) +
    xlab(labx) + ylab(laby) +
    theme(legend.title = element_blank(), 
          legend.position = "bottom", 
          legend.box = "horizontal")
}

predPlotAll <- function(df, y, x, laby, labx) {
  
  #' predPlotAll()
  #' 
  #' Create plot of all best model's fitted value vs response 
  #' In one frame
  
  ggplot(df) +
    geom_point(aes_string(x = x, y = y, fill = as.factor(1))) +
    geom_line(aes_string(x = x, y = "polynomial", color = as.factor(2)), size = 1) +
    geom_line(aes_string(x = x, y = "piecewise", color = as.factor(3)), size = 1) +
    geom_line(aes_string(x = x, y = "ncubicspline", color = as.factor(4)), size = 1) +
    scale_fill_manual(labels = c("Observed"), values = c("black")) +
    scale_color_manual(labels = c("Polynomial", "Piecewise", "Natural Cubic Splines"), 
                       values = c("blue", "red", "green"))  +
    theme_classic() +
    ggtitle("Comparison of Best Predictions") +
    xlab(labx) + ylab(laby) +
    theme(legend.title = element_blank(), 
          legend.position = "bottom", 
          legend.box = "horizontal")
}

Fungsi wrapper

generateAllREg <- function(df, y, x, laby, labx, degree, cut, knot, best=NULL){
  
  #' generateAllREg()
  #' 
  #' Wrapper function to call 
  #'   beyondLinear()
  #'   cvPlot()
  #'   predPlot()
  #' in one call
  
  methods <- c("polynomial", "piecewise", "ncubicspline")
  method_alias <- c("Polynomial Regression", "Piecewise Constant", "Natural Cubic Splines")
  params <- list(Degree = degree, `N Bins` = cut, `N Knots` = knot)
  
  outModel <- list()
  pltCV <- list()
  pltPred <- list()
  
  for( i in seq(length(methods)) ){
    
    model <- beyondLinear(df = df, y = y, x = x, model = methods[i], param = params[[i]])
    
    outModel[[i]] <- model
    
    
    p <- cvPlot(model$cvScore,
                title = method_alias[i],
                xlab =  names(params)[i])
    
    pltCV[[i]] <- p
    
    p <- predPlot(df = model$predValue, 
                  y = y, laby = laby,
                  x = x, labx = labx,
                  title = sprintf("%s with %s=%s (RMSE=%s)", 
                                  method_alias[i], names(params)[i], 
                                  model$bestParamIdx, round(model$bestScore,2)))
    
    if(methods[i] == "ncubicspline"){
      p <- p + geom_vline(xintercept = model$bestParam, linetype="dashed", color = "gray")
    }
    
    pltPred[[i]] <- p
  }
  
  bestSummary <- data.frame(t(sapply(outModel, "[", c("method", "bestParamIdx", "bestScore"))))
  names(bestSummary) <- c("method", "param", "RMSE")
  
  predAllBest <- data.frame(outModel[[1]]$predValue, 
                            outModel[[2]]$predValue$pred, 
                            outModel[[3]]$predValue$pred)
  names(predAllBest) <- c(y, x, methods)
  
  pltPredAll <- predPlotAll(df = predAllBest, 
                          y = y, x = x, 
                          laby = laby, labx = labx)
    
    if(is.null(best)){
      bIdx <- which.min(bestSummary$RMSE)
    } else {
      bIdx <- best
    }
  
  bestModel <- bestSummary[bIdx, ]
  
  predicted <- outModel[[bIdx]]$predValue
  
  return(list(bestSummary = bestSummary, 
              bestModel = bestModel,
              bestModelDetail = outModel[[bIdx]]$bestModel,
              bestParamDetail = outModel[[bIdx]]$bestParam,
              model = outModel, 
              pltCV = pltCV, 
              pltPred = pltPred,
              pltPredAll = pltPredAll,
              pltCVGrid = plot_grid(pltCV[[1]], pltCV[[2]], pltCV[[3]]),
              pltPredGrid = plot_grid(pltPred[[1]], pltPred[[2]], pltPred[[3]], pltPredAll),
              predBest3 = predAllBest,
              predBestFinal = predicted))
  
}

Penentuan Knots

pctKnots <- function(x){
  
  #' pctKnots()
  #' 
  #' Calculate knots based on quartiles
  
  pct <- list(
    .5,
    .33*c(1:2),
    .25*c(1:3),
    c(.10,.5,.90),
    .20*c(1:4),
    c(.05,.35,.65,.95),
    .167*c(1:5),
    c(.05,.275,.5,.725,.95),
    .143*c(1:6),
    c(.05,.23,.41,.59,.77,.95),
    .125*c(1:7),
    c(.025,.1833,.3417,.5,.6583,.8167,.975))
  
  knt <- sapply(pct, function(p, i){round(quantile(i, p),2)}, i = x)
  names(knt) <- c(1,2,3,"3*",4,"4*",5,"5*",6,"6*",7,"7*")
  return(knt)
  
}

Hasil dan Pembahasan

Preview Dataset

library(ISLR)
head(Auto)

mpg	cylinders	displacement	horsepower	weight	acceleration	year	origin	name
18	8	307	130	3504	12.0	70	1	chevrolet chevelle malibu
15	8	350	165	3693	11.5	70	1	buick skylark 320
18	8	318	150	3436	11.0	70	1	plymouth satellite
16	8	304	150	3433	12.0	70	1	amc rebel sst
17	8	302	140	3449	10.5	70	1	ford torino
15	8	429	198	4341	10.0	70	1	ford galaxie 500

summary(Auto)

##       mpg          cylinders      displacement     horsepower        weight    
##  Min.   : 9.00   Min.   :3.000   Min.   : 68.0   Min.   : 46.0   Min.   :1613  
##  1st Qu.:17.00   1st Qu.:4.000   1st Qu.:105.0   1st Qu.: 75.0   1st Qu.:2225  
##  Median :22.75   Median :4.000   Median :151.0   Median : 93.5   Median :2804  
##  Mean   :23.45   Mean   :5.472   Mean   :194.4   Mean   :104.5   Mean   :2978  
##  3rd Qu.:29.00   3rd Qu.:8.000   3rd Qu.:275.8   3rd Qu.:126.0   3rd Qu.:3615  
##  Max.   :46.60   Max.   :8.000   Max.   :455.0   Max.   :230.0   Max.   :5140  
##                                                                                
##   acceleration        year           origin                      name    
##  Min.   : 8.00   Min.   :70.00   Min.   :1.000   amc matador       :  5  
##  1st Qu.:13.78   1st Qu.:73.00   1st Qu.:1.000   ford pinto        :  5  
##  Median :15.50   Median :76.00   Median :1.000   toyota corolla    :  5  
##  Mean   :15.54   Mean   :75.98   Mean   :1.577   amc gremlin       :  4  
##  3rd Qu.:17.02   3rd Qu.:79.00   3rd Qu.:2.000   amc hornet        :  4  
##  Max.   :24.80   Max.   :82.00   Max.   :3.000   chevrolet chevette:  4  
##                                                  (Other)           :365

Hubungan `mpg` terhadap peubah-peubah lain

Diagram pencar di bawah menunjukkan hubungan peubah mpg terhadap peubah-peubah lain yang ada dalam dataset, yaitu:

cylinders
displacement
horsepower
weight
year

plotmpg <- function(x){
  ggplot(Auto) +
  geom_point(aes_string(x = tolower(x), y = "mpg"), alpha = 0.8) +
  ggtitle(paste0("Miles per Gallon vs ", x)) +
  xlab(x) + ylab("Miles per Gallon") +
  theme_classic() 
}

p1 <- plotmpg("cylinders")
p1

p2 <- plotmpg("displacement")
p2

p3 <- plotmpg("horsepower")
p3

p4 <- plotmpg("weight")
p4

p5 <- plotmpg("year")
p5

Jika disatukan dalam satu frame, maka dapat dilihat:

plot_grid(p1, p2, p3, p4, p5, ncol = 2)

Berdasarkan diagram pencar di atas, maka terlihat hubungan mpg dengan peubah-peubah lain bersifat tidak linier.

Untuk proses selanjutnya, akan dipilih tiga peubah sebagai prediktor bagi mpg, yaitu displacement, horsepower dan weight.

`mpg` versus `displacement`

AutoKnots <- pctKnots(Auto$displacement)

set.seed(123)
mpg.dp <- generateAllREg(df = Auto,
                         y = "mpg", x = "displacement", 
                         laby = "Miles per Gallon", labx = "Displacement", 
                         degree = c(2:10), cut = c(2:10), knot = AutoKnots,
                         best = 2)

Cross validation

mpg.dp$pltCVGrid

Ringkasan tuning parameter terbaik

mpg.dp$bestSummary

method	param	RMSE
polynomial	10	4.146329
piecewise	9	4.222382
ncubicspline	6*	4.15199

mpg.dp$bestModel

	method	param	RMSE
2	piecewise	9	4.222382

Tunning Parameter terbaik:

mpg.dp$bestParamDetail

## [1] 9

Nilai observasi vs prediksi

mpg.dp$pltPredGrid

Model terbaik yang diperoleh dari cross-validation adalah polynomial derajat 10 yang memberikan RMSE 4.15. Akan tetapi, dari kurva terlihal nilai prediksi polynomial dan natural cubic splines menunjukkan ada belokan di ujung sebelah kiri kurva. Karena itu, dipilih model lainnya yaitu piecewise constant dengan jumlah tangga 9. Konsekuensinya, RMSE meningkat menjadi 4.22.

Ringkasan modeling

summary(mpg.dp$bestModelDetail)

## 
## Call:
## NULL
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -13.5181   -2.2250   -0.2905    2.0339   19.4607  
## 
## Coefficients:
##                                          Estimate Std. Error t value Pr(>|t|)
## (Intercept)                               31.5181     0.3947  79.850  < 2e-16
## `cut(displacement, breaks = 9)(111,154]`  -5.5601     0.6010  -9.252  < 2e-16
## `cut(displacement, breaks = 9)(154,197]`  -8.2848     1.0770  -7.693 1.23e-13
## `cut(displacement, breaks = 9)(197,240]` -11.7022     0.7527 -15.547  < 2e-16
## `cut(displacement, breaks = 9)(240,283]` -12.9788     0.8951 -14.499  < 2e-16
## `cut(displacement, breaks = 9)(283,326]` -16.2276     0.7656 -21.197  < 2e-16
## `cut(displacement, breaks = 9)(326,369]` -16.7923     0.8595 -19.537  < 2e-16
## `cut(displacement, breaks = 9)(369,412]` -17.5494     1.1337 -15.479  < 2e-16
## `cut(displacement, breaks = 9)(412,455]` -18.2959     1.4710 -12.438  < 2e-16
##                                             
## (Intercept)                              ***
## `cut(displacement, breaks = 9)(111,154]` ***
## `cut(displacement, breaks = 9)(154,197]` ***
## `cut(displacement, breaks = 9)(197,240]` ***
## `cut(displacement, breaks = 9)(240,283]` ***
## `cut(displacement, breaks = 9)(283,326]` ***
## `cut(displacement, breaks = 9)(326,369]` ***
## `cut(displacement, breaks = 9)(369,412]` ***
## `cut(displacement, breaks = 9)(412,455]` ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 18.07284)
## 
##     Null deviance: 23819.0  on 391  degrees of freedom
## Residual deviance:  6921.9  on 383  degrees of freedom
## AIC: 2258
## 
## Number of Fisher Scoring iterations: 2

`mpg` versus `horsepower`

AutoKnots <- pctKnots(Auto$horsepower)

set.seed(123)
mpg.hp <- generateAllREg(df = Auto,
                         y = "mpg", x = "horsepower", 
                         laby = "Miles per Gallon", labx = "Horsepower", 
                         degree = c(2:10), cut = c(2:10), knot = AutoKnots)

Cross validation

mpg.hp$pltCVGrid

Ringkasan tuning parameter terbaik

mpg.hp$bestSummary

method	param	RMSE
polynomial	8	4.287447
piecewise	8	4.430233
ncubicspline	7	4.289125

mpg.hp$bestModel

method	param	RMSE
polynomial	8	4.287447

Tunning Parameter terbaik:

mpg.hp$bestParamDetail

## [1] 8

Nilai observasi vs prediksi

mpg.hp$pltPredGrid

Model terbaik yang diperoleh dari cross-validation adalah polynomial derajat 8.

Ringkasan modeling

summary(mpg.hp$bestModelDetail)

## 
## Call:
## NULL
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -15.5939   -2.5657   -0.2684    2.2188   15.1306  
## 
## Coefficients:
##                                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                       23.4459     0.2172 107.953  < 2e-16 ***
## `poly(horsepower, degree = 8)1` -120.1377     4.3001 -27.939  < 2e-16 ***
## `poly(horsepower, degree = 8)2`   44.0895     4.3001  10.253  < 2e-16 ***
## `poly(horsepower, degree = 8)3`   -3.9488     4.3001  -0.918  0.35903    
## `poly(horsepower, degree = 8)4`   -5.1878     4.3001  -1.206  0.22839    
## `poly(horsepower, degree = 8)5`   13.2722     4.3001   3.086  0.00217 ** 
## `poly(horsepower, degree = 8)6`   -8.5462     4.3001  -1.987  0.04758 *  
## `poly(horsepower, degree = 8)7`    7.9806     4.3001   1.856  0.06423 .  
## `poly(horsepower, degree = 8)8`    2.1727     4.3001   0.505  0.61366    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 18.49066)
## 
##     Null deviance: 23819.0  on 391  degrees of freedom
## Residual deviance:  7081.9  on 383  degrees of freedom
## AIC: 2266.9
## 
## Number of Fisher Scoring iterations: 2

`mpg` versus `weight`

AutoKnots <- pctKnots(Auto$weight)

set.seed(123)
mpg.wg <- generateAllREg(df = Auto,
                         y = "mpg", x = "weight", 
                         laby = "Miles per Gallon", labx = "Weight", 
                         degree = c(2:10), cut = c(2:10), knot = AutoKnots)

Cross validation

mpg.wg$pltCVGrid

Ringkasan tuning parameter terbaik

mpg.wg$bestSummary

method	param	RMSE
polynomial	2	4.126055
piecewise	6	4.197694
ncubicspline	4	4.11046

mpg.wg$bestModel

	method	param	RMSE
3	ncubicspline	4	4.11046

Tunning Parameter terbaik:

mpg.wg$bestParamDetail

##    20%    40%    60%    80% 
## 2155.0 2583.2 3113.4 3820.8

Nilai observasi vs prediksi

mpg.wg$pltPredGrid

Model terbaik yang diperoleh dari cross-validation adalah natural cubic spline dengan 4* knots.

Ringkasan modeling

summary(mpg.wg$bestModelDetail)

## 
## Call:
## NULL
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -12.8856   -2.7740   -0.4045    1.9150   16.1559  
## 
## Coefficients:
##                                                        Estimate Std. Error
## (Intercept)                                              35.686      1.591
## `ns(weight, knots = c(2155, 2583.2, 3113.4, 3820.8))1`  -10.885      1.553
## `ns(weight, knots = c(2155, 2583.2, 3113.4, 3820.8))2`  -15.031      1.991
## `ns(weight, knots = c(2155, 2583.2, 3113.4, 3820.8))3`  -19.698      1.433
## `ns(weight, knots = c(2155, 2583.2, 3113.4, 3820.8))4`  -26.677      3.648
## `ns(weight, knots = c(2155, 2583.2, 3113.4, 3820.8))5`  -21.759      1.674
##                                                        t value Pr(>|t|)    
## (Intercept)                                             22.423  < 2e-16 ***
## `ns(weight, knots = c(2155, 2583.2, 3113.4, 3820.8))1`  -7.010 1.07e-11 ***
## `ns(weight, knots = c(2155, 2583.2, 3113.4, 3820.8))2`  -7.549 3.20e-13 ***
## `ns(weight, knots = c(2155, 2583.2, 3113.4, 3820.8))3` -13.745  < 2e-16 ***
## `ns(weight, knots = c(2155, 2583.2, 3113.4, 3820.8))4`  -7.312 1.53e-12 ***
## `ns(weight, knots = c(2155, 2583.2, 3113.4, 3820.8))5` -12.997  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 17.53828)
## 
##     Null deviance: 23819.0  on 391  degrees of freedom
## Residual deviance:  6769.8  on 386  degrees of freedom
## AIC: 2243.2
## 
## Number of Fisher Scoring iterations: 2

Kesimpulan

Berdasarkan hasil modeling tidak linier, pemilihan tuning parameter menggunakan 10-fold cross validation, dan evaluasi diperoleh model terbaik untuk masing-masing prediktor sebagai berikut:

summary3 <- rbind(
    cbind(predictor = "displacement", mpg.dp$bestModel, paramlist = paste0(mpg.dp$bestParamDetail, collapse = ",")),
    cbind(predictor = "horsepower",   mpg.hp$bestModel, paramlist = paste0(mpg.hp$bestParamDetail, collapse = ",")),
    cbind(predictor = "weight",       mpg.wg$bestModel, paramlist = paste0(mpg.wg$bestParamDetail, collapse = ",")))

row.names(summary3) <- NULL 
summary3 <- summary3[, c(1,2,3,5,4)]
summary3

predictor	method	param	paramlist	RMSE
displacement	piecewise	9	9	4.222382
horsepower	polynomial	8	8	4.287447
weight	ncubicspline	4	2155,2583.2,3113.4,3820.8	4.11046

Hasil modeling dapat disajikan dalam diagram pencar dan kurva prediksi sebagai berikut:

plt.dp <- mpg.dp$pltPred[[2]] 
plt.dp <- plt.dp + ggtitle("Miles per Gallon vs Displacement", subtitle = plt.dp$labels$title)

plt.hp <- mpg.hp$pltPred[[1]] 
plt.hp <- plt.hp + ggtitle("Miles per Gallon vs Horsepower", subtitle = plt.hp$labels$title)

plt.wg <- mpg.wg$pltPred[[3]] 
plt.wg <- plt.wg + ggtitle("Miles per Gallon vs Weight", subtitle = plt.wg$labels$title)

plot_grid(plt.dp, plt.hp, plt.wg)

Meskipun sudah mendapatkan “model terbaik”, tuning model masih perlu dilakukan terutama untuk prediktor horsepower karena masih ada belokan di ujung-ujung kurva.

Daftar Pustaka

Dito, Gerry A. (2021), Regresi Polinomial, Regresi Fungsi Tangga dan Regresi Spline di R, https://gerrydito.github.io/Regresi-Polinomial,-Regresi-Fungsi-Tangga-dan-Regresi-Spline/ (Oct 29, 2021).

Harrell, Frank E., Jr. (2015). Regression Modeling Strategies. Springer Series in Statistics, Springer.

James, G.; Witten, D.; Hastie, T. & Tibshirani, R. (2013), An Introduction to Statistical Learning: with Applications in R, Springer.

Mahasiswa Pascasarjana Statistika dan Sains Data, Institut Pertanian Bogor, NIM: G1501211061, nur.andi@apps.ipb.ac.id ↩︎

Moving Beyond Linearity

Tugas Praktikum 8 STA581 Sains Data

Pendahuluan

Tujuan

Output

Metodologi

Tahapan

Skenario Tuning Parameter

Tools

Kumpulan Fungsi

Modeling dan Cross Validation

Grafik

Fungsi wrapper

Penentuan Knots

Hasil dan Pembahasan

Preview Dataset

Hubungan mpg terhadap peubah-peubah lain

mpg versus displacement

Cross validation

Ringkasan tuning parameter terbaik

Nilai observasi vs prediksi

Ringkasan modeling

mpg versus horsepower

Cross validation

Ringkasan tuning parameter terbaik

Nilai observasi vs prediksi

Ringkasan modeling

mpg versus weight

Cross validation

Ringkasan tuning parameter terbaik

Nilai observasi vs prediksi

Ringkasan modeling

Kesimpulan

Daftar Pustaka

Hubungan `mpg` terhadap peubah-peubah lain

`mpg` versus `displacement`

`mpg` versus `horsepower`

`mpg` versus `weight`