Course Description Machine learning is the study and application of algorithms that learn from and make predictions on data. From search results to self-driving cars, it has manifested itself in all areas of our lives and is one of the most exciting and fast growing fields of research in the world of data science. This course teaches the big ideas in machine learning: how to build and evaluate predictive models, how to tune them for optimal performance, how to preprocess data for better results, and much more. The popular caret R package, which provides a consistent interface to all of R’s most powerful machine learning facilities, is used throughout the course.

library(dplyr)
library(ggplot2)
install.packages("mlbench")
library(mlbench) # sonar data
install.packages("caret")
library(caret)
library(caTools)
#source('create_datasets.R')

Chapter 1: Regression models: fitting them and evaluating their performance

In the first chapter of this course, you’ll fit regression models with train() and evaluate their out-of-sample performance using cross-validation and root-mean-square error (RMSE).

1.1: n-sample RMSE for linear regression

RMSE is commonly calculated in-sample on your training set. What’s a potential drawback to calculating training set error?

Answer the question

50 XP

Possible Answers

There’s no potential drawback to calculating training set error, but you should calculate R2 instead of RMSE.
You have no idea how well your model generalizes to new data (i.e. overfitting).
You should manually inspect your model to validate its coefficients and calculate RMSE.

head(diamonds)

# Fit lm model: model
model<-lm(price~.,diamonds)
# Predict on full data: p
p<-predict(model)
# Compute errors: error
error<-p-diamonds$price
# Calculate RMSE
RMSE<-sqrt(mean(error^2))
RMSE

[1] 1129.843

1.2: Out-of-sample RMSE for linear regression

What is the advantage of using a train/test split rather than just validating your model in-sample on the training set?

Answer the question

50 XP

Possible Answers

It takes less time to calculate error on the test set, since it is smaller than the training set.
There is no advantage to using a test set. You can just use adjusted R2 on your training set.
It gives you an estimate of how well your model performs on new data.

1.3: Randomly order the data frame

One way you can take a train/test split of a dataset is to order the dataset randomly, then divide it into the two sets. This ensures that the training set and test set are both random samples and that any biases in the ordering of the dataset (e.g. if it had originally been ordered by price or size) are not retained in the samples we take for training and testing your models. You can think of this like shuffling a brand new deck of playing cards before dealing hands.

First, you set a random seed so that your work is reproducible and you get the same random split each time you run your script:

set.seed(42)

Next, you use the sample() function to shuffle the row indices of the diamonds dataset. You can later use these indices to reorder the dataset.

rows <- sample(nrow(diamonds)) Finally, you can use this random vector to reorder the diamonds dataset:

diamonds <- diamonds[rows, ]

Instructions

100 XP

Set the random seed to 42.
Make a vector of row indices called rows.
Randomly reorder the diamonds data frame.

# Set seed
set.seed(42)
# Shuffle row indices: rows
rows <- sample(nrow(diamonds))
# Randomly order data
diamonds <- diamonds[rows, ]

1.4: Try an 80/20 split

Now that your dataset is randomly ordered, you can split the first 80% of it into a training set, and the last 20% into a test set. You can do this by choosing a split point approximately 80% of the way through your data:

split <- round(nrow(mydata) * .80) You can then use this point to break off the first 80% of the dataset as a training set:

mydata[1:split, ] And then you can use that same point to determine the test set:

mydata[(split + 1):nrow(mydata), ]

Instructions

100 XP

Choose a row index to split on so that the split point is approximately 80% of the way through the diamonds dataset. Call this index split.
Create a training set called train using that index.
Create a test set called test using that index.

# Determine row to split on: split
split <- round(nrow(diamonds) * .80)
# Create train
train<-diamonds[1:split, ]
# Create test
test<-diamonds[(split+1):nrow(diamonds), ]

1.5: Predict on test set

Now that you have a randomly split training set and test set, you can use the lm() function as you did in the first exercise to fit a model to your training set, rather than the entire dataset. Recall that you can use the formula interface to the linear regression function to fit a model with a specified target variable using all other variables in the dataset as predictors:

mod <- lm(y ~ ., training_data) You can use the predict() function to make predictions from that model on new data. The new dataset must have all of the columns from the training data, but they can be in a different order with different values. Here, rather than re-predicting on the training set, you can predict on the test set, which you did not use for training the model. This will allow you to determine the out-of-sample error for the model in the next exercise:

p <- predict(model, new_data)

Instructions

100 XP

Fit an lm() model called model to predict price using all other variables as covariates. Be sure to use the training set, train.
Predict on the test set, test, using predict(). Store these values in a vector called p.

# Fit lm model on train: model
model<-lm(price~.,train)
# Predict on test: p
p<-predict(model,newdata=test)

1.6: Calculate test set RMSE by hand

Now that you have predictions on the test set, you can use these predictions to calculate an error metric (in this case RMSE) on the test set and see how the model performs out-of-sample, rather than in-sample as you did in the first exercise. You first do this by calculating the errors between the predicted diamond prices and the actual diamond prices by subtracting the predictions from the actual values.

Once you have an error vector, calculating RMSE is as simple as squaring it, taking the mean, then taking the square root:

sqrt(mean(error^2))

Instructions

100 XP

test, model, and p are loaded in your workspace.
Calculate the error between the predictions on the test set and the actual diamond prices in the test set. Call this error.
Calculate RMSE using this error vector, just printing the result to the console.

RMSE

[1] 1136.596

1.7: Comparing out-of-sample RMSE to in-sample RMSE

Why is the test set RMSE higher than the training set RMSE?

Answer the question

50 XP

Possible Answers

Because you overfit the training set and the test set contains data the model hasn’t seen before.
Because you should not use a test set at all and instead just look at error on the training set.
Because the test set has a smaller sample size the training set and thus the mean error is lower. [ans]

Remark: Though the test set has the smaller sample size, the mean error is not necessarily lower.

1.8: Advantage of cross-validation

What is the advantage of cross-validation over a single train/test split?

Answer the question

50 XP

Possible Answers

There is no advantage to cross-validation, just as there is no advantage to a single train/test split. You should be validating your models in-sample with a metric like adjusted R2.
You can pick the best test set to minimize the reported RMSE of your model.
It gives you multiple estimates of out-of-sample error, rather than a single estimate. [ans]

Remark: If all of your estimates give similar outputs, you can be more certain of the model’s accuracy. If your estimates give different outputs, that tells you the model does not perform consistently and suggests a problem with it.

1.9: 10-fold cross-validation

As you saw in the video, a better approach to validating models is to use multiple systematic test sets, rather than a single random train/test split. Fortunately, the caret package makes this very easy to do:

model <- train(y ~ ., my_data) caret supports many types of cross-validation, and you can specify which type of cross-validation and the number of cross-validation folds with the trainControl() function, which you pass to the trControl argument in train():

model <- train( y ~ ., my_data, method = “lm”, trControl = trainControl( method = “cv”, number = 10, verboseIter = TRUE ) )

It’s important to note that you pass the method for modeling to the main train() function and the method for cross-validation to the trainControl() function.

Instructions

100 XP

Fit a linear regression to model price using all other variables in the diamonds dataset as predictors.
Use the train() function and 10-fold cross-validation. (Note that we’ve taken a subset of the full diamonds dataset to speed up this operation, but it’s still named diamonds.)
Print the model to the console and examine the results.

# Fit lm model using 10-fold CV: model
model <- train(
  price~., diamonds,
  method = "lm",
  trControl = trainControl(
    method = "cv", number = 10,
    verboseIter = TRUE
  )
)

+ Fold01: intercept=TRUE 
- Fold01: intercept=TRUE 
+ Fold02: intercept=TRUE 
- Fold02: intercept=TRUE 
+ Fold03: intercept=TRUE 
- Fold03: intercept=TRUE 
+ Fold04: intercept=TRUE 
- Fold04: intercept=TRUE 
+ Fold05: intercept=TRUE 
- Fold05: intercept=TRUE 
+ Fold06: intercept=TRUE 
- Fold06: intercept=TRUE 
+ Fold07: intercept=TRUE 
- Fold07: intercept=TRUE 
+ Fold08: intercept=TRUE 
- Fold08: intercept=TRUE 
+ Fold09: intercept=TRUE 
- Fold09: intercept=TRUE 
+ Fold10: intercept=TRUE 
- Fold10: intercept=TRUE 
Aggregating results
Fitting final model on full training set

# Print model to console
model

Linear Regression 

53940 samples
    9 predictor

No pre-processing
Resampling: Cross-Validated (10 fold) 
Summary of sample sizes: 48547, 48546, 48546, 48545, 48545, 48545, ... 
Resampling results:

  RMSE      Rsquared   MAE     
  1130.658  0.9197492  740.4646

Tuning parameter 'intercept' was held constant at a value of TRUE

1.10: 5-fold cross-validation

In this course, you will use a wide variety of datasets to explore the full flexibility of the caret package. Here, you will use the famous Boston housing dataset, where the goal is to predict median home values in various Boston suburbs.

You can use exactly the same code as in the previous exercise, but change the dataset used by the model:

model <- train( medv ~ ., Boston, method = “lm”, trControl = trainControl( method = “cv”, number = 10, verboseIter = TRUE ) ) Next, you can reduce the number of cross-validation folds from 10 to 5 using the number argument to the trainControl() argument:

trControl = trainControl( method = “cv”, number = 5, verboseIter = TRUE )

Instructions

100 XP

Fit an lm() model to the Boston housing dataset, such that medv is the response variable and all other variables are explanatory variables.
Use 5-fold cross-validation rather than 10-fold cross-validation.
Print the model to the console and inspect the results.

library(MASS) # For loading the Boston dataset

# Fit lm model using 5-fold CV: model
model <- train(
  medv~., Boston,
  method = "lm",
  trControl = trainControl(
    method = "cv", number = 5,
    verboseIter = TRUE
  )
)

+ Fold1: intercept=TRUE 
- Fold1: intercept=TRUE 
+ Fold2: intercept=TRUE 
- Fold2: intercept=TRUE 
+ Fold3: intercept=TRUE 
- Fold3: intercept=TRUE 
+ Fold4: intercept=TRUE 
- Fold4: intercept=TRUE 
+ Fold5: intercept=TRUE 
- Fold5: intercept=TRUE 
Aggregating results
Fitting final model on full training set

# Print model to console
model

Linear Regression 

506 samples
 13 predictor

No pre-processing
Resampling: Cross-Validated (5 fold) 
Summary of sample sizes: 405, 405, 404, 405, 405 
Resampling results:

  RMSE      Rsquared   MAE     
  4.875484  0.7316537  3.425015

Tuning parameter 'intercept' was held constant at a value of TRUE

1.11: 5 x 5-fold cross-validation

You can do more than just one iteration of cross-validation. Repeated cross-validation gives you a better estimate of the test-set error. You can also repeat the entire cross-validation procedure. This takes longer, but gives you many more out-of-sample datasets to look at and much more precise assessments of how well the model performs.

One of the awesome things about the train() function in caret is how easy it is to run very different models or methods of cross-validation just by tweaking a few simple arguments to the function call. For example, you could repeat your entire cross-validation procedure 5 times for greater confidence in your estimates of the model’s out-of-sample accuracy, e.g.:

trControl = trainControl( method = “cv”, number = 5, repeats = 5, verboseIter = TRUE )

Instructions

100 XP

Re-fit the linear regression model to the Boston housing dataset.
Use 5 repeats of 5-fold cross-validation.
Print the model to the console.

# Fit lm model using 5 x 5-fold CV: model
model <- train(
  medv ~ ., Boston,
  method = "lm",
  trControl = trainControl(
    method = "cv", number = 5,
    repeats = 5, verboseIter = TRUE
  )
)

`repeats` has no meaning for this resampling method.

+ Fold1: intercept=TRUE 
- Fold1: intercept=TRUE 
+ Fold2: intercept=TRUE 
- Fold2: intercept=TRUE 
+ Fold3: intercept=TRUE 
- Fold3: intercept=TRUE 
+ Fold4: intercept=TRUE 
- Fold4: intercept=TRUE 
+ Fold5: intercept=TRUE 
- Fold5: intercept=TRUE 
Aggregating results
Fitting final model on full training set

# Print model to console
model

Linear Regression 

506 samples
 13 predictor

No pre-processing
Resampling: Cross-Validated (5 fold) 
Summary of sample sizes: 404, 406, 406, 403, 405 
Resampling results:

  RMSE      Rsquared   MAE    
  4.858793  0.7247119  3.39237

Tuning parameter 'intercept' was held constant at a value of TRUE

1.12: Making predictions on new data

Finally, the model you fit with the train() function has the exact same predict() interface as the linear regression models you fit earlier in this chapter.

After fitting a model with train(), you can simply call predict() with new data, e.g:

predict(my_model, new_data)

Instructions

100 XP

Use the predict() function to make predictions with model on the full Boston housing dataset. Print the result to the console.

# Predict on full Boston dataset
p<-predict(model,Boston)
p

         1          2          3          4          5          6          7          8          9 
30.0038434 25.0255624 30.5675967 28.6070365 27.9435242 25.2562845 23.0018083 19.5359884 11.5236369 
        10         11         12         13         14         15         16         17         18 
18.9202621 18.9994965 21.5867957 20.9065215 19.5529028 19.2834821 19.2974832 20.5275098 16.9114013 
        19         20         21         22         23         24         25         26         27 
16.1780111 18.4061360 12.5238575 17.6710367 15.8328813 13.8062853 15.6783383 13.3866856 15.4639765 
        28         29         30         31         32         33         34         35         36 
14.7084743 19.5473729 20.8764282 11.4551176 18.0592329  8.8110574 14.2827581 13.7067589 23.8146353 
        37         38         39         40         41         42         43         44         45 
22.3419371 23.1089114 22.9150261 31.3576257 34.2151023 28.0205641 25.2038663 24.6097927 22.9414918 
        46         47         48         49         50         51         52         53         54 
22.0966982 20.4232003 18.0365509  9.1065538 17.2060775 21.2815254 23.9722228 27.6558508 24.0490181 
        55         56         57         58         59         60         61         62         63 
15.3618477 31.1526495 24.8568698 33.1091981 21.7753799 21.0849356 17.8725804 18.5111021 23.9874286 
        64         65         66         67         68         69         70         71         72 
22.5540887 23.3730864 30.3614836 25.5305651 21.1133856 17.4215379 20.7848363 25.2014886 21.7426577 
        73         74         75         76         77         78         79         80         81 
24.5574496 24.0429571 25.5049972 23.9669302 22.9454540 23.3569982 21.2619827 22.4281737 28.4057697 
        82         83         84         85         86         87         88         89         90 
26.9948609 26.0357630 25.0587348 24.7845667 27.7904920 22.1685342 25.8927642 30.6746183 30.8311062 
        91         92         93         94         95         96         97         98         99 
27.1190194 27.4126673 28.9412276 29.0810555 27.0397736 28.6245995 24.7274498 35.7815952 35.1145459 
       100        101        102        103        104        105        106        107        108 
32.2510280 24.5802202 25.5941347 19.7901368 20.3116713 21.4348259 18.5399401 17.1875599 20.7504903 
       109        110        111        112        113        114        115        116        117 
22.6482911 19.7720367 20.6496586 26.5258674 20.7732364 20.7154831 25.1720888 20.4302559 23.3772463 
       118        119        120        121        122        123        124        125        126 
23.6904326 20.3357836 20.7918087 21.9163207 22.4710778 20.5573856 16.3666198 20.5609982 22.4817845 
       127        128        129        130        131        132        133        134        135 
14.6170663 15.1787668 18.9386859 14.0557329 20.0352740 19.4101340 20.0619157 15.7580767 13.2564524 
       136        137        138        139        140        141        142        143        144 
17.2627773 15.8784188 19.3616395 13.8148390 16.4488147 13.5714193  3.9888551 14.5949548 12.1488148 
       145        146        147        148        149        150        151        152        153 
 8.7282236 12.0358534 15.8208206  8.5149902  9.7184414 14.8045137 20.8385815 18.3010117 20.1228256 
       154        155        156        157        158        159        160        161        162 
17.2860189 22.3660023 20.1037592 13.6212589 33.2598270 29.0301727 25.5675277 32.7082767 36.7746701 
       163        164        165        166        167        168        169        170        171 
40.5576584 41.8472817 24.7886738 25.3788924 37.2034745 23.0874875 26.4027396 26.6538211 22.5551466 
       172        173        174        175        176        177        178        179        180 
24.2908281 22.9765722 29.0719431 26.5219434 30.7220906 25.6166931 29.1374098 31.4357197 32.9223157 
       181        182        183        184        185        186        187        188        189 
34.7244046 27.7655211 33.8878732 30.9923804 22.7182001 24.7664781 35.8849723 33.4247672 32.4119915 
       190        191        192        193        194        195        196        197        198 
34.5150995 30.7610949 30.2893414 32.9191871 32.1126077 31.5587100 40.8455572 36.1277008 32.6692081 
       199        200        201        202        203        204        205        206        207 
34.7046912 30.0934516 30.6439391 29.2871950 37.0714839 42.0319312 43.1894984 22.6903480 23.6828471 
       208        209        210        211        212        213        214        215        216 
17.8544721 23.4942899 17.0058772 22.3925110 17.0604275 22.7389292 25.2194255 11.1191674 24.5104915 
       217        218        219        220        221        222        223        224        225 
26.6033477 28.3551871 24.9152546 29.6865277 33.1841975 23.7745666 32.1405196 29.7458199 38.3710245 
       226        227        228        229        230        231        232        233        234 
39.8146187 37.5860575 32.3995325 35.4566524 31.2341151 24.4844923 33.2883729 38.0481048 37.1632863 
       235        236        237        238        239        240        241        242        243 
31.7138352 25.2670557 30.1001074 32.7198716 28.4271706 28.4294068 27.2937594 23.7426248 24.1200789 
       244        245        246        247        248        249        250        251        252 
27.4020841 16.3285756 13.3989126 20.0163878 19.8618443 21.2883131 24.0798915 24.2063355 25.0421582 
       253        254        255        256        257        258        259        260        261 
24.9196401 29.9456337 23.9722832 21.6958089 37.5110924 43.3023904 36.4836142 34.9898859 34.8121151 
       262        263        264        265        266        267        268        269        270 
37.1663133 40.9892850 34.4463409 35.8339755 28.2457430 31.2267359 40.8395575 39.3179239 25.7081791 
       271        272        273        274        275        276        277        278        279 
22.3029553 27.2034097 28.5116947 35.4767660 36.1063916 33.7966827 35.6108586 34.8399338 30.3519266 
       280        281        282        283        284        285        286        287        288 
35.3098070 38.7975697 34.3312319 40.3396307 44.6730834 31.5968909 27.3565923 20.1017415 27.0420667 
       289        290        291        292        293        294        295        296        297 
27.2136458 26.9139584 33.4356331 34.4034963 31.8333982 25.8178324 24.4298235 28.4576434 27.3626700 
       298        299        300        301        302        303        304        305        306 
19.5392876 29.1130984 31.9105461 30.7715945 28.9427587 28.8819102 32.7988723 33.2090546 30.7683179 
       307        308        309        310        311        312        313        314        315 
35.5622686 32.7090512 28.6424424 23.5896583 18.5426690 26.8788984 23.2813398 25.5458025 25.4812006 
       316        317        318        319        320        321        322        323        324 
20.5390990 17.6157257 18.3758169 24.2907028 21.3252904 24.8868224 24.8693728 22.8695245 19.4512379 
       325        326        327        328        329        330        331        332        333 
25.1178340 24.6678691 23.6807618 19.3408962 21.1741811 24.2524907 21.5926089 19.9844661 23.3388800 
       334        335        336        337        338        339        340        341        342 
22.1406069 21.5550993 20.6187291 20.1609718 19.2849039 22.1667232 21.2496577 21.4293931 30.3278880 
       343        344        345        346        347        348        349        350        351 
22.0473498 27.7064791 28.5479412 16.5450112 14.7835964 25.2738008 27.5420512 22.1483756 20.4594409 
       352        353        354        355        356        357        358        359        360 
20.5460542 16.8806383 25.4025351 14.3248663 16.5948846 19.6370469 22.7180661 22.2021889 19.2054806 
       361        362        363        364        365        366        367        368        369 
22.6661611 18.9319262 18.2284680 20.2315081 37.4944739 14.2819073 15.5428625 10.8316232 23.8007290 
       370        371        372        373        374        375        376        377        378 
32.6440736 34.6068404 24.9433133 25.9998091  6.1263250  0.7777981 25.3071306 17.7406106 20.2327441 
       379        380        381        382        383        384        385        386        387 
15.8333130 16.8351259 14.3699483 18.4768283 13.4276828 13.0617751  3.2791812  8.0602217  6.1284220 
       388        389        390        391        392        393        394        395        396 
 5.6186481  6.4519857 14.2076474 17.2122518 17.2988727  9.8911664 20.2212419 17.9418118 20.3044578 
       397        398        399        400        401        402        403        404        405 
19.2955908 16.3363278  6.5516232 10.8901678 11.8814587 17.8117451 18.2612659 12.9794878  7.3781636 
       406        407        408        409        410        411        412        413        414 
 8.2111586  8.0662619 19.9829479 13.7075637 19.8526845 15.2230830 16.9607198  1.7185181 11.8057839 
       415        416        417        418        419        420        421        422        423 
-4.2813107  9.5837674 13.3666081  6.8956236  6.1477985 14.6066179 19.6000267 18.1242748 18.5217713 
       424        425        426        427        428        429        430        431        432 
13.1752861 14.6261762  9.9237498 16.3459065 14.0751943 14.2575624 13.0423479 18.1595569 18.6955435 
       433        434        435        436        437        438        439        440        441 
21.5272830 17.0314186 15.9609044 13.3614161 14.5207938  8.8197601  4.8675110 13.0659131 12.7060970 
       442        443        444        445        446        447        448        449        450 
17.2955806 18.7404850 18.0590103 11.5147468 11.9740036 17.6834462 18.1269524 17.5183465 17.2274251 
       451        452        453        454        455        456        457        458        459 
16.5227163 19.4129110 18.5821524 22.4894479 15.2800013 15.8208934 12.6872558 12.8763379 17.1866853 
       460        461        462        463        464        465        466        467        468 
18.5124761 19.0486053 20.1720893 19.7740732 22.4294077 20.3191185 17.8861625 14.3747852 16.9477685 
       469        470        471        472        473        474        475        476        477 
16.9840576 18.5883840 20.1671944 22.9771803 22.4558073 25.5782463 16.3914763 16.1114628 20.5348160 
       478        479        480        481        482        483        484        485        486 
11.5427274 19.2049630 21.8627639 23.4687887 27.0988732 28.5699430 21.0839878 19.4551620 22.2222591 
       487        488        489        490        491        492        493        494        495 
19.6559196 21.3253610 11.8558372  8.2238669  3.6639967 13.7590854 15.9311855 20.6266205 20.6124941 
       496        497        498        499        500        501        502        503        504 
16.8854196 14.0132079 19.1085414 21.2980517 18.4549884 20.4687085 23.5333405 22.3757189 27.6274261 
       505        506 
26.1279668 22.3442123

LS0tDQp0aXRsZTogIkRhdGFjYW1wIFIgLSBNYWNoaW5lIExlYXJuaW5nIFRvb2xib3ggOiBDaGFwdGVyIDEgKFJlZ3Jlc3Npb24gbW9kZWxzOiBmaXR0aW5nIHRoZW0gYW5kIGV2YWx1YXRpbmcgdGhlaXIgcGVyZm9ybWFuY2UpIg0KYXV0aG9yOiAiQ2hlbiBXZWlxaWFuZyINCmRhdGU6ICJOb3ZlbWJlciAyOCwgMjAxOCINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQpDb3Vyc2UgRGVzY3JpcHRpb24NCk1hY2hpbmUgbGVhcm5pbmcgaXMgdGhlIHN0dWR5IGFuZCBhcHBsaWNhdGlvbiBvZiBhbGdvcml0aG1zIHRoYXQgbGVhcm4gZnJvbSBhbmQgbWFrZSBwcmVkaWN0aW9ucyBvbiBkYXRhLiBGcm9tIHNlYXJjaCByZXN1bHRzIHRvIHNlbGYtZHJpdmluZyBjYXJzLCBpdCBoYXMgbWFuaWZlc3RlZCBpdHNlbGYgaW4gYWxsIGFyZWFzIG9mIG91ciBsaXZlcyBhbmQgaXMgb25lIG9mIHRoZSBtb3N0IGV4Y2l0aW5nIGFuZCBmYXN0IGdyb3dpbmcgZmllbGRzIG9mIHJlc2VhcmNoIGluIHRoZSB3b3JsZCBvZiBkYXRhIHNjaWVuY2UuIFRoaXMgY291cnNlIHRlYWNoZXMgdGhlIGJpZyBpZGVhcyBpbiBtYWNoaW5lIGxlYXJuaW5nOiBob3cgdG8gYnVpbGQgYW5kIGV2YWx1YXRlIHByZWRpY3RpdmUgbW9kZWxzLCBob3cgdG8gdHVuZSB0aGVtIGZvciBvcHRpbWFsIHBlcmZvcm1hbmNlLCBob3cgdG8gcHJlcHJvY2VzcyBkYXRhIGZvciBiZXR0ZXIgcmVzdWx0cywgYW5kIG11Y2ggbW9yZS4gVGhlIHBvcHVsYXIgY2FyZXQgUiBwYWNrYWdlLCB3aGljaCBwcm92aWRlcyBhIGNvbnNpc3RlbnQgaW50ZXJmYWNlIHRvIGFsbCBvZiBSJ3MgbW9zdCBwb3dlcmZ1bCBtYWNoaW5lIGxlYXJuaW5nIGZhY2lsaXRpZXMsIGlzIHVzZWQgdGhyb3VnaG91dCB0aGUgY291cnNlLg0KDQpgYGB7ciwgZWNobz1UUlVFLHJlc3VsdHM9J2hpZGUnLGZpZy5rZWVwPSdhbGwnfQ0KbGlicmFyeShkcGx5cikNCmxpYnJhcnkoZ2dwbG90MikNCmluc3RhbGwucGFja2FnZXMoIm1sYmVuY2giKQ0KbGlicmFyeShtbGJlbmNoKSAjIHNvbmFyIGRhdGENCmluc3RhbGwucGFja2FnZXMoImNhcmV0IikNCmxpYnJhcnkoY2FyZXQpDQpsaWJyYXJ5KGNhVG9vbHMpDQoNCiNzb3VyY2UoJ2NyZWF0ZV9kYXRhc2V0cy5SJykNCmBgYA0KDQojIENoYXB0ZXIgMTogUmVncmVzc2lvbiBtb2RlbHM6IGZpdHRpbmcgdGhlbSBhbmQgZXZhbHVhdGluZyB0aGVpciBwZXJmb3JtYW5jZQ0KSW4gdGhlIGZpcnN0IGNoYXB0ZXIgb2YgdGhpcyBjb3Vyc2UsIHlvdSdsbCBmaXQgcmVncmVzc2lvbiBtb2RlbHMgd2l0aCB0cmFpbigpIGFuZCBldmFsdWF0ZSB0aGVpciBvdXQtb2Ytc2FtcGxlIHBlcmZvcm1hbmNlIHVzaW5nIGNyb3NzLXZhbGlkYXRpb24gYW5kIHJvb3QtbWVhbi1zcXVhcmUgZXJyb3IgKFJNU0UpLg0KDQojIyAxLjE6IG4tc2FtcGxlIFJNU0UgZm9yIGxpbmVhciByZWdyZXNzaW9uDQoNClJNU0UgaXMgY29tbW9ubHkgY2FsY3VsYXRlZCBpbi1zYW1wbGUgb24geW91ciB0cmFpbmluZyBzZXQuIFdoYXQncyBhIHBvdGVudGlhbCBkcmF3YmFjayB0byBjYWxjdWxhdGluZyB0cmFpbmluZyBzZXQgZXJyb3I/DQoNCkFuc3dlciB0aGUgcXVlc3Rpb24NCg0KNTAgWFANCg0KUG9zc2libGUgQW5zd2Vycw0KDQoxLiBUaGVyZSdzIG5vIHBvdGVudGlhbCBkcmF3YmFjayB0byBjYWxjdWxhdGluZyB0cmFpbmluZyBzZXQgZXJyb3IsIGJ1dCB5b3Ugc2hvdWxkIGNhbGN1bGF0ZSBSMiBpbnN0ZWFkIG9mIFJNU0UuDQoNCjIuIFlvdSBoYXZlIG5vIGlkZWEgaG93IHdlbGwgeW91ciBtb2RlbCBnZW5lcmFsaXplcyB0byBuZXcgZGF0YSAoaS5lLiBvdmVyZml0dGluZykuDQoNCjMuIFlvdSBzaG91bGQgbWFudWFsbHkgaW5zcGVjdCB5b3VyIG1vZGVsIHRvIHZhbGlkYXRlIGl0cyBjb2VmZmljaWVudHMgYW5kIGNhbGN1bGF0ZSBSTVNFLg0KDQpgYGB7cn0NCmhlYWQoZGlhbW9uZHMpDQojIEZpdCBsbSBtb2RlbDogbW9kZWwNCm1vZGVsPC1sbShwcmljZX4uLGRpYW1vbmRzKQ0KDQojIFByZWRpY3Qgb24gZnVsbCBkYXRhOiBwDQpwPC1wcmVkaWN0KG1vZGVsKQ0KDQojIENvbXB1dGUgZXJyb3JzOiBlcnJvcg0KZXJyb3I8LXAtZGlhbW9uZHMkcHJpY2UNCg0KIyBDYWxjdWxhdGUgUk1TRQ0KUk1TRTwtc3FydChtZWFuKGVycm9yXjIpKQ0KUk1TRQ0KYGBgDQoNCiMjIDEuMjogT3V0LW9mLXNhbXBsZSBSTVNFIGZvciBsaW5lYXIgcmVncmVzc2lvbg0KDQpXaGF0IGlzIHRoZSBhZHZhbnRhZ2Ugb2YgdXNpbmcgYSB0cmFpbi90ZXN0IHNwbGl0IHJhdGhlciB0aGFuIGp1c3QgdmFsaWRhdGluZyB5b3VyIG1vZGVsIGluLXNhbXBsZSBvbiB0aGUgdHJhaW5pbmcgc2V0Pw0KDQpBbnN3ZXIgdGhlIHF1ZXN0aW9uDQoNCjUwIFhQDQoNClBvc3NpYmxlIEFuc3dlcnMNCg0KMS4gSXQgdGFrZXMgbGVzcyB0aW1lIHRvIGNhbGN1bGF0ZSBlcnJvciBvbiB0aGUgdGVzdCBzZXQsIHNpbmNlIGl0IGlzIHNtYWxsZXIgdGhhbiB0aGUgdHJhaW5pbmcgc2V0Lg0KDQoyLiBUaGVyZSBpcyBubyBhZHZhbnRhZ2UgdG8gdXNpbmcgYSB0ZXN0IHNldC4gWW91IGNhbiBqdXN0IHVzZSBhZGp1c3RlZCBSMiBvbiB5b3VyIHRyYWluaW5nIHNldC4NCg0KMy4gSXQgZ2l2ZXMgeW91IGFuIGVzdGltYXRlIG9mIGhvdyB3ZWxsIHlvdXIgbW9kZWwgcGVyZm9ybXMgb24gbmV3IGRhdGEuDQoNCg0KIyMgMS4zOiBSYW5kb21seSBvcmRlciB0aGUgZGF0YSBmcmFtZQ0KDQpPbmUgd2F5IHlvdSBjYW4gdGFrZSBhIHRyYWluL3Rlc3Qgc3BsaXQgb2YgYSBkYXRhc2V0IGlzIHRvIG9yZGVyIHRoZSBkYXRhc2V0IHJhbmRvbWx5LCB0aGVuIGRpdmlkZSBpdCBpbnRvIHRoZSB0d28gc2V0cy4gVGhpcyBlbnN1cmVzIHRoYXQgdGhlIHRyYWluaW5nIHNldCBhbmQgdGVzdCBzZXQgYXJlIGJvdGggcmFuZG9tIHNhbXBsZXMgYW5kIHRoYXQgYW55IGJpYXNlcyBpbiB0aGUgb3JkZXJpbmcgb2YgdGhlIGRhdGFzZXQgKGUuZy4gaWYgaXQgaGFkIG9yaWdpbmFsbHkgYmVlbiBvcmRlcmVkIGJ5IHByaWNlIG9yIHNpemUpIGFyZSBub3QgcmV0YWluZWQgaW4gdGhlIHNhbXBsZXMgd2UgdGFrZSBmb3IgdHJhaW5pbmcgYW5kIHRlc3RpbmcgeW91ciBtb2RlbHMuIFlvdSBjYW4gdGhpbmsgb2YgdGhpcyBsaWtlIHNodWZmbGluZyBhIGJyYW5kIG5ldyBkZWNrIG9mIHBsYXlpbmcgY2FyZHMgYmVmb3JlIGRlYWxpbmcgaGFuZHMuDQoNCkZpcnN0LCB5b3Ugc2V0IGEgcmFuZG9tIHNlZWQgc28gdGhhdCB5b3VyIHdvcmsgaXMgcmVwcm9kdWNpYmxlIGFuZCB5b3UgZ2V0IHRoZSBzYW1lIHJhbmRvbSBzcGxpdCBlYWNoIHRpbWUgeW91IHJ1biB5b3VyIHNjcmlwdDoNCg0Kc2V0LnNlZWQoNDIpDQoNCk5leHQsIHlvdSB1c2UgdGhlIHNhbXBsZSgpIGZ1bmN0aW9uIHRvIHNodWZmbGUgdGhlIHJvdyBpbmRpY2VzIG9mIHRoZSBkaWFtb25kcyBkYXRhc2V0LiBZb3UgY2FuIGxhdGVyIHVzZSB0aGVzZSBpbmRpY2VzIHRvIHJlb3JkZXIgdGhlIGRhdGFzZXQuDQoNCnJvd3MgPC0gc2FtcGxlKG5yb3coZGlhbW9uZHMpKQ0KRmluYWxseSwgeW91IGNhbiB1c2UgdGhpcyByYW5kb20gdmVjdG9yIHRvIHJlb3JkZXIgdGhlIGRpYW1vbmRzIGRhdGFzZXQ6DQoNCmRpYW1vbmRzIDwtIGRpYW1vbmRzW3Jvd3MsIF0NCg0KSW5zdHJ1Y3Rpb25zDQoNCjEwMCBYUA0KDQotIFNldCB0aGUgcmFuZG9tIHNlZWQgdG8gNDIuDQoNCi0gTWFrZSBhIHZlY3RvciBvZiByb3cgaW5kaWNlcyBjYWxsZWQgcm93cy4NCg0KLSBSYW5kb21seSByZW9yZGVyIHRoZSBkaWFtb25kcyBkYXRhIGZyYW1lLg0KDQpgYGB7cn0NCiMgU2V0IHNlZWQNCnNldC5zZWVkKDQyKQ0KDQojIFNodWZmbGUgcm93IGluZGljZXM6IHJvd3MNCnJvd3MgPC0gc2FtcGxlKG5yb3coZGlhbW9uZHMpKQ0KDQojIFJhbmRvbWx5IG9yZGVyIGRhdGENCmRpYW1vbmRzIDwtIGRpYW1vbmRzW3Jvd3MsIF0NCmBgYA0KDQojIyAxLjQ6IFRyeSBhbiA4MC8yMCBzcGxpdA0KDQpOb3cgdGhhdCB5b3VyIGRhdGFzZXQgaXMgcmFuZG9tbHkgb3JkZXJlZCwgeW91IGNhbiBzcGxpdCB0aGUgZmlyc3QgODAlIG9mIGl0IGludG8gYSB0cmFpbmluZyBzZXQsIGFuZCB0aGUgbGFzdCAyMCUgaW50byBhIHRlc3Qgc2V0LiBZb3UgY2FuIGRvIHRoaXMgYnkgY2hvb3NpbmcgYSBzcGxpdCBwb2ludCBhcHByb3hpbWF0ZWx5IDgwJSBvZiB0aGUgd2F5IHRocm91Z2ggeW91ciBkYXRhOg0KDQpzcGxpdCA8LSByb3VuZChucm93KG15ZGF0YSkgKiAuODApDQpZb3UgY2FuIHRoZW4gdXNlIHRoaXMgcG9pbnQgdG8gYnJlYWsgb2ZmIHRoZSBmaXJzdCA4MCUgb2YgdGhlIGRhdGFzZXQgYXMgYSB0cmFpbmluZyBzZXQ6DQoNCm15ZGF0YVsxOnNwbGl0LCBdDQpBbmQgdGhlbiB5b3UgY2FuIHVzZSB0aGF0IHNhbWUgcG9pbnQgdG8gZGV0ZXJtaW5lIHRoZSB0ZXN0IHNldDoNCg0KbXlkYXRhWyhzcGxpdCArIDEpOm5yb3cobXlkYXRhKSwgXQ0KDQpJbnN0cnVjdGlvbnMNCg0KMTAwIFhQDQoNCi0gQ2hvb3NlIGEgcm93IGluZGV4IHRvIHNwbGl0IG9uIHNvIHRoYXQgdGhlIHNwbGl0IHBvaW50IGlzIGFwcHJveGltYXRlbHkgODAlIG9mIHRoZSB3YXkgdGhyb3VnaCB0aGUgZGlhbW9uZHMgZGF0YXNldC4gQ2FsbCB0aGlzIGluZGV4IHNwbGl0Lg0KDQotIENyZWF0ZSBhIHRyYWluaW5nIHNldCBjYWxsZWQgdHJhaW4gdXNpbmcgdGhhdCBpbmRleC4NCg0KLSBDcmVhdGUgYSB0ZXN0IHNldCBjYWxsZWQgdGVzdCB1c2luZyB0aGF0IGluZGV4Lg0KDQoNCmBgYHtyfQ0KIyBEZXRlcm1pbmUgcm93IHRvIHNwbGl0IG9uOiBzcGxpdA0Kc3BsaXQgPC0gcm91bmQobnJvdyhkaWFtb25kcykgKiAuODApDQoNCiMgQ3JlYXRlIHRyYWluDQp0cmFpbjwtZGlhbW9uZHNbMTpzcGxpdCwgXQ0KDQojIENyZWF0ZSB0ZXN0DQp0ZXN0PC1kaWFtb25kc1soc3BsaXQrMSk6bnJvdyhkaWFtb25kcyksIF0NCmBgYA0KDQoNCiMjIDEuNTogUHJlZGljdCBvbiB0ZXN0IHNldA0KDQpOb3cgdGhhdCB5b3UgaGF2ZSBhIHJhbmRvbWx5IHNwbGl0IHRyYWluaW5nIHNldCBhbmQgdGVzdCBzZXQsIHlvdSBjYW4gdXNlIHRoZSBsbSgpIGZ1bmN0aW9uIGFzIHlvdSBkaWQgaW4gdGhlIGZpcnN0IGV4ZXJjaXNlIHRvIGZpdCBhIG1vZGVsIHRvIHlvdXIgdHJhaW5pbmcgc2V0LCByYXRoZXIgdGhhbiB0aGUgZW50aXJlIGRhdGFzZXQuIFJlY2FsbCB0aGF0IHlvdSBjYW4gdXNlIHRoZSBmb3JtdWxhIGludGVyZmFjZSB0byB0aGUgbGluZWFyIHJlZ3Jlc3Npb24gZnVuY3Rpb24gdG8gZml0IGEgbW9kZWwgd2l0aCBhIHNwZWNpZmllZCB0YXJnZXQgdmFyaWFibGUgdXNpbmcgYWxsIG90aGVyIHZhcmlhYmxlcyBpbiB0aGUgZGF0YXNldCBhcyBwcmVkaWN0b3JzOg0KDQptb2QgPC0gbG0oeSB+IC4sIHRyYWluaW5nX2RhdGEpDQpZb3UgY2FuIHVzZSB0aGUgcHJlZGljdCgpIGZ1bmN0aW9uIHRvIG1ha2UgcHJlZGljdGlvbnMgZnJvbSB0aGF0IG1vZGVsIG9uIG5ldyBkYXRhLiBUaGUgbmV3IGRhdGFzZXQgbXVzdCBoYXZlIGFsbCBvZiB0aGUgY29sdW1ucyBmcm9tIHRoZSB0cmFpbmluZyBkYXRhLCBidXQgdGhleSBjYW4gYmUgaW4gYSBkaWZmZXJlbnQgb3JkZXIgd2l0aCBkaWZmZXJlbnQgdmFsdWVzLiBIZXJlLCByYXRoZXIgdGhhbiByZS1wcmVkaWN0aW5nIG9uIHRoZSB0cmFpbmluZyBzZXQsIHlvdSBjYW4gcHJlZGljdCBvbiB0aGUgdGVzdCBzZXQsIHdoaWNoIHlvdSBkaWQgbm90IHVzZSBmb3IgdHJhaW5pbmcgdGhlIG1vZGVsLiBUaGlzIHdpbGwgYWxsb3cgeW91IHRvIGRldGVybWluZSB0aGUgb3V0LW9mLXNhbXBsZSBlcnJvciBmb3IgdGhlIG1vZGVsIGluIHRoZSBuZXh0IGV4ZXJjaXNlOg0KDQpwIDwtIHByZWRpY3QobW9kZWwsIG5ld19kYXRhKQ0KDQpJbnN0cnVjdGlvbnMNCg0KMTAwIFhQDQoNCi0gRml0IGFuIGxtKCkgbW9kZWwgY2FsbGVkIG1vZGVsIHRvIHByZWRpY3QgcHJpY2UgdXNpbmcgYWxsIG90aGVyIHZhcmlhYmxlcyBhcyBjb3ZhcmlhdGVzLiBCZSBzdXJlIHRvIHVzZSB0aGUgdHJhaW5pbmcgc2V0LCB0cmFpbi4NCg0KLSBQcmVkaWN0IG9uIHRoZSB0ZXN0IHNldCwgdGVzdCwgdXNpbmcgcHJlZGljdCgpLiBTdG9yZSB0aGVzZSB2YWx1ZXMgaW4gYSB2ZWN0b3IgY2FsbGVkIHAuDQoNCmBgYHtyfQ0KIyBGaXQgbG0gbW9kZWwgb24gdHJhaW46IG1vZGVsDQptb2RlbDwtbG0ocHJpY2V+Lix0cmFpbikNCg0KIyBQcmVkaWN0IG9uIHRlc3Q6IHANCnA8LXByZWRpY3QobW9kZWwsbmV3ZGF0YT10ZXN0KQ0KYGBgDQoNCiMjIDEuNjogQ2FsY3VsYXRlIHRlc3Qgc2V0IFJNU0UgYnkgaGFuZA0KDQpOb3cgdGhhdCB5b3UgaGF2ZSBwcmVkaWN0aW9ucyBvbiB0aGUgdGVzdCBzZXQsIHlvdSBjYW4gdXNlIHRoZXNlIHByZWRpY3Rpb25zIHRvIGNhbGN1bGF0ZSBhbiBlcnJvciBtZXRyaWMgKGluIHRoaXMgY2FzZSBSTVNFKSBvbiB0aGUgdGVzdCBzZXQgYW5kIHNlZSBob3cgdGhlIG1vZGVsIHBlcmZvcm1zIG91dC1vZi1zYW1wbGUsIHJhdGhlciB0aGFuIGluLXNhbXBsZSBhcyB5b3UgZGlkIGluIHRoZSBmaXJzdCBleGVyY2lzZS4gWW91IGZpcnN0IGRvIHRoaXMgYnkgY2FsY3VsYXRpbmcgdGhlIGVycm9ycyBiZXR3ZWVuIHRoZSBwcmVkaWN0ZWQgZGlhbW9uZCBwcmljZXMgYW5kIHRoZSBhY3R1YWwgZGlhbW9uZCBwcmljZXMgYnkgc3VidHJhY3RpbmcgdGhlIHByZWRpY3Rpb25zIGZyb20gdGhlIGFjdHVhbCB2YWx1ZXMuDQoNCk9uY2UgeW91IGhhdmUgYW4gZXJyb3IgdmVjdG9yLCBjYWxjdWxhdGluZyBSTVNFIGlzIGFzIHNpbXBsZSBhcyBzcXVhcmluZyBpdCwgdGFraW5nIHRoZSBtZWFuLCB0aGVuIHRha2luZyB0aGUgc3F1YXJlIHJvb3Q6DQoNCnNxcnQobWVhbihlcnJvcl4yKSkNCg0KSW5zdHJ1Y3Rpb25zDQoNCjEwMCBYUA0KDQotIHRlc3QsIG1vZGVsLCBhbmQgcCBhcmUgbG9hZGVkIGluIHlvdXIgd29ya3NwYWNlLg0KDQotIENhbGN1bGF0ZSB0aGUgZXJyb3IgYmV0d2VlbiB0aGUgcHJlZGljdGlvbnMgb24gdGhlIHRlc3Qgc2V0IGFuZCB0aGUgYWN0dWFsIGRpYW1vbmQgcHJpY2VzIGluIHRoZSB0ZXN0IHNldC4gQ2FsbCB0aGlzIGVycm9yLg0KDQotIENhbGN1bGF0ZSBSTVNFIHVzaW5nIHRoaXMgZXJyb3IgdmVjdG9yLCBqdXN0IHByaW50aW5nIHRoZSByZXN1bHQgdG8gdGhlIGNvbnNvbGUuDQoNCmBgYHtyfQ0KIyBDb21wdXRlIGVycm9yczogZXJyb3INCmVycm9yPC1wLXRlc3QkcHJpY2UNCg0KIyBDYWxjdWxhdGUgUk1TRQ0KUk1TRTwtc3FydChtZWFuKGVycm9yXjIpKQ0KUk1TRQ0KYGBgDQoNCiMjIDEuNzogQ29tcGFyaW5nIG91dC1vZi1zYW1wbGUgUk1TRSB0byBpbi1zYW1wbGUgUk1TRQ0KDQpXaHkgaXMgdGhlIHRlc3Qgc2V0IFJNU0UgaGlnaGVyIHRoYW4gdGhlIHRyYWluaW5nIHNldCBSTVNFPw0KDQpBbnN3ZXIgdGhlIHF1ZXN0aW9uDQoNCjUwIFhQDQoNClBvc3NpYmxlIEFuc3dlcnMNCg0KMS4gQmVjYXVzZSB5b3Ugb3ZlcmZpdCB0aGUgdHJhaW5pbmcgc2V0IGFuZCB0aGUgdGVzdCBzZXQgY29udGFpbnMgZGF0YSB0aGUgbW9kZWwgaGFzbid0IHNlZW4gYmVmb3JlLg0KDQoyLiBCZWNhdXNlIHlvdSBzaG91bGQgbm90IHVzZSBhIHRlc3Qgc2V0IGF0IGFsbCBhbmQgaW5zdGVhZCBqdXN0IGxvb2sgYXQgZXJyb3Igb24gdGhlIHRyYWluaW5nIHNldC4NCg0KMy4gQmVjYXVzZSB0aGUgdGVzdCBzZXQgaGFzIGEgc21hbGxlciBzYW1wbGUgc2l6ZSB0aGUgdHJhaW5pbmcgc2V0IGFuZCB0aHVzIHRoZSBtZWFuIGVycm9yIGlzIGxvd2VyLg0KW2Fuc10NCg0KUmVtYXJrOiBUaG91Z2ggdGhlIHRlc3Qgc2V0IGhhcyB0aGUgc21hbGxlciBzYW1wbGUgc2l6ZSwgdGhlIG1lYW4gZXJyb3IgaXMgbm90IG5lY2Vzc2FyaWx5IGxvd2VyLg0KDQojIyAxLjg6IEFkdmFudGFnZSBvZiBjcm9zcy12YWxpZGF0aW9uDQoNCldoYXQgaXMgdGhlIGFkdmFudGFnZSBvZiBjcm9zcy12YWxpZGF0aW9uIG92ZXIgYSBzaW5nbGUgdHJhaW4vdGVzdCBzcGxpdD8NCg0KQW5zd2VyIHRoZSBxdWVzdGlvbg0KDQo1MCBYUA0KDQpQb3NzaWJsZSBBbnN3ZXJzDQoNCjEuIFRoZXJlIGlzIG5vIGFkdmFudGFnZSB0byBjcm9zcy12YWxpZGF0aW9uLCBqdXN0IGFzIHRoZXJlIGlzIG5vIGFkdmFudGFnZSB0byBhIHNpbmdsZSB0cmFpbi90ZXN0IHNwbGl0LiBZb3Ugc2hvdWxkIGJlIHZhbGlkYXRpbmcgeW91ciBtb2RlbHMgaW4tc2FtcGxlIHdpdGggYSBtZXRyaWMgbGlrZSBhZGp1c3RlZCBSMi4gDQoNCjIuIFlvdSBjYW4gcGljayB0aGUgYmVzdCB0ZXN0IHNldCB0byBtaW5pbWl6ZSB0aGUgcmVwb3J0ZWQgUk1TRSBvZiB5b3VyIG1vZGVsLg0KDQozLiBJdCBnaXZlcyB5b3UgbXVsdGlwbGUgZXN0aW1hdGVzIG9mIG91dC1vZi1zYW1wbGUgZXJyb3IsIHJhdGhlciB0aGFuIGEgc2luZ2xlIGVzdGltYXRlLiBbYW5zXQ0KDQpSZW1hcms6IElmIGFsbCBvZiB5b3VyIGVzdGltYXRlcyBnaXZlIHNpbWlsYXIgb3V0cHV0cywgeW91IGNhbiBiZSBtb3JlIGNlcnRhaW4gb2YgdGhlIG1vZGVsJ3MgYWNjdXJhY3kuIElmIHlvdXIgZXN0aW1hdGVzIGdpdmUgZGlmZmVyZW50IG91dHB1dHMsIHRoYXQgdGVsbHMgeW91IHRoZSBtb2RlbCBkb2VzIG5vdCBwZXJmb3JtIGNvbnNpc3RlbnRseSBhbmQgc3VnZ2VzdHMgYSBwcm9ibGVtIHdpdGggaXQuDQoNCiMjIDEuOTogMTAtZm9sZCBjcm9zcy12YWxpZGF0aW9uDQoNCkFzIHlvdSBzYXcgaW4gdGhlIHZpZGVvLCBhIGJldHRlciBhcHByb2FjaCB0byB2YWxpZGF0aW5nIG1vZGVscyBpcyB0byB1c2UgbXVsdGlwbGUgc3lzdGVtYXRpYyB0ZXN0IHNldHMsIHJhdGhlciB0aGFuIGEgc2luZ2xlIHJhbmRvbSB0cmFpbi90ZXN0IHNwbGl0LiBGb3J0dW5hdGVseSwgdGhlIGNhcmV0IHBhY2thZ2UgbWFrZXMgdGhpcyB2ZXJ5IGVhc3kgdG8gZG86DQoNCm1vZGVsIDwtIHRyYWluKHkgfiAuLCBteV9kYXRhKQ0KY2FyZXQgc3VwcG9ydHMgbWFueSB0eXBlcyBvZiBjcm9zcy12YWxpZGF0aW9uLCBhbmQgeW91IGNhbiBzcGVjaWZ5IHdoaWNoIHR5cGUgb2YgY3Jvc3MtdmFsaWRhdGlvbiBhbmQgdGhlIG51bWJlciBvZiBjcm9zcy12YWxpZGF0aW9uIGZvbGRzIHdpdGggdGhlIHRyYWluQ29udHJvbCgpIGZ1bmN0aW9uLCB3aGljaCB5b3UgcGFzcyB0byB0aGUgdHJDb250cm9sIGFyZ3VtZW50IGluIHRyYWluKCk6DQoNCm1vZGVsIDwtIHRyYWluKA0KICB5IH4gLiwgbXlfZGF0YSwNCiAgbWV0aG9kID0gImxtIiwNCiAgdHJDb250cm9sID0gdHJhaW5Db250cm9sKA0KICAgIG1ldGhvZCA9ICJjdiIsIG51bWJlciA9IDEwLA0KICAgIHZlcmJvc2VJdGVyID0gVFJVRQ0KICApDQopDQoNCkl0J3MgaW1wb3J0YW50IHRvIG5vdGUgdGhhdCB5b3UgcGFzcyB0aGUgbWV0aG9kIGZvciBtb2RlbGluZyB0byB0aGUgbWFpbiB0cmFpbigpIGZ1bmN0aW9uIGFuZCB0aGUgbWV0aG9kIGZvciBjcm9zcy12YWxpZGF0aW9uIHRvIHRoZSB0cmFpbkNvbnRyb2woKSBmdW5jdGlvbi4NCg0KSW5zdHJ1Y3Rpb25zDQoNCjEwMCBYUA0KDQotIEZpdCBhIGxpbmVhciByZWdyZXNzaW9uIHRvIG1vZGVsIHByaWNlIHVzaW5nIGFsbCBvdGhlciB2YXJpYWJsZXMgaW4gdGhlIGRpYW1vbmRzIGRhdGFzZXQgYXMgcHJlZGljdG9ycy4gDQotIFVzZSB0aGUgdHJhaW4oKSBmdW5jdGlvbiBhbmQgMTAtZm9sZCBjcm9zcy12YWxpZGF0aW9uLiAoTm90ZSB0aGF0IHdlJ3ZlIHRha2VuIGEgc3Vic2V0IG9mIHRoZSBmdWxsIGRpYW1vbmRzIGRhdGFzZXQgdG8gc3BlZWQgdXAgdGhpcyBvcGVyYXRpb24sIGJ1dCBpdCdzIHN0aWxsIG5hbWVkIGRpYW1vbmRzLikNCg0KLSBQcmludCB0aGUgbW9kZWwgdG8gdGhlIGNvbnNvbGUgYW5kIGV4YW1pbmUgdGhlIHJlc3VsdHMuDQoNCmBgYHtyfQ0KIyBGaXQgbG0gbW9kZWwgdXNpbmcgMTAtZm9sZCBDVjogbW9kZWwNCm1vZGVsIDwtIHRyYWluKA0KICBwcmljZX4uLCBkaWFtb25kcywNCiAgbWV0aG9kID0gImxtIiwNCiAgdHJDb250cm9sID0gdHJhaW5Db250cm9sKA0KICAgIG1ldGhvZCA9ICJjdiIsIG51bWJlciA9IDEwLA0KICAgIHZlcmJvc2VJdGVyID0gVFJVRQ0KICApDQopDQoNCiMgUHJpbnQgbW9kZWwgdG8gY29uc29sZQ0KbW9kZWwNCmBgYA0KDQojIyAxLjEwOiA1LWZvbGQgY3Jvc3MtdmFsaWRhdGlvbg0KDQpJbiB0aGlzIGNvdXJzZSwgeW91IHdpbGwgdXNlIGEgd2lkZSB2YXJpZXR5IG9mIGRhdGFzZXRzIHRvIGV4cGxvcmUgdGhlIGZ1bGwgZmxleGliaWxpdHkgb2YgdGhlIGNhcmV0IHBhY2thZ2UuIEhlcmUsIHlvdSB3aWxsIHVzZSB0aGUgZmFtb3VzIEJvc3RvbiBob3VzaW5nIGRhdGFzZXQsIHdoZXJlIHRoZSBnb2FsIGlzIHRvIHByZWRpY3QgbWVkaWFuIGhvbWUgdmFsdWVzIGluIHZhcmlvdXMgQm9zdG9uIHN1YnVyYnMuDQoNCllvdSBjYW4gdXNlIGV4YWN0bHkgdGhlIHNhbWUgY29kZSBhcyBpbiB0aGUgcHJldmlvdXMgZXhlcmNpc2UsIGJ1dCBjaGFuZ2UgdGhlIGRhdGFzZXQgdXNlZCBieSB0aGUgbW9kZWw6DQoNCm1vZGVsIDwtIHRyYWluKA0KICBtZWR2IH4gLiwgQm9zdG9uLA0KICBtZXRob2QgPSAibG0iLA0KICB0ckNvbnRyb2wgPSB0cmFpbkNvbnRyb2woDQogICAgbWV0aG9kID0gImN2IiwgbnVtYmVyID0gMTAsDQogICAgdmVyYm9zZUl0ZXIgPSBUUlVFDQogICkNCikNCk5leHQsIHlvdSBjYW4gcmVkdWNlIHRoZSBudW1iZXIgb2YgY3Jvc3MtdmFsaWRhdGlvbiBmb2xkcyBmcm9tIDEwIHRvIDUgdXNpbmcgdGhlIG51bWJlciBhcmd1bWVudCB0byB0aGUgdHJhaW5Db250cm9sKCkgYXJndW1lbnQ6DQoNCnRyQ29udHJvbCA9IHRyYWluQ29udHJvbCgNCiAgbWV0aG9kID0gImN2IiwgbnVtYmVyID0gNSwNCiAgdmVyYm9zZUl0ZXIgPSBUUlVFDQopDQoNCkluc3RydWN0aW9ucw0KDQoxMDAgWFANCg0KLSBGaXQgYW4gbG0oKSBtb2RlbCB0byB0aGUgQm9zdG9uIGhvdXNpbmcgZGF0YXNldCwgc3VjaCB0aGF0IG1lZHYgaXMgdGhlIHJlc3BvbnNlIHZhcmlhYmxlIGFuZCBhbGwgb3RoZXIgdmFyaWFibGVzIGFyZSBleHBsYW5hdG9yeSB2YXJpYWJsZXMuDQoNCi0gVXNlIDUtZm9sZCBjcm9zcy12YWxpZGF0aW9uIHJhdGhlciB0aGFuIDEwLWZvbGQgY3Jvc3MtdmFsaWRhdGlvbi4NCg0KLSBQcmludCB0aGUgbW9kZWwgdG8gdGhlIGNvbnNvbGUgYW5kIGluc3BlY3QgdGhlIHJlc3VsdHMuDQpgYGB7ciwgZWNobz1UUlVFLHJlc3VsdHM9J2hpZGUnLGZpZy5rZWVwPSdhbGwnfQ0KbGlicmFyeShNQVNTKSAjIEZvciBsb2FkaW5nIHRoZSBCb3N0b24gZGF0YXNldA0KYGBgDQoNCmBgYHtyfQ0KIyBGaXQgbG0gbW9kZWwgdXNpbmcgNS1mb2xkIENWOiBtb2RlbA0KbW9kZWwgPC0gdHJhaW4oDQogIG1lZHZ+LiwgQm9zdG9uLA0KICBtZXRob2QgPSAibG0iLA0KICB0ckNvbnRyb2wgPSB0cmFpbkNvbnRyb2woDQogICAgbWV0aG9kID0gImN2IiwgbnVtYmVyID0gNSwNCiAgICB2ZXJib3NlSXRlciA9IFRSVUUNCiAgKQ0KKQ0KDQojIFByaW50IG1vZGVsIHRvIGNvbnNvbGUNCm1vZGVsDQpgYGANCg0KIyMgMS4xMTogNSB4IDUtZm9sZCBjcm9zcy12YWxpZGF0aW9uDQoNCllvdSBjYW4gZG8gbW9yZSB0aGFuIGp1c3Qgb25lIGl0ZXJhdGlvbiBvZiBjcm9zcy12YWxpZGF0aW9uLiBSZXBlYXRlZCBjcm9zcy12YWxpZGF0aW9uIGdpdmVzIHlvdSBhIGJldHRlciBlc3RpbWF0ZSBvZiB0aGUgdGVzdC1zZXQgZXJyb3IuIFlvdSBjYW4gYWxzbyByZXBlYXQgdGhlIGVudGlyZSBjcm9zcy12YWxpZGF0aW9uIHByb2NlZHVyZS4gVGhpcyB0YWtlcyBsb25nZXIsIGJ1dCBnaXZlcyB5b3UgbWFueSBtb3JlIG91dC1vZi1zYW1wbGUgZGF0YXNldHMgdG8gbG9vayBhdCBhbmQgbXVjaCBtb3JlIHByZWNpc2UgYXNzZXNzbWVudHMgb2YgaG93IHdlbGwgdGhlIG1vZGVsIHBlcmZvcm1zLg0KDQpPbmUgb2YgdGhlIGF3ZXNvbWUgdGhpbmdzIGFib3V0IHRoZSB0cmFpbigpIGZ1bmN0aW9uIGluIGNhcmV0IGlzIGhvdyBlYXN5IGl0IGlzIHRvIHJ1biB2ZXJ5IGRpZmZlcmVudCBtb2RlbHMgb3IgbWV0aG9kcyBvZiBjcm9zcy12YWxpZGF0aW9uIGp1c3QgYnkgdHdlYWtpbmcgYSBmZXcgc2ltcGxlIGFyZ3VtZW50cyB0byB0aGUgZnVuY3Rpb24gY2FsbC4gRm9yIGV4YW1wbGUsIHlvdSBjb3VsZCByZXBlYXQgeW91ciBlbnRpcmUgY3Jvc3MtdmFsaWRhdGlvbiBwcm9jZWR1cmUgNSB0aW1lcyBmb3IgZ3JlYXRlciBjb25maWRlbmNlIGluIHlvdXIgZXN0aW1hdGVzIG9mIHRoZSBtb2RlbCdzIG91dC1vZi1zYW1wbGUgYWNjdXJhY3ksIGUuZy46DQoNCnRyQ29udHJvbCA9IHRyYWluQ29udHJvbCgNCiAgbWV0aG9kID0gImN2IiwgbnVtYmVyID0gNSwNCiAgcmVwZWF0cyA9IDUsIHZlcmJvc2VJdGVyID0gVFJVRQ0KKQ0KDQpJbnN0cnVjdGlvbnMNCg0KMTAwIFhQDQoNCi0gUmUtZml0IHRoZSBsaW5lYXIgcmVncmVzc2lvbiBtb2RlbCB0byB0aGUgQm9zdG9uIGhvdXNpbmcgZGF0YXNldC4NCg0KLSBVc2UgNSByZXBlYXRzIG9mIDUtZm9sZCBjcm9zcy12YWxpZGF0aW9uLg0KDQotIFByaW50IHRoZSBtb2RlbCB0byB0aGUgY29uc29sZS4NCg0KYGBge3J9DQojIEZpdCBsbSBtb2RlbCB1c2luZyA1IHggNS1mb2xkIENWOiBtb2RlbA0KbW9kZWwgPC0gdHJhaW4oDQogIG1lZHYgfiAuLCBCb3N0b24sDQogIG1ldGhvZCA9ICJsbSIsDQogIHRyQ29udHJvbCA9IHRyYWluQ29udHJvbCgNCiAgICBtZXRob2QgPSAiY3YiLCBudW1iZXIgPSA1LA0KICAgIHJlcGVhdHMgPSA1LCB2ZXJib3NlSXRlciA9IFRSVUUNCiAgKQ0KKQ0KDQojIFByaW50IG1vZGVsIHRvIGNvbnNvbGUNCm1vZGVsDQpgYGANCg0KIyMgMS4xMjogTWFraW5nIHByZWRpY3Rpb25zIG9uIG5ldyBkYXRhDQoNCkZpbmFsbHksIHRoZSBtb2RlbCB5b3UgZml0IHdpdGggdGhlIHRyYWluKCkgZnVuY3Rpb24gaGFzIHRoZSBleGFjdCBzYW1lIHByZWRpY3QoKSBpbnRlcmZhY2UgYXMgdGhlIGxpbmVhciByZWdyZXNzaW9uIG1vZGVscyB5b3UgZml0IGVhcmxpZXIgaW4gdGhpcyBjaGFwdGVyLg0KDQpBZnRlciBmaXR0aW5nIGEgbW9kZWwgd2l0aCB0cmFpbigpLCB5b3UgY2FuIHNpbXBseSBjYWxsIHByZWRpY3QoKSB3aXRoIG5ldyBkYXRhLCBlLmc6DQoNCnByZWRpY3QobXlfbW9kZWwsIG5ld19kYXRhKQ0KDQpJbnN0cnVjdGlvbnMNCg0KMTAwIFhQDQoNCi0gVXNlIHRoZSBwcmVkaWN0KCkgZnVuY3Rpb24gdG8gbWFrZSBwcmVkaWN0aW9ucyB3aXRoIG1vZGVsIG9uIHRoZSBmdWxsIEJvc3RvbiBob3VzaW5nIGRhdGFzZXQuIFByaW50IHRoZSByZXN1bHQgdG8gdGhlIGNvbnNvbGUuDQoNCmBgYHtyfQ0KIyBQcmVkaWN0IG9uIGZ1bGwgQm9zdG9uIGRhdGFzZXQNCnA8LXByZWRpY3QobW9kZWwsQm9zdG9uKQ0KcA0KYGBgDQoNCg0K

Datacamp R - Machine Learning Toolbox : Chapter 1 (Regression models: fitting them and evaluating their performance)

Chen Weiqiang

November 28, 2018

Chapter 1: Regression models: fitting them and evaluating their performance

1.1: n-sample RMSE for linear regression

1.2: Out-of-sample RMSE for linear regression

1.3: Randomly order the data frame

1.4: Try an 80/20 split

1.5: Predict on test set

1.6: Calculate test set RMSE by hand

1.7: Comparing out-of-sample RMSE to in-sample RMSE

1.8: Advantage of cross-validation

1.9: 10-fold cross-validation

1.10: 5-fold cross-validation

1.11: 5 x 5-fold cross-validation

1.12: Making predictions on new data