Qualitative Approach
I consider a number of libraries to conduct this investigation:
library(xgboost)
library(caret)
library(ISLR)
library(plyr)
library(ggplot2)
I focus mainly on the the toy data named Hitters from the ISLR package.
ds <- Hitters
ds <- na.omit(ds)
head(ds)
AtBat Hits HmRun Runs RBI Walks Years CAtBat CHits CHmRun CRuns CRBI CWalks League
-Alan Ashby 315 81 7 24 38 39 14 3449 835 69 321 414 375 N
-Alvin Davis 479 130 18 66 72 76 3 1624 457 63 224 266 263 A
-Andre Dawson 496 141 20 65 78 37 11 5628 1575 225 828 838 354 N
-Andres Galarraga 321 87 10 39 42 30 2 396 101 12 48 46 33 N
-Alfredo Griffin 594 169 4 74 51 35 11 4408 1133 19 501 336 194 A
-Al Newman 185 37 1 23 8 21 2 214 42 1 30 9 24 N
Division PutOuts Assists Errors Salary NewLeague
-Alan Ashby W 632 43 10 475.0 N
-Alvin Davis W 880 82 14 480.0 A
-Andre Dawson E 200 11 3 500.0 N
-Andres Galarraga E 805 40 4 91.5 N
-Alfredo Griffin W 282 421 25 750.0 A
-Al Newman E 76 127 7 70.0 A
Similar to James et al (2013), I try to explain (for now) the salary of the player using two variables, Hits and Years. The response variable is given in thousands of $s. Since salary denotes the price level, we know that this variable is less likely to exhibit a bell-shaped distribution:
hist(ds$Salary)

On the other hand, if we consider the natural logarithm of the salary, we have
ds$Salary <- log(ds$Salary)
hist(ds$Salary)

Perspective on the Data
Before we start torturing the data, let’s take a descriptive approach to identify the relationship between the feature space and the response variable. Since we are dealing with a small dimension, this does not constitute a cumbersome task. To motivate this, let’s plot the the number of hits against the number of years. At the same time, let’s highlight the observations that correspond to the players with the highest and lowest salary quartiles.
plot(Hits~Years,data = ds, pch = 20, cex = 0.75)
points(Hits~Years,data = ds[ds$Salary > quantile(ds$Salary,0.75),],col = 3)
points(Hits~Years,data = ds[ds$Salary < quantile(ds$Salary,0.25),],col = 2)

The green highlighted dots indicate the salaries that are ranked on the top quartile in the sample. On the other hand, the red ones denote those in the lower quartile. Anything else refers to those in between. Intuitively, compensation should consider the performance, approximated by the number of hits, and the number of years as a proxy for experience. We observe that the more (less) compensated Baseball players are located on the top-right (bottom-left) on the plot above. While this is roughly the case, we also observe a few violations of this rationale.
We know that real-life facts do not exhibit linearity per se. If the data were fully linear, then a simple a linear regression model should capture the heterogeneity in the data very well, such that an increase in each dimension should imply an increase in the compensation. However, as I demonstrate later on in this vignette, real-life relationships do not necessarily confirm to such linearity.
Torturing Data and Machine Learning
The previous section provides a simple demonstration of the data and how one can qualitatively assess the classification of the data using simple descriptive statistics. In this section, I refer to more data-driven approach to uncover how moving from linearity to non-linearity improves the predictive power of the model. In particular, I deploy a linear regression with elastic net penalty, support vector machines (SVM) with linear and radial kernel, and extreme gradient boosting (XGB) for decision trees.
Training and Testing
Our investigation deploys a number of machine learning models. We will use the caret package mainly to achieve so. The performance of each model is established using training and testing samples.
To test the predictive power of each model, we split the sample into 80-20 training and testing sub-samples.
set.seed(13)
index <- 1:nrow(ds)
index_train <-sample(index,round(0.8*nrow(ds)) )
index_test <- index[-index_train]
To double check that both sample are mutually exclusive, we refer to the intersect command to confirm no overlapping observations:
intersect(index_test,index_train)
integer(0)
Hence, we have two sub-samples:
ds_train <- ds[index_train,]
ds_test <- ds[index_test,]
For the feature space, we use two variables - the number of hits and years of experience - whereas the response variable is the log-salary
x_var <- c("Hits","Years")
y_var <- "Salary"
The following commands run a ML algorithm by simply defining the name of the model, the training set, the testing set, the name of the regressors, and the name of the response variables. For each defined model, we run a loop to with 10-folds cross validation.
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 1,allowParallel = T)
model_list <- c("glmnet","svmLinear","svmRadial","xgbTree")
predict_y <- numeric()
model_formula <- formula(paste(y_var, " ~ " ,paste(x_var,collapse = " + ")))
for (model_i in model_list) {
set.seed(13)
if(model_i == "xgbTree") {
train_model <- train(model_formula, data = ds_train[,c(y_var,x_var)], method = model_i,
preProcess = c("center", "scale"))
}
else{
train_model <- train(model_formula, data = ds_train[,c(y_var,x_var)], method = model_i,
trControl=trctrl,
preProcess = c("center", "scale"),
tuneLength = 10)
}
y_hat_i <- predict(train_model,ds_test[,x_var])
predict_y <- cbind(predict_y,y_hat_i)
}
Note that the xgbTree does not require cross-validation, as the model is validated using the boosting approach inherit in the algorithm itself.
To test the performance of each, we regress the actual values versus each fitted ones as follows:
lm.list <- lapply(1:length(model_list), function(i) lm(predict_y[,i] ~ predict_y[,"Actual"]))
R_sq <- sapply(lm.list, function(x) summary(x)$adj)
R_sq
[1] 0.5610861 0.5357248 0.5999348 0.6442294
We observe that the non-linear models outperform the linear ones. Comparing between the non-linear SVM and xgbTree, we note that the latter returns a higher \(R^2\). However, to achieve greater confidence in such conclusion, we need to repeat this experiment using a bootstrap approach.
Bootstrap Performance
To gain more confidence in the above evidence, we repeat the same investigation 100 times. For each trial, we split the data randomly for training and testing using a distinct seed and repeat the same for cross-validation for tuning.
Let’s stack all of the above in a single function:
my_ml_function <- function(x_var,y_var,ds_train,ds_test,n) {
index <- 1:nrow(ds)
index_train <-sample(index,round(0.8*nrow(ds)) )
index_test <- index[-index_train]
ds_train <- ds[index_train,]
ds_test <- ds[index_test,]
trctrl <- trainControl(method = "repeatedcv", number = 10, repeats = 1)
predict_y <- numeric()
model_formula <- formula(paste(y_var, " ~ " ,paste(x_var,collapse = " + ")))
for (model_i in model_list) {
set.seed(n)
if(model_i == "xgbTree") {
train_model <- train(model_formula, data = ds_train[,c(y_var,x_var)], method = model_i,
preProcess = c("center", "scale"))
}
else{
train_model <- train(model_formula, data = ds_train[,c(y_var,x_var)], method = model_i,
trControl=trctrl,
preProcess = c("center", "scale"),
tuneLength = 10)
}
y_hat_i <- predict(train_model,ds_test[,x_var])
predict_y <- cbind(predict_y,y_hat_i)
}
colnames(predict_y) <- model_list
predict_y <- data.frame(predict_y,Actual = ds_test[,y_var])
lm.list <- lapply(1:length(model_list), function(i) lm(predict_y[,i] ~ predict_y[,"Actual"]))
R_sq <- sapply(lm.list, function(x) summary(x)$adj)
names(R_sq) <- model_list
return(R_sq)
}
Given the above function, we can run 100 repetitions to compute the distribution of the \(R^2\) resulting from each model:
library(parallel)
run_f <- function(n) my_ml_function(x_var,y_var,ds_train,ds_test,n)
R_list <- mclapply(1:100,run_f)
To determine the best fit among all models, we stack the results in a single data frame
ds_R <- data.frame(t(sapply(R_list,rbind)))
names(ds_R) <- model_list
summary(ds_R)
glmnet svmLinear svmRadial xgbTree
Min. :0.1336 Min. :0.1212 Min. :0.4029 Min. :0.3361
1st Qu.:0.3974 1st Qu.:0.3859 1st Qu.:0.5866 1st Qu.:0.6071
Median :0.4859 Median :0.4808 Median :0.6548 Median :0.6704
Mean :0.4792 Mean :0.4730 Mean :0.6535 Mean :0.6567
3rd Qu.:0.5729 3rd Qu.:0.5641 3rd Qu.:0.7248 3rd Qu.:0.7313
Max. :0.7388 Max. :0.7377 Max. :0.8317 Max. :0.8274
We observe that the SVM model with the non-linear kernel and the xgbTree perform the best in terms of average/median \(R^2\). These imply low bias in the performance of the non-linear models. At the same time, we also note that there is a lower variance associated with both non-linear models compared to the linear ones. This is can reflected by the standard deviation of the \(R^2\) from each model:
apply(ds_R,2,sd)
glmnet svmLinear svmRadial xgbTree
0.12333198 0.12651178 0.09600006 0.10619243
To visualize these insights, we refer to the violin plot from ggplot2:
ds.plot <- lapply(1:ncol(ds_R), function(i) data.frame(R = ds_R[,i], Model = names(ds_R)[i] ) )
ds.plot <- ldply(ds.plot,data.frame)
p <- ggplot(ds.plot,aes(x = factor(Model), y = R))
p <- p + geom_violin()
p <- p + geom_jitter(aes(colour = Model),size = 1,height = 0, width = 0.1)
p <- p + geom_abline(intercept = median(apply(ds_R,2,median)), slope = 0, linetype ="dashed")
p <- p + labs(x = "Model", y = "R Squared")
print(p)

From the above plot, we observe that the SVM with Radial kernel performs the best, exhibiting the lowest negative skewness.
LS0tDQp0aXRsZTogIkEgSG9yc2UgUmFjZSBvZiBDYXJldHMiDQojb3V0cHV0OiBybWFya2Rvd246OmdpdGh1Yl9kb2N1bWVudA0Kb3V0cHV0Og0KICBodG1sX25vdGVib29rOiBkZWZhdWx0DQogIHBkZl9kb2N1bWVudDogZGVmYXVsdA0KYXV0aG9yOiBNYWplZWQgU2ltYWFuDQpkYXRlOiAnYHIgZm9ybWF0KFN5cy5EYXRlKCksICIlWS0lQi0lZCIpYCcNCmZpZ193aWR0aDogNTANCi0tLQ0KDQoNCiMgT3ZlcnZpZXcNCkluIHRoaXMgdmlnbmV0dGUsIEkgY29tcGFyZSB0aGUgc3RhdGlzdGljYWwgcGVyZm9ybWFuY2Ugb2YgZm91ciBkaWZmZXJlbnQgbWFjaGluZSBsZWFybmluZyAoTUwpIGFsZ29yaXRobXMgdXNpbmcgdGhlIGBjYXJldGAgbGlicmFyeS4gVGhlIHBlcmZvcm1hbmNlIGNvbnNpZGVycyB0aGUgcHJlZGljdGl2ZSBwb3dlciBvZiBlYWNoLiBUbyBkZW1vbnN0cmF0ZSB0aGlzLCBJIHJlZmVyIHRvIHRoZSBgSGl0dGVyc2AgZGF0YSwgd2hpY2ggaXMgY29tbW9ubHkgdXNlZCBpbiBNTCBpbGx1c3RyYXRpb25zIChzZWUgZS5nLiwgSmFtZXMsIEcuLCBXaXR0ZW4sIEQuLCBIYXN0aWUsIFQuLCBcJiBUaWJzaGlyYW5pLCBSLiAoMjAxMykuIEFuIGludHJvZHVjdGlvbiB0byBzdGF0aXN0aWNhbCBsZWFybmluZyAoVm9sLiAxMTIsIHAuIDE4KS4gTmV3IFlvcms6IHNwcmluZ2VyKS4NCg0KDQojIFF1YWxpdGF0aXZlIEFwcHJvYWNoDQpJIGNvbnNpZGVyIGEgbnVtYmVyIG9mIGxpYnJhcmllcyB0byBjb25kdWN0IHRoaXMgaW52ZXN0aWdhdGlvbjoNCmBgYHtyLHdhcm5pbmc9RixtZXNzYWdlPUZ9DQpsaWJyYXJ5KHhnYm9vc3QpDQpsaWJyYXJ5KGNhcmV0KQ0KbGlicmFyeShJU0xSKQ0KbGlicmFyeShwbHlyKQ0KbGlicmFyeShnZ3Bsb3QyKQ0KYGBgDQpJIGZvY3VzIG1haW5seSBvbiB0aGUgdGhlIHRveSBkYXRhIG5hbWVkIGBIaXR0ZXJzYCBmcm9tIHRoZSBgSVNMUmAgcGFja2FnZS4NCmBgYHtyfQ0KZHMgPC0gSGl0dGVycw0KZHMgPC0gbmEub21pdChkcykNCmhlYWQoZHMpDQpgYGANCg0KU2ltaWxhciB0byBKYW1lcyBldCBhbCAoMjAxMyksIEkgdHJ5IHRvIGV4cGxhaW4gKGZvciBub3cpIHRoZSBzYWxhcnkgb2YgdGhlIHBsYXllciB1c2luZyB0d28gdmFyaWFibGVzLCBIaXRzIGFuZCBZZWFycy4gVGhlIHJlc3BvbnNlIHZhcmlhYmxlIGlzIGdpdmVuIGluIHRob3VzYW5kcyBvZiBcJHMuIFNpbmNlIHNhbGFyeSBkZW5vdGVzIHRoZSBwcmljZSBsZXZlbCwgd2Uga25vdyB0aGF0IHRoaXMgdmFyaWFibGUgaXMgbGVzcyBsaWtlbHkgdG8gZXhoaWJpdCBhIGJlbGwtc2hhcGVkIGRpc3RyaWJ1dGlvbjoNCmBgYHtyLGZpZy5hbGlnbiA9ICJjZW50ZXIifQ0KaGlzdChkcyRTYWxhcnkpDQpgYGANCk9uIHRoZSBvdGhlciBoYW5kLCBpZiB3ZSBjb25zaWRlciB0aGUgbmF0dXJhbCBsb2dhcml0aG0gb2YgdGhlIHNhbGFyeSwgd2UgaGF2ZQ0KYGBge3J9DQpkcyRTYWxhcnkgPC0gbG9nKGRzJFNhbGFyeSkNCmhpc3QoZHMkU2FsYXJ5KQ0KYGBgDQoNCiMjIFBlcnNwZWN0aXZlIG9uIHRoZSBEYXRhDQpCZWZvcmUgd2Ugc3RhcnQgdG9ydHVyaW5nIHRoZSBkYXRhLCBsZXQncyB0YWtlIGEgZGVzY3JpcHRpdmUgYXBwcm9hY2ggdG8gaWRlbnRpZnkgdGhlIHJlbGF0aW9uc2hpcCBiZXR3ZWVuIHRoZSBmZWF0dXJlIHNwYWNlIGFuZCB0aGUgcmVzcG9uc2UgdmFyaWFibGUuIFNpbmNlIHdlIGFyZSBkZWFsaW5nIHdpdGggYSBzbWFsbCBkaW1lbnNpb24sIHRoaXMgZG9lcyBub3QgY29uc3RpdHV0ZSBhIGN1bWJlcnNvbWUgdGFzay4gVG8gbW90aXZhdGUgdGhpcywgbGV0J3MgcGxvdCB0aGUgdGhlIG51bWJlciBvZiBoaXRzIGFnYWluc3QgdGhlIG51bWJlciBvZiB5ZWFycy4gQXQgdGhlIHNhbWUgdGltZSwgbGV0J3MgaGlnaGxpZ2h0IHRoZSBvYnNlcnZhdGlvbnMgdGhhdCBjb3JyZXNwb25kIHRvIHRoZSBwbGF5ZXJzIHdpdGggdGhlIGhpZ2hlc3QgYW5kIGxvd2VzdCBzYWxhcnkgcXVhcnRpbGVzLg0KYGBge3IsZmlnLmFsaWduID0gImNlbnRlciJ9DQpwbG90KEhpdHN+WWVhcnMsZGF0YSA9IGRzLCBwY2ggPSAyMCwgY2V4ID0gMC43NSkNCnBvaW50cyhIaXRzflllYXJzLGRhdGEgPSBkc1tkcyRTYWxhcnkgPiBxdWFudGlsZShkcyRTYWxhcnksMC43NSksXSxjb2wgPSAzKQ0KcG9pbnRzKEhpdHN+WWVhcnMsZGF0YSA9IGRzW2RzJFNhbGFyeSA8IHF1YW50aWxlKGRzJFNhbGFyeSwwLjI1KSxdLGNvbCA9IDIpDQoNCmBgYA0KVGhlIGdyZWVuIGhpZ2hsaWdodGVkIGRvdHMgaW5kaWNhdGUgdGhlIHNhbGFyaWVzIHRoYXQgYXJlIHJhbmtlZCBvbiB0aGUgdG9wIHF1YXJ0aWxlIGluIHRoZSBzYW1wbGUuIE9uIHRoZSBvdGhlciBoYW5kLCB0aGUgcmVkIG9uZXMgZGVub3RlIHRob3NlIGluIHRoZSBsb3dlciBxdWFydGlsZS4gQW55dGhpbmcgZWxzZSByZWZlcnMgdG8gdGhvc2UgaW4gYmV0d2Vlbi4gSW50dWl0aXZlbHksIGNvbXBlbnNhdGlvbiBzaG91bGQgY29uc2lkZXIgdGhlIHBlcmZvcm1hbmNlLCBhcHByb3hpbWF0ZWQgYnkgdGhlIG51bWJlciBvZiBoaXRzLCBhbmQgdGhlIG51bWJlciBvZiB5ZWFycyBhcyBhIHByb3h5IGZvciBleHBlcmllbmNlLiBXZSBvYnNlcnZlIHRoYXQgdGhlIG1vcmUgKGxlc3MpIGNvbXBlbnNhdGVkIEJhc2ViYWxsIHBsYXllcnMgYXJlIGxvY2F0ZWQgb24gdGhlIHRvcC1yaWdodCAoYm90dG9tLWxlZnQpIG9uIHRoZSBwbG90IGFib3ZlLiBXaGlsZSB0aGlzIGlzIHJvdWdobHkgdGhlIGNhc2UsIHdlIGFsc28gb2JzZXJ2ZSBhIGZldyB2aW9sYXRpb25zIG9mIHRoaXMgcmF0aW9uYWxlLiANCg0KV2Uga25vdyB0aGF0IHJlYWwtbGlmZSBmYWN0cyBkbyBub3QgZXhoaWJpdCBsaW5lYXJpdHkgcGVyIHNlLiBJZiB0aGUgZGF0YSB3ZXJlIGZ1bGx5IGxpbmVhciwgdGhlbiBhIHNpbXBsZSBhIGxpbmVhciByZWdyZXNzaW9uIG1vZGVsIHNob3VsZCBjYXB0dXJlIHRoZSBoZXRlcm9nZW5laXR5IGluIHRoZSBkYXRhIHZlcnkgd2VsbCwgc3VjaCB0aGF0IGFuIGluY3JlYXNlIGluIGVhY2ggZGltZW5zaW9uIHNob3VsZCBpbXBseSBhbiBpbmNyZWFzZSBpbiB0aGUgY29tcGVuc2F0aW9uLiBIb3dldmVyLCBhcyBJIGRlbW9uc3RyYXRlIGxhdGVyIG9uIGluIHRoaXMgdmlnbmV0dGUsIHJlYWwtbGlmZSByZWxhdGlvbnNoaXBzIGRvIG5vdCBuZWNlc3NhcmlseSBjb25maXJtIHRvIHN1Y2ggbGluZWFyaXR5LiANCg0KDQojIFRvcnR1cmluZyBEYXRhIGFuZCBNYWNoaW5lIExlYXJuaW5nDQpUaGUgcHJldmlvdXMgc2VjdGlvbiBwcm92aWRlcyBhIHNpbXBsZSBkZW1vbnN0cmF0aW9uIG9mIHRoZSBkYXRhIGFuZCBob3cgb25lIGNhbiBxdWFsaXRhdGl2ZWx5IGFzc2VzcyB0aGUgY2xhc3NpZmljYXRpb24gb2YgdGhlIGRhdGEgdXNpbmcgc2ltcGxlIGRlc2NyaXB0aXZlIHN0YXRpc3RpY3MuIEluIHRoaXMgc2VjdGlvbiwgSSByZWZlciB0byBtb3JlIGRhdGEtZHJpdmVuIGFwcHJvYWNoIHRvIHVuY292ZXIgaG93IG1vdmluZyBmcm9tIGxpbmVhcml0eSB0byBub24tbGluZWFyaXR5IGltcHJvdmVzIHRoZSBwcmVkaWN0aXZlIHBvd2VyIG9mIHRoZSBtb2RlbC4gSW4gcGFydGljdWxhciwgSSBkZXBsb3kgYSBsaW5lYXIgcmVncmVzc2lvbiB3aXRoIGVsYXN0aWMgbmV0IHBlbmFsdHksIHN1cHBvcnQgdmVjdG9yIG1hY2hpbmVzIChTVk0pIHdpdGggbGluZWFyIGFuZCByYWRpYWwga2VybmVsLCBhbmQgZXh0cmVtZSBncmFkaWVudCBib29zdGluZyAoWEdCKSBmb3IgZGVjaXNpb24gdHJlZXMuDQoNCiMjIFRyYWluaW5nIGFuZCBUZXN0aW5nDQpPdXIgaW52ZXN0aWdhdGlvbiBkZXBsb3lzIGEgbnVtYmVyIG9mIG1hY2hpbmUgbGVhcm5pbmcgbW9kZWxzLiBXZSB3aWxsIHVzZSB0aGUgYGNhcmV0YCBwYWNrYWdlIG1haW5seSB0byBhY2hpZXZlIHNvLiBUaGUgcGVyZm9ybWFuY2Ugb2YgZWFjaCBtb2RlbCBpcyBlc3RhYmxpc2hlZCB1c2luZyB0cmFpbmluZyBhbmQgdGVzdGluZyBzYW1wbGVzLiANCg0KVG8gdGVzdCB0aGUgcHJlZGljdGl2ZSBwb3dlciBvZiBlYWNoIG1vZGVsLCB3ZSBzcGxpdCB0aGUgc2FtcGxlIGludG8gODAtMjAgdHJhaW5pbmcgYW5kIHRlc3Rpbmcgc3ViLXNhbXBsZXMuDQpgYGB7cn0NCnNldC5zZWVkKDEzKQ0KaW5kZXggPC0gMTpucm93KGRzKQ0KaW5kZXhfdHJhaW4gPC1zYW1wbGUoaW5kZXgscm91bmQoMC44Km5yb3coZHMpKSApDQppbmRleF90ZXN0IDwtIGluZGV4Wy1pbmRleF90cmFpbl0NCmBgYA0KVG8gZG91YmxlIGNoZWNrIHRoYXQgYm90aCBzYW1wbGUgYXJlIG11dHVhbGx5IGV4Y2x1c2l2ZSwgd2UgcmVmZXIgdG8gdGhlIGBpbnRlcnNlY3RgIGNvbW1hbmQgdG8gY29uZmlybSBubyBvdmVybGFwcGluZyBvYnNlcnZhdGlvbnM6DQpgYGB7cn0NCmludGVyc2VjdChpbmRleF90ZXN0LGluZGV4X3RyYWluKQ0KYGBgDQpIZW5jZSwgd2UgaGF2ZSB0d28gc3ViLXNhbXBsZXM6DQpgYGB7cn0NCmRzX3RyYWluIDwtIGRzW2luZGV4X3RyYWluLF0NCmRzX3Rlc3QgPC0gZHNbaW5kZXhfdGVzdCxdDQpgYGANCkZvciB0aGUgZmVhdHVyZSBzcGFjZSwgd2UgdXNlIHR3byB2YXJpYWJsZXMgLSB0aGUgbnVtYmVyIG9mIGhpdHMgYW5kIHllYXJzIG9mIGV4cGVyaWVuY2UgLSB3aGVyZWFzIHRoZSByZXNwb25zZSB2YXJpYWJsZSBpcyB0aGUgbG9nLXNhbGFyeQ0KYGBge3J9DQp4X3ZhciA8LSBjKCJIaXRzIiwiWWVhcnMiKQ0KeV92YXIgPC0gIlNhbGFyeSINCmBgYA0KDQpUaGUgZm9sbG93aW5nIGNvbW1hbmRzIHJ1biBhIE1MIGFsZ29yaXRobSBieSBzaW1wbHkgZGVmaW5pbmcgdGhlIG5hbWUgb2YgdGhlIG1vZGVsLCB0aGUgdHJhaW5pbmcgc2V0LCB0aGUgdGVzdGluZyBzZXQsIHRoZSBuYW1lIG9mIHRoZSByZWdyZXNzb3JzLCBhbmQgdGhlIG5hbWUgb2YgdGhlIHJlc3BvbnNlIHZhcmlhYmxlcy4gRm9yIGVhY2ggZGVmaW5lZCBtb2RlbCwgd2UgcnVuIGEgbG9vcCB0byB3aXRoIDEwLWZvbGRzIGNyb3NzIHZhbGlkYXRpb24uIA0KYGBge3IsbWVzc2FnZT1GLHdhcm5pbmc9Rn0NCnRyY3RybCA8LSB0cmFpbkNvbnRyb2wobWV0aG9kID0gInJlcGVhdGVkY3YiLCBudW1iZXIgPSAxMCwgcmVwZWF0cyA9IDEsYWxsb3dQYXJhbGxlbCA9IFQpDQptb2RlbF9saXN0IDwtIGMoImdsbW5ldCIsInN2bUxpbmVhciIsInN2bVJhZGlhbCIsInhnYlRyZWUiKQ0KcHJlZGljdF95IDwtIG51bWVyaWMoKQ0KbW9kZWxfZm9ybXVsYSA8LSBmb3JtdWxhKHBhc3RlKHlfdmFyLCAiIH4gIiAscGFzdGUoeF92YXIsY29sbGFwc2UgPSAiICsgIikpKQ0KDQpmb3IgKG1vZGVsX2kgaW4gbW9kZWxfbGlzdCkgew0KICBzZXQuc2VlZCgxMykNCiAgDQogIGlmKG1vZGVsX2kgPT0gInhnYlRyZWUiKSB7DQogICAgdHJhaW5fbW9kZWwgPC0gdHJhaW4obW9kZWxfZm9ybXVsYSwgZGF0YSA9IGRzX3RyYWluWyxjKHlfdmFyLHhfdmFyKV0sIG1ldGhvZCA9IG1vZGVsX2ksDQogICAgICAgICAgICAgICAgIHByZVByb2Nlc3MgPSBjKCJjZW50ZXIiLCAic2NhbGUiKSkNCiAgICB9DQogIGVsc2V7DQogIHRyYWluX21vZGVsIDwtIHRyYWluKG1vZGVsX2Zvcm11bGEsIGRhdGEgPSBkc190cmFpblssYyh5X3Zhcix4X3ZhcildLCBtZXRob2QgPSBtb2RlbF9pLA0KICAgICAgICAgICAgICAgICB0ckNvbnRyb2w9dHJjdHJsLA0KICAgICAgICAgICAgICAgICBwcmVQcm9jZXNzID0gYygiY2VudGVyIiwgInNjYWxlIiksDQogICAgICAgICAgICAgICAgIHR1bmVMZW5ndGggPSAxMCkNCiAgfQ0KICB5X2hhdF9pIDwtIHByZWRpY3QodHJhaW5fbW9kZWwsZHNfdGVzdFsseF92YXJdKQ0KICBwcmVkaWN0X3kgPC0gY2JpbmQocHJlZGljdF95LHlfaGF0X2kpDQp9DQoNCmNvbG5hbWVzKHByZWRpY3RfeSkgPC0gbW9kZWxfbGlzdA0KcHJlZGljdF95IDwtIGRhdGEuZnJhbWUocHJlZGljdF95LEFjdHVhbCA9IGRzX3Rlc3RbLHlfdmFyXSkNCmBgYA0KTm90ZSB0aGF0IHRoZSBgeGdiVHJlZWAgZG9lcyBub3QgcmVxdWlyZSBjcm9zcy12YWxpZGF0aW9uLCBhcyB0aGUgbW9kZWwgaXMgdmFsaWRhdGVkIHVzaW5nIHRoZSBib29zdGluZyBhcHByb2FjaCBpbmhlcml0IGluIHRoZSBhbGdvcml0aG0gaXRzZWxmLiANCg0KVG8gdGVzdCB0aGUgcGVyZm9ybWFuY2Ugb2YgZWFjaCwgd2UgcmVncmVzcyB0aGUgYWN0dWFsIHZhbHVlcyB2ZXJzdXMgZWFjaCBmaXR0ZWQgb25lcyBhcyAgZm9sbG93czogDQpgYGB7cn0NCmxtLmxpc3QgPC0gbGFwcGx5KDE6bGVuZ3RoKG1vZGVsX2xpc3QpLCBmdW5jdGlvbihpKSBsbShwcmVkaWN0X3lbLGldIH4gcHJlZGljdF95WywiQWN0dWFsIl0pKQ0KUl9zcSA8LSBzYXBwbHkobG0ubGlzdCwgZnVuY3Rpb24oeCkgc3VtbWFyeSh4KSRhZGopDQpSX3NxDQpgYGANCldlIG9ic2VydmUgdGhhdCB0aGUgbm9uLWxpbmVhciBtb2RlbHMgb3V0cGVyZm9ybSB0aGUgbGluZWFyIG9uZXMuIENvbXBhcmluZyBiZXR3ZWVuIHRoZSBub24tbGluZWFyIFNWTSBhbmQgeGdiVHJlZSwgd2Ugbm90ZSB0aGF0IHRoZSBsYXR0ZXIgcmV0dXJucyBhIGhpZ2hlciAkUl4yJC4gSG93ZXZlciwgdG8gYWNoaWV2ZSBncmVhdGVyIGNvbmZpZGVuY2UgaW4gc3VjaCBjb25jbHVzaW9uLCB3ZSBuZWVkIHRvIHJlcGVhdCB0aGlzIGV4cGVyaW1lbnQgdXNpbmcgYSBib290c3RyYXAgYXBwcm9hY2guIA0KDQojIyBCb290c3RyYXAgUGVyZm9ybWFuY2UNClRvIGdhaW4gbW9yZSBjb25maWRlbmNlIGluIHRoZSBhYm92ZSBldmlkZW5jZSwgd2UgcmVwZWF0IHRoZSBzYW1lIGludmVzdGlnYXRpb24gMTAwIHRpbWVzLiBGb3IgZWFjaCB0cmlhbCwgd2Ugc3BsaXQgdGhlIGRhdGEgcmFuZG9tbHkgZm9yIHRyYWluaW5nIGFuZCB0ZXN0aW5nIHVzaW5nIGEgZGlzdGluY3Qgc2VlZCBhbmQgcmVwZWF0ICB0aGUgc2FtZSBmb3IgY3Jvc3MtdmFsaWRhdGlvbiBmb3IgdHVuaW5nLiAgDQoNCkxldCdzIHN0YWNrIGFsbCBvZiB0aGUgYWJvdmUgaW4gYSBzaW5nbGUgZnVuY3Rpb246DQpgYGB7cn0NCm15X21sX2Z1bmN0aW9uIDwtIGZ1bmN0aW9uKHhfdmFyLHlfdmFyLGRzX3RyYWluLGRzX3Rlc3Qsbikgew0KICBpbmRleCA8LSAxOm5yb3coZHMpDQogIGluZGV4X3RyYWluIDwtc2FtcGxlKGluZGV4LHJvdW5kKDAuOCpucm93KGRzKSkgKQ0KICBpbmRleF90ZXN0IDwtIGluZGV4Wy1pbmRleF90cmFpbl0NCiAgZHNfdHJhaW4gPC0gZHNbaW5kZXhfdHJhaW4sXQ0KICBkc190ZXN0IDwtIGRzW2luZGV4X3Rlc3QsXQ0KICB0cmN0cmwgPC0gdHJhaW5Db250cm9sKG1ldGhvZCA9ICJyZXBlYXRlZGN2IiwgbnVtYmVyID0gMTAsIHJlcGVhdHMgPSAxKQ0KICBwcmVkaWN0X3kgPC0gbnVtZXJpYygpDQogIG1vZGVsX2Zvcm11bGEgPC0gZm9ybXVsYShwYXN0ZSh5X3ZhciwgIiB+ICIgLHBhc3RlKHhfdmFyLGNvbGxhcHNlID0gIiArICIpKSkNCiAgDQogIGZvciAobW9kZWxfaSBpbiBtb2RlbF9saXN0KSB7DQogICAgc2V0LnNlZWQobikNCiAgICBpZihtb2RlbF9pID09ICJ4Z2JUcmVlIikgew0KICAgIHRyYWluX21vZGVsIDwtIHRyYWluKG1vZGVsX2Zvcm11bGEsIGRhdGEgPSBkc190cmFpblssYyh5X3Zhcix4X3ZhcildLCBtZXRob2QgPSBtb2RlbF9pLA0KICAgICAgICAgICAgICAgICBwcmVQcm9jZXNzID0gYygiY2VudGVyIiwgInNjYWxlIikpDQogICAgfQ0KICBlbHNlew0KICB0cmFpbl9tb2RlbCA8LSB0cmFpbihtb2RlbF9mb3JtdWxhLCBkYXRhID0gZHNfdHJhaW5bLGMoeV92YXIseF92YXIpXSwgbWV0aG9kID0gbW9kZWxfaSwNCiAgICAgICAgICAgICAgICAgdHJDb250cm9sPXRyY3RybCwNCiAgICAgICAgICAgICAgICAgcHJlUHJvY2VzcyA9IGMoImNlbnRlciIsICJzY2FsZSIpLA0KICAgICAgICAgICAgICAgICB0dW5lTGVuZ3RoID0gMTApDQogIA0KICAgIH0gIA0KICAgIHlfaGF0X2kgPC0gcHJlZGljdCh0cmFpbl9tb2RlbCxkc190ZXN0Wyx4X3Zhcl0pDQogICAgcHJlZGljdF95IDwtIGNiaW5kKHByZWRpY3RfeSx5X2hhdF9pKQ0KICB9DQogIA0KICBjb2xuYW1lcyhwcmVkaWN0X3kpIDwtIG1vZGVsX2xpc3QNCiAgcHJlZGljdF95IDwtIGRhdGEuZnJhbWUocHJlZGljdF95LEFjdHVhbCA9IGRzX3Rlc3RbLHlfdmFyXSkNCiAgbG0ubGlzdCA8LSBsYXBwbHkoMTpsZW5ndGgobW9kZWxfbGlzdCksIGZ1bmN0aW9uKGkpIGxtKHByZWRpY3RfeVssaV0gfiBwcmVkaWN0X3lbLCJBY3R1YWwiXSkpDQogIFJfc3EgPC0gc2FwcGx5KGxtLmxpc3QsIGZ1bmN0aW9uKHgpIHN1bW1hcnkoeCkkYWRqKQ0KICBuYW1lcyhSX3NxKSA8LSBtb2RlbF9saXN0DQogIHJldHVybihSX3NxKQ0KfQ0KDQpgYGANCg0KDQpHaXZlbiB0aGUgYWJvdmUgZnVuY3Rpb24sIHdlIGNhbiBydW4gMTAwIHJlcGV0aXRpb25zIHRvIGNvbXB1dGUgdGhlIGRpc3RyaWJ1dGlvbiBvZiB0aGUgJFJeMiQgcmVzdWx0aW5nIGZyb20gZWFjaCBtb2RlbDogDQpgYGB7cix3YXJuaW5nPUYsbWVzc2FnZT1GfQ0KbGlicmFyeShwYXJhbGxlbCkNCnJ1bl9mIDwtIGZ1bmN0aW9uKG4pIG15X21sX2Z1bmN0aW9uKHhfdmFyLHlfdmFyLGRzX3RyYWluLGRzX3Rlc3QsbikNClJfbGlzdCA8LSBtY2xhcHBseSgxOjEwMCxydW5fZikNCmBgYA0KDQoNClRvIGRldGVybWluZSB0aGUgYmVzdCBmaXQgYW1vbmcgYWxsIG1vZGVscywgd2Ugc3RhY2sgdGhlIHJlc3VsdHMgaW4gYSBzaW5nbGUgZGF0YSBmcmFtZQ0KYGBge3J9DQpkc19SIDwtIGRhdGEuZnJhbWUodChzYXBwbHkoUl9saXN0LHJiaW5kKSkpDQpuYW1lcyhkc19SKSA8LSBtb2RlbF9saXN0DQpzdW1tYXJ5KGRzX1IpDQpgYGANCg0KV2Ugb2JzZXJ2ZSB0aGF0IHRoZSBTVk0gbW9kZWwgd2l0aCB0aGUgbm9uLWxpbmVhciBrZXJuZWwgYW5kIHRoZSB4Z2JUcmVlICBwZXJmb3JtIHRoZSBiZXN0IGluIHRlcm1zIG9mIGF2ZXJhZ2UvbWVkaWFuICRSXjIkLiBUaGVzZSBpbXBseSBsb3cgYmlhcyBpbiB0aGUgcGVyZm9ybWFuY2Ugb2YgdGhlIG5vbi1saW5lYXIgbW9kZWxzLiBBdCB0aGUgc2FtZSB0aW1lLCB3ZSBhbHNvIG5vdGUgdGhhdCB0aGVyZSBpcyBhIGxvd2VyIHZhcmlhbmNlIGFzc29jaWF0ZWQgd2l0aCBib3RoIG5vbi1saW5lYXIgbW9kZWxzIGNvbXBhcmVkIHRvIHRoZSBsaW5lYXIgb25lcy4NClRoaXMgaXMgY2FuIHJlZmxlY3RlZCBieSB0aGUgc3RhbmRhcmQgZGV2aWF0aW9uIG9mIHRoZSAkUl4yJCBmcm9tIGVhY2ggbW9kZWw6DQpgYGB7cn0NCmFwcGx5KGRzX1IsMixzZCkNCmBgYA0KVG8gdmlzdWFsaXplIHRoZXNlIGluc2lnaHRzLCB3ZSByZWZlciB0byB0aGUgdmlvbGluIHBsb3QgZnJvbSBgZ2dwbG90MmA6DQpgYGB7cixtZXNzYWdlPUYsd2FybmluZz1GLGZpZy5hbGlnbj0iY2VudGVyIn0NCmRzLnBsb3QgPC0gbGFwcGx5KDE6bmNvbChkc19SKSwgZnVuY3Rpb24oaSkgZGF0YS5mcmFtZShSID0gZHNfUlssaV0sIE1vZGVsID0gbmFtZXMoZHNfUilbaV0gKSApDQpkcy5wbG90IDwtIGxkcGx5KGRzLnBsb3QsZGF0YS5mcmFtZSkgIA0KICANCiAgDQpwIDwtIGdncGxvdChkcy5wbG90LGFlcyh4ID0gZmFjdG9yKE1vZGVsKSwgeSA9IFIpKSANCnAgPC0gcCArICBnZW9tX3Zpb2xpbigpIA0KcCA8LSBwICsgZ2VvbV9qaXR0ZXIoYWVzKGNvbG91ciA9IE1vZGVsKSxzaXplID0gMSxoZWlnaHQgPSAwLCB3aWR0aCA9IDAuMSkgDQpwIDwtIHAgKyBnZW9tX2FibGluZShpbnRlcmNlcHQgPSBtZWRpYW4oYXBwbHkoZHNfUiwyLG1lZGlhbikpLCBzbG9wZSA9IDAsIGxpbmV0eXBlID0iZGFzaGVkIikNCnAgPC0gcCArICBsYWJzKHggPSAiTW9kZWwiLCB5ID0gIlIgU3F1YXJlZCIpDQpwcmludChwKQ0KYGBgDQpGcm9tIHRoZSBhYm92ZSBwbG90LCB3ZSBvYnNlcnZlIHRoYXQgdGhlIFNWTSB3aXRoIFJhZGlhbCBrZXJuZWwgcGVyZm9ybXMgdGhlIGJlc3QsIGV4aGliaXRpbmcgdGhlIGxvd2VzdCBuZWdhdGl2ZSBza2V3bmVzcy4gDQoNCiMgQ29uY2x1c2lvbg0KV2UgcmFuIGEgaG9yc2UgcmFjZSBhbW9uZyA0IGRpZmZlcmVudCBhbGdvcml0aG1zIHVzaW5nIHRoZSBgY2FyZXRgIHBhY2thZ2UuIFdoaWxlIHdlIHVzZWQgYSBzaW1wbGUgdG95IGRhdGEsIHdlIG9ic2VydmVkIHNpZ25pZmljYW50IGltcHJvdmVtZW50IGluIHRoZSBwcmVkaWN0aW9uIHBvd2VyIG9mIHRoZSBub24tbGluZWFyIG1vZGVscyBvdmVyIHRoZSBsaW5lYXIgb25lcy4gQ29uc2lzdGVudCB3aXRoIHRoZSBzZW50aW1lbnQgYW1vbmcgTUwgdXNlcnMsIHRoZSB4Z2JUcmVlIHNlZW1zIGEgdmVyeSBzdHJvbmcgY2FuZGlkYXRlIGluIHRlcm1zIG9mIGltcHJvdmVkIGJpYXMgYW5kIHZhcmlhbmNlLiBOb25ldGhlbGVzcywgdGhlIG1ham9yIGNvbmNlcm4gd2l0aCBzdWNoIG1vZGVsIGlzIGNvbXB1dGF0aW9uYWwgZWZmaWNpZW5jeSwgZXNwZWNpYWxseSB3aGVuIGEgcmV0YWlsIGludmVzdG9yIG5lZWRzIHRvIHRyYWluIHN1Y2ggbW9kZWwgb24gYSByZWN1cnJpbmcgYmFzaXMuIEl0IGlzIHVuY2xlYXIgd2hldGhlciBzdWNoIG1vZGVsIHByb3ZpZGVzIHRoZSB1cHBlciBoYW5kIGNvbXBhcmVkIHRvIFNWTSB3aXRoIHJhZGlhbCBrZXJuZWwsIHdoaWNoIHJlc2VtYmxlcyBhbiBhcnRpZmljaWFsIG5ldXJhbCBuZXR3b3JrIHdpdGggYSBzaW5nbGUgaGlkZGVuIGxheWVyLiANCg0KDQo=