In this lession, we will introduction the caret package. This package provides a common interface for creating supervised learning models using many different algorithms, as well as tools for performing tasks related to data preprocessing. It also provides tools for performing cross validation.
We will begin by loading the package.
library(caret)
As our first example of using caret, we will create a basic multiple linear regression model. For this example, we will use the NYC Restaurant dataset.
nyc <- read.table("data/nyc.txt", sep="\t", header = TRUE)
summary(nyc)
Price Food Decor Service Wait East
Min. :19.0 Min. :16.0 Min. : 6.00 Min. :14.0 Min. : 0.00 Min. :0.000
1st Qu.:36.0 1st Qu.:19.0 1st Qu.:16.00 1st Qu.:18.0 1st Qu.:16.75 1st Qu.:0.000
Median :43.0 Median :20.5 Median :18.00 Median :20.0 Median :23.00 Median :1.000
Mean :42.7 Mean :20.6 Mean :17.69 Mean :19.4 Mean :22.92 Mean :0.631
3rd Qu.:50.0 3rd Qu.:22.0 3rd Qu.:19.00 3rd Qu.:21.0 3rd Qu.:29.00 3rd Qu.:1.000
Max. :65.0 Max. :25.0 Max. :25.00 Max. :24.0 Max. :46.00 Max. :1.000
Supervised learning models are created in caret by using the train function. You can specify the reponse variable and the predictors using the familiar formula notation (using ~), or by providing separate dataframes for the features and reponse. The train function also requires you to specify a method argument, which determines the type of model being fit.
model_1 <- train(Price ~ ., nyc, method="lm")
#model_1a <- train(nyc[,-1], nyc$Price, method = "lm" )
summary(model_1)
Call:
lm(formula = .outcome ~ ., data = dat)
Residuals:
Min 1Q Median 3Q Max
-14.3315 -3.9098 -0.2242 3.3561 17.7499
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -25.16816 4.78350 -5.261 4.47e-07 ***
Food 1.55401 0.36844 4.218 4.09e-05 ***
Decor 1.91615 0.21663 8.845 1.49e-15 ***
Service -0.04571 0.39688 -0.115 0.9085
Wait 0.06796 0.05311 1.280 0.2025
East 2.04599 0.94505 2.165 0.0319 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.727 on 162 degrees of freedom
Multiple R-squared: 0.6316, Adjusted R-squared: 0.6202
F-statistic: 55.55 on 5 and 162 DF, p-value: < 2.2e-16
We can use the predict function to generate predictions based on a model created with train. We will use predict to generate predictions for our training set, which we will then use to recalculate our training r-Squared value.
predictions <- predict(model_1, nyc)
SSE <- sum((nyc$Price - predictions)^2)
SST <- sum((nyc$Price - mean(nyc$Price))^2)
r2 <- 1 - SSE / SST
r2
[1] 0.6316051
We can also instruct train to produce estimates of out-of-sample performance metrics by performing cross-validation. This is done by specifying a trainControl argument.
set.seed(1)
tc <- trainControl(method="cv", number=10)
model_2 <- train(Price ~ ., nyc, method="lm", trControl=tc)
summary(model_2)
Call:
lm(formula = .outcome ~ ., data = dat)
Residuals:
Min 1Q Median 3Q Max
-14.3315 -3.9098 -0.2242 3.3561 17.7499
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -25.16816 4.78350 -5.261 4.47e-07 ***
Food 1.55401 0.36844 4.218 4.09e-05 ***
Decor 1.91615 0.21663 8.845 1.49e-15 ***
Service -0.04571 0.39688 -0.115 0.9085
Wait 0.06796 0.05311 1.280 0.2025
East 2.04599 0.94505 2.165 0.0319 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 5.727 on 162 degrees of freedom
Multiple R-squared: 0.6316, Adjusted R-squared: 0.6202
F-statistic: 55.55 on 5 and 162 DF, p-value: < 2.2e-16
The summary does not show the estimates generated by cross-validation. However, we can see these by using the model object itself.
model_2
Linear Regression
168 samples
5 predictor
No pre-processing
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 152, 151, 151, 150, 152, 151, ...
Resampling results:
RMSE Rsquared MAE
5.707805 0.6267498 4.549324
Tuning parameter 'intercept' was held constant at a value of TRUE
We can view the metrics for each of the 10 folds through the model’s resample attribute.
model_2$resample
We can also use cross-validation for performing hyperparameter tuning and model selection. Before providing an example of this, we will first illustrate the use of the expand.grid() function, which is useful for creating dataframes consisting of combinations of hyperparameter values.
expand.grid(c1 = c(1, 2, 3), c2 = c('A', 'B'))
We will now perform KNN regression on the nyc dataset, using cross-validation to select an appropriate value for K. The hyperparameter values are supplied to the train() function in the form of the tuneGrid parameter. The train function will automatically select the best set of hyperparameters according to the cross-validation scores. By default, train performs this selection based on RMSE for regression problems, but we can instruct it to instead use r-squared with the metric parameter.
set.seed(1)
tc <- trainControl(method="cv", number=10,
selectionFunction="Rsquared")
param_grid <- expand.grid(k = c(1:40))
model_3 <- train(Price ~ ., nyc, method="knn",
preProcess=c("center", "scale"),
tuneGrid=param_grid, trainControl=tc,
metric="Rsquared")
model_3
k-Nearest Neighbors
168 samples
5 predictor
Pre-processing: centered (5), scaled (5)
Resampling: Bootstrapped (25 reps)
Summary of sample sizes: 168, 168, 168, 168, 168, 168, ...
Resampling results across tuning parameters:
k RMSE Rsquared MAE
1 8.133346 0.3651926 6.559035
2 7.630758 0.4065960 6.110262
3 7.188013 0.4411575 5.747229
4 6.868136 0.4695338 5.508320
5 6.620856 0.4967033 5.293771
6 6.520592 0.5078362 5.206962
7 6.449212 0.5163969 5.119928
8 6.382981 0.5259114 5.021135
9 6.379889 0.5266769 5.014644
10 6.420765 0.5221343 5.035897
11 6.427816 0.5226087 5.032147
12 6.435852 0.5222320 5.031862
13 6.459572 0.5193200 5.048249
14 6.444286 0.5232384 5.017959
15 6.443646 0.5244012 5.013767
16 6.434772 0.5272125 5.002048
17 6.433166 0.5293690 5.010073
18 6.426012 0.5323696 5.017149
19 6.405804 0.5378944 5.002702
20 6.362845 0.5470030 4.958886
21 6.347289 0.5516032 4.948972
22 6.356725 0.5517104 4.966295
23 6.368628 0.5523172 4.973502
24 6.344537 0.5591353 4.958955
25 6.334406 0.5626367 4.949254
26 6.320043 0.5670252 4.941686
27 6.328378 0.5686898 4.944207
28 6.344985 0.5686375 4.951599
29 6.347670 0.5693341 4.954045
30 6.363316 0.5685342 4.952608
31 6.365253 0.5698286 4.956516
32 6.360889 0.5711098 4.959845
33 6.369985 0.5708191 4.971895
34 6.384071 0.5705291 4.981946
35 6.400461 0.5693288 4.995635
36 6.425944 0.5673021 5.017832
37 6.434182 0.5677234 5.031035
38 6.446933 0.5673075 5.044687
39 6.466928 0.5658344 5.061033
40 6.476308 0.5664772 5.073236
Rsquared was used to select the optimal model using the largest value.
The final value used for the model was k = 32.
The hyperparameters resulting in the best model are stored in the bestTune attribute of the model object.
model_3$bestTune
The results attribute of the model object is a data frame containing cross-validation results for each combination of hyperparameters being considering.
model_3$results
We can use which.max to select out information relating to the model with the highest cross-validation r-squared value.
best_ix_3 = which.max(model_3$results$Rsquared)
model_3$results[best_ix_3, ]
We can extract individual columns of the results dataframe to generate plots of the cross-validation r-squared estimates as a function of K.
plot(1:40, model_3$results$Rsquared,
main='Using Cross-Validation for Hyperparameter Tuning',
xlab='K', ylab='Cross-Validation r-Squared')
lines(1:40, model_3$results$Rsquared)
As a convenience, we can generate the same plot by simply passing the model object to the plot() function.
plot(model_3)
The results of the final model selected through cross-validation are stored in the finalModel attribute of the model object. However, the final model itself doesn’t take into account any preprocessing that was applied to the training set. Fortunately, the object returned by train also contains a preProcess attribute that contains our feature scaler.
scaled_data = predict(model_3$preProcess, nyc[1:5, 2:6])
predict(model_3$finalModel, scaled_data)
[1] 43.25000 39.78125 36.37500 39.68750 46.34375
To simplify matters, we can provide the object returned by train to the predict function directly. This will perform the necessary preprocessing steps prior to applying the final model.
predict(model_3, nyc[1:5, 2:6])
[1] 43.25000 39.78125 36.37500 39.68750 46.34375
After using cross-validation to select your final model, it is useful to perform an independent round of cross-validation to estimate out-of-sample performance.
set.seed(2)
tc <- trainControl(method="cv", number=10)
best_params_3 <- model_3$bestTune
model_3_best <- train(Price ~ ., nyc, method="knn",
preProcess=c("center", "scale"),
tuneGrid=best_params_3, trainControl=tc,
metric="Rsquared")
model_3_best
k-Nearest Neighbors
168 samples
5 predictor
Pre-processing: centered (5), scaled (5)
Resampling: Bootstrapped (25 reps)
Summary of sample sizes: 168, 168, 168, 168, 168, 168, ...
Resampling results:
RMSE Rsquared MAE
6.641145 0.5460618 5.214257
Tuning parameter 'k' was held constant at a value of 32
We will now now see an example of using caret to perform hyperparameter tuning on an elasticnet model.
set.seed(1)
tc <- trainControl(method="cv", number=10)
param_grid <- expand.grid(alpha=seq(0,1, by=0.2),
lambda=seq(0,1,length=100))
model_4 <- train(Price ~ ., nyc, method="glmnet",
preProcess=c("center", "scale"),
tuneGrid=param_grid, trControl=tc,
metric="Rsquared")
best_ix_4 = which.max(model_4$results$Rsquared)
model_4$results[best_ix_4, ]
We will use plot() to see how the cross-validation estimates for r-squared vary with respect to the hyperparameters.
plot(model_4, pch="")
As a strange quirk, when using train with method="glmnet", the finalModel attribute contains a matrix of coefficient estimates for multiple values of lambda, but for the optimal value of alpha. We can select out the coefficients for the optimal model as follows:
coef(model_4$finalModel,
model_4$finalModel$lambdaOpt)
6 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) 42.6964286
Food 2.9917039
Decor 5.0963233
Service .
Wait 0.5020165
East 0.9210049
We will perform an independent round of cross-validation to assess out-of-sample performance.
set.seed(2)
best_params_4 <- model_4$bestTune
tc <- trainControl(method="cv", number=10)
model_4_best <- train(Price ~ ., nyc, method="glmnet",
preProcess=c("center", "scale"),
tuneGrid=best_params_4, trControl=tc,
metric="Rsquared")
model_4_best
glmnet
168 samples
5 predictor
Pre-processing: centered (5), scaled (5)
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 152, 152, 150, 150, 151, 151, ...
Resampling results:
RMSE Rsquared MAE
5.653275 0.6451659 4.51807
Tuning parameter 'alpha' was held constant at a value of 1
Tuning parameter 'lambda' was
held constant at a value of 0.07070707
The train function from caret will automatically take care of one-hot encoding qualitative features for us. We will illustrate this using the diamonds dataset.
diamonds <- read.table("data/diamonds.txt", sep="\t", header = TRUE)
diamonds <- diamonds[,c("carat", "cut", "color", "clarity", "price")]
summary(diamonds)
carat cut color clarity price
Min. :0.2000 Fair : 1610 D: 6775 SI1 :13065 Min. : 326
1st Qu.:0.4000 Good : 4906 E: 9797 VS2 :12258 1st Qu.: 950
Median :0.7000 Ideal :21551 F: 9542 SI2 : 9194 Median : 2401
Mean :0.7979 Premium :13791 G:11292 VS1 : 8171 Mean : 3933
3rd Qu.:1.0400 Very Good:12082 H: 8304 VVS2 : 5066 3rd Qu.: 5324
Max. :5.0100 I: 5422 VVS1 : 3655 Max. :18823
J: 2808 (Other): 2531
We will now us cross-validation to select elasticnet hyperparameters for this dataset.
set.seed(1)
tc <- trainControl(method="cv", number=10)
param_grid <- expand.grid(alpha=seq(0.2, 1, by=0.2),
lambda=exp(seq(-3, 2,length=100)))
model_5 <- train(price ~ ., diamonds, method="glmnet",
preProcess = c("range"),
tuneGrid=param_grid, trControl=tc,
metric="Rsquared")
Registered S3 methods overwritten by 'htmltools':
method from
print.html tools:rstudio
print.shiny.tag tools:rstudio
print.shiny.tag.list tools:rstudio
best_ix <- which.max(model_5$results$Rsquared)
model_5$results[best_ix, ]
We will now plot the cross-validation estimates for out-of-sample r-squared.
plot(model_5, pch="", xTrans=log)
The coefficients in our final model are displayed below.
coef(model_5$finalModel,
model_5$finalModel$lambdaOpt)
19 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) -5322.0557
carat 42668.8359
cutGood 622.5196
cutIdeal 969.4578
cutPremium 839.3194
cutVery Good 819.9871
colorE -194.2664
colorF -285.6780
colorG -487.0998
colorH -959.7280
colorI -1418.2183
colorJ -2300.7144
clarityIF 5163.9155
claritySI1 3329.1471
claritySI2 2384.4157
clarityVS1 4287.2468
clarityVS2 3972.9075
clarityVVS1 4820.6038
clarityVVS2 4718.7002
In the code chunk below, we will retrain a new model using the best combination of hyperparameters found in order to estimate out-of-sample performance.
set.seed(2)
best_params_5 <- model_5$bestTune
tc <- trainControl(method="cv", number=10)
param_grid <- expand.grid(alpha=seq(0.2, 1, by=0.2),
lambda=exp(seq(-3, 2,length=100)))
model_5_best <- train(price ~ ., diamonds, method="glmnet",
preProcess = c("range"),
tuneGrid=best_params_5, trControl=tc,
metric="Rsquared")
model_5_best
glmnet
53940 samples
4 predictor
Pre-processing: re-scaling to [0, 1] (18)
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 48547, 48546, 48546, 48547, 48545, 48546, ...
Resampling results:
RMSE Rsquared MAE
1157.426 0.9158503 801.4574
Tuning parameter 'alpha' was held constant at a value of 0.8
Tuning parameter 'lambda' was
held constant at a value of 1.140325
We will now see how to use train to perform cross-validation to select elasticnet hyperparameters for logistic regression.
wbc <- read.table('data/breast_cancer.csv', header=TRUE, sep=',')
wbc$id <- NULL
summary(wbc)
diagnosis radius_mean texture_mean perimeter_mean area_mean smoothness_mean
B:357 Min. : 6.981 Min. : 9.71 Min. : 43.79 Min. : 143.5 Min. :0.05263
M:212 1st Qu.:11.700 1st Qu.:16.17 1st Qu.: 75.17 1st Qu.: 420.3 1st Qu.:0.08637
Median :13.370 Median :18.84 Median : 86.24 Median : 551.1 Median :0.09587
Mean :14.127 Mean :19.29 Mean : 91.97 Mean : 654.9 Mean :0.09636
3rd Qu.:15.780 3rd Qu.:21.80 3rd Qu.:104.10 3rd Qu.: 782.7 3rd Qu.:0.10530
Max. :28.110 Max. :39.28 Max. :188.50 Max. :2501.0 Max. :0.16340
compactness_mean concavity_mean concave.points_mean symmetry_mean
Min. :0.01938 Min. :0.00000 Min. :0.00000 Min. :0.1060
1st Qu.:0.06492 1st Qu.:0.02956 1st Qu.:0.02031 1st Qu.:0.1619
Median :0.09263 Median :0.06154 Median :0.03350 Median :0.1792
Mean :0.10434 Mean :0.08880 Mean :0.04892 Mean :0.1812
3rd Qu.:0.13040 3rd Qu.:0.13070 3rd Qu.:0.07400 3rd Qu.:0.1957
Max. :0.34540 Max. :0.42680 Max. :0.20120 Max. :0.3040
fractal_dimension_mean radius_se texture_se perimeter_se area_se
Min. :0.04996 Min. :0.1115 Min. :0.3602 Min. : 0.757 Min. : 6.802
1st Qu.:0.05770 1st Qu.:0.2324 1st Qu.:0.8339 1st Qu.: 1.606 1st Qu.: 17.850
Median :0.06154 Median :0.3242 Median :1.1080 Median : 2.287 Median : 24.530
Mean :0.06280 Mean :0.4052 Mean :1.2169 Mean : 2.866 Mean : 40.337
3rd Qu.:0.06612 3rd Qu.:0.4789 3rd Qu.:1.4740 3rd Qu.: 3.357 3rd Qu.: 45.190
Max. :0.09744 Max. :2.8730 Max. :4.8850 Max. :21.980 Max. :542.200
smoothness_se compactness_se concavity_se concave.points_se symmetry_se
Min. :0.001713 Min. :0.002252 Min. :0.00000 Min. :0.000000 Min. :0.007882
1st Qu.:0.005169 1st Qu.:0.013080 1st Qu.:0.01509 1st Qu.:0.007638 1st Qu.:0.015160
Median :0.006380 Median :0.020450 Median :0.02589 Median :0.010930 Median :0.018730
Mean :0.007041 Mean :0.025478 Mean :0.03189 Mean :0.011796 Mean :0.020542
3rd Qu.:0.008146 3rd Qu.:0.032450 3rd Qu.:0.04205 3rd Qu.:0.014710 3rd Qu.:0.023480
Max. :0.031130 Max. :0.135400 Max. :0.39600 Max. :0.052790 Max. :0.078950
fractal_dimension_se radius_worst texture_worst perimeter_worst area_worst
Min. :0.0008948 Min. : 7.93 Min. :12.02 Min. : 50.41 Min. : 185.2
1st Qu.:0.0022480 1st Qu.:13.01 1st Qu.:21.08 1st Qu.: 84.11 1st Qu.: 515.3
Median :0.0031870 Median :14.97 Median :25.41 Median : 97.66 Median : 686.5
Mean :0.0037949 Mean :16.27 Mean :25.68 Mean :107.26 Mean : 880.6
3rd Qu.:0.0045580 3rd Qu.:18.79 3rd Qu.:29.72 3rd Qu.:125.40 3rd Qu.:1084.0
Max. :0.0298400 Max. :36.04 Max. :49.54 Max. :251.20 Max. :4254.0
smoothness_worst compactness_worst concavity_worst concave.points_worst symmetry_worst
Min. :0.07117 Min. :0.02729 Min. :0.0000 Min. :0.00000 Min. :0.1565
1st Qu.:0.11660 1st Qu.:0.14720 1st Qu.:0.1145 1st Qu.:0.06493 1st Qu.:0.2504
Median :0.13130 Median :0.21190 Median :0.2267 Median :0.09993 Median :0.2822
Mean :0.13237 Mean :0.25427 Mean :0.2722 Mean :0.11461 Mean :0.2901
3rd Qu.:0.14600 3rd Qu.:0.33910 3rd Qu.:0.3829 3rd Qu.:0.16140 3rd Qu.:0.3179
Max. :0.22260 Max. :1.05800 Max. :1.2520 Max. :0.29100 Max. :0.6638
fractal_dimension_worst
Min. :0.05504
1st Qu.:0.07146
Median :0.08004
Mean :0.08395
3rd Qu.:0.09208
Max. :0.20750
We will now perform cross validation, and will display information relating to the best model.
set.seed(1)
tc <- trainControl(method="cv", number=10)
param_grid <- expand.grid(alpha=seq(0, 1, by=0.5),
lambda=exp(seq(-6, -2,length=20)))
model_6 <- train(diagnosis ~ ., wbc, method="glmnet", family="binomial",
preProcess=c("center", "scale"),
tuneGrid=param_grid, trControl=tc,
metric="Accuracy")
#model_6$bestTune
best_ix_6 <- which.max(model_6$results$Accuracy)
model_6$results[best_ix_6, ]
We will now plot the cross-validation estimates for out-of-sample accuracy.
plot(model_6, pch="", xTrans=log)
The coefficients for the best model found are shown below.
coef(model_6$finalModel,
model_6$finalModel$lambdaOpt)
31 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) -0.315639716
radius_mean 0.285506745
texture_mean 0.371914497
perimeter_mean 0.235269424
area_mean 0.310513202
smoothness_mean .
compactness_mean .
concavity_mean 0.462308660
concave.points_mean 0.734380315
symmetry_mean .
fractal_dimension_mean -0.219396320
radius_se 1.131353911
texture_se -0.074450743
perimeter_se 0.391405864
area_se 0.653019321
smoothness_se 0.104321239
compactness_se -0.535356211
concavity_se .
concave.points_se 0.001014363
symmetry_se -0.142199830
fractal_dimension_se -0.255251353
radius_worst 0.988305173
texture_worst 0.992847852
perimeter_worst 0.763287310
area_worst 0.858604086
smoothness_worst 0.671143708
compactness_worst .
concavity_worst 0.655779729
concave.points_worst 0.941234045
symmetry_worst 0.603554472
fractal_dimension_worst .
We will now retrain a new model with a slightly higher degree of regularization, but with a similar cross-validation score.
set.seed(2)
tc <- trainControl(method="cv", number=10)
param_grid <- expand.grid(alpha=0.5,
lambda=exp(-4))
model_6_alt <- train(diagnosis ~ ., wbc, method="glmnet", family="binomial",
preProcess=c("center", "scale"),
tuneGrid=param_grid, trControl=tc,
metric="Accuracy")
model_6_alt
glmnet
569 samples
30 predictor
2 classes: 'B', 'M'
Pre-processing: centered (30), scaled (30)
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 512, 513, 511, 512, 511, 512, ...
Resampling results:
Accuracy Kappa
0.9736475 0.9426502
Tuning parameter 'alpha' was held constant at a value of 0.5
Tuning parameter 'lambda' was
held constant at a value of 0.01831564
The coefficients of this alternate model are shown below.
coef(model_6_alt$finalModel,
model_6_alt$finalModel$lambdaOpt)
31 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) -0.5741890
radius_mean 0.3141484
texture_mean 0.2490084
perimeter_mean 0.2857658
area_mean 0.2378809
smoothness_mean .
compactness_mean .
concavity_mean 0.1064340
concave.points_mean 0.4673911
symmetry_mean .
fractal_dimension_mean .
radius_se 0.4429637
texture_se .
perimeter_se 0.1416516
area_se 0.1355298
smoothness_se .
compactness_se .
concavity_se .
concave.points_se .
symmetry_se .
fractal_dimension_se -0.1075690
radius_worst 0.6555300
texture_worst 0.5875112
perimeter_worst 0.5590332
area_worst 0.4669748
smoothness_worst 0.4270019
compactness_worst .
concavity_worst 0.2660941
concave.points_worst 0.6538142
symmetry_worst 0.3069766
fractal_dimension_worst .
In this final example, we will use train to perform hyperparameter tuning for KNN classification.
set.seed(1)
tc <- trainControl(method="cv", number=10)
param_grid <- expand.grid(k=1:60)
model_7 <- train(diagnosis ~ ., wbc, method="knn",
preProcess=c("center", "scale"),
tuneGrid=param_grid, trControl=tc,
metric="Accuracy")
#model_6$bestTune
best_ix_7 <- which.max(model_7$results$Accuracy)
model_7$results[best_ix_7, ]
We will plot the cross-validation estimates as a function of K.
plot(model_7, pch="")
We will now use cross-validation to estimate the best model’s out-of-sample performance.
set.seed(2)
tc <- trainControl(method="cv", number=10)
best_params_7 <- model_7$bestTune
model_7_best <- train(diagnosis ~ ., wbc, method="knn",
preProcess=c("center", "scale"),
tuneGrid=best_params_7, trControl=tc,
metric="Accuracy")
model_7_best
k-Nearest Neighbors
569 samples
30 predictor
2 classes: 'B', 'M'
Pre-processing: centered (30), scaled (30)
Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 512, 513, 511, 512, 511, 512, ...
Resampling results:
Accuracy Kappa
0.9719558 0.9392185
Tuning parameter 'k' was held constant at a value of 10