This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.
library(caret)
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).
library(mlbench)
data(Sonar)
str(Sonar[, 1:10])
'data.frame': 208 obs. of 10 variables:
$ V1 : num 0.02 0.0453 0.0262 0.01 0.0762 0.0286 0.0317 0.0519 0.0223 0.0164 ...
$ V2 : num 0.0371 0.0523 0.0582 0.0171 0.0666 0.0453 0.0956 0.0548 0.0375 0.0173 ...
$ V3 : num 0.0428 0.0843 0.1099 0.0623 0.0481 ...
$ V4 : num 0.0207 0.0689 0.1083 0.0205 0.0394 ...
$ V5 : num 0.0954 0.1183 0.0974 0.0205 0.059 ...
$ V6 : num 0.0986 0.2583 0.228 0.0368 0.0649 ...
$ V7 : num 0.154 0.216 0.243 0.11 0.121 ...
$ V8 : num 0.16 0.348 0.377 0.128 0.247 ...
$ V9 : num 0.3109 0.3337 0.5598 0.0598 0.3564 ...
$ V10: num 0.211 0.287 0.619 0.126 0.446 ...
library(caret)
set.seed(998)
inTraining <- createDataPartition(Sonar$Class, p = .75, list = FALSE)
training <- Sonar[ inTraining,]
testing <- Sonar[-inTraining,]
fitControl <- trainControl(## 10-fold CV
method = "repeatedcv",
number = 10,
## repeated ten times
repeats = 10)
set.seed(825)
gbmFit1 <- train(Class ~ ., data = training,
method = "gbm",
trControl = fitControl,
## This last option is actually one
## for gbm() that passes through
verbose = FALSE)
Loading required package: gbm
Loading required package: survival
Attaching package: 㤼㸱survival㤼㸲
The following object is masked from 㤼㸱package:caret㤼㸲:
cluster
Loading required package: splines
Loading required package: parallel
Loaded gbm 2.1.1
Loading required package: plyr
gbmFit1
Stochastic Gradient Boosting
157 samples
60 predictor
2 classes: 'M', 'R'
No pre-processing
Resampling: Cross-Validated (10 fold, repeated 10 times)
Summary of sample sizes: 142, 142, 140, 142, 142, 141, ...
Resampling results across tuning parameters:
interaction.depth n.trees Accuracy Kappa
1 50 0.7609191 0.5163703
1 100 0.7934216 0.5817734
1 150 0.7957647 0.5859190
2 50 0.7863652 0.5676063
2 100 0.8201029 0.6347544
2 150 0.8206863 0.6355834
3 50 0.7894118 0.5727717
3 100 0.8124314 0.6187470
3 150 0.8207647 0.6358243
Tuning parameter 'shrinkage' was held constant at a value of 0.1
Tuning parameter 'n.minobsinnode' was held constant at a value of 10
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were n.trees = 150, interaction.depth
= 3, shrinkage = 0.1 and n.minobsinnode = 10.
gbmGrid <- expand.grid(interaction.depth = c(1, 5, 9),
n.trees = (1:30)*50,
shrinkage = 0.1,
n.minobsinnode = 20)
nrow(gbmGrid)
[1] 90
set.seed(825)
gbmFit2 <- train(Class ~ ., data = training,
method = "gbm",
trControl = fitControl,
verbose = FALSE,
## Now specify the exact models
## to evaluate:
tuneGrid = gbmGrid)
gbmFit2
Stochastic Gradient Boosting
157 samples
60 predictor
2 classes: 'M', 'R'
No pre-processing
Resampling: Cross-Validated (10 fold, repeated 10 times)
Summary of sample sizes: 142, 142, 140, 142, 142, 141, ...
Resampling results across tuning parameters:
interaction.depth n.trees Accuracy Kappa
1 50 0.7507892 0.4959040
1 100 0.7780686 0.5508905
1 150 0.7922451 0.5795467
1 200 0.8026716 0.6006903
1 250 0.8045049 0.6043980
1 300 0.8020417 0.5992079
1 350 0.8013431 0.5978871
1 400 0.8015417 0.5979258
1 450 0.8020882 0.5989973
1 500 0.7990000 0.5930135
1 550 0.7996667 0.5939355
1 600 0.7963750 0.5873621
1 650 0.7983480 0.5917318
1 700 0.7963064 0.5870914
1 750 0.7951348 0.5851049
1 800 0.7987279 0.5926014
1 850 0.7975931 0.5901765
1 900 0.7956765 0.5865259
1 950 0.7975098 0.5902253
1 1000 0.8013431 0.5977354
1 1050 0.8014314 0.5980265
1 1100 0.8020147 0.5990251
1 1150 0.8045882 0.6040754
1 1200 0.7989265 0.5927789
1 1250 0.7994265 0.5936918
1 1300 0.7962647 0.5870218
1 1350 0.7963431 0.5870629
1 1400 0.7982966 0.5915434
1 1450 0.7976716 0.5902344
1 1500 0.7970049 0.5892220
5 50 0.7800441 0.5546265
5 100 0.8040613 0.6028649
5 150 0.8167598 0.6282865
5 200 0.8179412 0.6301288
5 250 0.8218529 0.6381953
5 300 0.8236078 0.6416049
5 350 0.8237745 0.6418016
5 400 0.8248946 0.6442506
5 450 0.8235294 0.6420833
5 500 0.8224314 0.6397173
5 550 0.8223946 0.6397534
5 600 0.8274828 0.6497662
5 650 0.8263064 0.6475455
5 700 0.8256397 0.6461268
5 750 0.8300613 0.6548908
5 800 0.8268897 0.6484123
5 850 0.8255564 0.6457125
5 900 0.8255564 0.6457101
5 950 0.8268480 0.6482644
5 1000 0.8275098 0.6497362
5 1050 0.8248897 0.6444087
5 1100 0.8255980 0.6460147
5 1150 0.8268064 0.6483050
5 1200 0.8273946 0.6494996
5 1250 0.8255564 0.6458752
5 1300 0.8249314 0.6445397
5 1350 0.8241814 0.6429755
5 1400 0.8262230 0.6471804
5 1450 0.8287647 0.6523344
5 1500 0.8275564 0.6499982
9 50 0.7846838 0.5639250
9 100 0.8130833 0.6209555
9 150 0.8143431 0.6233901
9 200 0.8224363 0.6397375
9 250 0.8166029 0.6280232
9 300 0.8256863 0.6461941
9 350 0.8187598 0.6320002
9 400 0.8211446 0.6372166
9 450 0.8210662 0.6371404
9 500 0.8223529 0.6389759
9 550 0.8231029 0.6407824
9 600 0.8249412 0.6446328
9 650 0.8262328 0.6470735
9 700 0.8275613 0.6500633
9 750 0.8268529 0.6483583
9 800 0.8262696 0.6474343
9 850 0.8275662 0.6496544
9 900 0.8274363 0.6494831
9 950 0.8249779 0.6445986
9 1000 0.8244730 0.6435258
9 1050 0.8225147 0.6396661
9 1100 0.8225147 0.6396644
9 1150 0.8218848 0.6386919
9 1200 0.8218848 0.6386434
9 1250 0.8193382 0.6333047
9 1300 0.8212230 0.6372190
9 1350 0.8200049 0.6346247
9 1400 0.8218848 0.6383332
9 1450 0.8218799 0.6384420
9 1500 0.8218799 0.6383696
Tuning parameter 'shrinkage' was held constant at a value of 0.1
Tuning parameter 'n.minobsinnode' was held constant at a value of 20
Accuracy was used to select the optimal model using the largest value.
The final values used for the model were n.trees = 750, interaction.depth
= 5, shrinkage = 0.1 and n.minobsinnode = 20.
trellis.par.set(caretTheme())
Note: The default device has been opened to honour attempt to modify trellis settings
plot(gbmFit2)
trellis.par.set(caretTheme())
plot(gbmFit2, metric = "Kappa")
trellis.par.set(caretTheme())
plot(gbmFit2, metric = "Kappa", plotType = "level",
scales = list(x = list(rot = 90)))
ggplot(gbmFit2)
Ignoring unknown aesthetics: shape
head(twoClassSummary)
1 function (data, lev = NULL, model = NULL)
2 {
3 lvls <- levels(data$obs)
4 if (length(lvls) > 2)
5 stop(paste("Your outcome has", length(lvls), "levels. The twoClassSummary() function isn't appropriate."))
6 requireNamespaceQuietStop("ModelMetrics")
fitControl <- trainControl(method = "repeatedcv",
number = 10,
repeats = 10,
## Estimate class probabilities
classProbs = TRUE,
## Evaluate performance using
## the following function
summaryFunction = twoClassSummary)
set.seed(825)
gbmFit3 <- train(Class ~ ., data = training,
method = "gbm",
trControl = fitControl,
verbose = FALSE,
tuneGrid = gbmGrid,
## Specify which metric to optimize
metric = "ROC")
gbmFit3
Stochastic Gradient Boosting
157 samples
60 predictor
2 classes: 'M', 'R'
No pre-processing
Resampling: Cross-Validated (10 fold, repeated 10 times)
Summary of sample sizes: 142, 142, 140, 142, 142, 141, ...
Resampling results across tuning parameters:
interaction.depth n.trees ROC Sens Spec
1 50 0.8584201 0.7968056 0.6976786
1 100 0.8717584 0.8177778 0.7312500
1 150 0.8706944 0.8284722 0.7492857
1 200 0.8739757 0.8393056 0.7596429
1 250 0.8754266 0.8408333 0.7617857
1 300 0.8755580 0.8402778 0.7566071
1 350 0.8735268 0.8379167 0.7578571
1 400 0.8710169 0.8430556 0.7526786
1 450 0.8710317 0.8429167 0.7539286
1 500 0.8710764 0.8409722 0.7498214
1 550 0.8696974 0.8408333 0.7508929
1 600 0.8706275 0.8404167 0.7442857
1 650 0.8722644 0.8356944 0.7537500
1 700 0.8734350 0.8380556 0.7467857
1 750 0.8710293 0.8320833 0.7508929
1 800 0.8716890 0.8368056 0.7537500
1 850 0.8715724 0.8356944 0.7523214
1 900 0.8726835 0.8320833 0.7523214
1 950 0.8725546 0.8345833 0.7535714
1 1000 0.8708557 0.8430556 0.7523214
1 1050 0.8705903 0.8393056 0.7564286
1 1100 0.8696801 0.8416667 0.7550000
1 1150 0.8716047 0.8463889 0.7550000
1 1200 0.8703423 0.8369444 0.7535714
1 1250 0.8710020 0.8368056 0.7546429
1 1300 0.8723289 0.8356944 0.7491071
1 1350 0.8714980 0.8394444 0.7451786
1 1400 0.8723834 0.8395833 0.7496429
1 1450 0.8727009 0.8395833 0.7483929
1 1500 0.8717932 0.8350000 0.7523214
5 50 0.8809673 0.8312500 0.7207143
5 100 0.8935962 0.8425000 0.7580357
5 150 0.8920561 0.8618056 0.7637500
5 200 0.8935888 0.8625000 0.7644643
5 250 0.8920139 0.8698611 0.7648214
5 300 0.8909449 0.8758333 0.7621429
5 350 0.8940823 0.8737500 0.7644643
5 400 0.8952034 0.8748611 0.7660714
5 450 0.8953894 0.8712500 0.7675000
5 500 0.8946131 0.8713889 0.7648214
5 550 0.8942113 0.8725000 0.7637500
5 600 0.8961706 0.8772222 0.7691071
5 650 0.8956920 0.8762500 0.7678571
5 700 0.8950719 0.8762500 0.7664286
5 750 0.8947371 0.8825000 0.7689286
5 800 0.8951290 0.8787500 0.7662500
5 850 0.8940749 0.8788889 0.7633929
5 900 0.8945213 0.8761111 0.7662500
5 950 0.8938343 0.8784722 0.7662500
5 1000 0.8933433 0.8809722 0.7650000
5 1050 0.8958110 0.8772222 0.7635714
5 1100 0.8956870 0.8775000 0.7650000
5 1150 0.8958656 0.8797222 0.7650000
5 1200 0.8944048 0.8808333 0.7650000
5 1250 0.8952108 0.8773611 0.7650000
5 1300 0.8943750 0.8773611 0.7633929
5 1350 0.8940997 0.8773611 0.7619643
5 1400 0.8929291 0.8800000 0.7635714
5 1450 0.8928795 0.8823611 0.7664286
5 1500 0.8928596 0.8800000 0.7664286
9 50 0.8746875 0.8234722 0.7391071
9 100 0.8913269 0.8561111 0.7625000
9 150 0.8917907 0.8590278 0.7621429
9 200 0.8914881 0.8647222 0.7726786
9 250 0.8932341 0.8609722 0.7648214
9 300 0.8910268 0.8765278 0.7662500
9 350 0.8897892 0.8694444 0.7594643
9 400 0.8911533 0.8666667 0.7678571
9 450 0.8916543 0.8652778 0.7691071
9 500 0.8907168 0.8734722 0.7621429
9 550 0.8913046 0.8712500 0.7664286
9 600 0.8925000 0.8736111 0.7678571
9 650 0.8920908 0.8737500 0.7701786
9 700 0.8923388 0.8736111 0.7732143
9 750 0.8932912 0.8747222 0.7703571
9 800 0.8928993 0.8713889 0.7730357
9 850 0.8925719 0.8773611 0.7687500
9 900 0.8914410 0.8772222 0.7687500
9 950 0.8914608 0.8737500 0.7675000
9 1000 0.8901538 0.8740278 0.7662500
9 1050 0.8902927 0.8712500 0.7650000
9 1100 0.8898065 0.8715278 0.7650000
9 1150 0.8895089 0.8690278 0.7666071
9 1200 0.8907118 0.8679167 0.7676786
9 1250 0.8903844 0.8666667 0.7635714
9 1300 0.8897148 0.8691667 0.7650000
9 1350 0.8898934 0.8691667 0.7621429
9 1400 0.8893899 0.8715278 0.7633929
9 1450 0.8889583 0.8690278 0.7662500
9 1500 0.8906944 0.8702778 0.7648214
Tuning parameter 'shrinkage' was held constant at a value of 0.1
Tuning parameter 'n.minobsinnode' was held constant at a value of 20
ROC was used to select the optimal model using the largest value.
The final values used for the model were n.trees = 600, interaction.depth
= 5, shrinkage = 0.1 and n.minobsinnode = 20.
whichTwoPct <- tolerance(gbmFit3$results, metric = "ROC",
tol = 2, maximize = TRUE)
cat("best model within 2 pct of best:\n")
best model within 2 pct of best:
predict(gbmFit3, newdata = head(testing))
[1] R R R R M M
Levels: M R
predict(gbmFit3, newdata = head(testing), type = "prob")
set.seed(825)
svmFit <- train(Class ~ ., data = training,
method = "svmRadial",
trControl = fitControl,
preProc = c("center", "scale"),
tuneLength = 8,
metric = "ROC")
Loading required package: kernlab
Attaching package: 㤼㸱kernlab㤼㸲
The following object is masked from 㤼㸱package:ggplot2㤼㸲:
alpha
svmFit
Support Vector Machines with Radial Basis Function Kernel
157 samples
60 predictor
2 classes: 'M', 'R'
Pre-processing: centered (60), scaled (60)
Resampling: Cross-Validated (10 fold, repeated 10 times)
Summary of sample sizes: 142, 142, 140, 142, 142, 141, ...
Resampling results across tuning parameters:
C ROC Sens Spec
0.25 0.8672371 0.7413889 0.7466071
0.50 0.9030134 0.8326389 0.7794643
1.00 0.9221577 0.8700000 0.7748214
2.00 0.9318601 0.8902778 0.7714286
4.00 0.9373735 0.8881944 0.7998214
8.00 0.9442411 0.9061111 0.8125000
16.00 0.9445164 0.9173611 0.8126786
32.00 0.9445164 0.9123611 0.8166071
Tuning parameter 'sigma' was held constant at a value of 0.0115025
ROC was used to select the optimal model using the largest value.
The final values used for the model were sigma = 0.0115025 and C = 16.
set.seed(825)
rdaFit <- train(Class ~ ., data = training,
method = "rda",
trControl = fitControl,
tuneLength = 4,
metric = "ROC")
1 package is needed for this model and is not installed. (klaR). Would you like to try to install it now?
1: yes
2: no
trellis.par.set(theme1)
bwplot(resamps, layout = c(3, 1))
trellis.par.set(caretTheme())
dotplot(resamps, metric = "ROC")
trellis.par.set(theme1)
xyplot(resamps, what = "BlandAltman")
splom(resamps)
difValues <- diff(resamps)
difValues
summary(difValues)
trellis.par.set(theme1)
bwplot(difValues, layout = c(3, 1))
trellis.par.set(caretTheme())
dotplot(difValues)
fitControl <- trainControl(method = "none", classProbs = TRUE)
set.seed(825)
gbmFit4 <- train(Class ~ ., data = training,
method = "gbm",
trControl = fitControl,
verbose = FALSE,
## Only a single model can be passed to the
## function when no resampling is used:
tuneGrid = data.frame(interaction.depth = 4,
n.trees = 100,
shrinkage = .1,
n.minobsinnode = 20),
metric = "ROC")
gbmFit4
predict(gbmFit4, newdata = head(testing))
predict(gbmFit4, newdata = head(testing), type = "prob")