I’ll take an example from Kuhn’s documentation on github. This uses a support vector machine with a radial kernel to distinguish between mines and rocks using sonar data. The particulars of the example are unimportant. What I’m interested in is the significance of the way that hyperparameter tuning is specified. In this example, there are two hyperparameters - C and sigma.
It is always a good idea to look at the dataframe of results which is one of the elements in the list produced by train.
library(mlbench)
data(Sonar)
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
library(tictoc)
set.seed(998)
inTraining <- createDataPartition(Sonar$Class, p = .75, list = FALSE)
training <- Sonar[ inTraining,]
testing <- Sonar[-inTraining,]
svmControl <- trainControl(method = "repeatedcv",
number = 10,
repeats = 10,
classProbs = TRUE)
tic()
set.seed(825)
svmFit <- train(Class ~ ., data = training,
method = "svmRadial",
trControl = svmControl,
preProc = c("center", "scale"),
metric = "Accuracy"
)
toc()
## 9.06 sec elapsed
str(svmFit$results)
## 'data.frame': 3 obs. of 6 variables:
## $ sigma : num 0.0133 0.0133 0.0133
## $ C : num 0.25 0.5 1
## $ Accuracy : num 0.756 0.826 0.833
## $ Kappa : num 0.512 0.648 0.663
## $ AccuracySD: num 0.0947 0.0933 0.0915
## $ KappaSD : num 0.188 0.188 0.186
svmFit$results
## sigma C Accuracy Kappa AccuracySD KappaSD
## 1 0.01334808 0.25 0.7560956 0.5115648 0.09470819 0.1879078
## 2 0.01334808 0.50 0.8257279 0.6480699 0.09325645 0.1884650
## 3 0.01334808 1.00 0.8334657 0.6625824 0.09145794 0.1858607
Looking at the dataframe of results, we can see that only one of the two hyperparameters is being varied in tuning.
Without specifying anything about tuning, the model used one value of sigma and three values of C. The highest value of C performed best, which is concerning because even higher values of C may have been superior. Let’s rerun this example with a tuneLength of 4.
tic()
svmFit4 <- train(Class ~ ., data = training,
method = "svmRadial",
trControl = svmControl,
preProc = c("center", "scale"),
metric = "Accuracy",
tuneLength = 4
)
toc()
## 10.599 sec elapsed
str(svmFit4$results)
## 'data.frame': 4 obs. of 6 variables:
## $ sigma : num 0.0105 0.0105 0.0105 0.0105
## $ C : num 0.25 0.5 1 2
## $ Accuracy : num 0.749 0.807 0.831 0.837
## $ Kappa : num 0.497 0.612 0.659 0.671
## $ AccuracySD: num 0.1013 0.0975 0.1042 0.1036
## $ KappaSD : num 0.203 0.196 0.21 0.209
svmFit4$results
## sigma C Accuracy Kappa AccuracySD KappaSD
## 1 0.01045575 0.25 0.7487010 0.4969536 0.10134733 0.2031292
## 2 0.01045575 0.50 0.8070343 0.6117418 0.09750854 0.1958670
## 3 0.01045575 1.00 0.8312892 0.6591425 0.10419507 0.2097231
## 4 0.01045575 2.00 0.8374289 0.6705278 0.10362762 0.2093253
Let’s do an explicit grid search and look at the results.
svmControl <- trainControl(method = "repeatedcv",
number = 10,
repeats = 10,
classProbs = TRUE,
search = "grid")
myGrid = expand.grid(sigma = c(.0133*.5,.0133, .0133*2),
C = .25*2^(0:9))
set.seed(825)
tic()
svmFitGrid <- train(Class ~ ., data = training,
method = "svmRadial",
trControl = svmControl,
preProc = c("center", "scale"),
metric = "Accuracy",
tuneGrid = myGrid
)
toc()
## 79.617 sec elapsed
svmFitGrid$bestTune
## sigma C
## 24 0.0266 2
str(svmFitGrid$results)
## 'data.frame': 30 obs. of 6 variables:
## $ sigma : num 0.00665 0.00665 0.00665 0.00665 0.00665 0.00665 0.00665 0.00665 0.00665 0.00665 ...
## $ C : num 0.25 0.5 1 2 4 8 16 32 64 128 ...
## $ Accuracy : num 0.743 0.791 0.823 0.835 0.835 ...
## $ Kappa : num 0.488 0.579 0.643 0.666 0.665 ...
## $ AccuracySD: num 0.0992 0.0955 0.0866 0.0921 0.0998 ...
## $ KappaSD : num 0.196 0.191 0.175 0.187 0.203 ...
svmFitGrid$results
## sigma C Accuracy Kappa AccuracySD KappaSD
## 1 0.00665 0.25 0.7427108 0.4880070 0.09915467 0.1963539
## 2 0.00665 0.50 0.7906618 0.5785797 0.09545817 0.1911605
## 3 0.00665 1.00 0.8232574 0.6428664 0.08664743 0.1752172
## 4 0.00665 2.00 0.8352941 0.6659340 0.09208516 0.1870780
## 5 0.00665 4.00 0.8349510 0.6651754 0.09981055 0.2033467
## 6 0.00665 8.00 0.8381127 0.6730638 0.09328164 0.1891359
## 7 0.00665 16.00 0.8554559 0.7078369 0.08801045 0.1792975
## 8 0.00665 32.00 0.8631593 0.7230779 0.08698057 0.1770567
## 9 0.00665 64.00 0.8587010 0.7137582 0.08764836 0.1789525
## 10 0.00665 128.00 0.8605760 0.7176856 0.08581121 0.1749325
## 11 0.01330 0.25 0.7535539 0.5053219 0.09882816 0.1988704
## 12 0.01330 0.50 0.8198162 0.6358223 0.08832488 0.1791638
## 13 0.01330 1.00 0.8366005 0.6693436 0.08855090 0.1789959
## 14 0.01330 2.00 0.8376789 0.6711748 0.09337690 0.1894411
## 15 0.01330 4.00 0.8555025 0.7077146 0.08464235 0.1719470
## 16 0.01330 8.00 0.8669240 0.7302613 0.08390210 0.1710780
## 17 0.01330 16.00 0.8625025 0.7210681 0.08647119 0.1763590
## 18 0.01330 32.00 0.8624289 0.7211191 0.08147900 0.1662476
## 19 0.01330 64.00 0.8638407 0.7238135 0.08327310 0.1698875
## 20 0.01330 128.00 0.8668873 0.7301284 0.08543898 0.1740992
## 21 0.02660 0.25 0.7714951 0.5435882 0.10205034 0.2035074
## 22 0.02660 0.50 0.8265956 0.6480008 0.08903800 0.1810125
## 23 0.02660 1.00 0.8460588 0.6879035 0.09196305 0.1868253
## 24 0.02660 2.00 0.8765270 0.7497404 0.08051664 0.1633974
## 25 0.02660 4.00 0.8673284 0.7307767 0.07873768 0.1605822
## 26 0.02660 8.00 0.8719167 0.7403930 0.07737708 0.1572292
## 27 0.02660 16.00 0.8731201 0.7423757 0.08430993 0.1725965
## 28 0.02660 32.00 0.8737500 0.7439557 0.07949121 0.1616518
## 29 0.02660 64.00 0.8681152 0.7325743 0.07835348 0.1595186
## 30 0.02660 128.00 0.8698750 0.7364092 0.07795720 0.1583606
What we might think we know now is that the best set of hyperparameters is at or above .0266 and somewhere between 1 and 4 for C. This would be true with only one independent variable, but we have two.
That leads me to do the following grid search.
tic()
myGrid = expand.grid(sigma = seq(.0266,.0276,length = 4),
C= seq(1,4,length=8))
set.seed(825)
svmFitGrid <- train(Class ~ ., data = training,
method = "svmRadial",
trControl = svmControl,
preProc = c("center", "scale"),
metric = "Accuracy",
tuneGrid = myGrid
)
toc()
## 87.784 sec elapsed
svmFitGrid$bestTune
## sigma C
## 13 0.02693333 2.714286
str(svmFitGrid$results)
## 'data.frame': 32 obs. of 6 variables:
## $ sigma : num 0.0266 0.0266 0.0266 0.0266 0.0266 ...
## $ C : num 1 1.43 1.86 2.29 2.71 ...
## $ Accuracy : num 0.852 0.867 0.866 0.871 0.871 ...
## $ Kappa : num 0.701 0.731 0.728 0.738 0.738 ...
## $ AccuracySD: num 0.0914 0.0816 0.0838 0.0792 0.0807 ...
## $ KappaSD : num 0.186 0.165 0.17 0.162 0.164 ...
svmFitGrid$results
## sigma C Accuracy Kappa AccuracySD KappaSD
## 1 0.02660000 1.000000 0.8524118 0.7006881 0.09142106 0.1860509
## 2 0.02660000 1.428571 0.8672181 0.7311064 0.08156032 0.1654614
## 3 0.02660000 1.857143 0.8660784 0.7283507 0.08382064 0.1703764
## 4 0.02660000 2.285714 0.8706936 0.7375466 0.07922409 0.1617124
## 5 0.02660000 2.714286 0.8708235 0.7382262 0.08071728 0.1637955
## 6 0.02660000 3.142857 0.8740221 0.7446576 0.07983947 0.1624798
## 7 0.02660000 3.571429 0.8687353 0.7338184 0.07874802 0.1602544
## 8 0.02660000 4.000000 0.8725319 0.7417149 0.07958232 0.1616581
## 9 0.02693333 1.000000 0.8498701 0.6958943 0.08940623 0.1811193
## 10 0.02693333 1.428571 0.8626520 0.7216902 0.08350391 0.1696351
## 11 0.02693333 1.857143 0.8693235 0.7350776 0.08083980 0.1643426
## 12 0.02693333 2.285714 0.8686985 0.7339020 0.08239036 0.1675798
## 13 0.02693333 2.714286 0.8762647 0.7492734 0.07854936 0.1596843
## 14 0.02693333 3.142857 0.8704951 0.7374441 0.08147722 0.1659853
## 15 0.02693333 3.571429 0.8695319 0.7354223 0.08125846 0.1655598
## 16 0.02693333 4.000000 0.8692819 0.7350513 0.08284989 0.1681833
## 17 0.02726667 1.000000 0.8511618 0.6986248 0.08770649 0.1775845
## 18 0.02726667 1.428571 0.8607402 0.7174493 0.08242213 0.1678406
## 19 0.02726667 1.857143 0.8705735 0.7375037 0.08032132 0.1634444
## 20 0.02726667 2.285714 0.8719902 0.7405382 0.07972295 0.1620289
## 21 0.02726667 2.714286 0.8707770 0.7380641 0.08241108 0.1677820
## 22 0.02726667 3.142857 0.8698750 0.7359527 0.08148406 0.1661733
## 23 0.02726667 3.571429 0.8718064 0.7399931 0.08033326 0.1635704
## 24 0.02726667 4.000000 0.8693701 0.7350354 0.08203403 0.1672314
## 25 0.02760000 1.000000 0.8494338 0.6948183 0.09065024 0.1843251
## 26 0.02760000 1.428571 0.8652868 0.7268443 0.08111100 0.1652686
## 27 0.02760000 1.857143 0.8692451 0.7350654 0.08398078 0.1705804
## 28 0.02760000 2.285714 0.8738235 0.7442553 0.07899395 0.1606439
## 29 0.02760000 2.714286 0.8706250 0.7379844 0.07688631 0.1560262
## 30 0.02760000 3.142857 0.8683137 0.7330160 0.08057737 0.1637872
## 31 0.02760000 3.571429 0.8687770 0.7337728 0.07915739 0.1611466
## 32 0.02760000 4.000000 0.8732451 0.7431499 0.07853615 0.1596658
Rather than doing a manual search, we can do a systematic search using the capabilities of caret.
tic()
adControl <- trainControl(method = "adaptive_cv",
number = 10, repeats = 10)
set.seed(825)
svmFitad <- train(Class ~ ., data = training,
method = "svmRadial",
trControl = adControl,
preProc = c("center", "scale"),
metric = "Accuracy",
tuneLength = 10
)
##
## Attaching package: 'kernlab'
## The following object is masked from 'package:ggplot2':
##
## alpha
toc()
## 7.287 sec elapsed
svmFitad$bestTune
## sigma C
## 10 0.01334808 8
str(svmFitad$results)
## 'data.frame': 10 obs. of 7 variables:
## $ sigma : num 0.0133 0.0133 0.0133 0.0133 0.0133 ...
## $ C : num 0.25 0.5 1 2 4 8 16 32 64 128
## $ Accuracy : num 0.758 0.783 0.851 0.855 0.855 ...
## $ Kappa : num 0.502 0.555 0.697 0.703 0.706 ...
## $ AccuracySD: num 0.0867 0.0788 0.0659 0.0587 0.0811 ...
## $ KappaSD : num 0.175 0.166 0.132 0.119 0.166 ...
## $ .B : int 5 5 6 7 63 100 6 6 6 6
svmFitad$results
## sigma C Accuracy Kappa AccuracySD KappaSD .B
## 1 0.01334808 0.25 0.7583333 0.5023541 0.08665264 0.1749826 5
## 2 0.01334808 0.50 0.7833333 0.5549347 0.07878196 0.1659528 5
## 3 0.01334808 1.00 0.8506944 0.6969047 0.06585003 0.1321128 6
## 6 0.01334808 2.00 0.8547619 0.7030778 0.05874993 0.1192128 7
## 8 0.01334808 4.00 0.8554233 0.7063767 0.08107726 0.1663978 63
## 10 0.01334808 8.00 0.8630539 0.7212616 0.07937917 0.1629252 100
## 5 0.01334808 16.00 0.8833333 0.7640228 0.04662200 0.0925252 6
## 7 0.01334808 32.00 0.8833333 0.7640228 0.04662200 0.0925252 6
## 9 0.01334808 64.00 0.8833333 0.7640228 0.04662200 0.0925252 6
## 4 0.01334808 128.00 0.8833333 0.7640228 0.04662200 0.0925252 6
Now I want to explore the ability of caret to do parallel processing.
library(doMC)
## Loading required package: foreach
## Loading required package: iterators
## Loading required package: parallel
# I have a new MB Pro with 8 cores.
registerDoMC(cores = 8)
We need to insert a line in the call to train.
adControl <- trainControl(method = "adaptive_cv",
number = 10, repeats = 10)
set.seed(825)
tic()
svmFitad <- train(Class ~ ., data = training,
method = "svmRadial",
trControl = adControl,
preProc = c("center", "scale"),
metric = "Accuracy",
tuneLength = 10,
allowParallel=TRUE
)
toc()
## 9.083 sec elapsed
svmFitad$bestTune
## sigma C
## 10 0.01334808 8
str(svmFitad$results)
## 'data.frame': 10 obs. of 7 variables:
## $ sigma : num 0.0133 0.0133 0.0133 0.0133 0.0133 ...
## $ C : num 0.25 0.5 1 2 4 8 16 32 64 128
## $ Accuracy : num 0.758 0.783 0.851 0.855 0.855 ...
## $ Kappa : num 0.502 0.555 0.697 0.703 0.706 ...
## $ AccuracySD: num 0.0867 0.0788 0.0659 0.0587 0.0811 ...
## $ KappaSD : num 0.175 0.166 0.132 0.119 0.166 ...
## $ .B : int 5 5 6 7 63 100 6 6 6 6
svmFitad$results
## sigma C Accuracy Kappa AccuracySD KappaSD .B
## 1 0.01334808 0.25 0.7583333 0.5023541 0.08665264 0.1749826 5
## 2 0.01334808 0.50 0.7833333 0.5549347 0.07878196 0.1659528 5
## 3 0.01334808 1.00 0.8506944 0.6969047 0.06585003 0.1321128 6
## 6 0.01334808 2.00 0.8547619 0.7030778 0.05874993 0.1192128 7
## 8 0.01334808 4.00 0.8554233 0.7063767 0.08107726 0.1663978 63
## 10 0.01334808 8.00 0.8630539 0.7212616 0.07937917 0.1629252 100
## 5 0.01334808 16.00 0.8833333 0.7640228 0.04662200 0.0925252 6
## 7 0.01334808 32.00 0.8833333 0.7640228 0.04662200 0.0925252 6
## 9 0.01334808 64.00 0.8833333 0.7640228 0.04662200 0.0925252 6
## 4 0.01334808 128.00 0.8833333 0.7640228 0.04662200 0.0925252 6