Hyperparameter Tuning

I’ll take an example from Kuhn’s documentation on github. This uses a support vector machine with a radial kernel to distinguish between mines and rocks using sonar data. The particulars of the example are unimportant. What I’m interested in is the significance of the way that hyperparameter tuning is specified. In this example, there are two hyperparameters - C and sigma.

It is always a good idea to look at the dataframe of results which is one of the elements in the list produced by train.

library(mlbench)
data(Sonar)
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
library(tictoc)
set.seed(998)
inTraining <- createDataPartition(Sonar$Class, p = .75, list = FALSE)
training <- Sonar[ inTraining,]
testing  <- Sonar[-inTraining,]


svmControl <- trainControl(method = "repeatedcv",
number = 10, 
repeats = 10,
classProbs = TRUE)

tic()
set.seed(825)
svmFit <- train(Class ~ ., data = training,
                method = "svmRadial", 
                trControl = svmControl, 
                preProc = c("center", "scale"),
                metric = "Accuracy"
                )
toc()
## 9.06 sec elapsed
str(svmFit$results)
## 'data.frame':    3 obs. of  6 variables:
##  $ sigma     : num  0.0133 0.0133 0.0133
##  $ C         : num  0.25 0.5 1
##  $ Accuracy  : num  0.756 0.826 0.833
##  $ Kappa     : num  0.512 0.648 0.663
##  $ AccuracySD: num  0.0947 0.0933 0.0915
##  $ KappaSD   : num  0.188 0.188 0.186
svmFit$results
##        sigma    C  Accuracy     Kappa AccuracySD   KappaSD
## 1 0.01334808 0.25 0.7560956 0.5115648 0.09470819 0.1879078
## 2 0.01334808 0.50 0.8257279 0.6480699 0.09325645 0.1884650
## 3 0.01334808 1.00 0.8334657 0.6625824 0.09145794 0.1858607

Looking at the dataframe of results, we can see that only one of the two hyperparameters is being varied in tuning.

Without specifying anything about tuning, the model used one value of sigma and three values of C. The highest value of C performed best, which is concerning because even higher values of C may have been superior. Let’s rerun this example with a tuneLength of 4.

tic()
svmFit4 <- train(Class ~ ., data = training,
                method = "svmRadial", 
                trControl = svmControl, 
                preProc = c("center", "scale"),
                metric = "Accuracy",
                tuneLength = 4
                )
toc()
## 10.599 sec elapsed
str(svmFit4$results)
## 'data.frame':    4 obs. of  6 variables:
##  $ sigma     : num  0.0105 0.0105 0.0105 0.0105
##  $ C         : num  0.25 0.5 1 2
##  $ Accuracy  : num  0.749 0.807 0.831 0.837
##  $ Kappa     : num  0.497 0.612 0.659 0.671
##  $ AccuracySD: num  0.1013 0.0975 0.1042 0.1036
##  $ KappaSD   : num  0.203 0.196 0.21 0.209
svmFit4$results
##        sigma    C  Accuracy     Kappa AccuracySD   KappaSD
## 1 0.01045575 0.25 0.7487010 0.4969536 0.10134733 0.2031292
## 2 0.01045575 0.50 0.8070343 0.6117418 0.09750854 0.1958670
## 3 0.01045575 1.00 0.8312892 0.6591425 0.10419507 0.2097231
## 4 0.01045575 2.00 0.8374289 0.6705278 0.10362762 0.2093253

Let’s do an explicit grid search and look at the results.

svmControl <- trainControl(method = "repeatedcv",
number = 10, 
repeats = 10,
classProbs = TRUE,
search = "grid")

myGrid = expand.grid(sigma = c(.0133*.5,.0133, .0133*2),
                     C = .25*2^(0:9)) 
set.seed(825)

tic()
svmFitGrid <- train(Class ~ ., data = training,
                method = "svmRadial", 
                trControl = svmControl, 
                preProc = c("center", "scale"),
                metric = "Accuracy",
                tuneGrid = myGrid
                )
toc()
## 79.617 sec elapsed
svmFitGrid$bestTune
##     sigma C
## 24 0.0266 2
str(svmFitGrid$results)
## 'data.frame':    30 obs. of  6 variables:
##  $ sigma     : num  0.00665 0.00665 0.00665 0.00665 0.00665 0.00665 0.00665 0.00665 0.00665 0.00665 ...
##  $ C         : num  0.25 0.5 1 2 4 8 16 32 64 128 ...
##  $ Accuracy  : num  0.743 0.791 0.823 0.835 0.835 ...
##  $ Kappa     : num  0.488 0.579 0.643 0.666 0.665 ...
##  $ AccuracySD: num  0.0992 0.0955 0.0866 0.0921 0.0998 ...
##  $ KappaSD   : num  0.196 0.191 0.175 0.187 0.203 ...
svmFitGrid$results
##      sigma      C  Accuracy     Kappa AccuracySD   KappaSD
## 1  0.00665   0.25 0.7427108 0.4880070 0.09915467 0.1963539
## 2  0.00665   0.50 0.7906618 0.5785797 0.09545817 0.1911605
## 3  0.00665   1.00 0.8232574 0.6428664 0.08664743 0.1752172
## 4  0.00665   2.00 0.8352941 0.6659340 0.09208516 0.1870780
## 5  0.00665   4.00 0.8349510 0.6651754 0.09981055 0.2033467
## 6  0.00665   8.00 0.8381127 0.6730638 0.09328164 0.1891359
## 7  0.00665  16.00 0.8554559 0.7078369 0.08801045 0.1792975
## 8  0.00665  32.00 0.8631593 0.7230779 0.08698057 0.1770567
## 9  0.00665  64.00 0.8587010 0.7137582 0.08764836 0.1789525
## 10 0.00665 128.00 0.8605760 0.7176856 0.08581121 0.1749325
## 11 0.01330   0.25 0.7535539 0.5053219 0.09882816 0.1988704
## 12 0.01330   0.50 0.8198162 0.6358223 0.08832488 0.1791638
## 13 0.01330   1.00 0.8366005 0.6693436 0.08855090 0.1789959
## 14 0.01330   2.00 0.8376789 0.6711748 0.09337690 0.1894411
## 15 0.01330   4.00 0.8555025 0.7077146 0.08464235 0.1719470
## 16 0.01330   8.00 0.8669240 0.7302613 0.08390210 0.1710780
## 17 0.01330  16.00 0.8625025 0.7210681 0.08647119 0.1763590
## 18 0.01330  32.00 0.8624289 0.7211191 0.08147900 0.1662476
## 19 0.01330  64.00 0.8638407 0.7238135 0.08327310 0.1698875
## 20 0.01330 128.00 0.8668873 0.7301284 0.08543898 0.1740992
## 21 0.02660   0.25 0.7714951 0.5435882 0.10205034 0.2035074
## 22 0.02660   0.50 0.8265956 0.6480008 0.08903800 0.1810125
## 23 0.02660   1.00 0.8460588 0.6879035 0.09196305 0.1868253
## 24 0.02660   2.00 0.8765270 0.7497404 0.08051664 0.1633974
## 25 0.02660   4.00 0.8673284 0.7307767 0.07873768 0.1605822
## 26 0.02660   8.00 0.8719167 0.7403930 0.07737708 0.1572292
## 27 0.02660  16.00 0.8731201 0.7423757 0.08430993 0.1725965
## 28 0.02660  32.00 0.8737500 0.7439557 0.07949121 0.1616518
## 29 0.02660  64.00 0.8681152 0.7325743 0.07835348 0.1595186
## 30 0.02660 128.00 0.8698750 0.7364092 0.07795720 0.1583606

What we might think we know now is that the best set of hyperparameters is at or above .0266 and somewhere between 1 and 4 for C. This would be true with only one independent variable, but we have two.

That leads me to do the following grid search.

tic()
myGrid = expand.grid(sigma = seq(.0266,.0276,length = 4),
C= seq(1,4,length=8))
set.seed(825)

svmFitGrid <- train(Class ~ ., data = training,
                method = "svmRadial", 
                trControl = svmControl, 
                preProc = c("center", "scale"),
                metric = "Accuracy",
                tuneGrid = myGrid
                )
toc()
## 87.784 sec elapsed
svmFitGrid$bestTune
##         sigma        C
## 13 0.02693333 2.714286
str(svmFitGrid$results)
## 'data.frame':    32 obs. of  6 variables:
##  $ sigma     : num  0.0266 0.0266 0.0266 0.0266 0.0266 ...
##  $ C         : num  1 1.43 1.86 2.29 2.71 ...
##  $ Accuracy  : num  0.852 0.867 0.866 0.871 0.871 ...
##  $ Kappa     : num  0.701 0.731 0.728 0.738 0.738 ...
##  $ AccuracySD: num  0.0914 0.0816 0.0838 0.0792 0.0807 ...
##  $ KappaSD   : num  0.186 0.165 0.17 0.162 0.164 ...
svmFitGrid$results
##         sigma        C  Accuracy     Kappa AccuracySD   KappaSD
## 1  0.02660000 1.000000 0.8524118 0.7006881 0.09142106 0.1860509
## 2  0.02660000 1.428571 0.8672181 0.7311064 0.08156032 0.1654614
## 3  0.02660000 1.857143 0.8660784 0.7283507 0.08382064 0.1703764
## 4  0.02660000 2.285714 0.8706936 0.7375466 0.07922409 0.1617124
## 5  0.02660000 2.714286 0.8708235 0.7382262 0.08071728 0.1637955
## 6  0.02660000 3.142857 0.8740221 0.7446576 0.07983947 0.1624798
## 7  0.02660000 3.571429 0.8687353 0.7338184 0.07874802 0.1602544
## 8  0.02660000 4.000000 0.8725319 0.7417149 0.07958232 0.1616581
## 9  0.02693333 1.000000 0.8498701 0.6958943 0.08940623 0.1811193
## 10 0.02693333 1.428571 0.8626520 0.7216902 0.08350391 0.1696351
## 11 0.02693333 1.857143 0.8693235 0.7350776 0.08083980 0.1643426
## 12 0.02693333 2.285714 0.8686985 0.7339020 0.08239036 0.1675798
## 13 0.02693333 2.714286 0.8762647 0.7492734 0.07854936 0.1596843
## 14 0.02693333 3.142857 0.8704951 0.7374441 0.08147722 0.1659853
## 15 0.02693333 3.571429 0.8695319 0.7354223 0.08125846 0.1655598
## 16 0.02693333 4.000000 0.8692819 0.7350513 0.08284989 0.1681833
## 17 0.02726667 1.000000 0.8511618 0.6986248 0.08770649 0.1775845
## 18 0.02726667 1.428571 0.8607402 0.7174493 0.08242213 0.1678406
## 19 0.02726667 1.857143 0.8705735 0.7375037 0.08032132 0.1634444
## 20 0.02726667 2.285714 0.8719902 0.7405382 0.07972295 0.1620289
## 21 0.02726667 2.714286 0.8707770 0.7380641 0.08241108 0.1677820
## 22 0.02726667 3.142857 0.8698750 0.7359527 0.08148406 0.1661733
## 23 0.02726667 3.571429 0.8718064 0.7399931 0.08033326 0.1635704
## 24 0.02726667 4.000000 0.8693701 0.7350354 0.08203403 0.1672314
## 25 0.02760000 1.000000 0.8494338 0.6948183 0.09065024 0.1843251
## 26 0.02760000 1.428571 0.8652868 0.7268443 0.08111100 0.1652686
## 27 0.02760000 1.857143 0.8692451 0.7350654 0.08398078 0.1705804
## 28 0.02760000 2.285714 0.8738235 0.7442553 0.07899395 0.1606439
## 29 0.02760000 2.714286 0.8706250 0.7379844 0.07688631 0.1560262
## 30 0.02760000 3.142857 0.8683137 0.7330160 0.08057737 0.1637872
## 31 0.02760000 3.571429 0.8687770 0.7337728 0.07915739 0.1611466
## 32 0.02760000 4.000000 0.8732451 0.7431499 0.07853615 0.1596658

Rather than doing a manual search, we can do a systematic search using the capabilities of caret.

tic()
adControl <- trainControl(method = "adaptive_cv",
                           number = 10, repeats = 10)
set.seed(825)

svmFitad <- train(Class ~ ., data = training,
                method = "svmRadial", 
                trControl = adControl, 
                preProc = c("center", "scale"),
                metric = "Accuracy",
                tuneLength = 10
                )
## 
## Attaching package: 'kernlab'
## The following object is masked from 'package:ggplot2':
## 
##     alpha
toc()
## 7.287 sec elapsed
svmFitad$bestTune
##         sigma C
## 10 0.01334808 8
str(svmFitad$results)
## 'data.frame':    10 obs. of  7 variables:
##  $ sigma     : num  0.0133 0.0133 0.0133 0.0133 0.0133 ...
##  $ C         : num  0.25 0.5 1 2 4 8 16 32 64 128
##  $ Accuracy  : num  0.758 0.783 0.851 0.855 0.855 ...
##  $ Kappa     : num  0.502 0.555 0.697 0.703 0.706 ...
##  $ AccuracySD: num  0.0867 0.0788 0.0659 0.0587 0.0811 ...
##  $ KappaSD   : num  0.175 0.166 0.132 0.119 0.166 ...
##  $ .B        : int  5 5 6 7 63 100 6 6 6 6
svmFitad$results
##         sigma      C  Accuracy     Kappa AccuracySD   KappaSD  .B
## 1  0.01334808   0.25 0.7583333 0.5023541 0.08665264 0.1749826   5
## 2  0.01334808   0.50 0.7833333 0.5549347 0.07878196 0.1659528   5
## 3  0.01334808   1.00 0.8506944 0.6969047 0.06585003 0.1321128   6
## 6  0.01334808   2.00 0.8547619 0.7030778 0.05874993 0.1192128   7
## 8  0.01334808   4.00 0.8554233 0.7063767 0.08107726 0.1663978  63
## 10 0.01334808   8.00 0.8630539 0.7212616 0.07937917 0.1629252 100
## 5  0.01334808  16.00 0.8833333 0.7640228 0.04662200 0.0925252   6
## 7  0.01334808  32.00 0.8833333 0.7640228 0.04662200 0.0925252   6
## 9  0.01334808  64.00 0.8833333 0.7640228 0.04662200 0.0925252   6
## 4  0.01334808 128.00 0.8833333 0.7640228 0.04662200 0.0925252   6

Now I want to explore the ability of caret to do parallel processing.

library(doMC)
## Loading required package: foreach
## Loading required package: iterators
## Loading required package: parallel
# I have a new MB Pro with 8 cores.
registerDoMC(cores = 8) 

We need to insert a line in the call to train.

adControl <- trainControl(method = "adaptive_cv",
                           number = 10, repeats = 10)
set.seed(825)

tic()
svmFitad <- train(Class ~ ., data = training,
                method = "svmRadial", 
                trControl = adControl, 
                preProc = c("center", "scale"),
                metric = "Accuracy",
                tuneLength = 10,
                allowParallel=TRUE
                )
toc()
## 9.083 sec elapsed
svmFitad$bestTune
##         sigma C
## 10 0.01334808 8
str(svmFitad$results)
## 'data.frame':    10 obs. of  7 variables:
##  $ sigma     : num  0.0133 0.0133 0.0133 0.0133 0.0133 ...
##  $ C         : num  0.25 0.5 1 2 4 8 16 32 64 128
##  $ Accuracy  : num  0.758 0.783 0.851 0.855 0.855 ...
##  $ Kappa     : num  0.502 0.555 0.697 0.703 0.706 ...
##  $ AccuracySD: num  0.0867 0.0788 0.0659 0.0587 0.0811 ...
##  $ KappaSD   : num  0.175 0.166 0.132 0.119 0.166 ...
##  $ .B        : int  5 5 6 7 63 100 6 6 6 6
svmFitad$results
##         sigma      C  Accuracy     Kappa AccuracySD   KappaSD  .B
## 1  0.01334808   0.25 0.7583333 0.5023541 0.08665264 0.1749826   5
## 2  0.01334808   0.50 0.7833333 0.5549347 0.07878196 0.1659528   5
## 3  0.01334808   1.00 0.8506944 0.6969047 0.06585003 0.1321128   6
## 6  0.01334808   2.00 0.8547619 0.7030778 0.05874993 0.1192128   7
## 8  0.01334808   4.00 0.8554233 0.7063767 0.08107726 0.1663978  63
## 10 0.01334808   8.00 0.8630539 0.7212616 0.07937917 0.1629252 100
## 5  0.01334808  16.00 0.8833333 0.7640228 0.04662200 0.0925252   6
## 7  0.01334808  32.00 0.8833333 0.7640228 0.04662200 0.0925252   6
## 9  0.01334808  64.00 0.8833333 0.7640228 0.04662200 0.0925252   6
## 4  0.01334808 128.00 0.8833333 0.7640228 0.04662200 0.0925252   6