Notes Nov. 25

Hyperparameter Tuning

I’ll take an example from Kuhn’s documentation on github. This uses a support vector machine with a radial kernel to distinguish between mines and rocks using sonar data. The particulars of the example are unimportant. What I’m interested in is the significance of the way that hyperparameter tuning is specified. In this example, there are two hyperparameters - C and sigma.

It is always a good idea to look at the dataframe of results which is one of the elements in the list produced by train.

library(mlbench)
library(kernlab)
data(Sonar)
library(caret)

## Loading required package: lattice

## Loading required package: ggplot2

## 
## Attaching package: 'ggplot2'

## The following object is masked from 'package:kernlab':
## 
##     alpha

library(tictoc)
set.seed(998)
inTraining <- createDataPartition(Sonar$Class, p = .75, list = FALSE)
training <- Sonar[ inTraining,]
testing  <- Sonar[-inTraining,]


svmControl <- trainControl(method = "repeatedcv",
number = 10, 
repeats = 10,
classProbs = TRUE)

tic()
set.seed(825)
svmFit <- train(Class ~ ., data = training,
                method = "svmRadial", 
                trControl = svmControl, 
                preProc = c("center", "scale"),
                metric = "Accuracy"
                )
toc()

## 9.762 sec elapsed

str(svmFit$results)

## 'data.frame':    3 obs. of  6 variables:
##  $ sigma     : num  0.0118 0.0118 0.0118
##  $ C         : num  0.25 0.5 1
##  $ Accuracy  : num  0.737 0.773 0.787
##  $ Kappa     : num  0.471 0.543 0.571
##  $ AccuracySD: num  0.115 0.11 0.104
##  $ KappaSD   : num  0.231 0.221 0.208

svmFit$results

##        sigma    C  Accuracy     Kappa AccuracySD   KappaSD
## 1 0.01181293 0.25 0.7365466 0.4710495  0.1149270 0.2307519
## 2 0.01181293 0.50 0.7732328 0.5426548  0.1098938 0.2214216
## 3 0.01181293 1.00 0.7865343 0.5712726  0.1036692 0.2084029

Looking at the dataframe of results, we can see that only one of the two hyperparameters is being varied in tuning.

Without specifying anything about tuning, the model used one value of sigma and three values of C. The highest value of C performed best, which is concerning because even higher values of C may have been superior. Let’s rerun this example with a tuneLength of 4.

tic()
svmFit4 <- train(Class ~ ., data = training,
                method = "svmRadial", 
                trControl = svmControl, 
                preProc = c("center", "scale"),
                metric = "Accuracy",
                tuneLength = 4
                )
toc()

## 11.952 sec elapsed

str(svmFit4$results)

## 'data.frame':    4 obs. of  6 variables:
##  $ sigma     : num  0.0136 0.0136 0.0136 0.0136
##  $ C         : num  0.25 0.5 1 2
##  $ Accuracy  : num  0.737 0.769 0.793 0.812
##  $ Kappa     : num  0.471 0.535 0.583 0.621
##  $ AccuracySD: num  0.105 0.115 0.102 0.102
##  $ KappaSD   : num  0.213 0.233 0.206 0.206

svmFit4$results

##        sigma    C  Accuracy     Kappa AccuracySD   KappaSD
## 1 0.01360214 0.25 0.7366740 0.4708012  0.1054291 0.2126107
## 2 0.01360214 0.50 0.7694020 0.5347933  0.1153425 0.2329351
## 3 0.01360214 1.00 0.7925343 0.5830024  0.1024355 0.2055742
## 4 0.01360214 2.00 0.8117279 0.6205051  0.1021609 0.2061736

Let’s do an explicit grid search and look at the results.

svmControl <- trainControl(method = "repeatedcv",
number = 10, 
repeats = 10,
classProbs = TRUE,
search = "grid")

myGrid = expand.grid(sigma = c(.0133*.5,.0133, .0133*2),
                     C = .25*2^(0:9)) 
set.seed(825)

tic()
svmFitGrid <- train(Class ~ ., data = training,
                method = "svmRadial", 
                trControl = svmControl, 
                preProc = c("center", "scale"),
                metric = "Accuracy",
                tuneGrid = myGrid
                )
toc()

## 87.248 sec elapsed

svmFitGrid$bestTune

##     sigma  C
## 17 0.0133 16

str(svmFitGrid$results)

## 'data.frame':    30 obs. of  6 variables:
##  $ sigma     : num  0.00665 0.00665 0.00665 0.00665 0.00665 0.00665 0.00665 0.00665 0.00665 0.00665 ...
##  $ C         : num  0.25 0.5 1 2 4 8 16 32 64 128 ...
##  $ Accuracy  : num  0.711 0.768 0.769 0.787 0.813 ...
##  $ Kappa     : num  0.423 0.532 0.536 0.573 0.626 ...
##  $ AccuracySD: num  0.1113 0.1048 0.1058 0.0947 0.0905 ...
##  $ KappaSD   : num  0.223 0.212 0.212 0.19 0.181 ...

svmFitGrid$results

##      sigma      C  Accuracy     Kappa AccuracySD   KappaSD
## 1  0.00665   0.25 0.7109069 0.4230469 0.11133899 0.2232574
## 2  0.00665   0.50 0.7680245 0.5322562 0.10475012 0.2117049
## 3  0.00665   1.00 0.7689730 0.5359981 0.10583579 0.2122770
## 4  0.00665   2.00 0.7870074 0.5728198 0.09472244 0.1896162
## 5  0.00665   4.00 0.8134583 0.6256687 0.09051800 0.1810306
## 6  0.00665   8.00 0.8370809 0.6722456 0.08215304 0.1644265
## 7  0.00665  16.00 0.8409142 0.6791553 0.08183504 0.1643256
## 8  0.00665  32.00 0.8410392 0.6793486 0.08129754 0.1633458
## 9  0.00665  64.00 0.8403775 0.6777328 0.08009262 0.1617632
## 10 0.00665 128.00 0.8398309 0.6773044 0.08423262 0.1689661
## 11 0.01330   0.25 0.7432598 0.4836304 0.11180238 0.2257788
## 12 0.01330   0.50 0.7794461 0.5553285 0.10813333 0.2175859
## 13 0.01330   1.00 0.7897059 0.5779083 0.09907841 0.1985945
## 14 0.01330   2.00 0.8242230 0.6462055 0.09074958 0.1818238
## 15 0.01330   4.00 0.8404044 0.6785791 0.08195699 0.1643656
## 16 0.01330   8.00 0.8511324 0.6997508 0.07810380 0.1569551
## 17 0.01330  16.00 0.8541422 0.7056982 0.08144626 0.1636939
## 18 0.01330  32.00 0.8480025 0.6935477 0.07913595 0.1589081
## 19 0.01330  64.00 0.8531176 0.7035801 0.08008044 0.1610547
## 20 0.01330 128.00 0.8517157 0.7008115 0.08102466 0.1626427
## 21 0.02660   0.25 0.7551127 0.5098603 0.11684780 0.2349180
## 22 0.02660   0.50 0.7988873 0.5939222 0.09515513 0.1924806
## 23 0.02660   1.00 0.8232034 0.6426913 0.08795617 0.1772456
## 24 0.02660   2.00 0.8295515 0.6558810 0.08479592 0.1699307
## 25 0.02660   4.00 0.8468946 0.6905084 0.08735424 0.1756529
## 26 0.02660   8.00 0.8466544 0.6900878 0.08488057 0.1705708
## 27 0.02660  16.00 0.8391863 0.6755100 0.08304815 0.1665383
## 28 0.02660  32.00 0.8442598 0.6852795 0.08393297 0.1686732
## 29 0.02660  64.00 0.8379681 0.6723321 0.08646423 0.1740743
## 30 0.02660 128.00 0.8403113 0.6776502 0.08646401 0.1733022

What we might think we know now is that the best set of hyperparameters is at or above .0266 and somewhere between 1 and 4 for C. This would be true with only one independent variable, but we have two.

That leads me to do the following grid search.

tic()
myGrid = expand.grid(sigma = seq(.0266,.0276,length = 4),
C= seq(1,4,length=8))
set.seed(825)

svmFitGrid <- train(Class ~ ., data = training,
                method = "svmRadial", 
                trControl = svmControl, 
                preProc = c("center", "scale"),
                metric = "Accuracy",
                tuneGrid = myGrid
                )
toc()

## 94.855 sec elapsed

svmFitGrid$bestTune

##     sigma        C
## 31 0.0276 3.571429

str(svmFitGrid$results)

## 'data.frame':    32 obs. of  6 variables:
##  $ sigma     : num  0.0266 0.0266 0.0266 0.0266 0.0266 ...
##  $ C         : num  1 1.43 1.86 2.29 2.71 ...
##  $ Accuracy  : num  0.826 0.831 0.833 0.838 0.84 ...
##  $ Kappa     : num  0.649 0.659 0.663 0.674 0.677 ...
##  $ AccuracySD: num  0.09 0.0876 0.0866 0.0882 0.0888 ...
##  $ KappaSD   : num  0.182 0.176 0.174 0.176 0.179 ...

svmFitGrid$results

##         sigma        C  Accuracy     Kappa AccuracySD   KappaSD
## 1  0.02660000 1.000000 0.8263799 0.6490607 0.09003923 0.1816266
## 2  0.02660000 1.428571 0.8308799 0.6587729 0.08756742 0.1755309
## 3  0.02660000 1.857143 0.8334265 0.6633080 0.08661472 0.1743122
## 4  0.02660000 2.285714 0.8384363 0.6736964 0.08816600 0.1764460
## 5  0.02660000 2.714286 0.8403480 0.6772519 0.08876830 0.1785556
## 6  0.02660000 3.142857 0.8404314 0.6773341 0.08692482 0.1748227
## 7  0.02660000 3.571429 0.8367966 0.6704482 0.08653823 0.1736352
## 8  0.02660000 4.000000 0.8424314 0.6816488 0.08400518 0.1687550
## 9  0.02693333 1.000000 0.8208554 0.6381496 0.09220819 0.1850470
## 10 0.02693333 1.428571 0.8303750 0.6573057 0.08644883 0.1739429
## 11 0.02693333 1.857143 0.8316299 0.6597775 0.08308980 0.1670940
## 12 0.02693333 2.285714 0.8380417 0.6730233 0.08621315 0.1728872
## 13 0.02693333 2.714286 0.8423848 0.6817877 0.08262121 0.1658771
## 14 0.02693333 3.142857 0.8404314 0.6774919 0.08605374 0.1721686
## 15 0.02693333 3.571429 0.8443113 0.6856300 0.08407102 0.1686504
## 16 0.02693333 4.000000 0.8412132 0.6790216 0.08942519 0.1797663
## 17 0.02726667 1.000000 0.8269632 0.6496067 0.08815440 0.1785573
## 18 0.02726667 1.428571 0.8284583 0.6532718 0.08718159 0.1757422
## 19 0.02726667 1.857143 0.8267402 0.6503130 0.08480416 0.1701740
## 20 0.02726667 2.285714 0.8321299 0.6609488 0.08842804 0.1769413
## 21 0.02726667 2.714286 0.8419167 0.6804144 0.08520826 0.1711493
## 22 0.02726667 3.142857 0.8442279 0.6853908 0.08522277 0.1709135
## 23 0.02726667 3.571429 0.8397598 0.6763245 0.08698414 0.1745599
## 24 0.02726667 4.000000 0.8408578 0.6787131 0.08639852 0.1726964
## 25 0.02760000 1.000000 0.8252917 0.6469534 0.08478907 0.1705061
## 26 0.02760000 1.428571 0.8333480 0.6636296 0.08413660 0.1690398
## 27 0.02760000 1.857143 0.8283431 0.6537053 0.08731862 0.1749327
## 28 0.02760000 2.285714 0.8364828 0.6703201 0.08833746 0.1767534
## 29 0.02760000 2.714286 0.8366348 0.6697244 0.08436073 0.1693820
## 30 0.02760000 3.142857 0.8399265 0.6767220 0.08610751 0.1726456
## 31 0.02760000 3.571429 0.8443529 0.6855205 0.08669758 0.1739750
## 32 0.02760000 4.000000 0.8441863 0.6849635 0.08282918 0.1662078

Rather than doing a manual search, we can do a systematic search using the capabilities of caret.

tic()
adControl <- trainControl(method = "adaptive_cv",
                           number = 10, repeats = 10)
set.seed(825)

svmFitad <- train(Class ~ ., data = training,
                method = "svmRadial", 
                trControl = adControl, 
                preProc = c("center", "scale"),
                metric = "Accuracy",
                tuneLength = 10
                )
toc()

## 5.124 sec elapsed

svmFitad$bestTune

##         sigma C
## 10 0.01181293 8

str(svmFitad$results)

## 'data.frame':    10 obs. of  7 variables:
##  $ sigma     : num  0.0118 0.0118 0.0118 0.0118 0.0118 ...
##  $ C         : num  0.25 0.5 1 2 4 8 16 32 64 128
##  $ Accuracy  : num  0.718 0.756 0.757 0.782 0.838 ...
##  $ Kappa     : num  0.415 0.499 0.502 0.558 0.675 ...
##  $ AccuracySD: num  0.0689 0.0706 0.0509 0.0559 0.0815 ...
##  $ KappaSD   : num  0.154 0.155 0.114 0.114 0.161 ...
##  $ .B        : int  5 5 5 5 32 100 6 6 6 6

svmFitad$results

##         sigma      C  Accuracy     Kappa AccuracySD   KappaSD  .B
## 1  0.01181293   0.25 0.7183333 0.4146668 0.06888245 0.1544630   5
## 2  0.01181293   0.50 0.7558333 0.4991819 0.07056321 0.1553820   5
## 3  0.01181293   1.00 0.7566667 0.5019550 0.05091182 0.1142971   5
## 6  0.01181293   2.00 0.7825000 0.5583937 0.05593275 0.1141370   5
## 8  0.01181293   4.00 0.8383502 0.6745518 0.08152624 0.1614193  32
## 10 0.01181293   8.00 0.8508824 0.6985281 0.07880221 0.1593479 100
## 5  0.01181293  16.00 0.8722222 0.7384152 0.08052202 0.1656383   6
## 7  0.01181293  32.00 0.8722222 0.7384152 0.08052202 0.1656383   6
## 9  0.01181293  64.00 0.8722222 0.7384152 0.08052202 0.1656383   6
## 4  0.01181293 128.00 0.8722222 0.7384152 0.08052202 0.1656383   6

Now I want to explore the ability of caret to do parallel processing.

library(doMC)

## Loading required package: foreach

## Loading required package: iterators

## Loading required package: parallel

# I have a new MB Pro with 8 cores.
registerDoMC(cores = 8)

We need to insert a line in the call to train.

adControl <- trainControl(method = "adaptive_cv",
                           number = 10, repeats = 10)
set.seed(825)

tic()
svmFitad <- train(Class ~ ., data = training,
                method = "svmRadial", 
                trControl = adControl, 
                preProc = c("center", "scale"),
                metric = "Accuracy",
                tuneLength = 10,
                allowParallel=TRUE
                )
toc()

## 5.533 sec elapsed

svmFitad$bestTune

##         sigma C
## 10 0.01181293 8

str(svmFitad$results)

## 'data.frame':    10 obs. of  7 variables:
##  $ sigma     : num  0.0118 0.0118 0.0118 0.0118 0.0118 ...
##  $ C         : num  0.25 0.5 1 2 4 8 16 32 64 128
##  $ Accuracy  : num  0.718 0.756 0.757 0.782 0.838 ...
##  $ Kappa     : num  0.415 0.499 0.502 0.558 0.675 ...
##  $ AccuracySD: num  0.0689 0.0706 0.0509 0.0559 0.0815 ...
##  $ KappaSD   : num  0.154 0.155 0.114 0.114 0.161 ...
##  $ .B        : int  5 5 5 5 32 100 6 6 6 6

svmFitad$results

##         sigma      C  Accuracy     Kappa AccuracySD   KappaSD  .B
## 1  0.01181293   0.25 0.7183333 0.4146668 0.06888245 0.1544630   5
## 2  0.01181293   0.50 0.7558333 0.4991819 0.07056321 0.1553820   5
## 3  0.01181293   1.00 0.7566667 0.5019550 0.05091182 0.1142971   5
## 6  0.01181293   2.00 0.7825000 0.5583937 0.05593275 0.1141370   5
## 8  0.01181293   4.00 0.8383502 0.6745518 0.08152624 0.1614193  32
## 10 0.01181293   8.00 0.8508824 0.6985281 0.07880221 0.1593479 100
## 5  0.01181293  16.00 0.8722222 0.7384152 0.08052202 0.1656383   6
## 7  0.01181293  32.00 0.8722222 0.7384152 0.08052202 0.1656383   6
## 9  0.01181293  64.00 0.8722222 0.7384152 0.08052202 0.1656383   6
## 4  0.01181293 128.00 0.8722222 0.7384152 0.08052202 0.1656383   6

Notes Nov. 25

Harold Nelson

11/25/2019

Hyperparameter Tuning