Grid search
introduction
Before we try to explain what is grid search is all about, should understand the hyperparameters. So what is heyperparameters ?
Hyperparameters
Hyperparameters is model specific parameters, this mean it’s related to the model itself and don’t have anything to do with the data. for example * The learning rate for training a neural network. * The C and sigma hyperparameters for support vector machines. * The k in k-nearest neighbors.
All of these are hyperparameters. Hyperparameters usually can’t be calculated using analytically method, moreover hyperparameters interact with each others in non known way.
So how to find these values ?
The process of finding the optimal hyperparamters values called tuning or optimization.
If we consider single hyperparamter tuning problem then we can easily solve it using loop testing it against a measure. for example we can optimize the number of trees (Hyperparamters) in GBM (Model) to find which value give the highest Accuracy (measure)
for more two hyperparameters we can use nested loop in the same maneer as in a single hyperparameter hence we call it grid search.
Random grid search
Example
Will use Regularized Discriminant Analysis (RDA) data to perform classification on Sonar data
Regularized Discriminant Analysis
157 samples
60 predictor
2 classes: 'M', 'R'
No pre-processing
Resampling: Cross-Validated (3 fold, repeated 1 times)
Summary of sample sizes: 104, 105, 105
Resampling results across tuning parameters:
gamma lambda ROC Sens Spec
0.06719392 0.15537778 0.8321032 0.7857143 0.8072222
0.23238068 0.99856887 0.8158333 0.7619048 0.6977778
0.24555126 0.90358907 0.8350000 0.7976190 0.7255556
0.24574399 0.24541119 0.8381548 0.8095238 0.7394444
0.29550294 0.84604358 0.8394246 0.8095238 0.7394444
0.30900994 0.94205611 0.8325397 0.7976190 0.7250000
0.33918666 0.11809853 0.8376389 0.8095238 0.6988889
0.34049204 0.93036798 0.8354960 0.8095238 0.7388889
0.36486559 0.30040117 0.8500000 0.8333333 0.7255556
0.40455354 0.06296552 0.8401587 0.8095238 0.6988889
0.41344444 0.16550610 0.8465675 0.8214286 0.6988889
0.41752491 0.43418958 0.8538690 0.8333333 0.7250000
0.45026345 0.91790703 0.8418452 0.7976190 0.7527778
0.47891791 0.77314461 0.8500992 0.8095238 0.7533333
0.50422382 0.20511296 0.8510119 0.8333333 0.6988889
[ reached getOption("max.print") -- omitted 15 rows ]
ROC was used to select the optimal model using the largest value.
The final values used for the model were gamma = 0.637717 and lambda
= 0.3650505.
Demonstration
Check how model searching for hyperparameters combinations
Which one is beteer ?
Based on this paper Bergstra et al, claims that a random search of the parameter space is guaranteed to be more effective than grid search