Grid search

introduction

introduction

Before we try to explain what is grid search is all about, should understand the hyperparameters. So what is heyperparameters ?

Hyperparameters

Hyperparameters is model specific parameters, this mean it’s related to the model itself and don’t have anything to do with the data. for example * The learning rate for training a neural network. * The C and sigma hyperparameters for support vector machines. * The k in k-nearest neighbors.

All of these are hyperparameters. Hyperparameters usually can’t be calculated using analytically method, moreover hyperparameters interact with each others in non known way.

So how to find these values ?

The process of finding the optimal hyperparamters values called tuning or optimization.

If we consider single hyperparamter tuning problem then we can easily solve it using loop testing it against a measure. for example we can optimize the number of trees (Hyperparamters) in GBM (Model) to find which value give the highest Accuracy (measure)

for more two hyperparameters we can use nested loop in the same maneer as in a single hyperparameter hence we call it grid search.

Random grid search

Example

Will use Regularized Discriminant Analysis (RDA) data to perform classification on Sonar data

Regularized Discriminant Analysis 

157 samples
 60 predictor
  2 classes: 'M', 'R' 

No pre-processing
Resampling: Cross-Validated (3 fold, repeated 1 times) 
Summary of sample sizes: 104, 105, 105 
Resampling results across tuning parameters:

  gamma       lambda      ROC        Sens       Spec     
  0.06719392  0.15537778  0.8321032  0.7857143  0.8072222
  0.23238068  0.99856887  0.8158333  0.7619048  0.6977778
  0.24555126  0.90358907  0.8350000  0.7976190  0.7255556
  0.24574399  0.24541119  0.8381548  0.8095238  0.7394444
  0.29550294  0.84604358  0.8394246  0.8095238  0.7394444
  0.30900994  0.94205611  0.8325397  0.7976190  0.7250000
  0.33918666  0.11809853  0.8376389  0.8095238  0.6988889
  0.34049204  0.93036798  0.8354960  0.8095238  0.7388889
  0.36486559  0.30040117  0.8500000  0.8333333  0.7255556
  0.40455354  0.06296552  0.8401587  0.8095238  0.6988889
  0.41344444  0.16550610  0.8465675  0.8214286  0.6988889
  0.41752491  0.43418958  0.8538690  0.8333333  0.7250000
  0.45026345  0.91790703  0.8418452  0.7976190  0.7527778
  0.47891791  0.77314461  0.8500992  0.8095238  0.7533333
  0.50422382  0.20511296  0.8510119  0.8333333  0.6988889
 [ reached getOption("max.print") -- omitted 15 rows ]

ROC was used to select the optimal model using  the largest value.
The final values used for the model were gamma = 0.637717 and lambda
 = 0.3650505.

Demonstration

Check how model searching for hyperparameters combinations

Which one is beteer ?

Based on this paper Bergstra et al, claims that a random search of the parameter space is guaranteed to be more effective than grid search