1 Goal

The goal of this tutorial is to learn how to use the two parameters from caret package: TuneGrid and TuneLength.

2 Data import

# First we load the libraries
library(caret)

## Loading required package: lattice

## Loading required package: ggplot2

# Then we load the dataset
data("iris")
str(iris)

## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

3 Separation of the dataset in train and test

# We create a 75% training dataset
set.seed(123)
my_index <- createDataPartition(iris$Sepal.Length, p = 0.75, list = F)
trainSet <- iris[my_index, ]
testSet <- iris[-my_index, ]

4 TuneLength

# The tuneLength parameter tells the algorithm to try different default values for the main parameter
set.seed(123)
my_knn_model <- train(Species ~ .,
                      method = "knn",
                      data = trainSet,
                      tuneLength = 4)

# In this case we used 4 default values 
my_knn_model

## k-Nearest Neighbors 
## 
## 115 samples
##   4 predictor
##   3 classes: 'setosa', 'versicolor', 'virginica' 
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 115, 115, 115, 115, 115, 115, ... 
## Resampling results across tuning parameters:
## 
##   k   Accuracy   Kappa    
##    5  0.9601991  0.9398849
##    7  0.9662143  0.9489685
##    9  0.9600470  0.9397324
##   11  0.9534206  0.9297923
## 
## Accuracy was used to select the optimal model using  the largest value.
## The final value used for the model was k = 7.

set.seed(123)
my_knn_model <- train(Species ~ .,
                      method = "knn",
                      data = trainSet,
                      tuneLength = 7)

# In this case we used 7 default values 
my_knn_model

## k-Nearest Neighbors 
## 
## 115 samples
##   4 predictor
##   3 classes: 'setosa', 'versicolor', 'virginica' 
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 115, 115, 115, 115, 115, 115, ... 
## Resampling results across tuning parameters:
## 
##   k   Accuracy   Kappa    
##    5  0.9572333  0.9354362
##    7  0.9652049  0.9474815
##    9  0.9592137  0.9384799
##   11  0.9523950  0.9282180
##   13  0.9536369  0.9301112
##   15  0.9576745  0.9362621
##   17  0.9555304  0.9331583
## 
## Accuracy was used to select the optimal model using  the largest value.
## The final value used for the model was k = 7.

# We see that the choice of parameters is always the same

5 TuneGrid

# The tuneGrid parameter lets us decide which values the main parameter will take
# While tuneLength only limit the number of default parameters to use.

set.seed(123)
my_knn_model <- train(Species ~ .,
                      method = "knn",
                      data = trainSet,
                      tuneGrid = expand.grid(k = c(5, 11, 21, 25)))

# In this case we used the values 5, 11, 21 and 25
my_knn_model

## k-Nearest Neighbors 
## 
## 115 samples
##   4 predictor
##   3 classes: 'setosa', 'versicolor', 'virginica' 
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 115, 115, 115, 115, 115, 115, ... 
## Resampling results across tuning parameters:
## 
##   k   Accuracy   Kappa    
##    5  0.9601991  0.9398849
##   11  0.9542797  0.9310758
##   21  0.9515068  0.9270186
##   25  0.9440841  0.9160004
## 
## Accuracy was used to select the optimal model using  the largest value.
## The final value used for the model was k = 5.

set.seed(123)
my_knn_model <- train(Species ~ .,
                      method = "knn",
                      data = trainSet,
                      tuneGrid = expand.grid(k = c(9, 13, 19)))

# In this case we used the values 9, 13 and 19
my_knn_model

## k-Nearest Neighbors 
## 
## 115 samples
##   4 predictor
##   3 classes: 'setosa', 'versicolor', 'virginica' 
## 
## No pre-processing
## Resampling: Bootstrapped (25 reps) 
## Summary of sample sizes: 115, 115, 115, 115, 115, 115, ... 
## Resampling results across tuning parameters:
## 
##   k   Accuracy   Kappa    
##    9  0.9590714  0.9382785
##   13  0.9558370  0.9334525
##   19  0.9519232  0.9276993
## 
## Accuracy was used to select the optimal model using  the largest value.
## The final value used for the model was k = 9.

6 Conclusion

In this tutorial we have learnt how to use the tuneLength and tuneGrid parameter when training a model. Remember that only one should be used while training.

TuneGrid and TuneLength in Caret

Ubiqum Code Academy

1 Goal

2 Data import

3 Separation of the dataset in train and test

4 TuneLength

5 TuneGrid

6 Conclusion