The goal of this tutorial is to learn how to use the two parameters from caret package: TuneGrid and TuneLength.
# First we load the libraries
library(caret)
## Loading required package: lattice
## Loading required package: ggplot2
# Then we load the dataset
data("iris")
str(iris)
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
# We create a 75% training dataset
set.seed(123)
my_index <- createDataPartition(iris$Sepal.Length, p = 0.75, list = F)
trainSet <- iris[my_index, ]
testSet <- iris[-my_index, ]
# The tuneLength parameter tells the algorithm to try different default values for the main parameter
set.seed(123)
my_knn_model <- train(Species ~ .,
method = "knn",
data = trainSet,
tuneLength = 4)
# In this case we used 4 default values
my_knn_model
## k-Nearest Neighbors
##
## 115 samples
## 4 predictor
## 3 classes: 'setosa', 'versicolor', 'virginica'
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 115, 115, 115, 115, 115, 115, ...
## Resampling results across tuning parameters:
##
## k Accuracy Kappa
## 5 0.9601991 0.9398849
## 7 0.9662143 0.9489685
## 9 0.9600470 0.9397324
## 11 0.9534206 0.9297923
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 7.
set.seed(123)
my_knn_model <- train(Species ~ .,
method = "knn",
data = trainSet,
tuneLength = 7)
# In this case we used 7 default values
my_knn_model
## k-Nearest Neighbors
##
## 115 samples
## 4 predictor
## 3 classes: 'setosa', 'versicolor', 'virginica'
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 115, 115, 115, 115, 115, 115, ...
## Resampling results across tuning parameters:
##
## k Accuracy Kappa
## 5 0.9572333 0.9354362
## 7 0.9652049 0.9474815
## 9 0.9592137 0.9384799
## 11 0.9523950 0.9282180
## 13 0.9536369 0.9301112
## 15 0.9576745 0.9362621
## 17 0.9555304 0.9331583
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 7.
# We see that the choice of parameters is always the same
# The tuneGrid parameter lets us decide which values the main parameter will take
# While tuneLength only limit the number of default parameters to use.
set.seed(123)
my_knn_model <- train(Species ~ .,
method = "knn",
data = trainSet,
tuneGrid = expand.grid(k = c(5, 11, 21, 25)))
# In this case we used the values 5, 11, 21 and 25
my_knn_model
## k-Nearest Neighbors
##
## 115 samples
## 4 predictor
## 3 classes: 'setosa', 'versicolor', 'virginica'
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 115, 115, 115, 115, 115, 115, ...
## Resampling results across tuning parameters:
##
## k Accuracy Kappa
## 5 0.9601991 0.9398849
## 11 0.9542797 0.9310758
## 21 0.9515068 0.9270186
## 25 0.9440841 0.9160004
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 5.
set.seed(123)
my_knn_model <- train(Species ~ .,
method = "knn",
data = trainSet,
tuneGrid = expand.grid(k = c(9, 13, 19)))
# In this case we used the values 9, 13 and 19
my_knn_model
## k-Nearest Neighbors
##
## 115 samples
## 4 predictor
## 3 classes: 'setosa', 'versicolor', 'virginica'
##
## No pre-processing
## Resampling: Bootstrapped (25 reps)
## Summary of sample sizes: 115, 115, 115, 115, 115, 115, ...
## Resampling results across tuning parameters:
##
## k Accuracy Kappa
## 9 0.9590714 0.9382785
## 13 0.9558370 0.9334525
## 19 0.9519232 0.9276993
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was k = 9.
In this tutorial we have learnt how to use the tuneLength and tuneGrid parameter when training a model. Remember that only one should be used while training.