Tuning Models with Tune Grid in Tidymodels

Tidymodels provides a powerful framework for building and tuning machine learning models in R. The tune package, a core component of tidymodels, allows you to systematically search for the best combination of hyperparameters for your model using techniques like grid search. This tutorial will guide you through using tune_grid to optimize your model.

1. Installation and Loading

First, install and load the necessary packages:

install.packages(c("tidymodels", "tune", "workflows", "recipes", "dplyr", "ggplot2"))

library(tidymodels)

2. Data Preparation

We’ll use the penguins dataset for this tutorial.

data(penguins, package = "palmerpenguins")
penguins <- penguins %>%
  drop_na()

# Split the data into training and testing sets
set.seed(123)
penguins_split <- initial_split(penguins, prop = 0.8, strata = species)
penguins_train <- training(penguins_split)
penguins_test <- testing(penguins_split)

3. Feature Engineering with Recipes

We’ll create a recipe to preprocess the data.

penguins_recipe <- recipe(species ~ ., data = penguins_train) %>%
  step_dummy(all_nominal_predictors()) %>%
  step_zv(all_predictors()) %>%
  step_normalize(all_numeric_predictors())

4. Model Specification

Let’s use a random forest model as an example.

rf_model <- rand_forest(
  mtry = tune(),
  trees = 500,
  min_n = tune()
) %>%
  set_mode("classification") %>%
  set_engine("ranger")

Note that mtry and min_n are set to tune(), indicating that we want to optimize these hyperparameters.

5. Workflow Creation

We’ll combine the recipe and model into a workflow.

penguins_workflow <- workflow() %>%
  add_recipe(penguins_recipe) %>%
  add_model(rf_model)

6. Tuning Grid Setup

We’ll create a tuning grid to specify the hyperparameter combinations to evaluate.

penguins_grid <- grid_regular(
  mtry(range = c(1, 6)),
  min_n(range = c(2, 10)),
  levels = 5
)

penguins_grid

7. Resampling Strategy

We’ll use cross-validation to evaluate the model’s performance.

penguins_folds <- vfold_cv(penguins_train, v = 5)

8. Tuning the Model with tune_grid

Now, we’ll use tune_grid to evaluate the model with the specified hyperparameter grid and resampling strategy.

tune_results <- penguins_workflow %>%
  tune_grid(
    resamples = penguins_folds,
    grid = penguins_grid,
    metrics = metric_set(accuracy, roc_auc, precision, recall)
  )

9. Analyzing Tuning Results

We can visualize the tuning results using autoplot().

autoplot(tune_results)

We can also extract the best performing hyperparameters using select_best().

best_params <- tune_results %>%
  select_best("roc_auc")

best_params

10. Finalizing the Workflow

We’ll finalize the workflow using the best hyperparameters.

final_workflow <- penguins_workflow %>%
  finalize_workflow(best_params)

11. Fitting the Final Model

We’ll fit the final model using the training data.

final_fit <- final_workflow %>%
  last_fit(penguins_split)

12. Evaluating the Final Model

We can evaluate the final model’s performance on the test data.

final_fit %>%
  collect_metrics()

We can also generate predictions and evaluate them.

final_fit %>%
  collect_predictions() %>%
  conf_mat(truth = species, estimate = .pred_class)

Key Advantages of tune_grid:

This tutorial provides a basic introduction to tune_grid. Explore the tidymodels documentation for more advanced features and examples, such as custom tuning grids, parallel processing, and more complex model tuning scenarios.