Cats or dogs?

Consider the following dataset:

image.tbl <- read.csv("Data/cats_dogs.csv") %>% mutate(type = factor(type))

which contains the images of 99 cats and 99 dogs represented as 64 by 64 images. The first 4096 (\(64 \times 64\)) columns correspond to the pixels of the image, and the column type contains information of whether the image corresponds to a cat or a dog.

The function plotImage allows us to show an image represented in a matrix. Notice that we can use it to plot the images in our dataset by selecting the appropriate row and columns and converting into a matrix (see the examples below)

plotImage <- function(dat,size=64){
  imag <- matrix(dat,  nrow = size,byrow = T) 
  image(imag,col = grey.colors(256)) 
}
#Plot the first image-dog
plotImage(as.matrix(image.tbl[1,1:4096]))

#Plot the 150th image-cat
plotImage(as.matrix(image.tbl[150,1:4096]))

Finally let’s create training/testing datasets as well as 10-fold cross-validation dataset. Make sure to use these datasets in the following subsections.

set.seed(654321)
image.split <- initial_split(image.tbl, prop = 0.6)
image.train.tbl <- training(image.split)
image.test.tbl <- testing(image.split)
images.folds <- vfold_cv(image.train.tbl, v = 10)

LASSO model

Describe what a LASSO model is and what the penalty term (\(\lambda\)) is. Create a LASSO model that distinguishes between cats and dogs by selecting an optimal penalty by using 10 fold cross-validation. Establish the accuracy and confusion matrix of the model using your testing dataset. Show two incorrectly classified images and their probabilities of being a dog or a cat assigned by the model.

LASSO models are used for regularization. Instead of minimizing the residual sums of squares only, we minimize the following function:

  • The penalty term (\(\lambda\)) for a LASSO model is this function: \[RSS(\beta_0,\dots,\beta_p) + \lambda \sum\limits_{i=1}^p \beta_j^2\]
image.model <-
  logistic_reg(mixture = 1, penalty = tune()) %>%
  set_mode("classification") %>%
  set_engine("glmnet")

image.recipe <-
  recipe(formula = type ~ ., data = image.train.tbl) %>%
  step_normalize(all_predictors()) 

image.wf <- workflow() %>%
  add_recipe(image.recipe) %>%
  add_model(image.model)

penalty.grid <- grid_regular(penalty(range = c(-2, 0)), levels = 20)
tune.res <- tune_grid(image.wf, resamples = images.folds, grid = penalty.grid)

show_best(tune.res, metric = "accuracy")
## # A tibble: 5 × 7
##   penalty .metric  .estimator  mean     n std_err .config              
##     <dbl> <chr>    <chr>      <dbl> <int>   <dbl> <chr>                
## 1  0.0207 accuracy binary     0.848    10  0.0166 Preprocessor1_Model04
## 2  0.0428 accuracy binary     0.840    10  0.0225 Preprocessor1_Model07
## 3  0.0162 accuracy binary     0.831    10  0.0246 Preprocessor1_Model03
## 4  0.0264 accuracy binary     0.831    10  0.0172 Preprocessor1_Model05
## 5  0.0336 accuracy binary     0.831    10  0.0172 Preprocessor1_Model06
best.penalty <- select_by_one_std_err(tune.res, metric = "accuracy", desc(penalty))

image.final.wf <- finalize_workflow(image.wf, best.penalty)
image.final.fit <- fit(image.final.wf, data = image.train.tbl)
augment(image.final.fit, new_data = image.test.tbl) %>%
  conf_mat(truth = type, estimate = .pred_class)
##           Truth
## Prediction cat dog
##        cat  32  15
##        dog   5  28
augment(image.final.fit, new_data = image.test.tbl) %>%
  accuracy(truth = type, estimate = .pred_class )
## # A tibble: 1 × 3
##   .metric  .estimator .estimate
##   <chr>    <chr>          <dbl>
## 1 accuracy binary          0.75
test.pred.tbl <- augment(image.final.fit, new_data = image.test.tbl) 

missed_class <- test.pred.tbl %>%
  filter(type == "dog" & .pred_class == "cat" |type == "cat" & .pred_class == "dog") 

plotImage(as.matrix(missed_class[1,1:4096]))

plotImage(as.matrix(missed_class[2,1:4096]))

missed_class %>% select(.pred_dog, .pred_cat) %>% head(2)
## # A tibble: 2 × 2
##   .pred_dog .pred_cat
##       <dbl>     <dbl>
## 1     0.411     0.589
## 2     0.415     0.585

Both miss-classified images are dogs miss-classified as cats. The probability of both images being a dog are around 0.41 and prob(cat) = 0.59). I think this makes sense as the probability of both is so close to 1/2

Random forest

Describe what a random forest model is and what the parameter mtry refers to in the ranger implementation of random forests. Create a random forest model that distinguishes between cats and dogs by selecting an optimal value of mtry using 10 fold cross-validation. Establish the accuracy and confusion matrix of the model using your testing data set. Show two incorrectly classified images and their probabilities of being a dog or a cat assigned by the model.

Random forests are basically ensembles of decision trees. In random forests we want to de-correlate the trees by selecting a random fraction of the variables at each level, so that the different trees won’t be related to each other.

Ranger is for the implementation of bagging and ‘mtry’ refers to number of predictors used in each trees.

forest.model <-
  rand_forest(mtry = tune()) %>%
  set_mode("classification") %>%
  set_engine("ranger")

forest.recipe <- recipe(formula = type ~ ., data = image.train.tbl) 

forest.wf <- workflow() %>%
  add_recipe(forest.recipe) %>%
  add_model(forest.model %>% set_args(mtry = tune()))
penalty.grid <- grid_regular(mtry(range = c(1, 40)), levels = 10)
tune.res <- 
  tune_grid(forest.wf, resamples = images.folds, grid = penalty.grid, metrics = metric_set(accuracy))

show_best(tune.res)
## # A tibble: 5 × 7
##    mtry .metric  .estimator  mean     n std_err .config              
##   <int> <chr>    <chr>      <dbl> <int>   <dbl> <chr>                
## 1    18 accuracy binary     0.839    10  0.0233 Preprocessor1_Model05
## 2    27 accuracy binary     0.839    10  0.0292 Preprocessor1_Model07
## 3    31 accuracy binary     0.839    10  0.0292 Preprocessor1_Model08
## 4    35 accuracy binary     0.839    10  0.0233 Preprocessor1_Model09
## 5    40 accuracy binary     0.831    10  0.0246 Preprocessor1_Model10
best.penalty <- select_best(tune.res, metric = "accuracy")

forest.final.wf <- finalize_workflow(forest.wf, best.penalty)
forest.final.fit <- fit(forest.final.wf, data = image.train.tbl)

augment(forest.final.fit, image.test.tbl) %>%
  accuracy(truth = type, estimate = .pred_class)
## # A tibble: 1 × 3
##   .metric  .estimator .estimate
##   <chr>    <chr>          <dbl>
## 1 accuracy binary         0.825
augment(forest.final.fit, new_data = image.test.tbl) %>%
  conf_mat(truth = type, estimate = .pred_class)
##           Truth
## Prediction cat dog
##        cat  34  11
##        dog   3  32
forest.pred.tbl <- augment(forest.final.fit, new_data = image.test.tbl) 

forest_missed <- forest.pred.tbl %>%
  filter(type == "dog" & .pred_class == "cat" |type == "cat" & .pred_class == "dog") 

plotImage(as.matrix(forest_missed[1,1:4096]))

plotImage(as.matrix(forest_missed[2,1:4096]))

forest_missed %>% select(.pred_dog, .pred_cat) %>% head(2)
## # A tibble: 2 × 2
##   .pred_dog .pred_cat
##       <dbl>     <dbl>
## 1     0.481     0.519
## 2     0.456     0.544

Again, these are both dogs misclassified as cats.

In the first image: prob(dog) = 0.488, prob(cat) = 0.512

In the second image: prob(dog) = 0.467, prob(cat) = 0.533

Boosting

Describe in your own words what a boosting model is and what the learning rate is in a boosting model. Create a boosting model that distinguishes between cats and dogs by selecting an optimal learning rate using 10 fold cross-validation. Establish the accuracy and confusion matrix of the model. Show two incorrectly classified images and their probabilities of being a dog or a cat assigned by the model.

In boosting we use “weak learning” repeatedly to get a suitable model. With boosting, we build a large number of under-fitted models. Then we build a final model by adding the under-fitted models.

Suppose we have data \(X\), a response variable \(y\) and a learning rate \(\lambda >0\) . We will build a sequence of weak learners (small trees), \(\hat f_b\), \(b=1...B\) as follows.

(boost.grid <- grid_regular(learn_rate(range = c(-2,0)), levels = 10))
## # A tibble: 10 × 1
##    learn_rate
##         <dbl>
##  1     0.01  
##  2     0.0167
##  3     0.0278
##  4     0.0464
##  5     0.0774
##  6     0.129 
##  7     0.215 
##  8     0.359 
##  9     0.599 
## 10     1
boost_recipe <- 
  recipe(formula = type ~ ., data = image.train.tbl) %>% 
  step_zv(all_predictors()) 

boost_spec <- 
  boost_tree(learn_rate = tune()) %>% 
  set_mode("classification") %>% 
  set_engine("xgboost") 

boost_workflow <- 
  workflow() %>% 
  add_recipe(boost_recipe) %>% 
  add_model(boost_spec)
boost_tune <- tune_grid(boost_workflow, resamples = images.folds, grid = boost.grid)

best.param <- select_best(boost_tune, "accuracy")
boost.final.wf <- finalize_workflow(boost_workflow, best.param)
boost.final.fit <- fit(boost.final.wf, image.train.tbl)
## [03:46:32] WARNING: amalgamation/../src/learner.cc:1115: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
augment(boost.final.fit, image.test.tbl) %>%
  conf_mat(truth = type, estimate = .pred_class)
##           Truth
## Prediction cat dog
##        cat  29  15
##        dog   8  28
augment(boost.final.fit, image.test.tbl) %>%
  accuracy(truth = type, estimate = .pred_class)
## # A tibble: 1 × 3
##   .metric  .estimator .estimate
##   <chr>    <chr>          <dbl>
## 1 accuracy binary         0.712
boost.pred.tbl <- augment(boost.final.fit, new_data = image.test.tbl) 

boost_missed <- boost.pred.tbl %>%
  filter(type == "dog" & .pred_class == "cat" | type == "cat" & .pred_class == "dog") 

plotImage(as.matrix(boost_missed[1,1:4096]))

plotImage(as.matrix(boost_missed[2,1:4096]))

boost_missed %>% select(.pred_dog, .pred_cat) %>% head(2)
## # A tibble: 2 × 2
##   .pred_dog .pred_cat
##       <dbl>     <dbl>
## 1    0.216      0.784
## 2    0.0714     0.929

These are both dogs misclassified as cats.

In the first image: prob(dog) = 0.216, prob(cat) = 0.784

In the second image: prob(dog) = 0.0714, prob(cat) = 0.929