Tural Sadigov
Hamilton College
Artwork by @allison_horst
Data are available by CC-0 license in accordance with the Palmer Station LTER Data Policy and the LTER Data Access Policy for Type I data.
# A tibble: 5 × 6
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
# A tibble: 3 × 6
species bill_length_mm bill_depth_mm flipper_length_mm body_mass_g year
<fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Adelie 38.8 18.3 190. 3706. 2008.
2 Chinstrap 48.8 18.4 196. 3733. 2008.
3 Gentoo 47.6 15.0 217. 5092. 2008.
Lets see Support Vector Machines in action to predict the gender of a penguin.
island
and load tidymodels
island
and load tidymodels
penguins_df <-
penguins %>%
drop_na() %>%
select(-island)
library(tidymodels)
set.seed(123)
penguins_df %>%
sample_n(10)
# A tibble: 10 × 6
species bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
<fct> <dbl> <dbl> <int> <int> <fct>
1 Gentoo 59.6 17 230 6050 male
2 Adelie 34.4 18.4 184 3325 female
3 Gentoo 45.2 15.8 215 5300 male
4 Chinstrap 49 19.5 210 3950 male
5 Adelie 41.4 18.5 202 3875 male
6 Chinstrap 51 18.8 203 4100 male
7 Gentoo 44.9 13.8 212 4750 female
8 Gentoo 51.1 16.5 225 5250 male
9 Chinstrap 50.8 19 210 4100 male
10 Gentoo 45.4 14.6 211 4800 female
set.seed(2022)
penguin_split <- initial_split(penguins_df,
strata = sex,
prop = .70)
penguin_train <- training(penguin_split)
penguin_test <- testing(penguin_split)
set.seed(1234)
penguin_cv <- vfold_cv(penguin_train, v = 5)
penguin_cv
# 5-fold cross-validation
# A tibble: 5 × 2
splits id
<list> <chr>
1 <split [185/47]> Fold1
2 <split [185/47]> Fold2
3 <split [186/46]> Fold3
4 <split [186/46]> Fold4
5 <split [186/46]> Fold5
# A tibble: 2 × 6
.metric .estimator mean n std_err .config
<chr> <chr> <dbl> <int> <dbl> <chr>
1 accuracy binary 0.914 5 0.0155 Preprocessor1_Model1
2 roc_auc binary 0.972 5 0.00644 Preprocessor1_Model1
# A tibble: 2 × 6
.metric .estimator mean n std_err .config
<chr> <chr> <dbl> <int> <dbl> <chr>
1 accuracy binary 0.914 5 0.0155 Preprocessor1_Model1
2 roc_auc binary 0.972 5 0.00644 Preprocessor1_Model1
# A tibble: 101 × 6
id .pred_female .pred_male .row .pred_class sex
<chr> <dbl> <dbl> <int> <fct> <fct>
1 train/test split 0.990 0.00990 14 female female
2 train/test split 0.784 0.216 16 female female
3 train/test split 0.700 0.300 18 female female
4 train/test split 0.210 0.790 19 male male
5 train/test split 0.00461 0.995 25 male male
6 train/test split 0.0285 0.972 31 male male
7 train/test split 0.0696 0.930 33 male female
8 train/test split 0.996 0.00401 40 female female
9 train/test split 0.0293 0.971 44 male male
10 train/test split 0.00938 0.991 48 male male
# … with 91 more rows
# ℹ Use `print(n = ...)` to see more rows
Manually calculate accuracy
Manually calculate accuracy
Manually calculate accuracy
M <- collect_predictions(penguin_final) %>%
conf_mat(sex, .pred_class)
(M$table[1,1] + M$table[2,2])/nrow(penguin_test)
[1] 0.9108911
Collect accuracy
Manually calculate accuracy
M <- collect_predictions(penguin_final) %>%
conf_mat(sex, .pred_class)
(M$table[1,1] + M$table[2,2])/nrow(penguin_test)
[1] 0.9108911
Collect accuracy