When we left off the first case study, we saw that our model was pretty accurate. We can and will do better in terms of accuracy. But, the accuracy value we found also raises a broader question: How good was our model in terms of making predictions?
While accuracy is an easy to understand and interpret value, it provides a limited answer to the above question about how good our model was in terms of making predictions.
In this learning and case study, we explore a variety of ways to understand how good of a predictive model ours is. We do this through a variety of means:
Other statistics, such as sensitivity and specificity
Tables–particularly, a confusion matrix
Our driving question for this case study, then, is: How good is our model at making predictions?
We’ll use the Open University Learning Analytics Dataset (OULAD) again–this time, adding another data source, one on students’ performance on assessments.
Like in the first learning lab, we’ll first load several packages –
{tidyverse}, {tidymodels}, and {janitor}. Make sure these are installed
(via install.packages()). Note that if they’re already
installed, you don’t need to install them again.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.0 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidymodels)
## ── Attaching packages ────────────────────────────────────── tidymodels 1.2.0 ──
## ✔ broom 1.0.5 ✔ rsample 1.2.1
## ✔ dials 1.2.1 ✔ tune 1.2.0
## ✔ infer 1.0.7 ✔ workflows 1.1.4
## ✔ modeldata 1.3.0 ✔ workflowsets 1.1.0
## ✔ parsnip 1.2.1 ✔ yardstick 1.3.1
## ✔ recipes 1.0.10
## ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
## ✖ scales::discard() masks purrr::discard()
## ✖ dplyr::filter() masks stats::filter()
## ✖ recipes::fixed() masks stringr::fixed()
## ✖ dplyr::lag() masks stats::lag()
## ✖ yardstick::spec() masks readr::spec()
## ✖ recipes::step() masks stats::step()
## • Learn how to get started at https://www.tidymodels.org/start/
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(yardstick)
library(dplyr)
Like in the code-along for the overview presentation, let’s take a look at the data and do some processing of it.
We’ll load the students file together; you’ll write code to read the
assessments file, which is named “oulad-assessments.csv”. Please assign
the name assessments to the loaded assessments file.
assessments <- read.csv("data/oulad-assessments.csv")
In the last lab, we used the count() function. Let’s
practice that again, by counting the number of of assessments of
different types. If you need, recall that the data dictionary is here. Note what
the different types of assessments mean.
##view(assessments)
assessments %>%
count(assessment_type) # number of each assessment type
## assessment_type n
## 1 CMA 70527
## 2 Exam 4959
## 3 TMA 98426
We’ll now use another function–like count(), from the
{tidyverse}. Specifically, we’ll use the distinct()
function. This returns the unique (or distinct) values for a specified
variable. Learn more about distinct() here.
Below, find the distinct assessment IDs.
assessments %>%
distinct(id_assessment)
## id_assessment
## 1 1752
## 2 1753
## 3 1754
## 4 1755
## 5 1756
## 6 1758
## 7 1759
## 8 1760
## 9 1761
## 10 1762
## 11 14984
## 12 14985
## 13 14986
## 14 14987
## 15 14988
## 16 14989
## 17 14991
## 18 14992
## 19 14993
## 20 14994
## 21 14995
## 22 14996
## 23 14997
## 24 14998
## 25 14999
## 26 15000
## 27 15001
## 28 15003
## 29 15004
## 30 15005
## 31 15006
## 32 15007
## 33 15008
## 34 15009
## 35 15010
## 36 15011
## 37 15012
## 38 15013
## 39 15015
## 40 15016
## 41 15017
## 42 15018
## 43 15019
## 44 15020
## 45 15021
## 46 15022
## 47 15023
## 48 15024
## 49 24282
## 50 24283
## 51 24284
## 52 24285
## 53 24286
## 54 24287
## 55 24288
## 56 24289
## 57 24290
## 58 24291
## 59 24292
## 60 24293
## 61 24294
## 62 24295
## 63 24296
## 64 24297
## 65 24298
## 66 24299
## 67 25334
## 68 25335
## 69 25336
## 70 25337
## 71 25338
## 72 25339
## 73 25340
## 74 25341
## 75 25342
## 76 25343
## 77 25344
## 78 25345
## 79 25346
## 80 25347
## 81 25348
## 82 25349
## 83 25350
## 84 25351
## 85 25352
## 86 25353
## 87 25354
## 88 25355
## 89 25356
## 90 25357
## 91 25358
## 92 25359
## 93 25360
## 94 25361
## 95 25362
## 96 25363
## 97 25364
## 98 25365
## 99 25366
## 100 25367
## 101 25368
## 102 30709
## 103 30710
## 104 30711
## 105 30712
## 106 30714
## 107 30715
## 108 30716
## 109 30717
## 110 30719
## 111 30720
## 112 30721
## 113 30722
## 114 34860
## 115 34861
## 116 34862
## 117 34863
## 118 34864
## 119 34865
## 120 34866
## 121 34867
## 122 34868
## 123 34869
## 124 34870
## 125 34871
## 126 34873
## 127 34874
## 128 34875
## 129 34876
## 130 34877
## 131 34878
## 132 34879
## 133 34880
## 134 34881
## 135 34882
## 136 34883
## 137 34884
## 138 34886
## 139 34887
## 140 34888
## 141 34889
## 142 34890
## 143 34891
## 144 34892
## 145 34893
## 146 34894
## 147 34895
## 148 34896
## 149 34897
## 150 34899
## 151 34900
## 152 34901
## 153 34902
## 154 34903
## 155 34904
## 156 34905
## 157 34906
## 158 34907
## 159 34908
## 160 34909
## 161 34910
## 162 37415
## 163 37416
## 164 37417
## 165 37418
## 166 37419
## 167 37420
## 168 37421
## 169 37422
## 170 37423
## 171 37425
## 172 37426
## 173 37427
## 174 37428
## 175 37429
## 176 37430
## 177 37431
## 178 37432
## 179 37433
## 180 37435
## 181 37436
## 182 37437
## 183 37438
## 184 37439
## 185 37440
## 186 37441
## 187 37442
## 188 37443
Let’s explore the assessments data a bit.
We might be interested in how many assessments there are per course.
To calculate that, we need to count() several variables at
once; when doing this, count() tabulates the number of
unique combinations of the variables passed to it.
assessments %>%
count(assessment_type, code_module, code_presentation)
## assessment_type code_module code_presentation n
## 1 CMA BBB 2013B 5049
## 2 CMA BBB 2013J 6416
## 3 CMA BBB 2014B 4493
## 4 CMA CCC 2014B 3920
## 5 CMA CCC 2014J 5846
## 6 CMA DDD 2013B 5252
## 7 CMA FFF 2013B 6681
## 8 CMA FFF 2013J 8847
## 9 CMA FFF 2014B 5549
## 10 CMA FFF 2014J 8915
## 11 CMA GGG 2013J 3749
## 12 CMA GGG 2014B 3063
## 13 CMA GGG 2014J 2747
## 14 Exam CCC 2014B 747
## 15 Exam CCC 2014J 1168
## 16 Exam DDD 2013B 602
## 17 Exam DDD 2013J 968
## 18 Exam DDD 2014B 524
## 19 Exam DDD 2014J 950
## 20 TMA AAA 2013J 1633
## 21 TMA AAA 2014J 1516
## 22 TMA BBB 2013B 6207
## 23 TMA BBB 2013J 7959
## 24 TMA BBB 2014B 5500
## 25 TMA BBB 2014J 7408
## 26 TMA CCC 2014B 2822
## 27 TMA CCC 2014J 4437
## 28 TMA DDD 2013B 4519
## 29 TMA DDD 2013J 6968
## 30 TMA DDD 2014B 4018
## 31 TMA DDD 2014J 7063
## 32 TMA EEE 2013J 2884
## 33 TMA EEE 2014B 1780
## 34 TMA EEE 2014J 3229
## 35 TMA FFF 2013B 5514
## 36 TMA FFF 2013J 7393
## 37 TMA FFF 2014B 4647
## 38 TMA FFF 2014J 7269
## 39 TMA GGG 2013J 2201
## 40 TMA GGG 2014B 1833
## 41 TMA GGG 2014J 1626
Let’s explore the dates assignments were submitted a bit – using the
summarize() function:
assessments %>%
summarize(mean_date = mean(date, na.rm = TRUE), # find the mean date for assignments
median_date = median(date, na.rm = TRUE), # find the median
sd_date = sd(date, na.rm = TRUE), # find the sd
min_date = min(date, na.rm = TRUE), # find the min
max_date = max(date, na.rm = TRUE)) # find the mad
## mean_date median_date sd_date min_date max_date
## 1 130.6056 129 78.02517 12 261
What can we take from this? It looks like, on average, the average (mean and median) date assignments were due was around 130 – 130 days after the start of the course. The first assignment seems to have been due 12 days into the course, and the last 261 days after.
Crucially, though, these dates vary by course. Thus, we need to first
group the data by course. Let’s use a different function this time –
quantile(), and calculate the first quantile value. Our
reasoning is that we want to be able to act to support students, and if
we wait until after the average assignment is due, then that might be
too late. Whereas the first quantile comes approximately one-quarter
through the semester — with, therefore, more time to intervene.
assessments %>%
group_by(code_module, code_presentation) %>% # first, group by course (module: course; presentation: semester)
summarize(mean_date = mean(date, na.rm = TRUE),
median_date = median(date, na.rm = TRUE),
sd_date = sd(date, na.rm = TRUE),
min_date = min(date, na.rm = TRUE),
max_date = max(date, na.rm = TRUE),
first_quantile = quantile(date, probs = .25, na.rm = TRUE)) # find the first (25%) quantile
## `summarise()` has grouped output by 'code_module'. You can override using the
## `.groups` argument.
## # A tibble: 22 × 8
## # Groups: code_module [7]
## code_module code_presentation mean_date median_date sd_date min_date max_date
## <chr> <chr> <dbl> <dbl> <dbl> <int> <int>
## 1 AAA 2013J 109. 117 71.3 19 215
## 2 AAA 2014J 109. 117 71.5 19 215
## 3 BBB 2013B 104. 89 55.5 19 187
## 4 BBB 2013J 112. 96 61.6 19 208
## 5 BBB 2014B 98.9 82 58.6 12 194
## 6 BBB 2014J 99.1 110 65.2 19 201
## 7 CCC 2014B 98.4 102 68.0 18 207
## 8 CCC 2014J 104. 109 70.8 18 214
## 9 DDD 2013B 104. 81 66.0 23 240
## 10 DDD 2013J 118. 88 77.9 25 261
## # ℹ 12 more rows
## # ℹ 1 more variable: first_quantile <dbl>
Alright, this is a bit complicated, but we can actually work with
this data. Let’s use just a portion of the above code, assigning it the
name code_module_dates.
code_module_dates <- assessments %>%
group_by(code_module, code_presentation) %>%
summarize(quantile_cutoff_date = quantile(date, probs = .25, na.rm = TRUE))
## `summarise()` has grouped output by 'code_module'. You can override using the
## `.groups` argument.
Let’s take a look at what we just created; type
code_module_dates below:
code_module_dates
## # A tibble: 22 × 3
## # Groups: code_module [7]
## code_module code_presentation quantile_cutoff_date
## <chr> <chr> <dbl>
## 1 AAA 2013J 54
## 2 AAA 2014J 54
## 3 BBB 2013B 54
## 4 BBB 2013J 54
## 5 BBB 2014B 47
## 6 BBB 2014J 54
## 7 CCC 2014B 32
## 8 CCC 2014J 32
## 9 DDD 2013B 51
## 10 DDD 2013J 53
## # ℹ 12 more rows
What have we created? We found the date that is one-quarter of the way through the semester (in terms of the dates assignments are due).
We can thus use this to group and calculate students’ performance on
assignments through this point. To do this, we need to use a join
function — left_join(), in particular. Please use
left_join() to join together assessments and
code_module_dates, with assessments being the
“left” data frame, and code_module_dates the right. Save
the output of the join the name assessments_joined.
##view(code_module_dates)
assessments_joined <- merge(x = assessments, y = code_module_dates,by=c("code_module","code_presentation"),all.x=TRUE)
##view(assessments_joined)
We’re almost there! The next few lines filter the assessments data so it only includes assessments before our cutoff date.
assessments_filtered <- assessments_joined %>%
filter(date < quantile_cutoff_date) # filter the data so only assignments before the cutoff date are included
##view(assessments_filtered)
Finally, we’ll find the average score for each student.
assessments_summarized <- assessments_filtered %>%
mutate(weighted_score = score * weight) %>% # create a new variable that accounts for the "weight" (comparable to points) given each assignment
group_by(id_student) %>%
summarize(mean_weighted_score = mean(weighted_score))
##view(assessments_summarized)
As a point of reflection here, note how much work we’ve done before we’ve gotten to the machine learning steps. Though a challenge, this is typical for both machine learning and more traditional statistical models: a lot of the work is in the preparation and data wrangling stages.
Let’s copy the code below that we used to process the students data
(when processing the pass and imd_band
variables).
students <- read_csv("data/oulad-students.csv")
## Rows: 32593 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (9): code_module, code_presentation, gender, region, highest_education, ...
## dbl (6): id_student, num_of_prev_attempts, studied_credits, module_presentat...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
students <- students %>%
mutate(pass = ifelse(final_result == "Pass", 1, 0)) %>% # creates a dummy code
mutate(pass = as.factor(pass)) # makes the variable a factor, helping later steps
students <- students %>%
mutate(imd_band = factor(imd_band, levels = c("0-10%",
"10-20%",
"20-30%",
"30-40%",
"40-50%",
"50-60%",
"60-70%",
"70-80%",
"80-90%",
"90-100%"))) %>% # this creates a factor with ordered levels
mutate(imd_band = as.integer(imd_band)) # this changes the levels into integers based on the order of the factor levels
Finally, let’s join together students and
assessments_summarized, assigning the joined the name
students_and_assessments.
##view(students)
##view(assessments_summarized)
## this is an inner join, using left join there were student_id's that were not in the assessments_summarized table
students_and_assessments <- merge(x = students, y = assessments_summarized, by="id_student")
##view(students_and_assessments)
Big picture, we’ve added another measure – another feature – that we can use to make predictions: students’ performance on assessments 1/4 of the way through the course.
We’re now ready to proceed to our machine learning steps!
The problem we will be working on - predicting students who pass, based on data from the first one-third of the semester - has an analog in a recent paper by Ryan Baker and colleagues:
In Baker et al. (2020), the authors create a precision-recall (also known as sensitivity) graph - one that demonstrates the trade-off between optimizing these two statistics. Review their paper - especially the results section - to see how they discuss these two statistics.
Baker, R. S., Berning, A. W., Gowda, S. M., Zhang, S., & Hawn, A. (2020). Predicting K-12 dropout. Journal of Education for Students Placed at Risk, 25(1), 28-54.
Please review this paper before proceeding, focusing on how they describe
This is identical to what we did in the first learning lab, using the
students_and_assessments data. We’ll also create a testing
data set we’ll use later.
set.seed(20230712)
students_and_assessments <- students_and_assessments %>%
drop_na(mean_weighted_score)
train_test_split <- initial_split(students_and_assessments, prop = .50, strata = "pass")
data_train <- training(train_test_split)
data_test <- testing(train_test_split)
We’ll also specify the recipe, adding our
mean_weighted_score variable we calculated above as well as
variables we used in LL1 (case study and badge). Note how we dummy code
two variables using step_dummy() (described further in the
first learning lab).
my_rec <- recipe(pass ~ disability +
date_registration +
gender +
code_module +
mean_weighted_score,
data = data_train) %>%
step_dummy(disability) %>%
step_dummy(gender) %>%
step_dummy(code_module)
These steps are also the same as in LL1.
# specify model
my_mod <-
logistic_reg() %>%
set_engine("glm") %>% # generalized linear model
set_mode("classification") # since we are predicting a dichotomous outcome, specify classification; for a number, specify regression
# specify workflow
my_wf <-
workflow() %>% # create a workflow
add_model(my_mod) %>% # add the model we wrote above
add_recipe(my_rec) # add our recipe we wrote above
Finally, we’ll fit our model.
fitted_model <- fit(my_wf, data = data_train)
Finally, we’ll use the last_fit function, but we’ll add
something a little different - we’ll add the metric set here. To the
above, we’ll add another common metric - Cohen’s Kappa, which is similar
to accuracy, but which accounts for chance agreement.
To do so, we have to specify which metrics we want to use
using the metric_set() function (see all of the available
options here).
Please specify six metrics in the metric set – accuracy, sensitivity,
specificity, ppv (recall), npv, and kappa. Assign the name
class_metrics to the output of your use of the
metric_set() function.
class_metrics <- metric_set(yardstick::accuracy, sensitivity, specificity, ppv, npv, kap)
Then, please use last_fit, looking to how we did this in
the last learning lab for a guide on how to specify the argument nts. To
the arguments, add metrics = class_metrics.
final_fit <- last_fit(my_wf, train_test_split, metrics = class_metrics)
We’re now ready to move on to interpreting the accuracy of our model.
Let’s start with a simple confusion matrix. The confusion matrix is a 2 x 2 table with values (cells in the table) representing one of four conditions, elaborated below. You’ll fill in the last two columns in a few moments.
| Statistic | How to Find | Interpretation | Value | % |
|---|---|---|---|---|
| True positive (TP) | Truth: 1, Prediction: 1 | Proportion of the population that is affected by a condition and correctly tested positive | ||
| True negative (TN) | Truth: 0, Prediction: 0 | Proportion of the population that is not affected by a condition and correctly tested negative | ||
| False positive (FP) | Truth: 0, Prediction: 1 | Proportion of the population that is not affected by a condition and incorrectly tested positive | ||
| False negative (FN) | Truth: 1, Prediction: 0 | Proportion of the population that is affected by a condition and incorrectly tested positive. |
To create a confusion matrix, run collect_predictions(),
which does what it suggests - it gathers together the model’s test set
predictions. Pass the last_fit object to this function
below.
predictions <- collect_predictions(final_fit)
predictions
## # A tibble: 12,318 × 5
## .pred_class id .row pass .config
## <fct> <chr> <int> <fct> <chr>
## 1 1 train/test split 1 1 Preprocessor1_Model1
## 2 0 train/test split 3 0 Preprocessor1_Model1
## 3 1 train/test split 4 1 Preprocessor1_Model1
## 4 1 train/test split 5 0 Preprocessor1_Model1
## 5 0 train/test split 6 1 Preprocessor1_Model1
## 6 1 train/test split 7 0 Preprocessor1_Model1
## 7 1 train/test split 11 1 Preprocessor1_Model1
## 8 1 train/test split 12 1 Preprocessor1_Model1
## 9 0 train/test split 14 0 Preprocessor1_Model1
## 10 1 train/test split 17 0 Preprocessor1_Model1
## # ℹ 12,308 more rows
Take a look at the columns. You’ll need to provide the predictions
(created with collect_predictions()) and then pipe that to
conf_mat(), to which you provide the names of a) the
predictions and b) the known values for the test set. Some code to get
you started is below.
# Calculate confusion matrix
confusion_matrix <- predictions %>%
conf_mat(.pred_class, pass)
##conf_mat(truth = truth, estimate =.pred_class)
##confusion_matrix
# Total number of data points in the test data set
total <- sum(confusion_matrix$table)
# Extract TP, TN, FP, FN from the confusion matrix
TP <- confusion_matrix$table[2, 2] # True Positives
TN <- confusion_matrix$table[1, 1] # True Negatives
FP <- confusion_matrix$table[1, 2] # False Positives
FN <- confusion_matrix$table[2, 1] # False Negatives
# Calculate percentages
TP_percentage <- (TP / total) * 100
TN_percentage <- (TN / total) * 100
FP_percentage <- (FP / total) * 100
FN_percentage <- (FN / total) * 100
# Calculate accuracy (sum of TP and TN percentages)
accuracy <- (TP_percentage + TN_percentage)
# Fill in the values and percentages
confusion_matrix$Value <- c(TP, TN, FP, FN)
confusion_matrix$Percentage <- c(TP_percentage, TN_percentage, FP_percentage, FN_percentage)
# Confusion matrix with filled values and percentages
str(confusion_matrix)
## List of 3
## $ table : 'table' num [1:2, 1:2] 4589 3346 2012 2366
## ..- attr(*, "dimnames")=List of 2
## .. ..$ Prediction: chr [1:2] "0" "1"
## .. ..$ Truth : chr [1:2] "0" "1"
## $ Value : num [1:4] 2366 4589 2012 3346
## $ Percentage: num [1:4] 19.2 37.3 16.3 27.2
## - attr(*, "class")= chr "conf_mat"
view(confusion_matrix)
print(confusion_matrix)
## Truth
## Prediction 0 1
## 0 4589 2012
## 1 3346 2366
# Print accuracy
cat("Accuracy:", accuracy, "%\n")
## Accuracy: 56.48502 %
You should see a confusion matrix output. If so, nice work! Please fill in the Value and Percentage columns in the table above (with TP, TN, FP, and FN), entering the appropriate values and then converting those into a percentage based on the total number of data points in the test data set. The accuracy can be interpreted as the sum of the true positive and true negative percentages. So, what’s the accuracy? Add that below as a percentage.
Here’s where things get interesting: There are other statistics that have different denominators. Recall from the overview presentation that we can slice and dice the confusion matrix to calculate statistics that give us insights into the predictive utility of the model. Based on the above Values for TP, TN, FP, and FN you interpreted a few moments ago, add the Statistic Values for sensitivity, specificity, precision, and negative predictive value below to three decimal points.
| Statistic | Equation | Interpretation | Question Answered | Statistic Values |
| Sensitivity (AKA recall) | TP / (TP + FN) | Proportion of those who are affected by a condition and correctly tested positive | Out of all the actual positives, how many did we correctly predict? | |
| Specificity | TN / (FP + TN) | Proportion of those who are not affected by a condition and correctly tested negative; | Out of all the actual negatives, how many did we correctly predict? | |
| Precision (AKA Positive Predictive Value) | TP / (TP + FP) | Proportion of those who tested positive who are affected by a condition | Out of all the instances we predicted as positive, how many are actually positive? | |
| Negative Predictive Value | TN / (TN + FN) | Proportion of those who tested positive who are not affected by a condition; the probability that a negative test is correct | Out of all the instances we predicted as negative, how many are actually negative? |
So, what does this hard-won by output tell us? Let’s interpret each statistic carefully in the table below. Please add the value and provide a substantive interpretation. One is provided to get you started.
| Statistic | Substantive Interpretation |
| Accuracy | The model has an accuracy of 56.5%.This measures the overall correctness of the model’s predictions. |
| Sensitivity (AKA recall) | The model correctly predicts about 2/3 of students who do not pass correctly (as not passing). This is pretty good, but it could be better. |
| Specificity | |
| Precision (AKA Positive Predictive Value) | 54%. Measures the accuracy of positive predictions made by the model. |
| Negative Predictive Value | 57.8%. Measures the proportion of true negative predictions out of all the instances predicted as negative by the model.In other words, NPV measures the model’s ability to correctly identify negative instances. A high NPV indicates that the model is good at avoiding false negatives, meaning it correctly identifies instances that do not belong to the positive class. |
sensitivity_calc <- round(TP / (TP + FN),3)
sensitivity_calc
## [1] 0.414
precision_calc <- round(TP / (TP + FP),3)
precision_calc
## [1] 0.54
npd_value <- round(TN / (TN + FN),3)
npd_value
## [1] 0.578
specificity_calc <- round(TN / (FP + TN),3)
specificity_calc
## [1] 0.695
This process might suggest to you how a “good” model isn’t necessarily one that is the most accurate, but instead is one that has good values for statistics that matter given our particular question and context.
Recall that Baker and colleagues sought to balance between precision and recall (specificity). Please briefly discuss how well our model does this; is it better suited to correctly identifying “positive” pass cases (sensitivity) or “negatively” identifying students who do not pass (specificity)?
After all of this work, we can calculate the above much more
easily given how we setup our metrics (using metric_set()
earlier, such as when we want to efficiently communicate the results of
our analysis to our audience? Below, use collect_metrics()
with the final_fit object, noting that in addition to the
four metrics we calculated manually just a few moments ago, we are also
provided with the accuracy and Cohen’s Kappa values.
collect_metrics(final_fit)
## # A tibble: 6 × 4
## .metric .estimator .estimate .config
## <chr> <chr> <dbl> <chr>
## 1 accuracy binary 0.565 Preprocessor1_Model1
## 2 sensitivity binary 0.695 Preprocessor1_Model1
## 3 specificity binary 0.414 Preprocessor1_Model1
## 4 ppv binary 0.578 Preprocessor1_Model1
## 5 npv binary 0.540 Preprocessor1_Model1
## 6 kap binary 0.111 Preprocessor1_Model1
Having invested in understanding the terminology of machine learning
metrics, we’ll use this “shortcut” (using collect_metrics()
going forward.
Congratulations - you’ve completed this case study! Go over to Lab 2 badge activity.