Purpose/Context: Project Everlast staff completed about 25 PAVE screening’s as part of their pilot period. As part of this, they also offered their input on whether or not they agreed with PAVE’s risk assessment. This offers a really valuable “golden” dataset where we have fully completed Screens alongside practitioner assessment of risk. We can use this information to train a model to mimic the risk assessment based on the decision making criteria that Project Everlast staff employ.
Training data includes case attributes alongside Project Everlast staff’s risk assessment. We can think of this as manually labeled data.
| Gender | Black.or.African.American | Limited.independence | staff_assessment |
|---|---|---|---|
| M | 0 | 0 | 0 |
| F | 0 | 0 | 1 |
| F | 1 | 1 | 0 |
Train a Naive Bayes model to predict traffikcing risk based on all case information & Project Everlast staff’s assessment of the case. Train the model exclusively on cases where the the Project Everlast staff made an assessment on the case’s risk level.
risk_model <- naive_bayes(staff_assessment ~ # staff assessment of risk is the outcome of interest
., # pass in every possible predictor variable ("." refers to all variables)
laplace = 1, # incorporate Laplace transformation to account for 0s
data=training_data)
summary(risk_model)
##
## ================================== Naive Bayes ==================================
##
## - Call: naive_bayes.formula(formula = staff_assessment ~ ., data = training_data, laplace = 1)
## - Laplace: 1
## - Classes: 2
## - Samples: 20
## - Features: 64
## - Conditional distributions:
## - Bernoulli: 35
## - Categorical: 29
## - Prior probabilities:
## - 0: 0.6
## - 1: 0.4
##
## ---------------------------------------------------------------------------------
test_data <- add_predictions(data, risk_model, var = "risk_prediction", type = NULL)
In general, the new predicted risk values agree with PAVE-generated risk assessments. The model’s predictions agree with PAVE risk assessments for about 77% of cases. Here’s some examples of the output:
| Case.No. | Gender | PAVE_Generated_Risk | User_Agree | Bayes_Model_Generated_Risk | model_agrees_with_PAVE | |
|---|---|---|---|---|---|---|
| 18 | MAFE1226 | F | 1 | Agree | 1 | 1 |
| 19 | tySm0317 | M | 1 | Unsure | 1 | 1 |
| 20 | JaBe0527 | N | 1 | Agree | 1 | 1 |
| 21 | ALMC0103 | F | 0 | Agree | 0 | 1 |
| 22 | DERO1299 | F | 0 | Agree | 0 | 1 |
The newly trained model can generate risk assessments in cases where staff disagreed with PAVE’s assessment. These newly generated risk assessments are generated by the way Project Everlast’s staff reasoned through risk assessment themselves. Intuitively then, the model’s predicted risk level agrees with the staff’s assessment, but disagrees with PAVE’s
| Case.No. | Gender | PAVE_Generated_Risk | User_Agree | Bayes_Model_Generated_Risk |
|---|---|---|---|---|
| ISBU1008 | F | 1 | Disagree | 0 |
| SiKe1222 | F | 1 | Disagree | 0 |
| LASH0211 | F | 1 | Disagree | 0 |
| COOL0912 | M | 1 | Disagree | 0 |
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.