*If you attended my workshop on neural networks in R using TensorFlow via Keras, then parts of the following may seem familiar!*

Now, that’s out the way, let’s get to it!

R/Pharma 2020 Conference for practitioners of R in the Pharmaceutical Industry • Oct. 15th 2020

*If you attended my workshop on neural networks in R using TensorFlow via Keras, then parts of the following may seem familiar!*

Now, that’s out the way, let’s get to it!

Identifying drug candidates is expensive!

Performing biochemical screening assays in the laboratory is time consuming

In cases, where the number of potential candidates is large, predictive modeling can help prioritise candidates for screening

Thereby, the search space and thus costs are greatly reduced

Source: Original file on wikipedia | Author | CC BY-SA 4.0

You work as a data scientist in a pharmaceutical company

You have taken delivery of a predictive model for candidate prioritisation

In the documentation it says, that the final model was created by expanding an initial simple model:

- A simple naive baseline model
- A more sophisticated high-complexity model

Also in the documentation you find some visualisations quantifying some performance metrics for the final model

Source: Original file on kissclipart

Metrics:

mse = mean-squared-error (low = good)

pcc = Pearson’s correlation Coefficient (high = good)

scc = Spearman’s correlation Coefficient (high = good)

Conclusion:

Evidently, the complex model captures the more subtle information better

Therefore, it is decided to put the complex model into production

You create a shiny app wrapping the model predictions and continue with other tasks

However…

Time goes by…

People using the model for prioritisation in the wet lab start complaining that despite prioritising targets using the model, only very few of the prioritised candidates are found to be relevant downstream