## load library
if (!require("ggplot2")) install.packages("ggplot2")
if (!require("gridExtra")) install.packages("gridExtra")
if (!require("plotROC")) install.packages("plotROC")The Receiver Operating Characteristic (ROC) curve is used to assess the accuracy of a continuous measurement for predicting a binary outcome. In medicine, ROC curves have a long history of use for evaluating diagnostic tests in radiology and general diagnostics. ROC curves have also been used for a long time in signal detection theory.
TPR (True Positive Rate) / Recall /Sensitivity
\[ \begin{aligned} TPR (True Positive Rate) / Recall /Sensitivity = \frac{TP}{TP + FN} \end{aligned} \]
Specificity
\[ \begin{aligned} Specificity = \frac{TN}{TN + FP} \end{aligned} \]
FPR
\[ \begin{aligned} FPR &= 1 - Specificity\\ FPR &= \frac{FP}{TN + FP} \end{aligned} \]
Next I use the ggplot function to define the aesthetics, and the geom_roc function to add an ROC curve layer. The geom_roc function requires the aesthetics d for disease status, and m for marker. The disease status need not be coded as 0/1, but if it is not, stat_roc assumes (with a warning) that the lowest value in sort order signifies disease-free status. stat_roc and geom_roc are linked by default, with the stat doing the underlying computation of the empirical ROC curve, and the geom consisting of the ROC curve layer.
If you have grouping factors in your dataset, or you have multiple markers measured on the same subjects, you may wish to plot multiple ROC curves on the same plot. plotROC fully supports faceting and grouping done by ggplot2. In out example dataset, we have 2 markers measured in a paired manner
These data are in wide format, with the 2 markers going across 2 columns. ggplot requires long format, with the marker result in a single column, and a third variable identifying the marker. We provide the function melt_roc to perform this transformation. The arguments are the data frame, a name or index identifying the disease status column, and a vector of names or indices identifying the the markers. Optionally, the names argument gives a vector of names to assign to the marker, replacing their column names. The result is a data frame in long format.
Then, the dataset can be passed to the ggplot function, with the marker name given as a grouping or faceting variable.
Although ROCs are often used for evaluating and interpreting logistic regression models, they’re not limited to logistic regression. A common usage in medical studies is to run an ROC to see how much better a single continuous predictor can predict disease status compared to chance.
ggplot2 gridSVG