2024-01-17
The earliest manifestation of the receiver operating characteristic (ROC) curve was during World War II for the analysis of radar signals, and it consequently entered the scientific literature in the 1950s in connection with signal detection theory and psychophysics, where assessment of human and animal detection of weak signals was of considerable interest.
The seminal text for the early work was that by Green and Swets 1.
Later, in the 1970s and1980s, it became evident that the technique was of considerable relevance to medical test evaluation and decision making.
The decades since then have seen much development and use of the technique in areas such as radiology, cardiology, clinical chemistry, and epidemiology.
The name Receiver Operating Characteristic arises from the use of such curves in signal detection theory 1 2, where the aim is to detect the presence of a particular signal, missing as few genuine occurrences as possible while simultaneously raising as few false alarms as possible.
That is, in signal detection theory, the aim is to assign each event either into the signal class or into the non signal class — so that the abstract situation is the same as above. The word “characteristic” in Receiver Operating Characteristic refers to the characteristics of behavior of the classifier over the potential range of its operation.
A huge number of situations can be described by the following abstract framework,
Each of a set of objects is known to belong to one of two classes.
An assignment procedure assigns each object to a class on the basis of information observed about that object.
Unfortunately, the assignment procedure is not perfect: errors are made, meaning that sometimes an object is assigned to an incorrect class. Because of this imperfection, we need to evaluate the quality of performance of the procedure.
In some cases, of course, more than two classes might be involved, but the case of two classes is by far the most important one in practice (sick/well, yes/no, right/wrong,accept/reject, act/do not act, condition present/absent, and so on).
The information about each object which is used to assign it to a class can be regarded as a vector of descriptive variables, characteristics, or features.
Sometimes the vector of descriptive variables will be univariate, but often it will be multivariate.
The type of information that one obtains depends on the level of measurement of each variable:
In general, multivariate methods are more powerful than univariate methods, if only because each component of the descriptive vector can add extra information about the class of the object.
The multiple measurements taken on each object are then reduced to a single score \(S(\boldsymbol{X})\) for that object by some appropriate function.
The majority of functions \(S\) with which we are typically concerned will convert the raw information into a continuous value (a score on a univariate continuum).
The class assignment or classification is then made by comparing this score with a threshold: if the score is above the threshold they are assigned to one class, and if the score is below the threshold to the other.
We denote the characteristics describing objects as \(\boldsymbol{X}\), with \(\boldsymbol{x}\) denoting particular values, and the resulting scores as \(S(\boldsymbol{X})\), taking particular values \(S(\boldsymbol{x})\). The classification threshold \(T\) takes values denoted by \(t\).
In general, we denote the two classes by P (positive) and N (negative). The emphasis is often on identifying P individuals correctly.
symmetry vs asymmetry between two populations
“supervised classification”
A central aspect to developing classification rules is to choose the function \(S\) which reduces the vector \(\boldsymbol{x}\) to a single score, to construct a score function \(S(\boldsymbol{X})\) such that members of the two classes have distinctly different sets of scores, thereby enabling the classes to be clearly distinguished.
We will assume that the scores have been orientated in such a way that members of class P tend to have large scores and members of class N tend to have small scores, with a threshold that divides the scores into two groups.
Includes both the descriptive vectors \(\boldsymbol{X}\) and the true P or N classes of each of the objects in this set.
Any proposed function \(S\) will then produce a distribution of scores for the members of P in the training set, and a distribution of scores for the members of N in the training set, and the score for any particular object can then be compared with the classification threshold \(t\).
Also it is important to define the criterion used to estimate any parameters in \(S\).
In a weighted sum, for example, we must choose the weights, in a partition of \(\boldsymbol{X}\) we must choose the positions of the cut points, and so on.
Training data are used to construct the classification rule. Then, having finally settled on a rule, we want to know how effective it will be in assigning future objects to classes. To explore this, we need actually to assign some objects to classes and see, in some way, how well the rule does.
Four joint probabilities,
One very common measure is the misclassification, but it is far from perfect.
\(p(P|s>t)\), ppv
\(p(N|s\leq t)\), npv
mis-classification rate as a weighted sum of the tp and fp
Youden index