## Specificity

Figure 4.18: Sensitivity and specificity trade-offs are defined bythe ROC curve. ROC curves a, b and c describe the trade-offs in sensitivity and specificity for the same data set using three different classification algorithms. c is clearly the best algorithm because the area under the curve is the highest. That is, it provides a better set of sensitivities and specificities for all values of these. In contrast, b and c provide trade-offs inferior to a and for some range of sensitivities, b provides better performance than c, and vice versa.

To calculate the ROC curve for a single continuous variable[12] all that is required is to obtain the sensitivity and specificity at a sufficient number of thresholds. The procedure is somewhat more complex with classification models that involve different thresholds on several variables. For example, for decision trees, ROC curves are determined by first assigning to each tree leaf the probability of being an event for a set of derived values that percolates to that point. These probabilities are based upon the ratio of events to nonevents that fall into each leaf during training. The threshold for considering a case to be event or nonevent is then set at each leaf probability value. The resulting sensitivity-specificity pairs, when plotted on a grid of sensitivity versus (1-specificity), gives the ROC curve.[13]

[10]Many assumptions go into this estimate, including clearly erroneous ones such as the independence of all genes in their expression patterns.

[11]For example, defined classes of two different types of leukemia as judged by an expert panel.

[12]For example, if gene A > threshold, then the prediction is that the sample is from a patient with the disease, and if gene A d threshold, then the prediction is that the sample is from a patient without a disease.

[13]However, because the number of leaves in a decision tree are finite, the number of points that can be ascertained for that ROC curve is correspondingly finite and therefore the curve appears as a segmented trajectory of straight lines.