## Performance metrics

In evaluating genes or features that are meant to decide one category versus another, other relevant performance metrics include sensitivity, specificity, positive predictive value (PPV), accuracy, and area under the receiver operating characteristic (ROC) curve [79].

Let us refer to the state of disease or any category of interest as the "disease state" and all other categories or states as "nondisease states." The sensitivity of a test measures the proportion of correctly categorized disease cases out of the total number of actual disease cases, while specificity measures the proportion of correctly categorized nondisease cases out of the total number of actual nondisease cases. Positive predictive value is the fraction of cases correctly categorized as disease over the number of all disease cases, regardless of categorization. Accuracy is calculated by the ratio of correctly categorized cases over all the cases.

To further illustrate these metrics, figure 4.17 shows a contingency table in which the two classes are called 0 (non-disease) and 1 (disease). The vertical groupings of cases are gold standard-labeled[11] cases belonging to each class. The horizontal groupings may be cases automatically categorized into each class. For example, a cases and c cases were actual class 0 cases, though a cases and b cases were labeled by a particular model as class 0. Similarly, b cases and d cases were actual class 1 cases, while c cases and d cases were labeled by a particular model as class 1. The relevant performance metrics in equation form are then as follows:

Figure 4.17: Contingency table illustrating performance metrics for classification algorithms.

Sensitivity =

Number of correctly categorized event cases Total number of event cases

Figure 4.17: Contingency table illustrating performance metrics for classification algorithms.

Sensitivity =

Number of correctly categorized event cases Total number of event cases

Specificity =

Number of correctly categorized nonevent cases Total number of nonevent cases

Accuracy =

Number of correctly categorized event cases Number of categorized event cases (correct and incorrect)

a + d a + b + c + d Number of correctly categorized and nonevent cases

Total number of cases