Skip to main content
Fig. 10 | BioData Mining

Fig. 10

From: The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification

Fig. 10

PPV versus ROC AUC plots and NPV versus ROC AUC plots. We developed an R script where we randomly generated a binary ground truth vector of 10 elements, and then we executed a loop where we produced a list of synthesized predictions of real values between 0 and 1, for 10,000 times. For each prediction, we computed the ROC AUC and its corresponding precision (PPV) and negative predictive value (NPV) with cut-off threshold \(\tau = 0.5\). Negatively imbalanced ground truth (i,l): the ground truth labels are (0, 0, 0, 0, 0, 0, 0, 1, 1, 1), corresponding to 70% negative elements and 30% positive elements. Balanced ground truth (j,m): the ground truth labels are (0, 0, 0, 0, 0, 1, 1, 1, 1, 1), corresponding to 50% negative elements and 50% positive elements. Positively imbalanced ground truth (k,n): the ground truth labels are (0, 0, 0, 1, 1, 1, 1, 1, 1, 1), corresponding to 30% negative elements and 70% positive elements. In each plot, the ground truth is fixed and never changes, while our script generated 10 random real values in the [0; 1] interval 10,000 times: each time, our script calculates the resulting ROC AUC and normMCC, which corresponds to a single point in the plot. The ground truth values and the predictions are the same of Fig. 9. PPV: precision, positive predictive value (Eq. 3). NPV: negative predictive value (Eq. 4). ROC AUC: area under the receiver operating characteristics curve. ROC AUC, precision, and NPV range from 0 (minimum and worst value) to 1 (maximum and best value). Blue line: regression line made with smoothed conditional means

Back to article page