Skip to main content
Fig. 9 | BioData Mining

Fig. 9

From: The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification

Fig. 9

MCC versus ROC AUC plots, TPR versus ROC AUC plots, and TNR versus ROC AUC plots. We developed an R script where we randomly generated a binary ground truth vector of 10 elements, and then we executed a loop where we produced a list of synthesized predictions of real values between 0 and 1, for 10,000 times. For each prediction, we computed the ROC AUC and its corresponding normalized MCC, where \(normMCC = (MCC + 1) / 2\), sensitivity, and specificity with cut-off threshold \(\tau = 0.5\). Negatively imbalanced ground truth (a,c,f): the ground truth labels are (0, 0, 0, 0, 0, 0, 0, 1, 1, 1), corresponding to 70% negative elements and 30% positive elements. Balanced ground truth (b,d,g): the ground truth labels are (0, 0, 0, 0, 0, 1, 1, 1, 1, 1), corresponding to 50% negative elements and 50% positive elements. Positively imbalanced ground truth (c,e,h): the ground truth labels are (0, 0, 0, 1, 1, 1, 1, 1, 1, 1), corresponding to 30% negative elements and 70% positive elements. In each plot, the ground truth is fixed and never changes, while our script generated 10 random real values in the [0; 1] interval 10,000 times: each time, our script calculates the resulting ROC AUC and normMCC, which corresponds to a single point in the plot. The ground truth values and the predictions are the same of Fig. 10. TPR: true positive rate, sensitivity, recall (Eq. 1). TNR: true negative rate, specificity (Eq. 2). ROC AUC: area under the receiver operating characteristics curve. MCC: Matthews correlation coefficient (Eq. 7). normMCC: normalized MCC (Eq. 8). ROC AUC, normMCC, specificity, and sensitivity range from 0 (minimum and worst value) to 1 (maximum and best value). Blue line: regression line made with smoothed conditional means

Back to article page