Skip to main content
Fig. 11 | BioData Mining

Fig. 11

From: The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification

Fig. 11

Three use cases including results measured through MCC, ROC AUC, and the four basic rates. Positives: data of survived patients. Negatives: data of deceased patients. MCC: Matthews correlation coefficient. MCC: worst and minimum value \(= -1\) and best and maximum value \(= +1\). TPR: true positive rate, sensitivity, recall. TNR: true negative rate, specificity. PPV: positive predictive value, precision. NPV: negative predictive value. ROC AUC: area under the receiver operating characteristic curve. The Random Forests classifier generated real predicted values in the [0; 1] interval. For the creation of the ROC curve, we used all the possible \(\tau\) cut-off thresholds, as per ROC curve definition. For the creation of the single confusion matrix on which to compute MCC. TPR, TNR, PPV, and NPV, the heuristic traditional \(\tau = 0.5\) threshold: predicted values lower than 0.5 were mapped into 0s (negatives), while predicted values greater or equal to 0.5 were mapped into 1s (positives). The resulting positives and negatives were then compared with the ground truth positives and negatives to generate a \(\tau = 0.5\) threshold confusion matrix, which we used to calculate the values of MCC. TPR, TNR, PPV, and NPV listed in this table. We report these values in a table format in Table S1. UC1: dataset of electronic health records of patients with hepatitis C by Tachi et al. [59]. UC2: dataset of electronic health records of patients with chronic kidney disease by Al-Shamsi and coauthors [60]. UC3: dataset of electronic health records of patients with hepatocellular carcinoma by Santos and colleagues [61]

Back to article page