Skip to main content

Table 2 Recap of the correlations between the confusion matrix summarizing metrics and the basic rates. #: integer number. MCC: Matthews correlation coefficient (Eq. 7). BA = balanced accuracy \(= (TPR + TNR) / 2\). BM = bookmaker informedness \(= TPR + TNR - 1\). MK = markedness \(= PPV + NPV - 1\). F\(_1\) score: harmonic mean of TPR and PPV (Eq. 5). Accuracy: ratio between correctly predicted data instances and all data instances (Eq. 6). We call “basic rates” these four statistics: TPR, TNR, PPV, and NPV. We calculate MCC, BA, MB, MK, F\(_1\) score, accuracy, TPR, TNR, PPV, and NPV here with cut-off threshold \(\tau = 0.5\): real-valued predictions greater or equal to 0.5 are mapped into 1s, and real-valued predictions smaller than 0.5 are mapped into 0s. The ROC AUC, instead, refers to all the possible cut-off thresholds, as per its definition. We published an initial version of this table as Table 4 in the [4] article under the Creative Commons Attribution 4.0 International License

From: The Matthews correlation coefficient (MCC) should replace the ROC AUC as the standard metric for assessing binary classification

scenario

condition of basic rates (with \({\tau = 0.5}\))

# guaranteed high basic rates

high MCC\(_{\tau = 0.5}\)

high TPR, TNR, PPV, and NPV

4

high BA\(_{\tau = 0.5}\)

high TPR, TNR, and at least one of PPV and NPV

3

high BM\(_{\tau = 0.5}\)

high TPR, TNR, and at least one of PPV and NPV

3

high MK\(_{\tau = 0.5}\)

high PPV, NPV, and at least one of TPR and TNR

3

high F\(_1\) score\(_{\tau = 0.5}\)

high PPV and TPR

2

high accuracy\(_{\tau = 0.5}\)

high TPR and PPV, or high TNR and NPV

2

high ROC AUC\(_{\tau = all}\) with all points above half semicircle ROC

high TPR and TNR, or at least one of TPR and TNR