Skip to main content

Table 3 Evaluation of two classifiers A and B on the same two datasets

From: The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation

Classifier dataset TP FN TN FP ϕ TPR TNR BM MCC   
(a) Relative CM
A 1 0.35 0.15 0.35 0.15 0.5 0.7 0.7 0.4 0.4   
A 2 0.035 0.015 0.665 0.285 0.05 0.7 0.7 0.4 0.2   
B 1 0.40 0.10 0.40 0.10 0.5 0.8 0.8 0.6 0.6   
B 2 0.04 0.01 0.76 0.19 0.05 0.8 0.8 0.6 0.3   
Classifier dataset TP FN TN FP ϕ TPR TNR BM MCC n+ n–
(b) Exemplary CM for a sample size of 200
A 1 70 30 70 30 0.5 0.7 0.7 0.4 0.4 100 100
A 2 7 3 133 57 0.05 0.7 0.7 0.4 0.2 10 190
B 1 80 20 80 20 0.5 0.8 0.8 0.6 0.6 100 100
B 2 8 2 152 38 0.05 0.8 0.8 0.6 0.3 10 190
  1. Ideally, both classifiers are evaluated on both datasets as shown in this table. Otherwise, one should rely on metrics which are independent of the prevalence such as BM. Matthews correlation coefficient (MCC) might be unreliable if one wants to compare classification results across datasets