The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation

Table 2 Evaluation of two classifiers A and B on separate datasets

Classifier	dataset	TP	FN	TN	FP	ϕ	TPR	TNR	BM	MCC
(a) Relative CM
A	1	0.35	0.15	0.35	0.15	0.5	0.7	0.7	0.4	0.4
B	2	0.04	0.01	0.76	0.19	0.05	0.8	0.8	0.6	0.3
Classifier	dataset	TP	FN	TN	FP	ϕ	TPR	TNR	BM	MCC	n+	n–
(b) Exemplary CM for a sample size of 200
A	1	70	30	70	30	0.5	0.7	0.7	0.4	0.4	100	100
B	2	8	2	152	38	0.05	0.8	0.8	0.6	0.3	10	190

In the literature, different publications compare classifiers for the same task on separate datasets. This poses a problem for the comparability of metrics which are dependent on prevalence

ISSN: 1756-0381