The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation

Table 3 Evaluation of two classifiers A and B on the same two datasets

Classifier	dataset	TP	FN	TN	FP	ϕ	TPR	TNR	BM	MCC
(a) Relative CM
A	1	0.35	0.15	0.35	0.15	0.5	0.7	0.7	0.4	0.4
A	2	0.035	0.015	0.665	0.285	0.05	0.7	0.7	0.4	0.2
B	1	0.40	0.10	0.40	0.10	0.5	0.8	0.8	0.6	0.6
B	2	0.04	0.01	0.76	0.19	0.05	0.8	0.8	0.6	0.3
Classifier	dataset	TP	FN	TN	FP	ϕ	TPR	TNR	BM	MCC	n+	n–
(b) Exemplary CM for a sample size of 200
A	1	70	30	70	30	0.5	0.7	0.7	0.4	0.4	100	100
A	2	7	3	133	57	0.05	0.7	0.7	0.4	0.2	10	190
B	1	80	20	80	20	0.5	0.8	0.8	0.6	0.6	100	100
B	2	8	2	152	38	0.05	0.8	0.8	0.6	0.3	10	190

Ideally, both classifiers are evaluated on both datasets as shown in this table. Otherwise, one should rely on metrics which are independent of the prevalence such as BM. Matthews correlation coefficient (MCC) might be unreliable if one wants to compare classification results across datasets

ISSN: 1756-0381