Taxonomy-based data representation for data mining: an example of the magnitude of risk associated with H. pylori infection

Polaka, Inese; Razuka-Ebela, Danute; Park, Jin Young; Leja, Marcis

doi:10.1186/s13040-021-00271-w

BioData Mining

Table 2 Classification results (H. pylori positive vs negative) in clustered and raw data

From: Taxonomy-based data representation for data mining: an example of the magnitude of risk associated with H. pylori infection

	Data set	Area under ROC (95% CI)	Percent correct (95% CI)	True positive rate (95% CI)	True negative rate (95% CI)	Number of rules (95% CI)	Tree size (95% CI)	Serialized model size in bytes (95%CI)
FURIA	Clustered	0.871^a (0.87…0.873)	87.3 (87.1…87.5)	0.799^a (0.795…0.802)	0.930^a (0.928…0.932)	9.5^a (9.2…9.7)		592230^a (588,461…595,998)
FURIA	Raw	0.880^a (0.878…0.882)	86.9 (86.7…87.1)	0.824^a (0.821…0.828)	0.904^a (0.902…0.907)	11.6^a (11.2…11.9)		976093^a (970,497…981,689)
RIPPER	Clustered	0.872^a (0.87…0.874)	87.1 (86.9…87.3)	0.804^a (0.801…0.807)	0.923^a (0.921…0.925)	4.8^a (4.7…4.8)		18746^a (18,706…18,785)
RIPPER	Raw	0.881^a (0.879…0.883)	86.9 (86.7…87.2)	0.833^a (0.83…0.837)	0.897^a (0.895…0.9)	4.2^a (4.2…4.3)		25909^a (25,882…25,936)
RIDOR	Clustered	0.847 (0.845…0.849)	85.8 (85.6…86.0)	0.760^a (0.756…0.765)	0.934^a (0.931…0.936)	7.3^a (7.2…7.5)		7940^a (7796…8083)
RIDOR	Raw	0.85 (0.847…0.852)	85.6 (85.4…85.8)	0.802^a (0.797…0.807)	0.897^a (0.894…0.9)	6.4^a (6.3…6.4)		5412^a (5328…5495)
C4.5	Clustered	0.891^a (0.889…0.894)	86.9^a (86.7…87.1)	0.802^a (0.798…0.806)	0.921^a (0.918…0.923)	23.3^a (22.8…23.8)	28.6^a (28.0…29.1)	17914^a (17,819…18,010)
C4.5	Raw	0.868^a (0.865…0.87)	86.1^a (85.9…86.3)	0.826^a (0.822…0.829)	0.889^a (0.886…0.891)	26.3^a (25.5…27.1)	37.3^a (36.3…38.4)	25801^a (25,608…25,994)
CART	Clustered	0.867^a (0.865…0.869)	86.1^a (85.9…86.3)	0.786^a (0.783…0.79)	0.919^a (0.917…0.921)		6.3^a (6.2…6.5)	603873^a (603,126…604,620)
CART	Raw	0.889^a (0.887…0.891)	87.8^a (87.6…88.0)	0.845^a (0.842…0.848)	0.904^a (0.901…0.906)		5.1^a (5.1…5.2)	909785^a (908,345…911,226)
Random Forest	Clustered	0.887^a (0.885…0.889)	82.2^a (82.0…82.4)	0.736^a (0.732…0.74)	0.888^a (0.886…0.891)			4537297^a (4,532,554…4,542,040)
Random Forest	Raw	0.915^a (0.913…0.916)	85.2^a (85.0…85.4)	0.754^a (0.75…0.758)	0.927^a (0.925…0.929)			4003216^a (3,999,312…4,007,121)

^astatistically significant difference (p < 0.05, Mann-Whitney U test)

Back to article page

ISSN: 1756-0381

Contact us

General enquiries: journalsubmissions@springernature.com