Prediction of donor splice sites using random forest with a new sequence encoding approach

Table 4 Performance metrics of Bagging, Boosting, Logistic regression, kNN and Naïve Bayes classifiers for all the three encoding procedures under both balanced and imbalanced situations

EP	MD	Balanced							Imbalanced
		TPR	TNR	F (α = 1)	F (β = 2)	G-mean	WA	MCC	TPR	TNR	F (α = 1)	F (β = 2)	G-mean	WA	MCC
P-1	BG	0.944	0.921	0.934	0.940	0.933	0.933	0.866	0.069	0.996	0.127	0.084	0.258	0.533	0.172
	BS	0.952	0.919	0.936	0.945	0.935	0.935	0.872	0.041	0.898	0.079	0.051	0.192	0.470	0.129
	LG	0.895	0.882	0.889	0.892	0.888	0.888	0.777	0.008	0.993	0.016	0.010	0.087	0.502	0.012
	NB	0.835	0.836	0.836	0.835	0.834	0.835	0.674	0.202	0.838	0.297	0.231	0.409	0.520	0.067
	KN	0.856	0.840	0.847	0.852	0.847	0.848	0.697	0.048	0.854	0.087	0.058	0.200	0.451	0.012
P-2	BG	0.927	0.882	0.907	0.919	0.904	0.904	0.810	0.112	0.992	0.198	0.135	0.330	0.552	0.216
	BS	0.934	0.901	0.918	0.928	0.917	0.917	0.835	0.090	0.996	0.163	0.109	0.296	0.543	0.200
	LG	0.742	0.734	0.739	0.741	0.737	0.738	0.478	0.112	0.981	0.198	0.135	0.330	0.547	0.190
	NB	0.772	0.758	0.767	0.770	0.764	0.765	0.532	0.159	0.884	0.250	0.186	0.373	0.521	0.073
	KN	0.813	0.678	0.760	0.790	0.739	0.746	0.502	0.173	0.981	0.290	0.207	0.412	0.577	0.262
P-3	BG	0.924	0.904	0.915	0.920	0.914	0.914	0.828	0.125	0.991	0.220	0.151	0.351	0.558	0.230
	BS	0.941	0.898	0.922	0.933	0.920	0.920	0.841	0.095	0.995	0.171	0.115	0.305	0.545	0.205
	LG	0.813	0.775	0.798	0.807	0.793	0.794	0.589	0.120	0.983	0.210	0.144	0.342	0.551	0.202
	NB	0.784	0.761	0.775	0.780	0.771	0.772	0.547	0.178	0.945	0.289	0.210	0.410	0.562	0.196
	KN	0.795	0.700	0.756	0.778	0.742	0.747	0.501	0.065	0.989	0.120	0.080	0.247	0.527	0.142

MD methods, EP encoding procedures, BG bagging, BS boosting, LG logistic regression, NB naïve bayes, KN K nearest neighbor

ISSN: 1756-0381