Skip to main content

Table 1 Datasets of ncRNAs and CDS of microalgae

From: mSRFR: a machine learning model using microalgal signature features for ncRNA classification

Group of microalgae

Types of sequences

Training dataset

Training dataset after balancing

Test dataset

Diatom

ncRNAs

1234

1125a

308

CDS

356

1125*

88

Golden algae

ncRNAs

168

1125*

41

CDS

60

1125*

15

Green algae

ncRNAs

1973

1125a

493

CDS

6818

1125a

1704

Cyanobacteria

ncRNAs

13,116

3375a

3280

CDS

5448

3375a

1363

  1. aData generated by random selection; * Data generated by SMOTE