Skip to main content


Figure 3 | BioData Mining

Figure 3

From: Caipirini: using gene sets to rank literature

Figure 3

Outline of the implemented document classification (Part III). The outline of the implemented methodology is demonstrated via an imaginary example (continued): following the initial data input and processing (see Figures 1 and 2), the next step (3. Training) is to find the best parameter C for training, based on the vectors from inputs sets A and B - once the optimal SVM configuration for the specific dataset entered by the user has been found (i.e., the value of C for which the highest five-fold cross validated accuracy can be achieved), it is used to create the final model. After the training, the SVM model is used to classify and rank set C, as well as the default test set (4. Classification). Last, result files are produced and abstracts are listed, ranked according to the predicted scores (5. Results).

Back to article page