Skip to main content
Fig. 1 | BioData Mining

Fig. 1

From: Iteratively refining breast cancer intrinsic subtypes in the METABRIC dataset

Fig. 1

Refinement process. The process is initialized with labels assigned using the PAM50 method. After computing the CM1 score, the top 10 highly discriminative probes are selected for each subtype. This set of features is used to train the 24 distinct classifiers for a 10-fold cross-validation classification. Samples are relabelled (eventually with the same label) if the classifiers agree in at least 50 % of the cases; otherwise they are marked as inconsistent and not further considered in the iteration process. The stopping criterion is reached when there are no more changes in the sample labels or selected feature set, or when the desired Fleiss’ kappa is achieved. After stopping, the final feature set and sample labels are used to classify the samples previously marked as inconsistent or from the validation dataset. These samples are run through the same refinement procedure; inconsistent samples are reclassified and labels are refined

Back to article page