Skip to main content
Fig. 1 | BioData Mining

Fig. 1

From: DIVIS: a semantic DIstance to improve the VISualisation of heterogeneous phenotypic datasets

Fig. 1

The dataset processing pipeline. Based on the list of qualitative variables we define the list of required ontologies (one for each variable). For each ontology, if relevant data are publicly available we retrieve it. Otherwise, we rely on expert knowledge to build the ontology graph. We introduce distances between concepts in the ontologies based either on real-life distances or expert knowledge. These ontologies including distance between concepts information are used to build a distance matrix between variable modalities for each qualitative variable. Based on the vector of variable values which represent each individual in the dataset we calculate pairwise distances to build a distance matrix between individuals. Individuals are then projected in a coordinate space using dimension reduction methods. Individuals coordinates are used during the clustering process to build groups. Representative individuals for each group are estimated to define the groups’archetypes which are used as part of the visualisations

Back to article page