Skip to main content
Fig. 2 | BioData Mining

Fig. 2

From: Changing word meanings in biomedical literature reveal pandemics and new technologies

Fig. 2

A Without alignment, each word2vec model has its own coordinate space. This is a UMAP visualization of 5000 randomly sampled tokens from 5 distinct Word2Vec models trained on the text published in 2010. Each data point represents a token, and the color represents the respective Word2Vec model. B We greyed out all tokens except for the token ‘probiotics’ to highlight that each token appears in its own respective cluster without alignment. C After the alignment step, the token ‘probiotics’ is closer in vector space signifying that tokens can be easily compared. D In the global coordinate space, token distances appear to be vastly different when alignment is not applied. After alignment, token distances become closer; tokens maintain similar distances with their neighbors regardless of alignment. This boxplot shows the average distance of 100 randomly sampled tokens shared in every year from 2000 to 2021. The x-axis shows the various groups being compared (tokens against themselves via intra-year and inter-year distances and tokens against their corresponding neighbors. The y-axis shows the average distance for every year

Back to article page