Like its name suggests, I’m trying to extend the idea of word2vec into word roots. Since English word roots typically originated from Latin or Greek languages and carry well defined meanings, analyzing word roots would be a great contribution to meaning analysis. If word roots could be vectorized in ways similar to word vectors, semantic models could be improved seamlessly. Furthermore, word roots are already well summarized in areas of linguistics, making it a good data set to work with.
Therefore, I’m developing a wdroot2vec model to represent the meaning of word roots with vectors and make the word vectors a function of these root vectors. It is included in the package Shades of Language, a set of NLP models aimed at enriching the machine understanding of languages.