Jensen Shannon Distance¶
The Jensen Shannon
Distance is
the last and the complex algorithm used inside sugaroid bot. The
equation for finding Jensen Shannon Distance is not directly used within
This being a complex and CPU intensive process, is handled
systematically by a Natural Language Processing library with Industrial
Processing support, viz, SpaCy. The
SpaCy
library handles this effectively by loading data from en_core_web_sm
and en_core_web_lg
The difference between sm and lg is that, en_core_web_sm is
collection of all the word in the dictionary with vectors only and
weighs 7.5 MB. The en_core_web_lg weighs 880 MB, and has data for
tensors too. This dataset is more efficient because, the data so
obtained has tensor data and this helps to correctly measure Jensen
Shannon Distance.
The JSD is internally implemented in an nlp object called
LanguageProcessor and handles most of the complex conversations
inside sugaroid.brain.utils.LanguageProcessor is a signed class with
two methods tokenize and similarity The similarity method
return the resultant net vector displacement of the given vectors.