Jensen Shannon Distance¶
The Jensen Shannon
Distance is
the last and the complex algorithm used inside sugaroid
bot. The
equation for finding Jensen Shannon Distance is not directly used within
This being a complex and CPU intensive process, is handled
systematically by a Natural Language Processing library with Industrial
Processing support, viz, SpaCy. The
SpaCy
library handles this effectively by loading data from en_core_web_sm
and en_core_web_lg
The difference between sm
and lg
is that, en_core_web_sm
is
collection of all the word in the dictionary with vectors only and
weighs 7.5 MB. The en_core_web_lg
weighs 880 MB, and has data for
tensors
too. This dataset is more efficient because, the data so
obtained has tensor data and this helps to correctly measure Jensen
Shannon Distance.
The JSD is internally implemented in an nlp
object called
LanguageProcessor
and handles most of the complex conversations
inside sugaroid.brain.utils.LanguageProcessor
is a signed class with
two methods tokenize
and similarity
The similarity
method
return the resultant net vector displacement of the given vectors.