Databases and Training ====================== Sugaroid uses an ``sqlite3``-type database for portability. All the responses are explicitly saved and trained on sugaroid. Sugaroid has two types of training: 1. Supervised training 2. Unsupervised training Supervised training ------------------- Supervised training is a list of proper responses, most commonly collected from the Stanford Question Answering Dataset (Natural) (`SQuAD 2.0 `__ from Stanford NLP, attribution to Rajpurkar & Jia et al. ’18). Other reponses are manually trained from interactions during testing. All the responses are saved to ``~/.config/sugaroid/sugaroid.db`` which is opened in read-only mode during production mode to prevent people from tampering with the dataset. At local testing, it is possible to teach sugaroid a sequel of responses and this will appended to the SQL database. Using `Naive Bayers `__ algorithm. Unsupervised Training --------------------- Unsupervised training are a community collected dataset. The sources of data, are obviously from the community, on its hosted `sugaroid.srevinsaju.me `__ instance on Microsoft Azure, frontend on AWS. This data are also appended to the SQL database like `Supervised Training <#supervised-training>`__ but they are saved with lesser confidence ( ``0.1 * confidence_from_statement`` ), as data from community needs to undergo refining. ``sqlite3`` ----------- Sugaroid’s backend module is ``sqlite3`` against the conventional MySQL or MariaDB adapters. ``sqlite3`` was chosen considering its portability alone. Despite higher IO operations on ``sqlite3``, community data collection becomes easier because ``sqlite3`` databases are more or less, a single file. Another problem it solves is the different ways in which the operating systems consider the file path to be. Using ``sqlite3`` helps to keep consistency in case. (For Windows, ``mysql`` is case insensitive, but on GNU/Linux/UNIX its case sensitive). Using ``sqlite3`` solves that problem. Privacy policy -------------- Sugaroid collects data from its users which are then used to train. This is done through cookies, on the first response you provide to sugaroid (on the web interface), on adding the bot to your discord channel (on the Discord adapter). However, your data is completely safe, and is not collected for training purposes if its (i) self hosted (ii) run as a desktop / command line app. All data on the desktop version is still appended to your respective configuration folders, which is, for example, on Linux, ``~/.config/sugaroid/sugaroid.db`` and on Windows its ``C:\Users\foobar\AppData\Local\sugaroid\sugaroid.db``. Note: ``AppData`` folder is normally hidden on Windows, manually “Show all hidden folders” to see the AppData folder. Investigating data from the database ------------------------------------ There are certain cases when you would like to analyze the data stored in the database, or you would like to do some debugging. In all such cases, the path to the ``sugaroid.db`` is very much useful. All you need is an ``sqlite3`` binary, which is available for all platforms. Download ``sqlite3`` from `here `__ And then, start investigating by .. code:: bash $ sqlite3 ~/.config/sugaroid/sugaroid.db This will open a prompt, where you can enter most commands; Apart from the main database, ``sugaroid`` also stores data in \* ``~/.config/sugaroid/sugaroid.db`` \* ``~/.config/sugaroid/sugaroid.trainer.json`` \* ``~/.config/sugaroid/sugaroid_internal.db`` \* ``~/.config/sugaroid/data.json`` Along with SQL, we have also used JSON type files for configuration alone.