- DrFAQ is a plug-and-play question answering chatbot that can be generally applied to any organiation's text corpora.
- Designed and implemented a NLP Question Answering architecture using spaCy, huggingface’s BERT language model, ElasticSearch, Telegram Bot API, and hosted on Heroku.
- 4 Mar 2021 - Transfer learning of language models alongside evaluation study is currently in progress.
- 13 Dec 2019 - Implementation of 4-step question-answering methodology completed.
- Given an organisation's corpus of documents, generate a chatbot to enable natural question-answering capabilities.
When a question is asked, the following processes are performed:
- FAQ Question Matching using spaCy's Similarity - /match
- From a given list of Frequently Asked Questions (FAQs), the chatbot detects similarity to the specified question and selects the best answer from the existing list.
- NLP Question Answering using huggingface's BERT - /nlp
- If the question asked is dissimilar to any existing FAQs, perform question answering on the knowledge base and return a sufficiently confident answer.
- Answer Search using ElasticSearch - /search
- If the answer is not sufficiently confident, perform a search on the document corpus and return the search results.
- Human Intervention
- If the search results are still not relevant, prompt a human to add the question-answer pair to the existing list of specified FAQs, or speak to a human.
- Transfer learning of language models researched in a benchmark study shows that:
- If a large and clean QA dataset is available, RoBERTa is the best language model.
- If only a small and unclean generated QA dataset is available, MobileBERT is the best language model.
- If the QA dataset contains many 'Who' questions, RoBERTa should be considered.
- Release DrFAQ as a pip package.
- Make an interactive demo available.
- Integrate abstractive question-answering into the methodology.
- Leverage databases and cloud services.
- explosion/spaCy - Industrial-strength Natural Language Processing (NLP) with Python and Cython
- huggingface/transformers - Transformers: State-of-the-art Natural Language Processing for TensorFlow 2.0 and Pytorch
- elastic/elasticsearch-py - Official Python low-level client for Elasticsearch
- python-telegram-bot/python-telegram-bot - Python Wrapper for Telegram Bots
- google-research/bert - TensorFlow code and pre-trained models for BERT
- BERT - Pre-training of Deep Bidirectional Transformers for Language Understanding