Skip to content

RajkumarGalaxy/Wiki-IR-ChatBot

Repository files navigation

Wiki-IR-ChatBot

A ChatBot that can respond with humans by retrieving information directly from Wikipedia.

User is open to choose any topic of interest!

In this project?

This project builds an information retrieval (IR) chatbot that can scrape Wikipedia using BeautifulSoup in the topic of user's interest and collect information against user's queries following a heuristic backed by TF-IDF score and cosine-similarity score. This Wiki-IR-ChatBot is user-friendly in permitting users to choose any topic and presenting either crisp and short response or detailed response. It leverages NLTK library to do text processing and scikit-learn library to do modeling. Find more details with the supporting blog article published at LinkedIn!

This repository contains:

Wiki_IR_ChatBot.py - A Python version of ChatBot

wiki-ir-chatbot_1.ipynb - A Interactive Notebook version of ChatBot (Earlier versions also available)

requirements.txt - Explores Python libraries requirements to run the project

wiki_ir_chatbot_chats_x.jpg - Screenshots of some chats by this project (replace x with 1, 2, 3)

What is a chatbot?

A ChatBot is a kind of virtual assistant that can build conversations with human users! A Chatting Robot. Building a chatbot is one of the popular tasks in Natural Language Processing.

What is Information Retrieval?

Information Retrieval (or, IR in short) is the task of identifying and collecting the most relevant information from a source based on a pre-defined heuristic. Text data is a good example of unordered data while it is abudant everywhere. It is hard to find the information manually from a huge collection of text data. Since need of information is time-bound in general, a good IR system is always in need.

Are all chatbots the same?

Chatbots fall under three common categories:

1. Rule-based chatbots
2. Retrieval-based chatbots
3. Intelligent chatbots

Rule-based chatbots

These bots respond to users' inputs based on certain pre-specified rules. For instance, these rules can be defined as if-elif-else statements. While writing rules for these chatbots, it is important to expect all possible user inputs, else the bot may fail to answer properly. Hence, rule-based chatbots do not possess any cognitive skills.

Retrieval-based chatbots

These bots respond to users' inputs by retrieving the most relevant information from the given text document. The most relevant information can be determined by Natural Language Processing with a scoring system such as cosine-similarity-score. Though these bots use NLP to do conversations, they lack cognitive skills to match a real human chatting companion. This Wiki-IR-ChatBot falls under this category!

Intelligent AI chatbots

These bots respond to users' inputs after understanding the inputs, as humans do. These bots are trained with a Machine Learning Model on a large training dataset of human conversations. These bots are cognitive to match a human in conversing. Popular Virtual Assistants such as Amazon's Alexa, Apple's Siri fall under this category. Further, most of these bots can make conversations based on the preceding chat texts. Conversational AI ChatBot, built by Author, employs Microsoft's DialoGPT to make intelligent conversations!

Some chats by this Wiki-IR-ChatBot

chat3

A chat on topic Bicycle

chat2

A chat on topic Tea

chat1

Happy Chatting!

Acknowledgement: Parul Pandey's article on "Building a simple chatbot in python using NLTK" gave a good insight on Information Retrieval modelling.

robo_chat

Image by Brett Jordan