WikiBuddy is a chatbot powered by a Large Language Model and based on Retrieval Augmented Generation (RAG) techniques. It utilizes datasets from Wikipedia.org and Nytimes.com collected from the internet, integrating both retrieval and generation models. This approasch ensures that WikiBuddy delivers responses that are more accurate and relevant compared to traditional chatbots
- Initiation: The interaction begins with the user asking a question or making a query to the chatbot.
- Wikibuddy's Request: Upon receiving the user's query, Wikibuddy requests the Language Model (LLM) to provide a better-phrased question. It includes the original question along with the chat history for context.
- Better-Phrased Question: Wikibuddy receives the improved question from the LLM and responds to the user with the refined query.
- Passing to Retriever: Wikibuddy forwards the better-phrased question to the Retriever component, which has access to a repository of documents.
- Similarity Search: The Retriever conducts a similarity search within the document repository based on the provided query to identify relevant documents.
- Combining Query and Documents: Wikibuddy augments the better-phrased question with the retrieved documents, forming a contextualized input for the LLM. This enriched context provides additional information for generating a more accurate response.
- LLM Processing: The augmented input is passed to the LLM, which processes the information and generates a response based on the provided context.
- User Response: The generated answer is returned to the user, addressing their initial query.
- Continued Conversation: Users can continue the conversation by asking follow-up questions, initiating another cycle of interaction.
- Telemetry Integration: Throughout the interaction, Langsmith monitors various aspects of the chatbot's performance, including response times, usage, traces.
- Analytics: Data collected during interactions are analyzed to gain insights into user behavior, query patterns, and overall chatbot effectiveness.
This project is intended to run 4bit version of open source LLM Mistral 7b Instruct locally on CPU. To run this locally first clone the repo and install dependencies in requirements.txt
pip3 install --no-cache-dir --upgrade -r requirements.txt
To create the embeddings and vector store first run ingestion.py
script. This script creates embedding and stores in vectorestore. Once the indexes are created they are saved locally in saved-index-faiss
folder.
Note: This script MUST run before running main.py
and takes quite lot of time. Alternatively, to speed up you can download the already created indexed from here and paste them in root directory. This way you do not need to run this script first.
Once the indexes are ready run
python3 main.py
This should on first run, downlaod the model and download locally too. From next time the model will be cached. Once the server is up and running it will be available on
http://127.0.0.1:8000
You can test the inference at by making POST
request at http://127.0.0.1:8000/chat
Query params :
query: "input question"
isAPI: "true"
The UI is created using Gradio and can be accessed at http://127.0.0.1:8000/gradio
The Swagger UI is available on http://127.0.0.1:8000/gradio
Langsmith is used for logging, analytics, monitoring. Setting up Langsmith for monitoring is really simple.
- Create a free account
- Create a project
- Get API Key
In project root directory create a new file name .env
and add these details and re-run the application.
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_ENDPOINT="https://api.smith.langchain.com"
export LANGCHAIN_API_KEY="<LANGSMITH_API_KEY>"
export LANGCHAIN_PROJECT="<PROJECT NAME>"