langchain-cohere-qdrant-doc-retrieval

This Flask backend API takes a document in multiple formats (.txt, .docx, .pptx, .jpg, .png, .eml, .html, and .pdf) and allows you to perform a semantic search in 100+ languages supported by Cohere Multilingual API. Qdrant vector database is used to save embeddings.

Setup

The following steps will guide you on how to run the application on macOS/Linux.

Prerequisites

Python 3
Git
virtualenv
Homebrew

Installation

Clone the repository

git clone https://github.com/menloparklab/langchain-cohere-qdrant-doc-retrieval docQA

Change into the directory

cd docQA

Create and activate a virtual environment

python3 -m venv env
source env/bin/activate

Install the required packages

pip install -r requirements.txt

Unstructured uses detectron which is installed as below:

pip install "detectron2@git+https://github.com/facebookresearch/[email protected]#egg=detectron2"

Install Homebrew

Follow the installation guide on Homebrew website.

Install the following brew packages

brew install libmagic poppler tesseract libxml2 libxslt

Create a .env file and set the following environment variables:

cohere_api_key="insert here"
openai_api_key="insert here"
qdrant_url="insert here"
qdrant_api_key="insert here"

Replace the values with your own API keys and Qdrant URL.

Qdrant url and api keys

Please signup for a free cloud-based account of Qdrant and create a new cluster. You will then be able to get the qdrant_url and qdrant_api_key used in the section above.

Run the application using the following command:

gunicorn app:app

Access the API endpoints

The API endpoints will be live at the following routes:

/embed
/retrieve

Conclusion

You have successfully installed and ran the DocQA system on your local machine. Feel free to explore the code and make changes as per your requirements.

Connecting to a frontend

The deployed api endpoints, /embed and /retrieve can now be called from any frontend application. For bubble users, you can watch this video for detailed instructions.

Include headers for the API: "Content-Type": "application/json"

JSON body for /embed:
{ "collection_name": "{collection_name}", "file_url": "{file_url}" }

JSON body for /retrieve:
{ "collection_name": "{collection_name}", "query": "{query}" }

For Bubble users

Embed JSON for the bubble:
{ "collection_name": "<collection_name>", "file_url": "<file_url>" }

Retrieve JSON for bubble:
{ "collection_name": "<collection_name>", "query": "<query>" }

Feel free to reach out if any questions on Twitter

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

langchain-cohere-qdrant-doc-retrieval

Setup

Prerequisites

Installation

Qdrant url and api keys

Conclusion

Connecting to a frontend

For Bubble users

About

Releases

Packages

Languages

menloparklab/langchain-cohere-qdrant-doc-retrieval

Folders and files

Latest commit

History

Repository files navigation

langchain-cohere-qdrant-doc-retrieval

Setup

Prerequisites

Installation

Qdrant url and api keys

Conclusion

Connecting to a frontend

For Bubble users

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages