This project utilizes the LLaVA model, an end-to-end trained large multimodal model designed to understand and generate content based on both visual inputs (images) and textual instructions.
The LLaVA Image Description Generator project aims to generate descriptive content for uploaded images using advanced AI techniques. It leverages the LLaVA model to interpret visual inputs and textual prompts, providing accurate and context-aware descriptions.
Follow these instructions to set up and run the project locally.
Before starting, ensure you have the following installed:
- Python 3.8.19
- install libraries
- Clone the repository:
git clone https://github.com/yugeshsivakumar/LLaVA-Image-Description-Generator.git
cd LLaVA-Image-Description-Generator
- Install dependencies:
pip install -r requirements.txt
- Install the ollama application from https://ollama.com/ according to the instructions provided on their website.
To use the LLaVA Image Description Generator:
- Start and install the LLaVA model using the command prompt:
ollama run llava
- Run the Streamlit app:
streamlit run app.py
-
Access the app in your browser at http://localhost:8501.
-
Upload an image file (jpg, jpeg, png).
-
Enter a prompt or use the default prompt "Describe this image".
-
Click "Get Description" to see the generated description.
The project structure includes:
app.py:
Streamlit application for interacting with the LLaVA model.requirements.txt:
List of Python dependencies.README.md:
Project documentation.
After generating descriptions, results can be visualized as follows:
Result Image 1 |
Result Image 2 |
Contributions are welcome! Please fork the repository and submit a pull request with your changes.
This project is licensed under the MIT License. See the LICENSE file for details.