Skip to content

Latest commit

 

History

History
99 lines (70 loc) · 2.82 KB

File metadata and controls

99 lines (70 loc) · 2.82 KB

LLaVA-Image-Description-Generator

This project utilizes the LLaVA model, an end-to-end trained large multimodal model designed to understand and generate content based on both visual inputs (images) and textual instructions.

Table of Contents

Overview

The LLaVA Image Description Generator project aims to generate descriptive content for uploaded images using advanced AI techniques. It leverages the LLaVA model to interpret visual inputs and textual prompts, providing accurate and context-aware descriptions.

Getting Started

Follow these instructions to set up and run the project locally.

Prerequisites

Before starting, ensure you have the following installed:

  • Python 3.8.19
  • install libraries

Installation

  1. Clone the repository:
git clone https://github.com/yugeshsivakumar/LLaVA-Image-Description-Generator.git
cd LLaVA-Image-Description-Generator
  1. Install dependencies:
pip install -r requirements.txt
  1. Install the ollama application from https://ollama.com/ according to the instructions provided on their website.

Usage

To use the LLaVA Image Description Generator:

  1. Start and install the LLaVA model using the command prompt:
ollama run llava
  1. Run the Streamlit app:
streamlit run app.py
  1. Access the app in your browser at http://localhost:8501.

  2. Upload an image file (jpg, jpeg, png).

  3. Enter a prompt or use the default prompt "Describe this image".

  4. Click "Get Description" to see the generated description.

Project Structure

The project structure includes:

  • app.py: Streamlit application for interacting with the LLaVA model.
  • requirements.txt: List of Python dependencies.
  • README.md: Project documentation.

Results

After generating descriptions, results can be visualized as follows:

Result Image 1
Result Image 1
Result Image 2
Result Image 2

Contributing

Contributions are welcome! Please fork the repository and submit a pull request with your changes.

License

This project is licensed under the MIT License. See the LICENSE file for details.