Skip to content

This project utilizes the LLaVA model, an end-to-end trained large multimodal model designed to understand and generate content based on both visual inputs (images) and textual instructions.

License

Notifications You must be signed in to change notification settings

yugeshsivakumar/LLaVA-Image-Description-Generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

LLaVA-Image-Description-Generator

This project utilizes the LLaVA model, an end-to-end trained large multimodal model designed to understand and generate content based on both visual inputs (images) and textual instructions.

Table of Contents

Overview

The LLaVA Image Description Generator project aims to generate descriptive content for uploaded images using advanced AI techniques. It leverages the LLaVA model to interpret visual inputs and textual prompts, providing accurate and context-aware descriptions.

Getting Started

Follow these instructions to set up and run the project locally.

Prerequisites

Before starting, ensure you have the following installed:

  • Python 3.8.19
  • install libraries

Installation

  1. Clone the repository:
git clone https://github.com/yugeshsivakumar/LLaVA-Image-Description-Generator.git
cd LLaVA-Image-Description-Generator
  1. Install dependencies:
pip install -r requirements.txt
  1. Install the ollama application from https://ollama.com/ according to the instructions provided on their website.

Usage

To use the LLaVA Image Description Generator:

  1. Start and install the LLaVA model using the command prompt:
ollama run llava
  1. Run the Streamlit app:
streamlit run app.py
  1. Access the app in your browser at http://localhost:8501.

  2. Upload an image file (jpg, jpeg, png).

  3. Enter a prompt or use the default prompt "Describe this image".

  4. Click "Get Description" to see the generated description.

Project Structure

The project structure includes:

  • app.py: Streamlit application for interacting with the LLaVA model.
  • requirements.txt: List of Python dependencies.
  • README.md: Project documentation.

Results

After generating descriptions, results can be visualized as follows:

Result Image 1
Result Image 1
Result Image 2
Result Image 2

Contributing

Contributions are welcome! Please fork the repository and submit a pull request with your changes.

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

This project utilizes the LLaVA model, an end-to-end trained large multimodal model designed to understand and generate content based on both visual inputs (images) and textual instructions.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages