Skip to content

Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models, arxiv 2023 / CVPR 2024

License

Notifications You must be signed in to change notification settings

SHI-Labs/Prompt-Free-Diffusion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Prompt-Free Diffusion

HuggingFace space Framework: PyTorch License: MIT

This repo hosts the official implementation of:

Xingqian Xu, Jiayi Guo, Zhangyang Wang, Gao Huang, Irfan Essa, and Humphrey Shi, Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models, Paper arXiv Link.

News

  • [2023.06.20]: SDWebUI plugin is created, repo at this link
  • [2023.05.25]: Our demo is running on HuggingFace🤗
  • [2023.05.25]: Repo created

Introduction

Prompt-Free Diffusion is a diffusion model that relys on only visual inputs to generate new images, handled by Semantic Context Encoder (SeeCoder) by substituting the commonly used CLIP-based text encoder. SeeCoder is reusable to most public T2I models as well as adaptive layers like ControlNet, LoRA, T2I-Adapter, etc. Just drop in and play!

Performance

Network

Setup

conda create -n prompt-free-diffusion python=3.10
conda activate prompt-free-diffusion
pip install torch==2.0.0+cu117 torchvision==0.15.1 --extra-index-url https://download.pytorch.org/whl/cu117
pip install -r requirements.txt

Demo

We provide a WebUI empowered by Gradio. Start the WebUI with the following command:

python app.py

Pretrained models

To support the full functionality of our demo. You need the following models located in these paths:

└── pretrained
    ├── pfd
    |   ├── vae
    |   │   └── sd-v2-0-base-autokl.pth
    |   ├── diffuser
    |   │   ├── AbyssOrangeMix-v2.safetensors
    |   │   ├── AbyssOrangeMix-v3.safetensors
    |   │   ├── Anything-v4.safetensors
    |   │   ├── Deliberate-v2-0.safetensors
    |   │   ├── OpenJouney-v4.safetensors
    |   │   ├── RealisticVision-v2-0.safetensors
    |   │   └── SD-v1-5.safetensors
    |   └── seecoder
    |       ├── seecoder-v1-0.safetensors
    |       ├── seecoder-pa-v1-0.safetensors
    |       └── seecoder-anime-v1-0.safetensors
    └── controlnet
        ├── control_sd15_canny_slimmed.safetensors
        ├── control_sd15_depth_slimmed.safetensors
        ├── control_sd15_hed_slimmed.safetensors
        ├── control_sd15_mlsd_slimmed.safetensors
        ├── control_sd15_normal_slimmed.safetensors
        ├── control_sd15_openpose_slimmed.safetensors
        ├── control_sd15_scribble_slimmed.safetensors
        ├── control_sd15_seg_slimmed.safetensors
        ├── control_v11p_sd15_canny_slimmed.safetensors
        ├── control_v11p_sd15_lineart_slimmed.safetensors
        ├── control_v11p_sd15_mlsd_slimmed.safetensors
        ├── control_v11p_sd15_openpose_slimmed.safetensors
        ├── control_v11p_sd15s2_lineart_anime_slimmed.safetensors
        ├── control_v11p_sd15_softedge_slimmed.safetensors
        └── preprocess
            ├── hed
            │   └── ControlNetHED.pth
            ├── midas
            │   └── dpt_hybrid-midas-501f0c75.pt
            ├── mlsd
            │   └── mlsd_large_512_fp32.pth
            ├── openpose
            │   ├── body_pose_model.pth
            │   ├── facenet.pth
            │   └── hand_pose_model.pth
            └── pidinet
                └── table5_pidinet.pth

All models can be downloaded at HuggingFace link.

Tools

We also provide tools to convert pretrained models from sdwebui and diffuser library to this codebase, please modify the following files:

└── tools
    ├── get_controlnet.py
    └── model_conversion.pth

You are expected to do some customized coding to make it work (i.e. changing hardcoded input output file paths)

Performance Anime

Citation

@article{xu2023prompt,
  title={Prompt-Free Diffusion: Taking" Text" out of Text-to-Image Diffusion Models},
  author={Xu, Xingqian and Guo, Jiayi and Wang, Zhangyang and Huang, Gao and Essa, Irfan and Shi, Humphrey},
  journal={arXiv preprint arXiv:2305.16223},
  year={2023}
}

Acknowledgement

Part of the codes reorganizes/reimplements code from the following repositories: Versatile Diffusion official Github and ControlNet sdwebui Github, which are also great influenced by LDM official Github and DDPM official Github

About

Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models, arxiv 2023 / CVPR 2024

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages