Skip to content

Optimized Stable Diffusion modified to run on lower GPU VRAM for image-to-image translation across large domain gaps

License

Notifications You must be signed in to change notification settings

alexmartin1722/Revive-2I

 
 

Repository files navigation

Jurassic World Remake: Bringing Ancient Fossils Back to Life via Zero-Shot Long Image-to-Image Translation

Project Page | ACM MM | arxiv | Dataset

This repo is a fork of a modified version of Stable Diffuion, optimized to use less VRAM than the original. The original repo can be found here.

Installation

git clone https://github.com/alexmartin1722/Revive-2I.git
cd Revive-2I
conda env create -f environment.yaml
conda activate ldm
pip install transformers==4.19.2 diffusers invisible-watermark
pip install timm scikit-learn 

Weights

The weights used in this paper are the stable-diffusion-v1-4 downloadable from HuggingFace.

Once the weights are downloaded they can be placed in models/ldm/stable-diffusion-v1/model.ckpt or you can specify the path to the weights using the --ckpt argument.

Dataset

Once access to the dataset is granted. The dataset can be downloaded from GoogleDrive. When downloaded, place inside this directory in data/. For example the dog dataset would be in data/skull2dog.

Usage

All code can be run out of optimized_txt_guid_i2i.py. There are two options for running the code

  1. Single image translation
python optimizedSD/optimized_txt_guid_i2i.py "prompt" --source-img <IMG> 
  1. Batch image translation
python optimizedSD/optimized_txt_guid_i2i.py "prompt" --source-img-dir <DIR>

The code used to generate the results in the paper is:

python optimizedSD/optimized_txt_guid_i2i.py "class" --source-img-dir data/skull2dog/testA/ --ddim_steps 100 --strength 0.95 --seed 42

Other built in prompting options are

  1. "short class" for just the ImageNet class name
  2. "no photo" for the prompt "class name head"

Evaluation

To evaluate the code, first classifier the images in the ouput folder

python classification/classifier.py --image-dir <DIR> --output <DIR>/labels.csv

Then run the evaluation script with the DIR

python eval/scores.py --generated-dir <DIR> --target-dir data/skull2dog/testB --class-csv <DIR>

You can also skip the classification step by providing your own HuggingFace API key, but if you have rate limits, use the classification script.

python eval/scores.py --generated-dir <DIR> --target-dir data/skull2dog/testB --api-key XXX

Citation

When citing this dataset please cite the following papers:

The living animals:

@misc{choi2018stargan,
      title={StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation}, 
      author={Yunjey Choi and Minje Choi and Munyoung Kim and Jung-Woo Ha and Sunghun Kim and Jaegul Choo},
      year={2018},
      eprint={1711.09020},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

The skulls and anything else:

@inproceedings{Martin_2023,
	doi = {10.1145/3581783.3612708},
  
	url = {https://doi.org/10.1145%2F3581783.3612708},
  
	year = 2023,
	month = {oct},
  
	publisher = {{ACM}
},
  
	author = {Alexander Martin and Haitian Zheng and Jie An and Jiebo Luo},
  
	title = {Jurassic World Remake: Bringing Ancient Fossils Back to Life via Zero-Shot Long Image-to-Image Translation},
  
	booktitle = {Proceedings of the 31st {ACM} International Conference on Multimedia}
}

About

Optimized Stable Diffusion modified to run on lower GPU VRAM for image-to-image translation across large domain gaps

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 86.4%
  • Python 13.5%
  • Other 0.1%