-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option for initializing the generation with an image file? #133
Comments
I'd like to say here that this sounds incredibly useful. It would allow someone to pass an image back and forth between VQGAN+CLIP setup and Big Sleep. For this I think it would need an encoder, and this one might work - https://github.com/disanda/MTV-TSA |
@illtellyoulater yea sure, if you can give me a sample image and prompt you'd like me to test on, i can build it |
@illtellyoulater don't you think Big Sleep is a little outdated these days? with the awesome work being done by @crowsonkb ? |
@illtellyoulater or maybe there's some surreal quality to Big Sleep generations that is still useful for your work? do let me know since I am curious! |
@lucidrains I think the VQGAN+CLIP implementation from @crowsonkb generally yields higher quality results, but the results from Big Sleep have more variety. This feature would allow the two to be combined (and I could add Big Sleep as an alternative engine for https://github.com/wolfgangmeyers/aibrush-2). |
@wolfgangmeyers ohh got it, that is interesting (re: variety) ok, maybe i'll hack this out when i have some spare time, shouldn't take too long |
there's actually also an improved version of big sleep out there (called fusedream) https://arxiv.org/abs/2112.01573 in case people don't know |
oops, closed by accident |
Fusedream looks awesome. Do you think the MTV-TSA project would work for this and/or fusedream for encoding an init image to latents? |
@wolfgangmeyers haha, i'm not familiar with MTV-TSA at all btw, i'm not quite sure i understand the need for an encoder in your previous comment @illtellyoulater was your desired feature basically this? https://github.com/lucidrains/deep-daze#priming |
Let me clarify - Big Sleep selects a random starting image (mostly dog images?) and can then use a priming image that guides the image generation. That does seem to help, but I've had odd results. The priming image is similar to the https://github.com/nerdyrodent/VQGAN-CLIP/blob/main/generate.py#L60 In this case, I think the ask is to be able to take an image and convert it into the BigGAN latent space, similar to the https://github.com/nerdyrodent/VQGAN-CLIP/blob/main/generate.py#L64 The expectation would be that you could do could reconstruct an input image - take an input image, convert it to latents, and then convert that back into an image that is similar enough to the original to be recognizable. Artbreeder does something similar with Stylegan, I think. The big benefit to this is that you can provide a concrete starting point for image generation, and it's possible to "resume" image generation from a completely different stack like VQGAN+CLIP, Glide, DALL-E, etc. I have a PR up that allows resuming from another Big Sleep generated image - this was very handy in being able to try many variations and pick the best one (looks like I need to fix merge conflicts). |
Looks like they only support 256x256 for BigGAN though. |
ohh got it! sorry my bad, this won't be easy then it'll require what they call projection in GAN literature, something like http://people.csail.mit.edu/minhuh/papers/pix2latent/arxiv_v2.pdf |
@wolfgangmeyers and yes you are right, an encoder would help here, but it needs to be trained concurrently with the GAN, something which BigGAN does not have |
@lucidrains: I confirm my request is exactly what @wolfgangmeyers said in the above period.
@wolfgangmeyers this would be very cool.
@lucidrains this is getting a bit too technical for me so from now on I will leave space to you guys. I just hope one way or another it can be done :) I hope that when needed I will still be able to help with feedbacks and the alike.
Agreeing with @wolfgangmeyers here, VQGAN+CLIP results looks better, but Big Sleep has more variety. Also, as pointed out by @lucidrains, there's a certain surreal quality in Big Sleep which keeps me fascinated.
@lucidrains I didn't know this, I will try it out! |
Hi
@lucidrains, @afiaka87, @DrJKL, @enricoros, and all other contributors to this amazing project.
I am trying to figure out the following:
I saw there is an "--img" flag for using an image as prompt, but is there a way to use an image as initializer for the generation?
If not, then do you guys have any plans implementing it? I'd say this is probably be one of the most important feature still missing from Big Sleep and... I would be happy to contribute to this myself, but this is not my field (yet!) so I honestly have no idea where to start...
But ! If you think this could be similar to how other projects implemented it (for example deep daze), then I could take a look at it and try to make my way around it... and I could probably get somewhere...
So yeah, let me know what you think!
The text was updated successfully, but these errors were encountered: