Sanity Check - Looking for a basic CIFAR10 hyperparameter set #331

samuelemarro · 2024-06-30T13:01:08Z

I'm running the denoising_diffusion_pytorch.py script as-is on the CIFAR10 dataset, however the FID quickly plateaus to ~90, which is a far cry from both those reported in the DDIM/DDPM paper and even in other open issues (e.g. #326). Here are my hyperparameters:

model = Unet(
    dim = 64,
    dim_mults = (1, 2, 4, 8),
    flash_attn = True,
    dropout=0.1
)

diffusion = GaussianDiffusion(
    model,
    image_size = 32,
    timesteps = 1000,
    sampling_timesteps = 250
)

trainer = Trainer(
    diffusion,
    './data/cifar10',
    train_batch_size = 32,
    train_lr = 8e-5,
    train_num_steps = 100000,
    gradient_accumulate_every = 2,
    ema_decay = 0.995,
    num_fid_samples=500,
    save_and_sample_every=1000,
    amp = False,
    calculate_fid = True
)

No matter how I tune it, I can't seem to beat ~70. Am I going crazy? I feel like there's something obvious I'm missing, but I can't see what.

The text was updated successfully, but these errors were encountered:

samuelemarro · 2024-07-01T20:26:34Z

Pinging in particular previous issue openers that reported FID scores/loss on CIFAR10 (@zzz313 @DavidXie03 @chengyiqiu1121), would be really grateful if you could take a look and see if there's something obviously wrong. Thank you!

samuelemarro · 2024-07-03T10:11:39Z

Pinging @lucidrains as well, hopefully if there's something I should obviously not be doing you probably have the best shot at noticing. Thanks!

chengyiqiu1121 · 2024-07-05T04:58:40Z

hi, the train_num_steps = 100000 is not enough. In my code, i set train_num_steps = 700000, and get FID around 20. Another thing is, this Unet of this package does not use dropout=0.1, which is mentioned in the original paper Denoising Diffusion Probabilistic Models, in Appendix B Experimental details.

chengyiqiu1121 · 2024-07-05T04:59:57Z

here is the unet config in my code, and after training, using DDIM sampler, the diffusion model gets FID 10.88

dataset_name: cifar10
lr: 2e-4
device: cuda:0
batch: 128
epoch: 700000
unet:
  dim: 128
  dim_mults: (1, 2, 2, 2)
  dropout: 0.1

samuelemarro · 2024-07-06T21:30:27Z

Thank you, I'll test it!

Maryeon · 2024-08-02T09:01:53Z

@samuelemarro Encountered the same problem as you, but I have found this codebase's implementation has some difference to the official implementation, such as the UNet strucuture (channel dim, multi-head or single-head attention), learning rate warmup. I am following this repo to reproduce the results on CIFAR10. Hope it will help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sanity Check - Looking for a basic CIFAR10 hyperparameter set #331

Sanity Check - Looking for a basic CIFAR10 hyperparameter set #331

samuelemarro commented Jun 30, 2024

samuelemarro commented Jul 1, 2024

samuelemarro commented Jul 3, 2024

chengyiqiu1121 commented Jul 5, 2024 •

edited

Loading

chengyiqiu1121 commented Jul 5, 2024 •

edited

Loading

samuelemarro commented Jul 6, 2024

Maryeon commented Aug 2, 2024

Sanity Check - Looking for a basic CIFAR10 hyperparameter set #331

Sanity Check - Looking for a basic CIFAR10 hyperparameter set #331

Comments

samuelemarro commented Jun 30, 2024

samuelemarro commented Jul 1, 2024

samuelemarro commented Jul 3, 2024

chengyiqiu1121 commented Jul 5, 2024 • edited Loading

chengyiqiu1121 commented Jul 5, 2024 • edited Loading

samuelemarro commented Jul 6, 2024

Maryeon commented Aug 2, 2024

chengyiqiu1121 commented Jul 5, 2024 •

edited

Loading

chengyiqiu1121 commented Jul 5, 2024 •

edited

Loading