Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DreamBooth LoRA: Can I get similar model performance in shorter time for a multi gpu setup (train_network.py)? #2926

Open
agarwalml opened this issue Oct 26, 2024 · 2 comments

Comments

@agarwalml
Copy link

agarwalml commented Oct 26, 2024

Hi, I'm trying to test training a dreambooth LoRA for SD1.5 faster on 4 GPUs as compared to 1 GPU. Would love any and all help!

I'm using code from the commit https://github.com/agarwalml/kohya_ss/commit/6c69b893e131dc428a21411f6212ee58a0819d30

With one GPU, with around 1600 steps, I get a good LoRA trained with the parameters below (taken from output json). However, this takes around 20 minutes or so, so I wanted to cut this time down as much as much as possible using multiple GPUs. I thought simply increasing the number of GPUs in the accelerate command num_processes 4 would do the trick but from multiple sources online, this doesn't seem to work that way and it takes similar time.

I saw a source ( https://www.pugetsystems.com/labs/hpc/multi-gpu-sd-training/ ) which claimed if I put max_train_epochs = 1 (in the toml) with num_processes 4 (4 GPUs) in the accelerate command, I could achieve the effect I wanted, but all this does is reduce the number of train_steps from 1600 to 275. Also, the trained model's quality is bad since it doesn't get me my desired level of LoRA quality (checked with sample images).

Is it possible to cut down LoRA training time while maintaining quality using multiple GPUs? If so, what settings should I change to make this possible?

Note: Sorry for the long post, I wanted to provide all the required context.

Parameters:

{
  "LoRA_type": "Standard",
  "LyCORIS_preset": "full",
  "adaptive_noise_scale": 0,
  "additional_parameters": "",
  "async_upload": false,
  "block_alphas": "",
  "block_dims": "",
  "block_lr_zero_threshold": "",
  "bucket_no_upscale": true,
  "bucket_reso_steps": 64,
  "bypass_mode": false,
  "cache_latents": false,
  "cache_latents_to_disk": false,
  "caption_dropout_every_n_epochs": 0,
  "caption_dropout_rate": 0.1,
  "caption_extension": ".txt",
  "clip_skip": 1,
  "color_aug": false,
  "constrain": 0,
  "conv_alpha": 1,
  "conv_block_alphas": "",
  "conv_block_dims": "",
  "conv_dim": 1,
  "dataset_config": "",
  "debiased_estimation_loss": false,
  "decompose_both": false,
  "dim_from_weights": false,
  "dora_wd": false,
  "down_lr_weight": "",
  "dynamo_backend": "no",
  "dynamo_mode": "default",
  "dynamo_use_dynamic": false,
  "dynamo_use_fullgraph": false,
  "enable_bucket": true,
  "epoch": 1,
  "extra_accelerate_launch_args": "",
  "factor": -1,
  "flip_aug": false,
  "fp8_base": false,
  "full_bf16": false,
  "full_fp16": false,
  "gpu_ids": "",
  "gradient_accumulation_steps": 1,
  "gradient_checkpointing": false,
  "huber_c": 0.1,
  "huber_schedule": "snr",
  "huggingface_path_in_repo": "",
  "huggingface_repo_id": "",
  "huggingface_repo_type": "",
  "huggingface_repo_visibility": "",
  "huggingface_token": "",
  "ip_noise_gamma": 0,
  "ip_noise_gamma_random_strength": false,
  "keep_tokens": 0,
  "learning_rate": 0.0001,
  "log_tracker_config": "",
  "log_tracker_name": "",
  "log_with": "",
  "logging_dir": "",
  "lora_network_weights": "",
  "loss_type": "l2",
  "lr_scheduler": "cosine",
  "lr_scheduler_args": "",
  "lr_scheduler_num_cycles": 1,
  "lr_scheduler_power": 1,
  "lr_warmup": 10,
  "main_process_port": 0,
  "masked_loss": false,
  "max_bucket_reso": 2048,
  "max_data_loader_n_workers": 0,
  "max_grad_norm": 1,
  "max_resolution": "512,512",
  "max_timestep": 1000,
  "max_token_length": 75,
  "max_train_epochs": 0,
  "max_train_steps": 1600,
  "mem_eff_attn": false,
  "metadata_author": "",
  "metadata_description": "",
  "metadata_license": "",
  "metadata_tags": "",
  "metadata_title": "",
  "mid_lr_weight": "",
  "min_bucket_reso": 256,
  "min_snr_gamma": 0,
  "min_timestep": 0,
  "mixed_precision": "fp16",
  "model_list": "custom",
  "module_dropout": 0,
  "multi_gpu": false,
  "multires_noise_discount": 0.3,
  "multires_noise_iterations": 0,
  "network_alpha": 48,
  "network_dim": 96,
  "network_dropout": 0,
  "noise_offset": 0.1,
  "noise_offset_random_strength": false,
  "noise_offset_type": "Original",
  "num_cpu_threads_per_process": 2,
  "num_machines": 1,
  "num_processes": 1,
  "optimizer": "AdamW8bit",
  "optimizer_args": "",
  "output_dir": "/home/ubuntu/work/kohya_ss/outputs",
  "output_name": "redactedname",
  "persistent_data_loader_workers": false,
  "pretrained_model_name_or_path": "runwayml/stable-diffusion-v1-5",
  "prior_loss_weight": 1,
  "random_crop": false,
  "rank_dropout": 0,
  "rank_dropout_scale": false,
  "reg_data_dir": "/home/ubuntu/work/redactedname_lora/reg",
  "rescaled": false,
  "resume": "",
  "resume_from_huggingface": "",
  "sample_every_n_epochs": 0,
  "sample_every_n_steps": 50,
  "sample_prompts": "redactedname, a photo of a man",
  "sample_sampler": "euler_a",
  "save_every_n_epochs": 1,
  "save_every_n_steps": 500,
  "save_last_n_steps": 0,
  "save_last_n_steps_state": 0,
  "save_model_as": "safetensors",
  "save_precision": "fp16",
  "save_state": false,
  "save_state_on_train_end": false,
  "save_state_to_huggingface": false,
  "scale_v_pred_loss_like_noise_pred": false,
  "scale_weight_norms": 0,
  "sdxl": false,
  "sdxl_cache_text_encoder_outputs": false,
  "sdxl_no_half_vae": false,
  "seed": 0,
  "shuffle_caption": true,
  "stop_text_encoder_training_pct": 0,
  "text_encoder_lr": 5e-05,
  "train_batch_size": 2,
  "train_data_dir": "/home/ubuntu/work/redactedname_lora/img",
  "train_norm": false,
  "train_on_input": true,
  "training_comment": "",
  "unet_lr": 0.0001,
  "unit": 1,
  "up_lr_weight": "",
  "use_cp": false,
  "use_scalar": false,
  "use_tucker": false,
  "v2": false,
  "v_parameterization": false,
  "v_pred_like_loss": 0,
  "vae": "",
  "vae_batch_size": 0,
  "wandb_api_key": "",
  "wandb_run_name": "",
  "weighted_captions": false,
  "xformers": "xformers"
}

Note the img folder contains the folder "50_redactedname man" with 22 captioned images and the regularization folder contains the folder "1_man" with 1100 images.

I run with the standard single GPU command kohya_ss/venv/bin/accelerate launch --dynamo_backend "no" --dynamo_mode "default" --mixed_precision "fp16" --num_processes 1 --num_machines 1 --num_cpu_threads_per_process 2 kohya_ss/sd-scripts/train_network.py --config_file redactedname.toml, I get a LoRA trained on my person (redactedname).

Note: I'm using code from the commit https://github.com/agarwalml/kohya_ss/commit/6c69b893e131dc428a21411f6212ee58a0819d30

Haven't upgraded yet since this took some effort to run on my local and I didn't want to break stuff unnecessarily

Judging the latest version of train_network.py, I don't see much difference but I may be wrong.
Thanks for all your help!

@agarwalml
Copy link
Author

Just wanted to bump this up! Think it's a really small/dumb issue I might be missing, so any help would be appreciated!

@agarwalml
Copy link
Author

agarwalml commented Oct 31, 2024

Any updates @bmaltais ?

Please would love even a short message directing me to the right resources.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant