-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: latest torch/comfyui; perf improvments; fix: SSL cert issues #309
base: main
Are you sure you want to change the base?
Conversation
@CodiumAI-Agent /describe |
This comment was marked as outdated.
This comment was marked as outdated.
Dockerfiles/README.md
Outdated
If your system is set up properly (see [Prerequisites](#prerequisites)) | ||
you can just [setup](https://github.com/Haidra-Org/horde-worker-reGen?tab=readme-ov-file#configure) your bridgeData.yaml file and then run | ||
```bash | ||
docker compose -f Dockerfiles/compse.[cuda|rocm].yaml build --pull |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo here, it should be compose.[
instead of compse.[
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch
The docker instructions aren't working for me (Arch Linux / nvidia GPU) git clone --sparse --branch raw-png https://github.com/Haidra-Org/horde-worker-reGen.git horde-worker-reGen-png
cd horde-worker-reGen-png/
git sparse-checkout set --no-cone Dockerfiles /bridgeData_template.yaml
docker compose -f Dockerfiles/compose.cuda.yaml build --pull
docker compose -f Dockerfiles/compose.cuda.yaml up -dV reGen | [notice] A new release of pip is available: 24.0 -> 24.3.1
reGen | [notice] To update, run: pip install --upgrade pip
reGen | 2024-10-30 18:40:57.711 | DEBUG | horde_worker_regen.load_env_vars:load_env_vars_from_config:68 - Using default AI Horde URL.
reGen | 2024-10-30 18:40:57.740 | DEBUG | horde_sdk:_dev_env_var_warnings:42 - AIWORKER_CACHE_HOME is ./models/.
reGen | 2024-10-30 18:40:59.707 | DEBUG | horde_model_reference.legacy.classes.legacy_converters:write_out_records:554 - Converted database written to: /horde-worker-reGen/models/horde_model_reference/stable_diffusion.json
reGen | 2024-10-30 18:41:00.050 | DEBUG | horde_model_reference.legacy.classes.legacy_converters:write_out_records:554 - Converted database written to: /horde-worker-reGen/models/horde_model_reference/stable_diffusion.json
reGen | 2024-10-30 18:41:00.061 | WARNING | horde_worker_regen.bridge_data.data_model:validate_performance_modes:162 - High memory mode is enabled. You may experience performance issues with more than one thread.
reGen | 2024-10-30 18:41:00.061 | WARNING | horde_worker_regen.bridge_data.data_model:validate_performance_modes:167 - Please let us know if `unload_models_from_vram_often` improves or degrades performance with `high_memory_mode` enabled.
reGen | 2024-10-30 18:41:01.056 | WARNING | horde_model_reference.model_reference_records:validator_is_style_known:132 - Unknown style control_qr for model control_qr
reGen | 2024-10-30 18:41:01.056 | WARNING | horde_model_reference.model_reference_records:validator_is_style_known:132 - Unknown style control_qr_xl for model control_qr_xl
reGen | 2024-10-30 18:41:01.061 | DEBUG | horde_sdk.ai_horde_worker.model_meta:remove_large_models:155 - Removing cascade models: {'Stable Cascade 1.0'}
reGen | 2024-10-30 18:41:01.061 | DEBUG | horde_sdk.ai_horde_worker.model_meta:remove_large_models:156 - Removing flux models: {'Flux.1-Schnell fp16 (Compact)', 'Flux.1-Schnell fp8 (Compact)'}
reGen | /horde-worker-reGen/venv/lib/python3.11/site-packages/transformers/utils/hub.py:128: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
reGen | warnings.warn(
reGen | 2024-10-30 18:41:02.834 | INFO | horde_safety.deep_danbooru_model:download_deep_danbooru_model:53 - Downloading DeepDanbooru model (~614 mb) to models/clip_blip/model-resnet_custom_v3.pt.
models/clip_blip/model-resnet_custom_v3.pt: 0% 0.00/644M [00:00<?, ?iB/s]2024-10-30 18:41:03.458 | INFO | horde_safety.deep_danbooru_model:download_deep_danbooru_model:63 - Model already downloaded.
reGen | 2024-10-30 18:41:03.458 | INFO | horde_safety.deep_danbooru_model:verify_deep_danbooru_model_hash:30 - Verifying SHA256 hash of downloaded file.
models/clip_blip/model-resnet_custom_v3.pt: 0% 0.00/644M [00:00<?, ?iB/s]
reGen | Loading CLIP model ViT-L-14/openai...
reGen | /horde-worker-reGen/venv/lib/python3.11/site-packages/open_clip/factory.py:372: UserWarning: These pretrained weights were trained with QuickGELU activation but the model config does not have that enabled. Consider using a model config with a "-quickgelu" suffix or enable with a flag.
reGen | warnings.warn(
reGen | Loaded CLIP model and data in 2.94 seconds.
reGen | 2024-10-30 18:41:06.832 | INFO | hordelib.comfy_horde:do_comfy_import:215 - Forcing normal vram mode
reGen | Traceback (most recent call last):
reGen | File "/horde-worker-reGen/download_models.py", line 25, in <module>
reGen | download_all_models(
reGen | File "/horde-worker-reGen/horde_worker_regen/download_models.py", line 58, in download_all_models
reGen | hordelib.initialise()
reGen | File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/initialisation.py", line 81, in initialise
reGen | hordelib.comfy_horde.do_comfy_import(
reGen | File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/comfy_horde.py", line 229, in do_comfy_import
reGen | import execution
reGen | File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/_comfyui/execution.py", line 13, in <module>
reGen | import nodes
reGen | File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/_comfyui/nodes.py", line 21, in <module>
reGen | import comfy.diffusers_load
reGen | File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/_comfyui/comfy/diffusers_load.py", line 3, in <module>
reGen | import comfy.sd
reGen | File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/_comfyui/comfy/sd.py", line 5, in <module>
reGen | from comfy import model_management
reGen | File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/_comfyui/comfy/model_management.py", line 143, in <module>
reGen | total_vram = get_total_memory(get_torch_device()) / (1024 * 1024)
reGen | ^^^^^^^^^^^^^^^^^^
reGen | File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/_comfyui/comfy/model_management.py", line 112, in get_torch_device
reGen | return torch.device(torch.cuda.current_device())
reGen | ^^^^^^^^^^^^^^^^^^^^^^^^^^^
reGen | File "/horde-worker-reGen/venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 778, in current_device
reGen | _lazy_init()
reGen | File "/horde-worker-reGen/venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init
reGen | torch._C._cuda_init()
reGen | RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx |
Do you have your system set up to make cuda work at all? Ironically getting nvidia to work inside docker is not as painless as AMD, due to their custom Kernel stuff |
I'm not sure what is and isn't required since I've not tested NVIDIA GPUs on Linux for a while, but you might need (some portion of) the cuda tooling installed locally. |
Yes. In fact, I created my own Dockerfile before I knew this branch existed, and it's running fine on my system as we speak. So I'm also stumped. I can dive a bit more into comparing the two containers to figure out what's going on. docker run --rm --gpus all ubuntu nvidia-smi --query-gpu=name --format=csv,noheader
NVIDIA GeForce RTX 4090 |
Please do. I haven't had much to do with the creation of the |
Good call. I compared the two deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: [gpu]
count: all |
I think you can just create a small separate PR to be merged into |
Done. #334 |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This already functionally happens in hordelib. We can use the bytestream passed from it directly.
* minor fixes USER root, venv creation after clone (otherwise git complains) and COPY the setup scripts * ninja for faster flash_attn build * mimimize diff
* rocm version via index url --extra-index-url https://download.pytorch.org/whl/rocmX.Y is sufficient to install the right package. This also makes A/B version testing easier. * build/fix: bring rocm reqs.txt in line with main reqs.txt * include wheel If this is not included, the flash_attn build fails on ROCm cards * Turn requirements.rocm.txt into a symlink This is a temporary measure for backwards compatibility. Once all scripts and Dockerfiles are updated it can be removed. * remove requirements.rocm.txt from scripts --------- Co-authored-by: tazlin <[email protected]>
* Sparse checkout * Create compose.rocm.yaml * Create compose.cuda.yaml * Docker Compose documentation * Syntax, fixes and clarifications * Update ROCm Version * fix compose up * build/fix: use env vars to control mounts w/ docker compose * docs: warn about compose config/mount behavior * docs: move old docker env config info to new location --------- Co-authored-by: tazlin <[email protected]>
scaled_dot_product_attention had its function signature changed to have the arg `enable_gqa` added. The hijack was therefore incompatible as if an attempt was made to call the hijack with the new arg, it would not be found and would raise an exception.
These are already bad situations. I would like to give the worker a fighting chance at staying alive. I have seen exceptions being thrown here cause a potentially recoverable situation into an irrecoverable one because the exceptions bubble up to the main process with no catch.
This reverts commit fb918d2.
* Fixed gpu configuration for compose.cuda.yaml The `count: all` setting was missing, which was causing CUDA to be unavailable in some cases. * Fix typos in README --------- Co-authored-by: Christian Bielert <[email protected]>
* Use SIGINT to stop the docker container This should allow the `docker stop` command to shut down the python process with SIGINT, allowing the process manager to stop the processes gracefully. * update docker README * Increase stop grace period Give the running processes more time to finish when stopping the docker container. * fix: give 2 minutes for worker shutdown w/ docker * docs: explain docker stop timeout + maint reasoning --------- Co-authored-by: Christian Bielert <[email protected]> Co-authored-by: tazlin <[email protected]>
* triton branch * install pytest * even newer version * set env variable container wide This needs to be set during build and in the individual worker threads context as well, otherwise a cuda version is used * check env variable * PyTorch 2.5.0 * tuning options * ROCm 6.2 * make sure the build works in conda env * try 256 head dimensions * enable flash_attn * check whether we want to build flash_attn * check is in install_amd_go_fast.sh * ROCm 6.2 * update optimizations --amd (basically setting --use-pytorch-cross-attention) degrades performance MIOPEN_FIND_MODE="FAST" is required for ROCm 6.2 to work as expected
* fix/style: hadolint Dockerfile lint fixes/recomendations * fix: re-add intended uninstall statement to ROCM image
There is some sort of incompatibility with the hadolint pre-commit hook in a github workflow. I am just going to stick to the github action for now.
docs: second revision of readme rewrite
New Features/Updates
very_fast_disk_mode
configuration option for concurrent model loading.false
.very_fast_disk_mode: false
to only load one model at a time when it is being explicitly preloaded. There are some cases where it still might attempt to load more than one but it should be far less often.3bb4dec
; torch >=2.4.1 and cu124 by default hordelib#343Fixes and Improvements
high_performance_mode
.max_threads
values greater than one.max_threads: 2
and a bit of tuning.high_performance_mode
if you have a xx90 card.high_memory_mode
can still lead to additional instability with threads at 2.max_threads: 2
in SD1.5-only setups without controlnets/post-processing or in other conservative configurations.rawpng
directly, reducing redundant operations.PIL.Image.open(...)
was highly inefficient, especially for very large images.certifi
to resolve certificate resolution issues.download_models.py
would not exit if the compvis models failed to download. This would cause the worker to crash unexpectedly as it expects the models to be available on worker start.Dockerfiles/README.md
for information on configuring these images.Developer changes
feat: add ROCm and CUDA Dockerfiles with entrypoint and setup scripts
compose
(Support for docker compose #328)