feat: latest torch/comfyui; perf improvments; fix: SSL cert issues #309

tazlin · 2024-10-04T13:48:55Z

New Features/Updates

Updated PyTorch version to 2.5.0 and the CUDA default to cu124.
Added very_fast_disk_mode configuration option for concurrent model loading.
- The default is false.
- This causes all workers with very_fast_disk_mode: false to only load one model at a time when it is being explicitly preloaded. There are some cases where it still might attempt to load more than one but it should be far less often.
Updated horde dependencies to the latest versions.
- fix: ignore numba bytecode dumps; config for ignored messages hordelib#342
- feat: comfyui 3bb4dec; torch >=2.4.1 and cu124 by default hordelib#343

Fixes and Improvements

Improved the stability and performance of high_performance_mode.
- Jobs which are expected to be brief now do not block job pops. Additionally, less time is spent in general waiting if this mode is on.
Improved the stability and performance of max_threads values greater than one.
- xx90 series cards will likely see a large improvement with max_threads: 2 and a bit of tuning.
  - Important: You almost certainly will want high_performance_mode if you have a xx90 card.
- Note that cascade and flux, as well as high_memory_mode can still lead to additional instability with threads at 2.
- xx80 series cards may benefit from max_threads: 2 in SD1.5-only setups without controlnets/post-processing or in other conservative configurations.
Improved process management with enhanced deadlock detection and handling.
- Particularly, hang ups where all of the process were available and waiting should be more readily detected and corrected.
Optimized image processing by using rawpng directly, reducing redundant operations.
- The repeated call to PIL.Image.open(...) was highly inefficient, especially for very large images.
- The already encoded png sent from ComfyUI is used instead
Added SSL context using certifi to resolve certificate resolution issues.
Updated documentation to reflect changes in CUDA version and new configuration options.
Fixed a bug where the download_models.py would not exit if the compvis models failed to download. This would cause the worker to crash unexpectedly as it expects the models to be available on worker start.
The docker image scheme has been substantially reworked. See the developer changes below for more information.
- As a reminder, cloud systems such as runpod.io and vast.ai have good support for deploying docker images. See the new Dockerfiles/README.md for information on configuring these images.

Developer changes

feat: add ROCm and CUDA Dockerfiles with entrypoint and setup scripts

Made possible by the invaluable efforts of @HPPinata.
Introduced multi-stage Dockerfiles for CUDA and ROCm, supporting customizable build arguments and pip caching.
Added new entrypoint script to handle environment setup and execution for both CUDA and ROCm environments.
Documented the usage of new Dockerfiles, including build and run instructions, configuration options, and customization possibilities.
Updated advanced README to include options for mounting configuration files and model directories.
Support for docker compose (Support for docker compose #328)
Flash Triton #333
Fixed gpu configuration for compose.cuda.yaml #334
Use SIGINT to stop the docker container #335

To see the specific tasks where the Asana app for GitHub is being used, see below:
- https://app.asana.com/0/0/1208486016776942

tazlin · 2024-10-04T17:52:36Z

@CodiumAI-Agent /describe

CIB · 2024-10-30T18:22:46Z

Dockerfiles/README.md

+If your system is set up properly (see [Prerequisites](#prerequisites))
+you can just [setup](https://github.com/Haidra-Org/horde-worker-reGen?tab=readme-ov-file#configure) your bridgeData.yaml file and then run
+```bash
+docker compose -f Dockerfiles/compse.[cuda|rocm].yaml build --pull


Typo here, it should be compose.[ instead of compse.[

CIB · 2024-10-30T18:45:57Z

The docker instructions aren't working for me (Arch Linux / nvidia GPU)

git clone --sparse --branch raw-png https://github.com/Haidra-Org/horde-worker-reGen.git horde-worker-reGen-png
cd horde-worker-reGen-png/
git sparse-checkout set --no-cone Dockerfiles /bridgeData_template.yaml
docker compose -f Dockerfiles/compose.cuda.yaml build --pull
docker compose -f Dockerfiles/compose.cuda.yaml up -dV

reGen  | [notice] A new release of pip is available: 24.0 -> 24.3.1
reGen  | [notice] To update, run: pip install --upgrade pip
reGen  | 2024-10-30 18:40:57.711 | DEBUG    | horde_worker_regen.load_env_vars:load_env_vars_from_config:68 - Using default AI Horde URL.
reGen  | 2024-10-30 18:40:57.740 | DEBUG    | horde_sdk:_dev_env_var_warnings:42 - AIWORKER_CACHE_HOME is ./models/.
reGen  | 2024-10-30 18:40:59.707 | DEBUG    | horde_model_reference.legacy.classes.legacy_converters:write_out_records:554 - Converted database written to: /horde-worker-reGen/models/horde_model_reference/stable_diffusion.json
reGen  | 2024-10-30 18:41:00.050 | DEBUG    | horde_model_reference.legacy.classes.legacy_converters:write_out_records:554 - Converted database written to: /horde-worker-reGen/models/horde_model_reference/stable_diffusion.json
reGen  | 2024-10-30 18:41:00.061 | WARNING  | horde_worker_regen.bridge_data.data_model:validate_performance_modes:162 - High memory mode is enabled. You may experience performance issues with more than one thread.
reGen  | 2024-10-30 18:41:00.061 | WARNING  | horde_worker_regen.bridge_data.data_model:validate_performance_modes:167 - Please let us know if `unload_models_from_vram_often` improves or degrades performance with `high_memory_mode` enabled.
reGen  | 2024-10-30 18:41:01.056 | WARNING  | horde_model_reference.model_reference_records:validator_is_style_known:132 - Unknown style control_qr for model control_qr
reGen  | 2024-10-30 18:41:01.056 | WARNING  | horde_model_reference.model_reference_records:validator_is_style_known:132 - Unknown style control_qr_xl for model control_qr_xl
reGen  | 2024-10-30 18:41:01.061 | DEBUG    | horde_sdk.ai_horde_worker.model_meta:remove_large_models:155 - Removing cascade models: {'Stable Cascade 1.0'}
reGen  | 2024-10-30 18:41:01.061 | DEBUG    | horde_sdk.ai_horde_worker.model_meta:remove_large_models:156 - Removing flux models: {'Flux.1-Schnell fp16 (Compact)', 'Flux.1-Schnell fp8 (Compact)'}
reGen  | /horde-worker-reGen/venv/lib/python3.11/site-packages/transformers/utils/hub.py:128: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
reGen  |   warnings.warn(
reGen  | 2024-10-30 18:41:02.834 | INFO     | horde_safety.deep_danbooru_model:download_deep_danbooru_model:53 - Downloading DeepDanbooru model (~614 mb) to models/clip_blip/model-resnet_custom_v3.pt.
models/clip_blip/model-resnet_custom_v3.pt:   0% 0.00/644M [00:00<?, ?iB/s]2024-10-30 18:41:03.458 | INFO     | horde_safety.deep_danbooru_model:download_deep_danbooru_model:63 - Model already downloaded.
reGen  | 2024-10-30 18:41:03.458 | INFO     | horde_safety.deep_danbooru_model:verify_deep_danbooru_model_hash:30 - Verifying SHA256 hash of downloaded file.
models/clip_blip/model-resnet_custom_v3.pt:   0% 0.00/644M [00:00<?, ?iB/s]
reGen  | Loading CLIP model ViT-L-14/openai...
reGen  | /horde-worker-reGen/venv/lib/python3.11/site-packages/open_clip/factory.py:372: UserWarning: These pretrained weights were trained with QuickGELU activation but the model config does not have that enabled. Consider using a model config with a "-quickgelu" suffix or enable with a flag.
reGen  |   warnings.warn(
reGen  | Loaded CLIP model and data in 2.94 seconds.
reGen  | 2024-10-30 18:41:06.832 | INFO     | hordelib.comfy_horde:do_comfy_import:215 - Forcing normal vram mode
reGen  | Traceback (most recent call last):
reGen  |   File "/horde-worker-reGen/download_models.py", line 25, in <module>
reGen  |     download_all_models(
reGen  |   File "/horde-worker-reGen/horde_worker_regen/download_models.py", line 58, in download_all_models
reGen  |     hordelib.initialise()
reGen  |   File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/initialisation.py", line 81, in initialise
reGen  |     hordelib.comfy_horde.do_comfy_import(
reGen  |   File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/comfy_horde.py", line 229, in do_comfy_import
reGen  |     import execution
reGen  |   File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/_comfyui/execution.py", line 13, in <module>
reGen  |     import nodes
reGen  |   File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/_comfyui/nodes.py", line 21, in <module>
reGen  |     import comfy.diffusers_load
reGen  |   File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/_comfyui/comfy/diffusers_load.py", line 3, in <module>
reGen  |     import comfy.sd
reGen  |   File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/_comfyui/comfy/sd.py", line 5, in <module>
reGen  |     from comfy import model_management
reGen  |   File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/_comfyui/comfy/model_management.py", line 143, in <module>
reGen  |     total_vram = get_total_memory(get_torch_device()) / (1024 * 1024)
reGen  |                                   ^^^^^^^^^^^^^^^^^^
reGen  |   File "/horde-worker-reGen/venv/lib/python3.11/site-packages/hordelib/_comfyui/comfy/model_management.py", line 112, in get_torch_device
reGen  |     return torch.device(torch.cuda.current_device())
reGen  |                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
reGen  |   File "/horde-worker-reGen/venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 778, in current_device
reGen  |     _lazy_init()
reGen  |   File "/horde-worker-reGen/venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 293, in _lazy_init
reGen  |     torch._C._cuda_init()
reGen  | RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

HPPinata · 2024-10-30T18:52:54Z

The docker instructions aren't working for me (Arch Linux / nvidia GPU)

git clone --sparse --branch raw-png https://github.com/Haidra-Org/horde-worker-reGen.git horde-worker-reGen-png
cd horde-worker-reGen-png/
git sparse-checkout set --no-cone Dockerfiles /bridgeData_template.yaml
docker compose -f Dockerfiles/compose.cuda.yaml build --pull
docker compose -f Dockerfiles/compose.cuda.yaml up -dV

Do you have your system set up to make cuda work at all?
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

Ironically getting nvidia to work inside docker is not as painless as AMD, due to their custom Kernel stuff
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/sample-workload.html

HPPinata · 2024-10-30T18:55:59Z

I'm not sure what is and isn't required since I've not tested NVIDIA GPUs on Linux for a while, but you might need (some portion of) the cuda tooling installed locally.

CIB · 2024-10-30T19:00:00Z

The docker instructions aren't working for me (Arch Linux / nvidia GPU)
git clone --sparse --branch raw-png https://github.com/Haidra-Org/horde-worker-reGen.git horde-worker-reGen-png
cd horde-worker-reGen-png/
git sparse-checkout set --no-cone Dockerfiles /bridgeData_template.yaml
docker compose -f Dockerfiles/compose.cuda.yaml build --pull
docker compose -f Dockerfiles/compose.cuda.yaml up -dV
Do you have your system set up to make cuda work at all? sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

Ironically getting nvidia to work inside docker is not as painless as AMD, due to their custom Kernel stuff https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/sample-workload.html

Yes. In fact, I created my own Dockerfile before I knew this branch existed, and it's running fine on my system as we speak. So I'm also stumped. I can dive a bit more into comparing the two containers to figure out what's going on.

docker run --rm --gpus all ubuntu nvidia-smi --query-gpu=name --format=csv,noheader
NVIDIA GeForce RTX 4090

HPPinata · 2024-10-30T19:08:22Z

Yes. In fact, I created my own Dockerfile before I knew this branch existed, and it's running fine on my system as we speak. So I'm also stumped. I can dive a bit more into comparing the two containers to figure out what's going on.

Please do. I haven't had much to do with the creation of the Dockerfile.cuda and @tazlin found it to be working iirc. but the compose.cuda.yaml is a complete blindshot based on what worked for AMD and what I found online.
There might very well be a few issues with that, especially around exposing the GPU to the container.

CIB · 2024-10-30T19:33:00Z

There might very well be a few issues with that, especially around exposing the GPU to the container.

Good call. I compared the two docker-compose.yml files, and found that the gpu configurations were ever so slightly different. With count: all added here, now the error is gone.

    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            capabilities: [gpu]
            count: all

HPPinata · 2024-10-30T19:41:23Z

I think you can just create a small separate PR to be merged into raw-png (not main).
This wouldn't fit anything I have open, shouldn't conflict with much else either and you should be the one credited for fixing what was broken.

CIB · 2024-10-30T19:58:09Z

I think you can just create a small separate PR to be merged into raw-png (not main). This wouldn't fit anything I have open, shouldn't conflict with much else either and you should be the one credited for fixing what was broken.

Done. #334

This already functionally happens in hordelib. We can use the bytestream passed from it directly.

* minor fixes USER root, venv creation after clone (otherwise git complains) and COPY the setup scripts * ninja for faster flash_attn build * mimimize diff

* rocm version via index url --extra-index-url https://download.pytorch.org/whl/rocmX.Y is sufficient to install the right package. This also makes A/B version testing easier. * build/fix: bring rocm reqs.txt in line with main reqs.txt * include wheel If this is not included, the flash_attn build fails on ROCm cards * Turn requirements.rocm.txt into a symlink This is a temporary measure for backwards compatibility. Once all scripts and Dockerfiles are updated it can be removed. * remove requirements.rocm.txt from scripts --------- Co-authored-by: tazlin <[email protected]>

* Sparse checkout * Create compose.rocm.yaml * Create compose.cuda.yaml * Docker Compose documentation * Syntax, fixes and clarifications * Update ROCm Version * fix compose up * build/fix: use env vars to control mounts w/ docker compose * docs: warn about compose config/mount behavior * docs: move old docker env config info to new location --------- Co-authored-by: tazlin <[email protected]>

scaled_dot_product_attention had its function signature changed to have the arg `enable_gqa` added. The hijack was therefore incompatible as if an attempt was made to call the hijack with the new arg, it would not be found and would raise an exception.

These are already bad situations. I would like to give the worker a fighting chance at staying alive. I have seen exceptions being thrown here cause a potentially recoverable situation into an irrecoverable one because the exceptions bubble up to the main process with no catch.

This reverts commit fb918d2.

* Fixed gpu configuration for compose.cuda.yaml The `count: all` setting was missing, which was causing CUDA to be unavailable in some cases. * Fix typos in README --------- Co-authored-by: Christian Bielert <[email protected]>

* Use SIGINT to stop the docker container This should allow the `docker stop` command to shut down the python process with SIGINT, allowing the process manager to stop the processes gracefully. * update docker README * Increase stop grace period Give the running processes more time to finish when stopping the docker container. * fix: give 2 minutes for worker shutdown w/ docker * docs: explain docker stop timeout + maint reasoning --------- Co-authored-by: Christian Bielert <[email protected]> Co-authored-by: tazlin <[email protected]>

* triton branch * install pytest * even newer version * set env variable container wide This needs to be set during build and in the individual worker threads context as well, otherwise a cuda version is used * check env variable * PyTorch 2.5.0 * tuning options * ROCm 6.2 * make sure the build works in conda env * try 256 head dimensions * enable flash_attn * check whether we want to build flash_attn * check is in install_amd_go_fast.sh * ROCm 6.2 * update optimizations --amd (basically setting --use-pytorch-cross-attention) degrades performance MIOPEN_FIND_MODE="FAST" is required for ROCm 6.2 to work as expected

* fix/style: hadolint Dockerfile lint fixes/recomendations * fix: re-add intended uninstall statement to ROCM image

There is some sort of incompatibility with the hadolint pre-commit hook in a github workflow. I am just going to stick to the github action for now.

docs: second revision of readme rewrite

tazlin mentioned this pull request Oct 4, 2024

fix: use a certifi ssl context for r2 uploads #306

Closed

This comment was marked as outdated.

Sign in to view

tazlin force-pushed the raw-png branch from 86a994c to 5e203d9 Compare October 6, 2024 08:15

tazlin linked an issue Oct 6, 2024 that may be closed by this pull request

horde-bridge script proceeds to start worker even if downloads fail #92

Open

tazlin linked an issue Oct 20, 2024 that may be closed by this pull request

Different requirements.X.txt versions should be checked for consistency against `requirements.txt. #314

Open

tazlin marked this pull request as ready for review October 20, 2024 16:14

CIB reviewed Oct 30, 2024

View reviewed changes

tazlin mentioned this pull request Nov 2, 2024

Worker can enter a state where there is constant model swapping #336

Open

This comment was marked as resolved.

Sign in to view

tazlin linked an issue Nov 3, 2024 that may be closed by this pull request

Option to use the disk lock (limit concurrent disk reads/writes) for slow disks #6

Open

tazlin added 10 commits November 4, 2024 09:38

fix: don't run image.save() twice

4d5d752

This already functionally happens in hordelib. We can use the bytestream passed from it directly.

feat: use torch 2.4.1 and cu124 by default

2462737

feat: use latest horde deps w/ latest comfyui+fixes

63baf2e

build/fix: condense and update dockerfiles

e44c98f

chore: version bump

4f98ca9

fix: pop more often with threads>1

72b79ad

fix: wait less time w/ high perf. mode

af18d0a

fix: dont pause at all for short jobs on high perf mode

276aa38

fix: wait even less w/ high perf mode

b277af1

docs/fix: clarify certain stats/config in logs and docstrings

94af426

tazlin and others added 28 commits November 4, 2024 09:38

build/fix: remove amd_go_fast from rocm dockerfile

a227235

fix: new dockerfile scheme fixes (#326)

bf3579d

* minor fixes USER root, venv creation after clone (otherwise git complains) and COPY the setup scripts * ninja for faster flash_attn build * mimimize diff

tests/fix: remove obsolete test

6350700

chore/dev: update developer dependencies

f7635fa

fix: remove obsolete numpy <2.0 pin

bec02fa

style: fix

154a4b3

feat: torch==2.5.0, latest comfyui via horde_engine~=2.17.0

87e8124

chore: version bump

2f0a688

fix: match intended version pins

cba9a73

fix: corrects missed amd hijack passthrough to func

8dbef89

Revert "tests/fix: remove obsolete test"

ae335d9

This reverts commit fb918d2.

tests: readd rocm file check

9ea9bfe

fix/tests: readd rocm reqs.txt, allow different torch versions

e670304

style: fix

8410e67

fix: accurate references to reqs.rocm.txt

adeedd4

Make req.rocm.txt not a symlink

0997620

fix: use horde_engine==2.17.1

cd42907

fix: gpu configuration for compose.cuda.yaml (#334)

9df214e

* Fixed gpu configuration for compose.cuda.yaml The `count: all` setting was missing, which was causing CUDA to be unavailable in some cases. * Fix typos in README --------- Co-authored-by: Christian Bielert <[email protected]>

style: fix

833a95e

chore/deps: update pre-commit+dev deps

11ff693

fix/style: hadolint Dockerfile lint fixes/recomendations (#338)

e7f77b3

* fix/style: hadolint Dockerfile lint fixes/recomendations * fix: re-add intended uninstall statement to ROCM image

ci/fix: use github action for hadolint instead

fad9161

There is some sort of incompatibility with the hadolint pre-commit hook in a github workflow. I am just going to stick to the github action for now.

tazlin force-pushed the raw-png branch from 38fee17 to fad9161 Compare November 4, 2024 14:39

docs: main readme rewrites

c787cf5

docs: second revision of readme rewrite

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: latest torch/comfyui; perf improvments; fix: SSL cert issues #309

feat: latest torch/comfyui; perf improvments; fix: SSL cert issues #309

tazlin commented Oct 4, 2024 •

edited

Loading

tazlin commented Oct 4, 2024

This comment was marked as outdated.

CIB Oct 30, 2024

HPPinata Oct 30, 2024

CIB commented Oct 30, 2024

HPPinata commented Oct 30, 2024

HPPinata commented Oct 30, 2024

CIB commented Oct 30, 2024

HPPinata commented Oct 30, 2024

CIB commented Oct 30, 2024

HPPinata commented Oct 30, 2024 •

edited

Loading

CIB commented Oct 30, 2024

This comment was marked as resolved.

This comment was marked as resolved.

feat: latest torch/comfyui; perf improvments; fix: SSL cert issues #309

Are you sure you want to change the base?

feat: latest torch/comfyui; perf improvments; fix: SSL cert issues #309

Conversation

tazlin commented Oct 4, 2024 • edited Loading

New Features/Updates

Fixes and Improvements

Developer changes

feat: add ROCm and CUDA Dockerfiles with entrypoint and setup scripts

tazlin commented Oct 4, 2024

This comment was marked as outdated.

CIB Oct 30, 2024

Choose a reason for hiding this comment

HPPinata Oct 30, 2024

Choose a reason for hiding this comment

CIB commented Oct 30, 2024

HPPinata commented Oct 30, 2024

HPPinata commented Oct 30, 2024

CIB commented Oct 30, 2024

HPPinata commented Oct 30, 2024

CIB commented Oct 30, 2024

HPPinata commented Oct 30, 2024 • edited Loading

CIB commented Oct 30, 2024

This comment was marked as resolved.

This comment was marked as resolved.

tazlin commented Oct 4, 2024 •

edited

Loading

HPPinata commented Oct 30, 2024 •

edited

Loading