Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: When testing locally and looping runsync, it eventually stalls #40

Open
vesper8 opened this issue Jun 16, 2024 · 9 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@vesper8
Copy link

vesper8 commented Jun 16, 2024

Describe the bug

I'm testing the API locally before deploying to Runpod. I'm testing on a 4070 Super. When I make a single call to /runsync it will complete without fail every time.. and do a really nice job of it. But if I loop let's say 10 requests, it will always eventually stall. There is no more output in the terminal, and the fans keep on spinning.. it just gets stuck.. one would guess there might be some kind of memory leak. Or maybe it tries to load the same models again and again and the memory runs out. It's rather hard to debug I guess. This doesn't seem to happen if I'm generating small images but when they are larger images that take longer to generate, it happens without fail.

I should add that I'm using the same checkpoint, and doing the same operations in my loop, so it's not like I'm requesting it to load a different model repeatedly.

Is there a way to force a memory clean in between generations.. or maybe run with a higher level of verbosity?

@vesper8 vesper8 added the bug Something isn't working label Jun 16, 2024
@vesper8
Copy link
Author

vesper8 commented Jun 16, 2024

It seems to be related to the /upload/image somehow.

I'm passing the same image, converted to base64, for each iteration of my loop. This isn't a particular big image, only about 300kb.

It works fine the first couple of times but then I start seeing this error over and over again:

comfyui-worker | DEBUG  | test-09828fb9-5b04-46cb-bae2-205ca2680559 | run_job return: {'error': '{"error_type": "<class \'requests.exceptions.ConnectionError\'>", "error_message": "HTTPConnectionPool(host=\'127.0.0.1\', port=8188): Max retries exceeded with url: /upload/image (Caused by NewConnectionError(\'<urllib3.connection.HTTPConnection object at 0x7fd5cead31f0>: Failed to establish a new connection: [Errno 111] Connection refused\'))", "error_traceback": "Traceback (most recent call last):\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/connection.py\\", line 174, in _new_conn\\n    conn = connection.create_connection(\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py\\", line 95, in create_connection\\n    raise err\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py\\", line 85, in create_connection\\n    sock.connect(sa)\\nConnectionRefusedError: [Errno 111] Connection refused\\n\\nDuring handling of the above exception, another exception occurred:\\n\\nTraceback (most recent call last):\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py\\", line 715, in urlopen\\n    httplib_response = self._make_request(\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py\\", line 416, in _make_request\\n    conn.request(method, url, **httplib_request_kw)\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/connection.py\\", line 244, in request\\n    super(HTTPConnection, self).request(method, url, body=body, headers=headers)\\n  File \\"/usr/lib/python3.10/http/client.py\\", line 1283, in request\\n    self._send_request(method, url, body, headers, encode_chunked)\\n  File \\"/usr/lib/python3.10/http/client.py\\", line 1329, in _send_request\\n    self.endheaders(body, encode_chunked=encode_chunked)\\n  File \\"/usr/lib/python3.10/http/client.py\\", line 1278, in endheaders\\n    self._send_output(message_body, encode_chunked=encode_chunked)\\n  File \\"/usr/lib/python3.10/http/client.py\\", line 1038, in _send_output\\n    self.send(msg)\\n  File \\"/usr/lib/

No idea why it works at first and then doesn't..

@TimPietrusky
Copy link
Member

@vesper8 thanks for reporting this.

Do you maybe have a repo / script that we can use to simulate exactly what you are doing? It would help us a lot to just get into testing.

On a first glance, it sounds like a problem in ComfyUI itself, but to be sure, we will also test this.

@vesper8
Copy link
Author

vesper8 commented Jun 17, 2024

I don't have a repo that I can share.. but let me explain in greater detail what I'm doing and maybe that will help.

I have a Windows machine on my private home network that has a powerful GPU. I set up runpod-worker-comfy there following the setup instructions and forwarded the ports so that I can access the UI, and the API, from any other machine on my home network.

Then, from my Macbook I have a very basic Laravel command that sends the workflow and base64 image to the API running on my Windows machine.

This works great for a few images, but if I repeatedly send more images it always ends up stalling with the error message above.

It's as if the API is not in a ready-state at some point and craps out. This isn't a problem when generating images that don't have an input image.

I think overall the input image logic introduced in 2.0 could maybe be improved so that we could pass an absolute url, such as an S3 url, or maybe we can pass an image file directly instead of having to b64 encode it. Or maybe if there was a way to say "use this one image for all of these generations". I'm not sure.. just throwing out ideas. Maybe the first step is to understand why exactly the image upload works initially and then stops working when the load is too heavy.

@TimPietrusky
Copy link
Member

@vesper8 thanks for the detailed explanation.

Do you wait inbetween requests until the former request was handled? Or do you send multiple requests at once?

@vesper8
Copy link
Author

vesper8 commented Jun 17, 2024

I use the /runsync endpoint and I don't send another http request to the api until the first one has completed and I've gotten the image back from it. I even added a 1 second wait in between requests.

@TimPietrusky
Copy link
Member

TimPietrusky commented Jun 17, 2024

@vesper8 ok thank you, this is enough information to actually start debugging.

@TimPietrusky TimPietrusky self-assigned this Jun 17, 2024
@vesper8
Copy link
Author

vesper8 commented Jun 18, 2024

thank you! I hope you can at least reproduce it easily. I've been working with it today and it continues to happen a lot.. I'm never able to do more than 3 images at a time. And when it stalls.. the UI at http://192.168.2.179:8188/ becomes unreachable and it seems the only thing to do is CTRL-C the docker instance and bring it back up.

I tried enabling REFRESH_WORKER in the docker-compose.yml but that doesn't seem to have any effect.. is that only for running on Runpod and doesn't affect locally?

It would be nice to have a similar flag for local testing. A way to start with a clean slate before processing the next image.

Right now it's so unclear whether it's my setup running out of memory or what.. the log doesn't say much.. isn't there a way to enabling more logging?

here are some more logs that just happened:


comfyui-worker | INFO   | test-978b2b1c-ce09-4849-8b7d-3fd318f016c2 | Started.
comfyui-worker | runpod-worker-comfy - API is reachable
comfyui-worker | runpod-worker-comfy - image(s) upload
comfyui-worker | runpod-worker-comfy - image(s) upload complete
comfyui-worker | got prompt
comfyui-worker | runpod-worker-comfy - queued workflow with ID da9468ab-e4bc-4513-9cb1-6cb2ad9b398c
comfyui-worker | runpod-worker-comfy - wait until image generation is complete
comfyui-worker | Requested to load SDXLClipModel
comfyui-worker | Loading 1 new model
comfyui-worker | /usr/local/lib/python3.10/dist-packages/insightface/utils/transform.py:68: FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
comfyui-worker | To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.
comfyui-worker |   P = np.linalg.lstsq(X_homo, Y)[0].T # Affine matrix. 3 x 4
comfyui-worker | /usr/local/lib/python3.10/dist-packages/torchvision/transforms/v2/_deprecated.py:41: UserWarning: The transform `ToTensor()` is deprecated and will be removed in a future release. Instead, please use `transforms.Compose([transforms.ToImageTensor(), transforms.ConvertImageDtype()])`.
comfyui-worker |   warnings.warn(
comfyui-worker | Requested to load SDXL
comfyui-worker | Loading 1 new model
100%|██████████| 20/20 [00:07<00:00,  2.72it/s]
comfyui-worker | Prompt executed in 16.02 seconds
comfyui-worker | runpod-worker-comfy - image generation is done
comfyui-worker | runpod-worker-comfy - /comfyui/output/ComfyUI/test_00053_.png
comfyui-worker | runpod-worker-comfy - the image was generated and converted to base64
comfyui-worker | DEBUG  | test-978b2b1c-ce09-4849-8b7d-3fd318f016c2 | Handler output: {'status': 'success', 'message': 'iVBORw0KGgoAAAANSUhEUgAAAvgAAAL4CAIAAACBQBS0AAEAAElEQVR4Xoz97XLkOrIlCq7lDpIRkjJz711V3T028zjzIPPiM2Yz93afU1/7IzOlCBJwX/eHgyFlnWqzQaYUJOBw+DccABni//v/9f/8RKaSUJIGImUAAEkASUICIBCEwLomCwgCANTdvJnXBAEBBPWxoUrheYAI4AQSgLMvAHFC/+9LkYqJZ16/U1bk/0vhHFmPawKSAaRIQCJASBAIzvbqSkAkU5z0AYCsBitmHkOKLJZIKQlKUxw8wXT+TExZPw/Ec1icbJbUH2j4qJkdoBqytFX1AiiBOhVDMIspFs7Srqo7AUgqrQMljTzRA5RkczwWZAnq5EMo0QGApgEUJqJopjIBFB0AjUpBJHIyNsmDBGUSSTIj7Rxxyj6kBAREUkpl1tgjlchUjlSmhD4iIyNjCEfkGDmUh7SnurJLR+oQjswgDqALQ5mc9K6Lr0sDFFKX7j26cQf2zA4M4B4aUAcG0VNDCCKIkUggUmkckoTIFJmCABojlERKghIsOZ/qxRQuEZpWQEKSkwYaaaQBRhIwUkKrG8mMkBa3ZWmLt21ZFvOleXNz0pyQooc1pgQgM0SMVIQiIqUgMtSPXtcSRg+RWUqQoKKcDyPU/DnVX5yQAFFmwgeHJECSRDFS3EHwYiHTjUiZ0QADmhOCAQYZSZKSEQYagJR7CYSUzMwIigY5QdJhRjnVSE+Y1Mhm5mAzNGVLrMAKLMwVWMwWw+axGS5Md2xNW8NiWhrcRmMakpQhHQkBhBMzigJGIAAgSEkEJKQIICSCScAIMslw66sf1/bdl9fL5dv68v3pz39sP31rL9/t86/5/Lc7//Z9//Xb/sevR3+l99ze/viJ40kjxiHrr9/+oNk9xhF3WCSQJigryCtVqklSLDckyg7BqaWpLUIEyYqRRgAESataY1VWOIG7Tc0+tD3bSLOqF0QaAAjkx+HmLSpGFCUTDvUJ1ugAxIKvHyOBGqHwNIKCmRmNRpM500kTnEbCjEa46KQDC5sDq3EBF3IBNuNGXMmr4Wq2URt1Ma1MRzpzMZkFDQYQMgKAQyTMUFomRINSVkwYytxNMy5O1knV9fyYTRRFETyBP8gEoGYPgCV0kvqAARLJChUosCqF5ryrHgLeNVHy/YjqpG5efSwq7VfjWQdMrKWtE/B/X/4LihqfgIqYB0oJj0QBqgkCEKgHClrIK4oGPczvm7UXiSk3F+QSdMpPMvKc+DAHrSkRmJ4xr3/k4EFvDT5rTpjTbucdcbYRJ3LM24J4r5rlga1YRF0+bvjet4rq50ciAUxaWH2JkhpJmVhWUmDExC2JxZMA1Nw91UcCsql0AO/iedAKqeyW/EBNzQHnLT+o+4P1V3m34oefPIgB+chgCnL2L4AHfr7fEcAZHN5ra7RSXOUZQNkagYrdgihMT6tOxQNF6cxp3oX1YTygcEuoIOqQZrIDnRn2SWelm4BEiURFZKch9c59qkgRQCMEpFFQBABIczAwlQL0uM4UlUBKMCiUUBKAaPxBgUUrITIkNwJISc6ZxJIgJNGnBPUuMwKiQYL81BcFQ0oyFkklWQJ5Ok0KsBJt0fSQYFnmKX0DqDOgCKAoQDXpl2TMaaQb3WlOOqyxEqBZTClFpqBQZmhIEZnSSI3MEVmpYiYyM4EUlKWclCRhMlKMC2XY0xaAEtNDIhCmnYCcUx9UfYpqo5Rz5VBBCQBB46jVAiAiKQPMICIkA6whZw5UASVZWRFhhIOQqHTASC/TAjxoAJUNauAiuHIhVnIBF9PWcDFeiIW5mC6NT43bgs25mq1kc4FGyioDRNZKpQzEDACdSFApEiYIcrFkVL+MBkV2ueGyEeMQ3pK/h8TLCIvPC/np+Xp9+unzpz9+bm83P+49337u99vv+w3jvrD7p+e433ncWjSijx7SkJIKg5JZgjYiSYCay4qy3BIzUHPktCtWrlOXoh7+W+AEaN
comfyui-worker | ...TRUNCATED 923404 CHARACTERS...
comfyui-worker | 3vVJgp2/K0XY3pMadMVc+eqMbA7r0Zpqq72yoxqXhhCyS7u7tG7utpICRps9tesYA8xWoxl8nEWtxOWiOva0tWYnxbWxj2ThzYwhgBpZKCaVnrsidJLijlIkiMrlWlkk1VjSQ8XvOH8RgSMVwHTWVOtpT5vhaNIM8e4+Qxujar5myO/Ke7pTG7jZG6e29Nt/K+V7Bx05arKCWaGbHusBnWptzYjpqShlw9ZdRtWarSGFahkeetahsJEapCIntFUo0hRDhU29QQKjVztjw1tti6ZDQo3NFIvJYVRs2rjhw5cxgohLBMWABCUglbldxTgFTAkiKRsSAdDIoEJbFSflGShI2U4VcpSbKtOA1AzCLw3WC7cbB3+MK5gGVpDvWpJjWnXex7zVNN17RuO13iNOY+eow5Rm8buEz1Xm5UjA1p1HoP4L637XbXkLr3fbetE7g1ynNqs0Qh3MqUUmFLlJDAUexKDA+lgQuBpx0JSdiOBwA5laM3bNcYxtO4XaWeXaJNVWGzlSTN7u7axlZVnhs+v/tw+fDN+4/faLu0xu5xnTrVSVZpV7cLz5qmKFTFtk9Rpzm3a2/t86jLw/ZOp0trwDifRpKUURpVzJiUS8uS2hYawmZUtZ2501KE0TZyjWWIlRmzanYPqQazjXsgeRaMnid067lPX7Clfc7ZtHDp1r0nFqF97tOY+GMfMzZek7csjmoh7oewpBybji8gZGMfegOa6JHEk1i43RNQhSE4dAr2Qi8fXSr0xFpivMR3BFbJdi47owohbss+HAbZagrmnExKeLQ8ZaY1LoVb3rcvl3rYoSfTs5AGo4JKbe2MWaeqFmpjkGiHXwmMss6A3H9ZUSci9Ii4qRPDw4MBeXpFLlcBxlEzq94cTu1D4hl3iZV4wtEHkI+kzYtayIvrAGDHdpVHBN6iQbTd3QbbitHZkmwt22XRk5NDrWZp3+CDUZT0y2vURX40LJk2dK8wu8GZ+uHhu4/bh5Paj4/XX356/PE/fv7xfz3++NPT5/35i7l9/PyLfnmYj+P9Pi57a0dzXsvP6tn79VSc2Jh9Gae9jakqu13MVkm7PbKyd9vJWtTGY1g9oeWGW3GTOvcwIss0koZqmlFVGts2aiu2Yitv6k17IWHFve2i6UHvzIEKBsIUtS7BlrL3KEpGAiGvmIWWJRzRLvqMsOXGCZO2kQ1xBpVZttmSMtdIQDZLJPae1DD01FRLlfm7hVAP3C3UVZyU9C6q7UNqlhq3IO8/BLdDfdsOXaUY0mworfgSk0e2VSX7yImLEW7aJDaabNvGzISgilpBE0mqwlSmU7CcJzzEiB0mwam2pJXJxBva2OCk2RmjlnBB1KovbJ22dt/2BosuLZZVY6LEyCrNtqFRu7u99yER0WA6u0jtaBEbW2hiTXsiVe4fw3aNKioxtxpVeaXBWcVJUizGFqojyCzBtUVVtkmS0wBINUTbCAnJCRkcNeHfICwEHLmFozMbI+X+JzsCMz3b3WNsko2iD2kpXZKke8RQScigJDSGNVTcBiFWX0vChPEVDIWzIBR4WSTQBq0lX4xwGqva3abXKxa0T+8mV7Ju3dOeZkq7mdYExsCzzkM3V9dE47SpNHFfdzXS6H2HYkhVdaoiRAONNKCqjKutKoEwUhVjbCyRUFVEmiypR5oloXWRV4kPpRiuQGrHSabbrjpyGkC5t0PAytpHdVvM07lqG3Sr56ePH7759sM333x/fveutotrdOk6W1sZBsCklOcsa3ps7LfJOI0qqPZoCtWoDY1tOwmNGmuDcVKikcKRhI0yH9lQVUhltmaaghrq7rCoUttSYati/xorkmCp3VWyyfNuOo0abrt7LzOx3e0eaArT3W48YTkcdlaD2DGYJI0WyDE1cObS+06ke3mbES6y7gOhxGXHD9eGhN3BERuNlu+B5rBkLxoAuw44aHldxXXgCyMrT3ooIcQx+nSyF524u4c929frXtIQ0La7x/V6u5zm/xfUgUqwB3pIhQAAAABJRU5ErkJggg==', 'refresh_worker': True}
comfyui-worker | DEBUG  | test-978b2b1c-ce09-4849-8b7d-3fd318f016c2 | run_job return: {'output': {'status': 'success', 'message': 'iVBORw0KGgoAAAANSUhEUgAAAvgAAAL4CAIAAACBQBS0AAEAAElEQVR4Xoz97XLkOrIlCq7lDpIRkjJz711V3T028zjzIPPiM2Yz93afU1/7IzOlCBJwX/eHgyFlnWqzQaYUJOBw+DccABni//v/9f/8RKaSUJIGImUAAEkASUICIBCEwLomCwgCANTdvJnXBAEBBPWxoUrheYAI4AQSgLMvAHFC/+9LkYqJZ16/U1bk/0vhHFmPawKSAaRIQCJASBAIzvbqSkAkU5z0AYCsBitmHkOKLJZIKQlKUxw8wXT+TExZPw/Ec1icbJbUH2j4qJkdoBqytFX1AiiBOhVDMIspFs7Srqo7AUgqrQMljTzRA5RkczwWZAnq5EMo0QGApgEUJqJopjIBFB0AjUpBJHIyNsmDBGUSSTIj7Rxxyj6kBAREUkpl1tgjlchUjlSmhD4iIyNjCEfkGDmUh7SnurJLR+oQjswgDqALQ5mc9K6Lr0sDFFKX7j26cQf2zA4M4B4aUAcG0VNDCCKIkUggUmkckoTIFJmCABojlERKghIsOZ/qxRQuEZpWQEKSkwYaaaQBRhIwUkKrG8mMkBa3ZWmLt21ZFvOleXNz0pyQooc1pgQgM0SMVIQiIqUgMtSPXtcSRg+RWUqQoKKcDyPU/DnVX5yQAFFmwgeHJECSRDFS3EHwYiHTjUiZ0QADmhOCAQYZSZKSEQYagJR7CYSUzMwIigY5QdJhRjnVSE+Y1Mhm5mAzNGVLrMAKLMwVWMwWw+axGS5Md2xNW8NiWhrcRmMakpQhHQkBhBMzigJGIAAgSEkEJKQIICSCScAIMslw66sf1/bdl9fL5dv68v3pz39sP31rL9/t86/5/Lc7//Z9//Xb/sevR3+l99ze/viJ40kjxiHrr9/+oNk9xhF3WCSQJigryCtVqklSLDckyg7BqaWpLUIEyYqRRgAESataY1VWOIG7Tc0+tD3bSLOqF0QaAAjkx+HmLSpGFCUTDvUJ1ugAxIKvHyOBGqHwNIKCmRmNRpM500kTnEbCjEa46KQDC5sDq3EBF3IBNuNGXMmr4Wq2URt1Ma1MRzpzMZkFDQYQMgKAQyTMUFomRINSVkwYytxNMy5O1knV9fyYTRRFETyBP8gEoGYPgCV0kvqAARLJChUosCqF5ryrHgLeNVHy/YjqpG5efSwq7VfjWQdMrKWtE/B/X/4LihqfgIqYB0oJj0QBqgkCEKgHClrIK4oGPczvm7UXiSk3F+QSdMpPMvKc+DAHrSkRmJ4xr3/k4EFvDT5rTpjTbucdcbYRJ3LM24J4r5rlga1YRF0+bvjet4rq50ciAUxaWH2JkhpJmVhWUmDExC2JxZMA1Nw91UcCsql0AO/iedAKqeyW/EBNzQHnLT+o+4P1V3m34oefPIgB+chgCnL2L4AHfr7fEcAZHN5ra7RSXOUZQNkagYrdgihMT6tOxQNF6cxp3oX1YTygcEuoIOqQZrIDnRn2SWelm4BEiURFZKch9c59qkgRQCMEpFFQBABIczAwlQL0uM4UlUBKMCiUUBKAaPxBgUUrITIkNwJISc6ZxJIgJNGnBPUuMwKiQYL81BcFQ0oyFkklWQJ5Ok0KsBJt0fSQYFnmKX0DqDOgCKAoQDXpl2TMaaQb3WlOOqyxEqBZTClFpqBQZmhIEZnSSI3MEVmpYiYyM4EUlKWclCRhMlKMC2XY0xaAEtNDIhCmnYCcUx9UfYpqo5Rz5VBBCQBB46jVAiAiKQPMICIkA6whZw5UASVZWRFhhIOQqHTASC/TAjxoAJUNauAiuHIhVnIBF9PWcDFeiIW5mC6NT43bgs25mq1kc4FGyioDRNZKpQzEDACdSFApEiYIcrFkVL+MBkV2ueGyEeMQ3pK/h8TLCIvPC/np+Xp9+unzpz9+bm83P+49337u99vv+w3jvrD7p+e433ncWjSijx7SkJIKg5JZgjYiSYCay4qy3BIzUHPktCtWrlO
comfyui-worker | ...TRUNCATED 923409 CHARACTERS...
comfyui-worker | N2ZbeC3vVJgp2/K0XY3pMadMVc+eqMbA7r0Zpqq72yoxqXhhCyS7u7tG7utpICRps9tesYA8xWoxl8nEWtxOWiOva0tWYnxbWxj2ThzYwhgBpZKCaVnrsidJLijlIkiMrlWlkk1VjSQ8XvOH8RgSMVwHTWVOtpT5vhaNIM8e4+Qxujar5myO/Ke7pTG7jZG6e29Nt/K+V7Bx05arKCWaGbHusBnWptzYjpqShlw9ZdRtWarSGFahkeetahsJEapCIntFUo0hRDhU29QQKjVztjw1tti6ZDQo3NFIvJYVRs2rjhw5cxgohLBMWABCUglbldxTgFTAkiKRsSAdDIoEJbFSflGShI2U4VcpSbKtOA1AzCLw3WC7cbB3+MK5gGVpDvWpJjWnXex7zVNN17RuO13iNOY+eow5Rm8buEz1Xm5UjA1p1HoP4L637XbXkLr3fbetE7g1ynNqs0Qh3MqUUmFLlJDAUexKDA+lgQuBpx0JSdiOBwA5laM3bNcYxtO4XaWeXaJNVWGzlSTN7u7axlZVnhs+v/tw+fDN+4/faLu0xu5xnTrVSVZpV7cLz5qmKFTFtk9Rpzm3a2/t86jLw/ZOp0trwDifRpKUURpVzJiUS8uS2hYawmZUtZ2501KE0TZyjWWIlRmzanYPqQazjXsgeRaMnid067lPX7Clfc7ZtHDp1r0nFqF97tOY+GMfMzZek7csjmoh7oewpBybji8gZGMfegOa6JHEk1i43RNQhSE4dAr2Qi8fXSr0xFpivMR3BFbJdi47owohbss+HAbZagrmnExKeLQ8ZaY1LoVb3rcvl3rYoSfTs5AGo4JKbe2MWaeqFmpjkGiHXwmMss6A3H9ZUSci9Ii4qRPDw4MBeXpFLlcBxlEzq94cTu1D4hl3iZV4wtEHkI+kzYtayIvrAGDHdpVHBN6iQbTd3QbbitHZkmwt22XRk5NDrWZp3+CDUZT0y2vURX40LJk2dK8wu8GZ+uHhu4/bh5Paj4/XX356/PE/fv7xfz3++NPT5/35i7l9/PyLfnmYj+P9Pi57a0dzXsvP6tn79VSc2Jh9Gae9jakqu13MVkm7PbKyd9vJWtTGY1g9oeWGW3GTOvcwIss0koZqmlFVGts2aiu2Yitv6k17IWHFve2i6UHvzIEKBsIUtS7BlrL3KEpGAiGvmIWWJRzRLvqMsOXGCZO2kQ1xBpVZttmSMtdIQDZLJPae1DD01FRLlfm7hVAP3C3UVZyU9C6q7UNqlhq3IO8/BLdDfdsOXaUY0mworfgSk0e2VSX7yImLEW7aJDaabNvGzISgilpBE0mqwlSmU7CcJzzEiB0mwam2pJXJxBva2OCk2RmjlnBB1KovbJ22dt/2BosuLZZVY6LEyCrNtqFRu7u99yER0WA6u0jtaBEbW2hiTXsiVe4fw3aNKioxtxpVeaXBWcVJUizGFqojyCzBtUVVtkmS0wBINUTbCAnJCRkcNeHfICwEHLmFozMbI+X+JzsCMz3b3WNsko2iD2kpXZKke8RQScigJDSGNVTcBiFWX0vChPEVDIWzIBR4WSTQBq0lX4xwGqva3abXKxa0T+8mV7Ju3dOeZkq7mdYExsCzzkM3V9dE47SpNHFfdzXS6H2HYkhVdaoiRAONNKCqjKutKoEwUhVjbCyRUFVEmiypR5oloXWRV4kPpRiuQGrHSabbrjpyGkC5t0PAytpHdVvM07lqG3Sr56ePH7759sM333x/fveutotrdOk6W1sZBsCklOcsa3ps7LfJOI0qqPZoCtWoDY1tOwmNGmuDcVKikcKRhI0yH9lQVUhltmaaghrq7rCoUttSYati/xorkmCp3VWyyfNuOo0abrt7LzOx3e0eaArT3W48YTkcdlaD2DGYJI0WyDE1cObS+06ke3mbES6y7gOhxGXHD9eGhN3BERuNlu+B5rBkLxoAuw44aHldxXXgCyMrT3ooIcQx+nSyF524u4c929frXtIQ0La7x/V6u5zm/xfUgUqwB3pIhQAAAABJRU5ErkJggg=='}, 'stopPod': True}
comfyui-worker | INFO   | test-cc0930c0-8dcd-450d-b963-863fa5b290ba | Started.
comfyui-worker | runpod-worker-comfy - API is reachable
comfyui-worker | runpod-worker-comfy - image(s) upload
comfyui-worker | runpod-worker-comfy - image(s) upload complete
comfyui-worker | got prompt
comfyui-worker | runpod-worker-comfy - queued workflow with ID 22e95fe4-daae-4bf0-99dc-cb713b62306d
comfyui-worker | runpod-worker-comfy - wait until image generation is complete
comfyui-worker | Requested to load SDXLClipModel
comfyui-worker | Loading 1 new model
comfyui-worker | DEBUG  | test-cc0930c0-8dcd-450d-b963-863fa5b290ba | Handler output: {'error': 'Error waiting for image generation: [Errno 104] Connection reset by peer'}
comfyui-worker | DEBUG  | test-cc0930c0-8dcd-450d-b963-863fa5b290ba | run_job return: {'error': 'Error waiting for image generation: [Errno 104] Connection reset by peer'}
comfyui-worker | INFO   | test-8e0087b3-08b0-4b94-a012-f7b9aa3f3964 | Started.
comfyui-worker | runpod-worker-comfy - Failed to connect to server at http://127.0.0.1:8188 after 500 attempts.
comfyui-worker | runpod-worker-comfy - image(s) upload
comfyui-worker | ERROR  | test-8e0087b3-08b0-4b94-a012-f7b9aa3f3964 | Captured Handler Exception
comfyui-worker | ERROR  | {
comfyui-worker |     "error_type": "<class 'requests.exceptions.ConnectionError'>",
comfyui-worker |     "error_message": "HTTPConnectionPool(host='127.0.0.1', port=8188): Max retries exceeded with url: /upload/image (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd3df1b3a60>: Failed to establish a new connection: [Errno 111] Connection refused'))",
comfyui-worker |     "error_traceback": "Traceback (most recent call last):\n  File \"/usr/local/lib/python3.10/dist-packages/urllib3/connection.py\", line 174, in _new_conn\n    conn = connection.create_connection(\n  File \"/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py\", line 95, in create_connection\n    raise err\n  File \"/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py\", line 85, in create_connection\n    sock.connect(sa)\nConnectionRefusedError: [Errno 111] Connection refused\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py\", line 715, in urlopen\n    httplib_response = self._make_request(\n  File \"/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py\", line 416, in _make_request\n    conn.request(method, url, **httplib_request_kw)\n  File \"/usr/local/lib/python3.10/dist-packages/urllib3/connection.py\", line 244, in request\n    super(HTTPConnection, self).request(method, url, body=body, headers=headers)\n  File \"/usr/lib/python3.10/http/client.py\", line 1283, in request\n    self._send_request(method, url, body, headers, encode_chunked)\n  File \"/usr/lib/python3.10/http/client.py\", line 1329, in _send_request\n    self.endheaders(body, encode_chunked=encode_chunked)\n  File \"/usr/lib/python3.10/http/client.py\", line 1278, in endheaders\n    self._send_output(message_body, encode_chunked=encode_chunked)\n  File \"/usr/lib/python3.10/http/client.py\", line 1038, in _send_output\n    self.send(msg)\n  File \"/usr/lib/python3.10/http/client.py\", line 976, in send\n    self.connect()\
comfyui-worker | ...TRUNCATED 783 CHARACTERS...
comfyui-worker | t(\n  File \"/usr/local/lib/python3.10/dist-packages/urllib3/util/retry.py\", line 594, in increment\n    raise MaxRetryError(_pool, url, error or ResponseError(cause))\nurllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=8188): Max retries exceeded with url: /upload/image (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd3df1b3a60>: Failed to establish a new connection: [Errno 111] Connection refused'))\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/usr/local/lib/python3.10/dist-packages/runpod/serverless/modules/rp_job.py\", line 134, in run_job\n    handler_return = handler(job)\n  File \"/rp_handler.py\", line 308, in handler\n    upload_result = upload_images(images)\n  File \"/rp_handler.py\", line 134, in upload_images\n    response = requests.post(f\"http://{COMFY_HOST}/upload/image\", files=files)\n  File \"/usr/local/lib/python3.10/dist-packages/requests/api.py\", line 115, in post\n    return request(\"post\", url, data=data, json=json, **kwargs)\n  File \"/usr/local/lib/python3.10/dist-packages/requests/api.py\", line 59, in request\n    return session.request(method=method, url=url, **kwargs)\n  File \"/usr/local/lib/python3.10/dist-packages/requests/sessions.py\", line 589, in request\n    resp = self.send(prep, **send_kwargs)\n  File \"/usr/local/lib/python3.10/dist-packages/requests/sessions.py\", line 703, in send\n    r = adapter.send(request, **kwargs)\n  File \"/usr/local/lib/python3.10/dist-packages/requests/adapters.py\", line 700, in send\n    raise ConnectionError(e, request=request)\nrequests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=8188): Max retries exceeded with url: /upload/image (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd3df1b3a60>: Failed to establish a new connection: [Errno 111] Connection refused'))\n",
comfyui-worker |     "hostname": "unknown",
comfyui-worker |     "worker_id": "unknown",
comfyui-worker |     "runpod_version": "1.6.2"
comfyui-worker | }
comfyui-worker | DEBUG  | test-8e0087b3-08b0-4b94-a012-f7b9aa3f3964 | run_job return: {'error': '{"error_type": "<class \'requests.exceptions.ConnectionError\'>", "error_message": "HTTPConnectionPool(host=\'127.0.0.1\', port=8188): Max retries exceeded with url: /upload/image (Caused by NewConnectionError(\'<urllib3.connection.HTTPConnection object at 0x7fd3df1b3a60>: Failed to establish a new connection: [Errno 111] Connection refused\'))", "error_traceback": "Traceback (most recent call last):\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/connection.py\\", line 174, in _new_conn\\n    conn = connection.create_connection(\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py\\", line 95, in create_connection\\n    raise err\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py\\", line 85, in create_connection\\n    sock.connect(sa)\\nConnectionRefusedError: [Errno 111] Connection refused\\n\\nDuring handling of the above exception, another exception occurred:\\n\\nTraceback (most recent call last):\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py\\", line 715, in urlopen\\n    httplib_response = self._make_request(\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py\\", line 416, in _make_request\\n    conn.request(method, url, **httplib_request_kw)\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/connection.py\\", line 244, in request\\n    super(HTTPConnection, self).request(method, url, body=body, headers=headers)\\n  File \\"/usr/lib/python3.10/http/client.py\\", line 1283, in request\\n    self._send_request(method, url, body, headers, encode_chunked)\\n  File \\"/usr/lib/python3.10/http/client.py\\", line 1329, in _send_request\\n    self.endheaders(body, encode_chunked=encode_chunked)\\n  File \\"/usr/lib/python3.10/http/client.py\\", line 1278, in endheaders\\n    self._send_output(message_body, encode_chunked=encode_chunked)\\n  File \\"/usr/lib/python3.10/http/client.py\\", line 1038, in _send_output\\n    self.send(msg)\\n  File \\"/usr/lib/
comfyui-worker | ...TRUNCATED 917 CHARACTERS...
comfyui-worker | t-packages/urllib3/util/retry.py\\", line 594, in increment\\n    raise MaxRetryError(_pool, url, error or ResponseError(cause))\\nurllib3.exceptions.MaxRetryError: HTTPConnectionPool(host=\'127.0.0.1\', port=8188): Max retries exceeded with url: /upload/image (Caused by NewConnectionError(\'<urllib3.connection.HTTPConnection object at 0x7fd3df1b3a60>: Failed to establish a new connection: [Errno 111] Connection refused\'))\\n\\nDuring handling of the above exception, another exception occurred:\\n\\nTraceback (most recent call last):\\n  File \\"/usr/local/lib/python3.10/dist-packages/runpod/serverless/modules/rp_job.py\\", line 134, in run_job\\n    handler_return = handler(job)\\n  File \\"/rp_handler.py\\", line 308, in handler\\n    upload_result = upload_images(images)\\n  File \\"/rp_handler.py\\", line 134, in upload_images\\n    response = requests.post(f\\"http://{COMFY_HOST}/upload/image\\", files=files)\\n  File \\"/usr/local/lib/python3.10/dist-packages/requests/api.py\\", line 115, in post\\n    return request(\\"post\\", url, data=data, json=json, **kwargs)\\n  File \\"/usr/local/lib/python3.10/dist-packages/requests/api.py\\", line 59, in request\\n    return session.request(method=method, url=url, **kwargs)\\n  File \\"/usr/local/lib/python3.10/dist-packages/requests/sessions.py\\", line 589, in request\\n    resp = self.send(prep, **send_kwargs)\\n  File \\"/usr/local/lib/python3.10/dist-packages/requests/sessions.py\\", line 703, in send\\n    r = adapter.send(request, **kwargs)\\n  File \\"/usr/local/lib/python3.10/dist-packages/requests/adapters.py\\", line 700, in send\\n    raise ConnectionError(e, request=request)\\nrequests.exceptions.ConnectionError: HTTPConnectionPool(host=\'127.0.0.1\', port=8188): Max retries exceeded with url: /upload/image (Caused by NewConnectionError(\'<urllib3.connection.HTTPConnection object at 0x7fd3df1b3a60>: Failed to establish a new connection: [Errno 111] Connection refused\'))\\n", "hostname": "unknown", "worker_id": "unknown", "runpod_version": "1.6.2"}'}
comfyui-worker | INFO   | test-d75801dc-f1ca-45e4-91bc-60fd399c1c61 | Started.
comfyui-worker | runpod-worker-comfy - Failed to connect to server at http://127.0.0.1:8188 after 500 attempts.
comfyui-worker | runpod-worker-comfy - image(s) upload
comfyui-worker | ERROR  | test-d75801dc-f1ca-45e4-91bc-60fd399c1c61 | Captured Handler Exception
comfyui-worker | ERROR  | {
comfyui-worker |     "error_type": "<class 'requests.exceptions.ConnectionError'>",
comfyui-worker |     "error_message": "HTTPConnectionPool(host='127.0.0.1', port=8188): Max retries exceeded with url: /upload/image (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd3dedc6d40>: Failed to establish a new connection: [Errno 111] Connection refused'))",
comfyui-worker |     "error_traceback": "Traceback (most recent call last):\n  File \"/usr/local/lib/python3.10/dist-packages/urllib3/connection.py\", line 174, in _new_conn\n    conn = connection.create_connection(\n  File \"/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py\", line 95, in create_connection\n    raise err\n  File \"/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py\", line 85, in create_connection\n    sock.connect(sa)\nConnectionRefusedError: [Errno 111] Connection refused\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py\", line 715, in urlopen\n    httplib_response = self._make_request(\n  File \"/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py\", line 416, in _make_request\n    conn.request(method, url, **httplib_request_kw)\n  File \"/usr/local/lib/python3.10/dist-packages/urllib3/connection.py\", line 244, in request\n    super(HTTPConnection, self).request(method, url, body=body, headers=headers)\n  File \"/usr/lib/python3.10/http/client.py\", line 1283, in request\n    self._send_request(method, url, body, headers, encode_chunked)\n  File \"/usr/lib/python3.10/http/client.py\", line 1329, in _send_request\n    self.endheaders(body, encode_chunked=encode_chunked)\n  File \"/usr/lib/python3.10/http/client.py\", line 1278, in endheaders\n    self._send_output(message_body, encode_chunked=encode_chunked)\n  File \"/usr/lib/python3.10/http/client.py\", line 1038, in _send_output\n    self.send(msg)\n  File \"/usr/lib/python3.10/http/client.py\", line 976, in send\n    self.connect()\
comfyui-worker | ...TRUNCATED 783 CHARACTERS...
comfyui-worker | t(\n  File \"/usr/local/lib/python3.10/dist-packages/urllib3/util/retry.py\", line 594, in increment\n    raise MaxRetryError(_pool, url, error or ResponseError(cause))\nurllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=8188): Max retries exceeded with url: /upload/image (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd3dedc6d40>: Failed to establish a new connection: [Errno 111] Connection refused'))\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/usr/local/lib/python3.10/dist-packages/runpod/serverless/modules/rp_job.py\", line 134, in run_job\n    handler_return = handler(job)\n  File \"/rp_handler.py\", line 308, in handler\n    upload_result = upload_images(images)\n  File \"/rp_handler.py\", line 134, in upload_images\n    response = requests.post(f\"http://{COMFY_HOST}/upload/image\", files=files)\n  File \"/usr/local/lib/python3.10/dist-packages/requests/api.py\", line 115, in post\n    return request(\"post\", url, data=data, json=json, **kwargs)\n  File \"/usr/local/lib/python3.10/dist-packages/requests/api.py\", line 59, in request\n    return session.request(method=method, url=url, **kwargs)\n  File \"/usr/local/lib/python3.10/dist-packages/requests/sessions.py\", line 589, in request\n    resp = self.send(prep, **send_kwargs)\n  File \"/usr/local/lib/python3.10/dist-packages/requests/sessions.py\", line 703, in send\n    r = adapter.send(request, **kwargs)\n  File \"/usr/local/lib/python3.10/dist-packages/requests/adapters.py\", line 700, in send\n    raise ConnectionError(e, request=request)\nrequests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=8188): Max retries exceeded with url: /upload/image (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd3dedc6d40>: Failed to establish a new connection: [Errno 111] Connection refused'))\n",
comfyui-worker |     "hostname": "unknown",
comfyui-worker |     "worker_id": "unknown",
comfyui-worker |     "runpod_version": "1.6.2"
comfyui-worker | }
comfyui-worker | DEBUG  | test-d75801dc-f1ca-45e4-91bc-60fd399c1c61 | run_job return: {'error': '{"error_type": "<class \'requests.exceptions.ConnectionError\'>", "error_message": "HTTPConnectionPool(host=\'127.0.0.1\', port=8188): Max retries exceeded with url: /upload/image (Caused by NewConnectionError(\'<urllib3.connection.HTTPConnection object at 0x7fd3dedc6d40>: Failed to establish a new connection: [Errno 111] Connection refused\'))", "error_traceback": "Traceback (most recent call last):\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/connection.py\\", line 174, in _new_conn\\n    conn = connection.create_connection(\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py\\", line 95, in create_connection\\n    raise err\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py\\", line 85, in create_connection\\n    sock.connect(sa)\\nConnectionRefusedError: [Errno 111] Connection refused\\n\\nDuring handling of the above exception, another exception occurred:\\n\\nTraceback (most recent call last):\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py\\", line 715, in urlopen\\n    httplib_response = self._make_request(\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py\\", line 416, in _make_request\\n    conn.request(method, url, **httplib_request_kw)\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/connection.py\\", line 244, in request\\n    super(HTTPConnection, self).request(method, url, body=body, headers=headers)\\n  File \\"/usr/lib/python3.10/http/client.py\\", line 1283, in request\\n    self._send_request(method, url, body, headers, encode_chunked)\\n  File \\"/usr/lib/python3.10/http/client.py\\", line 1329, in _send_request\\n    self.endheaders(body, encode_chunked=encode_chunked)\\n  File \\"/usr/lib/python3.10/http/client.py\\", line 1278, in endheaders\\n    self._send_output(message_body, encode_chunked=encode_chunked)\\n  File \\"/usr/lib/python3.10/http/client.py\\", line 1038, in _send_output\\n    self.send(msg)\\n  File \\"/usr/lib/
comfyui-worker | ...TRUNCATED 917 CHARACTERS...
comfyui-worker | t-packages/urllib3/util/retry.py\\", line 594, in increment\\n    raise MaxRetryError(_pool, url, error or ResponseError(cause))\\nurllib3.exceptions.MaxRetryError: HTTPConnectionPool(host=\'127.0.0.1\', port=8188): Max retries exceeded with url: /upload/image (Caused by NewConnectionError(\'<urllib3.connection.HTTPConnection object at 0x7fd3dedc6d40>: Failed to establish a new connection: [Errno 111] Connection refused\'))\\n\\nDuring handling of the above exception, another exception occurred:\\n\\nTraceback (most recent call last):\\n  File \\"/usr/local/lib/python3.10/dist-packages/runpod/serverless/modules/rp_job.py\\", line 134, in run_job\\n    handler_return = handler(job)\\n  File \\"/rp_handler.py\\", line 308, in handler\\n    upload_result = upload_images(images)\\n  File \\"/rp_handler.py\\", line 134, in upload_images\\n    response = requests.post(f\\"http://{COMFY_HOST}/upload/image\\", files=files)\\n  File \\"/usr/local/lib/python3.10/dist-packages/requests/api.py\\", line 115, in post\\n    return request(\\"post\\", url, data=data, json=json, **kwargs)\\n  File \\"/usr/local/lib/python3.10/dist-packages/requests/api.py\\", line 59, in request\\n    return session.request(method=method, url=url, **kwargs)\\n  File \\"/usr/local/lib/python3.10/dist-packages/requests/sessions.py\\", line 589, in request\\n    resp = self.send(prep, **send_kwargs)\\n  File \\"/usr/local/lib/python3.10/dist-packages/requests/sessions.py\\", line 703, in send\\n    r = adapter.send(request, **kwargs)\\n  File \\"/usr/local/lib/python3.10/dist-packages/requests/adapters.py\\", line 700, in send\\n    raise ConnectionError(e, request=request)\\nrequests.exceptions.ConnectionError: HTTPConnectionPool(host=\'127.0.0.1\', port=8188): Max retries exceeded with url: /upload/image (Caused by NewConnectionError(\'<urllib3.connection.HTTPConnection object at 0x7fd3dedc6d40>: Failed to establish a new connection: [Errno 111] Connection refused\'))\\n", "hostname": "unknown", "worker_id": "unknown", "runpod_version": "1.6.2"}'}
comfyui-worker | INFO   | test-065b19f8-752e-45ec-8d7f-56a9794683d2 | Started.

@vesper8
Copy link
Author

vesper8 commented Jun 18, 2024

It seems like it's constantly loading models in and out of memory even though.. as it happens.. i'm using the same nodes and same models for the entire batch.. is keeping the same models loaded in for the whole batch something that's possible, something that we can have control over?

@Fiedroz
Copy link

Fiedroz commented Oct 10, 2024

Im having same issue, does anyone have solution on how to fix this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants