You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From what I understand, stealing has come pretty far in confirmation, i.e., it checked that the request is up-to-date, that the worker has indeed confirmed the request (by checking the worker status), and checked whether the task is currently stealable.
After looking into this for a while, I have not been able to understand the root-cause of this, so I'm leaving this here in case this ever comes up again.
Environment:
Dask version: 2024.7.1
Python version: 3.10.12
The text was updated successfully, but these errors were encountered:
hendrikmakait
changed the title
Scheduler deadlocked with after stealing failed in move_task_confirm
Scheduler deadlocked after stealing failed in move_task_confirmJul 23, 2024
I've investigated a cluster that deadlocked after work-stealing failed in
move_task_confirm
with the following traceback:From what I understand, stealing has come pretty far in confirmation, i.e., it checked that the request is up-to-date, that the worker has indeed confirmed the request (by checking the worker status), and checked whether the task is currently stealable.
After looking into this for a while, I have not been able to understand the root-cause of this, so I'm leaving this here in case this ever comes up again.
Environment:
2024.7.1
3.10.12
The text was updated successfully, but these errors were encountered: