Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More graceful job cancellation #640

Open
AlecThomson opened this issue May 22, 2024 · 4 comments · May be fixed by #641
Open

More graceful job cancellation #640

AlecThomson opened this issue May 22, 2024 · 4 comments · May be fixed by #641
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@AlecThomson
Copy link

Hey all,

This is just a thought for the SLURMCluster for now (since that's what I'm familiar with) but similar options may be available in other clusters too. Currently, the cancel_command in the SLURMJob class is a bare "scancel".

cancel_command = "scancel"

This means that, even when workers are shutdown completely gracefully, the Slurm job is marked as CANCELLED. Instead, if the command were scancel --signal=SIGTERM the job would be marked as COMPLETED. Its possible there could be cases where we would want a job to cancelled, which complicates this somewhat.

In the simple case, however, I think this could be implmented with a simple change of cancel_command to:

class SLURMJob(Job):
    # Override class variables
    submit_command = "sbatch"
    cancel_command = "scancel --signal=SIGTERM"
    config_name = "slurm"

It'd be great to get some more thoughts on the implications for this.

@jacobtomlinson
Copy link
Member

This sounds like a great improvement. Do you have any interest in making a PR to add this option?

@jacobtomlinson jacobtomlinson added enhancement New feature or request help wanted Extra attention is needed labels May 24, 2024
@AlecThomson
Copy link
Author

Happy to! Just wanted to check in to make sure there wouldn't be any more hidden gotchas

@AlecThomson AlecThomson linked a pull request May 24, 2024 that will close this issue
@guillaumeeb
Copy link
Member

Hi! This sounds also perfectly acceptable to me. I don't think there is any case in which we would really like to have a CANCELLED status! Thanks for proposing this, and I think this might be possible with other schedulers too!

@AlecThomson
Copy link
Author

I see something like this was added for the HTCondor class in #411 and #514. I'll attempt to generalise

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants