-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the option to rerun failed tasks #62
Conversation
Minimum allowed coverage is Generated by 🐒 cobertura-action against a941448 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Natalie! This is great! Just added a few minor comments
with fs.open(f"{self.results_dir}/missing_tasks.txt", "w") as f: | ||
for task_id in range(expected): | ||
if task_id not in done_tasks: | ||
f.write(f"{task_id}\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be useful to add print(f"Missing task ID: {task_id}")
so that we can keep track of them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also if the job count becomes 0, should we print something like all expected task results are present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also if the job_count becomes zero, we should skip the post process, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added logging of the list of tasks. In gcp.py
(where this is called), we raise an error and quit if there's nothing to retry.
buildstockbatch/cloud/docker_base.py
Outdated
done_tasks = [] | ||
for f in fs.ls(f"{self.results_dir}/simulation_output/"): | ||
if m := re.match(".*results_job(\\d*).json.gz$", f): | ||
done_tasks.append(int(m.group(1))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure how long this is taking right now (probably not that long) but what do you think about using a set comprehension for faster lookups? something like or something similar?
fp = re.compile(".*results_job(\\d*).json.gz$")
done_tasks = {int(m.group(1)) for f in fs.ls(f"{self.results_dir}/simulation_output/")
if (m := fp.match(f))}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a loop is more readable than a comprehension, plus it avoids evaluating the regex twice (e.g. {int(fp.match(f).group(1)) for f in files if fp.match(f)}
), but I will switch to a set instead of a list.
|
||
If this happens, you can rerun the same job with the ``--missingonly`` flag. This will rerun only the | ||
tasks that didn't produce output files, then run postprocessing. Note: This flag assumes that your | ||
project config file has not changed since the previous run, other than the job identifier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means the job identifier needs to be changed for --missingonly flag, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Natalie! Looks good to me!
Adds the option to use the
--missingonly
flag to only run the tasks that don't already have results present.Notes:
results_job{TASK_ID}.json.gz
files, and runs the tasks for which that file is missing. This means you can also trigger reruns by deleting those files.Testing:
--missingonly
and confirmed that only the missing tasks were rerun.