Add the option to rerun failed tasks #62

nweires · 2024-01-24T15:47:48Z

Adds the option to use the --missingonly flag to only run the tasks that don't already have results present.

Notes:

This checks the output directory for results_job{TASK_ID}.json.gz files, and runs the tasks for which that file is missing. This means you can also trigger reruns by deleting those files.
This assumes that you're rerunning with the same project file. If you change it, the behavior is undefined. (Some types of changes would be ignored, others could cause wrong results.)

Testing:

I deleted some results files from a previous job, then ran with --missingonly and confirmed that only the missing tasks were rerun.
I also compared the new results file to the old one, to ensure that the correct set of simulations were run.

github-actions · 2024-01-24T16:04:26Z

File	Coverage
All files	`86%`	✅
base.py	`91%`	✅
exc.py	`57%`	✅
hpc.py	`78%`	✅
local.py	`70%`	✅
postprocessing.py	`84%`	✅
utils.py	`91%`	✅
cloud/docker_base.py	`79%`	✅
sampler/base.py	`79%`	✅
sampler/downselect.py	`33%`	✅
sampler/precomputed.py	`93%`	✅
sampler/residential_quota.py	`61%`	✅
test/shared_testing_stuff.py	`85%`	✅
test/test_docker.py	`33%`	✅
test/test_local.py	`97%`	✅
test/test_validation.py	`97%`	✅
workflow_generator/base.py	`90%`	✅
workflow_generator/commercial.py	`53%`	✅
workflow_generator/residential_hpxml.py	`86%`	✅

Minimum allowed coverage is 33%

Generated by 🐒 cobertura-action against a941448

mfathollahzadeh

Thanks, Natalie! This is great! Just added a few minor comments

buildstockbatch/cloud/docker_base.py

mfathollahzadeh · 2024-01-25T15:49:06Z

buildstockbatch/cloud/docker_base.py

+        with fs.open(f"{self.results_dir}/missing_tasks.txt", "w") as f:
+            for task_id in range(expected):
+                if task_id not in done_tasks:
+                    f.write(f"{task_id}\n")


I think it would be useful to add print(f"Missing task ID: {task_id}") so that we can keep track of them

also if the job count becomes 0, should we print something like all expected task results are present?

also if the job_count becomes zero, we should skip the post process, right?

Added logging of the list of tasks. In gcp.py (where this is called), we raise an error and quit if there's nothing to retry.

mfathollahzadeh · 2024-01-25T15:56:58Z

buildstockbatch/cloud/docker_base.py

+        done_tasks = []
+        for f in fs.ls(f"{self.results_dir}/simulation_output/"):
+            if m := re.match(".*results_job(\\d*).json.gz$", f):
+                done_tasks.append(int(m.group(1)))


not sure how long this is taking right now (probably not that long) but what do you think about using a set comprehension for faster lookups? something like or something similar?

fp = re.compile(".*results_job(\\d*).json.gz$") done_tasks = {int(m.group(1)) for f in fs.ls(f"{self.results_dir}/simulation_output/") if (m := fp.match(f))}

I think a loop is more readable than a comprehension, plus it avoids evaluating the regex twice (e.g. {int(fp.match(f).group(1)) for f in files if fp.match(f)}), but I will switch to a set instead of a list.

mfathollahzadeh · 2024-01-25T16:07:42Z

docs/run_sims.rst

+
+If this happens, you can rerun the same job with the ``--missingonly`` flag. This will rerun only the
+tasks that didn't produce output files, then run postprocessing. Note: This flag assumes that your
+project config file has not changed since the previous run, other than the job identifier.


This means the job identifier needs to be changed for --missingonly flag, right?

mfathollahzadeh

Thanks, Natalie! Looks good to me!

Add the option to rerun failed tasks.

d9460e2

nweires added 4 commits January 24, 2024 18:03

Cleanups

08b13ee

Refactor: make missing_only an instance variable

6eff54f

More refactoring and comment fixes

6fa23a3

formatting

10268b4

nweires marked this pull request as ready for review January 24, 2024 21:41

nweires requested a review from mfathollahzadeh January 24, 2024 21:42

Merge branch 'gcp' into natalie/retry-missing

93c587f

mfathollahzadeh reviewed Jan 25, 2024

View reviewed changes

Review comment updates

a941448

nweires requested a review from mfathollahzadeh January 26, 2024 16:26

mfathollahzadeh approved these changes Jan 29, 2024

View reviewed changes

mfathollahzadeh merged commit feb1607 into gcp Jan 31, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the option to rerun failed tasks #62

Add the option to rerun failed tasks #62

nweires commented Jan 24, 2024 •

edited

Loading

github-actions bot commented Jan 24, 2024 •

edited

Loading

mfathollahzadeh left a comment

mfathollahzadeh Jan 25, 2024

mfathollahzadeh Jan 25, 2024

mfathollahzadeh Jan 25, 2024

nweires Jan 25, 2024

mfathollahzadeh Jan 25, 2024

nweires Jan 26, 2024

mfathollahzadeh Jan 25, 2024

mfathollahzadeh left a comment

Add the option to rerun failed tasks #62

Add the option to rerun failed tasks #62

Conversation

nweires commented Jan 24, 2024 • edited Loading

github-actions bot commented Jan 24, 2024 • edited Loading

mfathollahzadeh left a comment

Choose a reason for hiding this comment

mfathollahzadeh Jan 25, 2024

Choose a reason for hiding this comment

mfathollahzadeh Jan 25, 2024

Choose a reason for hiding this comment

mfathollahzadeh Jan 25, 2024

Choose a reason for hiding this comment

nweires Jan 25, 2024

Choose a reason for hiding this comment

mfathollahzadeh Jan 25, 2024

Choose a reason for hiding this comment

nweires Jan 26, 2024

Choose a reason for hiding this comment

mfathollahzadeh Jan 25, 2024

Choose a reason for hiding this comment

mfathollahzadeh left a comment

Choose a reason for hiding this comment

nweires commented Jan 24, 2024 •

edited

Loading

github-actions bot commented Jan 24, 2024 •

edited

Loading