-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to run multiple jobs in a row #71
base: gcp
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1102,12 +1102,13 @@ def main(): | |
GcpBatch.run_combine_results_on_cloud(gcs_bucket, gcs_prefix, results_dir, do_timeseries) | ||
else: | ||
parser = argparse.ArgumentParser() | ||
parser.add_argument("project_filename") | ||
parser.add_argument("project_filenames", help="Comma-separated list of project YAML files to run.") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe removing the help part from here as it is in the argument section as well? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure what you mean... But I like the help string here because it shows up when you run
|
||
parser.add_argument( | ||
"job_identifier", | ||
"job_identifiers", | ||
nargs="?", | ||
default=None, | ||
help="Optional override of gcp.job_identifier in your project file. Max 48 characters.", | ||
help="Comma-separated list of job IDs to use. " | ||
"Optional override of gcp.job_identifier in your project file. Max 48 characters.", | ||
) | ||
group = parser.add_mutually_exclusive_group() | ||
group.add_argument( | ||
|
@@ -1148,34 +1149,50 @@ def main(): | |
else: | ||
logger.setLevel(logging.INFO) | ||
|
||
# validate the project, and if --validateonly flag is set, return True if validation passes | ||
GcpBatch.validate_project(os.path.abspath(args.project_filename)) | ||
if args.validateonly: | ||
return True | ||
project_filenames = args.project_filenames.split(",") | ||
n_projects = len(project_filenames) | ||
job_IDs = len(project_filenames) * [None] | ||
if args.job_identifiers: | ||
job_IDs = args.job_identifiers.split(",") | ||
if len(job_IDs) != n_projects: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this capturing the project_id issue? like trying to see what is the likelihood of this error happening? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is just checking that if you give a list of project files and a list of IDs, they're the same length. |
||
raise ValidationError( | ||
f"job_identifiers contains {len(args.job_identifiers.split(','))} IDs, " | ||
f"but project_filenames contains {n_projects} files" | ||
) | ||
|
||
batch = GcpBatch(args.project_filename, args.job_identifier, missing_only=args.missingonly) | ||
if args.clean: | ||
batch.clean() | ||
return | ||
if args.show_jobs: | ||
batch.show_jobs() | ||
for project_filename, job_ID in zip(project_filenames, job_IDs): | ||
logger.info(f"----------Validating {project_filename}{f' ({job_ID})' if job_ID else ''}----------") | ||
# validate the project, and if --validateonly flag is set, return True if validation passes | ||
GcpBatch.validate_project(os.path.abspath(project_filename)) | ||
|
||
if args.validateonly: | ||
return | ||
elif args.postprocessonly: | ||
if batch.check_for_existing_jobs(pp_only=True): | ||
return | ||
batch.build_image("gcp") | ||
batch.push_image() | ||
batch.process_results() | ||
else: | ||
if batch.check_for_existing_jobs(): | ||
return | ||
if not args.missingonly: | ||
batch.check_output_dir() | ||
|
||
batch.build_image("gcp") | ||
batch.push_image() | ||
batch.run_batch() | ||
batch.process_results() | ||
for project_filename, job_ID in zip(project_filenames, job_IDs): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Project-Job Pair Processing continues all the way to post-processing and then the next project-job pair is picked up or once the results are avilable in gcs, this will pick up the next job? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right now, this is just running one job completely (waiting for post-processing to finish), then starting the next one. We could potentially start the second job while the first is in post-processing, but I'm not doing that here. |
||
logger.info(f"----------Starting {project_filename}{f' ({job_ID})' if job_ID else ''}----------") | ||
batch = GcpBatch(project_filename, job_ID, missing_only=args.missingonly) | ||
if args.clean: | ||
batch.clean() | ||
continue | ||
if args.show_jobs: | ||
batch.show_jobs() | ||
continue | ||
elif args.postprocessonly: | ||
if batch.check_for_existing_jobs(pp_only=True): | ||
continue | ||
batch.build_image("gcp") | ||
batch.push_image() | ||
batch.process_results() | ||
else: | ||
if batch.check_for_existing_jobs(): | ||
continue | ||
if not args.missingonly: | ||
batch.check_output_dir() | ||
|
||
batch.build_image("gcp") | ||
batch.push_image() | ||
batch.run_batch() | ||
batch.process_results() | ||
|
||
|
||
if __name__ == "__main__": | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we move this into develop at this point instead of gcp?