Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Jobs] Parallel execution for DAG #4055

Open
cblmemo opened this issue Oct 10, 2024 · 2 comments · May be fixed by #4128
Open

[Jobs] Parallel execution for DAG #4055

cblmemo opened this issue Oct 10, 2024 · 2 comments · May be fixed by #4128
Assignees

Comments

@cblmemo
Copy link
Collaborator

cblmemo commented Oct 10, 2024

Blocked by #4054.

As a first step, we should support parallel execution for some basic DAG, in our jobs controller. For the following example, we should support parallel execution of the two finetune task.

image
@cblmemo cblmemo self-assigned this Oct 10, 2024
@cblmemo
Copy link
Collaborator Author

cblmemo commented Oct 10, 2024

Assigning @andylizf

@andylizf
Copy link
Contributor

@Michaelvll Question about self.dag in StrategyExecutor:

  1. Is self.dag intended as a future-proof design? If so, what scenarios were considered?

  2. Is it correct to assume that self.dag is unrelated to parallel execution of independent tasks at the same level?

  3. Or is it simply for convenience in passing arguments to self.launch, with no special significance?

def __init__(self, cluster_name: str, backend: 'backends.Backend',
task: 'task_lib.Task', retry_until_up: bool) -> None:
"""Initialize the strategy executor.
Args:
cluster_name: The name of the cluster.
backend: The backend to use. Only CloudVMRayBackend is supported.
task: The task to execute.
retry_until_up: Whether to retry until the cluster is up.
"""
assert isinstance(backend, backends.CloudVmRayBackend), (
'Only CloudVMRayBackend is supported.')
self.dag = sky.Dag()
self.dag.add(task)

Understanding this would help us implement parallel execution effectively. Thanks!

@andylizf andylizf linked a pull request Oct 19, 2024 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants