Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add shard execution workflow #1557

Open
wants to merge 11 commits into
base: pv-feat/experimental-sharding-backend
Choose a base branch
from

Conversation

polvalente
Copy link
Contributor

Adds the initial version of the process communication structure for sharded execution.

Does not handle container outputs for the sharded function yet,
and also does not yet bring everything together into the compiler jit function.

@polvalente polvalente self-assigned this Nov 8, 2024
@polvalente polvalente changed the base branch from main to pv-feat/experimental-sharding-backend November 8, 2024 01:20
@@ -4,6 +4,7 @@ defmodule Nx.Application do

def start(_type, _args) do
children = [
Nx.Defn.ShardingCompiler.ShardRegistry,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want to have this here, actually.
In fact, I think we might actually want to go with gen_stage for the execution, since the whole "chain of processes producing data to one another" smells a lot like gen_stage.

Thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should talk about it but I doubt GenStage will be helpful here. One of the biggest pitfalls in GenStage is that people move data around too much, when they should not. It is cheaper to move computations than to move data.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus the whole demand approach is unnecessary here. Here is either pending or done (like a promise), no?

@@ -0,0 +1,21 @@
defmodule Nx.Defn.ShardingCompiler.ShardExecution.ArgumentProvider do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need several providers instead of one that receives the index and returns the relevant one?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, this may actually be the simplest, so nevermind.

@josevalim
Copy link
Collaborator

Could we fully decouple the workflow definition and execution from Nx? Ideally we would have a workflow like this:

workflow = %{
  0 => %{
    code: &foo(&1, &2, ...),
    args: [1, 2]
  },
  1 =>  %{
    code: &bar(&1),
    args: [2]
  },
  2 => %{
    code: &baz/0,
    args: []
  }
}

And then we pass this to a ProcessExecutor which is completely independent of Nx and tensors. You could also have a Nx executor, but the overall idea is that the Executor should worry about resources and not necessarily tensors (except the resources the tensors are located).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants