-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add shard execution workflow #1557
base: pv-feat/experimental-sharding-backend
Are you sure you want to change the base?
feat: add shard execution workflow #1557
Conversation
@@ -4,6 +4,7 @@ defmodule Nx.Application do | |||
|
|||
def start(_type, _args) do | |||
children = [ | |||
Nx.Defn.ShardingCompiler.ShardRegistry, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we want to have this here, actually.
In fact, I think we might actually want to go with gen_stage for the execution, since the whole "chain of processes producing data to one another" smells a lot like gen_stage.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should talk about it but I doubt GenStage will be helpful here. One of the biggest pitfalls in GenStage is that people move data around too much, when they should not. It is cheaper to move computations than to move data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plus the whole demand approach is unnecessary here. Here is either pending or done (like a promise), no?
@@ -0,0 +1,21 @@ | |||
defmodule Nx.Defn.ShardingCompiler.ShardExecution.ArgumentProvider do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need several providers instead of one that receives the index and returns the relevant one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, this may actually be the simplest, so nevermind.
Could we fully decouple the workflow definition and execution from Nx? Ideally we would have a workflow like this:
And then we pass this to a ProcessExecutor which is completely independent of Nx and tensors. You could also have a Nx executor, but the overall idea is that the Executor should worry about resources and not necessarily tensors (except the resources the tensors are located). |
Adds the initial version of the process communication structure for sharded execution.
Does not handle container outputs for the sharded function yet,
and also does not yet bring everything together into the compiler jit function.