feat: add shard execution workflow #1557

polvalente · 2024-11-08T01:19:52Z

Adds the initial version of the process communication structure for sharded execution.

Does not handle container outputs for the sharded function yet,
and also does not yet bring everything together into the compiler jit function.

polvalente · 2024-11-08T01:38:25Z

nx/lib/nx/application.ex

@@ -4,6 +4,7 @@ defmodule Nx.Application do

  def start(_type, _args) do
    children = [
+      Nx.Defn.ShardingCompiler.ShardRegistry,


I'm not sure we want to have this here, actually.
In fact, I think we might actually want to go with gen_stage for the execution, since the whole "chain of processes producing data to one another" smells a lot like gen_stage.

Thoughts?

We should talk about it but I doubt GenStage will be helpful here. One of the biggest pitfalls in GenStage is that people move data around too much, when they should not. It is cheaper to move computations than to move data.

Plus the whole demand approach is unnecessary here. Here is either pending or done (like a promise), no?

josevalim · 2024-11-11T11:20:38Z

nx/lib/nx/defn/sharding_compiler/shard_execution/argument_provider.ex

@@ -0,0 +1,21 @@
+defmodule Nx.Defn.ShardingCompiler.ShardExecution.ArgumentProvider do


Why do you need several providers instead of one that receives the index and returns the relevant one?

Hrm, this may actually be the simplest, so nevermind.

josevalim · 2024-11-11T11:27:01Z

Could we fully decouple the workflow definition and execution from Nx? Ideally we would have a workflow like this:

workflow = %{
  0 => %{
    code: &foo(&1, &2, ...),
    args: [1, 2]
  },
  1 =>  %{
    code: &bar(&1),
    args: [2]
  },
  2 => %{
    code: &baz/0,
    args: []
  }
}

And then we pass this to a ProcessExecutor which is completely independent of Nx and tensors. You could also have a Nx executor, but the overall idea is that the Executor should worry about resources and not necessarily tensors (except the resources the tensors are located).

polvalente added 11 commits October 24, 2024 00:15

refactor: bring changes from nx_iree

427a434

fix: optional function support

3a24374

test: test anonymous functions

f49daab

feat: start producing shard execution processes

aa2ffe7

wip: make code compile

f6eed5a

wip: shard execution implementation

6315546

refactor: use Registry and :erpc.multicall for process registration

5a5fbaf

fix: use local get function

eecbb28

refactor: use uniform sharding only

553a5b8

feat: sharding concurrent execution proof of concept working

14e558f

feat: minimal e2e example of shard execution working

fafaa0d

polvalente self-assigned this Nov 8, 2024

polvalente changed the base branch from main to pv-feat/experimental-sharding-backend November 8, 2024 01:20

polvalente requested a review from josevalim November 8, 2024 01:20

polvalente commented Nov 8, 2024

View reviewed changes

josevalim reviewed Nov 11, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add shard execution workflow #1557

feat: add shard execution workflow #1557

polvalente commented Nov 8, 2024

polvalente Nov 8, 2024

josevalim Nov 11, 2024

josevalim Nov 11, 2024

josevalim Nov 11, 2024

josevalim Nov 11, 2024

josevalim commented Nov 11, 2024

		@@ -0,0 +1,21 @@
		defmodule Nx.Defn.ShardingCompiler.ShardExecution.ArgumentProvider do

feat: add shard execution workflow #1557

Are you sure you want to change the base?

feat: add shard execution workflow #1557

Conversation

polvalente commented Nov 8, 2024

polvalente Nov 8, 2024

Choose a reason for hiding this comment

josevalim Nov 11, 2024

Choose a reason for hiding this comment

josevalim Nov 11, 2024

Choose a reason for hiding this comment

josevalim Nov 11, 2024

Choose a reason for hiding this comment

josevalim Nov 11, 2024

Choose a reason for hiding this comment

josevalim commented Nov 11, 2024