manipulation of acylic directed graphs for hpc orchestrations #1955
-
i am aware of a python/vue solution to a problem statement shaped like this: 'The execution of simulation models requires orchestrating a series of jobs (steps) that depend on each other. In the context of High-Performance Computing, resources are usually scarce as the users' demand rises as their simulations compete in the scheduling system' assume the subject hpc behemoth was paid for by multiple countries, serves global stakeholders including you whoever you are, a heterogeneous python accessible environment and workloads where failure of the orchestrator is probably very expensive based on your own experience with this hpc orchestration problem, how would you advise the boss of the maintainer of the python piece in question, in terms of the durable functions option bullet point? let's say they're proud of the behemoth and the behemoth's data center and the physical location of executing code is a key measure please advise for a better forum if this is not that |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
As you mentioned, our compute for orchestrations/activities/entities are relatively cheap and expendable VMs, especially in comparison to a HPC environment. That makes the normal workflow of orchestrations + activities not a great fit. However, we do support calling HTTP functions, either via the callHttp APIs on the orchestration context, or via just scheduling an activity that can make an external HTTP request. These HTTP requests can be made to compute living on these limited resources, with logic around what to do if the resource is not currently available. Note that you can really use any external communication mechanism to signal to your HPC environment to execute work when the resources are available, as long as you have a way to indicate back to the orchestration when that work is completed. External events work well for this purpose. This allows you to keep your orchestration logic lightweight and reliable upon crashes, but still delegate work properly to your HPC environment. The main caveat I would give is to keep large results from your HPC results local to that environment, so you don't pay the cost of sending around huge pieces of data. Instead, pass around references to those results, along with metadata that your orchestration needs about the results to make decisions. This is good both from an ingress/egress perspective as well as memory consumption of your orchestration. |
Beta Was this translation helpful? Give feedback.
As you mentioned, our compute for orchestrations/activities/entities are relatively cheap and expendable VMs, especially in comparison to a HPC environment. That makes the normal workflow of orchestrations + activities not a great fit.
However, we do support calling HTTP functions, either via the callHttp APIs on the orchestration context, or via just scheduling an activity that can make an external HTTP request. These HTTP requests can be made to compute living on these limited resources, with logic around what to do if the resource is not currently available. Note that you can really use any external communication mechanism to signal to your HPC environment to execute work when the reso…