Documentation for HA - Compute Isolation (#1104)

Added information for the Compute Isolation development. Note the the example is just my best effort, and will require extra work....along with all the other updates ;-)
hazelcast · Jun 13, 2024 · af73dcb · af73dcb
1 parent 697d994
commit af73dcb
Show file tree

Hide file tree

Showing 7 changed files with 132 additions and 1 deletion.
diff --git a/docs/modules/ROOT/pages/phone-homes.adoc b/docs/modules/ROOT/pages/phone-homes.adoc
@@ -78,6 +78,7 @@ The following information is sent in a phone home:
 ** Whether HD memory is enabled
 ** Whether Tiered Storage is enabled
 ** Whether User Code Namespaces is enabled; if so, count of registered user code namespaces
+** Count of submitted placement controled jobs
 
 **Disabling Phone Homes**
 

diff --git a/docs/modules/architecture/pages/distributed-computing.adoc b/docs/modules/architecture/pages/distributed-computing.adoc
@@ -121,6 +121,55 @@ introduce a network connection between tasks, sending the data from one
 cluster node to the other. This is the basic principle behind
 auto-parallelization and distribution.
 
+== Word Count with Job Placement Control  
+
+Now we'll take the Word Count task above, but we'll define the location to use for the Jet processing job.
+
+NOTE: Your license key must include `Advanced Compute` to activate this feature.
+
+We'll use the same `ArrayList` that we used in the previous example, but we'll run the Jet processing job on lite members only.
+
+Create your `JobBuilder` for the pipeline:
+
+```java
+   /**
+     * Creates a JobBuilder for a new Jet job with Pipeline definition.
+     */
+    default JobBuilder newJobBuilder(Pipeline p) {
+        return new JobBuilder(this, p);
+    }
+}
+```
+
+Define your pipeline and any member selection override to submit your Jet job from your Hazelcast Java client:
+
+```java
+HazelcastInstance hz = HazelcastClient.newHazelcastClient();
+// ... 
+Map map = hz.getMap(MAP_NAME);
+
+Pipeline p = Pipeline.create()
+                .readFrom(Sources.map(map))
+                .map(Entry::getValue)
+                .writeTo(sink)
+                .getPipeline();
+
+Job job = hz.getJet()
+        .newJobBuilder(p)
+        .withMemberSelector(JetMemberSelector.ALL_LITE_MEMBERS)
+        .start();
+```
+
+In this form we can clearly identify individual steps taken by the computation:
+
+. Get the Map.
+. Read lines from the text source.
+. Get the value from the loaded Map.
+. Write to the sink.
+. Define the job using the `JobBuilder` API.
+. Override the default job placement of all cluster members, and select lite members only.
+. Run the processing job.
+
 == Core DAG Planner
 
 As you write a pipeline, you form the pipeline DAG and when you submit it for execution, the planner converts it to the core DAG.

diff --git a/docs/modules/configuration/pages/jet-configuration.adoc b/docs/modules/configuration/pages/jet-configuration.adoc
@@ -136,7 +136,7 @@ joins the cluster. It has no effect on jobs with auto-scaling disabled.
 With this feature, you can restart the whole cluster without losing the
 jobs and their state. It is implemented on top of Hazelcast's Persistence
 feature, which persists the data to disk. You need to have
-the Hazelcast {enterprise-product-name} edition and configure Hazelcast's Persistence to
+the Hazelcast {enterprise-product-name} and configure Hazelcast's Persistence to
 use this feature. The default value is `false`, i.e., disabled.
 
 |`max-processor-accumulated-records`
@@ -297,6 +297,18 @@ The most important properties are listed here:
 Each job has job-specific configuration options. These are covered
 in detail in xref:pipelines:configuring-jobs.adoc[].
 
+=== Job Placement Control
+
+To activate job placement control, your license key must include `Advanced Compute`.
+
+Job placement control allows you to define the members to use for Jet job processing. For example, you can manage your workload without worrying that the Jet processing jobs starve resources from your storage components.
+
+NOTE: Your storage components still need to serve the data and this has some impact on their resources. Before using job placement control to manage the workload, ensure that the processing element of the job is substantially more resource-intensive than the data retrieval element. 
+
+You can control the placement of the job using the `JetMemberSelector` parameter of the `JobBuilder` API. For further information on `JobBuilder`, refer to the link:https://docs.hazelcast.org/docs/latest/javadoc/com/hazelcast/jet/JetService.JobBuilder.html[API Reference, window=_blank].
+
+You can resubmit the selector configuration when you submit your job from the Hazelcast client. For more information on submitting a job on specific members, see xref:pipelines:submitting-jobs.adoc#isolated-jobs[Submitting Jobs].
+
 == Client Configuration
 
 When using a Hazelcast client to access Jet engine services, the easiest way to

diff --git a/docs/modules/pipelines/pages/job-placement-control.adoc b/docs/modules/pipelines/pages/job-placement-control.adoc
@@ -0,0 +1,25 @@
+= Jet Job Placement Control
+:description: Your Jet processing jobs can be distributed across a defined subset of the cluster. This approach provides finer control of your Jet processing, which means that you can distribute your workload to meet your requirements. 
+:page-enterprise: true
+
+{description}
+
+NOTE: Your license key must include `Advanced Compute` to activate this feature.
+
+For example, you can configure Jet processing jobs so that they run on lite members only, allowing you to split your computational and storage requirements without the need to configure each job separately. You control the members to use for your Jet job processing on a job-by-job basis.
+
+Distributing the processing job in this way allows you to find the best balance for your processing and storage requirements. Separating the processing from the data serving requirement means that less stress is put on your storage component's resources, as they only need to serve the data and not carry out any of the processing. This can help you to spread the load on your cluster across the members.
+
+Your storage components still use their resources to serve data. You must be sure that the processing element of the job uses considerably more resources than the data retrieval element before using job placement control in this manner. 
+
+To use job placement control, create and/or submit a job using the `JobBuilder` API, which you can use to configure the following:
+
+** The job configuration
+** The pipeline or DAG
+** The member selection criteria for the processing
+
+For further information on the `JobBuilder` API, refer to the link:https://docs.hazelcast.org/docs/latest/javadoc/com/hazelcast/jet/JetService.JobBuilder.html[API Reference, window=_blank].
+
+For further information on submitting an isolated job, see xref:pipelines:submitting-jobs.adoc#isolated-jobs[Submitting Jobs].
+
+For an example of an isolated job, see xref:architecture:distributed-computing.adoc[], or use the link:https://github.com/hazelcast/hazelcast-code-samples/tree/master/jet/wordcount-compute-isolation[provided code sample]. For further information on using our code samples, refer to the link:https://github.com/hazelcast/hazelcast-code-samples/blob/master/README.md[code samples ReadMe].
diff --git a/docs/modules/pipelines/pages/overview.adoc b/docs/modules/pipelines/pages/overview.adoc
@@ -32,6 +32,9 @@ Some features of the Jet engine include:
 - Process infinite out-of-order data streams using event time-based windows.
 - Fork data stream to reuse the same intermediate result in more than one way.
 - Distribute the processing across all available CPU cores.
+- Specify job placement across a defined subset of the cluster
++
+NOTE: Your license key must include `Advanced Compute` to activate this feature.
 
 == Pipeline Workflow
 

diff --git a/docs/modules/pipelines/pages/submitting-jobs.adoc b/docs/modules/pipelines/pages/submitting-jobs.adoc
@@ -121,6 +121,46 @@ You cannot upload the following classes using the Jet API or the CLI. These clas
 * Map features such as EntryProcessor or MapLoader and MapStore
 ====
 
+=== Job Placement Control
+
+To activate job placement, your license key must include `Advanced Compute`.
+
+You can define which members an individual Jet job runs on. This is known as job placement.
+
+This approach is particularly useful in the following situations:
+
+* When you want to run the job on a lite member to isolate computation from storage
++
+NOTE: Before isolating your computation, ensure that the processing element of the job is substantially more resource-intensive than the data retrieval element. Although isolating computation from storage can mean that your storage components benefit from the reduced resource workload, serving the data still has some impact and you must ensure that isolated jobs provide the right balance for your needs.
+
+* When you want to run the job on an edge node to take advantage of edge computing
+
+Job placement supports the following:
+
+* Auto-scaling
+* `AT_LEAST_ONCE` and `EXACTLY_ONCE` Fault-tolerance
+* Split-brain protection
+* Metrics
+
+You can use the Hazelcast Java client to submit your job to specific members as follows:
+
+```java
+HazelcastInstance hz = HazelcastClient.newHazelcastClient();
+// ... 
+Map map = hz.getMap(MAP_NAME);
+Pipeline p = Pipeline.create()
+                .readFrom(Sources.map(map))
+                .map(Entry::getValue)
+                .writeTo(sink)
+                .getPipeline();
+Job job = hz.getJet()
+        .newJobBuilder(p)
+        .withMemberSelector(JetMemberSelector.ALL_LITE_MEMBERS)
+        .start();
+```
+
+For further information on job placement control, see xref:pipelines:job-placement-control.adoc[].
+
 == Submitting a Job using SQL
 
 To submit a job to the cluster with SQL, use the xref:sql:create-job.adoc[`CREATE JOB` statement].

diff --git a/docs/modules/pipelines/partials/nav.adoc b/docs/modules/pipelines/partials/nav.adoc
@@ -29,6 +29,7 @@
 **** xref:pipelines:kinesis.adoc[]
 **** xref:pipelines:pulsar.adoc[]
 ** xref:pipelines:serialization.adoc[]
+** xref:pipelines:job-placement-control.adoc[]
 ** xref:pipelines:configuring-jobs.adoc[]
 ** xref:pipelines:job-security.adoc[]
 ** xref:pipelines:submitting-jobs.adoc[]