Skip to content

Commit

Permalink
Documentation for HA - Compute Isolation (#1104)
Browse files Browse the repository at this point in the history
Added information for the Compute Isolation development.

Note the the example is just my best effort, and will require extra
work....along with all the other updates ;-)
  • Loading branch information
rebekah-lawrence authored Jun 13, 2024
1 parent 697d994 commit af73dcb
Show file tree
Hide file tree
Showing 7 changed files with 132 additions and 1 deletion.
1 change: 1 addition & 0 deletions docs/modules/ROOT/pages/phone-homes.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ The following information is sent in a phone home:
** Whether HD memory is enabled
** Whether Tiered Storage is enabled
** Whether User Code Namespaces is enabled; if so, count of registered user code namespaces
** Count of submitted placement controled jobs
**Disabling Phone Homes**

Expand Down
49 changes: 49 additions & 0 deletions docs/modules/architecture/pages/distributed-computing.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,55 @@ introduce a network connection between tasks, sending the data from one
cluster node to the other. This is the basic principle behind
auto-parallelization and distribution.

== Word Count with Job Placement Control

Now we'll take the Word Count task above, but we'll define the location to use for the Jet processing job.

NOTE: Your license key must include `Advanced Compute` to activate this feature.

We'll use the same `ArrayList` that we used in the previous example, but we'll run the Jet processing job on lite members only.

Create your `JobBuilder` for the pipeline:

```java
/**
* Creates a JobBuilder for a new Jet job with Pipeline definition.
*/
default JobBuilder newJobBuilder(Pipeline p) {
return new JobBuilder(this, p);
}
}
```

Define your pipeline and any member selection override to submit your Jet job from your Hazelcast Java client:

```java
HazelcastInstance hz = HazelcastClient.newHazelcastClient();
// ...
Map map = hz.getMap(MAP_NAME);

Pipeline p = Pipeline.create()
.readFrom(Sources.map(map))
.map(Entry::getValue)
.writeTo(sink)
.getPipeline();

Job job = hz.getJet()
.newJobBuilder(p)
.withMemberSelector(JetMemberSelector.ALL_LITE_MEMBERS)
.start();
```

In this form we can clearly identify individual steps taken by the computation:

. Get the Map.
. Read lines from the text source.
. Get the value from the loaded Map.
. Write to the sink.
. Define the job using the `JobBuilder` API.
. Override the default job placement of all cluster members, and select lite members only.
. Run the processing job.

== Core DAG Planner

As you write a pipeline, you form the pipeline DAG and when you submit it for execution, the planner converts it to the core DAG.
Expand Down
14 changes: 13 additions & 1 deletion docs/modules/configuration/pages/jet-configuration.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ joins the cluster. It has no effect on jobs with auto-scaling disabled.
With this feature, you can restart the whole cluster without losing the
jobs and their state. It is implemented on top of Hazelcast's Persistence
feature, which persists the data to disk. You need to have
the Hazelcast {enterprise-product-name} edition and configure Hazelcast's Persistence to
the Hazelcast {enterprise-product-name} and configure Hazelcast's Persistence to
use this feature. The default value is `false`, i.e., disabled.

|`max-processor-accumulated-records`
Expand Down Expand Up @@ -297,6 +297,18 @@ The most important properties are listed here:
Each job has job-specific configuration options. These are covered
in detail in xref:pipelines:configuring-jobs.adoc[].

=== Job Placement Control

To activate job placement control, your license key must include `Advanced Compute`.

Job placement control allows you to define the members to use for Jet job processing. For example, you can manage your workload without worrying that the Jet processing jobs starve resources from your storage components.

NOTE: Your storage components still need to serve the data and this has some impact on their resources. Before using job placement control to manage the workload, ensure that the processing element of the job is substantially more resource-intensive than the data retrieval element.

You can control the placement of the job using the `JetMemberSelector` parameter of the `JobBuilder` API. For further information on `JobBuilder`, refer to the link:https://docs.hazelcast.org/docs/latest/javadoc/com/hazelcast/jet/JetService.JobBuilder.html[API Reference, window=_blank].

You can resubmit the selector configuration when you submit your job from the Hazelcast client. For more information on submitting a job on specific members, see xref:pipelines:submitting-jobs.adoc#isolated-jobs[Submitting Jobs].

== Client Configuration

When using a Hazelcast client to access Jet engine services, the easiest way to
Expand Down
25 changes: 25 additions & 0 deletions docs/modules/pipelines/pages/job-placement-control.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
= Jet Job Placement Control
:description: Your Jet processing jobs can be distributed across a defined subset of the cluster. This approach provides finer control of your Jet processing, which means that you can distribute your workload to meet your requirements.
:page-enterprise: true

{description}

NOTE: Your license key must include `Advanced Compute` to activate this feature.

For example, you can configure Jet processing jobs so that they run on lite members only, allowing you to split your computational and storage requirements without the need to configure each job separately. You control the members to use for your Jet job processing on a job-by-job basis.

Distributing the processing job in this way allows you to find the best balance for your processing and storage requirements. Separating the processing from the data serving requirement means that less stress is put on your storage component's resources, as they only need to serve the data and not carry out any of the processing. This can help you to spread the load on your cluster across the members.

Your storage components still use their resources to serve data. You must be sure that the processing element of the job uses considerably more resources than the data retrieval element before using job placement control in this manner.

To use job placement control, create and/or submit a job using the `JobBuilder` API, which you can use to configure the following:

** The job configuration
** The pipeline or DAG
** The member selection criteria for the processing

For further information on the `JobBuilder` API, refer to the link:https://docs.hazelcast.org/docs/latest/javadoc/com/hazelcast/jet/JetService.JobBuilder.html[API Reference, window=_blank].

For further information on submitting an isolated job, see xref:pipelines:submitting-jobs.adoc#isolated-jobs[Submitting Jobs].

For an example of an isolated job, see xref:architecture:distributed-computing.adoc[], or use the link:https://github.com/hazelcast/hazelcast-code-samples/tree/master/jet/wordcount-compute-isolation[provided code sample]. For further information on using our code samples, refer to the link:https://github.com/hazelcast/hazelcast-code-samples/blob/master/README.md[code samples ReadMe].
3 changes: 3 additions & 0 deletions docs/modules/pipelines/pages/overview.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,9 @@ Some features of the Jet engine include:
- Process infinite out-of-order data streams using event time-based windows.
- Fork data stream to reuse the same intermediate result in more than one way.
- Distribute the processing across all available CPU cores.
- Specify job placement across a defined subset of the cluster
+
NOTE: Your license key must include `Advanced Compute` to activate this feature.

== Pipeline Workflow

Expand Down
40 changes: 40 additions & 0 deletions docs/modules/pipelines/pages/submitting-jobs.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,46 @@ You cannot upload the following classes using the Jet API or the CLI. These clas
* Map features such as EntryProcessor or MapLoader and MapStore
====

=== Job Placement Control

To activate job placement, your license key must include `Advanced Compute`.

You can define which members an individual Jet job runs on. This is known as job placement.

This approach is particularly useful in the following situations:

* When you want to run the job on a lite member to isolate computation from storage
+
NOTE: Before isolating your computation, ensure that the processing element of the job is substantially more resource-intensive than the data retrieval element. Although isolating computation from storage can mean that your storage components benefit from the reduced resource workload, serving the data still has some impact and you must ensure that isolated jobs provide the right balance for your needs.

* When you want to run the job on an edge node to take advantage of edge computing

Job placement supports the following:

* Auto-scaling
* `AT_LEAST_ONCE` and `EXACTLY_ONCE` Fault-tolerance
* Split-brain protection
* Metrics

You can use the Hazelcast Java client to submit your job to specific members as follows:

```java
HazelcastInstance hz = HazelcastClient.newHazelcastClient();
// ...
Map map = hz.getMap(MAP_NAME);
Pipeline p = Pipeline.create()
.readFrom(Sources.map(map))
.map(Entry::getValue)
.writeTo(sink)
.getPipeline();
Job job = hz.getJet()
.newJobBuilder(p)
.withMemberSelector(JetMemberSelector.ALL_LITE_MEMBERS)
.start();
```

For further information on job placement control, see xref:pipelines:job-placement-control.adoc[].

== Submitting a Job using SQL

To submit a job to the cluster with SQL, use the xref:sql:create-job.adoc[`CREATE JOB` statement].
Expand Down
1 change: 1 addition & 0 deletions docs/modules/pipelines/partials/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
**** xref:pipelines:kinesis.adoc[]
**** xref:pipelines:pulsar.adoc[]
** xref:pipelines:serialization.adoc[]
** xref:pipelines:job-placement-control.adoc[]
** xref:pipelines:configuring-jobs.adoc[]
** xref:pipelines:job-security.adoc[]
** xref:pipelines:submitting-jobs.adoc[]
Expand Down

0 comments on commit af73dcb

Please sign in to comment.