Skip to content

Commit

Permalink
CP Subsystem Content Improvements (#798)
Browse files Browse the repository at this point in the history
* Highlighting OOM risk for all CP structures.

* Adding recommendation about having minimal number of CP groups.

* Restructuring the overview sections and fixing link formatting.

* Adding more clarification to CP groups and starting the best practices content.

* Drafting the best practices content and adding to nav.

* Addressing review comments.
  • Loading branch information
Serdaro authored Sep 7, 2023
1 parent d70237c commit 1a64a5e
Show file tree
Hide file tree
Showing 15 changed files with 151 additions and 96 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ public static void main(String[] args){
IAtomicReference<String> ref = hz.getCPSubsystem().getAtomicReference("reference");
ref.set("foo");
System.out.println(ref.get());
ref.destroy();
System.exit(0);
//end::iar[]
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ public static void main( String[] args ) {
counter.incrementAndGet();
}
System.out.printf( "Count is %s\n", counter.get() );
counter.destroy();
//end::ial[]
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ public static void main( String[] args ) {
result = atomicLong.getAndAlter( new Add2Function() );
System.out.println( "getAndAlter.result: " + result );
System.out.println( "getAndAlter.value: " + atomicLong.get() );
atomicLong.destroy();
//end::ialef[]
}
//tag::add2func[]
Expand Down
2 changes: 2 additions & 0 deletions docs/modules/ROOT/examples/dds/semaphore/SemaphoreMember.java
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ public static void main( String[] args ) throws Exception{
}
}
System.out.println("Finished");
semaphore.destroy();
resource.destroy();
//end::sm[]
}
}
11 changes: 11 additions & 0 deletions docs/modules/cp-subsystem/pages/best-practices.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
= Best Practices
:description: Consider the following, a wrap-up of the aforementioned recommendations, to get the best throughput out of your CP Subsystem use case.

{description}

* Try to reuse already created or existing xref:cp-subsystem:cp-subsystem.adoc#cp-members[CP objects] (AtomicReference, FencedLock, etc.) and try to avoid creating new ones since those will not automatically be garbage collected.
* Since each Raft based data structure is bounded by a single xref:cp-subsystem:cp-subsystem.adoc#cp-groups[Raft group], a client will create a separate session on each Raft group which holds the session-based data structure and the client interacts with. And when idle, clients send separate heartbeat messages to each Raft group to keep the sessions alive. For instance, if five clients try to acquire four locks, where each lock is placed into a different Raft group, then there will be 5 x 4 = 20 heartbeat operations committed in total. However, if all locks are put into the same Raft group, there will be only five heartbeats in each period. Therefore, it is strongly recommended that you store all session-based CP data structures (locks and semaphores) in a single Raft group to avoid that cost.
* Create the minimum number of xref:cp-subsystem:configuration.adoc#choosing-a-group-size[CP groups] that you think are sufficient for your CP subsystem use case.
39 changes: 25 additions & 14 deletions docs/modules/cp-subsystem/pages/configuration.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ NOTE: Before going into production, make sure to read the <<production-checklist

By default, CP Subsystem runs in unsafe mode, which means it is disabled. To enable the CP Subsystem, you must first configure a non-zero value for the <<cp-member-count, `cp-member-count`>> option:

[[cp-member-count]]
[tabs]
====
XML::
Expand Down Expand Up @@ -87,22 +88,32 @@ Hazelcast cluster, be aware that member or cluster restarts can fail because of

== Choosing a Group Size

For all xref:cp-subsystem.adoc#cp-groups[CP groups], you can set the number of CP members that should participate in each one, using the <<group-size, `group-size`>> option.
For all xref:cp-subsystem.adoc#cp-groups[CP groups], you can set the number of CP members
that should participate in each one, using the <<group-size, `group-size`>> option.

To scale out throughput and memory capacity, you can choose a CP group size that is smaller
than the CP member count configuration to distribute your CP data structures to
multiple CP groups.
To scale out throughput and memory capacity, you can choose a CP group size that is
smaller than the <<cp-member-count, CP member count>> to distribute your CP data structures to multiple CP groups.

CP groups usually
consist of an odd number of CP members between three and seven. An odd number of CP members is more
advantageous to an even number because of the quorum or majority calculations.
For a CP group of `N` members, the majority is calculated as `N / 2 + 1`. For
example, in a CP group of five CP members, operations are committed when they are
CP groups usually consist of an odd number of CP members between three and seven, i.e., it should be either three, five, or seven.
An odd number of CP members in a group is more advantageous to an even number because of the quorum or majority calculations.

For a CP group of `N` members:

* the majority of members is calculated as `(N + 1) / 2`.
* the number of failing members you wish to tolerate is calculated as `(N - 1) / 2`.

For example, in a CP group of five CP members, operations are committed when they are
replicated to at least three CP members. This CP group can tolerate the failure of two CP
members and remain available. However, if you run a CP group with six CP members,
it can still tolerate the failure of two CP members because the majority of six is four.
Therefore, it does not improve the degree of fault tolerance compared to five CP
members.
members and remain available.

[NOTE]
====
We recommend that you create the minimum number of CP groups that you think are sufficient for your CP subsystem use case.
Creating separate CP groups allows clients to distribute the load among the leaders of several CP groups.
However, a large number of CP groups running on the same members may slow down the system.
If you have accidentally or intentionally created a large number of CP groups, and don't have plans to use them anymore,
you should xref:cp-subsystem:management.adoc#destroying-a-cp-group-by-force[destroy them].
====

[[configuring-leadership-priority]]
== Configuring Leadership Priority
Expand Down Expand Up @@ -247,7 +258,7 @@ XML::
----
<hazelcast>
<cp-subsystem>
<group-size>7</grou-size>
<group-size>7</group-size>
</cp-subsystem>
</hazelcast>
----
Expand Down
4 changes: 2 additions & 2 deletions docs/modules/cp-subsystem/pages/cp-subsystem.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ When a member becomes a CP member, it generates an additional UUID that other CP
- Member logs
- Management Center

== Unsafe Mode
=== Unsafe Mode

When a cluster is not configured with <<cp-members, CP members>>, the CP Subsystem is disabled and any CP data structures operate in
_unsafe mode_.
Expand Down Expand Up @@ -99,7 +99,7 @@ when 6 accessible CP members are available and the configured CP member count is
To configure the number of CP members that participate in a group, use the xref:configuration.adoc#group-size[`group-size`] option.

[[consensus]]
== Leaders and Followers
=== Leaders and Followers

Each CP group elects its
own Raft leader, which runs the link:http://thesecretlivesofdata.com/raft/[Raft consensus algorithm]. All other CP members in the group then become followers.
Expand Down
129 changes: 72 additions & 57 deletions docs/modules/cp-subsystem/pages/management.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ member alive, and proceeds its shutdown without attempting a Raft commit on
_the METADATA CP group_.

See <<cp-membership-listeners, CP Membership Listeners>> to get notified
about the CP member additions and removals. See also xref:maintain-cluster:shutdown.adoc#shutting-down-a-hazelcast-member[] on how to shut down a member.
about the CP member additions and removals. See also xref:maintain-cluster:shutdown.adoc#shutting-down-a-hazelcast-member[Shutting Down a Hazelcast Member] for a general information on shutting down specific members and considerations to take into account.

== Handling a Lost Majority

Expand Down Expand Up @@ -361,6 +361,77 @@ hz-cluster-cp-admin -o get-local-member --address 127.0.0.1 --port 5701
--
====

== Getting CP Members

Get a list of all active CP members in the cluster.

[tabs]
====
Management Center::
+
--
See xref:{page-latest-supported-mc}@management-center:cp-subsystem:dashboard.adoc[] in the Management Center documentation.
--
Java member API::
+
--
[source,java]
----
include::ROOT:example$/cp/CpSubsystemAPI.java[tag=cpmembers]
----
--
REST API::
+
--
[source,sh]
----
curl http://127.0.0.1:5701/hazelcast/rest/cp-subsystem/members
# OR
hz-cluster-cp-admin -o get-members --address 127.0.0.1 --port 5701
----
.Sample response
[source,json]
----
[{
"uuid": "33f84b0f-46ba-4a41-9e0a-29ee284c1c2a",
"address": "[127.0.0.1]:5703"
}, {
"uuid": "59ca804c-312c-4cd6-95ff-906b2db13acb",
"address": "[127.0.0.1]:5704"
}, {
"uuid": "777ff6ea-b8a3-478d-9642-47d1db019b37",
"address": "[127.0.0.1]:5705"
}, {
"uuid": "c6229b44-8976-4602-bb57-d13cf743ccef",
"address": "[127.0.0.1]:5701"
}, {
"uuid": "c7856e0f-25d2-4717-9919-88fb3ecb3384",
"address": "[127.0.0.1]:5702"
}]
----
--
====


== Creating CP Groups

To create a custom CP group, add an `@` symbol to the end of the data structures name, followed by the name of the group that you want to create and store the data structure in. For example, here, a new CP group is created with the name `myGroup` and then an atomic long called `myAtomicLong` is initialized in this custom CP group: `.getAtomicLong("myAtomicLong@myGroup")`.

If you use fenced locks or semaphores, we recommend using a minimal number of CP groups. For these data structures, a Hazelcast member or client starts
a new xref:cp-subsystem:cp-subsystem.adoc#sessions[session] on the corresponding CP group when it makes its very first acquisition request, and then periodically commits session heartbeats
to this CP group in order to indicate its liveliness. If fenced
locks and semaphores are distributed to multiple CP groups, there will be
a session management overhead on each CP group. Therefore, for most use
cases, the `DEFAULT` CP group should be sufficient to maintain all CP data
structure instances. Custom CP groups is recommended only when you benchmark
your deployment and decide that performance of the `DEFAULT` CP group is not
sufficient for your workload.

== Getting CP Groups

To see the list of active CP groups:
Expand Down Expand Up @@ -472,62 +543,6 @@ hz-cluster-cp-admin -o get-group --group ${CPGROUP_NAME} --address 127.0.0.1 --p
--
====

== Getting CP Members

Get a list of all active CP members in the cluster.

[tabs]
====
Management Center::
+
--
See xref:{page-latest-supported-mc}@management-center:cp-subsystem:dashboard.adoc[] in the Management Center documentation.
--
Java member API::
+
--
[source,java]
----
include::ROOT:example$/cp/CpSubsystemAPI.java[tag=cpmembers]
----
--
REST API::
+
--
[source,sh]
----
curl http://127.0.0.1:5701/hazelcast/rest/cp-subsystem/members
# OR
hz-cluster-cp-admin -o get-members --address 127.0.0.1 --port 5701
----
.Sample response
[source,json]
----
[{
"uuid": "33f84b0f-46ba-4a41-9e0a-29ee284c1c2a",
"address": "[127.0.0.1]:5703"
}, {
"uuid": "59ca804c-312c-4cd6-95ff-906b2db13acb",
"address": "[127.0.0.1]:5704"
}, {
"uuid": "777ff6ea-b8a3-478d-9642-47d1db019b37",
"address": "[127.0.0.1]:5705"
}, {
"uuid": "c6229b44-8976-4602-bb57-d13cf743ccef",
"address": "[127.0.0.1]:5701"
}, {
"uuid": "c7856e0f-25d2-4717-9919-88fb3ecb3384",
"address": "[127.0.0.1]:5702"
}]
----
--
====

== Getting CP Group Sessions

Get all CP sessions that are currently active in a given CP group.
Expand Down
1 change: 1 addition & 0 deletions docs/modules/cp-subsystem/partials/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@
** xref:cp-subsystem:cp-subsystem.adoc[Overview]
** xref:cp-subsystem:configuration.adoc[Configuration]
** xref:cp-subsystem:management.adoc[Management]
** xref:cp-subsystem:best-practices.adoc[Best Practices]
Original file line number Diff line number Diff line change
Expand Up @@ -263,20 +263,6 @@ objects.
* If you call the `destroy()` method on a CP data structure object, that data structure is terminated in the underlying CP group and cannot be reinitialized until the CP group is force-destroyed. For this reason,
please make sure that you are completely done with a CP data structure before destroying it.

=== Creating CP Groups

To create a custom CP group, add an `@` symbol to the end of the data structures name, followed by the name of the group that you want to create and store the data structure in. For example, here, a new CP group is created with the name `myGroup` and then an atomic long called `myAtomicLong` is initialized in this custom CP group: `.getAtomicLong("myAtomicLong@myGroup")`.

If you use fenced locks or semaphores, we recommend using a minimal number of CP groups. For these data structures, a Hazelcast member or client starts
a new xref:cp-subsystem:cp-subsystem.adoc#sessions[session] on the corresponding CP group when it makes its very first acquisition request, and then periodically commits session heartbeats
to this CP group in order to indicate its liveliness. If fenced
locks and semaphores are distributed to multiple CP groups, there will be
a session management overhead on each CP group. Therefore, for most use
cases, the `DEFAULT` CP group should be sufficient to maintain all CP data
structure instances. Custom CP groups is recommended only when you benchmark
your deployment and decide that performance of the `DEFAULT` CP group is not
sufficient for your workload.

=== Example Code

[source,java]
Expand Down
10 changes: 6 additions & 4 deletions docs/modules/data-structures/pages/fencedlock.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -98,15 +98,17 @@ if ( lock.tryLock ( 10, TimeUnit.SECONDS ) ) {
[[understanding-lock-behavior]]
== Understanding Lock Behavior

WARNING: Locks are not automatically removed. If a lock is not used anymore, Hazelcast
does not automatically perform garbage collection in it.
This can lead to an `OutOfMemoryError`. If you create locks on the fly,
make sure they are destroyed. See xref:data-structures:distributed-data-structures.adoc#destroying-objects[Destroying Objects]
and xref:data-structures:distributed-data-structures.adoc#cp-data[CP Data Structures].

* Locks are fail-safe. If a member holds a lock and some other members go down,
the cluster will keep your locks safe and available.
Moreover, when a member leaves the cluster, all the locks acquired by that
dead member will be removed so that those
locks are immediately available for live members.
* Locks are not automatically removed. If a lock is not used anymore, Hazelcast
does not automatically perform garbage collection in the lock.
This can lead to an `OutOfMemoryError`. If you create locks on the fly,
make sure they are destroyed.
* Locks are re-entrant. The same thread can lock multiple times on the same lock.
Note that for other threads to be able to require this lock, the owner of the lock
must call `unlock` as many times as the owner called `lock`.
6 changes: 6 additions & 0 deletions docs/modules/data-structures/pages/iatomiclong.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,12 @@ include::ROOT:example$/dds/iatomiclong/ExampleIAtomicLong.java[tag=ial]
When you start other instances with the code above,
you will see the count as *member count* times *a million*.

WARNING: ``IAtomicLong``s are not automatically removed. If an instance is not used anymore, Hazelcast
does not automatically perform garbage collection in it.
This can lead to an `OutOfMemoryError`. If you create ``IAtomicLong``s on the fly,
make sure they are destroyed. See xref:data-structures:distributed-data-structures.adoc#destroying-objects[Destroying Objects]
and xref:data-structures:distributed-data-structures.adoc#cp-data[CP Data Structures].

[[sending-functions-to-iatomiclong]]
== Sending Functions to IAtomicLong

Expand Down
6 changes: 6 additions & 0 deletions docs/modules/data-structures/pages/iatomicreference.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,12 @@ When you execute the above example, the output is as follows:

`foo`

WARNING: ``IAtomicReference``s are not automatically removed. If a reference is not used anymore, Hazelcast
does not automatically perform garbage collection in it.
This can lead to an `OutOfMemoryError`. If you create ``IAtomicReference``s on the fly,
make sure they are destroyed. See xref:data-structures:distributed-data-structures.adoc#destroying-objects[Destroying Objects]
and xref:data-structures:distributed-data-structures.adoc#cp-data[CP Data Structures].

[[sending-functions-to-iatomicreference]]
== Sending Functions to IAtomicReference

Expand Down
8 changes: 7 additions & 1 deletion docs/modules/data-structures/pages/icountdownlatch.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -36,4 +36,10 @@ The follower class above first retrieves `ICountDownLatch` and then calls the `a
method to enable the thread to listen for the latch. The method `await` has a timeout
value as a parameter. This is useful when the `countDown` method fails. To see `ICountDownLatch`
in action, start the leader first and then start one or more followers. You will see that the
followers wait until the leader completes.
followers wait until the leader completes.

WARNING: ``ICountDownLatch ``s are not automatically removed. If a latch is not used anymore, Hazelcast
does not automatically perform garbage collection in it.
This can lead to an `OutOfMemoryError`. If you create ``ICountDownLatch``s on the fly,
make sure they are destroyed. See xref:data-structures:distributed-data-structures.adoc#destroying-objects[Destroying Objects]
and xref:data-structures:distributed-data-structures.adoc#cp-data[CP Data Structures].
Loading

0 comments on commit 1a64a5e

Please sign in to comment.