CP Subsystem Content Improvements (#798)

* Highlighting OOM risk for all CP structures. * Adding recommendation about having minimal number of CP groups. * Restructuring the overview sections and fixing link formatting. * Adding more clarification to CP groups and starting the best practices content. * Drafting the best practices content and adding to nav. * Addressing review comments.
hazelcast · Sep 7, 2023 · 1a64a5e · 1a64a5e
1 parent d70237c
commit 1a64a5e
Show file tree

Hide file tree

Showing 15 changed files with 151 additions and 96 deletions.
diff --git a/docs/modules/ROOT/examples/dds/ExampleIAtomicReference.java b/docs/modules/ROOT/examples/dds/ExampleIAtomicReference.java
@@ -13,6 +13,7 @@ public static void main(String[] args){
         IAtomicReference<String> ref = hz.getCPSubsystem().getAtomicReference("reference");
         ref.set("foo");
         System.out.println(ref.get());
+        ref.destroy();
         System.exit(0);
         //end::iar[]
     }

diff --git a/docs/modules/ROOT/examples/dds/iatomiclong/ExampleIAtomicLong.java b/docs/modules/ROOT/examples/dds/iatomiclong/ExampleIAtomicLong.java
@@ -15,6 +15,7 @@ public static void main( String[] args ) {
             counter.incrementAndGet();
         }
         System.out.printf( "Count is %s\n", counter.get() );
+        counter.destroy();
         //end::ial[]
     }
 }
diff --git a/docs/modules/ROOT/examples/dds/iatomiclong/IAtomicLongExecuteFuncs.java b/docs/modules/ROOT/examples/dds/iatomiclong/IAtomicLongExecuteFuncs.java
@@ -27,6 +27,7 @@ public static void main( String[] args ) {
         result = atomicLong.getAndAlter( new Add2Function() );
         System.out.println( "getAndAlter.result: " + result );
         System.out.println( "getAndAlter.value: " + atomicLong.get() );
+        atomicLong.destroy();
         //end::ialef[]
     }
     //tag::add2func[]

diff --git a/docs/modules/ROOT/examples/dds/semaphore/SemaphoreMember.java b/docs/modules/ROOT/examples/dds/semaphore/SemaphoreMember.java
@@ -23,6 +23,8 @@ public static void main( String[] args ) throws Exception{
             }
         }
         System.out.println("Finished");
+        semaphore.destroy();
+        resource.destroy();
         //end::sm[]
     }
 }
diff --git a/docs/modules/cp-subsystem/pages/best-practices.adoc b/docs/modules/cp-subsystem/pages/best-practices.adoc
@@ -0,0 +1,11 @@
+= Best Practices
+:description: Consider the following, a wrap-up of the aforementioned recommendations, to get the best throughput out of your CP Subsystem use case.
+
+{description}
+
+* Try to reuse already created or existing xref:cp-subsystem:cp-subsystem.adoc#cp-members[CP objects] (AtomicReference, FencedLock, etc.) and try to avoid creating new ones since those will not automatically be garbage collected.
+* Since each Raft based data structure is bounded by a single xref:cp-subsystem:cp-subsystem.adoc#cp-groups[Raft group], a client will create a separate session on each Raft group which holds the session-based data structure and the client interacts with. And when idle, clients send separate heartbeat messages to each Raft group to keep the sessions alive. For instance, if five clients try to acquire four locks, where each lock is placed into a different Raft group, then there will be 5 x 4 = 20 heartbeat operations committed in total. However, if all locks are put into the same Raft group, there will be only five heartbeats in each period. Therefore, it is strongly recommended that you store all session-based CP data structures (locks and semaphores) in a single Raft group to avoid that cost.
+* Create the minimum number of xref:cp-subsystem:configuration.adoc#choosing-a-group-size[CP groups] that you think are sufficient for your CP subsystem use case.
+
+
+
diff --git a/docs/modules/cp-subsystem/pages/configuration.adoc b/docs/modules/cp-subsystem/pages/configuration.adoc
@@ -12,6 +12,7 @@ NOTE: Before going into production, make sure to read the <<production-checklist
 
 By default, CP Subsystem runs in unsafe mode, which means it is disabled. To enable the CP Subsystem, you must first configure a non-zero value for the <<cp-member-count, `cp-member-count`>> option:
 
+[[cp-member-count]]
 [tabs] 
 ==== 
 XML:: 
@@ -87,22 +88,32 @@ Hazelcast cluster, be aware that member or cluster restarts can fail because of
 
 == Choosing a Group Size
 
-For all xref:cp-subsystem.adoc#cp-groups[CP groups], you can set the number of CP members that should participate in each one, using the <<group-size, `group-size`>> option.
+For all xref:cp-subsystem.adoc#cp-groups[CP groups], you can set the number of CP members
+that should participate in each one, using the <<group-size, `group-size`>> option.
 
-To scale out throughput and memory capacity, you can choose a CP group size that is smaller
-than the CP member count configuration to distribute your CP data structures to
-multiple CP groups.
+To scale out throughput and memory capacity, you can choose a CP group size that is
+smaller than the <<cp-member-count, CP member count>> to distribute your CP data structures to multiple CP groups.
 
-CP groups usually
-consist of an odd number of CP members between three and seven. An odd number of CP members is more
-advantageous to an even number because of the quorum or majority calculations.
-For a CP group of `N` members, the majority is calculated as `N / 2 + 1`. For
-example, in a CP group of five CP members, operations are committed when they are
+CP groups usually consist of an odd number of CP members between three and seven, i.e., it should be either three, five, or seven.
+An odd number of CP members in a group is more advantageous to an even number because of the quorum or majority calculations.
+
+For a CP group of `N` members:
+
+* the majority of members is calculated as `(N + 1) / 2`.
+* the number of failing members you wish to tolerate is calculated as `(N - 1) / 2`.
+
+For example, in a CP group of five CP members, operations are committed when they are
 replicated to at least three CP members. This CP group can tolerate the failure of two CP
-members and remain available. However, if you run a CP group with six CP members,
-it can still tolerate the failure of two CP members because the majority of six is four.
-Therefore, it does not improve the degree of fault tolerance compared to five CP
-members.
+members and remain available.
+
+[NOTE]
+====
+We recommend that you create the minimum number of CP groups that you think are sufficient for your CP subsystem use case.
+Creating separate CP groups allows clients to distribute the load among the leaders of several CP groups.
+However, a large number of CP groups running on the same members may slow down the system.
+If you have accidentally or intentionally created a large number of CP groups, and don't have plans to use them anymore,
+you should xref:cp-subsystem:management.adoc#destroying-a-cp-group-by-force[destroy them].
+====
 
 [[configuring-leadership-priority]]
 == Configuring Leadership Priority
@@ -247,7 +258,7 @@ XML::
 ----
 <hazelcast>
   <cp-subsystem>
-    <group-size>7</grou-size>
+    <group-size>7</group-size>
   </cp-subsystem>
 </hazelcast>
 ----

diff --git a/docs/modules/cp-subsystem/pages/cp-subsystem.adoc b/docs/modules/cp-subsystem/pages/cp-subsystem.adoc
@@ -57,7 +57,7 @@ When a member becomes a CP member, it generates an additional UUID that other CP
 - Member logs
 - Management Center
 
-== Unsafe Mode
+=== Unsafe Mode
 
 When a cluster is not configured with <<cp-members, CP members>>, the CP Subsystem is disabled and any CP data structures operate in
 _unsafe mode_.
@@ -99,7 +99,7 @@ when 6 accessible CP members are available and the configured CP member count is
 To configure the number of CP members that participate in a group, use the xref:configuration.adoc#group-size[`group-size`] option.
 
 [[consensus]]
-== Leaders and Followers
+=== Leaders and Followers
 
 Each CP group elects its
 own Raft leader, which runs the link:http://thesecretlivesofdata.com/raft/[Raft consensus algorithm]. All other CP members in the group then become followers.

diff --git a/docs/modules/cp-subsystem/pages/management.adoc b/docs/modules/cp-subsystem/pages/management.adoc
@@ -82,7 +82,7 @@ member alive, and proceeds its shutdown without attempting a Raft commit on
 _the METADATA CP group_.
 
 See <<cp-membership-listeners, CP Membership Listeners>> to get notified
-about the CP member additions and removals. See also xref:maintain-cluster:shutdown.adoc#shutting-down-a-hazelcast-member[] on how to shut down a member.
+about the CP member additions and removals. See also xref:maintain-cluster:shutdown.adoc#shutting-down-a-hazelcast-member[Shutting Down a Hazelcast Member] for a general information on shutting down specific members and considerations to take into account.
 
 == Handling a Lost Majority
 
@@ -361,6 +361,77 @@ hz-cluster-cp-admin -o get-local-member --address 127.0.0.1 --port 5701
 --
 ====
 
+== Getting CP Members
+
+Get a list of all active CP members in the cluster.
+
+[tabs] 
+====
+Management Center::
++ 
+--
+See xref:{page-latest-supported-mc}@management-center:cp-subsystem:dashboard.adoc[] in the Management Center documentation.
+--
+
+Java member API::
++
+--
+[source,java]
+----
+include::ROOT:example$/cp/CpSubsystemAPI.java[tag=cpmembers]
+----
+--
+
+REST API::
++
+--
+[source,sh]
+----
+curl http://127.0.0.1:5701/hazelcast/rest/cp-subsystem/members
+
+# OR
+
+hz-cluster-cp-admin -o get-members --address 127.0.0.1 --port 5701
+----
+
+.Sample response
+[source,json]
+----
+[{
+    "uuid": "33f84b0f-46ba-4a41-9e0a-29ee284c1c2a",
+    "address": "[127.0.0.1]:5703"
+}, {
+    "uuid": "59ca804c-312c-4cd6-95ff-906b2db13acb",
+    "address": "[127.0.0.1]:5704"
+}, {
+    "uuid": "777ff6ea-b8a3-478d-9642-47d1db019b37",
+    "address": "[127.0.0.1]:5705"
+}, {
+    "uuid": "c6229b44-8976-4602-bb57-d13cf743ccef",
+    "address": "[127.0.0.1]:5701"
+}, {
+    "uuid": "c7856e0f-25d2-4717-9919-88fb3ecb3384",
+    "address": "[127.0.0.1]:5702"
+}]
+----
+--
+====
+
+
+== Creating CP Groups
+
+To create a custom CP group, add an `@` symbol to the end of the data structures name, followed by the name of the group that you want to create and store the data structure in. For example, here, a new CP group is created with the name `myGroup` and then an atomic long called `myAtomicLong` is initialized in this custom CP group: `.getAtomicLong("myAtomicLong@myGroup")`.
+
+If you use fenced locks or semaphores, we recommend using a minimal number of CP groups. For these data structures, a Hazelcast member or client starts
+a new xref:cp-subsystem:cp-subsystem.adoc#sessions[session] on the corresponding CP group when it makes its very first acquisition request, and then periodically commits session heartbeats
+to this CP group in order to indicate its liveliness. If fenced
+locks and semaphores are distributed to multiple CP groups, there will be
+a session management overhead on each CP group. Therefore, for most use
+cases, the `DEFAULT` CP group should be sufficient to maintain all CP data
+structure instances. Custom CP groups is recommended only when you benchmark
+your deployment and decide that performance of the `DEFAULT` CP group is not
+sufficient for your workload.
+
 == Getting CP Groups
 
 To see the list of active CP groups:
@@ -472,62 +543,6 @@ hz-cluster-cp-admin -o get-group --group ${CPGROUP_NAME} --address 127.0.0.1 --p
 --
 ====
 
-== Getting CP Members
-
-Get a list of all active CP members in the cluster.
-
-[tabs] 
-====
-Management Center::
-+ 
---
-See xref:{page-latest-supported-mc}@management-center:cp-subsystem:dashboard.adoc[] in the Management Center documentation.
---
-
-Java member API::
-+
---
-[source,java]
-----
-include::ROOT:example$/cp/CpSubsystemAPI.java[tag=cpmembers]
-----
---
-
-REST API::
-+
---
-[source,sh]
-----
-curl http://127.0.0.1:5701/hazelcast/rest/cp-subsystem/members
-
-# OR
-
-hz-cluster-cp-admin -o get-members --address 127.0.0.1 --port 5701
-----
-
-.Sample response
-[source,json]
-----
-[{
-    "uuid": "33f84b0f-46ba-4a41-9e0a-29ee284c1c2a",
-    "address": "[127.0.0.1]:5703"
-}, {
-    "uuid": "59ca804c-312c-4cd6-95ff-906b2db13acb",
-    "address": "[127.0.0.1]:5704"
-}, {
-    "uuid": "777ff6ea-b8a3-478d-9642-47d1db019b37",
-    "address": "[127.0.0.1]:5705"
-}, {
-    "uuid": "c6229b44-8976-4602-bb57-d13cf743ccef",
-    "address": "[127.0.0.1]:5701"
-}, {
-    "uuid": "c7856e0f-25d2-4717-9919-88fb3ecb3384",
-    "address": "[127.0.0.1]:5702"
-}]
-----
---
-====
-
 == Getting CP Group Sessions
 
 Get all CP sessions that are currently active in a given CP group.

diff --git a/docs/modules/cp-subsystem/partials/nav.adoc b/docs/modules/cp-subsystem/partials/nav.adoc
@@ -2,3 +2,4 @@
 ** xref:cp-subsystem:cp-subsystem.adoc[Overview]
 ** xref:cp-subsystem:configuration.adoc[Configuration]
 ** xref:cp-subsystem:management.adoc[Management]
+** xref:cp-subsystem:best-practices.adoc[Best Practices]
diff --git a/docs/modules/data-structures/pages/distributed-data-structures.adoc b/docs/modules/data-structures/pages/distributed-data-structures.adoc
@@ -263,20 +263,6 @@ objects.
 * If you call the `destroy()` method on a CP data structure object, that data structure is terminated in the underlying CP group and cannot be reinitialized until the CP group is force-destroyed. For this reason,
 please make sure that you are completely done with a CP data structure before destroying it.
 
-=== Creating CP Groups
-
-To create a custom CP group, add an `@` symbol to the end of the data structures name, followed by the name of the group that you want to create and store the data structure in. For example, here, a new CP group is created with the name `myGroup` and then an atomic long called `myAtomicLong` is initialized in this custom CP group: `.getAtomicLong("myAtomicLong@myGroup")`.
-
-If you use fenced locks or semaphores, we recommend using a minimal number of CP groups. For these data structures, a Hazelcast member or client starts
-a new xref:cp-subsystem:cp-subsystem.adoc#sessions[session] on the corresponding CP group when it makes its very first acquisition request, and then periodically commits session heartbeats
-to this CP group in order to indicate its liveliness. If fenced
-locks and semaphores are distributed to multiple CP groups, there will be
-a session management overhead on each CP group. Therefore, for most use
-cases, the `DEFAULT` CP group should be sufficient to maintain all CP data
-structure instances. Custom CP groups is recommended only when you benchmark
-your deployment and decide that performance of the `DEFAULT` CP group is not
-sufficient for your workload.
-
 === Example Code
 
 [source,java]

diff --git a/docs/modules/data-structures/pages/fencedlock.adoc b/docs/modules/data-structures/pages/fencedlock.adoc
@@ -98,15 +98,17 @@ if ( lock.tryLock ( 10, TimeUnit.SECONDS ) ) {
 [[understanding-lock-behavior]]
 == Understanding Lock Behavior
 
+WARNING: Locks are not automatically removed. If a lock is not used anymore, Hazelcast
+does not automatically perform garbage collection in it.
+This can lead to an `OutOfMemoryError`. If you create locks on the fly,
+make sure they are destroyed. See xref:data-structures:distributed-data-structures.adoc#destroying-objects[Destroying Objects]
+and xref:data-structures:distributed-data-structures.adoc#cp-data[CP Data Structures].
+
 * Locks are fail-safe. If a member holds a lock and some other members go down,
 the cluster will keep your locks safe and available.
 Moreover, when a member leaves the cluster, all the locks acquired by that
 dead member will be removed so that those
 locks are immediately available for live members.
-* Locks are not automatically removed. If a lock is not used anymore, Hazelcast
-does not automatically perform garbage collection in the lock.
-This can lead to an `OutOfMemoryError`. If you create locks on the fly,
-make sure they are destroyed.
 * Locks are re-entrant. The same thread can lock multiple times on the same lock.
 Note that for other threads to be able to require this lock, the owner of the lock
 must call `unlock` as many times as the owner called `lock`.
diff --git a/docs/modules/data-structures/pages/iatomiclong.adoc b/docs/modules/data-structures/pages/iatomiclong.adoc
@@ -20,6 +20,12 @@ include::ROOT:example$/dds/iatomiclong/ExampleIAtomicLong.java[tag=ial]
 When you start other instances with the code above,
 you will see the count as *member count* times *a million*.
 
+WARNING: ``IAtomicLong``s are not automatically removed. If an instance is not used anymore, Hazelcast
+does not automatically perform garbage collection in it.
+This can lead to an `OutOfMemoryError`. If you create ``IAtomicLong``s on the fly,
+make sure they are destroyed. See xref:data-structures:distributed-data-structures.adoc#destroying-objects[Destroying Objects]
+and xref:data-structures:distributed-data-structures.adoc#cp-data[CP Data Structures].
+
 [[sending-functions-to-iatomiclong]]
 == Sending Functions to IAtomicLong
 

diff --git a/docs/modules/data-structures/pages/iatomicreference.adoc b/docs/modules/data-structures/pages/iatomicreference.adoc
@@ -19,6 +19,12 @@ When you execute the above example, the output is as follows:
 
 `foo`
 
+WARNING: ``IAtomicReference``s are not automatically removed. If a reference is not used anymore, Hazelcast
+does not automatically perform garbage collection in it.
+This can lead to an `OutOfMemoryError`. If you create ``IAtomicReference``s on the fly,
+make sure they are destroyed. See xref:data-structures:distributed-data-structures.adoc#destroying-objects[Destroying Objects]
+and xref:data-structures:distributed-data-structures.adoc#cp-data[CP Data Structures].
+
 [[sending-functions-to-iatomicreference]]
 == Sending Functions to IAtomicReference
 

diff --git a/docs/modules/data-structures/pages/icountdownlatch.adoc b/docs/modules/data-structures/pages/icountdownlatch.adoc
@@ -36,4 +36,10 @@ The follower class above first retrieves `ICountDownLatch` and then calls the `a
 method to enable the thread to listen for the latch. The method `await` has a timeout
 value as a parameter. This is useful when the `countDown` method fails. To see `ICountDownLatch`
 in action, start the leader first and then start one or more followers. You will see that the
-followers wait until the leader completes.
+followers wait until the leader completes.
+
+WARNING: ``ICountDownLatch ``s are not automatically removed. If a latch is not used anymore, Hazelcast
+does not automatically perform garbage collection in it.
+This can lead to an `OutOfMemoryError`. If you create ``ICountDownLatch``s on the fly,
+make sure they are destroyed. See xref:data-structures:distributed-data-structures.adoc#destroying-objects[Destroying Objects]
+and xref:data-structures:distributed-data-structures.adoc#cp-data[CP Data Structures].