From 697d99405149fa3d873210d6686781b0f110fbe5 Mon Sep 17 00:00:00 2001 From: Granville Barnett <140408555+gbarnett-hz@users.noreply.github.com> Date: Wed, 12 Jun 2024 13:22:53 +0100 Subject: [PATCH] Considerations for Repartitioning and Migration [HZG-27] (#1135) Provides some considerations and guidance when determining the partition count and cluster size for production data. Specifically this is related to repartitioning and migrations upon a Hazelcast member shutdown. --------- Co-authored-by: rebekah-lawrence <142301480+rebekah-lawrence@users.noreply.github.com> --- docs/modules/ROOT/pages/production-checklist.adoc | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/docs/modules/ROOT/pages/production-checklist.adoc b/docs/modules/ROOT/pages/production-checklist.adoc index 5a21dcce1..92f4963cc 100644 --- a/docs/modules/ROOT/pages/production-checklist.adoc +++ b/docs/modules/ROOT/pages/production-checklist.adoc @@ -96,3 +96,11 @@ If you are a Hazelcast {enterprise-product-name} customer using the High-Density we recommend a large increase in partition count, starting with 5009 or higher. The partition count cannot be easily changed after a cluster is created, so if you have a large cluster be sure to test and set an optimum partition count prior to deployment. If you need to change the partition count after a cluster is already running, you will need to schedule a maintenance window to entirely bring the cluster down. If your cluster uses the xref:storage:persistence.adoc[Persistence] or xref:cp-subsystem:persistence.adoc[CP Persistence] features, those persistent files will need to be removed after the cluster is shut down, as they contain references to the previous partition count. Once all member configurations are updated, and any persistent data structure files are removed, the cluster can be safely restarted. + +The partition count also impacts other areas of the system, such as repartitioning and migration durations, which occur when a Hazelcast member is shutdown (gracefully or non-gracefully). For your production data, it is recommended to analyse the following upon a repartitioning and migration to ensure it meets your requirements: + +* CPU Utilisation. Repartitioning and migration use all partition operation threads. See xref:cluster-performance:best-practices.adoc#partition-aware-operations[Partition-aware Operations] for information on how to configure the number of threads used. +* Memory. Repartitioning and migration can result in additional memory pressure on cluster members. Ensure you have sufficient memory headroom to service your production requirements. +* Repartitioning and Migration Duration. The duration of a repartitioning and migration is determined by the amount of data in the cluster, number of partitions and the cluster size. + +Generally, for the same amount of data, a larger cluster size entails less CPU utilisation, memory pressure and repartitioning and migration durations relative to a smaller cluster size. To monitor the duration of repartitioning and migration durations in your downstream log system you can filter the `INFO` level message emitted by `com.hazelcast.internal.partition.impl.MigrationManager` that matches the pattern `All migration tasks have been completed.` Use the recommendations on this page to determine a partitioning and cluster size that meets your requirements.