Skip to content

Commit

Permalink
Considerations for Repartitioning and Migration [HZG-27] (#1135)
Browse files Browse the repository at this point in the history
Provides some considerations and guidance when determining the partition
count and cluster size for production data. Specifically this is related
to repartitioning and migrations upon a Hazelcast member shutdown.

---------

Co-authored-by: rebekah-lawrence <[email protected]>
  • Loading branch information
gbarnett-hz and rebekah-lawrence authored Jun 12, 2024
1 parent f3a9b48 commit 697d994
Showing 1 changed file with 8 additions and 0 deletions.
8 changes: 8 additions & 0 deletions docs/modules/ROOT/pages/production-checklist.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -96,3 +96,11 @@ If you are a Hazelcast {enterprise-product-name} customer using the High-Density
we recommend a large increase in partition count, starting with 5009 or higher.

The partition count cannot be easily changed after a cluster is created, so if you have a large cluster be sure to test and set an optimum partition count prior to deployment. If you need to change the partition count after a cluster is already running, you will need to schedule a maintenance window to entirely bring the cluster down. If your cluster uses the xref:storage:persistence.adoc[Persistence] or xref:cp-subsystem:persistence.adoc[CP Persistence] features, those persistent files will need to be removed after the cluster is shut down, as they contain references to the previous partition count. Once all member configurations are updated, and any persistent data structure files are removed, the cluster can be safely restarted.

The partition count also impacts other areas of the system, such as repartitioning and migration durations, which occur when a Hazelcast member is shutdown (gracefully or non-gracefully). For your production data, it is recommended to analyse the following upon a repartitioning and migration to ensure it meets your requirements:

* CPU Utilisation. Repartitioning and migration use all partition operation threads. See xref:cluster-performance:best-practices.adoc#partition-aware-operations[Partition-aware Operations] for information on how to configure the number of threads used.
* Memory. Repartitioning and migration can result in additional memory pressure on cluster members. Ensure you have sufficient memory headroom to service your production requirements.
* Repartitioning and Migration Duration. The duration of a repartitioning and migration is determined by the amount of data in the cluster, number of partitions and the cluster size.

Generally, for the same amount of data, a larger cluster size entails less CPU utilisation, memory pressure and repartitioning and migration durations relative to a smaller cluster size. To monitor the duration of repartitioning and migration durations in your downstream log system you can filter the `INFO` level message emitted by `com.hazelcast.internal.partition.impl.MigrationManager` that matches the pattern `All migration tasks have been completed.` Use the recommendations on this page to determine a partitioning and cluster size that meets your requirements.

0 comments on commit 697d994

Please sign in to comment.