-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add python script for zookeeper benchmark #27
Conversation
scripts/zk_benchmark.py
Outdated
configs = self.__get_chunked_configs(child_node_config) | ||
with concurrent.futures.ProcessPoolExecutor(max_workers=self.config.parallelism) as exec: | ||
futures_to_id = {exec.submit(self._process, parent, config): id for (id, config) in enumerate(configs)} | ||
for future in concurrent.futures.as_completed(futures_to_id): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
want to have upper time bound to terminate it deterministically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the API for multiprocess executor does not provide a way to pass timeout
trying to find a way to pass sigterm to subprocesses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be kill process group pgid()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I read that by using with
statement will cause executor.shutdown()
to be called which will terminate the processes. My assumption is that if I send a SIGINT via ctrl+c it should cause program to terminate and the with
block to get cleaned up.
But, while running I see that when zookeeper becomes unreachable, the multiprocess program behaves badly. The program gets stuck and upon SIGTERM the main process exits but some of the processes continue to exists as zombies. (sending logs to the terminal)
scripts/zk_benchmark.py
Outdated
print("process: {} generated an exception: {}".format(id, exc)) | ||
|
||
def _process(self, parent: str, child_node_config: ChildNodeConfig): | ||
with DataLoader(self.config, zk_config=self.zk_config) as loader: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this creating threadpool in processpool, expected ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a wrapper over thread pool data loader and we ensure that the chunks are divided such that the wrapped dataloader only works on a single chunk.
The reason it is written this way is because, I initially wrote the multithreaded dataloader but the performance was not sufficient due to the limitations imposed by the GIL. Then, I wrote the multi process implementation which tries to share the current object state between the processes. This approach failed due to the requirement that the arguments (and hence the object state) should be pickleable and the zookeeper client was not pickleable. Therefore, I wrapped over existing Dataloader instance which separately creates a zookeeper client in the new process instead of sharing it.
I'm thinking of writing a single threaded data loader, then the multiprocess data loader can wrap over it.
scripts/zk_benchmark.py
Outdated
def __exit__(self, exc_type, exc_val, exc_tb): | ||
return True | ||
|
||
def __get_chunked_configs(self, child_node_config: ChildNodeConfig) -> list[ChildNodeConfig]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with default chunk_size of 10K, and target range of 10K, this will effectively generate single block ?
should default chunk_size be smaller number say 1K ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but the intention is that users of this script will benchmark for node count > 10k
if they need to create less then they can reduce the chunk size
but I think default chunk size can be reduced to 1k
scripts/zk_benchmark.py
Outdated
print("loader: {} generated an exception: {}".format(id, exc)) | ||
|
||
def _fill(self, path: str, child_generator: NodeGenerator, gen_size: int): | ||
for node in tqdm(child_generator, total=gen_size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does tqdm will take care of parallel execution in terms of progress bar display for each chunk ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it works well with multi threaded executor, but not with multiprocess executor where the progress bars start to overlap
scripts/zk_benchmark.py
Outdated
while gen < num: | ||
name_res = "" | ||
if len(name) > 0: | ||
name_res = "{}_{}".format(name, str(gen + 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this wouldn't take care of name length requirements, but that should be fine in case fixed name is needed. just add a comment for the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the intention is that of name is given then min_len max_len will not be used. Will add comments describing the same.
scripts/zk_benchmark.py
Outdated
with catchtime() as t: | ||
for _ in range(measure_samples): | ||
client.ls(path) | ||
print("Latency get_children on path {}: {} ms".format(path, t.time / measure_samples)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this case it wouldn't matter much, but it would be good to have it as min, max, avg over the sample range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree, I'll add this change
move this to scripts/benchmarks directory and also add a small readme under same folder on how to install dependencies and run the benchmark as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to scripts/benchmarks directory and also add a small readme under same folder on how to install dependencies and run the benchmark as well.
Signed-off-by: aayustark007-fk <[email protected]>
Signed-off-by: aayustark007-fk <[email protected]>
Signed-off-by: aayustark007-fk <[email protected]>
0cc7d88
to
c0e2ed9
Compare
…ess loader impl Signed-off-by: aayustark007-fk <[email protected]>
Signed-off-by: aayustark007-fk <[email protected]>
### Currently, holds the script for benchmarking read performance of Zookeeper | ||
|
||
Requires: `Python==3.9.7+` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pip -r requirements.txt ? needed ?
scripts/benchmark/zk_benchmark.py
Outdated
@@ -312,6 +315,7 @@ def __run(self, run_config: BenchmarkRunConfig, skip_measure: bool, dataloading_ | |||
|
|||
## Measure step | |||
self.__measure(path, measure_samples, skip_measure) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: __measure_list()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Do execute the benchmark on a standard zk setup and publish results.
This script connects to zookeeper ensemble, loads data under a parent node and then measures the response time of get_children() call.
Various parameters are configurable and can be fetched by running
python zk_benchmark.py -h
To speed up data loading, there are two options, multi-threaded dataloader and multi-process dataloader. Multi-process dataloader has some bugs so its not enabled by default.
Requirements: python 3.10+, tqdm, kazoo