Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add python script for zookeeper benchmark #27

Merged
merged 8 commits into from
Jul 13, 2023
Merged

Add python script for zookeeper benchmark #27

merged 8 commits into from
Jul 13, 2023

Conversation

aayustark007-fk
Copy link
Collaborator

This script connects to zookeeper ensemble, loads data under a parent node and then measures the response time of get_children() call.

Various parameters are configurable and can be fetched by running python zk_benchmark.py -h
To speed up data loading, there are two options, multi-threaded dataloader and multi-process dataloader. Multi-process dataloader has some bugs so its not enabled by default.

Requirements: python 3.10+, tqdm, kazoo

@github-actions
Copy link

github-actions bot commented Jun 27, 2023

Test Results

21 tests   21 ✔️  11s ⏱️
  9 suites    0 💤
  9 files      0

Results for commit 1c48164.

♻️ This comment has been updated with latest results.

configs = self.__get_chunked_configs(child_node_config)
with concurrent.futures.ProcessPoolExecutor(max_workers=self.config.parallelism) as exec:
futures_to_id = {exec.submit(self._process, parent, config): id for (id, config) in enumerate(configs)}
for future in concurrent.futures.as_completed(futures_to_id):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

want to have upper time bound to terminate it deterministically.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the API for multiprocess executor does not provide a way to pass timeout
trying to find a way to pass sigterm to subprocesses

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be kill process group pgid()

Copy link
Collaborator Author

@aayustark007-fk aayustark007-fk Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I read that by using with statement will cause executor.shutdown() to be called which will terminate the processes. My assumption is that if I send a SIGINT via ctrl+c it should cause program to terminate and the with block to get cleaned up.

But, while running I see that when zookeeper becomes unreachable, the multiprocess program behaves badly. The program gets stuck and upon SIGTERM the main process exits but some of the processes continue to exists as zombies. (sending logs to the terminal)

print("process: {} generated an exception: {}".format(id, exc))

def _process(self, parent: str, child_node_config: ChildNodeConfig):
with DataLoader(self.config, zk_config=self.zk_config) as loader:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this creating threadpool in processpool, expected ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a wrapper over thread pool data loader and we ensure that the chunks are divided such that the wrapped dataloader only works on a single chunk.

The reason it is written this way is because, I initially wrote the multithreaded dataloader but the performance was not sufficient due to the limitations imposed by the GIL. Then, I wrote the multi process implementation which tries to share the current object state between the processes. This approach failed due to the requirement that the arguments (and hence the object state) should be pickleable and the zookeeper client was not pickleable. Therefore, I wrapped over existing Dataloader instance which separately creates a zookeeper client in the new process instead of sharing it.

I'm thinking of writing a single threaded data loader, then the multiprocess data loader can wrap over it.

def __exit__(self, exc_type, exc_val, exc_tb):
return True

def __get_chunked_configs(self, child_node_config: ChildNodeConfig) -> list[ChildNodeConfig]:
Copy link
Collaborator

@kmrdhruv kmrdhruv Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with default chunk_size of 10K, and target range of 10K, this will effectively generate single block ?
should default chunk_size be smaller number say 1K ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but the intention is that users of this script will benchmark for node count > 10k
if they need to create less then they can reduce the chunk size
but I think default chunk size can be reduced to 1k

print("loader: {} generated an exception: {}".format(id, exc))

def _fill(self, path: str, child_generator: NodeGenerator, gen_size: int):
for node in tqdm(child_generator, total=gen_size):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does tqdm will take care of parallel execution in terms of progress bar display for each chunk ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it works well with multi threaded executor, but not with multiprocess executor where the progress bars start to overlap

while gen < num:
name_res = ""
if len(name) > 0:
name_res = "{}_{}".format(name, str(gen + 1))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this wouldn't take care of name length requirements, but that should be fine in case fixed name is needed. just add a comment for the same.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the intention is that of name is given then min_len max_len will not be used. Will add comments describing the same.

with catchtime() as t:
for _ in range(measure_samples):
client.ls(path)
print("Latency get_children on path {}: {} ms".format(path, t.time / measure_samples))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this case it wouldn't matter much, but it would be good to have it as min, max, avg over the sample range.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, I'll add this change

@kmrdhruv
Copy link
Collaborator

move this to scripts/benchmarks directory and also add a small readme under same folder on how to install dependencies and run the benchmark as well.

Copy link
Collaborator

@kmrdhruv kmrdhruv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to scripts/benchmarks directory and also add a small readme under same folder on how to install dependencies and run the benchmark as well.

Signed-off-by: aayustark007-fk <[email protected]>
Signed-off-by: aayustark007-fk <[email protected]>
### Currently, holds the script for benchmarking read performance of Zookeeper

Requires: `Python==3.9.7+`

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pip -r requirements.txt ? needed ?

@@ -312,6 +315,7 @@ def __run(self, run_config: BenchmarkRunConfig, skip_measure: bool, dataloading_

## Measure step
self.__measure(path, measure_samples, skip_measure)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: __measure_list()

Copy link
Collaborator

@kmrdhruv kmrdhruv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Do execute the benchmark on a standard zk setup and publish results.

@kmrdhruv kmrdhruv merged commit 612dda2 into master Jul 13, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants