Skip to content

Commit

Permalink
Apply suggestions from code review
Browse files Browse the repository at this point in the history
Co-authored-by: Dalton Bohning <[email protected]>
Signed-off-by: 0xE0F <[email protected]>
  • Loading branch information
0xE0F and daltonbohning authored Nov 15, 2024
1 parent 92e30a3 commit 6064df1
Show file tree
Hide file tree
Showing 3 changed files with 10 additions and 10 deletions.
16 changes: 8 additions & 8 deletions src/client/pydaos/torch/Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,17 +12,17 @@ To implement map style dataset only two methods are required: `__len__()` and `_

During dataset creation the connection to container will be established and its namespace will be scanned to build
a list of files in container with their size. The number of items in that list will be used to implement `__len__()` method.
`__getitem__()` implementation consist of looking up the object by its absolute path and reading its content into the buffer
create on the python side.
`__getitem__()` implementation consists of looking up the object by its absolute path and reading its content into the buffer
created on the python side.

The `__getitems__()` method allow to request multiple samples at once making this a good case to use DAOS event queue to send and wait on batch items.
The `__getitems__()` method allows requesting multiple samples at once, making this a good case to use DAOS event queues to send and wait on batch items.

By default Dataset is single threaded (more like single process in python), `__getitem__()` and `__getitems__()` are regular blocking calls.
If multiprocessing is enabled, Dataset provides the `worker_init` method, which worker processes are calling upon their startup,
during this setup the global connection should be reused and the new event queue should be created for calling worker process.
By default `Dataset` is single threaded (more like single process in python), `__getitem__()` and `__getitems__()` are regular blocking calls.
If multiprocessing is enabled, `Dataset` provides the `worker_init` method, which worker processes are calling upon their startup.
During this setup the global connection should be reused and the new event queue should be created for calling worker processes.

There's no internal multithreading inside the shim module - it's driven on outside by `torch.utils.DataLoader`.
If DataLoader is configured to have 8 readers then 8 event queues are going to be created per each worker process so the performance of individual worker should not be affected by others.
If `DataLoader` is configured to have 8 readers then 8 event queues are going to be created per each worker process so the performance of individual worker should not be affected by others.


Implementation of `torch.utils.data.IterableDataset` requires to implement `__iter__()` protocol, which can be fully implemented on python side,
Expand All @@ -35,7 +35,7 @@ Configured and running DAOS agent on the node(s) and correctly set ACLs - the us



### Example of usage Map style Dataset
### Example usage of Map style Dataset

```python
import numpy as np
Expand Down
2 changes: 1 addition & 1 deletion src/client/pydaos/torch/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# (C) Copyright 2019-2024 Intel Corporation.
# (C) Copyright 2024 Intel Corporation.
# (C) Copyright 2024 Google LLC
# (C) Copyright 2024 Enakta Labs Ltd
#
Expand Down
2 changes: 1 addition & 1 deletion src/client/pydaos/torch/torch_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -316,7 +316,7 @@ def parallel_list(self, path=None,
readdir_batch_size=READDIR_BATCH_SIZE,
workers=PARALLEL_SCAN_WORKERS):
"""
Parallel list tires to leverage DAOS ability to read dir in parallel
Parallel list tries to leverage DAOS ability to read dir in parallel
by splitting across multiple engines.
To fully use this feature the container should be configured with directory object classes
Expand Down

0 comments on commit 6064df1

Please sign in to comment.