Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an example to demonstrate multithreaded read_parquet pipelines #16828

Merged

Conversation

mhaseeb123
Copy link
Member

@mhaseeb123 mhaseeb123 commented Sep 18, 2024

Description

Closes #16717. This PR adds a new example to read multiple parquet files using multiple threads.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@mhaseeb123 mhaseeb123 self-assigned this Sep 18, 2024
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Sep 18, 2024
@mhaseeb123 mhaseeb123 added the 2 - In Progress Currently a work in progress label Sep 18, 2024
@github-actions github-actions bot added the CMake CMake build issue label Sep 18, 2024
@mhaseeb123 mhaseeb123 added non-breaking Non-breaking change feature request New feature or request labels Sep 18, 2024
@mhaseeb123 mhaseeb123 changed the title Add an example to demostrate read/write parquet using multiple threads. Add an example to demonstrate read/write parquet using multiple threads. Sep 19, 2024
@mhaseeb123 mhaseeb123 changed the base branch from branch-24.10 to branch-24.12 September 23, 2024 21:41
@GregoryKimball
Copy link
Contributor

GregoryKimball commented Sep 26, 2024

Thanks @mhaseeb123 for putting this together. After a little hacking I was able to generate a nice pipelined profile. I removed the concats and the writes and only kept the first read. I also had to add an initialization for thread_count to prevent garbage thread values.

Here is nice pipelining
image

From this command:
/nfs/nsight-systems-2022.5.1/bin/nsys profile -t nvtx,cuda,osrt -f true --cuda-memory-usage=true --cuda-um-cpu-page-faults=true --cuda-um-gpu-page-faults=true --gpu-metrics-device=4 --output=/nfs/20240924_iothreads/prof2_4 --env-var KVIKIO_COMPAT_MODE=on,KVIKIO_NTHREADS=8 ./parquet_io_multithreaded /raid/tpch/gqe-dbgen-1/lineitem/lineitem_1.snappy.parquet,/raid/tpch/gqe-dbgen-1/part/part_1.snappy.parquet,/raid/tpch/gqe-dbgen-1/orders/orders_1.snappy.parquet,/raid/tpch/gqe-dbgen-1/lineitem/lineitem_1.snappy.parquet,/raid/tpch/gqe-dbgen-1/part/part_1.snappy.parquet,/raid/tpch/gqe-dbgen-1/orders/orders_1.snappy.parquet,/raid/tpch/gqe-dbgen-1/lineitem/lineitem_1.snappy.parquet,/raid/tpch/gqe-dbgen-1/part/part_1.snappy.parquet,/raid/tpch/gqe-dbgen-1/orders/orders_1.snappy.parquet,/raid/tpch/gqe-dbgen-1/lineitem/lineitem_1.snappy.parquet,/raid/tpch/gqe-dbgen-1/part/part_1.snappy.parquet,/raid/tpch/gqe-dbgen-1/orders/orders_1.snappy.parquet,/raid/tpch/gqe-dbgen-1/lineitem/lineitem_1.snappy.parquet,/raid/tpch/gqe-dbgen-1/part/part_1.snappy.parquet,/raid/tpch/gqe-dbgen-1/orders/orders_1.snappy.parquet,/raid/tpch/gqe-dbgen-1/lineitem/lineitem_1.snappy.parquet,/raid/tpch/gqe-dbgen-1/part/part_1.snappy.parquet,/raid/tpch/gqe-dbgen-1/orders/orders_1.snappy.parquet /raid/tmp DEFAULT SNAPPY 4 YES

Based on this command, you can see why I would like a "repetitions" CLI argument 😆

@GregoryKimball
Copy link
Contributor

GregoryKimball commented Sep 26, 2024

Thanks @mhaseeb123 for the continued development and discussions around this example.

  • I would also like to consider introducing an io_type CLI argument similar to the way the nvbenchmarks work. We could take the file information and copy it to pageable host buffers, pinned host buffers, or device buffers, based on the io_type requested by the runner.
  • Also I'm seeing poor pipelining in the first set of read_parquet calls, as each thread launches its IO at the same time, and we end up starting compute later than if each thread completed IO one at a time. I'll open a separate issue about this topic. ([FEA] Add synchronization for IO between read_parquet calls on different threads #16936)

@mhaseeb123
Copy link
Member Author

Based on this command, you can see why I would like a "repetitions" CLI argument 😆

OMG that command line is such a monstrosity. All done in the update though. Still working on the io_type

Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the final batch of nits :)

cpp/examples/parquet_io/io_source.cpp Outdated Show resolved Hide resolved
cpp/examples/parquet_io/io_source.hpp Show resolved Hide resolved
cpp/examples/parquet_io/common_utils.cpp Outdated Show resolved Hide resolved
cpp/examples/parquet_io/parquet_io_multithreaded.cpp Outdated Show resolved Hide resolved
cpp/examples/parquet_io/parquet_io_multithreaded.cpp Outdated Show resolved Hide resolved
cpp/examples/parquet_io/parquet_io_multithreaded.cpp Outdated Show resolved Hide resolved
cpp/examples/parquet_io/parquet_io_multithreaded.cpp Outdated Show resolved Hide resolved
Copy link

copy-pr-bot bot commented Oct 4, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@mhaseeb123 mhaseeb123 requested a review from vuule October 7, 2024 18:25
@mhaseeb123 mhaseeb123 added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Oct 7, 2024
@GregoryKimball
Copy link
Contributor

@lamarrr Would you please share your review?

Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI changes look fine to me.

@mhaseeb123
Copy link
Member Author

/merge

@mhaseeb123 mhaseeb123 added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Oct 11, 2024
@rapids-bot rapids-bot bot merged commit be1dd32 into rapidsai:branch-24.12 Oct 11, 2024
101 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge CMake CMake build issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

[FEA] Add multi-threaded Parquet read example
6 participants