-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend address stability detection for thrust::transform
#2404
base: main
Are you sure you want to change the base?
Conversation
* Introduces address stability detection and opt-in in libcu++ * Mark lambdas in Thrust BabelStream benchmark address oblivious Fixes: NVIDIA#2263
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments about more efficient implementation for the trait.
Should we also mark our reference wrappers as explicitly not supporting allows_copied_arguments
?
It iterator; | ||
|
||
_CCCL_HOST_DEVICE kernel_arg() {} // in case It is not default-constructible | ||
_CCCL_HOST_DEVICE kernel_arg() {} // needed in case It is not default-constructible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this explicitly construct aligned_ptr
?
// since we switch the active member of the union, we have to use placement new. This also uses the copy constructor | ||
// of It, which works in more cases than assignment (e.g. thrust::transform_iterator with non-copy-assignable functor, | ||
// e.g. in merge sort tests) | ||
new (&arg.iterator) It(it); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might want to use cuda::std::__construct_at
#ifndef _LIBCUDACXX___TYPE_TRAITS_ADDRESS_STABILITY_H | ||
#define _LIBCUDACXX___TYPE_TRAITS_ADDRESS_STABILITY_H |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not in cuda::std
so that should rather be:
#ifndef _LIBCUDACXX___TYPE_TRAITS_ADDRESS_STABILITY_H | |
#define _LIBCUDACXX___TYPE_TRAITS_ADDRESS_STABILITY_H | |
#ifndef _CUDA___FUNCTIONAL_ADDRESS_STABILITY_H | |
#define _CUDA___FUNCTIONAL_ADDRESS_STABILITY_H |
template <typename R, typename... Args> | ||
struct __all_parameters_by_value_fptr<R (*)(Args...)> | ||
: _CUDA_VSTD::conjunction<_CUDA_VSTD::_Not<_CUDA_VSTD::is_reference<Args>>...> | ||
{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be implemented more efficiently with
template <typename R, typename... Args> | |
struct __all_parameters_by_value_fptr<R (*)(Args...)> | |
: _CUDA_VSTD::conjunction<_CUDA_VSTD::_Not<_CUDA_VSTD::is_reference<Args>>...> | |
{}; | |
template <typename R, typename... Args> | |
struct __all_parameters_by_value_fptr<R (*)(Args...)> | |
: _CUDA_VSTD::__all_of<!_CCCL_TRAIT(is_reference, _Args)...> | |
{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that that is C++14 only so we could use the equivalent thing from sfinae_helpers
template <bool... _Preds>
struct __all_dummy;
template <bool... _Pred>
using __all = is_same<__all_dummy<_Pred...>, __all_dummy<((void) _Pred, true)...>>;
#include <cuda/__functional/address_stability.h> | ||
#include <cuda/functional> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In benchmaks we conventionally only include the public header
#include <cuda/__functional/address_stability.h> | |
#include <cuda/functional> | |
#include <cuda/functional> |
@@ -0,0 +1,198 @@ | |||
#include <cuda/__functional/address_stability.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include <cuda/__functional/address_stability.h> | |
#include <cuda/functional> |
// The following test cannot be compiled because of a bug in the conversion of thrust::tuple on MSVC 2017 | ||
#ifndef _CCCL_COMPILER_MSVC_2017 | ||
// we specialize zip_function for sum_five, but do nothing in the call operator so the test below would fail if the | ||
// zip_function is actually called (and not unwrapped) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please recheck now that we completely replaced that with cuda::std::tuple
This PR extendes the automatic detection of function objects for which arguments can savely be copied before passing them to the function object.
Fixes: #2403
Merge before:
thrust::transform
to usecub::DeviceTransform
#2263