Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend address stability detection for thrust::transform #2404

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

bernhardmgruber
Copy link
Contributor

This PR extendes the automatic detection of function objects for which arguments can savely be copied before passing them to the function object.

Fixes: #2403

Merge before:

@bernhardmgruber bernhardmgruber added the thrust For all items related to Thrust. label Sep 11, 2024
Copy link
Collaborator

@miscco miscco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments about more efficient implementation for the trait.

Should we also mark our reference wrappers as explicitly not supporting allows_copied_arguments?

It iterator;

_CCCL_HOST_DEVICE kernel_arg() {} // in case It is not default-constructible
_CCCL_HOST_DEVICE kernel_arg() {} // needed in case It is not default-constructible
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this explicitly construct aligned_ptr?

// since we switch the active member of the union, we have to use placement new. This also uses the copy constructor
// of It, which works in more cases than assignment (e.g. thrust::transform_iterator with non-copy-assignable functor,
// e.g. in merge sort tests)
new (&arg.iterator) It(it);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might want to use cuda::std::__construct_at

Comment on lines +10 to +11
#ifndef _LIBCUDACXX___TYPE_TRAITS_ADDRESS_STABILITY_H
#define _LIBCUDACXX___TYPE_TRAITS_ADDRESS_STABILITY_H
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not in cuda::std so that should rather be:

Suggested change
#ifndef _LIBCUDACXX___TYPE_TRAITS_ADDRESS_STABILITY_H
#define _LIBCUDACXX___TYPE_TRAITS_ADDRESS_STABILITY_H
#ifndef _CUDA___FUNCTIONAL_ADDRESS_STABILITY_H
#define _CUDA___FUNCTIONAL_ADDRESS_STABILITY_H

Comment on lines +42 to +45
template <typename R, typename... Args>
struct __all_parameters_by_value_fptr<R (*)(Args...)>
: _CUDA_VSTD::conjunction<_CUDA_VSTD::_Not<_CUDA_VSTD::is_reference<Args>>...>
{};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be implemented more efficiently with

Suggested change
template <typename R, typename... Args>
struct __all_parameters_by_value_fptr<R (*)(Args...)>
: _CUDA_VSTD::conjunction<_CUDA_VSTD::_Not<_CUDA_VSTD::is_reference<Args>>...>
{};
template <typename R, typename... Args>
struct __all_parameters_by_value_fptr<R (*)(Args...)>
: _CUDA_VSTD::__all_of<!_CCCL_TRAIT(is_reference, _Args)...>
{};

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that that is C++14 only so we could use the equivalent thing from sfinae_helpers

template <bool... _Preds>
struct __all_dummy;

template <bool... _Pred>
using __all = is_same<__all_dummy<_Pred...>, __all_dummy<((void) _Pred, true)...>>;

Comment on lines +36 to +37
#include <cuda/__functional/address_stability.h>
#include <cuda/functional>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In benchmaks we conventionally only include the public header

Suggested change
#include <cuda/__functional/address_stability.h>
#include <cuda/functional>
#include <cuda/functional>

@@ -0,0 +1,198 @@
#include <cuda/__functional/address_stability.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#include <cuda/__functional/address_stability.h>
#include <cuda/functional>

Comment on lines +426 to +429
// The following test cannot be compiled because of a bug in the conversion of thrust::tuple on MSVC 2017
#ifndef _CCCL_COMPILER_MSVC_2017
// we specialize zip_function for sum_five, but do nothing in the call operator so the test below would fail if the
// zip_function is actually called (and not unwrapped)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please recheck now that we completely replaced that with cuda::std::tuple

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
thrust For all items related to Thrust.
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

Automatic address stability detection
2 participants