Make vectors trivially relocatable. #337

sukeya · 2023-07-18T12:29:43Z

Hello,

I'm sorry for making the same issue three times.
I made Vec2, Vec3 and Vec4 trivially relocatable.
I didn't change the behavior of them.
Could you please merge this PR?

Signed-off-by: Yuya Asano <[email protected]>

lgritz · 2023-07-20T01:09:18Z

@AlexMWells Sorry to drag you into this, but I know that in the past you have gone super deep on how tweaks to these classes can help or inhibit vectorization in loops. Is there anything here that would concern you?

lgritz

These look fine to me, though I pinged Alex Wells for an opinion as well, I want to make sure that these changes aren't going to disrupt any SIMD vectorization that we depend on in OSL.

meshula

lgtm.

AlexMWells · 2023-07-20T16:56:36Z

Historically with vectorizing compilers, a data type can have its members turned into scalars with a "scalar replacement of aggregates" optimization pass. Once a data type has become scalars, its members can live in registers vs. being forced to standard data layout and being kept on the stack. This is important to minimize the movement between stack and registers when vectorizing as that is when an AOS/SOA transformation might be required and generate extra instructions. One thing that can disqualify SROA is passing the address of the structure to a function call, as that would require the standard data layout and the object exist on the stack vs registers.

Now the wrinkle is the "default" copy constructor and possibly other default operators is that a compiler could implement a memcpy(*this, &src, sizeof(T). And the taking the address of "src" could disqualify SROA and end up forcing it to exist on the stack in standard data layout.

Perhaps another optimization pass would convert the memcpy to assignments before SROA is applied, but those can often be specialized (just for 2 or 3 data members in a specific pattern). So defensive programming technique was to explicitly provide those constructors and operators with member-wise assignment/copies vs. letting the default possibly use memcpy/memset.

Now fast forward many years and std::is_trivially_copyable was introduced in c++11 and then compilers/libraries start adding it to memcpy and other places with warnings, etc.

In short I think you might want to keep both ways of having those default or not controllable by a define.

sukeya · 2023-07-21T12:30:35Z

Thank you for your advices.
I'll fix to keep both ways.

Signed-off-by: Yuya Asano <[email protected]>

lgritz · 2023-07-21T16:18:03Z

config/ImathConfig.h.in

@@ -174,4 +174,7 @@
 #    endif
 #endif

+// Whether the copy and move constructors of Vec2, Vec3 and Vec4 are default or not.
+#cmakedefine IMATH_VEC_USE_DEFAULT_CONSTRUCTOR


I think instead we want

Suggested change

#cmakedefine IMATH_VEC_USE_DEFAULT_CONSTRUCTOR

#ifndef IMATH_VEC_USE_DEFAULT_CONSTRUCTOR

#define IMATH_VEC_USE_DEFAULT_CONSTRUCTOR 1

#endif

Because it's easy to imagine that a particular downstream app needs to make sure it's the other way, and this would allow it to be overridden if the downstream modules defines the symbol as 0 before including any imath headers.

Then the various places where you have

#ifdef IMATH_VEC_USE_DEFAULT_CONSTRUCTOR

should be

#if IMATH_VEC_USE_DEFAULT_CONSTRUCTOR

fly in the ointment, if we go both ways, are we inviting an recurring support burden along these lines?

I can't make my build work with MegaDCC",

oh right, did you match IMATH_VEC_USE_DEFAULT_CONSTRUCTOR to what they picked?

but that's in conflict to our internal build

yeah but did you namespace?

we don't want to have to have two copies

The OPENEXR_DLL thing was sort of a harbinger, I'm worried about a matrix of macros people have to solve for compatibility

or maybe I'm misunderstanding completely, and it's all inlined and completely doesn't matter which you pick

I think because it's inlined it's all ok? There shouldn't be any link issues.

If that's the case, great. Then my only request is that we document why we have it both ways, even if it is a copy and paste of @AlexMWells thorough notes above. This is strongly in the realm of "gosh, this is rather esoteric, why is this a choice, and how do I make it?" I'd prefer the doc is inline in the CMakeLists.txt where you are faced with making a choice.

An example, well with an older compiler, but just wanted to show the behavior.
https://godbolt.org/z/rEY4Ef5GM
With #define VEC_ENABLE_DEFAULT_CONSTRUCTOR 1
The optimization report shows:

remark #15518: Structure assignment was serialized [ <source>(145,14) ] remark #15518: Structure assignment was serialized [ <source>(147,14) ] remark #15518: Structure assignment was serialized [ <source>(149,14) ] remark #15301: OpenMP SIMD LOOP WAS VECTORIZED remark #15450: unmasked unaligned unit stride loads: 11 remark #15451: unmasked unaligned unit stride stores: 4 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 367 remark #15477: vector cost: 125.870 remark #15478: estimated potential speedup: 2.490 remark #15488: --- end vector cost summary --- ... <source>(145,14):remark #34000: call to memcpy implemented inline with loads and stores with proven source

Structure assignment was serialized really kills performance, and we can see the compiler by "default" decided to generate a memcpy.

By providing the copy constructor and assignment operator explicitly we avoid the memcpy or memset (for initializations) and with #define VEC_ENABLE_DEFAULT_CONSTRUCTOR 0
The optimization report shows:

remark #15309: vectorization support: normalized vectorization overhead 0.352 remark #15301: OpenMP SIMD LOOP WAS VECTORIZED remark #15450: unmasked unaligned unit stride loads: 11 remark #15451: unmasked unaligned unit stride stores: 4 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 201 remark #15477: vector cost: 44.370 remark #15478: estimated potential speedup: 3.350

Now this is all very dependent on compiler and order of optimizations, and perhaps isn't as required on newer compilers. But this is just a simple example, real world kernels could have much larger state being kept on the stack and any nested aggregate that happens to use a memcpy or memset by default could prevent the SROA pass from fully succeeding.

Thus it would be nice for projects like OSL to choose to disable the use of default constructors. And if the supported compiler matrix has moved on to where it is not a problem then it could chose to enable them. Problem is examining optimization reports before and after is required.

So that sounds like explicitly defining copy and assignment, and not using default, we get the behavior that we want, which is that things optimize as we wish. "we" includes OSL. I can't think of a reason to provide this as one option, and default constructors and copy another option.

I am rereading this argument which suggests maybe we need the option,

#337 (comment)

but I'm not understanding a case to preserve the default option. Sorry if I'm being dense....

The reason why I made this PR was because I wanted to copy Vec3 sequences between host and device using thrust::async::copy. It requires Vec3 to be trivially relocatable, so default constructors and assignment operators for copy and move are needed for me.
I'm sorry for not telling it.

My apologies, I didn't mean to shut down your effort. Maybe I am misunderstanding? I thought that adding the explicit copy and move gives us the trivial relocatability you are searching for, and also satisfies the SROA concern that @AlexMWells raises? I thought we can solve both your need, and also the SIMD concern at the same time, without needing to have a special conditional. My intent wasn't to stop your effort, I was hoping we can find one solution that solves both problems.

At first, I apologize for my quick but rough decision.
I misunderstood I couldn't solve the assertion error, but I noticed that it was caused by ODR violation. Though I'm running CI in my forked repository, it will be OK.

I think that the trivial relocatability and the SROA are incompatible because the trivial relocatability requires Vec3 not to have user-provided copy constructor and assignment operator(even though Vec3::Vec3(const Vec3& v) : x(v.x), y(v.y), z(v.z) {}), but on the other hand, providing the copy constructor and assignment operator increases the performance as @AlexMWells showed earlier.
So, we have to switch default or not in some way.

Thank you :)

Signed-off-by: Yuya Asano <[email protected]>

This reverts commit abc7c5e. Signed-off-by: Yuya Asano <[email protected]>

Signed-off-by: Yuya Asano <[email protected]>

sukeya · 2023-07-22T04:51:27Z

I fixed codes following an advice, but an assertion error occured. One of the causes seems that return Vec2 (x / l, y / l) in Vec2::normalized behaves strangely. Unfortunately, I couldn't fix it.

Signed-off-by: Yuya Asano <[email protected]>

This reverts commit 5d4ed11. Signed-off-by: Yuya Asano <[email protected]>

Signed-off-by: Yuya Asano <[email protected]>

sukeya · 2023-07-23T02:53:39Z

I found that only changing copy constructor from user-defined to default caused an assertion error, so I give up making Vec3 trivially relocatable.
Thank you very much!

Signed-off-by: Yuya Asano <[email protected]>

lgritz · 2023-07-24T05:12:30Z

Here is the summary as I understand it:

Using =default makes it trivially moveable, and that might have a number of advantages for the vast majority of users.

But there are a few specialized situations where it could sabotage certain auto-vectorization cases. Those situations can be locally fixed if there's a way to include the header files in such a way that it doesn't use the =default versions in a module-local way (say, by defining a symbol before including the Imath header). OSL is one of these. If Imath itself does not allow this, we might have to do something even uglier like replicate the Imath headers inside OSL and modify them to allow it.

sukeya · 2023-07-24T06:25:48Z

I'm sorry, but what is OSL?

lgritz · 2023-07-24T16:07:13Z

Open Shading Language

sukeya · 2023-07-24T22:24:45Z

Thank you!

cary-ilm · 2023-07-28T00:06:14Z

We discussed this in today's technical steering committee meeting. We support this addition in principle, the ultimate objective seems sound, but we're concerned about adding additional complexity through another batch of #if's in the code without a thorough analysis of the benefits. As @AlexMWells points out, the actual performance impact is complicated to assess for all cases.

@sukeya, can you describe more about the motivation for the change? Was this motivated by a specific use case? Did you do a performance analysis that quantifies the impact?

We also discussed briefly whether the same objective could be achieved by adding the trivially_relocatable attribute to the class definition itself, rather than adding new class methods.

We are planning a v3.2 release in mid-August, which will be cut from the main branch, so we'd like to wait until after that before merging the PR. That will provide more time for evaluation before official release.

Thanks for the contribution and thanks for your patience.

sukeya · 2023-07-30T08:20:51Z

I see. I will do a performance analysis.

I talk about the motivation for the change. I have been writing a snow simulator which reads polygon meshes from an Alembic file exported by Blender. I think that the exported file generally stores a pair of a transform matrix (Imath::M44f) and a polygon mesh (each type of a point and a normal is std::vector<Imath::V3f>) at each time, so I have to calculate the transformed polygon mesh from them. However, my simulator will heavily use CPU, so I would like to assign this task to GPU asynchronously.

Thank you.

sukeya · 2023-07-30T14:48:30Z

Adding the trivially_relocatable attribute is good for me.

sukeya · 2023-08-23T03:24:39Z

I have done a performance analysis. The result is the following:

GPU
Test case	Avg time [ms]	Branch name in my forked repository
user defined	17.5	make_vector_trivially_relocatable_performance_test
default	12.0	defalut_ver
default and async	11.8	async_ver

CPU
Test case	Avg time [ms]	Branch name
user defined	2.4	tbb_ver
default	2.25	tbb_defalut_ver

The environment is the following:

OS	Ubuntu 22.04
CPU	Intel Core i7 8700
GPU	NVIDIA GeForce RTX 2060 (12GB)
Memory	DDR4-2666 32GB
Compiler	GCC 12.1.0

The test data is a pair of 1 million 3 dimension vector generated random and a 4 dimension matrix. Let p be a 3 dimension vector and M be a 4 dimension matrix, the test performs p * M for all vector.

The programs needs oneTBB(2021.5.0 or higher) and CUDA(12.2), and must be cloned recursively.

I didn't calculate the deviation of each test case, but felt that each sample in the test cases using CPU and default and async test case using GPU is in a range [0.95 * avg, 1.05 * avg].

lgritz · 2023-08-23T19:08:29Z

Can you clarify what exactly is the difference between "default" and "user defined"?

sukeya · 2023-08-23T22:52:24Z

I see.
"defalut" means that the following functions of vectors are declared as default:

move constructors and move assignment operators,
copy constructors and copy assignment operators.

"user defined" means that vectors use current implementation of said functions.

meshula · 2023-08-24T03:35:56Z

The performance difference on GPU is striking.

Quuxplusone · 2024-03-13T02:16:08Z

config/ImathConfig.h.in

+// Whether the copy and move constructors of Vec2, Vec3 and Vec4 are default or not.
+#ifndef IMATH_VEC_BE_TRIVIALLY_RELOCATABLE
+#     define IMATH_VEC_BE_TRIVIALLY_RELOCATABLE 0
+#endif


Author of P1144 here. :) I think it's great you're adopting the terminology — and orthogonally great that you're =default'ing all the things — but, looking at this PR, I think the focus on the word "relocatable" is actually a distraction. You should just say IMATH_VEC_RULE_OF_ZERO or something like that. After your changes, it is true that a Vec2<T> will be trivially relocatable if T is trivially relocatable; but it will also be trivially copyable if T is trivially copyable; trivially copy-assignable if T is trivially copy-assignable; and so on. There's nothing relocation-specific about this PR as far as I can tell. (Am I missing something?)

Followup: Why is this gated on a macro at all? The =default changes look like a strict improvement to me. If the non-defaulted implementations were in a .cpp file, so they were only compiled once, then there might be a reason to keep them; but nope, they're inline constexpr in the header! I think you should consider very hard whether there's any reason to ever set the macro to anything but 1. (...Is it that you need to have an opt-in way to avoid the ABI break that comes with Vec2 suddenly being passed in registers?)

FYI @sukeya @cary-ilm, there is right now this week a PR to add fully-P1144-compliant __is_trivially_relocatable(T) to Clang (fixing a long-standing false-positive bug on Windows and also making it respect user-defined special members including operator= so as to avoid false positives on all platforms). If you're interested (professionally, financially, or just plain interested) in P1144 trivial relocatability, I'd really appreciate it if you'd take a look at llvm/llvm-project#84621 and weigh in with your thoughts. (Full disclosure: my impression is that the false-positives being fixed don't actually matter to @sukeya's use-case of thrust::async::copy. But I'd still really appreciate your taking a look, if you have interest in moving P1144 forward at all. It [and I mean that PR, specifically!] needs to see some customer demand in order to progress.)

Thank you for your advice! I commited your patches.
I'm busy now, but I'll see that PR when I have a time.

Thank you for your advice! I committed your patches.
I'm busy now, but I'll see that PR when I have a time.

@sukeya Thank you! I'd really appreciate it if you leave a 👍 reaction now, so you'll receive notifications that remind you to add a fuller comment when you have the time.

src/Imath/ImathVec.h

meshula · 2024-03-13T05:25:02Z

@lgritz Since you & I debated it earlier in this thread, a half year ago, and pursuant to @Quuxplusone's comments, I'm still of the opinion that we shouldn't guard default with IMATH_VEC_USE_DEFAULT_CONSTRUCTOR. Since we last discussed this PR we've mostly collectively moved forward to C++17, and I just point that out in a spirit of moving forward with language features, and not keeping the old business around.

Co-authored-by: Arthur O'Dwyer <[email protected]> Signed-off-by: Yuya Asano <[email protected]>

rm IMATH_CONSTEXPR14 and IMATH_NOEXCEPT from default constructor and assignment. Co-authored-by: Arthur O'Dwyer <[email protected]> Signed-off-by: Yuya Asano <[email protected]>

rm IMATH_NOEXCEPT from default destructor. Co-authored-by: Arthur O'Dwyer <[email protected]> Signed-off-by: Yuya Asano <[email protected]>

lgritz · 2024-03-13T23:13:51Z

@meshula Do we not still have the problems that @AlexMWells detailed, where the = default might derail certain vectorization that we rely on? Making sure we could selectively have it the way that's better for vectorization was the point of suggesting the guard in the first place. That's unrelated to C++17.

cary-ilm · 2024-03-14T15:56:10Z

The website check failed because of a change in required versions at readthedocs. This is fixed in the main branch. @sukeya , could you possibly merge the main branch with yours to pick up those fixes?

cary-ilm · 2024-03-14T20:42:32Z

@Quuxplusone and @sukeya, I need some more time to refresh my understanding of the proposal, but I think the consensus is it's headed in the right direction.

We'd ideally like to have an analysis like @AlexMWells describes above. Some of the current constructs were set in place after such a careful expert analysis, although that was a long time ago with different compilers. It's definitely worth modernizing, but also good to confirm there aren't unintended consequences.

Quuxplusone · 2024-03-15T13:35:52Z

@Quuxplusone and @sukeya, I need some more time to refresh my understanding of the proposal, but I think the consensus is it's headed in the right direction.

We'd ideally like to have an analysis like @AlexMWells describes above. Some of the current constructs were set in place after such a careful expert analysis, although that was a long time ago with different compilers. It's definitely worth modernizing, but also good to confirm there aren't unintended consequences.

FWIW, I believe I see now the scenario that @AlexMWells is worried about: Imagine a compiler that does escape analysis after inlining, and also optimizes a trivial constructor into memcpy, and also its escape analysis does not understand the semantics of memcpy (i.e. that its pointer arguments do not escape). Then we'd have this: https://godbolt.org/z/YjKcd1Pf1 Notice that Clang optimizes the return; but if you change memcpy to not_memcpy, it stops optimizing. This indicates that Clang's escape analysis does understand that memcpy is special and doesn't escape its arguments. Of the compilers on Godbolt, it looks to me like GCC 4.1.2 doesn't understand memcpy but GCC 4.4.7+ do. All Godbolt-available versions of Clang (3+) and MSVC (19.14+) understand memcpy.

I don't understand precisely how this kind of thing would interact with vectorization, but based on these observations I'd be surprised if memcpy interacted poorly with anything (in a recent enough compiler).

I know another surprising pitfall where if a library skips running a trivial (pseudo-)destructor, lifetime analysis might not realize the object is dead; but my test case for that got fixed in GCC 11, and only ever happened on GCC AFAIK anyway. I bring it up merely as another case where making something more trivial can cause worse performance — but I'm quite confident that I wouldn't let that case stop me from making anything more trivial.

Anyway, my main contribution to this PR is to say that "trivially relocatable" should just be "trivial" everywhere it appears in this PR; and to invite people interested in relocatability to comment in support of the Clang implementation llvm/llvm-project#84621 . :)

…elocatable

Make vectors trivially relocatable.

69b4559

Signed-off-by: Yuya Asano <[email protected]>

lgritz approved these changes Jul 20, 2023

View reviewed changes

meshula reviewed Jul 20, 2023

View reviewed changes

Keep both ways of having default ctor or not.

56f83c3

Signed-off-by: Yuya Asano <[email protected]>

lgritz reviewed Jul 21, 2023

View reviewed changes

sukeya added 4 commits July 22, 2023 01:32

Fix using default ctor whatever value the flag is.

abc7c5e

Signed-off-by: Yuya Asano <[email protected]>

Revert "Fix using default ctor whatever value the flag is."

04950f4

This reverts commit abc7c5e. Signed-off-by: Yuya Asano <[email protected]>

Fix using a build flag to use a macro.

9db4019

Signed-off-by: Yuya Asano <[email protected]>

[wip] add a test for ImathVec when using default.

c49dc99

Signed-off-by: Yuya Asano <[email protected]>

sukeya added 3 commits July 22, 2023 13:55

Remove Imath::Config from ImathTestVecUseDefaultConstructor.

5d4ed11

Signed-off-by: Yuya Asano <[email protected]>

Revert "Remove Imath::Config from ImathTestVecUseDefaultConstructor."

63478fc

This reverts commit 5d4ed11. Signed-off-by: Yuya Asano <[email protected]>

Remove Imath::Config from ImathTestVecUseDefaultConstructor.

536b430

Signed-off-by: Yuya Asano <[email protected]>

sukeya closed this Jul 23, 2023

Fix assertion errors and clean up.

961ac87

Signed-off-by: Yuya Asano <[email protected]>

sukeya reopened this Jul 24, 2023

Quuxplusone reviewed Mar 13, 2024

View reviewed changes

sukeya and others added 3 commits March 13, 2024 16:49

Update src/Imath/ImathVec.h

03f055f

Co-authored-by: Arthur O'Dwyer <[email protected]> Signed-off-by: Yuya Asano <[email protected]>

Updata src/Imath/ImathVec.h

64674d1

rm IMATH_CONSTEXPR14 and IMATH_NOEXCEPT from default constructor and assignment. Co-authored-by: Arthur O'Dwyer <[email protected]> Signed-off-by: Yuya Asano <[email protected]>

Update src/Imath/ImathVec.h

7ad5437

rm IMATH_NOEXCEPT from default destructor. Co-authored-by: Arthur O'Dwyer <[email protected]> Signed-off-by: Yuya Asano <[email protected]>

Merge remote-tracking branch 'origin/main' into make_vecs_trivially_r…

386faf1

…elocatable

-#cmakedefine IMATH_VEC_USE_DEFAULT_CONSTRUCTOR
+#ifndef IMATH_VEC_USE_DEFAULT_CONSTRUCTOR
+#define IMATH_VEC_USE_DEFAULT_CONSTRUCTOR 1
+#endif

Make vectors trivially relocatable. #337

Are you sure you want to change the base?

Make vectors trivially relocatable. #337

Conversation

sukeya commented Jul 18, 2023

lgritz commented Jul 20, 2023

lgritz left a comment

Choose a reason for hiding this comment

meshula left a comment

Choose a reason for hiding this comment

AlexMWells commented Jul 20, 2023

sukeya commented Jul 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sukeya Jul 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sukeya Jul 24, 2023 • edited Loading

Choose a reason for hiding this comment

sukeya commented Jul 22, 2023 • edited Loading

sukeya commented Jul 23, 2023 • edited Loading

lgritz commented Jul 24, 2023

sukeya commented Jul 24, 2023

lgritz commented Jul 24, 2023

sukeya commented Jul 24, 2023

cary-ilm commented Jul 28, 2023

sukeya commented Jul 30, 2023 • edited Loading

sukeya commented Jul 30, 2023 • edited Loading

sukeya commented Aug 23, 2023 • edited Loading

lgritz commented Aug 23, 2023

sukeya commented Aug 23, 2023 • edited Loading

meshula commented Aug 24, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

meshula commented Mar 13, 2024

lgritz commented Mar 13, 2024

cary-ilm commented Mar 14, 2024

cary-ilm commented Mar 14, 2024

Quuxplusone commented Mar 15, 2024

sukeya Jul 23, 2023 •

edited

Loading

sukeya Jul 24, 2023 •

edited

Loading

sukeya commented Jul 22, 2023 •

edited

Loading

sukeya commented Jul 23, 2023 •

edited

Loading

sukeya commented Jul 30, 2023 •

edited

Loading

sukeya commented Jul 30, 2023 •

edited

Loading

sukeya commented Aug 23, 2023 •

edited

Loading

sukeya commented Aug 23, 2023 •

edited

Loading