[Optimization] Remove memory allocation from stack_access
implementation
#2758
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Checklist
Description
I have revised the
stack_access
implementation to fix a TODO in the snippet below from xbuilder.hpp:The allocation remains in the compiled time implementation as I doubt that it's a typical hot path through the library. I'm sure it can be optimized as well but probably not worth the effort.
The main change is I have moved from individual lookups with
xindex
to a lookup into flat memory space using an iterator provided by the underlying container. The iterator should correctly traverse into linear memory in either row or column major based on the iterator type. This change from allocatedxindex
objects to access through iterators results in better optimized code with inlining (On top of the cycles saved from not having to allocatesvector
). The following code results in nearly identical execution times for both cases (I've removed the instrumentation code for conciseness):Output:
When run on master the results are similar to:
The same fix can be applied to the concatenation access as well and I will add that in another PR.
Finally, C++17 would be nice as it has
std::apply
:)Thanks for your review!!!