Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative merge algorithm without the builder #78

Merged
merged 2 commits into from
Sep 13, 2024

Conversation

martindurant
Copy link
Member

No description provided.

@martindurant
Copy link
Member Author

@jpivarski , the code in akimbo.io includes, for now, both the previous builder version and this new version of the JOIN algorithm, for you to look at. While the builder is very dynamic and ought to be slower, it also features a growable buffer for the index data, whereas the new version is allocated as big as it can possibly be (it could be resized when finished). Is it possible to use a growable buffer here without the builder?

The merge function takes a trivial amount of time compared to the other things going on in the join function, so I am not sure if this is actually any faster at all. Am I missing something?

@jpivarski
Copy link
Collaborator

That _merge function is going to be Numba'd, right?

There is an ak.numba.GrowableBuffer that you can use instead of over-allocating and trimming. (Does trim really release the memory for other objects, or is it just a slice that holds a reference to the original allocation as its base?) This was part of a LayoutBuilder implementation in Numba (scikit-hep/awkward#2408) that never ended up being faster than ArrayBuilder in Numba, and we don't know why.

In principle, a numba.typed.List of numbers is the same kind of thing as a GrowableBuffer, and presumably this is better optimized than our GrowableBuffer. It could be because our GrowableBuffer passes around two array objects, which are separately reference-counted, while Numba's typed list would have only one reference count.

In both the Numba-LayoutBuilder project and yours, we're seeing the dynamic ArrayBuilder being faster than it's supposed to be. It's even accessed through external pointers (no possibility for LLVM to optimize), so I really don't know what's going on there. It shouldn't be too hard to beat, if there aren't other bottlenecks elsewhere, hiding the performance of this part.

@martindurant
Copy link
Member Author

Does trim really release the memory for other objects

Yes it does: it must own the memory and the memory must be contiguous. You would pass a flag asserting that no other reference points to the same array (although python references would be fine and updated in place).

@martindurant
Copy link
Member Author

we're seeing the dynamic ArrayBuilder being faster than it's supposed to be

do you think there's any point in this PR, then? At scale, the over-allocation would be costly, so I would need one of the growable implementations anyway.

@jpivarski
Copy link
Collaborator

I don't think there's anything wrong with the PR. In principle, one should be able to do better than ArrayBuilder. I've always taken it as some issue hiding in the implementation of ak.numba.LayoutBuilder, but your project is independent of that. (And I'm assuming that you'll use some growable buffer, eventually.)

As an alternative to the alternative, can the merge be implemented by creating "tags" and "index" arrays of integers using Numba and then pass them into ak.contents.UnionArray.simplified to mix them appropriately? The return value of this function is not necessarily a union array; it tries to homogenize types.

@martindurant
Copy link
Member Author

As an alternative to the alternative

Maybe next time! I think this is nice for now.

@martindurant martindurant merged commit 0b533d8 into intake:main Sep 13, 2024
12 checks passed
@martindurant martindurant deleted the merge_ind branch September 13, 2024 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants