Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stdlib] Move StringRef find() implementation to Span #3548

Open
wants to merge 23 commits into
base: nightly
Choose a base branch
from

Conversation

martinvuyk
Copy link
Contributor

@martinvuyk martinvuyk commented Sep 25, 2024

Move StringRef find() implementation to Span

Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
@martinvuyk martinvuyk marked this pull request as ready for review September 25, 2024 03:21
@martinvuyk martinvuyk requested a review from a team as a code owner September 25, 2024 03:21
@martinvuyk martinvuyk changed the title [stdlib] Move StrRef find() implementation to Span [stdlib] Move StringRef find() implementation to Span Sep 25, 2024
@@ -335,3 +338,194 @@ struct Span[
return Span[T, _lit_mut_cast[lifetime, False].result](
unsafe_ptr=self._data, len=self._len
)

fn find[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question It seems a bit weird from an API design perspective to have this "find" function in span, can you help me understand why we'd want this? It feels more coupled to string algorithms, and Span isn't just for a view over string data.

Copy link
Contributor Author

@martinvuyk martinvuyk Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be used if you want to find occurrences of a specific scalar value sequence. My main thought was that we'll lose the ability to do that once StringSlice.find() is fixed to use unicode codepoints and it introduces quite some overhead for algorithms that are faster using raw bytes. We could also make this private WDYT?

Also, once PR's #3577 DType.get_dtype() gets merged, List.index() can also delegate to this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoeLoser gentle ping and also to add something, since it seems like we are moving toward unifying Python's bytes into Span[UInt8]in PR #3636, request #3634 and many other places. Python's bytes.find is actually a method, and many other things that we put in StringSlice and StringRef are actually operations from bytes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great change for overall performance when manipulating slices. It opens a lot of possibilities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants