Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] [stdlib] [proposal] Have all fn __getitem__(self, span: Slice) -> Self return Iterator[Self] instead #3653

Open
1 task done
martinvuyk opened this issue Oct 12, 2024 · 0 comments
Labels
enhancement New feature or request mojo-repo Tag all issues with this label

Comments

@martinvuyk
Copy link
Contributor

martinvuyk commented Oct 12, 2024

Review Mojo's priorities

What is your request?

Once we have Iterators designed or right now using each type's specialized iterator, change all fn __getitem__(self, span: Slice) -> Self to return the iterator and add constructors that take that iterator so that implicit conversion is possible.

What is your motivation for this change?

Te idea came to me while reading PR #3650 which allocates a buffer to return a negatively stepped slice over a readonly view of data that the Span is built upon which doesn't make much sense but there is really no alternative since the function should return an instance of Self.

Iterating with a particular step and in any particular direction is much simpler and doable without copying buffers. There is also the added benefit that when people do:

var string = String("some string")
print(string[4:]) # no copy since it would just be written to the file stream
var amount_of_s = 0
for s in string[4:]: # no allocation needed here
    if s == "s":
        amount_of_s += 1

when building an instance of the type with that iterator it would only allocate when the type is an owning type or when step != 1 for non-owning types

var string = String("some string")
var s2 = string[4:] # no copy
var s3 = String(s2) # copies all values, since abs(step) == 1 copies directly
var s4 = StringSlice(s2) # no copy
# can use s4 methods which are mostly the same as the String API
# but without paying the cost of allocation
var s5 = String(string[4::2]) # copies all values by iterating
var s6 = StringSlice(string[4::-1]) # copies by iterating (because of unicode)

var items = List[UInt8](1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
var i1 = items[1::2] # iterator over even numbers
var i2 = List[UInt8](i1) # should use SIMD strided load
var i3 = List[UInt8](items[1::-1]) # should reverse with SIMD
var i4 = List[UInt8](items[1::-2]) # SIMD strided load + reverse

This will also (I think) benefit performance in the case of String slicing which currently does not do it by unicode codepoints but will in the future, which needs to count all previous codepoints and then from then count until end - start of the slice is reached, allocate a buffer for the new string + new null terminator, and return. This iterator approach would mean just counting until the beginning of the slice and returning an iterator with the given end - start length and the given step.

Any other details?

The only downside this would have is that when wanting to do stuff with the type after slicing is that the API from the iterator is nonexistent, so you'd need to build an instance explicitly.

@martinvuyk martinvuyk added enhancement New feature or request mojo-repo Tag all issues with this label labels Oct 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request mojo-repo Tag all issues with this label
Projects
None yet
Development

No branches or pull requests

1 participant