Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intern property and class names, use UstrMap for properties #462

Merged
merged 15 commits into from
Oct 29, 2024

Conversation

kennethloeffler
Copy link
Member

@kennethloeffler kennethloeffler commented Oct 28, 2024

This PR closes #112 and (partially?) #457 by adding ustr as a dependency, using interned strings for property and class names, and using UstrMap for property maps.

On my M1 MacBook, this improves rbx_binary's "Serialize 10,000 Parts" and "Deserialize 10,000 Parts" benchmarks by ~46% and ~49%, respectively, and reduces the number of heap allocations by ~84% (but I only profiled allocations when deserializing, maybe allocations during serialization is something a reviewer can check).

This change is breaking because it changes many public method signatures on InstanceBuilder, and changes the type of Instance.properties (which is public) from HashMap<String, Variant> to UstrMap<Variant>.

For convenience, I had rbx_dom_weak re-export some ustr members. Please let me know if these should be more or different!

Revert "Add property name interning"

This reverts commit e82c3dbc8dc0269c399d69dbc2a7a9d0d7ff554b.
Copy link
Member

@Dekkonot Dekkonot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also use Ustr in rbx_xml for SharedString and Ref rewrites. It'll be a small change but I expect it'll add up with Ref rewrites in particular.

Changes look good to me overall though. Thanks for taking this on.

rbx_binary/src/serializer/state.rs Outdated Show resolved Hide resolved
rbx_dom_weak/src/instance.rs Show resolved Hide resolved
rbx_xml/src/lib.rs Outdated Show resolved Hide resolved
rbx_binary/src/deserializer/state.rs Outdated Show resolved Hide resolved
@Dekkonot
Copy link
Member

UstrMap is lacking a lot of convenience features such as ::new and ::with_capacity. Looking at the source, I'm not sure why this is the case because it's just a wrapper around std::collections::HashMap.

Any ideas?

@Dekkonot
Copy link
Member

Okay I have an answer not very much time later.

Turns out that "being generic over the hasher" is a bit of a misnomer since some of the functionality with hashmaps in the standard library literally isn't. This is for a good reason, but I hate it.

If you are curious: the type of HashMap is HashMap<K, V, S> where S is the hasher. If e.g. HashMap::new() were generic across S, it would need to be inferred when using it. Since ideally people don't need to think about this, it just doesn't.

The consequence is that HashMap::new simply doesn't exist for UStrMap though, since it's actually an aliased HashMap<Ustr, V, BuiltInHasher>. Neither does From<[(K, V); N]> which is something of a bummer since it's a good convenience method.

We may consider exporting a helper trait to implement this stuff for people. It's particularly annoying to lose with_capacity.

@kennethloeffler
Copy link
Member Author

UstrMap is lacking a lot of convenience features such as ::new and ::with_capacity. Looking at the source, I'm not sure why this is the case because it's just a wrapper around std::collections::HashMap.

The consequence is that HashMap::new simply doesn't exist for UStrMap though, since it's actually an aliased HashMap<Ustr, V, BuiltInHasher>. Neither does From<[(K, V); N]> which is something of a bummer since it's a good convenience method.

We may consider exporting a helper trait to implement this stuff for people. It's particularly annoying to lose with_capacity.

Yeah, it isn't perfect. We should determine the feasibility of these and try contributing some patches upstream to ustr. In the meantime, do you have any ideas for the helper traits? Not too sure what to do there

@kennethloeffler
Copy link
Member Author

We'll need to decide what to do for change logs too, I'd appreciate any input on that

@Dekkonot
Copy link
Member

Yeah, it isn't perfect. We should determine the feasibility of these and try contributing some patches upstream to ustr. In the meantime, do you have any ideas for the helper traits? Not too sure what to do there

I'm not sure how feasible it actually is to contribute a patch for this upstream; they'd probably need to just implement the helper traits themselves, which feels like it might be considered bloat since it's technically not necessary.

A helper trait I have in mind is something like this:

pub trait UstrMapExt {
    fn new() -> Self;
    fn with_capacity(capacity: usize) -> Self;
}

impl<V> UstrMapExt for UstrMap<V> {
    fn new() -> Self {
        UstrMap::default()
    }

    fn with_capacity(capacity: usize) -> Self {
        UstrMap::with_capacity_and_hasher(capacity, Default::default())
    }
}

For new it's mostly redundant but with_capacity_and_hasher is kind of a mouthful, which is why with_capacity exists to begin with.

I'm not sure what we can do about From<[(Ustr, V); N]> though. It might not matter, since FromIter is implemented for HashMap<K, V, S>. If we write it down, it might not be too bad?

@Dekkonot
Copy link
Member

We'll need to decide what to do for change logs too, I'd appreciate any input on that

I'm thinking we need to include 3 things in the changelog:

  • Breaking change and why
  • What interfaces changed
  • Mention Ustr and how to use it properly

Otherwise, I don't think there's really that much to get into. Everything should work the same, it's just that the public-facing API changed.

@kennethloeffler
Copy link
Member Author

I think we're all ready to go here - if you have no other objections, let's merge, and then I'll be able to submit a few other performance related changes

Copy link
Member

@Dekkonot Dekkonot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Let's roll.

@kennethloeffler kennethloeffler merged commit 942531f into rojo-rbx:master Oct 29, 2024
3 checks passed
@kennethloeffler kennethloeffler deleted the the-internering branch October 29, 2024 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Interned strings for property names
2 participants