Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General performance improvements #14

Closed
wants to merge 11 commits into from

Conversation

freya022
Copy link
Contributor

@freya022 freya022 commented Nov 9, 2023

Getting an emoji by its alias is now a lot faster, mainly due to avoiding an O(N) loop over all the emojis in the worst case (i.e., when the alias is invalid)

Granted this is only a 5 second warmup/iteration, on 5 warmups/iterations, on one fork:

Old code:
image

New code:
image

The alias maps are lazily created, as I believe most people would either not use it, or stick to one alias group.

Duplicate refactoring

Most of the code finding unicode emojis is duplicated in each method, each with small behavior differences, I moved the parsing logic away in a single method, calling an interface on certain points

Future improvements

Some parts of the library use List#contains when using Set#contains would be faster in most cases, I saw that Java 8 does not provide an easy, nor performant, way to transform an array into a Set.

Since Java 8 is already EOL since March 2022, I would strongly consider updating JEmoji to Java 17 (which also aligns with Javacord)

This would enable usages of Set.of (replacing Arrays.asList) and Set.copyOf (replacing constructs such as Collections.unmodifiableList(new ArrayList<>(aliases)))

This would also change List<Alias> into Set<Alias> in Emoji

Comment on lines +516 to +517
final int[] textCodePointsArray = stringToCodePoints(text);
final long textCodePointsLength = textCodePointsArray.length;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hot spot, there's a way to get the code point from the string, which would save us from getting the entire array (except in worst case of course), but i couldn't figure out a way to get the correct index

Might be interesting though: https://stackoverflow.com/a/1527891 and String#offsetByCodePoints

@felldo
Copy link
Owner

felldo commented Nov 11, 2023

I really appreciate your work on this. Do you mind creating separate PR's for every change you do?
For example the jitpack changes, code refactoring (maybe even more, haven't really had a look on everything yet) and from your other PR changing the project name can all be their own PR.
This would allow me to individually merge them and even more important, test their perofrmance effect on this library one by one without other changes maybe blurrying the impact of other changes.

@felldo
Copy link
Owner

felldo commented Nov 11, 2023

Since Java 8 is already EOL since March 2022, I would strongly consider updating JEmoji to Java 17

I'm all in for upgrading, but the corporate world is how it is and I want to offer a replacement for the many unmaintained libraries. The ideal solution is to get the library into a solid state, and then if possible swap the EmojiManager (more recent java version) with the old java 8 version if they are running on a lower version that does not support the current version. The file would not get any updates but it still exists if someone is still running on 8. And there are still quite some applications that are running on 8

This reverts commit 2612965.
@freya022
Copy link
Contributor Author

Replacing with smaller PRs

@freya022 freya022 closed this Nov 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants