Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using simdjson as an SAX tokenizer #64

Open
michaeleisel opened this issue Jul 23, 2020 · 1 comment
Open

Using simdjson as an SAX tokenizer #64

michaeleisel opened this issue Jul 23, 2020 · 1 comment

Comments

@michaeleisel
Copy link
Contributor

simdjson seems to be the gold standard in terms of JSON-parsing performance. It's always being updated with state-of-the-art algorithms for parsing, makes excellent use of intrinsics, and supports both arm and x86_64. It's also in use by many different organizations and has extensive testing via fuzzing etc. . I don't know what the performance needs are for JSON parsing here at Spotify, but if there's any desire for more speed, simdjson would be a great choice. It could be used as an SAX tokenizer, or simply forked to have spotify-json's high-level API built on top of it.

@punchfox
Copy link
Contributor

It's an interesting direction to explore. In terms of performance, we do alright on x86 platforms, but we have no optimizations for ARM platforms, which turns out to be the majority of our uses. It would be interesting to see if we could use some or all of simdjson in our parser. I don't know if the SAX parses easily slots into our code, but just replacing the string and number parsers with the ones from simdjson might be an easy performance win, and would allow us to remove some of our own code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants