Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming to validate very large JSON files #229

Open
DavidFarago opened this issue Jun 15, 2021 · 3 comments
Open

Streaming to validate very large JSON files #229

DavidFarago opened this issue Jun 15, 2021 · 3 comments

Comments

@DavidFarago
Copy link

Is there (or will there be) an option to validate a very large JSON file (up to 5GB) in chunks, e.g. via streaming, so that the whole JSON file never has to be held in memory?

This would be awesome, since I haven't found any other JSON schema validator in Python being able to do this. For other languages, there is e.g. https://github.com/worldturner/medeia-validator.

@Stranger6667
Copy link
Owner

In principle, I think it is possible, as serde supports deserialization without buffering + the file could be accessed through mmap. Though I am not sure how much effort it will require, but I'd be happy to have support for this feature here :)

@DavidFarago
Copy link
Author

Very cool, and thanks for the fast reply.

Can you give a first guess about when it might be available? I would love to use jsonschema-rs for our microservice, but since we need the service for very large JSON files within 2 weeks, I wonder if I have to port it to a JVM language to be able to use https://github.com/worldturner/medeia-validator...

@Stranger6667
Copy link
Owner

My guess will be in a few months - at the moment, I don't have the bandwidth to work on this, unfortunately :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants