-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MVP of RDF/Turtle canonization/file formatting for generated dictionaries #46
Comments
Changes made here at EticaAI/lexicographi-sine-finibus (which is used by EticaAI/MDCIII-boostrapper, the automated agent). And, oh my. This commit triggering an massive update at all repos at @MDCIII! Around this time, starts the batch of 1/10 each hour. Maybe eventually could make sense we self-test the tools for the way the files are formated (which would pass the tests for validation, but not for changes on GitHub. About the changesI still looking on how rdflib proposed "longturtle" actually is doing its formating (the main reference is this email thread https://groups.google.com/g/rdflib-dev/c/EUW2fawv4mw). But at the moment, the changes seems reasonable. It is slower than rapper (sudo apt install raptor2-utils), but beyond being written in python (not compiled language) the Some points:
|
As we're moving to prepare more data to be shared, the nature of RDF triples may be easier to compare than when same data is on 250+ column CSVs (and one change update the entire row), but several tools can have variations on the way white space, line breaks and etc are handled, so we need to think about this to reduce noise.
The https://json-ld.github.io/rdf-dataset-canonicalization/spec/ and sveral of their mentioned works or papers which discuss this in deep. Some of then even try ideas as far as make digital signatures to assert one or a group of RDF tripples would really come from a source, but this is not as scope now, not only because lack of tooling, but we really, need to fix first the file diffs.
The MVP
The idea here (before we start generating very, very large RDF files which naturally will evolve over time) is create some tool or documentation on how to make some conventions about the turtle outputs in such way that every generated files uses it.
Eventually this could be improved, but for now if we do not do this, the repositories which receive updates would increase in size for something which could have simple solution sooner.
The text was updated successfully, but these errors were encountered: