Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use namespace prefixes in output / preserve input prefixes #6

Open
clange opened this issue Jul 21, 2016 · 0 comments
Open

Use namespace prefixes in output / preserve input prefixes #6

clange opened this issue Jul 21, 2016 · 0 comments

Comments

@clange
Copy link
Member

clange commented Jul 21, 2016

In any output format that supports prefixes as a means of abbreviation (e.g. Turtle or RDF/XML), Krextor should be able to make use of such prefixes: either the prefixes defined by the extraction module, or the prefixes defined by the input (if any), or a combination of both.

This is not supported so far, as prefixes do not affect the semantics; they are merely convenient for human authors. But, on the other hand, if the RDF extracted by Krextor should ever be used, e.g., in an RDF ''editor'', it would be valuable to know the prefixes.

Here's why this is hard to implement:

  1. The krextor:* templates modes/templates/functions actually only take URIs as input; they don't know what namespaces are. (This is BTW compliant with the RDF data model; namespace prefixes are rather syntactic sugar of some, but not all RDF serializations.)
  2. Krextor generates some of the more advanced RDF serializations (such as RDF/XML and Turtle, which group triples by common subject) by first creating, internally, a simple XML-based encoding called RXR, and then post-processing that. RXR only knows URIs, but no namespace prefixes. (The alternative format TriX would actually support prefixes.)
  3. Of course namespace prefixes are convenient for human users. Therefore it is good practice to use them when implementing extraction modules. However, the way they are encoded in extraction modules is even just XML syntactic sugar: &prefix; is an XML entity, which is resolved by the XML parser.
  4. For some combinations of input format plus output format there is not even a general solution. If you have a hard-coded prefix→URI mapping in your extraction module, this could in principle be looped through to the output module. For some input formats, it's not the extraction module that defines namespaces (i.e. mapping the whole input schema of the input language to one output ontology), but the input documents themselves declare namespaces. In RDFa input, e.g., the namespaces are even locally scoped. However, when creating, e.g., Turtle output of that, you need global namespace prefixes (if you want namespace prefixes at all), so in the end you need to make an
    arbitrary choice, or to generate artificial ones.

Let me very briefly outline some possibilities for enabling the desired
functionality:

  • Very pragmatic: post-process your RDF/XML output by non-Krextor means (e.g. another XSLT, or regular expressions), and wait for this feature to eventually become available in Krextor.
  • Semi-pragmatic: somehow hack it into the output module you are interested in. One way of doing that is making the respective output module call a certain template by name. In the output module you provide an empty implementation of that template, but in your extraction module you implement that template. That template could return a fixed prefix→URI mapping (if your namespaces are that simple), which the output module would then output in the respective RDF serialization.

This issue was restored/updated/extended from http://trac.kwarc.info/krextor/ticket/59.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant