Use namespace prefixes in output / preserve input prefixes #6

clange · 2016-07-21T06:25:08Z

In any output format that supports prefixes as a means of abbreviation (e.g. Turtle or RDF/XML), Krextor should be able to make use of such prefixes: either the prefixes defined by the extraction module, or the prefixes defined by the input (if any), or a combination of both.

This is not supported so far, as prefixes do not affect the semantics; they are merely convenient for human authors. But, on the other hand, if the RDF extracted by Krextor should ever be used, e.g., in an RDF ''editor'', it would be valuable to know the prefixes.

Here's why this is hard to implement:

The krextor:* templates modes/templates/functions actually only take URIs as input; they don't know what namespaces are. (This is BTW compliant with the RDF data model; namespace prefixes are rather syntactic sugar of some, but not all RDF serializations.)
Krextor generates some of the more advanced RDF serializations (such as RDF/XML and Turtle, which group triples by common subject) by first creating, internally, a simple XML-based encoding called RXR, and then post-processing that. RXR only knows URIs, but no namespace prefixes. (The alternative format TriX would actually support prefixes.)
Of course namespace prefixes are convenient for human users. Therefore it is good practice to use them when implementing extraction modules. However, the way they are encoded in extraction modules is even just XML syntactic sugar: &prefix; is an XML entity, which is resolved by the XML parser.
For some combinations of input format plus output format there is not even a general solution. If you have a hard-coded prefix→URI mapping in your extraction module, this could in principle be looped through to the output module. For some input formats, it's not the extraction module that defines namespaces (i.e. mapping the whole input schema of the input language to one output ontology), but the input documents themselves declare namespaces. In RDFa input, e.g., the namespaces are even locally scoped. However, when creating, e.g., Turtle output of that, you need global namespace prefixes (if you want namespace prefixes at all), so in the end you need to make an
arbitrary choice, or to generate artificial ones.

Let me very briefly outline some possibilities for enabling the desired
functionality:

Very pragmatic: post-process your RDF/XML output by non-Krextor means (e.g. another XSLT, or regular expressions), and wait for this feature to eventually become available in Krextor.
Semi-pragmatic: somehow hack it into the output module you are interested in. One way of doing that is making the respective output module call a certain template by name. In the output module you provide an empty implementation of that template, but in your extraction module you implement that template. That template could return a fixed prefix→URI mapping (if your namespaces are that simple), which the output module would then output in the respective RDF serialization.

This issue was restored/updated/extended from http://trac.kwarc.info/krextor/ticket/59.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use namespace prefixes in output / preserve input prefixes #6

Use namespace prefixes in output / preserve input prefixes #6

clange commented Jul 21, 2016

Use namespace prefixes in output / preserve input prefixes #6

Use namespace prefixes in output / preserve input prefixes #6

Comments

clange commented Jul 21, 2016