Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDF URIs for language tags and / or language subtags #13

Open
fsasaki opened this issue Apr 16, 2020 · 6 comments
Open

RDF URIs for language tags and / or language subtags #13

fsasaki opened this issue Apr 16, 2020 · 6 comments

Comments

@fsasaki
Copy link

fsasaki commented Apr 16, 2020

Over the years, the RDF community has developed several concrete sets of URIs for identifying languages. Examples:

The URIs in these sets are based on ISO 639 , often extended with further URIs e.g. to identify language (variants) that are not part of 639, e.g. underressourced or historic languages.

There are various groups that provide such URIs or the underlying values, e.g. the two efforts mentioned above, or the library of congress.

Some arguments for providing URIs for language (sub) tags, taken from this thread:
https://lists.w3.org/Archives/Public/public-ontolex/2020Apr/0006.html

  • URIs allow to use the forehand mentioned languages that are not part of the BCP 47 sub tag registry
  • URIs allow to add information to each (sub)tag easily in a de-centralized manner. This is not possible with the sub tag registry.

Some open questions:

  • Is there a need to provide validation of sub tag combinations, if URIs are provided?

The above is just a summary of what I read from the thread. Below is an observation.

The RDF community "likes" to provide information as URIs - that is a "selling point" of RDF itself. At the moment, the URI "providers" for language information are scattered across organizations and research groups. Also, there are open questions like the validation aspect of language tags - which are solved in BPC 47, but not in the URI version(s) of language tags.
A lot of this discussion has to do with understanding about

  • what can be done with BCP 47 already
  • how to get "your subtag" into the BCP 47 registry
  • what use cases cannot be covered by BCP 47, see the "adding information" requirement mentioned above

Since the RDF community does not have one accepted provider of URIs, it is hard to have the right stakeholders on the table.

A next step for the BCP 47 community could be to fill a gap: provide URIs for the entries of the language sub tag registry. In that way, more understanding of BCP 47 could be brought to the RDF community, and W3C and / or IETF could be recognized as the proper stakeholder for this task.

@niklasl
Copy link

niklasl commented Oct 27, 2021

This ought to be coordinated with the i18n namespace defined in JSON-LD 1.1.

@fsasaki
Copy link
Author

fsasaki commented Oct 27, 2021

@aphillips , the latest comment from @niklasl is an interesting input to our discussion with John Klensin.

@jonquet
Copy link

jonquet commented Apr 1, 2024

Can I get a reference / link to the Library of Congress set of URIs for ISO 639 ?

We are facing an issue with the Lexvo ones:

agroportal/project-management#507

@aphillips
Copy link
Contributor

@jonquet I think you might be confused by the distinction between what 639 does and how language tags are composed.

The Library of Congress is a reference for ISO-639-1. This is not the only part of ISO 639: it's only the 2-letter codes. The RA is the Summer Institute of Language (SIL), who maintain ISO-639-3 (parts -1 and -2 are derived from this, note that I'm simplifying a lot). However...

Language Tags are defined by IETF BCP47. These tags include multiple standards, including ISO 639 for languages, ISO 4217 for scripts, ISO 3166 for country/regions. These codes (called "subtags") can be composed to form complete language such as pt-BR, zh-Hans-CN, etc. Our WG maintains an introductory article here.

There is a registry of valid subtags maintained by IANA here[https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry}. This registry tracks all of the parts of ISO639 as well as the other standards that are used in language tags. However, it is one large "cookie-jar" format file with all of the subtags in it.

This issue, where we're discussing this, reflects a known gap for RDF: there is no URL reference for composed language tags. This WG investigated what would be required to create one in the 2020/2021. It would be possible to do this at IETF/IANA, but no one wrote the Internet-Draft to carry the work forward. cf. action result

@andjc
Copy link

andjc commented Apr 1, 2024

There are also the T and U extensions to BCP47.

The T extension would as a minimum have to be ticked off. Library of Congress' increasing use of Bibframe and their current preference for romanised data means that most of their linked data will require T extensions as part of the language tag.

@jonquet
Copy link

jonquet commented Apr 22, 2024

Thanks @aphillips for detailed info. Indeed this confirms the way the 'pt-BR' code is built ... and that there is no URI yet to identify those subtags.
It's a pity as a machine would not automatically know the semantics of the code without a semantic representation of them (for which there would be an URI).
On our side (agroportal/project-management#507) we will make the use of URIs no require to handle the cases when ontologies use subtags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants