Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PMID vs. PUBMED prefix for CURIE form of identifiers from https://pubmed.ncbi.nlm.nih.gov/ #2373

Open
sierra-moxon opened this issue Oct 14, 2024 · 1 comment

Comments

@sierra-moxon
Copy link
Member

bioregistry issue - NCBI PubMed resource would like their prefix to be 'pubmed' rather than PMID (the issue has a lot of discussion for review). More votes encouraged and if anyone wants to take up the issue at Bioregistry or with NCBI that would be helpful.

This leads to an ambiguous namespace/identifier resolution for publication identifiers when GO software engineers try to use standard libraries (like curies, or prefixmaps to expand/contract PubMed URIs or CURIEs) for this task.

We do have a couple of controls that help us work around situations like this:

  1. bioregistry has the concept of a "preferred" prefix and a set of synonyms. However, in the case of the PubMed resource, the bioregistry has chosen "pubmed" as the preferred prefix. This won't help us.

  2. in the prefixmaps library, we create simple maps that represent prefix expansion rules inclusive of bioregistry synonyms. This does help us, for now.

For example, for PubMed, this is the prefixmap's merged representation of all the possible expansions of the prefix in question. We use the fourth column in this CSV to designate whether the expansions should be considered an alternative expansion "prefix_alias" - this is useful when ingesting data that might use URIs instead of CURIEs and we want to contract the URI to the correct CURIE - or a "canonical" expansion. This is useful when we are expanding CURIEs to their URI forms for use in RDF stores where the persistent identifier is a URI.

merged,PUBMED,http://bio2rdf.org/pubmed:,prefix_alias,bioregistry
merged,PUBMED,http://bioregistry.io/MEDLINE:,prefix_alias,bioregistry
merged,PUBMED,http://bioregistry.io/PMID:,prefix_alias,bioregistry
merged,PUBMED,http://bioregistry.io/PubMed:,prefix_alias,bioregistry
merged,PUBMED,http://europepmc.org/abstract/MED/,prefix_alias,bioregistry
merged,PUBMED,http://identifiers.org/pubmed:,prefix_alias,bioregistry
merged,PUBMED,http://linkedlifedata.com/resource/pubmed/id/,prefix_alias,bioregistry
merged,PUBMED,http://n2t.net/pubmed:,prefix_alias,bioregistry
merged,PUBMED,http://pubmed.ncbi.nlm.nih.gov/,prefix_alias,bioregistry
merged,PUBMED,http://purl.uniprot.org/citations/,prefix_alias,bioregistry
merged,PUBMED,http://purl.uniprot.org/pubmed/,prefix_alias,bioregistry
merged,PUBMED,http://rdf.ncbi.nlm.nih.gov/pubchem/reference/,prefix_alias,bioregistry
merged,PUBMED,http://scholia.toolforge.org/pubmed/,prefix_alias,bioregistry
merged,PUBMED,http://www.hubmed.org/display.cgi?uids=,prefix_alias,bioregistry
merged,PUBMED,http://www.ncbi.nlm.nih.gov/pubmed/,prefix_alias,bioregistry
merged,PUBMED,https://bio2rdf.org/pubmed:,prefix_alias,bioregistry
merged,PUBMED,https://bioregistry.io/MEDLINE:,prefix_alias,bioregistry
merged,PUBMED,https://bioregistry.io/PMID:,prefix_alias,bioregistry
merged,PUBMED,https://bioregistry.io/PubMed:,prefix_alias,bioregistry
merged,PUBMED,https://europepmc.org/abstract/MED/,prefix_alias,bioregistry
merged,PUBMED,https://identifiers.org/pubmed/,prefix_alias,bioregistry
merged,PUBMED,https://identifiers.org/pubmed:,prefix_alias,bioregistry
merged,PUBMED,https://linkedlifedata.com/resource/pubmed/id/,prefix_alias,bioregistry
merged,PUBMED,https://n2t.net/pubmed:,prefix_alias,bioregistry
merged,PUBMED,https://pubmed.ncbi.nlm.nih.gov/,prefix_alias,bioregistry
merged,PUBMED,https://purl.uniprot.org/citations/,prefix_alias,bioregistry
merged,PUBMED,https://purl.uniprot.org/pubmed/,prefix_alias,bioregistry
merged,PUBMED,https://rdf.ncbi.nlm.nih.gov/pubchem/reference/,prefix_alias,bioregistry
merged,PUBMED,https://scholia.toolforge.org/pubmed/,prefix_alias,bioregistry
merged,PUBMED,https://www.hubmed.org/display.cgi?uids=,prefix_alias,bioregistry
merged,PUBMED,https://www.ncbi.nlm.nih.gov/pubmed/,prefix_alias,bioregistry
merged,pubmed,http://bio2rdf.org/pubmed_vocabulary:,prefix_alias,prefixcc
merged,PUBMED,http://identifiers.org/pubmed/,namespace_alias,bioregistry
merged,PMID,http://identifiers.org/pubmed/,canonical,go

For GO software that uses prefixmaps to do URI/CURIE expansion/contraction, we should always instantiate prefixmaps with the "go" context, which simply reflects the db-xrefx.yaml annotations:

go,PMID,http://identifiers.org/pubmed/,canonical

I'm opening this ticket because bioregistry has a lot of uptake in our community and we need to keep an eye out for data coming into the GO with one of these alternate URI expansions (e.g. coming in with a PUBMED: CURIE, or with a URI like http://identifiers.org/pubmed/PUBMED:1234). These should fail our QC checks, but could start to be more common as resources move further towards bioregistry. (e.g. analogous example that does not impact GO: Alliance just moved all their instances of OMIM: prefixes to MIM: prefixes based on bioregistry discussions with OMIM).

It may be that we want to have a discussion at some point about how to ask for PubMed identifiers in our ingest files.

@kltm
Copy link
Member

kltm commented Oct 14, 2024

Also, as always, nothing the difference between linking (mostly what db-xrefs.yaml is concerned with) and identifiers, which are not always the same thing.

Tagging @pgaudet , to make sure this is on your radar as well, but no concrete action at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants