Skip to content

Latest commit

 

History

History
147 lines (101 loc) · 6.71 KB

README.md

File metadata and controls

147 lines (101 loc) · 6.71 KB

Collections of metadata files of use by the GOC.

In general we follow the pattern:

  • metadata source in YAML (or YAML embedded inside Markdown)
  • schema for each file also specified in YAML
  • metadata can be edited via github web interface, followed by Pull Request
  • Travis-CI checks file against schema - see the ../.travis.yml, if passes can be merged
  • Jenkins jobs publish metadata files (e.g. http://current.geneontology.org/metadata)

users.yaml

Content:

Each entry is for metadata about a single user. This drives a lot of behavior such as who can do what in Noctua or TG, and is also used for provenance purposes. We want to track all contributions made to any GO content (ontology, annotations or models) and so we want to be sure we have a way of uniquely identifying users through their different aliases and accounts.

note - for historic purposes, some entries in users.yaml are actually transient groups of users. these will be migrated to groups.yaml. The main blocker for this is that TG reads users.yaml but not groups.yaml.

Fields:

  • nickname (REQUIRED) - typically first plus last name (not actually nickname in the usual sense)
  • uri (RECOMMENDED, UNIQUE) - A Uniform Resource Indicator or Compact URI that uniquely identifies a person.
    • Typically an ORCID http URL
    • If no ORCID available then a GOC Compact URI is used, e.g. GOC:cjm
    • Noctua - uses this field for auto-assigning dc:creator to instances
  • xref (OPTIONAL, UNIQUE) - a compact URI that uniquely identifiers the person, e.g. GOC:cjm
    • optional
    • this is partly historical. The ontology definition xrefs field uses these
    • TermGenie - uses this as a lookup for ontology definition xrefs
  • organization (RECOMMENDED) - the primary organization to which a person belongs
    • although a person may be involved in more than one, typically their GO role will be through one
    • this field is primarily for informational purposes
  • groups (ZERO TO MANY) - the groups a person belongs to (see below for more on groups)
    • Noctua uses this information to allow a person to attribute pav:provided_by annotations
  • accounts (DICT) - a dictionary mapping account type to username
    • Noctua uses this information for login/authentication
    • TermGenie uses this information for login/authentication
  • authorizations (DICT)
    • Noctua uses this information to authorization (determining if your account is allowed to edit)
    • TermGenie uses this information to authorization (determining if your account is allowed to edit)
  • email-md5 deprecated

Tracking contributions to GO:

In the GO graphstore, we typically have triples:

<instance> dc:author <user-uri>
<instance> dc:contributor <user-uri>

These are auto-generated by Noctua.

Additionally, where provenance is added directly in the ontology, the information is stored as a dbxref "axiom annotation" on top of the association between the term URI and the definition string. See section 5.6 of the obo-syntax spec for full details.

groups.yaml

Groups encompasses organizations, projects, working groups, content meetings, grants, etc. We call these "groups" as these typically consist of groups of users. Some groups may be transient (e.g. projects or working groups). Others may be permanent institutions, such as Cambridge University.

Fields:

  • id (REQUIRED, UNIQUE) - a URI uniquely identifying the group. Typically the official URL.
  • label (REQUIRED, UNIQUE) - e.g. university name, grant name. Should be unique but this is not actually tracked

TODO: each group should have a point of contact, and that POC should be in users

SOP for adding new groups

Click on groups.yaml and add a new entry. This assumes you have familiarity with making pull requests via the github web interface. If you can't do that file a ticket in this tracker.

The group must have a stable URL that directs to a page about the group. See existing entries for details.

Tracking contributions to GO using groups.yaml

In the GO graphstore, we typically have triples:

<instance> pav:providedBy <group-uri>

These are added by Noctua. Note the user must select one or more group roles (multiple roles OK).

db-xrefs.yaml

Registry of database prefixes

datasets

Metadata about locations and contents of GAFS and GPADs contributed to GO Central

See the datasets/ directory for more details

gorules

Enumerated rules used for QC within the GO

See the gorules/ directory for more details

gorefs.yaml

Ad-hoc references and publications referenced within GO, where no PMID or DOI available.

Fields:

  • id (REQUIRED, UNIQUE) - A URI uniquely identifying the reference. Follows the pattern GO_REF:NNNNNNN where N is a digit. Typically the number should be the next available number (e.g. GO_REF:0000119)
  • title (REQUIRED) - The title of the reference.
  • description (REQUIRED) - A description or abstract for the reference.
  • comments (ZERO TO MANY) - Comments on the reference. These will be displayed separately from the description. Rarely used except for by some old references.
  • alt_id (ZERO TO MANY) - Alternative IDs for the reference. Must follow the same pattern as the id field.
  • authors (REQUIRED) - Authors of the reference.
  • citation (OPTIONAL) - PMID of a published citation for this reference (e.g. PMID:30272209)
  • evidence_codes (ZERO TO MANY) - Evidence codes that are used in the reference. Must be an ECO term ID (e.g. ECO:0000501)
  • external_accession (ZERO TO MANY) - Cross references to other databases for the reference. Must be of the form PFX:ID where PFX is the database prefix and ID is the accession number (e.g. SGD_REF:S000148669)
  • is_obsolete (BOOLEAN, OPTIONAL) - Whether the reference is obsolete. If true, the title should also begin with "OBSOLETE".
  • url (OPTIONAL) - a URL to get more information about the reference.
  • year (OPTIONAL, INTEGER) - The year the reference was created.

retracted-publications.txt

  • PMID's from retracted publications. Some entries have associated PMCID delimited by comma