Collections of metadata files of use by the GOC.
In general we follow the pattern:
- metadata source in YAML (or YAML embedded inside Markdown)
- schema for each file also specified in YAML
- metadata can be edited via github web interface, followed by Pull Request
- Travis-CI checks file against schema - see the ../.travis.yml, if passes can be merged
- Jenkins jobs publish metadata files (e.g. http://current.geneontology.org/metadata)
- users.yaml - metadata on GOC members and contributors
- users.schema.yaml - schema
Content:
Each entry is for metadata about a single user. This drives a lot of behavior such as who can do what in Noctua or TG, and is also used for provenance purposes. We want to track all contributions made to any GO content (ontology, annotations or models) and so we want to be sure we have a way of uniquely identifying users through their different aliases and accounts.
note - for historic purposes, some entries in users.yaml are actually transient groups of users. these will be migrated to groups.yaml. The main blocker for this is that TG reads users.yaml but not groups.yaml.
Fields:
- nickname (REQUIRED) - typically first plus last name (not actually nickname in the usual sense)
- uri (RECOMMENDED, UNIQUE) - A Uniform Resource Indicator or Compact URI that uniquely identifies a person.
- Typically an ORCID http URL
- If no ORCID available then a GOC Compact URI is used, e.g. GOC:cjm
- Noctua - uses this field for auto-assigning dc:creator to instances
- xref (OPTIONAL, UNIQUE) - a compact URI that uniquely identifiers the person, e.g. GOC:cjm
- optional
- this is partly historical. The ontology definition xrefs field uses these
- TermGenie - uses this as a lookup for ontology definition xrefs
- organization (RECOMMENDED) - the primary organization to which a person belongs
- although a person may be involved in more than one, typically their GO role will be through one
- this field is primarily for informational purposes
- groups (ZERO TO MANY) - the groups a person belongs to (see below for more on groups)
- Noctua uses this information to allow a person to attribute pav:provided_by annotations
- accounts (DICT) - a dictionary mapping account type to username
- Noctua uses this information for login/authentication
- TermGenie uses this information for login/authentication
- authorizations (DICT)
- Noctua uses this information to authorization (determining if your account is allowed to edit)
- TermGenie uses this information to authorization (determining if your account is allowed to edit)
- email-md5 deprecated
Tracking contributions to GO:
In the GO graphstore, we typically have triples:
<instance> dc:author <user-uri>
<instance> dc:contributor <user-uri>
These are auto-generated by Noctua.
Additionally, where provenance is added directly in the ontology, the information is stored as a dbxref "axiom annotation" on top of the association between the term URI and the definition string. See section 5.6 of the obo-syntax spec for full details.
- groups.yaml - metadata on GOC groups
- groups.schema.yaml - schema
Groups encompasses organizations, projects, working groups, content meetings, grants, etc. We call these "groups" as these typically consist of groups of users. Some groups may be transient (e.g. projects or working groups). Others may be permanent institutions, such as Cambridge University.
Fields:
- id (REQUIRED, UNIQUE) - a URI uniquely identifying the group. Typically the official URL.
- label (REQUIRED, UNIQUE) - e.g. university name, grant name. Should be unique but this is not actually tracked
TODO: each group should have a point of contact, and that POC should be in users
Click on groups.yaml and add a new entry. This assumes you have familiarity with making pull requests via the github web interface. If you can't do that file a ticket in this tracker.
The group must have a stable URL that directs to a page about the group. See existing entries for details.
In the GO graphstore, we typically have triples:
<instance> pav:providedBy <group-uri>
These are added by Noctua. Note the user must select one or more group roles (multiple roles OK).
Registry of database prefixes
- db-xrefs.yaml - prefix registry
- db-xrefs.schema.yaml - schema
Metadata about locations and contents of GAFS and GPADs contributed to GO Central
See the datasets/ directory for more details
- datasets.schema.yaml - schema
Enumerated rules used for QC within the GO
See the gorules/ directory for more details
- gorefs.yaml - metadata on GOC members and contributors
- gorefs.schema.yaml - schema (uses LinkML)
Ad-hoc references and publications referenced within GO, where no PMID or DOI available.
Fields:
- id (REQUIRED, UNIQUE) - A URI uniquely identifying the reference. Follows the pattern
GO_REF:NNNNNNN
where N is a digit. Typically the number should be the next available number (e.g.GO_REF:0000119
) - title (REQUIRED) - The title of the reference.
- description (REQUIRED) - A description or abstract for the reference.
- comments (ZERO TO MANY) - Comments on the reference. These will be displayed separately from the description. Rarely used except for by some old references.
- alt_id (ZERO TO MANY) - Alternative IDs for the reference. Must follow the same pattern as the id field.
- authors (REQUIRED) - Authors of the reference.
- citation (OPTIONAL) - PMID of a published citation for this reference (e.g.
PMID:30272209
) - evidence_codes (ZERO TO MANY) - Evidence codes that are used in the reference. Must be an ECO term ID (e.g.
ECO:0000501
) - external_accession (ZERO TO MANY) - Cross references to other databases for the reference. Must be of the form
PFX:ID
where PFX is the database prefix and ID is the accession number (e.g.SGD_REF:S000148669
) - is_obsolete (BOOLEAN, OPTIONAL) - Whether the reference is obsolete. If true, the title should also begin with "OBSOLETE".
- url (OPTIONAL) - a URL to get more information about the reference.
- year (OPTIONAL, INTEGER) - The year the reference was created.
- PMID's from retracted publications. Some entries have associated PMCID delimited by comma