Skip to content

Latest commit

 

History

History
256 lines (194 loc) · 15.8 KB

isa_ro_crate.md

File metadata and controls

256 lines (194 loc) · 15.8 KB

ISA RO-Crate Profile

Overview

A significant part of the previous work on this RO-Crate profile for ISA was produced as part of the Annotated Research Context (ARC) project, through arc-to-rocrate.

During the ELIXIR Biohackathon 2023, as part of Project 14: Enabling continuous RDM using Annotated Research Contexts with RO-Crate profiles for ISA, the profile was further fine tuned and defined, and some remaining unresolved mappings resolved.

The aim of the profile is to be able to fully represent ISA-JSON as RO-Crate, fully capturing the metadata and files in a non-lossy form such that it should be possible to convert between one to the other, in either direction, without loss of information.

The ISA RO-Crate has led to a few changes to Bioschemas types:

LabProtocol - Has been redefined as a child of HowTo to make it clearer that it is intended to specifically describe the planned instructions for a lab process.

LabProcess - A new type has been defined as a child of Action, to specifically describe the details and outcomes of an executed LabProtocol. Thereby seperating the "what was planned" and "what happened" between LabProtocol and LabProcess respectively. A working group is working on the new type and adaptations of existing types.

An important change to the Bioschemas specification that is still pending is the following:

Dataset - A new property processSequence to describe how the Dataset was created.

The following graph summarizes the ISA model in terms of Bioschemas/Schema.org vocabulary:

flowchart TD

dataset[Investigation/Study/Assay=Dataset]

Process[LabProcess]

Protocol[Protocol=LabProtocol]

BioSample[Source/Sample/Material=Sample]

DataFile[Data=File]

ont[OntologyAnnotation=DefinedTerm]

prop[ParameterValue=PropertyValue]

dataset --hasPart--> dataset
dataset --hasPart----> DataFile
dataset --processSequence--> Process

Process --"result"---> DataFile
Process --"result"--> BioSample
Process --"object"--> BioSample
Process --executesLabProtocol--> Protocol
Process --parameterValue---> prop

BioSample --derivesFrom--> BioSample
BioSample --additionalProperty--> prop

Protocol --purpose---> ont
Protocol --labEquipment---> ont
Protocol --reagent---> ont

Loading

Requirements

Investigation

Is based upon schema.org/Dataset and maps to the ISA-JSON Investigation

Property Required Expected Type Description
@id MUST Text or URL Should be “./”, the investigation object represents the root data entity.
@type MUST Text must be 'schema.org/Dataset'
additionalType MUST Text or URL ‘Investigation’ or ontology term to identify it as an Investigation
identifier MUST Text or URL Identifying descriptor of the investigation (e.g. repository name).
creator SHOULD schema.org/Person The creator(s)/authors(s)/owner(s)/PI(s) of the investigation.
dateCreated SHOULD DateTime When the Investigation was created
datePublished SHOULD DateTime When the Investigation was published
description SHOULD Text A description of the investigation (e.g. an abstract).
hasPart SHOULD schema.org/Dataset (Study) An Investigation object should contain other datasets representing the studies of the investigation. They must follow the Study profile.
headline SHOULD Text A title of the investigation (e.g. a paper title).
citation COULD schema.org/ScholarlyArticle Publications corresponding with this investigation.
comment COULD schema.org/Comment Comment
dateModified COULD DateTime When the Investigation was last modified
mentions COULD schema.org/DefinedTermSet Ontologies referenced in this investigation.
url COULD URL The filename or path of the metadata file describing the investigation. Optional, since in some contexts like an ARC the filename is implicit.

Study

Is based upon schema.org/Dataset and maps to the ISA-JSON Study

Property Required Expected Type Description
@id MUST Text or URL Should be a subdirectory corresponding to this study.
@type MUST Text must be 'schema.org/Dataset'
additionalType MUST Text or URL ‘Study’ or ontology term to identify it as a Study
identifier MUST Text or URL Identifying descriptor of the study.
about SHOULD bioschemas.org/LabProcess The experimental processes performed in this study.
creator SHOULD schema.org/Person The performer of the study.
dateCreated SHOULD DateTime When the Study was created
datePublished SHOULD DateTime When the Study was published
description SHOULD Text A short description of the study (e.g. an abstract).
hasPart SHOULD schema.org/Dataset (Assay) or File Assays contained in this study or actual data files resulting from the process sequence.
headline SHOULD Text A title of the study.
citation COULD schema.org/ScholarlyArticle A publication corresponding to the study.
comment COULD schema.org/Comment Comment
dateModified COULD DateTime When the Study was last modified
url COULD URL The filename or path of the metadata file describing the study. Optional, since in some contexts like an ARC the filename is implicit.

Assay

Is based upon schema.org/Dataset and maps to the ISA-JSON Assay

Property Required Expected Type Description
@id MUST Text or URL Should be a subdirectory corresponding to this assay.
@type MUST Text must be 'schema.org/Dataset'
additionalType MUST Text or URL ‘Assay’ or ontology term to identify it as an Assay
identifier MUST Text or URL Identifying descriptor of the assay.
about SHOULD bioschemas.org/LabProcess The experimental processes performed in this assay.
creator SHOULD schema.org/Person The performer of the experiments.
hasPart SHOULD File The data files resulting from the process sequence
measurementMethod SHOULD URL or schema.org/DefinedTerm Describes the type measurement e.g Complexomics or transcriptomics as an ontology term
measurementTechnique SHOULD URL or schema.org/DefinedTerm Describes the type of technology used to take the measurement, e.g mass spectrometry or deep sequencing
comment COULD schema.org/Comment Comment
url COULD URL The filename or path of the metadata file describing the assay. Optional, since in some contexts like an ARC the filename is implicit.
variableMeasured COULD Text or schema.org/PropertyValue The target variable being measured E.g protein concentration

LabProcess

Has the new Bioschemas DRAFT bioschemas.org/LabProcess type and maps to the ISA-JSON Process

Property Required Expected Type Description
@type MUST Text must be 'bioschemas.org/LabProcess'
@id MUST Text or URL Could identify the process using the isa metadata filename and the protocol reference or process name.
name MUST Text -
agent MUST schema.org/Person The performer
object MUST bioschemas.org/Sample or File The input
result MUST bioschemas.org/Sample or File The output
executesLabProtocol SHOULD bioschemas.org/LabProtocol The protocol executed
parameterValue SHOULD schema.org/PropertyValue A parameter value of the experimental process, usually a key-value pair using ontology terms
endTime SHOULD DateTime
disambiguatingDescription COULD Text Comments

LabProtocol

Is based on the Bioschemas bioschemas.org/LabProtocol type and maps to the ISA-JSON Protocol

Property Required Expected Type Description
@id MUST Text or URL Could be the url pointing to the protocol resource.
@type MUST Text must be 'bioschemas.org/LabProtocol'
description SHOULD Text A short description of the protocol (e.g. an abstract)
intendedUse SHOULD schema.org/DefinedTerm or Text or URL The protocol type as an ontology term
name SHOULD Text Main title of the LabProtocol.
comment COULD schema.org/Comment Comment
computationalTool COULD schema.org/DefinedTerm or schema.org/PropertyValue or schema.org/SoftwareApplication Software or tool used as part of the lab protocol to complete a part of it.
labEquipment COULD schema.org/DefinedTerm or schema.org/PropertyValue or Text or URL For LabProtocols it would be a laboratory equipment use by a person to follow one or more steps described in this LabProtocol.
reagent COULD schema.org/BioChemEntity or schema.org/DefinedTerm or schema.org/PropertyValue or Text or URL Reagents used in the protocol.
url COULD URL Pointer to protocol resources external to the ISA-Tab that can be accessed by their Uniform Resource Identifier (URI).
version COULD Number or Text An identifier for the version to ensure protocol tracking.

Sample

Is based on the Bioschemas bioschemas.org/Sample type, and represents the ISA-JSON Sample, Source and Material

Property Required Expected Type Description
@id MUST Text or URL Could be the unique sample name.
@type MUST Text must be 'bioschemas.org/Sample'
name MUST Text A name identifying the sample.
additionalProperty SHOULD schema.org/PropertyValue characteristics or factors
derivesFrom COULD bioschemas.org/Sample A source from which the sample is derived through processes.

Data

Describes and points to a Data file, and maps to the ISA-JSON Data

Property Required Expected Type Description
@id MUST Text or URL Should be the path pointing to the file
@type MUST Text must be 'File' or 'MediaObject'
name MUST Text or URL The name of the file.
comment COULD schema.org/Comment Comment
disambiguatingDescription COULD Text The type of the data file (“Raw Data File", “Derived Data File" or "Image File").
encodingFormat COULD Text of URL Media format as a MIME type

PropertyValue

It is based on schema.org/PropertyValue and maps to the ISA-JSON Process Parameter Value

Property Required Expected Type Description
@id MUST Text or URL
@type MUST Text must be 'schema.org/PropertyValue'
name MUST Text Key name
value MUST Text Value text or number
propertyID SHOULD URL Key ontology reference
additionalType COULD Text Can be used to describe if the value is a factor, characteristic or parameter.
unitCode COULD URL Unit ontology reference
unitText COULD Text Unit name
valueReference COULD URL Value ontology reference

Person

It is based on schema.org/Person, and maps to the ISA-JSON Person

Property Required Expected Type Description
@id MUST Text or URL
@type MUST Text must be 'schema.org/Person'
givenName MUST Text Given name of a person. Can be used for any type of name.
affiliation SHOULD schema.org/Organization
email SHOULD Text
familyName SHOULD Text Family name of a person.
identifier SHOULD Text or URL or schema.org/PropertyValue One or many identifiers for this person, e.g. an ORCID. Can be of type PropertyValue to indicate the kind of reference.
jobTitle SHOULD schema.org/DefinedTerm
additionalName COULD Text
address COULD PostalAddress or Text
disambiguatingDescription COULD Text
faxNumber COULD Text
telephone COULD Text

ScholarlyArticle

It is based on schema.org/ScholarlyArticle and maps to the ISA-JSON Publication

Property Required Expected Type Description
@id MUST Text or URL
@type MUST Text must be 'schema.org/ScholarlyArticle'
headline MUST Text
identifier MUST Text or URL or schema.org/PropertyValue One or many identifiers for this article like a DOI or PubMedID. Can be of type PropertyValue to indicate the kind of reference.
author SHOULD schema.org/Person
url SHOULD URL
creativeWorkStatus COULD schema.org/DefinedTerm The status of the publication in terms of its stage in a lifecycle.
disambiguatingDescription COULD Text

Example ro-crate-metadata.json

TODO: simple example and a link to a more complete example