Skip to content

Latest commit

 

History

History
103 lines (73 loc) · 4.54 KB

File metadata and controls

103 lines (73 loc) · 4.54 KB
marp license title date author
false
Create your own ARC
2024-04-02

💡 This is just to give an idea for where to start when creating a new ARC.

Sketch your ARC

Try to sketch a little map of what your project could look like as an ARC.

Create a new ARC

  • Create a new ARC on your computer using ARCitect or ARC commander
  • Upload the ARC to the DataHUB
  • Save intermediate steps and commit them with a message

Add a README.md

  • The README.md is the first thing everyone sees when opening your ARC in the DataHUB
  • This is an easy-to-use place for notes or info about your ARC
  • You could add your sketch / map here

Add a license to the ARC

Adding a LICENSE file is easy via the DataHUB:

  1. Go to the ARC
  2. Click the button "Add LICENSE"
  3. Apply a license template
  4. Commit the changes

💡 There is no template for CC-BY 4.0. You can add the license text from https://choosealicense.com/licenses/cc-by-4.0/.

Add metadata to ISA Investigation

  • Add a Title: e.g. the project or publication title
  • Add a Description: e.g. the project description or publication abstract
  • Add Contacts: e.g. collaborators or publication authors
    1. Add First Name, Last Name, Affiliation
    2. Add ORCID
  • Add Publication(s)
    1. DOI, Title, Authors, Status = Published

Add a study

  • if the study is internal (i.e. a dataset from this project or publication)

    • Think of a suitable study identifier and title
    • Add contacts (e.g. transfer contacts from investigation-level to study)
    • Description: write a short summary / bullet points
  • if the study is external (i.e. a dataset from another project or publication)

    • Add Title: publication title
    • Add Description: publication abstract
    • Add Public Release Date: publication online date
    • Add People: authors in same order as on publication
    • Add Publication(s): DOI, Title, Authors, Status = Published

Add an assay

  • add Measurement Type, Technology Type, Technology Platform for every assay (ideally backed by ontology terms)
  • if applicable, add people (assay performers)

Add assay data

  • add the measurement files / raw data to dataset

💡 If your data is very large, take only a subset or use dummy files during the hands-on session.

Annotate ISA studies and assays

Add samples to your ARC and try to describe the sample-to-data flow using ISA

  • the ./assays/<assayName>/isa.assay.xlsx should relate to the ./assays//protocols

    • Add a sheet with the same name as the file for each protocol file; e.g. ./assays/<assayName>/protocols/plant-growth.md --> ./assays/<assayName>/isa.assay.xlsx:plant-growth
    • Link the protocol file name (e.g. plant-growth.md) in the respective Protocol REF column
  • all files stored in a folder ./assays/<assayName>/dataset should be linked in an Output building block of the ./assays/<assayName>/isa.assay.xlsx

    • Use Output [Raw Data File] to link raw data generated by a machine, measuring device, etc.
    • Use Output [Derived Data File] to link data produced by a computational workflow, script, software
    • 💡 Note that an assay can produce
      • samples from samples: e.g. Input [Sample Name] leaf samples -> Output [Sample Name] RNA extract samples
      • data from samples: e.g. Input [Sample Name] cDNA libraries -> Output [Raw Data File] qRT-PCR results
      • data from data: e.g. Input [Raw Data File] qRT-PCR results -> Output [Derived Data File] Plot of relative gene expression
  • use Sample/Material/Data nodes (Input [ ] / Output[ ]) ...

    • ... to link between processes (sheets) within one study/assay
    • ... to link across multiple studies and / or assays

The final result (across all isa.*.xlsx sheets) should be a gapless connection from isa.study.xlsx-sheets through isa.assay.xlsx-sheets representing the flow through the various Input/Output nodes of sample/material --> through processes/protocols --> to Input/Output nodes of sample/material/raw data/derived data. So that any file stored in a ./assays/<assayName>/dataset can be traced back along the chain of processes to the original sample in the lab.

File names

  • Avoid spaces in file names. We recommend to use camelCase or PascalCase for file names
  • However, in order to keep track of links and data origin, it is recommended to keep the original name of data files (i.e. if a publisher or repository stores files with spaces).