marp | license | title | date | author | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
false |
[CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/) |
Create your own ARC |
2024-04-02 |
|
💡 This is just to give an idea for where to start when creating a new ARC.
Try to sketch a little map of what your project could look like as an ARC.
- Create a new ARC on your computer using ARCitect or ARC commander
- Upload the ARC to the DataHUB
- Save intermediate steps and commit them with a message
- The README.md is the first thing everyone sees when opening your ARC in the DataHUB
- This is an easy-to-use place for notes or info about your ARC
- You could add your sketch / map here
Adding a LICENSE
file is easy via the DataHUB:
- Go to the ARC
- Click the button "Add LICENSE"
- Apply a license template
- Commit the changes
💡 There is no template for CC-BY 4.0. You can add the license text from https://choosealicense.com/licenses/cc-by-4.0/.
- Add a Title: e.g. the project or publication title
- Add a Description: e.g. the project description or publication abstract
- Add Contacts: e.g. collaborators or publication authors
- Add First Name, Last Name, Affiliation
- Add ORCID
- Add Publication(s)
- DOI, Title, Authors, Status = Published
-
if the study is internal (i.e. a dataset from this project or publication)
- Think of a suitable study identifier and title
- Add contacts (e.g. transfer contacts from investigation-level to study)
- Description: write a short summary / bullet points
-
if the study is external (i.e. a dataset from another project or publication)
- Add Title: publication title
- Add Description: publication abstract
- Add Public Release Date: publication online date
- Add People: authors in same order as on publication
- Add Publication(s): DOI, Title, Authors, Status = Published
- add Measurement Type, Technology Type, Technology Platform for every assay (ideally backed by ontology terms)
- if applicable, add people (assay performers)
- add the measurement files / raw data to dataset
💡 If your data is very large, take only a subset or use dummy files during the hands-on session.
Add samples to your ARC and try to describe the sample-to-data flow using ISA
-
the
./assays/<assayName>/isa.assay.xlsx
should relate to the ./assays//protocols- Add a sheet with the same name as the file for each protocol file; e.g.
./assays/<assayName>/protocols/plant-growth.md
-->./assays/<assayName>/isa.assay.xlsx:plant-growth
- Link the protocol file name (e.g.
plant-growth.md
) in the respectiveProtocol REF
column
- Add a sheet with the same name as the file for each protocol file; e.g.
-
all files stored in a folder
./assays/<assayName>/dataset
should be linked in an Output building block of the./assays/<assayName>/isa.assay.xlsx
- Use
Output [Raw Data File]
to link raw data generated by a machine, measuring device, etc. - Use
Output [Derived Data File]
to link data produced by a computational workflow, script, software - 💡 Note that an assay can produce
- samples from samples: e.g.
Input [Sample Name]
leaf samples ->Output [Sample Name]
RNA extract samples - data from samples: e.g.
Input [Sample Name]
cDNA libraries ->Output [Raw Data File]
qRT-PCR results - data from data: e.g.
Input [Raw Data File]
qRT-PCR results ->Output [Derived Data File]
Plot of relative gene expression
- samples from samples: e.g.
- Use
-
use Sample/Material/Data nodes (Input [ ] / Output[ ]) ...
- ... to link between processes (sheets) within one study/assay
- ... to link across multiple studies and / or assays
The final result (across all isa.*.xlsx sheets) should be a gapless connection from isa.study.xlsx
-sheets through isa.assay.xlsx
-sheets representing the flow through the various Input/Output nodes of sample/material --> through processes/protocols --> to Input/Output nodes of sample/material/raw data/derived data.
So that any file stored in a ./assays/<assayName>/dataset
can be traced back along the chain of processes to the original sample in the lab.
- Avoid spaces in file names. We recommend to use camelCase or PascalCase for file names
- However, in order to keep track of links and data origin, it is recommended to keep the original name of data files (i.e. if a publisher or repository stores files with spaces).