This repository contains the JSON schema files for importing data into RNAcentral. The goal is create a generic, simple method of importing data into RNAcentral. Currently the schema here are under review are are not ready for real world use case. If you are interested in providing feedback please open an issue or pull request.
The work here is based off the
agr_schemas version 0.6.2.
The top level schema is rnacentral-schema.json
with subsections stored in
sections
. Examples will be found in examples
and a validation script
is provided in validate.py
.
We support referencing coordinates in two ways, either 0-start, half open
or
1-start, fully-closed
. For details on what these mean and why there are two
coordinate systems please refer to:
- http://genome.ucsc.edu/blog/the-ucsc-genome-browser-coordinate-counting-systems/
- https://genome.ucsc.edu/FAQ/FAQtracks#tracks1
By default we assume that the coordinates for genomic locations are 0-based, half open
, but this can be changed by setting the genomicCoordinateSystem
property of the metaData
. This does not change the behavior of annotations about
any sequence features like, modifications or the coordinates in related
sequences.
The validation script requires the python packages found in requirements.txt
.
Install them with pip install -r requirements.txt
. Validation is not yet
provided as a library, but may be in the future.
The validator script in ./validate.py
can be used to validate files. Below
are some example usages:
./validate.py examples/flybase.json
If you have moved schema files, or the directory of sections around you can use
the --schema
or --sections
options. Like:
./validate.py --schema different-schema.json --sections new-sections.json examples/flybase.json