Skip to content

A reference implementation for the Splitter Interface that splits FASTA files in distinct chunks.

License

Notifications You must be signed in to change notification settings

KubeITerator/biosplitter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Biosplitter

A reference implementation for the proposed splitting logic of the KubeIT project. This Splitter splits FASTA files by either number of records or byte-size.

Behavior

Biosplitter uses the environment variables

  • DATASOURCE: A URL to a file that should be distributed
  • PARAMS: A JSON object that describes parameters that specify the actual splitting

to determine suitable locations for the splitting of the input file. These locations are outputted to stdout as a JSON formatted list that looks like this:

  [
    { "index": 0, "range": "Range:bytes=0-1000" },
    { "index": 1, "range": "Range:bytes=1001-2000" }
  ]

The index value is an incrementing number. range refers to the string Range:bytes=START-STOP that specifier a HTML Range HEADER in curl.

Biosplitter is distributed in a docker container

PARAMS

Biosplitter currently accepts two params to determine suitable splitting positions. maxrecords and bytesize. maxrecords defines an upper limit for the number of records, while bytesize defines a approximate size in bytes per chunk. The PARAM envvar must be JSON formatted, example:

{
  "maxrecord": 1,
  "bytesize": 100000
}

Only one param must be specified, if both are specified biosplitter prefers to split for the maxrecord factor.

SplitterInterface

To create your own Splitting logic you must recreate the above behaviour for your own container. If you use Go for your container you can use the SplitterInterface as interface for your operation.

About

A reference implementation for the Splitter Interface that splits FASTA files in distinct chunks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published