Creates an "extracted" genome assembly from a fasta + list of regions in "bed" format
extractome regions.bed <options>
or
python extractome/extract.py regions.bed <options>
Required inputs
- Region file in "bed" format, described here. Only the first 3 columns are required. Note that the bed format uses the “0-start, half-open” coordinate convention, so for example the first base in a sequence is represented by start=0, end=1.
- Either a fasta file or IGV genome identifer (see Options below)
Options
- --fasta reference fasta file, required if --genome is not specified
- --genome igv.js genome id (e.g. hg38), required if --fasta is not specified
- --name base name for output files, default=Xome
- --output output directory name, default=output
The script creates 3 output files
- base_name.fa
- base_name.regions.bed - the input regions file lifted over to extracted fasta
- base_name.chain - a UCSC "chain" file. Can be used to liftover files to the extracted fasta with tools such as CrossMap
extractome test/data/cpgIsland_mm10.bed --genome mm10 --name CPG_mm10 --output output