FALCON-formatter

The program FALCON-formatter takes fastq and fasta files from a Pacific Biosciences sequencer and formats them for de novo assembly with FALCON.

Description

Even though it is more convenient to store all reads in a single FASTA or FASTQ file on your system, Dazzler (and therefore FALCON) does not accept this kind of input. All inputs MUST be in FASTA format with files split by barcode, set, and part number. This means that fields 1-6 in the example below must be unique to each input file.

m140415_143853_42175_c100635972550000001823121909121417_s1_p0/553/3100_11230
1yymmdd_hhmmss 33333 4444444444444444444444444444444444 55 66 777 8888888888

“m” = movie
Time of Run Start (yymmdd_hhmmss)
Instrument Serial Number
SMRT Cell Barcode
Set Number
Part Number
ZMW hole number*
Subread Region (start_stop using polymerase read coordinates)*

These fields are only used in fasta/q headers

More information about file formats can be found at the SMRT-Analysis wiki.

Below is an example that demonstrates this requirement and process by correctly splitting the file Example.fasta.

Example.fasta

>m140415_143853_42175_c100635972550000001823121909121417_s1_p0/553/3100_11230
>m140415_143853_42175_c324508543089230982134098587348034_s1_p0/553/103_725
>m140415_143853_42175_c324508543089230982134098587348034_s1_p0/553/973_13390
>m140415_143853_42175_c100635972550000001823121909121417_s1_p0/553/15030_17394

In the 4 headers, there are two unique 1-6 field sets:

>m140415_143853_42175_c100635972550000001823121909121417_s1_p0
>m140415_143853_42175_c324508543089230982134098587348034_s1_p0

All subreads corresponding to these headers need to be in their own files, so Example.fasta would be split accordingly:

m140415_143853_42175_c100635972550000001823121909121417_s1_p0.fasta

>m140415_143853_42175_c100635972550000001823121909121417_s1_p0/553/3100_11230
>m140415_143853_42175_c100635972550000001823121909121417_s1_p0/553/15030_17394

m140415_143853_42175_c324508543089230982134098587348034_s1_p0.fasta

>m140415_143853_42175_c324508543089230982134098587348034_s1_p0/553/103_725
>m140415_143853_42175_c324508543089230982134098587348034_s1_p0/553/973_13390

FALCON-formatter takes FASTA/Q files or folders of files as input, converts the FASTQ to FASTA and writes each read to a file corresponding to fields 1 through 6.

Installation

Using setuptools

git clone https://github.com/zyndagj/FALCON-formatter
cd FALCON-formatter
python setup.py install --user

Using pip

pip install --user git+https://github.com/zyndagj/FALCON-formatter

CLI Usage

The program FALCON-formatter (installed in $HOME/.local/bin) takes fastq and fasta files from a Pacific Biosciences sequencer and formats them for de novo assembly with FALCON.

usage:

FALCON-formatter [-h] [-w INT] [-o STR] F [F ...]

positional arguments:

Argument	Description
F	Fastq/a files for folder for formatting

optional arguments:

Flag	Option	Description
-h		show this help message and exit
-w	INT	hard-wrap fasta output at [80] base-pairs
-o	DIR	output path [.]

Example

$ FALCON-formatter ecoli.fasta
Processing: ecoli.fasta

CyVerse Usage

If you’re coming from Cyverse, you first need to find the FALCON-formatter app in the HPC app catalog and launch it. Then, click on the “Inputs” drop down arrow to designate your inputs.

Then, click the browse button to open up a file explorer to choose your input.

Select either a single fastq/fastq file or a whole folder to process.

Click “Launch Analysis” to start your job. You’ll get notifications when the program starts and when it finishes.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
FALCON_formatter		FALCON_formatter
agave_app		agave_app
bin		bin
test_data		test_data
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FALCON-formatter

Description

Installation

CLI Usage

usage:

positional arguments:

optional arguments:

Example

CyVerse Usage

About

Releases

Packages

Languages

License

zyndagj/FALCON-formatter

Folders and files

Latest commit

History

Repository files navigation

FALCON-formatter

Description

Installation

CLI Usage

usage:

positional arguments:

optional arguments:

Example

CyVerse Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages