Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: new type + format for BLASTDB v5 #275

Open
nbokulich opened this issue May 17, 2022 · 0 comments
Open

ENH: new type + format for BLASTDB v5 #275

nbokulich opened this issue May 17, 2022 · 0 comments

Comments

@nbokulich
Copy link
Member

nbokulich commented May 17, 2022

Addition Description
Using BLAST indexed databases would allow faster searching as well as parallelization (e.g., see use of blastn in q2-feature-classifier, also used in q2-quality-control and some other plugins). This would be in addition to the current use of FASTA (in which case the indexed db is built on the fly).

Current Behavior
No blastdb formats are supported.

Proposed Behavior

  1. Create a new type (proposed name: BLASTDB)
  2. Create a new format (proposed: BLASTDBv5Format). There are versioned formats (v4 is deprecated, v5 is current for some time).
    3. Create transformers (to/from FASTAformat(s)) (EDIT: these should probably be actions, not transformers, and could be placed in q2-feature-classifier re: Wrap makeblastdb to generate indexed blast database q2-feature-classifier#158)

One problem is that the format specification does not appear to be described anywhere that I can find. So I am not sure that we can write a detailed format validator. However, format validation could use blastdbcmd (ships with blast+) to get db info, like:
blastdbcmd -db -info

Questions

  1. Do we need separate types for protein vs. nucleotide dbs? The format would be the same but linked to different types.
  2. Is blastdbcmd sufficient for validation?
  3. Is this even a type that we want in q2-types? A few plugins would wind up using this (q2-feature-classifier, q2-moshpit, external plugins?) so I personally feel certain enough to open the issue here first.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant