Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NCBI BLAST+ blastn overflow error with NCBI NT 2023-09-01 Nucleotide BLAST database #156

Closed
kysrpex opened this issue Sep 11, 2023 · 6 comments

Comments

@kysrpex
Copy link

kysrpex commented Sep 11, 2023

The latest version of NCBI BLAST+ blastn available in this repository seems to be incompatible with the NCBI NT database from September 1, 2023. Below you may find the outputs of a job I launched myself on UseGalaxy.eu to reproduce the issue.

Command Line

blastn  -query '/data/dnb09/galaxy_db/files/0/5/1/dataset_051aeaa5-7cb3-4776-a46b-9ab01c6d3f8e.dat'   -db '"/data/db/databases/blast/nt/2023-09-01/nt"'  -task 'blastn' -evalue '0.001' -out '/data/jwd05e/main/062/440/62440929/outputs/dataset_ba855f06-d5b2-4810-9b63-71b033951036.dat' -outfmt '6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen salltitles'  -num_threads "${GALAXY_SLOTS:-8}"

Tool Standard Error

Error: NCBI C++ Exception:
    T0 "/opt/conda/conda-bld/blast_1595737360567/work/blast/c++/src/serial/objistrasnb.cpp", line 499: Error: (CSerialException::eOverflow) byte 83: overflow error ( at [].[].gi)
    T0 "/opt/conda/conda-bld/blast_1595737360567/work/blast/c++/src/serial/member.cpp", line 768: Error: (CSerialException::eOverflow) ncbi::CMemberInfoFunctions::ReadWithSetFlagMember() - error while reading seqid ( at Blast-def-line-set.[].[].seqid.[].[].gi)

Tool Exit Code

255

The bug can be reproduced on UseGalaxy.eu using the following input [1],

ATGAAAAAGATAAAAATTGTTCCACTTATTTTAATAGTTGTAGTTGTCGGGTTTGGTATATATTTTTATGCTTCCAAAGATAAAGAAATTAATAATACTATTGATGCAATTGAAGATAAAAATTTCAAACAAGTTTATAAAGATAGCAGTTATATTTCTAAAAGCGATAATGGTGAAGTAGAAATGACTGAACGTCCGATAAAAATATATAATAGTTTAGGCGTTAAAGATATAAACATTCAGGATCGTAAAATAAAAAAAGTATCTAAAAATAAAAAACGAGTAGATGCTCAATATAAAATTAAAACAAACTACGGTAACATTGATCGCAACGTTCAATTTAATTTTGTTAAAGAAGATGGTATGTGGAAGTTAGATTGGGATCATAGCGTCATTATTCCAGGAATGCAGAAAGACCAAAGCATACATATTGAAAATTTAAAATCAGAACGTGGTAAAATTTTAGACCGAAACAATGTGGAATTGGCCAATACAGGAACAGCATATGAGATAGGCATCGTTCCAAAGAATGTATCTAAAAAAGATTATAAAGCAATCGCTAAAGAACTAAGTATTTCTGAAGACTATATCAAACAACAAATGGATCAAAATTGGGTACAAGATGATACCTTCGTTCCACTTAAAACCGTTAAAAAAATGGATGAATATTTAAGTGATTTCGCAAAAAAATTTCATCTTACAACTAATGAAACAAAAAGTCGTAACTATCCTCTAGGAAAAGCGACTTCACATCTATTAGGTTATGTTGGTCCCATTAACTCTGAAGAATTAAAACAAAAAGAATATAAAGGCTATAAAGATGATGCAGTTATTGGTAAAAAGGGACTCGAAAAACTTTACGATAAAAAGCTCCAACATGAAGATGGCTATCGTGTCACAATCGTTGACGATAATAGCAATACAATCGCACATACATTAATAGAGAAAAAGAAAAAAGATGGCAAAGATATTCAACTAACTATTGATGCTAAAGTTCAAAAGAGTATTTATAACAACATGAAAAATGATTATGGCTCAGGTACTGCTATCCACCCTCAAACAGGTGAATTATTAGCACTTGTAAGCACACCTTCATATGACGTCTATCCATTTATGTATGGCATGAGTAACGAAGAATATAATAAATTAACCGAAGATAAAAAAGAACCTCTGCTCAACAAGTTCCAGATTACAACTTCACCAGGTTCAACTCAAAAAATATTAACAGCAATGATTGGGTTAAATAACAAAACATTAGACGATAAAACAAGTTATAAAATCGATGGTAAAGGTTGGCAAAAAGATAAATCTTGGGGTGGTTACAACGTTACAAGAAATAAAGTGGTAAATGGTAATATCGACTTAAAACAAGCAATAGAATCATCAGATAACATTTTCTTTGCTAGAGTAGCACTCGAATTAGGCAGTAAGAAATTTGAAAAAGGCATGAAAAAACTAGGTGTTGGTGAAGATATACCAAGTGATTATCCATTTTATAATGCTCAAATTTCAAACAAAAATTTAGATAATGAAATATTATTAGCTGATTCAGGTTACGGACAAGGTGAAATACTGATTAACCCAGTACAGATCCTTTCAATCTATAGCGCATTAGAAAATAATGGCAATATTAACGCACCTCACTTATTAAAAGACACGAAAAACAAAGTTTGGAAGAAAAATATTATTTCCAAAGAAAATATCAATCTATTAACTGATGGTATGCAACAAGTCGTAAATAAAACACATAAAGAAGATATTTATAGATCTTATGCAAACTTAATTGGCAAATCCGGTACTGCAGAACTCAAAATGAAACAAGGAGAAACTGGCAGACAAATTGGGTGGTTTATATCATATGATAAAGATAATCCAAACATGATGATGGCTATTAATGTTAAAGATGTACAAGATAAAGGAATGGCTAGCTACAATGCCAAAATCTCAGGTAAAGTGTATGATGAGCTATATGAGAACGGTAATAAAAAATACGATATAGATGAATAA

and choosing "blastn" as "Type of BLAST".

You may import NCBI-BLAST-blastn-overflow-error-with-NCBI-NT-2023-09-01-Nucleotide-BLAST-database.rocrate.zip to save yourself the hassle of setting up the job.

According to a Stack Overflow post mentioning the same issue [1], the solution may be to update NCBI BLAST+ blastn.

[1] - https://stackoverflow.com/questions/70370949/local-blast-ncbi-c-exception

@kysrpex
Copy link
Author

kysrpex commented Sep 11, 2023

@bgruening This is the BLAST issue I commented this morning.

@kysrpex
Copy link
Author

kysrpex commented Sep 11, 2023

I assume #146 is related.

@peterjc
Copy link
Owner

peterjc commented Sep 11, 2023

Currently the wrapper specifies BLAST+ version 2.10.1 here https://github.com/peterjc/galaxy_blast/blob/master/tools/ncbi_blast_plus/ncbi_macros.xml

If we have reason to believe that version of BLAST+ can't cope with the latest NCBI DB, then updating ought to solve this - and touch wood ought not to be too complicated (assuming not changes to the command line etc). i.e. Issue #146.

Has anyone tried to reproduce this at the command line outside of Galaxy? I can probably do that locally with a recent copy of NCBI NT from August/September 2023...

@peterjc
Copy link
Owner

peterjc commented Sep 11, 2023

Confirming with our local copy of NT on Linux, BLAST 2.14.1 (current latest on bioconda) worked fine with the above command giving 500 hits (default limit), but after downgrading to BLAST 2.10.1 it crashes:

Error: NCBI C++ Exception:
    T0 "/opt/conda/conda-bld/blast_1607337341665/work/blast/c++/src/serial/objistrasnb.cpp", line 499: Error: (CSerialException::eOverflow) byte 83: overflow error ( at [].[].gi)
    T0 "/opt/conda/conda-bld/blast_1607337341665/work/blast/c++/src/serial/member.cpp", line 768: Error: (CSerialException::eOverflow) ncbi::CMemberInfoFunctions::ReadWithSetFlagMember() - error while reading seqid ( at Blast-def-line-set.[].[].seqid.[].[].gi)

Either version takes nearly an hour with 8 cores and 100GB allocated on our cluster:

time blastn -db $BLASTDB/nt -query query.fasta -task 'blastn' -evalue '0.001' -outfm t '6 std sallseqid score nident positive gaps ppos qframe sframe qseq sseq qlen slen salltitles' -out query_shared.tsv -num_threads 8

This is a strong reason to push a BLAST update for the wrappers.

@peterjc
Copy link
Owner

peterjc commented Nov 19, 2023

Updated wrappers released via #157, this should be resolved now - closing issue.

@peterjc peterjc closed this as completed Nov 19, 2023
@kysrpex
Copy link
Author

kysrpex commented Nov 20, 2023

Updated wrappers released via #157, this should be resolved now - closing issue.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants