Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does "merge_kingdoms" mentioned in the "build_db_and_run.sh" script work? #31

Open
ecalfapietra opened this issue Mar 16, 2023 · 1 comment
Assignees

Comments

@ecalfapietra
Copy link

Hello,

I'm trying to use the STAT tools to build a database from fasta sequences, and then using it to do metagenomics/taxonomic analyses.
So I'm following the tutorial in the build_db_and_run.sh script.
It says that we can do the identify_tax_ids part in multiple instances, but if we do, we have to use the tool called merge_kingdoms to combine results into a single file.
My problem is that there is no informations about the use of this tool.
The help of the tool is : need <tax.parents>
I don't understand what I should put in each argument (except for tax.parents).

Also, I'm using the default parameters :
KMER_LEN=32
DENSE_WINDOW=4 # 1 kmer of 4 for dense db (just for example)
SPARSE_WINDOW=128 # 1 kmer of 128 for sparse db (just for example)
But I don't know if I really should ?

Same question for MAX_KMER_DICTIONARY_SIZE=5000000 # This number should be roughly as max kmers expected * 2.
I don't really know how I could know the maximum number of kmers expected.

Thank you in advance !

@tolot27
Copy link

tolot27 commented May 13, 2024

@ecalfapietra Did you build your db sucessfully? I'm looking for the same parameters to build a refeq k-mer db.

@multikengineer multikengineer self-assigned this May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants