Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making it go fast for high volume queries #668

Open
FlimFlamm opened this issue Aug 19, 2024 · 0 comments
Open

Making it go fast for high volume queries #668

FlimFlamm opened this issue Aug 19, 2024 · 0 comments

Comments

@FlimFlamm
Copy link

FlimFlamm commented Aug 19, 2024

Looking for any pointers/advice/best practices for my use case:

Large annoy tree (100GB +), high frequency lookups, best or near best accuracy required (wherever diminishing returns start to show i guess, which right now seems to be around search_k= 30_000 for 10M items each with 3500 components)

Essentially I need to non-stop sequentially lookup every item in the tree as fast as possible. At my desired search_k value, the performance hit is starting to hurt.

Side question: If i were to build another annoy index with as many build-trees as i can fit in memory/disk, would this significantly reduce my search_k requirements to get similar results? edit: answer: possibly yes; at least a bit

NOTE: currently, multi-processing across 2/3'rds of my cores appears to be fastest, which i suspect is due to I/O waiting times...

Tertiary question: Is a shared memory approach with one tree in memory and many processes accessing it achievable or useful?

Quaternary questions: is there a fastest metric? edit: answer: yes. In my case hamming turned out to be fastest, and counterintuitively, the most accurate by a keyword-based metric. (although if I normalize my continuous vector before hand (which should break hamming), the build time itself seems to increase drastically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant