-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save the search results like groundtruth_ivecs #9
Comments
Hi, once a query has been performed, all search results are stored on the CPU-side for evaluating the accuracy:
you can see the layout of the data here:
to store the results, simply store the data pointed to by Best regards, |
Thanks for your reply! |
Well, there is tau_build and tau_query. Otherwise, increasing tau_build can get you some better query performance in both speed and accuracy at the cost of construction time and increasing k can increase accuracy, but will slow everything down. Multi-GPU can help a bit, because it shards the dataset which reduces the number important neighbors within that shard that we need to keep, so it has similar effects as increasing k. But its only a small improvement, I would guess. Another thing you can try is changing the query parameters: If changing tau_query does not get you to 99%R@1, |
Thanks a lot! |
Hi, |
Hi, that functionality exists:
There is just currently no code that would print it out. An alternative way to get statistics is running the "stats" query, which runs the same query but collects a lot of statistics as well, so it will be a bit slower: ggnn/include/ggnn/cuda_knn_ggnn.cuh Line 332 in 51f3056
you can find an example of using it here: Line 136 in 51f3056
But you would again have to add your code to actually write out the number of distances per query. You should add that code at some point before this line: ggnn/include/ggnn/cuda_knn_ggnn.cuh Line 469 in 51f3056
Since it is "managed" memory, it can be accessed by both the GPU and CPU, so you can just use it (as opposed to device memory, which is only accessible by the GPU). In general, comparing the performance of different algorithms is always a bit tricky. We also have brute-force search modes available and can even import a HNSW graph for searching, if you want to test that: Line 132 in 51f3056
When using the "no slack" query, then it should be quite similar to what HNSW does on the CPU-side, Anyway, I hope this helps. |
Hello. |
Ah, I see. The counter is initialized here:
but never incremented. All distances are computed here:
So you have to add something like if (DIST_STATS && !threadIdx.x) ++dist_calc_counter; after that line to actually count the distances. |
I have a confusing problem where CHECK_CUDA(cudaPeekAtLastError()) fails on line 391 when building and querying with mydataset_stats (generated by mydataset_stats.cu), but mydataset (generated by mydataset.cu generated) works fine. |
the "stats" and "non-stats" versions run completely independent code. The main reason for seeing different performance is probably different parameters during construction and query. Best regards, |
Thanks for your reply! |
I am not familiar with cuda programming.
How can I minimally modify the code to save the search results like groundtruth_ivecs?
Any help would be greatly appreciated!
Thanks!
The text was updated successfully, but these errors were encountered: