Save the search results like groundtruth_ivecs #9

zzzmm1 · 2022-12-07T07:58:29Z

I am not familiar with cuda programming.
How can I minimally modify the code to save the search results like groundtruth_ivecs?
Any help would be greatly appreciated!
Thanks!

LukasRuppert · 2022-12-07T08:25:17Z

Hi,

once a query has been performed, all search results are stored on the CPU-side for evaluating the accuracy:

single GPU mode :

ggnn/include/ggnn/cuda_knn_ggnn.cuh

Line 256 in 51f3056

ggnn_results.evaluateResults();
multi GPU mode:

ggnn/include/ggnn/cuda_knn_ggnn_multi_gpu.cuh

Line 561 in 51f3056

ggnn_results.evaluateResults();

you can see the layout of the data here:

ggnn/include/ggnn/utils/cuda_knn_ggnn_results.cuh

Line 182 in 51f3056

KeyT q = h_sorted_ids[n * KQuery + k_result];

to store the results, simply store the data pointed to by ggnn_results.h_sorted_ids

Best regards,
Lukas

zzzmm1 · 2022-12-07T13:41:52Z

Thanks for your reply!
One more question.
Applying the default search graph configuration "k=96, s=64, l=4, r=2, tau=0.4" like GIST on a 1 million dataset in 1024 dimensions, using the cosine similarity metric, the search performance is not good enough (can't reach 99% R@1).
How to tune the parameters and would multi-GPUs help?

LukasRuppert · 2022-12-07T13:57:55Z

Well, there is tau_build and tau_query.
The higher tau_query, the better your search results should become, but it also increases the cost and there are diminishing returns on quality.
Try using tau_query somewhere between 1 and 2 for high accuracy.

Otherwise, increasing tau_build can get you some better query performance in both speed and accuracy at the cost of construction time and increasing k can increase accuracy, but will slow everything down.

Multi-GPU can help a bit, because it shards the dataset which reduces the number important neighbors within that shard that we need to keep, so it has similar effects as increasing k. But its only a small improvement, I would guess.
Essentially, it just performs the search on all shards and then merges the search results at the end (so this also has the effect of having an increased k_query).
You can even run the multi-GPU mode on a single GPU, if you find that sharding produces better results for you.

Another thing you can try is changing the query parameters:
You can change the size of the k-best list of the search (k_query) (it should have similar effects to changing tau_query but at higher cost),
the size of the priority queue (which did not matter too much in our tests - see table 6 in the arxiv paper),
and the size of the search iterations (set the size of the visited list to a similar value, or the search might run in cycles)

If changing tau_query does not get you to 99%R@1,
the main parameters to boost performance are really k, the number of neighbors per point
and the number of search iterations (and matching visited list size).

zzzmm1 · 2022-12-07T14:38:23Z

Thanks a lot!
I will try.

zzzmm1 · 2022-12-12T06:53:20Z

Hi,
Is there any way to get the number of distances computed during the search to compare the acceleration ratio between GGNN and other graph search algorithms (like HNSW)?
Since one is GPU-based and one is CPU-based, it's not fair to compare the speed directly.
Or is there another metric to measure the acceleration ratio between GGNN and brute-force search?

LukasRuppert · 2022-12-12T09:56:46Z

Hi,

that functionality exists:

ggnn/include/ggnn/query/cuda_knn_query_layer.cuh

Line 126 in 51f3056

d_dist_stats[n] = cache.get_dist_stats();

ggnn/include/ggnn/cuda_knn_ggnn_gpu_instance.cuh

Line 502 in 51f3056

query_kernel.d_dist_stats = m_dist_statistics;

There is just currently no code that would print it out.
Also, it looks to me like the memory containing the number of distances computed is being freed before the computation is actually done (CUDA kernels run asynchronously in the background) -- so better not use this one.
In any case, it would only be computed if explicitly requested via the template parameter DIST_STATS since simply counting the number of distances computed is additional work that does not need to be done to actually execute the query and resources are scarce on the GPU, so even just adding a few bytes can in a few cases slow things down considerably.

An alternative way to get statistics is running the "stats" query, which runs the same query but collects a lot of statistics as well, so it will be a bit slower:

ggnn/include/ggnn/cuda_knn_ggnn.cuh

Line 332 in 51f3056

 typedef StatsQueryKernel<measure, ValueT, KeyT, D, KBuild, KF, KQuery, S, BLOCK_DIM_X, BaseT, 

you can find an example of using it here:

ggnn/src/sift1m_stats.cu

Line 136 in 51f3056

m_ggnn.queryLayerDebug<32, 400, 448, 64>();

But you would again have to add your code to actually write out the number of distances per query.
It must have gotten lost during some round of refactoring we did on the code.

You should add that code at some point before this line:

ggnn/include/ggnn/cuda_knn_ggnn.cuh

Line 469 in 51f3056

cudaFree(m_dist_statistics);

Since it is "managed" memory, it can be accessed by both the GPU and CPU, so you can just use it (as opposed to device memory, which is only accessible by the GPU).

In general, comparing the performance of different algorithms is always a bit tricky.
I would argue that the best way to compare would be to run both algorithms on the same hardware,
but, of course, one cannot do that when comparing CPU and GPU-based implementations.

We also have brute-force search modes available and can even import a HNSW graph for searching, if you want to test that:

ggnn/src/sift1m_load_hnsw.cu

Line 132 in 51f3056

HNSWLoader m_hnsw_loader(FLAGS_graph_filename);

When using the "no slack" query, then it should be quite similar to what HNSW does on the CPU-side,
but that query is not as efficient on the GPU as our regular query implementation with "tau_query" and a significantly shorter k-best list.

Anyway, I hope this helps.
Best regards,
Lukas

zzzmm1 · 2022-12-12T12:19:46Z

Hello.
I added the code before line 469 and both printing out dist_stats and writing dist_stats to the binary file.
The results are all 0.
Also, the result_distance printed on line 459 is all 0 while the printed m_query_results are correct.
Is there something wrong, where is the code that counts the num distance exactly?

LukasRuppert · 2022-12-12T12:27:57Z

Ah, I see.

The counter is initialized here:

ggnn/include/ggnn/cache/cuda_simple_knn_cache.cuh

Line 113 in 51f3056

if (DIST_STATS && !threadIdx.x) dist_calc_counter = 0;

but never incremented.

All distances are computed here:

ggnn/include/ggnn/cache/cuda_simple_knn_cache.cuh

Line 272 in 51f3056

const ValueT dist = rs_dist_calc.distance_synced(other_m);

So you have to add something like

if (DIST_STATS && !threadIdx.x) ++dist_calc_counter;

after that line to actually count the distances.

zzzmm1 · 2022-12-13T10:32:40Z

I have a confusing problem where CHECK_CUDA(cudaPeekAtLastError()) fails on line 391 when building and querying with mydataset_stats (generated by mydataset_stats.cu), but mydataset (generated by mydataset.cu generated) works fine.
I have tried sift1m and sift1m_stats and both return similar results.
When using mydataset_stats with a pre-built graph file, it works, but returns worse results than the normal mydataset (c@1 0.70 vs 0.93).
In addition, the average distance computation per query on a 1M dataset is only 20k, which is as expected?
This is much less than HNSW.

LukasRuppert · 2022-12-13T14:16:34Z

the "stats" and "non-stats" versions run completely independent code.
when there is a CUDA error, that means one has to debug a bit more to figure out what kind of error this was and what might be the cause.
with just this information, I can hardly even begin to speculate what might be the issue.
I can also sadly not spend too much time on this.
But if you can give me the parameters for GGNN and your dataset, I might be able to try and reproduce it here.

The main reason for seeing different performance is probably different parameters during construction and query.
One has to be quite careful there, since there's a lot of parameters.
E.g., the stats version seems to have a different query configuration and also does not run the query with the same parameters. (Simply look at the diff between the two)
For the number of distance computations per query, I don't have any numbers right now, but we can compute the worst case:
For every visited point (one per iteration), we compute the distance to all its neighbors (k), cached distances are skipped.
So if we have to query configured to 400 iterations max and the search-graph configured to k=24 for SIFT1M, then there should be no more than 400*24 = 9600 distances computed.
Additionally to these iterations, we also once compute all distances to the top layer points s = 32, but that is almost negligible in comparison.

Best regards,
Lukas

zzzmm1 · 2022-12-15T03:27:04Z

Thanks for your reply!
I am using the same build and query parameters in both the "stats" and "non-stats" versions, and the "stats" version gives CUDA errors when both use <32, 400, 448, 64> parameters , and both work fine when both use <128, 2000, 2048, 64> parameters.
I adjusted each of these parameters for comparison and found that it was "BLOCK_DIM_X" that matters, probably because my dataset is 1024 dimensions, closer to the GIST dataset, but using the cosine distance metric.
Also, does each iteration refer to visiting a node, in my experiments the number of MAX_ITERATIONS had the most noticeable effect on the search performance when searching on the same built graph.
How should I set the "base, factor and shard" when using multi-gpus if the number of datasets N_base is not divisible by 2?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save the search results like groundtruth_ivecs #9

Save the search results like groundtruth_ivecs #9

zzzmm1 commented Dec 7, 2022

LukasRuppert commented Dec 7, 2022

zzzmm1 commented Dec 7, 2022

LukasRuppert commented Dec 7, 2022 •

edited

Loading

zzzmm1 commented Dec 7, 2022

zzzmm1 commented Dec 12, 2022

LukasRuppert commented Dec 12, 2022

zzzmm1 commented Dec 12, 2022

LukasRuppert commented Dec 12, 2022

zzzmm1 commented Dec 13, 2022

LukasRuppert commented Dec 13, 2022 •

edited

Loading

zzzmm1 commented Dec 15, 2022

Save the search results like groundtruth_ivecs #9

Save the search results like groundtruth_ivecs #9

Comments

zzzmm1 commented Dec 7, 2022

LukasRuppert commented Dec 7, 2022

zzzmm1 commented Dec 7, 2022

LukasRuppert commented Dec 7, 2022 • edited Loading

zzzmm1 commented Dec 7, 2022

zzzmm1 commented Dec 12, 2022

LukasRuppert commented Dec 12, 2022

zzzmm1 commented Dec 12, 2022

LukasRuppert commented Dec 12, 2022

zzzmm1 commented Dec 13, 2022

LukasRuppert commented Dec 13, 2022 • edited Loading

zzzmm1 commented Dec 15, 2022

LukasRuppert commented Dec 7, 2022 •

edited

Loading

LukasRuppert commented Dec 13, 2022 •

edited

Loading