Got wrong result by using simple python code in sift1m dataset #18

FartFang · 2018-07-10T13:40:36Z

I rewrite example.py by changing the replacing input dataset 'sift1m', and i got a result that seems like a wrong evaluation:

Recall (V=16, M=8, subquants=256): [0.2018 0.4247 0.5168 0.5218]
Recall (V=16, M=16, subquants=256): [0.3124 0.5057 0.5218 0.5218]
Recall (V=16, M=8, subquants=512): [0.2219 0.4477 0.5198 0.5218]

And i also got a error when i try to use GIST1M dataset:

Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 266, in _feed
    send(obj)
SystemError: NULL result without error in PyObject_Call

Is that the code in python folder is not for the large dataset like SIFI?
Or that is some mistake in importing data process?

Looking forward to your reply.
Thanks a lot!

The text was updated successfully, but these errors were encountered:

pumpikano · 2018-07-11T18:55:13Z

Regarding the sift1m dataset, I can't tell if anything is wrong from those numbers, but I think it is probably simply that the quantization is too coarse for a dataset with the size/complexity of sift1m. From my notes, I found that my recall around [~0.40 ~0.85 ~0.98 ~0.98] with V=1024, M=8, subquants=256. Fitting the quantizers in this case might take a while (an hour or two) on a single machine.

I have not tried gist1m, but it could be a memory issue (cf. https://bugs.python.org/issue17560). I can't tell from the info that you shared where in the program this is happening though. If it is during index building, an easy thing to try is parallelize index building more by increasing num_procs: https://github.com/yahoo/lopq/blob/master/python/lopq/search.py#L85

In any case, the code in python/ assumes that the full datasets fits in memory. This assumption would need to be changed if this is not the case for you (or try the Spark code instead).

FartFang · 2018-07-12T08:23:15Z

Thanks a lot,I change V=16 to V=1024,and I got the result seems more correct than last time.
BTW,where is the API which can change the value of w according to the paper?
And what is the default value of w in your implement?
@pumpikano

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Got wrong result by using simple python code in sift1m dataset #18

Got wrong result by using simple python code in sift1m dataset #18

FartFang commented Jul 10, 2018

pumpikano commented Jul 11, 2018 •

edited

Loading

FartFang commented Jul 12, 2018

Got wrong result by using simple python code in sift1m dataset #18

Got wrong result by using simple python code in sift1m dataset #18

Comments

FartFang commented Jul 10, 2018

pumpikano commented Jul 11, 2018 • edited Loading

FartFang commented Jul 12, 2018

pumpikano commented Jul 11, 2018 •

edited

Loading