Skip to content
This repository has been archived by the owner on May 1, 2020. It is now read-only.

Got wrong result by using simple python code in sift1m dataset #18

Open
FartFang opened this issue Jul 10, 2018 · 2 comments
Open

Got wrong result by using simple python code in sift1m dataset #18

FartFang opened this issue Jul 10, 2018 · 2 comments

Comments

@FartFang
Copy link

I rewrite example.py by changing the replacing input dataset 'sift1m', and i got a result that seems like a wrong evaluation:

Recall (V=16, M=8, subquants=256): [0.2018 0.4247 0.5168 0.5218]
Recall (V=16, M=16, subquants=256): [0.3124 0.5057 0.5218 0.5218]
Recall (V=16, M=8, subquants=512): [0.2219 0.4477 0.5198 0.5218]

And i also got a error when i try to use GIST1M dataset:

Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 266, in _feed
    send(obj)
SystemError: NULL result without error in PyObject_Call

Is that the code in python folder is not for the large dataset like SIFI?
Or that is some mistake in importing data process?

Looking forward to your reply.
Thanks a lot!

@pumpikano
Copy link
Collaborator

pumpikano commented Jul 11, 2018

Regarding the sift1m dataset, I can't tell if anything is wrong from those numbers, but I think it is probably simply that the quantization is too coarse for a dataset with the size/complexity of sift1m. From my notes, I found that my recall around [~0.40 ~0.85 ~0.98 ~0.98] with V=1024, M=8, subquants=256. Fitting the quantizers in this case might take a while (an hour or two) on a single machine.

I have not tried gist1m, but it could be a memory issue (cf. https://bugs.python.org/issue17560). I can't tell from the info that you shared where in the program this is happening though. If it is during index building, an easy thing to try is parallelize index building more by increasing num_procs: https://github.com/yahoo/lopq/blob/master/python/lopq/search.py#L85

In any case, the code in python/ assumes that the full datasets fits in memory. This assumption would need to be changed if this is not the case for you (or try the Spark code instead).

@FartFang
Copy link
Author

Thanks a lot,I change V=16 to V=1024,and I got the result seems more correct than last time.
BTW,where is the API which can change the value of w according to the paper?
And what is the default value of w in your implement?
@pumpikano

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants