Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Retrieval] request to release the idx (idx in corpus) of each candidate code. #10

Open
izhx opened this issue Jan 8, 2024 · 2 comments

Comments

@izhx
Copy link

izhx commented Jan 8, 2024

Currently, the positive_code and negative_code provide the code string. Could you release the corresponding corpus idx of each candidate? And, if possible, with the corresponding corpus filename.

This would be helpful for using this data in tools like beir.

Thanks.


now (such as retrieval_code_code/validation/Java_code_code_dev_file.jsonl) :

    "positive_code": [
        {"source_code": "xxxx"}, 
        ....
    ],
    "negative_code": [
        {"source_code": "yyyy"}, 
        ....
    ],

expected:

    "positive_code": [
        {"idx": "x", "file_name": "java.jsonl", "source_code": "xxxx"}, 
        ....
    ],
    "negative_code": [
        {"idx": "y", "file_name": "java.jsonl", "source_code": "yyyy"}, 
        ....
    ],
@izhx
Copy link
Author

izhx commented Jan 8, 2024

BTW, this benchmark is excellent work.

@Jackal1586
Copy link
Collaborator

Thanks for the suggestion, we are looking into the possibility of making changes. Please stay with us for few days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants