Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vicuna-13B results #24

Open
yzc111 opened this issue Mar 8, 2024 · 12 comments
Open

Vicuna-13B results #24

yzc111 opened this issue Mar 8, 2024 · 12 comments

Comments

@yzc111
Copy link

yzc111 commented Mar 8, 2024

Hello, when I reproduce the results on Vicuna-13B and Llams2-7B , I can not get any model output, and the code outputs the warning:"Prompt exceeds max length and return an empty string as answer. If this happens too many times, it is suggested to make the prompt shorter", How to deal with this phenomenon? Thank you~

@gaotianyu1350
Copy link
Member

Hi,

Which config are you using? Vicuna and llama2 models have a 4k context window limit, which limits how many passages you can use in the context.

@yzc111
Copy link
Author

yzc111 commented Mar 12, 2024

Hi, thank you for your reply, the config is 2 shot, 3 ndoc

@gaotianyu1350
Copy link
Member

Did you use the "light instruction" version as well?

@yzc111
Copy link
Author

yzc111 commented Mar 12, 2024

NO, I just use the default setting

@gaotianyu1350
Copy link
Member

Can you try this config (but change the model name): https://github.com/princeton-nlp/ALCE/blob/main/configs/asqa_alpaca-7b_shot2_ndoc3_gtr_light_inst.yaml

@yzc111
Copy link
Author

yzc111 commented Mar 12, 2024

OK. thanks~

@yzc111
Copy link
Author

yzc111 commented Mar 13, 2024

another question, when I use the setting
prompt_file: prompts/asqa_light_inst.json
eval_file: data/asqa_eval_gtr_top100.json
shot: 2
ndoc: 3
dataset_name: asqa
tag: gtr_light_inst
model: vicuna-13b
temperature: 1.0
top_p: 0.95
to reproduce the result, I get the QA-EM=19.7 and mauve=70.7. the paper reports EM=31.9 mauve=82.6. are there any different settings in the config file?

@howard-yen
Copy link
Collaborator

Note that there is a difference between EM and QA-EM, and we report EM in the paper. Can you post the full output or .score file? Can you also post the link to the vicuna model that you are using? There are a couple different versions with different performances.

@yzc111
Copy link
Author

yzc111 commented Mar 18, 2024

Hi. this is the config of we used to reproduce the result on vicuna-13B
prompt_file: prompts/asqa_light_inst.json
eval_file: data/asqa_eval_gtr_top100.json
shot: 2
ndoc: 3
dataset_name: asqa
tag: gtr_light_inst
model: /work/models/vicuna-13b
temperature: 1.0
top_p: 0.95

@yzc111
Copy link
Author

yzc111 commented Apr 2, 2024

so, how can I get the EM score of your paper reported?

@gaotianyu1350
Copy link
Member

That is “str_em"

@yzc111
Copy link
Author

yzc111 commented Apr 2, 2024

Fine,Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants