Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问 https://llmbench.ai/align/submit 网站上的 Critique LLM 的评测服务代码有吗 #9

Open
IcyFeather233 opened this issue Jul 8, 2024 · 0 comments

Comments

@IcyFeather233
Copy link

想问一下怎么像 https://llmbench.ai/align/submit 这个上面的服务一样,通过一个LLM推理的csv结果文件,通过CritiqueLLM来打分评测

另外,我通过 OpenCompass 使用 CritiqueLLM 进行推理评测,683条数据中只成功解析来616条,感觉这个CritiqueLM对于格式的指令遵循好像不是很强?想问一下 https://llmbench.ai/align/submit 网站上是用的什么prompt和解析方法呢?

附上 OpenCompass 的 config:

from mmengine.config import read_base

with read_base():
    from .datasets.subjective.alignbench.alignbench_judgeby_critiquellm import alignbench_datasets

from opencompass.models import HuggingFaceCausalLM, HuggingFace, HuggingFaceChatGLM3, OpenAI
from opencompass.models.openai_api import OpenAIAllesAPIN
from opencompass.partitioners import NaivePartitioner, SizePartitioner
from opencompass.partitioners.sub_naive import SubjectiveNaivePartitioner
from opencompass.partitioners.sub_size import SubjectiveSizePartitioner
from opencompass.runners import LocalRunner
from opencompass.runners import SlurmSequentialRunner
from opencompass.tasks import OpenICLInferTask
from opencompass.tasks.subjective_eval import SubjectiveEvalTask
from opencompass.summarizers import AlignmentBenchSummarizer

# -------------Inference Stage ----------------------------------------
# For subjective evaluation, we often set do sample for models
from opencompass.models import VLLM

_meta_template = dict(
    round=[
        dict(role="HUMAN", begin='<|im_start|>user\n', end='<|im_end|>\n'),
        dict(role="BOT", begin="<|im_start|>assistant\n", end='<|im_end|>\n', generate=True),
    ],
    eos_token_id=151645,
)

GPU_NUMS = 4


stop_list = ['<|im_end|>', '</s>', '<|endoftext|>']

models = [
    dict(
        type=VLLM,
        abbr='xxx',
        path='xxx',
        model_kwargs=dict(tensor_parallel_size=GPU_NUMS, disable_custom_all_reduce=True, enforce_eager=True),
        meta_template=_meta_template,
        max_out_len=1024,
        max_seq_len=2048,
        batch_size=GPU_NUMS * 8,
        generation_kwargs=dict(temperature=0.1, top_p=0.9, skip_special_tokens=False, stop=stop_list),
        stop_words=stop_list,
        run_cfg=dict(num_gpus=GPU_NUMS, num_procs=1),
    )
]

datasets = [*alignbench_datasets]

# -------------Evalation Stage ----------------------------------------

## ------------- JudgeLLM Configuration


api_meta_template = dict(
    round=[
            dict(role='HUMAN', api_role='HUMAN'),
            dict(role='BOT', api_role='BOT', generate=True),
    ],
)

judge_models = [
    dict(
        type=VLLM,
        abbr='CritiqueLLM',
        path='/xxx/models/CritiqueLLM',
        model_kwargs=dict(tensor_parallel_size=GPU_NUMS, disable_custom_all_reduce=True, enforce_eager=True),
        meta_template=_meta_template,
        max_out_len=1024,
        max_seq_len=2048,
        batch_size=GPU_NUMS * 8,
        generation_kwargs=dict(temperature=0.1, top_p=0.9, skip_special_tokens=False, stop=stop_list),
        run_cfg=dict(num_gpus=GPU_NUMS, num_procs=1),
    )
]

## ------------- Evaluation Configuration
eval = dict(
    partitioner=dict(type=SubjectiveNaivePartitioner, models=models, judge_models=judge_models),
    runner=dict(type=LocalRunner, max_num_workers=16, task=dict(type=SubjectiveEvalTask)),
)

summarizer = dict(type=AlignmentBenchSummarizer)

work_dir = 'outputs/alignment_bench/'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant