请问 https://llmbench.ai/align/submit 网站上的 Critique LLM 的评测服务代码有吗 #9

IcyFeather233 · 2024-07-08T15:29:05Z

想问一下怎么像 https://llmbench.ai/align/submit 这个上面的服务一样，通过一个LLM推理的csv结果文件，通过CritiqueLLM来打分评测

另外，我通过 OpenCompass 使用 CritiqueLLM 进行推理评测，683条数据中只成功解析来616条，感觉这个CritiqueLM对于格式的指令遵循好像不是很强？想问一下 https://llmbench.ai/align/submit 网站上是用的什么prompt和解析方法呢？

附上 OpenCompass 的 config：

from mmengine.config import read_base

with read_base():
    from .datasets.subjective.alignbench.alignbench_judgeby_critiquellm import alignbench_datasets

from opencompass.models import HuggingFaceCausalLM, HuggingFace, HuggingFaceChatGLM3, OpenAI
from opencompass.models.openai_api import OpenAIAllesAPIN
from opencompass.partitioners import NaivePartitioner, SizePartitioner
from opencompass.partitioners.sub_naive import SubjectiveNaivePartitioner
from opencompass.partitioners.sub_size import SubjectiveSizePartitioner
from opencompass.runners import LocalRunner
from opencompass.runners import SlurmSequentialRunner
from opencompass.tasks import OpenICLInferTask
from opencompass.tasks.subjective_eval import SubjectiveEvalTask
from opencompass.summarizers import AlignmentBenchSummarizer

# -------------Inference Stage ----------------------------------------
# For subjective evaluation, we often set do sample for models
from opencompass.models import VLLM

_meta_template = dict(
    round=[
        dict(role="HUMAN", begin='<|im_start|>user\n', end='<|im_end|>\n'),
        dict(role="BOT", begin="<|im_start|>assistant\n", end='<|im_end|>\n', generate=True),
    ],
    eos_token_id=151645,
)

GPU_NUMS = 4


stop_list = ['<|im_end|>', '</s>', '<|endoftext|>']

models = [
    dict(
        type=VLLM,
        abbr='xxx',
        path='xxx',
        model_kwargs=dict(tensor_parallel_size=GPU_NUMS, disable_custom_all_reduce=True, enforce_eager=True),
        meta_template=_meta_template,
        max_out_len=1024,
        max_seq_len=2048,
        batch_size=GPU_NUMS * 8,
        generation_kwargs=dict(temperature=0.1, top_p=0.9, skip_special_tokens=False, stop=stop_list),
        stop_words=stop_list,
        run_cfg=dict(num_gpus=GPU_NUMS, num_procs=1),
    )
]

datasets = [*alignbench_datasets]

# -------------Evalation Stage ----------------------------------------

## ------------- JudgeLLM Configuration


api_meta_template = dict(
    round=[
            dict(role='HUMAN', api_role='HUMAN'),
            dict(role='BOT', api_role='BOT', generate=True),
    ],
)

judge_models = [
    dict(
        type=VLLM,
        abbr='CritiqueLLM',
        path='/xxx/models/CritiqueLLM',
        model_kwargs=dict(tensor_parallel_size=GPU_NUMS, disable_custom_all_reduce=True, enforce_eager=True),
        meta_template=_meta_template,
        max_out_len=1024,
        max_seq_len=2048,
        batch_size=GPU_NUMS * 8,
        generation_kwargs=dict(temperature=0.1, top_p=0.9, skip_special_tokens=False, stop=stop_list),
        run_cfg=dict(num_gpus=GPU_NUMS, num_procs=1),
    )
]

## ------------- Evaluation Configuration
eval = dict(
    partitioner=dict(type=SubjectiveNaivePartitioner, models=models, judge_models=judge_models),
    runner=dict(type=LocalRunner, max_num_workers=16, task=dict(type=SubjectiveEvalTask)),
)

summarizer = dict(type=AlignmentBenchSummarizer)

work_dir = 'outputs/alignment_bench/'

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

请问 https://llmbench.ai/align/submit 网站上的 Critique LLM 的评测服务代码有吗 #9

请问 https://llmbench.ai/align/submit 网站上的 Critique LLM 的评测服务代码有吗 #9

IcyFeather233 commented Jul 8, 2024

请问 https://llmbench.ai/align/submit 网站上的 Critique LLM 的评测服务代码有吗 #9

请问 https://llmbench.ai/align/submit 网站上的 Critique LLM 的评测服务代码有吗 #9

Comments

IcyFeather233 commented Jul 8, 2024