Skip to content

A multi-choice benchmark to evaluate LLM performance on PyChrono API usage

License

Notifications You must be signed in to change notification settings

uwsbel/PyChronoBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyChronoBench

PyChronoBench is a multiple-choice benchmark designed to evaluate the performance of large language models (LLMs) in using the PyChrono API. The benchmark consists of 280 multiple-choice problems that cover various aspects of PyChrono API usage.

LLM Performance Plot

Features

  • A collection of real-world API-related questions to assess LLM capabilities in understanding and applying the PyChrono library.
  • Questions are structured in a user-friendly format, making it easy to evaluate model performance.

Format

Each question in PyChronoBench follows a simple structure:

  • Instruction: The question presented as a prompt, with multiple-choice options.
  • Input: No additional input is required.
  • Output: The correct answer(s) enclosed in double brackets.

Here are some examples:

{
    "instruction": "What method is used to set the friction coefficient for a contact material in PyChrono? 'A. material.SetFriction(value)', 'B. material.SetFrictionCoefficient(value)', 'C. material.SetFrictionValue(value)', 'D. material.SetFrictionFactor(value)'",
    "input": "",
    "output": "[[A]]"
},
{
    "instruction": "How do you add a body to the simulation in PyChrono? 'A. sys.AddBody(body)', 'B. sys.Add(body)', 'C. sys.Insert(body)', 'D. sys.AddObject(body)'",
    "input": "",
    "output": "[[B]]"
}

Usage

PyChronoBench is designed to be easy to integrate into your LLM evaluation pipeline, offering standardized and consistent questions to measure PyChrono API knowledge and decision-making.

About

A multi-choice benchmark to evaluate LLM performance on PyChrono API usage

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages