Skip to content

TRI-ML/japanese-llm-ranking

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

jrank: Ranking Japanese LLMs

| Ranking | Blog | Discord |

This repository supports YuzuAI's Rakuda leaderboard of Japanese LLMs, which is a Japanese-focused analogue of LMSYS' Vicuna eval.

Adding a model to Rakuda

To add a model to the Rakuda leaderboard, first have the model answer the Rakuda questions. These questions are stored in jrank/questions/ and on hugging-face.

If you wish, you can use the jrank/get_model_qa.py script to generate these answers. This script loads and runs models using model adapters from FastChat. Custom adapters can also be implemented in jrank/adapters.py, and scripts showing exactly the commands used to run existing models on the leaderboard are stored in jrank/jobs/. If your model is only accessible via an API, consult jrank/get_gpt_qa.py.

Once your model has answered the Rakuda questions, use jrank/matchmaker.py to send pairs of answers from your model and other ranked models to an external reviewer, by default GPT-4 (jrank/reviewer_gpt.py). The reviewer will evaluate which answer is better and store its results in jrank/reviews.

Finally run the analysis notebook jrank/bradley-terry.ipynb which will perform a Bayesian analysis of the reviews and infer the strength of each model. The ranking will be output to jrank/rankings/.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 95.2%
  • Python 3.4%
  • Shell 1.4%