Releases: nlpxucan/WizardLM
Releases · nlpxucan/WizardLM
Release v1.6
🚀Major Update: Introducing WizardCoder 34B trained from Code Llama
WizardCoder-34B surpasses GPT-4, ChatGPT-3.5 and Claude-2 on HumanEval with 73.2% pass@1
-
🖥️Demo: http://47.103.63.15:50085/
-
🏇Model Weights: https://huggingface.co/WizardLM/WizardCoder-Python-34B-V1.0
Release v1.5
🚀Major Update: Introducing WizardMath, the third member of Wizard Family
WizardMath 70B achieves:
-
Surpasses ChatGPT-3.5, Claude Instant-1, PaLM-2 and Chinchilla on GSM8k with 81.6 Pass@1
-
Surpasses Text-davinci-002, GAL, PaLM, GPT-3 on MATH with 22.7 Pass@1
Release v1.4
🚀Major Update: Introducing WizardLM-70B-V1.0 trained from Llama-2
Compared with Llama-2-70b-chat, there are the following updates:
- AlpacaEval Leaderboard: 92.66% -> 92.91%
- MT-Bench Leaderboard: 6.86 -> 7.78
- Gsm8K: 56.8% -> 77.6%
- HumanEval: 32.3 -> 50.6
Release v1.3
🚀Major Update: Introducing WizardLM-13B-V1.2 trained from Llama-2
Compared with WizardLM-13B-V1.1, there are the following updates:
- Context Window: 2048 -> 4096
- AlpacaEval Leaderboard: 86.32% -> 89.17%
- MT-Bench Leaderboard: 6.76 -> 7.06
- WizardLM Eval: 99.3% -> 101.4%
Release v1.2
🚀Major Update: Introducing WizardLM 30B Version.
- On difficulty-balanced Evol-Instruct testset, evaluated by GPT-4: WizardLM-30B achieves 97.8% of ChatGPT, Guanaco-65B achieves 96.6%, and WizardLM-13B achieves 89.1%.
- We provide a comparison between the performance of the WizardLM-30B and ChatGPT on different skills to establish a reasonable expectation of WizardLM's capabilities.
Release v1.1
🚀Major Update: Introducing WizardLM 13B Version.
- On difficulty-balanced Evol-Instruct testset, evaluated by GPT-4: WizardLM-13B achieves 89.1% of ChatGPT, Vicuna-13B achieves 86.9%, and WizardLM-7B achieves 78%.
- The 13B version is trained on instruction data evolved from real-world human conversations (ShareGPT), while the 7B version is trained on instruction data evolved from machine-generated data (Alpaca).
- We provide a comparison between the performance of the WizardLM-13B and ChatGPT on different skills to establish a reasonable expectation of WizardLM's capabilities.