Skip to content

Releases: OpenNMT/OpenNMT-py

OpenNMT-py v3.5.1

18 Mar 14:01
3f0c5f7
Compare
Choose a tag to compare
  • Further fixes
  • added wikitext runs

OpenNMT-py v3.5.0

22 Feb 17:42
b9a60d6
Compare
Choose a tag to compare

3.5.0 (2024-02-22)

  • Further improvements and fixes
  • Suport for AWQ models
  • Add n_best for topp/topk generation
  • Support MoE (Mixtral) inference
  • Extend HF models converter
  • use flash_attn_with_kvcache for faster inference
  • Add wikitext2 PPL computation
  • Support for Phi-2 models

OpenNMT-py v3.4.3

02 Nov 12:53
7171a31
Compare
Choose a tag to compare
  • Further improvements to beam search and decoding
  • New indexing "in bucket" for faster inference cf #2496
  • Code cleanup
  • Fix int8 for CPU dynamic quantization (still slow...)

OpenNMT-py v3.4.2

20 Oct 13:37
3e63fcc
Compare
Choose a tag to compare
  • torch 2.1 (scaled_dot_product improvements)
  • Mistral 7B sliding window
  • Speed-up inference
  • flash attention 2 (with sliding window) >= v2.3.1
  • use FusedRMSNorm from apex if available
  • fixed attn_debug

OpenNMT-py v3.4.1

26 Sep 09:02
9abeed4
Compare
Choose a tag to compare
  • bug fixes
  • torch 2.x requirement (flash attention requires it)
  • zero-out the prompt loss in LM finetuning
  • batching sorted on src then tgt instead of max len
  • six dependancy

OpenNMT-py v3.4.0

06 Sep 12:56
eb24258
Compare
Choose a tag to compare
  • bitsandbytes 4/8 bit quantization at inference
  • MMLU-FR results and scoring
  • flan-T5 support
  • flash attention
  • terminology transform
  • tensor parallelism (inference, training)

OpenNMT-py v3.3.0

22 Jun 11:33
3afced5
Compare
Choose a tag to compare
  • Switch to pytorch 2.0.1
  • Eval LLM with MMLU benchmark
  • Fix Falcon 40B conversion / finetuning / inference
  • Plugin encoder/decoder thanks @kleag / @n2oblife
  • optional Safetensors for model storage (beta)
  • finetuning config templates for supported LLMs

OpenNMT-py v3.2.0

07 Jun 20:15
c858395
Compare
Choose a tag to compare

Lots new stuff in this release:

  • Skip init during model build (way faster building)
  • Enable quantization of LoRA layers
  • Enable 4bit quantization from bitsandbytes (NF4 / FP4)
  • Enable "some" bnb.optim Optimizers for benchmarking purpose
  • Refactor model state_dict loading to enable pseudo lazy loading with move on GPU as it loads
  • Enable Gradient checkpointing for FFN, MHA, LoRA modules
  • Make FFN bias optional (same as QKV): llama, mpt, redpajama, openllama converters changed accordingly.
    Convertv2_v3 set add_qkvbias=True, add_ffnbias=True.
    load_checkpoint: if w1_bias detected in checkpoint then add_ffnbias=True
  • Add Multi Query attention
  • Add Parallel Residual attention
  • Add Falcon 7B converter

OpenNMT-py v3.1.3

24 May 10:40
f71b62c
Compare
Choose a tag to compare
  • Step-by-step Tuto for Vicuna replication thanks Lina
  • MosaicML MPT7B converter and support (Alibi embeddings)
  • Open Llama converter / Redpajama converter
  • Switch GCLD3 to Fasttext thanks ArtanieTheOne
  • fix coverage attention in beam decoding
  • fix ct2 keys for "Llama / MPT7B based" OpenNMT-y models

OpenNMT-py v3.1.2

10 May 17:38
e7b2cf4
Compare
Choose a tag to compare
  • fixes: transforms (normalize, clean, inlinetags)

  • Llama support (rotary embeddings, RMSNorm, Silu activation)

  • 8bit loading for specific layers (along with LoRa for other layers)

  • subword learner added to build_vocab