Releases: OpenNMT/OpenNMT-py
Releases · OpenNMT/OpenNMT-py
OpenNMT-py v3.5.1
OpenNMT-py v3.5.0
3.5.0 (2024-02-22)
- Further improvements and fixes
- Suport for AWQ models
- Add n_best for topp/topk generation
- Support MoE (Mixtral) inference
- Extend HF models converter
- use flash_attn_with_kvcache for faster inference
- Add wikitext2 PPL computation
- Support for Phi-2 models
OpenNMT-py v3.4.3
- Further improvements to beam search and decoding
- New indexing "in bucket" for faster inference cf #2496
- Code cleanup
- Fix int8 for CPU dynamic quantization (still slow...)
OpenNMT-py v3.4.2
- torch 2.1 (scaled_dot_product improvements)
- Mistral 7B sliding window
- Speed-up inference
- flash attention 2 (with sliding window) >= v2.3.1
- use FusedRMSNorm from apex if available
- fixed attn_debug
OpenNMT-py v3.4.1
- bug fixes
- torch 2.x requirement (flash attention requires it)
- zero-out the prompt loss in LM finetuning
- batching sorted on src then tgt instead of max len
- six dependancy
OpenNMT-py v3.4.0
- bitsandbytes 4/8 bit quantization at inference
- MMLU-FR results and scoring
- flan-T5 support
- flash attention
- terminology transform
- tensor parallelism (inference, training)
OpenNMT-py v3.3.0
OpenNMT-py v3.2.0
Lots new stuff in this release:
- Skip init during model build (way faster building)
- Enable quantization of LoRA layers
- Enable 4bit quantization from bitsandbytes (NF4 / FP4)
- Enable "some" bnb.optim Optimizers for benchmarking purpose
- Refactor model state_dict loading to enable pseudo lazy loading with move on GPU as it loads
- Enable Gradient checkpointing for FFN, MHA, LoRA modules
- Make FFN bias optional (same as QKV): llama, mpt, redpajama, openllama converters changed accordingly.
Convertv2_v3 set add_qkvbias=True, add_ffnbias=True.
load_checkpoint: if w1_bias detected in checkpoint then add_ffnbias=True - Add Multi Query attention
- Add Parallel Residual attention
- Add Falcon 7B converter
OpenNMT-py v3.1.3
- Step-by-step Tuto for Vicuna replication thanks Lina
- MosaicML MPT7B converter and support (Alibi embeddings)
- Open Llama converter / Redpajama converter
- Switch GCLD3 to Fasttext thanks ArtanieTheOne
- fix coverage attention in beam decoding
- fix ct2 keys for "Llama / MPT7B based" OpenNMT-y models
OpenNMT-py v3.1.2
-
fixes: transforms (normalize, clean, inlinetags)
-
Llama support (rotary embeddings, RMSNorm, Silu activation)
-
8bit loading for specific layers (along with LoRa for other layers)
-
subword learner added to build_vocab