Llama2 is a part open source commercial model released from Meta, including 7B/13B/70B and chat models with 4096 context window.
- [Original Model] 202307 Meta Released Llama2
- Github
- Meta's llama-recipes: provide examples for finetuning at SingleGPU/Multiple GPU and the recipe to convert model to HuggingFace transformers's LLama2 model definition
- Paper: Llama 2: Open Foundation and Fine-Tuned Chat Models
- Download Applications
- [Togehter AI] 202307 TogetherAI released Llama2-7B context window with 32k context window based on Meta's research Extending Context Window of Large Language Models via Positional Interpolation
- Codellama: Meta finetuned Llama2 for code generation usage. Support C++/ Java/ PHP/ Type Script/ C#/ Bash/ Python generation. Include models 7B/13B/34B,and 3 kind of variation (Generatl/python/instruction). Extend maximum context window from 4,096 tokens to 100k(like claude2).
- Llama2 70B Chatbot at HuggingFace
- A16z's Llama2-chatbot: provide a streamlit chatbot app for LLaMA2
- Finetune with PEFT
- Finetune together.ai 32k context window model: script to finetune on booksum/mqa dataset
- Llama-2-7B-32K-Instruct — and fine-tuning for Llama-2 models with Together API: Together AI show their 32k context instruct 7b model.
- Finetune with QLora at 13b model: a colab about finetuning llama2
- HuggingFace SFT training script
- Pytorch-lightening's script to finetune Llama2 on custom dataset
- Instuction-tune Llama2: HuggingFace's Tech Lead Philschmid introduced how to instruct finetune Llama2
- Finetune LLaMA2 7-70B on Amazon SageMaker: Philschmid introduce preparing datasets/using QLoRA/Deploy model on Amazon SageMaker
- Finetune LLaMa2 with QLoRA at colab
- Fine-tune Llama 2 with DPO by huggingface
- Fine-tune Llama2 on specific usage like SQL Gen/Functional Representation: Anyscale's member used their lib
ray
to demo finetune Llama2 70B.Their scripts
-
Karpathy's Llama2.c: Karpathy's weekend project to build a LLama2 at C
-
web-llm: Bringing large-language models and chat to web browsers
-
HuggingFace release Swift Transformers to help run LLM on Apple Device: Provide Swift based Swift Transformers Lib, a swift chat app and a exporters for exporting model to coreml.
-
pyllama: LLaMA: Open and Efficient Foundation Language Models
- Meta's started guide to use Llama
- Llama2.c for dummies: a description about Karpathy's LLama2 line by line
- NeurIPS 2023 LLM Efficiency Challenge Quickstart Guide: A competition focused on training 1 LLM for 24 hours on 1 GPU – the team with the best LLM gets to present their results at NeurIPS 2023.
- Huggingface share how to train and deploy an open source LLM?
- Huggingface trend about llama2
- Chinese-Llama-2-7b: finetune on a chinese and english instruction dataset with 10 millions size
- Chinese-LLaMA-Alpaca
- Finetuned on code with qLoRA
- ToolLLaMA: An open source project to train LLaMa on ToolBench, to make LLaMa support function call
- Llama2-Code-Interpreter: make Llama2 use Code Execution, Debug, Save Code, Reuse it, Access to Internet
- Llama2-Medical-Chatbot: A medical bot built using Llama2 and Sentence Transformers
- Finetune LLaMA 7B with Traditional Chinese instruction datasets
- Taiwan-LLaMa: NTU's MiuLab finetune 13B Llama2 with 5B traditional chinese tokens and 490k instruction dataset.
- Finetuning LLaMa + Text-to-SQL: LlamaIndex show how to fine-tune LLaMa 2 7B on a Text-to-SQL dataset
- LLaSM: Large Language and Speech Model: Support chinese/english voice chat model based on whisper features
- LLaVA : Large Language-and-Vision Assistant
- Chinese-LLaVA: support vision input and chinese text input/output
- [TogetherAI] OpenChatKit: Together.ai's open toolkit for LLM finetune/moderation
- LLaMA2-Accessory:An Open-source Toolkit for LLM Development
- LLaMA-Adapter: Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
- text-generation-webui:A gradio web UI for running Large Language Models like LLaMA, llama.cpp, GPT-J, OPT, and GALACTICA.
- text-generation-inference: Huggingface's Large Language Model Text Generation Inference.
- FlexFlow Serve: Low-Latency, High-Performance LLM Serving: An open-source compiler and distributed system for low latency, high performance LLM serving.
- LLM-As-Chatbot:Use lots of open sourced instruction-following fine-tuned LLM models as a Chatbot service.
- Optimizing LLM latency: A great blog about exploration of inference tools for open source LLMs
- Series Quantized LLama2 Model from The Bloke with GPTQ/GGML
- Quantization
- GPTQ: Accurate Post Training Quantization for generative pre-trained transformers
- AutoGPTQ: An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
- GPTQ: Accurate Post Training Quantization for generative pre-trained transformers
- Together AI's Medusa to accelerate decoding
- NVIDIA TensorRT-LLM Supercharges Large Language Model Inference on NVIDIA H100 GPUs:TensorRT-LLM is an open-source library that accelerates and optimizes inference performance on the latest LLMs on NVIDIA Tensor Core GPUs.
- 20231130 Pytorch Team use pytorch tool to accelerate
- LLM Reasoners: LLM Reasoners is a library to enable LLMs to conduct complex reasoning, with advanced reasoning algorithms.
- Deepminds LLM as Optimizers
- Run Llama 2 on your own Mac using LLM and Homebrew
- Deploy Llama2 7B/13B/70B model on AWS SageMaker: Based on Hugging Face LLM DLC(Deep Learning Container) which is powered by huggingface's text generation inference. HuggingFace's text generation inference is a Rust, Python and gRPC server for text generation inference. Used in production at HuggingFace to power Hugging Chat, the Inference API and Inference Endpoint.
-
LLaMA-efficient-tuning: Easy-to-use fine-tuning framework using PEFT (PT+SFT+RLHF with QLoRA) (LLaMA-2, BLOOM, Falcon, Baichuan)
-
Finetune Falcon-7B on Your GPU with TRL and QLoRA: A blog about tuning falcon-7b on your consumer GPU
-
A Definitive Guide to QLoRA: Fine-tuning Falcon-7b with PEFT
-
Amazon sagemaker generativeai: Fine-tune Falcon-40B with QLoRA
-
Llama with FlashAttention2: Reduces VRAM usage, especially during training.Full finetune Llama 2 7b:51.3->40.3GiB
-
Anti-hype LLM reading list: A reading list about LLM.
- Patterns for Building LLM-based Systems & Products: Amazon's LLM Engineer Eugene Yan wrote a blog about patterns of LLM based system
- Finetuning an LLM: RLHF and alternatives
- Github:A developer’s guide to prompt engineering and LLMs: Github engineer shares their experiences to to prompt engineering for their copilot product.
- The Rise and Potential of Large Language Model Based Agents: A Survey: A survey from Fudan NLP Group about LLM based Agents. Their github repo https://github.com/WooooDyy/LLM-Agent-Paper-List
- 🤗Open LLM Leaderboard: A huggingface space which track, rank and evaluate LLMs and chatbots as they are released.
- How is LLaMa.cpp possible: The post showed why Llama is limited by memory bound with some calculations of the transformers parameters.
-
Why we should train smaller LLMs on more tokens
- harms law on hugging face for calculating the model size/dataset size's compute overhead
-
LLMSurvey: A Survey of Large Language Models
-
Open challenges in LLM research: Chip Huyen's post about LLM's challenge
-
Stanford CS324 - Large Language Models: The fundamentals about the modeling, theory, ethics, and systems aspects of large language models.
-
Why you(Propbably) Don't Need to Fine-tune an LLM: Finetuning maynot reduce hallucinating. You could use few-shot prompting/ Retrieval Augmented Generation(RAG)
- Some Intuition on Attention and the Transformer: A post introduces the big deal about attention/what are query,key and value
- Intro to transformers