You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unleash LLMs' potential through curated tutorials, best practices, and ready-to-use code for custom training and inferencing.
Overview
Welcome to LLM-PowerHouse, your ultimate resource for unleashing the full potential of Large Language Models (LLMs) with custom training and inferencing. This GitHub repository is a comprehensive and curated guide designed to empower developers, researchers, and enthusiasts to harness the true capabilities of LLMs and build intelligent applications that push the boundaries of natural language understanding.
This section offers fundamental insights into mathematics, Python, and neural networks. It may not be the ideal starting point, but you can consult it whenever necessary.
⬇️ Ready to Embrace Foundations of LLMs? ⬇️
graph LR
Foundations["📚 Foundations of Large Language Models (LLMs)"] --> ML["1️⃣ Mathematics for Machine Learning"]
Foundations["📚 Foundations of Large Language Models (LLMs)"] --> Python["2️⃣ Python for Machine Learning"]
Foundations["📚 Foundations of Large Language Models (LLMs)"] --> NN["3️⃣ Neural Networks"]
Foundations["📚 Foundations of Large Language Models (LLMs)"] --> NLP["4️⃣ Natural Language Processing (NLP)"]
ML["1️⃣ Mathematics for Machine Learning"] --> LA["📐 Linear Algebra"]
ML["1️⃣ Mathematics for Machine Learning"] --> Calculus["📏 Calculus"]
ML["1️⃣ Mathematics for Machine Learning"] --> Probability["📊 Probability & Statistics"]
Python["2️⃣ Python for Machine Learning"] --> PB["🐍 Python Basics"]
Python["2️⃣ Python for Machine Learning"] --> DS["📊 Data Science Libraries"]
Python["2️⃣ Python for Machine Learning"] --> DP["🔄 Data Preprocessing"]
Python["2️⃣ Python for Machine Learning"] --> MLL["🤖 Machine Learning Libraries"]
NN["3️⃣ Neural Networks"] --> Fundamentals["🔧 Fundamentals"]
NN["3️⃣ Neural Networks"] --> TO["⚙️ Training & Optimization"]
NN["3️⃣ Neural Networks"] --> Overfitting["📉 Overfitting"]
NN["3️⃣ Neural Networks"] --> MLP["🧠 Implementation of MLP"]
NLP["4️⃣ Natural Language Processing (NLP)"] --> TP["📝 Text Preprocessing"]
NLP["4️⃣ Natural Language Processing (NLP)"] --> FET["🔍 Feature Extraction Techniques"]
NLP["4️⃣ Natural Language Processing (NLP)"] --> WE["🌐 Word Embedding"]
NLP["4️⃣ Natural Language Processing (NLP)"] --> RNN["🔄 Recurrent Neural Network"]
Loading
1. Mathematics for Machine Learning
Before mastering machine learning, it's essential to grasp the fundamental mathematical concepts that underpin these algorithms.
Concept
Description
Linear Algebra
Crucial for understanding many algorithms, especially in deep learning. Key concepts include vectors, matrices, determinants, eigenvalues, eigenvectors, vector spaces, and linear transformations.
Calculus
Important for optimizing continuous functions in many machine learning algorithms. Essential topics include derivatives, integrals, limits, series, multivariable calculus, and gradients.
Probability and Statistics
Vital for understanding how models learn from data and make predictions. Key concepts encompass probability theory, random variables, probability distributions, expectations, variance, covariance, correlation, hypothesis testing, confidence intervals, maximum likelihood estimation, and Bayesian inference.
Further Exploration
Reference
Description
Link
3Blue1Brown - The Essence of Linear Algebra
Offers a series of videos providing geometric intuition to fundamental linear algebra concepts.
Mastery of Python programming entails understanding its basic syntax, data types, error handling, and object-oriented programming principles.
Data Science Libraries
Familiarity with essential libraries such as NumPy for numerical operations, Pandas for data manipulation, and Matplotlib and Seaborn for data visualization is crucial for effective data analysis.
Data Preprocessing
This phase involves crucial tasks such as feature scaling, handling missing data, outlier detection, categorical data encoding, and data partitioning into training, validation, and test sets to ensure data quality and model performance.
Machine Learning Libraries
Proficiency with Scikit-learn, a comprehensive library for machine learning, is indispensable. Understanding and implementing algorithms like linear regression, logistic regression, decision trees, random forests, k-nearest neighbors (K-NN), and K-means clustering are essential for building predictive models. Additionally, familiarity with dimensionality reduction techniques like PCA and t-SNE aids in visualizing complex data structures effectively.
Further Exploration
Reference
Description
Link
Real Python
A comprehensive resource offering articles and tutorials for both beginner and advanced Python concepts.
Understand the basic structure of a neural network, including layers, weights, biases, and activation functions like sigmoid, tanh, and ReLU.
Training and Optimization
Learn about backpropagation and various loss functions such as Mean Squared Error (MSE) and Cross-Entropy. Become familiar with optimization algorithms like Gradient Descent, Stochastic Gradient Descent, RMSprop, and Adam.
Overfitting
Grasp the concept of overfitting, where a model performs well on training data but poorly on unseen data, and explore regularization techniques like dropout, L1/L2 regularization, early stopping, and data augmentation to mitigate it.
Implement a Multilayer Perceptron (MLP)
Build a Multilayer Perceptron (MLP), also known as a fully connected network, using PyTorch.
Further Exploration
Reference
Description
Link
3Blue1Brown - But what is a Neural Network?
This video provides an intuitive explanation of neural networks and their inner workings.
Learn various text preprocessing steps such as tokenization (splitting text into words or sentences), stemming (reducing words to their root form), lemmatization (similar to stemming but considers the context), and stop word removal.
Feature Extraction Techniques
Become familiar with techniques to convert text data into a format understandable by machine learning algorithms. Key methods include Bag-of-Words (BoW), Term Frequency-Inverse Document Frequency (TF-IDF), and n-grams.
Word Embeddings
Understand word embeddings, a type of word representation that allows words with similar meanings to have similar representations. Key methods include Word2Vec, GloVe, and FastText.
Recurrent Neural Networks (RNNs)
Learn about RNNs, a type of neural network designed to work with sequence data, and explore LSTMs and GRUs, two RNN variants capable of learning long-term dependencies.
Further Exploration
Reference
Description
Link
RealPython - NLP with spaCy in Python
An exhaustive guide on using the spaCy library for NLP tasks in Python.
An overview of the Transformer architecture, with emphasis on inputs (tokens) and outputs (logits), and the importance of understanding the vanilla attention mechanism and its improved versions.
Concept
Description
Transformer Architecture (High-Level)
Review encoder-decoder Transformers, specifically the decoder-only GPT architecture used in modern LLMs.
Tokenization
Understand how raw text is converted into tokens (words or subwords) for the model to process.
Attention Mechanisms
Grasp the theory behind attention, including self-attention and scaled dot-product attention, which allows the model to focus on relevant parts of the input during output generation.
Text Generation
Learn different methods the model uses to generate output sequences. Common strategies include greedy decoding, beam search, top-k sampling, and nucleus sampling.
Further Exploration
Reference
Description
Link
The Illustrated Transformer by Jay Alammar
A visual and intuitive explanation of the Transformer model
While it's easy to find raw data from Wikipedia and other websites, it's difficult to collect pairs of instructions and answers in the wild. Like in traditional machine learning, the quality of the dataset will directly influence the quality of the model, which is why it might be the most important component in the fine-tuning process.
This dataset generation method utilizes the OpenAI API (GPT) to synthesize data from scratch, allowing for the specification of seeds and system prompts to foster diversity within the dataset.
Advanced techniques
Delve into methods for enhancing existing datasets with Evol-Instruct, and explore approaches for generating top-tier synthetic data akin to those outlined in the Orca and phi-1 research papers.
Filtering data
Employ traditional techniques such as regex, near-duplicate removal, and prioritizing answers with substantial token counts to refine datasets.
Prompt templates
Recognize the absence of a definitive standard for structuring instructions and responses, underscoring the importance of familiarity with various chat templates like ChatML and Alpaca.
Further Exploration
Reference
Description
Link
Preparing a Dataset for Instruction tuning by Thomas Capelle
Explores the Alpaca and Alpaca-GPT4 datasets and discusses formatting methods.
Pre-training, being both lengthy and expensive, is not the primary focus of this course. While it's beneficial to grasp the fundamentals of pre-training, practical experience in this area is not mandatory.
Concept
Description
Data pipeline
Pre-training involves handling vast datasets, such as the 2 trillion tokens used in Llama 2, which necessitates tasks like filtering, tokenization, and vocabulary preparation.
Causal language modeling
Understand the distinction between causal and masked language modeling, including insights into the corresponding loss functions. Explore efficient pre-training techniques through resources like Megatron-LM or gpt-neox.
Scaling laws
Delve into the scaling laws, which elucidate the anticipated model performance based on factors like model size, dataset size, and computational resources utilized during training.
High-Performance Computing
While beyond the scope of this discussion, a deeper understanding of HPC becomes essential for those considering building their own LLMs from scratch, encompassing aspects like hardware selection and distributed workload management.
Further Exploration
Reference
Description
Link
LLMDataHub by Junhao Zhao
Offers a carefully curated collection of datasets tailored for pre-training, fine-tuning, and RLHF.
Provides a comprehensive overview of the BLOOM model's construction, offering valuable insights into its engineering aspects and encountered challenges.
Pre-trained models are trained to predict the next word, so they're not great as assistants. But with SFT, you can adjust them to follow instructions. Plus, you can fine-tune them on different data, even private stuff GPT-4 hasn't seen, and use them without needing paid APIs like OpenAI's.
Concept
Description
Full fine-tuning
Full fine-tuning involves training all parameters in the model, though it's not the most efficient approach, it can yield slightly improved results.
DeepSpeed facilitates efficient pre-training and fine-tuning of large language models across multi-GPU and multi-node settings, often integrated within Axolotl for enhanced performance.
Further Exploration
Reference
Description
Link
The Novice's LLM Training Guide by Alpin
Provides an overview of essential concepts and parameters for fine-tuning LLMs.
Following supervised fine-tuning, RLHF serves as a crucial step in harmonizing the LLM's responses with human expectations. This entails acquiring preferences from human or artificial feedback, thereby mitigating biases, implementing model censorship, or fostering more utilitarian behavior. RLHF is notably more intricate than SFT and is frequently regarded as discretionary.
Concept
Description
Preference datasets
Typically containing several answers with some form of ranking, these datasets are more challenging to produce than instruction datasets.
This algorithm utilizes a reward model to predict whether a given text is highly ranked by humans. It then optimizes the SFT model using a penalty based on KL divergence.
DPO simplifies the process by framing it as a classification problem. It employs a reference model instead of a reward model (requiring no training) and only necessitates one hyperparameter, rendering it more stable and efficient.
Further Exploration
Reference
Description
Link
An Introduction to Training LLMs using RLHF by Ayush Thakur
Explain why RLHF is desirable to reduce bias and increase performance in LLMs.
Assessing LLMs is an often overlooked aspect of the pipeline, characterized by its time-consuming nature and moderate reliability. Your evaluation criteria should be tailored to your downstream task, while bearing in mind Goodhart's law: "When a measure becomes a target, it ceases to be a good measure."
Concept
Description
Traditional metrics
Metrics like perplexity and BLEU score, while less favored now due to their contextual limitations, remain crucial for comprehension and determining their applicable contexts.
Tasks like summarization, translation, and question answering boast dedicated benchmarks, metrics, and even subdomains (e.g., medical, financial), exemplified by PubMedQA for biomedical question answering.
Human evaluation
The most dependable evaluation method entails user acceptance rates or human-comparison metrics. Additionally, logging user feedback alongside chat traces, facilitated by tools like LangSmith, aids in pinpointing potential areas for enhancement.
Further Evaluation
Reference
Description
Link
Perplexity of fixed-length models by Hugging Face
Provides an overview of perplexity along with code to implement it using the transformers library.
Quantization involves converting the weights (and activations) of a model to lower precision. For instance, weights initially stored using 16 bits may be transformed into a 4-bit representation. This technique has gained significance in mitigating the computational and memory expenses linked with LLMs
Concept
Description
Base techniques
Explore various levels of precision (FP32, FP16, INT8, etc.) and learn how to conduct naïve quantization using techniques like absmax and zero-point.
GGUF and llama.cpp
Originally intended for CPU execution, llama.cpp and the GGUF format have emerged as popular tools for running LLMs on consumer-grade hardware.
GPTQ and EXL2
GPTQ and its variant, the EXL2 format, offer remarkable speed but are limited to GPU execution. However, quantizing models using these formats can be time-consuming.
AWQ
This newer format boasts higher accuracy compared to GPTQ, as indicated by lower perplexity, but demands significantly more VRAM and may not necessarily exhibit faster performance.
Further Exploration
Reference
Description
Link
Introduction to quantization
Offers an overview of quantization, including absmax and zero-point quantization, and demonstrates LLM.int8() with accompanying code.
Presents a guide on quantizing a Mistral model using the EXL2 format and running it with the ExLlamaV2 library, touted as the fastest library for LLMs.
Explore how LLMs encode positions, focusing on relative positional encoding schemes like RoPE. Implement extensions to context length using techniques such as YaRN (which multiplies the attention matrix by a temperature factor) or ALiBi (applying attention penalty based on token distance).
Model merging
Model merging has gained popularity as a method for creating high-performance models without additional fine-tuning. The widely-used mergekit library incorporates various merging methods including SLERP, DARE, and TIES.
Mixture of Experts
The resurgence of the MoE architecture, exemplified by Mixtral, has led to the emergence of alternative approaches like frankenMoE, seen in community-developed models such as Phixtral, offering cost-effective and high-performance alternatives.
Multimodal models
These models, such as CLIP, Stable Diffusion, or LLaVA, process diverse inputs (text, images, audio, etc.) within a unified embedding space, enabling versatile applications like text-to-image generation.
Learn to create and deploy robust LLM-powered applications, focusing on model augmentation and practical deployment strategies for production environments.
⬇️ Ready to Build Production-Ready LLM Applications?⬇️
Running LLMs can be demanding due to significant hardware requirements. Based on your use case, you might opt to use a model through an API (like GPT-4) or run it locally. In either scenario, employing additional prompting and guidance techniques can improve and constrain the output for your applications.
Techniques such as zero-shot prompting, few-shot prompting, chain of thought, and ReAct are commonly used in prompt engineering. These methods are more effective with larger models but can also be adapted for smaller ones.
Structuring Outputs
Many tasks require outputs to be in a specific format, such as a strict template or JSON. Libraries like LMQL, Outlines, and Guidance can help guide the generation process to meet these structural requirements.
Further Exploration
Reference
Description
Link
Run an LLM locally with LM Studio by Nisha Arya
A brief guide on how to use LM Studio for running a local LLM.
Creating a vector storage is the first step in building a Retrieval Augmented Generation (RAG) pipeline. This involves loading and splitting documents, and then using the relevant chunks to produce vector representations (embeddings) that are stored for future use during inference.
Category
Details
Ingesting Documents
Document loaders are convenient wrappers that handle various formats such as PDF, JSON, HTML, Markdown, etc. They can also retrieve data directly from some databases and APIs (e.g., GitHub, Reddit, Google Drive).
Splitting Documents
Text splitters break down documents into smaller, semantically meaningful chunks. Instead of splitting text after a certain number of characters, it's often better to split by header or recursively, with some additional metadata.
Embedding Models
Embedding models convert text into vector representations, providing a deeper and more nuanced understanding of language, which is essential for performing semantic search.
Vector Databases
Vector databases (like Chroma, Pinecone, Milvus, FAISS, Annoy, etc.) store embedding vectors and enable efficient retrieval of data based on vector similarity.
Further Exploration
Reference
Description
Link
LangChain - Text splitters
A list of different text splitters implemented in LangChain.
Using RAG, LLMs access relevant documents from a database to enhance the precision of their responses. This method is widely used to expand the model's knowledge base without the need for fine-tuning.
Category
Details
Orchestrators
Orchestrators (like LangChain, LlamaIndex, FastRAG, etc.) are popular frameworks to connect your LLMs with tools, databases, memories, etc. and augment their abilities.
Retrievers
User instructions are not optimized for retrieval. Different techniques (e.g., multi-query retriever, HyDE, etc.) can be applied to rephrase/expand them and improve performance.
Memory
To remember previous instructions and answers, LLMs and chatbots like ChatGPT add this history to their context window. This buffer can be improved with summarization (e.g., using a smaller LLM), a vector store + RAG, etc.
Evaluation
We need to evaluate both the document retrieval (context precision and recall) and generation stages (faithfulness and answer relevancy). It can be simplified with tools Ragas and DeepEval.
Further Exploration
Reference
Description
Link
Llamaindex - High-level concepts
Main concepts to know when building RAG pipelines.
Real-world applications often demand intricate pipelines that utilize SQL or graph databases and dynamically choose the appropriate tools and APIs. These sophisticated methods can improve a basic solution and offer extra capabilities.
Category
Details
Query construction
Structured data stored in traditional databases requires a specific query language like SQL, Cypher, metadata, etc. We can directly translate the user instruction into a query to access the data with query construction.
Agents and tools
Agents augment LLMs by automatically selecting the most relevant tools to provide an answer. These tools can be as simple as using Google or Wikipedia, or more complex like a Python interpreter or Jira.
Post-processing
The final step processes the inputs that are fed to the LLM. It enhances the relevance and diversity of documents retrieved with re-ranking, RAG-fusion, and classification.
Program LLMs
Frameworks like DSPy allow you to optimize prompts and weights based on automated evaluations in a programmatic way.
Further Exploration
Reference
Description
Link
LangChain - Query Construction
Blog post about different types of query construction.
Text generation is an expensive process that requires powerful hardware. Besides quantization, various techniques have been proposed to increase throughput and lower inference costs.
Category
Details
Flash Attention
Optimization of the attention mechanism to transform its complexity from quadratic to linear, speeding up both training and inference.
Deploying LLMs at scale is a complex engineering task that may require multiple GPU clusters. However, demos and local applications can often be achieved with significantly less complexity.
Category
Details
Local deployment
Privacy is an important advantage that open-source LLMs have over private ones. Local LLM servers (LM Studio, Ollama, oobabooga, kobold.cpp, etc.) capitalize on this advantage to power local apps.
Demo deployment
Frameworks like Gradio and Streamlit are helpful to prototype applications and share demos. You can also easily host them online, for example using Hugging Face Spaces.
Server deployment
Deploying LLMs at scale requires cloud infrastructure (see also SkyPilot) or on-prem infrastructure and often leverages optimized text generation frameworks like TGI, vLLM, etc.
Edge deployment
In constrained environments, high-performance frameworks like MLC LLM and mnn-llm can deploy LLMs in web browsers, Android, and iOS.
Further Exploration
Reference
Description
Link
Streamlit - Build a basic LLM app
Tutorial to make a basic ChatGPT-like app using Streamlit.
Along with the usual security concerns of software, LLMs face distinct vulnerabilities arising from their training and prompting methods.
Category
Details
Prompt hacking
Techniques related to prompt engineering, including prompt injection (adding instructions to alter the model’s responses), data/prompt leaking (accessing original data or prompts), and jailbreaking (crafting prompts to bypass safety features).
Backdoors
Attack vectors targeting the training data itself, such as poisoning the training data with false information or creating backdoors (hidden triggers to alter the model’s behavior during inference).
Defensive measures
Protecting LLM applications involves testing them for vulnerabilities (e.g., using red teaming and tools like garak) and monitoring them in production (using a framework like langfuse).
Further Exploration
Reference
Description
Link
OWASP LLM Top 10 by HEGO Wiki
List of the 10 most critical vulnerabilities found in LLM applications.
Generated by Databricks employees, prompt/response pairs in eight different instruction categories, including the seven outlined in the InstructGPT paper.
High-quality dataset with pairs of instructions and answers in different languages. See Locutusque/function-calling-chatml for a variant without conversation tags.
Mix of AgentInstruct, ToolBench, and ShareGPT datasets.
Agent & Function calling
LLM Alligmment
Alignment is an emerging field of study where you ensure that an AI system performs exactly what you want it to perform. In the context of LLMs specifically, alignment is a process that trains an LLM to ensure that the generated outputs align with human values and goals.
What are the current methods for LLM alignment?
You will find many alignment methods in research literature, we will only stick to 3 alignment methods for the sake of discussion
📌 RLHF:
Step 1 & 2: Train an LLM (pre-training for the base model + supervised/instruction fine-tuning for chat model)
Step 3: RLHF uses an ancillary language model (it could be much smaller than the main LLM) to learn human preferences. This can be done using a preference dataset - it contains a prompt, and a response/set of responses graded by expert human labelers. This is called a “reward model”.
Step 4: Use a reinforcement learning algorithm (eg: PPO - proximal policy optimization), where the LLM is the agent, the reward model provides a positive or negative reward to the LLM based on how well it’s responses align with the “human preferred responses”.
In theory, it is as simple as that. However, implementation isn’t that easy - requiring lot of human experts and compute resources. To overcome the “expense” of RLHF, researchers developed DPO.
Step 4: DPO eliminates the need for the training of a reward model (i.e step 3). How? DPO defines an additional preference loss as a function of it’s policy and uses the language model directly as the reward model. The idea is simple, If you are already training such a powerful LLM, why not train itself to distinguish between good and bad responses, instead of using another model?
DPO is shown to be more computationally efficient (in case of RLHF you also need to constantly monitor the behavior of the reward model) and has better performance than RLHF in several settings.
The newest method out of all 3, ORPO combines Step 2, 3 & 4 into a single step - so the dataset required for this method is a combination of a fine-tuning + preference dataset.
The supervised fine-tuning and alignment/preference optimization is performed in a single step. This is because the fine-tuning step, while allowing the model to specialize to tasks and domains, can also increase the probability of undesired responses from the model.
ORPO combines the steps using a single objective function by incorporating an odds ratio (OR) term - reward preferred responses & penalizing rejected responses.
After immersing myself in the recent GenAI text-based language model hype for nearly a month, I have made several observations about its performance on my specific tasks.
Please note that these observations are subjective and specific to my own experiences, and your conclusions may differ.
We need a minimum of 7B parameter models (<7B) for optimal natural language understanding performance. Models with fewer parameters result in a significant decrease in performance. However, using models with more than 7 billion parameters requires a GPU with greater than 24GB VRAM (>24GB).
Benchmarks can be tricky as different LLMs perform better or worse depending on the task. It is crucial to find the model that works best for your specific use case. In my experience, MPT-7B is still the superior choice compared to Falcon-7B.
Prompts change with each model iteration. Therefore, multiple reworks are necessary to adapt to these changes. While there are potential solutions, their effectiveness is still being evaluated.
For fine-tuning, you need at least one GPU with greater than 24GB VRAM (>24GB). A GPU with 32GB or 40GB VRAM is recommended.
Fine-tuning only the last few layers to speed up LLM training/finetuning may not yield satisfactory results. I have tried this approach, but it didn't work well.
Loading 8-bit or 4-bit models can save VRAM. For a 7B model, instead of requiring 16GB, it takes approximately 10GB or less than 6GB, respectively. However, this reduction in VRAM usage comes at the cost of significantly decreased inference speed. It may also result in lower performance in text understanding tasks.
Those who are exploring LLM applications for their companies should be aware of licensing considerations. Training a model with another model as a reference and requiring original weights is not advisable for commercial settings.
There are three major types of LLMs: basic (like GPT-2/3), chat-enabled, and instruction-enabled. Most of the time, basic models are not usable as they are and require fine-tuning. Chat versions tend to be the best, but they are often not open-source.
Not every problem needs to be solved with LLMs. Avoid forcing a solution around LLMs. Similar to the situation with deep reinforcement learning in the past, it is important to find the most appropriate approach.
I have tried but didn't use langchains and vector-dbs. I never needed them. Simple Python, embeddings, and efficient dot product operations worked well for me.
LLMs do not need to have complete world knowledge. Humans also don't possess comprehensive knowledge but can adapt. LLMs only need to know how to utilize the available knowledge. It might be possible to create smaller models by separating the knowledge component.
The next wave of innovation might involve simulating "thoughts" before answering, rather than simply predicting one word after another. This approach could lead to significant advancements.
The overparameterization of LLMs presents a significant challenge: they tend to memorize extensive amounts of training data. This becomes particularly problematic in RAG scenarios when the context conflicts with this "implicit" knowledge. However, the situation escalates further when the context itself contains contradictory information. A recent survey paper comprehensively analyzes these "knowledge conflicts" in LLMs, categorizing them into three distinct types:
Context-Memory Conflicts: Arise when external context contradicts the LLM's internal knowledge.
Solution
Fine-tune on counterfactual contexts to prioritize external information.
Utilize specialized prompts to reinforce adherence to context
Apply decoding techniques to amplify context probabilities.
Pre-train on diverse contexts across documents.
Inter-Context Conflicts: Contradictions between multiple external sources.
Solution:
Employ specialized models for contradiction detection.
Utilize fact-checking frameworks integrated with external tools.
Fine-tune discriminators to identify reliable sources.
Aggregate high-confidence answers from augmented queries.
Intra-Memory Conflicts: The LLM gives inconsistent outputs for similar inputs due to conflicting internal knowledge.
Solution:
Fine-tune with consistency loss functions.
Implement plug-in methods, retraining on word definitions.
Ensemble one model's outputs with another's coherence scoring.
Apply contrastive decoding, focusing on truthful layers/heads.
The difference between PPO and DPOs: in DPO you don’t need to train a reward model anymore. Having good and bad data would be sufficient!
ORPO: “A straightforward and innovative reference model-free monolithic odds ratio preference optimization algorithm, ORPO, eliminating the necessity for an additional preference alignment phase. “ Hong, Lee, Thorne (2024)
KTO: “KTO does not need preferences -- only a binary signal of whether an output is desirable or undesirable for a given input. This makes it far easier to use in the real world, where preference data is scarce and expensive.” Ethayarajh et al (2024)
Contributing
Contributions are welcome! If you'd like to contribute to this project, feel free to open an issue or submit a pull request.