From f603353e2057808f46c86395334ef507fd2bb351 Mon Sep 17 00:00:00 2001
From: Artur Fierka <artur.fierka@intel.com>
Date: Fri, 25 Oct 2024 08:46:30 +0200
Subject: [PATCH] Update README_GAUDI about fp8 calibration procedure (#423)

---
 README_GAUDI.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/README_GAUDI.md b/README_GAUDI.md
index b9c744bd9e23f..6dd7837116d52 100644
--- a/README_GAUDI.md
+++ b/README_GAUDI.md
@@ -282,6 +282,10 @@ Additionally, there are HPU PyTorch Bridge environment variables impacting vLLM
 - `PT_HPU_LAZY_MODE`: if `0`, PyTorch Eager backend for Gaudi will be used, if `1` PyTorch Lazy backend for Gaudi will be used, `1` is default
 - `PT_HPU_ENABLE_LAZY_COLLECTIVES`: required to be `true` for tensor parallel inference with HPU Graphs
 
+# Quantization and FP8 model calibration process
+
+The FP8 model calibration procedure has been described as a part of [vllm-hpu-extention](https://github.com/HabanaAI/vllm-hpu-extension/tree/main/calibration/README.md) package.
+
 # Troubleshooting: Tweaking HPU Graphs
 
 If you experience device out-of-memory issues or want to attempt inference at higher batch sizes, try tweaking HPU Graphs by following the below: