Skip to content

Commit

Permalink
Update README_GAUDI about fp8 calibration procedure (#423)
Browse files Browse the repository at this point in the history
  • Loading branch information
afierka-intel authored Oct 25, 2024
1 parent 7f58ad1 commit f603353
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions README_GAUDI.md
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,10 @@ Additionally, there are HPU PyTorch Bridge environment variables impacting vLLM
- `PT_HPU_LAZY_MODE`: if `0`, PyTorch Eager backend for Gaudi will be used, if `1` PyTorch Lazy backend for Gaudi will be used, `1` is default
- `PT_HPU_ENABLE_LAZY_COLLECTIVES`: required to be `true` for tensor parallel inference with HPU Graphs

# Quantization and FP8 model calibration process

The FP8 model calibration procedure has been described as a part of [vllm-hpu-extention](https://github.com/HabanaAI/vllm-hpu-extension/tree/main/calibration/README.md) package.

# Troubleshooting: Tweaking HPU Graphs

If you experience device out-of-memory issues or want to attempt inference at higher batch sizes, try tweaking HPU Graphs by following the below:
Expand Down

0 comments on commit f603353

Please sign in to comment.