[Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` #837

dsikka · 2024-10-10T17:03:36Z

SUMMARY:

PR to add observers to llm-compressor
Adds the required hooks needed to run calibration as part of the QuantizationModifier. All required calibration lifecycle steps can now be found in calibration.py
Also adds the KV Cache object such that calibration can be done to update k_scale and v_scale for kv_cache quantization
Requires the following PR to land in compressed-tensors: Observer Restructure: Remove Observers, calibration, and applying frozen steps from lifecycle neuralmagic/compressed-tensors#189
Updated Calibration lifecycle (also shown in the docstrings). This will run as part of the calibration step within the QuantizationModifier


Run calibration if running input/output activation quantization or kv_cache quantization.

Calibration Lifecycle for a single torch.nn.Module:

      1. initialize_observer():
          if input/output activation:
              - observer = Observer.load_from_registry(...)
              - module.register_module(f"{base_name}_observer", observer)
              
      2. register_calibration_hooks():
          if input activation and not dynamic quant (used to call observers before intput QDQ):
              - pre_hook_handle = module.register_forward_pre_hook(calibrate_input_hook())
          if output activation and not dynamic quant (used to call observers before output QDQ):
              - post_hook_handle = module.register_forward_hook(calibrate_kv_cache_output_hook())
          if kv_cache quantization (used to set kv_cache to QuantizedKVParameterCache and update k_scale/v_scale)
              - pre_hook_handle = module.register_forward_pre_hook(calibrate_kv_cache_input_hook(), with_kwargs=True)
              - post_hook_handle = module.register_forward_hook(calibrate_kv_cache_output_hook())
          self.calibration_hooks.append(pre_hook_handle)
          self.calibration_hooks.append(post_hook_handle)

      3. self._calibrate(module) # run forward pass through model using calibration data
      4. set_unset_kv_cache() # remove kv_cache objects attached to attention layers  initially set in _apply_modifier_to_model
      5. remove calibration hooks in self.calibration_hooks_
      6. remove observers

Testing:

Tested w4a16, quantized kv_cache, and w8a8 int8 workflows

github-actions · 2024-10-10T17:03:49Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

dsikka · 2024-10-24T00:44:09Z

With the corresponding remove-observers branch checked out

python3 examples/quantization_w4a16/llama3_example.py                                                                                                                         
Traceback (most recent call last):                                                                                         
  File "/home/ksayers/llm-compressor/examples/quantization_w4a16/llama3_example.py", line 4, in <module>                   
    from llmcompressor.modifiers.quantization import GPTQModifier                                                          
  File "/home/ksayers/llm-compressor/src/llmcompressor/modifiers/quantization/__init__.py", line 3, in <module>            
    from .cache import *                                                                                                   
  File "/home/ksayers/llm-compressor/src/llmcompressor/modifiers/quantization/cache.py", line 18, in <module>              
    from compressed_tensors.quantization.lifecycle import KVCacheScaleType

Can you confirm you're using the most recent commit for both? I do not get this error. And in general you should not as the scales are defined under src/compressed_tensors/quantization/lifecycle/initialize.py

src/llmcompressor/modifiers/quantization/calibration.py

src/llmcompressor/modifiers/quantization/quantization/base.py

Co-authored-by: Kyle Sayers <[email protected]>

src/llmcompressor/observers/base.py

rahul-tuli

I really like the new structure, Great work!

Left a few nits, would recommend revisiting the docstrings and updating them for consistency:
-> Start docstrings with a Capital Letter
-> Include param info in :params over just writing a description in the main docstring

Otherwise no big red flags! Good tests as well.

src/llmcompressor/observers/base.py

update functioon

7d6c73c

dsikka marked this pull request as draft October 10, 2024 17:03

dsikka mentioned this pull request Oct 10, 2024

[Observer Restructure]: Separate out scale/zp and observer init; separate out calibration from forward pass neuralmagic/compressed-tensors#188

Merged

wip

7dad592

dsikka changed the title ~~[Observer Restructure]: Update function call~~ [Observer Restructure]: Add Observers Oct 14, 2024

dsikka added 5 commits October 14, 2024 16:12

clean-up; fix imports

ece6451

clean-up

dbda873

more clean-up

d1a5756

bug fix

15597c3

update for kvcache

acdb8da

dsikka force-pushed the update-foward branch from 3544076 to acdb8da Compare October 17, 2024 19:33

dsikka added 3 commits October 17, 2024 20:53

get kv_cache to work

28c0167

docstring

841780d

fix comment

5e21639

dsikka force-pushed the update-foward branch from dad1442 to 5e21639 Compare October 18, 2024 00:32

dsikka added 3 commits October 18, 2024 01:32

fix condition for dynamic

de28cf8

Merge branch 'main' into update-foward

a3ddb6f

update

b0de448

dsikka force-pushed the update-foward branch from 8b2d430 to b0de448 Compare October 18, 2024 01:41

dsikka added 2 commits October 21, 2024 18:24

update tests

b739db8

add observer tests

ac00c9b

dsikka force-pushed the update-foward branch from 358e10a to ac00c9b Compare October 21, 2024 18:52

dsikka added 4 commits October 21, 2024 16:28

Merge branch 'main' into update-foward

a5eafad

add flake8 skip

a68694d

apply updated mse fixes

ab2d0a6

fix import

27284b8

dsikka changed the title ~~[Observer Restructure]: Add Observers~~ [Observer Restructure]: Add Observers, calibration, and frozen steps to lifecycle Oct 22, 2024

dsikka changed the title ~~[Observer Restructure]: Add Observers, calibration, and frozen steps to lifecycle~~ [Observer Restructure]: Add Observers; Add calibration, and frozen steps to QuantizationModifier Oct 22, 2024

dsikka changed the title ~~[Observer Restructure]: Add Observers; Add calibration, and frozen steps to QuantizationModifier~~ [Observer Restructure]: Add Observers; Add calibration and frozen steps to QuantizationModifier Oct 22, 2024

horheynm reviewed Oct 24, 2024

View reviewed changes

src/llmcompressor/modifiers/quantization/calibration.py Outdated Show resolved Hide resolved

horheynm reviewed Oct 24, 2024

View reviewed changes

src/llmcompressor/modifiers/quantization/calibration.py Show resolved Hide resolved

horheynm reviewed Oct 24, 2024

View reviewed changes

src/llmcompressor/modifiers/quantization/calibration.py Outdated Show resolved Hide resolved

horheynm reviewed Oct 24, 2024

View reviewed changes

src/llmcompressor/modifiers/quantization/quantization/base.py Outdated Show resolved Hide resolved

dsikka and others added 6 commits October 25, 2024 11:14

Update src/llmcompressor/modifiers/quantization/calibration.py

25a0025

Co-authored-by: Kyle Sayers <[email protected]>

Update src/llmcompressor/modifiers/quantization/calibration.py

e574c2a

Co-authored-by: Kyle Sayers <[email protected]>

Merge branch 'main' into update-foward

43771e7

PR comments

14b69fd

clean-up

b4621fa

move hook check to observer call

92db43e

dsikka requested review from kylesayrs and horheynm October 28, 2024 13:35

Merge branch 'main' into update-foward

99a9376

horheynm reviewed Oct 30, 2024

View reviewed changes

src/llmcompressor/observers/base.py Outdated Show resolved Hide resolved

kylesayrs previously approved these changes Oct 30, 2024

View reviewed changes

rahul-tuli previously approved these changes Oct 30, 2024

View reviewed changes

src/llmcompressor/observers/base.py Outdated Show resolved Hide resolved

src/llmcompressor/observers/base.py Show resolved Hide resolved

update

9fc10a9

dsikka dismissed stale reviews from rahul-tuli and kylesayrs via 9fc10a9 October 30, 2024 21:49

Merge branch 'main' into update-foward

c4686d4

dsikka requested review from horheynm, kylesayrs and rahul-tuli October 30, 2024 21:51

separate out calibration step

e591528

kylesayrs approved these changes Oct 31, 2024

View reviewed changes

rahul-tuli approved these changes Oct 31, 2024

View reviewed changes

Merge branch 'main' into update-foward

031ba38

dsikka merged commit 18e9a9f into main Oct 31, 2024
6 of 7 checks passed

dsikka deleted the update-foward branch October 31, 2024 14:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` #837

[Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` #837

dsikka commented Oct 10, 2024 •

edited

Loading

github-actions bot commented Oct 10, 2024

dsikka commented Oct 24, 2024

rahul-tuli left a comment

[Observer Restructure]: Add Observers; Add calibration and frozen steps to QuantizationModifier #837

[Observer Restructure]: Add Observers; Add calibration and frozen steps to QuantizationModifier #837

Conversation

dsikka commented Oct 10, 2024 • edited Loading

Testing:

github-actions bot commented Oct 10, 2024

dsikka commented Oct 24, 2024

rahul-tuli left a comment

Choose a reason for hiding this comment

[Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` #837

[Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` #837

dsikka commented Oct 10, 2024 •

edited

Loading