update gguf blog

BuxianChen · Oct 12, 2024 · ca5ff07 · ca5ff07
1 parent c9e655c
commit ca5ff07
Showing 1 changed file with 43 additions and 1 deletion.
diff --git a/_drafts/2024-10-09-gguf.md b/_drafts/2024-10-09-gguf.md
@@ -624,4 +624,46 @@ TODO
 
 ## 量化与反量化
 
-TODO
+前面已经提及 huggingface transformers 代码库里的 `from_pretrained` 方法可以直接读取 GGUF 文件, 这个过程将 GGUF 里的权重反量化为浮点数. 而这个反量化本质上是 `gguf` 包提供的, 下面先看一个例子验证这一点:
+
+- 使用 `from_pretrained` 方法得到的模型权重
+- 直接使用 `gguf` 包提供的 `GGUFReader` 和 `dequantize` 手动反量化权重
+
+环境要求: `gguf == 0.10.0`, `transformers==4.45.2`
+
+```python
+from transformers import AutoModelForCausalLM
+from gguf import GGUFReader, GGUFWriter, dequantize, quantize
+import os
+
+path = "/content/qwen2.5-1.5b-instruct-q5_k_m.gguf"
+
+model = AutoModelForCausalLM.from_pretrained(os.path.dirname(path), gguf_file=path)
+model_tensor = model.state_dict()["lm_head.weight"]
+print(model_tensor.shape)  # torch.Size([151936, 1536])
+print(model_tensor[:2, :3].numpy())
+# array([[ 0.00715065,  0.01251364, -0.01072598],
+#        [ 0.00555754,  0.0155611 ,  0.02334166]], dtype=float32)
+
+
+reader = GGUFReader(path)
+gguf_tensor = reader.tensors[0]
+print(gguf_tensor.name)  # "output.weight"
+print(gguf_tensor.tensor_type)   # <GGMLQuantizationType.Q6_K: 14>
+print(gguf_tensor.data.shape)  # (151936, 1260), 量化后的数据维度
+print(gguf_tensor.shape)  # (151916, 1536) 原始 float 权重的维度
+
+gguf_float_tensor = dequantize(gguf_tensor.data, gguf_tensor.tensor_type)
+print(gguf_float_tensor[:2, :3])
+# array([[ 0.00715065,  0.01251364, -0.01072598],
+#        [ 0.00555754,  0.0155611 ,  0.02334166]], dtype=float32)
+```
+
+可以看到, 上面两种做法得到的浮点数形式的权重是一致的. 因此我们下一步可以深入研究 `dequantize` 方法, 相关代码是: [llama.cpp/gguf-py/gguf/quants.py](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/gguf/quants.py)
+
+
+我们先看这个例子中用到的 `GGMLQuantizationType.Q6_K`
+
+## 量化推理
+
+TODO