Skip to content

Commit

Permalink
update gguf blog
Browse files Browse the repository at this point in the history
  • Loading branch information
BuxianChen committed Oct 12, 2024
1 parent c9e655c commit ca5ff07
Showing 1 changed file with 43 additions and 1 deletion.
44 changes: 43 additions & 1 deletion _drafts/2024-10-09-gguf.md
Original file line number Diff line number Diff line change
Expand Up @@ -624,4 +624,46 @@ TODO

## 量化与反量化

TODO
前面已经提及 huggingface transformers 代码库里的 `from_pretrained` 方法可以直接读取 GGUF 文件, 这个过程将 GGUF 里的权重反量化为浮点数. 而这个反量化本质上是 `gguf` 包提供的, 下面先看一个例子验证这一点:

- 使用 `from_pretrained` 方法得到的模型权重
- 直接使用 `gguf` 包提供的 `GGUFReader``dequantize` 手动反量化权重

环境要求: `gguf == 0.10.0`, `transformers==4.45.2`

```python
from transformers import AutoModelForCausalLM
from gguf import GGUFReader, GGUFWriter, dequantize, quantize
import os

path = "/content/qwen2.5-1.5b-instruct-q5_k_m.gguf"

model = AutoModelForCausalLM.from_pretrained(os.path.dirname(path), gguf_file=path)
model_tensor = model.state_dict()["lm_head.weight"]
print(model_tensor.shape) # torch.Size([151936, 1536])
print(model_tensor[:2, :3].numpy())
# array([[ 0.00715065, 0.01251364, -0.01072598],
# [ 0.00555754, 0.0155611 , 0.02334166]], dtype=float32)


reader = GGUFReader(path)
gguf_tensor = reader.tensors[0]
print(gguf_tensor.name) # "output.weight"
print(gguf_tensor.tensor_type) # <GGMLQuantizationType.Q6_K: 14>
print(gguf_tensor.data.shape) # (151936, 1260), 量化后的数据维度
print(gguf_tensor.shape) # (151916, 1536) 原始 float 权重的维度

gguf_float_tensor = dequantize(gguf_tensor.data, gguf_tensor.tensor_type)
print(gguf_float_tensor[:2, :3])
# array([[ 0.00715065, 0.01251364, -0.01072598],
# [ 0.00555754, 0.0155611 , 0.02334166]], dtype=float32)
```

可以看到, 上面两种做法得到的浮点数形式的权重是一致的. 因此我们下一步可以深入研究 `dequantize` 方法, 相关代码是: [llama.cpp/gguf-py/gguf/quants.py](https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/gguf/quants.py)


我们先看这个例子中用到的 `GGMLQuantizationType.Q6_K`

## 量化推理

TODO

0 comments on commit ca5ff07

Please sign in to comment.