在Jetson AGX Orin上进行4bit量化时报错 #1290
Unanswered
FanZhang91
asked this question in
Q&A
Replies: 2 comments
-
model = AutoPeftModelForCausalLM.from_pretrained(
model_dir, trust_remote_code=trust_remote_code, device_map='auto',
quantization_config = BitsAndBytesConfig(load_in_4bit=True),
) 用这种方式加载 |
Beta Was this translation helpful? Give feedback.
0 replies
-
在4090服务器上无论是采用quntize还是BitsAndBytesConfig的形式进行量化都是可以运行的,只是发现int4和int8量化还不如half快,这个是否正常?但是在Orin这种设备上int4量化会报错,这个呀怎么处理? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
我使用类似下面的方式加载量化模型,但是会报错:RuntimeError: no kernel image available for execution on the device.
model = AutoModel.from_pretrained(MODEL_PATH, trust_remote_code=True).quantize(bits=4, device="cuda").cuda().eval()
model = AutoModel.from_pretrained(MODEL_PATH, trust_remote_code=True).cuda().quantize(bits=4, device="cuda").eval()
请问要如何修复这个问题?
Beta Was this translation helpful? Give feedback.
All reactions