在Jetson AGX Orin上进行4bit量化时报错 #1290

FanZhang91 · 2024-07-15T12:17:56Z

FanZhang91
Jul 15, 2024

我使用类似下面的方式加载量化模型，但是会报错：RuntimeError: no kernel image available for execution on the device.
model = AutoModel.from_pretrained(MODEL_PATH, trust_remote_code=True).quantize(bits=4, device="cuda").cuda().eval()
model = AutoModel.from_pretrained(MODEL_PATH, trust_remote_code=True).cuda().quantize(bits=4, device="cuda").eval()
请问要如何修复这个问题？

Lin-A1 · 2024-07-15T12:22:13Z

Lin-A1
Jul 15, 2024

        model = AutoPeftModelForCausalLM.from_pretrained(
            model_dir, trust_remote_code=trust_remote_code, device_map='auto',
            quantization_config = BitsAndBytesConfig(load_in_4bit=True),
        )

用这种方式加载quantization_config = BitsAndBytesConfig(load_in_4bit=True)

0 replies

FanZhang91 · 2024-07-16T02:02:23Z

FanZhang91
Jul 16, 2024
Author

在4090服务器上无论是采用quntize还是BitsAndBytesConfig的形式进行量化都是可以运行的，只是发现int4和int8量化还不如half快，这个是否正常？但是在Orin这种设备上int4量化会报错，这个呀怎么处理？

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

在Jetson AGX Orin上进行4bit量化时报错 #1290

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

在Jetson AGX Orin上进行4bit量化时报错 #1290

FanZhang91 Jul 15, 2024

Replies: 2 comments

Lin-A1 Jul 15, 2024

FanZhang91 Jul 16, 2024 Author

FanZhang91
Jul 15, 2024

Lin-A1
Jul 15, 2024

FanZhang91
Jul 16, 2024
Author