Error when trying to run with a Quantized base model #138

ihavework7 · 2023-12-15T05:34:54Z

Hello. I have been trying to run the multi task llama7b models with Bloke's llama 7b GPTQ(https://huggingface.co/TheBloke/Llama-2-7B-GPTQ) as the base.

def load_model(base_model, peft_model, from_remote=True):

    model_name = parse_model_name(base_model, from_remote)
    # model = AutoModelForCausalLM.from_pretrained(
    #     model_name, trust_remote_code=True,
    #     device_map="auto",
    # )
    model_name_or_path = "TheBloke/Llama-2-7b-Chat-GPTQ"
    model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
        model_basename="model",
        use_safetensors=True,
        trust_remote_code=True,
        device="cuda:0",
        use_triton="False")
    model.model_parallel = True

    tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

    model = PeftModel.from_pretrained(model, peft_model)
    model = model.eval()
    return model, tokenizer

While running the same in google colab, I get the error when trying to load PEFT from pre trained:

ValueError: Target modules ['q_proj', 'k_proj', 'v_proj'] not found in the base model. Please check the target modules and try again.

After a bit of searching, It says I'll have to re train the PEFT model by using a different config. Is there anything I can do? (other than training)

For debugging purposes, value of 'model' before PEFT is used:

LlamaGPTQForCausalLM(
  (model): LlamaForCausalLM(
    (model): LlamaModel(
      (embed_tokens): Embedding(32000, 4096, padding_idx=0)
      (layers): ModuleList(
        (0-31): 32 x LlamaDecoderLayer(
          (self_attn): FusedLlamaAttentionForQuantizedModel(
            (qkv_proj): GeneralQuantLinear(in_features=4096, out_features=12288, bias=True)
            (o_proj): GeneralQuantLinear(in_features=4096, out_features=4096, bias=True)
            (rotary_emb): LlamaRotaryEmbedding()
          )
          (mlp): FusedLlamaMLPForQuantizedModel(
            (gate_proj): GeneralQuantLinear(in_features=4096, out_features=11008, bias=True)
            (up_proj): GeneralQuantLinear(in_features=4096, out_features=11008, bias=True)
            (down_proj): GeneralQuantLinear(in_features=11008, out_features=4096, bias=True)
          )
          (input_layernorm): LlamaRMSNorm()
          (post_attention_layernorm): LlamaRMSNorm()
        )
      )
      (norm): LlamaRMSNorm()
    )
    (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
  )
)

The text was updated successfully, but these errors were encountered:

Siddharth-Latthe-07 · 2024-07-16T13:04:48Z

The error you're encountering, ValueError: Target modules ['q_proj', 'k_proj', 'v_proj'] not found in the base model, suggests that the PEFT configuration is looking for certain layers in the model that are named differently or do not exist.

Given your model structure, it looks like the quantized model has replaced the standard projection layers (q_proj, k_proj, v_proj) with qkv_proj in FusedLlamaAttentionForQuantizedModel.

possible solution:-

Adjust PEFT Configuration: Modify the PEFT configuration to target the correct layers in the quantized model. You will need to customize the configuration so it aligns with the layers in FusedLlamaAttentionForQuantizedModel.
Custom Target Modules: Instead of targeting q_proj, k_proj, v_proj, target qkv_proj.

example snippet:-

from transformers import AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM
from peft import PeftModel, LoraConfig, get_peft_model, prepare_model_for_kbit_training

def load_model(base_model, peft_model, from_remote=True):
    model_name = parse_model_name(base_model, from_remote)
    
    model_name_or_path = "TheBloke/Llama-2-7b-Chat-GPTQ"
    model = AutoGPTQForCausalLM.from_quantized(
        model_name_or_path,
        model_basename="model",
        use_safetensors=True,
        trust_remote_code=True,
        device="cuda:0",
        use_triton="False"
    )
    model.model_parallel = True

    tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
    
    # Define the new target modules for PEFT
    peft_config = LoraConfig(
        r=8,
        lora_alpha=32,
        target_modules=["qkv_proj"],  # Adjusted target module
        lora_dropout=0.1,
        bias="none",
        task_type="CAUSAL_LM"
    )

    # Prepare model for PEFT with the new configuration
    model = prepare_model_for_kbit_training(model)
    model = get_peft_model(model, peft_config)
    model = PeftModel.from_pretrained(model, peft_model)
    
    model = model.eval()
    return model, tokenizer

# Rest of your code

hope this helps,
Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when trying to run with a Quantized base model #138

Error when trying to run with a Quantized base model #138

ihavework7 commented Dec 15, 2023

Siddharth-Latthe-07 commented Jul 16, 2024

Error when trying to run with a Quantized base model #138

Error when trying to run with a Quantized base model #138

Comments

ihavework7 commented Dec 15, 2023

Siddharth-Latthe-07 commented Jul 16, 2024