Request for more Details on Training the Reward Model #2

brightest66 · 2024-11-07T13:12:23Z

Hi,

I am currently attempting to reproduce the experiments detailed in the section titled "Process Rewards Annotating (Taking LogiQA-v2 as an Example)" from your README.md file. However, as I reach Step 5, I have observed that there is no script available for training the reward model (RM). Could you please provide more information about the RM training process? Specifically, it would be extremely beneficial if you could share details about any files associated with the RM training or provide a script similar to those previously shared.

I have attempted to run the following script:

deepspeed trainer_base_ds_mul.py -cp conf/exp/reward/logiqav2 -cn llama2_7b_70bdistil_prm_v2_0

but it failed to execute. Perhaps there is an issue with my Python environment setup (even though I installed it according to the requirements.txt) or I might have misunderstood the code structure and executed the wrong script.

Thank you very much for your assistance. Please excuse my limited coding experience.

SparkJiao · 2024-11-07T15:13:39Z

Please show me the error message.

brightest66 · 2024-11-07T16:19:17Z

Ubuntu 20.04.1
nvidia 4090 * 2
cuda 12.4

flash-attn: 2.3.3
vllm: 0.2.5
transformers: 4.36.1
deepspeed: 0.12.2

partial output below: Thanks for your help!

[2024-11-07 20:47:17,327][FK][WARNING] - Error locating target 'models.llama.LlamaModelForSequenceClassification.from_pretrained', see chained exception above.
full_key: model
Traceback (most recent call last):
  File "/home/work/anaconda3/envs/reasoning_dpo/lib/python3.9/site-packages/hydra/_internal/utils.py", line 639, in _locate
    obj = getattr(obj, part)
    obj = getattr(obj, part)AttributeError
: module 'models' has no attribute 'llama'
AttributeError
During handling of the above exception, another exception occurred:

···
ImportErrorTraceback (most recent call last):
:   File "/home/data/daiqun/dpo-trajectory-reasoning-main/trainer_base_ds_mul.py", line 328, in main
Error loading 'models.llama.LlamaModelForSequenceClassification.from_pretrained':
ImportError('/home/work/anaconda3/envs/reasoning_dpo/lib/python3.9/site-packages/flash_attn_2_cuda.cpython-39-x86_64-linux-gnu.so: undefined symbol: _ZNK3c105Error4whatEv')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/data/daiqun/dpo-trajectory-reasoning-main/trainer_base_ds_mul.py", line 328, in main
    model = hydra.utils.call(cfg.model, cfg.model_name_or_path, state_dict=pretrain_state_dict)
  File "/home/work/anaconda3/envs/reasoning_dpo/lib/python3.9/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 325, in instantiate_node
    return instantiate_node(
  File "/home/work/anaconda3/envs/reasoning_dpo/lib/python3.9/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 325, in instantiate_node
    _target_ = _resolve_target(node.get(_Keys.TARGET), full_key)
  File "/home/work/anaconda3/envs/reasoning_dpo/lib/python3.9/site-packages/hydra/_internal/instantiate/_instantiate2.py", line 139, in _resolve_target
    raise InstantiationException(msg) from e
hydra.errors.InstantiationException: Error locating target 'models.llama.LlamaModelForSequenceClassification.from_pretrained', see chained exception above.
full_key: model
    
···
Traceback (most recent call last):
  File "/home/work/anaconda3/envs/reasoning_dpo/lib/python3.9/site-packages/hydra/_internal/utils.py", line 639, in _locate
    obj = getattr(obj, part)
AttributeError: module 'models' has no attribute 'llama'

During handling of the above exception, another exception occurred:

···
Traceback (most recent call last):
  File "/home/data/daiqun/dpo-trajectory-reasoning-main/trainer_base_ds_mul.py", line 426, in <module>
    raise ImportError(
ImportError: Error loading 'models.llama.LlamaModelForSequenceClassification.from_pretrained':
ImportError('/home/work/anaconda3/envs/reasoning_dpo/lib/python3.9/site-packages/flash_attn_2_cuda.cpython-39-x86_64-linux-gnu.so: undefined symbol: _ZNK3c105Error4whatEv')

The above exception was the direct cause of the following exception:

···
hydra.errors.InstantiationException: Error locating target 'models.llama.LlamaModelForSequenceClassification.from_pretrained', see chained exception above.
full_key: model

SparkJiao · 2024-11-07T16:35:09Z

Maybe you can first reinstall flash-attention following https://github.com/Dao-AILab/flash-attention

If it still does not work, change the attn_implementation here from flash_attention_2 to sdpa or eager.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for more Details on Training the Reward Model #2

Request for more Details on Training the Reward Model #2

brightest66 commented Nov 7, 2024

SparkJiao commented Nov 7, 2024

brightest66 commented Nov 7, 2024

SparkJiao commented Nov 7, 2024

Request for more Details on Training the Reward Model #2

Request for more Details on Training the Reward Model #2

Comments

brightest66 commented Nov 7, 2024

SparkJiao commented Nov 7, 2024

brightest66 commented Nov 7, 2024

SparkJiao commented Nov 7, 2024