Inference after fine tuning not working as expected #1231

troy256 · 2024-07-26T14:26:58Z

troy256
Jul 26, 2024

I fine tuned llama3:8b on a small text corpus in parquet format. It's mostly source code with a few text files and is purposely small at the moment to get the process down, but ultimately will be much larger. I used the vanilla torchtune.datasets.text_completion_dataset as the dataset class. The documentation does mention that this class is meant to be customized, but it didn't seem necessary for my data.

I trained with lora_finetune_single_device --config llama3/8B_lora_single_device which ran successfully to completion and generated several hf_model_* checkpoint files. I then used tune run generate, with an appropriately modified config that points to those checkpoint files to do some interactive testing.

However it doesn't seem to know anything about the data. For example, one of the dataset elements is some fabricated text that I had a LLM generate on a fictional file system. Multiple queries about this file system just produced hallucinations.

Any guidance would be much appreciated, thank you!

troy256 · 2024-07-31T13:37:22Z

troy256
Jul 31, 2024
Author

Nevermind, I had success with full fine tuning for this particular use case, which was just a test as I learn more about fine tuning.

6 replies

troy256 Aug 1, 2024
Author

Yes, I tuned the 4 model-*.safetensor files. Full fine tuning seemed to work much better than LoRA/QLoRA. Also different epoch counts produced sometimes wildly different results. I'm still experimenting with all the knobs and dials, but try different epoch counts such as 1 through 10 and test inference on each one.

I'm only experimenting at this point so my dataset is just fabricated data. I used parquet format and made sure the most important column, that which contains the actual data, is referenced in the dataset.column parameter. I think the default title is "text", so that I used that same column title to keep things simple.

yopzey Aug 1, 2024

Thanks, Im doing a full finetune as welll
do you think you could share you config file for inference
assuming you are running it through the tune generate command

is it something like this, replacing the checkpoint files
checkpointer:
component: torchtune.utils.FullModelHFCheckpointer
checkpoint_dir: /tmp/Meta-Llama-3-8B-Instruct
checkpoint_files:

meta_model_0.pt
meta_model_1.pt
model_type: LLAMA3
output_dir: /tmp/Meta-Llama-3-8B-Instruct
device: cuda
dtype: bf16
prompt:
quantizer: null
tokenizer:
component: torchtune.models.llama3.llama3_tokenizer
path: /tmp/Meta-Llama-3-8B-Instruct/tokenizer.model
seed: 42

troy256 Aug 1, 2024
Author

Here's my resolved config, accoridng to the output of tune run generate:

checkpointer:
  _component_: torchtune.utils.FullModelHFCheckpointer
  checkpoint_dir: /data/HF-llama3.1-8b-instruct
  checkpoint_files:
  - hf_model_0001_9.pt
  - hf_model_0002_9.pt
  - hf_model_0003_9.pt
  - hf_model_0004_9.pt
  model_type: LLAMA3
  output_dir: /data/HF-llama3.1-8b-instruct
device: cuda
dtype: bf16
enable_kv_cache: true
max_new_tokens: 300
model:
  _component_: torchtune.models.llama3.llama3_8b
prompt: 
quantizer: null
seed: 1234
temperature: 0.6
tokenizer:
  _component_: torchtune.models.llama3.llama3_tokenizer
  path: /data/HF-llama3.1-8b-instruct/original/tokenizer.model
top_k: 300

yopzey Aug 1, 2024

thanks for your help!

yopzey Aug 2, 2024

So i tuned the hf weights, and the tune generate command does not spit out any tokens, nor does anything show in the output dir

tune run generate --config inference.yaml prompt="What are some interesting sites to visit in the Bay Area?"

checkpointer:
  _component_: torchtune.utils.FullModelHFCheckpointer
  checkpoint_dir: /tmp/Meta-Llama-3-8B-Instruct-hf/
  checkpoint_files:
  - hf_model_0001_1.pt
  - hf_model_0002_1.pt
  - hf_model_0003_1.pt
  - hf_model_0004_1.pt
  model_type: LLAMA3
  output_dir: /tmp/Meta-Llama-3-8B-Instruct-hf/
device: cuda
dtype: bf16
enable_kv_cache: true
max_new_tokens: 300
model:
  _component_: torchtune.models.llama3.llama3_8b
prompt: What are some interesting sites to visit in the Bay Area?
quantizer: null
seed: 1234
temperature: 0.6
tokenizer:
  _component_: torchtune.models.llama3.llama3_tokenizer
  path: /tmp/Meta-Llama-3-8B-Instruct-hf/original/tokenizer.model
top_k: 300

output:

DEBUG:torchtune.utils.logging:Setting manual seed to local seed 1234. Local seed is seed + rank = 1234 + 0
INFO:torchtune.utils.logging:Model is initialized with precision torch.bfloat16.
INFO:torchtune.utils.logging:What are some interesting sites to visit in the Bay Area?
INFO:torchtune.utils.logging:Time for inference: 0.66 sec total, 1.52 tokens/sec
INFO:torchtune.utils.logging:Bandwidth achieved: 31.22 GB/s
INFO:torchtune.utils.logging:Memory used: 20.62 GB

so i tired converting the weights using convert_hf_to_gguf.py from llama cpp

but i get

RuntimeError: Internal: could not parse ModelProto from /tmp/complete-model/tokenizer.model

I have tried replacing the tokenizer and redownloading it from hf but none seem to work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference after fine tuning not working as expected #1231

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Inference after fine tuning not working as expected #1231

troy256 Jul 26, 2024

Replies: 1 comment · 6 replies

troy256 Jul 31, 2024 Author

troy256 Aug 1, 2024 Author

yopzey Aug 1, 2024

troy256 Aug 1, 2024 Author

yopzey Aug 1, 2024

yopzey Aug 2, 2024

troy256
Jul 26, 2024

Replies: 1 comment 6 replies

troy256
Jul 31, 2024
Author

troy256 Aug 1, 2024
Author

troy256 Aug 1, 2024
Author