Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Dolly Dataloader: context key not found! #1760

Open
pytholic opened this issue Sep 28, 2024 · 5 comments
Open

Issue with Dolly Dataloader: context key not found! #1760

pytholic opened this issue Sep 28, 2024 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@pytholic
Copy link
Contributor

Bug description

I ran into the following issue while running LoRA fine-tuning.

Stack Trace

KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/Users/lunit_haseebraja/miniconda3/envs/lora_tests/lib/python3.11/site-packages/torch/utils/data/_utils/worker.py", line 309, in _worker_loop
    data = fetcher.fetch(index)  # type: ignore[possibly-undefined]
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/lunit_haseebraja/miniconda3/envs/lora_tests/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lunit_haseebraja/miniconda3/envs/lora_tests/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 52, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
            ~~~~~~~~~~~~^^^^^
  File "/Users/lunit_haseebraja/miniconda3/envs/lora_tests/lib/python3.11/site-packages/litgpt/data/base.py", line 80, in __getitem__
    example = self.transform(example)
              ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lunit_haseebraja/miniconda3/envs/lora_tests/lib/python3.11/site-packages/litgpt/data/dolly.py", line 74, in _transform
    item["input"] = item.pop("context")
                    ^^^^^^^^^^^^^^^^^^^
KeyError: 'context'

Command

litgpt finetune_lora checkpoints/EleutherAI/pythia-70m --data Dolly --precision 16-mixed --data.num_workers 4 --train.global_batch_size 1 --train.max_seq_length 512 --data.val_split_fraction 0.0

I spent some time debugging it. It seems like _transform method is being called twice at the beginning for some reason. During the second call, they keys are not there since we are using pop. It does work with get though.
In src/litgpt/litgpt/data/dolly.py (commented parts are for debugging):

# import sys
# from pprint import pprint

def _transform(idx: int, item: dict) -> dict:
    # if "context" not in item.keys():
        # print(f"{idx}: Missing Key!")
        # pprint(item)
        # sys.exit()
    item["input"] = item.pop("context")
    item["output"] = item.pop("response")
    return item

I couldn't figure out why it is being called twice though.

What operating system are you using?

macOS

LitGPT Version

Tested on two versions. Also tested on two platforms macOS and linux.

litgpt                                   0.4.13
litgpt                                   0.4.14.dev1
@pytholic pytholic added the bug Something isn't working label Sep 28, 2024
@pytholic
Copy link
Contributor Author

@rasbt Maybe you can take a look if you have some time. I think the original implementation was done by you (if I am not mistaken).

@rasbt
Copy link
Collaborator

rasbt commented Sep 29, 2024

Thanks for the note. Not sure what happened there. Maybe I forgot to adjust the dataset as we updated the data loader. I will try to take a look next week. (In the meantime, if you got it to work, I'd appreciate a PR)

@pytholic
Copy link
Contributor Author

@rasbt I will fix it within one to two days and create a PR.

@pytholic
Copy link
Contributor Author

pytholic commented Oct 2, 2024

@rasbt could you assign this issue to me before I begin?

@rasbt
Copy link
Collaborator

rasbt commented Oct 2, 2024

Of course, happy to assign you (I just see that @Andrei-Aksionov already beat me to it though 😅)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants