Issue with Dolly Dataloader: `context` key not found! #1760

pytholic · 2024-09-28T18:58:45Z

Bug description

I ran into the following issue while running LoRA fine-tuning.

Stack Trace

KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/Users/lunit_haseebraja/miniconda3/envs/lora_tests/lib/python3.11/site-packages/torch/utils/data/_utils/worker.py", line 309, in _worker_loop
    data = fetcher.fetch(index)  # type: ignore[possibly-undefined]
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/lunit_haseebraja/miniconda3/envs/lora_tests/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lunit_haseebraja/miniconda3/envs/lora_tests/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 52, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
            ~~~~~~~~~~~~^^^^^
  File "/Users/lunit_haseebraja/miniconda3/envs/lora_tests/lib/python3.11/site-packages/litgpt/data/base.py", line 80, in __getitem__
    example = self.transform(example)
              ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/lunit_haseebraja/miniconda3/envs/lora_tests/lib/python3.11/site-packages/litgpt/data/dolly.py", line 74, in _transform
    item["input"] = item.pop("context")
                    ^^^^^^^^^^^^^^^^^^^
KeyError: 'context'

Command

litgpt finetune_lora checkpoints/EleutherAI/pythia-70m --data Dolly --precision 16-mixed --data.num_workers 4 --train.global_batch_size 1 --train.max_seq_length 512 --data.val_split_fraction 0.0

I spent some time debugging it. It seems like _transform method is being called twice at the beginning for some reason. During the second call, they keys are not there since we are using pop. It does work with get though.
In src/litgpt/litgpt/data/dolly.py (commented parts are for debugging):

# import sys
# from pprint import pprint

def _transform(idx: int, item: dict) -> dict:
    # if "context" not in item.keys():
        # print(f"{idx}: Missing Key!")
        # pprint(item)
        # sys.exit()
    item["input"] = item.pop("context")
    item["output"] = item.pop("response")
    return item

I couldn't figure out why it is being called twice though.

What operating system are you using?

macOS

LitGPT Version

Tested on two versions. Also tested on two platforms macOS and linux.

litgpt                                   0.4.13
litgpt                                   0.4.14.dev1

The text was updated successfully, but these errors were encountered:

pytholic · 2024-09-28T18:59:31Z

@rasbt Maybe you can take a look if you have some time. I think the original implementation was done by you (if I am not mistaken).

rasbt · 2024-09-29T14:11:33Z

Thanks for the note. Not sure what happened there. Maybe I forgot to adjust the dataset as we updated the data loader. I will try to take a look next week. (In the meantime, if you got it to work, I'd appreciate a PR)

pytholic · 2024-09-30T03:30:35Z

@rasbt I will fix it within one to two days and create a PR.

pytholic · 2024-10-02T02:14:58Z

@rasbt could you assign this issue to me before I begin?

rasbt · 2024-10-02T12:46:18Z

Of course, happy to assign you (I just see that @Andrei-Aksionov already beat me to it though 😅)

pytholic added the bug Something isn't working label Sep 28, 2024

Andrei-Aksionov assigned pytholic Oct 2, 2024

pytholic mentioned this issue Oct 2, 2024

[fix][1760] Added fix for the missing context key issue in dolly! #1766

Merged

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with Dolly Dataloader: `context` key not found! #1760

Issue with Dolly Dataloader: `context` key not found! #1760

pytholic commented Sep 28, 2024

pytholic commented Sep 28, 2024

rasbt commented Sep 29, 2024

pytholic commented Sep 30, 2024

pytholic commented Oct 2, 2024

rasbt commented Oct 2, 2024

Issue with Dolly Dataloader: context key not found! #1760

Issue with Dolly Dataloader: context key not found! #1760

Comments

pytholic commented Sep 28, 2024

Bug description

What operating system are you using?

LitGPT Version

pytholic commented Sep 28, 2024

rasbt commented Sep 29, 2024

pytholic commented Sep 30, 2024

pytholic commented Oct 2, 2024

rasbt commented Oct 2, 2024

Issue with Dolly Dataloader: `context` key not found! #1760

Issue with Dolly Dataloader: `context` key not found! #1760