Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In-context Learning task #4

Open
ldqvinh opened this issue Jan 22, 2024 · 5 comments
Open

In-context Learning task #4

ldqvinh opened this issue Jan 22, 2024 · 5 comments

Comments

@ldqvinh
Copy link

ldqvinh commented Jan 22, 2024

Hello,
Thanks for your work. I attempted the in-context learning training command from the experiment details, but encountered a 'loss is NaN' error. Could you share the command you used? Appreciate it.

python src/run.py \
     --dataset "induction" \
     --vocab_size 10 \
     --dataset_size 20000 \
     --min_length 1 \
     --max_length 10 \
     --n_epochs 250 \
     --batch_size 512 \
     --lr "5e-2" \
     --n_layers 2 \
     --n_heads_cat 1 \
     --n_heads_num 0 \
     --n_cat_mlps 1 \
     --n_num_mlps 0 \
     --one_hot_embed \
     --count_only \
     --seed 0 \
     --save \
     --save_code \
     --output_dir "output/induction"
@danfriedman0
Copy link
Collaborator

Thanks for taking an interest in the code! I'm not immediately sure what the issue could be, but some things to try are:

  1. Set --min_length 10 and --max_length 10
  2. Set the --autoregressive flag.
  3. Set --n_cat_mlps 0 (no MLPs)
  4. Set --n_epochs = 500.

Note that you likely need to try a number of random seeds to get a model that successfully learns the task. To save time, we also used a "patience" of 25 (this is a possible argument to the run_training function, although you would need to modify src/run.py to make it a command-line flag).

Could you also share any more details about what you observe? Do you get "loss is NaN" right away, or only after some training?

@Wangcheng-Xu
Copy link

Wangcheng-Xu commented Jan 29, 2024

Hi @danfriedman0,

I also had issue replicating the induction experiment. The command was as suggested above, which is also copied below. I used a modified file "experiment_run_n.py" that iterates through seed when a training running out of the patience with an additional "patience" argument. The training seems to return a constant loss=5.81e+29 from seed 0 all the way to 100. By the way, some other experiments seemed to work, such as "sort", "reverse", etc.

CUDA_VISIBLE_DEVICES=0 python experiment_run_n.py
--dataset "induction"
--vocab_size 10
--dataset_size 20000
--min_length 10
--max_length 10
--n_epochs 500
--batch_size 512
--patience 25
--lr "5e-2"
--n_layers 2
--n_heads_cat 1
--n_heads_num 0
--n_cat_mlps 0
--n_num_mlps 0
--one_hot_embed
--count_only
--autoregressive
--seed 0
--save
--save_code
--output_dir "output/induction"

Also, it would be great if you can share the configuration for replicating all experiments in the paper, like that for "sort" and "conll_ner" in the README.md. Thanks!

danfriedman0 added a commit that referenced this issue Jan 30, 2024
Add an example training script to reproduce the in-context learning experiment from the paper (see issue #4).

An important detail is to set `--unembed_mask 0` (otherwise the model will be prevented from predicting the `unk`, which is used for this task). You may need to run the script with multiple seeds (e.g. 10) to get an initialization that learns to solve the task.
@danfriedman0
Copy link
Collaborator

Hi all, sorry for the trouble, and thanks for the additional detail.

I think I found the main problem: you need to set --unembed_mask 0. This flag is set to 1 by default, which prevents the model from predicting pad or unk as the output token, but the unk token is a valid prediction for this task. I have uploaded a script with a command that works for me (on around 20% of seeds).

@Wangcheng-Xu : The scripts directory contains configurations used for the other experiments in the paper. Please let me know if you have any more questions.

@Wangcheng-Xu
Copy link

Thank you! I have tested the fixed configuration for the induction task, which works for me.

@ldqvinh
Copy link
Author

ldqvinh commented Feb 1, 2024

Thank you to everyone involved for identifying and resolving the issue.
The updated configuration for the induction task is now functioning perfectly on my end as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants