In-context Learning task #4

ldqvinh · 2024-01-22T08:06:03Z

Hello,
Thanks for your work. I attempted the in-context learning training command from the experiment details, but encountered a 'loss is NaN' error. Could you share the command you used? Appreciate it.

python src/run.py \
     --dataset "induction" \
     --vocab_size 10 \
     --dataset_size 20000 \
     --min_length 1 \
     --max_length 10 \
     --n_epochs 250 \
     --batch_size 512 \
     --lr "5e-2" \
     --n_layers 2 \
     --n_heads_cat 1 \
     --n_heads_num 0 \
     --n_cat_mlps 1 \
     --n_num_mlps 0 \
     --one_hot_embed \
     --count_only \
     --seed 0 \
     --save \
     --save_code \
     --output_dir "output/induction"

The text was updated successfully, but these errors were encountered:

danfriedman0 · 2024-01-25T16:21:53Z

Thanks for taking an interest in the code! I'm not immediately sure what the issue could be, but some things to try are:

Set --min_length 10 and --max_length 10
Set the --autoregressive flag.
Set --n_cat_mlps 0 (no MLPs)
Set --n_epochs = 500.

Note that you likely need to try a number of random seeds to get a model that successfully learns the task. To save time, we also used a "patience" of 25 (this is a possible argument to the run_training function, although you would need to modify src/run.py to make it a command-line flag).

Could you also share any more details about what you observe? Do you get "loss is NaN" right away, or only after some training?

Wangcheng-Xu · 2024-01-29T04:51:26Z

Hi @danfriedman0,

I also had issue replicating the induction experiment. The command was as suggested above, which is also copied below. I used a modified file "experiment_run_n.py" that iterates through seed when a training running out of the patience with an additional "patience" argument. The training seems to return a constant loss=5.81e+29 from seed 0 all the way to 100. By the way, some other experiments seemed to work, such as "sort", "reverse", etc.

CUDA_VISIBLE_DEVICES=0 python experiment_run_n.py
--dataset "induction"
--vocab_size 10
--dataset_size 20000
--min_length 10
--max_length 10
--n_epochs 500
--batch_size 512
--patience 25
--lr "5e-2"
--n_layers 2
--n_heads_cat 1
--n_heads_num 0
--n_cat_mlps 0
--n_num_mlps 0
--one_hot_embed
--count_only
--autoregressive
--seed 0
--save
--save_code
--output_dir "output/induction"

Also, it would be great if you can share the configuration for replicating all experiments in the paper, like that for "sort" and "conll_ner" in the README.md. Thanks!

Add an example training script to reproduce the in-context learning experiment from the paper (see issue #4). An important detail is to set `--unembed_mask 0` (otherwise the model will be prevented from predicting the `unk`, which is used for this task). You may need to run the script with multiple seeds (e.g. 10) to get an initialization that learns to solve the task.

danfriedman0 · 2024-01-30T20:21:16Z

Hi all, sorry for the trouble, and thanks for the additional detail.

I think I found the main problem: you need to set --unembed_mask 0. This flag is set to 1 by default, which prevents the model from predicting pad or unk as the output token, but the unk token is a valid prediction for this task. I have uploaded a script with a command that works for me (on around 20% of seeds).

@Wangcheng-Xu : The scripts directory contains configurations used for the other experiments in the paper. Please let me know if you have any more questions.

Wangcheng-Xu · 2024-02-01T04:12:38Z

Thank you! I have tested the fixed configuration for the induction task, which works for me.

ldqvinh · 2024-02-01T07:20:06Z

Thank you to everyone involved for identifying and resolving the issue.
The updated configuration for the induction task is now functioning perfectly on my end as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In-context Learning task #4

In-context Learning task #4

ldqvinh commented Jan 22, 2024 •

edited

Loading

danfriedman0 commented Jan 25, 2024

Wangcheng-Xu commented Jan 29, 2024 •

edited

Loading

danfriedman0 commented Jan 30, 2024

Wangcheng-Xu commented Feb 1, 2024

ldqvinh commented Feb 1, 2024

In-context Learning task #4

In-context Learning task #4

Comments

ldqvinh commented Jan 22, 2024 • edited Loading

danfriedman0 commented Jan 25, 2024

Wangcheng-Xu commented Jan 29, 2024 • edited Loading

danfriedman0 commented Jan 30, 2024

Wangcheng-Xu commented Feb 1, 2024

ldqvinh commented Feb 1, 2024

ldqvinh commented Jan 22, 2024 •

edited

Loading

Wangcheng-Xu commented Jan 29, 2024 •

edited

Loading