Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an eval-only script for existing ckpts #736

Open
wants to merge 39 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
f879166
Eval-only script
liujch1998 Oct 15, 2024
7619ad7
Fix env
liujch1998 Oct 15, 2024
2b87757
Disable saving data indices
liujch1998 Oct 15, 2024
aeabd02
Restore train dataloader
liujch1998 Oct 15, 2024
414277b
Do not load train state
liujch1998 Oct 15, 2024
b27a822
Bypass trainer state
liujch1998 Oct 15, 2024
ea0cf07
Fix save folder
liujch1998 Oct 16, 2024
746c674
Switch to loading sharded ckpt
liujch1998 Oct 17, 2024
cb54f80
Eval peteish1
liujch1998 Oct 17, 2024
7b40310
Switch to 1 node
liujch1998 Oct 17, 2024
7f994fe
Make things work for single node
liujch1998 Oct 17, 2024
9a5f076
Make things work for single node
liujch1998 Oct 17, 2024
d1e05fd
Make things work for single node
liujch1998 Oct 17, 2024
b455b99
Make things work for single node
liujch1998 Oct 17, 2024
2f4d252
Load train_dataloader
liujch1998 Oct 17, 2024
331b0ad
Change to another ckpt
liujch1998 Oct 17, 2024
6acabf3
Do not load train_dataloader and trainer_state
liujch1998 Oct 17, 2024
2415719
run for annealed model
AkshitaB Oct 18, 2024
788b397
Backfill does not seem possible; Evaluating multiple ckpts
liujch1998 Oct 20, 2024
ed074da
Fix import
liujch1998 Oct 20, 2024
eb628a8
Fix glob
liujch1998 Oct 20, 2024
ee6d55f
Fix glob
liujch1998 Oct 20, 2024
901ed16
Fix load
liujch1998 Oct 20, 2024
90e1f93
Switch back to the real peteish1
liujch1998 Oct 20, 2024
003cd29
Fix ckpt loading
liujch1998 Oct 20, 2024
e67812c
Print sum of params
liujch1998 Oct 21, 2024
c27037b
Skip step0
liujch1998 Oct 21, 2024
1996d04
Print param sum of dist_model
liujch1998 Oct 21, 2024
d1c528d
Print per-batch ce loss
liujch1998 Oct 21, 2024
01b1dc4
Update
liujch1998 Oct 22, 2024
35f2186
Reconstruct models when iterating ckpts
liujch1998 Oct 23, 2024
504fb2a
Do not quit wandb; Do not create train_loader
liujch1998 Oct 23, 2024
a51a63b
Switch to the real peteish1
liujch1998 Oct 24, 2024
c362030
Massage the group
liujch1998 Oct 24, 2024
a36b9e0
Fix bug
liujch1998 Oct 24, 2024
97d78ed
Revert Peteish7 changes
liujch1998 Oct 24, 2024
9097624
Fix lint
liujch1998 Oct 24, 2024
752b9cc
Update CHANGELOG
liujch1998 Oct 24, 2024
ff41d5b
Merge branch 'main' into backfill
dirkgr Oct 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- Added ability to try loading latest checkpoint from save folder using `--try_load_latest_save`.
- Added support for flash attention and gradient checkpointing to `hf_olmo`.
- Added an eval-only script that evaluates existing checkpoints on specified tasks.
- Added `effective_n_kv_heads` to OLMoConfig for hacky VLLM support.


## [v0.5.0](https://github.com/allenai/OLMo/releases/tag/v0.5.0) - 2024-08-26

- Fixed conversion to HuggingFace model for DDP-trained models.
Expand Down Expand Up @@ -45,7 +47,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Swapped in correct flan data mix.
- Fix bug where the attention norm, when applied before the attention block, was modifying the residual stream.
- Fixed `OLMo.from_checkpoint()` so that it correctly loads `olmo_core` and `torch_new` style checkpoints.
- Fixed `preserve_rng_state` being incorrectly set to False when doing gradient checkpointing with dropout
- Fixed `preserve_rng_state` being incorrectly set to False when doing gradient checkpointing with dropout


## [v0.4.0](https://github.com/allenai/OLMo/releases/tag/v0.4.0) - 2024-07-11
Expand Down
282 changes: 252 additions & 30 deletions configs/peteish1-weka.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -108,35 +108,35 @@ eval_interval: 1000
eval_subset_num_batches: -1
device_eval_batch_size: ${device_train_microbatch_size}
evaluators:
# - label: all-small-ppl-validation
# data:
# num_workers: 0
# drop_last: true
# # generate_doc_lengths: true
# memmap_dtype: uint32
# datasets:
# c4_en-validation:
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/c4_en/val/part-0-00000.npy
# dolma_books-validation:
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_books/val/part-0-00000.npy
# dolma_common-crawl-validation:
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_common-crawl/val/part-0-00000.npy
# dolma_pes2o-validation:
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_pes2o/val/part-0-00000.npy
# dolma_reddit-validation:
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_reddit/val/part-0-00000.npy
# dolma_stack-validation:
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_stack/val/part-0-00000.npy
# dolma_wiki-validation:
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_wiki/val/part-0-00000.npy
# ice-validation:
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/ice/val/part-0-00000.npy
# m2d2_s2orc-validation:
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/m2d2_s2orc/val/part-0-00000.npy
# pile-validation:
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/pile/val/part-0-00000.npy
# wikitext_103-validation:
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/wikitext_103/val/part-0-00000.npy
- label: all-small-ppl-validation
data:
num_workers: 0
drop_last: true
# generate_doc_lengths: true
memmap_dtype: uint32
datasets:
c4_en-validation:
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/c4_en/val/part-0-00000.npy
dolma_books-validation:
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_books/val/part-0-00000.npy
dolma_common-crawl-validation:
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_common-crawl/val/part-0-00000.npy
dolma_pes2o-validation:
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_pes2o/val/part-0-00000.npy
dolma_reddit-validation:
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_reddit/val/part-0-00000.npy
dolma_stack-validation:
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_stack/val/part-0-00000.npy
dolma_wiki-validation:
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_wiki/val/part-0-00000.npy
ice-validation:
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/ice/val/part-0-00000.npy
m2d2_s2orc-validation:
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/m2d2_s2orc/val/part-0-00000.npy
pile-validation:
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/pile/val/part-0-00000.npy
wikitext_103-validation:
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/wikitext_103/val/part-0-00000.npy
Comment on lines -111 to +139
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to commit this change?


##########################
# Downstream evaluations #
Expand All @@ -155,7 +155,7 @@ evaluators:

- label: boolq
type: downstream

- label: sciq
type: downstream

Expand Down Expand Up @@ -231,6 +231,228 @@ evaluators:
- label: arc_easy_ppl
type: downstream

- label: piqa_rc_0shot
type: downstream

- label: piqa_rc_0shot_bpb
type: downstream

- label: piqa_rc_5shot
type: downstream

- label: piqa_rc_5shot_bpb
type: downstream

- label: piqa_mc_5shot
type: downstream

- label: piqa_mc_5shot_bpb
type: downstream

- label: hellaswag_rc_0shot
type: downstream

- label: hellaswag_rc_0shot_bpb
type: downstream

- label: hellaswag_rc_5shot
type: downstream

- label: hellaswag_rc_5shot_bpb
type: downstream

- label: hellaswag_mc_5shot
type: downstream

- label: hellaswag_mc_5shot_bpb
type: downstream

- label: winogrande_rc_0shot
type: downstream

- label: winogrande_rc_0shot_bpb
type: downstream

- label: winogrande_rc_5shot
type: downstream

- label: winogrande_rc_5shot_bpb
type: downstream

- label: winogrande_mc_5shot
type: downstream

- label: winogrande_mc_5shot_bpb
type: downstream

- label: openbookqa_rc_0shot
type: downstream

- label: openbookqa_rc_0shot_bpb
type: downstream

- label: openbookqa_rc_5shot
type: downstream

- label: openbookqa_rc_5shot_bpb
type: downstream

- label: openbookqa_mc_5shot
type: downstream

- label: openbookqa_mc_5shot_bpb
type: downstream

- label: boolq_rc_0shot
type: downstream

- label: boolq_rc_0shot_bpb
type: downstream

- label: boolq_rc_5shot
type: downstream

- label: boolq_rc_5shot_bpb
type: downstream

- label: boolq_mc_5shot
type: downstream

- label: boolq_mc_5shot_bpb
type: downstream

- label: sciq_rc_0shot
type: downstream

- label: sciq_rc_0shot_bpb
type: downstream

# - label: sciq_rc_5shot
# type: downstream

# - label: sciq_rc_5shot_bpb
# type: downstream

# - label: sciq_mc_5shot
# type: downstream

# - label: sciq_mc_5shot_bpb
# type: downstream

- label: arc_easy_rc_0shot
type: downstream

- label: arc_easy_rc_0shot_bpb
type: downstream

- label: arc_easy_rc_5shot
type: downstream

- label: arc_easy_rc_5shot_bpb
type: downstream

- label: arc_easy_mc_5shot
type: downstream

- label: arc_easy_mc_5shot_bpb
type: downstream

- label: arc_challenge_rc_0shot
type: downstream

- label: arc_challenge_rc_0shot_bpb
type: downstream

- label: arc_challenge_rc_5shot
type: downstream

- label: arc_challenge_rc_5shot_bpb
type: downstream

- label: arc_challenge_mc_5shot
type: downstream

- label: arc_challenge_mc_5shot_bpb
type: downstream

- label: copa_rc_0shot
type: downstream
Comment on lines +378 to +379
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care about any of the 0-shots?


- label: copa_rc_0shot_bpb
type: downstream

# - label: copa_rc_5shot
# type: downstream

# - label: copa_rc_5shot_bpb
# type: downstream

# - label: copa_mc_5shot
# type: downstream

# - label: copa_mc_5shot_bpb
# type: downstream

- label: csqa_rc_0shot
type: downstream

- label: csqa_rc_0shot_bpb
type: downstream

- label: csqa_rc_5shot
type: downstream

- label: csqa_rc_5shot_bpb
type: downstream

- label: csqa_mc_5shot
type: downstream

- label: csqa_mc_5shot_bpb
type: downstream

- label: socialiqa_rc_0shot
type: downstream

- label: socialiqa_rc_0shot_bpb
type: downstream

- label: socialiqa_rc_5shot
type: downstream

- label: socialiqa_rc_5shot_bpb
type: downstream

- label: socialiqa_mc_5shot
type: downstream

- label: socialiqa_mc_5shot_bpb
type: downstream

- label: mmlu_stem_var_bpb
type: downstream

- label: mmlu_humanities_var_bpb
type: downstream

- label: mmlu_social_sciences_var_bpb
type: downstream

- label: mmlu_other_var_bpb
type: downstream

- label: mmlu_stem_bpb
type: downstream

- label: mmlu_humanities_bpb
type: downstream

- label: mmlu_social_sciences_bpb
type: downstream

- label: mmlu_other_bpb
type: downstream

data:
pad_direction: right
# generate_doc_lengths: true
Expand Down
2 changes: 1 addition & 1 deletion configs/peteish7-weka.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ evaluators:

- label: boolq
type: downstream

- label: sciq
type: downstream

Expand Down
4 changes: 2 additions & 2 deletions olmo/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -1368,8 +1368,8 @@ def close(self, exit_code: int = 0) -> None:
gc.enable()
else:
gc.disable()
if wandb.run is not None:
wandb.finish(exit_code=exit_code, quiet=True)
# if wandb.run is not None:
# wandb.finish(exit_code=exit_code, quiet=True)
Comment on lines -1371 to +1372
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug code?


def __enter__(self) -> Trainer:
return self
Expand Down
41 changes: 41 additions & 0 deletions scripts/beaker/peteish/peteish1-eval-launch.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
#!/usr/bin/env bash

set -ex

NUM_NODES=16

gantry run \
--allow-dirty \
--workspace ai2/OLMo-pretraining-stability \
--task-name peteish1-eval \
--description "Pete-ish 1B eval" \
--priority high \
--preemptible \
--beaker-image petew/olmo-torch23-gantry \
--cluster ai2/jupiter-cirrascale-2 \
--gpus 8 \
--replicas "${NUM_NODES}" \
--leader-selection \
--host-networking \
--propagate-failure \
--propagate-preemption \
--synchronized-start-timeout 90m \
--budget ai2/oe-training \
--no-nfs \
--weka oe-training-default:/weka/oe-training-default \
--no-python \
--env LOG_FILTER_TYPE=local_rank0_only \
--env OMP_NUM_THREADS=8 \
--env OLMO_TASK=model \
--env R2_PROFILE=R2 \
--env S3_PROFILE=S3 \
--env WEKA_PROFILE=WEKA \
--env-secret AWS_CONFIG=PETEW_AWS_CONFIG \
--env-secret AWS_CREDENTIALS=PETEW_AWS_CREDENTIALS \
--env-secret R2_ENDPOINT_URL=R2_ENDPOINT_URL \
--env-secret WEKA_ENDPOINT_URL=WEKA_ENDPOINT_URL \
--env-secret WANDB_API_KEY=JIACHENGL_WANDB_API_KEY \
--shared-memory 10GiB \
--yes \
--timeout=-1 \
-- /bin/bash -c "scripts/beaker/peteish/peteish1-eval.sh \$BEAKER_LEADER_REPLICA_HOSTNAME ${NUM_NODES} \$BEAKER_REPLICA_RANK"
Loading
Loading