Create an eval-only script for existing ckpts #736

liujch1998 · 2024-10-20T23:00:39Z

This PR adds scripts/eval.py, which evaluates one or more existing ckpts while bypassing the training steps.

It seems impossible to backfill evals back to the original wandb run, because "step" must always increase. Rewinding the run will truncate the log, which we don't want. Therefore, this script logs things to a new wandb run.

Starting from a training setup:

You can keep using the same yaml file.
Make a copy of the XXX.sh file into XXX-eval.sh, point to scripts/eval.sh, add a flag --wandb.group=XXX to ensure it logs to the same group, and specify --load_path to be either a single ckpt or all ckpts under a directory.
Make a copy of the XXX-launch.sh file into XXX-eval-launch.sh, change --task-name to XXX-eval, and change the command so it runs XXX-eval.sh.

See an example in peteish1-eval.sh and peteish1-eval-launch.sh.

dirkgr · 2024-10-25T23:36:31Z

configs/peteish1-weka.yaml

-  # - label: all-small-ppl-validation
-  #   data:
-  #     num_workers: 0
-  #     drop_last: true
-  #     # generate_doc_lengths: true
-  #     memmap_dtype: uint32
-  #     datasets:
-  #       c4_en-validation:
-  #         - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/c4_en/val/part-0-00000.npy
-  #       dolma_books-validation:
-  #         - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_books/val/part-0-00000.npy
-  #       dolma_common-crawl-validation:
-  #         - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_common-crawl/val/part-0-00000.npy
-  #       dolma_pes2o-validation:
-  #         - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_pes2o/val/part-0-00000.npy
-  #       dolma_reddit-validation:
-  #         - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_reddit/val/part-0-00000.npy
-  #       dolma_stack-validation:
-  #         - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_stack/val/part-0-00000.npy
-  #       dolma_wiki-validation:
-  #         - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_wiki/val/part-0-00000.npy
-  #       ice-validation:
-  #         - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/ice/val/part-0-00000.npy
-  #       m2d2_s2orc-validation:
-  #         - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/m2d2_s2orc/val/part-0-00000.npy
-  #       pile-validation:
-  #         - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/pile/val/part-0-00000.npy
-  #       wikitext_103-validation:
-  #         - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/wikitext_103/val/part-0-00000.npy
+  - label: all-small-ppl-validation
+    data:
+      num_workers: 0
+      drop_last: true
+      # generate_doc_lengths: true
+      memmap_dtype: uint32
+      datasets:
+        c4_en-validation:
+          - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/c4_en/val/part-0-00000.npy
+        dolma_books-validation:
+          - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_books/val/part-0-00000.npy
+        dolma_common-crawl-validation:
+          - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_common-crawl/val/part-0-00000.npy
+        dolma_pes2o-validation:
+          - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_pes2o/val/part-0-00000.npy
+        dolma_reddit-validation:
+          - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_reddit/val/part-0-00000.npy
+        dolma_stack-validation:
+          - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_stack/val/part-0-00000.npy
+        dolma_wiki-validation:
+          - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_wiki/val/part-0-00000.npy
+        ice-validation:
+          - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/ice/val/part-0-00000.npy
+        m2d2_s2orc-validation:
+          - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/m2d2_s2orc/val/part-0-00000.npy
+        pile-validation:
+          - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/pile/val/part-0-00000.npy
+        wikitext_103-validation:
+          - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/wikitext_103/val/part-0-00000.npy


Did you mean to commit this change?

dirkgr · 2024-10-25T23:36:56Z

configs/peteish1-weka.yaml

+  - label: copa_rc_0shot
+    type: downstream


Do we care about any of the 0-shots?

dirkgr · 2024-10-25T23:37:17Z

olmo/train.py

-        if wandb.run is not None:
-            wandb.finish(exit_code=exit_code, quiet=True)
+        # if wandb.run is not None:
+        #     wandb.finish(exit_code=exit_code, quiet=True)


Debug code?

dirkgr · 2024-10-25T23:39:15Z

scripts/eval.py

+    # train_loader = build_train_dataloader(cfg)
+    train_loader = None


Is this always going to be None? If so, we don't need it.

dirkgr · 2024-10-25T23:39:53Z

scripts/eval.py

+    if 'step' in cfg.load_path.split('/')[-1]:
+        load_paths = [cfg.load_path]
+    else:
+        # This globbing does not work with remote paths.


How is that problem handled then?

dirkgr · 2024-10-25T23:40:18Z

scripts/eval.py

+        log.info(f"Number of non-embedding parameters: {olmo_model.num_params(include_embedding=False):,d}")
+        log.info(f"Peak GPU Memory (MB) before {cfg.distributed_strategy}: {int(peak_gpu_memory() or 0)}")
+
+        olmo_model.set_activation_checkpointing(cfg.activation_checkpointing)


If we only ever eval, we don't need this.

dirkgr · 2024-10-25T23:41:18Z

scripts/eval.py

+        optim = build_optimizer(cfg, dist_model)
+        scheduler = build_scheduler(cfg)


We don't need optimizers and schedulers if we're just evaluating.

So you're creating these only so that you can produce a Trainer object?

How hard is it to pull the stuff you need out of the Trainer object, so we don't have to do so many things we don't need? It makes me particularly uncomfortable that you're creating a trainer with a None data loader, which isn't supposed to work. It just happens to work.

liujch1998 and others added 30 commits October 15, 2024 21:46

Eval-only script

f879166

Fix env

7619ad7

Disable saving data indices

2b87757

Restore train dataloader

aeabd02

Do not load train state

414277b

Bypass trainer state

b27a822

Fix save folder

ea0cf07

Switch to loading sharded ckpt

746c674

Eval peteish1

cb54f80

Switch to 1 node

7b40310

Make things work for single node

7f994fe

Make things work for single node

9a5f076

Make things work for single node

d1e05fd

Make things work for single node

b455b99

Load train_dataloader

2f4d252

Change to another ckpt

331b0ad

Do not load train_dataloader and trainer_state

6acabf3

run for annealed model

2415719

Backfill does not seem possible; Evaluating multiple ckpts

788b397

Fix import

ed074da

Fix glob

eb628a8

Fix glob

ee6d55f

Fix load

901ed16

Switch back to the real peteish1

90e1f93

Fix ckpt loading

003cd29

Print sum of params

e67812c

Skip step0

c27037b

Print param sum of dist_model

1996d04

Print per-batch ce loss

d1c528d

Update

01b1dc4

liujch1998 added 5 commits October 23, 2024 19:44

Reconstruct models when iterating ckpts

35f2186

Do not quit wandb; Do not create train_loader

504fb2a

Switch to the real peteish1

a51a63b

Massage the group

c362030

Fix bug

a36b9e0

liujch1998 requested a review from dirkgr October 24, 2024 18:18

liujch1998 marked this pull request as ready for review October 24, 2024 18:18

liujch1998 and others added 4 commits October 24, 2024 18:25

Revert Peteish7 changes

97d78ed

Fix lint

9097624

Update CHANGELOG

752b9cc

Merge branch 'main' into backfill

ff41d5b

dirkgr requested changes Oct 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create an eval-only script for existing ckpts #736

Create an eval-only script for existing ckpts #736

liujch1998 commented Oct 20, 2024 •

edited

Loading

dirkgr Oct 25, 2024

dirkgr Oct 25, 2024

dirkgr Oct 25, 2024

dirkgr Oct 25, 2024

dirkgr Oct 25, 2024

dirkgr Oct 25, 2024

dirkgr Oct 25, 2024

dirkgr Oct 25, 2024

		# train_loader = build_train_dataloader(cfg)
		train_loader = None

		optim = build_optimizer(cfg, dist_model)
		scheduler = build_scheduler(cfg)

Create an eval-only script for existing ckpts #736

Are you sure you want to change the base?

Create an eval-only script for existing ckpts #736

Conversation

liujch1998 commented Oct 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liujch1998 commented Oct 20, 2024 •

edited

Loading