Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an eval-only script for existing ckpts #736

Open
wants to merge 39 commits into
base: main
Choose a base branch
from
Open

Conversation

liujch1998
Copy link

@liujch1998 liujch1998 commented Oct 20, 2024

This PR adds scripts/eval.py, which evaluates one or more existing ckpts while bypassing the training steps.

It seems impossible to backfill evals back to the original wandb run, because "step" must always increase. Rewinding the run will truncate the log, which we don't want. Therefore, this script logs things to a new wandb run.

Starting from a training setup:

  • You can keep using the same yaml file.
  • Make a copy of the XXX.sh file into XXX-eval.sh, point to scripts/eval.sh, add a flag --wandb.group=XXX to ensure it logs to the same group, and specify --load_path to be either a single ckpt or all ckpts under a directory.
  • Make a copy of the XXX-launch.sh file into XXX-eval-launch.sh, change --task-name to XXX-eval, and change the command so it runs XXX-eval.sh.

See an example in peteish1-eval.sh and peteish1-eval-launch.sh.

@liujch1998 liujch1998 marked this pull request as ready for review October 24, 2024 18:18
Comment on lines -111 to +139
# - label: all-small-ppl-validation
# data:
# num_workers: 0
# drop_last: true
# # generate_doc_lengths: true
# memmap_dtype: uint32
# datasets:
# c4_en-validation:
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/c4_en/val/part-0-00000.npy
# dolma_books-validation:
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_books/val/part-0-00000.npy
# dolma_common-crawl-validation:
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_common-crawl/val/part-0-00000.npy
# dolma_pes2o-validation:
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_pes2o/val/part-0-00000.npy
# dolma_reddit-validation:
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_reddit/val/part-0-00000.npy
# dolma_stack-validation:
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_stack/val/part-0-00000.npy
# dolma_wiki-validation:
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_wiki/val/part-0-00000.npy
# ice-validation:
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/ice/val/part-0-00000.npy
# m2d2_s2orc-validation:
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/m2d2_s2orc/val/part-0-00000.npy
# pile-validation:
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/pile/val/part-0-00000.npy
# wikitext_103-validation:
# - /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/wikitext_103/val/part-0-00000.npy
- label: all-small-ppl-validation
data:
num_workers: 0
drop_last: true
# generate_doc_lengths: true
memmap_dtype: uint32
datasets:
c4_en-validation:
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/c4_en/val/part-0-00000.npy
dolma_books-validation:
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_books/val/part-0-00000.npy
dolma_common-crawl-validation:
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_common-crawl/val/part-0-00000.npy
dolma_pes2o-validation:
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_pes2o/val/part-0-00000.npy
dolma_reddit-validation:
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_reddit/val/part-0-00000.npy
dolma_stack-validation:
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_stack/val/part-0-00000.npy
dolma_wiki-validation:
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/dolma_wiki/val/part-0-00000.npy
ice-validation:
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/ice/val/part-0-00000.npy
m2d2_s2orc-validation:
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/m2d2_s2orc/val/part-0-00000.npy
pile-validation:
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/pile/val/part-0-00000.npy
wikitext_103-validation:
- /weka/oe-training-default/ai2-llm/eval-data/perplexity/v3_small_dolma2-tokenizer/wikitext_103/val/part-0-00000.npy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to commit this change?

Comment on lines +378 to +379
- label: copa_rc_0shot
type: downstream
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care about any of the 0-shots?

Comment on lines -1371 to +1372
if wandb.run is not None:
wandb.finish(exit_code=exit_code, quiet=True)
# if wandb.run is not None:
# wandb.finish(exit_code=exit_code, quiet=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug code?

Comment on lines +119 to +120
# train_loader = build_train_dataloader(cfg)
train_loader = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this always going to be None? If so, we don't need it.

if 'step' in cfg.load_path.split('/')[-1]:
load_paths = [cfg.load_path]
else:
# This globbing does not work with remote paths.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is that problem handled then?

log.info(f"Number of non-embedding parameters: {olmo_model.num_params(include_embedding=False):,d}")
log.info(f"Peak GPU Memory (MB) before {cfg.distributed_strategy}: {int(peak_gpu_memory() or 0)}")

olmo_model.set_activation_checkpointing(cfg.activation_checkpointing)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only ever eval, we don't need this.

Comment on lines +225 to +226
optim = build_optimizer(cfg, dist_model)
scheduler = build_scheduler(cfg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need optimizers and schedulers if we're just evaluating.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you're creating these only so that you can produce a Trainer object?

How hard is it to pull the stuff you need out of the Trainer object, so we don't have to do so many things we don't need? It makes me particularly uncomfortable that you're creating a trainer with a None data loader, which isn't supposed to work. It just happens to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants