Release v0.25.0 · mosaicml/composer

What's New

1. Torch 2.4.1 Compatibility (#3609)

We've added support for torch 2.4.1, including necessary patches to Torch.

Deprecations and breaking changes

1. Microbatch device movement (#3567)

Instead of moving the entire batch to device at once, we now move each microbatch to device. This saves memory for large inputs, e.g. multimodal data, when training with many microbatches.

This change may affect certain callbacks which run operations on the batch which require it to be moved to an accelerator ahead of time, such as the two changed in this PR. There shouldn't be too many of these callbacks, so we anticipate this change will be relatively safe.

2. DeepSpeed deprecation version (#3634)

We have update the Composer version that we will remove support for DeepSpeed to 0.27.0. Please reach out on GitHub if you have any concerns about this.

3. PyTorch legacy sharded checkpoint format

PyTorch briefly used a different sharded checkpoint format than the current one, which was quickly deprecated by PyTorch. We have continued to support loading legacy format checkpoints for a while, but we will likely be removing support for this format entirely in an upcoming release. We initially removed support for saving in this format in #2262, and the original feature was added in #1902. Please reach out if you have concerns or need help converting your checkpoints to the new format.

What's Changed

Set dev version back to 0.25.0.dev0 by @snarayan21 in #3582
Microbatch Device Movement by @mvpatel2000 in #3567
Init Dist Default None by @mvpatel2000 in #3585
Explicit None Check in get_device by @mvpatel2000 in #3586
Update protobuf requirement from <5.28 to <5.29 by @dependabot in #3591
Bump databricks-sdk from 0.30.0 to 0.31.1 by @dependabot in #3592
Update ci-testing to 0.2.2 by @dakinggg in #3590
Bump Mellanox Tools by @mvpatel2000 in #3597
Roll back ci-testing for daillies by @mvpatel2000 in #3598
Revert driver changes by @mvpatel2000 in #3599
Remove step in log_image for MLFlow by @mvpatel2000 in #3601
Reduce system metrics logging frequency by @chenmoneygithub in #3604
Bump databricks-sdk from 0.31.1 to 0.32.0 by @dependabot in #3608
torch2.4.1 by @bigning in #3609
Test with torch2.4.1 image by @bigning in #3610
fix 2.4.1 test by @bigning in #3612
Remove tensor option for _global_exception_occured by @irenedea in #3611
Update error message for overwrite to be more user friendly by @mvpatel2000 in #3619
Update wandb requirement from <0.18,>=0.13.2 to >=0.13.2,<0.19 by @dependabot in #3615
Fix RNG key checking by @dakinggg in #3623
Update datasets requirement from <3,>=2.4 to >=2.4,<4 by @dependabot in #3626
Disable exceptions for MosaicML Logger by @mvpatel2000 in #3627
Fix CPU dailies by @mvpatel2000 in #3628
fix 2.4.1ckpt by @bigning in #3629
More checkpoint debug logs by @mvpatel2000 in #3632
Lower DeepSpeed deprecation version by @mvpatel2000 in #3634
Bump version 25 by @dakinggg in #3633

Full Changelog: v0.24.1...v0.25.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.25.0