Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding configs related to DCLM #663

Open
wants to merge 101 commits into
base: fineweb_data
Choose a base branch
from
Open

Adding configs related to DCLM #663

wants to merge 101 commits into from

Conversation

abhinavg4
Copy link
Contributor

DCLM 7B related configs

@abhinavg4 abhinavg4 requested review from dlwh and Ivan-Zhou July 18, 2024 08:35
@Ivan-Zhou
Copy link
Contributor

LGTM. Can you run pre-commit checks to fix the pre-commit issue? I think it is just some formatting issue.

pre-commit run --all-files

@@ -64,6 +65,7 @@ class LlamaConfig(HFCompatConfig):
activation_function: str = "silu"
initializer_range: float = 0.02
layer_norm_epsilon: float = 1e-5
z_loss_weight: float = 0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would rather this not be a property of the model's config but an option on TrainLmConfig, and define a loss function in train_lm and pass it into trainer

loss = cross_entropy_loss(
logits, self.Vocab, target_y, reduction, reduction_axis=reduction_axis, where=example.loss_mask
)
if hasattr(self.config, "z_loss_weight") and self.config.z_loss_weight > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really don't like using this here. much cleaner to just pull out the loss function

@abhinavg4 abhinavg4 requested a review from dlwh July 19, 2024 19:27
from levanter.utils.jax_utils import parameter_count


logger = logging.getLogger(__name__)


class ModuleComputeZLoss(ComputeLossFunction[M, X]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i still don't like this but I think I can't really articulate what I want. i'm gonna push a change to my fork

blahBlahhhJ and others added 16 commits July 25, 2024 17:53
* refactor queued-resources

* fix multislice

* add auto tear down

* reuse docker image

* tiny fix

* switch to concurrent executor for parallel subprocesses & small fix & logs
* Add llama 1b with fineweb txt

* replace with 50 fineweb urls

* wip

* revert many of the changes, which seems to fix the crashing

* revert many of the changes, which seems to fix the crashing

* remove now-unused option

* cleanup

* cleanup

* sigh

* Adding changes for dclm

---------

Co-authored-by: Ivan Zhou <[email protected]>
Co-authored-by: Abhinav Garg <[email protected]>
Bumps [ray[default]](https://github.com/ray-project/ray) from 2.32.0 to 2.34.0.
- [Release notes](https://github.com/ray-project/ray/releases)
- [Commits](ray-project/ray@ray-2.32.0...ray-2.34.0)

---
updated-dependencies:
- dependency-name: ray[default]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* wandb seems to be broken in latest release

* oops

* what?
* add mounting dir

* minor fix

* support abs and rel path

* add docs

* refactor to extra context

* minor fix docs

* minor fix

* modify docs
TheQuantumFractal and others added 30 commits September 25, 2024 16:20
1. num_tpus=1 is actually a bad idea because Ray will mask out the other
tpus
2. force non-docker workloads to run in a separate process for stability
…ards again, (re)add configuration metadata to cache (#752)

Co-authored-by: Ahmed Ahmed <[email protected]>
Pulls in the New Mixture Features Into Audio Space!

Tested that this fixes the previous epoching errors in the whisper_tiny
config.
…g batches instead of a ray actor/task (#757)

About a 5x speedup. Memory usage isn't super well controlled in mixtures
and that needs some work
… head node, add code to change max size of actor pool
#762)

This is marginally slower, but pile now builds fine on a v4-32, which is
an improvement.
This PR creates a `ParquetDataSource` class to support loading
`.parquet` files.
Closes #763
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants