Skip to content

Deterministic ML Models execution using Python frameworks

Notifications You must be signed in to change notification settings

backend-developers-ltd/deterministic-ml

Repository files navigation

Deterministic ML Models execution using Python frameworks

 Continuous Integration License python versions PyPI version

This project is two-part:

  • documentation that describes how to ensure deterministic execution of ML models across different frameworks
  • a Python package that provides utilities that help to ensure deterministic execution of ML models across different frameworks and versions

Currently supported frameworks and inference engines: CUDA-based, PyTorch, vLLM.

The goal is to be able to reproduce exactly the same results on another machine using the same software. This means, finding a balance between performance and hardware restrictions without compromising reproducibility. I.e. if limiting to a single GPU model and vRAM size is required to achieve reproducibility, then it is also acceptable solution, especially if otherwise it would require "dumbing down" other cards just to achieve the same results.

Experiment results so far

Through Integration testing we can see that the output of the model can be achieved in a deterministic way.

Here is the summary of the results for vLLM running llama3 model:

  • each card GPU model (combined with its vRAM configuration) has a different output, but is consistent across runs
  • GPU interface (SXM4, PCIe) does not affect the output
  • A100 80GB and A100X 80GB produce the same output
  • 2x A100 40GB do not produce the same output as 1x A100 80GB
  • driver&CUDA may influence results, especially in case of cards with higher "Compute Capacity" feature set, e.g. H100, as opposed to A100 which seems to produce same results with wider range of versions. More in-depth investigation is required here. This will likely depend on exactl ML model or to be more exact - features used to execute that model.

To learn more about this particular example, please refer to the Integration testing documentation and the tests/integration/experiments/vllm_llama_3_70b_instruct_awq experiment code.

Usage

Important

This package uses ApiVer, make sure to import deterministic_ml.v1.

pip install deterministic_ml[vllm]  # pick the right extra for your use case, e.g. [vllm] or [torch]

Versioning

This package uses Semantic Versioning. TL;DR you are safe to use compatible release version specifier ~=MAJOR.MINOR in your pyproject.toml or requirements.txt.

Additionally, this package uses ApiVer to further reduce the risk of breaking changes. This means, the public API of this package is explicitly versioned, e.g. deterministic_ml.v1, and will not change in a backwards-incompatible way even when deterministic_ml.v2 is released.

Internal packages, i.e. prefixed by deterministic_ml._ do not share these guarantees and may change in a backwards-incompatible way at any time even in patch releases.

Development

Pre-requisites:

Ideally, you should run nox -t format lint before every commit to ensure that the code is properly formatted and linted. Before submitting a PR, make sure that tests pass as well, you can do so using:

nox -t check # equivalent to `nox -t format lint test`

If you wish to install dependencies into .venv so your IDE can pick them up, you can do so using:

pdm install --dev

Contributing

Contributions are welcome, especially ones that add to docs/IMPROVING_CONSITENCY.md docs expanding the list of recommendations for improving the consistency of inference results when using various python frameworks.

About

Deterministic ML Models execution using Python frameworks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages