Initial commit for instructlab standalone script #252
+3,798
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR adds the initial
standalone.py
script for running the InstructLab process as a set of Kubernetes Jobs with distributed training (Kubeflow Training Operator). Currently the script assumes valid SDG data exists in an S3 bucket and that their is an appropriate judge model endpoint available. Currently, the script will download the valid SDG data from S3, run phase 1 training, run phase 2 training, run mt_bench eval, run final eval, and then push the trained model back to an s3 bucket.This PR also includes the current README.md as well as a helper script for pushing the correct data into S3.