Run Multi LoRA with ONNX models

Setup

Install Olive

pip install git+https://github.com/microsoft/olive

Build and install ONNX Runtime generate()

TODO: replace this with 1.20 when it is released

git clone https://github.com/microsoft/onnxruntime-genai.git
cd onnxruntime-genai
python build.py
cd build\Windows\RelWithDebInfo\wheel
pip install *.whl

Install ONNX Runtime nightly

TODO: remove this step when 1.20 is released

pip uninstall onnxruntime
pip install --pre --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ ort-nightly

Install other dependencies
```
pip install optimum peft
```
Downgrade torch

TODO: There is an export bug with torch 2.5.0 and an incompatibility with transformers>=4.45.0
```
pip uninstall torch
pip install torch==2.4
pip uninstall transformers
pip install transformers==4.44
```
Choose a model

In this example we'll use Llama-3-8b

You need to register with Meta for a license to use this model. You can do this by accessing the above page, signing in, and registering for access. Access should be granted quickly. Esnure that the huggingface-cli is installed (pip install huggingface-hub[cli]) and you are logged in via huggingface-cli login.
Locate datasets and/or existing adapters

In this example, we will two pre-tuned adapters
- Coldstart/Llama-3.1-8B-Instruct-Surfer-Dude-Personality
- Coldstart/Llama-3.1-8B-Instruct-Hillbilly-Personality

Generate model and adapters in ONNX format

Convert existing adapters into ONNX format

Note the output path cannot have any period (.) characters.

Note also that this step requires 63GB of memory on the machine on which it is running.

Export the model to ONNX format

Note: add --use_model_builder when this is ready

olive capture-onnx-graph -m meta-llama/Llama-3.1-8B-Instruct --adapter_path Coldstart/Llama-3.1-8B-Instruct-Surfer-Dude-Personality -o models\Llama-3-1-8B-Instruct-LoRA --torch_dtype float32 --use_ort_genai

(Optional) Quantize the model

olive quantize -m models\Llama-3-1-8B-Instruct-LoRA\model --algorithm rtn --implementation matmul4 -o models\Llama-3-1-8B-Instruct-LoRA-int4

Adapt model

olive generate-adapter -m models\Llama-3-1-8B-Instruct-LoRA-int4\model -o models\Llama-3-1-8B-Instruct-LoRA-int4\adapted --log_level 1

Convert adapters to ONNX

olive convert-adapters --adapter_path Coldstart/Llama-3.1-8B-Instruct-Surfer-Dude-Personality --output_path adapters\Llama-1-8B-Instruct-Surfer-Dude-Personality --dtype float32 --quantize_int4

olive convert-adapters --adapter_path Coldstart/Llama-3.1-8B-Instruct-Hillbilly-Personality --output_path adapters\Llama-1-8B-Instruct-Hillbilly-Personality --dtype float32 --quantize_int4

Write your application

See app.py

Appendix:

Fine-tune the model with a dataset

TODO: this requires CUDA

olive finetune --method qlora -m meta-llama/Meta-Llama-3-8B -d nampdn-ai/tiny-codes --train_split "train[:4096]" --eval_split "train[4096:4224]" --text_template "### Language: {programming_language} \n### Question: {prompt} \n### Answer: {response}" --per_device_train_batch_size 16 --per_device_eval_batch_size 16 --max_steps 150 --logging_steps 50 -o adapters\tiny-codes

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.gitignore		.gitignore
README.md		README.md
app.py		app.py
check_model.py		check_model.py
main.cpp		main.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Run Multi LoRA with ONNX models

Setup

Generate model and adapters in ONNX format

Convert existing adapters into ONNX format

Write your application

Appendix:

Fine-tune the model with a dataset

About

Releases

Packages

Languages

natke/multilora

Folders and files

Latest commit

History

Repository files navigation

Run Multi LoRA with ONNX models

Setup

Generate model and adapters in ONNX format

Convert existing adapters into ONNX format

Write your application

Appendix:

Fine-tune the model with a dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages