Reducing model size by converting to ORT format #416
-
I tried reducing the model size of Piper models by converting to ORT format, but the resulting file has about the same size, if not slightly larger. Is it true that Piper models are already optimized? I'm deploying to web and looking to minimize model size, but also potentially using a minimal build of onnxruntime. Currently the ort-wasm-simd-threaded.wasm file that needs to be downloaded to the browser is about 10MB. Synthesizing in the browser works great, and fast enough for real time application even using CPU on old machines, but user will need to download roughly 10MB (onnxruntime) + 60MB (model). Thank you for any suggestions. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
Piper has been integrated into Read Aloud, and released as a separate extension as well. The source code is here. Please help out if you can with some of the open issues. Issue 1 is the topic of this thread. I'm wondering if Piper models are already optimized, or if their sizes can be further reduced. As for performing inferencing at the edge, we'd like the download size to be as small as possible Issue 2 is need help compiling a JS/WASM version of piper-phonemize. It's the last piece of the puzzle needed for fully offline speech synthesis. |
Beta Was this translation helpful? Give feedback.
-
The ORT model format is mostly not about model file size but about being more efficient to load and requiring a smaller binary file size, because it is based on FlatBuffers, a format Google designed to enable efficient loading of resources in mobile games, rather than Protocol Buffers. The optimizations mentioned in the doc you linked to are mainly about improving execution performance, usually by combining ("fusing") multiple steps of the model into a single step. To reduce the model size you'd need to reduce the size of the weights, either by 16-bit floats or quantizing to 8-bit integers. Depending on how this is done it can affect quality. See https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html. I haven't tried quantizing Piper TTS models specifically, so I can't say how easy this will be. |
Beta Was this translation helpful? Give feedback.
The ORT model format is mostly not about model file size but about being more efficient to load and requiring a smaller binary file size, because it is based on FlatBuffers, a format Google designed to enable efficient loading of resources in mobile games, rather than Protocol Buffers.
The optimizations mentioned in the doc you linked to are mainly about improving execution performance, usually by combining ("fusing") multiple steps of the model into a single s…