-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New PR: Allow users to add their own customised model without editing existing faster-whisper code #1054
Conversation
… existing faster-whisper code Added a function to allow users to add their own Hugging Face ct2 models. Provides a user-friendly way to test other models.
Hello. You can do this very easily already with the current stable version. For example: WhisperModel("deepdml/faster-whisper-large-v3-turbo-ct2", device="cuda", compute_type="float16") |
Hi jordimas, I do understand that it can be done easily with the current stable version but I am looking in the perspective of someone creating projects based on faster-whisper. |
I don't understand what this pull request does. Are you referring to adding a list of custom links to ct2 models above and beyondt systran ones on huggingface essentiallky? |
Yes. Basically, this PR allows users to directly interfere with the _MODELS variable within the utils.py.
Users can directly add to the existing list of models by simply using the function and use the custom model.
This essentially allows users to use any custom model with any custom name so long as it is within hugging face. It also allows users to overwrite any existing Systran models if necessary. |
I like the general proposition. I think it's pure laziness for Systran to only provide select model precisions such as float16...yet ctranslate2 supports float32, bfloat16, int8, int8_float16, and so on... To obviate this laziness, I had to create and upload all my own precisions and/or quantizations here: https://huggingface.co/ctranslate2-4you The way I handle it in my program is to simply use a dictionary. For example, if an option withing a GUI pulldown menu is
In other words...1) user selects an option from pulldown menu...2) user clicks load button or whatever it's called...3) when the button is clicked, the script takes whatever item is selected in the pulldown menu and gets the relevant dictionary entry...4) once it gets the relevant dictionary entry, it specifically returns the "repo_id" child key, which is the huggingface repo id...5_ it this huggingface repo ID is what's actually used in your logic that runs the whisper model. Then you can simply add new items to the dictionary! HOWEVER, you'll also need to dynamically change the Again...in my dictionary you see a "precision" child key...simply return this child key value (just like you returned the one for "repo_id") and use it for the Here's my entire dictionary, for example:
Hope it helps! |
@blackpolarz So technically @jordimas is correct, but you need to instead implement a solution somewhat like mine if you want to allow a user to dynamically select the size or precision of whisper model to use...you don't want a crap ton of if, else, elif stuff, one for each permutation of model size and precision...that's ridiculous, and I used to do that. You can also use string manipulation or mappings...but I like the dictionary approach because: It allows me to put the dictionary into my Reduces the code in the actual script that performs the transcription. Is damn reliable and you don't have to scrounge for errors in string manipulation. You can easily add/remove models...comment out portions of the dictionary, etc. |
@BBC-Esq Thanks for recommendation and I do agree that your method does work. Either way, this is beyond the PR which only aims to build on the existing faster-whisper structure, providing a simple method for other custom models. |
@blackpolarz You're technically correct. It's basically a tradeoff and I had a brief conversation with the @guillaumekln dude awhile ago.
Thus, it's a tradeoff...Yes CT2 can convert at runtime...heck, you can even quant from int8 to float16 (for a significant loss of quality)...So 1) to save compute time; and 2) to improve accuracy a little bit...I've chosen to upload the various permutations of quantizations that ct2 supports. Personal choice totally, but good to make an informed decision after you fully understand what I've described is all I'm saying. |
To give another example...if you take the systran float16 model and want to use bfloat16, it will not be as accurate as if you take the float32 version and do a single conversion to bfloat16. I prefer the quality and to know the quantizations. I'm surprised that they only distribute float16 versions. The last two generations of Nvidia cards support bfloat16 so why not? Also, if they're going to put defaults into their code, why not include all quantizations. |
Added a user-friendly function to allow users to add their own Hugging Face ct2 models. This allows others to directly use any updated faster-whisper models without having to wait for others to update existing faster-whisper.