-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Silero VAD support #888
base: main
Are you sure you want to change the base?
Silero VAD support #888
Conversation
onset: float = 0.5, | ||
offset: Optional[float] = None, | ||
): | ||
assert chunk_size > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep binarization separate from the parent class function merge_chunks
(i.e. Vad.merge_chunks
). This is because binarization of other VAD methods (e.g. silero
) may happen in earlier stages making Vad.merge_chunks
easier to reuse. Specifically, in the case of silero
, binarization happens during model invocation.
How do I use Silero VAD with WhisperX!! |
From the pull request description:
|
An error occurred whisperx: error: unrecognized arguments: --vad_method silero |
You have to checkout |
I have * main |
You can run |
i showed up!! |
When will a parameter for threshold adjustment be added? |
Description
Implementation includes:
pyannote-audio
toolkit.whisperx\__init__.py
imports.The implementation aims to respect the current structure as well as keep the existing functionality intact. It is worth mentioning that the manually-assigned
vad_model
still works as expected (seeload_model
for details).See relevant issue for further details. resolves #889
Tests
vad_model
has higher priority thanvad_method
, seeload_model
function for details)Example command line (applies also for
--vad_method pyannote
):python3 -m whisperx.transcribe audio.wav --language en --device cuda --diarize --hf_token xxx --vad_method silero
python3 -m whisperx.transcribe audio.wav --language en --device cpu --diarize --hf_token xxx --compute_type int8 --vad_method silero
Example Python script usage:
output:
click to expand
Future work
min_silence_duration_ms
(silero
) andmin_duration_off
(pyannote
)min_speech_duration_ms
(silero
) andmin_duration_on
(pyannote
)