[Feature] Silero VAD support #889

3manifold · 2024-09-26T08:33:30Z

VAD model plays a crucial role in the WhisperX pipeline and can significantly affect speech recognition performance and inference time. Thus, it is important to extend the application to accept alternative VAD methods. These methods do not necessarily have to emerge from pyannote-audio toolkit (as in the case of the default VAD model). Silero VAD is an ideal candidate for an alternative VAD option. It has excellent results on speech detection tasks running only on CPUs. In addition, it is considered a high-priority TODO item in WhisperX repository.

This feature includes:

Implementation of Silero VAD as an alternative VAD option.
Extension of WhisperX to accept VAD alternatives that do not have to necessarily emerge from pyannote-audio toolkit.
Fix in whisperx\__init__.py imports.

Implementation, description of tests as well as future work can be found in pull request #888 .

The text was updated successfully, but these errors were encountered:

3manifold linked a pull request Sep 26, 2024 that will close this issue

Silero VAD support #888

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Silero VAD support #889

[Feature] Silero VAD support #889

3manifold commented Sep 26, 2024 •

edited

Loading

[Feature] Silero VAD support #889

[Feature] Silero VAD support #889

Comments

3manifold commented Sep 26, 2024 • edited Loading

3manifold commented Sep 26, 2024 •

edited

Loading