You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In my tests whisperX is "padding" the duration of certain words with silence / pauses.
With Faster-Whisper ("large-v2") I get the following word level timestamps with a noticeable pause of about 1.5 seconds between the first and the second word.
With whisperX ("large-v2") this pause is somehow added to the previous word, that now has a duration of about 2 seconds (instead of 0.5 with FasterWhisper):
model = whisperx.load_model(
whisper_arch='large-v2',
device = 'cpu',
compute_type='int8',
language='de',
)
audio = whisperx.load_audio(str(filepath), sr=16000)
result = model.transcribe(audio, batch_size=16, task='transcribe')
Is there any way to control how whisperX is handling silence / pauses? I tried adjusting the no_speech_threshold of asr_options, but that did not work.
Any help would be much appreciated!
The text was updated successfully, but these errors were encountered:
In my tests whisperX is "padding" the duration of certain words with silence / pauses.
With
Faster-Whisper
("large-v2") I get the following word level timestamps with a noticeable pause of about 1.5 seconds between the first and the second word.I am running Faster-Whisper with standard parameters:
With
whisperX
("large-v2") this pause is somehow added to the previous word, that now has a duration of about 2 seconds (instead of 0.5 with FasterWhisper):I am running whisperX with these parameters:
Is there any way to control how whisperX is handling silence / pauses? I tried adjusting the
no_speech_threshold
ofasr_options
, but that did not work.Any help would be much appreciated!
The text was updated successfully, but these errors were encountered: