whisperX removing silence / pauses #896

tsmdt · 2024-10-09T12:46:54Z

In my tests whisperX is "padding" the duration of certain words with silence / pauses.

With Faster-Whisper ("large-v2") I get the following word level timestamps with a noticeable pause of about 1.5 seconds between the first and the second word.

"words": [
    {
        "word": "Ja,",
        "start": 0.0,
        "end": 0.54,
        "score": 0.76
    },
    {
        "word": "so",
        "start": 2.04,
        "end": 2.3,
        "score": 0.86
    },
    {
        "word": "ist",
        "start": 2.3,
        "end": 2.52,
        "score": 0.8
    }, 
...]

I am running Faster-Whisper with standard parameters:

model = WhisperModel('large-v2', device='cpu', num_workers=num_workers, compute_type='int8')

segments, _ = model.transcribe(
    str(filepath), 
    beam_size=5, 
    language='de',
    word_timestamps=True,
)

With whisperX ("large-v2") this pause is somehow added to the previous word, that now has a duration of about 2 seconds (instead of 0.5 with FasterWhisper):

"words": [
  {
      "word": "Ja.",
      "start": 0.45,
      "end": 2.232,
      "score": 0.649,
      "speaker": "SPEAKER_00"
  },
  {
      "word": "So",
      "start": 2.252,
      "end": 2.393,
      "score": 0.964,
      "speaker": "SPEAKER_00"
  },
  {
      "word": "ist",
      "start": 2.433,
      "end": 2.573,
      "score": 0.768,
      "speaker": "SPEAKER_00"
  },
...]

I am running whisperX with these parameters:

model = whisperx.load_model(
    whisper_arch='large-v2', 
    device = 'cpu',
    compute_type='int8', 
    language='de',
    )
audio = whisperx.load_audio(str(filepath), sr=16000)
result = model.transcribe(audio, batch_size=16, task='transcribe')

Is there any way to control how whisperX is handling silence / pauses? I tried adjusting the no_speech_threshold of asr_options, but that did not work.

Any help would be much appreciated!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

whisperX removing silence / pauses #896

whisperX removing silence / pauses #896

tsmdt commented Oct 9, 2024

whisperX removing silence / pauses #896

whisperX removing silence / pauses #896

Comments

tsmdt commented Oct 9, 2024