In-memory audio input mode #65

amdrozdov · 2024-07-09T12:30:24Z

Hello Whisper S2T team!

In our project we need to work with pre-loaded audio chunks and I did a small PR that adds file_io flag to the whisper s2t model. This mode allows to call transcribe() with np.arrays (without working with file io). Usage example:

model = whisper_s2t.load_model(
    model_identifier=./models/faster-whisper-large-v3",
    backend='CTranslate2',
    n_mels=128,
    file_io=False
)

# some audio chunks
audio_chunks = [np.frombuffer(my_data, np.int16).flatten().astype(np.float32)/32768.0]

result = model.transcribe(audio_chunks,
  lang_codes=lang_codes,
  tasks=tasks,
  initial_prompts=initial_prompts,
  batch_size=32
)

Please let me know if I will need to change tests or benchmarks as well in order to to merge the PR.

P.S. There is a ticket #25 and this PR can be a first step for it. (if we control external VAD and hypothesis buffer outside of whisper s2t).

Best regards,
Andrei

Added in-memory audio input mode

2779d05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In-memory audio input mode #65

In-memory audio input mode #65

amdrozdov commented Jul 9, 2024

In-memory audio input mode #65

Are you sure you want to change the base?

In-memory audio input mode #65

Conversation

amdrozdov commented Jul 9, 2024