Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In-memory audio input mode #65

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

amdrozdov
Copy link

Hello Whisper S2T team!

In our project we need to work with pre-loaded audio chunks and I did a small PR that adds file_io flag to the whisper s2t model. This mode allows to call transcribe() with np.arrays (without working with file io). Usage example:

model = whisper_s2t.load_model(
    model_identifier=./models/faster-whisper-large-v3",
    backend='CTranslate2',
    n_mels=128,
    file_io=False
)

# some audio chunks
audio_chunks = [np.frombuffer(my_data, np.int16).flatten().astype(np.float32)/32768.0]

result = model.transcribe(audio_chunks,
  lang_codes=lang_codes,
  tasks=tasks,
  initial_prompts=initial_prompts,
  batch_size=32
)

Please let me know if I will need to change tests or benchmarks as well in order to to merge the PR.

P.S. There is a ticket #25 and this PR can be a first step for it. (if we control external VAD and hypothesis buffer outside of whisper s2t).

Best regards,
Andrei

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant