-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestion to deal with omission of periods #9
Comments
Possibly these are some of the failure modes of whisper's LLM based decoder.
Interesting, do you have any detailed evaluation on it? Like how much punctuation accuracy improves after adding this? Also any effects on the WER? One issue I see in this approach is that it will unnecessarily increase the inference time. Can you try this and check if it helps? files = ['audio.wav']
lang_codes = ['en']
tasks = ['transcribe']
initial_prompts = ['This is a documentary about Meghan Elizabeth.']
out = model.transcribe_with_vad(files,
lang_codes=lang_codes,
tasks=tasks,
initial_prompts=initial_prompts,
batch_size=32) PS: This thing is also in my roadmaps on how to use prompting with whisper model to align the transcription format. |
If you can provide me one sample file, I can try looking into it if VAD margins can be somehow used to improve these issues. |
I can supply an MP3 file in which this issue happens predictably. How can I share it with you privately? |
You can email me: [email protected] |
Done. Thank you!
…On Thu, Feb 1, 2024 at 12:10 PM Shashi Kant ***@***.***> wrote:
You can email me: ***@***.***
—
Reply to this email directly, view it on GitHub
<#9 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACBPVXW22IDKZQTKQQ2NF7LYROV5PAVCNFSM6AAAAABCTPCSJCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRRGU2TCOJXGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi got your email. I will get back to you by coming weekend. |
All right, thank you!
…On Tue, Feb 6, 2024 at 5:15 AM Shashi Kant ***@***.***> wrote:
Hi got your email. I will get back to you by coming weekend.
—
Reply to this email directly, view it on GitHub
<#9 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACBPVXSXCX4G3MDZJTEX3KLYSHRBHAVCNFSM6AAAAABCTPCSJCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRYHE4DOMJWGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
There is a frequent hallucination in Whisper in which segments of the transcript are stripped of a period or full stop. Example (not a real transcription, just to illustrate the issue:
I have found that adding about 5 seconds of whitenoise to the beginning of the affected excerpt and retranscribing it usually corrects the punctuation.
Perhaps this could be incorporated to the code. Or, if there were a way to separate the affected region (e.g. with information from the VAD), a separate function could be written to check for this hallucination, export the WAV for the affected region and retranscribe.
The text was updated successfully, but these errors were encountered: