Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confidence-score per word #18

Open
raffiniert opened this issue Sep 14, 2024 · 2 comments
Open

Confidence-score per word #18

raffiniert opened this issue Sep 14, 2024 · 2 comments

Comments

@raffiniert
Copy link

Very nice library, thank you so much for all the efforts!

For my project, I would need a confidence score not only on the final result of f.e. a complete sentence, but on a per-word basis. Currently, the confidence score per word is always returned as 0.

Is there any way you could implement this in an upcoming release? I would buy you coffee/beer in large amounts for it :-D

@jamsch
Copy link
Owner

jamsch commented Sep 14, 2024

Hi @raffiniert, you can get the confidence scores per word segment through event.results[x].segments, however there's a few things to note:

On Android, your mileage may vary based on the recognition service you're using. This seems to be the only public Github repo that implements this functionality for RecognizerIntent.EXTRA_REQUEST_WORD_CONFIDENCE -- https://github.com/search?q=EXTRA_REQUEST_WORD_CONFIDENCE&type=code

Also of note for Android:

  • Segment confidences seem to only return back as -1 (RecognitionPart.CONFIDENCE_LEVEL_UNKNOWN) at least on my Samsung Galaxy S23 Ultra. -- this will likely have to be fixed upstream on Google's end.
  • This feature is only available on SDK 34+ (Android 14+)
  • I've only verified that segments work with the com.google.android.as recognition service (using on device speech recognition). I couldn't verify it to work with com.google.android.tts and com.samsung.android.bixby.agent.
  • Segments are only available during the final result
  • The segment parts are split up by words.
  • The segments are only available for the first transcript

For iOS:

  • The confidence score per segment will be 0 on partial results

So to wrap it up, not really working for Android, but does work for iOS. Android will need upstream fixes for this feature to work. In the meantime if you do need to get this consistent across devices, there's two options you could explore:

  • You can record the recognized audio by calling ExpoSpeechRecognitionModule.start() with recordingOptions: { persist: true } and then send the recognized audio to that service when speech recognition has ended.
  • Otherwise, if space isn't a concern you may want to consider bundling your app with an on-device speech recognition model like whisper.rn

@raffiniert
Copy link
Author

man you're awesome. I don't have time to deep-dive rn but just wanted to thank you for your reply immediately.

Hope Android will fix it upstream soon and will look into the iOS-solution asap. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants