Turning on Diarize with Streaming always returns speaker as 0 #108
-
I'm using Deepgram with Twilio stream, with the following config:
I'm trying to test diarization in the hopes that I can filter out all speakers other than the primary one, but I always get
|
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 1 reply
-
Hi @TechyChan I just tried this out to see whether I could recreate the same experience. I did it slightly differently to you. I streamed the audio from a tv show with various characters having conversations. I used the following web socket uri and params... wss://api.deepgram.com/v1/listen?language=en&tier=enhanced&model=meeting&diarize=true&interim_results=true&smart_format=true&encoding=linear16&sample_rate=18000&profanity_filter=true I found that different speakers were identified, but not consistently. The same speaker id was not selected for the same speaker throughout the tv show and even during scenes the speaker was sometimes switched. I have raised a similar issue with a potential solution here: https://github.com/orgs/deepgram/discussions/104 I guess another way of solving your issue is if you can send more than one channel from your calls (one for each person) and use the multichannel functionality..... https://developers.deepgram.com/documentation/guides/multichannel-vs-diarization/ This is something I am yet to play around with, but it would be my next attempt at solving this if I was in your position. |
Beta Was this translation helpful? Give feedback.
-
Hi @rilhia, thanks for the response! Unfortunately I only have one audio channel from Twilio, so multichannel would not be an option for me. The idea is that I want to be able to separate out the primary speaker during the phone call, if the phone was speaker mode and had some background voices. I think I'll probably need to look into other options like Azure Speech to Text for diarization. |
Beta Was this translation helpful? Give feedback.
-
Slightly different from audio channels, but twilio stream channels can be separated with a twilio feature called Single Party Call Recordings. That should allow you to only send through the caller audio, and be much more resilient. Even with the best diarization in the world, it will never be 100% if you send both speakers through
|
Beta Was this translation helpful? Give feedback.
-
Closing due to age of issue. If this is still a problem just let us know and we can re-open it. |
Beta Was this translation helpful? Give feedback.
-
I'm still running into this exact same problem. It makes using diarization with live streaming almost unsuable for me. Is the intended behavior to maintain speaker id's across the stream or is it mean't to separate out the speakers for a single transcription? |
Beta Was this translation helpful? Give feedback.
-
I am still running into the same issue with diarization on streaming audio - it is always returning speaker = 0. I am using a lightly modified version of the code for streaming audio from a microphone. |
Beta Was this translation helpful? Give feedback.
Closing due to age of issue. If this is still a problem just let us know and we can re-open it.