How to know the offsets in output audio buffer that correspond to word boundaries? #600

siddhsql · 2024-09-10T18:34:59Z

siddhsql
Sep 10, 2024

In my application I want to output a stream of words on the screen as they are being spoken. To do this I first tried to call textToAudio passing it individual words of a sentence sort of like:

while (HasNext()) {
   std::string word = GetNext();
   std::cout << word;
   textToAudio(word);
}

However this does not lead to a pleasant speech. The textToAudio generates a better audio if its passed full sentences. But now I don't know the positions in the audio buffer that correspond to word boundaries. Is there any way to revise textToAudio so it also outputs a std::vector<S> where:

struct S {
// the word being spoken
std::string word;
// offset in the audio buffer
size_t offset;
// number of bytes in the audio buffer that correspond to utterance of the word
size_t nBytes;
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to know the offsets in output audio buffer that correspond to word boundaries? #600

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

How to know the offsets in output audio buffer that correspond to word boundaries? #600

siddhsql Sep 10, 2024

Replies: 0 comments

siddhsql
Sep 10, 2024