You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In my application I want to output a stream of words on the screen as they are being spoken. To do this I first tried to call textToAudio passing it individual words of a sentence sort of like:
while (HasNext()) {
std::string word = GetNext();
std::cout << word;
textToAudio(word);
}
However this does not lead to a pleasant speech. The textToAudio generates a better audio if its passed full sentences. But now I don't know the positions in the audio buffer that correspond to word boundaries. Is there any way to revise textToAudio so it also outputs a std::vector<S> where:
struct S {
// the word being spoken
std::string word;
// offset in the audio buffer
size_t offset;
// number of bytes in the audio buffer that correspond to utterance of the word
size_t nBytes;
}
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
In my application I want to output a stream of words on the screen as they are being spoken. To do this I first tried to call
textToAudio
passing it individual words of a sentence sort of like:However this does not lead to a pleasant speech. The
textToAudio
generates a better audio if its passed full sentences. But now I don't know the positions in the audio buffer that correspond to word boundaries. Is there any way to revisetextToAudio
so it also outputs astd::vector<S>
where:Beta Was this translation helpful? Give feedback.
All reactions