Skip to content

Is there a way to make an additional float[] output showing individual phoneme times/lengths from the native DLL? #425

Answered by synesthesiam
JasonBlain asked this question in Q&A
Discussion options

You must be logged in to vote

The w_ceil variable has the phoneme lengths:

w_ceil = torch.ceil(w)

Multiplying this tensor by 256 will get you the number of audio samples per phoneme.

That w_ceil tensor needs to be returned from the infer function and then also returned with the audio here:

audio = model_g.infer(

On the C++ side, you then need to pick apart the multiple output tensors (one audio, one phoneme samples):

auto outputTensors = session.onnx.Run(

Replies: 3 comments 8 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
4 replies
@synesthesiam
Comment options

Answer selected by JasonBlain
@JasonBlain
Comment options

@JasonBlain
Comment options

@JasonBlain
Comment options

Comment options

You must be logged in to vote
4 replies
@synesthesiam
Comment options

@goldyfruit
Comment options

@goldyfruit
Comment options

@MetaMachina
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
5 participants