Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API endpoint #14

Open
nathan-skynet opened this issue Apr 26, 2024 · 5 comments
Open

API endpoint #14

nathan-skynet opened this issue Apr 26, 2024 · 5 comments

Comments

@nathan-skynet
Copy link

I'm looking to find documentation for using Wyoming Whisper/Piper on a curl request but I can't find what endpoint was used and what data to send to it.

I even look in the code but either I don't understand it or I'm really stupid.

@Tomywang999
Copy link

Same here, hope some kind of reference document can be published

@GRbit
Copy link

GRbit commented Jul 16, 2024

Same here.

What can be understood from the README.md is that the body of the request should be encoded with JSONLand the paypload should be in PCM audio format. We can also see description of some requests, but no real examples.

I suggested that https://github.com/rhasspy/wyoming-faster-whisper is run via tcp:// protocol and tried to send a describe request to it. As I understood request should look like this {"type":"describe"}\n\n. Server never replied, unfortunately.

I would really appreciate any help with sending an example request to any Wyoming server. It's a bit hard for me to understand how it works reading the code.

@GRbit
Copy link

GRbit commented Jul 16, 2024

I've ended up installing wireshark and capturing packages from HA to piper/whisper.

It's indeed TCP (mine code had some errors, I didn't send FIN after write), but the packages looks a bit different from what I assumed from the README.md

For example synthesize request described as

  • synthesize - request to generate audio from text
    • text - text to speak (string, required)
    • voice - use a specific voice (optional)
      • name - name of voice (string, optional)
      • language - language of voice (string, optional)
      • speaker - speaker of voice (string, optional)

And from the Format desciption I would assume that I should set type to "synthesize" and put the text and voice in "data".
But it turned out that it should be on the next line as "Additional data". That's a surprise.

So, I guess for any request you can try putting the request under data JSON key or sending it as "Additional data" and one of two will work. Then try to handle what you got form the server.

@synesthesiam
Copy link
Contributor

Wyoming author here. The first line of the TCP message is JSON with the event type and the size of the "additional data" (JSON) plus the size of the binary payload (PCM audio).

The "additional data" was needed because Python has a limit to how many characters can be on a single line, so large JSON messages would be cut off.
My solution was to add another section for additional data and merge it with the (small) JSON data from the header.

@GRbit
Copy link

GRbit commented Jul 18, 2024

Greetings @synesthesiam! Thank you for your work on the protocol and for taking the time to respond to our request.

If you have some more time, I would like to ask one more question. Is there somewhere I can read about the API? I mean, I got the idea with line of JSON, line of additional JSON and PCM audio, the real question for me is the spec of the messages. Is there a place where I can see examples of each request? Otherwise I cannot predict if the data should come in the first line under data key or in the second line as "additional data". Are messages always come as "additional data"? If so, in which cases data key from the first line is used?
I have experience with some HTTP APIs where there is a specification like OpenAPI or JSON:API. It would be great to have something similar for Wyoming, even if it's not HTTP but TCP.

Also, I'm very curious about the limitation you mentioned "Python has a limit on how many characters can be on a single line". Can you tell me more about this? I've never heard of this before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants