Coefontuber, which is named after CoeFont + YouTuber, is an interactive and cross-platform CUI client for CoeFont.
As a vtuber streams a live with his/her virtual appearance, Coefontuber lets you speak with a virtual voice. Even dictionary and sound effects such as echo are supported.
It is written in Go.
In addition to CoeFont API, VOICEVOX API is supported. Currently this is experimental and not documented.
Demo |
-
Sign up for CoeFont.
-
Visit https://coefont.cloud/account/api to generate a key for CoeFont API.
- SoX (We call
play
command bundled with SoX.)
Coefontuber reads ./config.json
as the configuration file.
You don't need to write ./config.json
from scratch; just
cp ./template_config.json ./config.json
Here is an example configuration:
{
"coefont": { //Normally, this is the only section you might want to edit.
"access_key": "vWuJOxgUZcJGNV4aCZA0dXHlK",
"client_secret": "y5Fsd7GDfspJnFaqszqOsxSF729je6SecIkevyC6",
"font_uuid": "6c0540f7-9639-4d2b-ae8c-43572d9d7f79",
"speed": 1.0
},
"readline": { //Never edit this section unless you know what is GNU Readline.
"vim_mode": false,
"history_file": "./.history"
},
"output_dir": "./wav",
"timeout_sec": 10,
"custom_prefix_list": [
{
"prefix": "echo",
"args": [
"echo",
"0.8",
"0.88",
"60",
"0.4"
]
}
]
}
See also: 3.3 Special Commands
The configuration field custom_prefix_list
registers additional arguments to the play
command used to play generated WAV files.
With the example configuration above, if you for example input
!echo hello world
in an interactive session, then
play hello_world.wav echo 0.8 0.88 60 0.4
is executed under the hood.
Using this, you can apply any effects supported by SoX.
You can type !list
to list all of the prefixes you've registered.
go build
./coefontuber #starts an interactive session
In an interactive session, each input of the form
!<word>[ <arg(s)>]
is interpreted as a special command.
Syntax | Description |
---|---|
!help |
Shows help about all of the built-in special commands. |
!list |
Shows the list of all of the user-defined special commands. |
!dict |
Shows the list of the words registered to CoeFont dictionary. |
!dict <word> <reading> |
Registers the word <word> with its pronunciation <reading> to CoeFont dictionary. |
!dict del <word> |
Removes the word <word> from CoeFont dictionary. |
User-define special commands are supported in a limited manner. See 2.2 custom_prefix_list
(Sound Effects) for the detail.
Run ./lint.sh
before you commit.
Say you want to make Coefontuber speak "hello", "world" and "bye". Here is what is happening under the hood:
-
main()
reads your inputs (i.e. "hello", "world" and "bye") using GNU Readline. -
main()
spawns a pair ofAPICall()
andPlay()
for each input. All of such functions start running right away, and they are all asynchronous. -
APICall()
sends a string to CoeFont and passes the resultant.wav
file to the correspondingPlay()
using a channel. CoeFont works as a text-to-speech converter. Note that whichAPICall()
returns first is undefined. It is possible(E)
or(F)
in the figure below returns before(D)
returns though(D)
is spawned first. -
Play()
waits for the correspondingAPICall()
to give it a.wav
file. In addition, it waits for the adjacentPlay()
which has been spawned right before it to return. This implies that, beforePlay()
returns, it notifies "I'm done." to the adjacentPlay()
. By that, even if sequentially spawnedAPICall()
s don't return in that order, it is guaranteed thatPlay()
s speak in that order. These "I'm done." communications are also done via channels namedbatonIn
(for listening) andbatonOut
(for notifying). Each instance ofbatonIn
orbatonOut
is shared only by two adjacentPlay()
s. For example, if abatonOut
is passed to(B)
in the figure below as an argument, it is copied only once and passed to(C)
asbatonIn
.
Figure: How Coefontuber Works |