Goruut

A tokenizer, text cleaner, and IPA phonemizer for several human languages.

Try it online

It is possible to try this software live at hashtron.cloud.

Installation

go install github.com/neurlang/goruut/cmd/goruut@latest

Docker Compose installation

Clone the repo and then run in root directory this command:

sudo docker compose up -d --force-recreate --build

Supported Languages

Afrikaans
Amharic
Arabic
Azerbaijani
Belarusian
Bengali
Bengali Dhaka
Bengali Rahr
Burmese
Cebuano
Chechen
Chinese Mandarin
Czech
Danish
Dutch
Dzongkha
English
Esperanto
Farsi
Finnish
French
German
Greek
Gujarati
Hausa
Hebrew
Hindi
Hungarian
Icelandic
Indonesian
Isan
Italian
Jamaican
Japanese
Javanese
Kazakh
Korean
Luxembourgish
Macedonian
Malayalam
Malay Arab
Malay Latin
Maltese
Marathi
Mongolian
Nepali
Norwegian
Pashto
Polish
Portuguese
Punjabi
Romanian
Russian
Slovak
Spanish
Swahili
Swedish
Tamil
Telugu
Thai
Tibetan
Turkish
Ukrainian
Urdu
Uyghur
Vietnamese Central
Vietnamese Northern
Vietnamese Southern
Zulu

The goal is to support all of voice2json's languages. Please Add a language if you have the necessary data.

Listening to the generated speech

There are currently 3 target languages (IPA flavors). They are:

IPA - Copy the output into ipa-reader.xyz and pick a correct language voice
Espeak - Copy the output into espeak. For example czech: espeak -v cs "[[ru:Zovi: ku:n^]]"
Antvaset - Copy the output into antvaset.com and pick a correct language voice

Dependencies

See go.mod file for an up-to-date list of depended-on projects. Minimum supported version of golang is go 1.18 (project uses type parameters).

Numbers, Dates, and More

Unsupported. Please write them using words.

Command-Line Usage

To start, launch the server using the example config (in configs dir):

./goruut -configfile configs/config.json

This will launch the server at a specific http port. You should see the port which you specified in the config file:

INFO[0000] Binding port: 18080

Then you can run queries:

POST http://127.0.0.1:18080/tts/phonemize/sentence

{
	"Language": "Czech",
	"Sentence": "jsem supr"	
}

Output should be:

{
	"Words": [
		{
			"Linguistic": "jsem",
			"Phonetic": "jsɛm"
		},
		{
			"Linguistic": "supr",
			"Phonetic": "supr"
		}
	]
}

Intended Audience

goruut is useful for transforming raw text into phonetic pronunciations, similar to phonemizer. Unlike phonemizer, goruut looks up words in a pre-built lexicon (pronunciation dictionary) or guesses word pronunciations with a pre-trained grapheme-to-phoneme model.

Name		Name	Last commit message	Last commit date
Latest commit History 400 Commits
.github/workflows		.github/workflows
app		app
cmd		cmd
configs		configs
controllers		controllers
dicts		dicts
helpers		helpers
lib		lib
models		models
repo		repo
usecases		usecases
views		views
.gitignore		.gitignore
.slsa-goreleaser-android-arm64.yml		.slsa-goreleaser-android-arm64.yml
.slsa-goreleaser-darwin-amd64.yml		.slsa-goreleaser-darwin-amd64.yml
.slsa-goreleaser-darwin-arm64.yml		.slsa-goreleaser-darwin-arm64.yml
.slsa-goreleaser-freebsd-386.yml		.slsa-goreleaser-freebsd-386.yml
.slsa-goreleaser-freebsd-amd64.yml		.slsa-goreleaser-freebsd-amd64.yml
.slsa-goreleaser-freebsd-arm.yml		.slsa-goreleaser-freebsd-arm.yml
.slsa-goreleaser-freebsd-arm64.yml		.slsa-goreleaser-freebsd-arm64.yml
.slsa-goreleaser-linux-386.yml		.slsa-goreleaser-linux-386.yml
.slsa-goreleaser-linux-amd64.yml		.slsa-goreleaser-linux-amd64.yml
.slsa-goreleaser-linux-arm.yml		.slsa-goreleaser-linux-arm.yml
.slsa-goreleaser-linux-arm64.yml		.slsa-goreleaser-linux-arm64.yml
.slsa-goreleaser-linux-riscv64.yml		.slsa-goreleaser-linux-riscv64.yml
.slsa-goreleaser-windows-386.yml		.slsa-goreleaser-windows-386.yml
.slsa-goreleaser-windows-amd64.yml		.slsa-goreleaser-windows-amd64.yml
.slsa-goreleaser-windows-arm.yml		.slsa-goreleaser-windows-arm.yml
.slsa-goreleaser-windows-arm64.yml		.slsa-goreleaser-windows-arm64.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Goruut

Try it online

Installation

Docker Compose installation

Supported Languages

Listening to the generated speech

Dependencies

Numbers, Dates, and More

Command-Line Usage

Intended Audience

About

Releases 25

Packages

Languages

License

neurlang/goruut

Folders and files

Latest commit

History

Repository files navigation

Goruut

Try it online

Installation

Docker Compose installation

Supported Languages

Listening to the generated speech

Dependencies

Numbers, Dates, and More

Command-Line Usage

Intended Audience

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 25

Packages 0

Languages

Packages