Skip to content

Exceptions for foreign words

Vaclav Hanzl edited this page Oct 31, 2022 · 7 revisions

Prak pronunciation generator covers just the "regular" part of it. It is very meticulous in considering possible assimilations (of various kinds), considering glottal stop presence etc., basically it does mostly right what is "logical" in Czech pronunciation (even in cases where this logic is rather involved). But Prak has very limited lexical knowledge. Where such knowledge is necessary to get pronunciation right, Prak needs your help.

File prak/exceptions.txt is a place where you can provide this help. This file is consulted when Prak is invoked from Praat. If you want to use some other file, change it in prak/prak_align_phrase.praat, the script which you've put on the Align by Prak button. Just edit the line

exceptions$ = prak$ + "\exceptions.txt"

to something else, like

exceptions$ = "C:\Users\Ferda\Ferduv_slovnik_vyjimek.txt"

In the exceptions.txt file (or other file you configured), you can use simple rules for foreign words. For example:

python pajtn
praat prát

and so on. Another domain where you help is needed are some composed words. For example:

elektroinstalace elektro=instalace

Here we suggested composition seam which is important here as it makes glottal stop possible (because the second part starts with vowel). With this rule in exceptions.txt, Prak will properly generate two pronunciation variants, one with a glottal stop at seam and one without (and hopefully choose the right one based on audio).

We would love to detect these seams automatically but automated approach we tried found much more nonsense proposals like "petr=olej" than real things. So we left this to humans for now. But one area where we succeeded to automate things are (semi)foreign words with "ditini". Prak contains a table with autolearned rules and gets about 98% of these words right. You can also look at prak/prongen/README for additional details.

You can use any text editor to add your rules to exceptions.txt, even the Praat's script editor can be used (but do not run it as a Praat script :-) ). Just remember to save the file as utf-8 NFC.