-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation for covering grammars #490
Comments
hi what is covering grammar? I have language to IPA longest prefix match grammar for a number of languages:
It can generate a new grammar based on tsv files like the ones that you have |
A covering grammar is essentially a listing of, for each character, all the pronunciations it can take on. For this to work with our system it also has to be at the right level (broad or narrow) and actually match what's in the Wiktionary data. Maybe you could take a look at what we have here, which overlaps a few of your languages: https://github.com/CUNY-CL/wikipron/tree/master/data/covering_grammar/tsv and see how they differ, if at all. If they are broadly similar it might make sense to incorporate your data. |
Checking japanese in your dataset, I see that you have a mapping "あ"->"a̠" but also "ああ"->"a̠ː". Consider the source language word "ああ" and a target language word "a̠a̠" is this a valid covering grammar production in your case? |
Yes, it would be in that case. You can think of all the pairs in the mapping each as a substitution, then the "grammar" is simply the closure over the union of all the substitutions. It is a "covering" grammar because we know it's overly permissive, but it's simple enough to be specified as substitution pairs. It's useful in debugging, quality assurance, and the like. |
If its the case that any choice like this is valid, then it means that the idea of covering grammars is different than the one which I use (longest prefix match grammar). In my scenario I am taking the longest match always, "ああ" and a target language word "a̠a̠" wouldn't be accepted by the grammar machine. Since our ideas are distinct you can't simply copy paste my grammars and call it a day. |
We have no effective documentation for the covering grammars data library.
The text was updated successfully, but these errors were encountered: