Documentation for covering grammars #490

kylebgorman · 2023-03-22T15:14:57Z

We have no effective documentation for the covering grammars data library.

We should probably add a short description to the data README.
We should give the exact instructions in the covering grammars README.

neurlang · 2024-09-28T10:48:37Z

hi what is covering grammar? I have language to IPA longest prefix match grammar for a number of languages:

English
Spanish
German
French
Italian
Arabic
Farsi
Luxembourgish
Dutch
Portuguese
Russian
Swedish
Czech
Slovak
Romanian
Finnish
Isan
Swahili
Esperanto
Icelandic
Norwegian
Jamaican
Japanese

It can generate a new grammar based on tsv files like the ones that you have

kylebgorman · 2024-09-28T13:10:42Z

A covering grammar is essentially a listing of, for each character, all the pronunciations it can take on. For this to work with our system it also has to be at the right level (broad or narrow) and actually match what's in the Wiktionary data. Maybe you could take a look at what we have here, which overlaps a few of your languages:

https://github.com/CUNY-CL/wikipron/tree/master/data/covering_grammar/tsv

and see how they differ, if at all. If they are broadly similar it might make sense to incorporate your data.

neurlang · 2024-09-28T13:47:37Z

Checking japanese in your dataset, I see that you have a mapping "あ"->"a̠" but also "ああ"->"a̠ː". Consider the source language word "ああ" and a target language word "a̠a̠" is this a valid covering grammar production in your case?

kylebgorman · 2024-09-28T17:11:30Z

Checking japanese in your dataset, I see that you have a mapping "あ"->"a̠" but also "ああ"->"a̠ː". Consider the source language word "ああ" and a target language word "a̠a̠" is this a valid covering grammar production in your case?

Yes, it would be in that case. You can think of all the pairs in the mapping each as a substitution, then the "grammar" is simply the closure over the union of all the substitutions.

It is a "covering" grammar because we know it's overly permissive, but it's simple enough to be specified as substitution pairs. It's useful in debugging, quality assurance, and the like.

neurlang · 2024-09-29T05:17:22Z

If its the case that any choice like this is valid, then it means that the idea of covering grammars is different than the one which I use (longest prefix match grammar).

In my scenario I am taking the longest match always, "ああ" and a target language word "a̠a̠" wouldn't be accepted by the grammar machine.

Since our ideas are distinct you can't simply copy paste my grammars and call it a day.

kylebgorman added documentation Improvements or additions to documentation good first issue Good for newcomers labels Mar 22, 2023

kylebgorman assigned wenzhang0222 Jul 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation for covering grammars #490

Documentation for covering grammars #490

kylebgorman commented Mar 22, 2023

neurlang commented Sep 28, 2024

kylebgorman commented Sep 28, 2024

neurlang commented Sep 28, 2024

kylebgorman commented Sep 28, 2024

neurlang commented Sep 29, 2024

Documentation for covering grammars #490

Documentation for covering grammars #490

Comments

kylebgorman commented Mar 22, 2023

neurlang commented Sep 28, 2024

kylebgorman commented Sep 28, 2024

neurlang commented Sep 28, 2024

kylebgorman commented Sep 28, 2024

neurlang commented Sep 29, 2024