A new pattern for translating metadata in tabular data packages #687
Unanswered
augusto-herrmann
asked this question in
Ideas
Replies: 2 comments 3 replies
-
I really like this idea @augusto-herrmann! |
Beta Was this translation helpful? Give feedback.
1 reply
-
I have a similar problem with my datapackages, and I solved it in another way. {
"resources": [{
"name": "ogd10_energieforschungstatistik_ch.csv",
"languages": ["de", "fr"],
[...]
{
"name": "finanzquelle",
"type": "string",
"format": "default",
"title@de": "Finanzquelle",
"title@fr": "Source de financement",
"description@de": "In- oder ausländische Stelle, welche die Forschung finanziert",
"description@fr": "Entité nationale ou étrangère qui finance la recherche"
}
[...]
} The Data Package Validator on DataHub tells me it's valid, but I am not really confident that I am using the datapackage the way it's meant to be used. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Frictionless already has a couple of patterns for translation of tabular data. The first involves creating additional columns with a language tag in the end, e.g. "name@en". The other one suggests creating different files in the same path, one CSV file for each language.
Both of these require duplicating a lot of data. Sometimes you have a lot of numeric columns, so it makes no sense to translate the data (which is just numbers after all), but you still want to translate the metadata (i.e. column names and descriptions). Other times, you could translate the data, but the volume is so great that it isn't feasible in the short term, but you still want to release the untranslated data with translated metadata.
With that in consideration, I would like to suggest a third pattern for translating just the metadata in a tabular data package.
datapackage.xx.json
files, replacingxx
with the language tag for the file. The strings in those file should all be in the respective language. All other definitions in the tabular data package (e.g. column type, missing values, restrictions, etc.) should be exactly the same in every file.We are already using this approach in a data package we publish as open data. A challenge of this pattern is keeping the non-textual information in sync among all those files. To tackle this challenge, we create the
datapackage.xx.json
files in Python, defining all the data validation properties in there. All text-based strings (titles and descriptions) are read from a separate yaml file. If we want to add a new language, just copy thedatapackage-strings.xx.yaml
file to a new one, translate it, run the Python script and we get the new respectivedatapackage.xx.json
file. If we want to change any of the data validation properties, we edit the Python script and it will generate all of thedatapackage.xx.json
files, keeping data validation in sync.Any comments on alternate ways to translate just the metadata (titles and descriptions) in tabular data packages would be appreciated. If we find out this pattern I describe is really the best way to do it, perhaps we should add it to the translation support patterns.
Beta Was this translation helpful? Give feedback.
All reactions