-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[praeparātiō ex automatīs] MVP of idea of automation to pre-process external data to LSF internal file format #42
Comments
Okay. Doing some dogfooding with previous step on #41.
Both cases are what would be considered referential data. On the issues to convert files: not a problem at allI think the file converters (and later do the pipeline with GitHub actions or equivalent) is quite feasible. This is not what taking more time (at least, considering everything already done with the HXLTM and Numerodinatio). However, most of the time, it was testing from data already added to HDX. Not from scratch (which have far more data to potentially document) The new challenge: the MAPPINGS to full blow linked dataThe numbers of identifiers which already have an Wikidata P are quite limited (see https://www.wikidata.org/wiki/Wikidata:Database_reports/List_of_properties/all). The IBGE municipality code is an perfect example. From this post https://dadosabertos.social/t/lista-de-codigos-de-referencia-para-intercambio-de-dados/1138/2, we're starting to get more data, which would need to be mapped. In sort: things that already have an Wikidata P code are much simpler to deal with. But we need to be smarter on how to make it viable. I'm not saying that is not possible. But at this point, the level of abstraction of the converters (which now means even RDF, but at same time it should work if is on relational databases) is such that not only this would allow convert the reference data to pretty much any format out there, but also make easier to do automated discovery of final datasets. (For what already is referential data) moving the logic to non code fileThe print screen is very early stage, but to generalize better, since is "easier" to do software to convert whatever source was to something in tabular format, then after the tabular format is done (think raw CSV headers) an strategy to explain what it means will take time. Since at some point this will take far more thinking than just create more crawlers, the idea of move the logic to an YAML configuration makes sense. |
…belecimentos de saude; teste inicial
…t better explanation (like relation with places)
…ines from (until now) fully standalone 999999999_*.py stripts
…235.py already is able to create the CSV, HXL and HXL+tm
…dex (use case: add columns to datasets)
This point is an minimal viable product of a one or more "crawlers" or "scripts" or "conversors" that transform external dictionaries (aka the ones we would label
origo_per_automata
, origin trough automation, vsorigo_per_amanuenses
, origin trough manuenses, the way we mostly optimized now) into the working format.Focuses
External examples of type of reference data
International Federation of Red Cross and Red Crescent Societies | IFRC Data initiatives
Common Operational datasets (overview)
2007 reference (somewhat outdated)
From https://interagencystandingcommittee.org/system/files/legacy_files/Country%20Level%20OCHA%20and%20HIC%20Minimum%20Common%20Operational%20Datasets%20v1.1.pdf
Table One :Minimum Common Operational Datasets
Table Two: Optional Datasets
The text was updated successfully, but these errors were encountered: