MVP of `[1603.45.16]` /"Ontologia"."United Nations"."P"/@eng-Latn #2

fititnt · 2022-01-04T23:58:07Z

Quick links

As part of Dedicated repository to store examples of pre-compiled datasets MVP of [1603:??:1603] /HXL/; focus on pre-compile replacement maps #5
What are P-Codes
- https://wiki.openstreetmap.org/wiki/P-Codes
- https://en.wikipedia.org/wiki/Place_code
Data source
- https://data.humdata.org/dataset?ext_cod=1&res_format=XLSX
  - https://drive.google.com/file/d/1jRshR0Mywd_w8r6W2njUFWv7oDVLgKQi/view?usp=sharing
Related
- HXL-CPLP/forum
  - HXL-CPLP-Dictionarium_Vaccinum: Lexicografia de conceitos usados na troca de dados relacionada a vacinação (inclusive COVID-19) HXL-CPLP/forum#59 (comment)
- augusto-herrmann/transparencia-dados-abertos-brasil: Incluir mapeamento com códigos COD P-Code 26
  - Incluir mapeamento com códigos COD P-Code augusto-herrmann/transparencia-dados-abertos-brasil#26

This issue is about minimal viable product of encode the entire public available P-Codes on numerordinatio. The scripts may need to get some cron job or manual upgrade over time, but this issue is mostly about at least have first version.

Replacing ISO 3661-1 alpha 2 with UN M49

P-codes are prefixed with 2 letter codes, which have advantage of deal with leading zeros. So, for P-Codes, this make sense leading letters, which also allow use pure P-Codes as programming variables. However the numerordinatio works, we can go fully numeric.

`[1603.45.16]` vs `[1603.45.49]`

In theory, [1603.45.16] could be a more specific version of [1603.45.49] (https://unstats.un.org/unsd/methodology/m49/) instead of have own base namespace. This may change later.

Another point is that depending of how numerordinatio would be done, the codes could have aliases.

Changes

[1603.45.15] renamed to [1603.45.16] (US-ASCII alphabet with K makes P as 16, not as 15).

The text was updated successfully, but these errors were encountered:

…ets names

fititnt · 2022-01-05T00:37:08Z

While the https://drive.google.com/file/d/1jRshR0Mywd_w8r6W2njUFWv7oDVLgKQi/view?usp=sharing is not a new version (was from around 8 months ago, so things are likely be more consistent) sheet names already are not as consistent (so is not possible to just pipepile zip output as some files would replace other. However, they is likely they still have patterns.

Since each dataset metadata can (and often would) be upgraded over time, then ingestion of more centralized version would need to be able to normalize more than one format at same time, like "Admin" "Adm" "adm", and the combinations with country prefix.

I will copy here the preview, since the https://github.com/EticaAI/ndata is likely to have history wiped several times to save space.

___.csv

#meta,#meta+archivum,#meta+iso3,#meta+sheets+original,#meta+sheets+new
1603.45.16:,afg.xlsx,afg,afg_adm2 afg_adm1 afg_adm0 ,
1603.45.16:,ago.xlsx,ago,ago_adm3 ago_adm2 ago_adm1 ago_adm0 ,
1603.45.16:,arg.xlsx,arg,arg_adm2 arg_adm1 arg_adm0 ,
1603.45.16:,arm.xlsx,arm,arm_adm2 arm_adm1 arm_adm0 ,
1603.45.16:,aze.xlsx,aze,adm1 adm0 ,
1603.45.16:,bdi.xlsx,bdi,bdi_adm2 bdi_adm1 bdi_adm0 ,
1603.45.16:,ben.xlsx,ben,ben_adm2 ben_adm1 ben_adm0 ,
1603.45.16:,bfa.xlsx,bfa,bfa_adm3 bfa_adm2 bfa_adm1 bfa_adm0 ,
1603.45.16:,bgd.xlsx,bgd,bgd_adm4 bgd_adm3 bgd_adm2 bgd_adm1 bgd_adm0 ,
1603.45.16:,bgr.xlsx,bgr,bgr_adm2 bgr_adm1 bgr_adm0 ,
1603.45.16:,blr.xlsx,blr,blr_adm2 blr_adm1 blr_adm0 ,
1603.45.16:,bmu.xlsx,bmu,bmu_adm2 bmu_adm1 bmu_adm0 ,
1603.45.16:,bol.xlsx,bol,bol_adm3 bol_adm2 bol_adm1 bol_adm0 ,
1603.45.16:,bra.xlsx,bra,adm2 adm1 adm0 ,
1603.45.16:,btn.xlsx,btn,adm2 adm1 adm0 ,
1603.45.16:,caf.xlsx,caf,caf_adm4 caf_adm3 caf_adm2 caf_adm1 caf_adm0 ,
1603.45.16:,chl.xlsx,chl,adm3 adm2 adm1 adm0 ,
1603.45.16:,chn.xlsx,chn,adm2 adm1 adm0 ,
1603.45.16:,civ.xlsx,civ,civ_adm3 civ_adm2 civ_adm1 civ_adm0 ,
1603.45.16:,cmr.xlsx,cmr,cmr_adm3 cmr_adm2 cmr_adm1 cmr_adm0 ,
1603.45.16:,cod.xlsx,cod,cod_adm2 cod_adm1 cod_adm0 ,
1603.45.16:,cog.xlsx,cog,cog_adm2 cog_adm1 cog_adm0 ,
1603.45.16:,col.xlsx,col,col_adm2 col_adm1 col_adm0 ,
1603.45.16:,com.xlsx,com,com_adm3 com_adm2 com_adm1 com_adm0 ,
1603.45.16:,cpv.xlsx,cpv,cpv_adm2 cpv_adm1 cpv_adm0 ,
1603.45.16:,cri.xlsx,cri,adm2 adm1 adm0 ,
1603.45.16:,dji.xlsx,dji,dji_adm2 dji_adm1 dji_adm0 ,
1603.45.16:,dma.xlsx,dma,dma_adm1 dma_adm0 ,
1603.45.16:,dom.xlsx,dom,dom_adm4 dom_adm3 dom_adm2 dom_adm1 dom_adm0 ,
1603.45.16:,dza.xlsx,dza,dza_adm2 dza_adm1 dza_adm0 ,
1603.45.16:,ecu.xlsx,ecu,ecu_adm3 ecu_adm2 ecu_adm1 ecu_adm0 ,
1603.45.16:,egy.xlsx,egy,egy_adm3 egy_adm2 egy_adm1 egy_adm0 ,
1603.45.16:,eri.xlsx,eri,eri_adm2 eri_adm1 eri_adm0 ,
1603.45.16:,eth.xlsx,eth,adm3 adm2 adm1 adm0 ,
1603.45.16:,fji.xlsx,fji,fji_adm3 fji_adm2 fji_adm1 fji_adm0 ,
1603.45.16:,fsm.xlsx,fsm,fsm_adm2 fsm_adm1 fsm_adm0 ,
1603.45.16:,gab.xlsx,gab,gab_adm2 gab_adm1 gab_adm0 ,
1603.45.16:,geo.xlsx,geo,geo_adm2 geo_adm1 geo_adm0 ,
1603.45.16:,gha.xlsx,gha,gha_adm2 gha_adm1 gha_adm0 ,
1603.45.16:,gin.xlsx,gin,gin_adm3 gin_adm2 gin_adm1 gin_adm0 ,
1603.45.16:,gtm.xlsx,gtm,gtm_adm2 gtm_adm1 gtm_adm0 ,
1603.45.16:,guf.xlsx,guf,guf_adm2 guf_adm1 guf_adm0 ,
1603.45.16:,hnd.xlsx,hnd,adm2 adm1 adm0 ,
1603.45.16:,hti.xlsx,hti,hti_adm3 hti_adm2 hti_adm1 hti_adm0 ,
1603.45.16:,idn.xlsx,idn,idn_adm4 idn_adm3 idn_adm2 idn_adm1 idn_adm0 ,
1603.45.16:,irn.xlsx,irn,irn_adm2 irn_adm1 irn_adm0 ,
1603.45.16:,irq.xlsx,irq,irq_adm3 irq_adm2 irq_adm1 irq_adm0 ,
1603.45.16:,kaz.xlsx,kaz,kaz_adm2 kaz_adm1 kaz_adm0 ,
1603.45.16:,ken.xlsx,ken,ken_adm2 ken_adm1 ken_adm0 ,
1603.45.16:,kgz.xlsx,kgz,kgz_adm3 kgz_adm2 kgz_adm1 kgz_adm0 ,
1603.45.16:,khm.xlsx,khm,khm_adm3 khm_adm2 khm_adm1 khm_adm0 ,
1603.45.16:,kir.xlsx,kir,kir_adm2 kir_adm1 kir_adm0 ,
1603.45.16:,lao.xlsx,lao,lao_adm2 lao_adm1 lao_adm0 ,
1603.45.16:,lbn.xlsx,lbn,adm3 adm2 adm1 adm0 ,
1603.45.16:,lbr.xlsx,lbr,lbr_adm2 lbr_adm1 lbr_adm0 ,
1603.45.16:,lby.xlsx,lby,lby_adm2 lby_adm1 lby_adm0 ,
1603.45.16:,lca.xlsx,lca,lca_adm2 lca_adm1 lca_adm0 ,
1603.45.16:,lka.xlsx,lka,lka_adm4 lka_adm3 lka_adm2 lka_adm1 lka_adm0 ,
1603.45.16:,lso.xlsx,lso,lso_adm2 lso_adm1 lso_adm0 ,
1603.45.16:,mar.xlsx,mar,adm2 adm1 adm0 ,
1603.45.16:,mda.xlsx,mda,mda_adm1 mda_adm0 ,
1603.45.16:,mdg.xlsx,mdg,mdg_adm4 mdg_adm3 mdg_adm2 mdg_adm1 mdg_adm0 ,
1603.45.16:,mex.xlsx,mex,adm2 adm1 adm0 ,
1603.45.16:,mkd.xlsx,mkd,mkd_adm4 mkd_adm3 mkd_adm2 mkd_adm1 mkd_adm0 ,
1603.45.16:,mli.xlsx,mli,mli_adm3 mli_adm2 mli_adm1 mli_adm0 ,
1603.45.16:,mng.xlsx,mng,adm2 adm1 adm0 ,
1603.45.16:,moz.xlsx,moz,moz_adm3 moz_adm2 moz_adm1 moz_adm0 ,
1603.45.16:,mrt.xlsx,mrt,adm2 adm1 adm0 ,
1603.45.16:,mtq.xlsx,mtq,mtq_adm2 mtq_adm1 mtq_adm0 ,
1603.45.16:,mus.xlsx,mus,adm1 adm0 ,
1603.45.16:,mwi.xlsx,mwi,mwi_adm3 mwi_adm2 mwi_adm1 mwi_adm0 ,
1603.45.16:,nam.xlsx,nam,nam_adm2 nam_adm1 nam_adm0 ,
1603.45.16:,ner.xlsx,ner,ner_adm3 ner_adm2 ner_adm1 ner_adm0 ,
1603.45.16:,nga.xlsx,nga,nga_adm3 nga_adm2 nga_adm1 nga_adm0 ,
1603.45.16:,nic.xlsx,nic,adm2 adm1 adm0 ,
1603.45.16:,npl.xlsx,npl,adm2 adm1 adm0 ,
1603.45.16:,pak.xlsx,pak,pak_adm3 pak_adm2 pak_adm1 pak_adm0 ,
1603.45.16:,pan.xlsx,pan,adm3 adm2 adm1 adm0 ,
1603.45.16:,per.xlsx,per,adm3 adm2 adm1 adm0 ,
1603.45.16:,phl.xlsx,phl,adm3 adm2 adm1 adm0 ,
1603.45.16:,png.xlsx,png,png_adm3 png_adm2 png_adm1 png_adm0 ,
1603.45.16:,pry.xlsx,pry,adm2 adm1 adm0 ,
1603.45.16:,pse.xlsx,pse,adm2 adm1 adm0 ,
1603.45.16:,rwa.xlsx,rwa,rwa_adm4 rwa_adm3 rwa_adm2 rwa_adm1 rwa_adm0 ,
1603.45.16:,sdn.xlsx,sdn,adm2 adm1 adm0 ,
1603.45.16:,sen.xlsx,sen,sen_adm3 sen_adm2 sen_adm1 sen_adm0 ,
1603.45.16:,slb.xlsx,slb,slb_adm3 slb_adm2 slb_adm1 slb_adm0 ,
1603.45.16:,sle.xlsx,sle,sle_adm4 sle_adm3 sle_adm2 sle_adm1 sle_adm0 ,
1603.45.16:,slv.xlsx,slv,adm2 adm1 adm0 ,
1603.45.16:,som.xlsx,som,som_adm2 som_adm1 som_adm0 ,
1603.45.16:,ssd.xlsx,ssd,adm2 adm1 adm0 ,
1603.45.16:,stp.xlsx,stp,adm2 adm1 adm0 ,
1603.45.16:,swz.xlsx,swz,swz_adm2 swz_adm1 swz_adm0 ,
1603.45.16:,sxm.xlsx,sxm,sxm_adm2 sxm_adm1 sxm_adm0 ,
1603.45.16:,syc.xlsx,syc,adm3 adm2 adm1 adm0 ,
1603.45.16:,syr.xlsx,syr,Admin3 Admin2 Admin1 Admin0 ,
1603.45.16:,tcd.xlsx,tcd,adm3 adm2 adm1 adm0 ,
1603.45.16:,tgo.xlsx,tgo,Admin3 Admin2 Admin1 Admin0 ,
1603.45.16:,tha.xlsx,tha,tha_adm3 tha_adm2 tha_adm1 tha_adm0 ,
1603.45.16:,tls.xlsx,tls,adm3 adm2 adm1 adm0 ,
1603.45.16:,ton.xlsx,ton,ton_adm3 ton_adm2 ton_adm1 ton_adm0 ,
1603.45.16:,tur.xlsx,tur,tur_adm4 tur_adm3 tur_adm2 tur_adm1 tur_adm0 ,
1603.45.16:,tza.xlsx,tza,tza_adm3 tza_adm2 tza_adm1 tza_adm0 ,
1603.45.16:,uga.xlsx,uga,adm4 adm3 adm2 adm1 adm0 ,
1603.45.16:,ukr.xlsx,ukr,ukr_adm4 ukr_adm3 ukr_adm2 ukr_adm1 ukr_adm0 ,
1603.45.16:,ury.xlsx,ury,adm2 adm1 adm0 ,
1603.45.16:,uzb.xlsx,uzb,adm2 adm1 adm0 ,
1603.45.16:,ven.xlsx,ven,ven_adm3 ven_adm2 ven_adm1 ven_adm0 ,
1603.45.16:,vnm.xlsx,vnm,adm2 adm1 adm0 ,
1603.45.16:,vut.xlsx,vut,vut_adm2 vut_adm1 vut_adm0 ,
1603.45.16:,yem.xlsx,yem,yem_adm3 yem_adm2 yem_adm1 yem_adm0 ,
1603.45.16:,zaf.xlsx,zaf,adm4 adm3 adm2 adm1 adm0 ,
1603.45.16:,zmb.xlsx,zmb,adm2 adm1 adm0 ,
1603.45.16:,zwe.xlsx,zwe,zwe_adm3 zwe_adm2 zwe_adm1 zwe_adm0 ,

… CSV

fititnt · 2022-01-05T02:20:31Z

I think a pure POSIX-shell function to make quick-n-dirty conversion from these headings to HXL could could, without need to more complex features.

meta-de-caput.uniq.txt
meta-de-caput.csv
meta-de-archivum.csv

Maybe will not need full table of languages to generate the terms. So worst case scenario they can be hardcoded

… the way

fititnt · 2022-01-05T07:31:34Z

An MVP of the HXLated result already exist.

Notes

I'm not 100% sure about the HXL hashtags for raw headers validOn and validTo. The https://tools.humdata.org/examples/hxl/ do have examples, but for data that uses PCodes, not the PCode tables themselves.

… everything from outside sources

…umeric UN m49 key

…ploy()

…beta()

fititnt · 2022-01-10T10:23:17Z

$ wc -l 1603/45/16/999/1603_45_16_1_15828996298662.hxl.csv 
432262 1603/45/16/999/1603_45_16_1_15828996298662.hxl.csv

m$ ls -lha  999999/1603/45/16/hxl/ | wc -l
408
$ ls -lha  999999/1603/45/16/hxl/*_0* | wc -l
114
$ ls -lha  999999/1603/45/16/hxl/* | wc -l
404
$ ls -lha  999999/1603/45/16/hxl/*0* | wc -l
114
$ ls -lha  999999/1603/45/16/hxl/*1* | wc -l
114
$ ls -lha  999999/1603/45/16/hxl/*2* | wc -l
110
$ ls -lha  999999/1603/45/16/hxl/*3* | wc -l
53
$ ls -lha  999999/1603/45/16/hxl/*4* | wc -l
13
$ ls -lha  999999/1603/45/16/hxl/*5* | wc -l
0

Trivia: do exist at least 432.262 published Place codes worldwide. (from 0 to 4, not attested admin level 5 and 6).

The minimum an non-compressed CSV with every code would be around 13MB. Also, how they are flattened would make difference on the space. But the good thing is we're far lower than GitHub ideal maximum of 50mb (hardlimit is 100MB)

Conventions on how to use UN M49 private namespaces as reference for compiled results

On this topic

from wikipedia
https://en.wikipedia.org/wiki/UN_M49#Private-use_codes_and_reserved_codes
Private-use codes and reserved codes
Beside the codes standardized above, the numeric codes 900 to 999 are reserved for private-use in ISO 3166-1 (under agreement by the UNSD) and in the UN M.49 standard. They may be used for any other groupings or subdivision of countries, territories and regions.

Some of these private-use codes may be found in some UN statistics reports and databases, for their own specific purpose. They are not portable across databases from third parties (except through private agreement), and may be changed without notice.

Note that the code 000 is reserved and not used for defining any region. It is used in absence of data, or for data in which no region (not even the World as a whole) is applicable. For unknown or unencoded regions, private-use codes should preferably be used.

For aggregated datasets related to world places, I believe we should start using private namespaces for it and document the logic. This saves a lot of upfront drama with scripting.

On logic about "population statistics"

I think aggregate population statistics is a different issue, but the sole major reason for the classical 70's UN m49 (https://unstats.un.org/unsd/publication/SeriesM/Series_M49_(1970)_en-fr.pdf) was this type of statistics. Wikipedia says this is not more used, but makes total sense for us here.

However I think population statistics is not a priority. But I know there is more than one datasets (and they are more automated) so at least at adm0 (country level) this would not be hard to automate. But we're already going for more detailed data, at least for countries such as Brazil which we may have additional sources.

Other priorities would be start mapping the P-Codes with Wikidata. Then things are going to be relevant.

…strative Level 0 (country/terrotiries)

…ly numeric

…1603_1 dictionaries

…DATA / HXL_ATTRIBUTES_AD_WIKIDATA mappings draft

…506 to use for adm0~adm6

…DATA mappings improved

…(was HXL_HASHTAGS_AD_WIKIDATA)

… improved

…lementation (based on dictionary) of COD-AB like data to RDF+HXL

…rk the original CSV/HXL/HXLTM exporter also save upper levels, so it make easier for make RDF relationship from the most detailed administrative region availible

…atio_identitas_numerodinatio() started

…pand all ABs

…ries, local only (time: 30m28,338s); before RDF relations

…A_AD_RDF started

…coded list will make it work for common cases at sort term

…around for duplicated items

…9999_54872.py --objectivum-formato=_temp_hxl_meta_in_json

…45_16__item_rdf() draft

…36,514s real 37m35,515s)

fititnt · 2022-06-18T16:41:42Z

TL;DR: the way graph databases works means, in humanitarian jargon, a single 10mb to 200mb RDF file(*) can have the entire country-level data, but instead of Excel or SQL, users could use high level interfaces, such as Protege, to even have semantic reasoning. While some humanitarians too focused would like the AI part of this, from our point of view this actually 1) helps to *allow the numeric codes we use be on local language of the persons who actually work on that at they own country" and 2) **the initial very, very hard work to automate documentation eventually will use reasoning to validate itself"*.

_{(*): however. the "single file" with all relevant to a region does not meant public data should be already such huge file because people would likely have own local data or want different public data. Either users adding each reference data or tools could merge the final result, but sensitive data do not need to leave users network.}

Rationale behind `[1603:16:{unm49}]` prefix

While we could start generating all the works weeks ago (and not going further on RDF SKOS) the after being viable load every dictionary we have to SQLite/PostgreSQL using well formed CSVs (topic #37), my time trying to "organize things" also in graph format (topic #41) made rethink the entire organization.

under graph format, some entry points would have hunger amount of connected data

While there's different ways to partition data, unless we employ pure algorithmic (aka use some way to divide earth in equal blocks) most users will tend to organize or share data by administrative region. Humanitarian sector (at international level) tends to use more high levels, but places like Brazil would go much more specialized by region.

Side comment. Ok. After this, there's also cases were data would be shared by region (for example, coverage area of an hospital), which likely be more dynamic (and so, would need to have the area shared too), but for sake of both national and international interoperability, makes sense to make as easier as possible to everyone at bare minimum have ways to use the closest to standard codes.

One implication of the "way to organize in graph format" is that while something such as [1603:45:16:76:2:3106200] today we're using to represent the Belo Horizonte, both removing "45" is a number shorter and the amount of data attached to the [1603:16:{unm49}] very, very high. Actually, if we ignore translations for places names we can get from Wikidata (places such as Rio de Janeiro can have over 200), pretty much every time someone would use the namespaces with key by P codes, they will share final data, not data about CODs.

On the idea that we might have other [1603:16:{unm49}] where "16" is a different nomenclature

After such simplification of remove the 45, as the entire idea is document/automate data interoperability (so, different from HXL Standard which only have vocabulary for aggregated, we're optimizing for > 90% use cases, e.g. sensitive data that should be processed offline) we will start to have namespaces for things that are not strictly places. For example:

Natural persons
Organizations

However, the way the world is organized, means we would also plan ahead how to partition the data which is not about place at all. Even if we consider Brazil alone, several government instances can have different entry points for same concept (like a person) and even at admin0 ("country" level in case of Brazil) for cases such as COVID-19 vaccines the entry points could up with public data go over 170 millions natural persons (>177.550.128). It's so much data that RDF wouldn't scale well compared to SQLite.

Ok. I'm aware it might sound "lazy" to recommend people to, unless someone else's in their country proposes better numeric namespace, as soon as we later create a base namespace to suggest as reference to store disaggregated data (at least Persons and Organizations) when in doubt, we suggest similar key to what we do to administrative boundaries.

This "lazy" way somewhat also helps to simplify a lot tooling that could allow (like by user additional parameter" to infer that such organization or person have its data coordinated by respectively administrative region. This might not be the case, but at least users have some default suggestion that tools would work perfectly. In any case, it would still be possible to (at country level) different organizations have mappings of what one person or organization code inside them means on another level.

A different "1603" on something like `[1603:16:{unm49}]` might be used for data that is not 100% factual (like anonymized, test data, simulated data, etc)

This makes far more sense when dealing with data for natural persons or organizations, but we might have a totally different global prefix for data that is an entirely different class of data.

In a context of data directly associated with places [1603:16:{unm49}], this would means (at worst case to NOT use for real)

The place country and it's internal divisions are a fake country; this might be relevant on simulations about epidemics we're people would complain if using real regions (less relevant if is country level preparedness; but not if other regions are invited to discuss)
The places are real, but associated data is fully simulated. Since root 1603 would be different, tools would need humans to explicitly ask what data from "production data / reference data" can be applied to this simulated place.

This topic alone would require an open issue here about how we will handle this type of data. However there's some edge cases where what humanitarians use as production data already are "simulations" or statistical inferences based on real data (use case: country person-to-person Census is too old) so even something like persons living in an area might get quite complicated. For example, the same way we would "get data from more specific namespace" at [1603:45:16] (Humanitarian P-Codes) to publish on [1603:16], since we're already preparing to allow ingest data on Tabular and Graph databases, we need somewhat assume users might not like our default choices, so things likely to have more versions (not just statistical Inference about population, which the entire methodology is different) makes sense have different options.

fititnt · 2022-06-29T02:33:08Z

Humm... we will need some documented way to

Encode PCodes by administrative level which would also allow compatibility with RDF.
1. However, there is Wikidata P for UN M49, but not for P-Codes. (see https://www.wikidata.org/wiki/Wikidata:Database_reports/List_of_properties/all)
2. We can still use https://en.wikipedia.org/wiki/Place_code which have the Q code https://www.wikidata.org/wiki/Q7200235 as something like part_of
For #admX+i_nnn+i_Nnnn without +altN, we can start encoding as skos:prefLabel
For #admX+i_nnn+i_Nnnn with +altN, we can start encoding as skos:altLabel, but since we can have several, it would be a good idea merge the fields with some separator

the current drafts already work for HXL / HXLTM, but without this change, it would need extra hardcoded logic. Another issue is that the current use of +altN allow pass validation of not having same fields (which frictionless validate file.csv and databases complain) but "is not semantic".

Edit:

a reason to encode PCodes by administrative level is for performance reasons when on graph databases. with over 450.000 P-Codes, if everyone become directly linked to a single upper concept, it would be not scale. Even dumping a Turtle file, it would get those > 450.000 potentially on a single line. Not good.
We could still "merge" (maybe with RML) some way to allow a global search without administrative level, but at least this issue would be restricted to less queries. If the user or api generating the query at least know either the level OR that the prefix means a specific country, it could rewrite the query to much, much less search space. Either by administrative level OR prefix already would allow reduce drastically the performance issues (which makes sense only for cases where entire world is loaded, not really a issue for regional users)

…refs #2 (comment))

…databases

…local numeric identifiers (brute force creation of IDs based on P-Codes may fail since some places have letters in the middle of P-Codes)

… local numeric identifiers (brute force creation of IDs based on P-Codes may fail since some places have letters in the middle of P-Codes)

…evel and is_zxxx added

fititnt changed the title Jan 4, 2022

fititnt referenced this issue in EticaAI/lsf-cache Jan 5, 2022

EticaAI/numerordinatio#6: 1603.45.16.sh

c6dd5a4

fititnt referenced this issue in EticaAI/lsf-cache Jan 5, 2022

EticaAI/numerordinatio#6: 1603.45.16.sh; csv with general idea of she…

1a59de5

…ets names

fititnt referenced this issue in EticaAI/lsf-cache Jan 5, 2022

EticaAI/numerordinatio#6: 1603.45.16.sh; P-Codes already extracted as…

7c82dd0

… CSV

fititnt referenced this issue in EticaAI/lsf-cache Jan 5, 2022

EticaAI/numerordinatio#6: 999999/1603/45/16/csv data files

6bcffc7

fititnt referenced this issue in EticaAI/lsf-cache Jan 5, 2022

EticaAI/numerordinatio#6: 999999/1603/45/16/csv; more medatada

ff2abca

fititnt referenced this issue in EticaAI/lsf-cache Jan 5, 2022

EticaAI/numerordinatio#6: un_pcode_rawheader_admin_level(); regex all…

a786382

… the way

fititnt referenced this issue in EticaAI/lsf-cache Jan 5, 2022

EticaAI/numerordinatio#6: MVP of HXLHashtag generation

76b8285

fititnt referenced this issue in EticaAI/lsf-cache Jan 5, 2022

EticaAI/numerordinatio#6: 1603.45.16 uppercase ISO 3661p1

87019dd

fititnt referenced this issue in EticaAI/lsf-cache Jan 5, 2022

EticaAI/numerordinatio#6: 1603.45.16 HXLated CSVs

dd38be6

fititnt referenced this issue in EticaAI/lsf-cache Jan 5, 2022

EticaAI/numerordinatio#6: ./999999999/0.sh The happy path to initialy…

cada00a

… everything from outside sources

fititnt referenced this issue in EticaAI/lsf-cache Jan 6, 2022

EticaAI/numerordinatio#6: NOW were starting to be be able PCodes as n…

b74cbcc

…umeric UN m49 key

fititnt referenced this issue in EticaAI/lsf-cache Jan 6, 2022

EticaAI/numerordinatio#6: bootstrap_999999_1603_45_16_metadata_pre_de…

dca4356

…ploy()

fititnt referenced this issue in EticaAI/lsf-cache Jan 6, 2022

EticaAI/numerordinatio#6: deploy_1603_45_16_prepare_directories()

8aaf230

fititnt referenced this issue in EticaAI/lsf-cache Jan 6, 2022

EticaAI/numerordinatio#6: numerordinatio_translatio_alpha_in_digito__…

fdc585d

…beta()

fititnt referenced this issue in EticaAI/lsf-cache Jan 6, 2022

EticaAI/numerordinatio#6: ./999999999/1603.47.639.3.sh reorganization

81d6316

fititnt transferred this issue from EticaAI/numerordinatio Jan 10, 2022

fititnt added a commit that referenced this issue Jan 10, 2022

1603:45:16 #2: draft of compiled globa codes

3b87c2b

fititnt added a commit that referenced this issue Jan 10, 2022

1603:3:12 #2: started MVP of read-only access to Wikidata

fb326e7

fititnt added a commit that referenced this issue Jan 10, 2022

1603:3:12 #2: download Wikipedia/Wikidata languages maping and Admini…

75264ef

…strative Level 0 (country/terrotiries)

fititnt added a commit that referenced this issue Jan 19, 2022

1603:45:16 (#2): improved documentation on P-Codes semantics when ful…

2c20a66

…ly numeric

fititnt added the dictionaria-specificis dictiōnāria specificīs; /specific group of dictionaries/@eng-Latn label Feb 4, 2022

fititnt mentioned this issue Apr 21, 2022

[1603:??] Geographia (create base numerospace) #31

Open

fititnt mentioned this issue Apr 29, 2022

New data warehouse strategy [tabular]: SQL database populated with dictionaries data (experimental feature) #37

Open

fititnt mentioned this issue May 8, 2022

Organization strategy to deal with numeric namespace of dictionaries which are handled at administrative level #39

Open

fititnt added a commit that referenced this issue May 15, 2022

1603_45_49 (#2): The 1603_45_49 now become git tracked like selected …

c8100b1

…1603_1 dictionaries

fititnt added a commit that referenced this issue Jun 10, 2022

rdf+bcp47+hxl (#41), admin-l (#39), pcodes (#2): HXL_HASHTAGS_AD_WIKI…

7a5ea50

…DATA / HXL_ATTRIBUTES_AD_WIKIDATA mappings draft

fititnt added a commit that referenced this issue Jun 10, 2022

rdf+bcp47+hxl (#41), admin-l (#39), pcodes (#2): convention of s500-s…

ea8226a

…506 to use for adm0~adm6

fititnt added a commit that referenced this issue Jun 11, 2022

rdf+bcp47+hxl (#41), admin-l (#39), pcodes (#2): HXL_HASHTAGS_AD_WIKI…

abe808e

…DATA mappings improved

fititnt added a commit that referenced this issue Jun 11, 2022

rdf+bcp47+hxl (#41), admin-l (#39), pcodes (#2): HXL_HASHTAGS_AD_RDF …

678e753

…(was HXL_HASHTAGS_AD_WIKIDATA)

fititnt added a commit that referenced this issue Jun 11, 2022

rdf+bcp47+hxl (#41), admin-l (#39), pcodes (#2): HXLHashtagSimplici()…

1122fda

… improved

fititnt added a commit that referenced this issue Jun 11, 2022

rdf+bcp47+hxl (#41), admin-l (#39), pcodes (#2): not yet finished imp…

e4f9eb6

…lementation (based on dictionary) of COD-AB like data to RDF+HXL

fititnt added a commit that referenced this issue Jun 12, 2022

rdf+bcp47+hxl (#41), admin-l (#39), pcodes (#2): CodAbTabulae.praepar…

8ea1066

…atio_identitas_numerodinatio() started

fititnt added a commit that referenced this issue Jun 12, 2022

rdf+bcp47+hxl (#41), admin-l (#39), pcodes (#2): HXLTM and No1 now ex…

cfc8bfe

…pand all ABs

fititnt added a commit that referenced this issue Jun 12, 2022

rdf+bcp47+hxl (#41), admin-l (#39), pcodes (#2): full drill 140 count…

731b579

…ries, local only (time: 30m28,338s); before RDF relations

fititnt added a commit that referenced this issue Jun 12, 2022

rdf+bcp47+hxl (#41), admin-l (#39), pcodes (#2): HXL_HASH_ET_ATTRIBUT…

a4a3508

…A_AD_RDF started

fititnt added a commit that referenced this issue Jun 12, 2022

rdf+bcp47+hxl (#41), admin-l (#39), pcodes (#2): not perfect but hard…

21a9f6a

…coded list will make it work for common cases at sort term

fititnt added a commit that referenced this issue Jun 12, 2022

rdf+bcp47+hxl (#41), admin-l (#39), pcodes (#2): CodAbTabulae(); work…

4e00998

…around for duplicated items

fititnt added a commit that referenced this issue Jun 12, 2022

rdf+bcp47+hxl (#41), admin-l (#39), pcodes (#2): added draft of 99999…

ff6d2e8

…9999_54872.py --objectivum-formato=_temp_hxl_meta_in_json

fititnt added a commit that referenced this issue Jun 13, 2022

rdf+bcp47+hxl (#41), admin-l (#39), pcodes (#2): bash bootstrap_1603_…

1877474

…45_16__item_rdf() draft

fititnt added a commit that referenced this issue Jun 13, 2022

rdf+bcp47+hxl (#41), admin-l (#39), pcodes (#2): full drill (user 53m…

c0c954c

…36,514s real 37m35,515s)

fititnt added a commit that referenced this issue Jun 29, 2022

999999999_7200235.py (#2): +v_pcode attempt to better encode w/ RDF (…

ca86756

…refs #2 (comment))

fititnt added a commit that referenced this issue Jun 29, 2022

999999999_7200235.py (#2): #country+code+v_iso3166p1a2

092d259

fititnt added a commit that referenced this issue Jun 29, 2022

999999999_7200235.py (#2): trying compress even more headings to fit …

1430a33

…databases

fititnt added a commit that referenced this issue Jun 29, 2022

999999999_54872.py (#2): numerordinatio_data__hxltm_to_bcp47()

28045bc

fititnt added a commit that referenced this issue Jun 30, 2022

1603_1.py (#2): draft; also generate datapackage/csvw for bcp47

cc1c8af

fititnt added a commit that referenced this issue Jun 30, 2022

1603_1.py (#2): draft; more ajusts of datapackage/csvw for bcp47

81c03b1

fititnt added a commit that referenced this issue Jun 30, 2022

999999999_54872.py (#2): bugfixes

369ca99

fititnt added a commit that referenced this issue Jun 30, 2022

(#2): doing another world run (now with bcp47 + RDF+Turtle validation)

bab3041

fititnt added a commit that referenced this issue Jul 12, 2022

1603_45_16.lib.sh (#2): 104 MMR (Myanmar), skip validation of unique …

7f4cdff

…local numeric identifiers (brute force creation of IDs based on P-Codes may fail since some places have letters in the middle of P-Codes)

fititnt added a commit that referenced this issue Jul 12, 2022

1603_45_16.lib.sh (#2): 834 TZA (Tanzania), skip validation of unique…

0c06957

… local numeric identifiers (brute force creation of IDs based on P-Codes may fail since some places have letters in the middle of P-Codes)

fititnt added a commit that referenced this issue Jul 23, 2022

#2, #31, #39, #45: 1603/16/1/0/1603_16_1_0.no1.tm.hxl.csv now core table

83f2d5f

fititnt added a commit that referenced this issue Jul 24, 2022

#2, #31, #39, #45: 1603/16/1/0/1603_16_1_0.no1.tm.hxl.csv ix_zzcodabl…

904fca9

…evel and is_zxxx added

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MVP of `[1603.45.16]` /"Ontologia"."United Nations"."P"/@eng-Latn #2

MVP of `[1603.45.16]` /"Ontologia"."United Nations"."P"/@eng-Latn #2

fititnt commented Jan 4, 2022

fititnt commented Jan 5, 2022

fititnt commented Jan 5, 2022

fititnt commented Jan 5, 2022

fititnt commented Jan 10, 2022

fititnt commented Jun 18, 2022

fititnt commented Jun 29, 2022 •

edited

Loading

MVP of [1603.45.16] /"Ontologia"."United Nations"."P"/@eng-Latn #2

MVP of [1603.45.16] /"Ontologia"."United Nations"."P"/@eng-Latn #2

Comments

fititnt commented Jan 4, 2022

Replacing ISO 3661-1 alpha 2 with UN M49

[1603.45.16] vs [1603.45.49]

fititnt commented Jan 5, 2022

fititnt commented Jan 5, 2022

fititnt commented Jan 5, 2022

Notes

fititnt commented Jan 10, 2022

Conventions on how to use UN M49 private namespaces as reference for compiled results

On logic about "population statistics"

fititnt commented Jun 18, 2022

Rationale behind [1603:16:{unm49}] prefix

under graph format, some entry points would have hunger amount of connected data

On the idea that we might have other [1603:16:{unm49}] where "16" is a different nomenclature

A different "1603" on something like [1603:16:{unm49}] might be used for data that is not 100% factual (like anonymized, test data, simulated data, etc)

fititnt commented Jun 29, 2022 • edited Loading

MVP of `[1603.45.16]` /"Ontologia"."United Nations"."P"/@eng-Latn #2

MVP of `[1603.45.16]` /"Ontologia"."United Nations"."P"/@eng-Latn #2

`[1603.45.16]` vs `[1603.45.49]`

Rationale behind `[1603:16:{unm49}]` prefix

A different "1603" on something like `[1603:16:{unm49}]` might be used for data that is not 100% factual (like anonymized, test data, simulated data, etc)

fititnt commented Jun 29, 2022 •

edited

Loading