MVP of read access to Wikidata #3

fititnt · 2022-01-10T02:53:30Z

This issue is about Minimal Viable Product with read-only access to Wikidata. One of main advantages is it's content already be on public domain, so this would allow generating external datasets some vocabularies even original copyright holders still need a long process of formal allowing any type of re-publishable license.

Trivia: Wikidata actually allows extraction of label translations from Wikipedia's related terms and it's explicitly public domain. This means any potential care will have very consistent mappings between our codes and Wikidata Q codes very relevant.

…previously crafted list of QIDs)

… a dataset

…ents of language codes from other sources

…ed yet) query generation to extract translations

…ations

fititnt · 2022-01-22T00:01:50Z

Fantastic!

We're using the #9 curated table to know the language mappings. Wikipedia have over 300 languages (some obviously have more content than others) but the way we use data, is not viable not build such intermediate tables.

The current version is using only some hardcoded Q codes (so is not all the ones compiled previously from https://docs.google.com/spreadsheets/d/1ih3ouvx_n8W5ntNcYBqoyZ2NRMdaA0LRg5F9mGriZm4/edit#gid=1894917893). But already gives an idea

fititnt@bravo:/workspace/git/EticaAI/multilingual-lexicography-automation/officinam$ ./999999999/0/1603_3_12.py --actionem-sparql

SELECT ?item ?item__rem__i_ara__is_arab ?item__rem__i_ben__is_beng ?item__rem__i_grc__is_grek ?item__rem__i_lat__is_latn ?item__rem__i_rus__is_cyrl ?item__rem__i_san__is_zzzz ?item__rem__i_por__is_latn ?item__rem__i_eng__is_latn ?item__rem__i_fra__is_latn ?item__rem__i_nld__is_latn ?item__rem__i_deu__is_latn ?item__rem__i_spa__is_latn ?item__rem__i_ita__is_latn ?item__rem__i_gle__is_latn
WHERE
{
  VALUES ?item { wd:Q1065 wd:Q82151 wd:Q125761 wd:Q7809 wd:Q386120 wd:Q61923 wd:Q7164 }
  OPTIONAL { ?item rdfs:label ?item__rem__i_ara__is_arab filter (lang(?item__rem__i_ara__is_arab) = "ar"). }
  OPTIONAL { ?item rdfs:label ?item__rem__i_ben__is_beng filter (lang(?item__rem__i_ben__is_beng) = "bn"). }
  OPTIONAL { ?item rdfs:label ?item__rem__i_grc__is_grek filter (lang(?item__rem__i_grc__is_grek) = "grc"). }
  OPTIONAL { ?item rdfs:label ?item__rem__i_lat__is_latn filter (lang(?item__rem__i_lat__is_latn) = "la"). }
  OPTIONAL { ?item rdfs:label ?item__rem__i_rus__is_cyrl filter (lang(?item__rem__i_rus__is_cyrl) = "ru"). }
  OPTIONAL { ?item rdfs:label ?item__rem__i_san__is_zzzz filter (lang(?item__rem__i_san__is_zzzz) = "sa"). }
  OPTIONAL { ?item rdfs:label ?item__rem__i_por__is_latn filter (lang(?item__rem__i_por__is_latn) = "pt"). }
  OPTIONAL { ?item rdfs:label ?item__rem__i_eng__is_latn filter (lang(?item__rem__i_eng__is_latn) = "en"). }
  OPTIONAL { ?item rdfs:label ?item__rem__i_fra__is_latn filter (lang(?item__rem__i_fra__is_latn) = "fr"). }
  OPTIONAL { ?item rdfs:label ?item__rem__i_nld__is_latn filter (lang(?item__rem__i_nld__is_latn) = "nl"). }
  OPTIONAL { ?item rdfs:label ?item__rem__i_deu__is_latn filter (lang(?item__rem__i_deu__is_latn) = "de"). }
  OPTIONAL { ?item rdfs:label ?item__rem__i_spa__is_latn filter (lang(?item__rem__i_spa__is_latn) = "es"). }
  OPTIONAL { ?item rdfs:label ?item__rem__i_ita__is_latn filter (lang(?item__rem__i_ita__is_latn) = "it"). }
  OPTIONAL { ?item rdfs:label ?item__rem__i_gle__is_latn filter (lang(?item__rem__i_gle__is_latn) = "ga"). }
}

TRY IT ↗

…enerate raw SPARQL query and generate CSV/TSV directly

…it more flexible; still need solve cases where multiple columns have Q codes

…naries

fititnt · 2022-02-04T04:46:31Z

For an Minimal Viable Product, the read access to Wikidata already is working nicely. This issue can be closed.

fititnt mentioned this issue Jan 11, 2022

MVP of [1603:??:1603] /HXL/; focus on pre-compile replacement maps #5

Open

fititnt added a commit that referenced this issue Jan 21, 2022

Wikidata (#3): starting (again) strategy to get translations (now by …

d5db292

…previously crafted list of QIDs)

fititnt added a commit that referenced this issue Jan 21, 2022

Wikidata (#3): not there... yet

78add70

fititnt added a commit that referenced this issue Jan 21, 2022

Wikidata (#3): 999999999/0/1603_3_12.py started

bc6f8f9

fititnt added a commit that referenced this issue Jan 21, 2022

Wikidata (#3), 1603:1:51 (#9): download scripts now allow select only…

e881ec4

… a dataset

fititnt added a commit that referenced this issue Jan 21, 2022

Wikidata (#3), 1603:1:51 (#9): namespace for HXL and CSV-like equival…

0c8975d

…ents of language codes from other sources

fititnt added a commit that referenced this issue Jan 21, 2022

Wikidata (#3), 1603:1:51 (#9): great! Now we have a (not 100% automat…

ec86661

…ed yet) query generation to extract translations

fititnt added a commit that referenced this issue Jan 21, 2022

Wikidata (#3), 1603:1:51 (#9): query somewhat automated; need optimiz…

2faf531

…ations

fititnt added a commit that referenced this issue Jan 22, 2022

Wikidata (#3), 1603:1:51 (#9): 1603_3_12.py command line both allow g…

afd9ffc

…enerate raw SPARQL query and generate CSV/TSV directly

fititnt added a commit that referenced this issue Jan 22, 2022

Wikidata (#3), 1603:1:51 (#9): 1603/45/1/1603_45_1.wikiq.tm.csv

7f7f309

fititnt added a commit that referenced this issue Jan 22, 2022

1603:3:12 (#3): Q codes extraction (used to get translations) now a b…

40c9ce0

…it more flexible; still need solve cases where multiple columns have Q codes

fititnt added a commit that referenced this issue Jan 30, 2022

#3, #9: MVP of merging Wikidata translations on Numerordinatio dictio…

9e43101

…naries

fititnt added the reconciliatio-erga-verba reconciliātiō ergā verba; Lit. /reconciliation with respect to words/@eng-Latn; Term reconciliation label Feb 4, 2022

fititnt closed this as completed Feb 4, 2022

fititnt mentioned this issue May 16, 2022

[praeparātiō ex automatīs] MVP of idea of automation to pre-process external data to LSF internal file format #42

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MVP of read access to Wikidata #3

MVP of read access to Wikidata #3

fititnt commented Jan 10, 2022

fititnt commented Jan 22, 2022

fititnt commented Feb 4, 2022

MVP of read access to Wikidata #3

MVP of read access to Wikidata #3

Comments

fititnt commented Jan 10, 2022

fititnt commented Jan 22, 2022

fititnt commented Feb 4, 2022