-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MVP of read access to Wikidata #3
Comments
…ents of language codes from other sources
…ed yet) query generation to extract translations
Fantastic! We're using the #9 curated table to know the language mappings. Wikipedia have over 300 languages (some obviously have more content than others) but the way we use data, is not viable not build such intermediate tables. The current version is using only some hardcoded Q codes (so is not all the ones compiled previously from https://docs.google.com/spreadsheets/d/1ih3ouvx_n8W5ntNcYBqoyZ2NRMdaA0LRg5F9mGriZm4/edit#gid=1894917893). But already gives an idea fititnt@bravo:/workspace/git/EticaAI/multilingual-lexicography-automation/officinam$ ./999999999/0/1603_3_12.py --actionem-sparql
SELECT ?item ?item__rem__i_ara__is_arab ?item__rem__i_ben__is_beng ?item__rem__i_grc__is_grek ?item__rem__i_lat__is_latn ?item__rem__i_rus__is_cyrl ?item__rem__i_san__is_zzzz ?item__rem__i_por__is_latn ?item__rem__i_eng__is_latn ?item__rem__i_fra__is_latn ?item__rem__i_nld__is_latn ?item__rem__i_deu__is_latn ?item__rem__i_spa__is_latn ?item__rem__i_ita__is_latn ?item__rem__i_gle__is_latn
WHERE
{
VALUES ?item { wd:Q1065 wd:Q82151 wd:Q125761 wd:Q7809 wd:Q386120 wd:Q61923 wd:Q7164 }
OPTIONAL { ?item rdfs:label ?item__rem__i_ara__is_arab filter (lang(?item__rem__i_ara__is_arab) = "ar"). }
OPTIONAL { ?item rdfs:label ?item__rem__i_ben__is_beng filter (lang(?item__rem__i_ben__is_beng) = "bn"). }
OPTIONAL { ?item rdfs:label ?item__rem__i_grc__is_grek filter (lang(?item__rem__i_grc__is_grek) = "grc"). }
OPTIONAL { ?item rdfs:label ?item__rem__i_lat__is_latn filter (lang(?item__rem__i_lat__is_latn) = "la"). }
OPTIONAL { ?item rdfs:label ?item__rem__i_rus__is_cyrl filter (lang(?item__rem__i_rus__is_cyrl) = "ru"). }
OPTIONAL { ?item rdfs:label ?item__rem__i_san__is_zzzz filter (lang(?item__rem__i_san__is_zzzz) = "sa"). }
OPTIONAL { ?item rdfs:label ?item__rem__i_por__is_latn filter (lang(?item__rem__i_por__is_latn) = "pt"). }
OPTIONAL { ?item rdfs:label ?item__rem__i_eng__is_latn filter (lang(?item__rem__i_eng__is_latn) = "en"). }
OPTIONAL { ?item rdfs:label ?item__rem__i_fra__is_latn filter (lang(?item__rem__i_fra__is_latn) = "fr"). }
OPTIONAL { ?item rdfs:label ?item__rem__i_nld__is_latn filter (lang(?item__rem__i_nld__is_latn) = "nl"). }
OPTIONAL { ?item rdfs:label ?item__rem__i_deu__is_latn filter (lang(?item__rem__i_deu__is_latn) = "de"). }
OPTIONAL { ?item rdfs:label ?item__rem__i_spa__is_latn filter (lang(?item__rem__i_spa__is_latn) = "es"). }
OPTIONAL { ?item rdfs:label ?item__rem__i_ita__is_latn filter (lang(?item__rem__i_ita__is_latn) = "it"). }
OPTIONAL { ?item rdfs:label ?item__rem__i_gle__is_latn filter (lang(?item__rem__i_gle__is_latn) = "ga"). }
}
|
…enerate raw SPARQL query and generate CSV/TSV directly
…it more flexible; still need solve cases where multiple columns have Q codes
For an Minimal Viable Product, the read access to Wikidata already is working nicely. This issue can be closed. |
This issue is about Minimal Viable Product with read-only access to Wikidata. One of main advantages is it's content already be on public domain, so this would allow generating external datasets some vocabularies even original copyright holders still need a long process of formal allowing any type of re-publishable license.
Trivia: Wikidata actually allows extraction of label translations from Wikipedia's related terms and it's explicitly public domain. This means any potential care will have very consistent mappings between our codes and Wikidata Q codes very relevant.
The text was updated successfully, but these errors were encountered: