Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDF IRI(-generation) & Data Location/Storage #4

Open
hoijui opened this issue Dec 13, 2023 · 1 comment
Open

RDF IRI(-generation) & Data Location/Storage #4

hoijui opened this issue Dec 13, 2023 · 1 comment
Assignees
Labels
idea status may be bullshit, may be the next top feature

Comments

@hoijui
Copy link
Collaborator

hoijui commented Dec 13, 2023

Current Data Aggregation Process

  1. Projects are found on different platforms by different means.
  2. Their meta-data is extracted, either by:
    1. just copying the okh.toml file out of their storage/repo
    2. assembling an okh.toml file by using the hosting platforms API
  3. the TOML data gets converted to RDF

The problem

The idea of LinkedData, and very much ours for OKH too, is to support a distributed data system.
Furthermore, all RDF data - more specifically each subject - is uniquely identifiable by its IRI.
An IRI is simply a unicode-version of a URL.
This pushes two requirements onto us:

  1. It is very much recommended - and we should ensure this to be the case -
    that a subject is available under its IRI.
  2. If we generate the RDF on our server (be it centralized or decentralized),
    and we make it available for the public, we would necessarily have to do it under a domain
    (which thus becomes an essential part of the RDFs IRI) that we control,
    and that the original project does not control.
    This means, the data would not be distributed anymore,
    and it also means, that each data-collector would host each projects RDF
    under their URL, using that URL as IRI, which means, we would end up
    with the same project/data available under different IRIs,
    which are supposed to be unique identifiers,
    meaning we would end up with duplicates.
    -> very bad!

We could choose to do one of two things:

  1. use a domain under the control of the original project
    (e.g. its github pages URL or a perma-URL they registered for this purpose),
    but actually host the RDF under our own domain, violating the first requirement above, or
  2. host it on our domain, and also using the correct hosting location as its IRI,
    which satisfies the first requirement above, but violates the second.

In theory, there is a third option:
Each project generates their RDF by themselfs in a CI, and then hosts it permanently (at least each release version of it plus the latest development one). That though, is very, very unlikely, unstable, difficult to maintain and update, .... and only possible for git-hosted (or other SCM-hosted) projects.
-> not really an option.

@hoijui hoijui self-assigned this Dec 13, 2023
@hoijui hoijui added the idea status may be bullshit, may be the next top feature label Dec 13, 2023
@hoijui
Copy link
Collaborator Author

hoijui commented Dec 17, 2023

DING, DING, DING, DING, ...

:O
Now, writing the above, I got an idea!
There is actually a fourth option:
We could use a similar approach like W3ID does, to host the data.
There is one (or optionally a few -> redundant) git repos, that contain/host all the RDF data.
Multiple parties that aggregate the data, have push-access to it, and regularly, push to it, in an automated fashion, when crawling/generating the data.
This means, both data-gatherers and individual projects could push data.
This allows for a somewhat distributed-ish, but at the very least decentralized/federated power over the RDF data,
and as a huge beneficial side-effect, it would allow to efficiently distribute the data-gathering load.

@hoijui hoijui transferred this issue from OPEN-NEXT/OKH-LOSH May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
idea status may be bullshit, may be the next top feature
Projects
None yet
Development

No branches or pull requests

1 participant