Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Materialisation does not necessarily select the latest version if multiple versions exist within the same page #701

Open
StijnDenisSirus opened this issue Oct 21, 2024 · 2 comments
Labels
needs triage Issue needs to be evaluated by team

Comments

@StijnDenisSirus
Copy link

Describe the bug

When running an Ldio:LdesClient (ldes/ldi-orchestrator:2.9.0-SNAPSHOT) with materialisation enabled (all other materialisation properties set to default), the state object that is returned does not necessarily reflect the 'latest' version if multiple versions exist within the same page.

To Reproduce
Steps to reproduce the behavior:

  1. Go to https://ca-westtoer-ldes.bluesea-b3dcdb70.westeurope.azurecontainerapps.io/touristattractions/latestView?pageNumber=452
  2. We are interested in the object whose URI is: https://westtoer.be/id/productlist/b4215ccb-a14e-45b0-9956-9449607412fa
  3. The latest version of this object is: https://westtoer.be/id/productlist/b4215ccb-a14e-45b0-9956-9449607412fa/2024-10-17T10:44:13.6224407Z
  4. The state object returned by the client is based on https://westtoer.be/id/productlist/b4215ccb-a14e-45b0-9956-9449607412fa/2024-10-16T09:47:34.6835981Z . It is potentially not a coincidence that this is the version that is listed first on the page.
@StijnDenisSirus StijnDenisSirus added the needs triage Issue needs to be evaluated by team label Oct 21, 2024
@jobulcke
Copy link
Collaborator

Hi @StijnDenisSirus
I have tried to reproduce this issue with the following pipeline, based on the description above:

name: client-pipeline
input:
  name: Ldio:LdesClient
  config:
    materialisation.enabled: true
    urls: https://ca-westtoer-ldes.bluesea-b3dcdb70.westeurope.azurecontainerapps.io/touristattractions/latestView?pageNumber=452
outputs:
  - name: Ldio:ConsoleOut

With this pipeline, two state objects passes through the pipeline:

  1. At line 1305 of the provided logs
<https://westtoer.be/id/productlist/b4215ccb-a14e-45b0-9956-9449607412fa>
        rdf:type                      dcmitype:Collection;
        prov:generatedAtTime          "2024-10-16T16:27:32.8812519Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>;
        generiek:lokaleIdentificator  "b4215ccb-a14e-45b0-9956-9449607412fa"^^<http://www.w3.org/2000/01/rdf-schema#string>;
        generiek:naamruimte           "https://westtoer.be/id/productlist"^^<http://www.w3.org/2000/01/rdf-schema#string>;
        generiek:versieIdentificator  "2024-10-16T16:27:32.8812519Z";
        <https://schema.org/description>
                "Nieuwe testlijst"@nl;
        <https://schema.org/name>     "Testlijst Stijn 15/10"@nl;
        ns:uitsluitenVanPublicatie    false .
  1. At line 3657 of the provided logs
<https://westtoer.be/id/productlist/b4215ccb-a14e-45b0-9956-9449607412fa>
        rdf:type                      dcmitype:Collection;
        prov:generatedAtTime          "2024-10-17T10:44:13.6224407Z"^^<http://www.w3.org/2001/XMLSchema#dateTime>;
        generiek:lokaleIdentificator  "b4215ccb-a14e-45b0-9956-9449607412fa"^^<http://www.w3.org/2000/01/rdf-schema#string>;
        generiek:naamruimte           "https://westtoer.be/id/productlist"^^<http://www.w3.org/2000/01/rdf-schema#string>;
        generiek:versieIdentificator  "2024-10-17T10:44:13.6224407Z";
        <https://schema.org/description>
                "Nieuwe testlijst"@nl;
        <https://schema.org/name>     "Testlijst Stijn 15/10"@nl;
        ns:uitsluitenVanPublicatie    false .

@StijnDenisSirus
Copy link
Author

StijnDenisSirus commented Nov 7, 2024

Hi @jobulcke

Thank you for looking into this. When using the simple client-pipeline you provided, I obtained the same results, with those exact state objects passing through the pipeline in the correct order (see lines 1384 and 3733 in the attached logs).

orchestrator:
  pipelines:
    - name: client-pipeline
      input:
        name: Ldio:LdesClient
        config:
          materialisation.enabled: true
          urls: https://ca-westtoer-ldes.bluesea-b3dcdb70.westeurope.azurecontainerapps.io/touristattractions/latestView?pageNumber=452
      outputs:
        - name: Ldio:ConsoleOut

When looking at the differences between this simple client-pipeline and the actual pipeline we use in our application, my colleague and I discovered after some testing that the issue appears to be related to the keep-state property; more specifically the sqlite persistence strategy. When using the following client, I can replicate the original issue where multiple state objects pass through the pipeline and the last one shown in the logs has a generatedAtTime of "2024-10-16T09:47:34.6835981Z" - the version that is listed first on the page (See lines 1456, 1512, 3985, 7431, 7557 and finally 8770 in the attached logs for a keep-state sqlite run).

orchestrator:
  pipelines:
    - name: client-pipeline
      input:
        name: Ldio:LdesClient
        config:
          materialisation.enabled: true
          urls: https://ca-westtoer-ldes.bluesea-b3dcdb70.westeurope.azurecontainerapps.io/touristattractions/latestView?pageNumber=452
          keep-state: true
          state: sqlite
      outputs:
        - name: Ldio:ConsoleOut

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs triage Issue needs to be evaluated by team
Projects
Status: 📋 Backlog
Development

No branches or pull requests

2 participants