Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up entities query for dataset CSV export and entity OData #1271

Merged
merged 3 commits into from
Nov 5, 2024

Conversation

ktuite
Copy link
Member

@ktuite ktuite commented Nov 4, 2024

We (myself, QA, LN, etc.) have been noticing lately that entity lists, both in frontend and in forms in enketo, have been taking an uncomfortably long time to load.

Notes:

  • 1.6M entities on staging
  • 240K entities on QA
  • 4M entities on Kathleen's local dev database (also experiencing multi-second loading times)

This PR removes a piece of the bulk entity query that computed the # of updates per entity. I did some historical sleuthing to figure out why we return updates count as well as version when updates always seems like it will be version - 1.

  • May 2023: updates count was added along with updatedAt, which could be selected and filtered on in odata.
    • we didn't have version yet
    • we allowed API updates to entities but not updates via submission
  • Sept 2023: entity def versions added to support entity updates via submission
    • did we hold off on versions because we weren't sure how the versioning system was going to work? probably...
    • there's a version for each entity def within an entity, so it's more informative than a simple updates count

Anyway, updates is baked into a lot of code, e.g. used by frontend, probably expected by people working with entities via odata, and we can't/shouldn't remove it. However, we always increment the version of the def by 1 when we update it, both through an API PATCH request and through a submission, so it's a good proxy for counting updates. I tried to make minimal code changes to remove the slow part of the query and calculate updates directly from version.

What has been done to verify that this works as intended?

Tests have mostly stayed the same (except for some minor changes to some unit tests) and I've seen queries get much faster locally.

Why is this the best possible solution? Were any other approaches considered?

See explanation above.

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

Shouldn't change things for users except speed things up.

Does this change require updates to the API documentation? If so, please update docs/api.yaml as part of this PR.

No.

Before submitting this PR, please make sure you have:

  • run make test and confirmed all checks still pass OR confirm CircleCI build passes
  • verified that any code from external sources are properly credited in comments or that everything is internally sourced

@ktuite ktuite added the entities Multiple Encounter workflows label Nov 4, 2024
@ktuite ktuite self-assigned this Nov 4, 2024
@ktuite ktuite added the needs testing Needs manual testing label Nov 4, 2024
lib/data/entity.js Outdated Show resolved Hide resolved
lib/model/frames/entity.js Outdated Show resolved Hide resolved
@@ -744,16 +744,12 @@ resolveConflict.audit = (entity, dataset) => (log) => log('entity.update.resolve
////////////////////////////////////////////////////////////////////////////////
// SERVING ENTITIES

const _exportUnjoiner = unjoiner(Entity, Entity.Def, Entity.Extended.into('stats'), Actor.alias('actors', 'creator'));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the Entity.Extended frame altogether in lib/model/frames/entity.js? I think we should at least modify it to remove its updates field.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems to be the only place Entity.Extended was used, so I will remove it.

@ktuite ktuite merged commit 5949128 into master Nov 5, 2024
7 checks passed
@ktuite ktuite deleted the ktuite/entities_query branch November 5, 2024 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
entities Multiple Encounter workflows needs testing Needs manual testing
Projects
Status: ✅ done
Development

Successfully merging this pull request may close these issues.

2 participants