Speed up entities query for dataset CSV export and entity OData #1271

ktuite · 2024-11-04T22:45:32Z

We (myself, QA, LN, etc.) have been noticing lately that entity lists, both in frontend and in forms in enketo, have been taking an uncomfortably long time to load.

staging: dataset with 1 entity taking 10 seconds to load
staging (same dataset as above): form taking 10 seconds to load
staging dataset with 20 entities taking >10 seconds
staging dataset with 60K entities, loading 250 of them taking 10 seconds
QA: dataset 2 entities taking 1 second to load
QA: dataset with 800 entities and no updates taking ~1 sec to load
K dev: 5 entities (no updates) loading in 2 seconds
K dev: 5 entities with updates loading in 2 seconds
K dev: 1.6M entity dataset (no updates) loading in 10 seconds

Notes:

1.6M entities on staging
240K entities on QA
4M entities on Kathleen's local dev database (also experiencing multi-second loading times)

This PR removes a piece of the bulk entity query that computed the # of updates per entity. I did some historical sleuthing to figure out why we return updates count as well as version when updates always seems like it will be version - 1.

May 2023: updates count was added along with updatedAt, which could be selected and filtered on in odata.
- we didn't have version yet
- we allowed API updates to entities but not updates via submission
Sept 2023: entity def versions added to support entity updates via submission
- did we hold off on versions because we weren't sure how the versioning system was going to work? probably...
- there's a version for each entity def within an entity, so it's more informative than a simple updates count

Anyway, updates is baked into a lot of code, e.g. used by frontend, probably expected by people working with entities via odata, and we can't/shouldn't remove it. However, we always increment the version of the def by 1 when we update it, both through an API PATCH request and through a submission, so it's a good proxy for counting updates. I tried to make minimal code changes to remove the slow part of the query and calculate updates directly from version.

What has been done to verify that this works as intended?

Tests have mostly stayed the same (except for some minor changes to some unit tests) and I've seen queries get much faster locally.

Why is this the best possible solution? Were any other approaches considered?

See explanation above.

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

Shouldn't change things for users except speed things up.

Does this change require updates to the API documentation? If so, please update docs/api.yaml as part of this PR.

No.

Before submitting this PR, please make sure you have:

run make test and confirmed all checks still pass OR confirm CircleCI build passes
verified that any code from external sources are properly credited in comments or that everything is internally sourced

lib/data/entity.js

lib/model/frames/entity.js

matthew-white · 2024-11-04T23:46:10Z

lib/model/query/entities.js

@@ -744,16 +744,12 @@ resolveConflict.audit = (entity, dataset) => (log) => log('entity.update.resolve
 ////////////////////////////////////////////////////////////////////////////////
 // SERVING ENTITIES

-const _exportUnjoiner = unjoiner(Entity, Entity.Def, Entity.Extended.into('stats'), Actor.alias('actors', 'creator'));


Can we remove the Entity.Extended frame altogether in lib/model/frames/entity.js? I think we should at least modify it to remove its updates field.

That seems to be the only place Entity.Extended was used, so I will remove it.

ktuite added 2 commits November 4, 2024 13:43

Speeding up entities query by removing updates count subquery

a85968a

Change entity data structure in unit test

522f524

ktuite added the entities Multiple Encounter workflows label Nov 4, 2024

ktuite self-assigned this Nov 4, 2024

ktuite requested review from sadiqkhoja and matthew-white November 4, 2024 22:47

ktuite added the needs testing Needs manual testing label Nov 4, 2024

matthew-white approved these changes Nov 4, 2024

View reviewed changes

small changes in response to code review

9bb028d

ktuite merged commit 5949128 into master Nov 5, 2024
7 checks passed

ktuite deleted the ktuite/entities_query branch November 5, 2024 19:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up entities query for dataset CSV export and entity OData #1271

Speed up entities query for dataset CSV export and entity OData #1271

ktuite commented Nov 4, 2024

matthew-white Nov 4, 2024

ktuite Nov 5, 2024

Speed up entities query for dataset CSV export and entity OData #1271

Speed up entities query for dataset CSV export and entity OData #1271

Conversation

ktuite commented Nov 4, 2024

What has been done to verify that this works as intended?

Why is this the best possible solution? Were any other approaches considered?

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

Does this change require updates to the API documentation? If so, please update docs/api.yaml as part of this PR.

Before submitting this PR, please make sure you have:

matthew-white Nov 4, 2024

Choose a reason for hiding this comment

ktuite Nov 5, 2024

Choose a reason for hiding this comment