From ea36f1d7d380d0269183307edb92c92a8c2ea98d Mon Sep 17 00:00:00 2001 From: jakedoublev Date: Wed, 18 Sep 2024 13:38:08 -0700 Subject: [PATCH 1/3] feat(docs): add policy ADR for LISTlimit and pagination --- .../policy/adr/0002-pagination-list-rpcs.md | 164 ++++++++++++++++++ 1 file changed, 164 insertions(+) create mode 100644 service/policy/adr/0002-pagination-list-rpcs.md diff --git a/service/policy/adr/0002-pagination-list-rpcs.md b/service/policy/adr/0002-pagination-list-rpcs.md new file mode 100644 index 000000000..50032cf1c --- /dev/null +++ b/service/policy/adr/0002-pagination-list-rpcs.md @@ -0,0 +1,164 @@ +# Pagination in policy LIST RPCs + +## Table of Contents +- [Background](#background) +- [Chosen Option](#chosen-option) +- [Considered Options](#considered-options) + - [LIMIT + OFFSET](#limit--offset) + - [Keyset Pagination](#keyset-pagination) + - [Cursor Pagination](#cursor-pagination) + +## Background + +At present, policy LIST RPCs are completely open-ended. + +Attribute Namespaces, Definitions, and Values LIST calls may be filtered by _active_ state. + +All Policy objects may be retrieved without quantity limits. This presents a challenge at scale if there +are a very large number of any policy object in the platform database when responses become overwhelmingly +large. + +Introduction of a `limit` on retrieved items in LIST procedure call responses necessitates the simultaneous introduction of +pagination. This ADR clarifies the unified approach we will take within policy service LIST RPCs +and at the database level for this pagination. + +## Chosen Option + +[LIMIT + OFFSET](#limit--offset) + +Because we do not know the likelihood of platforms running with Policy where any individual object has +enough rows to experience the at-scale performance concerns of `offset` pagination, we will prefer +this simple implementation for now and leave the door open for cursor-based pagination to solve the performance +constraint should it be a realized problem in the future. + +## Considered Options + +### LIMIT + OFFSET + +The simplest approach is a simple update to the proto for LIST RPCs and db queries to take in `limit` and `offset` with default values. + +```proto +message ListRequest { + // ...existing fields omitted + int32 limit = 3; // default depends on type of policy object + int32 offset = 4; // default: 0 +} +``` + +```sql +-- subject-mappings example request: +-- 'limit' 100 +-- 'offset' 100 +SELECT * FROM opentdf_policy.subject_mappings +ORDER BY created_at +LIMIT 100 OFFSET 100 +``` + +#### Pros & Cons + +- :green_circle: Simple - support across any SQL database (just slightly different syntax) +- :green_circle: Stateless - each request can independently paginate by specifying LIMIT / OFFSET +- :green_circle: Flexibile - random-access pagination supported +- :green_circle: Familiar - standard across LIST-type APIs +- :yellow_circle: Create/Update/Delete of data between requests may throw off pages, but this is a relatively small concern when reads are exponentially more frequent than writes in Policy +- :red_circle: Performance: large number of objects _or_ a high offset mean a lot of rows need to be scanned and discarded (skipped). However, (:yellow_circle:) we do not know how often the scale of policy objects will be large enough for this to be a problem + +> [!NOTE] +> Pagination is roughly Big O(n) time complexity as offset increases + +### Keyset Pagination + +We would index a column (the most obvious would be `created_at`) to use as the pagination key for +querying, and facilitate pagination before/after any arbitrary timestamp. + +```proto +message ListRequest { + // ...existing fields omitted + int32 limit = 3; // default depends on type of policy object + google.protobuf.Timestamp after = 4; // default: start_of_time +} +``` + +```sql +-- subject-mappings example request: +-- 'after' 2023-01-01 +-- 'limit' 100 +SELECT * FROM opentdf_policy.subject_mappings +WHERE created_at > '2023-01-01' ORDER BY created_at LIMIT 100; +``` + +#### Pros & Cons + +- :green_circle: Support - supported across any SQL database (just slightly different syntax) +- :green_circle: Speed - much faster in deep pages than OFFSET due to reduced scan row count +- :yellow_circle: Reliability - provisioned policy may contain the same `created_at` timestamp +- :red_circle: Flexibility - pagination is only forward of the `created_at` timestamp +- :red_circle: Complexity - client must maintain state since response timestamps are required to drive subsequent request timestamp pagination, and pagination backwards is not supported +- :red_circle: Complexity - reliance on timestamps introduces timezone differential confusion unless a parameter is also employed to localize the query + +### Cursor Pagination + +We would index a column (the most obvious would be `created_at`) to use as the pagination key for +querying, but we would utilize an encoded cursor approach. + +```proto +message ListRequest { + // ...existing fields omitted + int32 limit = 3; // default depends on type of policy object + string cursor = 4; // defaulted in API layer to cursor for encoded start_of_time +} + +message ListResponse { + // ...existing fields and response data ommitted + // cursors are encoded by the server as base64'd 'created_at' timestamps + string previous_cursor = 4; + string next_cursor = 4; +} +``` + +```sql +-- subject-mappings example, request: +-- 'after_cursor' 2023-01-01 00:00:00.000000+00 +-- 'limit' 100 +WITH Data AS ( + SELECT * + FROM opentdf_policy.subject_mappings + WHERE created_at >= '2023-01-01 00:00:00.000000+00' + ORDER BY created_at + LIMIT 101 +), +NextPage AS ( + SELECT * + FROM Data + ORDER BY created_at + LIMIT 100 +), +PreviousPage AS ( + SELECT * + FROM opentdf_policy.subject_mappings + WHERE created_at < (SELECT MIN(created_at) FROM Data) + ORDER BY created_at DESC + LIMIT 101 +), +CursorData AS ( + SELECT + (SELECT MIN(created_at) FROM Data) AS first_item_created_at, + (SELECT MAX(created_at) FROM NextPage) AS next_cursor_created_at, + (SELECT MIN(created_at) FROM PreviousPage) AS previous_cursor_created_at +) +SELECT + (SELECT json_agg(row_to_json(NextPage)) FROM NextPage) AS data, + (SELECT json_build_object('created_at', next_cursor_created_at) FROM CursorData) AS next_cursor, + (SELECT json_build_object('created_at', previous_cursor_created_at) FROM CursorData) AS previous_cursor +FROM CursorData; +``` + +#### Pros & Cons + +- :green_circle: Support - supported across any SQL database (just different syntax) +- :green_circle: Speed - much faster in deep pages than OFFSET due to reduced scan row count +- :green_circle: Flexibility - pagination _a single page_ backward made possible by response `previous_cursor` value +- :green_circle: Complexity - timestamp timezone differential is not a problem as cursors are server-determined and an API concern +- :yellow_circle:/:red_circle: Reliability - provisioned policy will sometimes contain the same `created_at` timestamp, making it less than 100% reliable +- :red_circle: Complexity - SQL queries become significantly more complex to build and read into responses +- :red_circle: Flexibility - random access is still not supported without client state management and prior knowledge of forward pagination's historical cursors From dce6f8be1766a7a189ac7935024073f078a222af Mon Sep 17 00:00:00 2001 From: jakedoublev Date: Wed, 18 Sep 2024 13:47:54 -0700 Subject: [PATCH 2/3] improvements --- service/policy/adr/0002-pagination-list-rpcs.md | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/service/policy/adr/0002-pagination-list-rpcs.md b/service/policy/adr/0002-pagination-list-rpcs.md index 50032cf1c..189c6329d 100644 --- a/service/policy/adr/0002-pagination-list-rpcs.md +++ b/service/policy/adr/0002-pagination-list-rpcs.md @@ -1,12 +1,13 @@ # Pagination in policy LIST RPCs ## Table of Contents + - [Background](#background) - [Chosen Option](#chosen-option) - [Considered Options](#considered-options) - - [LIMIT + OFFSET](#limit--offset) - - [Keyset Pagination](#keyset-pagination) - - [Cursor Pagination](#cursor-pagination) + - [LIMIT + OFFSET](#limit--offset) + - [Keyset Pagination](#keyset-pagination) + - [Cursor Pagination](#cursor-pagination) ## Background @@ -43,6 +44,10 @@ message ListRequest { int32 limit = 3; // default depends on type of policy object int32 offset = 4; // default: 0 } +message ListResponse { + // ...existing fields omitted + int32 total = 5; // indication of total available for pagination +} ``` ```sql @@ -76,6 +81,11 @@ message ListRequest { // ...existing fields omitted int32 limit = 3; // default depends on type of policy object google.protobuf.Timestamp after = 4; // default: start_of_time + int32 total = 5; // indication of total that can be paginated through +} +message ListResponse { + // ...existing fields omitted + int32 total = 5; // indication of total available for pagination } ``` @@ -113,6 +123,7 @@ message ListResponse { // cursors are encoded by the server as base64'd 'created_at' timestamps string previous_cursor = 4; string next_cursor = 4; + int32 total = 5; // indication of total available for pagination } ``` From db84dc38e032b06f3385c1e292f7cff448570125 Mon Sep 17 00:00:00 2001 From: Jake Van Vorhis <83739412+jakedoublev@users.noreply.github.com> Date: Tue, 15 Oct 2024 07:52:36 -0700 Subject: [PATCH 3/3] Update service/policy/adr/0002-pagination-list-rpcs.md --- service/policy/adr/0002-pagination-list-rpcs.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/service/policy/adr/0002-pagination-list-rpcs.md b/service/policy/adr/0002-pagination-list-rpcs.md index 189c6329d..b26a7b0b0 100644 --- a/service/policy/adr/0002-pagination-list-rpcs.md +++ b/service/policy/adr/0002-pagination-list-rpcs.md @@ -171,5 +171,7 @@ FROM CursorData; - :green_circle: Flexibility - pagination _a single page_ backward made possible by response `previous_cursor` value - :green_circle: Complexity - timestamp timezone differential is not a problem as cursors are server-determined and an API concern - :yellow_circle:/:red_circle: Reliability - provisioned policy will sometimes contain the same `created_at` timestamp, making it less than 100% reliable +- :red_circle: New index on the `created_at` timestamp required which adds overhead but little value for management with +time pretty much irrelevant to attributes except if required for sorting - :red_circle: Complexity - SQL queries become significantly more complex to build and read into responses - :red_circle: Flexibility - random access is still not supported without client state management and prior knowledge of forward pagination's historical cursors