-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-generate dataloaders from sqlc queries #1233
Merged
Merged
Changes from 8 commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
1190b31
The rest of the owl
radazen 5b6f669
Merge remote-tracking branch 'origin/main' into ezra/dataloaders
radazen 01f5909
Replace old dataloaders with new ones
radazen a59b27c
Better lock contention handling within batches
radazen 6d1d44e
Add a comment
radazen bb5936f
Better handling for media lookups
radazen 6681569
MediaByTokenID -> MediaByMediaID
radazen e2e3ae9
More TokenID -> TokenMediaID updates
radazen 4db784a
Better not found error handling, v1
radazen a5de6fd
Rename numAssigned to numCallers
radazen 34f5301
Add getNotFoundError implementations for existing pgx.ErrNoRows cases
radazen 5a810b1
Update the README with docs for pgx.ErrNoRows
radazen 56f5f23
Merge remote-tracking branch 'origin/main' into ezra/dataloaders
radazen c10e2fd
make sqlc-generate after merging main
radazen 7f0be23
Fix tests
radazen File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
## Dataloader Generator | ||
_Automatically generates dataloaders based on sqlc queries_ | ||
___ | ||
### Requirements | ||
`sqlc.yaml` must be set up to use sqlc's `sqlc-gen-json` example plugin to generate a JSON manifest file with information about generated queries | ||
|
||
### Quickstart | ||
From the go-gallery root directory, run: | ||
```bash | ||
make sqlc-generate | ||
``` | ||
|
||
### Overview | ||
This tool will read the manifest created by `sqlc-gen-json` and use the `go/types` package to figure out which SQL statements can be turned into dataloaders. | ||
- By default, all `:batchone` and `:batchmany` statements will create dataloaders | ||
- Dataloaders can also be generated for SQL queries that don't use sqlc's `:batch` syntax. See **[Custom Batching](#custom-batching)**. | ||
|
||
A dataloader can receive and cache results from other dataloaders. This happens automatically for dataloaders that appear to look up objects by their IDs, and can be set up for other dataloaders with minimal effort. See **[Caching Results](#caching-results)**. | ||
|
||
Configuration options for individual dataloaders can be set with a `-- dataloader-config:` comment in the sqlc queries file. For example: | ||
``` | ||
-- name: GetUserByID :batchone | ||
-- dataloader-config: maxBatchSize=10 batchTimeout=2ms publishResults=false | ||
``` | ||
See **[Configuring Dataloaders](#configuring-dataloaders)** for a full list of available options. | ||
|
||
Generated dataloaders are aware of `sqlc.embed` syntax, which can be used to return multiple generated types from a single query (e.g. a `coredb.Token` and a `coredb.Contract`). Each embedded type will be sent to dataloaders that can cache objects of that type (e.g. the `coredb.Token` in the example above will be sent to dataloaders that can cache `coredb.Token` results). | ||
|
||
It's possible for `sqlc` to generate parameter types that go doesn't consider `comparable`. For example, a query might accept a list of Chains as a parameter, but a go struct with a slice field (e.g. `chains []Chain`) is not comparable. Generated dataloaders support these non-comparable keys by converting them to JSON internally, and using their JSON strings as comparable cache keys. | ||
|
||
Running `make sqlc-generate` creates three files: `dataloaders_gen.go` and `api_gen.go` | ||
- `manifest.json` is the JSON manifest generated by the `sqlc-gen-json` plugin | ||
- `dataloaders_gen.go` contains definitions for all the generated dataloaders | ||
- `api_gen.go` contains a `Loaders` struct with fields for all the generated dataloaders, and sets up connections between them to cache results from one dataloader in another | ||
|
||
### Caching Results | ||
Dataloaders will attempt to publish their results for other dataloaders to cache. A dataloader can opt in for caching by implementing one of these interfaces (where `TKey` and `TResult` are the key and result types of the dataloader itself): | ||
|
||
```go | ||
// Given a TResult to cache, return the TKey value to use as its cache key | ||
type autoCacheWithKey[TKey any, TResult any] interface { | ||
getKeyForResult(TResult) TKey | ||
} | ||
|
||
// Given a TResult to cache, return multiple TKey values to use as cache keys. | ||
// The TResult value will be cached once for each provided cache key. | ||
// Useful for things like GetGalleryByCollectionID, where the same Gallery result | ||
// should be cached with each of its child collection IDs as keys. | ||
type autoCacheWithKeys[TKey any, TResult any] interface { | ||
getKeysForResult(TResult) []TKey | ||
} | ||
``` | ||
|
||
If a sqlc query appears to look up an object by its ID, the generated dataloader will automatically implement `autoCacheWithKey` for that object type. This happens if the dataloader has: | ||
- a `persist.DBID` key type, and | ||
- a sqlc-generated result type (e.g. a `coredb.Xyz`) with a `persist.DBID` field named `ID` | ||
|
||
Because ID-based lookups are the most common caching need, it's rare to need to implement one of the autoCache interfaces manually. If the need arises, add an entry to `autocache.go`. | ||
|
||
### Configuring Dataloaders | ||
Configuration options for individual dataloaders can be set with a `-- dataloader-config:` comment in the sqlc queries file. For example: | ||
``` | ||
-- name: GetUserByID :batchone | ||
-- dataloader-config: maxBatchSize=10 batchTimeout=2ms publishResults=false | ||
``` | ||
|
||
Available options: | ||
- **maxBatchSize**: the maximum number of keys to fetch in a single batched query. Defaults to 100. | ||
- **batchTimeout**: the duration to wait before sending a batch (unless it reaches maxBatchSize first, at which point it will be sent immediately). Defaults to 2ms. | ||
- **publishResults**: whether to publish results for other dataloaders to cache. Defaults to true. | ||
- **skip**: whether to skip generating a dataloader for this query. Defaults to false. | ||
|
||
### Custom Batching | ||
The easiest and most common way to generate dataloaders is to use sqlc's `:batch` syntax, which uses the Postgres batching API to send many queries to the database in a single round trip. The batching API reduces round trip overhead, but it still executes one SQL query for each provided key. In some performance-critical circumstances (e.g. routinely looking up thousands of objects by their IDs), it's better to perform a single query that returns an entire batch of results. | ||
|
||
A dataloader will be generated for SQL statements that don't use sqlc's `:batch` syntax, if: | ||
- the query uses the sqlc `:many` keyword | ||
- the query returns an `int` column named `batch_key_index` | ||
|
||
`batch_key_index` should be a 1-based index that maps keys to results, and is typically created via the `generate_subscripts` function. For example, to create a dataloader that looks up contracts by their IDs: | ||
|
||
```sql | ||
with keys as ( | ||
select unnest (@contract_ids::varchar[]) as id | ||
, generate_subscripts(@contract_ids::varchar[], 1) as batch_key_index | ||
) | ||
select k.batch_key_index, sqlc.embed(c) from keys k | ||
join contracts c on c.id = k.id | ||
where not c.deleted; | ||
``` | ||
|
||
This example is a good template for looking up objects by IDs via custom batching, and can be reused for other types. | ||
|
||
**Note**: because the SQL query above does not have a `persist.DBID` key type (it uses a `[]varchar`), the generated dataloader will not automatically implement `autoCacheWithKey` for the result type. `autoCacheWithKey` will need to be implemented manually. |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so cool