-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-generate dataloaders from sqlc queries #1233
Conversation
db/gen/coredb/manifest.json
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a gigantic(5MB) generated file that will change every time we add new sqlc queries. We could add it to .gitignore, or we could just commit it and not pay attention to it. I think both approaches are fine! I kept it around for the time being.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so cool, I'm excited to see this in action and start using it. Also appreciate all the comments and documentation, it made it a lot easier to follow 👍
``` | ||
See **[Configuring Dataloaders](#configuring-dataloaders)** for a full list of available options. | ||
|
||
Generated dataloaders are aware of `sqlc.embed` syntax, which can be used to return multiple generated types from a single query (e.g. a `coredb.Token` and a `coredb.Contract`). Each embedded type will be sent to dataloaders that can cache objects of that type (e.g. the `coredb.Token` in the example above will be sent to dataloaders that can cache `coredb.Token` results). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so cool
|
||
// Prevent lock contention within a batch by allowing only the first maxBatchSize callers | ||
// to obtain the lock. | ||
numAssigned := atomic.AddInt32(&b.numAssigned, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I got a bit confused with the name numAssigned
, since it refers to the number of callers so far in the batch, maybe something like num callers, current caller count, caller slot, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call! This is the last thing I added, and I kind of threw it in there haphazardly. I'll call it numCallers
!
|
||
func NewDataloader[TKey comparable, TResult any](ctx context.Context, maxBatchSize int, batchTimeout time.Duration, cacheResults bool, publishResults bool, | ||
fetchFunc func(context.Context, []TKey) ([]TResult, []error)) *Dataloader[TKey, TResult] { | ||
return newDataloader(ctx, maxBatchSize, batchTimeout, cacheResults, publishResults, fetchFunc, indexOf[TKey]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could searching through keys linearly become an issue, or in practice, the batch size is never very large?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the linear search should be okay in practice. The existing dataloaders do it, and I've never noticed a bottleneck there. I was debating whether we should create a map per batch to make these lookups faster, but I'm honestly not sure if speed would improve, and memory usage would definitely go up a bit.
return d | ||
} | ||
|
||
func loadCountAdmiresByFeedEventIDBatch(q *coredb.Queries) func(context.Context, []persist.DBID) ([]int64, []error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this is unreal!!
graphql/dataloader/api_gen.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 🔥 🔥
The README included in this PR serves as a decent PR description and is pasted below. Also worth noting: this PR introduces the sqlc code generator, but it doesn't replace our existing dataloaders with the new generated code. I'll handle that in a follow-up PR.
Dataloader Generator
Automatically generates dataloaders based on sqlc queries
Requirements
sqlc.yaml
must be set up to use sqlc'ssqlc-gen-json
example plugin to generate a JSON manifest file with information about generated queriesQuickstart
From the go-gallery root directory, run:
Overview
This tool will read the manifest created by
sqlc-gen-json
and use thego/types
package to figure out which SQL statements can be turned into dataloaders.:batchone
and:batchmany
statements will create dataloaders:batch
syntax. See Custom Batching.A dataloader can receive and cache results from other dataloaders. This happens automatically for dataloaders that appear to look up objects by their IDs, and can be set up for other dataloaders with minimal effort. See Caching Results.
Configuration options for individual dataloaders can be set with a
-- dataloader-config:
comment in the sqlc queries file. For example:See Configuring Dataloaders for a full list of available options.
Generated dataloaders are aware of
sqlc.embed
syntax, which can be used to return multiple generated types from a single query (e.g. acoredb.Token
and acoredb.Contract
). Each embedded type will be sent to dataloaders that can cache objects of that type (e.g. thecoredb.Token
in the example above will be sent to dataloaders that can cachecoredb.Token
results).It's possible for
sqlc
to generate parameter types that go doesn't considercomparable
. For example, a query might accept a list of Chains as a parameter, but a go struct with a slice field (e.g.chains []Chain
) is not comparable. Generated dataloaders support these non-comparable keys by converting them to JSON internally, and using their JSON strings as comparable cache keys.Running
make sqlc-generate
creates three files:dataloaders_gen.go
andapi_gen.go
manifest.json
is the JSON manifest generated by thesqlc-gen-json
plugindataloaders_gen.go
contains definitions for all the generated dataloadersapi_gen.go
contains aLoaders
struct with fields for all the generated dataloaders, and sets up connections between them to cache results from one dataloader in anotherCaching Results
Dataloaders will attempt to publish their results for other dataloaders to cache. A dataloader can opt in for caching by implementing one of these interfaces (where
TKey
andTResult
are the key and result types of the dataloader itself):If a sqlc query appears to look up an object by its ID, the generated dataloader will automatically implement
autoCacheWithKey
for that object type. This happens if the dataloader has:persist.DBID
key type, andcoredb.Xyz
) with apersist.DBID
field namedID
Because ID-based lookups are the most common caching need, it's rare to need to implement one of the autoCache interfaces manually. If the need arises, add an entry to
autocache.go
.Configuring Dataloaders
Configuration options for individual dataloaders can be set with a
-- dataloader-config:
comment in the sqlc queries file. For example:Available options:
Custom Batching
The easiest and most common way to generate dataloaders is to use sqlc's
:batch
syntax, which uses the Postgres batching API to send many queries to the database in a single round trip. The batching API reduces round trip overhead, but it still executes one SQL query for each provided key. In some performance-critical circumstances (e.g. routinely looking up thousands of objects by their IDs), it's better to perform a single query that returns an entire batch of results.A dataloader will be generated for SQL statements that don't use sqlc's
:batch
syntax, if::many
keywordint
column namedbatch_key_index
batch_key_index
should be a 1-based index that maps keys to results, and is typically created via thegenerate_subscripts
function. For example, to create a dataloader that looks up contracts by their IDs:This example is a good template for looking up objects by IDs via custom batching, and can be reused for other types.
Note: because the SQL query above does not have a
persist.DBID
key type (it uses a[]varchar
), the generated dataloader will not automatically implementautoCacheWithKey
for the result type.autoCacheWithKey
will need to be implemented manually.