Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Cosmos] Add APIs to perform single-partition queries against a container #1814

Merged
merged 17 commits into from
Sep 26, 2024

Conversation

analogrelay
Copy link
Member

@analogrelay analogrelay commented Sep 19, 2024

Closes #1809

This adds the initial rudimentary APIs to perform a single-partition query against items in a container.

Some example code (see the cosmos_query example for a larger example):

let cosmos_client = CosmosClient::new(...);
let db_client = client.database_client("SampleDB");
let container_client = db_client.container_client("SampleContainer");
let mut items_pager =
    container_client.query_items::<serde_json::Value>("SELECT * FROM c", "some_partition_key", None)?;

// ".next()" requires importing futures::StreamExt
while let Some(page) = items_pager.next().await {
    for item in page?.items {
        println!("    * {:?}", item);
    }
}

A few pieces of note:

  1. A query can be parameterized. To do that, instead of passing the raw string in, you'd call Query::from("...") and then call .with_parameter("@name", "value") on the resulting Query to build up the query and parameters

  2. We only support single-partition queries. This is aligned with the support we currently provide in other SDKs that only support Gateway mode. Over time we can expand this to cross-partition queries, but there are significant limitations to those.

  3. For serializing Partition Keys we have to do something a little custom and can't just use serde_json as-is. The partition key for a query is encoded as an HTTP header, which means we have to escape any non-ASCII characters because HTTP headers may not contain non-ASCII characters. Serde's default JSON serialization does not perform that escaping, it allows non-ASCII UTF-8 characters to pass through unescaped. Even Rust's built-in escaping is unsuitable because it uses the format \u{XXXX}, when JSON requires \uXXXX (sans-{}). So we do some custom stuff. Importantly, we still use the standard library's logic to encode characters as UTF-16, which handles surrogate pairs properly.

  4. UPDATE: I reverted this change and decided to leave Pageable as-is I changed Pageable to include a lifetime parameter. Previously, it expected the provided closure to return a future with the 'static lifetime, which would require the Future to own all necessary data. However, the implementation I used for query_items required that the future borrowed self.pipeline, so it could use the same HTTP pipeline to make the requests. That isn't totally necessary. I could clone the pipeline and give the clone to the closure, but that seemed heavier to me, so I hesitated to do that. Though, after the conversations we've had about lifetimes, I'm waffling on that and considering rolling that change back and requiring that the Pageable hold its own HTTP pipeline and avoid being coupled to the lifetime of the client that created it. I'll continue to look into that.

There are also fairly substantial API docs in place. We don't have a great way to preview those docs (maybe we should? publish updated docs to an Azure Static Web App on PR build?) but you can run cargo doc --package azure_data_cosmos --open, if you're on a local machine or RDP, or use #1808, if on a codespace, to view them.

@analogrelay
Copy link
Member Author

analogrelay commented Sep 19, 2024

UPDATE: I decided to just leave things as-is in Pageable and do the extra clones for now


I'm waffling on that and considering rolling that change back and requiring that the Pageable hold its own HTTP pipeline and avoid being coupled to the lifetime of the client that created it. I'll continue to look into that.

I was just reminded why I went with the lifetime approach when I looked at this earlier in the week. The problem is that if we continue to require that Pageable hold only 'static lifetime values, we have to clone the pipeline in each invocation. So you end up having to write this:

let pipeline = self.pipeline.clone(); // A clone so we can move into the closure.
Pageable::new(move |continuation| {
  // Another clone so we can move in to the `async` block.
  let pipeline = pipeline.clone();
  async move {
    pipeline.send(...).await...
  }
});

Currently async closures are unstable, which forces us to use an async block inside a regular closure. I think that may be part of the requirement for the clone.

Another option might be to allow a Pageable to hold on to an azure_core::Pipeline itself (perhaps using an Option so that it's up to the caller if they want to do that). I don't mind cloning the pipeline to pass it in to the Pageable, but I'm concerned about doing that on every request.

Having said that, it looks like maybe that's what the existing unofficial SDK code does? Maybe I'm fussing about a non-issue?

Would be interested to hear any thoughts you have here @heaths. Putting a lifetime parameter on Pageable allows it to borrow the client itself, in effect you'd write client methods like this:

fn query_items<'a>(&'a self, query: &str) -> Pageable<'a, ...>;

In practice, the lifetimes are all elided, so you don't see them. The main impacts of putting a lifetime parameter on Pageable that I can see are:

  1. A Pageable cannot outlive the client that generated it. But I can't really think of a viable scenario where you'd need it to...
  2. Writing out the Pageable type can sometimes be more verbose (requires lifetime parameters), though as I said above, lifetime elision often has you covered there.

@heaths
Copy link
Member

heaths commented Sep 24, 2024

Cloning the pipeline is designed to be cheap. It's a Vec<Arc<dyn Policy>>. For this reason, we already have a guideline about cloning it for subclients. I don't see why pageables - subclients, really - shouldn't do the same. Introducing lifetimes isn't out of question, but can introduce problems for callers. What if they want to return a pageable to complete later? It's infectious.

For now until we get substantial feedback otherwise, I think we should just clone the pipeline.

@heaths
Copy link
Member

heaths commented Sep 24, 2024

For serializing Partition Keys we have to do something a little custom and can't just use serde_json as-is. The partition key for a query is encoded as an HTTP header, which means we have to escape any non-ASCII characters because HTTP headers may not contain non-ASCII characters. Serde's default JSON serialization does not perform that escaping, it allows non-ASCII UTF-8 characters to pass through unescaped. Even Rust's built-in escaping is unsuitable because it uses the format \u{XXXX}, when JSON requires \uXXXX (sans-{}). So we do some custom stuff. Importantly, we still use the standard library's logic to encode characters as UTF-16, which handles surrogate pairs properly.

IIRC, we have some helper functions already that might be close or even do that. IF they don't, I suggest actually adding some globals. You definitely won't be the only ones writing "params" as headers. That's another reason why I'm pretty sure we already have serde de/serialization helpers for that.

Update: Never mind. I'm conflating https://github.com/Azure/azure-sdk-for-rust/blob/feature/track2/sdk/typespec/typespec_client_core/src/date/mod.rs but the idea could be the same, along with implementing AsHeader if appropriate.

All that said, I haven't reviewed the code just yet but going through your ample PR comments hereabove first.

Copy link
Member

@heaths heaths left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I love it! A few nits and questions, but nothing blocking.

@analogrelay
Copy link
Member Author

Update: Never mind. I'm conflating https://github.com/Azure/azure-sdk-for-rust/blob/feature/track2/sdk/typespec/typespec_client_core/src/date/mod.rs but the idea could be the same, along with implementing AsHeader if appropriate.

I initially started by implementing AsHeader, but AsHeader is infallible and serializing a partition key to a header is fallible (serializing to JSON can fail). I could refactor later, but this is largely internal code (converting PartitionKey to a header is internal, that is).

@analogrelay
Copy link
Member Author

The next iteration is live, I think I've covered all the feedback. I'm going to try and land this soon so that I can move on to item CRUD (#1811)

@analogrelay analogrelay merged commit d97b0d1 into feature/track2 Sep 26, 2024
43 checks passed
@analogrelay analogrelay deleted the ashleyst/cosmos-query-items branch September 26, 2024 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants