Implement Pagination Features for FHIRClient and Bundle Classes #174

LanaNYC · 2024-09-03T19:54:19Z

Overview

This pull request aims to address pagination functionalities as outlined in issue #172. The implementation includes the following key changes:

Refactoring FHIRSearch perform() and perform_resources with iterations.
Adding __iter__ method to the Bundle class to handle pagination.
Moving pagination logic to _utils.py for better modularity and reuse.
Adjusted tests to accommodate the new iterable-based approach.

mikix · 2024-09-04T18:21:28Z

I would greatly appreciate any guidance from the maintainers or community members on resolving the Superfluous entry error. This will allow me to complete the testing phase and ensure the robustness of the new pagination features.

I believe this is because of some bad indenting in fhirclient/models/bundle.py - def elementProperties lost its indenting and is now a toplevel function instead of a class method. Adding back the indenting seems to surface more reasonable errors.

Note that manually editing bundle.py like that is fraught - most model files like that will get overwritten the next time ./generate_models.sh is run. But that's fine! We have options:

Copy/paste a copy of bundle.py and put it in ./fhir-parser-resources/ and edit the generation machinery to copy it over (this has downsides of code-drift over time)
Make some of these changes in fhir-parser first, then update this repo to consume the changes (forward-port the changes)
Make the changes here, where it's easier/faster, then go back and back-port the changes to fhir-parser too (and then probably also update this repo on top of that, removing the custom changes to get fully in sync again)

I'm down for whatever - that last option is probably easiest for just getting something that works. And maybe the solution also depends on whether we think this feature "belongs" in fhir-parser. Just wanted to flag the close relationship these two repos have.

LanaNYC · 2024-09-09T13:38:51Z

@mikix Thank you. I'll have more time this week and hope to finish this PR.

mikix

Looks great! Thank you! Comments below

fhirclient/_utils.py

fhirclient/models/bundle.py

mikix · 2024-09-23T13:51:35Z

fhirclient/models/fhirsearch.py

+        if not first_bundle:
+            return iter([])
+        yield first_bundle
+
    def perform_resources(self, server):


nit: while we're here, maybe add -> list['Resource']: on this method, which should help differentiate how this is different than the _iter version.

(And probably add -> 'Bundle': to perform() too.)

mikix · 2024-09-23T13:55:04Z

fhirclient/models/fhirsearch.py

-
+
+    # Use forward references to avoid circular imports
+    def perform_iter(self, server) -> Iterator['Resource']:


The way this is written, it will iterate on BundleEntry objects, yeah? But we actually want Bundles.

Should this be re-written as return iter_pages(self.perform(server)) (and change the return hint to Iterator['Bundle']?

Yes, thank you for catching this. Will be fixed with next pushed code.

fhirclient/models/fhirsearch.py

mikix · 2024-09-23T14:01:47Z

fhirclient/models/fhirsearch.py

        bundle = self.perform(server)
        resources = []
        if bundle is not None and bundle.entry is not None:


nit: this code could just be return list(self.perform_resources_iter()) yeah? (just to reduce duplicated code paths, and to emphasize that the method is now just a "worse" version of the _iter one)

Actually, that change would be a behavior change - but I think an important one? Previously, this did no pagination. But that just means that we'd throw resources on the floor with no indication we had done so or way to work around that.

So I think switching this method to use pagination under the hood is a valuable fix.

Since you'd prefer to change a behavior of perfom_resources(), my current implementation of perform_iter() depend on perform(). I can refactor perform_iter() to remove this dependency while keeping both perform() and perfom_resources() with DeprecationWarning to maintain backward compatibility. Please, let me know what direction you'd prefer here.

Yeah it sounds reasonable to have the deprecation warning in both perform & perform_resources. Refactoring could maybe be as simple as moving the guts out to a helper method like _perform_perform that both call (don't name it that 😄)

mikix · 2024-09-23T14:13:10Z

fhirclient/models/fhirsearch.py

+        if not first_bundle or not first_bundle.entry:
+            return iter([])


This check isn't needed, is it? perform can't return None and the not .entry check is handled by __iter__.

This method could be written as the following, I think?

for bundle in self.perform_iter(server): for entry in bundle: yield entry.resource

mikix · 2024-09-23T14:17:20Z

tests/models/bundle_iterator_test.py

Thank you for adding these tests, love to see it ❤️

LanaNYC · 2024-09-25T15:22:40Z

@mikix Thank you for detailed feedback. I'll look at everything this weekend.

LanaNYC · 2024-10-07T15:34:37Z

Hi @mikix,

Sorry for the delay. I've got carried away by work and life. I'm ready to finish this PR now.

First of all, thank you for your feedback on the PRs! I really appreciate the thoughtful consideration about the iteration and pagination logic in Bundle.

Second, to make sure we're aligned before I proceed, here's my understanding of your comments and the next steps I should take:
Summary of your feedback:

Bundle should stay as an Iterator vs. Iterable;
Direct Iteration: You're okay with dropping that feature for now, especially since the primary iteration should handle pagination, which is already being covered in FHIRSearch.
It sounds like you are suggesting we drop the changes to fhir-parser for now and focus on keeping the iteration logic for entries within the scope of FHIRSearch or _util.py. This would simplify both PRs and reduce the complexity while still delivering value.

Subtasks:

Remove the direct iteration support in the Bundle class:

Revert the iteration logic added to the Bundle class to reduce complexity and unnecessary features.

Move iteration logic to _util.py or FHIRSearch:

Ensure that the logic for iterating over entries in Bundle is handled within _util.py or FHIRSearch, making it specific to where it is needed (e.g., handling pagination and entry iteration).

Close the open PR for fhir-parser:

Since the iteration changes in fhir-parser are no longer needed, I will close the PR for fhir-parser (Backport: Add Generator Methods to Bundle Class for Iteration fhir-parser#53) to simplify the contribution.

Update and test client-py:

After making the changes, update the client-py tests to ensure everything works as expected, especially focusing on the behavior of iter_pages.

Before I start on these changes, I’d like to confirm with you:

Does this approach sound good, or would you prefer I keep the iteration logic within the Bundle class in some capacity?
Should I proceed with closing the fhir-parser PR and refocusing everything on the client-py side?
Are there any other changes or considerations you’d like me to make before moving forward with the adjustments?

Once I hear back from you, I’ll begin working on these changes.

Thank you again for your guidance!

mikix · 2024-10-07T18:04:17Z

Welcome back to calmer times, @LanaNYC!

First off -- and I'm not saying you meant this -- but you would be entirely within your rights to be typing "thank you for your feedback on the PRs" a little sarcastically. Sorry for being so nit-picky, but I have a hard time turning that part of my brain off. 😄 And likewise, you're wise to get pre-approval in case I blow up your plan again. But hopefully the back and forth is worth it. 🤞

I think that plan looks good. 👍

I think leaving Bundle alone for now is a reasonable boundary (in hindsight).
So with this approach the only public entry point to iteration/pagination would be perform_iter and perform_resources_iter (and perform_resources behind the scenes)? That feels reasonable.

LanaNYC · 2024-10-07T22:06:43Z

@mikix Truly love your nit-picking, —please keep it up, especially with my PRs :) . It's the best way to grow and learn. Diving into fhir-parser has been a great learning experience for me, and I have no regrets.

Since you've pre-approved my plan, I’ll move forward with it. I’ll switch this PR from draft to 'Ready for review' once all the tests pass.

mikix · 2024-10-22T15:39:00Z

Oh I missed that this was marked ready-for-review! I'll review this again?

I also wanted to drop a line that I just switched the Pythons we run in CI (in #182), so you'll want to rebase so that the CI runs the now-required 3.13 tests.

LanaNYC · 2024-10-22T17:39:06Z

No problem. Rebase is done. All checks have passed. PR ready for a review. As discussed earlier, please feel free to dive into the details as much as you'd like - your thorough feedback is always appreciated!

mikix

Looks great! Got a few comments, but this is in really good shape I think!

mikix · 2024-10-22T18:25:34Z

fhir-parser

Looks like this submodule is still pointing at your old branch?

mikix · 2024-10-22T18:28:10Z

fhirclient/models/bundle.py

+
+    def __iter__(self):
+        return iter(self.entry or [])
+


I think we talked about removing this change too, yeah? It will be overridden the next time we regenerate models and it's not a huge help for folks anyway - they could just do self.entry or [] themselves.

mikix · 2024-10-22T18:30:01Z

fhirclient/_utils.py

+    next_link = _get_next_link(bundle)
+    if next_link:


nit: you absolutely don't need to use these, but python 3.8 introduced walrus operators, so this could be slightly condensed, if you liked:

if next_link := _get_next_link(bundle):

Beauty & readability is in the eye of the beholder, though.

mikix · 2024-10-22T18:33:26Z

fhirclient/_utils.py

+
+def _sanitize_next_link(next_link: str) -> str:
+    """
+    Sanitize the `next` link to ensure it is safe to use.


I asked you in this PR before, what the worries were with this - I forget the exact response - but could you add a sentence or two in this docstring about it, for posterity?

I've been testing _sanitize_next_link() more today and noticed a few potential issues. The initial idea behind sanitizing links was to implement a safeguard to protect against URL injection attacks by ensuring the link is safe to use. However, to maintain the library's flexibility, I can't simply rely on blacklisting or whitelisting parameters.

The current approach is just the first step toward safely parsing URLs, but it isn't sufficient for full validation. Users will still need to apply additional validation measures on their end.

On another note, do you think that _sanitize_next_link() is too rigid for an open-source library?
Am I overthinking it?

It might be unnecessary here, especially if it requires some extra client side work to be fully valid.

What are the risks / attack vectors to consider here? The EHR vendor you are connecting to would have to be malicious, yeah? (Like, this URL isn't coming from a human entering data at the hospital - the EHR software directly generates it as part of its search code.)

In which case, the malicious EHR doesn't need to send you to a separate server or mess with URLs. They already have you talking securely to them, and they can leak your credentials or feed you bogus data or whatever it is they want.

(I'm assuming that one would be connecting over https and other man-in-the-middle attacks are not in scope?)

If man-in-the-middle attacks are in scope - maybe it would be enough to simply validate that the hostnames (netlocs) match between origin_server and the new URL.

mikix · 2024-10-22T18:38:37Z

fhirclient/_utils.py

+
+    for link in bundle.link:
+        if link.relation == "next":
+            return link.url


I might sanitize the URL here?

If there's not a use case for getting the raw URL, we are just leaving a gap where a caller could introduce a bug by not knowing they need to sanitize it after getting it. I've seen that kind of "external context" in an API lead to bugs, once it's handed over to other devs who don't have that context in their heads.

mikix · 2024-10-22T18:55:08Z

fhirclient/_utils.py

+    return sanitized_url
+
+
+def _execute_pagination_request(sanitized_url: str) -> Optional['Bundle']:


I think the return type hint can just be 'Bundle' - this method will never return None (it may raise an exception... but otherwise will return a Bundle). (and can update the docstring below too)

mikix · 2024-10-22T18:56:47Z

fhirclient/_utils.py

+    except requests.exceptions.HTTPError as e:
+        # Handle specific HTTP errors as needed, possibly including retry logic
+        raise e


nit: small trick when you don't actually need the exception object is to just do a plain raise which re-raises the current exception.

except requests.exceptions.HTTPError: # In future: handle specific HTTP errors as needed, possibly including retry logic raise

mikix · 2024-10-22T19:06:26Z

fhirclient/models/fhirsearch.py

        res = server.request_json(self.construct())
-        bundle = bundle.Bundle(res)
+        bundle = bundle_module.Bundle(res)
        bundle.origin_server = server
        return bundle


This is not your doing, but this looks like a re-implementation of Bundle.read_from.

You might be able to reduce this whole method (post-deprecation-warning) to:

from .bundle import Bundle return Bundle.read_from(self.construct(), server)

(since it even checks if server is None for you)

mikix · 2024-10-22T19:09:10Z

tests/models/fhirsearch_perform_iter_test.py

+
+class TestFHIRSearchIter(unittest.TestCase):
+    def setUp(self):
+        # self.mock_server = MockServer(tmpdir=os.path.join(os.path.dirname(__file__), '..', 'data', 'examples'))


mikix · 2024-10-22T19:23:38Z

tests/models/fhirsearch_perform_iter_test.py

+        mock_perform.return_value = self.mock_bundle
+        mock_iter_pages.side_effect = iter([self.mock_bundle])
+        mock_fetch_next_page.side_effect = [None]


Your tests are fine - but I will say that all of this mocking can end up being brittle. Like here, you are using knowledge of the internals of perform_iter (that it calls perform()) and then mocking out a lot of the detailed utility code - where there might be bugs! But it's generally easy to get lost in the sauce when doing all this mocking. If another dev breaks this test in six month, they might not have the understanding of which micro-targeted line of code you are meant to be testing.

I know that it's a bit philosophical territory about how small to make unit tests vs functional tests vs whatever. But just practically, it can be tough to mock all the little interactions like this.

This is entirely optional for this PR, but if I were writing tests for this, I might:

Add a testing dependency on the responses mocking library. Like in pyproject.toml's optional-dependencies section, add responses to the tests list of dependencies.

Then in tests like this, you can mock things at the network boundary:

@responses.activate def test_perform_iter_single_bundle(self): responses.add(responses.GET, "https://example.com/bundle-url/", json={bundle content...})

But again, what you have here is fine - just it can be annoying when writing tests to do all the mocking like this, trying to avoid the network. But responses lets you mock further back.

LanaNYC · 2024-10-28T18:18:31Z

@mikix The latest version is ready for review. Thank you for steering me towards responses mocking library - It made my life so much easier :)

mikix

Looks fabulous - I think my only blocking concern is the authentication header comment.

mikix · 2024-10-29T14:20:23Z

fhir-parser

Interesting - this is harmless, but it does point at a different commit, one from your branch. Since the content is the same, it's not necessary to figure out how to revert this - but maybe see if it's simple to do.

PR is ready for review. I've reverted fhir-parser to the original commit, hopefully that should solve it.

mikix · 2024-10-29T14:21:41Z

fhirclient/_utils.py

+        Optional[Bundle]: The next page of results as a FHIR Bundle, or None if no "next" link is found.
+    """
+    if next_link := _get_next_link(bundle):
+        sanitized_next_link = _sanitize_next_link(next_link)


nit: this call could be dropped now that _get_next_link does the sanitizing.

mikix · 2024-10-29T14:26:03Z

fhirclient/_utils.py

+        response = requests.get(sanitized_url)
+        response.raise_for_status()
+        next_bundle_data = response.json()
+        next_bundle = Bundle(next_bundle_data)


Could/should this also use Bundle.read_from? That would set a few more headers like Accept: application/json and would also send authentication headers - which... we probably need here yeah?

It would mean sending down the FHIRServer instance, but that should be available to the callers of this function.

mikix · 2024-10-29T14:30:17Z

tests/models/fhirsearch_perform_iter_test.py

+    @responses.activate
+    def test_perform_iter_single_bundle(self):


Ah yeah this looks much cleaner with responses

…to Bundle, move pagination logic to _utils.py, add tests Simplify url sanitization, adjust testse Replace redundant code on perform() Add tests perform-iter and perform_resources_iter via Response Revert fhir-parser submodule to original commit

mikix

This looks great to me!

Thank you so much for this work!

LanaNYC force-pushed the feat/pagination-Bundle branch from 9a69be0 to 8f4a32b Compare September 20, 2024 16:15

LanaNYC changed the title ~~WIP: Implement Pagination Features for FHIRClient and Bundle Classes~~ Implement Pagination Features for FHIRClient and Bundle Classes Sep 20, 2024

LanaNYC force-pushed the feat/pagination-Bundle branch from 8f4a32b to 08a8535 Compare September 20, 2024 16:28

mikix reviewed Sep 23, 2024

View reviewed changes

LanaNYC marked this pull request as draft October 7, 2024 15:55

LanaNYC force-pushed the feat/pagination-Bundle branch 3 times, most recently from 9b5ad9b to bebe4a6 Compare October 14, 2024 19:50

LanaNYC marked this pull request as ready for review October 14, 2024 19:51

LanaNYC force-pushed the feat/pagination-Bundle branch from bebe4a6 to 2232f3a Compare October 22, 2024 17:29

mikix reviewed Oct 22, 2024

View reviewed changes

LanaNYC force-pushed the feat/pagination-Bundle branch from 2232f3a to bd60dc3 Compare October 28, 2024 18:14

mikix reviewed Oct 29, 2024

View reviewed changes

LanaNYC force-pushed the feat/pagination-Bundle branch from bd60dc3 to 3ec3ef3 Compare November 4, 2024 17:49

mikix approved these changes Nov 6, 2024

View reviewed changes

mikix merged commit ff41695 into smart-on-fhir:main Nov 6, 2024
5 checks passed

mikix mentioned this pull request Nov 6, 2024

Iterable support in FHIRSearch #172

Closed



		# Use forward references to avoid circular imports
		def perform_iter(self, server) -> Iterator['Resource']:

		if not first_bundle or not first_bundle.entry:
		return iter([])

		return sanitized_url


		def _execute_pagination_request(sanitized_url: str) -> Optional['Bundle']:

		@responses.activate
		def test_perform_iter_single_bundle(self):

Implement Pagination Features for FHIRClient and Bundle Classes #174

Implement Pagination Features for FHIRClient and Bundle Classes #174

Conversation

LanaNYC commented Sep 3, 2024 • edited Loading

Overview

mikix commented Sep 4, 2024 • edited Loading

LanaNYC commented Sep 9, 2024

mikix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LanaNYC Oct 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LanaNYC commented Sep 25, 2024

LanaNYC commented Oct 7, 2024

mikix commented Oct 7, 2024

LanaNYC commented Oct 7, 2024

mikix commented Oct 22, 2024

LanaNYC commented Oct 22, 2024

mikix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikix Oct 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikix Oct 22, 2024 • edited Loading

Choose a reason for hiding this comment

LanaNYC commented Oct 28, 2024

mikix left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikix left a comment

Choose a reason for hiding this comment

LanaNYC commented Sep 3, 2024 •

edited

Loading

mikix commented Sep 4, 2024 •

edited

Loading

LanaNYC Oct 8, 2024 •

edited

Loading

mikix Oct 22, 2024 •

edited

Loading

mikix Oct 22, 2024 •

edited

Loading