Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Reuse GeoArrowArrayReader in ReaderImpl #29

Merged
merged 1 commit into from
Oct 9, 2024

Conversation

jorisvandenbossche
Copy link
Collaborator

I am not entirely sure this is desired (it gives a bit of overhead), but while trying to understand the reader/writer/builder/visitor hierarchy in geoarrow-c, I noticed that here we were not using GeoArrowArrayReader, but the separate readers for WKT/WKB (GeoArrowWKTReader/GeoArrowWKBReader) and GeoArrowArrayViewVisit for native coords, duplicating the iteration function for WKT/WKB.
While the GeoArrowArrayReader already abstracts away those three options behind a single interface.

This gives a bit more overhead for the scalar WKTReader (I assume because this constructs a geoarrow::Reader every time, and this now not just initializes the GeoArrowWKTReader, but also the other structs for WKB and native).
Given this is the scalar version (and you can always ensure to start from arrow input for optimal performance), I think that overhead is acceptable. But alternatively, we could also update GeoArrowArrayReader to only initialize the WKT/WKB readers if needed (when it gets data passed with that type), instead of upfront. Or another option is to expose the GeoArrowArrayViewVisitWKT/WKB functions in geoarrow.h, so that we can at least avoid reimplementing those.

Also added some brief doc comments.

Copy link
Owner

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

I had totally forgotten about that 😬

@paleolimbot
Copy link
Owner

This gives a bit more overhead for the scalar WKTReader

Did you measure this? I believe the Reader is reused for all calls to WriteGeography() and I would be surprised if the two malloc()s involved in initializing those C things makes a difference (unless you are creating a new WKTReader() for every feature)

@paleolimbot paleolimbot merged commit 9ebd7b1 into paleolimbot:main Oct 9, 2024
4 checks passed
@jorisvandenbossche jorisvandenbossche deleted the reader-cleanup branch October 18, 2024 09:26
@jorisvandenbossche
Copy link
Collaborator Author

I measured it, but indeed with:

unless you are creating a new WKTReader() for every feature

See https://github.com/benbovy/spherely/pull/50/files, I should move the creation of the WKTReader outside of the function that is automatically vectorized, so a single instance is reused in the inner function (but the current way was just the easiest to get this working, it's a small optimization that can be done later as well)

@paleolimbot
Copy link
Owner

Yes, definitely! I remember an early conversation with you and Ben about ensuring that the S2GeographyOptions class was reused and you should consider that as well. Eventually I'd like to refactor the functions as something like:

class XOperation : public Operation<double, S2Geography> {
public:
  Init(Options options);
  double Execute(const S2Geography& geog);
};

...to reduce replication of looping code in the R package (you can template on a class, but you can't template on a function name). This also gives the Operation an opportunity to allocate some scratch space once and reuse it (a frequent pattern is to loop over geometries that are all approximately the same size, so often this is helpful).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants