Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid const generics for nD #822

Open
b4l opened this issue Oct 9, 2024 · 14 comments
Open

Avoid const generics for nD #822

b4l opened this issue Oct 9, 2024 · 14 comments

Comments

@b4l
Copy link
Contributor

b4l commented Oct 9, 2024

It's very hard up to impossible to write something generic/dynamic around a NativeArray.

@kylebarron
Copy link
Member

kylebarron commented Oct 9, 2024

Can you provide any more detail? Can you list what you've tried? There are couple dozen examples of writing generic code on a NativeArray in the algorithm mod.

It's imperative that geoarrow-rs is able to represent dimensions more than just XY, and so the generic is unlikely to be removed.

I've also been experimenting with a NativeGeometryAccessor trait, so you can implement a function like

pub fn algorithm(array: impl NativeGeometryAccessor<2>) {
    for i in 0..array.len() {
        let geom = array.value_as_geometry(0);
        let geo_geom = geom.to_geo();
        // Do something with your geometry
    }
}

And it'll automatically work on any of the native arrays with dimension 2.

@b4l
Copy link
Contributor Author

b4l commented Oct 10, 2024

In the end, it didn't matter anymore. So wrapping into a dyn NativeArray is the way to go. I was at some point trying to return a typed generic array which is obviously impossible.

Still believe there are better ways to model multiple dimensions. Where do you see the benefit, and why is it unlikely to go away?

@kylebarron
Copy link
Member

Right you can't return a typed generic array.

Still believe there are better ways to model multiple dimensions.

Like what? Would love some examples

Where do you see the benefit, and why is it unlikely to go away?

We need to be able to statically know the memory layout of the array. Otherwise we'll pay a dynamic dispatch penalty when accessing every row of every child of every array. That means we need to know "This is a polygon array with 2D geometries" or "this is a MultiPoint array with 3D XYZ geometries".

@b4l
Copy link
Contributor Author

b4l commented Oct 11, 2024

Like what? Would love some examples

For example rstar and nalgebra.

I don't get why we need to know the memory layout statically. Arrow is not using const generics, and we hide the const generics behind a dyn NativeArray. We could just add a field on the type instead which would provide the same guarantees/functionality, no?

@kylebarron
Copy link
Member

kylebarron commented Oct 11, 2024

I'm less familiar with nalgebra, but the numpy rust binding has both generic and untyped array constructs. The generic in geoarrow arrays is equivalent to the generics on PyArray, which needs to statically know type and dimension. The dyn NativeArray is similar to dyn Array in arrow or to the numpy PyUntypedArray, which makes you downcast to a concrete array to do more operations.

@b4l
Copy link
Contributor Author

b4l commented Oct 22, 2024

As far as I can tell, const generics are only used for the dimensions, not for the actual data. This is basically what I try to suggest here.

@kylebarron
Copy link
Member

The const generics are used in the underlying coordinate buffers:

#[derive(Debug, Clone, PartialEq)]
pub struct SeparatedCoordBuffer<const D: usize> {
pub(crate) buffers: [ScalarBuffer<f64>; D],
}

@b4l
Copy link
Contributor Author

b4l commented Oct 22, 2024

That would need to change of course as well.

@kylebarron
Copy link
Member

If we don't have a const generic there, we would need to either pay a runtime penalty of inferring the dimension for every coordinate, or we'd need to always store 4 dimensions even if we're only using two.

I still don't understand what you're proposing

@kylebarron
Copy link
Member

const generics are only used for the dimensions, not for the actual data

I don't understand this statement. 3D coordinates are actual data

@b4l
Copy link
Contributor Author

b4l commented Oct 22, 2024

Well dimensions in ndarray are always 2 for geometries, only the shape changes which are not const generics. Geoarrow is using const generics for the shape. There would be a negligible overhead since the value is valid for the whole array and hence only needs to be checked once.

@kylebarron
Copy link
Member

kylebarron commented Oct 22, 2024

Well dimensions in ndarray are always 2 for geometries, only the shape changes which are not const generics.

That is relevant for geoarrow for the interleaved coordinate buffer case, but not for the separated coordinate buffer case. To handle separated coordinates we need to have one separate underlying Arrow buffer per coordinate dimension. So we either need to have a Vec<Buffer>, which then has an extra heap indirection on every coordinate access, or we can have an array of buffers ([ScalarBuffer<f64>; D]), but that needs a const generic

@b4l
Copy link
Contributor Author

b4l commented Oct 22, 2024

Also it would still be typed just not directly with a const generic.

@kylebarron
Copy link
Member

I still have no clue what you're proposing.

You're welcome to put up a prototype PR. To be clear, I would love to not have the const generic, but I think it's the only way to satisfy the goals of compile-time safety with libraries that don't support more than 2 dimensions (e.g. geo) and avoiding performance overhead on coordinate access.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants