-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++ <-> Python conversion performance #3
Comments
Tracked this down in pybind11's The bottleneck seems that pybind11 keeps a map of all registered instances and first tries to find and return the corresponding instance from the map, creating a new one if not found. In the example above (more generally in geography creation functions), we know that no instance already exists, we could skip this step. EDIT: Other potential optimizations:
Note also: the benchmark above may be biased as pybind11 probably optimizes the empty |
Now a quick benchmark to evaluate the overhead of Python -> C++ conversion: int dummy(PyObjectGeography obj) {
// equivalent to `obj.cast<Geography*>()` where obj is a `py::object`
obj.as_geog_ptr();
return 0;
}
int dummy_no_cast(PyObjectGeography obj) {
return 0;
}
int get_dimensions(PyObjectGeography obj) {
return obj.as_geog_ptr()->dimension();
}
...
m.def("dummy", py::vectorize(&dummy));
m.def("dummy_no_cast", py::vectorize(&dummy_no_cast));
m.def("get_dimensions", py::vectorize(&get_dimensions)); Results: x = np.random.rand(100_000)
y = np.random.rand(100_000)
points = s2shapely.create(x, y)
%timeit s2shapely.dummy_no_cast(points)
# 542 µs ± 11.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%timeit s2shapely.dummy(points)
# 8.39 ms ± 29.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit s2shapely.get_dimensions(points)
# 8.93 ms ± 36.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Significant overhead here too. |
After more investigation, it looks like most of the conversion overhead is coming from calls to In our case (at least for the Python -> C++ conversion within a vectorized function operating on geographies), we don't need to perform this lookup for each array element. It is enough (and almost free!) to just extract the pointer to the C++ object from the Python object and cast it to the Geography base class: auto inst = reinterpret_cast<py::detail::instance *>(py_obj.ptr());
return reinterpret_cast<Geography*>(inst->simple_value_holder[0]); "Just extract" is not really right, we also want to check first if we can safely cast the Python object to a Geography object (or raise), and check that in an efficient way. Unfortunately, this doesn't seem to be supported in pybind11. Maybe we could wrap This still uses pybind11's internals, though (pybind/pybind11#1572). |
Nanobind has a low-level instance interface that could make things much easier: https://github.com/wjakob/nanobind/blob/master/docs/lowlevel.md. However, it doesn't support automatic vectorization (yet?). |
I did a quick benchmark to evaluate the overhead of C++ -> Python conversion:
Here are the results:
The cast version is almost 3x slower than the no cast version. That's a lot for "just" wrapping the C++ geography in a Python object. Not sure what's happening (the Geography (sub)classes only implement move semantics, so there shouldn't be any data copy at this level), but there must be something wrong.
The text was updated successfully, but these errors were encountered: