-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tf_remap very CPU-intensive #175
Comments
This sounds promising. I'll wait for other to chime in, but if there are some sort of tests or other method to prove that the C++ node performs equivalent with the same usage to the python node, I would be in support of adding a C++ target with the name tf_remap and moving the python script ot tf_remap_legacy or similar. Could add a ROS out warning for one distro to make users aware of the change. The only breaking usage case I can think of off the top of my head is if a user launches the script with python /opt/ros/.../tf_remap. rosrun, roslaunch, and ./ should all work. I would prefer bringing the C++ version upstream for wider distribution in either case. I should be available to help this process. |
Certainly if we can get better performance from the C++ version it's definitely worth considering switching. |
As this kind of timed-out, I released the package into melodic separately. However, I'm still open for the possibility of adding it to ros/geometry. |
Should I still start a PR replacing the Python version? |
I was wondering what's eating my CPU when replaying bagfiles, and I found out it's tf_remap. In the bagfile, I have TF messages at about 1-2 kHz, and the code I'm running spawns like 10 TF listeners. This is enough to saturate 2 CPU cores on my machine just by running the tf_remap node.
I verified the remapping itself is not the bottleneck - the bottleneck really is the
rospy.Publisher.publish()
, probably not able to publish at such a high rate from python to that many subscribers.So I've written an experimental C++ tf_remap, and the CPU load is much lower (about 30-40% of one CPU core). This package also solves a few more pain points of tf_remap, like the inability to work with tf_static, or the inability to remove TF frames.
Now the question is: do we want to merge this package as a tool to geometry2 (not to geometry, since it's TF2-native)? Or should I rather release it as a standalone package? I assume that completely substituting the current Python node with the C++ one is not an option for many reasons...
Another thing after the previous question is solved: should there be a warning added to
tf/tf_remap
that it's known to be inefficient, and with a link to the C++ node if the user observes performance problems?Here's the code that can easily achieve the high-load situation I'm describing (note that
tf/static_transform_publisher
does not use/tf_static
, but it periodically publishes to/tf
, in this case at 1 kHz):For completeness, I attach profiler outputs from profiling the original Python node and the C++ node: profilers.zip . In Python,
thread.lock.acquire
takes almost all the self time, seconded bywrite_data
with 1.5% . In C++,pthread_mutex_lock
is also the most time-consuming call, but still taking only 17% of the self time.The text was updated successfully, but these errors were encountered: