-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[render_vtk] Remove X server requirement #21700
Comments
See #21050 (comment) for a WIP idea. |
What we're looking for from Kitware is this part. It's OK to hard-code the rendering to EGL during that initial prototyping.
If this part is too high of a learning curve for Kitware people, TRI people can collaborate on adding the params flag and wiring it up. |
Thank you for clarifying @jwnimmer-tri. I'll hope to have the initial build available for prototyping soon. We can then iterate over the flags required for switching between engines. |
With EGL, setting Then, get all the tests passing and open a draft PR. For for the RenderEngineVtkParams config flag for which backend to use, we should allow it to be blank, in which case Drake gets to choose a default (for now -- GLX). On macOS, the new param must be blank because neither GLX nor EGL are wired up for macOS within Drake. (We won't have a "Cocoa" string option.) |
This issue is becoming ever-more important to TRI. Now that we're back after the holiday, anything we can do to help move this faster would be appreciated. |
@jwnimmer-tri Trying to track down why the test However, things fall apart after Any insight as to why there might be a difference in the clone renderer? |
I think TRI would be happy to help give ideas and/or debug it (and I'll \CC @SeanCurtis-TRI who will probably have better answers than me), but surely we can't do that in the absence of any code? Please open a (draft) pull request with the code, so we have somewhere to start from. It's fine if the code is unfinished / unclean for now. |
Without having seen the code, the most probable cause is that there is some property in an actor (or mapper or some such thing) that isn't currently being copied. The VTK artifcacts don't have clean copy semantics (as far as I know). As such, the cloning mechanism inside Drake explicitly constructs new instances, mapping over a subset of values into the clone. If some property, important for EGL, is omitted in that explicit creation/copy act, then the cloned version will have insufficient data. So, you should look at the cloning code and see if there is some obviously missing property. |
I've pushed my debug branch here: master...sankhesh:drake:egl-rebased Yes, I also think that there's something weird going on with the The branch above also writes out the rendered views as png files so I could introspect. |
Let me know if you definitively want me to explore. I'll wait til I hear from you to avoid duplication. |
Yes please take a look. I'm trying to decipher the vtk usage as I debug but it would be much faster for you. If you don't see anything major, I can try reproducing it with a minimal example outside of drake. |
I typically do the "reproduce outside of Drake" when wrestling with VTK. I find it very helpful to convince myself that my problems are born of Drake and not VTK (it usually is, but I've been able to submit enough fixes to VTK to remain convinced it's a worthwhile endeavor). I'll take a look at your branch later today. |
BTW My first guess is that this isn't a material or geometry problem. I'm guessing it's a light problem. I'll let you know how it goes. |
And yet another pertubation -- if I displace the camera a bit (so the sphere is no longer centered), I get: The mystery deepens.
|
The face culling we're seeing in the clone happens in the depth image as well. (Presumably it happens in the label image, but beacuse we can see through and see the other side, it doesn't appear in the rendered label image. The curious thing is that the image as the faces get "removed", what is revealed is the correct rendering. (Same for the depth image.) So, it's like the mapper is getting something backwards...all normals flipped? But then as the relationship between camera and geometry changes, some faces get culled away and we see through to the correct faces. (Note: I tried toggling two-sided shading in the renderers, and it made no difference to the rendered output.) |
If I delete the original engine prior to rendering, the cloned rendering comes out exactly right. The clone shares some data with the original, either it's too much or not enough. |
FYI This feels like it might be related to #20002. Clones fighting each other. The fight with the EGL renderer may be different than the fight with the OpenGL renderer, but fights are still happening? It may simply be that the current test that is failing now was insufficiently sophisticated to reveal the failure. Furthermore, our usage may also be contributing not falling in the trap even though Drake has an original |
Ok, it's definitely interference between the two render engines. I changed the test to do the following:
The first rendering is correct, the second is messed up, the third is correct again. Surprisingly, the fourth is also correct. The same thing happens if I flip the render engines in steps 3-6. So, it seems like one RenderEngine is updating the VTK state for itself, the other accepts it (in some sense) even though it isn't quite appropriate for it. But after one rendering, it gets a chance to correct for itself. But those corrections appear to stick? I'm not sure if that's completely true or if there's some sequence of further renderings that would show further corruption of the image. I'm going to try this same test on the original OpenGl implementation to see if we get a similar outcome. Stay tuned... |
Nope. It doesn't happen with the OpenGL-backed RenderEngineGl. So, while we know there's some form of interference between the other kind (#20002) it's not the same source as what's effecting us with the egl-backed implementation. Let me know if you'd like me to push my current experiment to your branch. |
Hi @SeanCurtis-TRI Thank you for your help investigating this. I am still trying to wrap my head around the rendering pipeline in drake. My assumption was that each run is a single render engine - so when I run the vtk tests, vtk is solely responsible for rendering. Based on your last comment about interference, that understanding seems wrong. How do two render engines play with each other? |
Note that in the latest change, I trick the logic to add the cloned sphere after the cloned plane to the renderer. Without this trick, the sphere doesn't show up. I did a poor job explaining this in #21700 (comment) but essentially, there is something fundamentally flawed with the cloned renderer itself. The order of actor addition shouldn't determine depth/visibility/occlusion - point coordinates and opacity should. |
Yes, please free to. |
! [remote rejected] egl-rebased -> egl-rebased (permission denied)
error: failed to push some refs to 'github.com:sankhesh/drake.git'
So, the two independent pipelines (which can be configured independently), nevertheless share data. We've got two possible problems:
I would recommend taking the operations on the VTK code out of Drake and running it directly against VTK. BUild the original pipeline, clone it in the same way, perform some renderings and see if they interfere. It'll be tedious. The best case scenario is that you observe interference outside of Drake (then we know it's something in VTK). If we don't observe interference, there are two possibilities:
I'd propose we do the "extraction experiment" first and see if we're lucky. If it doesn't reveal a corresponding issue, then we can look at the test and persuade all parties that it captured everything Drake is doing with VTK. After that, we can look at what Drake is doing on top of it. |
@SeanCurtis-TRI Just a quick update that I am still looking into this issue and I think it is just a manifestation of #20002. WIth EGL it seems to happen all the time. |
The PR #22025 mostly finishes this work. After that, only a few loose ends remain:
|
(This was split out of #21050.)
Is your feature request related to a problem? Please describe.
There's no real reason for an X server to be a strict requirement for this renderer. It makes it harder to deploy to headless environments for containerized environments.
Describe the solution you'd like
Build VTK and RenderEngineVtk with EGL support, and provide a
RenderEngineVtkParams
config option to select EGL instead of GLX.Describe alternatives you've considered
Users can do
xvfb
but that does non-HW-accelerated rendering and would be a bottleneck.The text was updated successfully, but these errors were encountered: