Convention of the World Frame for Rerun Visualization #9

Nik-V9 · 2024-07-15T10:12:08Z

Hi, Thanks for releasing the code!

I find that the coordinate frame convention for the world frame tends to change across different sets of input images.

In DUSt3R, using a RDF convention (X - Right, Y - Down, Z - Forward) seemed to always give a consistent visualization where the pointcloud is oriented correctly in the viewer at initialization. This is exactly what the Rerun visualizer in Mini-DUSt3R does: https://github.com/pablovela5620/mini-dust3r/blob/b3f2ec7c829f4ae2ba46f603a19fd2f9107f47cb/mini_dust3r/api/inference.py#L36

However, in MASt3R, to have the pointcloud oriented correctly in the viewer (at initialization), I observe that the convention has to be changed for different sets of input images. On the other hand, I see that the local camera convention has not changed (it's still RDF) and I don't have this issue when I transform everything relative to the first frame (the pointcloud and other visualizations are oriented correctly).

I was wondering if this behavior has to do with the changed global BA procedure between DUSt3R & MASt3R? Is this behavior caused due to the avg-angle canonical view transforms performed in sparse_ga?

yocabon · 2024-07-15T12:31:57Z

Hi,
it's probably due to the fact that poses are not initialized to a good guess in the sparse ga, unlike the global alignment used in dust3r. Poses all start from identity so the scene may end up in whatever orientation. As you say, it's fine if you transform everything relative to the first frame.

Nik-V9 · 2024-07-15T12:52:46Z

Ah, I see, Thanks for the clarification! The relative transform works fine, I'll just use that instead.

yt2639 · 2024-07-24T02:30:32Z

Hii @yocabon and @Nik-V9 , could you elaborate on how we could "transform everything relative to the first frame"?

I noticed in the demo, it has this line

mast3r/mast3r/demo.py

Line 104 in d1c9c7b

scene.apply_transform(np.linalg.inv(cams2world[0] @ OPENGL @ rot))

I assume you are talking about this? Then, how do we exactly do this? I did look at trimesh's document and code but it seems way too complicated for me to parse what exactly has been done through the above line of code.

I am now trying to export poses estimated by Mast3r to a NeRFStudio-compatible format. If I directly go and export the poses following the discussions in this issue in Dust3r, the poses are completely off (I also created a new issue in Dust3r). Therefore, I am wondering if I should also transform the poses of the other images to be relative to the first one before saving/exporting poses to NeRFStudio.

I appreciate very much your help. Thanks!

yocabon · 2024-07-24T12:50:27Z

transform everything relative to the first frame: express all poses in cam0 space instead of world

cams2world = scene.get_im_poses().cpu()
world_to_c0 = np.linalg.inv(cams2world[0])  # world to cam0

cams_to_c0 = []
for i in range(len(pts3d)):
    ci_to_world = cams2world[i]  # cam_i to world
    ci_to_c0 = world_to_c0 @ ci_to_world  # cam_i to cam0
    cams_to_c0.append(ci_to_c0)

yt2639 · 2024-07-26T02:39:44Z

Thanks a lot for your explanation!

Nik-V9 closed this as completed Jul 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convention of the World Frame for Rerun Visualization #9

Convention of the World Frame for Rerun Visualization #9

Nik-V9 commented Jul 15, 2024

yocabon commented Jul 15, 2024

Nik-V9 commented Jul 15, 2024

yt2639 commented Jul 24, 2024 •

edited

Loading

yocabon commented Jul 24, 2024

yt2639 commented Jul 26, 2024

Convention of the World Frame for Rerun Visualization #9

Convention of the World Frame for Rerun Visualization #9

Comments

Nik-V9 commented Jul 15, 2024

yocabon commented Jul 15, 2024

Nik-V9 commented Jul 15, 2024

yt2639 commented Jul 24, 2024 • edited Loading

yocabon commented Jul 24, 2024

yt2639 commented Jul 26, 2024

yt2639 commented Jul 24, 2024 •

edited

Loading