Trying to train SynSin on SceneNet database #24

NagabhushanSN95 · 2020-08-13T07:34:47Z

Hi,
I'm trying to train the SynSin model on SceneNet database. But I'm not able to train the model. I would really appreciate it, if you can give me some tips.

I'm using 2000 pairs of frame only. To be specific, I'm using frames 0,25,3750,3775 from each scene of the first part of the training set which contains 1000 scenes. So, I believe there is considerable amount of diversity.
Also since SceneNet has ground truth depths, I'm using them and bypassing the depth regressor network. For this, I've enabled the --use_gt_depth flag.
In issue-23, it was suggested to use square images only. Since SceneNet has rectangular images (320x240), I'm cropping the frames and depth maps to 240x240. I've modified the camera instrinsic matrix (K) accordingly.
Even after training for 100000 epochs, the model doesn't train at all i.e. I get some red/blue images. That's all. Nothing else. I can understand if prediction fails for testing images, but it is failing for training images itself. PSNR starts from -5, increases to 1 or 2 and then goes to negative again. SSIM doesn't increase beyond 0.05. What do you think would be the problem here?
I tried with both learning rates, the ones metioned in the paper and the ones default in the code. Both didn't work.
I noticed that l1 loss and perceptual loss (content loss) are around 0.7 or 0.8, but GAN loss is an order higher (around 7). So, I set the lambda values for l1 loss and perceptual loss as 10. That didn't help either.
GAN loss starts low (around 5) and keep increasing for around 80000 iterations to upto 12. And then it almost flattens.
I would assume the reason for 7 is that discriminator is training faster than generator. But D_Real and D_Fake have similar values in each batch (around 0.1 to 0.3). So, discriminator isn't training as well.

I don't know what else to try. Can you kindly help me out here?

The text was updated successfully, but these errors were encountered:

NagabhushanSN95 · 2020-08-13T07:37:52Z

This is the command I'm using to start the training

python snb/train.py --batch-size 4 --folder temp --num_workers 4 --resume --dataset scenenet --use_inv_z --accumulation alphacomposite --model_type zbuffer_pts --refine_model_type resnet_256W8UpDown64  --norm_G sync:spectral_batch --render_ids 1 --suffix '' --normalize_image --lr 0.0001 --use_gt_depth --W 240 --log-dir ../Runs/Training/Train01/%s

I wrote DataLoader for SceneNet based on KittiDataLoader. The code is as follows:

import math
from pathlib import Path

import numpy
import skimage.io
import skimage.transform
import torch
import torch.utils.data as data


class SceneNetDataLoader(data.Dataset):

    def __init__(self, split_name, opts=None):
        super(SceneNetDataLoader, self).__init__()
        self.opt = opts
        self.dataroot = Path(opts.dataset_path) / split_name
        self.scenes = []
        for scene_num in sorted(self.dataroot.iterdir()):
            self.scenes.append((scene_num.stem, 0))
            self.scenes.append((scene_num.stem, 3750))

    @staticmethod
    def get_image(path: Path):
        image = skimage.io.imread(path.as_posix()).astype(numpy.float32) / 255 * 2 - 1
        image = image[:, 40:280]                # Crop (240,320,3) to (240,240,3)
        image_tr = torch.from_numpy(image).permute((2, 0, 1))
        return image_tr

    @staticmethod
    def get_depth(path: Path):
        depth = skimage.io.imread(path.as_posix()) * 0.001
        depth = depth[:, 40:280]                # Crop (240,320,3) to (240,240,3)
        depth = depth[None]
        depth = depth.astype(numpy.float32)
        return depth

    def get_transformation(self, scene_num, view_num: int):
        transformation_matrix_path = self.dataroot / scene_num / 'TransformationMatrix.txt'
        transformation_matrices = numpy.genfromtxt(transformation_matrix_path.as_posix(), delimiter=',')
        pose_index = view_num // 25
        pose1 = transformation_matrices[pose_index].reshape(4, 4)
        pose2 = transformation_matrices[pose_index + 1].reshape(4, 4)
        trans = numpy.matmul(pose2, numpy.linalg.inv(pose1)).astype(numpy.float32)
        return trans

    @staticmethod
    def camera_intrinsic_transform(vfov=45, hfov=60, pixel_width=320, pixel_height=240):
        """
        Copied from SceneNet
        """
        camera_intrinsics = numpy.zeros((3, 4))
        camera_intrinsics[2, 2] = 1
        camera_intrinsics[0, 0] = (pixel_width / 2.0) / math.tan(math.radians(hfov / 2.0))
        camera_intrinsics[0, 2] = pixel_width / 2.0
        camera_intrinsics[1, 1] = (pixel_height / 2.0) / math.tan(math.radians(vfov / 2.0))
        camera_intrinsics[1, 2] = pixel_height / 2.0
        return camera_intrinsics

    def __getitem__(self, index):
        scene_id = self.scenes[index]
        scene_num, view_num = scene_id

        frame1_path = self.dataroot / scene_num / f'photo/{view_num:04}.jpg'
        frame2_path = self.dataroot / scene_num / f'photo/{view_num + 25:04}.jpg'
        frame1 = self.get_image(frame1_path)
        frame2 = self.get_image(frame2_path)

        frame1_depth_path = self.dataroot / scene_num / f'depth/{view_num:04}.png'
        frame2_depth_path = self.dataroot / scene_num / f'depth/{view_num + 25:04}.png'
        frame1_depth = self.get_depth(frame1_depth_path)
        frame2_depth = self.get_depth(frame2_depth_path)

        trans = self.get_transformation(scene_num, view_num)
        trans_inv = numpy.linalg.inv(trans)
        identity = torch.eye(4)
        intrinsic = self.camera_intrinsic_transform(pixel_height=frame1.shape[1], pixel_width=frame1.shape[2])
        K = numpy.eye(4, dtype=numpy.float32)
        K[:3, :4] = intrinsic
        K_inv = numpy.linalg.inv(K)

        return {'images': [frame1, frame2],
                'depths': [frame1_depth, frame2_depth],
                'cameras': [{'Pinv': identity, 'P': identity, 'K': K, 'Kinv': K_inv},
                            {'Pinv': trans_inv, 'P': trans, 'K': K, 'Kinv': K_inv}]
                }

    def __len__(self):
        return len(self.scenes)

    def toval(self, epoch):
        pass

    def totrain(self, epoch):
        pass

oawiles · 2020-08-18T12:16:49Z

I think it's probably something with the camera set up -- you should see when it first projects stuff that the noisy results somewhat align with the true images. You can try using the true depths in the code in order to see if the cameras are right (here: https://github.com/facebookresearch/synsin/blob/master/models/z_buffermodel.py#L89).

NagabhushanSN95 · 2020-08-18T12:40:34Z

Thanks @oawiles. I'm using true depth only. I'll check if warping of features is correct.

oawiles · 2020-08-18T12:53:23Z

You can also try warping the RGB -- e.g. pass the RGB colours as features. This should be easier to check. Then these should precisely match the other image.

NagabhushanSN95 · 2020-08-23T02:34:54Z

@oawiles, you were right. The error is during warping only. The output of splatter is just an array of zeros. The error is in the format of camera matrix. By writing my own transformation code, I'm able to train the SynSin model. But, I'm not able to get your transformation (warping) code to work correctly. I had the camera matrix in the form

With this camera matrix, splatter output was zeros. I changed the camera matrix and removed dependencies on height and width of frame as follows

With this, splatter output is a warped frame, but the transformation doesn't match with the ground truth. Can you suggest what changes I've to make to my camera matrix? In other words, in what format does your code camera matrix to be in?

Thanks a lot

oawiles · 2020-08-26T20:09:13Z

What is the error? Sometimes how the splattered image looks in comparison to the true image makes it make snese. One thing I notice is that you should use K to make the values between -1,1 which I believe is not what you're doing. Another thing is sometimes you have to flip the Y. Without being able to see the visual results it's hard to guess at the precise problem.

NagabhushanSN95 · 2020-08-28T04:16:50Z

Hi,
I've attached the images below.
This is the first frame (true)

This is the second frame (true)

This is the first frame warped to the view of second frame (splattered)

As you can notice, in the splattered image, the green beam has come down compared to true second frame.

My camera matrix is as below:

where hfov=60 and vfov=45.

Also, I had to crop the images from 320x240 to 240x240. Would it make any difference?

oawiles · 2020-09-08T09:25:56Z

It could make a difference. I would recommend you first try to resize. Otherwise I think the intrinsics would mess it up. It l ooks like it's zoomed in, which could be from the cropping. I'd recommend first resizing and then using a matrix to transform from the intrinsics to [-1,1] for x/y using an offset matrix O such that you have a new intrinsic matrix I = O K where K was your old intrinsic matrix.

NagabhushanSN95 · 2020-09-09T09:30:39Z

OK. I'll try that. Thanks!

duyguceylan · 2022-05-30T10:47:21Z

Hi I have similar issues as described in the first message of this thread. I'm trying to train the code on my own dataset. I do save out the warped images using gt depth with the 'use_rgb_features' option set to True and they do look good. However, the model doesn't really train and I continue to get images that are mostly a single color. I tried debugging with only using L1 loss etc. but I observe the same pattern. Do you have any other pointers to what could be the issue?

This comment has been minimized.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trying to train SynSin on SceneNet database #24

Trying to train SynSin on SceneNet database #24

NagabhushanSN95 commented Aug 13, 2020

NagabhushanSN95 commented Aug 13, 2020

oawiles commented Aug 18, 2020

NagabhushanSN95 commented Aug 18, 2020

oawiles commented Aug 18, 2020

This comment has been minimized.

NagabhushanSN95 commented Aug 23, 2020

oawiles commented Aug 26, 2020

NagabhushanSN95 commented Aug 28, 2020 •

edited

Loading

oawiles commented Sep 8, 2020

NagabhushanSN95 commented Sep 9, 2020

duyguceylan commented May 30, 2022

Trying to train SynSin on SceneNet database #24

Trying to train SynSin on SceneNet database #24

Comments

NagabhushanSN95 commented Aug 13, 2020

NagabhushanSN95 commented Aug 13, 2020

oawiles commented Aug 18, 2020

NagabhushanSN95 commented Aug 18, 2020

oawiles commented Aug 18, 2020

This comment has been minimized.

NagabhushanSN95 commented Aug 23, 2020

oawiles commented Aug 26, 2020

NagabhushanSN95 commented Aug 28, 2020 • edited Loading

oawiles commented Sep 8, 2020

NagabhushanSN95 commented Sep 9, 2020

duyguceylan commented May 30, 2022

NagabhushanSN95 commented Aug 28, 2020 •

edited

Loading