Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to train SynSin on SceneNet database #24

Open
NagabhushanSN95 opened this issue Aug 13, 2020 · 11 comments
Open

Trying to train SynSin on SceneNet database #24

NagabhushanSN95 opened this issue Aug 13, 2020 · 11 comments

Comments

@NagabhushanSN95
Copy link

Hi,
I'm trying to train the SynSin model on SceneNet database. But I'm not able to train the model. I would really appreciate it, if you can give me some tips.

  1. I'm using 2000 pairs of frame only. To be specific, I'm using frames 0,25,3750,3775 from each scene of the first part of the training set which contains 1000 scenes. So, I believe there is considerable amount of diversity.
  2. Also since SceneNet has ground truth depths, I'm using them and bypassing the depth regressor network. For this, I've enabled the --use_gt_depth flag.
  3. In issue-23, it was suggested to use square images only. Since SceneNet has rectangular images (320x240), I'm cropping the frames and depth maps to 240x240. I've modified the camera instrinsic matrix (K) accordingly.
  4. Even after training for 100000 epochs, the model doesn't train at all i.e. I get some red/blue images. That's all. Nothing else. I can understand if prediction fails for testing images, but it is failing for training images itself. PSNR starts from -5, increases to 1 or 2 and then goes to negative again. SSIM doesn't increase beyond 0.05. What do you think would be the problem here?
  5. I tried with both learning rates, the ones metioned in the paper and the ones default in the code. Both didn't work.
  6. I noticed that l1 loss and perceptual loss (content loss) are around 0.7 or 0.8, but GAN loss is an order higher (around 7). So, I set the lambda values for l1 loss and perceptual loss as 10. That didn't help either.
  7. GAN loss starts low (around 5) and keep increasing for around 80000 iterations to upto 12. And then it almost flattens.
  8. I would assume the reason for 7 is that discriminator is training faster than generator. But D_Real and D_Fake have similar values in each batch (around 0.1 to 0.3). So, discriminator isn't training as well.

I don't know what else to try. Can you kindly help me out here?

@NagabhushanSN95
Copy link
Author

This is the command I'm using to start the training

python snb/train.py --batch-size 4 --folder temp --num_workers 4 --resume --dataset scenenet --use_inv_z --accumulation alphacomposite --model_type zbuffer_pts --refine_model_type resnet_256W8UpDown64  --norm_G sync:spectral_batch --render_ids 1 --suffix '' --normalize_image --lr 0.0001 --use_gt_depth --W 240 --log-dir ../Runs/Training/Train01/%s

I wrote DataLoader for SceneNet based on KittiDataLoader. The code is as follows:

import math
from pathlib import Path

import numpy
import skimage.io
import skimage.transform
import torch
import torch.utils.data as data


class SceneNetDataLoader(data.Dataset):

    def __init__(self, split_name, opts=None):
        super(SceneNetDataLoader, self).__init__()
        self.opt = opts
        self.dataroot = Path(opts.dataset_path) / split_name
        self.scenes = []
        for scene_num in sorted(self.dataroot.iterdir()):
            self.scenes.append((scene_num.stem, 0))
            self.scenes.append((scene_num.stem, 3750))

    @staticmethod
    def get_image(path: Path):
        image = skimage.io.imread(path.as_posix()).astype(numpy.float32) / 255 * 2 - 1
        image = image[:, 40:280]                # Crop (240,320,3) to (240,240,3)
        image_tr = torch.from_numpy(image).permute((2, 0, 1))
        return image_tr

    @staticmethod
    def get_depth(path: Path):
        depth = skimage.io.imread(path.as_posix()) * 0.001
        depth = depth[:, 40:280]                # Crop (240,320,3) to (240,240,3)
        depth = depth[None]
        depth = depth.astype(numpy.float32)
        return depth

    def get_transformation(self, scene_num, view_num: int):
        transformation_matrix_path = self.dataroot / scene_num / 'TransformationMatrix.txt'
        transformation_matrices = numpy.genfromtxt(transformation_matrix_path.as_posix(), delimiter=',')
        pose_index = view_num // 25
        pose1 = transformation_matrices[pose_index].reshape(4, 4)
        pose2 = transformation_matrices[pose_index + 1].reshape(4, 4)
        trans = numpy.matmul(pose2, numpy.linalg.inv(pose1)).astype(numpy.float32)
        return trans

    @staticmethod
    def camera_intrinsic_transform(vfov=45, hfov=60, pixel_width=320, pixel_height=240):
        """
        Copied from SceneNet
        """
        camera_intrinsics = numpy.zeros((3, 4))
        camera_intrinsics[2, 2] = 1
        camera_intrinsics[0, 0] = (pixel_width / 2.0) / math.tan(math.radians(hfov / 2.0))
        camera_intrinsics[0, 2] = pixel_width / 2.0
        camera_intrinsics[1, 1] = (pixel_height / 2.0) / math.tan(math.radians(vfov / 2.0))
        camera_intrinsics[1, 2] = pixel_height / 2.0
        return camera_intrinsics

    def __getitem__(self, index):
        scene_id = self.scenes[index]
        scene_num, view_num = scene_id

        frame1_path = self.dataroot / scene_num / f'photo/{view_num:04}.jpg'
        frame2_path = self.dataroot / scene_num / f'photo/{view_num + 25:04}.jpg'
        frame1 = self.get_image(frame1_path)
        frame2 = self.get_image(frame2_path)

        frame1_depth_path = self.dataroot / scene_num / f'depth/{view_num:04}.png'
        frame2_depth_path = self.dataroot / scene_num / f'depth/{view_num + 25:04}.png'
        frame1_depth = self.get_depth(frame1_depth_path)
        frame2_depth = self.get_depth(frame2_depth_path)

        trans = self.get_transformation(scene_num, view_num)
        trans_inv = numpy.linalg.inv(trans)
        identity = torch.eye(4)
        intrinsic = self.camera_intrinsic_transform(pixel_height=frame1.shape[1], pixel_width=frame1.shape[2])
        K = numpy.eye(4, dtype=numpy.float32)
        K[:3, :4] = intrinsic
        K_inv = numpy.linalg.inv(K)

        return {'images': [frame1, frame2],
                'depths': [frame1_depth, frame2_depth],
                'cameras': [{'Pinv': identity, 'P': identity, 'K': K, 'Kinv': K_inv},
                            {'Pinv': trans_inv, 'P': trans, 'K': K, 'Kinv': K_inv}]
                }

    def __len__(self):
        return len(self.scenes)

    def toval(self, epoch):
        pass

    def totrain(self, epoch):
        pass

@oawiles
Copy link

oawiles commented Aug 18, 2020

I think it's probably something with the camera set up -- you should see when it first projects stuff that the noisy results somewhat align with the true images. You can try using the true depths in the code in order to see if the cameras are right (here: https://github.com/facebookresearch/synsin/blob/master/models/z_buffermodel.py#L89).

@NagabhushanSN95
Copy link
Author

Thanks @oawiles. I'm using true depth only. I'll check if warping of features is correct.

@oawiles
Copy link

oawiles commented Aug 18, 2020

You can also try warping the RGB -- e.g. pass the RGB colours as features. This should be easier to check. Then these should precisely match the other image.

@NagabhushanSN95

This comment has been minimized.

@NagabhushanSN95
Copy link
Author

@oawiles, you were right. The error is during warping only. The output of splatter is just an array of zeros. The error is in the format of camera matrix. By writing my own transformation code, I'm able to train the SynSin model. But, I'm not able to get your transformation (warping) code to work correctly. I had the camera matrix in the form

image

With this camera matrix, splatter output was zeros. I changed the camera matrix and removed dependencies on height and width of frame as follows

image

With this, splatter output is a warped frame, but the transformation doesn't match with the ground truth. Can you suggest what changes I've to make to my camera matrix? In other words, in what format does your code camera matrix to be in?

Thanks a lot

@oawiles
Copy link

oawiles commented Aug 26, 2020

What is the error? Sometimes how the splattered image looks in comparison to the true image makes it make snese. One thing I notice is that you should use K to make the values between -1,1 which I believe is not what you're doing. Another thing is sometimes you have to flip the Y. Without being able to see the visual results it's hard to guess at the precise problem.

@NagabhushanSN95
Copy link
Author

NagabhushanSN95 commented Aug 28, 2020

Hi,
I've attached the images below.
This is the first frame (true)
frame1

This is the second frame (true)
frame2

This is the first frame warped to the view of second frame (splattered)
frame2_warped

As you can notice, in the splattered image, the green beam has come down compared to true second frame.

My camera matrix is as below:
image
where hfov=60 and vfov=45.

Also, I had to crop the images from 320x240 to 240x240. Would it make any difference?

@oawiles
Copy link

oawiles commented Sep 8, 2020

It could make a difference. I would recommend you first try to resize. Otherwise I think the intrinsics would mess it up. It l ooks like it's zoomed in, which could be from the cropping. I'd recommend first resizing and then using a matrix to transform from the intrinsics to [-1,1] for x/y using an offset matrix O such that you have a new intrinsic matrix I = O K where K was your old intrinsic matrix.

@NagabhushanSN95
Copy link
Author

OK. I'll try that. Thanks!

@duyguceylan
Copy link

Hi I have similar issues as described in the first message of this thread. I'm trying to train the code on my own dataset. I do save out the warped images using gt depth with the 'use_rgb_features' option set to True and they do look good. However, the model doesn't really train and I continue to get images that are mostly a single color. I tried debugging with only using L1 loss etc. but I observe the same pattern. Do you have any other pointers to what could be the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants