Skip to content

Commit

Permalink
SparseOcc v1.1 release (#37)
Browse files Browse the repository at this point in the history
* add usage of ray_metrics.py

* topk bug fix

* add scal and lovasz loss

* add focal loss

* add bda flip rot

* code clean

* add config for 60ep

* update panoptic config

* Cleaning

---------

Co-authored-by: YANG-CY-163 <[email protected]>
  • Loading branch information
afterthat97 and YANG-CY-163 authored Jun 27, 2024
1 parent 550b12a commit fe9a966
Show file tree
Hide file tree
Showing 17 changed files with 401 additions and 497 deletions.
51 changes: 34 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,24 @@

This is the official PyTorch implementation for our paper:

> [**SparseOcc: Fully Sparse 3D Occupancy Prediction**](https://arxiv.org/abs/2312.17118)<br>
> [**Fully Sparse 3D Panoptic Occupancy Prediction**](https://arxiv.org/abs/2312.17118)<br>
> :school: Presented by Nanjing University and Shanghai AI Lab<br>
> :email: Primary contact: Haisong Liu ([email protected])<br>
> :trophy: [CVPR 2024 Autonomous Driving Challenge - Occupancy and Flow](https://opendrivelab.com/challenge2024/#occupancy_and_flow)<br>
> :book: 第三方中文解读: [自动驾驶之心](https://zhuanlan.zhihu.com/p/675811281)[AIming](https://zhuanlan.zhihu.com/p/691549750)。谢谢你们!
## :warning: Important Notes
There is concurrent work titled ‘SparseOcc: Rethinking Sparse Latent Representation’ by Tang et al., which shares the same name SparseOcc with our work. If you cite our research, please ensure that you reference the correct version (arXiv **2312.17118**, authored by **Liu et al.**):

```
@article{liu2023fully,
title={Fully sparse 3d panoptic occupancy prediction},
author={Liu, Haisong and Wang, Haiguang and Chen, Yang and Yang, Zetong and Zeng, Jia and Chen, Li and Wang, Limin},
journal={arXiv preprint arXiv:2312.17118},
year={2023}
}
```

## Highlights

**New model**:1st_place_medal:: SparseOcc initially reconstructs a sparse 3D representation from visual inputs and subsequently predicts semantic/instance occupancy from the 3D sparse representation by sparse queries.
Expand All @@ -18,21 +30,34 @@ This is the official PyTorch implementation for our paper:

![](asserts/rayiou.jpg)

Some FAQs from the community about the evaluation metrics:

1. **Why does training with visible masks result in significant improvements in the old mIoU metric, but not in the new RayIoU metric?** As mentioned in the paper, when using the visible mask during training, the area behind the surface won't be supervised, so the model tends to fill this area with duplicated predictions, leading to a thicker surface. The old metric inconsistently penalizes along the depth axis when the prediction has a thick surface. Thus, this ''imporovement'' is mainly due to the vulnerability of old metric.
2. **Why SparseOcc cannot exploit the vulnerability of the old metrics?** As SparseOcc employs a fully sparse architecture, it always predicts a thin surface. Thus, there are two ways for a fair comparison: (a) use the old metric, but all methods must predict a thin surface, which implies they cannot use the visible mask during training; (b) use RayIoU, as it is more reasonable and can fairly compare thick or thin surface. Our method achieves SOTA performance on both cases.
3. **Does RayIoU overlook interior reconstruction?** Firstly, we are unable to obtain the interior occupancy ground-truth. This is because the ground-truth is derived from voxelizing LiDAR point clouds, and LiDARs are only capable of scanning the thin surface of an object. Secondly, the query ray in RayIoU can originate from any position within the scene (see the figure above). This allows it to evaluate the overall reconstruction performance, unlike depth estimation. We would like to emphasize that the evaluation logic of RayIoU aligns with the process of ground-truth generation.

If you have other questions, feel free to contact me (Haisong Liu, [email protected]).

## News

* 2024-05-29: We add support for [OpenOcc v2](configs/r50_nuimg_704x256_8f_openocc.py) dataset (without occupancy flow).
* 2024-04-11: The panoptic version of SparseOcc ([configs/r50_nuimg_704x256_8f_pano.py](configs/r50_nuimg_704x256_8f_pano.py)) is released.
* 2024-04-09: An updated arXiv version [https://arxiv.org/abs/2312.17118v3](https://arxiv.org/abs/2312.17118v3) has been released.
* 2024-03-31: We release the code and pretrained weights.
* 2023-12-30: We release the paper.
* **2024-06-27**: SparseOcc v1.1 is released. In this change, we introduce BEV data augmentation (BDA) and Lovasz-Softmax loss to further enhance the performance. Compared with [v1.0](https://github.com/MCG-NJU/SparseOcc/tree/v1.0) (35.0 RayIoU with 48 epochs), SparseOcc v1.1 can achieve 36.8 RayIoU with 24 epochs!
* **2024-05-29**: We add support for [OpenOcc v2](configs/r50_nuimg_704x256_8f_openocc.py) dataset (without occupancy flow).
* **2024-04-11**: The panoptic version of SparseOcc ([configs/r50_nuimg_704x256_8f_pano.py](configs/r50_nuimg_704x256_8f_pano.py)) is released.
* **2024-04-09**: An updated arXiv version [https://arxiv.org/abs/2312.17118v3](https://arxiv.org/abs/2312.17118v3) has been released.
* **2024-03-31**: We release the code and pretrained weights.
* **2023-12-30**: We release the paper.

## Model Zoo

| Setting | Pretrain | Training Cost | RayIoU | RayPQ | FPS | Weights |
These results are from our latest version, v1.1, which outperforms the results reported in the paper. Additionally, our implementation differs slightly from the original paper. If you wish to reproduce the paper exactly, please refer to the [v1.0](https://github.com/MCG-NJU/SparseOcc/tree/v1.0) tag.

| Setting | Epochs | Training Cost | RayIoU | RayPQ | FPS | Weights |
|----------|:--------:|:-------------:|:------:|:-----:|:---:|:-------:|
| [r50_nuimg_704x256_8f](configs/r50_nuimg_704x256_8f.py) | [nuImg](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/nuimages_semseg/cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim/cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim_20201009_124951-40963960.pth) | 1d4h, ~12GB Memory | 35.0 | - | 17.3 | [github](https://github.com/MCG-NJU/SparseOcc/releases/download/v1.0/sparseocc_r50_nuimg_704x256_8f.pth) |
| [r50_nuimg_704x256_8f_pano](configs/r50_nuimg_704x256_8f_pano.py) | [nuImg](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/nuimages_semseg/cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim/cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim_20201009_124951-40963960.pth) | 1d4h, ~12GB Memory | 34.5 | 14.0 | 17.3 | [github](https://github.com/MCG-NJU/SparseOcc/releases/download/v1.0/sparseocc_r50_nuimg_704x256_8f_pano.pth) |
| [r50_nuimg_704x256_8f](configs/r50_nuimg_704x256_8f.py) | 24 | 15h, ~12GB | 36.8 | - | 17.3 | [github](https://github.com/MCG-NJU/SparseOcc/releases/download/v1.1/sparseocc_r50_nuimg_704x256_8f_24e_v1.1.pth) |
| [r50_nuimg_704x256_8f_60e](configs/r50_nuimg_704x256_8f_60e.py) | 60 | 37h, ~12GB | 37.7 | - | 17.3 | [github](https://github.com/MCG-NJU/SparseOcc/releases/download/v1.1/sparseocc_r50_nuimg_704x256_8f_60e_v1.1.pth) |
| [r50_nuimg_704x256_8f_pano](configs/r50_nuimg_704x256_8f_pano.py) | 24 | 15h, ~12GB | 35.9 | 14.0 | 17.3 | [github](https://github.com/MCG-NJU/SparseOcc/releases/download/v1.1/sparseocc_r50_nuimg_704x256_8f_pano_24e_v1.1.pth) |

* The backbone is pretrained on [nuImages](https://download.openmmlab.com/mmdetection3d/v0.1.0_models/nuimages_semseg/cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim/cascade_mask_rcnn_r50_fpn_coco-20e_20e_nuim_20201009_124951-40963960.pth). Download the weights to `pretrain/xxx.pth` before you start training.
* FPS is measured with Intel(R) Xeon(R) Platinum 8369B CPU and NVIDIA A100-SXM4-80GB GPU (PyTorch `fp32` backend, including data loading).
* We will release more settings in the future.

Expand All @@ -48,14 +73,6 @@ conda activate sparseocc
conda install pytorch==2.0.0 torchvision==0.15.0 pytorch-cuda=11.8 -c pytorch -c nvidia
```

or PyTorch 1.10.2 + CUDA 10.2 for older GPUs:

```
conda create -n sparseocc python=3.8
conda activate sparseocc
conda install pytorch==1.10.2 torchvision==0.11.3 cudatoolkit=10.2 -c pytorch
```

Install other dependencies:

```
Expand Down
45 changes: 33 additions & 12 deletions configs/r50_nuimg_704x256_8f.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,21 +37,18 @@
_dim_ = 256
_num_points_ = 4
_num_groups_ = 4
_num_layers_ = 4
_num_layers_ = 2
_num_frames_ = 8
_num_queries_ = 100
_topk_training_ = [4000, 16000, 64000]
_topk_testing_ = [2000, 8000, 32000]
_topk_training_ = _topk_testing_


model = dict(
type='SparseOcc',
data_aug=dict(
img_color_aug=True, # Move some augmentations to GPU
img_norm_cfg=img_norm_cfg,
img_pad_cfg=dict(size_divisor=32)),
use_grid_mask=False,
use_mask_camera=False,
img_backbone=dict(
type='ResNet',
Expand Down Expand Up @@ -97,6 +94,16 @@
loss_mask_weight=5.0,
loss_dice_weight=5.0,
),
loss_geo_scal=dict(
type='GeoScalLoss',
num_classes=len(occ_class_names),
loss_weight=1.0
),
loss_sem_scal=dict(
type='SemScalLoss',
num_classes=len(occ_class_names),
loss_weight=1.0
)
),
),
)
Expand All @@ -107,12 +114,20 @@
'bot_pct_lim': (0.0, 0.0),
'rot_lim': (0.0, 0.0),
'H': 900, 'W': 1600,
'rand_flip': False,
'rand_flip': True,
}

bda_aug_conf = dict(
rot_lim=(-22.5, 22.5),
scale_lim=(1., 1.),
flip_dx_ratio=0.5,
flip_dy_ratio=0.5
)

train_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=False, color_type='color'),
dict(type='LoadMultiViewImageFromMultiSweeps', sweeps_num=_num_frames_ - 1),
dict(type='BEVAug', bda_aug_conf=bda_aug_conf, classes=det_class_names, is_train=True),
dict(type='LoadOccGTFromFile', num_classes=len(occ_class_names)),
dict(type='RandomTransformImage', ida_aug_conf=ida_aug_conf, training=True),
dict(type='DefaultFormatBundle3D', class_names=det_class_names),
Expand All @@ -123,6 +138,7 @@
test_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=False, color_type='color'),
dict(type='LoadMultiViewImageFromMultiSweeps', sweeps_num=_num_frames_ - 1, test_mode=True),
dict(type='BEVAug', bda_aug_conf=bda_aug_conf, classes=det_class_names, is_train=False),
dict(type='LoadOccGTFromFile', num_classes=len(occ_class_names)),
dict(type='RandomTransformImage', ida_aug_conf=ida_aug_conf, training=False),
dict(type='DefaultFormatBundle3D', class_names=det_class_names),
Expand All @@ -140,7 +156,8 @@
pipeline=train_pipeline,
classes=det_class_names,
modality=input_modality,
test_mode=False),
test_mode=False
),
val=dict(
type=dataset_type,
data_root=dataset_root,
Expand All @@ -149,7 +166,8 @@
pipeline=test_pipeline,
classes=det_class_names,
modality=input_modality,
test_mode=True),
test_mode=True
),
test=dict(
type=dataset_type,
data_root=dataset_root,
Expand All @@ -158,12 +176,13 @@
pipeline=test_pipeline,
classes=det_class_names,
modality=input_modality,
test_mode=True),
test_mode=True
),
)

optimizer = dict(
type='AdamW',
lr=2e-4,
lr=5e-4,
paramwise_cfg=dict(
custom_keys={
'img_backbone': dict(lr_mult=0.1),
Expand All @@ -174,13 +193,15 @@
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))

lr_config = dict(
policy='CosineAnnealing',
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=1.0 / 3,
min_lr_ratio=1e-3
by_epoch=True,
step=[22, 24],
gamma=0.2
)
total_epochs = 48
total_epochs = 24
batch_size = 8

# load pretrained weights
Expand Down
15 changes: 15 additions & 0 deletions configs/r50_nuimg_704x256_8f_60e.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
_base_ = ['./r50_nuimg_704x256_8f.py']

lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=1.0 / 3,
by_epoch=True,
step=[48, 60],
gamma=0.2
)
total_epochs = 60

# evaluation
eval_config = dict(interval=total_epochs)
9 changes: 6 additions & 3 deletions configs/r50_nuimg_704x256_8f_openocc.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,11 +62,14 @@
workers_per_gpu=8,
train=dict(
pipeline=train_pipeline,
occ_gt_root=occ_gt_root),
occ_gt_root=occ_gt_root
),
val=dict(
pipeline=test_pipeline,
occ_gt_root=occ_gt_root),
occ_gt_root=occ_gt_root
),
test=dict(
pipeline=test_pipeline,
occ_gt_root=occ_gt_root),
occ_gt_root=occ_gt_root
),
)
20 changes: 16 additions & 4 deletions configs/r50_nuimg_704x256_8f_pano.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,20 @@
'bot_pct_lim': (0.0, 0.0),
'rot_lim': (0.0, 0.0),
'H': 900, 'W': 1600,
'rand_flip': False,
'rand_flip': True,
}

bda_aug_conf = dict(
rot_lim=(-22.5, 22.5),
scale_lim=(1., 1.),
flip_dx_ratio=0.5,
flip_dy_ratio=0.5
)

train_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=False, color_type='color'),
dict(type='LoadMultiViewImageFromMultiSweeps', sweeps_num=_num_frames_ - 1),
dict(type='BEVAug', bda_aug_conf=bda_aug_conf, classes=det_class_names, is_train=True),
dict(type='LoadOccGTFromFile', num_classes=len(occ_class_names), inst_class_ids=[2, 3, 4, 5, 6, 7, 9, 10]),
dict(type='RandomTransformImage', ida_aug_conf=ida_aug_conf, training=True),
dict(type='DefaultFormatBundle3D', class_names=det_class_names),
Expand All @@ -45,6 +53,7 @@
test_pipeline = [
dict(type='LoadMultiViewImageFromFiles', to_float32=False, color_type='color'),
dict(type='LoadMultiViewImageFromMultiSweeps', sweeps_num=_num_frames_ - 1, test_mode=True),
dict(type='BEVAug', bda_aug_conf=bda_aug_conf, classes=det_class_names, is_train=False),
dict(type='LoadOccGTFromFile', num_classes=len(occ_class_names), inst_class_ids=[2, 3, 4, 5, 6, 7, 9, 10]),
dict(type='RandomTransformImage', ida_aug_conf=ida_aug_conf, training=False),
dict(type='DefaultFormatBundle3D', class_names=det_class_names),
Expand All @@ -56,11 +65,14 @@
workers_per_gpu=8,
train=dict(
pipeline=train_pipeline,
occ_gt_root=occ_gt_root),
occ_gt_root=occ_gt_root
),
val=dict(
pipeline=test_pipeline,
occ_gt_root=occ_gt_root),
occ_gt_root=occ_gt_root
),
test=dict(
pipeline=test_pipeline,
occ_gt_root=occ_gt_root),
occ_gt_root=occ_gt_root
),
)
1 change: 1 addition & 0 deletions loaders/ego_pose_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ def trans_matrix(T, R):
return tm


# A helper dataset for RayIoU. It is NOT used during training.
class EgoPoseDataset(Dataset):
def __init__(self, data_infos):
super(EgoPoseDataset, self).__init__()
Expand Down
2 changes: 1 addition & 1 deletion loaders/nuscenes_occ_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ def evaluate(self, occ_results, runner=None, show_dir=None, **eval_kwargs):
occ_loc = torch.from_numpy(occ_pred['occ_loc'].astype(np.int64)) # [B, N, 3]

data_type = self.occ_gt_root.split('/')[-1]
if data_type == 'occ3d':
if data_type == 'occ3d' or data_type == 'occ3d_panoptic':
occ_class_names = occ3d_class_names
elif data_type == 'openocc_v2':
occ_class_names = openocc_class_names
Expand Down
Loading

0 comments on commit fe9a966

Please sign in to comment.