Skip to content

Commit

Permalink
code and model for the paper Neural Video Compression with Feature Mo…
Browse files Browse the repository at this point in the history
…dulation in CVPR 2024.
  • Loading branch information
ustclibin committed Feb 28, 2024
1 parent d3f0bd2 commit 78ac026
Show file tree
Hide file tree
Showing 43 changed files with 5,708 additions and 0 deletions.
131 changes: 131 additions & 0 deletions DCVC-FM/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# Introduction

Official Pytorch implementation for DCVC-FM: [Neural Video Compression with **F**eature **M**odulation](https://arxiv.org/abs/2402.17414), in CVPR 2024.

# Prerequisites
* Python 3.10 and conda, get [Conda](https://www.anaconda.com/)
* CUDA if want to use GPU
* Environment
```
conda create -n $YOUR_PY_ENV_NAME python=3.10
conda activate $YOUR_PY_ENV_NAME
conda install pytorch==2.0.0 torchvision==0.15.0 torchaudio==2.0.0 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt
```

# Test dataset

We support arbitrary original resolution. The input video resolution will be padded automatically. The reconstructed video will be cropped back to the original size. The distortion (PSNR) is calculated at original resolution.

## YUV 420 content

Put the*.yuv in the folder structure similar to the following structure.

/media/data/HEVC_B/
- BQTerrace_1920x1080_60.yuv
- BasketballDrive_1920x1080_50.yuv
- ...
/media/data/HEVC_D/
/media/data/HEVC_C/
...

The dataset structure can be seen in dataset_config_example_yuv420.json.

## RGB content

Please convert YUV 420 data to RGB data using BT.709 conversion matrix.

For example, one video of HEVC Class B can be prepared as:
* Make the video path:
```
mkdir BasketballDrive_1920x1080_50
```
* Convert YUV to PNG:

We use BT.709 conversion matrix to generate png data to test RGB sequences. Please refer to ./test_data_to_png.py for more details.

At last, the folder structure of dataset is like:

/media/data/HEVC_B/
* BQTerrace_1920x1080_60/
- im00001.png
- im00002.png
- im00003.png
- ...
* BasketballDrive_1920x1080_50/
- im00001.png
- im00002.png
- im00003.png
- ...
* ...
/media/data/HEVC_D/
/media/data/HEVC_C/
...

The dataset structure can be seen in dataset_config_example_rgb.json.

# Build the project
Please build the C++ code if want to test with actual bitstream writing. There is minor difference about the bits for calculating the bits using entropy (the method used in the paper to report numbers) and actual bitstreaming writing. There is overhead when writing the bitstream into the file and the difference percentage depends on the bitstream size.

## Build the entropy encoding/decoding module
```bash
sudo apt-get install cmake g++
cd src
mkdir build
cd build
conda activate $YOUR_PY_ENV_NAME
cmake ../cpp -DCMAKE_BUILD_TYPE=Release
make -j
```

## Build customized flow warp implementation (especially you want to test fp16 inference)
```
sudo apt install ninja-build
cd ./src/models/extensions/
python setup.py build_ext --inplace
```

# Pretrained models

* Download [our pretrained models](https://1drv.ms/f/s!AozfVVwtWWYoi1QkAhlIE-7aAaKV?e=OoemTr) and put them into ./checkpoints folder.
* Or run the script in ./checkpoints directly to download the model.
* There are 2 models, one for image coding and the other for video coding.

# Test the models

Example to test pretrained model with four rate points:
```bash
python test_video.py --model_path_i ./checkpoints/cvpr2024_image.pth.tar --model_path_p ./checkpoints/cvpr2024_video.pth.tar --rate_num 4 --test_config ./dataset_config_example_yuv420.json --cuda 1 --worker 1 --write_stream 0 --output_path output.json --force_intra_period 9999 --force_frame_num 96
```

It is recommended that the ```--worker``` number is equal to your GPU number.

You can also specify different ```--rate_num``` values (2~64) to test finer bitrate adjustment.

# Comparing with other method
Bit saving over VTM-17.0 (HEVC E (600 frames) with single intra-frame setting (i.e. intra-period = –1) and YUV420 colorspace.)

<img src="assets/bitsaving.png" width="600">

RD curve of YUV420 PSNR

<img src="assets/rd_yuv420_psnr.png" width="750">

# Acknowledgement
The implementation is based on [CompressAI](https://github.com/InterDigitalInc/CompressAI) and [PyTorchVideoCompression](https://github.com/ZhihaoHu/PyTorchVideoCompression).
# Citation
If you find this work useful for your research, please cite:

```
@inproceedings{li2024neural,
title={Neural Video Compression with Feature Modulation},
author={Li, Jiahao and Li, Bin and Lu, Yan},
booktitle={{IEEE/CVF} Conference on Computer Vision and Pattern Recognition,
{CVPR} 2024, Seattle, WA, USA, June 17-21, 2024},
year={2024}
}
```

# Trademarks
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow [Microsoft’s Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general). Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.
Binary file added DCVC-FM/assets/bitsaving.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added DCVC-FM/assets/rd_yuv420_psnr.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
21 changes: 21 additions & 0 deletions DCVC-FM/checkpoints/download.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
import urllib.request


def download_one(url, target):
urllib.request.urlretrieve(url, target)


def main():
urls = {
'https://onedrive.live.com/download?cid=2866592D5C55DF8C&resid=2866592D5C55DF8C%211494&authkey=!AOxzcrEFT_h-iCk': 'cvpr2024_image.pth.tar',
'https://onedrive.live.com/download?cid=2866592D5C55DF8C&resid=2866592D5C55DF8C%211493&authkey=!AFxYv6oK1o6GrZc': 'cvpr2024_video.pth.tar',
}
for url in urls:
target = urls[url]
print("downloading", target)
download_one(url, target)
print("downloaded", target)


if __name__ == "__main__":
main()
100 changes: 100 additions & 0 deletions DCVC-FM/dataset_config_example_rgb.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
{
"root_path": "/media/data/",
"test_classes": {
"UVG": {
"test": 1,
"base_path": "UVG",
"src_type": "png",
"sequences": {
"Beauty_1920x1080_120fps_420_8bit_YUV": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"Bosphorus_1920x1080_120fps_420_8bit_YUV": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"HoneyBee_1920x1080_120fps_420_8bit_YUV": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"Jockey_1920x1080_120fps_420_8bit_YUV": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"ReadySteadyGo_1920x1080_120fps_420_8bit_YUV": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"ShakeNDry_1920x1080_120fps_420_8bit_YUV": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"YachtRide_1920x1080_120fps_420_8bit_YUV": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96}
}
},
"MCL-JCV": {
"test": 1,
"base_path": "MCL-JCV",
"src_type": "png",
"sequences": {
"videoSRC01_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"videoSRC02_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"videoSRC03_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"videoSRC04_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"videoSRC05_1920x1080_25": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"videoSRC06_1920x1080_25": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"videoSRC07_1920x1080_25": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"videoSRC08_1920x1080_25": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"videoSRC09_1920x1080_25": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"videoSRC10_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"videoSRC11_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"videoSRC12_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"videoSRC13_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"videoSRC14_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"videoSRC15_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"videoSRC16_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"videoSRC17_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"videoSRC18_1920x1080_25": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"videoSRC19_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"videoSRC20_1920x1080_25": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"videoSRC21_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"videoSRC22_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"videoSRC23_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"videoSRC24_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"videoSRC25_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"videoSRC26_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"videoSRC27_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"videoSRC28_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"videoSRC29_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"videoSRC30_1920x1080_30": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96}
}
},
"HEVC_B": {
"test": 1,
"base_path": "HEVC_B",
"src_type": "png",
"sequences": {
"BQTerrace_1920x1080_60": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"BasketballDrive_1920x1080_50": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"Cactus_1920x1080_50": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"Kimono1_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96},
"ParkScene_1920x1080_24": {"width": 1920, "height": 1080, "frames": 96, "intra_period": 96}
}
},
"HEVC_E": {
"test": 1,
"base_path": "HEVC_E",
"src_type": "png",
"sequences": {
"FourPeople_1280x720_60": {"width": 1280, "height": 720, "frames": 96, "intra_period": 96},
"Johnny_1280x720_60": {"width": 1280, "height": 720, "frames": 96, "intra_period": 96},
"KristenAndSara_1280x720_60": {"width": 1280, "height": 720, "frames": 96, "intra_period": 96}
}
},
"HEVC_C": {
"test": 1,
"base_path": "HEVC_C",
"src_type": "png",
"sequences": {
"BQMall_832x480_60": {"width": 832, "height": 480, "frames": 96, "intra_period": 96},
"BasketballDrill_832x480_50": {"width": 832, "height": 480, "frames": 96, "intra_period": 96},
"PartyScene_832x480_50": {"width": 832, "height": 480, "frames": 96, "intra_period": 96},
"RaceHorses_832x480_30": {"width": 832, "height": 480, "frames": 96, "intra_period": 96}
}
},
"HEVC_D": {
"test": 1,
"base_path": "HEVC_D",
"src_type": "png",
"sequences": {
"BasketballPass_416x240_50": {"width": 416, "height": 240, "frames": 96, "intra_period": 96},
"BlowingBubbles_416x240_50": {"width": 416, "height": 240, "frames": 96, "intra_period": 96},
"BQSquare_416x240_60": {"width": 416, "height": 240, "frames": 96, "intra_period": 96},
"RaceHorses_416x240_30": {"width": 416, "height": 240, "frames": 96, "intra_period": 96}
}
}
}
}
100 changes: 100 additions & 0 deletions DCVC-FM/dataset_config_example_yuv420.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
{
"root_path": "/media/data/",
"test_classes": {
"UVG": {
"test": 1,
"base_path": "UVG",
"src_type": "yuv420",
"sequences": {
"Beauty_1920x1080_120fps_420_8bit_YUV.yuv": {"width": 1920, "height": 1080, "frames": 600, "intra_period": -1},
"Bosphorus_1920x1080_120fps_420_8bit_YUV.yuv": {"width": 1920, "height": 1080, "frames": 600, "intra_period": -1},
"HoneyBee_1920x1080_120fps_420_8bit_YUV.yuv": {"width": 1920, "height": 1080, "frames": 600, "intra_period": -1},
"Jockey_1920x1080_120fps_420_8bit_YUV.yuv": {"width": 1920, "height": 1080, "frames": 600, "intra_period": -1},
"ReadySteadyGo_1920x1080_120fps_420_8bit_YUV.yuv": {"width": 1920, "height": 1080, "frames": 600, "intra_period": -1},
"ShakeNDry_1920x1080_120fps_420_8bit_YUV.yuv": {"width": 1920, "height": 1080, "frames": 300, "intra_period": -1},
"YachtRide_1920x1080_120fps_420_8bit_YUV.yuv": {"width": 1920, "height": 1080, "frames": 600, "intra_period": -1}
}
},
"MCL-JCV": {
"test": 1,
"base_path": "MCL-JCV",
"src_type": "yuv420",
"sequences": {
"videoSRC01_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
"videoSRC02_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
"videoSRC03_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
"videoSRC04_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
"videoSRC05_1920x1080_25.yuv": {"width": 1920, "height": 1080, "frames": 125, "intra_period": -1},
"videoSRC06_1920x1080_25.yuv": {"width": 1920, "height": 1080, "frames": 125, "intra_period": -1},
"videoSRC07_1920x1080_25.yuv": {"width": 1920, "height": 1080, "frames": 125, "intra_period": -1},
"videoSRC08_1920x1080_25.yuv": {"width": 1920, "height": 1080, "frames": 125, "intra_period": -1},
"videoSRC09_1920x1080_25.yuv": {"width": 1920, "height": 1080, "frames": 125, "intra_period": -1},
"videoSRC10_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
"videoSRC11_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
"videoSRC12_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
"videoSRC13_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
"videoSRC14_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
"videoSRC15_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
"videoSRC16_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
"videoSRC17_1920x1080_24.yuv": {"width": 1920, "height": 1080, "frames": 120, "intra_period": -1},
"videoSRC18_1920x1080_25.yuv": {"width": 1920, "height": 1080, "frames": 125, "intra_period": -1},
"videoSRC19_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
"videoSRC20_1920x1080_25.yuv": {"width": 1920, "height": 1080, "frames": 125, "intra_period": -1},
"videoSRC21_1920x1080_24.yuv": {"width": 1920, "height": 1080, "frames": 120, "intra_period": -1},
"videoSRC22_1920x1080_24.yuv": {"width": 1920, "height": 1080, "frames": 120, "intra_period": -1},
"videoSRC23_1920x1080_24.yuv": {"width": 1920, "height": 1080, "frames": 120, "intra_period": -1},
"videoSRC24_1920x1080_24.yuv": {"width": 1920, "height": 1080, "frames": 120, "intra_period": -1},
"videoSRC25_1920x1080_24.yuv": {"width": 1920, "height": 1080, "frames": 120, "intra_period": -1},
"videoSRC26_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
"videoSRC27_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
"videoSRC28_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1},
"videoSRC29_1920x1080_24.yuv": {"width": 1920, "height": 1080, "frames": 120, "intra_period": -1},
"videoSRC30_1920x1080_30.yuv": {"width": 1920, "height": 1080, "frames": 150, "intra_period": -1}
}
},
"HEVC_B": {
"test": 1,
"base_path": "HEVC_B",
"src_type": "yuv420",
"sequences": {
"BQTerrace_1920x1080_60.yuv": {"width": 1920, "height": 1080, "frames": 600, "intra_period": -1},
"BasketballDrive_1920x1080_50.yuv": {"width": 1920, "height": 1080, "frames": 500, "intra_period": -1},
"Cactus_1920x1080_50.yuv": {"width": 1920, "height": 1080, "frames": 500, "intra_period": -1},
"Kimono1_1920x1080_24.yuv": {"width": 1920, "height": 1080, "frames": 240, "intra_period": -1},
"ParkScene_1920x1080_24.yuv": {"width": 1920, "height": 1080, "frames": 240, "intra_period": -1}
}
},
"HEVC_E": {
"test": 1,
"base_path": "HEVC_E",
"src_type": "yuv420",
"sequences": {
"FourPeople_1280x720_60.yuv": {"width": 1280, "height": 720, "frames": 600, "intra_period": -1},
"Johnny_1280x720_60.yuv": {"width": 1280, "height": 720, "frames": 600, "intra_period": -1},
"KristenAndSara_1280x720_60.yuv": {"width": 1280, "height": 720, "frames": 600, "intra_period": -1}
}
},
"HEVC_C": {
"test": 1,
"base_path": "HEVC_C",
"src_type": "yuv420",
"sequences": {
"BQMall_832x480_60.yuv": {"width": 832, "height": 480, "frames": 600, "intra_period": -1},
"BasketballDrill_832x480_50.yuv": {"width": 832, "height": 480, "frames": 500, "intra_period": -1},
"PartyScene_832x480_50.yuv": {"width": 832, "height": 480, "frames": 500, "intra_period": -1},
"RaceHorses_832x480_30.yuv": {"width": 832, "height": 480, "frames": 300, "intra_period": -1}
}
},
"HEVC_D": {
"test": 1,
"base_path": "HEVC_D",
"src_type": "yuv420",
"sequences": {
"BasketballPass_416x240_50.yuv": {"width": 416, "height": 240, "frames": 500, "intra_period": -1},
"BlowingBubbles_416x240_50.yuv": {"width": 416, "height": 240, "frames": 500, "intra_period": -1},
"BQSquare_416x240_60.yuv": {"width": 416, "height": 240, "frames": 600, "intra_period": -1},
"RaceHorses_416x240_30.yuv": {"width": 416, "height": 240, "frames": 300, "intra_period": -1}
}
}
}
}
Loading

0 comments on commit 78ac026

Please sign in to comment.