Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DCVC-FM model is only applicable to datasets converted using the BT.709 standard under RGB test conditions? #54

Open
tianyuan168326 opened this issue May 6, 2024 · 6 comments

Comments

@tianyuan168326
Copy link

Thank you for the released codes and models; they have significantly helped my research! However, I have encountered some confusion during the evaluation.

Most previous approaches have adopted PNG datasets extracted using ffmpeg software during the conversion from YUV420P to PNG. I tested both the DCVC-DC and DCVC-FM models on these datasets that were converted with ffmpeg. It was observed that the DCVC-FM model performed significantly worse under the same Group of Pictures (GOP) length of 32 in RGB test conditions, with the exception of the HEVC Class E dataset.

Has anyone else encountered this issue?

I conjecture the reason maybe that neural networks are easily fitted to data processing during training, considering that the training color conversion adheres to the BT.709 standard. However, traditional codecs perform consistently across different color conversion approaches.

image
image
image
image

@yaohualibin
Copy link
Contributor

Thanks for the comments. We are focusing on YUV420 color space as most of traditional video codecs. We find that the same model could also work for RGB content (at least using BT.709 conversion). We did not test other RGB content. Would you please check whether the following two lines using the correct conversion matrix as expected (if not using BT.709 matrix)?

https://github.com/microsoft/DCVC/blob/4df94295c8dbe0a26456582d1a0eddb3465f1597/DCVC-FM/src/utils/test_helper.py#L88C9-L88C37
https://github.com/microsoft/DCVC/blob/4df94295c8dbe0a26456582d1a0eddb3465f1597/DCVC-FM/src/utils/test_helper.py#L123C1-L123C72

@tianyuan168326
Copy link
Author

tianyuan168326 commented May 6, 2024

Yes, I used the right matrix. My test pipeline is [1. read PNG files into RGB]->[2. convert RGB into YcBcR using function "rgb_to_ycbcr444"(src.transforms.functional)]->[3. compress the YcBcR frames with DCVC-FM model]->[4.convert YcBcR back into RGB using function "ycbcr444_to_rgb"(src.transforms.functional) ].

I surmise that the probable cause of the issue is the amplification of information loss during the transitions from RGB to YCbCr444 and back to RGB in steps 2 and 4, respectively. This loss is likely exacerbated for source frames or datasets that have not undergone conversion using the BT.709 standard.

But, traditional codecs/previous RGB-space neural codecs appear to be more robust against variations in the color space conversion methods used for source frames.

Could you please fine-tune a model for RGB input/output (I mean a model specifically for RGB frames, without YUV conversion) and release this model? I think just a few steps are enough to get this model. But since I don't have a training strategy, I might need to trouble you to go through this fine-tuning process. This will help identify the problem. Thanks! !

@yaohualibin
Copy link
Contributor

As you mentioned, rgb_to_ycbcr444 and ycbcr444_to_rgb in src.transforms.functional were used to for color space conversion. However, BT.709 is assumed in these two functions.
If you RGB is not converted from BT.709, I would suggest modifying rgb_to_ycbcr444 and ycbcr444 (or add new functions) to use the correct conversion matrix.

@tianyuan168326
Copy link
Author

Thanks, I will try that.

@james20181013
Copy link

As you mentioned, rgb_to_ycbcr444 and ycbcr444_to_rgb in src.transforms.functional were used to for color space conversion. However, BT.709 is assumed in these two functions. If you RGB is not converted from BT.709, I would suggest modifying rgb_to_ycbcr444 and ycbcr444 (or add new functions) to use the correct conversion matrix.

Could you release the traing codes of DCVC-FM,since we want to do the fine tuning in our dataset. Thanks.

[email protected]

@ZLN12
Copy link

ZLN12 commented Aug 4, 2024

As you mentioned, rgb_to_ycbcr444 and ycbcr444_to_rgb in src.transforms.functional were used to for color space conversion. However, BT.709 is assumed in these two functions. If you RGB is not converted from BT.709, I would suggest modifying rgb_to_ycbcr444 and ycbcr444 (or add new functions) to use the correct conversion matrix.

Could you release the traing codes of DCVC-FM,since we want to do the fine tuning in our dataset. Thanks.

[email protected]
Did you receive the training code? If so, could you send me a copy? Thank you very much!
[email protected]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants