-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DCVC-FM model is only applicable to datasets converted using the BT.709 standard under RGB test conditions? #54
Comments
Thanks for the comments. We are focusing on YUV420 color space as most of traditional video codecs. We find that the same model could also work for RGB content (at least using BT.709 conversion). We did not test other RGB content. Would you please check whether the following two lines using the correct conversion matrix as expected (if not using BT.709 matrix)? https://github.com/microsoft/DCVC/blob/4df94295c8dbe0a26456582d1a0eddb3465f1597/DCVC-FM/src/utils/test_helper.py#L88C9-L88C37 |
Yes, I used the right matrix. My test pipeline is [1. read PNG files into RGB]->[2. convert RGB into YcBcR using function "rgb_to_ycbcr444"(src.transforms.functional)]->[3. compress the YcBcR frames with DCVC-FM model]->[4.convert YcBcR back into RGB using function "ycbcr444_to_rgb"(src.transforms.functional) ]. I surmise that the probable cause of the issue is the amplification of information loss during the transitions from RGB to YCbCr444 and back to RGB in steps 2 and 4, respectively. This loss is likely exacerbated for source frames or datasets that have not undergone conversion using the BT.709 standard. But, traditional codecs/previous RGB-space neural codecs appear to be more robust against variations in the color space conversion methods used for source frames. Could you please fine-tune a model for RGB input/output (I mean a model specifically for RGB frames, without YUV conversion) and release this model? I think just a few steps are enough to get this model. But since I don't have a training strategy, I might need to trouble you to go through this fine-tuning process. This will help identify the problem. Thanks! ! |
As you mentioned, rgb_to_ycbcr444 and ycbcr444_to_rgb in src.transforms.functional were used to for color space conversion. However, BT.709 is assumed in these two functions. |
Thanks, I will try that. |
Could you release the traing codes of DCVC-FM,since we want to do the fine tuning in our dataset. Thanks. |
|
Thank you for the released codes and models; they have significantly helped my research! However, I have encountered some confusion during the evaluation.
Most previous approaches have adopted PNG datasets extracted using ffmpeg software during the conversion from YUV420P to PNG. I tested both the DCVC-DC and DCVC-FM models on these datasets that were converted with ffmpeg. It was observed that the DCVC-FM model performed significantly worse under the same Group of Pictures (GOP) length of 32 in RGB test conditions, with the exception of the HEVC Class E dataset.
Has anyone else encountered this issue?
I conjecture the reason maybe that neural networks are easily fitted to data processing during training, considering that the training color conversion adheres to the BT.709 standard. However, traditional codecs perform consistently across different color conversion approaches.
The text was updated successfully, but these errors were encountered: