You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In image processing, both the original image and sliced images use the same resized_patch_height and resized_patch_width. However, in image encoding, the original image uses abstract_h_num and abstract_w_num, while the sliced image uses slice_h_num and slice_w_num, respectively. There appears to be an inconsistency between the two approaches.
I've released another implementation of LLaVA-UHD here, which I believe is more stable and elegant. The code of the new repo originates from this repo, but its overall quality is improved, and the training program is tested to be able to normally run without bugs.
When I reviewed this old repo and tried to fix this RuntimeError issue, I found it contains a lot of hidden bugs and calculations with wrong logic (violating the spirit of the original paper), and misses some necessary process (such as, image normalization). So I decided to rewrite the code and try my best to fix all these issues. Now I open-sourced my rewritten version.
You are very welcome to use it, and I look forward to your feedback.
In image processing, both the original image and sliced images use the same
resized_patch_height
andresized_patch_width
. However, in image encoding, the original image usesabstract_h_num
andabstract_w_num
, while the sliced image usesslice_h_num
andslice_w_num
, respectively. There appears to be an inconsistency between the two approaches.image processing:
LLaVA-UHD/llava_uhd/train/llava-uhd/slice_logic.py
Lines 189 to 202 in 302301b
image encoding:
LLaVA-UHD/llava_uhd/train/llava-uhd/adapt_clip.py
Lines 322 to 340 in 302301b
The text was updated successfully, but these errors were encountered: