Question :The code to generate Vision Features #35

roapple10 · 2023-03-02T15:40:49Z

I would like to study more about the Vision Features, is it convenient to share the coding part to generate the npy file?
Much appreciate the hard work here.

gianfrancodemarco · 2023-03-05T18:11:02Z

Bump

Francesco-Ranieri · 2023-03-12T18:41:40Z

Hi,
I made this code snippet for visual feature extraction. Unfortunately, the results obtained on the ScienceQA dataset differ (slightly) from those present in this repository. Despite this, the results obtained are consistent in size and allow the execution of both classification and rationale generation.
Hope it can be useful.

from transformers import AutoImageProcessor, DetrForObjectDetection
from PIL import Image
import torch

pretrained_model = "facebook/detr-resnet-101-dc5"
image_processor = AutoImageProcessor.from_pretrained(pretrained_model)
model = DetrForObjectDetection.from_pretrained(pretrained_model)

image_path = "img.jpg"
image = Image.open(image_path)
inputs = image_processor(images=image, return_tensors="pt")
outputs = model(**inputs) 

# the last hidden states are the final query embeddings of the Transformer decoder
vision_features = outputs.last_hidden_state.numpy()

aiPenguin · 2023-03-14T22:41:16Z

Thanks the author for this awesome work!

Some questions in the dataset contain both image of question and images of the choices. I was wondering how the author get the visual features in this case. Are there some pooling funtion applied?

How do you deal with this case, Francesco-Ranieri?

Francesco-Ranieri · 2023-03-16T21:40:10Z

As long as i understood by their implementation, always one image features vector is used for each question. Being the code of the vision features generation not available, we need an answer from the authors to know if any pooling function was applied.
However, i honestly think that only one image was taken into consideration.

aiPenguin · 2023-03-23T16:52:40Z

Same opinion as yours. But I found that there are more features in .npy than questions which have image contexts. So I open another issue with respect to it. #46

Francesco-Ranieri mentioned this issue Mar 13, 2023

Scouting vision feature extractor gianfrancodemarco/mm-cot#10

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question :The code to generate Vision Features #35

Question :The code to generate Vision Features #35

roapple10 commented Mar 2, 2023

gianfrancodemarco commented Mar 5, 2023

Francesco-Ranieri commented Mar 12, 2023

aiPenguin commented Mar 14, 2023

Francesco-Ranieri commented Mar 16, 2023

aiPenguin commented Mar 23, 2023

Question :The code to generate Vision Features #35

Question :The code to generate Vision Features #35

Comments

roapple10 commented Mar 2, 2023

gianfrancodemarco commented Mar 5, 2023

Francesco-Ranieri commented Mar 12, 2023

aiPenguin commented Mar 14, 2023

Francesco-Ranieri commented Mar 16, 2023

aiPenguin commented Mar 23, 2023