Real image input example #4

JamshedAlamQaderi · 2024-03-30T22:14:46Z

Thank you so much for this awesome repo. I'm very excited to test this project. So, i've tried with example code but it gives me this below error

SyntaxError: Non-UTF-8 code starting with '\xff' in file C:\Users\alamj\Downloads\screenai.py on line 1, but no encoding declared; see https://peps.python.org/pep-0263/ for details

Could you use a real example of giving input image and text and converting them to vector and feed to the model. I really want to check it out

Thank you!

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

The text was updated successfully, but these errors were encountered:

github-actions · 2024-03-30T22:15:15Z

Hello there, thank you for opening an Issue ! 🙏🏻 The team was notified and they will get back to you asap.

Yingrjimsch · 2024-04-03T22:39:59Z

Hello,

I could run it with an actual img with the following code

import torch
from torchvision.io import read_image
from screenai.main import ScreenAI

# Create a tensor for the image
image = read_image('test.png').unsqueeze(0).to(torch.float32)
# Create a tensor for the text
text = torch.randint(0, 20000, (1, 1028))

# Create an instance of the ScreenAI model with specified parameters
model = ScreenAI(
    num_tokens = 20000,
    max_seq_len = 1028,
    patch_size=16,
    image_size=224,
    dim=512,
    depth=6,
    heads=8,
    vit_depth=4,
    multi_modal_encoder_depth=4,
    llm_decoder_depth=4,
    mm_encoder_ff_mult=4,
)

# Perform forward pass of the model with the given text and image tensors
out = model(text, image)

# Print the shape of the output tensor
print(out)

and a test image which needs to be 224 x 224 pixels for example:

Maybe this helps.

JamshedAlamQaderi · 2024-04-06T08:56:47Z

@Yingrjimsch thank you so much for the help. Can you also tell me if i could input prompt text and encode it to tensor? how to do decode output tensor?

Yingrjimsch · 2024-04-07T19:13:00Z

Hi @JamshedAlamQaderi I had no time yet to try that but I would suggest use the Hugging Face transformer library to find a tokenizer. Use the tokenizer on your input text and set num_tokens as well as max_seq_length to the tokenizers specs. If I have time I'll try it as well and keep you updated.

Barney-Steven · 2024-04-09T02:25:36Z

Hi, @JamshedAlamQaderi , this repo is not the official Implementation, you can see the definition in "from screenai.main import ScreenAI", it is a very simple structure. ScreenAI is not open source for now. I find something similar in Huggingface, try moondream2.

JamshedAlamQaderi · 2024-04-09T07:02:39Z

Thank you guys for helping me

JamshedAlamQaderi closed this as not planned Won't fix, can't repro, duplicate, stale Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Real image input example #4

Real image input example #4

JamshedAlamQaderi commented Mar 30, 2024 •

edited by polar-sh bot

Loading

github-actions bot commented Mar 30, 2024

Yingrjimsch commented Apr 3, 2024

JamshedAlamQaderi commented Apr 6, 2024 •

edited

Loading

Yingrjimsch commented Apr 7, 2024

Barney-Steven commented Apr 9, 2024 •

edited

Loading

JamshedAlamQaderi commented Apr 9, 2024

Real image input example #4

Real image input example #4

Comments

JamshedAlamQaderi commented Mar 30, 2024 • edited by polar-sh bot Loading

Upvote & Fund

github-actions bot commented Mar 30, 2024

Yingrjimsch commented Apr 3, 2024

JamshedAlamQaderi commented Apr 6, 2024 • edited Loading

Yingrjimsch commented Apr 7, 2024

Barney-Steven commented Apr 9, 2024 • edited Loading

JamshedAlamQaderi commented Apr 9, 2024

JamshedAlamQaderi commented Mar 30, 2024 •

edited by polar-sh bot

Loading

JamshedAlamQaderi commented Apr 6, 2024 •

edited

Loading

Barney-Steven commented Apr 9, 2024 •

edited

Loading