How to upgrade to gpt-4o #219

The-Erf · 2024-05-19T23:16:27Z

Hello
What I have in mind is that gpt can analyze the images of the cameras connected to the home assistant
for example :
How many people do you see on the camera?
Or what is the color of their clothes?
Do they look suspicious?

In the first step, I tried to set its language model to gpt-4o in the extended open ai conversion settings
as a result :
The response speed is relatively better
But when I asked him to analyze the camera images, he replied that I don't have access to cameras or that I don't have the ability to process images.

After a little searching, I found this: https://community.home-assistant.io/t/gpt-4o-vision-capabilities-in-home-assistant/729241
I installed it and after 1 day
I succeeded!
in such a way that
When I say open ai conversion
"what do you see?"
1- My automation or script is executed
2- A photo is taken from the camera I specified
3-Then I send that photo to ha-gpt4vision
4- The response of ha-gpt4vision is converted to sound with tts

If I'm honest, the result is good. lol:)
But its problems are many
For example, it is very limited
Or sometimes its tts sound interferes with openai conversion (tts sounds are played at the same time)

Or I have to write a lot of scripts to run ha-gpt4vision (for example, if the word x is said, take a picture and analyze the picture.
If the word b is said, take a picture and say what it is used for.
If the word c is said, take a picture and tell if the person in the picture is suspicious or not.
In this way, you have to write a lot of scripts to analyze each different photo

I'm looking for a way to not write scripts
For example, extended open ai conversion can directly access the cameras, and when we say for example, what do you see in the camera? Analyze the camera image in real time with GPT-4O

In the end, I hope I have explained correctly and you understand because I used Google translator ♥

jleinenbach · 2024-05-20T15:26:20Z

So you have issues with your workflow.
ha-gpt4vision uses a service you could use, but input needs to be an image.
Your goal needs to be that the response of ha-gpt4vision gets back to your conversation.

So I'd recommend that you write an Extended OpenAI Conversation script that includes the complete workflow.
Note: All response variables are dicts!

Here's a template I generated for you with ChatGPT without modifying it, but so that you have an idea how to start:

- spec:
    name: capture_and_analyze_photo
    description: >
      Captures a photo, sends the URL to the ha-gpt4vision service, and retrieves the description of the photo.
    parameters:
      type: object
      properties:
        camera_entity_id:
          type: string
          description: The entity_id of the camera to capture the photo from.
      required:
        - camera_entity_id
  function:
    type: composite
    sequence:
      - type: script
        sequence:
          - service: camera.snapshot
            target:
              entity_id: "{{ camera_entity_id }}"
            data:
              filename: "/config/www/tmp/photo.jpg"
          - service: homeassistant.update_entity
            target:
              entity_id: "{{ camera_entity_id }}"
        response_variable: photo_url
      - type: template
        value_template: >
          {{ "http://your-home-assistant-url:8123/local/tmp/photo.jpg" }}
        response_variable: photo_url
      - type: script
        sequence:
          - service: ha-gpt4vision.analyze_photo
            data:
              photo_url: "{{ photo_url }}"
        response_variable: photo_description
      - type: template
        value_template: >
          {{ photo_description }}
        response_variable: final_description

mkammes · 2024-05-20T20:57:32Z

Uploading a picture to a vision complaint OpenAI model was added to Extended OpenAI Conversation several months ago, @jleinenbach @The-Erf .

#43

You can use a sentence trigger through the HA GUI with keywords to trigger what kind of image analysis you want or simply ask in your native language and let ChatGPT "understand" what you say.

The-Erf · 2024-05-21T04:00:35Z

Uploading a picture to a vision complaint OpenAI model was added to Extended OpenAI Conversation several months ago, @jleinenbach @The-Erf .

#43

You can use a sentence trigger through the HA GUI with keywords to trigger what kind of image analysis you want or simply ask in your native language and let ChatGPT "understand" what you say.

Tanks can you Explain more ؟

mkammes · 2024-05-21T15:33:05Z

The developer outlines it here: #60

valentinfrlch · 2024-05-22T11:48:23Z

This spec, as taken from this post, allows you to chat with Extended OpenAI about an image (multiple images if you want):

- spec:
    name: vision
    description: Analyze images
    parameters:
      type: object
      properties:
        message:
          type: string
          description: Analyze the images as requested by the user
      required:
      - request
  function:
    type: script
    sequence:
    - service: gpt4vision.image_analyzer
      data:
        max_tokens: 400
        message: "{{request}}"
        image_file: |-
          /media/Allarme_Camera.jpg
          /media/Allarme_Sala1.jpg
          /media/Snapshot_Giardino1_20240425-090813.jpg
        provider: OpenAI
        model: gpt-4-vision-preview
        target_width: 1280
        temperature: 0.5
      response_variable: _function_result

ChristianEvc · 2024-06-20T09:24:32Z

@valentinfrlch
Super beginner question here, but where do I add the spec for this to work, please? I'm just getting started with Home Assistant, so still figuring things out. Greatful for any guidance anyone can give on this!

valentinfrlch · 2024-06-20T12:07:48Z

So I assume you have installed OpenAI Extended Conversation.

Go to Settings > Devices & services > Extended OpenAI Conversation
You should see your entry here, if not, you need to add the integration first.
Click the configure button. There should be a "Functions" text field in the dialog. This is where the specs go.

The spec posted here has also been updated. You can find the updated version in the wiki of gpt4vision

Also note that you need to install gpt4vision (a separate integration) for this spec to work. You can do so through HACS, just follow the instructions here.

ChristianEvc · 2024-06-20T16:35:35Z

Thanks very much. I had an existing spec, so wasn't sure wether to replace or append, but seems like append is the way to go. Thank you @valentinfrlch !

The-Erf mentioned this issue May 19, 2024

Help to connect extended open ai conversion to ha-gpt4vision valentinfrlch/ha-llmvision#2

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to upgrade to gpt-4o #219

How to upgrade to gpt-4o #219

The-Erf commented May 19, 2024 •

edited

Loading

jleinenbach commented May 20, 2024 •

edited

Loading

mkammes commented May 20, 2024 •

edited

Loading

The-Erf commented May 21, 2024 •

edited

Loading

mkammes commented May 21, 2024

valentinfrlch commented May 22, 2024 •

edited

Loading

ChristianEvc commented Jun 20, 2024

valentinfrlch commented Jun 20, 2024

ChristianEvc commented Jun 20, 2024

How to upgrade to gpt-4o #219

How to upgrade to gpt-4o #219

Comments

The-Erf commented May 19, 2024 • edited Loading

jleinenbach commented May 20, 2024 • edited Loading

mkammes commented May 20, 2024 • edited Loading

The-Erf commented May 21, 2024 • edited Loading

mkammes commented May 21, 2024

valentinfrlch commented May 22, 2024 • edited Loading

ChristianEvc commented Jun 20, 2024

valentinfrlch commented Jun 20, 2024

ChristianEvc commented Jun 20, 2024

The-Erf commented May 19, 2024 •

edited

Loading

jleinenbach commented May 20, 2024 •

edited

Loading

mkammes commented May 20, 2024 •

edited

Loading

The-Erf commented May 21, 2024 •

edited

Loading

valentinfrlch commented May 22, 2024 •

edited

Loading