Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Having 1 proxy in each room my presence keeps jumping between 3 rooms #329

Open
hajar97 opened this issue Oct 20, 2024 · 20 comments
Open

Having 1 proxy in each room my presence keeps jumping between 3 rooms #329

hajar97 opened this issue Oct 20, 2024 · 20 comments

Comments

@hajar97
Copy link

hajar97 commented Oct 20, 2024

Version of the custom_component

Configuration

Describe the bug

I assumed having proxies in the house is better and ideal is to have 1 proxy per room. I fitted a proxy in every room. Unfortunately, result is very unreliable presence measurement in every single room.

Let me give a concrete example:
I have my phone 40cm away from a proxy in the Office. A bedroom shares a wall with the office and second proxy is about 3 meters away through that wall. Third proxy is in the Bathroom, about 5 meters away through a door and a wall.

I calibrated Reference Power to be give me 1m. I tried all kinds of combinations of settings. Unfortunately, no matter what I do my reading keeps jumping between Office, Bedroom and Bathroom at least once within 1 min and goes on like this. Is there something very obvious I am missing, or do I just need to wait until Reference Power can be individually configured for each device so solve my problem? Would hugely appreciate any guidance.

Debug log

Here are my latest settings:
image

@agittins
Copy link
Owner

Can you please attach the results of a "download diagnostics" from Bermuda?
image

(If your system has been running a few days this might take a long time to run - it will usually complete OK but might take a few minutes, possibly. You can instead reload Bermuda, leave it for a few minutes, then do a download-diagnostics, which should only take a short time to complete).

The "Max Radius" setting should be set fairly high in order to effectively disable that feature, as it doesn't tend to work very well. I'd suggest 70m or something, rather than the 10m you have currently.

If you can upload a diagnostics I'll have a better idea of what's going on. My guess is that your proxies might not be reporting in the advertisements often enough, so Bermuda assumes that if another proxy has a more recent report, you must have moved there. The diags will show that though.

If you can also add which hardware you are using for your proxies and the yaml you're using on them that will help as well.

Something else that helps with visualising the issue is to use the "History" button in the HA sidebar, and add the device you want to troubleshoot, and reduce the timeframe down to a few minutes. The the Area and Distance sensors on the graph might give some hints, too.

But the main thing I need is the diagnostics.

@hajar97
Copy link
Author

hajar97 commented Oct 20, 2024

config_entry-bermuda-01JAK4SM6B21MSEDGAAYAAEY6Q.json

Thank you for the prompt reply. Your theory of what it could be might be right. But I also noticed that distance between proxies also changes and my phone is shown as being closer to the proxy in bathroom which is 5 metres, a wall and a door away than to a proxy that is 30cm away from it in direct sight.

I use different ESPHome devices as proxies throughout the house. But to keep things simpler for you, for this particular example all 3 are based on M5Atom S3 Lite.

@hajar97
Copy link
Author

hajar97 commented Oct 20, 2024

Sorry, forgot to add my YAML for all 3 devices. I tried both with and without scan_parameters, but it didn't seem to have any impact.

esphome:
  name: bathroom-2-atom
  friendly_name: Bathroom 2 Atom

esp32:
  board: esp32-s3-devkitc-1
  framework:
    type: arduino

# Enable logging
logger:

# Enable Home Assistant API
api:
  encryption:
    key: "..."

ota:
  - platform: esphome
    password: "..."

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "Bathroom-2-Atom Fallback Hotspot"
    password: "..."

captive_portal:

bluetooth_proxy:

esp32_ble_tracker:
  scan_parameters:
    interval: 1000ms # default 320ms. Time spent per adv channel
    window: 900ms # default 30ms. Time spent listening during interval.

@agittins
Copy link
Owner

Thank you for the prompt reply.

And thanks for the quick and comprehensive debug response! :-)

So taking a look at the diags, it looks like you have four devices configured via IRK and no other manually-added devices. I'm looking at "Moth BLE" since it looks to be located in the office at the time of the diags.

In the diags the first thing I'm looking at is the hist_interval data. This tells me how many seconds elapsed between each update we noticed from a given proxy for a given device. Since Bermuda checks every second, and most devices transmit advertisements every 200ms or so (this varies widely), we ideally like to see a listing of values around 1 second - or for Shelly devices, around 3 seconds due to how their firmware is set up.

The office proxy reports:

"hist_interval": [
    33.28667011899999,
    47.205457025,
    68.156908738
],

Which is pretty alarming :-) Your system looks like it's been up for 150seconds / 2.5minutes, but the office proxy has only reported seeing BLE Moth 3 times, at over 30sec intervals. It's possible that the esphome is rebooting or maybe restarting the ble part of it's firmware. The distances each time are around 70cm though, so the device is definitely "close" to this proxy.

Looking at the epl-living-room proxy, we see:

           "hist_interval": [
              0.913007623999988,
              1.341010194000006,
              0.17200125599998728,
              0.9320061900000098,
              1.1240066869999907,
              0.897004585000019,
              1.6870069149999836,
              0.9010027719999982,
              0.9380021610000142,
              0.9260013850000064
            ],

with fairly stable distance readings of about 4 metres. So the living room proxy looks really healthy.

The bathroom-2-atom proxy looks unhealthy: intervals of 4, 48 and 67, but at distances of 1m, .46m and 1.3m.

ir-obstacle-sensor looks healthy, pretty solid intervals between 1 and 2 seconds, distance about 3.5m.

epl-kitchen seems too far away to get anything useful (one advert at 18m).

So from that it looks like even though Moth BLE is quite close to the office and bathroom proxies, they are reporting in so intermittently that Bermuda is switching to the more timely reports coming from ir-obstacle-sensor and epl-living-room, since they keep reporting readings when the office and bathroom proxies are not.

A few things with your atom configs:

  • the arduino platform is definitely not recommended, apparently the esp-idf platform works a lot better for ble stuff.
  • My personal current preference for interval and window are 320ms and 300ms (or 290ms). I originally liked the 1-second values but I think it leads to the device having too many adverts waiting to send and running out of memory, possibly. And the 320/300 timing seems to capture most adverts OK.
  • I really like setting baud_rate: 0 in the logger, so that it doesn't try to do serial logging, only system logging.
  • captive_portal might cause extra memory usage, as I think it might pull in the web component.

I'd suggest taking out the bluetooth stuff, and altering it to just pull in this package which does pretty much the same stuff, and also makes some other changes like some SDK flags and an automation to disable BLE scanning until the proxy has estabished its connection to HA:

packages:
  Bermuda.c3: github://agittins/bermuda-proxies/packages/bermuda-proxy-c3.yaml

You can view the config it's pulling in here if you'd rather copy them in directly: https://github.com/agittins/bermuda-proxies/blob/main/packages/bermuda-proxy-c3.yaml

I'll be pushing more configs to that repo soon for other boards as well, since I think this is a common issue.

Do you want to try updating your office (and ultimately bathroom) proxies with that and seeing if it improves things? If you do another diagnostics after that I can take a look and verify if the intervals are improved. Once we have those locked in you should find the area sensors a lot more stable, but we can see how it goes from there and keep digging if it's still not right.

I just checked the stats for K iPhone, and it looks similar:

  • spotty 3.6m from office
  • reliable 5m from kitchen
  • reliable 11m to living room

So again it's closest to office, but because the office proxy is working poorly, it will bounce to kitchen most of the time.

Hopefully the firmware changes to office and bathroom will improve things a lot!

@hajar97
Copy link
Author

hajar97 commented Oct 21, 2024 via email

@agittins
Copy link
Owner

Wow. Really appreciate such a detailed analysis. This helps hugely.

No worries! There are so many moving parts and so little visibility into what's going on that I just accept that I'll have to build tools to help people debug it, and until then... debug it myself! 😅

There is 1 thing I cannot understand. Both Bathroom 2 and Office are exactly the same M5 Atom S3 Lite with exactly the same yaml configuration. The only difference is that Bathroom 2 was located quite a bit further away from Moth than Office. How can it be that office is reporting so rarely, while Bathroom 2 more frequently? Could it be due to USB port they are plugged in that somehow yields too little power?

So the Office and the Bathroom proxies both look equally unhealthy, it's the Living room that looks good, did you mean the living room one?

If you mean that living room and office have the same config, I can only think of two things off the top of my head:

  • Variations in hardware. These are (relatively) cheap units, and it's likely that minor differences exist between different boards even from the same production run. These may usually be invisible (they sort of have to be, for a digital processor) but perhaps when at the edge of their performance capabilities the "bad copies" drop their bundle in sudden ways.
  • Difference in environment, such as power supply (as you already surmised), or RF environment. It might be that the psu on the office one might not deliver as clean a voltage, perhaps putting noise on the voltage rail that causes instability, or perhaps the living room one is under less load because it has fewer BLE devices within it's hearing range, so only has a few advertisements to handle per second, while the office one might be getting so many adverts per second that it keeps dropping the whole bundle. This can be especially problematic during start-up, if the unit is too busy with BLE to sort out a solid wifi and api connection. But I'm assuming, it's really hard to say.

You could try swapping the two units temporarily, swapping their power supplies etc, and seeing if the problem moves with something (or stays behind with something else).

I would lean toward it being on the edge of the performance these chips can manage, and for whatever reason the office one is tripping over that edge and the living room one, for now, isn't. Not a very satisfying answer, I know!

I'd definitely make the firmware changes though, and see what difference it makes to them both.

Oh... have you had the office and bathroom units for very long? There was a change made to the flash layout in esphome 2022.12, and only a serial flash via usb can apply that change (OTA updates just left the flash in the old format, which I think leaves less space for BLE-relevant things, as I understand it). So if you haven't done a usb serial flash on the unit since Dec 2022, definitely give that a go, too.

Ah, one other possibility - do you have any bluetooth integrations that might be making outbound connections (thermometers, window sensors etc)? If so, it's possible that the office or bathroom proxies might be getting tangled up doing outbound proxy connections to devices, stopping them from reliably reporting advertisements.

@hajar97
Copy link
Author

hajar97 commented Oct 21, 2024 via email

@hajar97
Copy link
Author

hajar97 commented Oct 21, 2024 via email

@agittins
Copy link
Owner

How long should I leave it running before sending you the next batch of diagnostics to check?

Just three minutes should be plenty of time for things to settle and have a good history to show (longer if fine too, of course).

Living Room and Kitchen are both Everything Presence Lite sensors, so are probably more powerful ESP32 devices altogether which explains their more regular signal.

Yes, looks like he's using normal ESP32's for those rather than C3's. But, interestingly, no fancy firmware settings.

any other ideas or suggestions

Taking a look at the hist_interval sets after your firmware changes, and possibly just a copy of the yaml for completeness, should be enough to see where we're at now 👍🏼 (I'm probably heading off to sleep pretty soon though, so expect some lag on the next round!)

@hajar97
Copy link
Author

hajar97 commented Oct 21, 2024

Ok, so attached is the new diagnostics file. The issue is the same. My phone is right next to the Office proxy, but in HA my location keeps jumping between Office, Bathroom 2 and Kids Room all the time non-stop.

config_entry-bermuda-01JAK4SM6B21MSEDGAAYAAEY6Q (1).json

Here is the modified YAML config file based on your recommendations. Please note that I was unable to use esp-idf because when I did that I had my proxy in permanent reboot loop and error message that I shared earlier.

esphome:
  name: office-atom
  friendly_name: Office Atom

esp32:
  board: esp32-s3-devkitc-1
  framework:
    type: arduino

# Enable logging
logger:
   baud_rate: 0

# Enable Home Assistant API
api:
  encryption:
    key: "..."
  # Only enable BLE tracking when wifi is up and api is connected
  # Gives single-core ESP32-C3 devices time to manage wifi and authenticate with api
  on_client_connected:
     - esp32_ble_tracker.start_scan:
        continuous: true
  # Disable BLE tracking when there are no api connections live
  on_client_disconnected:
    if:
      condition:
        not:
          api.connected:
      then:
        - esp32_ble_tracker.stop_scan:

ota:
  - platform: esphome
    password: "..."

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  use_address: 192.x.x.x

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "Office-Atom Fallback Hotspot"
    password: "..."

# captive_portal:

esp32_ble_tracker:
  scan_parameters:
    # Don't auto start BLE scanning, we control it in the `api` block's automation.
    continuous: False
    
    active: True  # send scan-request packets to gather more info, like device name for some devices.

    interval: 320ms  # default 320ms - how long to spend on each advert channel
    window:   300ms  # default 30ms - how long to actually "listen" in each interval. Reduce this if device is unstable.
    # If the device cannot keep up or becomes unstable, reduce the "window" setting. This may be
    # required if your device is controlling other sensors or doing PWM for lights etc.

bluetooth_proxy:
  active: True  # allows outbound connections from HA to devices.

@jsheheane
Copy link

jsheheane commented Oct 21, 2024 via email

@hajar97
Copy link
Author

hajar97 commented Oct 21, 2024 via email

@hajar97
Copy link
Author

hajar97 commented Oct 23, 2024 via email

@agittins
Copy link
Owner

Howdy, just taking a look now, sorry.

Looking at Moth again...

  • office-atom looks really good, intervals from 0.2 to 1.127 seconds - very consistent!
"hist_interval": [
              1.1270099860048504,
              0.20700183499866398,
              1.1270099870016566,
              0.9220081699968432,
              1.1270099870016566,
              1.12500996900053,
              0.9220081710009254,
              0.9220081709936494,
              0.9200081540038809,
              1.1260099799983436
            ],
  • kids room atom looks to be out of range (for about 40s)
  • epl-living-room looks great, 1s +/- 0.3s, even at 5-7m away.
  • ir-obstable-sensor is a bit more variable, from 0.7s to 3.38s - but at 5-7m away that's pretty reasonable.
  • bedroom-atom is out of range (for about 30s)
  • bathroom-2-atom is out of range (for about 40s)

Looking at Dani-iPad:

  • ir-obstacle-sensor looks v good, 1-2s intervals, even at 11m!
  • office-atom is pretty good, mostly 2s, at 3m
  • epl-living-room is solid at 1s intervals, 5m
  • kids-room-atom is out of range for ~ 40s
  • bathroom-2-atom is out of range for 30s, but intervals look a bit variable before that.
  • bedroom-atom is out of range.

So across those two devices:

  • office-atom working well
  • esp-living-room working well
  • ir-obstacle-sensor working well

While kids-room, bedroom and bathroom-2-atom were all basically out-of-range (or failing to report).

If those stats support how things were (ie, that bathroom-2 was out of range at that time (or maybe rebooting?) then it looks like things are OK as far as the bluetooth backend goes, at least for the living-room, office and terrace proxies, anyway.

After re-reading your notes on that last diag:

My phone is right next to the Office proxy, but in HA my location keeps jumping between Office, Bathroom 2 and Kids Room all the time non-stop.

When I mention "being out of range" above I am assuming that based on them not reporting a signal for 30s or more. But if you are getting flips every 20 to 40 seconds or so, maybe that's what's doing it, and the problem is that the bedroom and bath proxies are doing well at receiving signals, but failing to stay up and report them. This might mean they are failing their ble stack internally or something.

Hmmm... can I ask you to:

  • create 60 seconds of esphome debug log (config, devices and services, esphome, enable debug logging, wait 30s, disable debug logging - then send me the result)
  • immediately after that, do another diagnostics from bermuda

Note that the debug logging will have IP addresses and full mac addresses in it, I'd suggest either emailing it to me [email protected] or uploading it to my nextcloud drop box https://cloud.ajg.net.au/index.php/s/JpeXDnZQGeXqqHB

I think it's worth trying to get esp-idf working, it really should be possible, but I have seen other people having similar errors when googling it.

Turns out esp-idf is not really working for M5 Atom Lite. I was getting the device constantly rebooting and this error in the log:

[13:40:48]Saved PC:0x400454d5
[13:40:48]SPIWP:0xee
[13:40:48]mode:QIO, clock div:1
[13:40:48]load:0x3fce3808,len:0x16c4
[13:40:48]ets_loader.c 78
[13:40:49]ESP-ROM:esp32s3-20210327
[13:40:49]Build:Mar 27 2021
[13:40:49]rst:0x7 (TG0WDT_SYS_RST),boot:0x28 (SPI_FAST_FLASH_BOOT)
[13:40:49]Saved PC:0x400454d5

You could try:

esp32:
  board: m5stack-atoms3
  variant: esp32s3
  framework:
    type: esp-idf

But I think it's the same as the generic devkit board spec you already tried. It is probably worth trying again, but first doing a "clean build files" in esphome, and flashing it via USB instead of OTA, in case the partitioning needs to be altered - which might (maybe?) have caused the boot loop you were getting.

@agittins
Copy link
Owner

I tried swapping Office and Bathroom 2 devices as you suggested because from first analysis of diagnostics it seemed that Bathroom 2 is sending data more regularly than Office, but that didn’t seem to have any effect. I still get my phone circulating evenly between Office, Bathroom 2 and Kids Room

For the swapping thing, I'd need a diagnostics for each "set-up". So swap the office and bath proxies, have the phone in the office (next to the bath proxy) for a minute, then grag a diagnostics (and notate what the conditions were - which psu on which proxy in which room, with which device).

Another thing you can try which will be a lot more enjoyable and might help visualise the issue, is to enable the extra sensors for your phone named "distance to ...", "unfiltered distance to..." and "nearest scanner". Then you can go to the "history" view in HA, and add your phone (click "+ choose Device"). Set the "from" time to the most recent 5 minutes. This will give you a reasonably "realtime" comparative view of things. Note that the newly-enabled sensors only start gathering data after you enable them, so you might need to wait a bit (like, a minute).

Here's what my watch looks like:
image

I have two proxies in my "studio", one is about 50cm from my wrist, the other about 2m. Even though they are both quite close, you can see that it hasn't flipped the "nearest scanner" sensor (they certainly do occasionally, but given the noise in the unfiltered signal it's surprising how stable it is). You can see the unfiltered distances bounce around a fair bit, and the filtered distance smooths along the "bottom" of the unfiltered curve.

I'm guessing we'll see long gaps in the problematic proxies with occasional, very "short" distances reported from them. But it will be interesting to see at both a zoomed-in (sub-5-minute) and a wider (1hr) view.

@agittins
Copy link
Owner

Oh, and just found the DIO vs QIO thing (at a post about C3 but probably worth trying):

esphome:
  # ...
  platformio_options:
    board_build.flash_mode: dio

Might be worth a shot.

@hajar97
Copy link
Author

hajar97 commented Oct 24, 2024

Hi there, thanks a lot for getting back to me again. Here is the latest update from me:

  1. There has been a new ESPHome version and I tried to compile with esp-edf (instead of arduino) again. For whatever reason this seems to be working without reboots. At least I don't notice them. Wonder what do you say it means for Office, Kids Room and Bathroom 2 proxies.

  2. Unfortunately for me there is no change. My phone is located next to Office proxy, yet Area field in HA keeps jumping between Office, Kids Room and occasionally Bathroom 2. See the screenshots of just 5 mins of my phone sitting 20cm away from Office proxy:

image

  1. I have attached latest diagnostics file. I am not sure how far the data goes back, but I would suggest you really only look at the last 10-15 mins, this is when I was doing the check for which I sent the above screenshot.

config_entry-bermuda-01JAK4SM6B21MSEDGAAYAAEY6Q (2).json

  1. It is getting late here now. I will try to collect those ESPHome debug details you asked for tomorrow. So far I had no luck getting the presence detection work unfortunately. Things are not stable at all, no matter which part of the house I go to. Reported area keeps cycling through multiple locations all the time non stop. I think we must be missing something really obvious here given that most people get it working quite stably without any additional configurations and I tried so many things and it is anything but stable.

@hajar97
Copy link
Author

hajar97 commented Oct 24, 2024

Hm, not sure if this means anything important, but when I look at the log of each proxy, I get different info despite using exactly the same kind of device and exactly the same YAML file.

Office Proxy:

image

Kids Room Proxy:

image

Bathroom 2 Proxy:

image

Note how Bathroom 2 has Hardware UART different from Office and Kids Room.
Note how Office has a bunch of additional Bluetooth and BLE configurations which are missing for Bathroom 2 and Kids Room.

Any clues with that perhaps?

@hajar97
Copy link
Author

hajar97 commented Oct 24, 2024

Hm, turns out if I go to Log for Kids room proxy a few times, eventually I get presented with a view similar to Office proxy, which includes those additional BLE configurations:

image

May be it is just way Log information is displayed in ESPHome, but may be it is an indicator of something being wrong...

@hajar97
Copy link
Author

hajar97 commented Oct 29, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants