Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some errors (deadlocks/OOMs) do not seem to be reported to Meadow.Cloud #782

Open
duduita opened this issue Sep 29, 2024 · 7 comments
Open
Assignees

Comments

@duduita
Copy link

duduita commented Sep 29, 2024

Describe the bug
Some errors (deadlocks/OOMs) from crash/mono_error.txt do not seem to be reported to Meadow.Cloud.

To Reproduce
Steps to reproduce an OOM (out-of-memory error):

  1. Call the AllocateMemory() method during the app initialization:
        static void AllocateMemory()
        {
            List<byte[]> allocations = new List<byte[]>(); // Store the allocated memory
            int iteration = 0;

            // Loop to keep allocating 3MB chunks
            while (true)
            {
                byte[] memoryBlock = new byte[3 * 1024 * 1024]; // Allocate 3MB
                allocations.Add(memoryBlock); // Keep reference to prevent GC from collecting
                iteration++;

                Console.WriteLine($"Iteration: {iteration}, Allocated: {allocations.Count * 3} MB");

                // Sleep a little to simulate some delay between allocations
                Thread.Sleep(100);
            }
        }
  1. After the app restarts, check meadow files using the meadow file list CLI command to see whether there isn't a mono_error.txt or not, and the Meadow.Cloud to see if there are a event containing this error there.

Expected behavior
The mono_error.txt should have been created, as well as sent to the Meadow.Cloud.

Meadow (please complete the following information as best as you can):
Board Information
Model: F7Micro
Hardware version: F7CoreComputeV2
Device name: CellBasics

Hardware Information
Processor type: STM32F777IIK6
ID: 3A-00-21-00-0D-50-4B-55-30-38-31-20
Serial number: 20523874554B
Coprocessor type: ESP32
MAC Address -
WiFi: 4C:75:25:D5:78:A0

Firmware Versions
OS: 1.14.0.0
Mono: 1.14.0.0
Coprocessor: 1.14.0.0
Protocol: 7

@duduita duduita self-assigned this Sep 29, 2024
@NevynUK
Copy link

NevynUK commented Sep 30, 2024

I am not sure that the code above actually represents an error in the OS. Mono does actually generate an exception which the application can catch.

Doesn't Core take on responsibility for unhandled exceptions if you have the right configuration settings in app.config.yaml?

@duduita
Copy link
Author

duduita commented Oct 1, 2024

I'm assuming that it represents an error in the OS given that the induce_reset() was called, and the device rebooted, but I can be wrong @NevynUK.

@duduita duduita changed the title Some errors (deadlocks/OOMs) do not seem to be reported to Meadow.Cloud Some errors (deadlocks/OOMs) do not seem to be generated/reported to Meadow.Cloud Oct 1, 2024
@NevynUK
Copy link

NevynUK commented Oct 1, 2024

I think I tried your code in a try/catch block and the device did not reset, it just looped. So I did this:

List<byte[]> allocations = new List<byte[]>();
int iteration = 0;

while (true)
{
    try
    {
        byte[] memoryBlock = new byte[3 * 1024 * 1024];
        allocations.Add(memoryBlock);
        iteration++;

        Console.WriteLine($"Iteration: {iteration}, Allocated: {allocations.Count * 3} MB");

        Thread.Sleep(100);
    }
    catch (Exception e)
    {
        Console.WriteLine($"OutOfMemory failed: {e.Message}");
    }

This suggests that the application is still running and any reset is due to Core detecting an unhandled exception and rebooting the board. If this is the case then I would be expecting Core to detect the unhandled exception and generating an error report file not the OS.

@duduita
Copy link
Author

duduita commented Oct 1, 2024

I think that my reproduction sample is not the best one, since usually we have OOMs at the OS level, so we can't use try/catches there. I'll find another way to reproduce this issue, by getting an OOM using only OS calls, and I'll think more about it. But, at first glance, if we have an OOM at the OS level, the OS should generate an error report file, right?

@NevynUK
Copy link

NevynUK commented Oct 3, 2024

@duduita I have run the code in the sample on a board with the latest build from main and I get the following:

Meadow successfully started MONO
Initializing OS...
Parsing app.config.yaml...
[core] Log level: Information
[core] MeadowApp
[core] Device is configured to use WiFi for the network interface
[core] All cloud features are disabled.
Iteration: 1, Allocated: 3 MB
Iteration: 2, Allocated: 6 MB
Iteration: 3, Allocated: 9 MB
Iteration: 4, Allocated: 12 MB
Iteration: 5, Allocated: 15 MB
Could not allocate 3145752 bytes
Unrecoverable .NET Runtime error. Meadow will restart in 5 seconds

I don't have logging to the cloud turned on so not sure if tis is working.

I can also confirm that the mono_error.txt file is being generated and it contains this (which looks right):

Reading file 'crash/mono_error.txt' from device...

Could not allocate 3145752 bytes

@duduita
Copy link
Author

duduita commented Oct 6, 2024

hmm, I was looking for mono_error.txt, but the right path is crash/mono_error.txt, now I can find it, thanks @NevynUK.

Anyway, it doesn't seem to be reported to the cloud, so I'll adjust the issue description, and I'll enhance the OOM error message to let it more descriptive.

@duduita duduita changed the title Some errors (deadlocks/OOMs) do not seem to be generated/reported to Meadow.Cloud Some errors (deadlocks/OOMs) do not seem to be reported to Meadow.Cloud Oct 6, 2024
@duduita
Copy link
Author

duduita commented Oct 6, 2024

Given that I have the following crash files:

Getting file list from '/meadow0/crash/'...
mono_error.txt
oscrash_report.txt
        2 file(s)

And I'm currently sending metrics to Meadow.Cloud, do I need to set something else in my app.config.yaml to send those files, or is it something that still need to be implemented @ctacke?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants