Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] "Test Result: FAIL!" is missing when firmware path is not found from journalctl #979

Open
miRoox opened this issue Nov 9, 2022 · 2 comments
Labels
area:logs Log and results collection, storage, etc. P2 Critical bugs or normal features type:bug Something doesn't work as expected

Comments

@miRoox
Copy link
Contributor

miRoox commented Nov 9, 2022

Describe the bug
When firmware path is not found from journalctl on an IPC4 test, it will immediately exit without report "Test Result: FAIL!". So our CI will treat it as TIMEOUT.

To Reproduce
It's hard to reproduce, but we can mock the case like #978.

Expected behavior
Report "Test Result: FAIL!" at the end of the log.

Detail Info

  1. Branch name and commit hash of the 3 repositories: sof (firmware/topology), linux (kernel driver) and sof-test (test case)
  2. Test report ID (if you find it from test report)
    • ID: 17476
  3. Test DUT Model (or a brief discribtion about the device)
    • MODEL: TGLU_RVP_NOCODEC_IPC4ZPH

Screenshots or console output

First occurrence:

2022-11-08 22:01:29 UTC N ===== Testing: (Round: 50/50) (PCM: DMIC [hw:0,10]) (Loop: 1/1) =====
2022-11-08 22:01:29 UTC N no file prefix, use /dev/null as dummy capture output
2022-11-08 22:01:29 UTC N arecord   -Dhw:0,10 -r 48000 -c 4 -f S32_LE -d 1 /dev/null -v -q
Hardware PCM card 0 'sof-nocodec' device 10 subdevice 0
Its setup is:
  stream       : CAPTURE
  access       : RW_INTERLEAVED
  format       : S32_LE
  subformat    : STD
  channels     : 4
  rate         : 48000
  exact rate   : 48000 (48000/1)
  msbits       : 32
  buffer_size  : 4096
  period_size  : 1024
  period_time  : 21333
  tstamp_mode  : NONE
  tstamp_type  : MONOTONIC
  period_step  : 1
  avail_min    : 1024
  period_event : 0
  start_threshold  : 1
  stop_threshold   : 4096
  silence_threshold: 0
  silence_size : 0
  boundary     : 4611686018427387904
  appl_ptr     : 0
  hw_ptr       : 0
declare -- cmd="journalctl_cmd --since=@1667944672"
2022-11-08 22:01:31 UTC [REMOTE_ERROR] firmware path not found from journalctl, no firmware loaded or debug option disabled?

Second occurrence (after TIMEOUT):

ln: failed to create hard link '/tmp/sof-test-card0.lock': File exists
2022-11-08 22:08:09 UTC [REMOTE_ERROR] ktime=2377 sof-test PID=12284: /tmp/sof-test-card0.lock already taken by PID 9180! Stealing it...
2022-11-08 22:08:09 UTC N ktime=2377 sof-test PID=12284: starting
2022-11-08 22:08:09 UTC [REMOTE_ERROR] firmware path not found from journalctl, no firmware loaded or debug option disabled?
2022-11-08 22:08:09 UTC [REMOTE_ERROR] Starting func_exit_handler(), exit status=1, FUNCNAME stack:
2022-11-08 22:08:09 UTC [REMOTE_ERROR]  die()  @  /home/ubuntu/sof-test/test-case/../case-lib/lib.sh
2022-11-08 22:08:09 UTC [REMOTE_ERROR]  is_firmware_file_zephyr()  @  /home/ubuntu/sof-test/test-case/../case-lib/lib.sh:697
2022-11-08 22:08:09 UTC [REMOTE_ERROR]  logger_disabled()  @  /home/ubuntu/sof-test/test-case/../case-lib/lib.sh:746
2022-11-08 22:08:09 UTC [REMOTE_ERROR]  main()  @  /home/ubuntu/sof-test/test-case/check-playback.sh:57
2022-11-08 22:08:10 UTC [REMOTE_ERROR] firmware path not found from journalctl, no firmware loaded or debug option disabled?
@marc-hb
Copy link
Collaborator

marc-hb commented Nov 10, 2022

@miRoox

As you said in #977, func_exit_handler will not be called again when exit inside it.

Originally posted in #973 (comment)

@marc-hb
Copy link
Collaborator

marc-hb commented Nov 11, 2022

Thanks @miRoox for the complex analysis!

First I want to make clear this situation should almost never happen. In #973 we made the decision to scan the logs to find the firmware file name so now we must make absolutely sure the firmware file name can be found in the logs by any means necessary. If it's not then maybe a TIMEOUT is not so bad - because it should almost never happen.

Now if we really wanted to fix this then maybe something like the UNTESTED patch below would help. But I think I'd prefer to wait and completely re-organize logging in sof-test because at the moment it's a big and complicated mess and the patch below would not make things simpler.

diff --git a/case-lib/hijack.sh b/case-lib/hijack.sh
index c1728b28c368..643d670f6cd8 100644
--- a/case-lib/hijack.sh
+++ b/case-lib/hijack.sh
@@ -101,7 +101,7 @@ function func_exit_handler()
 
     fi
 
-    if is_ipc4 && is_firmware_file_zephyr; then
+    if ! logger_disabled && is_ipc4 && is_firmware_file_zephyr; then
        local mtraceBin; mtraceBin=mtrace-reader.py
         dlogi "pkill -TERM -f $mtraceBin"
         sudo pkill -TERM -f "$mtraceBin" || {
diff --git a/case-lib/lib.sh b/case-lib/lib.sh
index 2f6444253a81..4993e44310db 100644
--- a/case-lib/lib.sh
+++ b/case-lib/lib.sh
@@ -731,6 +731,22 @@ is_ipc4()
 
 logger_disabled()
 {
+    # Cache the result for performance and to solve the "recursivity"
+    # issue #979 where the func_exit_handler() is calling one more time
+    # a function that just died.
+    if test -n "${_sof_logging}"; then
+        if $_sof_logging; then
+            # horrible double-negation. This function should have been
+            # "logger_enabled()" but too late now.
+            return 1
+        else # 0 = true = logging is disabled
+            return 0
+        fi
+    fi
+
+    # Default to false in case we die here
+    _sof_logging=false
+
     local ldcFile
     # Some firmware/OS configurations do not support logging.
     ldcFile=$(find_ldc_file) || {
@@ -738,7 +754,7 @@ logger_disabled()
         return 0 # 0 is 'true'
     }
 
-    # Disable logging when available...
+    # Disable logging when the -s option is available
     if [ ${OPT_VAL['s']} -eq 0 ]; then
         return 0
     fi
@@ -757,7 +773,8 @@ logger_disabled()
         return 0
     fi
 
-    return 1
+    _sof_logging=true
+    return 1 # 1 = false = not disabled
 }
 
 print_module_params()

@fredoh9 fredoh9 added the P2 Critical bugs or normal features label Mar 13, 2023
@marc-hb marc-hb added type:bug Something doesn't work as expected and removed bug labels May 17, 2023
@marc-hb marc-hb added the area:logs Log and results collection, storage, etc. label Jun 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:logs Log and results collection, storage, etc. P2 Critical bugs or normal features type:bug Something doesn't work as expected
Projects
None yet
Development

No branches or pull requests

4 participants