Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set_sync_source() fails on N310 #449

Closed
andrepuschmann opened this issue Jun 8, 2021 · 11 comments
Closed

set_sync_source() fails on N310 #449

andrepuschmann opened this issue Jun 8, 2021 · 11 comments
Labels
Fixed Internally This issue has been fixed but needs to clear our internal release pipeline.

Comments

@andrepuschmann
Copy link
Contributor

Issue Description

The issue comes up when trying to bring up a N310 two-channel streamer and setting the sync source. For some reason, the set_sync_source() call doesn't work with the N310. The same code works well with the N310 and UHD 3.15 LTS or with UHD4 on the X310.

srs@edge:~/srsRAN_public/build_release$ ./srsue/src/srsue ~/ue_nsa.conf
Reading configuration file /home/srs/ue_nsa.conf...
WARNING: cpu0 scaling governor is not set to performance mode. Realtime processing could be compromised. Consider setting it to performance mode before running the application.

Built in Release mode using commit 89f16eed2 on branch master.

Opening 2 channels in RF device=default with args=type=n3xx,clock=external,sampling_rate=15.36e6,lo_freq_offset_hz=15.36e6
[INFO] [UHD] linux; GNU C++ version 7.5.0; Boost_106501; UHD_4.0.0.0-231-gb54fb54b
[INFO] [LOGGING] Fastpath logging disabled at runtime.
Opening USRP channels=2, args: type=n3xx,lo_freq_offset_hz=15.36e6,master_clock_rate=122.88e6
[INFO] [UHD RF] RF UHD Generic instance constructed
[INFO] [MPMD] Initializing 1 device(s) in parallel with args: mgmt_addr=192.168.10.189,type=n3xx,product=n310,serial=3195684,claimed=False,addr=192.168.10.189,master_clock_rate=122.88e6
[WARNING] [MPM.RPCServer] A timeout event occured!
[INFO] [MPM.PeriphManager] init() called with device args `master_clock_rate=122.88e6,mgmt_addr=192.168.10.189,product=n310,clock_source=internal,time_source=internal'.
[WARNING] [RFNOC::GRAPH] One or more blocks timed out during flush!
[INFO] [MPM.Magnesium-0] Re-initializing daughter board. This may take some time.
/home/srs/srsRAN_public/lib/src/phy/rf/rf_uhd_imp.cc.822: USRP reported the following error: RuntimeError: Error during RPC call to `set_sync_source'. Error message: At least one PLL did not lock! Check the logs for details.
[ERROR] [RPC] At least one PLL did not lock! Check the logs for details.
/home/srs/srsRAN_public/lib/src/phy/rf/rf_uhd_imp.cc.946: uhd_init failed, freeing...
[ERROR] [MPM.RPCServer] Uncaught exception in method set_sync_source :At least one PLL did not lock! Check the logs for details.
 Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/usrp_mpm/rpc_server.py", line 186, in new_claimed_function
    return function(*args)
  File "/usr/lib/python3.7/site-packages/usrp_mpm/periph_manager/n3xx.py", line 696, in set_sync_source
    skip_rfic=args.get('skip_rfic', None)
  File "/usr/lib/python3.7/site-packages/usrp_mpm/dboard_manager/magnesium.py", line 400, in update_ref_clock_freq
    self._reinit(self.master_clock_rate)
  File "/usr/lib/python3.7/site-packages/usrp_mpm/dboard_manager/magnesium.py", line 359, in _reinit
    self.init(args)
  File "/usr/lib/python3.7/site-packages/usrp_mpm/dboard_manager/magnesium.py", line 295, in init
    args, self._init_args, fast_reinit)
  File "/usr/lib/python3.7/site-packages/usrp_mpm/dboard_manager/mg_init.py", line 615, in init
    args
  File "/usr/lib/python3.7/site-packages/usrp_mpm/dboard_manager/mg_init.py", line 547, in _full_init
    self.PHASE_DAC_SPI_ADDR,
  File "/usr/lib/python3.7/site-packages/usrp_mpm/dboard_manager/mg_init.py", line 181, in _init_lmk
    self.log
  File "/usr/lib/python3.7/site-packages/usrp_mpm/dboard_manager/lmk_mg.py", line 52, in __init__
    self.config()
  File "/usr/lib/python3.7/site-packages/usrp_mpm/dboard_manager/lmk_mg.py", line 218, in config
    raise RuntimeError("At least one PLL did not lock! Check the logs for details.")
RuntimeError: At least one PLL did not lock! Check the logs for details.

/home/srs/srsRAN_public/lib/src/phy/rf/rf_imp.c.129: No compatible RF frontend found
Error initializing radio.

Setup Details

  • N310, latest UHD 4 rootfs and FPGA
  • UHD4 built from source UHD_4.0.0.0-231-gb54fb54b
  • Ubuntu 18.04.5 LTS

Expected Behavior

UHD (3.15 and 4) to be able to set sync source reliably on all supported HW.

Actual Behaviour

The call fails with UHD4 and the N310. It works on the N310 with UHD 3.15 and the X310, even with UHD4.

Steps to reproduce the problem

The above output is with srsUE public version (https://github.com/srsran/srsRAN). Normal build from source with following UE config file.
ue_nsa.conf.zip

Additional Information

I've asked the same question a while ago on the usrp-users ML without any response.

@andrepuschmann
Copy link
Contributor Author

Quick update. It actually turned out that by commenting the call to set_sync_source(), the UHD init goes can pass and it seems to start up.

I've used this trivial patch:

diff --git a/lib/src/phy/rf/rf_uhd_imp.cc b/lib/src/phy/rf/rf_uhd_imp.cc
index e2da0e144..6ad25ced8 100644
--- a/lib/src/phy/rf/rf_uhd_imp.cc
+++ b/lib/src/phy/rf/rf_uhd_imp.cc
@@ -817,11 +817,13 @@ static int uhd_init(rf_uhd_handler_t* handler, char* args, uint32_t nof_channels
   bool        is_locked = false;
   std::string sensor_name;

+#if 0
   // Set sync source
   if (handler->uhd->set_sync_source(sync_src, clock_src) != UHD_ERROR_NONE) {
     print_usrp_error(handler);
     return SRSRAN_ERROR;
   }
+#endif

   if (clock_src == "gpsdo") {
     set_time_to_gps_time(handler);

But then the receive hangs. The issue can actually be reproduced with the much "simpler" benchmark_radio example that just uses the srsRAN RF library to receive and transmit samples. There is no DSP happening beyond that, similar to UHDs benchmark_rate.

The call just hangs:

srs@edge:~/srsRAN_public/build_release$ ./lib/src/radio/test/benchmark_radio
Instantiating objects and allocating memory...
mlockall: Cannot allocate memory
Initialising instances...
Opening 1 channels in RF device= with args=default
[INFO] [UHD] linux; GNU C++ version 7.5.0; Boost_106501; UHD_4.0.0.0-231-gb54fb54b
[INFO] [LOGGING] Fastpath logging disabled at runtime.
Opening USRP channels=1, args: type=n3xx,master_clock_rate=122.88e6
[INFO] [UHD RF] RF UHD Generic instance constructed
[INFO] [MPMD] Initializing 1 device(s) in parallel with args: mgmt_addr=192.168.10.189,type=n3xx,product=n310,serial=3195684,claimed=False,addr=192.168.10.189,master_clock_rate=122.88e6
[WARNING] [MPM.RPCServer] A timeout event occured!
[INFO] [MPM.PeriphManager] init() called with device args `master_clock_rate=122.88e6,mgmt_addr=192.168.10.189,product=n310,clock_source=internal,time_source=internal'.
[WARNING] [0/Radio#0] Channel 0: Coercing TX gain from -1 dB to 0

Warning: TX gain was not set. Using open-loop power control (not working properly)

Start capturing 10 sub-frames of 1920 samples (approx. 0s) ...

@andrepuschmann
Copy link
Contributor Author

Hey, just checking here if anyone else has seen this issue or knows something about it? Is it worth checking the UHD 4.1?

@michaelld
Copy link
Collaborator

@andrepuschmann It's worth checking UHD 4.1 release ... certainly! If this is a UHD issue, we want to track it down and deal with it, and that means fixing it in 4.1 & if it exists in prior UHD then backporting the fix as reasonably possible to 4.0 and 3.15.

Question: Can you use any of the UHD-provided examples or utilities to show this issue? That's the ideal way to do so; using a significant 3rd party UHD-based application -- no matter how robust or well tested -- can introduce all sorts of strange behaviors without much ado. The goal here is to help track down the most likely place where the issue is being created: the 3rd party UHD-based application or UHD itself. If you can demonstrate the issue using UHD itself, then we know the answer and the issue is likely to be addressed by R&D ... otherwise ... well, who knows how the issue might proceed or not ... I'm sure you understand!

@andrepuschmann
Copy link
Contributor Author

Thanks for the feedback @michaelld

Unless I am doing something completely wrong, it's very easy to reproduce with any UHD example I believe. See below, I am taking the first line from the srsRAN code base and the seconds is just copy paste from the UHD manual here

diff --git a/host/examples/benchmark_rate.cpp b/host/examples/benchmark_rate.cpp
index d975c4c0c..1aa701112 100644
--- a/host/examples/benchmark_rate.cpp
+++ b/host/examples/benchmark_rate.cpp
@@ -540,6 +540,10 @@ int UHD_SAFE_MAIN(int argc, char* argv[])
             std::cout << boost::format("Setting RX spp to %u\n") % rx_spp;
             usrp->set_rx_spp(rx_spp);
         }
+       // set sync source (neither of them works)
+       usrp->set_sync_source("internal","internal");
+       usrp->set_sync_source(uhd::device_addr_t("clock_source=external,time_source=external"));
+
         // create a receive streamer
         uhd::stream_args_t stream_args(rx_cpu, rx_otw);
         stream_args.channels             = rx_channel_nums;

Both give:

d$ ./examples/benchmark_rate --rx_rate=15.36e6 --args "type=n3xx"

[INFO] [UHD] linux; GNU C++ version 9.3.0; Boost_107100; UHD_4.0.0.0-154-gb061af4f
[00:00:00.000429] Creating the usrp device with: type=n3xx...
[INFO] [MPMD] Initializing 1 device(s) in parallel with args: mgmt_addr=192.168.10.2,type=n3xx,product=n310,serial=317F537,claimed=False,addr=192.168.10.2
[INFO] [MPM.PeriphManager] init() called with device args `mgmt_addr=192.168.10.2,product=n310,clock_source=internal,time_source=internal'.
Using Device: Single USRP:
  Device: N300-Series Device
  Mboard 0: n310
  RX Channel: 0
    RX DSP: 0
    RX Dboard: A
    RX Subdev: Magnesium
  RX Channel: 1
    RX DSP: 1
    RX Dboard: A
    RX Subdev: Magnesium
  RX Channel: 2
    RX DSP: 2
    RX Dboard: B
    RX Subdev: Magnesium
  RX Channel: 3
    RX DSP: 3
    RX Dboard: B
    RX Subdev: Magnesium
  TX Channel: 0
    TX DSP: 0
    TX Dboard: A
    TX Subdev: Magnesium
  TX Channel: 1
    TX DSP: 1
    TX Dboard: A
    TX Subdev: Magnesium
  TX Channel: 2
    TX DSP: 2
    TX Dboard: B
    TX Subdev: Magnesium
  TX Channel: 3
    TX DSP: 3
    TX Dboard: B
    TX Subdev: Magnesium

[00:00:05.457587724] Setting device timestamp to 0...
[WARNING] [MULTI_USRP] Could not set RX rate to 15.360 MHz. Actual rate is 15.625 MHz
[WARNING] [MULTI_USRP] Could not set RX rate to 15.360 MHz. Actual rate is 15.625 MHz
[WARNING] [MULTI_USRP] Could not set RX rate to 15.360 MHz. Actual rate is 15.625 MHz
[WARNING] [MULTI_USRP] Could not set RX rate to 15.360 MHz. Actual rate is 15.625 MHz
[ERROR] [RPC] unhashable type: 'dict'
[ERROR] [MPM.RPCServer] Uncaught exception in method set_clock_source :unhashable type: 'dict'
 Traceback (most recent call last):
  File "/usr/lib/python3.7/site-packages/usrp_mpm/rpc_server.py", line 186, in new_claimed_function
    return function(*args)
  File "/usr/lib/python3.7/site-packages/usrp_mpm/periph_manager/n3xx.py", line 557, in set_clock_source
    if (clock_source, time_source) not in self.valid_sync_sources:
TypeError: unhashable type: 'dict'

Error: RuntimeError: Error during RPC call to `set_clock_source'. Error message: unhashable type: 'dict'

@manderseck
Copy link
Contributor

@andrepuschmann Sorry for the delay. Thanks for providing a way to reproduce this without 3rd party applications. I have applied your changes to the benchmark_rate example and can confirm getting errors with this in UHD 4.1. I still need to investigate this further but will get back as soon as I will have found something.

@manderseck
Copy link
Contributor

manderseck commented Aug 27, 2021

@andrepuschmann Update: As I wrote yesterday I could reproduce the original error RuntimeError: At least one PLL did not lock! Check the logs for details. with the 4.1 release. This however was only the case because I didn't have the external clock connected. Now that I connected both the 10 MHz reference clock and the PPS to the N310 it does exactly what we expect it to do and doesn't return any error or exception with the adapted benchmark_rate example. I was not able to reproduce the Error during RPC call to 'set_clock_source'. Error message: unhashable type: 'dict'.

@manderseck
Copy link
Contributor

manderseck commented Aug 31, 2021

In UHD-4.0 I could also reproduce the "unhashable type" issue which is caused by calling the wrong function - set_clock_source() instead of set_sync_source() - on the USRP side from mpmd_mb_controller.cpp (diff between 4.0 and current):

 void mpmd_mb_controller::set_sync_source(
@@ -123,23 +181,25 @@ void mpmd_mb_controller::set_sync_source(const device_addr_t& sync_source)
       for (const auto& key : sync_source.keys()) {
            sync_source_map[key] = sync_source.get(key);
       }                                                                                                                  
-    _rpc->notify_with_token(
-        MPMD_DEFAULT_LONG_TIMEOUT, "set_clock_source", sync_source_map);
+    _rpc->get_raw_rpc_client()->notify_with_token(
+        MPMD_DEFAULT_LONG_TIMEOUT, "set_sync_source", sync_source_map);
+    if (!_sync_source_updaters.empty()) {
+        for (const auto& updater : _sync_source_updaters) {
+            updater(sync_source);
+        }
+    }
 }

For a quick fix I can only recommend upgrading to 4.1. We are still in discussion if we can backport this fix to 4.0.

@manderseck
Copy link
Contributor

The internal discussion about the backport is still ongoing, but the issue is found and already resolved in 4.1. Therefore I'll close the issue. Please re-open if you think that any aspect of your issue report was not covered properly. Thanks for reporting!

@mbr0wn mbr0wn added the Fixed Internally This issue has been fixed but needs to clear our internal release pipeline. label Sep 9, 2021
@andrepuschmann
Copy link
Contributor Author

Confirmed to work fine with the N310 with UHD 4.1.

@mbr0wn
Copy link
Contributor

mbr0wn commented Sep 16, 2021

Thanks for confirming!

@houshangaziziaccelleran

Hi
I am experiencing similar issue:


ad@bm-super11-intel:/usr/local/lib/uhd/examples$ sudo ./benchmark_rate --rx_rate 125e6 --rx_subdev "A:0 B:0" --rx_channels 0,1 --tx_rate 125e6 --tx_subdev "A:0 B:0" --tx_channels 0,1 --args "mgmt_addr=10.10.0.100,master_clock_rate=125e6"

[INFO] [UHD] linux; GNU C++ version 11.4.0; Boost_107400; DPDK_23.11; UHD_4.7.0.HEAD-0-ga5ed1872
[00:00:00.000065] Creating the usrp device with: mgmt_addr=10.10.0.100,master_clock_rate=125e6...
[INFO] [MPMD] Initializing 1 device(s) in parallel with args: mgmt_addr=10.10.0.100,type=n3xx,product=n310,serial=32000F1,name=ni-n3xx-32000F1,fpga=XG,claimed=False,master_clock_rate=125e6
[WARNING] [MPM.RPCServer] A timeout event occured!
[INFO] [MPM.PeriphManager] init() called with device args `fpga=XG,master_clock_rate=125e6,mgmt_addr=10.10.0.100,name=ni-n3xx-32000F1,product=n310,clock_source=internal,time_source=internal'.
Using Device: Single USRP:
  Device: N300-Series Device
  Mboard 0: n310
  RX Channel: 0
    RX DSP: 0
    RX Dboard: A
    RX Subdev: Magnesium
  RX Channel: 1
    RX DSP: 1
    RX Dboard: B
    RX Subdev: Magnesium
  TX Channel: 0
    TX DSP: 0
    TX Dboard: A
    TX Subdev: Magnesium
  TX Channel: 1
    TX DSP: 1
    TX Dboard: B
    TX Subdev: Magnesium

[00:00:23.92320232] Setting device timestamp to 0...
[INFO] [MULTI_USRP]     1) catch time transition at pps edge
[INFO] [MULTI_USRP]     2) set times next pps (synchronously)
[WARNING] [0/Radio#0] Attempting to set tick rate to 0. Skipping.
[WARNING] [0/Radio#1] Attempting to set tick rate to 0. Skipping.
[00:00:24.532138819] Testing receive rate 125.000000 Msps on 2 channels
[WARNING] [0/Radio#1] Attempting to set tick rate to 0. Skipping.
[WARNING] [0/Radio#0] Attempting to set tick rate to 0. Skipping.
Setting TX spp to 364
[00:00:24.542294075] Testing transmit rate 125.000000 Msps on 2 channels
OOOOOOOOterminate called after throwing an instance of 'uhd::op_timeout'
  what():  RfnocError: OpTimeout: Control operation timed out waiting for ACK
Aborted
ad@bm-super11-intel:/usr/local/lib/uhd/examples$ 

Here are my n310 and my host server version of UHD:

ad@bm-super11-intel:/usr/local/lib/uhd/examples$ uhd_config_info --version
UHD 4.7.0.HEAD-0-ga5ed1872
ad@bm-super11-intel:/usr/local/lib/uhd/examples$ ssh [email protected]
root@ni-n3xx-32000F1:~# uhd_config_info --version
UHD 4.7.0.0-0-ga5ed1872

root@ni-n3xx-32000F1:~#

Any way out of this?
Thanks
Houshang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Fixed Internally This issue has been fixed but needs to clear our internal release pipeline.
Projects
None yet
Development

No branches or pull requests

5 participants