Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

root user and all network connections to the Raspberry "freeze" after POSTing a file to express server while ws281x-animation is running. #87

Open
HerrZatacke opened this issue Jul 9, 2018 · 14 comments

Comments

@HerrZatacke
Copy link

Hi there,

First of all I have to say I appreciate all the work that is being put into this module.

I have a strange issue with this library in combination with express (the node http server) where it seems the root-user kind of "gets locked" or "freezes" and all network connections seem to die until the next restart after POSTing a file to an express-server running on the Raspberry.

I'll try to give a very detailed description even if some of the following information might be irrelevant.
If there's anything I can do to dig further into this issue, let me know (I think i have high knowledge of the node.js universe, but bindings totally scare me ;-)

My Setup

The error

  • All network connections from/to the Raspberry are dead.
  • All sudo commands (used via a local console) do not work.
  • When pinging I get this (the error message starts ~30s after starting the ping). ping: sendmsg: Kein Hauptspeicher für den Puffer verfügbar (see more console output below)
  • The tty remains unusable even after Ctrl+C
  • ps -A returns: (run as pi as sudo ps -A does not work - see the console output below)
  • I don't get any direct error message (still might be possible to find something in logfiles I don't know)
  • After all this the actual process running the animation (with root privileges) seems still to be running - yet there is no more response on the TTY.
  • Also the desktop of the Raspberry is still usable for the pi-user (Mouse moves, Keyboard works - that's how i could ping and ps -A) except for "reboot" or similar "root"-tasks (which forces me to kill the Raspberry by cutting the power)
Detailed output for "ping 8.8.8.8"
pi@raspberrypi:~ $ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
ping: sendmsg: Kein Hauptspeicher für den Puffer verfügbar
ping: sendmsg: Kein Hauptspeicher für den Puffer verfügbar
ping: sendmsg: Kein Hauptspeicher für den Puffer verfügbar
ping: sendmsg: Kein Hauptspeicher für den Puffer verfügbar
ping: sendmsg: Kein Hauptspeicher für den Puffer verfügbar
ping: sendmsg: Kein Hauptspeicher für den Puffer verfügbar
ping: sendmsg: Kein Hauptspeicher für den Puffer verfügbar
ping: sendmsg: Kein Hauptspeicher für den Puffer verfügbar
ping: sendmsg: Kein Hauptspeicher für den Puffer verfügbar
^C
--- 8.8.8.8 ping statistics ---
22 packets transmitted, 0 received, 100% packet loss, time 40901ms
Detailed output for "ps -A"
  PID TTY          TIME CMD
    1 ?        00:00:02 systemd
    2 ?        00:00:00 kthreadd
    3 ?        00:00:00 kworker/0:0
    4 ?        00:00:00 kworker/0:0H
    5 ?        00:00:00 kworker/u8:0
    6 ?        00:00:00 mm_percpu_wq
    7 ?        00:00:00 ksoftirqd/0
    8 ?        00:00:00 rcu_sched
    9 ?        00:00:00 rcu_bh
   10 ?        00:00:00 migration/0
   11 ?        00:00:00 cpuhp/0
   12 ?        00:00:00 cpuhp/1
   13 ?        00:00:00 migration/1
   14 ?        00:00:00 ksoftirqd/1
   15 ?        00:00:00 kworker/1:0
   16 ?        00:00:00 kworker/1:0H
   17 ?        00:00:00 cpuhp/2
   18 ?        00:00:00 migration/2
   19 ?        00:00:00 ksoftirqd/2
   20 ?        00:00:00 kworker/2:0
   21 ?        00:00:00 kworker/2:0H
   22 ?        00:00:00 cpuhp/3
   23 ?        00:00:00 migration/3
   24 ?        00:00:00 ksoftirqd/3
   25 ?        00:00:00 kworker/3:0
   26 ?        00:00:00 kworker/3:0H
   27 ?        00:00:00 kdevtmpfs
   28 ?        00:00:00 netns
   29 ?        00:00:00 kworker/0:1
   30 ?        00:00:00 kworker/1:1
   31 ?        00:00:00 kworker/2:1
   32 ?        00:00:00 kworker/3:1
   33 ?        00:00:00 khungtaskd
   34 ?        00:00:00 oom_reaper
   35 ?        00:00:00 writeback
   36 ?        00:00:00 kcompactd0
   37 ?        00:00:00 crypto
   38 ?        00:00:00 kblockd
   39 ?        00:00:00 watchdogd
   40 ?        00:00:00 rpciod
   41 ?        00:00:00 xprtiod
   42 ?        00:00:00 kworker/u8:1
   44 ?        00:00:00 kswapd0
   45 ?        00:00:00 nfsiod
   55 ?        00:00:00 kthrotld
   56 ?        00:00:00 iscsi_eh
   57 ?        00:00:00 dwc_otg
   58 ?        00:00:00 DWC Notificatio
   59 ?        00:00:00 vchiq-slot/0
   60 ?        00:00:00 vchiq-recy/0
   61 ?        00:00:00 vchiq-sync/0
   62 ?        00:00:00 vchiq-keep/0
   63 ?        00:00:00 SMIO
   64 ?        00:00:00 kworker/0:2
   65 ?        00:00:00 irq/92-mmc1
   68 ?        00:00:00 mmcqd/0
   69 ?        00:00:00 jbd2/mmcblk0p7-
   70 ?        00:00:00 ext4-rsv-conver
   71 ?        00:00:00 ipv6_addrconf
   83 ?        00:00:00 irq/169-usb-001
   87 ?        00:00:00 kworker/1:1H
  101 ?        00:00:00 systemd-journal
  126 ?        00:00:00 systemd-udevd
  153 ?        00:00:00 kworker/2:2
  159 ?        00:00:00 kworker/u8:2
  161 ?        00:00:00 kworker/3:2
  173 ?        00:00:00 kworker/1:2
  217 ?        00:00:00 cfg80211
  219 ?        00:00:00 brcmf_wq/mmc1:0
  220 ?        00:00:00 brcmf_wdog/mmc1
  221 ?        00:00:00 kworker/3:3
  275 ?        00:00:00 systemd-timesyn
  304 ?        00:00:00 dbus-daemon
  326 ?        00:00:00 rsyslogd
  329 ?        00:00:00 thd
  330 ?        00:00:00 systemd-logind
  338 ?        00:00:00 cron
  340 ?        00:00:00 avahi-daemon
  343 ?        00:00:00 dhcpcd
  352 ?        00:00:00 avahi-daemon
  384 ?        00:00:00 wpa_supplicant
  391 ?        00:00:00 kworker/3:4
  428 ?        00:00:00 lightdm
  438 tty1     00:00:00 login
  445 ?        00:00:00 sshd
  447 tty7     00:00:00 Xorg
  464 ?        00:00:00 lightdm
  472 ?        00:00:00 systemd
  475 ?        00:00:00 (sd-pam)
  480 ?        00:00:00 lxsession
  489 ?        00:00:00 dbus-daemon
  532 ?        00:00:00 ssh-agent
  538 ?        00:00:00 gvfsd
  543 ?        00:00:00 gvfsd-fuse
  567 ?        00:00:00 kworker/u9:0
  568 ?        00:00:00 hciattach
  574 ?        00:00:00 openbox
  575 ?        00:00:00 bluetoothd
  576 ?        00:00:00 kworker/u9:1
  577 ?        00:00:00 kworker/u9:2
  578 ?        00:00:00 lxpolkit
  584 ?        00:00:01 lxpanel
  585 ?        00:00:00 pcmanfm
  594 ?        00:00:00 ssh-agent
  599 ?        00:00:00 polkitd
  602 ?        00:00:00 bluealsa
  631 ?        00:00:00 krfcommd
  650 tty1     00:00:00 bash
  670 ?        00:00:00 menu-cached
  674 ?        00:00:00 gvfs-udisks2-vo
  677 ?        00:00:00 udisksd
  695 ?        00:00:00 gvfs-afc-volume
  700 ?        00:00:00 gvfs-goa-volume
  704 ?        00:00:00 gvfs-gphoto2-vo
  708 ?        00:00:00 gvfs-mtp-volume
  744 ?        00:00:00 gvfsd-trash
  751 ?        00:00:00 jbd2/mmcblk0p5-
  752 ?        00:00:00 ext4-rsv-conver
 1140 ?        00:00:00 kworker/2:1H
 1141 ?        00:00:00 kworker/0:1H
 1142 ?        00:00:00 kworker/3:1H
 1160 ?        00:00:00 kworker/1:3
 1161 ?        00:00:00 sshd
 1173 ?        00:00:00 sshd
 1176 pts/0    00:00:00 bash
 1193 ?        00:00:00 sshd
 1203 ?        00:00:00 sshd
 1206 pts/1    00:00:00 bash
 1230 pts/0    00:00:01 npm
 1240 pts/0    00:00:00 sh
 1241 pts/0    00:00:00 node
 1251 pts/1    00:00:00 sudo
 1255 pts/1    00:00:01 npm
 1265 pts/1    00:00:00 sh
 1266 pts/1    00:00:00 node
 1276 ?        00:00:00 lxterminal
 1277 ?        00:00:00 gnome-pty-helpe
 1278 pts/2    00:00:00 bash
 1286 pts/2    00:00:00 ps

How to reproduce the error

  • download this gist (there are two separate scripts I combined into one example gist for easier testing - the error also happens when running as two completely different packages in different folders)
  • unzip and run npm i
  • sudo npm run ws281x starts the animation
  • npm run express starts the very simple webserver
  • navigate to http://[IP]:3000
  • select a file (image ~2MB works for me) and click the !-button (this starts the upload)
  • there should be no more response from the server - the raspberry should be "dead"

What else I have tried:

  • I used the new v1.x-branch and incorporated the changes noted here - Same error ocurred
  • I used the python examples from here in combination with the npm run express - No Freezes here.
  • Running the test animation script without sudo npm run ws281x - No Freezes (and no animation, duh ;))
  • killing all 'node' processes - Could not kill the node process started by root.
  • Sending the file while the animation loop is not running (even while the "root" process is running). - No freezes
  • My current workaround is to query if the animation is running before sending a file, but validation purely on the client side is not a good solution I think.

What I have not tried:

  • running a different type of webserver (Apache, nginx, etc)
  • running a different node version

If you think any of this might help, I'll be happy to try these.

@usefulthink
Copy link
Member

usefulthink commented Jul 10, 2018

Hey there, you absolutely deserve a 🥇🏆for the most detailed bug reported here yet!

The error-message (ping: sendmsg: Kein Hauptspeicher für den Puffer verfügbar, original text No buffer space available) sounds suspiciously like the pi ran out of some sort of memory. Further searching leads to this SO-question: https://askubuntu.com/questions/210451/what-does-ping-sendmsg-no-buffer-space-available-mean - there are some things to try in the answers, like resetting the network-interface

Some more things you can try:

  • you can use sudo su - to get a full shell with the root-user logged in, that might help with issuing root-commands after the error occured.
  • if kill/killall alone doesn't help, try kill -9 $PID/killall -9 $NAME

Would you mind joining me in our slack (https://livejs-slackin.herokuapp.com/) so we can chat about this? I'm happy to help debugging this issue, but I'm a bit out of Ideas right now. Having a real-time conversation would probably help..

@HerrZatacke
Copy link
Author

HerrZatacke commented Jul 10, 2018

Hi,

Sure! I'll try both of your suggestions and will also join your channel to report back :-)
(after I'm home)

@xoblite
Copy link

xoblite commented Sep 30, 2018

Hi,

First of all I have to say I appreciate all the work that is being put into this module.

I can only join the choir here - much appreciated indeed! As for the issue, I have been experiencing similar symptoms to the ones described by the OP, but haven't been able to track down the issue yet either (btw, updating to Node.js 8.12 doesn't seem to improve things). May I ask though, how fast is your animation running? (i.e. interval in msecs between frames, or frames per second). @usefulthink, any recommendations here? (nb. this said, I tried decreasing one of my animations from 20 to 100 msec between updates, but that didn't seem to help; the Zero becomes unresponsive but the animation continues. EDIT: If you wait longer, the animation eventually freezes as well.).

Work-in-progress in case you want to check it out yourselves: https://github.com/xoblite/HAP-NodeJS-accessories (see Unicorn_pHAT_accessory.js)

Thanks again,

BR//Karl-Henrik

@xoblite
Copy link

xoblite commented Oct 5, 2018

After some more testing (including running the same code via Pimoroni's python library and managing heavy load at the same time) I can pretty much confirm the issue is somewhere in the library; possibly in the default (i.e. unspecified / passed on) init settings. What's interesting is that a few, low pps of network traffic is OK, but more than that and it freezes. Could it be a DMA conflict again perhaps? For reference, Pimoroni uses the following init options in their Python library for Unicorn HAT/pHAT; note e.g. the use of DMA channel 10:

# LED strip configuration:
LED_COUNT      = 64      # Number of LED pixels.
LED_PIN        = 18      # GPIO pin connected to the pixels (must support PWM!).
LED_FREQ_HZ    = 800000  # LED signal frequency in hertz (usually 800khz)
LED_DMA        = 10      # DMA channel to use for generating signal
LED_BRIGHTNESS = 128     # Set to 0 for darkest and 255 for brightest
LED_CHANNEL    = 0       # PWM channel
LED_INVERT     = False   # True to invert the signal (when using NPN transistor level shift)
LED_GAMMA = ...

Will dig around a bit more and see if I can pin down the issue, though the (old branched upstream?) selection of DMA channel looks like a candidate at this point (cf. other reports on DMA vs RPi3B+?). Still on the master branch btw, tested on both Node.js 8.11.x and 8.12.0, Raspberry Pi Zero W (i.e. BCM2835 based) with Pimoroni Unicorn pHAT.

Any thoughts? By the way, how could/should other init params be passed to the library? (haven't had time to digest the upstream yet ;) ) Any related aspects of possibly switching to the 1.0 branch?

BR//Karl-Henrik

@xoblite
Copy link

xoblite commented Oct 5, 2018

Update: Tried changing the default DMA channel (i.e. DEFAULT_DMANUM) in rpi-ws281x.cc to 10 and recompiling, and everything seems to be working like a charm now (running Node.js, Prometheus, Redis server+benchmark, sudo apt-get update and frequently hitting my own HTTP server code at the same time, seemingly no issues, so far at least ;D ).

@HerrZatacke, would it be possible for you to confirm on your side as well? (nb. you need to install node-gyp to recompile the rpi_ws281x.node binding)

Update Oct 6th: After quite extensive testing and really pushing the limits (hehe), I can confirm that the system is still rock solid following the change of DMA channel as per above. @usefulthink, given that this can be considered quite a serious issue on both the RPi 3B+ and 0W - which is also the current model of each product - I propose a change to DMA channel 10 also on master (i.e. rpi-ws281x.cc row 19), do you agree? I can of course pull a one-liner myself, but figured you may want to put something in e.g. the readme as well; let me know otherwise. Thanks!

@KDKHD
Copy link

KDKHD commented Oct 6, 2018

default DMA channel

Hello, I am having the exact same problem. Could you please give a detailed description on how to change the DMA channel, how to recompile and all the steps needed to get this working?

@xoblite
Copy link

xoblite commented Oct 6, 2018

@Profe66er:

Hello, I am having the exact same problem. Could you please give a detailed description on how to change the DMA channel, how to recompile and all the steps needed to get this working?

I'll try... (I had never done it before either, but I think this is the correct workflow)

  • Stop your related node.js processes.
  • Install node-gyp (nb. the instructions on that page match my steps below btw).
  • Change (e.g. using nano) row 19 of rpi-ws281x.cc in the <ParentFolder>/node_modules/rpi-ws281x-native/src folder to read #define DEFAULT_DMANUM 10, and save.
  • Change directory back to <ParentFolder>/node_modules/rpi-ws281x-native and perform a node-gyp rebuild (this performs a combined node-gyp clean+configure+build operation, just in case). After it's finished, cd into <ParentFolder>/node_modules/rpi-ws281x-native/lib/binding and check the date on rpi_ws281x.node. If all went well, it should now have the date/time the rebuild was finished.
  • Restart your related node.js processes (rebooting never hurts after serious issues either though IMO).
  • Push the system limits that previously got you into problems, and let us know if this solves the issue for you as well. (Are you on RPi 3B+ or 0W by the way? Raspbian Stretch? Which LED HW btw?)

Clear as mud? ;) ...let me know otherwise.

BR//Karl-Henrik

@KDKHD
Copy link

KDKHD commented Oct 6, 2018

Thank a lot. Very helpful. Im on RPI 3B+ and I was trying to run a Spotify api program while running the neopixle server but the neopixle server kept on crashing. I was using a Ws2812B too. It all seems to work now but Ill do some more testing.
Thanks alot!!!

@xoblite
Copy link

xoblite commented Oct 10, 2018

@HerrZatacke @Profe66er Any updates?
@usefulthink Any thoughts?

@jiristanglica
Copy link

Duh I wish I found this issue last week when I ran into the same problem. My scenario was that I was using my Pi as a WiFi AP (using the built in WiFi chip on wlan0), connecting my phone to it to control the LED strip. It froze every time I tried to serve a static HTML file from the Express. However, I solved the issue by reversing the connection logic - my phone was set to an AP mode and Pi was connected to it (although I have to point out that it was connected using a USB WiFi dongle on wlan1) - then everything worked just fine.
This would point to the direction that the DMA channel 5 is somehow used by the internal WiFi chip too.
Anyhow, thank you very much for providing this solution, might come in handy later! 👍

xoblite added a commit to xoblite/HAP-NodeJS-accessories that referenced this issue Jan 2, 2019
Added the possibility to configure driver initialization parameters, as the default DMA channel (5) used by upstream will cause serious networking issues on (at least) RPi Zero W, 3B+ and 3A+, see e.g. beyondscreen/node-rpi-ws281x-native#87 .
@xoblite
Copy link

xoblite commented Jan 2, 2019

After putting the same type of pHAT on my Raspberry Pi 3A+ I can confirm that the DMA 5 issue applies to this model as well, and that it's solved by changing the DMA channel to 10 as per above.

FYI, another way (read: perhaps useful beyond waiting for the DMA change specifically to happen upstream) to address this, that I now use myself in my code, is to pass the modified input parameters to the driver as part of the init call:

// LED DRIVER INIT PARAMETERS - DON'T CHANGE THESE UNLESS YOU HAVE PROBLEMS AND KNOW WHAT YOU'RE DOING...
const DRIVER_FREQUENCY = 800000;
const DRIVER_DMA_NUMBER = 10; // Note: The sometimes used DMA 5 causes networking issues on (at least) RPi Zero W, 3B+ and 3A+!
const DRIVER_GPIO_PIN = 18;
const DRIVER_INVERT = 0;
const DRIVER_BRIGHTNESS = 255;

// ...

var driverInitObject = {
    frequency: DRIVER_FREQUENCY,
    dmaNum: DRIVER_DMA_NUMBER,
    gpioPin: DRIVER_GPIO_PIN,
    invert: DRIVER_INVERT,
    brightness: DRIVER_BRIGHTNESS };
driver.init(numLeds, driverInitObject);

@usefulthink Any update on the subject? Will you pull this one yourself, or do you want me to do it? (nb. the change to DMA channel 10 was committed to upstream by jgarff exactly a year ago today)

@NoahRoseLedesma
Copy link

NoahRoseLedesma commented Jan 29, 2019

Im experiencing similar 'freezing' issues. Im using my RPI as a network bridge. Running an animation for an extended period of time kills hostapd and seems to freeze up the system.

EDIT: Chaning dmaNum to 10 seems to have solved this issue for me 😄

@nicklasfrahm
Copy link
Contributor

I think we should at least update the README.md to document this. I suspected my SD cards until I found this issue.

@xoblite
Copy link

xoblite commented May 30, 2019

@usefulthink Any update on the subject? Will you pull this one yourself, or do you want me to do it? (nb. the change to DMA channel 10 was committed to upstream by jgarff a very long time ago)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants