Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minitube locks up the GPU with nouveau kernel module on Debian GNU/Linux #232

Open
computer-enthusiastic opened this issue Feb 18, 2023 · 0 comments

Comments

@computer-enthusiastic
Copy link

computer-enthusiastic commented Feb 18, 2023

Hello,

I'm experiencing computer freezes during minitube playback due to GPU lock up with nouveau kernel module on Debian GNU (using a nvidia graphic card).

I'm currently using minitube version 3.9.3 with Debian Stable (11.6) for AMD64 architecture and the following graphics configuration:

$ inxi -Gx
Graphics:  Device-1: NVIDIA G96CM [GeForce 9600M GT] vendor: Acer Incorporated ALI driver: nouveau v: kernel 
           bus ID: 01:00.0 
           Display: x11 server: X.Org 1.20.11 driver: loaded: modesetting unloaded: fbdev,vesa resolution: 1280x800~60Hz 
           OpenGL: renderer: NV96 v: 3.3 Mesa 20.3.5 direct render: Yes 

Minitube regularly starts , videos are correctly searched and played, but screen completely freezes (without artifacts) after a variable time. When computer freezes, mouse stops moving, keyboard does not interact with the graphical environment, audio and video stop playing, I cannon access to text console (e.g. with CTRL+ALT+F1). Fortunately, the Magic Sys Req [1] is still active and I can get kernel stack traces and/or shutdown the crippled computer.

This an example of a kernel call trace generated during a lock up :

feb 14 09:11:18 kernel: INFO: task Xorg:715 blocked for more than 120 seconds.
feb 14 09:11:18 kernel:       Not tainted 6.1.0-3-amd64 #1 Debian 6.1.8-1
feb 14 09:11:18 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
feb 14 09:11:18 kernel: task:Xorg            state:D stack:0     pid:715   ppid:678    flags:0x00400004
feb 14 09:11:18 kernel: Call Trace:
feb 14 09:11:18 kernel:  <TASK>
feb 14 09:11:18 kernel:  __schedule+0x351/0xa20
feb 14 09:11:18 kernel:  schedule+0x5d/0xe0
feb 14 09:11:18 kernel:  schedule_preempt_disabled+0x14/0x30
feb 14 09:11:18 kernel:  __ww_mutex_lock.constprop.0+0x577/0x9e0
feb 14 09:11:18 kernel:  ? _raw_spin_unlock+0x15/0x30
feb 14 09:11:18 kernel:  drm_modeset_lock+0x8d/0xd0 [drm]
feb 14 09:11:18 kernel:  drm_crtc_get_sequence_ioctl+0xe8/0x1a0 [drm]
feb 14 09:11:18 kernel:  ? drm_wait_vblank_ioctl+0x770/0x770 [drm]
feb 14 09:11:18 kernel:  drm_ioctl_kernel+0xc9/0x170 [drm]
feb 14 09:11:18 kernel:  drm_ioctl+0x1e7/0x450 [drm]
feb 14 09:11:18 kernel:  ? drm_wait_vblank_ioctl+0x770/0x770 [drm]
feb 14 09:11:18 kernel:  nouveau_drm_ioctl+0x56/0xb0 [nouveau]
feb 14 09:11:18 kernel:  __x64_sys_ioctl+0x90/0xd0
feb 14 09:11:18 kernel:  do_syscall_64+0x5b/0xc0
feb 14 09:11:18 kernel:  ? fpregs_assert_state_consistent+0x22/0x50
feb 14 09:11:18 kernel:  ? exit_to_user_mode_prepare+0x171/0x1c0
feb 14 09:11:18 kernel:  ? syscall_exit_to_user_mode+0x17/0x40
feb 14 09:11:18 kernel:  ? do_syscall_64+0x67/0xc0
feb 14 09:11:18 kernel:  ? syscall_exit_to_user_mode+0x17/0x40
feb 14 09:11:18 kernel:  ? do_syscall_64+0x67/0xc0
feb 14 09:11:18 kernel:  ? fpregs_assert_state_consistent+0x22/0x50
feb 14 09:11:18 kernel:  ? exit_to_user_mode_prepare+0x171/0x1c0
feb 14 09:11:18 kernel:  ? syscall_exit_to_user_mode+0x17/0x40
feb 14 09:11:18 kernel:  ? do_syscall_64+0x67/0xc0
feb 14 09:11:18 kernel:  ? do_syscall_64+0x67/0xc0
feb 14 09:11:18 kernel:  entry_SYSCALL_64_after_hwframe+0x63/0xcd
feb 14 09:11:18 kernel: RIP: 0033:0x7fe1f211d5f7
feb 14 09:11:18 kernel: RSP: 002b:00007ffc4a5898f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
feb 14 09:11:18 kernel: RAX: ffffffffffffffda RBX: 00007ffc4a589930 RCX: 00007fe1f211d5f7
feb 14 09:11:18 kernel: RDX: 00007ffc4a589930 RSI: 00000000c018643b RDI: 0000000000000010
feb 14 09:11:18 kernel: RBP: 00000000c018643b R08: 000055c05be81990 R09: 0000000000000000
feb 14 09:11:18 kernel: R10: 000055c05c5213c0 R11: 0000000000000246 R12: 00007ffc4a589a50
feb 14 09:11:18 kernel: R13: 0000000000000010 R14: 000055c05bd92500 R15: 00007ffc4a589990
feb 14 09:11:18 kernel:  </TASK>
feb 14 09:11:18 kernel: INFO: task kworker/u4:3:8083 blocked for more than 120 seconds.
feb 14 09:11:18 kernel:       Not tainted 6.1.0-3-amd64 #1 Debian 6.1.8-1
feb 14 09:11:18 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
feb 14 09:11:18 kernel: task:kworker/u4:3    state:D stack:0     pid:8083  ppid:2      flags:0x00004000
feb 14 09:11:18 kernel: Workqueue: events_unbound nv50_disp_atomic_commit_work [nouveau]
feb 14 09:11:18 kernel: Call Trace:
feb 14 09:11:18 kernel:  <TASK>
feb 14 09:11:18 kernel:  __schedule+0x351/0xa20
feb 14 09:11:18 kernel:  schedule+0x5d/0xe0
feb 14 09:11:18 kernel:  schedule_timeout+0x118/0x150
feb 14 09:11:18 kernel:  dma_fence_default_wait+0x1a5/0x260
feb 14 09:11:18 kernel:  ? __bpf_trace_dma_fence+0x10/0x10
feb 14 09:11:18 kernel:  dma_fence_wait_timeout+0x108/0x130
feb 14 09:11:18 kernel:  drm_atomic_helper_wait_for_fences+0x82/0xe0 [drm_kms_helper]
feb 14 09:11:18 kernel:  nv50_disp_atomic_commit_tail+0x8d/0x8e0 [nouveau]
feb 14 09:11:18 kernel:  ? _raw_spin_unlock+0x15/0x30
feb 14 09:11:18 kernel:  ? finish_task_switch.isra.0+0x9b/0x300
feb 14 09:11:18 kernel:  process_one_work+0x1c7/0x380
feb 14 09:11:18 kernel:  worker_thread+0x4d/0x380
feb 14 09:11:18 kernel:  ? rescuer_thread+0x3a0/0x3a0
feb 14 09:11:18 kernel:  kthread+0xe9/0x110
feb 14 09:11:18 kernel:  ? kthread_complete_and_exit+0x20/0x20
feb 14 09:11:18 kernel:  ret_from_fork+0x22/0x30
feb 14 09:11:18 kernel:  </TASK>

As you can see, the X server (Xorg) locks up because the nouveau kernel module locks up.

The issue is always replicable. Every time I start playing a video with minitube, the kernel immediately starts reporting errors trapped by the GPU, for example:

Feb 14 09:07:10 kernel: nouveau 0000:01:00.0: gr: TRAP_PROP - TP 0 - 00001000 [RT_LINEAR_MISMATCH] - Address 0000000000
feb 14 09:07:10 kernel: nouveau 0000:01:00.0: gr: TRAP_PROP - TP 0 - e0c: 00000000, e18: 00000000, e1c: 05500010, e20: 00001100, e24: 00020070
feb 14 09:07:10 kernel: nouveau 0000:01:00.0: gr: TRAP_PROP - TP 1 - 00001000 [RT_LINEAR_MISMATCH] - Address 0000000000
feb 14 09:07:10 kernel: nouveau 0000:01:00.0: gr: TRAP_PROP - TP 1 - e0c: 00000000, e18: 00000000, e1c: 05500000, e20: 00001100, e24: 00020070
feb 14 09:07:10 kernel: nouveau 0000:01:00.0: gr: 00200000 [] ch 8 [001f0f6000 Xorg[715]] subc 3 class 8297 mthd 1b0c data 0000f010
feb 14 09:07:10 kernel: nouveau 0000:01:00.0: fb: trapped write at 0020870000 on channel 8 [1f0f6000 Xorg[715]] engine 00 [PGRAPH] client 0b [PROP] subclient 00 [RT0] reason 00000002 [PAGE_NOT_PRESENT]
feb 14 09:07:31 kernel: nouveau 0000:01:00.0: firmware: direct-loading firmware nouveau/nv84_xuc103
feb 14 09:07:31 kernel: nouveau 0000:01:00.0: firmware: direct-loading firmware nouveau/nv84_xuc00f
feb 14 09:07:31 kernel: nouveau 0000:01:00.0: gr: TRAP_PROP - TP 0 - 00000040 [RT_FAULT] - Address 0021e70080
feb 14 09:07:31 kernel: nouveau 0000:01:00.0: gr: TRAP_PROP - TP 0 - e0c: 00000000, e18: 00000000, e1c: 06ac0020, e20: 00001100, e24: 00030000
feb 14 09:07:31 kernel: nouveau 0000:01:00.0: gr: TRAP_PROP - TP 1 - 00000040 [RT_FAULT] - Address 0021e700c0
feb 14 09:07:31 kernel: nouveau 0000:01:00.0: gr: TRAP_PROP - TP 1 - e0c: 00000000, e18: 00000000, e1c: 06ac0030, e20: 00001100, e24: 00030000
feb 14 09:07:31 kernel: nouveau 0000:01:00.0: gr: 00200000 [] ch 8 [001f0f6000 Xorg[715]] subc 3 class 8297 mthd 1b0c data 0000f010
feb 14 09:07:31 kernel: nouveau 0000:01:00.0: fb: trapped write at 0021e700c0 on channel 8 [1f0f6000 Xorg[715]] engine 00 [PGRAPH] client 0b [PROP] subclient 00 [RT0] reason 00000002 [PAGE_NOT_PRESENT]

These errors populate the system logs as soon as minitube starts playing. As you can see from the previous log, the GPU traps some error and it continuously try to recover (reloading the GPU firmware, too), but after a certain number of recoveries it hangs and afterwards the X server hangs, too.

I replicated the issue with both Linux kernel version 5.10.165 and version 6.1.8 , therefore newer kernels are affected, too.

Minitube clearly triggers a malfunction in the nouveau kernel module (the open source kernel module provided by Linux kernel for nvidia graphic cards).

Minitube uses the mpv program as backend for playing audio/video, I tried to modify some parameters used to initialize mpv in lib/media/src/mpv/mediampv.cpp, (the "vo" option, for example) but I had no luck: it continues to trigger the nouveau malfunctions.

I would like to identify and replicate the commands used by Minitube to activate the backend player: can you help me ?

If we identify the offending commands sent to the video player backed, minitube could be configured not to use them. Furthermore, the issue could be reported upstream to Linux kernel developers in reference to the backend program.

Let me know if you need more information.


[1] https://it.wikipedia.org/wiki/Magic_Sys_Req

@computer-enthusiastic computer-enthusiastic changed the title Minitube locks ups the GPU with nouveau kernel module on Debian GNU/Linux Minitube locks up the GPU with nouveau kernel module on Debian GNU/Linux Feb 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant