-
-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync drm/panthor and drm/sched with 6.12-rc2 #264
Commits on Oct 14, 2024
-
drm/panthor: Sync commit from drm-tip with Linux tree
Minor change: in drm-tip one comment was in a different place than in the mainline branch. Fixing this to potentially simplify merge.
Configuration menu - View commit details
-
Copy full SHA for 2b67c59 - Browse repository at this point
Copy the full SHA 2b67c59View commit details -
drm/sched: Re-queue run job worker when drm_sched_entity_pop_job() re…
…turns NULL Rather then loop over entities until one with a ready job is found, re-queue the run job worker when drm_sched_entity_pop_job() returns NULL. Signed-off-by: Matthew Brost <[email protected]> Reviewed-by: Christian König <[email protected]> Fixes: 66dbd90 ("drm/sched: Drain all entities in DRM sched run job worker") Reviewed-by: Luben Tuikov <[email protected]> Signed-off-by: Dave Airlie <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Configuration menu - View commit details
-
Copy full SHA for 13a04e9 - Browse repository at this point
Copy the full SHA 13a04e9View commit details -
drm/scheduler: Simplify the allocation of slab caches in drm_sched_fe…
…nce_slab_init Use the new KMEM_CACHE() macro instead of direct kmem_cache_create to simplify the creation of SLAB caches. Signed-off-by: Kunwu Chan <[email protected]> Signed-off-by: Daniel Vetter <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Configuration menu - View commit details
-
Copy full SHA for e305ff9 - Browse repository at this point
Copy the full SHA e305ff9View commit details -
drm/sched: fix null-ptr-deref in init entity
The bug can be triggered by sending an amdgpu_cs_wait_ioctl to the AMDGPU DRM driver on any ASICs with valid context. The bug was reported by Joonkyo Jung <[email protected]>. For example the following code: static void Syzkaller2(int fd) { union drm_amdgpu_ctx arg1; union drm_amdgpu_wait_cs arg2; arg1.in.op = AMDGPU_CTX_OP_ALLOC_CTX; ret = drmIoctl(fd, 0x140106442 /* amdgpu_ctx_ioctl */, &arg1); arg2.in.handle = 0x0; arg2.in.timeout = 0x2000000000000; arg2.in.ip_type = AMD_IP_VPE /* 0x9 */; arg2->in.ip_instance = 0x0; arg2.in.ring = 0x0; arg2.in.ctx_id = arg1.out.alloc.ctx_id; drmIoctl(fd, 0xc0206449 /* AMDGPU_WAIT_CS * /, &arg2); } The ioctl AMDGPU_WAIT_CS without previously submitted job could be assumed that the error should be returned, but the following commit 1decbf6 modified the logic and allowed to have sched_rq equal to NULL. As a result when there is no job the ioctl AMDGPU_WAIT_CS returns success. The change fixes null-ptr-deref in init entity and the stack below demonstrates the error condition: [ +0.000007] BUG: kernel NULL pointer dereference, address: 0000000000000028 [ +0.007086] #PF: supervisor read access in kernel mode [ +0.005234] #PF: error_code(0x0000) - not-present page [ +0.005232] PGD 0 P4D 0 [ +0.002501] Oops: 0000 [armbian#1] PREEMPT SMP KASAN NOPTI [ +0.005034] CPU: 10 PID: 9229 Comm: amd_basic Tainted: G B W L 6.7.0+ armbian#4 [ +0.007797] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1401 12/03/2020 [ +0.009798] RIP: 0010:drm_sched_entity_init+0x2d3/0x420 [gpu_sched] [ +0.006426] Code: 80 00 00 00 00 00 00 00 e8 1a 81 82 e0 49 89 9c 24 c0 00 00 00 4c 89 ef e8 4a 80 82 e0 49 8b 5d 00 48 8d 7b 28 e8 3d 80 82 e0 <48> 83 7b 28 00 0f 84 28 01 00 00 4d 8d ac 24 98 00 00 00 49 8d 5c [ +0.019094] RSP: 0018:ffffc90014c1fa40 EFLAGS: 00010282 [ +0.005237] RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffffff8113f3fa [ +0.007326] RDX: fffffbfff0a7889d RSI: 0000000000000008 RDI: ffffffff853c44e0 [ +0.007264] RBP: ffffc90014c1fa80 R08: 0000000000000001 R09: fffffbfff0a7889c [ +0.007266] R10: ffffffff853c44e7 R11: 0000000000000001 R12: ffff8881a719b010 [ +0.007263] R13: ffff88810d412748 R14: 0000000000000002 R15: 0000000000000000 [ +0.007264] FS: 00007ffff7045540(0000) GS:ffff8883cc900000(0000) knlGS:0000000000000000 [ +0.008236] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.005851] CR2: 0000000000000028 CR3: 000000011912e000 CR4: 0000000000350ef0 [ +0.007175] Call Trace: [ +0.002561] <TASK> [ +0.002141] ? show_regs+0x6a/0x80 [ +0.003473] ? __die+0x25/0x70 [ +0.003124] ? page_fault_oops+0x214/0x720 [ +0.004179] ? preempt_count_sub+0x18/0xc0 [ +0.004093] ? __pfx_page_fault_oops+0x10/0x10 [ +0.004590] ? srso_return_thunk+0x5/0x5f [ +0.004000] ? vprintk_default+0x1d/0x30 [ +0.004063] ? srso_return_thunk+0x5/0x5f [ +0.004087] ? vprintk+0x5c/0x90 [ +0.003296] ? drm_sched_entity_init+0x2d3/0x420 [gpu_sched] [ +0.005807] ? srso_return_thunk+0x5/0x5f [ +0.004090] ? _printk+0xb3/0xe0 [ +0.003293] ? __pfx__printk+0x10/0x10 [ +0.003735] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20 [ +0.005482] ? do_user_addr_fault+0x345/0x770 [ +0.004361] ? exc_page_fault+0x64/0xf0 [ +0.003972] ? asm_exc_page_fault+0x27/0x30 [ +0.004271] ? add_taint+0x2a/0xa0 [ +0.003476] ? drm_sched_entity_init+0x2d3/0x420 [gpu_sched] [ +0.005812] amdgpu_ctx_get_entity+0x3f9/0x770 [amdgpu] [ +0.009530] ? finish_task_switch.isra.0+0x129/0x470 [ +0.005068] ? __pfx_amdgpu_ctx_get_entity+0x10/0x10 [amdgpu] [ +0.010063] ? __kasan_check_write+0x14/0x20 [ +0.004356] ? srso_return_thunk+0x5/0x5f [ +0.004001] ? mutex_unlock+0x81/0xd0 [ +0.003802] ? srso_return_thunk+0x5/0x5f [ +0.004096] amdgpu_cs_wait_ioctl+0xf6/0x270 [amdgpu] [ +0.009355] ? __pfx_amdgpu_cs_wait_ioctl+0x10/0x10 [amdgpu] [ +0.009981] ? srso_return_thunk+0x5/0x5f [ +0.004089] ? srso_return_thunk+0x5/0x5f [ +0.004090] ? __srcu_read_lock+0x20/0x50 [ +0.004096] drm_ioctl_kernel+0x140/0x1f0 [drm] [ +0.005080] ? __pfx_amdgpu_cs_wait_ioctl+0x10/0x10 [amdgpu] [ +0.009974] ? __pfx_drm_ioctl_kernel+0x10/0x10 [drm] [ +0.005618] ? srso_return_thunk+0x5/0x5f [ +0.004088] ? __kasan_check_write+0x14/0x20 [ +0.004357] drm_ioctl+0x3da/0x730 [drm] [ +0.004461] ? __pfx_amdgpu_cs_wait_ioctl+0x10/0x10 [amdgpu] [ +0.009979] ? __pfx_drm_ioctl+0x10/0x10 [drm] [ +0.004993] ? srso_return_thunk+0x5/0x5f [ +0.004090] ? __kasan_check_write+0x14/0x20 [ +0.004356] ? srso_return_thunk+0x5/0x5f [ +0.004090] ? _raw_spin_lock_irqsave+0x99/0x100 [ +0.004712] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ +0.005063] ? __pfx_arch_do_signal_or_restart+0x10/0x10 [ +0.005477] ? srso_return_thunk+0x5/0x5f [ +0.004000] ? preempt_count_sub+0x18/0xc0 [ +0.004237] ? srso_return_thunk+0x5/0x5f [ +0.004090] ? _raw_spin_unlock_irqrestore+0x27/0x50 [ +0.005069] amdgpu_drm_ioctl+0x7e/0xe0 [amdgpu] [ +0.008912] __x64_sys_ioctl+0xcd/0x110 [ +0.003918] do_syscall_64+0x5f/0xe0 [ +0.003649] ? noist_exc_debug+0xe6/0x120 [ +0.004095] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ +0.005150] RIP: 0033:0x7ffff7b1a94f [ +0.003647] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00 [ +0.019097] RSP: 002b:00007fffffffe0a0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.007708] RAX: ffffffffffffffda RBX: 000055555558b360 RCX: 00007ffff7b1a94f [ +0.007176] RDX: 000055555558b360 RSI: 00000000c0206449 RDI: 0000000000000003 [ +0.007326] RBP: 00000000c0206449 R08: 000055555556ded0 R09: 000000007fffffff [ +0.007176] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffffffe5d8 [ +0.007238] R13: 0000000000000003 R14: 000055555555cba8 R15: 00007ffff7ffd040 [ +0.007250] </TASK> v2: Reworked check to guard against null ptr deref and added helpful comments (Christian) Cc: Christian Koenig <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Luben Tuikov <[email protected]> Cc: Bas Nieuwenhuizen <[email protected]> Cc: Joonkyo Jung <[email protected]> Cc: Dokyung Song <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Signed-off-by: Vitaly Prosyak <[email protected]> Reviewed-by: Christian König <[email protected]> Fixes: 56e4496 ("drm/sched: Convert the GPU scheduler to variable number of run-queues") Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Christian König <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e33a0f8 - Browse repository at this point
Copy the full SHA e33a0f8View commit details -
drm/scheduler: remove full_recover from drm_sched_start
This was basically just another one of amdgpus hacks. The parameter allowed to restart the scheduler without turning fence signaling on again. That this is absolutely not a good idea should be obvious by now since the fences will then just sit there and never signal. While at it cleanup the code a bit. Signed-off-by: Christian König <[email protected]> Reviewed-by: Matthew Brost <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Configuration menu - View commit details
-
Copy full SHA for 2abb6f4 - Browse repository at this point
Copy the full SHA 2abb6f4View commit details -
drm/panthor: Fix race when converting group handle to group object
XArray provides it's own internal lock which protects the internal array when entries are being simultaneously added and removed. However there is still a race between retrieving the pointer from the XArray and incrementing the reference count. To avoid this race simply hold the internal XArray lock when incrementing the reference count, this ensures there cannot be a racing call to xa_erase(). Fixes: de85488 ("drm/panthor: Add the scheduler logical block") Signed-off-by: Steven Price <[email protected]> Reviewed-by: Boris Brezillon <[email protected]> Reviewed-by: Liviu Dudau <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Configuration menu - View commit details
-
Copy full SHA for 90e0e0b - Browse repository at this point
Copy the full SHA 90e0e0bView commit details -
drm/sched: Fix dynamic job-flow control race
Fixes a race condition reported here: AsahiLinux/linux#309 (comment) The whole premise of lockless access to a single-producer-single- consumer queue is that there is just a single producer and single consumer. That means we can't call drm_sched_can_queue() (which is about queueing more work to the hw, not to the spsc queue) from anywhere other than the consumer (wq). This call in the producer is just an optimization to avoid scheduling the consuming worker if it cannot yet queue more work to the hw. It is safe to drop this optimization to avoid the race condition. Suggested-by: Asahi Lina <[email protected]> Fixes: a78422e ("drm/sched: implement dynamic job-flow control") Closes: AsahiLinux/linux#309 Cc: [email protected] Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Danilo Krummrich <[email protected]> Tested-by: Janne Grunau <[email protected]> Signed-off-by: Danilo Krummrich <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Configuration menu - View commit details
-
Copy full SHA for 830795c - Browse repository at this point
Copy the full SHA 830795cView commit details -
drm/sched: Add locking to drm_sched_entity_modify_sched
Without the locking amdgpu currently can race between amdgpu_ctx_set_entity_priority() (via drm_sched_entity_modify_sched()) and drm_sched_job_arm(), leading to the latter accesing potentially inconsitent entity->sched_list and entity->num_sched_list pair. v2: * Improve commit message. (Philipp) Signed-off-by: Tvrtko Ursulin <[email protected]> Fixes: b37aced ("drm/scheduler: implement a function to modify sched list") Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Luben Tuikov <[email protected]> Cc: Matthew Brost <[email protected]> Cc: David Airlie <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: [email protected] Cc: Philipp Stanner <[email protected]> Cc: <[email protected]> # v5.7+ Reviewed-by: Christian König <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Christian König <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 9820361 - Browse repository at this point
Copy the full SHA 9820361View commit details -
drm/sched: Always wake up correct scheduler in drm_sched_entity_push_job
Since drm_sched_entity_modify_sched() can modify the entities run queue, lets make sure to only dereference the pointer once so both adding and waking up are guaranteed to be consistent. Alternative of moving the spin_unlock to after the wake up would for now be more problematic since the same lock is taken inside drm_sched_rq_update_fifo(). v2: * Improve commit message. (Philipp) * Cache the scheduler pointer directly. (Christian) Signed-off-by: Tvrtko Ursulin <[email protected]> Fixes: b37aced ("drm/scheduler: implement a function to modify sched list") Cc: Christian König <[email protected]> Cc: Alex Deucher <[email protected]> Cc: Luben Tuikov <[email protected]> Cc: Matthew Brost <[email protected]> Cc: David Airlie <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Philipp Stanner <[email protected]> Cc: [email protected] Cc: <[email protected]> # v5.7+ Reviewed-by: Christian König <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Christian König <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 6af15ae - Browse repository at this point
Copy the full SHA 6af15aeView commit details -
drm/panthor: Lock the VM resv before calling drm_gpuvm_bo_obtain_prea…
…lloc() drm_gpuvm_bo_obtain_prealloc() will call drm_gpuvm_bo_put() on our pre-allocated BO if the <BO,VM> association exists. Given we only have one ref on preallocated_vm_bo, drm_gpuvm_bo_destroy() will be called immediately, and we have to hold the VM resv lock when calling this function. Fixes: 647810e ("drm/panthor: Add the MMU/VM logical block") Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Liviu Dudau <[email protected]> Reviewed-by: Steven Price <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Configuration menu - View commit details
-
Copy full SHA for 2096c77 - Browse repository at this point
Copy the full SHA 2096c77View commit details -
drm/panthor: Fix access to uninitialized variable in tick_ctx_cleanup()
The group variable can't be used to retrieve ptdev in our second loop, because it points to the previously iterated list_head, not a valid group. Get the ptdev object from the scheduler instead. Cc: <[email protected]> Fixes: d72f049 ("drm/panthor: Allow driver compilation") Reported-by: kernel test robot <[email protected]> Reported-by: Julia Lawall <[email protected]> Closes: https://lore.kernel.org/r/[email protected]/ Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Liviu Dudau <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Configuration menu - View commit details
-
Copy full SHA for 0eaab07 - Browse repository at this point
Copy the full SHA 0eaab07View commit details -
drm/panthor: Don't declare a queue blocked if deferred operations are…
… pending If deferred operations are pending, we want to wait for those to land before declaring the queue blocked on a SYNC_WAIT. We need this to deal with the case where the sync object is signalled through a deferred SYNC_{ADD,SET} from the same queue. If we don't do that and the group gets scheduled out before the deferred SYNC_{SET,ADD} is executed, we'll end up with a timeout, because no external SYNC_{SET,ADD} will make the scheduler reconsider the group for execution. Fixes: de85488 ("drm/panthor: Add the scheduler logical block") Cc: <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Steven Price <[email protected]> Reviewed-by: Liviu Dudau <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Configuration menu - View commit details
-
Copy full SHA for 62884fe - Browse repository at this point
Copy the full SHA 62884feView commit details -
drm/panthor: Don't add write fences to the shared BOs
The only user (the mesa gallium driver) is already assuming explicit synchronization and doing the export/import dance on shared BOs. The only reason we were registering ourselves as writers on external BOs is because Xe, which was the reference back when we developed Panthor, was doing so. Turns out Xe was wrong, and we really want bookkeep on all registered fences, so userspace can explicitly upgrade those to read/write when needed. Fixes: 4bdca11 ("drm/panthor: Add the driver frontend block") Cc: Matthew Brost <[email protected]> Cc: Simona Vetter <[email protected]> Cc: <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> Reviewed-by: Steven Price <[email protected]> Reviewed-by: Liviu Dudau <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
Configuration menu - View commit details
-
Copy full SHA for 4a47ff1 - Browse repository at this point
Copy the full SHA 4a47ff1View commit details -
drm/sched: Use drm sched lockdep map for submit_wq
Avoid leaking a lockdep map on each drm sched creation and destruction by using a single lockdep map for all drm sched allocated submit_wq. v2: - Use alloc_ordered_workqueue_lockdep_map (Tejun) Cc: Luben Tuikov <[email protected]> Cc: Christian König <[email protected]> Signed-off-by: Matthew Brost <[email protected]> Reviewed-by: Nirmoy Das <[email protected]> Acked-by: Danilo Krummrich <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Maarten Lankhorst <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 3fe3d46 - Browse repository at this point
Copy the full SHA 3fe3d46View commit details