public inbox for drm-ai-reviews@public-inbox.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] drm/amdgpu: fix locking issues in PASID IDR management
@ 2026-03-30  5:30 Mikhail Gavrilov
  2026-03-30  5:30 ` [PATCH v2 1/2] drm/amdgpu: fix sleeping allocation under spinlock in PASID IDR Mikhail Gavrilov
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Mikhail Gavrilov @ 2026-03-30  5:30 UTC (permalink / raw)
  To: Alex Deucher, Christian König
  Cc: Eric Huang, David Airlie, Simona Vetter, amd-gfx, dri-devel,
	stable, Mikhail Gavrilov

Commit 8f1de51f49be ("drm/amdgpu: prevent immediate PASID reuse case")
converted the global PASID allocator from IDA to IDR with a spinlock
for cyclic allocation.  This introduced two locking bugs:
 
1) idr_alloc_cyclic() is called with GFP_KERNEL under spin_lock(),
   which can sleep.
 
2) amdgpu_pasid_free() can be called from hardirq context via the
   fence signal path (amdgpu_pasid_free_cb), but the lock is taken
   with plain spin_lock() in process context, creating a potential
   deadlock:
 
     CPU0
     ----
     spin_lock(&amdgpu_pasid_idr_lock)   // process context, IRQs on
     <Interrupt>
       spin_lock(&amdgpu_pasid_idr_lock) // deadlock
 
   The hardirq call chain is:
 
     sdma_v6_0_process_trap_irq
      -> amdgpu_fence_process
       -> dma_fence_signal
        -> drm_sched_job_done
         -> dma_fence_signal
          -> amdgpu_pasid_free_cb
           -> amdgpu_pasid_free
 
   This was observed on an RX 7900 XTX when exiting a Vulkan game
   running under Proton/Wine, which triggers the fence callback path
   during VM teardown.
 
Patch 1 fixes the sleeping-under-spinlock by using idr_preload() with
GFP_KERNEL before taking the lock, then GFP_NOWAIT for the actual
allocation.
 
Patch 2 converts all three spin_lock/spin_unlock call sites to
spin_lock_irqsave/spin_unlock_irqrestore.
 
Tested on ASUS ROG STRIX B650E-I / Ryzen 9 7950X / RX 7900 XTX with
CONFIG_PROVE_LOCKING=y.  The lockdep warning is no longer triggered
after applying both patches.

Mikhail Gavrilov (2):
  drm/amdgpu: fix sleeping allocation under spinlock in PASID IDR
  drm/amdgpu: use spin_lock_irqsave for PASID IDR lock

 drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/2] drm/amdgpu: fix sleeping allocation under spinlock in PASID IDR
  2026-03-30  5:30 [PATCH v2 0/2] drm/amdgpu: fix locking issues in PASID IDR management Mikhail Gavrilov
@ 2026-03-30  5:30 ` Mikhail Gavrilov
  2026-03-31  7:36   ` Claude review: " Claude Code Review Bot
  2026-03-30  5:30 ` [PATCH v2 2/2] drm/amdgpu: use spin_lock_irqsave for PASID IDR lock Mikhail Gavrilov
  2026-03-31  7:36 ` Claude review: drm/amdgpu: fix locking issues in PASID IDR management Claude Code Review Bot
  2 siblings, 1 reply; 6+ messages in thread
From: Mikhail Gavrilov @ 2026-03-30  5:30 UTC (permalink / raw)
  To: Alex Deucher, Christian König
  Cc: Eric Huang, David Airlie, Simona Vetter, amd-gfx, dri-devel,
	stable, Mikhail Gavrilov

Commit 14b81abe7bdc ("drm/amdgpu: prevent immediate PASID reuse case")
switched from ida to idr_alloc_cyclic() protected by a spinlock, but
passes GFP_KERNEL to the allocator.  idr_alloc_cyclic() may need to
allocate radix-tree nodes, which with GFP_KERNEL can sleep — illegal
under a spinlock that disables preemption.  With CONFIG_PREEMPT or
lockdep enabled this triggers:

  BUG: sleeping function called from invalid context at
       ./include/linux/sched/mm.h:323
  in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 570
  ...
  #1: ffffffffc2cd24f8 (amdgpu_pasid_idr_lock){+.+.}-{3:3},
      at: amdgpu_pasid_alloc+0x24/0x210 [amdgpu]
  ...
  kmem_cache_alloc_noprof+0x41d/0x780
  radix_tree_node_alloc.constprop.0+0x56/0x3a0
  idr_get_free+0x330/0x830
  idr_alloc_u32+0x14a/0x2e0
  idr_alloc_cyclic+0xd3/0x1d0
  amdgpu_pasid_alloc+0x51/0x210 [amdgpu]

A mutex is not an option because amdgpu_pasid_free() is reachable from
dma-fence callbacks (amdgpu_pasid_free_cb) which may run in IRQ context.

Use idr_preload(GFP_KERNEL) before taking the spinlock to pre-allocate
radix-tree nodes, then pass GFP_NOWAIT inside the critical section so
the allocator draws from the preloaded pool and never sleeps.  This is
the standard kernel pattern for IDR allocation under a spinlock.

Fixes: 14b81abe7bdc ("drm/amdgpu: prevent immediate PASID reuse case")
Cc: stable@vger.kernel.org
Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
index d88523568b62..515775eab2ef 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
@@ -67,10 +67,12 @@ int amdgpu_pasid_alloc(unsigned int bits)
 	if (bits == 0)
 		return -EINVAL;
 
+	idr_preload(GFP_KERNEL);
 	spin_lock(&amdgpu_pasid_idr_lock);
 	pasid = idr_alloc_cyclic(&amdgpu_pasid_idr, NULL, 1,
-				 1U << bits, GFP_KERNEL);
+				 1U << bits, GFP_NOWAIT);
 	spin_unlock(&amdgpu_pasid_idr_lock);
+	idr_preload_end();
 
 	if (pasid >= 0)
 		trace_amdgpu_pasid_allocated(pasid);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/2] drm/amdgpu: use spin_lock_irqsave for PASID IDR lock
  2026-03-30  5:30 [PATCH v2 0/2] drm/amdgpu: fix locking issues in PASID IDR management Mikhail Gavrilov
  2026-03-30  5:30 ` [PATCH v2 1/2] drm/amdgpu: fix sleeping allocation under spinlock in PASID IDR Mikhail Gavrilov
@ 2026-03-30  5:30 ` Mikhail Gavrilov
  2026-03-31  7:36   ` Claude review: " Claude Code Review Bot
  2026-03-31  7:36 ` Claude review: drm/amdgpu: fix locking issues in PASID IDR management Claude Code Review Bot
  2 siblings, 1 reply; 6+ messages in thread
From: Mikhail Gavrilov @ 2026-03-30  5:30 UTC (permalink / raw)
  To: Alex Deucher, Christian König
  Cc: Eric Huang, David Airlie, Simona Vetter, amd-gfx, dri-devel,
	stable, Mikhail Gavrilov

amdgpu_pasid_free() can be called from hardirq context via the fence
signal path:

  sdma_v6_0_process_trap_irq
   -> amdgpu_fence_process
    -> dma_fence_signal
     -> drm_sched_job_done
      -> dma_fence_signal
       -> amdgpu_pasid_free_cb
        -> amdgpu_pasid_free
         -> spin_lock(&amdgpu_pasid_idr_lock)  <- hardirq context

But the lock was originally taken with plain spin_lock() in process
context (amdgpu_pasid_alloc), creating an inconsistent
{HARDIRQ-ON-W} -> {IN-HARDIRQ-W} lock state that can deadlock if an
interrupt arrives while the lock is held on the same CPU.

Use spin_lock_irqsave/spin_unlock_irqrestore for all call sites of
amdgpu_pasid_idr_lock to prevent the deadlock.

This patch applies on top of "drm/amdgpu: fix sleeping allocation
under spinlock in PASID IDR".

Fixes: 8f1de51f49be ("drm/amdgpu: prevent immediate PASID reuse case")
Cc: stable@vger.kernel.org
Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
index 515775eab2ef..762ceb3c708a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
@@ -62,16 +62,17 @@ struct amdgpu_pasid_cb {
  */
 int amdgpu_pasid_alloc(unsigned int bits)
 {
+	unsigned long flags;
 	int pasid;
 
 	if (bits == 0)
 		return -EINVAL;
 
 	idr_preload(GFP_KERNEL);
-	spin_lock(&amdgpu_pasid_idr_lock);
+	spin_lock_irqsave(&amdgpu_pasid_idr_lock, flags);
 	pasid = idr_alloc_cyclic(&amdgpu_pasid_idr, NULL, 1,
 				 1U << bits, GFP_NOWAIT);
-	spin_unlock(&amdgpu_pasid_idr_lock);
+	spin_unlock_irqrestore(&amdgpu_pasid_idr_lock, flags);
 	idr_preload_end();
 
 	if (pasid >= 0)
@@ -86,11 +87,12 @@ int amdgpu_pasid_alloc(unsigned int bits)
  */
 void amdgpu_pasid_free(u32 pasid)
 {
+	unsigned long flags;
 	trace_amdgpu_pasid_freed(pasid);
 
-	spin_lock(&amdgpu_pasid_idr_lock);
+	spin_lock_irqsave(&amdgpu_pasid_idr_lock, flags);
 	idr_remove(&amdgpu_pasid_idr, pasid);
-	spin_unlock(&amdgpu_pasid_idr_lock);
+	spin_unlock_irqrestore(&amdgpu_pasid_idr_lock, flags);
 }
 
 static void amdgpu_pasid_free_cb(struct dma_fence *fence,
@@ -633,7 +635,9 @@ void amdgpu_vmid_mgr_fini(struct amdgpu_device *adev)
  */
 void amdgpu_pasid_mgr_cleanup(void)
 {
-	spin_lock(&amdgpu_pasid_idr_lock);
+	unsigned long flags;
+
+	spin_lock_irqsave(&amdgpu_pasid_idr_lock, flags);
 	idr_destroy(&amdgpu_pasid_idr);
-	spin_unlock(&amdgpu_pasid_idr_lock);
+	spin_unlock_irqrestore(&amdgpu_pasid_idr_lock, flags);
 }
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Claude review: drm/amdgpu: fix locking issues in PASID IDR management
  2026-03-30  5:30 [PATCH v2 0/2] drm/amdgpu: fix locking issues in PASID IDR management Mikhail Gavrilov
  2026-03-30  5:30 ` [PATCH v2 1/2] drm/amdgpu: fix sleeping allocation under spinlock in PASID IDR Mikhail Gavrilov
  2026-03-30  5:30 ` [PATCH v2 2/2] drm/amdgpu: use spin_lock_irqsave for PASID IDR lock Mikhail Gavrilov
@ 2026-03-31  7:36 ` Claude Code Review Bot
  2 siblings, 0 replies; 6+ messages in thread
From: Claude Code Review Bot @ 2026-03-31  7:36 UTC (permalink / raw)
  To: dri-devel-reviews

Overall Series Review

Subject: drm/amdgpu: fix locking issues in PASID IDR management
Author: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Patches: 3
Reviewed: 2026-03-31T17:36:41.530736

---

This is a well-structured 2-patch series fixing two real locking bugs introduced by commit `14b81abe7bdc` / `8f1de51f49be` ("drm/amdgpu: prevent immediate PASID reuse case"), which converted PASID allocation from IDA to IDR with a spinlock.

The two bugs are:
1. **Sleeping under spinlock**: `idr_alloc_cyclic()` called with `GFP_KERNEL` inside `spin_lock()`.
2. **IRQ deadlock**: `amdgpu_pasid_free()` reachable from hardirq context (fence signal path) but the lock taken with plain `spin_lock()` in process context.

Both fixes use well-established kernel patterns. The patches are correctly ordered (patch 1 is a prerequisite for patch 2, as patch 2 builds on the code introduced by patch 1). The cover letter is excellent with clear problem description, call chain, and test details.

**Verdict: Series looks correct and ready to apply.** One minor nit on patch 2 below.

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Claude review: drm/amdgpu: fix sleeping allocation under spinlock in PASID IDR
  2026-03-30  5:30 ` [PATCH v2 1/2] drm/amdgpu: fix sleeping allocation under spinlock in PASID IDR Mikhail Gavrilov
@ 2026-03-31  7:36   ` Claude Code Review Bot
  0 siblings, 0 replies; 6+ messages in thread
From: Claude Code Review Bot @ 2026-03-31  7:36 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Status: Good**

The fix uses the standard `idr_preload()` / `GFP_NOWAIT` pattern:

```c
+	idr_preload(GFP_KERNEL);
 	spin_lock(&amdgpu_pasid_idr_lock);
 	pasid = idr_alloc_cyclic(&amdgpu_pasid_idr, NULL, 1,
-				 1U << bits, GFP_KERNEL);
+				 1U << bits, GFP_NOWAIT);
 	spin_unlock(&amdgpu_pasid_idr_lock);
+	idr_preload_end();
```

This is the textbook kernel approach: `idr_preload(GFP_KERNEL)` pre-allocates radix-tree nodes in sleepable context with preemption disabled, then `GFP_NOWAIT` inside the spinlock draws from the preloaded pool. `idr_preload_end()` re-enables preemption after the lock is released.

The ordering is correct: `idr_preload()` before `spin_lock()`, and `idr_preload_end()` after `spin_unlock()`. No issues.

Note: The Fixes: tag references `14b81abe7bdc` while the cover letter mentions `8f1de51f49be`. Both should refer to the same commit that introduced the IDR conversion. The cover letter and patch 2 use `8f1de51f49be` — the discrepancy in the Fixes: tag on patch 1 (`14b81abe7bdc`) should be verified. If these are different commits, the correct one to cite is whichever actually introduced `idr_alloc_cyclic()` with `GFP_KERNEL` under the spinlock.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Claude review: drm/amdgpu: use spin_lock_irqsave for PASID IDR lock
  2026-03-30  5:30 ` [PATCH v2 2/2] drm/amdgpu: use spin_lock_irqsave for PASID IDR lock Mikhail Gavrilov
@ 2026-03-31  7:36   ` Claude Code Review Bot
  0 siblings, 0 replies; 6+ messages in thread
From: Claude Code Review Bot @ 2026-03-31  7:36 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Status: Good**

Converts all three lock sites to `spin_lock_irqsave`/`spin_unlock_irqrestore`:

1. **`amdgpu_pasid_alloc()`** — process context allocator
2. **`amdgpu_pasid_free()`** — called from both process and hardirq context (via `amdgpu_pasid_free_cb`)
3. **`amdgpu_pasid_mgr_cleanup()`** — module cleanup

All three conversions are correct and follow the same pattern:
```c
+	unsigned long flags;
 ...
-	spin_lock(&amdgpu_pasid_idr_lock);
+	spin_lock_irqsave(&amdgpu_pasid_idr_lock, flags);
 ...
-	spin_unlock(&amdgpu_pasid_idr_lock);
+	spin_unlock_irqrestore(&amdgpu_pasid_idr_lock, flags);
```

**Minor nit**: In `amdgpu_pasid_free()`, the `flags` variable declaration is placed before the `trace_amdgpu_pasid_freed()` call with no blank line separating declarations from statements:

```c
void amdgpu_pasid_free(u32 pasid)
{
+	unsigned long flags;
 	trace_amdgpu_pasid_freed(pasid);
```

Kernel coding style typically expects a blank line between variable declarations and the first statement. Looking at the applied result, it does have the declaration immediately followed by the trace call with no blank line. This is a cosmetic nit — some maintainers care, some don't.

**Correctness note on `amdgpu_pasid_mgr_cleanup()`**: Using `spin_lock_irqsave` here is technically unnecessary since this runs during module unload where no concurrent IRQ-context users should exist, but it's the right thing to do for consistency — it ensures the lock annotation is uniform across all call sites, preventing future lockdep warnings if the call ordering ever changes.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-03-31  7:36 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-30  5:30 [PATCH v2 0/2] drm/amdgpu: fix locking issues in PASID IDR management Mikhail Gavrilov
2026-03-30  5:30 ` [PATCH v2 1/2] drm/amdgpu: fix sleeping allocation under spinlock in PASID IDR Mikhail Gavrilov
2026-03-31  7:36   ` Claude review: " Claude Code Review Bot
2026-03-30  5:30 ` [PATCH v2 2/2] drm/amdgpu: use spin_lock_irqsave for PASID IDR lock Mikhail Gavrilov
2026-03-31  7:36   ` Claude review: " Claude Code Review Bot
2026-03-31  7:36 ` Claude review: drm/amdgpu: fix locking issues in PASID IDR management Claude Code Review Bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox