[PATCH] drm/amdgpu: fix recursive ww_mutex acquire in amdgpu_devcoredump

public inbox for drm-ai-reviews@public-inbox.freedesktop.org
 help / color / mirror / Atom feed

* [PATCH] drm/amdgpu: fix recursive ww_mutex acquire in amdgpu_devcoredump_format
@ 2026-04-29 14:37 Mikhail Gavrilov
  2026-05-05  1:32 ` Claude review: " Claude Code Review Bot
  2026-05-05  1:32 ` Claude Code Review Bot
  0 siblings, 2 replies; 3+ messages in thread
From: Mikhail Gavrilov @ 2026-04-29 14:37 UTC (permalink / raw)
  To: amd-gfx, dri-devel, linux-kernel
  Cc: Mikhail Gavrilov, stable, Alex Deucher, Christian König,
	David Airlie, Simona Vetter, Sumit Semwal,
	Pierre-Eric Pelloux-Prayer, linux-media, linaro-mm-sig

When dumping IB contents from a hung job, amdgpu_devcoredump_format()
acquires the VM root PD's reservation lock via amdgpu_vm_lock_by_pasid()
and then, for each IB referenced by the job, calls amdgpu_bo_reserve()
on the BO that backs the IB.  Both reservations are taken on
reservation_ww_class_mutex objects but neither uses a ww_acquire_ctx,
which trips lockdep:

  WARNING: possible recursive locking detected
  --------------------------------------------
  kworker/u128:0 is trying to acquire lock:
  ffff88838b16e1f0 (reservation_ww_class_mutex){+.+.}-{4:4},
    at: amdgpu_devcoredump_format+0x1594/0x23f0 [amdgpu]

  but task is already holding lock:
  ffff8882f82681f0 (reservation_ww_class_mutex){+.+.}-{4:4},
    at: amdgpu_devcoredump_format+0x1594/0x23f0 [amdgpu]

   Possible unsafe locking scenario:
         CPU0
         ----
    lock(reservation_ww_class_mutex);
    lock(reservation_ww_class_mutex);

   *** DEADLOCK ***
   May be due to missing lock nesting notation

  Workqueue: events_unbound amdgpu_devcoredump_deferred_work [amdgpu]
  Call Trace:
   __ww_mutex_lock.constprop.0
   ww_mutex_lock
   amdgpu_bo_reserve
   amdgpu_devcoredump_format+0x1594 [amdgpu]
   amdgpu_devcoredump_deferred_work+0xea [amdgpu]
   process_one_work
   worker_thread
   kthread

The two reservations are on different BOs in the captured trace, so the
splat is a lockdep-correctness warning, not an observed deadlock.  It
becomes a real self-deadlock whenever the IB BO shares its dma_resv
with the root PD (the always-valid case, see
amdgpu_vm_is_bo_always_valid()): amdgpu_bo_reserve(abo) re-acquires the
same ww_mutex without a ticket and blocks forever.

With amdgpu.gpu_recovery=0 the timeout handler refires every ~2 s and
each invocation produces this splat, drowning the kernel ring buffer.

Fix it by collecting the per-IB BO references under the root PD's
reservation, then releasing the root before reserving each IB BO
individually.  The walk over the VM mapping tree must remain under the
root lock (mappings can be torn down without it), but the actual
content copies do not need to nest inside it.  Each per-IB reservation
is now an independent top-level acquire, eliminating the nested
ww_mutex.

The collect/release logic is factored out into two small helpers
(amdgpu_devcoredump_collect_ib_refs / amdgpu_devcoredump_release_ib_refs)
to keep the main function's indentation reasonable.

This also fixes a BO refcount leak in the original code: when
amdgpu_bo_reserve() failed, control jumped to free_ib_content without
running amdgpu_bo_unref().  In the new structure the per-IB BO refs
are released unconditionally in the cleanup helper.

Reproducer (~150 LoC libdrm_amdgpu): submit a single GFX IB containing
PACKET3_INDIRECT_BUFFER chained at GPU VA 0 and wait for the fence.
The TDR fires within ~10 s and the deferred coredump worker produces
the splat above on every invocation.

Fixes: 7b15fc2d1f1a ("drm/amdgpu: dump job ibs in the devcoredump")
Cc: stable@vger.kernel.org # 7.1
Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
---
 .../gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c  | 147 +++++++++++++-----
 1 file changed, 110 insertions(+), 37 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c
index d386bc775d03..f6bb968de756 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c
@@ -207,6 +207,72 @@ static void amdgpu_devcoredump_fw_info(struct amdgpu_device *adev,
 	}
 }
 
+struct amdgpu_devcoredump_ib_ref {
+	struct amdgpu_bo	*bo;
+	u64			offset;
+};
+
+/*
+ * Walk the VM's mapping tree under the root PD's reservation to obtain the BO
+ * that backs each IB and pin it with a refcount. The root PD reservation is
+ * dropped before this function returns; the caller can then reserve each IB
+ * BO individually without nesting ww_mutex acquires on
+ * reservation_ww_class_mutex.
+ *
+ * Returns an array of num_ibs entries (each ib_refs[i].bo may be NULL if its
+ * mapping was not found), or NULL on allocation failure / VM lookup failure.
+ * The caller must release the BO refs and free the array.
+ */
+static struct amdgpu_devcoredump_ib_ref *
+amdgpu_devcoredump_collect_ib_refs(struct amdgpu_device *adev,
+				   struct amdgpu_coredump_info *coredump)
+{
+	struct amdgpu_devcoredump_ib_ref *ib_refs;
+	struct amdgpu_bo_va_mapping *mapping;
+	struct amdgpu_bo *root;
+	struct amdgpu_vm *vm;
+	u64 va_start;
+
+	ib_refs = kcalloc(coredump->num_ibs, sizeof(*ib_refs), GFP_KERNEL);
+	if (!ib_refs)
+		return NULL;
+
+	vm = amdgpu_vm_lock_by_pasid(adev, &root, coredump->pasid);
+	if (!vm) {
+		kfree(ib_refs);
+		return NULL;
+	}
+
+	for (int i = 0; i < coredump->num_ibs; i++) {
+		va_start = coredump->ibs[i].gpu_addr & AMDGPU_GMC_HOLE_MASK;
+		mapping = amdgpu_vm_bo_lookup_mapping(vm, va_start / AMDGPU_GPU_PAGE_SIZE);
+		if (!mapping)
+			continue;
+
+		ib_refs[i].bo = amdgpu_bo_ref(mapping->bo_va->base.bo);
+		ib_refs[i].offset = va_start -
+				    mapping->start * AMDGPU_GPU_PAGE_SIZE;
+	}
+
+	amdgpu_bo_unreserve(root);
+	amdgpu_bo_unref(&root);
+
+	return ib_refs;
+}
+
+static void
+amdgpu_devcoredump_release_ib_refs(struct amdgpu_devcoredump_ib_ref *ib_refs,
+				   int num_ibs)
+{
+	if (!ib_refs)
+		return;
+
+	for (int i = 0; i < num_ibs; i++)
+		if (ib_refs[i].bo)
+			amdgpu_bo_unref(&ib_refs[i].bo);
+	kfree(ib_refs);
+}
+
 static ssize_t
 amdgpu_devcoredump_format(char *buffer, size_t count, struct amdgpu_coredump_info *coredump)
 {
@@ -214,13 +280,11 @@ amdgpu_devcoredump_format(char *buffer, size_t count, struct amdgpu_coredump_inf
 	struct drm_printer p;
 	struct drm_print_iterator iter;
 	struct amdgpu_vm_fault_info *fault_info;
-	struct amdgpu_bo_va_mapping *mapping;
 	struct amdgpu_ip_block *ip_block;
 	struct amdgpu_res_cursor cursor;
-	struct amdgpu_bo *abo, *root;
-	uint64_t va_start, offset;
+	struct amdgpu_bo *abo;
+	uint64_t offset;
 	struct amdgpu_ring *ring;
-	struct amdgpu_vm *vm;
 	u32 *ib_content;
 	uint8_t *kptr;
 	int ver, i, j, r;
@@ -343,43 +407,52 @@ amdgpu_devcoredump_format(char *buffer, size_t count, struct amdgpu_coredump_inf
 		drm_printf(&p, "VRAM is lost due to GPU reset!\n");
 
 	if (coredump->num_ibs) {
-		/* Don't try to lookup the VM or map the BOs when calculating the
-		 * size required to store the devcoredump.
+		struct amdgpu_devcoredump_ib_ref *ib_refs = NULL;
+
+		/*
+		 * Snapshot per-IB BO references under the root PD's reservation,
+		 * then release the root before reserving each IB BO individually
+		 * to copy its contents.
+		 *
+		 * Reserving an IB BO while the root PD is still reserved would
+		 * be a nested ww_mutex acquire on reservation_ww_class_mutex
+		 * without a ww_acquire_ctx, which trips lockdep's recursive-
+		 * locking check and self-deadlocks for IB BOs that share their
+		 * dma_resv with the root PD (always-valid BOs).
+		 *
+		 * Skip lookup/reservation entirely on the sizing pass: it does
+		 * not write IB content, and the size estimate doesn't depend on
+		 * whether the BOs are reachable.
 		 */
-		if (sizing_pass)
-			vm = NULL;
-		else
-			vm = amdgpu_vm_lock_by_pasid(adev, &root, coredump->pasid);
+		if (!sizing_pass)
+			ib_refs = amdgpu_devcoredump_collect_ib_refs(adev, coredump);
 
-		for (int i = 0; i < coredump->num_ibs && (sizing_pass || vm); i++) {
+		for (int i = 0; i < coredump->num_ibs; i++) {
 			ib_content = kvmalloc_array(coredump->ibs[i].ib_size_dw, 4,
 						    GFP_KERNEL);
 			if (!ib_content)
 				continue;
 
-			/* vm=NULL can only happen when 'sizing_pass' is true. Skip to the
-			 * drm_printf() calls (ib_content doesn't need to be initialized
-			 * as its content won't be written anywhere).
-			 */
-			if (!vm)
+			if (sizing_pass)
 				goto output_ib_content;
 
-			va_start = coredump->ibs[i].gpu_addr & AMDGPU_GMC_HOLE_MASK;
-			mapping = amdgpu_vm_bo_lookup_mapping(vm, va_start / AMDGPU_GPU_PAGE_SIZE);
-			if (!mapping)
-				goto free_ib_content;
+			if (!ib_refs || !ib_refs[i].bo)
+				goto output_ib_content;
+
+			abo = ib_refs[i].bo;
+			offset = ib_refs[i].offset;
 
-			offset = va_start - (mapping->start * AMDGPU_GPU_PAGE_SIZE);
-			abo = amdgpu_bo_ref(mapping->bo_va->base.bo);
 			r = amdgpu_bo_reserve(abo, false);
 			if (r)
-				goto free_ib_content;
+				goto output_ib_content;
 
 			if (abo->flags & AMDGPU_GEM_CREATE_NO_CPU_ACCESS) {
 				off = 0;
 
-				if (abo->tbo.resource->mem_type != TTM_PL_VRAM)
-					goto unreserve_abo;
+				if (abo->tbo.resource->mem_type != TTM_PL_VRAM) {
+					amdgpu_bo_unreserve(abo);
+					goto output_ib_content;
+				}
 
 				amdgpu_res_first(abo->tbo.resource, offset,
 						 coredump->ibs[i].ib_size_dw * 4,
@@ -395,8 +468,10 @@ amdgpu_devcoredump_format(char *buffer, size_t count, struct amdgpu_coredump_inf
 				r = ttm_bo_kmap(&abo->tbo, 0,
 						PFN_UP(abo->tbo.base.size),
 						&abo->kmap);
-				if (r)
-					goto unreserve_abo;
+				if (r) {
+					amdgpu_bo_unreserve(abo);
+					goto output_ib_content;
+				}
 
 				kptr = amdgpu_bo_kptr(abo);
 				kptr += offset;
@@ -406,21 +481,19 @@ amdgpu_devcoredump_format(char *buffer, size_t count, struct amdgpu_coredump_inf
 				amdgpu_bo_kunmap(abo);
 			}
 
+			amdgpu_bo_unreserve(abo);
+
 output_ib_content:
 			drm_printf(&p, "\nIB #%d 0x%llx %d dw\n",
 				   i, coredump->ibs[i].gpu_addr, coredump->ibs[i].ib_size_dw);
-			for (int j = 0; j < coredump->ibs[i].ib_size_dw; j++)
-				drm_printf(&p, "0x%08x\n", ib_content[j]);
-unreserve_abo:
-			if (vm)
-				amdgpu_bo_unreserve(abo);
-free_ib_content:
+			if (!sizing_pass && ib_refs && ib_refs[i].bo) {
+				for (int j = 0; j < coredump->ibs[i].ib_size_dw; j++)
+					drm_printf(&p, "0x%08x\n", ib_content[j]);
+			}
 			kvfree(ib_content);
 		}
-		if (vm) {
-			amdgpu_bo_unreserve(root);
-			amdgpu_bo_unref(&root);
-		}
+
+		amdgpu_devcoredump_release_ib_refs(ib_refs, coredump->num_ibs);
 	}
 
 	return count - iter.remain;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Claude review: drm/amdgpu: fix recursive ww_mutex acquire in amdgpu_devcoredump_format
  2026-04-29 14:37 [PATCH] drm/amdgpu: fix recursive ww_mutex acquire in amdgpu_devcoredump_format Mikhail Gavrilov
@ 2026-05-05  1:32 ` Claude Code Review Bot
  2026-05-05  1:32 ` Claude Code Review Bot
  1 sibling, 0 replies; 3+ messages in thread
From: Claude Code Review Bot @ 2026-05-05  1:32 UTC (permalink / raw)
  To: dri-devel-reviews

Overall Series Review

Subject: drm/amdgpu: fix recursive ww_mutex acquire in amdgpu_devcoredump_format
Author: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Patches: 1
Reviewed: 2026-05-05T11:32:39.969284

---

This is a single patch fixing a real and well-diagnosed locking problem: `amdgpu_devcoredump_format()` acquires two `reservation_ww_class_mutex` locks in a nested fashion without a `ww_acquire_ctx`, triggering lockdep warnings and causing a real self-deadlock when an IB BO shares its `dma_resv` with the root page directory (always-valid BOs).

The approach -- collect BO references under the root PD lock, release the root, then reserve each IB BO independently -- is **architecturally correct** and the helper factoring is clean. The patch also fixes a pre-existing BO refcount leak. However, the patch introduces **two new bugs** that need to be addressed before it can be merged.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Claude review: drm/amdgpu: fix recursive ww_mutex acquire in amdgpu_devcoredump_format
  2026-04-29 14:37 [PATCH] drm/amdgpu: fix recursive ww_mutex acquire in amdgpu_devcoredump_format Mikhail Gavrilov
  2026-05-05  1:32 ` Claude review: " Claude Code Review Bot
@ 2026-05-05  1:32 ` Claude Code Review Bot
  1 sibling, 0 replies; 3+ messages in thread
From: Claude Code Review Bot @ 2026-05-05  1:32 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Positive aspects:**

- The lockdep analysis in the commit message is excellent, clearly documenting the root cause and the real deadlock scenario (always-valid BOs sharing `dma_resv` with the root PD).
- The collect-then-release pattern in `amdgpu_devcoredump_collect_ib_refs()` is correct. `amdgpu_vm_bo_lookup_mapping()` requires the root PD reservation (confirmed by `dma_resv_assert_held()` assertions elsewhere in the tree), and `amdgpu_bo_ref()` is a lockless atomic refcount bump, so the collection phase is safe.
- After releasing the root PD, each `amdgpu_bo_reserve()` is an independent top-level ww_mutex acquire -- no nesting, no lockdep issue.
- `amdgpu_devcoredump_release_ib_refs()` properly unrefs all BOs, fixing the pre-existing leak the commit message describes.

**Bug 1 -- Sizing pass underestimates buffer size (truncation):**

The original code ran the IB content printing loop during the sizing pass:

```c
// Original -- sizing pass prints content (with uninitialized data, harmless for sizing)
for (int j = 0; j < coredump->ibs[i].ib_size_dw; j++)
    drm_printf(&p, "0x%08x\n", ib_content[j]);
```

The new code guards the content printing with:

```c
if (!sizing_pass && ib_refs && ib_refs[i].bo) {
    for (int j = 0; j < coredump->ibs[i].ib_size_dw; j++)
        drm_printf(&p, "0x%08x\n", ib_content[j]);
}
```

During the sizing pass, `sizing_pass` is true, so the content lines are **never counted**. But `amdgpu_devcoredump_deferred_work()` at line 546-548 allocates the buffer based on the sizing pass return value:

```c
coredump->formatted_size = amdgpu_devcoredump_format(NULL, ...);
coredump->formatted = kvzalloc(coredump->formatted_size, GFP_KERNEL);
```

The second (real) pass then tries to write IB header + content into a buffer that was only sized for headers. `drm_coredump_printer` respects `iter.remain` so there's no overflow, but IB content at the end of the dump will be **silently truncated** -- which defeats the purpose of the devcoredump.

**Fix:** The sizing pass should still account for the content lines. For example, always print the content loop during sizing (as the original did), or compute the size arithmetically without accessing `ib_content`.

**Bug 2 -- Uninitialized `ib_content` printed on error paths:**

When `amdgpu_bo_reserve()` fails, `ttm_bo_kmap()` fails, or the BO is `NO_CPU_ACCESS` but not in VRAM, the code jumps to `output_ib_content` **without having populated `ib_content`** (allocated via `kvmalloc_array` -- not zeroed). The guard at `output_ib_content`:

```c
if (!sizing_pass && ib_refs && ib_refs[i].bo) {
```

is still true in these cases because the BO was found (it's the *access* that failed), so uninitialized heap data gets formatted as hex into the devcoredump. In the original code, these error paths jumped to `free_ib_content`, skipping the output entirely.

This is both a correctness issue (misleading garbage in the dump) and technically an info leak to `/sys/class/devcoredump/` (root-readable, so low severity).

**Fix:** Track whether `ib_content` was successfully filled (e.g., a `bool ib_valid` flag, or NULL out the `ib_content` pointer on error paths), and only print content when it's valid. Alternatively, use `kvzalloc_array`/add `__GFP_ZERO` so failed reads produce zeroes instead of heap contents.

**Minor observations:**

- The `Cc: stable@vger.kernel.org # 7.1` tag looks incorrect -- this should probably be `# 6.x` since Linux is on 6.x kernels. Verify the version where the Fixes commit `7b15fc2d1f1a` first appeared.
- The behavioral change when VM lookup fails (original: skip loop entirely; new: print IB headers without content) is acceptable but worth noting in the commit message.
- Comments in the main function body (lines 412-426) are a bit verbose for kernel style, but the locking rationale is valuable to preserve.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-05  1:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-29 14:37 [PATCH] drm/amdgpu: fix recursive ww_mutex acquire in amdgpu_devcoredump_format Mikhail Gavrilov
2026-05-05  1:32 ` Claude review: " Claude Code Review Bot
2026-05-05  1:32 ` Claude Code Review Bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox