* [PATCH v2] drm/amdgpu: fix root reservation in amdgpu_vm_handle_fault
@ 2026-04-20 12:14 Pierre-Eric Pelloux-Prayer
2026-04-20 12:29 ` Christian König
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2026-04-20 12:14 UTC (permalink / raw)
To: Alex Deucher, Christian König, David Airlie, Simona Vetter,
Pierre-Eric Pelloux-Prayer
Cc: amd-gfx, dri-devel, linux-kernel
svm_range_restore_pages might reserve the root bo so it must
be called after unreserving it.
---
v2:
- don't modify amdgpu_vm_lock_by_pasid
- add a TODO
---
Fixes: 32b486e8541c ("drm/amdgpu: extract amdgpu_vm_lock_by_pasid from amdgpu_vm_handle_fault")
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 63156289ae7f..799a1803d941 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -3026,11 +3026,22 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, u32 pasid,
is_compute_context = vm->is_compute_context;
- if (is_compute_context && !svm_range_restore_pages(adev, pasid, vmid,
- node_id, addr >> PAGE_SHIFT, ts, write_fault)) {
+ if (is_compute_context) {
+ /* Unreserve root since svm_range_restore_pages might try to reserve it. */
+ /* TODO: rework svm_range_restore_pages so that this isn't necessary. */
amdgpu_bo_unreserve(root);
+
+ if (!svm_range_restore_pages(adev, pasid, vmid,
+ node_id, addr >> PAGE_SHIFT, ts, write_fault)) {
+ amdgpu_bo_unref(&root);
+ return true;
+ }
amdgpu_bo_unref(&root);
- return true;
+
+ /* Double check that the VM still exists. */
+ vm = amdgpu_vm_lock_by_pasid(adev, &root, pasid);
+ if (!vm)
+ return false;
}
addr /= AMDGPU_GPU_PAGE_SIZE;
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH v2] drm/amdgpu: fix root reservation in amdgpu_vm_handle_fault
2026-04-20 12:14 [PATCH v2] drm/amdgpu: fix root reservation in amdgpu_vm_handle_fault Pierre-Eric Pelloux-Prayer
@ 2026-04-20 12:29 ` Christian König
2026-04-22 23:58 ` Claude review: " Claude Code Review Bot
2026-04-22 23:58 ` Claude Code Review Bot
2 siblings, 0 replies; 4+ messages in thread
From: Christian König @ 2026-04-20 12:29 UTC (permalink / raw)
To: Pierre-Eric Pelloux-Prayer, Alex Deucher, David Airlie,
Simona Vetter
Cc: amd-gfx, dri-devel, linux-kernel
On 4/20/26 14:14, Pierre-Eric Pelloux-Prayer wrote:
> svm_range_restore_pages might reserve the root bo so it must
> be called after unreserving it.
>
> ---
> v2:
> - don't modify amdgpu_vm_lock_by_pasid
> - add a TODO
> ---
>
> Fixes: 32b486e8541c ("drm/amdgpu: extract amdgpu_vm_lock_by_pasid from amdgpu_vm_handle_fault")
> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 17 ++++++++++++++---
> 1 file changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 63156289ae7f..799a1803d941 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -3026,11 +3026,22 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, u32 pasid,
>
> is_compute_context = vm->is_compute_context;
>
> - if (is_compute_context && !svm_range_restore_pages(adev, pasid, vmid,
> - node_id, addr >> PAGE_SHIFT, ts, write_fault)) {
> + if (is_compute_context) {
> + /* Unreserve root since svm_range_restore_pages might try to reserve it. */
> + /* TODO: rework svm_range_restore_pages so that this isn't necessary. */
> amdgpu_bo_unreserve(root);
> +
> + if (!svm_range_restore_pages(adev, pasid, vmid,
> + node_id, addr >> PAGE_SHIFT, ts, write_fault)) {
> + amdgpu_bo_unref(&root);
> + return true;
> + }
> amdgpu_bo_unref(&root);
> - return true;
> +
> + /* Double check that the VM still exists. */
Probably better to write "Re-acquire the VM lock, could be that the VM was freed in between.".
With that done Reviewed-by: Christian König <christian.koenig@amd.com>.
> + vm = amdgpu_vm_lock_by_pasid(adev, &root, pasid);
> + if (!vm)
> + return false;
> }
>
> addr /= AMDGPU_GPU_PAGE_SIZE;
^ permalink raw reply [flat|nested] 4+ messages in thread* Claude review: drm/amdgpu: fix root reservation in amdgpu_vm_handle_fault
2026-04-20 12:14 [PATCH v2] drm/amdgpu: fix root reservation in amdgpu_vm_handle_fault Pierre-Eric Pelloux-Prayer
2026-04-20 12:29 ` Christian König
@ 2026-04-22 23:58 ` Claude Code Review Bot
2026-04-22 23:58 ` Claude Code Review Bot
2 siblings, 0 replies; 4+ messages in thread
From: Claude Code Review Bot @ 2026-04-22 23:58 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**The deadlock path (confirmed by code reading):**
1. `amdgpu_vm_handle_fault` → `amdgpu_vm_lock_by_pasid` → `amdgpu_bo_reserve(root, true)` — acquires `dma_resv` lock with NULL `ww_acquire_ctx`
2. → `svm_range_restore_pages` → `svm_range_validate_and_map` → `svm_range_reserve_bos` → `amdgpu_vm_lock_pd(vm, &ctx->exec, 2)` — tries to lock the same root BO via `drm_exec` with a non-NULL `ww_acquire_ctx`
3. Same thread, same mutex, two different acquire contexts → deadlock
**Correctness of the fix:**
The new code structure:
```c
if (is_compute_context) {
amdgpu_bo_unreserve(root);
if (!svm_range_restore_pages(...)) {
amdgpu_bo_unref(&root);
return true;
}
amdgpu_bo_unref(&root);
vm = amdgpu_vm_lock_by_pasid(adev, &root, pasid);
if (!vm)
return false;
}
```
- **Success path** (`svm_range_restore_pages` returns 0): unreserve, unref, return true — correctly matches original behavior.
- **Failure path** (non-zero return): unreserve + unref the old root, then re-acquire a fresh lock+ref via `amdgpu_vm_lock_by_pasid`. Falls through to the NORETRY PTE installation path at line 3037 (`flags = AMDGPU_VM_NORETRY_FLAGS`), which needs root reserved — satisfied by the fresh lock.
- **VM destruction race**: Between unreserve/unref and the re-lock call, the VM could be torn down. `amdgpu_vm_lock_by_pasid` handles this correctly — it re-looks up the pasid, re-refs the root BO, re-reserves it, and double-checks that `vm->root.bo` still matches (lines 2979–2983 in the existing code).
- **`amdgpu_bo_unref(&root)`** nulls out `root`, so the subsequent `amdgpu_vm_lock_by_pasid` writes to a clean pointer. No use-after-free risk.
**Minor observations:**
1. **Missing `Cc: stable@vger.kernel.org`**: The patch has a `Fixes:` tag pointing to commit `32b486e8541c` but no `Cc: stable`. If this deadlock is reproducible in practice (e.g., SVM/XNACK workloads on compute contexts), it should probably carry a stable tag.
2. **Two-line comment block**: The `/* Unreserve root ... */` and `/* TODO: ... */` are two separate single-line comments. A single multi-line comment would be more conventional kernel style, but this is purely cosmetic.
3. **The TODO is well-placed**: The unlock/relock pattern is inherently fragile — if future code adds state between the initial lock and the `is_compute_context` check, it could be invalidated by the relock. The TODO acknowledging that `svm_range_restore_pages` should be reworked to not need the root reservation (or to accept it already held) is appropriate guidance for a future cleanup.
4. **Behavioral change is intentional and correct**: In the original code, for compute contexts where `svm_range_restore_pages` fails, the function falls through with the original lock still held. In the new code, it drops and re-acquires the lock. This is equivalent — the re-acquired lock protects the same critical section, and the `vm` pointer is refreshed.
Overall this is a clean, minimal fix for a real deadlock. Reviewed-by worthy.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 4+ messages in thread* Claude review: drm/amdgpu: fix root reservation in amdgpu_vm_handle_fault
2026-04-20 12:14 [PATCH v2] drm/amdgpu: fix root reservation in amdgpu_vm_handle_fault Pierre-Eric Pelloux-Prayer
2026-04-20 12:29 ` Christian König
2026-04-22 23:58 ` Claude review: " Claude Code Review Bot
@ 2026-04-22 23:58 ` Claude Code Review Bot
2 siblings, 0 replies; 4+ messages in thread
From: Claude Code Review Bot @ 2026-04-22 23:58 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: drm/amdgpu: fix root reservation in amdgpu_vm_handle_fault
Author: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Patches: 2
Reviewed: 2026-04-23T09:58:50.258731
---
This is a single-patch fix for a real deadlock in `amdgpu_vm_handle_fault`. The bug: `amdgpu_vm_lock_by_pasid` acquires the root BO reservation (via `amdgpu_bo_reserve`), then calls `svm_range_restore_pages`, which eventually reaches `svm_range_reserve_bos` → `amdgpu_vm_lock_pd`, trying to lock the same root BO via `drm_exec`. Since the first lock is held without a `ww_acquire_ctx` (NULL) and the second attempts to lock with a `drm_exec` context, the same thread deadlocks on the `ww_mutex`.
The fix — unreserve root before calling `svm_range_restore_pages`, then re-acquire via `amdgpu_vm_lock_by_pasid` if needed — is the correct minimal approach. The unlock/relock window is properly guarded by the re-validation in `amdgpu_vm_lock_by_pasid` (which re-checks the pasid→vm mapping and verifies `vm->root.bo` still matches).
**Verdict: The patch is correct and should be safe to apply.** One minor concern noted below about a missing `Cc: stable` tag.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-04-22 23:58 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-20 12:14 [PATCH v2] drm/amdgpu: fix root reservation in amdgpu_vm_handle_fault Pierre-Eric Pelloux-Prayer
2026-04-20 12:29 ` Christian König
2026-04-22 23:58 ` Claude review: " Claude Code Review Bot
2026-04-22 23:58 ` Claude Code Review Bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox