From mboxrd@z Thu Jan 1 00:00:00 1970 From: Claude Code Review Bot To: dri-devel-reviews@example.com Subject: Claude review: drm/amdgpu: fix root reservation in amdgpu_vm_handle_fault Date: Thu, 23 Apr 2026 09:58:50 +1000 Message-ID: In-Reply-To: <20260420121438.1612-1-pierre-eric.pelloux-prayer@amd.com> References: <20260420121438.1612-1-pierre-eric.pelloux-prayer@amd.com> <20260420121438.1612-1-pierre-eric.pelloux-prayer@amd.com> X-Mailer: Claude Code Patch Reviewer Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Patch Review **The deadlock path (confirmed by code reading):** 1. `amdgpu_vm_handle_fault` =E2=86=92 `amdgpu_vm_lock_by_pasid` =E2=86=92 `= amdgpu_bo_reserve(root, true)` =E2=80=94 acquires `dma_resv` lock with NULL= `ww_acquire_ctx` 2. =E2=86=92 `svm_range_restore_pages` =E2=86=92 `svm_range_validate_and_ma= p` =E2=86=92 `svm_range_reserve_bos` =E2=86=92 `amdgpu_vm_lock_pd(vm, &ctx-= >exec, 2)` =E2=80=94 tries to lock the same root BO via `drm_exec` with a n= on-NULL `ww_acquire_ctx` 3. Same thread, same mutex, two different acquire contexts =E2=86=92 deadlo= ck **Correctness of the fix:** The new code structure: ```c if (is_compute_context) { amdgpu_bo_unreserve(root); if (!svm_range_restore_pages(...)) { amdgpu_bo_unref(&root); return true; } amdgpu_bo_unref(&root); vm =3D amdgpu_vm_lock_by_pasid(adev, &root, pasid); if (!vm) return false; } ``` - **Success path** (`svm_range_restore_pages` returns 0): unreserve, unref,= return true =E2=80=94 correctly matches original behavior. - **Failure path** (non-zero return): unreserve + unref the old root, then = re-acquire a fresh lock+ref via `amdgpu_vm_lock_by_pasid`. Falls through to= the NORETRY PTE installation path at line 3037 (`flags =3D AMDGPU_VM_NORET= RY_FLAGS`), which needs root reserved =E2=80=94 satisfied by the fresh lock. - **VM destruction race**: Between unreserve/unref and the re-lock call, th= e VM could be torn down. `amdgpu_vm_lock_by_pasid` handles this correctly = =E2=80=94 it re-looks up the pasid, re-refs the root BO, re-reserves it, an= d double-checks that `vm->root.bo` still matches (lines 2979=E2=80=932983 i= n the existing code). - **`amdgpu_bo_unref(&root)`** nulls out `root`, so the subsequent `amdgpu_= vm_lock_by_pasid` writes to a clean pointer. No use-after-free risk. **Minor observations:** 1. **Missing `Cc: stable@vger.kernel.org`**: The patch has a `Fixes:` tag p= ointing to commit `32b486e8541c` but no `Cc: stable`. If this deadlock is r= eproducible in practice (e.g., SVM/XNACK workloads on compute contexts), it= should probably carry a stable tag. 2. **Two-line comment block**: The `/* Unreserve root ... */` and `/* TODO:= ... */` are two separate single-line comments. A single multi-line comment= would be more conventional kernel style, but this is purely cosmetic. 3. **The TODO is well-placed**: The unlock/relock pattern is inherently fra= gile =E2=80=94 if future code adds state between the initial lock and the `= is_compute_context` check, it could be invalidated by the relock. The TODO = acknowledging that `svm_range_restore_pages` should be reworked to not need= the root reservation (or to accept it already held) is appropriate guidanc= e for a future cleanup. 4. **Behavioral change is intentional and correct**: In the original code, = for compute contexts where `svm_range_restore_pages` fails, the function fa= lls through with the original lock still held. In the new code, it drops an= d re-acquires the lock. This is equivalent =E2=80=94 the re-acquired lock p= rotects the same critical section, and the `vm` pointer is refreshed. Overall this is a clean, minimal fix for a real deadlock. Reviewed-by worth= y. --- Generated by Claude Code Patch Reviewer