* Re: [PATCH] drm/amdgpu: fix PASID task_info lookup race [not found] <20260309160403.599472-1-fanwu01@zju.edu.cn> @ 2026-03-09 16:54 ` Christian König 2026-03-10 0:58 ` [PATCH v2] " Fan Wu ` (2 more replies) 0 siblings, 3 replies; 4+ messages in thread From: Christian König @ 2026-03-09 16:54 UTC (permalink / raw) To: Fan Wu, Alex Deucher; +Cc: amd-gfx, dri-devel, linux-kernel On 3/9/26 17:04, Fan Wu wrote: > The amdgpu_vm_get_task_info_pasid() function previously called > amdgpu_vm_get_vm_from_pasid() which returns a raw VM pointer after > releasing the pasids xarray lock. The caller then dereferences > vm->task_info without any lifetime protection. > > Race condition: > > CPU 0 (lookup) CPU 1 (release) > ------------------ ------------------ > amdgpu_vm_get_task_info_pasid() > xa_lock() > vm = xa_load(pasids) > xa_unlock() > amdgpu_vm_fini() > xa_erase_irq(pasids) > // teardown continues > kfree(fpriv) > // VM freed (embedded in fpriv) > vm->task_info // potential UAF > > This can leave the VM pointer dangling because struct amdgpu_vm is > embedded in struct amdgpu_fpriv which is freed via kfree(fpriv) in > amdgpu_file_release_kms() after amdgpu_vm_fini() returns. > > Fix this by acquiring the task_info reference while holding the > xarray lock. This avoids the window where the VM could be freed > between the lookup and the dereference. > > Cache vm->task_info in a local variable before attempting to take a > reference, which keeps the lookup straightforward inside the locked > section. Use kref_get_unless_zero() to safely handle the case where > task_info's refcount is already being decremented to zero by another > thread in the teardown path. > > Note: An RCU-based approach was considered but is not currently > feasible because: (1) the pasids xarray is initialized without > XA_FLAGS_RCU, and (2) struct amdgpu_fpriv is freed with kfree() > rather than kfree_rcu(). A future refactoring could enable RCU > if needed for performance. > > Also remove the unsafe helper function amdgpu_vm_get_vm_from_pasid() > to prevent future misuse. > > Fixes: b8f67b9ddf4f ("drm/amdgpu: change vm->task_info handling") > Signed-off-by: Fan Wu <fanwu01@zju.edu.cn> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 40 ++++++++++++++++---------- > 1 file changed, 25 insertions(+), 15 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > index f2beb980e3c3..7e8621c9b661 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > @@ -2468,19 +2468,6 @@ static void amdgpu_vm_destroy_task_info(struct kref *kref) > kfree(ti); > } > > -static inline struct amdgpu_vm * > -amdgpu_vm_get_vm_from_pasid(struct amdgpu_device *adev, u32 pasid) > -{ > - struct amdgpu_vm *vm; > - unsigned long flags; > - > - xa_lock_irqsave(&adev->vm_manager.pasids, flags); > - vm = xa_load(&adev->vm_manager.pasids, pasid); > - xa_unlock_irqrestore(&adev->vm_manager.pasids, flags); > - > - return vm; > -} > - > /** > * amdgpu_vm_put_task_info - reference down the vm task_info ptr > * > @@ -2527,8 +2514,31 @@ amdgpu_vm_get_task_info_vm(struct amdgpu_vm *vm) > struct amdgpu_task_info * > amdgpu_vm_get_task_info_pasid(struct amdgpu_device *adev, u32 pasid) > { > - return amdgpu_vm_get_task_info_vm( > - amdgpu_vm_get_vm_from_pasid(adev, pasid)); > + struct amdgpu_vm *vm; > + unsigned long flags; > + struct amdgpu_task_info *ti = NULL; > + > + /* > + * Acquire the task_info reference while holding the pasids xarray > + * lock to prevent a race with amdgpu_vm_fini() which removes the > + * PASID mapping before freeing the VM (embedded in struct amdgpu_fpriv). > + * Without this, the VM could be freed between xa_load() return and > + * the task_info dereference. That the VM is freed is irrelevant, the point is that we need to grab the reference to the task info before we drop that one. > + */ > + xa_lock_irqsave(&adev->vm_manager.pasids, flags); > + vm = xa_load(&adev->vm_manager.pasids, pasid); > + if (vm) { > + /* > + * Cache vm->task_info in a local variable before > + * attempting to take a reference. > + */ Please drop that comment, taking the task info into a local variable is actually superflous. > + ti = vm->task_info; > + if (ti && !kref_get_unless_zero(&ti->refcount)) That is unecessary as wel, the task info is dropped after the VM is removed from pasid mapping. So just using kref_get() is sufficient. Regards, Christian. > + ti = NULL; > + } > + xa_unlock_irqrestore(&adev->vm_manager.pasids, flags); > + > + return ti; > } > > static int amdgpu_vm_create_task_info(struct amdgpu_vm *vm) > -- > 2.34.1 > ^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v2] drm/amdgpu: fix PASID task_info lookup race 2026-03-09 16:54 ` [PATCH] drm/amdgpu: fix PASID task_info lookup race Christian König @ 2026-03-10 0:58 ` Fan Wu 2026-03-11 3:51 ` Claude review: Re: [PATCH] " Claude Code Review Bot 2026-03-11 3:51 ` Claude Code Review Bot 2 siblings, 0 replies; 4+ messages in thread From: Fan Wu @ 2026-03-10 0:58 UTC (permalink / raw) To: Alex Deucher, Christian König Cc: David Airlie, Simona Vetter, amd-gfx, dri-devel, linux-kernel, Fan Wu amdgpu_vm_get_task_info_pasid() currently looks up the VM from the PASID xarray, drops the xarray lock, and only then grabs the task_info reference through amdgpu_vm_get_task_info_vm(). Take the task_info reference directly while holding the PASID xarray lock instead. This keeps the lookup and reference acquisition in the same critical section. The task_info is dropped only after the VM is removed from the PASID mapping, so a regular kref_get() is sufficient here. Also remove the now unnecessary amdgpu_vm_get_vm_from_pasid() helper. Fixes: b8f67b9ddf4f ("drm/amdgpu: change vm->task_info handling") Signed-off-by: Fan Wu <fanwu01@zju.edu.cn> --- v2: - grab the task_info reference directly under the PASID xarray lock - drop the extra local-variable comment - use kref_get() instead of kref_get_unless_zero() - simplify the changelog drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 35 +++++++++++++++----------- 1 file changed, 20 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index f2beb980e3c3..706ca8dd65d2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -2468,19 +2468,6 @@ static void amdgpu_vm_destroy_task_info(struct kref *kref) kfree(ti); } -static inline struct amdgpu_vm * -amdgpu_vm_get_vm_from_pasid(struct amdgpu_device *adev, u32 pasid) -{ - struct amdgpu_vm *vm; - unsigned long flags; - - xa_lock_irqsave(&adev->vm_manager.pasids, flags); - vm = xa_load(&adev->vm_manager.pasids, pasid); - xa_unlock_irqrestore(&adev->vm_manager.pasids, flags); - - return vm; -} - /** * amdgpu_vm_put_task_info - reference down the vm task_info ptr * @@ -2527,8 +2514,26 @@ amdgpu_vm_get_task_info_vm(struct amdgpu_vm *vm) struct amdgpu_task_info * amdgpu_vm_get_task_info_pasid(struct amdgpu_device *adev, u32 pasid) { - return amdgpu_vm_get_task_info_vm( - amdgpu_vm_get_vm_from_pasid(adev, pasid)); + struct amdgpu_vm *vm; + unsigned long flags; + struct amdgpu_task_info *ti = NULL; + + /* + * Acquire the task_info reference while holding the pasids xarray + * lock to prevent a race with amdgpu_vm_fini() which removes the + * PASID mapping before freeing the VM (embedded in struct amdgpu_fpriv). + * Without this, the VM could be freed between xa_load() return and + * the task_info dereference. + */ + xa_lock_irqsave(&adev->vm_manager.pasids, flags); + vm = xa_load(&adev->vm_manager.pasids, pasid); + if (vm && vm->task_info) { + ti = vm->task_info; + kref_get(&ti->refcount); + } + xa_unlock_irqrestore(&adev->vm_manager.pasids, flags); + + return ti; } static int amdgpu_vm_create_task_info(struct amdgpu_vm *vm) -- 2.34.1 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Claude review: Re: [PATCH] drm/amdgpu: fix PASID task_info lookup race 2026-03-09 16:54 ` [PATCH] drm/amdgpu: fix PASID task_info lookup race Christian König 2026-03-10 0:58 ` [PATCH v2] " Fan Wu @ 2026-03-11 3:51 ` Claude Code Review Bot 2026-03-11 3:51 ` Claude Code Review Bot 2 siblings, 0 replies; 4+ messages in thread From: Claude Code Review Bot @ 2026-03-11 3:51 UTC (permalink / raw) To: dri-devel-reviews Overall Series Review Subject: Re: [PATCH] drm/amdgpu: fix PASID task_info lookup race Author: =?UTF-8?Q?Christian_K=C3=B6nig?= <christian.koenig@amd.com> Patches: 2 Reviewed: 2026-03-11T13:51:08.979104 --- This is a single-patch fix for a real use-after-free race condition in `amdgpu_vm_get_task_info_pasid()`. The race is between the PASID lookup path and `amdgpu_vm_fini()`: 1. Thread A calls `amdgpu_vm_get_vm_from_pasid()`, gets a pointer to the VM, releases the xarray lock. 2. Thread B runs `amdgpu_vm_fini()`, calls `xa_erase_irq()` to remove the PASID mapping, then eventually calls `amdgpu_vm_put_task_info()` which may free `task_info`. 3. Thread A dereferences `vm->task_info` — use-after-free. The fix correctly collapses the lookup and reference acquisition into a single critical section under the xarray lock. The approach is sound and well-motivated. **Verdict: The patch is correct and should be accepted, with minor observations below.** --- Generated by Claude Code Patch Reviewer ^ permalink raw reply [flat|nested] 4+ messages in thread
* Claude review: Re: [PATCH] drm/amdgpu: fix PASID task_info lookup race 2026-03-09 16:54 ` [PATCH] drm/amdgpu: fix PASID task_info lookup race Christian König 2026-03-10 0:58 ` [PATCH v2] " Fan Wu 2026-03-11 3:51 ` Claude review: Re: [PATCH] " Claude Code Review Bot @ 2026-03-11 3:51 ` Claude Code Review Bot 2 siblings, 0 replies; 4+ messages in thread From: Claude Code Review Bot @ 2026-03-11 3:51 UTC (permalink / raw) To: dri-devel-reviews Patch Review **Correctness of the fix:** The core change is correct. By holding `xa_lock_irqsave` across both the `xa_load()` and the `kref_get()`, the race with `amdgpu_vm_fini()` is closed. In `amdgpu_vm_fini()` (line 2784-2786), `xa_erase_irq()` takes the same xarray lock internally, so the PASID entry cannot be removed while we hold the lock, guaranteeing the VM (and its `task_info`) remains valid. **Use of `kref_get()` vs `kref_get_unless_zero()`:** The commit message states: > The task_info is dropped only after the VM is removed from the PASID mapping, so a regular kref_get() is sufficient here. This is correct. Looking at `amdgpu_vm_fini()`, `xa_erase_irq()` happens at line 2785, while `amdgpu_vm_put_task_info(vm->task_info)` happens at line 2841 — well after the PASID is erased. Since the xarray lock serializes against the erase, if `xa_load()` returns a non-NULL VM, the task_info refcount is guaranteed to be >= 1. `kref_get()` is the right choice. **NULL check improvement:** The new code: ```c if (vm && vm->task_info) { ti = vm->task_info; kref_get(&ti->refcount); } ``` This adds a `vm->task_info` NULL check that the old `amdgpu_vm_get_task_info_vm()` lacked — the old code would crash if `vm->task_info` were NULL. This is a subtle improvement. **Comment quality:** The inline comment explaining the race is well-written and helpful for future maintainers. **Minor observation — `amdgpu_vm_get_task_info_vm` left with same bug:** The remaining `amdgpu_vm_get_task_info_vm()` function (lines 2510-2520) still unconditionally dereferences `vm->task_info` without a NULL check: ```c if (vm) { ti = vm->task_info; kref_get(&vm->task_info->refcount); /* crashes if task_info is NULL */ } ``` If `task_info` can be NULL (which this patch implicitly acknowledges by checking it), then `amdgpu_vm_get_task_info_vm()` has the same potential NULL dereference. This is a pre-existing issue and not introduced by this patch, but it might be worth fixing in a follow-up. The callers of `amdgpu_vm_get_task_info_vm()` likely always have a valid `task_info`, so it may be benign in practice. **No other concerns.** The patch is clean, minimal, and fixes a real race condition. --- Generated by Claude Code Patch Reviewer ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-03-11 3:51 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260309160403.599472-1-fanwu01@zju.edu.cn>
2026-03-09 16:54 ` [PATCH] drm/amdgpu: fix PASID task_info lookup race Christian König
2026-03-10 0:58 ` [PATCH v2] " Fan Wu
2026-03-11 3:51 ` Claude review: Re: [PATCH] " Claude Code Review Bot
2026-03-11 3:51 ` Claude Code Review Bot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox