* Re: [PATCH] drm/amdgpu: fix PASID task_info lookup race
[not found] <20260309160403.599472-1-fanwu01@zju.edu.cn>
@ 2026-03-09 16:54 ` Christian König
2026-03-10 0:58 ` [PATCH v2] " Fan Wu
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Christian König @ 2026-03-09 16:54 UTC (permalink / raw)
To: Fan Wu, Alex Deucher; +Cc: amd-gfx, dri-devel, linux-kernel
On 3/9/26 17:04, Fan Wu wrote:
> The amdgpu_vm_get_task_info_pasid() function previously called
> amdgpu_vm_get_vm_from_pasid() which returns a raw VM pointer after
> releasing the pasids xarray lock. The caller then dereferences
> vm->task_info without any lifetime protection.
>
> Race condition:
>
> CPU 0 (lookup) CPU 1 (release)
> ------------------ ------------------
> amdgpu_vm_get_task_info_pasid()
> xa_lock()
> vm = xa_load(pasids)
> xa_unlock()
> amdgpu_vm_fini()
> xa_erase_irq(pasids)
> // teardown continues
> kfree(fpriv)
> // VM freed (embedded in fpriv)
> vm->task_info // potential UAF
>
> This can leave the VM pointer dangling because struct amdgpu_vm is
> embedded in struct amdgpu_fpriv which is freed via kfree(fpriv) in
> amdgpu_file_release_kms() after amdgpu_vm_fini() returns.
>
> Fix this by acquiring the task_info reference while holding the
> xarray lock. This avoids the window where the VM could be freed
> between the lookup and the dereference.
>
> Cache vm->task_info in a local variable before attempting to take a
> reference, which keeps the lookup straightforward inside the locked
> section. Use kref_get_unless_zero() to safely handle the case where
> task_info's refcount is already being decremented to zero by another
> thread in the teardown path.
>
> Note: An RCU-based approach was considered but is not currently
> feasible because: (1) the pasids xarray is initialized without
> XA_FLAGS_RCU, and (2) struct amdgpu_fpriv is freed with kfree()
> rather than kfree_rcu(). A future refactoring could enable RCU
> if needed for performance.
>
> Also remove the unsafe helper function amdgpu_vm_get_vm_from_pasid()
> to prevent future misuse.
>
> Fixes: b8f67b9ddf4f ("drm/amdgpu: change vm->task_info handling")
> Signed-off-by: Fan Wu <fanwu01@zju.edu.cn>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 40 ++++++++++++++++----------
> 1 file changed, 25 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index f2beb980e3c3..7e8621c9b661 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -2468,19 +2468,6 @@ static void amdgpu_vm_destroy_task_info(struct kref *kref)
> kfree(ti);
> }
>
> -static inline struct amdgpu_vm *
> -amdgpu_vm_get_vm_from_pasid(struct amdgpu_device *adev, u32 pasid)
> -{
> - struct amdgpu_vm *vm;
> - unsigned long flags;
> -
> - xa_lock_irqsave(&adev->vm_manager.pasids, flags);
> - vm = xa_load(&adev->vm_manager.pasids, pasid);
> - xa_unlock_irqrestore(&adev->vm_manager.pasids, flags);
> -
> - return vm;
> -}
> -
> /**
> * amdgpu_vm_put_task_info - reference down the vm task_info ptr
> *
> @@ -2527,8 +2514,31 @@ amdgpu_vm_get_task_info_vm(struct amdgpu_vm *vm)
> struct amdgpu_task_info *
> amdgpu_vm_get_task_info_pasid(struct amdgpu_device *adev, u32 pasid)
> {
> - return amdgpu_vm_get_task_info_vm(
> - amdgpu_vm_get_vm_from_pasid(adev, pasid));
> + struct amdgpu_vm *vm;
> + unsigned long flags;
> + struct amdgpu_task_info *ti = NULL;
> +
> + /*
> + * Acquire the task_info reference while holding the pasids xarray
> + * lock to prevent a race with amdgpu_vm_fini() which removes the
> + * PASID mapping before freeing the VM (embedded in struct amdgpu_fpriv).
> + * Without this, the VM could be freed between xa_load() return and
> + * the task_info dereference.
That the VM is freed is irrelevant, the point is that we need to grab the reference to the task info before we drop that one.
> + */
> + xa_lock_irqsave(&adev->vm_manager.pasids, flags);
> + vm = xa_load(&adev->vm_manager.pasids, pasid);
> + if (vm) {
> + /*
> + * Cache vm->task_info in a local variable before
> + * attempting to take a reference.
> + */
Please drop that comment, taking the task info into a local variable is actually superflous.
> + ti = vm->task_info;
> + if (ti && !kref_get_unless_zero(&ti->refcount))
That is unecessary as wel, the task info is dropped after the VM is removed from pasid mapping.
So just using kref_get() is sufficient.
Regards,
Christian.
> + ti = NULL;
> + }
> + xa_unlock_irqrestore(&adev->vm_manager.pasids, flags);
> +
> + return ti;
> }
>
> static int amdgpu_vm_create_task_info(struct amdgpu_vm *vm)
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v2] drm/amdgpu: fix PASID task_info lookup race
2026-03-09 16:54 ` [PATCH] drm/amdgpu: fix PASID task_info lookup race Christian König
@ 2026-03-10 0:58 ` Fan Wu
2026-03-11 3:51 ` Claude review: Re: [PATCH] " Claude Code Review Bot
2026-03-11 3:51 ` Claude Code Review Bot
2 siblings, 0 replies; 4+ messages in thread
From: Fan Wu @ 2026-03-10 0:58 UTC (permalink / raw)
To: Alex Deucher, Christian König
Cc: David Airlie, Simona Vetter, amd-gfx, dri-devel, linux-kernel,
Fan Wu
amdgpu_vm_get_task_info_pasid() currently looks up the VM from the
PASID xarray, drops the xarray lock, and only then grabs the task_info
reference through amdgpu_vm_get_task_info_vm().
Take the task_info reference directly while holding the PASID xarray
lock instead. This keeps the lookup and reference acquisition in the
same critical section.
The task_info is dropped only after the VM is removed from the PASID
mapping, so a regular kref_get() is sufficient here.
Also remove the now unnecessary amdgpu_vm_get_vm_from_pasid() helper.
Fixes: b8f67b9ddf4f ("drm/amdgpu: change vm->task_info handling")
Signed-off-by: Fan Wu <fanwu01@zju.edu.cn>
---
v2:
- grab the task_info reference directly under the PASID xarray lock
- drop the extra local-variable comment
- use kref_get() instead of kref_get_unless_zero()
- simplify the changelog
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 35 +++++++++++++++-----------
1 file changed, 20 insertions(+), 15 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index f2beb980e3c3..706ca8dd65d2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2468,19 +2468,6 @@ static void amdgpu_vm_destroy_task_info(struct kref *kref)
kfree(ti);
}
-static inline struct amdgpu_vm *
-amdgpu_vm_get_vm_from_pasid(struct amdgpu_device *adev, u32 pasid)
-{
- struct amdgpu_vm *vm;
- unsigned long flags;
-
- xa_lock_irqsave(&adev->vm_manager.pasids, flags);
- vm = xa_load(&adev->vm_manager.pasids, pasid);
- xa_unlock_irqrestore(&adev->vm_manager.pasids, flags);
-
- return vm;
-}
-
/**
* amdgpu_vm_put_task_info - reference down the vm task_info ptr
*
@@ -2527,8 +2514,26 @@ amdgpu_vm_get_task_info_vm(struct amdgpu_vm *vm)
struct amdgpu_task_info *
amdgpu_vm_get_task_info_pasid(struct amdgpu_device *adev, u32 pasid)
{
- return amdgpu_vm_get_task_info_vm(
- amdgpu_vm_get_vm_from_pasid(adev, pasid));
+ struct amdgpu_vm *vm;
+ unsigned long flags;
+ struct amdgpu_task_info *ti = NULL;
+
+ /*
+ * Acquire the task_info reference while holding the pasids xarray
+ * lock to prevent a race with amdgpu_vm_fini() which removes the
+ * PASID mapping before freeing the VM (embedded in struct amdgpu_fpriv).
+ * Without this, the VM could be freed between xa_load() return and
+ * the task_info dereference.
+ */
+ xa_lock_irqsave(&adev->vm_manager.pasids, flags);
+ vm = xa_load(&adev->vm_manager.pasids, pasid);
+ if (vm && vm->task_info) {
+ ti = vm->task_info;
+ kref_get(&ti->refcount);
+ }
+ xa_unlock_irqrestore(&adev->vm_manager.pasids, flags);
+
+ return ti;
}
static int amdgpu_vm_create_task_info(struct amdgpu_vm *vm)
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Claude review: Re: [PATCH] drm/amdgpu: fix PASID task_info lookup race
2026-03-09 16:54 ` [PATCH] drm/amdgpu: fix PASID task_info lookup race Christian König
2026-03-10 0:58 ` [PATCH v2] " Fan Wu
@ 2026-03-11 3:51 ` Claude Code Review Bot
2026-03-11 3:51 ` Claude Code Review Bot
2 siblings, 0 replies; 4+ messages in thread
From: Claude Code Review Bot @ 2026-03-11 3:51 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: Re: [PATCH] drm/amdgpu: fix PASID task_info lookup race
Author: =?UTF-8?Q?Christian_K=C3=B6nig?= <christian.koenig@amd.com>
Patches: 2
Reviewed: 2026-03-11T13:51:08.979104
---
This is a single-patch fix for a real use-after-free race condition in `amdgpu_vm_get_task_info_pasid()`. The race is between the PASID lookup path and `amdgpu_vm_fini()`:
1. Thread A calls `amdgpu_vm_get_vm_from_pasid()`, gets a pointer to the VM, releases the xarray lock.
2. Thread B runs `amdgpu_vm_fini()`, calls `xa_erase_irq()` to remove the PASID mapping, then eventually calls `amdgpu_vm_put_task_info()` which may free `task_info`.
3. Thread A dereferences `vm->task_info` — use-after-free.
The fix correctly collapses the lookup and reference acquisition into a single critical section under the xarray lock. The approach is sound and well-motivated.
**Verdict: The patch is correct and should be accepted, with minor observations below.**
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 4+ messages in thread
* Claude review: Re: [PATCH] drm/amdgpu: fix PASID task_info lookup race
2026-03-09 16:54 ` [PATCH] drm/amdgpu: fix PASID task_info lookup race Christian König
2026-03-10 0:58 ` [PATCH v2] " Fan Wu
2026-03-11 3:51 ` Claude review: Re: [PATCH] " Claude Code Review Bot
@ 2026-03-11 3:51 ` Claude Code Review Bot
2 siblings, 0 replies; 4+ messages in thread
From: Claude Code Review Bot @ 2026-03-11 3:51 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Correctness of the fix:**
The core change is correct. By holding `xa_lock_irqsave` across both the `xa_load()` and the `kref_get()`, the race with `amdgpu_vm_fini()` is closed. In `amdgpu_vm_fini()` (line 2784-2786), `xa_erase_irq()` takes the same xarray lock internally, so the PASID entry cannot be removed while we hold the lock, guaranteeing the VM (and its `task_info`) remains valid.
**Use of `kref_get()` vs `kref_get_unless_zero()`:**
The commit message states:
> The task_info is dropped only after the VM is removed from the PASID mapping, so a regular kref_get() is sufficient here.
This is correct. Looking at `amdgpu_vm_fini()`, `xa_erase_irq()` happens at line 2785, while `amdgpu_vm_put_task_info(vm->task_info)` happens at line 2841 — well after the PASID is erased. Since the xarray lock serializes against the erase, if `xa_load()` returns a non-NULL VM, the task_info refcount is guaranteed to be >= 1. `kref_get()` is the right choice.
**NULL check improvement:**
The new code:
```c
if (vm && vm->task_info) {
ti = vm->task_info;
kref_get(&ti->refcount);
}
```
This adds a `vm->task_info` NULL check that the old `amdgpu_vm_get_task_info_vm()` lacked — the old code would crash if `vm->task_info` were NULL. This is a subtle improvement.
**Comment quality:**
The inline comment explaining the race is well-written and helpful for future maintainers.
**Minor observation — `amdgpu_vm_get_task_info_vm` left with same bug:**
The remaining `amdgpu_vm_get_task_info_vm()` function (lines 2510-2520) still unconditionally dereferences `vm->task_info` without a NULL check:
```c
if (vm) {
ti = vm->task_info;
kref_get(&vm->task_info->refcount); /* crashes if task_info is NULL */
}
```
If `task_info` can be NULL (which this patch implicitly acknowledges by checking it), then `amdgpu_vm_get_task_info_vm()` has the same potential NULL dereference. This is a pre-existing issue and not introduced by this patch, but it might be worth fixing in a follow-up. The callers of `amdgpu_vm_get_task_info_vm()` likely always have a valid `task_info`, so it may be benign in practice.
**No other concerns.** The patch is clean, minimal, and fixes a real race condition.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-03-11 3:51 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20260309160403.599472-1-fanwu01@zju.edu.cn>
2026-03-09 16:54 ` [PATCH] drm/amdgpu: fix PASID task_info lookup race Christian König
2026-03-10 0:58 ` [PATCH v2] " Fan Wu
2026-03-11 3:51 ` Claude review: Re: [PATCH] " Claude Code Review Bot
2026-03-11 3:51 ` Claude Code Review Bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox