* [PATCH 0/2] drm/amdgpu: fix use-after-free in userq signal/wait IOCTLs
@ 2026-03-09 2:22 Chenyuan Mi
2026-03-09 2:22 ` [PATCH 1/2] drm/amdgpu: protect waitq access with userq_mutex in wait IOCTL Chenyuan Mi
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Chenyuan Mi @ 2026-03-09 2:22 UTC (permalink / raw)
To: alexander.deucher, christian.koenig
Cc: Arunpravin.PaneerSelvam, airlied, simona, amd-gfx, dri-devel,
linux-kernel
Both amdgpu_userq_wait_ioctl() and amdgpu_userq_signal_ioctl()
access user queue objects obtained from xa_load() without holding
userq_mutex. A concurrent AMDGPU_USERQ_OP_FREE can destroy and
kfree the queue in this window, leading to use-after-free.
The two bugs have different origins:
- Patch 1 fixes a wait-path regression introduced by commit
4b27406380b0 ("drm/amdgpu: Add queue id support to the user queue
wait IOCTL"), which removed the indirect fence_drv_xa_ptr model
and its NULL-check safety net from commit ed5fdc1fc282
("drm/amdgpu: Fix the use-after-free issue in wait IOCTL").
- Patch 2 fixes a similar pre-existing lifetime bug in the signal
path, present since commit a292fdecd728 ("drm/amdgpu: Implement
userqueue signal/wait IOCTL").
Patch 1 adds explicit userq_mutex coverage around the xa_load and
subsequent fence_drv_xa operations in the wait path.
Patch 2 moves the ensure_ev_fence call (which acquires
userq_mutex) before xa_load in the signal path, so that the queue
lookup and all subsequent accesses are covered by the same lock.
Chenyuan Mi (2):
drm/amdgpu: protect waitq access with userq_mutex in wait IOCTL
drm/amdgpu: protect queue access in signal IOCTL
.../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 25 +++++++++++++------
1 file changed, 18 insertions(+), 7 deletions(-)
--
2.53.0
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH 1/2] drm/amdgpu: protect waitq access with userq_mutex in wait IOCTL 2026-03-09 2:22 [PATCH 0/2] drm/amdgpu: fix use-after-free in userq signal/wait IOCTLs Chenyuan Mi @ 2026-03-09 2:22 ` Chenyuan Mi 2026-03-09 10:07 ` Christian König 2026-03-09 2:22 ` [PATCH 2/2] drm/amdgpu: protect queue access in signal IOCTL Chenyuan Mi 2026-03-10 2:44 ` Claude review: drm/amdgpu: fix use-after-free in userq signal/wait IOCTLs Claude Code Review Bot 2 siblings, 1 reply; 8+ messages in thread From: Chenyuan Mi @ 2026-03-09 2:22 UTC (permalink / raw) To: alexander.deucher, christian.koenig Cc: Arunpravin.PaneerSelvam, airlied, simona, amd-gfx, dri-devel, linux-kernel, stable amdgpu_userq_wait_ioctl() accesses the wait queue object obtained from xa_load() without holding userq_mutex or taking a reference on the queue. A concurrent AMDGPU_USERQ_OP_FREE call can destroy and free the queue between the xa_load() and the subsequent xa_alloc(&waitq->fence_drv_xa, ...), resulting in a use-after-free. This is a regression introduced by commit 4b27406380b0 ("drm/amdgpu: Add queue id support to the user queue wait IOCTL"), which removed the indirect fence_drv_xa_ptr model and its NULL check safety net from commit ed5fdc1fc282 ("drm/amdgpu: Fix the use-after-free issue in wait IOCTL") and replaced it with a direct waitq->fence_drv_xa access, but did not add any lifetime protection around the new waitq pointer. Fix this by holding userq_mutex across the xa_load() and the subsequent fence_drv_xa operations, matching the locking used by the destroy path. Fixes: 4b27406380b0 ("drm/amdgpu: Add queue id support to the user queue wait IOCTL") Cc: stable@vger.kernel.org Signed-off-by: Chenyuan Mi <chenyuan_mi@163.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c index 8013260e29dc..1785ea7c18fe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c @@ -912,8 +912,10 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data, */ num_fences = dma_fence_dedup_array(fences, num_fences); + mutex_lock(&userq_mgr->userq_mutex); waitq = xa_load(&userq_mgr->userq_xa, wait_info->waitq_id); if (!waitq) { + mutex_unlock(&userq_mgr->userq_mutex); r = -EINVAL; goto free_fences; } @@ -932,6 +934,7 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data, r = dma_fence_wait(fences[i], true); if (r) { dma_fence_put(fences[i]); + mutex_unlock(&userq_mgr->userq_mutex); goto free_fences; } @@ -948,8 +951,10 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data, */ r = xa_alloc(&waitq->fence_drv_xa, &index, fence_drv, xa_limit_32b, GFP_KERNEL); - if (r) + if (r) { + mutex_unlock(&userq_mgr->userq_mutex); goto free_fences; + } amdgpu_userq_fence_driver_get(fence_drv); @@ -961,6 +966,7 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data, /* Increment the actual userq fence count */ cnt++; } + mutex_unlock(&userq_mgr->userq_mutex); wait_info->num_fences = cnt; /* Copy userq fence info to user space */ -- 2.53.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] drm/amdgpu: protect waitq access with userq_mutex in wait IOCTL 2026-03-09 2:22 ` [PATCH 1/2] drm/amdgpu: protect waitq access with userq_mutex in wait IOCTL Chenyuan Mi @ 2026-03-09 10:07 ` Christian König 2026-03-10 2:44 ` Claude review: " Claude Code Review Bot 0 siblings, 1 reply; 8+ messages in thread From: Christian König @ 2026-03-09 10:07 UTC (permalink / raw) To: Chenyuan Mi, alexander.deucher Cc: Arunpravin.PaneerSelvam, airlied, simona, amd-gfx, dri-devel, linux-kernel, stable On 3/9/26 03:22, Chenyuan Mi wrote: > amdgpu_userq_wait_ioctl() accesses the wait queue object obtained > from xa_load() without holding userq_mutex or taking a reference on > the queue. A concurrent AMDGPU_USERQ_OP_FREE call can destroy and > free the queue between the xa_load() and the subsequent > xa_alloc(&waitq->fence_drv_xa, ...), resulting in a use-after-free. > > This is a regression introduced by commit 4b27406380b0 > ("drm/amdgpu: Add queue id support to the user queue wait IOCTL"), > which removed the indirect fence_drv_xa_ptr model and its NULL > check safety net from commit ed5fdc1fc282 ("drm/amdgpu: Fix the > use-after-free issue in wait IOCTL") and replaced it with a direct > waitq->fence_drv_xa access, but did not add any lifetime protection > around the new waitq pointer. > > Fix this by holding userq_mutex across the xa_load() and the > subsequent fence_drv_xa operations, matching the locking used by > the destroy path. > > Fixes: 4b27406380b0 ("drm/amdgpu: Add queue id support to the user queue wait IOCTL") > Cc: stable@vger.kernel.org > Signed-off-by: Chenyuan Mi <chenyuan_mi@163.com> Well this trivially causes a deadlock. The correct fix has already been published by Sunil quite a while ago. Regards, Christian. > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c > index 8013260e29dc..1785ea7c18fe 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c > @@ -912,8 +912,10 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data, > */ > num_fences = dma_fence_dedup_array(fences, num_fences); > > + mutex_lock(&userq_mgr->userq_mutex); > waitq = xa_load(&userq_mgr->userq_xa, wait_info->waitq_id); > if (!waitq) { > + mutex_unlock(&userq_mgr->userq_mutex); > r = -EINVAL; > goto free_fences; > } > @@ -932,6 +934,7 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data, > r = dma_fence_wait(fences[i], true); > if (r) { > dma_fence_put(fences[i]); > + mutex_unlock(&userq_mgr->userq_mutex); > goto free_fences; > } > > @@ -948,8 +951,10 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data, > */ > r = xa_alloc(&waitq->fence_drv_xa, &index, fence_drv, > xa_limit_32b, GFP_KERNEL); > - if (r) > + if (r) { > + mutex_unlock(&userq_mgr->userq_mutex); > goto free_fences; > + } > > amdgpu_userq_fence_driver_get(fence_drv); > > @@ -961,6 +966,7 @@ int amdgpu_userq_wait_ioctl(struct drm_device *dev, void *data, > /* Increment the actual userq fence count */ > cnt++; > } > + mutex_unlock(&userq_mgr->userq_mutex); > > wait_info->num_fences = cnt; > /* Copy userq fence info to user space */ > -- > 2.53.0 > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Claude review: Re: [PATCH 1/2] drm/amdgpu: protect waitq access with userq_mutex in wait IOCTL 2026-03-09 10:07 ` Christian König @ 2026-03-10 2:44 ` Claude Code Review Bot 0 siblings, 0 replies; 8+ messages in thread From: Claude Code Review Bot @ 2026-03-10 2:44 UTC (permalink / raw) To: dri-devel-reviews Patch Review **Already fixed upstream via refcounting.** The current tree uses: ```c waitq = amdgpu_userq_get(userq_mgr, wait_info->waitq_id); ``` (line 866) instead of the raw `xa_load()` this patch targets. The `amdgpu_userq_get()` function takes a kref under xa_lock, and `amdgpu_userq_put()` is called at the end (line 951), providing proper lifetime protection without holding the mutex across the entire loop. **Even evaluated against the older base, there is a concern:** - **Sleeping under mutex**: The patch holds `userq_mutex` across `dma_fence_wait(fences[i], true)` (line 909 in the diff context). `dma_fence_wait` with `intr=true` can sleep for an extended/unbounded period waiting for GPU work to complete. Holding `userq_mutex` during this time blocks all other userqueue operations (create, destroy, signal) for this file descriptor. This is a significant liveness concern — a slow or stalled fence would effectively deadlock the entire userqueue subsystem for the process. The refcounting approach in drm-next avoids this problem entirely. --- Generated by Claude Code Patch Reviewer ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/2] drm/amdgpu: protect queue access in signal IOCTL 2026-03-09 2:22 [PATCH 0/2] drm/amdgpu: fix use-after-free in userq signal/wait IOCTLs Chenyuan Mi 2026-03-09 2:22 ` [PATCH 1/2] drm/amdgpu: protect waitq access with userq_mutex in wait IOCTL Chenyuan Mi @ 2026-03-09 2:22 ` Chenyuan Mi 2026-03-09 10:09 ` Christian König 2026-03-10 2:44 ` Claude review: drm/amdgpu: fix use-after-free in userq signal/wait IOCTLs Claude Code Review Bot 2 siblings, 1 reply; 8+ messages in thread From: Chenyuan Mi @ 2026-03-09 2:22 UTC (permalink / raw) To: alexander.deucher, christian.koenig Cc: Arunpravin.PaneerSelvam, airlied, simona, amd-gfx, dri-devel, linux-kernel, stable amdgpu_userq_signal_ioctl() retrieves the user queue via xa_load() and then dereferences it in amdgpu_userq_fence_read_wptr(), amdgpu_userq_fence_create(), and direct queue->last_fence accesses, all before userq_mutex is acquired by amdgpu_userq_ensure_ev_fence(). A concurrent AMDGPU_USERQ_OP_FREE can destroy and free the queue in this window, leading to a use-after-free. This bug predates the queue-id wait ioctl changes and has been present since the original signal/wait ioctl implementation. Fix this by moving amdgpu_userq_ensure_ev_fence() before xa_load() so that the queue lookup and all subsequent accesses are performed under the userq_mutex that ensure_ev_fence acquires. Add the necessary mutex_unlock() calls to the error paths between the moved ensure_ev_fence and the existing unlock. Fixes: a292fdecd728 ("drm/amdgpu: Implement userqueue signal/wait IOCTL") Cc: stable@vger.kernel.org Signed-off-by: Chenyuan Mi <chenyuan_mi@163.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c index 1785ea7c18fe..7866f583eea4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c @@ -545,23 +545,28 @@ int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data, } } - /* Retrieve the user queue */ + /* We are here means UQ is active, make sure the eviction fence is valid */ + amdgpu_userq_ensure_ev_fence(&fpriv->userq_mgr, &fpriv->evf_mgr); + + /* Retrieve the user queue under userq_mutex (held by ensure_ev_fence) */ queue = xa_load(&userq_mgr->userq_xa, args->queue_id); if (!queue) { + mutex_unlock(&userq_mgr->userq_mutex); r = -ENOENT; goto put_gobj_write; } r = amdgpu_userq_fence_read_wptr(adev, queue, &wptr); - if (r) + if (r) { + mutex_unlock(&userq_mgr->userq_mutex); goto put_gobj_write; + } r = amdgpu_userq_fence_alloc(&userq_fence); - if (r) + if (r) { + mutex_unlock(&userq_mgr->userq_mutex); goto put_gobj_write; - - /* We are here means UQ is active, make sure the eviction fence is valid */ - amdgpu_userq_ensure_ev_fence(&fpriv->userq_mgr, &fpriv->evf_mgr); + } /* Create a new fence */ r = amdgpu_userq_fence_create(queue, userq_fence, wptr, &fence); -- 2.53.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] drm/amdgpu: protect queue access in signal IOCTL 2026-03-09 2:22 ` [PATCH 2/2] drm/amdgpu: protect queue access in signal IOCTL Chenyuan Mi @ 2026-03-09 10:09 ` Christian König 2026-03-10 2:44 ` Claude review: " Claude Code Review Bot 0 siblings, 1 reply; 8+ messages in thread From: Christian König @ 2026-03-09 10:09 UTC (permalink / raw) To: Chenyuan Mi, alexander.deucher Cc: Arunpravin.PaneerSelvam, airlied, simona, amd-gfx, dri-devel, linux-kernel, stable On 3/9/26 03:22, Chenyuan Mi wrote: > [Some people who received this message don't often get email from chenyuan_mi@163.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] > > amdgpu_userq_signal_ioctl() retrieves the user queue via xa_load() > and then dereferences it in amdgpu_userq_fence_read_wptr(), > amdgpu_userq_fence_create(), and direct queue->last_fence accesses, > all before userq_mutex is acquired by amdgpu_userq_ensure_ev_fence(). > > A concurrent AMDGPU_USERQ_OP_FREE can destroy and free the queue > in this window, leading to a use-after-free. > > This bug predates the queue-id wait ioctl changes and has been > present since the original signal/wait ioctl implementation. > > Fix this by moving amdgpu_userq_ensure_ev_fence() before xa_load() Again that trivially causes a deadlock. So the patch is just not working at all. Regards, Christian. > so that the queue lookup and all subsequent accesses are performed > under the userq_mutex that ensure_ev_fence acquires. Add the > necessary mutex_unlock() calls to the error paths between the moved > ensure_ev_fence and the existing unlock. > > Fixes: a292fdecd728 ("drm/amdgpu: Implement userqueue signal/wait IOCTL") > Cc: stable@vger.kernel.org > Signed-off-by: Chenyuan Mi <chenyuan_mi@163.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 17 +++++++++++------ > 1 file changed, 11 insertions(+), 6 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c > index 1785ea7c18fe..7866f583eea4 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c > @@ -545,23 +545,28 @@ int amdgpu_userq_signal_ioctl(struct drm_device *dev, void *data, > } > } > > - /* Retrieve the user queue */ > + /* We are here means UQ is active, make sure the eviction fence is valid */ > + amdgpu_userq_ensure_ev_fence(&fpriv->userq_mgr, &fpriv->evf_mgr); > + > + /* Retrieve the user queue under userq_mutex (held by ensure_ev_fence) */ > queue = xa_load(&userq_mgr->userq_xa, args->queue_id); > if (!queue) { > + mutex_unlock(&userq_mgr->userq_mutex); > r = -ENOENT; > goto put_gobj_write; > } > > r = amdgpu_userq_fence_read_wptr(adev, queue, &wptr); > - if (r) > + if (r) { > + mutex_unlock(&userq_mgr->userq_mutex); > goto put_gobj_write; > + } > > r = amdgpu_userq_fence_alloc(&userq_fence); > - if (r) > + if (r) { > + mutex_unlock(&userq_mgr->userq_mutex); > goto put_gobj_write; > - > - /* We are here means UQ is active, make sure the eviction fence is valid */ > - amdgpu_userq_ensure_ev_fence(&fpriv->userq_mgr, &fpriv->evf_mgr); > + } > > /* Create a new fence */ > r = amdgpu_userq_fence_create(queue, userq_fence, wptr, &fence); > -- > 2.53.0 > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Claude review: Re: [PATCH 2/2] drm/amdgpu: protect queue access in signal IOCTL 2026-03-09 10:09 ` Christian König @ 2026-03-10 2:44 ` Claude Code Review Bot 0 siblings, 0 replies; 8+ messages in thread From: Claude Code Review Bot @ 2026-03-10 2:44 UTC (permalink / raw) To: dri-devel-reviews Patch Review **Already fixed upstream via refcounting.** The current tree uses: ```c queue = amdgpu_userq_get(userq_mgr, args->queue_id); ``` (line 522) instead of raw `xa_load()`. Combined with `amdgpu_userq_put(queue)` at the cleanup (line 614), this provides proper lifetime protection. **Additional concerns with this patch's approach:** - **Reliance on implicit locking side-effect**: The patch relies on the fact that `amdgpu_userq_ensure_ev_fence()` acquires `userq_mutex` as a side-effect, and then adds `mutex_unlock()` calls in error paths between it and the existing unlock. This coupling is fragile — the comment `/* Retrieve the user queue under userq_mutex (held by ensure_ev_fence) */` documents a non-obvious invariant. If `ensure_ev_fence`'s locking behavior ever changes, this code silently breaks. It would be cleaner to acquire the mutex explicitly and separately. - **`amdgpu_userq_fence_alloc` under mutex**: Moving `ensure_ev_fence` earlier means `amdgpu_userq_fence_alloc()` (a `GFP_KERNEL` allocation from a slab cache) is now called under `userq_mutex`. While this works, it needlessly extends the critical section. The allocation doesn't access queue state and doesn't need mutex protection. - **Fixes tag**: The `Fixes:` tag references `a292fdecd728`, the original signal/wait IOCTL commit. The `Cc: stable` tag is appropriate for the severity, but given the patch doesn't apply to current trees, backporting would need to be done carefully against the actual stable tree base. --- Generated by Claude Code Patch Reviewer ^ permalink raw reply [flat|nested] 8+ messages in thread
* Claude review: drm/amdgpu: fix use-after-free in userq signal/wait IOCTLs 2026-03-09 2:22 [PATCH 0/2] drm/amdgpu: fix use-after-free in userq signal/wait IOCTLs Chenyuan Mi 2026-03-09 2:22 ` [PATCH 1/2] drm/amdgpu: protect waitq access with userq_mutex in wait IOCTL Chenyuan Mi 2026-03-09 2:22 ` [PATCH 2/2] drm/amdgpu: protect queue access in signal IOCTL Chenyuan Mi @ 2026-03-10 2:44 ` Claude Code Review Bot 2 siblings, 0 replies; 8+ messages in thread From: Claude Code Review Bot @ 2026-03-10 2:44 UTC (permalink / raw) To: dri-devel-reviews Overall Series Review Subject: drm/amdgpu: fix use-after-free in userq signal/wait IOCTLs Author: Chenyuan Mi <chenyuan_mi@163.com> Patches: 5 Reviewed: 2026-03-10T12:44:42.922557 --- This 2-patch series aims to fix use-after-free bugs in the amdgpu userqueue signal and wait IOCTLs by adding mutex protection around queue lookups. The identified race conditions are real: without lifetime protection, a concurrent `AMDGPU_USERQ_OP_FREE` can destroy a queue between `xa_load()` and subsequent dereferences. **However, these patches are based on an older version of the code and are already superseded by changes in the current drm-next tree.** The current tree (as seen at `amdgpu_userq_fence.c:522` and `:866`) has replaced the raw `xa_load()` calls with `amdgpu_userq_get()`/`amdgpu_userq_put()`, which use `kref` refcounting to protect the queue lifetime. This refcounting approach is a better fix — it takes a reference under the xa_lock, preventing the queue from being freed while in use, without needing to hold `userq_mutex` across the entire operation. **Recommendation: NAK — these patches are not needed against current drm-next.** --- Generated by Claude Code Patch Reviewer ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-03-10 2:44 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-09 2:22 [PATCH 0/2] drm/amdgpu: fix use-after-free in userq signal/wait IOCTLs Chenyuan Mi 2026-03-09 2:22 ` [PATCH 1/2] drm/amdgpu: protect waitq access with userq_mutex in wait IOCTL Chenyuan Mi 2026-03-09 10:07 ` Christian König 2026-03-10 2:44 ` Claude review: " Claude Code Review Bot 2026-03-09 2:22 ` [PATCH 2/2] drm/amdgpu: protect queue access in signal IOCTL Chenyuan Mi 2026-03-09 10:09 ` Christian König 2026-03-10 2:44 ` Claude review: " Claude Code Review Bot 2026-03-10 2:44 ` Claude review: drm/amdgpu: fix use-after-free in userq signal/wait IOCTLs Claude Code Review Bot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox