* [PATCH] drm/amdgpu/userq: clean up VA state on create failure
@ 2026-06-04 6:39 Guangshuo Li
2026-06-04 20:43 ` Claude review: " Claude Code Review Bot
2026-06-04 20:43 ` Claude Code Review Bot
0 siblings, 2 replies; 3+ messages in thread
From: Guangshuo Li @ 2026-06-04 6:39 UTC (permalink / raw)
To: Alex Deucher, Christian König, David Airlie, Simona Vetter,
Prike Liang, Sunil Khatri, Jesse.Zhang, Lijo Lazar, amd-gfx,
dri-devel, linux-kernel
Cc: Guangshuo Li
amdgpu_userq_input_va_validate() is not a side-effect-free validator.
When it succeeds, it allocates a VA cursor, links it on
queue->userq_va_list and marks the corresponding bo_va as userq mapped.
The user queue create path validates queue_va, rptr_va and wptr_va with a
short-circuit OR expression. If an earlier validation succeeds and a
later validation fails, the error path frees the queue directly. The VA
cursor added by the successful validation is leaked and
bo_va->userq_va_mapped remains set even though no user queue was created.
The same stale VA tracking state can also survive later create failures
after all VA validations have succeeded, because those paths also free
the queue without unwinding queue->userq_va_list.
Route the create error paths through common unwind labels and call
amdgpu_userq_buffer_vas_list_cleanup() before freeing the queue. This
releases any VA cursors added during validation and clears the stale
userq VA mapping state.
Fixes: 9e46b8bb0539 ("drm/amdgpu: validate userq buffer virtual address and size")
Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 32 +++++++++++------------
1 file changed, 15 insertions(+), 17 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
index 0a1b93259887..dba0f786ae4a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
@@ -826,17 +826,15 @@ amdgpu_userq_create(struct drm_file *filp, union drm_amdgpu_userq *args)
amdgpu_userq_input_va_validate(adev, queue, args->in.rptr_va, AMDGPU_GPU_PAGE_SIZE) ||
amdgpu_userq_input_va_validate(adev, queue, args->in.wptr_va, AMDGPU_GPU_PAGE_SIZE)) {
r = -EINVAL;
- kfree(queue);
- goto unlock;
+ goto free_queue;
}
/* Convert relative doorbell offset into absolute doorbell index */
index = amdgpu_userq_get_doorbell_index(uq_mgr, &db_info, filp);
if (index == (uint64_t)-EINVAL) {
drm_file_err(uq_mgr->file, "Failed to get doorbell for queue\n");
- kfree(queue);
r = -EINVAL;
- goto unlock;
+ goto free_queue;
}
queue->doorbell_index = index;
@@ -844,15 +842,14 @@ amdgpu_userq_create(struct drm_file *filp, union drm_amdgpu_userq *args)
r = amdgpu_userq_fence_driver_alloc(adev, queue);
if (r) {
drm_file_err(uq_mgr->file, "Failed to alloc fence driver\n");
- goto unlock;
+ goto free_queue;
}
r = uq_funcs->mqd_create(queue, &args->in);
if (r) {
drm_file_err(uq_mgr->file, "Failed to create Queue\n");
amdgpu_userq_fence_driver_free(queue);
- kfree(queue);
- goto unlock;
+ goto free_queue;
}
/* drop this refcount during queue destroy */
@@ -862,21 +859,17 @@ amdgpu_userq_create(struct drm_file *filp, union drm_amdgpu_userq *args)
down_read(&adev->reset_domain->sem);
r = xa_err(xa_store_irq(&adev->userq_doorbell_xa, index, queue, GFP_KERNEL));
if (r) {
- kfree(queue);
up_read(&adev->reset_domain->sem);
- goto unlock;
+ goto free_queue;
}
r = xa_alloc(&uq_mgr->userq_xa, &qid, queue,
XA_LIMIT(1, AMDGPU_MAX_USERQ_COUNT), GFP_KERNEL);
if (r) {
drm_file_err(uq_mgr->file, "Failed to allocate a queue id\n");
- amdgpu_userq_fence_driver_free(queue);
- uq_funcs->mqd_destroy(queue);
- kfree(queue);
r = -ENOMEM;
up_read(&adev->reset_domain->sem);
- goto unlock;
+ goto free_queue;
}
up_read(&adev->reset_domain->sem);
@@ -892,10 +885,7 @@ amdgpu_userq_create(struct drm_file *filp, union drm_amdgpu_userq *args)
if (r) {
drm_file_err(uq_mgr->file, "Failed to map Queue\n");
xa_erase(&uq_mgr->userq_xa, qid);
- amdgpu_userq_fence_driver_free(queue);
- uq_funcs->mqd_destroy(queue);
- kfree(queue);
- goto unlock;
+ goto free_queue;
}
}
@@ -915,7 +905,15 @@ amdgpu_userq_create(struct drm_file *filp, union drm_amdgpu_userq *args)
args->out.queue_id = qid;
atomic_inc(&uq_mgr->userq_count[queue->queue_type]);
+ goto unlock;
+free_mqd:
+ uq_funcs->mqd_destroy(queue);
+free_fence_driver:
+ amdgpu_userq_fence_driver_free(queue);
+free_queue:
+ amdgpu_userq_buffer_vas_list_cleanup(adev, queue);
+ kfree(queue);
unlock:
mutex_unlock(&uq_mgr->userq_mutex);
--
2.43.0
^ permalink raw reply related [flat|nested] 3+ messages in thread* Claude review: drm/amdgpu/userq: clean up VA state on create failure
2026-06-04 6:39 [PATCH] drm/amdgpu/userq: clean up VA state on create failure Guangshuo Li
@ 2026-06-04 20:43 ` Claude Code Review Bot
2026-06-04 20:43 ` Claude Code Review Bot
1 sibling, 0 replies; 3+ messages in thread
From: Claude Code Review Bot @ 2026-06-04 20:43 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: drm/amdgpu/userq: clean up VA state on create failure
Author: Guangshuo Li <lgs201920130244@gmail.com>
Patches: 1
Reviewed: 2026-06-05T06:43:51.545614
---
**Verdict: The bug this patch tries to fix has already been fixed in drm-next through a larger refactor of `amdgpu_userq_create`.**
This is a single patch addressing a real bug in an older version of the code: when `amdgpu_userq_input_va_validate()` succeeds for one VA but a subsequent validation (or later create step) fails, the VA cursor linked onto `queue->userq_va_list` leaks and `bo_va->userq_va_mapped` remains stale. The analysis in the commit message is correct for the code it targets.
However, the patch is based on an older codebase that has since been substantially reworked on drm-next. The current tree (as of drm-next) shows:
1. `amdgpu_userq_create()` has already been restructured with proper unwind labels (`free_queue`, `free_fence_drv`, `clean_doorbell_bo`, `clean_mqd`, `erase_doorbell`).
2. The `userq_va_list` mechanism referenced by the patch no longer exists. VA tracking now uses a `userq_vas.va_array` union and the `userq_va_mapped` flag directly on `bo_va`.
3. The function `amdgpu_userq_buffer_vas_list_cleanup()` that the patch calls does not exist in the current tree.
4. The VA validation failure path already goes to `free_fence_drv` (line 662), which unwinds properly.
5. The short-circuit OR expression for VA validation now stores results into `&queue->userq_vas.va.*` output parameters rather than linking a list.
The patch cannot be applied to drm-next and is not needed there — the underlying issue has been addressed as part of the refactoring.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 3+ messages in thread
* Claude review: drm/amdgpu/userq: clean up VA state on create failure
2026-06-04 6:39 [PATCH] drm/amdgpu/userq: clean up VA state on create failure Guangshuo Li
2026-06-04 20:43 ` Claude review: " Claude Code Review Bot
@ 2026-06-04 20:43 ` Claude Code Review Bot
1 sibling, 0 replies; 3+ messages in thread
From: Claude Code Review Bot @ 2026-06-04 20:43 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Commit message quality:** Good. The commit message clearly explains the bug: the short-circuit OR in `amdgpu_userq_input_va_validate()` calls means earlier successful validations leak state when later ones fail. The Fixes tag references the right commit.
**Correctness issues in the patch itself (against its target base):**
1. **Missing unwind for `xa_store_irq` into `userq_doorbell_xa`:** After the `xa_store_irq` at the original line ~865, if `xa_alloc` fails, the patch's `free_queue` label skips straight to `amdgpu_userq_buffer_vas_list_cleanup` + `kfree` without erasing the doorbell xarray entry. The old code also didn't do this, but since the patch is reorganizing error handling, it should have been caught.
2. **Missing unwind for `mqd_create`:** The patch adds `free_mqd` and `free_fence_driver` labels but **never jumps to them**. Looking at the diff:
- After `amdgpu_userq_fence_driver_alloc` fails (line 847), it goes to `free_queue` — correct, fence driver wasn't allocated.
- After `mqd_create` fails (line 854), it calls `amdgpu_userq_fence_driver_free` inline then goes to `free_queue` — this bypasses the new `free_fence_driver` label but works. However it does NOT go to `free_mqd`, which is correct since mqd_create failed.
- After `xa_store_irq` fails (line 866), it goes to `free_queue` — but at this point both the fence driver AND the MQD have been successfully created, so `free_mqd` and `free_fence_driver` should be invoked. **This is a resource leak bug introduced by the patch.**
- After `xa_alloc` fails (line 876), same problem — goes to `free_queue` without cleaning up MQD or fence driver.
- After `amdgpu_userq_map_helper` fails (line 891), goes to `free_queue` — same issue, plus the doorbell xarray entry isn't erased.
The `free_mqd` and `free_fence_driver` labels are dead code — nothing jumps to them.
3. **Inconsistent fence driver cleanup:** In the `mqd_create` failure path (line 856), the patch keeps the inline `amdgpu_userq_fence_driver_free(queue)` call before `goto free_queue`. But later failure paths that also need fence driver cleanup (post xa_store_irq, post xa_alloc, post map_helper) go directly to `free_queue` and skip it.
**Summary:** The patch correctly identifies a real bug and the VA cleanup addition is the right idea, but the consolidated error handling has **regression bugs** — several failure paths that previously cleaned up the MQD and fence driver now skip that cleanup by jumping to `free_queue` instead of `free_mqd`. The `free_mqd` and `free_fence_driver` labels are dead code. This would need a v2 that properly threads the later error paths through the full unwind chain.
The point is moot for drm-next since the code has been completely refactored and the bug no longer exists in the current tree.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-06-04 20:43 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-04 6:39 [PATCH] drm/amdgpu/userq: clean up VA state on create failure Guangshuo Li
2026-06-04 20:43 ` Claude review: " Claude Code Review Bot
2026-06-04 20:43 ` Claude Code Review Bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox