public inbox for drm-ai-reviews@public-inbox.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH V1] accel/amdxdna: Remove drv_cmd tracing from job free callback
@ 2026-05-29 15:28 Lizhi Hou
  2026-05-31 14:18 ` Mario Limonciello
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Lizhi Hou @ 2026-05-29 15:28 UTC (permalink / raw)
  To: ogabbay, quic_jhugo, dri-devel, mario.limonciello,
	karol.wachowski
  Cc: Lizhi Hou, linux-kernel, max.zhen, sonal.santan

aie2_sched_job_free() accesses job->drv_cmd for tracing purposes. However,
job->drv_cmd is owned by the caller and may already have been freed when
the job free callback runs, leading to a potential use-after-free.

Remove the job->drv_cmd access from aie2_sched_job_free().

Fixes: 8711eb2dde2e ("accel/amdxdna: Improve tracing for job lifecycle and mailbox RX worker")
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
---
 drivers/accel/amdxdna/aie2_ctx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/accel/amdxdna/aie2_ctx.c b/drivers/accel/amdxdna/aie2_ctx.c
index 658a5fb1fda6..2ad343728782 100644
--- a/drivers/accel/amdxdna/aie2_ctx.c
+++ b/drivers/accel/amdxdna/aie2_ctx.c
@@ -437,8 +437,9 @@ static void aie2_sched_job_free(struct drm_sched_job *sched_job)
 	struct amdxdna_sched_job *job = drm_job_to_xdna_job(sched_job);
 	struct amdxdna_hwctx *hwctx = job->hwctx;
 
+	/* job->drv_cmd could be freed, so use DEFAULT_IO */
 	trace_xdna_job(sched_job, hwctx->name, "job free",
-		       job->seq, job->drv_cmd ? job->drv_cmd->opcode : DEFAULT_IO);
+		       job->seq, DEFAULT_IO);
 	if (!job->job_done)
 		up(&hwctx->priv->job_sem);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH V1] accel/amdxdna: Remove drv_cmd tracing from job free callback
  2026-05-29 15:28 [PATCH V1] accel/amdxdna: Remove drv_cmd tracing from job free callback Lizhi Hou
@ 2026-05-31 14:18 ` Mario Limonciello
  2026-06-01 15:18   ` Lizhi Hou
  2026-06-04  6:19 ` Claude review: " Claude Code Review Bot
  2026-06-04  6:19 ` Claude Code Review Bot
  2 siblings, 1 reply; 7+ messages in thread
From: Mario Limonciello @ 2026-05-31 14:18 UTC (permalink / raw)
  To: Lizhi Hou, ogabbay, quic_jhugo, dri-devel, karol.wachowski
  Cc: linux-kernel, max.zhen, sonal.santan



On 5/29/26 17:28, Lizhi Hou wrote:
> aie2_sched_job_free() accesses job->drv_cmd for tracing purposes. However,
> job->drv_cmd is owned by the caller and may already have been freed when
> the job free callback runs, leading to a potential use-after-free.
> 
> Remove the job->drv_cmd access from aie2_sched_job_free().
> 
> Fixes: 8711eb2dde2e ("accel/amdxdna: Improve tracing for job lifecycle and mailbox RX worker")
> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
> ---
>   drivers/accel/amdxdna/aie2_ctx.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/accel/amdxdna/aie2_ctx.c b/drivers/accel/amdxdna/aie2_ctx.c
> index 658a5fb1fda6..2ad343728782 100644
> --- a/drivers/accel/amdxdna/aie2_ctx.c
> +++ b/drivers/accel/amdxdna/aie2_ctx.c
> @@ -437,8 +437,9 @@ static void aie2_sched_job_free(struct drm_sched_job *sched_job)
>   	struct amdxdna_sched_job *job = drm_job_to_xdna_job(sched_job);
>   	struct amdxdna_hwctx *hwctx = job->hwctx;
>   
> +	/* job->drv_cmd could be freed, so use DEFAULT_IO */
>   	trace_xdna_job(sched_job, hwctx->name, "job free",
> -		       job->seq, job->drv_cmd ? job->drv_cmd->opcode : DEFAULT_IO);
> +		       job->seq, DEFAULT_IO);

Could this still be a race with dov->drv_cmd being valid when the first 
part of the expression is evaluated (job->drv_cmd) but invalid when 
job->drv_cmd->opcode is accessed?

>   	if (!job->job_done)
>   		up(&hwctx->priv->job_sem);
>   


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V1] accel/amdxdna: Remove drv_cmd tracing from job free callback
  2026-05-31 14:18 ` Mario Limonciello
@ 2026-06-01 15:18   ` Lizhi Hou
  2026-06-03 18:34     ` Mario Limonciello
  0 siblings, 1 reply; 7+ messages in thread
From: Lizhi Hou @ 2026-06-01 15:18 UTC (permalink / raw)
  To: Mario Limonciello, ogabbay, quic_jhugo, dri-devel,
	karol.wachowski
  Cc: linux-kernel, max.zhen, sonal.santan


On 5/31/26 07:18, Mario Limonciello wrote:
>
>
> On 5/29/26 17:28, Lizhi Hou wrote:
>> aie2_sched_job_free() accesses job->drv_cmd for tracing purposes. 
>> However,
>> job->drv_cmd is owned by the caller and may already have been freed when
>> the job free callback runs, leading to a potential use-after-free.
>>
>> Remove the job->drv_cmd access from aie2_sched_job_free().
>>
>> Fixes: 8711eb2dde2e ("accel/amdxdna: Improve tracing for job 
>> lifecycle and mailbox RX worker")
>> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
>> ---
>>   drivers/accel/amdxdna/aie2_ctx.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/accel/amdxdna/aie2_ctx.c 
>> b/drivers/accel/amdxdna/aie2_ctx.c
>> index 658a5fb1fda6..2ad343728782 100644
>> --- a/drivers/accel/amdxdna/aie2_ctx.c
>> +++ b/drivers/accel/amdxdna/aie2_ctx.c
>> @@ -437,8 +437,9 @@ static void aie2_sched_job_free(struct 
>> drm_sched_job *sched_job)
>>       struct amdxdna_sched_job *job = drm_job_to_xdna_job(sched_job);
>>       struct amdxdna_hwctx *hwctx = job->hwctx;
>>   +    /* job->drv_cmd could be freed, so use DEFAULT_IO */
>>       trace_xdna_job(sched_job, hwctx->name, "job free",
>> -               job->seq, job->drv_cmd ? job->drv_cmd->opcode : 
>> DEFAULT_IO);
>> +               job->seq, DEFAULT_IO);
>
> Could this still be a race with dov->drv_cmd being valid when the 
> first part of the expression is evaluated (job->drv_cmd) but invalid 
> when job->drv_cmd->opcode is accessed?

When aie2_sched_job_free() is called, the job->drv_cmd could already be 
freed. So it should never access job->drv_cmd at all. The entire 
expression "job->drv_cmd ? job->drv_cmd->opcode :  DEFAULT_IO" is removed.

Lizhi

>
>>       if (!job->job_done)
>>           up(&hwctx->priv->job_sem);
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V1] accel/amdxdna: Remove drv_cmd tracing from job free callback
  2026-06-01 15:18   ` Lizhi Hou
@ 2026-06-03 18:34     ` Mario Limonciello
  2026-06-03 20:17       ` Lizhi Hou
  0 siblings, 1 reply; 7+ messages in thread
From: Mario Limonciello @ 2026-06-03 18:34 UTC (permalink / raw)
  To: Lizhi Hou, ogabbay, quic_jhugo, dri-devel, karol.wachowski
  Cc: linux-kernel, max.zhen, sonal.santan



On 6/1/26 10:18, Lizhi Hou wrote:
> 
> On 5/31/26 07:18, Mario Limonciello wrote:
>>
>>
>> On 5/29/26 17:28, Lizhi Hou wrote:
>>> aie2_sched_job_free() accesses job->drv_cmd for tracing purposes. 
>>> However,
>>> job->drv_cmd is owned by the caller and may already have been freed when
>>> the job free callback runs, leading to a potential use-after-free.
>>>
>>> Remove the job->drv_cmd access from aie2_sched_job_free().
>>>
>>> Fixes: 8711eb2dde2e ("accel/amdxdna: Improve tracing for job 
>>> lifecycle and mailbox RX worker")
>>> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
>>> ---
>>>   drivers/accel/amdxdna/aie2_ctx.c | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/accel/amdxdna/aie2_ctx.c b/drivers/accel/ 
>>> amdxdna/aie2_ctx.c
>>> index 658a5fb1fda6..2ad343728782 100644
>>> --- a/drivers/accel/amdxdna/aie2_ctx.c
>>> +++ b/drivers/accel/amdxdna/aie2_ctx.c
>>> @@ -437,8 +437,9 @@ static void aie2_sched_job_free(struct 
>>> drm_sched_job *sched_job)
>>>       struct amdxdna_sched_job *job = drm_job_to_xdna_job(sched_job);
>>>       struct amdxdna_hwctx *hwctx = job->hwctx;
>>>   +    /* job->drv_cmd could be freed, so use DEFAULT_IO */
>>>       trace_xdna_job(sched_job, hwctx->name, "job free",
>>> -               job->seq, job->drv_cmd ? job->drv_cmd->opcode : 
>>> DEFAULT_IO);
>>> +               job->seq, DEFAULT_IO);
>>
>> Could this still be a race with dov->drv_cmd being valid when the 
>> first part of the expression is evaluated (job->drv_cmd) but invalid 
>> when job->drv_cmd->opcode is accessed?
> 
> When aie2_sched_job_free() is called, the job->drv_cmd could already be 
> freed. So it should never access job->drv_cmd at all. The entire 
> expression "job->drv_cmd ? job->drv_cmd->opcode :  DEFAULT_IO" is removed.
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
> 
> Lizhi
> 
>>
>>>       if (!job->job_done)
>>>           up(&hwctx->priv->job_sem);
>>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V1] accel/amdxdna: Remove drv_cmd tracing from job free callback
  2026-06-03 18:34     ` Mario Limonciello
@ 2026-06-03 20:17       ` Lizhi Hou
  0 siblings, 0 replies; 7+ messages in thread
From: Lizhi Hou @ 2026-06-03 20:17 UTC (permalink / raw)
  To: Mario Limonciello, ogabbay, quic_jhugo, dri-devel,
	karol.wachowski
  Cc: linux-kernel, max.zhen, sonal.santan

Applied to drm-misc-next-fixes

On 6/3/26 11:34, Mario Limonciello wrote:
>
>
> On 6/1/26 10:18, Lizhi Hou wrote:
>>
>> On 5/31/26 07:18, Mario Limonciello wrote:
>>>
>>>
>>> On 5/29/26 17:28, Lizhi Hou wrote:
>>>> aie2_sched_job_free() accesses job->drv_cmd for tracing purposes. 
>>>> However,
>>>> job->drv_cmd is owned by the caller and may already have been freed 
>>>> when
>>>> the job free callback runs, leading to a potential use-after-free.
>>>>
>>>> Remove the job->drv_cmd access from aie2_sched_job_free().
>>>>
>>>> Fixes: 8711eb2dde2e ("accel/amdxdna: Improve tracing for job 
>>>> lifecycle and mailbox RX worker")
>>>> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
>>>> ---
>>>>   drivers/accel/amdxdna/aie2_ctx.c | 3 ++-
>>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/accel/amdxdna/aie2_ctx.c b/drivers/accel/ 
>>>> amdxdna/aie2_ctx.c
>>>> index 658a5fb1fda6..2ad343728782 100644
>>>> --- a/drivers/accel/amdxdna/aie2_ctx.c
>>>> +++ b/drivers/accel/amdxdna/aie2_ctx.c
>>>> @@ -437,8 +437,9 @@ static void aie2_sched_job_free(struct 
>>>> drm_sched_job *sched_job)
>>>>       struct amdxdna_sched_job *job = drm_job_to_xdna_job(sched_job);
>>>>       struct amdxdna_hwctx *hwctx = job->hwctx;
>>>>   +    /* job->drv_cmd could be freed, so use DEFAULT_IO */
>>>>       trace_xdna_job(sched_job, hwctx->name, "job free",
>>>> -               job->seq, job->drv_cmd ? job->drv_cmd->opcode : 
>>>> DEFAULT_IO);
>>>> +               job->seq, DEFAULT_IO);
>>>
>>> Could this still be a race with dov->drv_cmd being valid when the 
>>> first part of the expression is evaluated (job->drv_cmd) but invalid 
>>> when job->drv_cmd->opcode is accessed?
>>
>> When aie2_sched_job_free() is called, the job->drv_cmd could already 
>> be freed. So it should never access job->drv_cmd at all. The entire 
>> expression "job->drv_cmd ? job->drv_cmd->opcode :  DEFAULT_IO" is 
>> removed.
> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
>>
>> Lizhi
>>
>>>
>>>>       if (!job->job_done)
>>>>           up(&hwctx->priv->job_sem);
>>>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Claude review: accel/amdxdna: Remove drv_cmd tracing from job free callback
  2026-05-29 15:28 [PATCH V1] accel/amdxdna: Remove drv_cmd tracing from job free callback Lizhi Hou
  2026-05-31 14:18 ` Mario Limonciello
@ 2026-06-04  6:19 ` Claude Code Review Bot
  2026-06-04  6:19 ` Claude Code Review Bot
  2 siblings, 0 replies; 7+ messages in thread
From: Claude Code Review Bot @ 2026-06-04  6:19 UTC (permalink / raw)
  To: dri-devel-reviews

Overall Series Review

Subject: accel/amdxdna: Remove drv_cmd tracing from job free callback
Author: Lizhi Hou <lizhi.hou@amd.com>
Patches: 5
Reviewed: 2026-06-04T16:19:38.908940

---

This is a single-patch fix for a real use-after-free bug in `accel/amdxdna`. The analysis is correct: `job->drv_cmd` points to a caller-owned stack-local `struct amdxdna_drv_cmd` (e.g., the `cmd` variable in `aie2_hwctx_cfg_debug_bo()` at line 907 and `aie2_hwctx_sync_debug_bo()` at line 979). After `amdxdna_cmd_submit()` returns and the calling function finishes, that stack frame is gone — but the job may still be alive in the scheduler and `aie2_sched_job_free()` can run later via the `.free_job` callback, at which point `job->drv_cmd` is a dangling pointer.

The fix is correct and minimal. The patch is straightforward and appropriate for a `Fixes:` tag.

**Verdict: Looks good.** One minor observation below.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Claude review: accel/amdxdna: Remove drv_cmd tracing from job free callback
  2026-05-29 15:28 [PATCH V1] accel/amdxdna: Remove drv_cmd tracing from job free callback Lizhi Hou
  2026-05-31 14:18 ` Mario Limonciello
  2026-06-04  6:19 ` Claude review: " Claude Code Review Bot
@ 2026-06-04  6:19 ` Claude Code Review Bot
  2 siblings, 0 replies; 7+ messages in thread
From: Claude Code Review Bot @ 2026-06-04  6:19 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Correctness: Good.** The use-after-free is real. The callers in `aie2_hwctx_cfg_debug_bo()` and `aie2_hwctx_sync_debug_bo()` pass a pointer to a stack-local `struct amdxdna_drv_cmd cmd`:

```c
struct amdxdna_drv_cmd cmd = { 0 };
...
ret = amdxdna_cmd_submit(client, &cmd, ...);
```

And in `amdxdna_cmd_submit()` this is stored as a bare pointer:

```c
job->drv_cmd = drv_cmd;
```

Since `aie2_sched_job_free` runs asynchronously via `drm_sched_backend_ops.free_job`, it can execute after the caller's stack frame is gone, making `job->drv_cmd` a dangling pointer. The fix correctly removes the access.

**Minor observation:** The added comment says `/* job->drv_cmd could be freed, so use DEFAULT_IO */`, but strictly speaking `drv_cmd` points to a stack-local — it isn't "freed" in the heap sense, the stack frame just goes out of scope. A more precise comment might say "could be invalid" or "may point to freed/stale memory." This is very minor and doesn't affect the fix.

**Other `drv_cmd` accesses:** The other uses of `job->drv_cmd` in `aie2_ctx.c` (the "signaling fence" trace at line 212, "job run" at line 384, the switch at line 395, and "sent to device" at line 429) all happen within `aie2_sched_job_run()` or its signal callback path, which runs while the job is actively submitted — the caller is blocked in `aie2_cmd_wait()` at that point, so the stack-local `cmd` is still valid. Those accesses are fine.

**Reviewed-by criteria:** This is a clean, minimal fix for a real bug. No concerns.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-06-04  6:19 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-29 15:28 [PATCH V1] accel/amdxdna: Remove drv_cmd tracing from job free callback Lizhi Hou
2026-05-31 14:18 ` Mario Limonciello
2026-06-01 15:18   ` Lizhi Hou
2026-06-03 18:34     ` Mario Limonciello
2026-06-03 20:17       ` Lizhi Hou
2026-06-04  6:19 ` Claude review: " Claude Code Review Bot
2026-06-04  6:19 ` Claude Code Review Bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox