* [PATCH V1] accel/amdxdna: Fix NULL pointer dereference of mgmt_chann
@ 2026-02-26 21:38 Lizhi Hou
2026-02-27 1:41 ` Claude review: " Claude Code Review Bot
2026-02-27 1:41 ` Claude Code Review Bot
0 siblings, 2 replies; 3+ messages in thread
From: Lizhi Hou @ 2026-02-26 21:38 UTC (permalink / raw)
To: ogabbay, quic_jhugo, dri-devel, maciej.falkowski
Cc: Lizhi Hou, linux-kernel, max.zhen, sonal.santan,
mario.limonciello
mgmt_chann may be set to NULL if the firmware returns an unexpected
error in aie2_send_mgmt_msg_wait(). This can later lead to a NULL
pointer dereference in aie2_hw_stop().
Fix this by introducing a dedicated helper to destroy mgmt_chann
and by adding proper NULL checks before accessing it.
Fixes: b87f920b9344 ("accel/amdxdna: Support hardware mailbox")
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
---
drivers/accel/amdxdna/aie2_message.c | 21 ++++++++++++++++-----
drivers/accel/amdxdna/aie2_pci.c | 7 ++-----
drivers/accel/amdxdna/aie2_pci.h | 1 +
3 files changed, 19 insertions(+), 10 deletions(-)
diff --git a/drivers/accel/amdxdna/aie2_message.c b/drivers/accel/amdxdna/aie2_message.c
index 277a27bce850..22e1a85a7ae0 100644
--- a/drivers/accel/amdxdna/aie2_message.c
+++ b/drivers/accel/amdxdna/aie2_message.c
@@ -40,11 +40,8 @@ static int aie2_send_mgmt_msg_wait(struct amdxdna_dev_hdl *ndev,
return -ENODEV;
ret = xdna_send_msg_wait(xdna, ndev->mgmt_chann, msg);
- if (ret == -ETIME) {
- xdna_mailbox_stop_channel(ndev->mgmt_chann);
- xdna_mailbox_destroy_channel(ndev->mgmt_chann);
- ndev->mgmt_chann = NULL;
- }
+ if (ret == -ETIME)
+ aie2_destroy_mgmt_chann(ndev);
if (!ret && *hdl->status != AIE2_STATUS_SUCCESS) {
XDNA_ERR(xdna, "command opcode 0x%x failed, status 0x%x",
@@ -914,6 +911,20 @@ void aie2_msg_init(struct amdxdna_dev_hdl *ndev)
ndev->exec_msg_ops = &legacy_exec_message_ops;
}
+void aie2_destroy_mgmt_chann(struct amdxdna_dev_hdl *ndev)
+{
+ struct amdxdna_dev *xdna = ndev->xdna;
+
+ drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock));
+
+ if (!ndev->mgmt_chann)
+ return;
+
+ xdna_mailbox_stop_channel(ndev->mgmt_chann);
+ xdna_mailbox_destroy_channel(ndev->mgmt_chann);
+ ndev->mgmt_chann = NULL;
+}
+
static inline struct amdxdna_gem_obj *
aie2_cmdlist_get_cmd_buf(struct amdxdna_sched_job *job)
{
diff --git a/drivers/accel/amdxdna/aie2_pci.c b/drivers/accel/amdxdna/aie2_pci.c
index 85079b6fc5d9..977ce21eaf9f 100644
--- a/drivers/accel/amdxdna/aie2_pci.c
+++ b/drivers/accel/amdxdna/aie2_pci.c
@@ -330,9 +330,7 @@ static void aie2_hw_stop(struct amdxdna_dev *xdna)
aie2_runtime_cfg(ndev, AIE2_RT_CFG_CLK_GATING, NULL);
aie2_mgmt_fw_fini(ndev);
- xdna_mailbox_stop_channel(ndev->mgmt_chann);
- xdna_mailbox_destroy_channel(ndev->mgmt_chann);
- ndev->mgmt_chann = NULL;
+ aie2_destroy_mgmt_chann(ndev);
drmm_kfree(&xdna->ddev, ndev->mbox);
ndev->mbox = NULL;
aie2_psp_stop(ndev->psp_hdl);
@@ -441,8 +439,7 @@ static int aie2_hw_start(struct amdxdna_dev *xdna)
return 0;
destroy_mgmt_chann:
- xdna_mailbox_stop_channel(ndev->mgmt_chann);
- xdna_mailbox_destroy_channel(ndev->mgmt_chann);
+ aie2_destroy_mgmt_chann(ndev);
stop_psp:
aie2_psp_stop(ndev->psp_hdl);
fini_smu:
diff --git a/drivers/accel/amdxdna/aie2_pci.h b/drivers/accel/amdxdna/aie2_pci.h
index b20a3661078c..e72311c77996 100644
--- a/drivers/accel/amdxdna/aie2_pci.h
+++ b/drivers/accel/amdxdna/aie2_pci.h
@@ -303,6 +303,7 @@ int aie2_get_array_async_error(struct amdxdna_dev_hdl *ndev,
/* aie2_message.c */
void aie2_msg_init(struct amdxdna_dev_hdl *ndev);
+void aie2_destroy_mgmt_chann(struct amdxdna_dev_hdl *ndev);
int aie2_suspend_fw(struct amdxdna_dev_hdl *ndev);
int aie2_resume_fw(struct amdxdna_dev_hdl *ndev);
int aie2_set_runtime_cfg(struct amdxdna_dev_hdl *ndev, u32 type, u64 value);
--
2.34.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Claude review: accel/amdxdna: Fix NULL pointer dereference of mgmt_chann
2026-02-26 21:38 [PATCH V1] accel/amdxdna: Fix NULL pointer dereference of mgmt_chann Lizhi Hou
@ 2026-02-27 1:41 ` Claude Code Review Bot
2026-02-27 1:41 ` Claude Code Review Bot
1 sibling, 0 replies; 3+ messages in thread
From: Claude Code Review Bot @ 2026-02-27 1:41 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: accel/amdxdna: Fix NULL pointer dereference of mgmt_chann
Author: Lizhi Hou <lizhi.hou@amd.com>
Patches: 1
Reviewed: 2026-02-27T11:41:38.362578
---
This is a single-patch fix for a NULL pointer dereference in the `accel/amdxdna` driver. The bug is real: if `aie2_send_mgmt_msg_wait()` times out (`-ETIME`), it sets `ndev->mgmt_chann = NULL`, and a subsequent call to `aie2_hw_stop()` will pass that NULL pointer to `xdna_mailbox_stop_channel()` and `xdna_mailbox_destroy_channel()`, causing a crash. The approach of extracting a helper with a NULL guard is sound, but the `drm_WARN_ON` lock assertion introduced in the helper may be too strong for all call sites.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 3+ messages in thread
* Claude review: accel/amdxdna: Fix NULL pointer dereference of mgmt_chann
2026-02-26 21:38 [PATCH V1] accel/amdxdna: Fix NULL pointer dereference of mgmt_chann Lizhi Hou
2026-02-27 1:41 ` Claude review: " Claude Code Review Bot
@ 2026-02-27 1:41 ` Claude Code Review Bot
1 sibling, 0 replies; 3+ messages in thread
From: Claude Code Review Bot @ 2026-02-27 1:41 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**The fix itself is correct.** The core problem is clear:
1. `aie2_send_mgmt_msg_wait()` sets `ndev->mgmt_chann = NULL` on `-ETIME`
2. `aie2_hw_stop()` unconditionally calls `xdna_mailbox_stop_channel(ndev->mgmt_chann)` without a NULL check
The new `aie2_destroy_mgmt_chann()` helper with its early-return NULL check solves this.
**Concern — lock assertion may fire on the resume error path:**
The new helper contains:
```c
drm_WARN_ON(&xdna->ddev, !mutex_is_locked(&xdna->dev_lock));
```
`aie2_hw_suspend()` takes `dev_lock` via `guard(mutex)(&xdna->dev_lock)` before calling `aie2_hw_stop()`, so the stop path is fine. `aie2_init()` and `aie2_fini()` are also called with `dev_lock` held (confirmed in `amdxdna_pci_drv.c` lines 285–287, 310–312, 328–339).
However, `aie2_hw_resume()` does **not** take `dev_lock`:
```c
static int aie2_hw_resume(struct amdxdna_dev *xdna)
{
struct amdxdna_client *client;
int ret;
ret = aie2_hw_start(xdna);
```
It is called from `amdxdna_pm_resume()` which also does not acquire `dev_lock`. If `aie2_hw_start()` fails and falls through to the `destroy_mgmt_chann:` error label, or if any `aie2_send_mgmt_msg_wait()` call during startup times out, the `drm_WARN_ON` will fire. This path is a legitimate (if unlikely) error scenario. Either:
- `aie2_hw_resume()` should acquire `dev_lock` (which might be the right fix anyway for consistency), or
- The lock assertion should be dropped/relaxed, or
- A comment should document why the resume path is safe without the lock.
**Minor — error path NULL assignment:** In `aie2_hw_start()`'s error path, the original code did not set `ndev->mgmt_chann = NULL` after destroying the channel:
```c
destroy_mgmt_chann:
xdna_mailbox_stop_channel(ndev->mgmt_chann);
xdna_mailbox_destroy_channel(ndev->mgmt_chann);
```
The new helper does set it to NULL, which is a slight improvement in defensive coding for the error path — no issue here, just noting the behavioral change.
**Minor — helper placement:** `aie2_destroy_mgmt_chann()` is placed between `aie2_msg_init()` and a static inline for command list buffers. Placing it adjacent to `aie2_msg_init()` (before or immediately after) would read more naturally, but this is cosmetic.
**Summary:** The NULL deref fix is correct and the helper consolidation is good. The main actionable concern is the `drm_WARN_ON` lock assertion vs. the `aie2_hw_resume` → `aie2_hw_start` error path which doesn't hold `dev_lock`. Please verify this won't produce a warning splat on resume failure, or add locking to the resume path.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-02-27 1:41 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-26 21:38 [PATCH V1] accel/amdxdna: Fix NULL pointer dereference of mgmt_chann Lizhi Hou
2026-02-27 1:41 ` Claude review: " Claude Code Review Bot
2026-02-27 1:41 ` Claude Code Review Bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox