* [PATCH 0/2] amdxdna: fixes for closing a process
@ 2026-02-10 16:42 Mario Limonciello
2026-02-10 16:42 ` [PATCH 1/2] accel/amdxdna: Fix NULL pointer dereference in mailbox channel cleanup Mario Limonciello
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Mario Limonciello @ 2026-02-10 16:42 UTC (permalink / raw)
To: mario.limonciello, lizhi.hou, mamin506, ogabbay, superm1; +Cc: dri-devel
I found that with drm-next when I close a process using amdxdna that
I hit a GPF. After fixing the GPF I found that it was really noisy
for standard cleanup from a closed process.
Mario Limonciello (2):
accel/amdxdna: Fix NULL pointer dereference in mailbox channel cleanup
accel/amdxdna: Reduce log noise during process termination
drivers/accel/amdxdna/aie2_ctx.c | 6 ++++--
drivers/accel/amdxdna/aie2_message.c | 18 ++++++++++++------
drivers/accel/amdxdna/aie2_pci.c | 14 +++++++++-----
3 files changed, 25 insertions(+), 13 deletions(-)
--
2.53.0
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH 1/2] accel/amdxdna: Fix NULL pointer dereference in mailbox channel cleanup 2026-02-10 16:42 [PATCH 0/2] amdxdna: fixes for closing a process Mario Limonciello @ 2026-02-10 16:42 ` Mario Limonciello 2026-02-10 17:17 ` Lizhi Hou 2026-02-11 6:21 ` Claude review: " Claude Code Review Bot 2026-02-10 16:42 ` [PATCH 2/2] accel/amdxdna: Reduce log noise during process termination Mario Limonciello 2026-02-11 6:21 ` Claude review: amdxdna: fixes for closing a process Claude Code Review Bot 2 siblings, 2 replies; 8+ messages in thread From: Mario Limonciello @ 2026-02-10 16:42 UTC (permalink / raw) To: mario.limonciello, lizhi.hou, mamin506, ogabbay, superm1; +Cc: dri-devel aie2_destroy_context() is called during various cleanup paths, including when context creation fails partially. If xdna_mailbox_create_channel() fails during aie2_create_context(), the hwctx->priv->mbox_chann pointer remains NULL. When cleanup occurs (e.g., during process termination via amdxdna_hwctx_remove_all), aie2_destroy_context() is invoked and attempts to stop and destroy the NULL mailbox channel, leading to a NULL pointer dereference. The issue was observed in the following call path: amdxdna_drm_close amdxdna_hwctx_remove_all aie2_hwctx_fini aie2_release_resource aie2_destroy_context xdna_mailbox_stop_channel <- NULL dereference Add NULL checks in aie2_destroy_context() before calling mailbox channel operations. Also add defensive NULL checks in aie2_hw_stop() for both mgmt_chann and mbox to prevent similar issues during device shutdown. Fixes: 97f27573837e ("accel/amdxdna: Fix potential NULL pointer dereference in context cleanup") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> --- drivers/accel/amdxdna/aie2_message.c | 14 +++++++++----- drivers/accel/amdxdna/aie2_pci.c | 14 +++++++++----- 2 files changed, 18 insertions(+), 10 deletions(-) diff --git a/drivers/accel/amdxdna/aie2_message.c b/drivers/accel/amdxdna/aie2_message.c index 7d7dcfeaf7942..77e3cdf18658b 100644 --- a/drivers/accel/amdxdna/aie2_message.c +++ b/drivers/accel/amdxdna/aie2_message.c @@ -318,11 +318,15 @@ int aie2_destroy_context(struct amdxdna_dev_hdl *ndev, struct amdxdna_hwctx *hwc struct amdxdna_dev *xdna = ndev->xdna; int ret; - xdna_mailbox_stop_channel(hwctx->priv->mbox_chann); - ret = aie2_destroy_context_req(ndev, hwctx->fw_ctx_id); - xdna_mailbox_destroy_channel(hwctx->priv->mbox_chann); - XDNA_DBG(xdna, "Destroyed fw ctx %d", hwctx->fw_ctx_id); - hwctx->priv->mbox_chann = NULL; + if (hwctx->priv->mbox_chann) { + xdna_mailbox_stop_channel(hwctx->priv->mbox_chann); + ret = aie2_destroy_context_req(ndev, hwctx->fw_ctx_id); + xdna_mailbox_destroy_channel(hwctx->priv->mbox_chann); + XDNA_DBG(xdna, "Destroyed fw ctx %d", hwctx->fw_ctx_id); + hwctx->priv->mbox_chann = NULL; + } else { + ret = aie2_destroy_context_req(ndev, hwctx->fw_ctx_id); + } hwctx->fw_ctx_id = -1; ndev->hwctx_num--; diff --git a/drivers/accel/amdxdna/aie2_pci.c b/drivers/accel/amdxdna/aie2_pci.c index f70ccf0f3c019..9c2572706bf53 100644 --- a/drivers/accel/amdxdna/aie2_pci.c +++ b/drivers/accel/amdxdna/aie2_pci.c @@ -324,11 +324,15 @@ static void aie2_hw_stop(struct amdxdna_dev *xdna) } aie2_mgmt_fw_fini(ndev); - xdna_mailbox_stop_channel(ndev->mgmt_chann); - xdna_mailbox_destroy_channel(ndev->mgmt_chann); - ndev->mgmt_chann = NULL; - drmm_kfree(&xdna->ddev, ndev->mbox); - ndev->mbox = NULL; + if (ndev->mgmt_chann) { + xdna_mailbox_stop_channel(ndev->mgmt_chann); + xdna_mailbox_destroy_channel(ndev->mgmt_chann); + ndev->mgmt_chann = NULL; + } + if (ndev->mbox) { + drmm_kfree(&xdna->ddev, ndev->mbox); + ndev->mbox = NULL; + } aie2_psp_stop(ndev->psp_hdl); aie2_smu_fini(ndev); aie2_error_async_events_free(ndev); -- 2.53.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 1/2] accel/amdxdna: Fix NULL pointer dereference in mailbox channel cleanup 2026-02-10 16:42 ` [PATCH 1/2] accel/amdxdna: Fix NULL pointer dereference in mailbox channel cleanup Mario Limonciello @ 2026-02-10 17:17 ` Lizhi Hou 2026-02-11 6:21 ` Claude review: " Claude Code Review Bot 1 sibling, 0 replies; 8+ messages in thread From: Lizhi Hou @ 2026-02-10 17:17 UTC (permalink / raw) To: Mario Limonciello, mamin506, ogabbay, superm1; +Cc: dri-devel Hi Mario, I posted a fix for this: https://lore.kernel.org/dri-devel/20260206060306.4050531-1-lizhi.hou@amd.com/ I am not sure if it still a good time to merge to drm-misc-next-fixes for 6.20 kernel. And I plan to merge to drm-misc-fixes during 6.20 rc1 time. Thanks, Lizhi On 2/10/26 08:42, Mario Limonciello wrote: > aie2_destroy_context() is called during various cleanup paths, including > when context creation fails partially. If xdna_mailbox_create_channel() > fails during aie2_create_context(), the hwctx->priv->mbox_chann pointer > remains NULL. When cleanup occurs (e.g., during process termination via > amdxdna_hwctx_remove_all), aie2_destroy_context() is invoked and attempts > to stop and destroy the NULL mailbox channel, leading to a NULL pointer > dereference. > > The issue was observed in the following call path: > amdxdna_drm_close > amdxdna_hwctx_remove_all > aie2_hwctx_fini > aie2_release_resource > aie2_destroy_context > xdna_mailbox_stop_channel <- NULL dereference > > Add NULL checks in aie2_destroy_context() before calling mailbox channel > operations. Also add defensive NULL checks in aie2_hw_stop() for both > mgmt_chann and mbox to prevent similar issues during device shutdown. > > Fixes: 97f27573837e ("accel/amdxdna: Fix potential NULL pointer dereference in context cleanup") > Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> > --- > drivers/accel/amdxdna/aie2_message.c | 14 +++++++++----- > drivers/accel/amdxdna/aie2_pci.c | 14 +++++++++----- > 2 files changed, 18 insertions(+), 10 deletions(-) > > diff --git a/drivers/accel/amdxdna/aie2_message.c b/drivers/accel/amdxdna/aie2_message.c > index 7d7dcfeaf7942..77e3cdf18658b 100644 > --- a/drivers/accel/amdxdna/aie2_message.c > +++ b/drivers/accel/amdxdna/aie2_message.c > @@ -318,11 +318,15 @@ int aie2_destroy_context(struct amdxdna_dev_hdl *ndev, struct amdxdna_hwctx *hwc > struct amdxdna_dev *xdna = ndev->xdna; > int ret; > > - xdna_mailbox_stop_channel(hwctx->priv->mbox_chann); > - ret = aie2_destroy_context_req(ndev, hwctx->fw_ctx_id); > - xdna_mailbox_destroy_channel(hwctx->priv->mbox_chann); > - XDNA_DBG(xdna, "Destroyed fw ctx %d", hwctx->fw_ctx_id); > - hwctx->priv->mbox_chann = NULL; > + if (hwctx->priv->mbox_chann) { > + xdna_mailbox_stop_channel(hwctx->priv->mbox_chann); > + ret = aie2_destroy_context_req(ndev, hwctx->fw_ctx_id); > + xdna_mailbox_destroy_channel(hwctx->priv->mbox_chann); > + XDNA_DBG(xdna, "Destroyed fw ctx %d", hwctx->fw_ctx_id); > + hwctx->priv->mbox_chann = NULL; > + } else { > + ret = aie2_destroy_context_req(ndev, hwctx->fw_ctx_id); > + } > hwctx->fw_ctx_id = -1; > ndev->hwctx_num--; > > diff --git a/drivers/accel/amdxdna/aie2_pci.c b/drivers/accel/amdxdna/aie2_pci.c > index f70ccf0f3c019..9c2572706bf53 100644 > --- a/drivers/accel/amdxdna/aie2_pci.c > +++ b/drivers/accel/amdxdna/aie2_pci.c > @@ -324,11 +324,15 @@ static void aie2_hw_stop(struct amdxdna_dev *xdna) > } > > aie2_mgmt_fw_fini(ndev); > - xdna_mailbox_stop_channel(ndev->mgmt_chann); > - xdna_mailbox_destroy_channel(ndev->mgmt_chann); > - ndev->mgmt_chann = NULL; > - drmm_kfree(&xdna->ddev, ndev->mbox); > - ndev->mbox = NULL; > + if (ndev->mgmt_chann) { > + xdna_mailbox_stop_channel(ndev->mgmt_chann); > + xdna_mailbox_destroy_channel(ndev->mgmt_chann); > + ndev->mgmt_chann = NULL; > + } > + if (ndev->mbox) { > + drmm_kfree(&xdna->ddev, ndev->mbox); > + ndev->mbox = NULL; > + } > aie2_psp_stop(ndev->psp_hdl); > aie2_smu_fini(ndev); > aie2_error_async_events_free(ndev); ^ permalink raw reply [flat|nested] 8+ messages in thread
* Claude review: accel/amdxdna: Fix NULL pointer dereference in mailbox channel cleanup 2026-02-10 16:42 ` [PATCH 1/2] accel/amdxdna: Fix NULL pointer dereference in mailbox channel cleanup Mario Limonciello 2026-02-10 17:17 ` Lizhi Hou @ 2026-02-11 6:21 ` Claude Code Review Bot 1 sibling, 0 replies; 8+ messages in thread From: Claude Code Review Bot @ 2026-02-11 6:21 UTC (permalink / raw) To: dri-devel-reviews Patch Review **Summary:** This patch is **redundant** - an equivalent fix already exists upstream per reviewer feedback. **Critical Issues:** 1. **Code Duplication:** ```c + if (hwctx->priv->mbox_chann) { + xdna_mailbox_stop_channel(hwctx->priv->mbox_chann); + ret = aie2_destroy_context_req(ndev, hwctx->fw_ctx_id); + xdna_mailbox_destroy_channel(hwctx->priv->mbox_chann); + XDNA_DBG(xdna, "Destroyed fw ctx %d", hwctx->fw_ctx_id); + hwctx->priv->mbox_chann = NULL; + } else { + ret = aie2_destroy_context_req(ndev, hwctx->fw_ctx_id); + } ``` The `aie2_destroy_context_req()` call is duplicated in both branches. This should be refactored: ```c if (hwctx->priv->mbox_chann) { xdna_mailbox_stop_channel(hwctx->priv->mbox_chann); xdna_mailbox_destroy_channel(hwctx->priv->mbox_chann); XDNA_DBG(xdna, "Destroyed fw ctx %d", hwctx->fw_ctx_id); hwctx->priv->mbox_chann = NULL; } ret = aie2_destroy_context_req(ndev, hwctx->fw_ctx_id); ``` 2. **Missing variable initialization:** ```c int ret; ``` In the original code, if `mbox_chann` is NULL and we enter the else branch, `ret` may be used uninitialized if we had taken the error path earlier. The refactored version above fixes this. 3. **Questionable NULL check in aie2_hw_stop():** ```c + if (ndev->mbox) { + drmm_kfree(&xdna->ddev, ndev->mbox); + ndev->mbox = NULL; + } ``` `drmm_kfree()` should already handle NULL pointers gracefully (this is standard for free-style functions). If it doesn't, that's a bug in the DRM managed memory API. This check may be unnecessary. 4. **Root cause not addressed:** The patch doesn't explain *why* `mbox_chann` can be NULL. Looking at the description: > If xdna_mailbox_create_channel() fails during aie2_create_context(), the hwctx->priv->mbox_chann pointer remains NULL. This suggests incomplete error handling in the creation path. The proper fix might be to ensure `hwctx->priv` is properly initialized or to prevent `aie2_destroy_context()` from being called if creation failed. **Minor Issues:** - The Fixes tag references commit `97f27573837e` which appears to be the fix itself based on the commit message. This is circular and incorrect. --- Generated by Claude Code Patch Reviewer ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 2/2] accel/amdxdna: Reduce log noise during process termination 2026-02-10 16:42 [PATCH 0/2] amdxdna: fixes for closing a process Mario Limonciello 2026-02-10 16:42 ` [PATCH 1/2] accel/amdxdna: Fix NULL pointer dereference in mailbox channel cleanup Mario Limonciello @ 2026-02-10 16:42 ` Mario Limonciello 2026-02-10 17:20 ` Lizhi Hou 2026-02-11 6:21 ` Claude review: " Claude Code Review Bot 2026-02-11 6:21 ` Claude review: amdxdna: fixes for closing a process Claude Code Review Bot 2 siblings, 2 replies; 8+ messages in thread From: Mario Limonciello @ 2026-02-10 16:42 UTC (permalink / raw) To: mario.limonciello, lizhi.hou, mamin506, ogabbay, superm1; +Cc: dri-devel During process termination, several error messages are logged that are not actual errors but expected conditions when a process is killed or interrupted. This creates unnecessary noise in the kernel log. The specific scenarios are: 1. HMM invalidation returns -ERESTARTSYS when the wait is interrupted by a signal during process cleanup. This is expected when a process is being terminated and should not be logged as an error. 2. Context destruction returns -ENODEV when the firmware or device has already stopped, which commonly occurs during cleanup if the device was already torn down. This is also an expected condition during orderly shutdown. Downgrade these expected error conditions from error level to debug level to reduce log noise while still keeping genuine errors visible. Fixes: 97f27573837e ("accel/amdxdna: Fix potential NULL pointer dereference in context cleanup") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> --- drivers/accel/amdxdna/aie2_ctx.c | 6 ++++-- drivers/accel/amdxdna/aie2_message.c | 4 +++- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/accel/amdxdna/aie2_ctx.c b/drivers/accel/amdxdna/aie2_ctx.c index 37d05f2e986f9..79f6316655e73 100644 --- a/drivers/accel/amdxdna/aie2_ctx.c +++ b/drivers/accel/amdxdna/aie2_ctx.c @@ -497,7 +497,7 @@ static void aie2_release_resource(struct amdxdna_hwctx *hwctx) if (AIE2_FEATURE_ON(xdna->dev_handle, AIE2_TEMPORAL_ONLY)) { ret = aie2_destroy_context(xdna->dev_handle, hwctx); - if (ret) + if (ret && ret != -ENODEV) XDNA_ERR(xdna, "Destroy temporal only context failed, ret %d", ret); } else { ret = xrs_release_resource(xdna->xrs_hdl, (uintptr_t)hwctx); @@ -1070,6 +1070,8 @@ void aie2_hmm_invalidate(struct amdxdna_gem_obj *abo, ret = dma_resv_wait_timeout(gobj->resv, DMA_RESV_USAGE_BOOKKEEP, true, MAX_SCHEDULE_TIMEOUT); - if (!ret || ret == -ERESTARTSYS) + if (!ret) XDNA_ERR(xdna, "Failed to wait for bo, ret %ld", ret); + else if (ret == -ERESTARTSYS) + XDNA_DBG(xdna, "Wait for bo interrupted by signal"); } diff --git a/drivers/accel/amdxdna/aie2_message.c b/drivers/accel/amdxdna/aie2_message.c index 77e3cdf18658b..5697c0c2dd43f 100644 --- a/drivers/accel/amdxdna/aie2_message.c +++ b/drivers/accel/amdxdna/aie2_message.c @@ -216,8 +216,10 @@ static int aie2_destroy_context_req(struct amdxdna_dev_hdl *ndev, u32 id) req.context_id = id; ret = aie2_send_mgmt_msg_wait(ndev, &msg); - if (ret) + if (ret && ret != -ENODEV) XDNA_WARN(xdna, "Destroy context failed, ret %d", ret); + else if (ret == -ENODEV) + XDNA_DBG(xdna, "Destroy context: device already stopped"); return ret; } -- 2.53.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 2/2] accel/amdxdna: Reduce log noise during process termination 2026-02-10 16:42 ` [PATCH 2/2] accel/amdxdna: Reduce log noise during process termination Mario Limonciello @ 2026-02-10 17:20 ` Lizhi Hou 2026-02-11 6:21 ` Claude review: " Claude Code Review Bot 1 sibling, 0 replies; 8+ messages in thread From: Lizhi Hou @ 2026-02-10 17:20 UTC (permalink / raw) To: Mario Limonciello, mamin506, ogabbay, superm1; +Cc: dri-devel On 2/10/26 08:42, Mario Limonciello wrote: > During process termination, several error messages are logged that are > not actual errors but expected conditions when a process is killed or > interrupted. This creates unnecessary noise in the kernel log. > > The specific scenarios are: > > 1. HMM invalidation returns -ERESTARTSYS when the wait is interrupted by > a signal during process cleanup. This is expected when a process is > being terminated and should not be logged as an error. > > 2. Context destruction returns -ENODEV when the firmware or device has > already stopped, which commonly occurs during cleanup if the device > was already torn down. This is also an expected condition during > orderly shutdown. > > Downgrade these expected error conditions from error level to debug level > to reduce log noise while still keeping genuine errors visible. > > Fixes: 97f27573837e ("accel/amdxdna: Fix potential NULL pointer dereference in context cleanup") > Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> > --- > drivers/accel/amdxdna/aie2_ctx.c | 6 ++++-- > drivers/accel/amdxdna/aie2_message.c | 4 +++- > 2 files changed, 7 insertions(+), 3 deletions(-) > > diff --git a/drivers/accel/amdxdna/aie2_ctx.c b/drivers/accel/amdxdna/aie2_ctx.c > index 37d05f2e986f9..79f6316655e73 100644 > --- a/drivers/accel/amdxdna/aie2_ctx.c > +++ b/drivers/accel/amdxdna/aie2_ctx.c > @@ -497,7 +497,7 @@ static void aie2_release_resource(struct amdxdna_hwctx *hwctx) > > if (AIE2_FEATURE_ON(xdna->dev_handle, AIE2_TEMPORAL_ONLY)) { > ret = aie2_destroy_context(xdna->dev_handle, hwctx); > - if (ret) > + if (ret && ret != -ENODEV) > XDNA_ERR(xdna, "Destroy temporal only context failed, ret %d", ret); > } else { > ret = xrs_release_resource(xdna->xrs_hdl, (uintptr_t)hwctx); > @@ -1070,6 +1070,8 @@ void aie2_hmm_invalidate(struct amdxdna_gem_obj *abo, > > ret = dma_resv_wait_timeout(gobj->resv, DMA_RESV_USAGE_BOOKKEEP, > true, MAX_SCHEDULE_TIMEOUT); > - if (!ret || ret == -ERESTARTSYS) > + if (!ret) > XDNA_ERR(xdna, "Failed to wait for bo, ret %ld", ret); > + else if (ret == -ERESTARTSYS) > + XDNA_DBG(xdna, "Wait for bo interrupted by signal"); > } > diff --git a/drivers/accel/amdxdna/aie2_message.c b/drivers/accel/amdxdna/aie2_message.c > index 77e3cdf18658b..5697c0c2dd43f 100644 > --- a/drivers/accel/amdxdna/aie2_message.c > +++ b/drivers/accel/amdxdna/aie2_message.c > @@ -216,8 +216,10 @@ static int aie2_destroy_context_req(struct amdxdna_dev_hdl *ndev, u32 id) > > req.context_id = id; > ret = aie2_send_mgmt_msg_wait(ndev, &msg); > - if (ret) > + if (ret && ret != -ENODEV) > XDNA_WARN(xdna, "Destroy context failed, ret %d", ret);Reviewed-by: > + else if (ret == -ENODEV) > + XDNA_DBG(xdna, "Destroy context: device already stopped"); Reviewed-by: Lizhi Hou <lizhi.hou@amd.com> > > return ret; > } ^ permalink raw reply [flat|nested] 8+ messages in thread
* Claude review: accel/amdxdna: Reduce log noise during process termination 2026-02-10 16:42 ` [PATCH 2/2] accel/amdxdna: Reduce log noise during process termination Mario Limonciello 2026-02-10 17:20 ` Lizhi Hou @ 2026-02-11 6:21 ` Claude Code Review Bot 1 sibling, 0 replies; 8+ messages in thread From: Claude Code Review Bot @ 2026-02-11 6:21 UTC (permalink / raw) To: dri-devel-reviews Patch Review **Summary:** Reasonable intent but implementation could be improved. **Issues:** 1. **Inconsistent error code checking pattern:** In `aie2_ctx.c`: ```c - if (ret) + if (ret && ret != -ENODEV) XDNA_ERR(xdna, "Destroy temporal only context failed, ret %d", ret); ``` This silently swallows `-ENODEV` errors even though the function description says they should be downgraded to debug level. There's no debug message added here unlike in the other locations. 2. **Logic error in dma_resv_wait_timeout handling:** ```c ret = dma_resv_wait_timeout(gobj->resv, DMA_RESV_USAGE_BOOKKEEP, true, MAX_SCHEDULE_TIMEOUT); - if (!ret || ret == -ERESTARTSYS) + if (!ret) XDNA_ERR(xdna, "Failed to wait for bo, ret %ld", ret); + else if (ret == -ERESTARTSYS) + XDNA_DBG(xdna, "Wait for bo interrupted by signal"); ``` The `dma_resv_wait_timeout()` function returns: - `> 0`: Success (remaining timeout in jiffies) - `0`: Timeout - `< 0`: Error code The original code treated both timeout (`!ret`) and signal interruption (`-ERESTARTSYS`) as errors. The new code only prints an error for timeout but adds debug for signals. However, there's a subtle issue: what if there are other error codes besides `-ERESTARTSYS`? They would be silently ignored. A better pattern would be: ```c if (ret > 0) { // Success - do nothing } else if (ret == 0) { XDNA_ERR(xdna, "Timeout waiting for bo"); } else if (ret == -ERESTARTSYS) { XDNA_DBG(xdna, "Wait for bo interrupted by signal"); } else { XDNA_ERR(xdna, "Failed to wait for bo, ret %ld", ret); } ``` 3. **Same Fixes tag issue:** ``` Fixes: 97f27573837e ("accel/amdxdna: Fix potential NULL pointer dereference in context cleanup") ``` This tag references a commit that doesn't seem to be the root cause of log noise - it's the NULL pointer fix. The Fixes tag should reference the commit that introduced the noisy logging, not the previous patch in this series. 4. **Missing context in commit message:** The commit message doesn't explain *how* the submitter determined these are "expected" conditions. For example: - How often does `-ERESTARTSYS` occur during normal termination? - Are there cases where `-ERESTARTSYS` indicates a real problem? - Same questions for `-ENODEV` **Positive aspects:** - The intent is correct - reducing log noise for expected conditions - The debug messages are informative - Reviewed-by tag from Lizhi Hou shows maintainer approval (for this patch specifically) **Recommendation:** This patch should be reworked to: 1. Add consistent debug logging for all downgraded errors 2. Fix the error handling logic in `aie2_hmm_invalidate()` 3. Correct the Fixes tag 4. Provide more context about when these conditions occur --- Generated by Claude Code Patch Reviewer ^ permalink raw reply [flat|nested] 8+ messages in thread
* Claude review: amdxdna: fixes for closing a process 2026-02-10 16:42 [PATCH 0/2] amdxdna: fixes for closing a process Mario Limonciello 2026-02-10 16:42 ` [PATCH 1/2] accel/amdxdna: Fix NULL pointer dereference in mailbox channel cleanup Mario Limonciello 2026-02-10 16:42 ` [PATCH 2/2] accel/amdxdna: Reduce log noise during process termination Mario Limonciello @ 2026-02-11 6:21 ` Claude Code Review Bot 2 siblings, 0 replies; 8+ messages in thread From: Claude Code Review Bot @ 2026-02-11 6:21 UTC (permalink / raw) To: dri-devel-reviews Overall Series Review Subject: amdxdna: fixes for closing a process Author: Mario Limonciello <mario.limonciello@amd.com> Patches: 5 Reviewed: 2026-02-11T16:21:01.694223 --- This 2-patch series addresses critical bugs in the amdxdna accelerator driver related to process cleanup and termination. The series fixes a NULL pointer dereference during mailbox channel cleanup and reduces unnecessary log noise during normal process termination. **Positive Aspects:** - Addresses a real crash (GPF) found in testing - Clear problem description with call path documentation - Defensive programming approach with NULL checks **Critical Issues:** 1. **Patch 1 has already been fixed upstream** - Reviewer Lizhi Hou points out that an equivalent fix was already posted at https://lore.kernel.org/dri-devel/20260206060306.4050531-1-lizhi.hou@amd.com/ 2. **Code duplication** - The else branch in patch 1 duplicates the `aie2_destroy_context_req()` call unnecessarily 3. **Inconsistent error handling** - Both patches modify error handling but don't follow a consistent pattern 4. **Missing initialization checks** - The patches add defensive NULL checks but don't address why these pointers can be NULL in the first place **Recommendation:** This series should not be merged as-is. Patch 1 is redundant with an existing fix. Patch 2 may be useful but needs to be rebased and potentially reworked. --- --- Generated by Claude Code Patch Reviewer ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-02-11 6:21 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-02-10 16:42 [PATCH 0/2] amdxdna: fixes for closing a process Mario Limonciello 2026-02-10 16:42 ` [PATCH 1/2] accel/amdxdna: Fix NULL pointer dereference in mailbox channel cleanup Mario Limonciello 2026-02-10 17:17 ` Lizhi Hou 2026-02-11 6:21 ` Claude review: " Claude Code Review Bot 2026-02-10 16:42 ` [PATCH 2/2] accel/amdxdna: Reduce log noise during process termination Mario Limonciello 2026-02-10 17:20 ` Lizhi Hou 2026-02-11 6:21 ` Claude review: " Claude Code Review Bot 2026-02-11 6:21 ` Claude review: amdxdna: fixes for closing a process Claude Code Review Bot
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox