From mboxrd@z Thu Jan 1 00:00:00 1970 From: Claude Code Review Bot To: dri-devel-reviews@example.com Subject: Claude review: accel/amdxdna: Fix suspend failure after enabling turbo mode Date: Wed, 11 Feb 2026 16:15:00 +1000 Message-ID: In-Reply-To: <20260210191652.312000-1-lizhi.hou@amd.com> References: <20260210191652.312000-1-lizhi.hou@amd.com> <20260210191652.312000-1-lizhi.hou@amd.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Mailer: Claude Code Patch Reviewer Patch Review **Commit Message Review:** The commit message provides a clear explanation of the problem and the solution. However, it could be improved: - Should include a "Fixes:" tag referencing the commit that introduced turbo mode - Should mention if this is user-visible (e.g., "Users experience suspend failures...") - Could benefit from describing the failure mode (does the system hang? return error?) **Code Review:** ```c @@ -323,6 +323,7 @@ static void aie2_hw_stop(struct amdxdna_dev *xdna) return; } + aie2_runtime_cfg(ndev, AIE2_RT_CFG_CLK_GATING, NULL); aie2_mgmt_fw_fini(ndev); xdna_mailbox_stop_channel(ndev->mgmt_chann); xdna_mailbox_destroy_channel(ndev->mgmt_chann); ``` **Issues at drivers/accel/amdxdna/aie2_pci.c:326:** 1. **Missing error handling**: `aie2_runtime_cfg()` likely returns an error code, but it's not checked here. What happens if re-enabling clock gating fails? Should the function proceed with firmware shutdown or bail out? 2. **Missing comment**: This non-obvious call deserves a comment explaining why clock gating must be re-enabled before firmware shutdown, something like: ```c /* Re-enable clock gating before suspend, as turbo mode disables it */ ``` 3. **Ordering question**: Should this be done *after* firmware shutdown rather than before? If firmware operations require certain clock states, the current order might cause issues. ```c @@ -406,15 +407,15 @@ static int aie2_hw_start(struct amdxdna_dev *xdna) goto stop_psp; } - ret = aie2_pm_init(ndev); + ret = aie2_mgmt_fw_init(ndev); if (ret) { - XDNA_ERR(xdna, "failed to init pm, ret %d", ret); + XDNA_ERR(xdna, "initial mgmt firmware failed, ret %d", ret); goto destroy_mgmt_chann; } - ret = aie2_mgmt_fw_init(ndev); + ret = aie2_pm_init(ndev); if (ret) { - XDNA_ERR(xdna, "initial mgmt firmware failed, ret %d", ret); + XDNA_ERR(xdna, "failed to init pm, ret %d", ret); goto destroy_mgmt_chann; } ``` **Issues at drivers/accel/amdxdna/aie2_pci.c:410-419:** 1. **Inadequate justification**: The commit message says "ensure that firmware is initialized in aie2_hw_start() before modifying clock-gating settings" but doesn't explain *why* this is necessary. Does `aie2_pm_init()` modify clock gating? If so, this should be explicit in the commit message. 2. **Broken error path**: Both error cases now `goto destroy_mgmt_chann`, but with the new ordering: - If `aie2_mgmt_fw_init()` fails, we skip destroying it (correct) - If `aie2_pm_init()` fails, we `goto destroy_mgmt_chann` but we've already initialized firmware - shouldn't we call `aie2_mgmt_fw_fini()` first? This looks like a bug in the error handling path. You need different labels: ```c ret = aie2_mgmt_fw_init(ndev); if (ret) { XDNA_ERR(xdna, "initial mgmt firmware failed, ret %d", ret); goto destroy_mgmt_chann; } ret = aie2_pm_init(ndev); if (ret) { XDNA_ERR(xdna, "failed to init pm, ret %d", ret); goto fini_mgmt_fw; /* Need to cleanup firmware */ } ``` 3. **Typo in error message**: "initial mgmt firmware failed" should be "initialize mgmt firmware failed" or "mgmt firmware initialization failed" for consistency with the other message format. 4. **Testing concern**: Has this been tested with error injection to verify the error paths work correctly? The changed ordering could mask issues. **Additional Concerns:** 1. **Missing symmetric cleanup**: In `aie2_hw_stop()`, the patch adds `aie2_runtime_cfg()` before `aie2_mgmt_fw_fini()`, but there's no corresponding operation in `aie2_hw_start()` after `aie2_pm_init()`. Is this asymmetry intentional? 2. **Race conditions**: What happens if suspend is called while turbo mode is being enabled? Is there locking to prevent races between runtime PM and mode changes? 3. **Power consumption**: If re-enabling clock gating fails during suspend, does the device stay in a high-power state? **Suggested Fix:** ```c diff --git a/drivers/accel/amdxdna/aie2_pci.c b/drivers/accel/amdxdna/aie2_pci.c index 5b326e4610e6..XXXXXXXX 100644 --- a/drivers/accel/amdxdna/aie2_pci.c +++ b/drivers/accel/amdxdna/aie2_pci.c @@ -323,6 +323,12 @@ static void aie2_hw_stop(struct amdxdna_dev *xdna) return; } + /* + * Re-enable clock gating for suspend. Turbo mode disables it, + * but suspend requires it to be enabled. + */ + ret = aie2_runtime_cfg(ndev, AIE2_RT_CFG_CLK_GATING, NULL); + WARN_ON(ret); /* Log but continue - shutdown is critical path */ + aie2_mgmt_fw_fini(ndev); xdna_mailbox_stop_channel(ndev->mgmt_chann); xdna_mailbox_destroy_channel(ndev->mgmt_chann); @@ -406,15 +412,20 @@ static int aie2_hw_start(struct amdxdna_dev *xdna) goto stop_psp; } - ret = aie2_pm_init(ndev); + /* + * Initialize firmware before PM to ensure clock gating + * configuration happens after firmware is ready. + */ + ret = aie2_mgmt_fw_init(ndev); if (ret) { - XDNA_ERR(xdna, "failed to init pm, ret %d", ret); + XDNA_ERR(xdna, "mgmt firmware init failed, ret %d", ret); goto destroy_mgmt_chann; } - ret = aie2_mgmt_fw_init(ndev); + ret = aie2_pm_init(ndev); if (ret) { - XDNA_ERR(xdna, "initial mgmt firmware failed, ret %d", ret); + XDNA_ERR(xdna, "failed to init pm, ret %d", ret); + aie2_mgmt_fw_fini(ndev); goto destroy_mgmt_chann; } ``` **Summary:** The patch addresses a real bug but needs revision for: - Proper error handling in the suspend path - Fixed error cleanup in the resume path - Better documentation of ordering requirements - Clearer commit message with Fixes: tag --- Generated by Claude Code Patch Reviewer