public inbox for drm-ai-reviews@public-inbox.freedesktop.org
 help / color / mirror / Atom feed
From: Claude Code Review Bot <claude-review@example.com>
To: dri-devel-reviews@example.com
Subject: Claude review: accel/amdxdna: Fix suspend failure after enabling turbo mode
Date: Wed, 11 Feb 2026 16:15:00 +1000	[thread overview]
Message-ID: <review-patch1-20260210191652.312000-1-lizhi.hou@amd.com> (raw)
In-Reply-To: <20260210191652.312000-1-lizhi.hou@amd.com>

Patch Review

**Commit Message Review:**

The commit message provides a clear explanation of the problem and the solution. However, it could be improved:

- Should include a "Fixes:" tag referencing the commit that introduced turbo mode
- Should mention if this is user-visible (e.g., "Users experience suspend failures...")
- Could benefit from describing the failure mode (does the system hang? return error?)

**Code Review:**

```c
@@ -323,6 +323,7 @@ static void aie2_hw_stop(struct amdxdna_dev *xdna)
 		return;
 	}
 
+	aie2_runtime_cfg(ndev, AIE2_RT_CFG_CLK_GATING, NULL);
 	aie2_mgmt_fw_fini(ndev);
 	xdna_mailbox_stop_channel(ndev->mgmt_chann);
 	xdna_mailbox_destroy_channel(ndev->mgmt_chann);
```

**Issues at drivers/accel/amdxdna/aie2_pci.c:326:**

1. **Missing error handling**: `aie2_runtime_cfg()` likely returns an error code, but it's not checked here. What happens if re-enabling clock gating fails? Should the function proceed with firmware shutdown or bail out?

2. **Missing comment**: This non-obvious call deserves a comment explaining why clock gating must be re-enabled before firmware shutdown, something like:
   ```c
   /* Re-enable clock gating before suspend, as turbo mode disables it */
   ```

3. **Ordering question**: Should this be done *after* firmware shutdown rather than before? If firmware operations require certain clock states, the current order might cause issues.

```c
@@ -406,15 +407,15 @@ static int aie2_hw_start(struct amdxdna_dev *xdna)
 		goto stop_psp;
 	}
 
-	ret = aie2_pm_init(ndev);
+	ret = aie2_mgmt_fw_init(ndev);
 	if (ret) {
-		XDNA_ERR(xdna, "failed to init pm, ret %d", ret);
+		XDNA_ERR(xdna, "initial mgmt firmware failed, ret %d", ret);
 		goto destroy_mgmt_chann;
 	}
 
-	ret = aie2_mgmt_fw_init(ndev);
+	ret = aie2_pm_init(ndev);
 	if (ret) {
-		XDNA_ERR(xdna, "initial mgmt firmware failed, ret %d", ret);
+		XDNA_ERR(xdna, "failed to init pm, ret %d", ret);
 		goto destroy_mgmt_chann;
 	}
```

**Issues at drivers/accel/amdxdna/aie2_pci.c:410-419:**

1. **Inadequate justification**: The commit message says "ensure that firmware is initialized in aie2_hw_start() before modifying clock-gating settings" but doesn't explain *why* this is necessary. Does `aie2_pm_init()` modify clock gating? If so, this should be explicit in the commit message.

2. **Broken error path**: Both error cases now `goto destroy_mgmt_chann`, but with the new ordering:
   - If `aie2_mgmt_fw_init()` fails, we skip destroying it (correct)
   - If `aie2_pm_init()` fails, we `goto destroy_mgmt_chann` but we've already initialized firmware - shouldn't we call `aie2_mgmt_fw_fini()` first?

   This looks like a bug in the error handling path. You need different labels:
   ```c
   ret = aie2_mgmt_fw_init(ndev);
   if (ret) {
       XDNA_ERR(xdna, "initial mgmt firmware failed, ret %d", ret);
       goto destroy_mgmt_chann;
   }

   ret = aie2_pm_init(ndev);
   if (ret) {
       XDNA_ERR(xdna, "failed to init pm, ret %d", ret);
       goto fini_mgmt_fw;  /* Need to cleanup firmware */
   }
   ```

3. **Typo in error message**: "initial mgmt firmware failed" should be "initialize mgmt firmware failed" or "mgmt firmware initialization failed" for consistency with the other message format.

4. **Testing concern**: Has this been tested with error injection to verify the error paths work correctly? The changed ordering could mask issues.

**Additional Concerns:**

1. **Missing symmetric cleanup**: In `aie2_hw_stop()`, the patch adds `aie2_runtime_cfg()` before `aie2_mgmt_fw_fini()`, but there's no corresponding operation in `aie2_hw_start()` after `aie2_pm_init()`. Is this asymmetry intentional?

2. **Race conditions**: What happens if suspend is called while turbo mode is being enabled? Is there locking to prevent races between runtime PM and mode changes?

3. **Power consumption**: If re-enabling clock gating fails during suspend, does the device stay in a high-power state?

**Suggested Fix:**

```c
diff --git a/drivers/accel/amdxdna/aie2_pci.c b/drivers/accel/amdxdna/aie2_pci.c
index 5b326e4610e6..XXXXXXXX 100644
--- a/drivers/accel/amdxdna/aie2_pci.c
+++ b/drivers/accel/amdxdna/aie2_pci.c
@@ -323,6 +323,12 @@ static void aie2_hw_stop(struct amdxdna_dev *xdna)
 		return;
 	}
 
+	/*
+	 * Re-enable clock gating for suspend. Turbo mode disables it,
+	 * but suspend requires it to be enabled.
+	 */
+	ret = aie2_runtime_cfg(ndev, AIE2_RT_CFG_CLK_GATING, NULL);
+	WARN_ON(ret); /* Log but continue - shutdown is critical path */
+
 	aie2_mgmt_fw_fini(ndev);
 	xdna_mailbox_stop_channel(ndev->mgmt_chann);
 	xdna_mailbox_destroy_channel(ndev->mgmt_chann);
@@ -406,15 +412,20 @@ static int aie2_hw_start(struct amdxdna_dev *xdna)
 		goto stop_psp;
 	}
 
-	ret = aie2_pm_init(ndev);
+	/*
+	 * Initialize firmware before PM to ensure clock gating
+	 * configuration happens after firmware is ready.
+	 */
+	ret = aie2_mgmt_fw_init(ndev);
 	if (ret) {
-		XDNA_ERR(xdna, "failed to init pm, ret %d", ret);
+		XDNA_ERR(xdna, "mgmt firmware init failed, ret %d", ret);
 		goto destroy_mgmt_chann;
 	}
 
-	ret = aie2_mgmt_fw_init(ndev);
+	ret = aie2_pm_init(ndev);
 	if (ret) {
-		XDNA_ERR(xdna, "initial mgmt firmware failed, ret %d", ret);
+		XDNA_ERR(xdna, "failed to init pm, ret %d", ret);
+		aie2_mgmt_fw_fini(ndev);
 		goto destroy_mgmt_chann;
 	}
```

**Summary:**

The patch addresses a real bug but needs revision for:
- Proper error handling in the suspend path
- Fixed error cleanup in the resume path  
- Better documentation of ordering requirements
- Clearer commit message with Fixes: tag

---
Generated by Claude Code Patch Reviewer

  parent reply	other threads:[~2026-02-11  6:15 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-10 19:16 [PATCH V1] accel/amdxdna: Fix suspend failure after enabling turbo mode Lizhi Hou
2026-02-11  6:14 ` Claude review: " Claude Code Review Bot
2026-02-11  6:15 ` Claude Code Review Bot [this message]
2026-02-11 19:23 ` [PATCH V1] " Mario Limonciello

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=review-patch1-20260210191652.312000-1-lizhi.hou@amd.com \
    --to=claude-review@example.com \
    --cc=dri-devel-reviews@example.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox