public inbox for drm-ai-reviews@public-inbox.freedesktop.org
 help / color / mirror / Atom feed
* Claude review: drm/imagination: Synchronize interrupts before suspending the GPU
  2026-03-10 11:41 ` [PATCH 1/2] drm/imagination: Synchronize " Alessio Belle
@ 2026-03-11  3:25   ` Claude Code Review Bot
  0 siblings, 0 replies; 7+ messages in thread
From: Claude Code Review Bot @ 2026-03-11  3:25 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Correctness: Good**

The core fix is adding `synchronize_irq()` in `pvr_power_fw_disable()` when called from the runtime PM suspend path (`rpm_suspend=true`). This ensures any in-progress threaded IRQ handler completes before `pvr_fw_stop()` shuts down the firmware and the GPU is powered off.

The removal of `pvr_power_get()`/`pvr_power_put()` from the safety event handling in the threaded IRQ handler is correct and necessary. The commit message explains this well:

1. The threaded IRQ handler is already running because an actual hardware interrupt occurred, so the GPU must be powered on at that point.
2. The `synchronize_irq()` in the suspend path blocks the suspend callback until the threaded handler finishes, so the GPU stays powered throughout the handler's execution.
3. Keeping `pvr_power_get()` in the handler would deadlock: `synchronize_irq()` is called from the runtime PM suspend callback (which holds the PM lock), and `pvr_power_get()` → `pm_runtime_resume_and_get()` requires that same lock.

**Minor observation**: The `rpm_suspend` boolean parameter added to `pvr_power_fw_disable` is a reasonable approach, though an alternative would have been to call `synchronize_irq()` directly in `pvr_power_device_suspend()`. The chosen approach keeps the synchronization close to the firmware stop, which is arguably the right abstraction level.

**Note on the hardirq handler**: The primary hardirq handler (`pvr_device_irq_handler`) also accesses GPU registers via `pvr_fw_irq_pending()` and `pvr_device_safety_irq_pending()`. There is a small theoretical window between `synchronize_irq()` returning and `pvr_fw_stop()` completing where a new hardirq could fire and access registers. Patch 2 addresses this properly.

**Stable backport note**: The Fixes/Cc-stable tags and the below-the-fold explanation about which commit affects which kernel version are well done, making backporting easier for the stable maintainers.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Claude review: drm/imagination: Synchronize interrupts before suspending the GPU
  2026-05-22 13:15 [PATCH 6.12.y] " Alessio Belle
@ 2026-05-25  8:34 ` Claude Code Review Bot
  2026-05-25  8:34 ` Claude Code Review Bot
  1 sibling, 0 replies; 7+ messages in thread
From: Claude Code Review Bot @ 2026-05-25  8:34 UTC (permalink / raw)
  To: dri-devel-reviews

Overall Series Review

Subject: drm/imagination: Synchronize interrupts before suspending the GPU
Author: Alessio Belle <alessio.belle@imgtec.com>
Patches: 2
Reviewed: 2026-05-25T18:34:26.620059

---

This is a single-patch backport targeting the 6.12 stable kernel. It fixes a real race condition where the runtime PM suspend callback can power off the GPU while the threaded IRQ handler is still running on another CPU, causing a synchronous external abort when the handler tries to read GPU registers. The fix is straightforward: call `synchronize_irq()` after requesting firmware power-off but before calling `pvr_fw_stop()`, ensuring any in-flight IRQ handler has completed.

The approach is a sensible minimal backport. The upstream commit (2d7f05cddf4c) on drm-next went further — it uses `disable_irq()` / `enable_irq()` pairs in the suspend/resume paths (and adds a corresponding `pvr_power_fw_enable` signature change with `rpm_resume`). This 6.12 backport intentionally takes only the synchronization part without the full disable/enable refactoring, which is a reasonable choice for a stable backport: it fixes the race with minimal code churn and avoids the need to also modify the resume path.

**Verdict: The patch looks correct and suitable for stable.**

One minor observation and one potential concern are noted below.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Claude review: drm/imagination: Synchronize interrupts before suspending the GPU
  2026-05-22 13:15 [PATCH 6.12.y] " Alessio Belle
  2026-05-25  8:34 ` Claude review: " Claude Code Review Bot
@ 2026-05-25  8:34 ` Claude Code Review Bot
  1 sibling, 0 replies; 7+ messages in thread
From: Claude Code Review Bot @ 2026-05-25  8:34 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Correctness of the fix:**

The core change is sound. Adding `synchronize_irq()` between the firmware power-off request and `pvr_fw_stop()` ensures any in-flight threaded IRQ handler completes before the GPU is actually stopped:

```c
+	if (rpm_suspend) {
+		/* Wait for late processing of GPU or firmware IRQs in other cores */
+		synchronize_irq(pvr_dev->irq);
+	}
+
 	return pvr_fw_stop(pvr_dev);
```

`synchronize_irq()` waits for both hardirq and threaded IRQ handlers to complete, which is exactly what's needed given the crash trace shows `pvr_device_irq_thread_handler` as the offending function.

**Placement is correct:** The call is placed after `pvr_power_request_pwr_off()` (which tells the firmware to stop generating new interrupts) but before `pvr_fw_stop()` (which actually powers off the GPU). This ordering ensures: (1) no new IRQs will be generated by the firmware, and (2) any already-in-flight handler finishes before hardware access is revoked.

**Boolean parameter addition:**

```c
-pvr_power_fw_disable(struct pvr_device *pvr_dev, bool hard_reset)
+pvr_power_fw_disable(struct pvr_device *pvr_dev, bool hard_reset, bool rpm_suspend)
```

The two call sites are updated correctly:
- `pvr_power_device_suspend()` passes `true` — this is the runtime PM path that needs synchronization
- `pvr_power_reset()` passes `false` — this path already handles IRQs differently (via `disable_irq`/`disable_irq_nosync` earlier in the reset sequence, which on 6.12 would be just `disable_irq`)

**Minor observation — `synchronize_irq()` vs upstream's `disable_irq()`:**

The upstream drm-next version uses `disable_irq()` instead of `synchronize_irq()`, which is strictly stronger — it both synchronizes *and* prevents new IRQs from being delivered while the GPU is off. This backport's use of `synchronize_irq()` alone leaves a small theoretical window: a new IRQ could arrive (e.g., a spurious interrupt or shared IRQ line) after `synchronize_irq()` returns but before `pvr_fw_stop()` completes. In practice this is unlikely to be a problem since: (1) the firmware has already been told to power off, so it shouldn't generate new IRQs, and (2) the IRQ handler should handle spurious interrupts gracefully. The comment in the patch description acknowledges that this is a subset of the upstream fix, which is acceptable for a minimal stable backport.

**Commit message quality:** Good. Clearly explains the race condition, why it's different from the upstream commit, and provides a concrete crash signature. The `Fixes:` tag and `Cc: stable` are appropriate.

**No issues found.** The patch is a clean, minimal, and correct stable backport.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 6.12.y v2] drm/imagination: Synchronize interrupts before suspending the GPU
@ 2026-05-26  8:13 Alessio Belle
  2026-05-26 13:38 ` Sasha Levin
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Alessio Belle @ 2026-05-26  8:13 UTC (permalink / raw)
  To: stable
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Frank Binns, Matt Coster, Brajesh Gupta, dri-devel,
	linux-kernel, Alessio Belle

commit 2d7f05cddf4c268cc36256a2476946041dbdd36d upstream.

The runtime PM suspend callback doesn't know whether the IRQ handler is
in progress on a different CPU core and doesn't wait for it to finish.

Depending on timing, the IRQ handler could be running while the GPU is
suspended, leading to it being killed when trying to access GPU
registers. See example signature below.

In a power off sequence initiated by the runtime PM suspend callback,
wait for any IRQ handlers in progress on other CPU cores to finish, by
calling synchronize_irq().

This version of the patch contains only the part of the upstream commit
that applies to 6.12; the rest was a revert of code added in 6.16.
The second paragraph above is different because on 6.12 this kind of bug
doesn't seem to crash the entire kernel, only the IRQ handler, leaving
the driver unusable in practice.

The crash signature below is also different, both because of the above,
and because there was no support for TI AM68 SK in 6.12.

Example signature on a TI AM62 SK platform:

  [ 7827.189088] Internal error: synchronous external abort: 0000000096000010 [#1] PREEMPT SMP
  [ 7827.197311] Modules linked in:
  [ 7827.222015] CPU: 0 UID: 0 PID: 461 Comm: irq/405-gpu Tainted: G   M               6.12.90 #5
  [ 7827.230461] Tainted: [M]=MACHINE_CHECK
  [ 7827.234203] Hardware name: Texas Instruments AM625 SK (DT)
  [ 7827.239682] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  [ 7827.246637] pc : pvr_device_irq_thread_handler+0x64/0x180 [powervr]
  [ 7827.252941] lr : irq_thread_fn+0x2c/0xa8
  [ 7827.256872] sp : ffff800082d8bd50
  [ 7827.260179] x29: ffff800082d8bd70 x28: ffff8000800ce374 x27: ffff800081829cc0
  [ 7827.267328] x26: ffff000004701e80 x25: ffff000005b884ac x24: ffff000005bd5780
  [ 7827.274472] x23: ffff00000da40bc0 x22: ffff00000da40ba0 x21: ffff800082d8bd58
  [ 7827.281614] x20: ffff00000da40000 x19: ffff000004701e80 x18: 08000000c6af9003
  [ 7827.288750] x17: 0000000000000010 x16: 0000000000000068 x15: 0df234008df66400
  [ 7827.295886] x14: 0000000000000000 x13: 000005c68f6e7191 x12: 000000000000025e
  [ 7827.303020] x11: 00000000000000c0 x10: 0000000000000ac0 x9 : ffff800082d8bd00
  [ 7827.310157] x8 : ffff000005bd62a0 x7 : ffff000077261380 x6 : 00000000000005c6
  [ 7827.317292] x5 : 000000000000425e x4 : 0000000000000000 x3 : 0000000000000000
  [ 7827.324428] x2 : 00000000000008a8 x1 : ffff800082d608a8 x0 : ffff000005bd5780
  [ 7827.331568] Call trace:
  [ 7827.334011]  pvr_device_irq_thread_handler+0x64/0x180 [powervr]
  [ 7827.339954]  irq_thread_fn+0x2c/0xa8
  [ 7827.343530]  irq_thread+0x16c/0x2f4
  [ 7827.347019]  kthread+0x110/0x114
  [ 7827.350248]  ret_from_fork+0x10/0x20
  [ 7827.353834] Code: f9446682 f943c281 b9404442 8b020021 (b9400021)
  [ 7827.359921] ---[ end trace 0000000000000000 ]---
  [ 7827.364820] genirq: exiting task "irq/405-gpu" (461) is an active IRQ thread (irq 405)
  [ 8011.230278] powervr fd00000.gpu: Job timeout
  [ 8011.230350] powervr fd00000.gpu: Job timeout
  [ 8011.230426] powervr fd00000.gpu: Job timeout

Fixes: cc1aeedb98ad ("drm/imagination: Implement firmware infrastructure and META FW support")
Fixes: 96822d38ff57 ("drm/imagination: Handle Rogue safety event IRQs")
Cc: stable@vger.kernel.org
Signed-off-by: Alessio Belle <alessio.belle@imgtec.com>
Reviewed-by: Matt Coster <matt.coster@imgtec.com>
Link: https://patch.msgid.link/20260310-drain-irqs-before-suspend-v1-1-bf4f9ed68e75@imgtec.com
Signed-off-by: Matt Coster <matt.coster@imgtec.com>
---
This is a backport of the parts relevant to 6.12 of [1].

[1] https://lore.kernel.org/all/20260310-drain-irqs-before-suspend-v1-0-bf4f9ed68e75@imgtec.com/
---
Changes in v2:
- Restored missing upstream trailers
- Link to v1: https://lore.kernel.org/r/20260522-sync-irqs-6-12-v1-1-b0ecc9675078@imgtec.com
---
 drivers/gpu/drm/imagination/pvr_power.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/imagination/pvr_power.c b/drivers/gpu/drm/imagination/pvr_power.c
index bf4cf8426f91..077d2651798c 100644
--- a/drivers/gpu/drm/imagination/pvr_power.c
+++ b/drivers/gpu/drm/imagination/pvr_power.c
@@ -84,7 +84,7 @@ pvr_power_request_pwr_off(struct pvr_device *pvr_dev)
 }
 
 static int
-pvr_power_fw_disable(struct pvr_device *pvr_dev, bool hard_reset)
+pvr_power_fw_disable(struct pvr_device *pvr_dev, bool hard_reset, bool rpm_suspend)
 {
 	if (!hard_reset) {
 		int err;
@@ -100,6 +100,11 @@ pvr_power_fw_disable(struct pvr_device *pvr_dev, bool hard_reset)
 			return err;
 	}
 
+	if (rpm_suspend) {
+		/* Wait for late processing of GPU or firmware IRQs in other cores */
+		synchronize_irq(pvr_dev->irq);
+	}
+
 	return pvr_fw_stop(pvr_dev);
 }
 
@@ -243,7 +248,7 @@ pvr_power_device_suspend(struct device *dev)
 		return -EIO;
 
 	if (pvr_dev->fw_dev.booted) {
-		err = pvr_power_fw_disable(pvr_dev, false);
+		err = pvr_power_fw_disable(pvr_dev, false, true);
 		if (err)
 			goto err_drm_dev_exit;
 	}
@@ -425,7 +430,7 @@ pvr_power_reset(struct pvr_device *pvr_dev, bool hard_reset)
 			queues_disabled = true;
 		}
 
-		err = pvr_power_fw_disable(pvr_dev, hard_reset);
+		err = pvr_power_fw_disable(pvr_dev, hard_reset, false);
 		if (!err) {
 			if (hard_reset) {
 				pvr_dev->fw_dev.booted = false;

---
base-commit: 2538fbeff8a94ee2b54eb09d92209e24a1e650d4
change-id: 20260522-sync-irqs-6-12-f14d5b26a84e

Best regards,
-- 
Alessio Belle <alessio.belle@imgtec.com>


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 6.12.y v2] drm/imagination: Synchronize interrupts before suspending the GPU
  2026-05-26  8:13 [PATCH 6.12.y v2] drm/imagination: Synchronize interrupts before suspending the GPU Alessio Belle
@ 2026-05-26 13:38 ` Sasha Levin
  2026-05-27  5:07 ` Claude review: " Claude Code Review Bot
  2026-05-27  5:07 ` Claude Code Review Bot
  2 siblings, 0 replies; 7+ messages in thread
From: Sasha Levin @ 2026-05-26 13:38 UTC (permalink / raw)
  To: Alessio Belle
  Cc: Sasha Levin, stable, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Frank Binns,
	Matt Coster, Brajesh Gupta, dri-devel, linux-kernel

On Tue, May 26, 2026 at 09:13:07AM +0100, Alessio Belle wrote:
> This version of the patch contains only the part of the upstream commit
> that applies to 6.12; the rest was a revert of code added in 6.16.

Queued for 6.12, thanks.

-- 
Thanks,
Sasha

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Claude review: drm/imagination: Synchronize interrupts before suspending the GPU
  2026-05-26  8:13 [PATCH 6.12.y v2] drm/imagination: Synchronize interrupts before suspending the GPU Alessio Belle
  2026-05-26 13:38 ` Sasha Levin
@ 2026-05-27  5:07 ` Claude Code Review Bot
  2026-05-27  5:07 ` Claude Code Review Bot
  2 siblings, 0 replies; 7+ messages in thread
From: Claude Code Review Bot @ 2026-05-27  5:07 UTC (permalink / raw)
  To: dri-devel-reviews

Overall Series Review

Subject: drm/imagination: Synchronize interrupts before suspending the GPU
Author: Alessio Belle <alessio.belle@imgtec.com>
Patches: 2
Reviewed: 2026-05-27T15:07:57.136689

---

This is a single-patch stable backport for kernel 6.12 that addresses a real race condition in the `drm/imagination` (PowerVR) driver. During runtime PM suspend, the GPU can be powered off while an IRQ handler is still executing on another CPU core, causing a synchronous external abort when the handler tries to access GPU registers.

The upstream fix (commit `2d7f05cddf4c`) uses `disable_irq()`/`enable_irq()` pairs in `pvr_power_fw_disable`/`pvr_power_fw_enable`, which is the complete solution. This backport correctly identifies that 6.12 lacks the `pvr_power_fw_enable` counterpart infrastructure (and other code added in 6.16) and instead uses the simpler `synchronize_irq()` — a minimal, appropriate fix for the stable branch.

The patch is well-motivated with a clear crash signature, properly tagged with `Fixes:` and `Cc: stable`, carries the upstream `Reviewed-by` and `Signed-off-by` trailers, and is v2 addressing prior review feedback about missing trailers.

**Verdict: This patch looks correct and appropriate for 6.12 stable.**

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Claude review: drm/imagination: Synchronize interrupts before suspending the GPU
  2026-05-26  8:13 [PATCH 6.12.y v2] drm/imagination: Synchronize interrupts before suspending the GPU Alessio Belle
  2026-05-26 13:38 ` Sasha Levin
  2026-05-27  5:07 ` Claude review: " Claude Code Review Bot
@ 2026-05-27  5:07 ` Claude Code Review Bot
  2 siblings, 0 replies; 7+ messages in thread
From: Claude Code Review Bot @ 2026-05-27  5:07 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Correctness of the approach:**

The patch adds a `synchronize_irq()` call in `pvr_power_fw_disable()` when called from the runtime PM suspend path (`rpm_suspend == true`), placed after the firmware has been told to idle and power off but before `pvr_fw_stop()` actually shuts down hardware:

```c
+	if (rpm_suspend) {
+		/* Wait for late processing of GPU or firmware IRQs in other cores */
+		synchronize_irq(pvr_dev->irq);
+	}
+
 	return pvr_fw_stop(pvr_dev);
```

This placement is correct — by the time we reach this point, the FW has acknowledged the idle and power-off requests, so no new IRQs should be generated, and `synchronize_irq()` ensures any in-flight handler completes before proceeding with `pvr_fw_stop()`.

**API parameter addition:**

The `bool rpm_suspend` parameter is added to `pvr_power_fw_disable()`:

```c
-pvr_power_fw_disable(struct pvr_device *pvr_dev, bool hard_reset)
+pvr_power_fw_disable(struct pvr_device *pvr_dev, bool hard_reset, bool rpm_suspend)
```

Both call sites are updated correctly:
- `pvr_power_device_suspend()` passes `true` — this is the runtime PM path that needs the synchronization
- `pvr_power_reset()` passes `false` — this path already handles IRQs separately (it calls `disable_irq()`/`disable_irq_nosync()` before entering the reset loop)

**Comparison with upstream:**

The upstream version on drm-next uses `disable_irq()`/`enable_irq()` instead of `synchronize_irq()`, which is stronger — it not only waits for in-flight handlers but also prevents new IRQs from being delivered. However, for the 6.12 stable context, `synchronize_irq()` is sufficient because:
1. By the time it's called, the FW has acknowledged power-off, so no new IRQs should fire
2. It avoids the need for a matching `enable_irq()` in `pvr_power_fw_enable()`, which would require additional changes to a function that doesn't exist in the same form on 6.12

This is a deliberate and reasonable simplification for the stable backport, and the commit message documents this explicitly.

**Minor observations (not blocking):**

- The comment `/* Wait for late processing of GPU or firmware IRQs in other cores */` is clear and explains the "why."
- The `v2` changelog properly notes the only changes were restoring missing upstream trailers, so the core logic is unchanged from v1 which was already reviewed.

**No issues found.** The patch is a clean, minimal stable backport that correctly addresses the race condition.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-05-27  5:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-26  8:13 [PATCH 6.12.y v2] drm/imagination: Synchronize interrupts before suspending the GPU Alessio Belle
2026-05-26 13:38 ` Sasha Levin
2026-05-27  5:07 ` Claude review: " Claude Code Review Bot
2026-05-27  5:07 ` Claude Code Review Bot
  -- strict thread matches above, loose matches on Subject: below --
2026-05-22 13:15 [PATCH 6.12.y] " Alessio Belle
2026-05-25  8:34 ` Claude review: " Claude Code Review Bot
2026-05-25  8:34 ` Claude Code Review Bot
2026-03-10 11:41 [PATCH 0/2] drm/imagination: Drain " Alessio Belle
2026-03-10 11:41 ` [PATCH 1/2] drm/imagination: Synchronize " Alessio Belle
2026-03-11  3:25   ` Claude review: " Claude Code Review Bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox