public inbox for drm-ai-reviews@public-inbox.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH v2] drm/xe/ggtt: hold FORCEWAKE while updating GGTT PTEs on LNL
@ 2026-05-20 19:44 Nikolay Mikhaylov
  2026-05-25 10:52 ` Claude review: " Claude Code Review Bot
  2026-05-25 10:52 ` Claude Code Review Bot
  0 siblings, 2 replies; 3+ messages in thread
From: Nikolay Mikhaylov @ 2026-05-20 19:44 UTC (permalink / raw)
  To: intel-xe
  Cc: Matthew Brost, Thomas Hellström, Rodrigo Vivi, David Airlie,
	Simona Vetter, dri-devel, Nikolay Mikhaylov

GGTT PTEs are written through GSM using MMIO writes. On Lunar Lake
systems affected by the referenced issue, hangs have been observed around
GGTT update paths while the GT may be entering RC6 under GuC control.

The GGTT modify paths currently rely on xe_pm_runtime_get_noresume() for
power management protection. That prevents the device from entering D3,
but does not keep the GT out of RC6 while the device is otherwise runtime
PM active.

Hold GT FORCEWAKE across the observed GGTT PTE write batches:

  - xe_ggtt_insert_node_transform() and __xe_ggtt_insert_bo_at(),
    covering the display framebuffer pin path observed as the primary
    trigger on LNL/Wayland systems
  - xe_ggtt_clear(), covering the GGTT node removal/unpin path

This keeps the change limited to the paths where the hang has been
observed. The exact code shape submitted here has been tested by multiple
LNL users, including several weeks of uptime without reproducing the hang.

Boot-time xe_ggtt_initial_clear() is also covered by the xe_ggtt_clear()
wrap. That is incidental and not the primary load-bearing path for the
reported issue.

The insert-path FORCEWAKE wrapping was originally proposed by Márton Vigh
(@mrtnvgh):
https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/7513#note_3418761

Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/7513
Tested-by: Nikolay Mikhaylov <sonny@milton.pro>
Signed-off-by: Nikolay Mikhaylov <sonny@milton.pro>
---
 drivers/gpu/drm/xe/xe_ggtt.c | 39 ++++++++++++++++++++++++++++--------
 1 file changed, 31 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_ggtt.c b/drivers/gpu/drm/xe/xe_ggtt.c
index a351c578b170..c048bad70ebf 100644
--- a/drivers/gpu/drm/xe/xe_ggtt.c
+++ b/drivers/gpu/drm/xe/xe_ggtt.c
@@ -20,6 +20,8 @@
 #include "regs/xe_regs.h"
 #include "xe_assert.h"
 #include "xe_bo.h"
+#include "xe_device.h"
+#include "xe_force_wake.h"
 #include "xe_gt_printk.h"
 #include "xe_gt_types.h"
 #include "xe_map.h"
@@ -272,9 +274,18 @@ static void xe_ggtt_clear(struct xe_ggtt *ggtt, u64 start, u64 size)
 	else
 		scratch_pte = 0;
 
-	while (start < end) {
-		ggtt->pt_ops->ggtt_set_pte(ggtt, start, scratch_pte);
-		start += XE_PAGE_SIZE;
+	/*
+	 * GSM (mapped at tile->mmio.regs + SZ_8M) is not in an always-on
+	 * power domain. Hold FORCEWAKE for the PTE write batch to keep
+	 * the GT awake; on LNL GuC autonomously enters RC6 via
+	 * GUCRC_FIRMWARE_CONTROL and writeq() to GSM hangs if the GT
+	 * is asleep.
+	 */
+	xe_with_force_wake(fw_ref, gt_to_fw(ggtt->tile->primary_gt), XE_FW_GT) {
+		while (start < end) {
+			ggtt->pt_ops->ggtt_set_pte(ggtt, start, scratch_pte);
+			start += XE_PAGE_SIZE;
+		}
 	}
 }
 
@@ -769,10 +780,19 @@ struct xe_ggtt_node *xe_ggtt_insert_node_transform(struct xe_ggtt *ggtt,
 	if (ret)
 		goto err_unlock;
 
-	if (transform)
-		transform(ggtt, node, pte_flags, ggtt->pt_ops->ggtt_set_pte, arg);
-	else
-		xe_ggtt_map_bo(ggtt, node, bo, pte_flags);
+	/*
+	 * Hold FORCEWAKE for the PTE write batch. xe_pm_runtime_get_noresume()
+	 * upstack only prevents D3, not RC6: GuC may have placed the GT into
+	 * RC6 autonomously (GUCRC_FIRMWARE_CONTROL on LNL), and writeq() to
+	 * GSM hangs if the GT is asleep. Triggers most often from the display
+	 * framebuffer pin path on LNL/Wayland.
+	 */
+	xe_with_force_wake(fw_ref, gt_to_fw(ggtt->tile->primary_gt), XE_FW_GT) {
+		if (transform)
+			transform(ggtt, node, pte_flags, ggtt->pt_ops->ggtt_set_pte, arg);
+		else
+			xe_ggtt_map_bo(ggtt, node, bo, pte_flags);
+	}
 
 	mutex_unlock(&ggtt->lock);
 	return node;
@@ -844,7 +864,10 @@ static int __xe_ggtt_insert_bo_at(struct xe_ggtt *ggtt, struct xe_bo *bo,
 		u16 pat_index = xe_cache_pat_idx(tile_to_xe(ggtt->tile), cache_mode);
 		u64 pte = ggtt->pt_ops->pte_encode_flags(bo, pat_index);
 
-		xe_ggtt_map_bo(ggtt, bo->ggtt_node[tile_id], bo, pte);
+		/* See xe_ggtt_insert_node_transform()/xe_ggtt_clear() */
+		xe_with_force_wake(fw_ref, gt_to_fw(ggtt->tile->primary_gt), XE_FW_GT) {
+			xe_ggtt_map_bo(ggtt, bo->ggtt_node[tile_id], bo, pte);
+		}
 	}
 	mutex_unlock(&ggtt->lock);
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Claude review: drm/xe/ggtt: hold FORCEWAKE while updating GGTT PTEs on LNL
  2026-05-20 19:44 [PATCH v2] drm/xe/ggtt: hold FORCEWAKE while updating GGTT PTEs on LNL Nikolay Mikhaylov
@ 2026-05-25 10:52 ` Claude Code Review Bot
  2026-05-25 10:52 ` Claude Code Review Bot
  1 sibling, 0 replies; 3+ messages in thread
From: Claude Code Review Bot @ 2026-05-25 10:52 UTC (permalink / raw)
  To: dri-devel-reviews

Overall Series Review

Subject: drm/xe/ggtt: hold FORCEWAKE while updating GGTT PTEs on LNL
Author: Nikolay Mikhaylov <sonny@milton.pro>
Patches: 1
Reviewed: 2026-05-25T20:52:12.654181

---

This is a single-patch series that adds FORCEWAKE protection around GGTT PTE writes in the xe driver to prevent hangs on Lunar Lake (LNL) systems. The problem is well-understood: `xe_pm_runtime_get_noresume()` prevents D3 entry but doesn't prevent RC6, and on LNL the GuC autonomously enters RC6 via `GUCRC_FIRMWARE_CONTROL`. When the GT is in RC6, `writeq()` to GSM hangs.

The patch is well-motivated with real user testing backing it. The approach of wrapping PTE write batches in `xe_with_force_wake()` is reasonable and narrowly targeted.

**Concern:** There are additional GGTT PTE write paths in the same file that are **not** wrapped with FORCEWAKE:

1. **`xe_ggtt_map_bo_unlocked()`** (line 735–744) — called from `xe_bo_evict.c` to restore GGTT mappings after suspend. This calls `xe_ggtt_map_bo()` which writes PTEs via `ggtt_set_pte`. This path is not protected.

2. **`xe_ggtt_assign_locked()`** (line 971–987) — SR-IOV PF path that writes PTEs in a loop. Not protected.

3. **`xe_ggtt_node_load()`** (line 1053–1080) — SR-IOV node load path that writes PTEs in a loop. Not protected.

The commit message acknowledges the change is limited to "paths where the hang has been observed," which is a defensible position for a targeted fix. However, `xe_ggtt_map_bo_unlocked()` is called during resume from suspend and seems like it could plausibly hit the same issue on LNL if the GT re-enters RC6 quickly after resume.

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Claude review: drm/xe/ggtt: hold FORCEWAKE while updating GGTT PTEs on LNL
  2026-05-20 19:44 [PATCH v2] drm/xe/ggtt: hold FORCEWAKE while updating GGTT PTEs on LNL Nikolay Mikhaylov
  2026-05-25 10:52 ` Claude review: " Claude Code Review Bot
@ 2026-05-25 10:52 ` Claude Code Review Bot
  1 sibling, 0 replies; 3+ messages in thread
From: Claude Code Review Bot @ 2026-05-25 10:52 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Code correctness:** The usage of `xe_with_force_wake()` is correct. The macro declares `fw_ref` as a scoped local variable, so no pre-declaration is needed. `gt_to_fw(ggtt->tile->primary_gt)` correctly gets the force wake handle from the primary GT, and `XE_FW_GT` is the right domain.

**`xe_ggtt_clear()` wrapping (lines 277–289):**
```c
xe_with_force_wake(fw_ref, gt_to_fw(ggtt->tile->primary_gt), XE_FW_GT) {
    while (start < end) {
        ggtt->pt_ops->ggtt_set_pte(ggtt, start, scratch_pte);
        start += XE_PAGE_SIZE;
    }
}
```
Looks correct. The comment is informative and explains the why clearly.

**`xe_ggtt_insert_node_transform()` wrapping (lines 783–795):**
```c
xe_with_force_wake(fw_ref, gt_to_fw(ggtt->tile->primary_gt), XE_FW_GT) {
    if (transform)
        transform(ggtt, node, pte_flags, ggtt->pt_ops->ggtt_set_pte, arg);
    else
        xe_ggtt_map_bo(ggtt, node, bo, pte_flags);
}
```
Correct. Note that the `transform` callback receives `ggtt_set_pte` and will call it inside the FORCEWAKE scope, which is the right thing.

**`__xe_ggtt_insert_bo_at()` wrapping (lines 867–870):**
```c
xe_with_force_wake(fw_ref, gt_to_fw(ggtt->tile->primary_gt), XE_FW_GT) {
    xe_ggtt_map_bo(ggtt, bo->ggtt_node[tile_id], bo, pte);
}
```
Correct.

**Nit — platform scoping:** The FORCEWAKE is taken unconditionally on all platforms, not just LNL. The commit message frames this as an LNL issue. If FORCEWAKE is cheap and idempotent on platforms that don't need it, this is fine (and is the simpler approach). But if the maintainers want it gated to affected platforms, a workaround check or platform conditional might be requested. The patch as submitted is the simpler and safer approach.

**Missing coverage of `xe_ggtt_map_bo_unlocked()`:** This function at line 735 calls `xe_ggtt_map_bo()` without FORCEWAKE protection:
```c
void xe_ggtt_map_bo_unlocked(struct xe_ggtt *ggtt, struct xe_bo *bo)
{
    ...
    xe_ggtt_map_bo(ggtt, bo->ggtt_node[ggtt->tile->id], bo, pte);
    ...
}
```
This is called from `xe_bo_evict.c:208` during post-suspend BO restore. If the rationale is that GSM writes can hang when the GT is in RC6, this path has the same exposure. It may warrant the same wrapping, or at minimum a comment explaining why it's safe without FORCEWAKE (e.g., if the resume path guarantees GT is awake at that point).

**Missing coverage of SR-IOV paths (`xe_ggtt_assign_locked`, `xe_ggtt_node_load`):** These write PTEs in loops without FORCEWAKE. They are SR-IOV specific and may be less likely to trigger in practice, but are theoretically vulnerable to the same issue. Lower priority since SR-IOV on LNL may not be a common configuration.

**Overall:** The patch is a reasonable, narrowly-scoped fix for a real hang. The code changes are correct. The main question for the maintainer is whether `xe_ggtt_map_bo_unlocked()` should also be covered, and whether the unconditional FORCEWAKE (rather than LNL-gated) is acceptable.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-25 10:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-20 19:44 [PATCH v2] drm/xe/ggtt: hold FORCEWAKE while updating GGTT PTEs on LNL Nikolay Mikhaylov
2026-05-25 10:52 ` Claude review: " Claude Code Review Bot
2026-05-25 10:52 ` Claude Code Review Bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox