public inbox for drm-ai-reviews@public-inbox.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH] drm/xe: Do not blindly copy system CCS pages when shrinking
@ 2026-04-30 19:09 Matthew Brost
  2026-05-05  0:06 ` Claude review: " Claude Code Review Bot
  2026-05-05  0:06 ` Claude Code Review Bot
  0 siblings, 2 replies; 3+ messages in thread
From: Matthew Brost @ 2026-04-30 19:09 UTC (permalink / raw)
  To: intel-xe, dri-devel

If the PAT index for a BO is configured without compression, copying CCS
pages during BO shrinking provides no value. Skip CCS copies for BOs that
have never been configured with compressed PAT indexes.

Once a BO is configured with a compressed PAT index, this becomes a
one-way transition that enables CCS copies. This heuristic is imprecise,
but simple and functional.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>

---

When running 10x tabs webgl aquarium, it was observed a ratio of ~6.5 to
1 of BOs that can possible enable compression (xe_bo_needs_ccs_pages
returns true) actually enable compression in the PAT index and overall
CCS copies are reduced by roughly 1/2 when the shrinker is triggered.
---
 drivers/gpu/drm/xe/xe_bo.c       | 2 ++
 drivers/gpu/drm/xe/xe_bo_types.h | 3 +++
 drivers/gpu/drm/xe/xe_vm.c       | 3 +++
 3 files changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 5ce60d161e09..3fbb82ece4de 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -1105,6 +1105,8 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
 			flags |= XE_MIGRATE_CLEAR_FLAG_CCS_DATA;
 
 		fence = xe_migrate_clear(migrate, bo, new_mem, flags);
+	} else if (handle_system_ccs && !bo->ccs_used) {
+		fence = dma_fence_get_stub();
 	} else {
 		fence = xe_migrate_copy(migrate, bo, bo, old_mem, new_mem,
 					handle_system_ccs);
diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
index 9c199badd9b2..76f73a3bf2ca 100644
--- a/drivers/gpu/drm/xe/xe_bo_types.h
+++ b/drivers/gpu/drm/xe/xe_bo_types.h
@@ -88,6 +88,9 @@ struct xe_bo {
 	/** @ccs_cleared: true means that CCS region of BO is already cleared */
 	bool ccs_cleared;
 
+	/** @ccs_used: true means that CCS region of BO is used */
+	bool ccs_used;
+
 	/** @bb_ccs: BB instructions of CCS read/write. Valid only for VF */
 	struct xe_mem_pool_node *bb_ccs[XE_SRIOV_VF_CCS_CTX_COUNT];
 
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index c3836f6eab35..638037e71e2d 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -3054,6 +3054,9 @@ static int vma_lock_and_validate(struct drm_exec *exec, struct xe_vma *vma,
 
 		if (flags.request_decompress)
 			err = xe_bo_decompress(bo);
+
+		if (xe_pat_index_get_comp_en(vm->xe, vma->attr.pat_index))
+			bo->ccs_used = true;
 	}
 
 	return err;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Claude review: drm/xe: Do not blindly copy system CCS pages when shrinking
  2026-04-30 19:09 [PATCH] drm/xe: Do not blindly copy system CCS pages when shrinking Matthew Brost
  2026-05-05  0:06 ` Claude review: " Claude Code Review Bot
@ 2026-05-05  0:06 ` Claude Code Review Bot
  1 sibling, 0 replies; 3+ messages in thread
From: Claude Code Review Bot @ 2026-05-05  0:06 UTC (permalink / raw)
  To: dri-devel-reviews

Overall Series Review

Subject: drm/xe: Do not blindly copy system CCS pages when shrinking
Author: Matthew Brost <matthew.brost@intel.com>
Patches: 1
Reviewed: 2026-05-05T10:06:13.444811

---

This is a single-patch optimization for the Xe driver's memory shrinker path on integrated (non-DGFX) GPUs. It avoids CCS (Compression Control Surface) page copies during BO shrinking when the BO has never been bound with a compressed PAT index. The commit message cites a ~6.5:1 ratio of eligible-but-unused CCS BOs and roughly 50% reduction in CCS copies — a meaningful optimization for memory pressure scenarios.

The approach is fundamentally sound: add a one-way `ccs_used` flag that transitions false-to-true when a VMA with a compressed PAT binds the BO, and skip the `xe_migrate_copy` in the move path when CCS was never used. However, there are a few issues worth addressing around locking documentation, error-path interaction, and the correctness of skipping the copy entirely (stub fence) versus just skipping the CCS portion.

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Claude review: drm/xe: Do not blindly copy system CCS pages when shrinking
  2026-04-30 19:09 [PATCH] drm/xe: Do not blindly copy system CCS pages when shrinking Matthew Brost
@ 2026-05-05  0:06 ` Claude Code Review Bot
  2026-05-05  0:06 ` Claude Code Review Bot
  1 sibling, 0 replies; 3+ messages in thread
From: Claude Code Review Bot @ 2026-05-05  0:06 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**xe_bo_c — stub fence branch (the core optimization)**

```c
+	} else if (handle_system_ccs && !bo->ccs_used) {
+		fence = dma_fence_get_stub();
 	} else {
 		fence = xe_migrate_copy(migrate, bo, bo, old_mem, new_mem,
 					handle_system_ccs);
```

When this branch fires, `move_lacks_source` is false (meaning `ccs_cleared` is already true from a prior clear and old_mem exists). The post-fence path at line 1130 then enters `ttm_bo_move_accel_cleanup()` with an already-signaled fence. This is functionally correct as a no-op move.

However, the key question is: **when we later move the BO back (SYSTEM -> TT) and `ccs_used` has since transitioned to true, what state are the CCS pages in?** If CCS data was never saved to the ttm_tt pages (because we returned a stub fence instead of calling `xe_migrate_copy` to read AUX CCS into the ttm_tt), then restoring from SYSTEM -> TT with `handle_system_ccs=true` would write back whatever garbage is in the ttm_tt CCS pages. The safety of this relies on the initial `xe_migrate_clear` having zeroed the CCS data in the ttm_tt pages, and the CCS pages not being reallocated with garbage across swap cycles. It would be worth adding a comment explaining why this is safe, or alternatively clearing the CCS pages in the ttm_tt when taking this shortcut to make the invariant explicit.

**xe_bo_types.h — new field**

```c
+	/** @ccs_used: true means that CCS region of BO is used */
+	bool ccs_used;
```

The field is placed next to `ccs_cleared`, which is good for locality. However, there is **no documentation of the locking requirements**. Compare with the nearby `atomic_access` field which documents "Protected by BO dma-resv lock." Since `ccs_used` is read in `xe_bo_move` (under resv lock via TTM) and written in `vma_lock_and_validate` (under resv lock via `drm_exec_lock_obj`), it should document this:

```c
/** @ccs_used: true means that CCS region of BO is used. Protected by dma-resv lock. */
```

The kdoc should also mention the one-way nature of the flag ("one-way transition: once set, never cleared"), since this is a key invariant that callers rely on.

**xe_vm.c — setting the flag**

```c
 		if (flags.request_decompress)
 			err = xe_bo_decompress(bo);
+
+		if (xe_pat_index_get_comp_en(vm->xe, vma->attr.pat_index))
+			bo->ccs_used = true;
 	}
 
 	return err;
```

Two observations:

1. **Error path interaction**: If `xe_bo_decompress(bo)` fails, `err` is non-zero but `ccs_used` is still set to true. This is conservative (we'd rather do unnecessary CCS copies than skip needed ones), so it's correct for safety. But it means the flag is set even though the bind may effectively fail. This is fine but worth a brief comment or at least acknowledgment.

2. **Placement**: `ccs_used` is set unconditionally of `flags.validate` — it fires whenever a compressed PAT is used, even if we didn't actually validate the BO. This is also conservative and correct. However, setting it outside the `if (!err && flags.validate)` block means it's set even when validation was never requested. Since this is a one-way transition to true, the worst case is premature CCS copying, not data corruption, so this is acceptable.

3. **Locking**: As discussed, `bo` is locked via `drm_exec_lock_obj` earlier in the function (for non-VM BOs) or implicitly (for VM-internal BOs). The `bo->ccs_used = true` write should be properly serialized. The external BO path (`!bo->vm`) takes the lock explicitly; the VM-internal BO path doesn't call `drm_exec_lock_obj` but those BOs shouldn't be shrinkable, so it shouldn't matter. Worth double-checking.

**VF CCS interaction**: When the stub fence path fires for a SYSTEM -> TT move with `handle_system_ccs=true`, the VF CCS attach code at lines 1160-1164 still executes:

```c
if (IS_VF_CCS_READY(xe) &&
    ((move_lacks_source && new_mem->mem_type == XE_PL_TT) ||
     (old_mem_type == XE_PL_SYSTEM && new_mem->mem_type == XE_PL_TT)) &&
    handle_system_ccs)
    ret = xe_sriov_vf_ccs_attach_bo(bo);
```

This would attach VF CCS to a BO where CCS is not actually used. If VF CCS attachment has a performance or memory cost, this branch should also check `bo->ccs_used`.

**Missing initialization**: `ccs_used` is not explicitly initialized. `xe_bo` is allocated via TTM, and `ccs_cleared` is explicitly set to `false` at line 2354 in `__xe_bo_create_locked`. `ccs_used` should similarly be explicitly initialized rather than relying on zeroed allocation, for consistency and clarity.

**Minor nit**: The field name `ccs_used` could be confused with "CCS is currently in use." A name like `ccs_ever_compressed` or `ccs_comp_bound` would more clearly convey the one-way "was ever bound with compression" semantics. This is just a suggestion.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-05  0:06 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-30 19:09 [PATCH] drm/xe: Do not blindly copy system CCS pages when shrinking Matthew Brost
2026-05-05  0:06 ` Claude review: " Claude Code Review Bot
2026-05-05  0:06 ` Claude Code Review Bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox