* [PATCH] drm/xe: Do not blindly copy system CCS pages when shrinking
@ 2026-04-30 19:09 Matthew Brost
2026-05-05 0:06 ` Claude review: " Claude Code Review Bot
2026-05-05 0:06 ` Claude Code Review Bot
0 siblings, 2 replies; 5+ messages in thread
From: Matthew Brost @ 2026-04-30 19:09 UTC (permalink / raw)
To: intel-xe, dri-devel
If the PAT index for a BO is configured without compression, copying CCS
pages during BO shrinking provides no value. Skip CCS copies for BOs that
have never been configured with compressed PAT indexes.
Once a BO is configured with a compressed PAT index, this becomes a
one-way transition that enables CCS copies. This heuristic is imprecise,
but simple and functional.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
When running 10x tabs webgl aquarium, it was observed a ratio of ~6.5 to
1 of BOs that can possible enable compression (xe_bo_needs_ccs_pages
returns true) actually enable compression in the PAT index and overall
CCS copies are reduced by roughly 1/2 when the shrinker is triggered.
---
drivers/gpu/drm/xe/xe_bo.c | 2 ++
drivers/gpu/drm/xe/xe_bo_types.h | 3 +++
drivers/gpu/drm/xe/xe_vm.c | 3 +++
3 files changed, 8 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 5ce60d161e09..3fbb82ece4de 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -1105,6 +1105,8 @@ static int xe_bo_move(struct ttm_buffer_object *ttm_bo, bool evict,
flags |= XE_MIGRATE_CLEAR_FLAG_CCS_DATA;
fence = xe_migrate_clear(migrate, bo, new_mem, flags);
+ } else if (handle_system_ccs && !bo->ccs_used) {
+ fence = dma_fence_get_stub();
} else {
fence = xe_migrate_copy(migrate, bo, bo, old_mem, new_mem,
handle_system_ccs);
diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h
index 9c199badd9b2..76f73a3bf2ca 100644
--- a/drivers/gpu/drm/xe/xe_bo_types.h
+++ b/drivers/gpu/drm/xe/xe_bo_types.h
@@ -88,6 +88,9 @@ struct xe_bo {
/** @ccs_cleared: true means that CCS region of BO is already cleared */
bool ccs_cleared;
+ /** @ccs_used: true means that CCS region of BO is used */
+ bool ccs_used;
+
/** @bb_ccs: BB instructions of CCS read/write. Valid only for VF */
struct xe_mem_pool_node *bb_ccs[XE_SRIOV_VF_CCS_CTX_COUNT];
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index c3836f6eab35..638037e71e2d 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -3054,6 +3054,9 @@ static int vma_lock_and_validate(struct drm_exec *exec, struct xe_vma *vma,
if (flags.request_decompress)
err = xe_bo_decompress(bo);
+
+ if (xe_pat_index_get_comp_en(vm->xe, vma->attr.pat_index))
+ bo->ccs_used = true;
}
return err;
--
2.34.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Claude review: drm/xe: Do not blindly copy system CCS pages when shrinking
2026-05-01 1:19 [PATCH v2] " Matthew Brost
2026-05-04 23:41 ` Claude review: " Claude Code Review Bot
@ 2026-05-04 23:41 ` Claude Code Review Bot
1 sibling, 0 replies; 5+ messages in thread
From: Claude Code Review Bot @ 2026-05-04 23:41 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: drm/xe: Do not blindly copy system CCS pages when shrinking
Author: Matthew Brost <matthew.brost@intel.com>
Patches: 1
Reviewed: 2026-05-05T09:41:41.461905
---
This is a single patch (v2) that optimizes the xe driver's shrinker path by skipping CCS (Compression Control Surface) page copies for buffer objects that have never been mapped with a compressed PAT index. The optimization is well-motivated — the commit message reports that ~6.5x more BOs *could* have compression than actually *do*, and this cuts CCS copies roughly in half during shrinking.
The approach is sound: a new `ccs_used` boolean tracks whether a BO has ever been bound with compression enabled, and when it hasn't, CCS migration to system memory is skipped by returning a stub fence. The one-way latch (once set, never cleared) is a safe conservative design.
There are a few issues, mostly around the `vma_lock_and_validate` integration.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 5+ messages in thread
* Claude review: drm/xe: Do not blindly copy system CCS pages when shrinking
2026-05-01 1:19 [PATCH v2] " Matthew Brost
@ 2026-05-04 23:41 ` Claude Code Review Bot
2026-05-04 23:41 ` Claude Code Review Bot
1 sibling, 0 replies; 5+ messages in thread
From: Claude Code Review Bot @ 2026-05-04 23:41 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**xe_bo_types.h — new `ccs_used` field**
The field placement and documentation look fine. Minor note: the existing tree has `struct xe_bb *bb_ccs[]` while the patch context shows `struct xe_mem_pool_node *bb_ccs[]` — this is likely just a base mismatch (the patch description notes v2 changes), not a real problem.
**xe_bo.h — new helpers**
The `xe_bo_set_ccs_used()` and `xe_bo_is_ccs_used()` helpers are clean. Both assert the reservation lock is held. The kernel-doc is thorough (perhaps more than needed for simple one-line helpers, but that's a style preference and the commit message says Claude assisted with docs).
**xe_bo.c — the core optimization**
```c
} else if (handle_system_ccs && new_mem->mem_type == XE_PL_SYSTEM &&
!xe_bo_is_ccs_used(bo)) {
fence = dma_fence_get_stub();
```
This is the key change. When `handle_system_ccs` is true but the BO was never used with compression and the destination is system memory, a stub fence is returned instead of calling `xe_migrate_copy()`. This is correct — if CCS was never written, there's nothing to preserve.
One concern: the code falls through to the `if (!move_lacks_source)` block at line 1130, which calls `ttm_bo_move_accel_cleanup()` with the stub fence. That should be fine since the stub fence is already signaled, but worth confirming that `ttm_bo_move_accel_cleanup()` with an already-signaled fence and `evict=true, pipeline=true` works correctly and actually moves the data pages. The stub fence means no *CCS* copy happens, but the main BO data also doesn't get copied here. Looking more carefully at the flow: the original path calls `xe_migrate_copy()` which handles *both* main data and CCS data together. **By returning a stub fence, the patch skips not only CCS copies but also the main data copy.** This seems like a bug if the BO has actual data pages that need migrating.
However, re-reading the condition: `handle_system_ccs` is true only for integrated GPUs (`!IS_DGFX`) where CCS pages live in system memory alongside the main data. In that case the "move" to system memory when the data is already in a system-memory-like TT pool may indeed not need a copy — but this deserves careful analysis. If the BO is moving from TT to SYSTEM placement and there *is* data but no CCS, the normal path would still do `xe_migrate_copy()` to handle the move. The stub fence would skip that entirely. **I'd want the author to confirm that this is safe for all TT-to-SYSTEM transitions on integrated GPUs**, or whether the condition should also check `old_mem_type`.
**xe_vm.c — xe_vm_ops_add_rebind**
```c
struct xe_bo *bo = xe_vma_bo(vma);
...
if (bo && xe_pat_index_get_comp_en(xe_vma_vm(vma)->xe,
vma->attr.pat_index))
xe_bo_set_ccs_used(bo);
```
This looks correct. On rebind, if the PAT index has compression enabled, mark the BO. The BO reservation lock should be held at this point since the rebind operation requires it.
**xe_vm.c — vma_lock_and_validate**
```c
if (flags.request_decompress)
err = xe_bo_decompress(bo);
if (xe_pat_index_get_comp_en(vm->xe, vma->attr.pat_index))
xe_bo_set_ccs_used(bo);
```
Two issues here:
1. **Missing error check**: `xe_bo_decompress()` can fail and set `err` to a non-zero value. The new code unconditionally calls `xe_bo_set_ccs_used()` even when `err` is non-zero from the decompress call. While setting the flag on a failed path is relatively harmless (it's a one-way latch and a conservative overapproximation), it would be cleaner to gate it with `if (!err && ...)` for consistency with the function's error-checking pattern.
2. **Placement outside the `if (bo)` block**: Looking at the patched code, the new lines are added after `xe_bo_decompress()` but before the closing `}` of the `if (bo)` block, so they are inside it. This is fine — `bo` is non-NULL when this runs. But the indentation in the patch could be clearer since there's a blank line before it.
**tests/xe_bo.c**
```c
xe_bo_set_ccs_used(bo);
```
Correctly sets `ccs_used` before CCS tests so the existing test behavior is preserved. The BO lock is held at this point.
**Overall assessment**: The optimization concept is good and the implementation is mostly clean. The main concern is whether returning `dma_fence_get_stub()` in `xe_bo_move()` is safe for all cases — since this replaces a call to `xe_migrate_copy()` which handles both data and CCS pages, not just CCS. The author should clarify that the main data doesn't need migration in these paths, or narrow the condition to only skip the CCS portion of the copy. The `vma_lock_and_validate` change should also add an `!err` guard for consistency.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 5+ messages in thread
* Claude review: drm/xe: Do not blindly copy system CCS pages when shrinking
2026-04-30 19:09 [PATCH] drm/xe: Do not blindly copy system CCS pages when shrinking Matthew Brost
2026-05-05 0:06 ` Claude review: " Claude Code Review Bot
@ 2026-05-05 0:06 ` Claude Code Review Bot
1 sibling, 0 replies; 5+ messages in thread
From: Claude Code Review Bot @ 2026-05-05 0:06 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: drm/xe: Do not blindly copy system CCS pages when shrinking
Author: Matthew Brost <matthew.brost@intel.com>
Patches: 1
Reviewed: 2026-05-05T10:06:13.444811
---
This is a single-patch optimization for the Xe driver's memory shrinker path on integrated (non-DGFX) GPUs. It avoids CCS (Compression Control Surface) page copies during BO shrinking when the BO has never been bound with a compressed PAT index. The commit message cites a ~6.5:1 ratio of eligible-but-unused CCS BOs and roughly 50% reduction in CCS copies — a meaningful optimization for memory pressure scenarios.
The approach is fundamentally sound: add a one-way `ccs_used` flag that transitions false-to-true when a VMA with a compressed PAT binds the BO, and skip the `xe_migrate_copy` in the move path when CCS was never used. However, there are a few issues worth addressing around locking documentation, error-path interaction, and the correctness of skipping the copy entirely (stub fence) versus just skipping the CCS portion.
---
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 5+ messages in thread
* Claude review: drm/xe: Do not blindly copy system CCS pages when shrinking
2026-04-30 19:09 [PATCH] drm/xe: Do not blindly copy system CCS pages when shrinking Matthew Brost
@ 2026-05-05 0:06 ` Claude Code Review Bot
2026-05-05 0:06 ` Claude Code Review Bot
1 sibling, 0 replies; 5+ messages in thread
From: Claude Code Review Bot @ 2026-05-05 0:06 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**xe_bo_c — stub fence branch (the core optimization)**
```c
+ } else if (handle_system_ccs && !bo->ccs_used) {
+ fence = dma_fence_get_stub();
} else {
fence = xe_migrate_copy(migrate, bo, bo, old_mem, new_mem,
handle_system_ccs);
```
When this branch fires, `move_lacks_source` is false (meaning `ccs_cleared` is already true from a prior clear and old_mem exists). The post-fence path at line 1130 then enters `ttm_bo_move_accel_cleanup()` with an already-signaled fence. This is functionally correct as a no-op move.
However, the key question is: **when we later move the BO back (SYSTEM -> TT) and `ccs_used` has since transitioned to true, what state are the CCS pages in?** If CCS data was never saved to the ttm_tt pages (because we returned a stub fence instead of calling `xe_migrate_copy` to read AUX CCS into the ttm_tt), then restoring from SYSTEM -> TT with `handle_system_ccs=true` would write back whatever garbage is in the ttm_tt CCS pages. The safety of this relies on the initial `xe_migrate_clear` having zeroed the CCS data in the ttm_tt pages, and the CCS pages not being reallocated with garbage across swap cycles. It would be worth adding a comment explaining why this is safe, or alternatively clearing the CCS pages in the ttm_tt when taking this shortcut to make the invariant explicit.
**xe_bo_types.h — new field**
```c
+ /** @ccs_used: true means that CCS region of BO is used */
+ bool ccs_used;
```
The field is placed next to `ccs_cleared`, which is good for locality. However, there is **no documentation of the locking requirements**. Compare with the nearby `atomic_access` field which documents "Protected by BO dma-resv lock." Since `ccs_used` is read in `xe_bo_move` (under resv lock via TTM) and written in `vma_lock_and_validate` (under resv lock via `drm_exec_lock_obj`), it should document this:
```c
/** @ccs_used: true means that CCS region of BO is used. Protected by dma-resv lock. */
```
The kdoc should also mention the one-way nature of the flag ("one-way transition: once set, never cleared"), since this is a key invariant that callers rely on.
**xe_vm.c — setting the flag**
```c
if (flags.request_decompress)
err = xe_bo_decompress(bo);
+
+ if (xe_pat_index_get_comp_en(vm->xe, vma->attr.pat_index))
+ bo->ccs_used = true;
}
return err;
```
Two observations:
1. **Error path interaction**: If `xe_bo_decompress(bo)` fails, `err` is non-zero but `ccs_used` is still set to true. This is conservative (we'd rather do unnecessary CCS copies than skip needed ones), so it's correct for safety. But it means the flag is set even though the bind may effectively fail. This is fine but worth a brief comment or at least acknowledgment.
2. **Placement**: `ccs_used` is set unconditionally of `flags.validate` — it fires whenever a compressed PAT is used, even if we didn't actually validate the BO. This is also conservative and correct. However, setting it outside the `if (!err && flags.validate)` block means it's set even when validation was never requested. Since this is a one-way transition to true, the worst case is premature CCS copying, not data corruption, so this is acceptable.
3. **Locking**: As discussed, `bo` is locked via `drm_exec_lock_obj` earlier in the function (for non-VM BOs) or implicitly (for VM-internal BOs). The `bo->ccs_used = true` write should be properly serialized. The external BO path (`!bo->vm`) takes the lock explicitly; the VM-internal BO path doesn't call `drm_exec_lock_obj` but those BOs shouldn't be shrinkable, so it shouldn't matter. Worth double-checking.
**VF CCS interaction**: When the stub fence path fires for a SYSTEM -> TT move with `handle_system_ccs=true`, the VF CCS attach code at lines 1160-1164 still executes:
```c
if (IS_VF_CCS_READY(xe) &&
((move_lacks_source && new_mem->mem_type == XE_PL_TT) ||
(old_mem_type == XE_PL_SYSTEM && new_mem->mem_type == XE_PL_TT)) &&
handle_system_ccs)
ret = xe_sriov_vf_ccs_attach_bo(bo);
```
This would attach VF CCS to a BO where CCS is not actually used. If VF CCS attachment has a performance or memory cost, this branch should also check `bo->ccs_used`.
**Missing initialization**: `ccs_used` is not explicitly initialized. `xe_bo` is allocated via TTM, and `ccs_cleared` is explicitly set to `false` at line 2354 in `__xe_bo_create_locked`. `ccs_used` should similarly be explicitly initialized rather than relying on zeroed allocation, for consistency and clarity.
**Minor nit**: The field name `ccs_used` could be confused with "CCS is currently in use." A name like `ccs_ever_compressed` or `ccs_comp_bound` would more clearly convey the one-way "was ever bound with compression" semantics. This is just a suggestion.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-05-05 0:06 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-30 19:09 [PATCH] drm/xe: Do not blindly copy system CCS pages when shrinking Matthew Brost
2026-05-05 0:06 ` Claude review: " Claude Code Review Bot
2026-05-05 0:06 ` Claude Code Review Bot
-- strict thread matches above, loose matches on Subject: below --
2026-05-01 1:19 [PATCH v2] " Matthew Brost
2026-05-04 23:41 ` Claude review: " Claude Code Review Bot
2026-05-04 23:41 ` Claude Code Review Bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox