[PATCH v4 0/5] Add reclaim to the dmem cgroup controller

public inbox for drm-ai-reviews@public-inbox.freedesktop.org
 help / color / mirror / Atom feed

* [PATCH v4 0/5] Add reclaim to the dmem cgroup controller
@ 2026-05-12  8:24 Thomas Hellström
  2026-05-12  8:24 ` [PATCH v4 1/5] drm/amdgpu: Fix init ordering in amdgpu_vram_mgr_init() Thomas Hellström
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Thomas Hellström @ 2026-05-12  8:24 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Natalie Vock, Johannes Weiner, Tejun Heo,
	Michal Koutný, cgroups, Huang Rui, Matthew Brost,
	Matthew Auld, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Simona Vetter, David Airlie, Christian König, Alex Deucher,
	Rodrigo Vivi, dri-devel, amd-gfx, linux-kernel

When writing a "max" limit lower than the current usage, the
existing code silently failed. This series aims to improve
on that by returning -EBUSY on failure and also attempt
to synchronously reclaim device memory to push the usage
under the new max limit to avoid the error.

Patch 1 fixes a pre-existing amdgpu_vram_mgr_init() error path
Patch 2 implements and documents a reclaim callback interface
      for the dmem controller.
Patch 3 implements a TTM reclaim callback.
Patch 4-5 hooks up the reclaim callback to the dmem cgroups-
      aware drivers xe and amdgpu.

v2:
- Remove the error propagation that was in a previous series (Maarten)
- A number of updates in patch 1. See its commit message for
  details (Maarten)

v3:
- Add patch 1 fixing a pre-existing amdgpu_vram_mgr_init() error path
  bug where drmm_cgroup_register_region() was called before
  INIT_LIST_HEAD() and gpu_buddy_init(), causing a kernel panic on
  failure. (Sashiko-bot)
- Use an rwsem to protect reclaim callback registration and region
  unregister against concurrent reclaim invocations. (Sashiko-bot)
- Fix ttm_resource_manager_set_dmem_region() storing an error pointer
  in man->cg unconditionally. (Sashiko-bot)
- Fix kernel-doc function name format for ttm_bo_evict_cgroup() and
  ttm_resource_manager_set_dmem_region().

v4:
- Rebased on drm-tip; dropped the XE_PL_STOLEN guard in the xe patch
  as stolen memory uses a separate TTM manager.

User-space tests are at
https://patchwork.freedesktop.org/series/163935/

Test-with: 20260428065411.4222-1-thomas.hellstrom@linux.intel.com

Thomas Hellström (5):
  drm/amdgpu: Fix init ordering in amdgpu_vram_mgr_init()
  cgroup/dmem: Add reclaim callback for lowering max below current usage
  drm/ttm: Hook up a cgroup-aware reclaim callback for the dmem
    controller
  drm/xe: Wire up dmem cgroup reclaim for VRAM manager
  drm/amdgpu: Wire up dmem cgroup reclaim for VRAM manager

 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c      |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c |  10 +-
 drivers/gpu/drm/ttm/ttm_bo.c                 |  95 ++++++++++++++++-
 drivers/gpu/drm/ttm/ttm_bo_util.c            |   3 +-
 drivers/gpu/drm/ttm/ttm_resource.c           |  37 +++++++
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.c         |  14 ++-
 include/drm/ttm/ttm_bo.h                     |  10 ++
 include/drm/ttm/ttm_resource.h               |   4 +
 include/linux/cgroup_dmem.h                  |  24 +++++
 kernel/cgroup/dmem.c                         | 106 +++++++++++++++++--
 10 files changed, 283 insertions(+), 22 deletions(-)

-- 
2.54.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v4 1/5] drm/amdgpu: Fix init ordering in amdgpu_vram_mgr_init()
  2026-05-12  8:24 [PATCH v4 0/5] Add reclaim to the dmem cgroup controller Thomas Hellström
@ 2026-05-12  8:24 ` Thomas Hellström
  2026-05-16  4:03   ` Claude review: " Claude Code Review Bot
  2026-05-12  8:24 ` [PATCH v4 2/5] cgroup/dmem: Add reclaim callback for lowering max below current usage Thomas Hellström
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Thomas Hellström @ 2026-05-12  8:24 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Sashiko-bot, Friedrich Vock,
	Maarten Lankhorst, Tejun Heo, Maxime Ripard, Christian König,
	Alex Deucher, amd-gfx, dri-devel, stable, Natalie Vock,
	Johannes Weiner, Michal Koutný, cgroups, Huang Rui,
	Matthew Brost, Matthew Auld, Maarten Lankhorst, Thomas Zimmermann,
	Simona Vetter, David Airlie, Rodrigo Vivi, linux-kernel

drmm_cgroup_register_region() is called before INIT_LIST_HEAD() and
gpu_buddy_init() in amdgpu_vram_mgr_init(). If it fails, the function
returns early and bypasses those initializations.

Since adev->mman.initialized is set to true before amdgpu_vram_mgr_init()
is called, a failure triggers amdgpu_ttm_fini(), which calls
amdgpu_vram_mgr_fini(), which then:

 - Calls list_for_each_entry_safe() on reservations_pending and
   reserved_pages, whose list_head::next pointers are zero-initialized
   (NULL). The loop does not recognize them as empty and dereferences NULL.

 - Calls gpu_buddy_fini(), which iterates free_trees[] unconditionally
   via for_each_free_tree(). Since mm->free_trees is NULL
   (never allocated), this dereferences NULL.

Both result in a kernel panic on the module load error path.

Fix by moving drmm_cgroup_register_region() to after the list and buddy
allocator are fully initialized, so the teardown path is safe to run.

Reported-by: Sashiko-bot <sashiko-bot@kernel.org>
Closes: https://sashiko.dev/#/patchset/20260428073116.15687-1-thomas.hellstrom@linux.intel.com?part=4
Fixes: 2b624a2c1865 ("drm/ttm: Handle cgroup based eviction in TTM")
Cc: Friedrich Vock <friedrich.vock@gmx.de>
Cc: Maarten Lankhorst <dev@lankhorst.se>
Cc: Tejun Heo <tj@kernel.org>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v6.14+
Assisted-by: GitHub_Copilot:claude-sonnet-4.6
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 2a241a5b12c4..ac3f71d77140 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -918,9 +918,6 @@ int amdgpu_vram_mgr_init(struct amdgpu_device *adev)
 	struct ttm_resource_manager *man = &mgr->manager;
 	int err;
 
-	man->cg = drmm_cgroup_register_region(adev_to_drm(adev), "vram", adev->gmc.real_vram_size);
-	if (IS_ERR(man->cg))
-		return PTR_ERR(man->cg);
 	ttm_resource_manager_init(man, &adev->mman.bdev,
 				  adev->gmc.real_vram_size);
 
@@ -935,6 +932,10 @@ int amdgpu_vram_mgr_init(struct amdgpu_device *adev)
 	if (err)
 		return err;
 
+	man->cg = drmm_cgroup_register_region(adev_to_drm(adev), "vram", adev->gmc.real_vram_size);
+	if (IS_ERR(man->cg))
+		return PTR_ERR(man->cg);
+
 	ttm_set_driver_manager(&adev->mman.bdev, TTM_PL_VRAM, &mgr->manager);
 	ttm_resource_manager_set_used(man, true);
 	return 0;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 2/5] cgroup/dmem: Add reclaim callback for lowering max below current usage
  2026-05-12  8:24 [PATCH v4 0/5] Add reclaim to the dmem cgroup controller Thomas Hellström
  2026-05-12  8:24 ` [PATCH v4 1/5] drm/amdgpu: Fix init ordering in amdgpu_vram_mgr_init() Thomas Hellström
@ 2026-05-12  8:24 ` Thomas Hellström
  2026-05-16  4:04   ` Claude review: " Claude Code Review Bot
  2026-05-12  8:24 ` [PATCH v4 3/5] drm/ttm: Hook up a cgroup-aware reclaim callback for the dmem controller Thomas Hellström
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Thomas Hellström @ 2026-05-12  8:24 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Natalie Vock, Johannes Weiner, Tejun Heo,
	Michal Koutný, cgroups, Huang Rui, Matthew Brost,
	Matthew Auld, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Simona Vetter, David Airlie, Christian König, Alex Deucher,
	Rodrigo Vivi, dri-devel, amd-gfx, linux-kernel

Add an optional reclaim callback to struct dmem_cgroup_region. When
dmem.max is set below the current usage of a cgroup pool, the new limit
is applied immediately (so that concurrent allocations are throttled
while reclaim is in progress) and then the driver is asked to evict
memory to bring usage back below the limit.

Reclaim is attempted up to a bounded number of times. No error is
returned to userspace if usage remains above the limit after reclaim,
and a pending signal will abort the reclaim loop early. This matches
the behavior of memory.max in the memory cgroup controller.

Also honor O_NONBLOCK so that if that flag is set during the
max value write, no reclaim is initiated. The idea is to avoid
charging the reclaim cost to the writer of the max value.

v2:
- Write max before reclaim is attempted (Maarten)
- Let signals abort the reclaim without error (Maarten)
- If a new max value is written with the O_NONBLOCK flag,
  reclaim is not attempted (Maarten)
- Extract region from the pool parameter rather than
  passing it explicitly to set_resource_xxx().
v3:
- Use an rwsem to protect reclaim callback registration and
  region unregister against concurrent reclaim invocations,
  ensuring reclaim_priv is visible when the callback is
  invoked. (Sashiko-bot)

Assisted-by: GitHub_Copilot:claude-sonnet-4.6
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 include/linux/cgroup_dmem.h |  24 ++++++++
 kernel/cgroup/dmem.c        | 106 +++++++++++++++++++++++++++++++++---
 2 files changed, 121 insertions(+), 9 deletions(-)

diff --git a/include/linux/cgroup_dmem.h b/include/linux/cgroup_dmem.h
index dd4869f1d736..c3bce21cbe80 100644
--- a/include/linux/cgroup_dmem.h
+++ b/include/linux/cgroup_dmem.h
@@ -14,6 +14,21 @@ struct dmem_cgroup_pool_state;
 /* Opaque definition of a cgroup region, used internally */
 struct dmem_cgroup_region;
 
+/**
+ * typedef dmem_cgroup_reclaim_fn_t - Reclaim callback for a dmem cgroup region.
+ * @pool: The cgroup pool that needs memory reclaimed.
+ * @target_bytes: Minimum number of bytes the driver should attempt to free.
+ * @priv: Private data registered with dmem_cgroup_region_set_reclaim().
+ *
+ * Called by the dmem cgroup controller when dmem.max is set below the current
+ * usage of @pool. The driver should evict at least @target_bytes of memory
+ * from @pool. May be called multiple times if usage remains above the limit.
+ *
+ * Return: 0 if progress was made, negative error code otherwise.
+ */
+typedef int (*dmem_cgroup_reclaim_fn_t)(struct dmem_cgroup_pool_state *pool,
+					u64 target_bytes, void *priv);
+
 #if IS_ENABLED(CONFIG_CGROUP_DMEM)
 struct dmem_cgroup_region *dmem_cgroup_register_region(u64 size, const char *name_fmt, ...) __printf(2,3);
 void dmem_cgroup_unregister_region(struct dmem_cgroup_region *region);
@@ -26,6 +41,9 @@ bool dmem_cgroup_state_evict_valuable(struct dmem_cgroup_pool_state *limit_pool,
 				      bool ignore_low, bool *ret_hit_low);
 
 void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool);
+void dmem_cgroup_region_set_reclaim(struct dmem_cgroup_region *region,
+				    dmem_cgroup_reclaim_fn_t reclaim,
+				    void *priv);
 #else
 static inline __printf(2,3) struct dmem_cgroup_region *
 dmem_cgroup_register_region(u64 size, const char *name_fmt, ...)
@@ -62,5 +80,11 @@ bool dmem_cgroup_state_evict_valuable(struct dmem_cgroup_pool_state *limit_pool,
 static inline void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool)
 { }
 
+static inline void
+dmem_cgroup_region_set_reclaim(struct dmem_cgroup_region *region,
+			       dmem_cgroup_reclaim_fn_t reclaim,
+			       void *priv)
+{ }
+
 #endif
 #endif	/* _CGROUP_DMEM_H */
diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
index 1ab1fb47f271..5fd5a1634d21 100644
--- a/kernel/cgroup/dmem.c
+++ b/kernel/cgroup/dmem.c
@@ -51,6 +51,20 @@ struct dmem_cgroup_region {
 	 * No new pools should be added to the region afterwards.
 	 */
 	bool unregistered;
+
+	/**
+	 * @reclaim: Optional callback invoked when dmem.max is set below the
+	 * current usage of a pool. The driver should attempt to free at least
+	 * @target_bytes from @pool. May be called multiple times if usage
+	 * remains above the limit after returning.
+	 */
+	dmem_cgroup_reclaim_fn_t reclaim;
+
+	/** @reclaim_priv: Private data passed to @reclaim. */
+	void *reclaim_priv;
+
+	/** @unregister_sem: Protect @reclaim while it is running. */
+	struct rw_semaphore unregister_sem;
 };
 
 struct dmemcg_state {
@@ -145,21 +159,58 @@ static void free_cg_pool(struct dmem_cgroup_pool_state *pool)
 }
 
 static void
-set_resource_min(struct dmem_cgroup_pool_state *pool, u64 val)
+set_resource_min(struct dmem_cgroup_pool_state *pool, u64 val, bool nonblock)
 {
 	page_counter_set_min(&pool->cnt, val);
 }
 
 static void
-set_resource_low(struct dmem_cgroup_pool_state *pool, u64 val)
+set_resource_low(struct dmem_cgroup_pool_state *pool, u64 val, bool nonblock)
 {
 	page_counter_set_low(&pool->cnt, val);
 }
 
 static void
-set_resource_max(struct dmem_cgroup_pool_state *pool, u64 val)
+set_resource_max(struct dmem_cgroup_pool_state *pool, u64 val, bool nonblock)
 {
-	page_counter_set_max(&pool->cnt, val);
+	struct dmem_cgroup_region *region = pool->region;
+
+	/*
+	 * Always update the limit, even if usage currently exceeds it.
+	 * Concurrent allocations will be throttled against the new limit
+	 * while reclaim is in progress.
+	 */
+	xchg(&pool->cnt.max, (unsigned long)val);
+
+	if (nonblock || !READ_ONCE(region->reclaim))
+		return;
+
+	for (int retries = 5; retries > 0; retries--) {
+		u64 usage = page_counter_read(&pool->cnt);
+		int ret;
+
+		if (usage <= val)
+			break;
+
+		if (signal_pending(current))
+			break;
+
+		/* Block unregister until the reclaim callback completes. */
+		if (down_read_interruptible(&region->unregister_sem))
+			break;
+
+		if (!region->reclaim) {
+			up_read(&region->unregister_sem);
+			break;
+		}
+
+		ret = region->reclaim(pool, usage - val, region->reclaim_priv);
+		up_read(&region->unregister_sem);
+		if (ret)
+			break;
+
+		cond_resched();
+	}
 }
 
 static u64 get_resource_low(struct dmem_cgroup_pool_state *pool)
@@ -184,9 +235,9 @@ static u64 get_resource_current(struct dmem_cgroup_pool_state *pool)
 
 static void reset_all_resource_limits(struct dmem_cgroup_pool_state *rpool)
 {
-	set_resource_min(rpool, 0);
-	set_resource_low(rpool, 0);
-	set_resource_max(rpool, PAGE_COUNTER_MAX);
+	set_resource_min(rpool, 0, false);
+	set_resource_low(rpool, 0, false);
+	set_resource_max(rpool, PAGE_COUNTER_MAX, false);
 }
 
 static void dmemcs_offline(struct cgroup_subsys_state *css)
@@ -491,6 +542,12 @@ void dmem_cgroup_unregister_region(struct dmem_cgroup_region *region)
 	region->unregistered = true;
 	spin_unlock(&dmemcg_lock);
 
+	/* Ensure all reclaim() callbacks have finished. */
+	down_write(&region->unregister_sem);
+	/* Pairs with READ_ONCE() in set_resource_max() */
+	WRITE_ONCE(region->reclaim, NULL);
+	up_write(&region->unregister_sem);
+
 	kref_put(&region->ref, dmemcg_free_region);
 }
 EXPORT_SYMBOL_GPL(dmem_cgroup_unregister_region);
@@ -530,6 +587,7 @@ struct dmem_cgroup_region *dmem_cgroup_register_region(u64 size, const char *fmt
 	INIT_LIST_HEAD(&ret->pools);
 	ret->name = region_name;
 	ret->size = size;
+	init_rwsem(&ret->unregister_sem);
 	kref_init(&ret->ref);
 
 	spin_lock(&dmemcg_lock);
@@ -568,6 +626,34 @@ void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool)
 }
 EXPORT_SYMBOL_GPL(dmem_cgroup_pool_state_put);
 
+/**
+ * dmem_cgroup_region_set_reclaim() - Register a reclaim callback on a region.
+ * @region: The region to register the callback for.
+ * @reclaim: Callback to invoke when dmem.max is set below current usage.
+ *           Called with the pool that needs reclaiming and the number of
+ *           bytes to free. Returns 0 on progress, negative on failure.
+ * @priv: Opaque pointer passed back to @reclaim.
+ *
+ * When dmem.max is lowered below the current usage of a cgroup pool, the
+ * dmem controller will call @reclaim with a target number of bytes to free.
+ * After @reclaim returns the controller retries setting the limit; if usage
+ * is still too high it calls @reclaim again, up to a bounded retry count.
+ */
+void dmem_cgroup_region_set_reclaim(struct dmem_cgroup_region *region,
+				    dmem_cgroup_reclaim_fn_t reclaim,
+				    void *priv)
+{
+	if (!region)
+		return;
+
+	down_write(&region->unregister_sem);
+	region->reclaim_priv = priv;
+	/* Pairs with READ_ONCE() in set_resource_max() */
+	WRITE_ONCE(region->reclaim, reclaim);
+	up_write(&region->unregister_sem);
+}
+EXPORT_SYMBOL_GPL(dmem_cgroup_region_set_reclaim);
+
 static struct dmem_cgroup_pool_state *
 get_cg_pool_unlocked(struct dmemcg_state *cg, struct dmem_cgroup_region *region)
 {
@@ -725,9 +811,10 @@ static int dmemcg_parse_limit(char *options, u64 *new_limit)
 
 static ssize_t dmemcg_limit_write(struct kernfs_open_file *of,
 				 char *buf, size_t nbytes, loff_t off,
-				 void (*apply)(struct dmem_cgroup_pool_state *, u64))
+				 void (*apply)(struct dmem_cgroup_pool_state *, u64, bool))
 {
 	struct dmemcg_state *dmemcs = css_to_dmemcs(of_css(of));
+	bool nonblock = of->file->f_flags & O_NONBLOCK;
 	int err = 0;
 
 	while (buf && !err) {
@@ -772,7 +859,8 @@ static ssize_t dmemcg_limit_write(struct kernfs_open_file *of,
 		}
 
 		/* And commit */
-		apply(pool, new_limit);
+		apply(pool, new_limit, nonblock);
+
 		dmemcg_pool_put(pool);
 
 out_put:
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 3/5] drm/ttm: Hook up a cgroup-aware reclaim callback for the dmem controller
  2026-05-12  8:24 [PATCH v4 0/5] Add reclaim to the dmem cgroup controller Thomas Hellström
  2026-05-12  8:24 ` [PATCH v4 1/5] drm/amdgpu: Fix init ordering in amdgpu_vram_mgr_init() Thomas Hellström
  2026-05-12  8:24 ` [PATCH v4 2/5] cgroup/dmem: Add reclaim callback for lowering max below current usage Thomas Hellström
@ 2026-05-12  8:24 ` Thomas Hellström
  2026-05-16  4:04   ` Claude review: " Claude Code Review Bot
  2026-05-12  8:24 ` [PATCH v4 4/5] drm/xe: Wire up dmem cgroup reclaim for VRAM manager Thomas Hellström
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Thomas Hellström @ 2026-05-12  8:24 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Natalie Vock, Johannes Weiner, Tejun Heo,
	Michal Koutný, cgroups, Huang Rui, Matthew Brost,
	Matthew Auld, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Simona Vetter, David Airlie, Christian König, Alex Deucher,
	Rodrigo Vivi, dri-devel, amd-gfx, linux-kernel

Add ttm_bo_evict_cgroup() to evict buffer objects charged to a specific
dmem cgroup pool from a resource manager's LRU until a byte target is
met.  Add ttm_resource_manager_set_dmem_region() to register the TTM
eviction path as the reclaim callback for a dmem cgroup region.

The eviction context is interruptible; signals abort the operation and
propagate back through the write() syscall.

Introduce a new mode for the bo LRU walker so that sleeping locks
can be taken. This can be used when the caller doesn't hold any
previous dma_resv locks, and where it intends to hold at most
one lock at a time.

Like the rest of the TTM eviction this should sooner than later
be converted to full WW transactions.

v3:
- Fix ttm_resource_manager_set_dmem_region() storing an error pointer
  in man->cg unconditionally. (Sashiko-bot)
- Fix kernel-doc function name format for ttm_bo_evict_cgroup() and
  ttm_resource_manager_set_dmem_region().

Assisted-by: GitHub_Copilot:claude-sonnet-4.6
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c       | 95 +++++++++++++++++++++++++++++-
 drivers/gpu/drm/ttm/ttm_bo_util.c  |  3 +-
 drivers/gpu/drm/ttm/ttm_resource.c | 37 ++++++++++++
 include/drm/ttm/ttm_bo.h           | 10 ++++
 include/drm/ttm/ttm_resource.h     |  4 ++
 5 files changed, 145 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 293401705542..3d0872f2f14d 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -515,12 +515,20 @@ static s64 ttm_bo_evict_cb(struct ttm_lru_walk *walk, struct ttm_buffer_object *
 {
 	struct ttm_bo_evict_walk *evict_walk =
 		container_of(walk, typeof(*evict_walk), walk);
+	/* Capture size before eviction in case res is cleared. */
+	s64 bo_size = bo->base.size;
 	s64 lret;
 
 	if (!dmem_cgroup_state_evict_valuable(evict_walk->limit_pool, bo->resource->css,
 					      evict_walk->try_low, &evict_walk->hit_low))
 		return 0;
 
+	/*
+	 * evict_walk->place is NULL in cgroup drain mode.  Drivers'
+	 * eviction_valuable() callbacks must handle a NULL place, treating it
+	 * as "any placement": the TTM base implementation already does so via
+	 * ttm_resource_intersects().
+	 */
 	if (bo->pin_count || !bo->bdev->funcs->eviction_valuable(bo, evict_walk->place))
 		return 0;
 
@@ -536,11 +544,15 @@ static s64 ttm_bo_evict_cb(struct ttm_lru_walk *walk, struct ttm_buffer_object *
 		goto out;
 
 	evict_walk->evicted++;
-	if (evict_walk->res)
+	if (evict_walk->res) {
 		lret = ttm_resource_alloc(evict_walk->evictor, evict_walk->place,
 					  evict_walk->res, NULL);
-	if (lret == 0)
-		return 1;
+		if (lret == 0)
+			return 1;
+	} else {
+		/* Cgroup drain: return bytes freed for byte-denominated progress. */
+		return bo_size;
+	}
 out:
 	/* Errors that should terminate the walk. */
 	if (lret == -ENOSPC)
@@ -614,6 +626,83 @@ static int ttm_bo_evict_alloc(struct ttm_device *bdev,
 	return 0;
 }
 
+/**
+ * ttm_bo_evict_cgroup() - Evict buffer objects charged to a specific cgroup.
+ * @bdev: The TTM device.
+ * @man: The resource manager whose LRU to walk.
+ * @limit_pool: The cgroup pool state whose members should be evicted.
+ * @target_bytes: Number of bytes to free.
+ * @ctx: The TTM operation context.
+ *
+ * Walk the LRU of @man and evict buffer objects that are charged to the
+ * cgroup identified by @limit_pool, until at least @target_bytes have been
+ * freed.  Mirrors the two-pass (trylock -> sleeping-lock, low-watermark)
+ * strategy used by ttm_bo_evict_alloc().
+ *
+ * Return: >= @target_bytes on full success, 0..target_bytes-1 if partial,
+ *         negative error code on fatal error.
+ */
+s64 ttm_bo_evict_cgroup(struct ttm_device *bdev,
+			struct ttm_resource_manager *man,
+			struct dmem_cgroup_pool_state *limit_pool,
+			s64 target_bytes,
+			struct ttm_operation_ctx *ctx)
+{
+	struct ttm_bo_evict_walk evict_walk = {
+		.walk = {
+			.ops = &ttm_evict_walk_ops,
+			.arg = { .ctx = ctx },
+		},
+		.limit_pool = limit_pool,
+		/* place, evictor, res left NULL: selects cgroup drain mode */
+	};
+	s64 lret, pass;
+
+	evict_walk.walk.arg.trylock_only = true;
+	lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, target_bytes);
+	if (lret < 0 || lret >= target_bytes)
+		return lret;
+
+	/* Second pass: also evict BOs at the low watermark. */
+	if (evict_walk.hit_low) {
+		evict_walk.try_low = true;
+		pass = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man,
+					      target_bytes - lret);
+		if (pass < 0)
+			return pass;
+		lret += pass;
+		if (lret >= target_bytes)
+			return lret;
+	}
+
+	/* Full sleeping-lock pass for remaining target. */
+	evict_walk.try_low = evict_walk.hit_low = false;
+	evict_walk.walk.arg.trylock_only = false;
+
+retry:
+	evict_walk.walk.arg.sleeping_lock = true;
+	do {
+		evict_walk.evicted = 0;
+		pass = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man,
+					      target_bytes - lret);
+		if (pass < 0) {
+			lret = pass;
+			goto out;
+		}
+		lret += pass;
+	} while (lret < target_bytes && evict_walk.evicted);
+
+	/* One more attempt if we hit the low limit during sleeping-lock pass. */
+	if (lret < target_bytes && evict_walk.hit_low && !evict_walk.try_low) {
+		evict_walk.try_low = true;
+		goto retry;
+	}
+
+out:
+	return lret;
+}
+EXPORT_SYMBOL(ttm_bo_evict_cgroup);
+
 /**
  * ttm_bo_pin - Pin the buffer object.
  * @bo: The buffer object to pin
diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c
index f83b7d5ec6c6..81c6a674c462 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -999,7 +999,8 @@ __ttm_bo_lru_cursor_next(struct ttm_bo_lru_cursor *curs)
 		bo = res->bo;
 		if (ttm_lru_walk_trylock(curs, bo))
 			bo_locked = true;
-		else if (!arg->ticket || arg->ctx->no_wait_gpu || arg->trylock_only)
+		else if ((!arg->ticket && !arg->sleeping_lock) || arg->ctx->no_wait_gpu ||
+			 arg->trylock_only)
 			continue;
 
 		if (!ttm_bo_get_unless_zero(bo)) {
diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c
index 0e5f1582f13d..6d24f3320892 100644
--- a/drivers/gpu/drm/ttm/ttm_resource.c
+++ b/drivers/gpu/drm/ttm/ttm_resource.c
@@ -950,3 +950,40 @@ void ttm_resource_manager_create_debugfs(struct ttm_resource_manager *man,
 #endif
 }
 EXPORT_SYMBOL(ttm_resource_manager_create_debugfs);
+
+static int ttm_resource_manager_dmem_reclaim(struct dmem_cgroup_pool_state *pool,
+					     u64 target_bytes, void *priv)
+{
+	struct ttm_resource_manager *man = priv;
+	struct ttm_operation_ctx ctx = { .interruptible = true };
+	s64 freed;
+
+	freed = ttm_bo_evict_cgroup(man->bdev, man, pool, target_bytes, &ctx);
+	if (freed < 0)
+		return freed;
+
+	return freed >= (s64)target_bytes ? 0 : -ENOSPC;
+}
+
+/**
+ * ttm_resource_manager_set_dmem_region() - Associate a dmem cgroup region with a
+ *                                        resource manager and register a reclaim
+ *                                        callback.
+ * @man: The resource manager.
+ * @region: The dmem cgroup region to associate, may be NULL or IS_ERR().
+ *
+ * Sets @man->cg and registers ttm_resource_manager_dmem_reclaim() so that
+ * writing to dmem.max below current usage triggers TTM eviction rather than
+ * returning -EBUSY to userspace.
+ */
+void ttm_resource_manager_set_dmem_region(struct ttm_resource_manager *man,
+					  struct dmem_cgroup_region *region)
+{
+	if (!IS_ERR_OR_NULL(region)) {
+		man->cg = region;
+		dmem_cgroup_region_set_reclaim(region,
+					       ttm_resource_manager_dmem_reclaim,
+					       man);
+	}
+}
+EXPORT_SYMBOL(ttm_resource_manager_set_dmem_region);
diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
index 8310bc3d55f9..32791c4db2a9 100644
--- a/include/drm/ttm/ttm_bo.h
+++ b/include/drm/ttm/ttm_bo.h
@@ -226,6 +226,11 @@ struct ttm_lru_walk_arg {
 	struct ww_acquire_ctx *ticket;
 	/** @trylock_only: Only use trylock for locking. */
 	bool trylock_only;
+	/**
+	 * @sleeping_lock: Use sleeping locks even with %NULL @ticket.
+	 * @trylock_only has precedence over this field.
+	 */
+	bool sleeping_lock;
 };
 
 /**
@@ -431,6 +436,11 @@ void ttm_bo_unpin(struct ttm_buffer_object *bo);
 int ttm_bo_evict_first(struct ttm_device *bdev,
 		       struct ttm_resource_manager *man,
 		       struct ttm_operation_ctx *ctx);
+s64 ttm_bo_evict_cgroup(struct ttm_device *bdev,
+			struct ttm_resource_manager *man,
+			struct dmem_cgroup_pool_state *limit_pool,
+			s64 target_bytes,
+			struct ttm_operation_ctx *ctx);
 int ttm_bo_access(struct ttm_buffer_object *bo, unsigned long offset,
 		  void *buf, int len, int write);
 vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo,
diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h
index a5d386583fb6..1ba3ae39763c 100644
--- a/include/drm/ttm/ttm_resource.h
+++ b/include/drm/ttm/ttm_resource.h
@@ -39,6 +39,7 @@
 
 struct dentry;
 struct dmem_cgroup_device;
+struct dmem_cgroup_region;
 struct drm_printer;
 struct ttm_device;
 struct ttm_resource_manager;
@@ -477,6 +478,9 @@ void ttm_resource_manager_init(struct ttm_resource_manager *man,
 			       struct ttm_device *bdev,
 			       uint64_t size);
 
+void ttm_resource_manager_set_dmem_region(struct ttm_resource_manager *man,
+					  struct dmem_cgroup_region *region);
+
 int ttm_resource_manager_evict_all(struct ttm_device *bdev,
 				   struct ttm_resource_manager *man);
 
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 4/5] drm/xe: Wire up dmem cgroup reclaim for VRAM manager
  2026-05-12  8:24 [PATCH v4 0/5] Add reclaim to the dmem cgroup controller Thomas Hellström
                   ` (2 preceding siblings ...)
  2026-05-12  8:24 ` [PATCH v4 3/5] drm/ttm: Hook up a cgroup-aware reclaim callback for the dmem controller Thomas Hellström
@ 2026-05-12  8:24 ` Thomas Hellström
  2026-05-16  4:04   ` Claude review: " Claude Code Review Bot
  2026-05-12  8:24 ` [PATCH v4 5/5] drm/amdgpu: " Thomas Hellström
  2026-05-16  4:03 ` Claude review: Add reclaim to the dmem cgroup controller Claude Code Review Bot
  5 siblings, 1 reply; 13+ messages in thread
From: Thomas Hellström @ 2026-05-12  8:24 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Natalie Vock, Johannes Weiner, Tejun Heo,
	Michal Koutný, cgroups, Huang Rui, Matthew Brost,
	Matthew Auld, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Simona Vetter, David Airlie, Christian König, Alex Deucher,
	Rodrigo Vivi, dri-devel, amd-gfx, linux-kernel

Register the VRAM manager with the dmem cgroup reclaim infrastructure
so that lowering dmem.max below current VRAM usage triggers TTM
eviction rather than failing with -EBUSY.

v4:
- Rebased on drm-tip; dropped the XE_PL_STOLEN guard as stolen memory
  uses a separate TTM manager and never calls __xe_ttm_vram_mgr_init().

Assisted-by: GitHub_Copilot:claude-sonnet-4.6
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
index b518f7dec680..d8d596d8575b 100644
--- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c
@@ -299,14 +299,10 @@ int __xe_ttm_vram_mgr_init(struct xe_device *xe, struct xe_ttm_vram_mgr *mgr,
 			   u64 default_page_size)
 {
 	struct ttm_resource_manager *man = &mgr->manager;
+	struct dmem_cgroup_region *cg;
 	const char *name;
 	int err;
 
-	name = mem_type == XE_PL_VRAM0 ? "vram0" : "vram1";
-	man->cg = drmm_cgroup_register_region(&xe->drm, name, size);
-	if (IS_ERR(man->cg))
-		return PTR_ERR(man->cg);
-
 	man->func = &xe_ttm_vram_mgr_func;
 	mgr->mem_type = mem_type;
 	err = drmm_mutex_init(&xe->drm, &mgr->lock);
@@ -317,6 +313,14 @@ int __xe_ttm_vram_mgr_init(struct xe_device *xe, struct xe_ttm_vram_mgr *mgr,
 	mgr->visible_avail = io_size;
 
 	ttm_resource_manager_init(man, &xe->ttm, size);
+
+	name = mem_type == XE_PL_VRAM0 ? "vram0" : "vram1";
+	cg = drmm_cgroup_register_region(&xe->drm, name, size);
+	if (IS_ERR(cg))
+		return PTR_ERR(cg);
+
+	ttm_resource_manager_set_dmem_region(man, cg);
+
 	err = gpu_buddy_init(&mgr->mm, man->size, default_page_size);
 	if (err)
 		return err;
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 5/5] drm/amdgpu: Wire up dmem cgroup reclaim for VRAM manager
  2026-05-12  8:24 [PATCH v4 0/5] Add reclaim to the dmem cgroup controller Thomas Hellström
                   ` (3 preceding siblings ...)
  2026-05-12  8:24 ` [PATCH v4 4/5] drm/xe: Wire up dmem cgroup reclaim for VRAM manager Thomas Hellström
@ 2026-05-12  8:24 ` Thomas Hellström
  2026-05-16  4:04   ` Claude review: " Claude Code Review Bot
  2026-05-16  4:03 ` Claude review: Add reclaim to the dmem cgroup controller Claude Code Review Bot
  5 siblings, 1 reply; 13+ messages in thread
From: Thomas Hellström @ 2026-05-12  8:24 UTC (permalink / raw)
  To: intel-xe
  Cc: Thomas Hellström, Natalie Vock, Johannes Weiner, Tejun Heo,
	Michal Koutný, cgroups, Huang Rui, Matthew Brost,
	Matthew Auld, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Simona Vetter, David Airlie, Christian König, Alex Deucher,
	Rodrigo Vivi, dri-devel, amd-gfx, linux-kernel

Register the VRAM manager with the dmem cgroup reclaim infrastructure
so that lowering dmem.max below current VRAM usage triggers TTM
eviction rather than failing with -EBUSY.

Guard place->flags in amdgpu_ttm_bo_eviction_valuable() against NULL,
as the TTM reclaim path passes a NULL place in cgroup drain mode.

v3:
- Rebased on fix for uninitialized list and buddy allocator on the
  drmm_cgroup_register_region() error path.

Assisted-by: GitHub_Copilot:claude-sonnet-4.6
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c      | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 9 ++++++---
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 2740de94e93c..8cbcd33f51a5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1488,7 +1488,7 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
 	dma_resv_for_each_fence(&resv_cursor, bo->base.resv,
 				DMA_RESV_USAGE_BOOKKEEP, f) {
 		if (amdkfd_fence_check_mm(f, current->mm) &&
-		    !(place->flags & TTM_PL_FLAG_CONTIGUOUS))
+		    !(place && (place->flags & TTM_PL_FLAG_CONTIGUOUS)))
 			return false;
 	}
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index ac3f71d77140..a1f1ae264a40 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -916,6 +916,7 @@ int amdgpu_vram_mgr_init(struct amdgpu_device *adev)
 {
 	struct amdgpu_vram_mgr *mgr = &adev->mman.vram_mgr;
 	struct ttm_resource_manager *man = &mgr->manager;
+	struct dmem_cgroup_region *cg;
 	int err;
 
 	ttm_resource_manager_init(man, &adev->mman.bdev,
@@ -932,9 +933,11 @@ int amdgpu_vram_mgr_init(struct amdgpu_device *adev)
 	if (err)
 		return err;
 
-	man->cg = drmm_cgroup_register_region(adev_to_drm(adev), "vram", adev->gmc.real_vram_size);
-	if (IS_ERR(man->cg))
-		return PTR_ERR(man->cg);
+	cg = drmm_cgroup_register_region(adev_to_drm(adev), "vram",
+					 adev->gmc.real_vram_size);
+	if (IS_ERR(cg))
+		return PTR_ERR(cg);
+	ttm_resource_manager_set_dmem_region(man, cg);
 
 	ttm_set_driver_manager(&adev->mman.bdev, TTM_PL_VRAM, &mgr->manager);
 	ttm_resource_manager_set_used(man, true);
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Claude review: Add reclaim to the dmem cgroup controller
  2026-05-12  8:24 [PATCH v4 0/5] Add reclaim to the dmem cgroup controller Thomas Hellström
                   ` (4 preceding siblings ...)
  2026-05-12  8:24 ` [PATCH v4 5/5] drm/amdgpu: " Thomas Hellström
@ 2026-05-16  4:03 ` Claude Code Review Bot
  5 siblings, 0 replies; 13+ messages in thread
From: Claude Code Review Bot @ 2026-05-16  4:03 UTC (permalink / raw)
  To: dri-devel-reviews

Overall Series Review

Subject: Add reclaim to the dmem cgroup controller
Author: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= <thomas.hellstrom@linux.intel.com>
Patches: 6
Reviewed: 2026-05-16T14:03:59.614912

---

This is a well-structured v4 series that adds reclaim support to the dmem cgroup controller, enabling `dmem.max` writes below current usage to trigger TTM eviction rather than silently failing. The layering is clean: patch 1 is a standalone bug fix, patch 2 adds the core cgroup infrastructure, patch 3 provides the TTM eviction machinery, and patches 4-5 wire up xe and amdgpu respectively.

The locking design (rwsem protecting reclaim callbacks against concurrent unregister) is sound, signal handling is proper, and the O_NONBLOCK support is a nice touch. The multi-pass eviction strategy in `ttm_bo_evict_cgroup()` correctly mirrors the existing `ttm_bo_evict_alloc()` approach. The series is generally in good shape for v4.

One design consideration worth flagging: partial reclaim progress currently terminates the retry loop rather than retrying with a smaller target, which is conservative but may leave reclaimable memory on the table. This is discussed further in patch 2/3 below.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Claude review: drm/amdgpu: Fix init ordering in amdgpu_vram_mgr_init()
  2026-05-12  8:24 ` [PATCH v4 1/5] drm/amdgpu: Fix init ordering in amdgpu_vram_mgr_init() Thomas Hellström
@ 2026-05-16  4:03   ` Claude Code Review Bot
  0 siblings, 0 replies; 13+ messages in thread
From: Claude Code Review Bot @ 2026-05-16  4:03 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Verdict: Good. Straightforward correctness fix.**

The commit message clearly explains the crash: `drmm_cgroup_register_region()` was called before `INIT_LIST_HEAD()` and `gpu_buddy_init()`, so a registration failure left the lists zero-initialized (not empty) and the buddy allocator uninitialized, causing NULL dereferences in `amdgpu_vram_mgr_fini()`.

The fix correctly moves the cgroup registration after the list and buddy allocator initialization:

```c
+	man->cg = drmm_cgroup_register_region(adev_to_drm(adev), "vram", adev->gmc.real_vram_size);
+	if (IS_ERR(man->cg))
+		return PTR_ERR(man->cg);
```

Now placed after `gpu_buddy_init()` succeeds, so the teardown path can safely iterate the lists and call `gpu_buddy_fini()`. The Fixes/Cc-stable tags are appropriate. No issues.

Note: Patch 5 replaces this exact code with the new `ttm_resource_manager_set_dmem_region()` helper, so patch 1 exists as a standalone backportable fix for stable kernels. This split is correct.

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Claude review: cgroup/dmem: Add reclaim callback for lowering max below current usage
  2026-05-12  8:24 ` [PATCH v4 2/5] cgroup/dmem: Add reclaim callback for lowering max below current usage Thomas Hellström
@ 2026-05-16  4:04   ` Claude Code Review Bot
  0 siblings, 0 replies; 13+ messages in thread
From: Claude Code Review Bot @ 2026-05-16  4:04 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Verdict: Good with minor observations.**

**Core design change — `xchg` replacing `page_counter_set_max`:**

```c
+	xchg(&pool->cnt.max, (unsigned long)val);
```

The original `page_counter_set_max()` refuses to set max below current usage (returns `-EBUSY`). The new code unconditionally sets max via `xchg`, which preserves the full memory barrier semantics while allowing max < usage. This is the right approach: it throttles concurrent allocations immediately while reclaim runs. Well motivated.

**Retry loop termination on partial progress:**

```c
+	for (int retries = 5; retries > 0; retries--) {
+		...
+		ret = region->reclaim(pool, usage - val, region->reclaim_priv);
+		...
+		if (ret)
+			break;
```

The reclaim callback (in patch 3) returns `-ENOSPC` when partial progress is made. This causes the retry loop to break even if some memory was freed. A more aggressive approach would be to retry on partial progress (similar to how the memory cgroup controller distinguishes `-EAGAIN` for "retry" from hard errors). However, since `page_counter_read()` reflects the actual usage and the max is already set to throttle new allocations, this best-effort approach is defensible and matches the stated design intent.

**rwsem protection looks correct:**

- `down_write` in `dmem_cgroup_unregister_region()` ensures all in-flight reclaim callbacks complete before clearing the callback pointer.
- `down_read_interruptible` in `set_resource_max()` allows concurrent reclaim and handles signals gracefully.
- `WRITE_ONCE`/`READ_ONCE` pairs on `region->reclaim` are properly placed.

**Unused `nonblock` parameter in `set_resource_min`/`set_resource_low`:**

```c
 static void
-set_resource_min(struct dmem_cgroup_pool_state *pool, u64 val)
+set_resource_min(struct dmem_cgroup_pool_state *pool, u64 val, bool nonblock)
```

These functions accept `nonblock` but ignore it, keeping the function pointer signature uniform for `dmemcg_limit_write()`. This is clean.

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Claude review: drm/ttm: Hook up a cgroup-aware reclaim callback for the dmem controller
  2026-05-12  8:24 ` [PATCH v4 3/5] drm/ttm: Hook up a cgroup-aware reclaim callback for the dmem controller Thomas Hellström
@ 2026-05-16  4:04   ` Claude Code Review Bot
  0 siblings, 0 replies; 13+ messages in thread
From: Claude Code Review Bot @ 2026-05-16  4:04 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Verdict: Good. The TTM integration is solid.**

**NULL place handling in `ttm_bo_evict_cb`:**

```c
+	/*
+	 * evict_walk->place is NULL in cgroup drain mode.  Drivers'
+	 * eviction_valuable() callbacks must handle a NULL place, treating it
+	 * as "any placement": the TTM base implementation already does so via
+	 * ttm_resource_intersects().
+	 */
 	if (bo->pin_count || !bo->bdev->funcs->eviction_valuable(bo, evict_walk->place))
```

I verified that `ttm_resource_intersects()` returns `true` for NULL place, so the base `ttm_bo_eviction_valuable()` correctly treats NULL as "any placement." The xe driver's `xe_bo_eviction_valuable()` calls the base first, so it's also safe. The amdgpu driver is handled in patch 5.

However, the comment says "Drivers' eviction_valuable() callbacks **must** handle a NULL place" but there's no compile-time or runtime enforcement. Any out-of-tree or future driver that dereferences `place` without a NULL check would crash. A comment in the `eviction_valuable` callback documentation (`struct ttm_device_funcs`) would be valuable to prevent future regressions.

**`bo_size` capture:**

```c
+	s64 bo_size = bo->base.size;
```

This correctly captures the GEM object size before eviction may clear the resource. Note that `bo->base.size` is the GEM size, not the VRAM allocation size (which could differ for padded allocations). For progress tracking this is fine since the actual cgroup accounting uses `page_counter_read()`.

**`sleeping_lock` flag:**

```c
+	/**
+	 * @sleeping_lock: Use sleeping locks even with %NULL @ticket.
+	 * @trylock_only has precedence over this field.
+	 */
+	bool sleeping_lock;
```

The change in `ttm_bo_util.c` correctly allows the `ttm_lru_walk_ticketlock` path when `sleeping_lock` is true and no WW ticket is present:

```c
-		else if (!arg->ticket || arg->ctx->no_wait_gpu || arg->trylock_only)
+		else if ((!arg->ticket && !arg->sleeping_lock) || arg->ctx->no_wait_gpu ||
+			 arg->trylock_only)
```

This is correct: without this, the cgroup reclaim path (which has no WW ticket) would only trylock and skip contended BOs, potentially making no progress. The comment in the patch about "sooner than later be converted to full WW transactions" is noted.

**`ttm_resource_manager_set_dmem_region` defensive check:**

```c
+	if (!IS_ERR_OR_NULL(region)) {
+		man->cg = region;
+		dmem_cgroup_region_set_reclaim(region, ...);
+	}
```

The `IS_ERR_OR_NULL` check is defensive since callers already validate, but it prevents storing error pointers (the v3 bug). Reasonable.

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Claude review: drm/xe: Wire up dmem cgroup reclaim for VRAM manager
  2026-05-12  8:24 ` [PATCH v4 4/5] drm/xe: Wire up dmem cgroup reclaim for VRAM manager Thomas Hellström
@ 2026-05-16  4:04   ` Claude Code Review Bot
  0 siblings, 0 replies; 13+ messages in thread
From: Claude Code Review Bot @ 2026-05-16  4:04 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Verdict: Good. Clean integration.**

The patch reorders cgroup registration to after `ttm_resource_manager_init()` and uses the new helper:

```c
+	cg = drmm_cgroup_register_region(&xe->drm, name, size);
+	if (IS_ERR(cg))
+		return PTR_ERR(cg);
+	ttm_resource_manager_set_dmem_region(man, cg);
```

The `drmm_` (managed) allocation ensures cleanup on failure paths. No issues.

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Claude review: drm/amdgpu: Wire up dmem cgroup reclaim for VRAM manager
  2026-05-12  8:24 ` [PATCH v4 5/5] drm/amdgpu: " Thomas Hellström
@ 2026-05-16  4:04   ` Claude Code Review Bot
  0 siblings, 0 replies; 13+ messages in thread
From: Claude Code Review Bot @ 2026-05-16  4:04 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Verdict: Good, with one observation on the NULL place guard.**

**NULL place guard in `amdgpu_ttm_bo_eviction_valuable`:**

```c
 		if (amdkfd_fence_check_mm(f, current->mm) &&
-		    !(place->flags & TTM_PL_FLAG_CONTIGUOUS))
+		    !(place && (place->flags & TTM_PL_FLAG_CONTIGUOUS)))
 			return false;
```

When `place` is NULL (cgroup drain mode), `!(place && ...)` evaluates to `true`, so the condition becomes `amdkfd_fence_check_mm(f, current->mm) && true` → returns `false` (BO not evictable). This means KFD BOs belonging to `current->mm` are protected from cgroup reclaim.

In practice this is likely fine: the cgroup `max` write is typically done by a cgroup management process (systemd, container runtime), not by the KFD process itself, so `amdkfd_fence_check_mm(f, current->mm)` would be false and the check would pass. The conservative behavior (protect KFD BOs from the writing process) is the safe choice here.

**Cgroup registration wiring** mirrors patch 4 and uses the same `ttm_resource_manager_set_dmem_region` helper. Clean.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Claude review: drm/amdgpu: Fix init ordering in amdgpu_vram_mgr_init()
  2026-05-11 17:30 ` [PATCH v3 1/5] drm/amdgpu: Fix init ordering in amdgpu_vram_mgr_init() Thomas Hellström
@ 2026-05-16  4:48   ` Claude Code Review Bot
  0 siblings, 0 replies; 13+ messages in thread
From: Claude Code Review Bot @ 2026-05-16  4:48 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Verdict: Good, correct fix.**

The bug is real and well described. In the current code at `amdgpu_vram_mgr.c:921`, `drmm_cgroup_register_region()` is called before `INIT_LIST_HEAD()` (lines 928-930) and `gpu_buddy_init()` (line 934). If registration fails, `adev->mman.initialized` is already true, so teardown calls `amdgpu_vram_mgr_fini()` which dereferences NULL list head pointers and uninitialized buddy allocator state.

The fix correctly moves registration to after all initialization. If registration fails at the new location, the buddy allocator and lists are properly initialized, so `amdgpu_vram_mgr_fini()` can safely clean them up.

The `Cc: stable` tag for v6.14+ is appropriate since the bug was introduced by commit `2b624a2c1865`.

No issues.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-05-16  4:48 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-12  8:24 [PATCH v4 0/5] Add reclaim to the dmem cgroup controller Thomas Hellström
2026-05-12  8:24 ` [PATCH v4 1/5] drm/amdgpu: Fix init ordering in amdgpu_vram_mgr_init() Thomas Hellström
2026-05-16  4:03   ` Claude review: " Claude Code Review Bot
2026-05-12  8:24 ` [PATCH v4 2/5] cgroup/dmem: Add reclaim callback for lowering max below current usage Thomas Hellström
2026-05-16  4:04   ` Claude review: " Claude Code Review Bot
2026-05-12  8:24 ` [PATCH v4 3/5] drm/ttm: Hook up a cgroup-aware reclaim callback for the dmem controller Thomas Hellström
2026-05-16  4:04   ` Claude review: " Claude Code Review Bot
2026-05-12  8:24 ` [PATCH v4 4/5] drm/xe: Wire up dmem cgroup reclaim for VRAM manager Thomas Hellström
2026-05-16  4:04   ` Claude review: " Claude Code Review Bot
2026-05-12  8:24 ` [PATCH v4 5/5] drm/amdgpu: " Thomas Hellström
2026-05-16  4:04   ` Claude review: " Claude Code Review Bot
2026-05-16  4:03 ` Claude review: Add reclaim to the dmem cgroup controller Claude Code Review Bot
  -- strict thread matches above, loose matches on Subject: below --
2026-05-11 17:30 [PATCH v3 0/5] " Thomas Hellström
2026-05-11 17:30 ` [PATCH v3 1/5] drm/amdgpu: Fix init ordering in amdgpu_vram_mgr_init() Thomas Hellström
2026-05-16  4:48   ` Claude review: " Claude Code Review Bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox