public inbox for drm-ai-reviews@public-inbox.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH v5] cgroup/dmem: implement dmem.high soft limit via prioritized eviction
@ 2026-05-31  8:45 Qiliang Yuan
  2026-06-04  5:03 ` Claude review: " Claude Code Review Bot
  2026-06-04  5:03 ` Claude Code Review Bot
  0 siblings, 2 replies; 3+ messages in thread
From: Qiliang Yuan @ 2026-05-31  8:45 UTC (permalink / raw)
  To: Christian Koenig, Huang Rui, Matthew Auld, Matthew Brost,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Tejun Heo, Johannes Weiner, Michal Koutný,
	Natalie Vock
  Cc: dri-devel, linux-kernel, cgroups, Qiliang Yuan

The dmem cgroup v2 controller currently only provides a hard "max"
limit, which causes immediate allocation failures when a cgroup's
device memory usage reaches its quota.  GPU-bound AI workloads need
smoother over-subscription support: a soft limit that temporarily
allows excess usage while applying backpressure through reclaim
rather than outright failure.

Add dmem.high, a soft limit that penalizes over-limit cgroups by
evicting their buffer objects first when eviction is triggered (e.g.
due to a "max" limit hit).  Unlike the rejected v1 approach which
used sleep-on-allocation throttling, this version provides a
meaningful recovery action through prioritized reclaim.

Expose "high" as a new cgroupfs control file per region via
set_resource_high() and get_resource_high(), and initialize it to
PAGE_COUNTER_MAX in reset_all_resource_limits().  Like get_resource_max(),
get_resource_high() returns PAGE_COUNTER_MAX when the pool is NULL.

Extend dmem_cgroup_state_evict_valuable() with a "try_high"
parameter.  When set, the function walks the page_counter parent
chain to check whether any ancestor exceeds its high limit, and
verifies that the pool is above its effective minimum to respect
dmem.min protection.  For the limit-hitting cgroup's own BOs, the
ancestry check is skipped but the high threshold still applies.

Refactor ttm_bo_evict_alloc() into a 3-pass eviction strategy.
Pass 1 targets only BOs whose cgroup exceeds dmem.high, using a
blocking lock when a ticket is available or trylock otherwise.
Pass 2 falls back to the standard above-elow trylock eviction.
Pass 3+ uses proper locking and repeats while making progress
with the existing low-watermark fallback.

Signed-off-by: Qiliang Yuan <realwujing@gmail.com>
---
Introduce a "high" soft limit for the dmem cgroup v2 controller.
When a "max" limit is hit and eviction is triggered, buffer objects
belonging to cgroups that exceed their dmem.high limit are targeted
first, providing a meaningful recovery action through reclaim.

The dmem cgroup currently only supports hard "max" limits, which
cause immediate allocation failures for GPU-bound workloads. A soft
limit enables smoother over-subscription by penalizing over-limit
cgroups via prioritized eviction rather than outright rejection.

The implementation adds a "high" cgroupfs control file per region,
a try_high parameter to dmem_cgroup_state_evict_valuable() for
tier-1 eviction, and a 3-pass strategy in ttm_bo_evict_alloc().
---
V4 -> V5:
- Restore the original control flow in dmem_cgroup_state_evict_valuable():
  test_pool is no longer dereferenced before the ancestry checks, fixing
  a NULL pointer dereference on BOs without a cgroup.  The limit_pool
  NULL-to-root-cgroup resolution is now performed before the try_high
  block, fixing a panic during global memory pressure eviction.
- Keep the try_high check for limit_pool == test_pool inside the existing
  early-return branch to avoid bypassing the hierarchy constraint check
  that prevents cross-cgroup eviction.
- Use a blocking lock in Pass 1 only when a ticket is available
  (trylock otherwise), addressing the deadlock risk of blocking without
  a valid ww_acquire_ctx.
- Explicitly reset trylock_only to true before Pass 2 so it does not
  inherit Pass 1's blocking behavior.

V3 -> V4:
- Use a blocking lock in Pass 1 instead of trylock to ensure
  over-limit cgroups are penalized even when their BOs are actively
  in use, as requested by Maarten Lankhorst.
- Evaluate the try_high condition before the limit_pool == test_pool
  early-return so that the limit-hitting cgroup's own BOs are also
  filtered by dmem.high.
- Remove the high-priority compensation retry at the start of Pass 3,
  which is no longer needed now that Pass 1 uses a blocking lock.

V2 -> V3:
- Walk the page_counter parent chain in the try_high pass to prevent
  child cgroups from evading the penalty when a parent cgroup exceeds
  its dmem.high limit.
- Check dmem.min protection in the try_high pass to avoid evicting
  BOs below the effective minimum.
- Add a properly-locked high-priority retry at the beginning of Pass 3
  so that actively-used over-limit BOs (which failed trylock in Pass 1)
  are not skipped while innocent cgroups are evicted.
- Fix get_resource_high(NULL) returning 0 instead of PAGE_COUNTER_MAX
  to match the behavior of get_resource_max().

V1 -> V2:
- Replace sleep-on-allocation throttling with prioritized eviction.
  When a "max" limit is hit, BOs from cgroups exceeding dmem.high are
  evicted first in a dedicated pass. No throttling or sleeping is
  performed in the charge path.
- Remove task throttling (schedule_timeout_killable, TIF_NOTIFY_RESUME,
  resume_user_mode_work() integration) entirely.
- Add dmem.high cgroupfs control file per region.
- Extend dmem_cgroup_state_evict_valuable() with try_high parameter
  to target over-limit cgroups as tier-1 eviction.
- Refactor ttm_bo_evict_alloc() into a 3-pass eviction strategy:
  (1) trylock: evict only BOs exceeding dmem.high
  (2) trylock: above-elow
  (3) proper-lock: repeat with low fallback.
- Initialize high to PAGE_COUNTER_MAX in reset_all_resource_limits().

v4: https://lore.kernel.org/r/20260530-feature-dmem-high-v4-1-ee7c6ec1c8da@gmail.com
v3: https://lore.kernel.org/r/20260528-feature-dmem-high-v3-1-c642b34bcb2f@gmail.com
v2: https://lore.kernel.org/r/20260522-feature-dmem-high-v2-1-1d7d4a0fa5da@gmail.com
v1: https://lore.kernel.org/all/20260520-feature-dmem-high-v1-1-97ca0cb7f95a@gmail.com
---
 drivers/gpu/drm/ttm/ttm_bo.c | 33 ++++++++++++++----
 include/linux/cgroup_dmem.h  |  4 +--
 kernel/cgroup/dmem.c         | 79 +++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 103 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index bcd76f6bb7f02..21fe34fd43eec 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -505,6 +505,8 @@ struct ttm_bo_evict_walk {
 
 	/** @limit_pool: Which pool limit we should test against */
 	struct dmem_cgroup_pool_state *limit_pool;
+	/** @try_high: Whether to only evict BO's above the high watermark (first pass) */
+	bool try_high;
 	/** @try_low: Whether we should attempt to evict BO's with low watermark threshold */
 	bool try_low;
 	/** @hit_low: If we cannot evict a bo when @try_low is false (first pass) */
@@ -518,7 +520,8 @@ static s64 ttm_bo_evict_cb(struct ttm_lru_walk *walk, struct ttm_buffer_object *
 	s64 lret;
 
 	if (!dmem_cgroup_state_evict_valuable(evict_walk->limit_pool, bo->resource->css,
-					      evict_walk->try_low, &evict_walk->hit_low))
+					      evict_walk->try_high, evict_walk->try_low,
+					      &evict_walk->hit_low))
 		return 0;
 
 	if (bo->pin_count || !bo->bdev->funcs->eviction_valuable(bo, evict_walk->place))
@@ -577,31 +580,47 @@ static int ttm_bo_evict_alloc(struct ttm_device *bdev,
 	};
 	s64 lret;
 
-	evict_walk.walk.arg.trylock_only = true;
+	/*
+	 * Pass 1 (high-priority): Evict only BOs whose cgroup exceeds its
+	 * dmem.high soft limit.  A blocking lock is used when a ticket is
+	 * available to ensure over-limit cgroups are penalized even when
+	 * their BOs are actively in use; trylock otherwise.
+	 */
+	evict_walk.walk.arg.trylock_only = !ticket;
+	evict_walk.try_high = true;
 	lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
+	evict_walk.try_high = false;
+	if (lret)
+		goto out;
 
-	/* One more attempt if we hit low limit? */
+	/*
+	 * Pass 2 (trylock): Evict BOs above the effective low watermark.
+	 * Falls back to low-priority eviction if needed.
+	 */
+	evict_walk.walk.arg.trylock_only = true;
+	lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
 	if (!lret && evict_walk.hit_low) {
 		evict_walk.try_low = true;
 		lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
 	}
+
 	if (lret || !ticket)
 		goto out;
 
-	/* Reset low limit */
+	/*
+	 * Pass 3+ (properly locked): Evict while making progress.
+	 * Reset flags and retry with try_low if we hit the low watermark.
+	 */
 	evict_walk.try_low = evict_walk.hit_low = false;
-	/* If ticket-locking, repeat while making progress. */
 	evict_walk.walk.arg.trylock_only = false;
 
 retry:
 	do {
-		/* The walk may clear the evict_walk.walk.ticket field */
 		evict_walk.walk.arg.ticket = ticket;
 		evict_walk.evicted = 0;
 		lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
 	} while (!lret && evict_walk.evicted);
 
-	/* We hit the low limit? Try once more */
 	if (!lret && evict_walk.hit_low && !evict_walk.try_low) {
 		evict_walk.try_low = true;
 		goto retry;
diff --git a/include/linux/cgroup_dmem.h b/include/linux/cgroup_dmem.h
index dd4869f1d736e..06115d35509b1 100644
--- a/include/linux/cgroup_dmem.h
+++ b/include/linux/cgroup_dmem.h
@@ -23,7 +23,7 @@ int dmem_cgroup_try_charge(struct dmem_cgroup_region *region, u64 size,
 void dmem_cgroup_uncharge(struct dmem_cgroup_pool_state *pool, u64 size);
 bool dmem_cgroup_state_evict_valuable(struct dmem_cgroup_pool_state *limit_pool,
 				      struct dmem_cgroup_pool_state *test_pool,
-				      bool ignore_low, bool *ret_hit_low);
+				      bool try_high, bool ignore_low, bool *ret_hit_low);
 
 void dmem_cgroup_pool_state_put(struct dmem_cgroup_pool_state *pool);
 #else
@@ -54,7 +54,7 @@ static inline void dmem_cgroup_uncharge(struct dmem_cgroup_pool_state *pool, u64
 static inline
 bool dmem_cgroup_state_evict_valuable(struct dmem_cgroup_pool_state *limit_pool,
 				      struct dmem_cgroup_pool_state *test_pool,
-				      bool ignore_low, bool *ret_hit_low)
+				      bool try_high, bool ignore_low, bool *ret_hit_low)
 {
 	return true;
 }
diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
index 4753a67d0f0f2..29f8c68e92f7f 100644
--- a/kernel/cgroup/dmem.c
+++ b/kernel/cgroup/dmem.c
@@ -156,6 +156,12 @@ set_resource_low(struct dmem_cgroup_pool_state *pool, u64 val)
 	page_counter_set_low(&pool->cnt, val);
 }
 
+static void
+set_resource_high(struct dmem_cgroup_pool_state *pool, u64 val)
+{
+	page_counter_set_high(&pool->cnt, val);
+}
+
 static void
 set_resource_max(struct dmem_cgroup_pool_state *pool, u64 val)
 {
@@ -167,6 +173,11 @@ static u64 get_resource_low(struct dmem_cgroup_pool_state *pool)
 	return pool ? READ_ONCE(pool->cnt.low) : 0;
 }
 
+static u64 get_resource_high(struct dmem_cgroup_pool_state *pool)
+{
+	return pool ? READ_ONCE(pool->cnt.high) : PAGE_COUNTER_MAX;
+}
+
 static u64 get_resource_min(struct dmem_cgroup_pool_state *pool)
 {
 	return pool ? READ_ONCE(pool->cnt.min) : 0;
@@ -186,6 +197,7 @@ static void reset_all_resource_limits(struct dmem_cgroup_pool_state *rpool)
 {
 	set_resource_min(rpool, 0);
 	set_resource_low(rpool, 0);
+	set_resource_high(rpool, PAGE_COUNTER_MAX);
 	set_resource_max(rpool, PAGE_COUNTER_MAX);
 }
 
@@ -289,10 +301,13 @@ dmem_cgroup_calculate_protection(struct dmem_cgroup_pool_state *limit_pool,
  * dmem_cgroup_state_evict_valuable() - Check if we should evict from test_pool
  * @limit_pool: The pool for which we hit limits
  * @test_pool: The pool for which to test
+ * @try_high: Only evict BOs whose usage exceeds the high limit (first pass)
  * @ignore_low: Whether we have to respect low watermarks.
  * @ret_hit_low: Pointer to whether it makes sense to consider low watermark.
  *
  * This function returns true if we can evict from @test_pool, false if not.
+ * When @try_high is set, only pools with usage above their high limit are
+ * evictable, enabling prioritized eviction of over-limit cgroups.
  * When returning false and @ignore_low is false, @ret_hit_low may
  * be set to true to indicate this function can be retried with @ignore_low
  * set to true.
@@ -301,15 +316,26 @@ dmem_cgroup_calculate_protection(struct dmem_cgroup_pool_state *limit_pool,
  */
 bool dmem_cgroup_state_evict_valuable(struct dmem_cgroup_pool_state *limit_pool,
 				      struct dmem_cgroup_pool_state *test_pool,
-				      bool ignore_low, bool *ret_hit_low)
+				      bool try_high, bool ignore_low, bool *ret_hit_low)
 {
 	struct dmem_cgroup_pool_state *pool = test_pool;
 	struct page_counter *ctest;
 	u64 used, min, low;
 
-	/* Can always evict from current pool, despite limits */
-	if (limit_pool == test_pool)
+	/*
+	 * When the limit-hitting cgroup's own BOs are being considered
+	 * in try_high mode, only evict them if their pool exceeds its
+	 * own dmem.high limit.  For non-try_high mode, maintain the
+	 * existing behavior: always evict from the limit-hitting pool.
+	 */
+	if (limit_pool == test_pool) {
+		if (try_high) {
+			ctest = &test_pool->cnt;
+			used = page_counter_read(ctest);
+			return used > READ_ONCE(ctest->high);
+		}
 		return true;
+	}
 
 	if (limit_pool) {
 		if (!parent_dmemcs(limit_pool->cs))
@@ -330,10 +356,38 @@ bool dmem_cgroup_state_evict_valuable(struct dmem_cgroup_pool_state *limit_pool,
 	}
 
 	ctest = &test_pool->cnt;
+	used = page_counter_read(ctest);
+
+	if (try_high) {
+		struct page_counter *c;
+
+		/*
+		 * Walk the page_counter parent chain to check whether any
+		 * ancestor cgroup exceeds its dmem.high limit.  This prevents
+		 * child cgroups from evading the penalty when a parent cgroup
+		 * is over its high limit.
+		 */
+		if (used <= READ_ONCE(ctest->high)) {
+			for (c = ctest->parent; c; c = c->parent) {
+				if (page_counter_read(c) > READ_ONCE(c->high))
+					break;
+			}
+			if (!c)
+				return false;
+		}
+
+		/*
+		 * Respect dmem.min protection: do not evict BOs below the
+		 * effective minimum even during the high-priority pass.
+		 */
+		dmem_cgroup_calculate_protection(limit_pool, test_pool);
+		min = READ_ONCE(ctest->emin);
+
+		return used > min;
+	}
 
 	dmem_cgroup_calculate_protection(limit_pool, test_pool);
 
-	used = page_counter_read(ctest);
 	min = READ_ONCE(ctest->emin);
 
 	if (used <= min)
@@ -835,6 +889,17 @@ static ssize_t dmem_cgroup_region_low_write(struct kernfs_open_file *of,
 	return dmemcg_limit_write(of, buf, nbytes, off, set_resource_low);
 }
 
+static int dmem_cgroup_region_high_show(struct seq_file *sf, void *v)
+{
+	return dmemcg_limit_show(sf, v, get_resource_high);
+}
+
+static ssize_t dmem_cgroup_region_high_write(struct kernfs_open_file *of,
+					  char *buf, size_t nbytes, loff_t off)
+{
+	return dmemcg_limit_write(of, buf, nbytes, off, set_resource_high);
+}
+
 static int dmem_cgroup_region_max_show(struct seq_file *sf, void *v)
 {
 	return dmemcg_limit_show(sf, v, get_resource_max);
@@ -868,6 +933,12 @@ static struct cftype files[] = {
 		.seq_show = dmem_cgroup_region_low_show,
 		.flags = CFTYPE_NOT_ON_ROOT,
 	},
+	{
+		.name = "high",
+		.write = dmem_cgroup_region_high_write,
+		.seq_show = dmem_cgroup_region_high_show,
+		.flags = CFTYPE_NOT_ON_ROOT,
+	},
 	{
 		.name = "max",
 		.write = dmem_cgroup_region_max_write,

---
base-commit: ab5fce87a778cb780a05984a2ca448f2b41aafbf
change-id: 20260519-feature-dmem-high-16997148dc38

Best regards,
-- 
Qiliang Yuan <realwujing@gmail.com>


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Claude review: cgroup/dmem: implement dmem.high soft limit via prioritized eviction
  2026-05-31  8:45 [PATCH v5] cgroup/dmem: implement dmem.high soft limit via prioritized eviction Qiliang Yuan
  2026-06-04  5:03 ` Claude review: " Claude Code Review Bot
@ 2026-06-04  5:03 ` Claude Code Review Bot
  1 sibling, 0 replies; 3+ messages in thread
From: Claude Code Review Bot @ 2026-06-04  5:03 UTC (permalink / raw)
  To: dri-devel-reviews

Overall Series Review

Subject: cgroup/dmem: implement dmem.high soft limit via prioritized eviction
Author: Qiliang Yuan <realwujing@gmail.com>
Patches: 1
Reviewed: 2026-06-04T15:03:19.411766

---

This is a single-patch v5 series adding a `dmem.high` soft limit to the dmem cgroup v2 controller. The concept is sound: when a hard `max` limit is hit and eviction is triggered, BOs from cgroups exceeding their `dmem.high` are targeted first via a new "Pass 1" in `ttm_bo_evict_alloc()`. This provides a useful middle ground between no limit and hard failure.

The patch has gone through significant iteration (v1→v5) addressing NULL-pointer issues, deadlock risks, and control flow problems. The v5 changes are clearly motivated by real bugs found in v4. However, several issues remain:

**Key concerns:**
1. **Missing `elow` check in the `try_high` path** — The try_high block respects `emin` but silently ignores `elow`, meaning BOs protected by `dmem.low` can be evicted during the high-priority pass.
2. **No documentation** — The patch adds a new cgroupfs control file but provides no documentation update (e.g., `Documentation/admin-guide/cgroup-v2.rst` or similar).
3. **The `try_high` path for `limit_pool == test_pool` skips `emin` protection** — When a cgroup hits its own limit and the same pool is being tested, the code only checks `used > high` but does not check against `emin`, unlike the cross-cgroup path.
4. **The 3-pass strategy has a subtle interaction** — Pass 1 uses blocking locks when a ticket is available, but Pass 2 always uses trylock. If Pass 1 fails to evict anything (all over-high BOs were locked and no ticket), and Pass 2 also fails to evict, the code falls through to Pass 3 which is the existing properly-locked path. This seems correct but could benefit from a comment about the fallback chain.

**Minor issues:**
- The cover letter text in the commit message body is duplicative with the commit message itself.
- Typo in the v2 changelog: "above-elow" (line 199).

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Claude review: cgroup/dmem: implement dmem.high soft limit via prioritized eviction
  2026-05-31  8:45 [PATCH v5] cgroup/dmem: implement dmem.high soft limit via prioritized eviction Qiliang Yuan
@ 2026-06-04  5:03 ` Claude Code Review Bot
  2026-06-04  5:03 ` Claude Code Review Bot
  1 sibling, 0 replies; 3+ messages in thread
From: Claude Code Review Bot @ 2026-06-04  5:03 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

#### kernel/cgroup/dmem.c — `dmem_cgroup_state_evict_valuable()`

**Issue 1 (medium): `limit_pool == test_pool` path in try_high mode skips emin protection**

```c
	if (limit_pool == test_pool) {
		if (try_high) {
			ctest = &test_pool->cnt;
			used = page_counter_read(ctest);
			return used > READ_ONCE(ctest->high);
		}
		return true;
	}
```

When a cgroup hits its own limit and `try_high` is set, this returns true solely based on `used > high` without checking `emin`. In contrast, the cross-cgroup `try_high` path (added later in the function) explicitly calls `dmem_cgroup_calculate_protection()` and checks `used > min`. This inconsistency means that in the self-eviction case during Pass 1, BOs below `emin` protection could be evicted. While `limit_pool == test_pool` implies this is the cgroup evicting its own BOs (so protection may be less relevant since the cgroup is over its own limits), it is still an inconsistency with the later code path and should at minimum be documented as intentional.

**Issue 2 (medium): try_high path does not respect elow**

The try_high cross-cgroup path checks `emin` but not `elow`:

```c
	if (try_high) {
		...
		dmem_cgroup_calculate_protection(limit_pool, test_pool);
		min = READ_ONCE(ctest->emin);

		return used > min;
	}
```

The normal (non-try_high) path checks both `emin` and `elow` (with `ret_hit_low` signaling). The try_high path ignores `elow` entirely. This means during Pass 1, a BO from a cgroup that is over its parent's `dmem.high` but protected by `dmem.low` could be evicted. If this is intentional (i.e., "high overrides low for the penalization pass"), it should be documented. If not, it's a correctness issue.

**Issue 3 (minor): `used` variable read twice unnecessarily**

```c
	ctest = &test_pool->cnt;
	used = page_counter_read(ctest);

	if (try_high) {
		...
		if (used <= READ_ONCE(ctest->high)) {
```

The `used = page_counter_read(ctest)` is hoisted before the `try_high` block. This read was previously placed after `dmem_cgroup_calculate_protection()`. Moving it earlier means the value is read before protection calculation in the non-try_high path. Looking at the diff more carefully, the original code had:

```c
	ctest = &test_pool->cnt;
	dmem_cgroup_calculate_protection(limit_pool, test_pool);
	used = page_counter_read(ctest);
```

The patch changes this to:

```c
	ctest = &test_pool->cnt;
	used = page_counter_read(ctest);

	if (try_high) { ... }

	dmem_cgroup_calculate_protection(limit_pool, test_pool);
	min = READ_ONCE(ctest->emin);
```

This reordering means `used` is now read before `dmem_cgroup_calculate_protection()` in the normal path too. In practice, `page_counter_read()` just reads an atomic counter and `dmem_cgroup_calculate_protection()` only writes `emin`/`elow`, so this reordering is harmless. But it's a subtle behavioral change that's not called out.

#### kernel/cgroup/dmem.c — New accessors and cgroupfs file

The `set_resource_high()`, `get_resource_high()`, `reset_all_resource_limits()` changes, and the new cgroupfs entries all look correct and follow the existing patterns exactly. `get_resource_high()` correctly returns `PAGE_COUNTER_MAX` for `NULL` pool, matching `get_resource_max()`.

The new cftype entry is placed between `low` and `max` in the files array, which is the natural ordering (min < low < high < max). Good.

#### include/linux/cgroup_dmem.h

The stub for `!CONFIG_CGROUP_DMEM` correctly adds the `try_high` parameter and ignores it (always returns `true`). This is correct — when dmem cgroups are disabled, all BOs are evictable.

#### drivers/gpu/drm/ttm/ttm_bo.c — 3-pass eviction strategy

**Issue 4 (low): Pass 1 blocking semantics could still cause latency issues**

```c
	evict_walk.walk.arg.trylock_only = !ticket;
	evict_walk.try_high = true;
	lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
```

When a ticket is available, Pass 1 uses blocking locks. This is intentional (v4 changelog says "as requested by Maarten Lankhorst") to ensure over-limit cgroups are actually penalized. However, this means Pass 1 could block waiting for a BO that ultimately turns out not to be in an over-high cgroup. The `dmem_cgroup_state_evict_valuable()` check happens inside `ttm_bo_evict_cb()`, which is called after the BO lock is acquired. So the blocking lock is acquired before knowing whether this BO is even a candidate for try_high eviction. This is a pre-existing pattern in the TTM LRU walk (the low/min checks also happen after locking), so it's not a new issue introduced by this patch, but it's worth noting for the commit message.

**Issue 5 (low): hit_low not reset between Pass 1 and Pass 2**

```c
	evict_walk.try_high = false;
	if (lret)
		goto out;

	/* Pass 2 */
	evict_walk.walk.arg.trylock_only = true;
	lret = ttm_lru_walk_for_evict(&evict_walk.walk, bdev, man, 1);
	if (!lret && evict_walk.hit_low) {
```

`hit_low` is not reset between Pass 1 and Pass 2. If Pass 1 happened to set `hit_low` (which can't actually happen since the try_high path never sets `*ret_hit_low`), this would cause Pass 2 to immediately try the low fallback. Since the try_high path in `dmem_cgroup_state_evict_valuable()` returns before reaching the `*ret_hit_low = true` line, this is not a bug in practice. But it relies on the invariant that try_high never touches hit_low — fragile if the code evolves.

**Overall comment on the ttm_bo.c changes:** The 3-pass structure is clean and readable. The comments added are appropriate and helpful. The removal of several now-redundant comments from the original code is fine.

#### Commit message

The commit message is well-written and clearly explains the motivation and approach. The below-the-`---` section duplicates much of the above-the-`---` content, which is unusual — typically the section below `---` is for changelog and notes, not a re-summary. The v4→v5 changelog is valuable and clearly describes the fixes.

#### Missing: Documentation

There is no update to any documentation file. The `dmem` cgroup controller should have its `high` knob documented alongside `min`, `low`, and `max`. This is typically expected for new cgroupfs control files.

---

**Summary:** The patch is well-motivated and has clearly improved through the v1→v5 iterations. The main substantive concerns are (1) inconsistent `emin`/`elow` protection in the try_high paths and (2) missing documentation for the new cgroupfs control file. The TTM eviction refactor is clean and correct. I'd suggest addressing the protection consistency issues and adding documentation before merging.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-06-04  5:03 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-31  8:45 [PATCH v5] cgroup/dmem: implement dmem.high soft limit via prioritized eviction Qiliang Yuan
2026-06-04  5:03 ` Claude review: " Claude Code Review Bot
2026-06-04  5:03 ` Claude Code Review Bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox