* [PATCH v3] cgroup/dmem: introduce a peak file
@ 2026-05-14 17:36 Thadeu Lima de Souza Cascardo
2026-05-16 0:32 ` Claude review: " Claude Code Review Bot
2026-05-16 0:32 ` Claude Code Review Bot
0 siblings, 2 replies; 7+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2026-05-14 17:36 UTC (permalink / raw)
To: Tejun Heo, Johannes Weiner, Michal Koutný, Michal Hocko,
Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton,
Jonathan Corbet, Shuah Khan, Maarten Lankhorst, Maxime Ripard,
Natalie Vock, Tvrtko Ursulin
Cc: cgroups, linux-kernel, linux-mm, linux-doc, dri-devel, kernel-dev,
Thadeu Lima de Souza Cascardo
Just like we have memory.peak, introduce a dmem.peak, which uses the
page_counter support for that.
For now, make it read-only.
This allows for memory usage monitoring without polling dmem.current when
the information needed is the maximum device memory used. That can be used
for capacity planning, such that dmem.max can be properly setup for a given
workload. It can also be used for debugging to determine whether a given
workload would have caused eviction or system memory use.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
---
Changes in v3:
- EDITME: describe what is new in this series revision.
- EDITME: use bulletpoints and terse descriptions.
- Link to v2: https://patch.msgid.link/20260513-dmem_peak-v2-1-dac06999db9e@igalia.com
Changes in v2:
- Make it read-only for now and adjust documentation accordingly.
- Link to v1: https://patch.msgid.link/20260506-dmem_peak-v1-0-8d803eb3449c@igalia.com
---
Documentation/admin-guide/cgroup-v2.rst | 6 ++++++
kernel/cgroup/dmem.c | 15 +++++++++++++++
2 files changed, 21 insertions(+)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 6efd0095ed99..d103623b2be4 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2808,6 +2808,12 @@ DMEM Interface Files
The semantics are the same as for the memory cgroup controller, and are
calculated in the same way.
+ dmem.peak
+ A read-only nested-keyed file that exists on non-root cgroups.
+
+ The max device memory usage recorded for the cgroup and its
+ descendants since the creation of the cgroup for each region.
+
dmem.capacity
A read-only file that describes maximum region capacity.
It only exists on the root cgroup. Not all memory can be
diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
index 4753a67d0f0f..6430c7ce1e03 100644
--- a/kernel/cgroup/dmem.c
+++ b/kernel/cgroup/dmem.c
@@ -182,6 +182,11 @@ static u64 get_resource_current(struct dmem_cgroup_pool_state *pool)
return pool ? page_counter_read(&pool->cnt) : 0;
}
+static u64 get_resource_peak(struct dmem_cgroup_pool_state *pool)
+{
+ return pool ? READ_ONCE(pool->cnt.watermark) : 0;
+}
+
static void reset_all_resource_limits(struct dmem_cgroup_pool_state *rpool)
{
set_resource_min(rpool, 0);
@@ -808,6 +813,11 @@ static int dmemcg_limit_show(struct seq_file *sf, void *v,
return 0;
}
+static int dmem_cgroup_region_peak_show(struct seq_file *sf, void *v)
+{
+ return dmemcg_limit_show(sf, v, get_resource_peak);
+}
+
static int dmem_cgroup_region_current_show(struct seq_file *sf, void *v)
{
return dmemcg_limit_show(sf, v, get_resource_current);
@@ -856,6 +866,11 @@ static struct cftype files[] = {
.name = "current",
.seq_show = dmem_cgroup_region_current_show,
},
+ {
+ .name = "peak",
+ .seq_show = dmem_cgroup_region_peak_show,
+ .flags = CFTYPE_NOT_ON_ROOT,
+ },
{
.name = "min",
.write = dmem_cgroup_region_min_write,
---
base-commit: d3b0a7f21119f5a66cb76aa28fb8cc13206aaf7d
change-id: 20260409-dmem_peak-3abc1be95072
Best regards,
--
Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
^ permalink raw reply related [flat|nested] 7+ messages in thread* Claude review: cgroup/dmem: introduce a peak file
2026-05-14 17:36 [PATCH v3] cgroup/dmem: introduce a peak file Thadeu Lima de Souza Cascardo
@ 2026-05-16 0:32 ` Claude Code Review Bot
2026-05-16 0:32 ` Claude Code Review Bot
1 sibling, 0 replies; 7+ messages in thread
From: Claude Code Review Bot @ 2026-05-16 0:32 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Correctness: Good**
The core implementation is sound:
```c
static u64 get_resource_peak(struct dmem_cgroup_pool_state *pool)
{
return pool ? READ_ONCE(pool->cnt.watermark) : 0;
}
```
This correctly handles the NULL pool case (returns 0, matching the `get_resource_current` pattern) and uses `READ_ONCE` to read the racy watermark, which is the established pattern in `mm/page_counter.c` (line 103: `if (new > READ_ONCE(c->watermark))`).
The watermark is updated hierarchically in `page_counter_try_charge()` and `page_counter_charge()`, walking the parent chain, so the "cgroup and its descendants" claim in the documentation is correct — charging to a leaf updates the watermark on all ancestors.
**Flag choice: Correct**
```c
{
.name = "peak",
.seq_show = dmem_cgroup_region_peak_show,
.flags = CFTYPE_NOT_ON_ROOT,
},
```
Using `CFTYPE_NOT_ON_ROOT` is consistent with `min`, `low`, and `max` in this controller. The root cgroup's peak is not meaningful (it would just track total system device memory usage, which `capacity` already covers from a different angle).
**Nit — v3 changelog has EDITME placeholders:**
```
Changes in v3:
- EDITME: describe what is new in this series revision.
- EDITME: use bulletpoints and terse descriptions.
```
This is a b4 template artifact that wasn't filled in. It doesn't affect the code, but it's sloppy and should be fixed before the patch is applied. The maintainer will likely ask for a resend.
**Nit — Documentation placement:**
The `dmem.peak` documentation is placed between `dmem.current` and `dmem.capacity`, which is a good logical grouping (current/peak are related usage metrics). The text is clear and mirrors the `memory.peak` description style.
**Observation — No selftest:**
The patch doesn't include a selftest. While there's no existing dmem selftest infrastructure to extend, a simple test (create cgroup, allocate device memory, read peak, verify peak >= current) would be valuable, especially since peak tracking is the kind of feature that could silently regress.
**Observation — Future write-to-reset:**
The commit message mentions "for now, make it read-only." When write-to-reset is added later, it will likely need to use `local_watermark` (like `memory.peak` does for per-FD resets via `page_counter_reset_watermark()`). The current implementation reading `watermark` (the global one) is correct for a read-only peak file, so no changes are needed now.
**Overall: The patch is a clean, minimal, and correct addition. Fix the EDITME changelog and consider adding a selftest, otherwise this is ready.**
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 7+ messages in thread* Claude review: cgroup/dmem: introduce a peak file
2026-05-14 17:36 [PATCH v3] cgroup/dmem: introduce a peak file Thadeu Lima de Souza Cascardo
2026-05-16 0:32 ` Claude review: " Claude Code Review Bot
@ 2026-05-16 0:32 ` Claude Code Review Bot
1 sibling, 0 replies; 7+ messages in thread
From: Claude Code Review Bot @ 2026-05-16 0:32 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: cgroup/dmem: introduce a peak file
Author: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Patches: 1
Reviewed: 2026-05-16T10:32:53.178730
---
This is a single-patch series (v3) that adds a read-only `dmem.peak` file to the dmem cgroup controller, mirroring the existing `memory.peak` concept. The implementation is clean and minimal — it reuses the existing `page_counter` watermark infrastructure and the `dmemcg_limit_show` callback pattern already established for `dmem.current`.
The patch is **correct and safe to merge** with minor nits. The watermark is already maintained hierarchically by `page_counter_try_charge()` / `page_counter_charge()`, so no additional charge-path changes are needed. The `READ_ONCE` usage is appropriate and matches how the memory controller reads the watermark.
**Issues worth noting:**
1. **Incomplete changelog**: The v3 changelog still contains `EDITME` placeholders — this should be fixed before merge, though it's metadata only.
2. **No selftest coverage**: There's no selftest for this new file, and no existing dmem selftest infrastructure was found. A basic test would strengthen the patch.
3. **Read-only vs. read-write**: The commit message says "for now, make it read-only." The memory controller's `memory.peak` is read-write (writing resets the peak per-FD). This is fine as stated — reset support can come later — but the documentation should probably mention this difference more explicitly or note that reset support is planned.
---
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2] cgroup/dmem: introduce a peak file
@ 2026-05-13 18:58 Thadeu Lima de Souza Cascardo
2026-05-16 1:20 ` Claude review: " Claude Code Review Bot
2026-05-16 1:20 ` Claude Code Review Bot
0 siblings, 2 replies; 7+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2026-05-13 18:58 UTC (permalink / raw)
To: Tejun Heo, Johannes Weiner, Michal Koutný, Michal Hocko,
Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton,
Jonathan Corbet, Shuah Khan, Maarten Lankhorst, Maxime Ripard,
Natalie Vock, Tvrtko Ursulin
Cc: cgroups, linux-kernel, linux-mm, linux-doc, dri-devel, kernel-dev,
Thadeu Lima de Souza Cascardo
Just like we have memory.peak, introduce a dmem.peak, which uses the
page_counter support for that.
For now, make it read-only.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
---
Changes in v2:
- Make it read-only for now and adjust documentation accordingly.
- Link to v1: https://patch.msgid.link/20260506-dmem_peak-v1-0-8d803eb3449c@igalia.com
---
Documentation/admin-guide/cgroup-v2.rst | 6 ++++++
kernel/cgroup/dmem.c | 15 +++++++++++++++
2 files changed, 21 insertions(+)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 6efd0095ed99..d103623b2be4 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2808,6 +2808,12 @@ DMEM Interface Files
The semantics are the same as for the memory cgroup controller, and are
calculated in the same way.
+ dmem.peak
+ A read-only nested-keyed file that exists on non-root cgroups.
+
+ The max device memory usage recorded for the cgroup and its
+ descendants since the creation of the cgroup for each region.
+
dmem.capacity
A read-only file that describes maximum region capacity.
It only exists on the root cgroup. Not all memory can be
diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
index 4753a67d0f0f..6430c7ce1e03 100644
--- a/kernel/cgroup/dmem.c
+++ b/kernel/cgroup/dmem.c
@@ -182,6 +182,11 @@ static u64 get_resource_current(struct dmem_cgroup_pool_state *pool)
return pool ? page_counter_read(&pool->cnt) : 0;
}
+static u64 get_resource_peak(struct dmem_cgroup_pool_state *pool)
+{
+ return pool ? READ_ONCE(pool->cnt.watermark) : 0;
+}
+
static void reset_all_resource_limits(struct dmem_cgroup_pool_state *rpool)
{
set_resource_min(rpool, 0);
@@ -808,6 +813,11 @@ static int dmemcg_limit_show(struct seq_file *sf, void *v,
return 0;
}
+static int dmem_cgroup_region_peak_show(struct seq_file *sf, void *v)
+{
+ return dmemcg_limit_show(sf, v, get_resource_peak);
+}
+
static int dmem_cgroup_region_current_show(struct seq_file *sf, void *v)
{
return dmemcg_limit_show(sf, v, get_resource_current);
@@ -856,6 +866,11 @@ static struct cftype files[] = {
.name = "current",
.seq_show = dmem_cgroup_region_current_show,
},
+ {
+ .name = "peak",
+ .seq_show = dmem_cgroup_region_peak_show,
+ .flags = CFTYPE_NOT_ON_ROOT,
+ },
{
.name = "min",
.write = dmem_cgroup_region_min_write,
---
base-commit: d3b0a7f21119f5a66cb76aa28fb8cc13206aaf7d
change-id: 20260409-dmem_peak-3abc1be95072
Best regards,
--
Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
^ permalink raw reply related [flat|nested] 7+ messages in thread* Claude review: cgroup/dmem: introduce a peak file
2026-05-13 18:58 [PATCH v2] " Thadeu Lima de Souza Cascardo
@ 2026-05-16 1:20 ` Claude Code Review Bot
2026-05-16 1:20 ` Claude Code Review Bot
1 sibling, 0 replies; 7+ messages in thread
From: Claude Code Review Bot @ 2026-05-16 1:20 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Approach:** The patch adds a `get_resource_peak()` getter that reads `pool->cnt.watermark`, a `dmem_cgroup_region_peak_show()` display function, and a cftype entry — all following the exact same pattern as `get_resource_current()` / `dmem_cgroup_region_current_show()`.
**Positive:**
1. The `READ_ONCE` usage in `get_resource_peak()` is correct and consistent with how `page_counter_charge()` in `mm/page_counter.c` uses `WRITE_ONCE` to update the watermark. This is the same deliberate-racy-but-acceptable pattern documented in the page counter code.
2. The null check (`pool ? ... : 0`) mirrors `get_resource_current()` exactly.
3. `CFTYPE_NOT_ON_ROOT` is the correct flag — this matches `memory.peak` in the memory controller and is consistent with how `dmem.min`, `dmem.low`, and `dmem.max` are configured.
**Observations:**
1. **`current` lacks `CFTYPE_NOT_ON_ROOT` but `peak` has it.** Looking at the existing tree, `dmem.current` does *not* set `CFTYPE_NOT_ON_ROOT`, so it appears on root. The new `dmem.peak` does set `CFTYPE_NOT_ON_ROOT`, so it won't appear on root. This is consistent with how `memory.current` (no flag) and `memory.peak` (has `CFTYPE_NOT_ON_ROOT`) work in `mm/memcontrol.c`, so the asymmetry is intentional and correct.
2. **No write/reset support.** The commit message and documentation say "read-only for now," which is fine for an initial version. The `memory.peak` controller has grown `peak_open`/`peak_release`/`memory_peak_write` for per-fd local watermark tracking and reset support. If that's planned as a follow-up, it might be worth mentioning in the commit message for future reference, but this is a nit — the v1→v2 changelog already documents the deliberate decision to start read-only.
3. **Documentation placement is correct.** The `dmem.peak` entry is inserted between `dmem.max/min/low` and `dmem.capacity`, which is a natural location grouping the per-cgroup files together before the root-only `dmem.capacity`.
4. **Watermark initialization.** The `page_counter_init()` function zeroes `usage` and sets `max` but does not explicitly initialize `watermark` or `local_watermark`. Since `page_counter` structs in `dmem_cgroup_pool_state` are allocated with `kzalloc` (which zero-initializes), the watermark starts at 0, which is correct for a "max seen so far" semantic. This is fine but worth noting for anyone reading the patch.
**No functional issues found.** The patch is a straightforward, correct extension of the existing dmem cgroup interface.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 7+ messages in thread
* Claude review: cgroup/dmem: introduce a peak file
2026-05-13 18:58 [PATCH v2] " Thadeu Lima de Souza Cascardo
2026-05-16 1:20 ` Claude review: " Claude Code Review Bot
@ 2026-05-16 1:20 ` Claude Code Review Bot
1 sibling, 0 replies; 7+ messages in thread
From: Claude Code Review Bot @ 2026-05-16 1:20 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: cgroup/dmem: introduce a peak file
Author: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Patches: 2
Reviewed: 2026-05-16T11:20:49.360654
---
This is a clean, well-scoped single patch that adds a `dmem.peak` cgroup file to the device memory (dmem) cgroup controller, mirroring the existing `memory.peak` pattern. The implementation correctly reuses the existing `dmemcg_limit_show` callback infrastructure and the `page_counter` watermark tracking that is already in place. The patch is minimal and follows established patterns in the codebase.
**Verdict: Looks good overall.** A couple of minor observations below but nothing blocking.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 0/2] cgroup/dmem: introduce a peak file
@ 2026-05-06 11:58 Thadeu Lima de Souza Cascardo
2026-05-06 11:58 ` [PATCH 2/2] " Thadeu Lima de Souza Cascardo
2026-05-07 3:37 ` Claude Code Review Bot
0 siblings, 2 replies; 7+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2026-05-06 11:58 UTC (permalink / raw)
To: Tejun Heo, Johannes Weiner, Michal Koutný, Michal Hocko,
Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton,
Jonathan Corbet, Shuah Khan, Maarten Lankhorst, Maxime Ripard,
Natalie Vock, Tvrtko Ursulin
Cc: cgroups, linux-kernel, linux-mm, linux-doc, dri-devel,
Thadeu Lima de Souza Cascardo, kernel-dev
Just like we have memory.peak, introduce a dmem.peak, which uses the
page_counter support for that.
It can be written to in order to reset the peak, but different from
memory.peak, which expects any write, dmem.peak expects the region name to
be written to it. That region peak is the one that is reset.
That requires ofp_peak to carry a pointer to the pool that was reset.
Writing a different region name will reset the different region and make
the original region peak get back to its non-reset value.
While at it, we reuse a helper from memcontrol, which we moved to
kernel/cgroup/cgroup.c.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
---
Thadeu Lima de Souza Cascardo (2):
mm/page_counter: decouple peak_reset from peak_write
cgroup/dmem: introduce a peak file
Documentation/admin-guide/cgroup-v2.rst | 10 +++
include/linux/cgroup-defs.h | 7 ++
kernel/cgroup/cgroup.c | 32 ++++++++
kernel/cgroup/dmem.c | 132 ++++++++++++++++++++++++++++++--
mm/memcontrol.c | 42 ++--------
5 files changed, 183 insertions(+), 40 deletions(-)
---
base-commit: 7fd2df204f342fc17d1a0bfcd474b24232fb0f32
change-id: 20260409-dmem_peak-3abc1be95072
Best regards,
--
Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH 2/2] cgroup/dmem: introduce a peak file
2026-05-06 11:58 [PATCH 0/2] " Thadeu Lima de Souza Cascardo
@ 2026-05-06 11:58 ` Thadeu Lima de Souza Cascardo
2026-05-07 3:37 ` Claude review: " Claude Code Review Bot
2026-05-07 3:37 ` Claude Code Review Bot
1 sibling, 1 reply; 7+ messages in thread
From: Thadeu Lima de Souza Cascardo @ 2026-05-06 11:58 UTC (permalink / raw)
To: Tejun Heo, Johannes Weiner, Michal Koutný, Michal Hocko,
Roman Gushchin, Shakeel Butt, Muchun Song, Andrew Morton,
Jonathan Corbet, Shuah Khan, Maarten Lankhorst, Maxime Ripard,
Natalie Vock, Tvrtko Ursulin
Cc: cgroups, linux-kernel, linux-mm, linux-doc, dri-devel,
Thadeu Lima de Souza Cascardo, kernel-dev
Just like we have memory.peak, introduce a dmem.peak, which uses the
page_counter support for that.
It can be written to in order to reset the peak, but different from
memory.peak, which expects any write, dmem.peak expects the region name to
be written to it. That region peak is the one that is reset.
That requires ofp_peak to carry a pointer to the pool that was reset.
Writing a different region name will reset the different region and make
the original region peak get back to its non-reset value.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
---
Documentation/admin-guide/cgroup-v2.rst | 10 +++
include/linux/cgroup-defs.h | 1 +
kernel/cgroup/dmem.c | 132 ++++++++++++++++++++++++++++++--
3 files changed, 137 insertions(+), 6 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 6efd0095ed99..3ba7ab3a36b3 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2808,6 +2808,16 @@ DMEM Interface Files
The semantics are the same as for the memory cgroup controller, and are
calculated in the same way.
+ dmem.peak
+ A readwrite nested-keyed file that exists on non-root cgroups.
+
+ The max memory usage recorded for the cgroup and its descendants since
+ either the creation of the cgroup or the most recent reset for that FD.
+
+ A write of a region name to this file resets it to the current memory
+ usage for subsequent reads through the same file descriptor for that
+ region.
+
dmem.capacity
A read-only file that describes maximum region capacity.
It only exists on the root cgroup. Not all memory can be
diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index a85044cb0553..b536054bd916 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -874,6 +874,7 @@ extern bool cgroup_enable_per_threadgroup_rwsem;
struct cgroup_of_peak {
unsigned long value;
struct list_head list;
+ struct dmem_cgroup_pool_state *pool;
};
void of_peak_reset(struct cgroup_of_peak *ofp, struct page_counter *pc,
diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
index 1ab1fb47f271..afa380c9839b 100644
--- a/kernel/cgroup/dmem.c
+++ b/kernel/cgroup/dmem.c
@@ -57,6 +57,9 @@ struct dmemcg_state {
struct cgroup_subsys_state css;
struct list_head pools;
+
+ /** @peaks_lock: Protects access to the pools' peaks lists */
+ spinlock_t peaks_lock;
};
struct dmem_cgroup_pool_state {
@@ -72,6 +75,10 @@ struct dmem_cgroup_pool_state {
struct rcu_head rcu;
struct page_counter cnt;
+
+ /* Protected by the dmemcg_state peaks_lock */
+ struct list_head peaks;
+
struct dmem_cgroup_pool_state *parent;
refcount_t ref;
@@ -162,26 +169,45 @@ set_resource_max(struct dmem_cgroup_pool_state *pool, u64 val)
page_counter_set_max(&pool->cnt, val);
}
-static u64 get_resource_low(struct dmem_cgroup_pool_state *pool)
+static u64 get_resource_low(struct seq_file *sf, struct dmem_cgroup_pool_state *pool)
{
return pool ? READ_ONCE(pool->cnt.low) : 0;
}
-static u64 get_resource_min(struct dmem_cgroup_pool_state *pool)
+static u64 get_resource_min(struct seq_file *sf, struct dmem_cgroup_pool_state *pool)
{
return pool ? READ_ONCE(pool->cnt.min) : 0;
}
-static u64 get_resource_max(struct dmem_cgroup_pool_state *pool)
+static u64 get_resource_max(struct seq_file *sf, struct dmem_cgroup_pool_state *pool)
{
return pool ? READ_ONCE(pool->cnt.max) : PAGE_COUNTER_MAX;
}
-static u64 get_resource_current(struct dmem_cgroup_pool_state *pool)
+static u64 get_resource_current(struct seq_file *sf, struct dmem_cgroup_pool_state *pool)
{
return pool ? page_counter_read(&pool->cnt) : 0;
}
+static u64 get_resource_peak(struct seq_file *sf, struct dmem_cgroup_pool_state *pool)
+{
+ struct cgroup_of_peak *ofp = of_peak(sf->private);
+ u64 fd_peak, peak;
+ struct dmem_cgroup_pool_state *of_pool;
+
+ if (!pool)
+ return 0;
+
+ of_pool = READ_ONCE(ofp->pool);
+
+ fd_peak = READ_ONCE(ofp->value);
+ if (of_pool != pool || fd_peak == OFP_PEAK_UNSET)
+ peak = pool->cnt.watermark;
+ else
+ peak = max(fd_peak, READ_ONCE(pool->cnt.local_watermark));
+ return peak;
+}
+
static void reset_all_resource_limits(struct dmem_cgroup_pool_state *rpool)
{
set_resource_min(rpool, 0);
@@ -227,6 +253,7 @@ dmemcs_alloc(struct cgroup_subsys_state *parent_css)
return ERR_PTR(-ENOMEM);
INIT_LIST_HEAD(&dmemcs->pools);
+ spin_lock_init(&dmemcs->peaks_lock);
return &dmemcs->css;
}
@@ -377,6 +404,7 @@ alloc_pool_single(struct dmemcg_state *dmemcs, struct dmem_cgroup_region *region
ppool ? &ppool->cnt : NULL, true);
reset_all_resource_limits(pool);
refcount_set(&pool->ref, 1);
+ INIT_LIST_HEAD(&pool->peaks);
kref_get(®ion->ref);
if (ppool && !pool->parent) {
pool->parent = ppool;
@@ -784,7 +812,7 @@ static ssize_t dmemcg_limit_write(struct kernfs_open_file *of,
}
static int dmemcg_limit_show(struct seq_file *sf, void *v,
- u64 (*fn)(struct dmem_cgroup_pool_state *))
+ u64 (*fn)(struct seq_file *, struct dmem_cgroup_pool_state *))
{
struct dmemcg_state *dmemcs = css_to_dmemcs(seq_css(sf));
struct dmem_cgroup_region *region;
@@ -796,7 +824,7 @@ static int dmemcg_limit_show(struct seq_file *sf, void *v,
seq_puts(sf, region->name);
- val = fn(pool);
+ val = fn(sf, pool);
if (val < PAGE_COUNTER_MAX)
seq_printf(sf, " %lld\n", val);
else
@@ -807,6 +835,90 @@ static int dmemcg_limit_show(struct seq_file *sf, void *v,
return 0;
}
+static int dmem_cgroup_region_peak_open(struct kernfs_open_file *of)
+{
+ struct cgroup_of_peak *ofp = of_peak(of);
+
+ ofp->value = OFP_PEAK_UNSET;
+
+ return 0;
+}
+
+static void dmem_cgroup_region_peak_remove(struct cgroup_of_peak *ofp)
+{
+ struct dmem_cgroup_pool_state *pool;
+ struct dmemcg_state *dmemcs;
+
+ pool = xchg(&ofp->pool, NULL);
+ if (!pool)
+ return;
+
+ dmemcs = pool->cs;
+
+ spin_lock(&dmemcs->peaks_lock);
+ list_del(&ofp->list);
+ spin_unlock(&dmemcs->peaks_lock);
+
+ WRITE_ONCE(ofp->value, OFP_PEAK_UNSET);
+
+ dmemcg_pool_put(pool);
+}
+
+static void dmem_cgroup_region_peak_release(struct kernfs_open_file *of)
+{
+ struct cgroup_of_peak *ofp = of_peak(of);
+
+ if (ofp->value == OFP_PEAK_UNSET) {
+ /* fast path (no writes on this fd) */
+ return;
+ }
+
+ dmem_cgroup_region_peak_remove(ofp);
+}
+
+static ssize_t dmem_cgroup_region_peak_write(struct kernfs_open_file *of,
+ char *buf, size_t nbytes, loff_t off)
+{
+ struct dmemcg_state *dmemcs = css_to_dmemcs(of_css(of));
+ struct cgroup_of_peak *ofp = of_peak(of);
+ struct dmem_cgroup_pool_state *pool = NULL;
+ struct dmem_cgroup_region *region;
+ int err = 0;
+
+ buf = strstrip(buf);
+ if (!buf[0])
+ return -EINVAL;
+
+ rcu_read_lock();
+ region = dmemcg_get_region_by_name(buf);
+ rcu_read_unlock();
+
+ if (!region)
+ return -EINVAL;
+
+ pool = get_cg_pool_unlocked(dmemcs, region);
+ if (IS_ERR(pool)) {
+ err = PTR_ERR(pool);
+ goto out_put;
+ }
+
+ dmem_cgroup_region_peak_remove(ofp);
+
+ xchg(&ofp->pool, pool);
+ spin_lock(&dmemcs->peaks_lock);
+ of_peak_reset(ofp, &pool->cnt, &pool->peaks);
+ spin_unlock(&dmemcs->peaks_lock);
+
+out_put:
+ kref_put(®ion->ref, dmemcg_free_region);
+ return err ?: nbytes;
+}
+
+static int dmem_cgroup_region_peak_show(struct seq_file *sf, void *v)
+{
+ return dmemcg_limit_show(sf, v, get_resource_peak);
+}
+
static int dmem_cgroup_region_current_show(struct seq_file *sf, void *v)
{
return dmemcg_limit_show(sf, v, get_resource_current);
@@ -855,6 +967,14 @@ static struct cftype files[] = {
.name = "current",
.seq_show = dmem_cgroup_region_current_show,
},
+ {
+ .name = "peak",
+ .open = dmem_cgroup_region_peak_open,
+ .release = dmem_cgroup_region_peak_release,
+ .write = dmem_cgroup_region_peak_write,
+ .seq_show = dmem_cgroup_region_peak_show,
+ .flags = CFTYPE_NOT_ON_ROOT,
+ },
{
.name = "min",
.write = dmem_cgroup_region_min_write,
--
2.47.3
^ permalink raw reply related [flat|nested] 7+ messages in thread* Claude review: cgroup/dmem: introduce a peak file
2026-05-06 11:58 ` [PATCH 2/2] " Thadeu Lima de Souza Cascardo
@ 2026-05-07 3:37 ` Claude Code Review Bot
0 siblings, 0 replies; 7+ messages in thread
From: Claude Code Review Bot @ 2026-05-07 3:37 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
This is the main patch, adding `dmem.peak` file support with per-region, per-fd peak tracking.
**Correctness: Good.** The peak tracking semantics are correct:
- Global watermark (`pool->cnt.watermark`) is reported for regions the fd has not reset.
- Per-fd peak (`max(ofp->value, pool->cnt.local_watermark)`) is reported for the region the fd has reset.
- Writing a different region name correctly "detaches" from the old region (reverting to global watermark) and "attaches" to the new one.
**Reference counting is correct:** `get_cg_pool_unlocked()` acquires a pool ref that is transferred to `ofp->pool`. `dmem_cgroup_region_peak_remove()` drops it. The region ref from `dmemcg_get_region_by_name()` is dropped at `out_put`. The release path handles both the fast path (no writes) and the watcher-cleanup path.
**Design concerns:**
1. **`pool` pointer in generic `cgroup_of_peak` struct:**
```c
struct cgroup_of_peak {
unsigned long value;
struct list_head list;
+ struct dmem_cgroup_pool_state *pool;
};
```
This adds a DRM-subsystem-specific pointer to a struct used by the generic cgroup infrastructure and the memory controller. Every `memory.peak` open file now carries an unused `struct dmem_cgroup_pool_state *pool`. While harmless (it's zero-initialized via `kzalloc` of the containing `cgroup_file_ctx`), this is a layering violation that cgroup/mm maintainers may object to. Consider using a wrapper struct in dmem.c or adding a `void *private` to `cgroup_of_peak` instead.
2. **Unused `struct seq_file *sf` parameter added to four functions:**
```c
-static u64 get_resource_low(struct dmem_cgroup_pool_state *pool)
+static u64 get_resource_low(struct seq_file *sf, struct dmem_cgroup_pool_state *pool)
```
`get_resource_low()`, `get_resource_min()`, `get_resource_max()`, and `get_resource_current()` all gain an `sf` parameter they never use, solely to match the callback signature needed by `get_resource_peak()`. This is a code smell. An alternative would be to have the show function pass a context struct or to handle the peak case with a separate callback type. At minimum, marking the parameter `__always_unused` would signal the intent.
3. **`get_resource_peak()` reads `ofp` without any lock:**
```c
static u64 get_resource_peak(struct seq_file *sf, struct dmem_cgroup_pool_state *pool)
{
struct cgroup_of_peak *ofp = of_peak(sf->private);
...
of_pool = READ_ONCE(ofp->pool);
fd_peak = READ_ONCE(ofp->value);
if (of_pool != pool || fd_peak == OFP_PEAK_UNSET)
peak = pool->cnt.watermark;
else
peak = max(fd_peak, READ_ONCE(pool->cnt.local_watermark));
```
This is intentionally lockless for the read path, relying on `READ_ONCE` for individual fields. There is a window where `of_pool` could point to a freed pool (between `dmem_cgroup_region_peak_remove()` setting it to NULL and a concurrent read having already loaded the old value). However, since the pointer is only used for an equality comparison against the valid `pool` from the iterator — never dereferenced — this is safe. The fallback to `pool->cnt.watermark` on mismatch makes this robust. This is fine but worth a brief comment in the code explaining why the lockless read is safe.
4. **The `dmem_cgroup_region_peak_remove()` function does not check `ofp->value == OFP_PEAK_UNSET`:**
```c
static void dmem_cgroup_region_peak_remove(struct cgroup_of_peak *ofp)
{
struct dmem_cgroup_pool_state *pool;
...
pool = xchg(&ofp->pool, NULL);
if (!pool)
return;
...
```
It relies on `ofp->pool == NULL` for the "no prior write" case, which is correct since `pool` is only set in the write path. But `dmem_cgroup_region_peak_release()` additionally checks `ofp->value == OFP_PEAK_UNSET` as a fast path before calling `remove`. This is slightly redundant but harmless — the two checks are equivalent for the no-write case.
5. **Documentation is good:**
```
+ dmem.peak
+ A readwrite nested-keyed file that exists on non-root cgroups.
+ ...
```
Clear and follows the existing documentation style. The semantics are well-described.
6. **Initialization in `alloc_pool_single()`:**
```c
+ INIT_LIST_HEAD(&pool->peaks);
```
Correctly initializes the peaks watcher list for new pools. The `spin_lock_init(&dmemcs->peaks_lock)` in `dmemcs_alloc()` is also correct.
**One question for the author:** When multiple fds reset the same region, the `local_watermark` is lowered to current usage on each reset. This means fd1's peak could temporarily appear lower than its actual peak-since-reset if fd2 resets afterward at a lower usage point — until new allocations push `local_watermark` back up. This is the same behavior as `memory.peak` so it's consistent, but worth confirming that this is the intended semantic for dmem as well, since device memory allocation patterns may differ from system memory.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 7+ messages in thread
* Claude review: cgroup/dmem: introduce a peak file
2026-05-06 11:58 [PATCH 0/2] " Thadeu Lima de Souza Cascardo
2026-05-06 11:58 ` [PATCH 2/2] " Thadeu Lima de Souza Cascardo
@ 2026-05-07 3:37 ` Claude Code Review Bot
1 sibling, 0 replies; 7+ messages in thread
From: Claude Code Review Bot @ 2026-05-07 3:37 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: cgroup/dmem: introduce a peak file
Author: Thadeu Lima de Souza Cascardo <cascardo@igalia.com>
Patches: 6
Reviewed: 2026-05-07T13:37:55.340313
---
This 2-patch series adds a `dmem.peak` cgroup file, mirroring the existing `memory.peak` functionality for the DRM device-memory cgroup controller. Patch 1 extracts the peak-reset logic from memcontrol into a shared helper; Patch 2 uses it to implement per-region peak tracking in dmem.
**The overall approach is sound and the core logic is correct.** The reference counting is carefully handled, concurrent read/write races are safe due to appropriate use of `READ_ONCE`/`WRITE_ONCE` and fallback to global watermark, and the per-fd peak semantics correctly mirror the existing memory.peak behavior.
There are a few design concerns worth discussing, mainly around the addition of a dmem-specific `pool` pointer into the generic `cgroup_of_peak` struct, and the unused `seq_file *` parameter added to four getter functions. Neither is a correctness issue, but both are layering concerns that reviewers from the cgroup/mm side may push back on.
No bugs found. The series is in good shape for a v1.
---
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-05-16 1:20 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-14 17:36 [PATCH v3] cgroup/dmem: introduce a peak file Thadeu Lima de Souza Cascardo
2026-05-16 0:32 ` Claude review: " Claude Code Review Bot
2026-05-16 0:32 ` Claude Code Review Bot
-- strict thread matches above, loose matches on Subject: below --
2026-05-13 18:58 [PATCH v2] " Thadeu Lima de Souza Cascardo
2026-05-16 1:20 ` Claude review: " Claude Code Review Bot
2026-05-16 1:20 ` Claude Code Review Bot
2026-05-06 11:58 [PATCH 0/2] " Thadeu Lima de Souza Cascardo
2026-05-06 11:58 ` [PATCH 2/2] " Thadeu Lima de Souza Cascardo
2026-05-07 3:37 ` Claude review: " Claude Code Review Bot
2026-05-07 3:37 ` Claude Code Review Bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox