From mboxrd@z Thu Jan 1 00:00:00 1970 From: Claude Code Review Bot To: dri-devel-reviews@example.com Subject: Claude review: cgroup/dmem: implement dmem.high soft limit and throttling Date: Mon, 25 May 2026 22:14:45 +1000 Message-ID: In-Reply-To: <20260520-feature-dmem-high-v1-1-97ca0cb7f95a@gmail.com> References: <20260520-feature-dmem-high-v1-1-97ca0cb7f95a@gmail.com> <20260520-feature-dmem-high-v1-1-97ca0cb7f95a@gmail.com> X-Mailer: Claude Code Patch Reviewer Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Patch Review **Critical: Sleeping inside RCU read-side critical section** The `__dmem_cgroup_handle_over_high()` function calls `schedule_timeout_kil= lable()` while holding `rcu_read_lock()`. This is illegal =E2=80=94 RCU rea= d-side critical sections must not sleep, and this will trigger a `BUG: slee= ping function called from invalid context` or similar splat on any kernel w= ith debug options enabled, and cause RCU stalls on production kernels. ```c rcu_read_lock(); list_for_each_entry_rcu(pool, &dmemcs->pools, css_node) { unsigned long usage, high; usage =3D page_counter_read(&pool->cnt); high =3D READ_ONCE(pool->cnt.high); if (usage > high) schedule_timeout_killable(HZ / 10); } rcu_read_unlock(); ``` You need to either collect the over-high state under RCU and sleep after `r= cu_read_unlock()`, or use a different locking strategy. Look at how `mem_cg= roup_handle_over_high` structures its work =E2=80=94 it does not hold RCU a= cross the sleep. **Bug: `get_resource_high` returns 0 for NULL pool** ```c static u64 get_resource_high(struct dmem_cgroup_pool_state *pool) { return pool ? READ_ONCE(pool->cnt.high) : 0; } ``` When a cgroup has no pool for a particular region, the default should be `P= AGE_COUNTER_MAX` (no limit), not `0`. Returning 0 means `dmem.high` will sh= ow `0` for regions where no pool exists, which is semantically wrong. Compa= re with `get_resource_max()` which correctly returns `PAGE_COUNTER_MAX` for= NULL pools. **Design issue: No fast-path guard in the inline wrapper** The memcg version guards the expensive path with a per-task flag: ```c static inline void mem_cgroup_handle_over_high(gfp_t gfp_mask) { if (unlikely(current->memcg_nr_pages_over_high)) __mem_cgroup_handle_over_high(gfp_mask); } ``` This patch's inline wrapper is unconditional: ```c static inline void dmem_cgroup_handle_over_high(void) { __dmem_cgroup_handle_over_high(); } ``` This means **every** return-to-userspace path will call `task_get_css()` + = iterate pools + `css_put()`, even for tasks that have never touched device = memory. This is a hot path =E2=80=94 it should have a cheap early-exit. Con= sider adding a per-task flag (similar to `memcg_nr_pages_over_high`) or at = minimum a static key that is only enabled when any dmem region is registere= d. **Design issue: High limit check does not walk the hierarchy** In `dmem_cgroup_try_charge`, the high check only looks at the leaf pool: ```c if (page_counter_read(&pool->cnt) > READ_ONCE(pool->cnt.high)) set_notify_resume(current); ``` If a parent cgroup sets a high limit but the child does not, the child's `p= ool->cnt.high` remains at `PAGE_COUNTER_MAX` and the parent's limit is neve= r checked. The memcg implementation walks the hierarchy for high limit enfo= rcement. You should walk from the leaf to the root, checking each ancestor'= s `high` against its own `usage`. **Nit: Gratuitous whitespace change** The patch re-aligns `dmem_cgroup_region_max_write` parameter indentation wi= thout any functional change: ```c static ssize_t dmem_cgroup_region_max_write(struct kernfs_open_file *of, - char *buf, size_t nbytes, loff_t off) + char *buf, size_t nbytes, loff_t off) ``` This should be dropped =E2=80=94 unrelated formatting changes in functional= patches add review noise. **Minor: EXPORT_SYMBOL_GPL may be unnecessary** `__dmem_cgroup_handle_over_high` is called only from the inline in the head= er, which is called from `resume_user_mode_work()` in arch entry code =E2= =80=94 always built-in, never a module. The export appears unnecessary. **Missing: No event counter or statistics** The memcg high limit has associated event counters (`MEMCG_HIGH`) that can = be observed via `memory.events`. Without analogous accounting, users have n= o way to observe how often throttling fires, making the feature difficult t= o tune in practice. This isn't a blocker for v1, but worth considering. --- Generated by Claude Code Patch Reviewer