[PATCH] cgroup/dmem: return -ENOMEM on failed pool preallocation

public inbox for drm-ai-reviews@public-inbox.freedesktop.org
 help / color / mirror / Atom feed

* [PATCH] cgroup/dmem: return -ENOMEM on failed pool preallocation
@ 2026-05-11  1:31 Guopeng Zhang
  2026-05-11  1:46 ` Tejun Heo
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Guopeng Zhang @ 2026-05-11  1:31 UTC (permalink / raw)
  To: Maarten Lankhorst, Maxime Ripard, Natalie Vock, Tejun Heo
  Cc: Johannes Weiner, Michal Koutný, cgroups, dri-devel,
	linux-kernel, Guopeng Zhang

get_cg_pool_unlocked() handles allocation failures under dmemcg_lock by
dropping the lock, preallocating a pool with GFP_KERNEL, and retrying the
locked lookup and creation path.

If the fallback allocation fails too, pool remains NULL. Since the loop
condition is while (!pool), the function can keep retrying instead of
propagating the allocation failure to the caller.

Set pool to ERR_PTR(-ENOMEM) when the fallback allocation fails so the
loop exits through the existing common return path. The callers already
handle ERR_PTR() from get_cg_pool_unlocked(), so this restores the
expected error path.

Fixes: b168ed458dde ("kernel/cgroup: Add "dmem" memory accounting cgroup")
Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
---
 kernel/cgroup/dmem.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
index 1ab1fb47f271..4753a67d0f0f 100644
--- a/kernel/cgroup/dmem.c
+++ b/kernel/cgroup/dmem.c
@@ -602,6 +602,7 @@ get_cg_pool_unlocked(struct dmemcg_state *cg, struct dmem_cgroup_region *region)
 				pool = NULL;
 				continue;
 			}
+			pool = ERR_PTR(-ENOMEM);
 		}
 	}

-- 
2.43.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] cgroup/dmem: return -ENOMEM on failed pool preallocation
  2026-05-11  1:31 [PATCH] cgroup/dmem: return -ENOMEM on failed pool preallocation Guopeng Zhang
@ 2026-05-11  1:46 ` Tejun Heo
  2026-05-16  5:56   ` Claude review: " Claude Code Review Bot
  2026-05-16  5:56   ` Claude Code Review Bot
  2026-05-11 13:03 ` Michal Koutný
       [not found] ` <1778555389652035.138.seg@mailgw.kylinos.cn>
  2 siblings, 2 replies; 6+ messages in thread
From: Tejun Heo @ 2026-05-11  1:46 UTC (permalink / raw)
  To: Guopeng Zhang

Hello,

Applied to cgroup/for-7.1-fixes with Cc: stable@vger.kernel.org # v6.14+
added, as the bug is in released kernels. Also capitalized the subject.

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Claude review: Re: [PATCH] cgroup/dmem: return -ENOMEM on failed pool preallocation
  2026-05-11  1:46 ` Tejun Heo
@ 2026-05-16  5:56   ` Claude Code Review Bot
  2026-05-16  5:56   ` Claude Code Review Bot
  1 sibling, 0 replies; 6+ messages in thread
From: Claude Code Review Bot @ 2026-05-16  5:56 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Correctness: Correct.**

Looking at the function flow (lines 585–606 in the current tree):

```c
while (!pool) {
    spin_lock(&dmemcg_lock);
    if (!region->unregistered)
        pool = get_cg_pool_locked(cg, region, &allocpool);
    else
        pool = ERR_PTR(-ENODEV);
    ...
    spin_unlock(&dmemcg_lock);

    if (pool == ERR_PTR(-ENOMEM)) {
        pool = NULL;
        if (WARN_ON(allocpool))
            continue;

        allocpool = kzalloc_obj(*allocpool);
        if (allocpool) {
            pool = NULL;
            continue;
        }
        // <--- BUG: pool is NULL here, loop retries forever
    }
}
```

When `kzalloc_obj()` returns NULL (allocation failure), execution falls through the `if (allocpool)` block without setting `pool`. Since `pool` was set to NULL at line 596 (`pool = NULL`), the `while (!pool)` condition is still true, and the loop retries. But `allocpool` is also still NULL, so the next iteration will call `get_cg_pool_locked()` without a preallocated pool, which will again return `ERR_PTR(-ENOMEM)`, and the cycle repeats — an infinite loop under memory pressure, exactly when you'd least want it.

The fix:

```c
        allocpool = kzalloc_obj(*allocpool);
        if (allocpool) {
            pool = NULL;
            continue;
        }
+       pool = ERR_PTR(-ENOMEM);
    }
```

This causes the `while (!pool)` loop to terminate (since `ERR_PTR(-ENOMEM)` is non-NULL), and `pool` is returned to the caller. Since callers already check for `IS_ERR()` returns (the `-ENODEV` path at line 590 already sets an ERR_PTR), this is consistent with the existing error contract.

**Minor observations:**

1. The `Fixes:` tag references the correct commit that introduced the dmem cgroup code.
2. The `kfree(allocpool)` at line 608 is safe since `allocpool` is NULL in this error path.
3. One could argue that `GFP_KERNEL` allocation failure should perhaps use `__GFP_NOFAIL` or retry with `GFP_NOIO` depending on context, but simply propagating the error is the safest and most standard approach, and matches how `get_cg_pool_locked` already signals `-ENOMEM`.

**No issues found. Clean, minimal, correct fix.**

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Claude review: Re: [PATCH] cgroup/dmem: return -ENOMEM on failed pool preallocation
  2026-05-11  1:46 ` Tejun Heo
  2026-05-16  5:56   ` Claude review: " Claude Code Review Bot
@ 2026-05-16  5:56   ` Claude Code Review Bot
  1 sibling, 0 replies; 6+ messages in thread
From: Claude Code Review Bot @ 2026-05-16  5:56 UTC (permalink / raw)
  To: dri-devel-reviews

Overall Series Review

Subject: Re: [PATCH] cgroup/dmem: return -ENOMEM on failed pool preallocation
Author: Tejun Heo <tj@kernel.org>
Patches: 4
Reviewed: 2026-05-16T15:56:05.178465

---

This is a single-patch fix for a real infinite loop bug in `kernel/cgroup/dmem.c`. The analysis in the commit message is correct: when `get_cg_pool_locked()` returns `ERR_PTR(-ENOMEM)`, the code clears `pool` to NULL, then attempts a fallback `kzalloc_obj()`. If that fallback allocation also fails, `pool` remains NULL and the `while (!pool)` loop spins forever, burning CPU without ever being able to make progress. The fix is minimal, correct, and well-placed.

**Verdict: Good patch, should be accepted.**

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] cgroup/dmem: return -ENOMEM on failed pool preallocation
  2026-05-11  1:31 [PATCH] cgroup/dmem: return -ENOMEM on failed pool preallocation Guopeng Zhang
  2026-05-11  1:46 ` Tejun Heo
@ 2026-05-11 13:03 ` Michal Koutný
       [not found] ` <1778555389652035.138.seg@mailgw.kylinos.cn>
  2 siblings, 0 replies; 6+ messages in thread
From: Michal Koutný @ 2026-05-11 13:03 UTC (permalink / raw)
  To: Guopeng Zhang
  Cc: Maarten Lankhorst, Maxime Ripard, Natalie Vock, Tejun Heo,
	Johannes Weiner, cgroups, dri-devel, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1596 bytes --]

On Mon, May 11, 2026 at 09:31:50AM +0800, Guopeng Zhang <zhangguopeng@kylinos.cn> wrote:
> get_cg_pool_unlocked() handles allocation failures under dmemcg_lock by
> dropping the lock, preallocating a pool with GFP_KERNEL, and retrying the
> locked lookup and creation path.
> 
> If the fallback allocation fails too, pool remains NULL. Since the loop
> condition is while (!pool), the function can keep retrying instead of
> propagating the allocation failure to the caller.

This implies that it's OK when the function keeps retrying with
allocpool != NULL (and repeated WARN_ON()s)?

> Set pool to ERR_PTR(-ENOMEM) when the fallback allocation fails so the
> loop exits through the existing common return path. The callers already
> handle ERR_PTR() from get_cg_pool_unlocked(), so this restores the
> expected error path.

If the callers can handle it, maybe there's no need to retry at all.
Perhpas dmem fellows can step in here.

> 
> Fixes: b168ed458dde ("kernel/cgroup: Add "dmem" memory accounting cgroup")
> Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
> ---
>  kernel/cgroup/dmem.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
> index 1ab1fb47f271..4753a67d0f0f 100644
> --- a/kernel/cgroup/dmem.c
> +++ b/kernel/cgroup/dmem.c
> @@ -602,6 +602,7 @@ get_cg_pool_unlocked(struct dmemcg_state *cg, struct dmem_cgroup_region *region)
>  				pool = NULL;

This 2nd pool zeroing seems pointless.

>  				continue;
>  			}
> +			pool = ERR_PTR(-ENOMEM);
>  		}
>  	}


HTH,
Michal

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 265 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

[parent not found: <1778555389652035.138.seg@mailgw.kylinos.cn>]

* Re: [PATCH] cgroup/dmem: return -ENOMEM on failed pool preallocation
       [not found] ` <1778555389652035.138.seg@mailgw.kylinos.cn>
@ 2026-05-12  7:04   ` Guopeng Zhang
  0 siblings, 0 replies; 6+ messages in thread
From: Guopeng Zhang @ 2026-05-12  7:04 UTC (permalink / raw)
  To: Michal Koutný
  Cc: Maarten Lankhorst, Maxime Ripard, Natalie Vock, Tejun Heo,
	Johannes Weiner, cgroups, dri-devel, linux-kernel

在 2026/5/11 21:03, Michal Koutný 写道:
> On Mon, May 11, 2026 at 09:31:50AM +0800, Guopeng Zhang <zhangguopeng@kylinos.cn> wrote:
>> get_cg_pool_unlocked() handles allocation failures under dmemcg_lock by
>> dropping the lock, preallocating a pool with GFP_KERNEL, and retrying the
>> locked lookup and creation path.
>>
>> If the fallback allocation fails too, pool remains NULL. Since the loop
>> condition is while (!pool), the function can keep retrying instead of
>> propagating the allocation failure to the caller.
> 
> This implies that it's OK when the function keeps retrying with
> allocpool != NULL (and repeated WARN_ON()s)?
Hi Michal,

Thanks for taking a look.

No, that was not what I meant to imply. The commit message was not precise
enough there.

The intended normal retry is only for the case where the GFP_NOWAIT
allocation under dmemcg_lock fails. In that case, get_cg_pool_unlocked()
drops the lock, preallocates one pool with GFP_KERNEL, and the next locked
retry consumes that preallocated pool and clears allocpool.

So allocpool != NULL together with another -ENOMEM return is not expected to
be a normal retry path. The WARN_ON(allocpool) branch looks defensive, and I
agree that repeatedly continuing from there would not be useful if it ever
fired.

>> Set pool to ERR_PTR(-ENOMEM) when the fallback allocation fails so the
>> loop exits through the existing common return path. The callers already
>> handle ERR_PTR() from get_cg_pool_unlocked(), so this restores the
>> expected error path.
> 
> If the callers can handle it, maybe there's no need to retry at all.
> Perhpas dmem fellows can step in here.My understanding is that the retry still has a purpose independent of the
callers' ability to handle ERR_PTR().

The first allocation attempt happens in alloc_pool_single() while
dmemcg_lock is held, so it uses GFP_NOWAIT. If that fails,
get_cg_pool_unlocked() drops the lock and preallocates one pool with the
default GFP_KERNEL context. The next locked retry then consumes that
preallocated pool instead of trying another GFP_NOWAIT allocation for that
pool.

So callers can handle the final ERR_PTR() result, but the fallback
preallocation gives the allocation a chance to succeed in a less
constrained context before reporting -ENOMEM. That said, whether this
retry policy is desirable is a dmem design question, so input from dmem
folks would be helpful.

>>
>> Fixes: b168ed458dde ("kernel/cgroup: Add "dmem" memory accounting cgroup")
>> Signed-off-by: Guopeng Zhang <zhangguopeng@kylinos.cn>
>> ---
>>  kernel/cgroup/dmem.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
>> index 1ab1fb47f271..4753a67d0f0f 100644
>> --- a/kernel/cgroup/dmem.c
>> +++ b/kernel/cgroup/dmem.c
>> @@ -602,6 +602,7 @@ get_cg_pool_unlocked(struct dmemcg_state *cg, struct dmem_cgroup_region *region)
>>  				pool = NULL;
> 
> This 2nd pool zeroing seems pointless.
Agreed. 

Since Tejun has already applied the fix, I will wait for the discussion
before sending any follow-up. If we keep the current retry scheme, a small
cleanup can make this path clearer.

Thanks,
Guopeng

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-05-16  5:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-11  1:31 [PATCH] cgroup/dmem: return -ENOMEM on failed pool preallocation Guopeng Zhang
2026-05-11  1:46 ` Tejun Heo
2026-05-16  5:56   ` Claude review: " Claude Code Review Bot
2026-05-16  5:56   ` Claude Code Review Bot
2026-05-11 13:03 ` Michal Koutný
     [not found] ` <1778555389652035.138.seg@mailgw.kylinos.cn>
2026-05-12  7:04   ` Guopeng Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox