* [PATCH] drm/ttm: Fix GPU MM stats during pool shrinking
@ 2026-05-01 22:30 Matthew Brost
[not found] ` <18c48e7f-a2bd-4e74-9262-266cd5c2fde8@panix.com>
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Matthew Brost @ 2026-05-01 22:30 UTC (permalink / raw)
To: intel-xe, dri-devel
Cc: Kenneth Crudup, Christian Koenig, Huang Rui, Matthew Auld,
David Airlie
TTM pool shrinking frees pages by calling __free_pages() directly,
which bypasses updates to NR_GPU_ACTIVE and leaves GPU MM accounting
out of sync.
Introduce a helper, __free_pages_gpu_account(), and use it for all page
frees in ttm_pool.c so GPU MM statistics are updated consistently.
Reported-by: Kenneth Crudup <kenny@panix.com>
Fixes: ae80122f3896 ("drm/ttm: use gpu mm stats to track gpu memory allocations. (v4)")
Cc: Christian Koenig <christian.koenig@amd.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: David Airlie <airlied@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
---
drivers/gpu/drm/ttm/ttm_pool.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 26a3689e5fd9..95bbd9328072 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -206,6 +206,14 @@ static struct page *ttm_pool_alloc_page(struct ttm_pool *pool, gfp_t gfp_flags,
return NULL;
}
+static void __free_pages_gpu_account(struct page *p, unsigned int order,
+ bool reclaim)
+{
+ mod_lruvec_page_state(p, reclaim ? NR_GPU_RECLAIM : NR_GPU_ACTIVE,
+ -(1 << order));
+ __free_pages(p, order);
+}
+
/* Reset the caching and pages of size 1 << order */
static void ttm_pool_free_page(struct ttm_pool *pool, enum ttm_caching caching,
unsigned int order, struct page *p, bool reclaim)
@@ -223,9 +231,7 @@ static void ttm_pool_free_page(struct ttm_pool *pool, enum ttm_caching caching,
#endif
if (!pool || !ttm_pool_uses_dma_alloc(pool)) {
- mod_lruvec_page_state(p, reclaim ? NR_GPU_RECLAIM : NR_GPU_ACTIVE,
- -(1 << order));
- __free_pages(p, order);
+ __free_pages_gpu_account(p, order, reclaim);
return;
}
@@ -606,7 +612,7 @@ static int ttm_pool_restore_commit(struct ttm_pool_tt_restore *restore,
*/
ttm_pool_split_for_swap(restore->pool, p);
copy_highpage(restore->alloced_page + i, p);
- __free_pages(p, 0);
+ __free_pages_gpu_account(p, 0, false);
}
restore->restored_pages++;
@@ -1068,7 +1074,7 @@ long ttm_pool_backup(struct ttm_pool *pool, struct ttm_tt *tt,
if (flags->purge) {
shrunken += num_pages;
page->private = 0;
- __free_pages(page, order);
+ __free_pages_gpu_account(page, order, false);
memset(tt->pages + i, 0,
num_pages * sizeof(*tt->pages));
}
--
2.34.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] drm/ttm: Fix GPU MM stats during pool shrinking
[not found] ` <18c48e7f-a2bd-4e74-9262-266cd5c2fde8@panix.com>
@ 2026-05-02 4:12 ` Matthew Brost
2026-05-02 4:13 ` Matthew Brost
0 siblings, 1 reply; 5+ messages in thread
From: Matthew Brost @ 2026-05-02 4:12 UTC (permalink / raw)
To: Kenneth Crudup
Cc: intel-xe, dri-devel, Christian Koenig, Huang Rui, Matthew Auld,
David Airlie
On Fri, May 01, 2026 at 05:05:14PM -0700, Kenneth Crudup wrote:
>
> On 5/1/26 15:30, Matthew Brost wrote:
>
> > TTM pool shrinking frees pages by calling __free_pages() directly,
> > which bypasses updates to NR_GPU_ACTIVE and leaves GPU MM accounting
> > out of sync.
> >
> > Introduce a helper, __free_pages_gpu_account(), and use it for all page
> > frees in ttm_pool.c so GPU MM statistics are updated consistently.
>
> OK, so why/how does "bonnie++" increase the GPU Memory size?
>
Well, it shouldn’t. What bonnie++ does is basically consume system
memory, triggering reclaim, which in turn will evict GPU BOs that exist
when the display is open. Thanks for pointing me to bonnie++ - this,
plus running something like the WebGL Aquarium in Chrome, is a very nice
test case for Xe/DRM shrinkers.
> ----
> SwapTotal: 33554428 kB
> MemTotal: 32345672 kB
> GPUReclaim: 621568 kB
> GPUActive: 453592 kB
>
> SwapTotal: 33554428 kB
> MemTotal: 32345672 kB
> GPUActive: 5554976 kB
> GPUReclaim: 12716 kB
>
> SwapTotal: 33554428 kB
> MemTotal: 32345672 kB
> GPUActive: 18407272 kB
> GPUReclaim: 884 kB
>
> SwapTotal: 33554428 kB
> MemTotal: 32345672 kB
> GPUActive: 24022916 kB
> GPUReclaim: 716 kB
>
> SwapTotal: 33554428 kB
> MemTotal: 32345672 kB
> GPUActive: 25258248 kB
> GPUReclaim: 16032 kB
>
> SwapTotal: 33554428 kB
> MemTotal: 32345672 kB
> GPUActive: 28207188 kB
> GPUReclaim: 3684 kB
> ----
>
> ... and I'm now not so sure the patch is working ... this after a 2nd bonnie
> run:
It doesn’t appear that it is. I can recreate what you’re seeing with
this patch alone on drm-tip. I originally coded this patch on top of a
local fix to avoid the TTM shrinker allocating higher-order folios when
reclaiming memory—this is working there. I falsely assumed it would work
on drm-tip as well. Let me ensure I have a standalone fix for GPUActive
accounting first, then apply my TTM shrinker fix on top.
Matt
>
> ----
> GPUActive: 44357100 kB
> SwapTotal: 33554428 kB
> MemTotal: 32345672 kB
> GPUReclaim: 94864 kB
>
> GPUActive: 44373904 kB
> SwapTotal: 33554428 kB
> MemTotal: 32345672 kB
> GPUReclaim: 94996 kB
>
> GPUActive: 44354940 kB
> SwapTotal: 33554428 kB
> MemTotal: 32345672 kB
> GPUReclaim: 98048 kB
>
> GPUActive: 44769340 kB
> SwapTotal: 33554428 kB
> MemTotal: 32345672 kB
> GPUReclaim: 122996 kB
> ----
>
> -Kenny
>
> --
> Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange County
> CA
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] drm/ttm: Fix GPU MM stats during pool shrinking
2026-05-02 4:12 ` Matthew Brost
@ 2026-05-02 4:13 ` Matthew Brost
0 siblings, 0 replies; 5+ messages in thread
From: Matthew Brost @ 2026-05-02 4:13 UTC (permalink / raw)
To: Kenneth Crudup
Cc: intel-xe, dri-devel, Christian Koenig, Huang Rui, Matthew Auld,
David Airlie
On Fri, May 01, 2026 at 09:12:31PM -0700, Matthew Brost wrote:
> On Fri, May 01, 2026 at 05:05:14PM -0700, Kenneth Crudup wrote:
> >
> > On 5/1/26 15:30, Matthew Brost wrote:
> >
> > > TTM pool shrinking frees pages by calling __free_pages() directly,
> > > which bypasses updates to NR_GPU_ACTIVE and leaves GPU MM accounting
> > > out of sync.
> > >
> > > Introduce a helper, __free_pages_gpu_account(), and use it for all page
> > > frees in ttm_pool.c so GPU MM statistics are updated consistently.
> >
> > OK, so why/how does "bonnie++" increase the GPU Memory size?
> >
>
> Well, it shouldn’t. What bonnie++ does is basically consume system
> memory, triggering reclaim, which in turn will evict GPU BOs that exist
> when the display is open. Thanks for pointing me to bonnie++ - this,
> plus running something like the WebGL Aquarium in Chrome, is a very nice
> test case for Xe/DRM shrinkers.
>
> > ----
> > SwapTotal: 33554428 kB
> > MemTotal: 32345672 kB
> > GPUReclaim: 621568 kB
> > GPUActive: 453592 kB
> >
> > SwapTotal: 33554428 kB
> > MemTotal: 32345672 kB
> > GPUActive: 5554976 kB
> > GPUReclaim: 12716 kB
> >
> > SwapTotal: 33554428 kB
> > MemTotal: 32345672 kB
> > GPUActive: 18407272 kB
> > GPUReclaim: 884 kB
> >
> > SwapTotal: 33554428 kB
> > MemTotal: 32345672 kB
> > GPUActive: 24022916 kB
> > GPUReclaim: 716 kB
> >
> > SwapTotal: 33554428 kB
> > MemTotal: 32345672 kB
> > GPUActive: 25258248 kB
> > GPUReclaim: 16032 kB
> >
> > SwapTotal: 33554428 kB
> > MemTotal: 32345672 kB
> > GPUActive: 28207188 kB
> > GPUReclaim: 3684 kB
> > ----
> >
> > ... and I'm now not so sure the patch is working ... this after a 2nd bonnie
> > run:
>
> It doesn’t appear that it is. I can recreate what you’re seeing with
> this patch alone on drm-tip. I originally coded this patch on top of a
> local fix to avoid the TTM shrinker allocating higher-order folios when
s/allocating/spliting
Matt
> reclaiming memory—this is working there. I falsely assumed it would work
> on drm-tip as well. Let me ensure I have a standalone fix for GPUActive
> accounting first, then apply my TTM shrinker fix on top.
>
> Matt
>
> >
> > ----
> > GPUActive: 44357100 kB
> > SwapTotal: 33554428 kB
> > MemTotal: 32345672 kB
> > GPUReclaim: 94864 kB
> >
> > GPUActive: 44373904 kB
> > SwapTotal: 33554428 kB
> > MemTotal: 32345672 kB
> > GPUReclaim: 94996 kB
> >
> > GPUActive: 44354940 kB
> > SwapTotal: 33554428 kB
> > MemTotal: 32345672 kB
> > GPUReclaim: 98048 kB
> >
> > GPUActive: 44769340 kB
> > SwapTotal: 33554428 kB
> > MemTotal: 32345672 kB
> > GPUReclaim: 122996 kB
> > ----
> >
> > -Kenny
> >
> > --
> > Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange County
> > CA
> >
^ permalink raw reply [flat|nested] 5+ messages in thread
* Claude review: drm/ttm: Fix GPU MM stats during pool shrinking
2026-05-01 22:30 [PATCH] drm/ttm: Fix GPU MM stats during pool shrinking Matthew Brost
[not found] ` <18c48e7f-a2bd-4e74-9262-266cd5c2fde8@panix.com>
2026-05-04 23:02 ` Claude review: " Claude Code Review Bot
@ 2026-05-04 23:02 ` Claude Code Review Bot
2 siblings, 0 replies; 5+ messages in thread
From: Claude Code Review Bot @ 2026-05-04 23:02 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: drm/ttm: Fix GPU MM stats during pool shrinking
Author: Matthew Brost <matthew.brost@intel.com>
Patches: 3
Reviewed: 2026-05-05T09:02:45.099837
---
This is a single-patch fix for a GPU memory accounting bug in TTM pool management. The issue is real and the fix is correct: two call sites in `ttm_pool.c` were calling `__free_pages()` directly without decrementing `NR_GPU_ACTIVE`, causing GPU MM stats to drift upward over time as pages are freed through the backup/restore and purge paths. The approach of extracting a helper (`__free_pages_gpu_account()`) is clean and ensures consistency across all three `__free_pages` call sites in the file.
The patch is minimal, well-scoped, and the Fixes tag correctly identifies the commit that introduced the GPU MM accounting.
**Recommendation: Approve.** One minor question worth confirming below, but it shouldn't block the patch.
---
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 5+ messages in thread
* Claude review: drm/ttm: Fix GPU MM stats during pool shrinking
2026-05-01 22:30 [PATCH] drm/ttm: Fix GPU MM stats during pool shrinking Matthew Brost
[not found] ` <18c48e7f-a2bd-4e74-9262-266cd5c2fde8@panix.com>
@ 2026-05-04 23:02 ` Claude Code Review Bot
2026-05-04 23:02 ` Claude Code Review Bot
2 siblings, 0 replies; 5+ messages in thread
From: Claude Code Review Bot @ 2026-05-04 23:02 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Correctness: Good.**
The patch introduces:
```c
static void __free_pages_gpu_account(struct page *p, unsigned int order,
bool reclaim)
{
mod_lruvec_page_state(p, reclaim ? NR_GPU_RECLAIM : NR_GPU_ACTIVE,
-(1 << order));
__free_pages(p, order);
}
```
And applies it to all three `__free_pages` call sites:
1. **`ttm_pool_free_page()` (line 225)** -- Pure refactor, replaces identical inline code. Already guarded by `if (!pool || !ttm_pool_uses_dma_alloc(pool))`, so it only runs for non-DMA pages that were accounted at allocation. Correct.
2. **`ttm_pool_restore_commit()` (line 609)** -- This frees an old retained page after splitting and copying to a new allocation. These pages were previously allocated through `ttm_pool_alloc_page()` which accounts them as `NR_GPU_ACTIVE` (line 175 in the current tree). Passing `reclaim=false` correctly decrements `NR_GPU_ACTIVE`. This was a real leak.
3. **`ttm_pool_backup()` (line 1071)** -- The purge path frees pages directly from `tt->pages`. These are actively-used pages counted in `NR_GPU_ACTIVE` (either from initial allocation or from `ttm_pool_type_take()` which moves pages back to ACTIVE state). Passing `reclaim=false` is correct. This was the main leak reported by the user.
**Coverage check:** I verified that these are the only three `__free_pages` call sites in `ttm_pool.c`:
- Line 228: handled (refactored)
- Line 609: handled (bug fix)
- Line 1071: handled (bug fix)
The shrinker dispose path (`ttm_pool_dispose_list` -> `ttm_pool_free_page` with `reclaim=true`) was already correct since it went through `ttm_pool_free_page()`.
**One question worth confirming:** At lines 609 and 1071, the `__free_pages()` call is not guarded by a `!ttm_pool_uses_dma_alloc()` check (unlike `ttm_pool_free_page` which has that guard). If pages from a DMA-alloc pool could reach these paths, calling `__free_pages_gpu_account()` would undercount `NR_GPU_ACTIVE` (since DMA-alloc pages never increment it at allocation). However, this would also mean the pre-existing `__free_pages()` call was itself wrong for DMA pages (leaking the DMA allocation), so this patch doesn't make it worse -- it's a separate pre-existing concern. It's worth confirming that these paths are indeed only reachable for non-DMA pools, or if not, whether there's a separate fix needed for the DMA-alloc case.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-05-04 23:02 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-01 22:30 [PATCH] drm/ttm: Fix GPU MM stats during pool shrinking Matthew Brost
[not found] ` <18c48e7f-a2bd-4e74-9262-266cd5c2fde8@panix.com>
2026-05-02 4:12 ` Matthew Brost
2026-05-02 4:13 ` Matthew Brost
2026-05-04 23:02 ` Claude review: " Claude Code Review Bot
2026-05-04 23:02 ` Claude Code Review Bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox