* Claude review: gpu/buddy: Per-order free and used block scoreboards
2026-05-04 13:52 [PATCH 0/2] " Francois Dugast
@ 2026-05-04 22:19 ` Claude Code Review Bot
0 siblings, 0 replies; 17+ messages in thread
From: Claude Code Review Bot @ 2026-05-04 22:19 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: gpu/buddy: Per-order free and used block scoreboards
Author: Francois Dugast <francois.dugast@intel.com>
Patches: 3
Reviewed: 2026-05-05T08:19:36.860161
---
This is a clean, well-motivated 2-patch series that replaces O(n) rbtree walks in `drm_buddy_print()` / `gpu_buddy_print()` with O(1) scoreboard array lookups. The approach is sound: maintain per-order counters at the points where block state transitions happen, removing the need to traverse the free trees at debugfs read time.
The free scoreboard (patch 1) is correctly maintained at all five state-transition sites. The init/fini and error-handling paths are properly updated. The series is low-risk and the resulting print functions are much simpler.
However, the used scoreboard (patch 2) has a correctness issue in the coalescing loop, and there are a couple of naming/style nits worth noting.
---
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 17+ messages in thread
* Claude review: gpu/buddy: Per-order free and used block scoreboards
2026-05-11 16:41 [PATCH v2 0/3] " Francois Dugast
@ 2026-05-16 4:58 ` Claude Code Review Bot
0 siblings, 0 replies; 17+ messages in thread
From: Claude Code Review Bot @ 2026-05-16 4:58 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: gpu/buddy: Per-order free and used block scoreboards
Author: Francois Dugast <francois.dugast@intel.com>
Patches: 5
Reviewed: 2026-05-16T14:58:07.515061
---
This is a well-structured 3-patch series against the `gpu_buddy` allocator. Patch 1 fixes a real use-after-free / rbtree corruption bug in four error-undo paths, patches 2 and 3 add per-order free and used block scoreboards to make `drm_buddy_print()` / `gpu_buddy_print()` O(1) instead of O(n).
The bug fix in patch 1 is correct and follows the established pattern from `__force_merge()`. The scoreboard accounting in patches 2 and 3 is carefully placed at every state-transition point and I was unable to find a path that would cause a missed increment, double-decrement, or underflow.
**Overall: series looks good, with minor nits below.**
---
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v4 0/5] gpu/buddy: Per-order free and used block scoreboards
@ 2026-05-22 9:25 Francois Dugast
2026-05-22 9:25 ` [PATCH v4 1/5] gpu/buddy: Fix use-after-free in split_block() call sites Francois Dugast
` (5 more replies)
0 siblings, 6 replies; 17+ messages in thread
From: Francois Dugast @ 2026-05-22 9:25 UTC (permalink / raw)
To: intel-xe
Cc: dri-devel, matthew.auld, Arunpravin.PaneerSelvam, Francois Dugast
drm_buddy_print() currently reports per-order free block counts by
walking all rbtrees, which is O(n) in the total number of free blocks
and holds the allocator lock for the duration. On large VRAM heaps with
many small fragments this becomes expensive.
This series replaces the rbtree walk with two lightweight scoreboard
arrays — free_scoreboard and used_scoreboard — indexed by order and
maintained incrementally at the points where block state transitions
occur. The print functions become simple array lookups, and drivers
reading debugfs (/sys/kernel/debug/dri/0/tile0/vram_mm) now get both
free and used counts per order at O(1) cost.
v4: Assert used_scoreboard is empty at fini()
v3: Add __gpu_buddy_undo_splits() helper as suggested by Matt
v2: Add first patch to fix bug reported by Sashiko [1] then update
following patches accordingly
Francois Dugast (5):
gpu/buddy: Fix use-after-free in split_block() call sites
gpu/buddy: Remove redundant condition in alloc_from_freetree() error
path
gpu/buddy: Introduce __gpu_buddy_undo_splits() helper
gpu/buddy: Track per-order free blocks with a scoreboard
gpu/buddy: Track per-order used blocks with a scoreboard
drivers/gpu/buddy.c | 107 ++++++++++++++++++++++--------------
drivers/gpu/drm/drm_buddy.c | 30 ++++------
include/linux/gpu_buddy.h | 15 +++++
3 files changed, 93 insertions(+), 59 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v4 1/5] gpu/buddy: Fix use-after-free in split_block() call sites
2026-05-22 9:25 [PATCH v4 0/5] gpu/buddy: Per-order free and used block scoreboards Francois Dugast
@ 2026-05-22 9:25 ` Francois Dugast
2026-05-25 9:06 ` Claude review: " Claude Code Review Bot
2026-05-22 9:25 ` [PATCH v4 2/5] gpu/buddy: Remove redundant condition in alloc_from_freetree() error path Francois Dugast
` (4 subsequent siblings)
5 siblings, 1 reply; 17+ messages in thread
From: Francois Dugast @ 2026-05-22 9:25 UTC (permalink / raw)
To: intel-xe
Cc: dri-devel, matthew.auld, Arunpravin.PaneerSelvam, Francois Dugast,
Sashiko
When split_block() fails it returns before calling mark_split(), leaving
the block in the FREE state and still linked in the rbtree. The four
err_undo paths then call __gpu_buddy_free() without first removing the
block from the tree, which leads to two distinct bugs:
- If the buddy is also free, __gpu_buddy_free() merges the two siblings
by calling gpu_block_free(mm, block) while block->rb is still linked
in the tree. Any subsequent rbtree traversal will follow the now-
dangling pointer, causing a use-after-free.
- In alloc_from_freetree(), where there is no buddy guard,
__gpu_buddy_free() always reaches mark_free() -> rbtree_insert() with
block still in the tree, corrupting the rbtree.
The same pattern is already used correctly in __force_merge(): call
rbtree_remove() to unlink the block before handing it to
__gpu_buddy_free(). Apply the same fix to all four err_undo sites.
Reported-by: Sashiko <sashiko-bot@kernel.org>
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Assisted-by: GitHub Copilot:claude-sonnet-4.6
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
drivers/gpu/buddy.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/buddy.c b/drivers/gpu/buddy.c
index eb1457376307..dac2027bb64a 100644
--- a/drivers/gpu/buddy.c
+++ b/drivers/gpu/buddy.c
@@ -737,8 +737,10 @@ __alloc_range_bias(struct gpu_buddy *mm,
buddy = __get_buddy(block);
if (buddy &&
(gpu_buddy_block_is_free(block) &&
- gpu_buddy_block_is_free(buddy)))
+ gpu_buddy_block_is_free(buddy))) {
+ rbtree_remove(mm, block);
__gpu_buddy_free(mm, block, false);
+ }
return ERR_PTR(err);
}
@@ -847,8 +849,10 @@ alloc_from_freetree(struct gpu_buddy *mm,
return block;
err_undo:
- if (tmp != order)
+ if (tmp != order) {
+ rbtree_remove(mm, block);
__gpu_buddy_free(mm, block, false);
+ }
return ERR_PTR(err);
}
@@ -968,8 +972,10 @@ gpu_buddy_offset_aligned_allocation(struct gpu_buddy *mm,
buddy = __get_buddy(block);
if (buddy &&
(gpu_buddy_block_is_free(block) &&
- gpu_buddy_block_is_free(buddy)))
+ gpu_buddy_block_is_free(buddy))) {
+ rbtree_remove(mm, block);
__gpu_buddy_free(mm, block, false);
+ }
return ERR_PTR(err);
}
@@ -1054,8 +1060,10 @@ static int __alloc_range(struct gpu_buddy *mm,
buddy = __get_buddy(block);
if (buddy &&
(gpu_buddy_block_is_free(block) &&
- gpu_buddy_block_is_free(buddy)))
+ gpu_buddy_block_is_free(buddy))) {
+ rbtree_remove(mm, block);
__gpu_buddy_free(mm, block, false);
+ }
err_free:
if (err == -ENOSPC && total_allocated_on_err) {
--
2.43.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 2/5] gpu/buddy: Remove redundant condition in alloc_from_freetree() error path
2026-05-22 9:25 [PATCH v4 0/5] gpu/buddy: Per-order free and used block scoreboards Francois Dugast
2026-05-22 9:25 ` [PATCH v4 1/5] gpu/buddy: Fix use-after-free in split_block() call sites Francois Dugast
@ 2026-05-22 9:25 ` Francois Dugast
2026-05-22 10:10 ` Matthew Auld
2026-05-25 9:06 ` Claude review: " Claude Code Review Bot
2026-05-22 9:25 ` [PATCH v4 3/5] gpu/buddy: Introduce __gpu_buddy_undo_splits() helper Francois Dugast
` (3 subsequent siblings)
5 siblings, 2 replies; 17+ messages in thread
From: Francois Dugast @ 2026-05-22 9:25 UTC (permalink / raw)
To: intel-xe
Cc: dri-devel, matthew.auld, Arunpravin.PaneerSelvam, Francois Dugast
The err_undo label in alloc_from_freetree() is only reachable via a
goto from inside the `while (tmp != order)` loop, which means tmp is
guaranteed to differ from order at that point. The surrounding
`if (tmp != order)` guard was therefore always true and can be dropped
without any behavioral change.
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
---
drivers/gpu/buddy.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/buddy.c b/drivers/gpu/buddy.c
index dac2027bb64a..9f6696f47f89 100644
--- a/drivers/gpu/buddy.c
+++ b/drivers/gpu/buddy.c
@@ -849,10 +849,8 @@ alloc_from_freetree(struct gpu_buddy *mm,
return block;
err_undo:
- if (tmp != order) {
- rbtree_remove(mm, block);
- __gpu_buddy_free(mm, block, false);
- }
+ rbtree_remove(mm, block);
+ __gpu_buddy_free(mm, block, false);
return ERR_PTR(err);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 3/5] gpu/buddy: Introduce __gpu_buddy_undo_splits() helper
2026-05-22 9:25 [PATCH v4 0/5] gpu/buddy: Per-order free and used block scoreboards Francois Dugast
2026-05-22 9:25 ` [PATCH v4 1/5] gpu/buddy: Fix use-after-free in split_block() call sites Francois Dugast
2026-05-22 9:25 ` [PATCH v4 2/5] gpu/buddy: Remove redundant condition in alloc_from_freetree() error path Francois Dugast
@ 2026-05-22 9:25 ` Francois Dugast
2026-05-25 9:06 ` Claude review: " Claude Code Review Bot
2026-05-22 9:25 ` [PATCH v4 4/5] gpu/buddy: Track per-order free blocks with a scoreboard Francois Dugast
` (2 subsequent siblings)
5 siblings, 1 reply; 17+ messages in thread
From: Francois Dugast @ 2026-05-22 9:25 UTC (permalink / raw)
To: intel-xe
Cc: dri-devel, matthew.auld, Arunpravin.PaneerSelvam, Francois Dugast
The pattern of merging a block back with its buddy on error paths is
duplicated across multiple locations. Extract it into a
__gpu_buddy_undo_splits() helper to avoid repetition and prepare for
future changes.
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
drivers/gpu/buddy.c | 43 +++++++++++++++++--------------------------
1 file changed, 17 insertions(+), 26 deletions(-)
diff --git a/drivers/gpu/buddy.c b/drivers/gpu/buddy.c
index 9f6696f47f89..8654604b87a4 100644
--- a/drivers/gpu/buddy.c
+++ b/drivers/gpu/buddy.c
@@ -650,6 +650,19 @@ static bool block_incompatible(struct gpu_buddy_block *block, unsigned int flags
return needs_clear != gpu_buddy_block_is_clear(block);
}
+static void __gpu_buddy_undo_splits(struct gpu_buddy *mm,
+ struct gpu_buddy_block *block)
+{
+ struct gpu_buddy_block *buddy = __get_buddy(block);
+
+ if (buddy &&
+ (gpu_buddy_block_is_free(block) &&
+ gpu_buddy_block_is_free(buddy))) {
+ rbtree_remove(mm, block);
+ __gpu_buddy_free(mm, block, false);
+ }
+}
+
static struct gpu_buddy_block *
__alloc_range_bias(struct gpu_buddy *mm,
u64 start, u64 end,
@@ -659,7 +672,6 @@ __alloc_range_bias(struct gpu_buddy *mm,
{
u64 req_size = mm->chunk_size << order;
struct gpu_buddy_block *block;
- struct gpu_buddy_block *buddy;
LIST_HEAD(dfs);
int err;
int i;
@@ -734,13 +746,7 @@ __alloc_range_bias(struct gpu_buddy *mm,
* bigger is better, so make sure we merge everything back before we
* free the allocated blocks.
*/
- buddy = __get_buddy(block);
- if (buddy &&
- (gpu_buddy_block_is_free(block) &&
- gpu_buddy_block_is_free(buddy))) {
- rbtree_remove(mm, block);
- __gpu_buddy_free(mm, block, false);
- }
+ __gpu_buddy_undo_splits(mm, block);
return ERR_PTR(err);
}
@@ -849,8 +855,7 @@ alloc_from_freetree(struct gpu_buddy *mm,
return block;
err_undo:
- rbtree_remove(mm, block);
- __gpu_buddy_free(mm, block, false);
+ __gpu_buddy_undo_splits(mm, block);
return ERR_PTR(err);
}
@@ -914,7 +919,6 @@ gpu_buddy_offset_aligned_allocation(struct gpu_buddy *mm,
{
struct gpu_buddy_block *block = NULL;
unsigned int order, tmp, alignment;
- struct gpu_buddy_block *buddy;
enum gpu_buddy_free_tree tree;
unsigned long pages;
int err;
@@ -967,13 +971,7 @@ gpu_buddy_offset_aligned_allocation(struct gpu_buddy *mm,
* bigger is better, so make sure we merge everything back before we
* free the allocated blocks.
*/
- buddy = __get_buddy(block);
- if (buddy &&
- (gpu_buddy_block_is_free(block) &&
- gpu_buddy_block_is_free(buddy))) {
- rbtree_remove(mm, block);
- __gpu_buddy_free(mm, block, false);
- }
+ __gpu_buddy_undo_splits(mm, block);
return ERR_PTR(err);
}
@@ -984,7 +982,6 @@ static int __alloc_range(struct gpu_buddy *mm,
u64 *total_allocated_on_err)
{
struct gpu_buddy_block *block;
- struct gpu_buddy_block *buddy;
u64 total_allocated = 0;
LIST_HEAD(allocated);
u64 end;
@@ -1055,13 +1052,7 @@ static int __alloc_range(struct gpu_buddy *mm,
* bigger is better, so make sure we merge everything back before we
* free the allocated blocks.
*/
- buddy = __get_buddy(block);
- if (buddy &&
- (gpu_buddy_block_is_free(block) &&
- gpu_buddy_block_is_free(buddy))) {
- rbtree_remove(mm, block);
- __gpu_buddy_free(mm, block, false);
- }
+ __gpu_buddy_undo_splits(mm, block);
err_free:
if (err == -ENOSPC && total_allocated_on_err) {
--
2.43.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 4/5] gpu/buddy: Track per-order free blocks with a scoreboard
2026-05-22 9:25 [PATCH v4 0/5] gpu/buddy: Per-order free and used block scoreboards Francois Dugast
` (2 preceding siblings ...)
2026-05-22 9:25 ` [PATCH v4 3/5] gpu/buddy: Introduce __gpu_buddy_undo_splits() helper Francois Dugast
@ 2026-05-22 9:25 ` Francois Dugast
2026-05-25 9:06 ` Claude review: " Claude Code Review Bot
2026-05-22 9:25 ` [PATCH v4 5/5] gpu/buddy: Track per-order used " Francois Dugast
2026-05-25 9:06 ` Claude review: gpu/buddy: Per-order free and used block scoreboards Claude Code Review Bot
5 siblings, 1 reply; 17+ messages in thread
From: Francois Dugast @ 2026-05-22 9:25 UTC (permalink / raw)
To: intel-xe
Cc: dri-devel, matthew.auld, Arunpravin.PaneerSelvam, Francois Dugast
Reporting per-order free block counts in drm_buddy_print() currently
requires walking all rbtrees, which is O(n) over the total number of
free blocks and holds the allocator lock for the duration. This becomes
expensive on large VRAM heaps with many small free fragments.
Maintain a free_scoreboard[] array indexed by order instead, so that
the count for any order is always available in O(1). The scoreboard is
kept accurate by hooking into the four places where a block's free state
changes: mark_free(), mark_allocated(), mark_split(), and the sites in
__gpu_buddy_free(), __force_merge(), and the four err_undo paths that
call rbtree_remove() directly on free blocks without going through
mark_*().
The print functions are simplified as a result: the rbtree traversal
is replaced by a direct array lookup.
v3: Update after introducing __gpu_buddy_undo_splits() helper
v2: Update after fix for use-after-free in split_block() call sites
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Assisted-by: GitHub Copilot:claude-sonnet-4.6
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
drivers/gpu/buddy.c | 36 +++++++++++++++++++++---------------
drivers/gpu/drm/drm_buddy.c | 16 ++--------------
include/linux/gpu_buddy.h | 7 +++++++
3 files changed, 30 insertions(+), 29 deletions(-)
diff --git a/drivers/gpu/buddy.c b/drivers/gpu/buddy.c
index 8654604b87a4..de18b63fef0a 100644
--- a/drivers/gpu/buddy.c
+++ b/drivers/gpu/buddy.c
@@ -193,6 +193,8 @@ static void mark_allocated(struct gpu_buddy *mm,
block->header &= ~GPU_BUDDY_HEADER_STATE;
block->header |= GPU_BUDDY_ALLOCATED;
+ mm->free_scoreboard[gpu_buddy_block_order(block)]--;
+
rbtree_remove(mm, block);
}
@@ -204,6 +206,8 @@ static void mark_free(struct gpu_buddy *mm,
block->header &= ~GPU_BUDDY_HEADER_STATE;
block->header |= GPU_BUDDY_FREE;
+ mm->free_scoreboard[gpu_buddy_block_order(block)]++;
+
tree = get_block_tree(block);
rbtree_insert(mm, block, tree);
}
@@ -214,6 +218,8 @@ static void mark_split(struct gpu_buddy *mm,
block->header &= ~GPU_BUDDY_HEADER_STATE;
block->header |= GPU_BUDDY_SPLIT;
+ mm->free_scoreboard[gpu_buddy_block_order(block)]--;
+
rbtree_remove(mm, block);
}
@@ -271,6 +277,7 @@ static unsigned int __gpu_buddy_free(struct gpu_buddy *mm,
}
rbtree_remove(mm, buddy);
+ mm->free_scoreboard[gpu_buddy_block_order(buddy)]--;
if (force_merge && gpu_buddy_block_is_clear(buddy))
mm->clear_avail -= gpu_buddy_block_size(mm, buddy);
@@ -335,6 +342,7 @@ static int __force_merge(struct gpu_buddy *mm,
iter = rb_prev(iter);
rbtree_remove(mm, block);
+ mm->free_scoreboard[gpu_buddy_block_order(block)]--;
if (gpu_buddy_block_is_clear(block))
mm->clear_avail -= gpu_buddy_block_size(mm, block);
@@ -384,11 +392,17 @@ int gpu_buddy_init(struct gpu_buddy *mm, u64 size, u64 chunk_size)
BUG_ON(mm->max_order > GPU_BUDDY_MAX_ORDER);
+ mm->free_scoreboard = kcalloc(mm->max_order + 1,
+ sizeof(*mm->free_scoreboard),
+ GFP_KERNEL);
+ if (!mm->free_scoreboard)
+ return -ENOMEM;
+
mm->free_trees = kmalloc_array(GPU_BUDDY_MAX_FREE_TREES,
sizeof(*mm->free_trees),
GFP_KERNEL);
if (!mm->free_trees)
- return -ENOMEM;
+ goto out_free_scoreboard;
for_each_free_tree(i) {
mm->free_trees[i] = kmalloc_array(mm->max_order + 1,
@@ -450,6 +464,8 @@ int gpu_buddy_init(struct gpu_buddy *mm, u64 size, u64 chunk_size)
while (i--)
kfree(mm->free_trees[i]);
kfree(mm->free_trees);
+out_free_scoreboard:
+ kfree(mm->free_scoreboard);
return -ENOMEM;
}
EXPORT_SYMBOL(gpu_buddy_init);
@@ -488,6 +504,7 @@ void gpu_buddy_fini(struct gpu_buddy *mm)
kfree(mm->free_trees[i]);
kfree(mm->free_trees);
kfree(mm->roots);
+ kfree(mm->free_scoreboard);
}
EXPORT_SYMBOL(gpu_buddy_fini);
@@ -659,6 +676,7 @@ static void __gpu_buddy_undo_splits(struct gpu_buddy *mm,
(gpu_buddy_block_is_free(block) &&
gpu_buddy_block_is_free(buddy))) {
rbtree_remove(mm, block);
+ mm->free_scoreboard[gpu_buddy_block_order(block)]--;
__gpu_buddy_free(mm, block, false);
}
}
@@ -1487,21 +1505,9 @@ void gpu_buddy_print(struct gpu_buddy *mm)
mm->chunk_size >> 10, mm->size >> 20, mm->avail >> 20, mm->clear_avail >> 20);
for (order = mm->max_order; order >= 0; order--) {
- struct gpu_buddy_block *block, *tmp;
- struct rb_root *root;
- u64 count = 0, free;
- unsigned int tree;
-
- for_each_free_tree(tree) {
- root = &mm->free_trees[tree][order];
-
- rbtree_postorder_for_each_entry_safe(block, tmp, root, rb) {
- BUG_ON(!gpu_buddy_block_is_free(block));
- count++;
- }
- }
+ u64 count = mm->free_scoreboard[order];
+ u64 free = count * (mm->chunk_size << order);
- free = count * (mm->chunk_size << order);
if (free < SZ_1M)
pr_info("order-%2d free: %8llu KiB, blocks: %llu\n",
order, free >> 10, count);
diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index faa025498de4..eef995e08a37 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -47,23 +47,11 @@ void drm_buddy_print(struct gpu_buddy *mm, struct drm_printer *p)
mm->chunk_size >> 10, mm->size >> 20, mm->avail >> 20, mm->clear_avail >> 20);
for (order = mm->max_order; order >= 0; order--) {
- struct gpu_buddy_block *block, *tmp;
- struct rb_root *root;
- u64 count = 0, free;
- unsigned int tree;
-
- for_each_free_tree(tree) {
- root = &mm->free_trees[tree][order];
-
- rbtree_postorder_for_each_entry_safe(block, tmp, root, rb) {
- BUG_ON(!gpu_buddy_block_is_free(block));
- count++;
- }
- }
+ u64 count = mm->free_scoreboard[order];
+ u64 free = count * (mm->chunk_size << order);
drm_printf(p, "order-%2d ", order);
- free = count * (mm->chunk_size << order);
if (free < SZ_1M)
drm_printf(p, "free: %8llu KiB", free >> 10);
else
diff --git a/include/linux/gpu_buddy.h b/include/linux/gpu_buddy.h
index 71941a039648..a28f7d7637ca 100644
--- a/include/linux/gpu_buddy.h
+++ b/include/linux/gpu_buddy.h
@@ -173,6 +173,13 @@ struct gpu_buddy {
* that fits in the remaining space.
*/
struct gpu_buddy_block **roots;
+ /*
+ * Per-order free block scoreboard: free_scoreboard[order] holds the
+ * number of blocks of that order currently in the free state.
+ * Incremented in mark_free(), decremented wherever rbtree_remove() is
+ * called on a free block.
+ */
+ u64 *free_scoreboard;
/* public: */
unsigned int n_roots;
unsigned int max_order;
--
2.43.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v4 5/5] gpu/buddy: Track per-order used blocks with a scoreboard
2026-05-22 9:25 [PATCH v4 0/5] gpu/buddy: Per-order free and used block scoreboards Francois Dugast
` (3 preceding siblings ...)
2026-05-22 9:25 ` [PATCH v4 4/5] gpu/buddy: Track per-order free blocks with a scoreboard Francois Dugast
@ 2026-05-22 9:25 ` Francois Dugast
2026-05-22 10:25 ` Arunpravin Paneer Selvam
2026-05-25 9:06 ` Claude review: " Claude Code Review Bot
2026-05-25 9:06 ` Claude review: gpu/buddy: Per-order free and used block scoreboards Claude Code Review Bot
5 siblings, 2 replies; 17+ messages in thread
From: Francois Dugast @ 2026-05-22 9:25 UTC (permalink / raw)
To: intel-xe
Cc: dri-devel, matthew.auld, Arunpravin.PaneerSelvam, Francois Dugast
Extend the scoreboard approach from the previous commit to used blocks,
so drm_buddy_print() can report per-order allocation pressure in O(1).
Unlike free blocks, an allocated block can leave the allocated state
through mark_free() (normal free and gpu_buddy_block_trim()) or be
consumed directly by gpu_block_free() during coalescing. Both sites are
guarded by gpu_buddy_block_is_allocated() and paired with the increment
in mark_allocated().
v3:
- Assert scoreboard is empty at fini(), as sanity check (Matthew Auld)
v2:
- Update after fix for use-after-free in split_block() call sites
- Change goto label to out_free_used_scoreboard for clarity
- Make drm_buddy_print() and gpu_buddy_print() symmetric for used and
free
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Assisted-by: GitHub Copilot:claude-sonnet-4.6
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
---
drivers/gpu/buddy.c | 42 ++++++++++++++++++++++++++++---------
drivers/gpu/drm/drm_buddy.c | 18 ++++++++++------
include/linux/gpu_buddy.h | 8 +++++++
3 files changed, 52 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/buddy.c b/drivers/gpu/buddy.c
index de18b63fef0a..dc81fe0301ce 100644
--- a/drivers/gpu/buddy.c
+++ b/drivers/gpu/buddy.c
@@ -194,6 +194,7 @@ static void mark_allocated(struct gpu_buddy *mm,
block->header |= GPU_BUDDY_ALLOCATED;
mm->free_scoreboard[gpu_buddy_block_order(block)]--;
+ mm->used_scoreboard[gpu_buddy_block_order(block)]++;
rbtree_remove(mm, block);
}
@@ -203,6 +204,9 @@ static void mark_free(struct gpu_buddy *mm,
{
enum gpu_buddy_free_tree tree;
+ if (gpu_buddy_block_is_allocated(block))
+ mm->used_scoreboard[gpu_buddy_block_order(block)]--;
+
block->header &= ~GPU_BUDDY_HEADER_STATE;
block->header |= GPU_BUDDY_FREE;
@@ -281,6 +285,9 @@ static unsigned int __gpu_buddy_free(struct gpu_buddy *mm,
if (force_merge && gpu_buddy_block_is_clear(buddy))
mm->clear_avail -= gpu_buddy_block_size(mm, buddy);
+ if (gpu_buddy_block_is_allocated(block))
+ mm->used_scoreboard[gpu_buddy_block_order(block)]--;
+
gpu_block_free(mm, block);
gpu_block_free(mm, buddy);
@@ -398,11 +405,17 @@ int gpu_buddy_init(struct gpu_buddy *mm, u64 size, u64 chunk_size)
if (!mm->free_scoreboard)
return -ENOMEM;
+ mm->used_scoreboard = kcalloc(mm->max_order + 1,
+ sizeof(*mm->used_scoreboard),
+ GFP_KERNEL);
+ if (!mm->used_scoreboard)
+ goto out_free_free_scoreboard;
+
mm->free_trees = kmalloc_array(GPU_BUDDY_MAX_FREE_TREES,
sizeof(*mm->free_trees),
GFP_KERNEL);
if (!mm->free_trees)
- goto out_free_scoreboard;
+ goto out_free_used_scoreboard;
for_each_free_tree(i) {
mm->free_trees[i] = kmalloc_array(mm->max_order + 1,
@@ -464,7 +477,9 @@ int gpu_buddy_init(struct gpu_buddy *mm, u64 size, u64 chunk_size)
while (i--)
kfree(mm->free_trees[i]);
kfree(mm->free_trees);
-out_free_scoreboard:
+out_free_used_scoreboard:
+ kfree(mm->used_scoreboard);
+out_free_free_scoreboard:
kfree(mm->free_scoreboard);
return -ENOMEM;
}
@@ -500,11 +515,15 @@ void gpu_buddy_fini(struct gpu_buddy *mm)
gpu_buddy_assert(mm->avail == mm->size);
+ for (i = 0; i <= mm->max_order; ++i)
+ gpu_buddy_assert(!mm->used_scoreboard[i]);
+
for_each_free_tree(i)
kfree(mm->free_trees[i]);
kfree(mm->free_trees);
kfree(mm->roots);
kfree(mm->free_scoreboard);
+ kfree(mm->used_scoreboard);
}
EXPORT_SYMBOL(gpu_buddy_fini);
@@ -1505,15 +1524,18 @@ void gpu_buddy_print(struct gpu_buddy *mm)
mm->chunk_size >> 10, mm->size >> 20, mm->avail >> 20, mm->clear_avail >> 20);
for (order = mm->max_order; order >= 0; order--) {
- u64 count = mm->free_scoreboard[order];
- u64 free = count * (mm->chunk_size << order);
-
- if (free < SZ_1M)
- pr_info("order-%2d free: %8llu KiB, blocks: %llu\n",
- order, free >> 10, count);
+ u64 free_count = mm->free_scoreboard[order];
+ u64 used_count = mm->used_scoreboard[order];
+ u64 block_size = mm->chunk_size << order;
+ u64 free = free_count * block_size;
+ u64 used = used_count * block_size;
+
+ if (block_size < SZ_1M)
+ pr_info("order-%2d free: %8llu KiB, used: %8llu KiB, free_blocks: %llu, used_blocks: %llu\n",
+ order, free >> 10, used >> 10, free_count, used_count);
else
- pr_info("order-%2d free: %8llu MiB, blocks: %llu\n",
- order, free >> 20, count);
+ pr_info("order-%2d free: %8llu MiB, used: %8llu MiB, free_blocks: %llu, used_blocks: %llu\n",
+ order, free >> 20, used >> 20, free_count, used_count);
}
}
EXPORT_SYMBOL(gpu_buddy_print);
diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index eef995e08a37..1536e59c6fe7 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -47,17 +47,23 @@ void drm_buddy_print(struct gpu_buddy *mm, struct drm_printer *p)
mm->chunk_size >> 10, mm->size >> 20, mm->avail >> 20, mm->clear_avail >> 20);
for (order = mm->max_order; order >= 0; order--) {
- u64 count = mm->free_scoreboard[order];
- u64 free = count * (mm->chunk_size << order);
+ u64 free_count = mm->free_scoreboard[order];
+ u64 used_count = mm->used_scoreboard[order];
+ u64 block_size = mm->chunk_size << order;
+ u64 free = free_count * block_size;
+ u64 used = used_count * block_size;
drm_printf(p, "order-%2d ", order);
- if (free < SZ_1M)
- drm_printf(p, "free: %8llu KiB", free >> 10);
+ if (block_size < SZ_1M)
+ drm_printf(p, "free: %8llu KiB, used: %8llu KiB",
+ free >> 10, used >> 10);
else
- drm_printf(p, "free: %8llu MiB", free >> 20);
+ drm_printf(p, "free: %8llu MiB, used: %8llu MiB",
+ free >> 20, used >> 20);
- drm_printf(p, ", blocks: %llu\n", count);
+ drm_printf(p, ", free_blocks: %llu, used_blocks: %llu\n",
+ free_count, used_count);
}
}
EXPORT_SYMBOL(drm_buddy_print);
diff --git a/include/linux/gpu_buddy.h b/include/linux/gpu_buddy.h
index a28f7d7637ca..e037714563d8 100644
--- a/include/linux/gpu_buddy.h
+++ b/include/linux/gpu_buddy.h
@@ -180,6 +180,14 @@ struct gpu_buddy {
* called on a free block.
*/
u64 *free_scoreboard;
+ /*
+ * Per-order used block scoreboard: used_scoreboard[order] holds the
+ * number of blocks of that order currently in the allocated state.
+ * Incremented in mark_allocated(), decremented in mark_free() (guarded
+ * by gpu_buddy_block_is_allocated()) and in __gpu_buddy_free() when an
+ * allocated block is consumed directly during buddy coalescing.
+ */
+ u64 *used_scoreboard;
/* public: */
unsigned int n_roots;
unsigned int max_order;
--
2.43.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v4 2/5] gpu/buddy: Remove redundant condition in alloc_from_freetree() error path
2026-05-22 9:25 ` [PATCH v4 2/5] gpu/buddy: Remove redundant condition in alloc_from_freetree() error path Francois Dugast
@ 2026-05-22 10:10 ` Matthew Auld
2026-05-25 9:06 ` Claude review: " Claude Code Review Bot
1 sibling, 0 replies; 17+ messages in thread
From: Matthew Auld @ 2026-05-22 10:10 UTC (permalink / raw)
To: Francois Dugast, intel-xe; +Cc: dri-devel, Arunpravin.PaneerSelvam
On 22/05/2026 10:25, Francois Dugast wrote:
> The err_undo label in alloc_from_freetree() is only reachable via a
> goto from inside the `while (tmp != order)` loop, which means tmp is
> guaranteed to differ from order at that point. The surrounding
> `if (tmp != order)` guard was therefore always true and can be dropped
> without any behavioral change.
>
> Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
> ---
> drivers/gpu/buddy.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/buddy.c b/drivers/gpu/buddy.c
> index dac2027bb64a..9f6696f47f89 100644
> --- a/drivers/gpu/buddy.c
> +++ b/drivers/gpu/buddy.c
> @@ -849,10 +849,8 @@ alloc_from_freetree(struct gpu_buddy *mm,
> return block;
>
> err_undo:
> - if (tmp != order) {
> - rbtree_remove(mm, block);
> - __gpu_buddy_free(mm, block, false);
> - }
> + rbtree_remove(mm, block);
> + __gpu_buddy_free(mm, block, false);
> return ERR_PTR(err);
> }
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 5/5] gpu/buddy: Track per-order used blocks with a scoreboard
2026-05-22 9:25 ` [PATCH v4 5/5] gpu/buddy: Track per-order used " Francois Dugast
@ 2026-05-22 10:25 ` Arunpravin Paneer Selvam
2026-05-22 10:56 ` Matthew Auld
2026-05-25 9:06 ` Claude review: " Claude Code Review Bot
1 sibling, 1 reply; 17+ messages in thread
From: Arunpravin Paneer Selvam @ 2026-05-22 10:25 UTC (permalink / raw)
To: Francois Dugast, intel-xe; +Cc: dri-devel, matthew.auld
On 5/22/2026 2:55 PM, Francois Dugast wrote:
> Extend the scoreboard approach from the previous commit to used blocks,
> so drm_buddy_print() can report per-order allocation pressure in O(1).
>
> Unlike free blocks, an allocated block can leave the allocated state
> through mark_free() (normal free and gpu_buddy_block_trim()) or be
> consumed directly by gpu_block_free() during coalescing. Both sites are
> guarded by gpu_buddy_block_is_allocated() and paired with the increment
> in mark_allocated().
>
> v3:
> - Assert scoreboard is empty at fini(), as sanity check (Matthew Auld)
>
> v2:
> - Update after fix for use-after-free in split_block() call sites
> - Change goto label to out_free_used_scoreboard for clarity
> - Make drm_buddy_print() and gpu_buddy_print() symmetric for used and
> free
>
> Signed-off-by: Francois Dugast <francois.dugast@intel.com>
> Assisted-by: GitHub Copilot:claude-sonnet-4.6
> Reviewed-by: Matthew Auld <matthew.auld@intel.com>
> ---
> drivers/gpu/buddy.c | 42 ++++++++++++++++++++++++++++---------
> drivers/gpu/drm/drm_buddy.c | 18 ++++++++++------
> include/linux/gpu_buddy.h | 8 +++++++
> 3 files changed, 52 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/buddy.c b/drivers/gpu/buddy.c
> index de18b63fef0a..dc81fe0301ce 100644
> --- a/drivers/gpu/buddy.c
> +++ b/drivers/gpu/buddy.c
> @@ -194,6 +194,7 @@ static void mark_allocated(struct gpu_buddy *mm,
> block->header |= GPU_BUDDY_ALLOCATED;
>
> mm->free_scoreboard[gpu_buddy_block_order(block)]--;
> + mm->used_scoreboard[gpu_buddy_block_order(block)]++;
The main consumer of free_scoreboard and used_scoreboard is
gpu_buddy_print(), which is a debug/diagnostic path - it only gets
called during debugfs reads or error state dumps, not during regular
operation.
However, the cost of maintaining these scoreboards is paid on every
alloc and free, which is the hot path of the allocator. Even though each
individual update is small, do we really need to add extra work to the
hot path for something that only benefits a rarely-called debug function?
Am I missing a use case that justifies the hot-path overhead?
Regards,
Arun.
>
> rbtree_remove(mm, block);
> }
> @@ -203,6 +204,9 @@ static void mark_free(struct gpu_buddy *mm,
> {
> enum gpu_buddy_free_tree tree;
>
> + if (gpu_buddy_block_is_allocated(block))
> + mm->used_scoreboard[gpu_buddy_block_order(block)]--;
> +
> block->header &= ~GPU_BUDDY_HEADER_STATE;
> block->header |= GPU_BUDDY_FREE;
>
> @@ -281,6 +285,9 @@ static unsigned int __gpu_buddy_free(struct gpu_buddy *mm,
> if (force_merge && gpu_buddy_block_is_clear(buddy))
> mm->clear_avail -= gpu_buddy_block_size(mm, buddy);
>
> + if (gpu_buddy_block_is_allocated(block))
> + mm->used_scoreboard[gpu_buddy_block_order(block)]--;
> +
> gpu_block_free(mm, block);
> gpu_block_free(mm, buddy);
>
> @@ -398,11 +405,17 @@ int gpu_buddy_init(struct gpu_buddy *mm, u64 size, u64 chunk_size)
> if (!mm->free_scoreboard)
> return -ENOMEM;
>
> + mm->used_scoreboard = kcalloc(mm->max_order + 1,
> + sizeof(*mm->used_scoreboard),
> + GFP_KERNEL);
> + if (!mm->used_scoreboard)
> + goto out_free_free_scoreboard;
> +
> mm->free_trees = kmalloc_array(GPU_BUDDY_MAX_FREE_TREES,
> sizeof(*mm->free_trees),
> GFP_KERNEL);
> if (!mm->free_trees)
> - goto out_free_scoreboard;
> + goto out_free_used_scoreboard;
>
> for_each_free_tree(i) {
> mm->free_trees[i] = kmalloc_array(mm->max_order + 1,
> @@ -464,7 +477,9 @@ int gpu_buddy_init(struct gpu_buddy *mm, u64 size, u64 chunk_size)
> while (i--)
> kfree(mm->free_trees[i]);
> kfree(mm->free_trees);
> -out_free_scoreboard:
> +out_free_used_scoreboard:
> + kfree(mm->used_scoreboard);
> +out_free_free_scoreboard:
> kfree(mm->free_scoreboard);
> return -ENOMEM;
> }
> @@ -500,11 +515,15 @@ void gpu_buddy_fini(struct gpu_buddy *mm)
>
> gpu_buddy_assert(mm->avail == mm->size);
>
> + for (i = 0; i <= mm->max_order; ++i)
> + gpu_buddy_assert(!mm->used_scoreboard[i]);
> +
> for_each_free_tree(i)
> kfree(mm->free_trees[i]);
> kfree(mm->free_trees);
> kfree(mm->roots);
> kfree(mm->free_scoreboard);
> + kfree(mm->used_scoreboard);
> }
> EXPORT_SYMBOL(gpu_buddy_fini);
>
> @@ -1505,15 +1524,18 @@ void gpu_buddy_print(struct gpu_buddy *mm)
> mm->chunk_size >> 10, mm->size >> 20, mm->avail >> 20, mm->clear_avail >> 20);
>
> for (order = mm->max_order; order >= 0; order--) {
> - u64 count = mm->free_scoreboard[order];
> - u64 free = count * (mm->chunk_size << order);
> -
> - if (free < SZ_1M)
> - pr_info("order-%2d free: %8llu KiB, blocks: %llu\n",
> - order, free >> 10, count);
> + u64 free_count = mm->free_scoreboard[order];
> + u64 used_count = mm->used_scoreboard[order];
> + u64 block_size = mm->chunk_size << order;
> + u64 free = free_count * block_size;
> + u64 used = used_count * block_size;
> +
> + if (block_size < SZ_1M)
> + pr_info("order-%2d free: %8llu KiB, used: %8llu KiB, free_blocks: %llu, used_blocks: %llu\n",
> + order, free >> 10, used >> 10, free_count, used_count);
> else
> - pr_info("order-%2d free: %8llu MiB, blocks: %llu\n",
> - order, free >> 20, count);
> + pr_info("order-%2d free: %8llu MiB, used: %8llu MiB, free_blocks: %llu, used_blocks: %llu\n",
> + order, free >> 20, used >> 20, free_count, used_count);
> }
> }
> EXPORT_SYMBOL(gpu_buddy_print);
> diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
> index eef995e08a37..1536e59c6fe7 100644
> --- a/drivers/gpu/drm/drm_buddy.c
> +++ b/drivers/gpu/drm/drm_buddy.c
> @@ -47,17 +47,23 @@ void drm_buddy_print(struct gpu_buddy *mm, struct drm_printer *p)
> mm->chunk_size >> 10, mm->size >> 20, mm->avail >> 20, mm->clear_avail >> 20);
>
> for (order = mm->max_order; order >= 0; order--) {
> - u64 count = mm->free_scoreboard[order];
> - u64 free = count * (mm->chunk_size << order);
> + u64 free_count = mm->free_scoreboard[order];
> + u64 used_count = mm->used_scoreboard[order];
> + u64 block_size = mm->chunk_size << order;
> + u64 free = free_count * block_size;
> + u64 used = used_count * block_size;
>
> drm_printf(p, "order-%2d ", order);
>
> - if (free < SZ_1M)
> - drm_printf(p, "free: %8llu KiB", free >> 10);
> + if (block_size < SZ_1M)
> + drm_printf(p, "free: %8llu KiB, used: %8llu KiB",
> + free >> 10, used >> 10);
> else
> - drm_printf(p, "free: %8llu MiB", free >> 20);
> + drm_printf(p, "free: %8llu MiB, used: %8llu MiB",
> + free >> 20, used >> 20);
>
> - drm_printf(p, ", blocks: %llu\n", count);
> + drm_printf(p, ", free_blocks: %llu, used_blocks: %llu\n",
> + free_count, used_count);
> }
> }
> EXPORT_SYMBOL(drm_buddy_print);
> diff --git a/include/linux/gpu_buddy.h b/include/linux/gpu_buddy.h
> index a28f7d7637ca..e037714563d8 100644
> --- a/include/linux/gpu_buddy.h
> +++ b/include/linux/gpu_buddy.h
> @@ -180,6 +180,14 @@ struct gpu_buddy {
> * called on a free block.
> */
> u64 *free_scoreboard;
> + /*
> + * Per-order used block scoreboard: used_scoreboard[order] holds the
> + * number of blocks of that order currently in the allocated state.
> + * Incremented in mark_allocated(), decremented in mark_free() (guarded
> + * by gpu_buddy_block_is_allocated()) and in __gpu_buddy_free() when an
> + * allocated block is consumed directly during buddy coalescing.
> + */
> + u64 *used_scoreboard;
> /* public: */
> unsigned int n_roots;
> unsigned int max_order;
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v4 5/5] gpu/buddy: Track per-order used blocks with a scoreboard
2026-05-22 10:25 ` Arunpravin Paneer Selvam
@ 2026-05-22 10:56 ` Matthew Auld
0 siblings, 0 replies; 17+ messages in thread
From: Matthew Auld @ 2026-05-22 10:56 UTC (permalink / raw)
To: Arunpravin Paneer Selvam, Francois Dugast, intel-xe; +Cc: dri-devel
On 22/05/2026 11:25, Arunpravin Paneer Selvam wrote:
>
>
> On 5/22/2026 2:55 PM, Francois Dugast wrote:
>> Extend the scoreboard approach from the previous commit to used blocks,
>> so drm_buddy_print() can report per-order allocation pressure in O(1).
>>
>> Unlike free blocks, an allocated block can leave the allocated state
>> through mark_free() (normal free and gpu_buddy_block_trim()) or be
>> consumed directly by gpu_block_free() during coalescing. Both sites are
>> guarded by gpu_buddy_block_is_allocated() and paired with the increment
>> in mark_allocated().
>>
>> v3:
>> - Assert scoreboard is empty at fini(), as sanity check (Matthew Auld)
>>
>> v2:
>> - Update after fix for use-after-free in split_block() call sites
>> - Change goto label to out_free_used_scoreboard for clarity
>> - Make drm_buddy_print() and gpu_buddy_print() symmetric for used and
>> free
>>
>> Signed-off-by: Francois Dugast <francois.dugast@intel.com>
>> Assisted-by: GitHub Copilot:claude-sonnet-4.6
>> Reviewed-by: Matthew Auld <matthew.auld@intel.com>
>> ---
>> drivers/gpu/buddy.c | 42 ++++++++++++++++++++++++++++---------
>> drivers/gpu/drm/drm_buddy.c | 18 ++++++++++------
>> include/linux/gpu_buddy.h | 8 +++++++
>> 3 files changed, 52 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/gpu/buddy.c b/drivers/gpu/buddy.c
>> index de18b63fef0a..dc81fe0301ce 100644
>> --- a/drivers/gpu/buddy.c
>> +++ b/drivers/gpu/buddy.c
>> @@ -194,6 +194,7 @@ static void mark_allocated(struct gpu_buddy *mm,
>> block->header |= GPU_BUDDY_ALLOCATED;
>> mm->free_scoreboard[gpu_buddy_block_order(block)]--;
>> + mm->used_scoreboard[gpu_buddy_block_order(block)]++;
> The main consumer of free_scoreboard and used_scoreboard is
> gpu_buddy_print(), which is a debug/diagnostic path - it only gets
> called during debugfs reads or error state dumps, not during regular
> operation.
> However, the cost of maintaining these scoreboards is paid on every
> alloc and free, which is the hot path of the allocator. Even though each
> individual update is small, do we really need to add extra work to the
> hot path for something that only benefits a rarely-called debug function?
> Am I missing a use case that justifies the hot-path overhead?
The usecase I originally saw involved sampling this frequently during a
running workload to help collect some running stats. For example to
generate a graph of the "page usage" over time. So is there lots of
smaller order usage (bad), or do we get mostly larger page orders
(good), do we see spikes we all of sudden we start getting smaller and
smaller orders, perhaps due to fragmentation? In general we just wanted
some more visibility to get a better sense of the "page usage" at
runtime. At the moment we don't have much visibility into what page
sizes we are actually getting from the allocator, over time.
We originally had this at this at the driver level, but it's not quite
as nice implementation wise, plus having it in the allocator could then
be beneficial to other drivers? Also the added cost did not seem too
bad, it's just incrementing/decrementing a counter. What do you think here?
>
> Regards,
> Arun.
>> rbtree_remove(mm, block);
>> }
>> @@ -203,6 +204,9 @@ static void mark_free(struct gpu_buddy *mm,
>> {
>> enum gpu_buddy_free_tree tree;
>> + if (gpu_buddy_block_is_allocated(block))
>> + mm->used_scoreboard[gpu_buddy_block_order(block)]--;
>> +
>> block->header &= ~GPU_BUDDY_HEADER_STATE;
>> block->header |= GPU_BUDDY_FREE;
>> @@ -281,6 +285,9 @@ static unsigned int __gpu_buddy_free(struct
>> gpu_buddy *mm,
>> if (force_merge && gpu_buddy_block_is_clear(buddy))
>> mm->clear_avail -= gpu_buddy_block_size(mm, buddy);
>> + if (gpu_buddy_block_is_allocated(block))
>> + mm->used_scoreboard[gpu_buddy_block_order(block)]--;
>> +
>> gpu_block_free(mm, block);
>> gpu_block_free(mm, buddy);
>> @@ -398,11 +405,17 @@ int gpu_buddy_init(struct gpu_buddy *mm, u64
>> size, u64 chunk_size)
>> if (!mm->free_scoreboard)
>> return -ENOMEM;
>> + mm->used_scoreboard = kcalloc(mm->max_order + 1,
>> + sizeof(*mm->used_scoreboard),
>> + GFP_KERNEL);
>> + if (!mm->used_scoreboard)
>> + goto out_free_free_scoreboard;
>> +
>> mm->free_trees = kmalloc_array(GPU_BUDDY_MAX_FREE_TREES,
>> sizeof(*mm->free_trees),
>> GFP_KERNEL);
>> if (!mm->free_trees)
>> - goto out_free_scoreboard;
>> + goto out_free_used_scoreboard;
>> for_each_free_tree(i) {
>> mm->free_trees[i] = kmalloc_array(mm->max_order + 1,
>> @@ -464,7 +477,9 @@ int gpu_buddy_init(struct gpu_buddy *mm, u64 size,
>> u64 chunk_size)
>> while (i--)
>> kfree(mm->free_trees[i]);
>> kfree(mm->free_trees);
>> -out_free_scoreboard:
>> +out_free_used_scoreboard:
>> + kfree(mm->used_scoreboard);
>> +out_free_free_scoreboard:
>> kfree(mm->free_scoreboard);
>> return -ENOMEM;
>> }
>> @@ -500,11 +515,15 @@ void gpu_buddy_fini(struct gpu_buddy *mm)
>> gpu_buddy_assert(mm->avail == mm->size);
>> + for (i = 0; i <= mm->max_order; ++i)
>> + gpu_buddy_assert(!mm->used_scoreboard[i]);
>> +
>> for_each_free_tree(i)
>> kfree(mm->free_trees[i]);
>> kfree(mm->free_trees);
>> kfree(mm->roots);
>> kfree(mm->free_scoreboard);
>> + kfree(mm->used_scoreboard);
>> }
>> EXPORT_SYMBOL(gpu_buddy_fini);
>> @@ -1505,15 +1524,18 @@ void gpu_buddy_print(struct gpu_buddy *mm)
>> mm->chunk_size >> 10, mm->size >> 20, mm->avail >> 20, mm-
>> >clear_avail >> 20);
>> for (order = mm->max_order; order >= 0; order--) {
>> - u64 count = mm->free_scoreboard[order];
>> - u64 free = count * (mm->chunk_size << order);
>> -
>> - if (free < SZ_1M)
>> - pr_info("order-%2d free: %8llu KiB, blocks: %llu\n",
>> - order, free >> 10, count);
>> + u64 free_count = mm->free_scoreboard[order];
>> + u64 used_count = mm->used_scoreboard[order];
>> + u64 block_size = mm->chunk_size << order;
>> + u64 free = free_count * block_size;
>> + u64 used = used_count * block_size;
>> +
>> + if (block_size < SZ_1M)
>> + pr_info("order-%2d free: %8llu KiB, used: %8llu KiB,
>> free_blocks: %llu, used_blocks: %llu\n",
>> + order, free >> 10, used >> 10, free_count, used_count);
>> else
>> - pr_info("order-%2d free: %8llu MiB, blocks: %llu\n",
>> - order, free >> 20, count);
>> + pr_info("order-%2d free: %8llu MiB, used: %8llu MiB,
>> free_blocks: %llu, used_blocks: %llu\n",
>> + order, free >> 20, used >> 20, free_count, used_count);
>> }
>> }
>> EXPORT_SYMBOL(gpu_buddy_print);
>> diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
>> index eef995e08a37..1536e59c6fe7 100644
>> --- a/drivers/gpu/drm/drm_buddy.c
>> +++ b/drivers/gpu/drm/drm_buddy.c
>> @@ -47,17 +47,23 @@ void drm_buddy_print(struct gpu_buddy *mm, struct
>> drm_printer *p)
>> mm->chunk_size >> 10, mm->size >> 20, mm->avail >> 20,
>> mm->clear_avail >> 20);
>> for (order = mm->max_order; order >= 0; order--) {
>> - u64 count = mm->free_scoreboard[order];
>> - u64 free = count * (mm->chunk_size << order);
>> + u64 free_count = mm->free_scoreboard[order];
>> + u64 used_count = mm->used_scoreboard[order];
>> + u64 block_size = mm->chunk_size << order;
>> + u64 free = free_count * block_size;
>> + u64 used = used_count * block_size;
>> drm_printf(p, "order-%2d ", order);
>> - if (free < SZ_1M)
>> - drm_printf(p, "free: %8llu KiB", free >> 10);
>> + if (block_size < SZ_1M)
>> + drm_printf(p, "free: %8llu KiB, used: %8llu KiB",
>> + free >> 10, used >> 10);
>> else
>> - drm_printf(p, "free: %8llu MiB", free >> 20);
>> + drm_printf(p, "free: %8llu MiB, used: %8llu MiB",
>> + free >> 20, used >> 20);
>> - drm_printf(p, ", blocks: %llu\n", count);
>> + drm_printf(p, ", free_blocks: %llu, used_blocks: %llu\n",
>> + free_count, used_count);
>> }
>> }
>> EXPORT_SYMBOL(drm_buddy_print);
>> diff --git a/include/linux/gpu_buddy.h b/include/linux/gpu_buddy.h
>> index a28f7d7637ca..e037714563d8 100644
>> --- a/include/linux/gpu_buddy.h
>> +++ b/include/linux/gpu_buddy.h
>> @@ -180,6 +180,14 @@ struct gpu_buddy {
>> * called on a free block.
>> */
>> u64 *free_scoreboard;
>> + /*
>> + * Per-order used block scoreboard: used_scoreboard[order] holds the
>> + * number of blocks of that order currently in the allocated state.
>> + * Incremented in mark_allocated(), decremented in mark_free()
>> (guarded
>> + * by gpu_buddy_block_is_allocated()) and in __gpu_buddy_free()
>> when an
>> + * allocated block is consumed directly during buddy coalescing.
>> + */
>> + u64 *used_scoreboard;
>> /* public: */
>> unsigned int n_roots;
>> unsigned int max_order;
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Claude review: gpu/buddy: Per-order free and used block scoreboards
2026-05-22 9:25 [PATCH v4 0/5] gpu/buddy: Per-order free and used block scoreboards Francois Dugast
` (4 preceding siblings ...)
2026-05-22 9:25 ` [PATCH v4 5/5] gpu/buddy: Track per-order used " Francois Dugast
@ 2026-05-25 9:06 ` Claude Code Review Bot
5 siblings, 0 replies; 17+ messages in thread
From: Claude Code Review Bot @ 2026-05-25 9:06 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: gpu/buddy: Per-order free and used block scoreboards
Author: Francois Dugast <francois.dugast@intel.com>
Patches: 9
Reviewed: 2026-05-25T19:06:37.067104
---
This is a well-structured 5-patch series that adds O(1) per-order free and used block counting to the gpu_buddy allocator. The series has good logical decomposition: bug fix first (patch 1), dead-code cleanup (patch 2), refactoring (patch 3), then the two feature patches (patches 4 and 5). Patches 1, 3, and 5 already carry `Reviewed-by: Matthew Auld`.
**Patch 1 is a genuine bug fix** — the existing error paths call `__gpu_buddy_free()` on a block that is still linked in an rbtree, which can corrupt the tree or cause use-after-free during coalescing. This is the most important patch in the series and should be considered for stable.
**The scoreboard approach is sound.** The incremental maintenance at the four state-transition points (mark_free, mark_allocated, mark_split, and the direct rbtree_remove sites) correctly mirrors the existing tree operations. The print functions become trivial O(1) lookups, which is a real win on large VRAM heaps.
**One concern worth raising:** the `free` vs `block_size` threshold change in patch 5 (from `free < SZ_1M` to `block_size < SZ_1M`) changes the KiB/MiB display logic — this is intentional and correct (avoids showing "0 MiB" for orders with zero blocks), but it changes debugfs output format which could break any userspace parsing.
Overall the series looks good. A few observations per patch below.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 17+ messages in thread
* Claude review: gpu/buddy: Fix use-after-free in split_block() call sites
2026-05-22 9:25 ` [PATCH v4 1/5] gpu/buddy: Fix use-after-free in split_block() call sites Francois Dugast
@ 2026-05-25 9:06 ` Claude Code Review Bot
0 siblings, 0 replies; 17+ messages in thread
From: Claude Code Review Bot @ 2026-05-25 9:06 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Correctness: Good fix, addresses a real bug.**
The existing code at all four `err_undo` paths calls `__gpu_buddy_free(mm, block, false)` without first removing `block` from the rbtree. Since a failed `split_block()` returns before `mark_split()`, the block is still FREE and linked in the tree. `__gpu_buddy_free()` will then try to coalesce with the buddy, calling `gpu_block_free(mm, block)` which frees the block's memory while its `rb` node is still linked. This is confirmed by examining the existing code at `buddy.c:737-741` where the rbtree_remove was missing.
The fix correctly adds `rbtree_remove(mm, block)` before each `__gpu_buddy_free()` call, matching the pattern already used in `__force_merge()` at line 337.
**One observation on the `alloc_from_freetree` error path:**
```c
err_undo:
if (tmp != order) {
rbtree_remove(mm, block);
__gpu_buddy_free(mm, block, false);
}
```
The `if (tmp != order)` guard: after a failed `split_block()`, `block` points to the block that failed to split, which is still at order `tmp`. Since we entered the `while (tmp != order)` loop, `tmp > order` at that point. However, if `split_block()` succeeds but something updates `block` to point to a child (at `block = block->right; tmp--;`), by the time a *subsequent* iteration of the loop fails, `block` points to the right child which is at order `tmp-1` (the decremented value). So `tmp != order` is indeed always true here, as patch 2 correctly observes.
No issues. This should probably carry a `Cc: stable@vger.kernel.org` tag and a `Fixes:` tag.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 17+ messages in thread
* Claude review: gpu/buddy: Remove redundant condition in alloc_from_freetree() error path
2026-05-22 9:25 ` [PATCH v4 2/5] gpu/buddy: Remove redundant condition in alloc_from_freetree() error path Francois Dugast
2026-05-22 10:10 ` Matthew Auld
@ 2026-05-25 9:06 ` Claude Code Review Bot
1 sibling, 0 replies; 17+ messages in thread
From: Claude Code Review Bot @ 2026-05-25 9:06 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Correctness: Correct cleanup.**
The `err_undo` label is only reachable from inside the `while (tmp != order)` loop (via `goto err_undo` at line 842). At that point `tmp` is guaranteed to differ from `order` — if they were equal, the loop wouldn't have been entered. Removing the guard is safe.
```c
err_undo:
rbtree_remove(mm, block);
__gpu_buddy_free(mm, block, false);
```
This is a straightforward simplification. No issues.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 17+ messages in thread
* Claude review: gpu/buddy: Introduce __gpu_buddy_undo_splits() helper
2026-05-22 9:25 ` [PATCH v4 3/5] gpu/buddy: Introduce __gpu_buddy_undo_splits() helper Francois Dugast
@ 2026-05-25 9:06 ` Claude Code Review Bot
0 siblings, 0 replies; 17+ messages in thread
From: Claude Code Review Bot @ 2026-05-25 9:06 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Correctness: Correct refactor, but one subtle behavioral change in `alloc_from_freetree`.**
The helper extracts the repeated pattern:
```c
static void __gpu_buddy_undo_splits(struct gpu_buddy *mm,
struct gpu_buddy_block *block)
{
struct gpu_buddy_block *buddy = __get_buddy(block);
if (buddy &&
(gpu_buddy_block_is_free(block) &&
gpu_buddy_block_is_free(buddy))) {
rbtree_remove(mm, block);
__gpu_buddy_free(mm, block, false);
}
}
```
For the three sites in `__alloc_range_bias`, `gpu_buddy_offset_aligned_allocation`, and `__alloc_range`, this is a direct extraction — those all had the `if (buddy && ...)` guard already.
**However**, in `alloc_from_freetree` (after patch 2's cleanup), the error path was:
```c
err_undo:
rbtree_remove(mm, block);
__gpu_buddy_free(mm, block, false);
```
— i.e., **unconditional** `rbtree_remove` + `__gpu_buddy_free`. Now it becomes:
```c
err_undo:
__gpu_buddy_undo_splits(mm, block);
```
which adds a **new guard** (`if buddy && both-free`). This is actually a tightening — if the buddy is not free or doesn't exist, the function now does nothing instead of unconditionally freeing.
**Is this correct?** When `split_block()` fails in `alloc_from_freetree`, the block is still FREE (split didn't complete). After the first successful split, the left child is marked free and the right child becomes `block` (via `block = block->right`). The right child is also marked free by `split_block()`. So after a failed split:
- `block` is FREE (the split that failed didn't change its state)
- `buddy` (left sibling) is FREE (created by the *previous* successful split)
So the guard `buddy && both-free` should be true in the normal case. But what about the *first* iteration failure? Then `block` is the original block from the tree — it has no buddy from splitting (the `__get_buddy` might return a pre-existing buddy). This seems safe: if there's no buddy or the buddy isn't free, we just skip the undo, and `__gpu_buddy_free` would handle the block via the `mark_free` path at the bottom anyway.
This is acceptable but the commit message could mention that `alloc_from_freetree` gains the buddy guard it previously lacked.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 17+ messages in thread
* Claude review: gpu/buddy: Track per-order free blocks with a scoreboard
2026-05-22 9:25 ` [PATCH v4 4/5] gpu/buddy: Track per-order free blocks with a scoreboard Francois Dugast
@ 2026-05-25 9:06 ` Claude Code Review Bot
0 siblings, 0 replies; 17+ messages in thread
From: Claude Code Review Bot @ 2026-05-25 9:06 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Correctness: The scoreboard maintenance looks correct.**
The increment/decrement sites are:
1. `mark_free()` — increment (block enters free state)
2. `mark_allocated()` — decrement (block leaves free state)
3. `mark_split()` — decrement (block leaves free state)
4. `__gpu_buddy_free()` coalescing loop — decrement for buddy being removed from tree
5. `__force_merge()` — decrement for block being removed from tree
6. `__gpu_buddy_undo_splits()` — decrement for block being removed from tree
These cover all the places where `rbtree_remove` is called on a free block (decrements) and where `rbtree_insert` is called on a free block (increments, via `mark_free`).
**Initialization and cleanup look correct** — `kcalloc` zeroes the array, `kfree` in both the init error path and `fini`.
**One minor note on the print threshold change:**
```c
- if (free < SZ_1M)
+ if (free < SZ_1M)
```
In this patch the threshold is still based on `free` (the total free bytes). This gets changed to `block_size` in patch 5. That's fine — incremental development.
No issues.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 17+ messages in thread
* Claude review: gpu/buddy: Track per-order used blocks with a scoreboard
2026-05-22 9:25 ` [PATCH v4 5/5] gpu/buddy: Track per-order used " Francois Dugast
2026-05-22 10:25 ` Arunpravin Paneer Selvam
@ 2026-05-25 9:06 ` Claude Code Review Bot
1 sibling, 0 replies; 17+ messages in thread
From: Claude Code Review Bot @ 2026-05-25 9:06 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Correctness: The used scoreboard maintenance is mostly correct, with one item worth double-checking.**
Increment site:
- `mark_allocated()` — block enters allocated state
Decrement sites:
- `mark_free()` — guarded by `gpu_buddy_block_is_allocated(block)`, which is correct because `mark_free` is also called on SPLIT blocks (during the coalescing path in `__gpu_buddy_free` at the bottom: `mark_free(mm, block)`), and those should not decrement the used scoreboard.
- `__gpu_buddy_free()` coalescing loop — guarded by `gpu_buddy_block_is_allocated(block)`, handling the case where an allocated block is consumed during coalescing.
```c
+ if (gpu_buddy_block_is_allocated(block))
+ mm->used_scoreboard[gpu_buddy_block_order(block)]--;
+
gpu_block_free(mm, block);
```
**Question about the coalescing path in `__gpu_buddy_free()`:** The coalescing loop at line ~277 walks up the tree merging block+buddy pairs. The `block` variable in the loop is the block being freed. On entry to `__gpu_buddy_free`, is `block` ever in the ALLOCATED state? Looking at the callers:
- `mark_free` is called *after* `__gpu_buddy_free` returns (at the bottom of the function)
- `__gpu_buddy_undo_splits` calls it on a FREE block
- `gpu_buddy_free_list_internal` calls `mark_free` then `__gpu_buddy_free` — wait, no, it calls `__gpu_buddy_free` which internally calls `mark_free` at the end.
Actually, looking more carefully at `__gpu_buddy_free`, the block enters as whatever state it is (the callers should have already ensured appropriate state), and the loop checks if `buddy` is free. The `block` at each iteration is the current node walking up the tree. On the first iteration, the block could indeed be allocated (if called from a path that hasn't marked it free yet). But looking at all callers of `__gpu_buddy_free`:
1. `__gpu_buddy_undo_splits` — block is FREE
2. `gpu_buddy_free_list_internal` via `__gpu_buddy_free` — the block state depends on the caller
Actually, re-reading `__gpu_buddy_free` more carefully: it first does a coalescing loop, then calls `mark_free(mm, block)`. The block entering the function may be in ALLOCATED state (from a direct free path). So the guard `if (gpu_buddy_block_is_allocated(block))` in the coalescing loop is correct — it handles the case where an allocated block is being freed and its buddy is also free, so they merge without going through `mark_free` first.
The `gpu_buddy_fini` assertion is a good addition:
```c
for (i = 0; i <= mm->max_order; ++i)
gpu_buddy_assert(!mm->used_scoreboard[i]);
```
**The print format change** from `free < SZ_1M` to `block_size < SZ_1M` is a sensible improvement — it makes the KiB/MiB display consistent per order rather than per count, so zero-block orders still display in the right unit. This changes output format for all consumers of `drm_buddy_print` / `gpu_buddy_print`, but that's debugfs-only so it should be fine.
**Minor style observation:** The used_scoreboard comment block in `gpu_buddy.h` is quite verbose (6 lines), which is fine for kernel code documenting a non-obvious invariant.
No blocking issues found. The series is well-constructed and the scoreboard bookkeeping appears correct across all state transitions.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2026-05-25 9:06 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-22 9:25 [PATCH v4 0/5] gpu/buddy: Per-order free and used block scoreboards Francois Dugast
2026-05-22 9:25 ` [PATCH v4 1/5] gpu/buddy: Fix use-after-free in split_block() call sites Francois Dugast
2026-05-25 9:06 ` Claude review: " Claude Code Review Bot
2026-05-22 9:25 ` [PATCH v4 2/5] gpu/buddy: Remove redundant condition in alloc_from_freetree() error path Francois Dugast
2026-05-22 10:10 ` Matthew Auld
2026-05-25 9:06 ` Claude review: " Claude Code Review Bot
2026-05-22 9:25 ` [PATCH v4 3/5] gpu/buddy: Introduce __gpu_buddy_undo_splits() helper Francois Dugast
2026-05-25 9:06 ` Claude review: " Claude Code Review Bot
2026-05-22 9:25 ` [PATCH v4 4/5] gpu/buddy: Track per-order free blocks with a scoreboard Francois Dugast
2026-05-25 9:06 ` Claude review: " Claude Code Review Bot
2026-05-22 9:25 ` [PATCH v4 5/5] gpu/buddy: Track per-order used " Francois Dugast
2026-05-22 10:25 ` Arunpravin Paneer Selvam
2026-05-22 10:56 ` Matthew Auld
2026-05-25 9:06 ` Claude review: " Claude Code Review Bot
2026-05-25 9:06 ` Claude review: gpu/buddy: Per-order free and used block scoreboards Claude Code Review Bot
-- strict thread matches above, loose matches on Subject: below --
2026-05-11 16:41 [PATCH v2 0/3] " Francois Dugast
2026-05-16 4:58 ` Claude review: " Claude Code Review Bot
2026-05-04 13:52 [PATCH 0/2] " Francois Dugast
2026-05-04 22:19 ` Claude review: " Claude Code Review Bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox