* [PATCH] drm/amdkfd: simplify kgd_gfx_v9_get_cu_occupancy()
@ 2026-04-28 2:26 Yury Norov
2026-04-28 3:44 ` Claude review: " Claude Code Review Bot
2026-04-28 3:44 ` Claude Code Review Bot
0 siblings, 2 replies; 3+ messages in thread
From: Yury Norov @ 2026-04-28 2:26 UTC (permalink / raw)
To: Felix Kuehling, Alex Deucher, Christian König, David Airlie,
Simona Vetter, amd-gfx, dri-devel, linux-kernel
Cc: Yury Norov
Switch inner loop in kgd_gfx_v9_get_cu_occupancy() to the dedicated
for_each(), and drop most of housekeeping code.
Signed-off-by: Yury Norov <ynorov@nvidia.com>
---
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 32 +++++++------------
1 file changed, 11 insertions(+), 21 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
index 2e116c06d5be..08f4c4bf20c3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -1026,9 +1026,8 @@ void kgd_gfx_v9_get_cu_occupancy(struct amdgpu_device *adev,
int qidx;
int se_idx;
int se_cnt;
- int queue_map;
- int max_queue_cnt;
- DECLARE_BITMAP(cp_queue_bitmap, AMDGPU_MAX_QUEUES);
+ DECLARE_BITMAP(queue_map, 32);
+ int max_queue_cnt, queue_reg;
lock_spi_csq_mutexes(adev);
soc15_grbm_select(adev, 1, 0, 0, 0, GET_INST(GC, inst));
@@ -1037,35 +1036,26 @@ void kgd_gfx_v9_get_cu_occupancy(struct amdgpu_device *adev,
* Iterate through the shader engines and arrays of the device
* to get number of waves in flight
*/
- bitmap_complement(cp_queue_bitmap, adev->gfx.mec_bitmap[0].queue_bitmap,
- AMDGPU_MAX_QUEUES);
max_queue_cnt = adev->gfx.mec.num_pipe_per_mec *
adev->gfx.mec.num_queue_per_pipe;
se_cnt = adev->gfx.config.max_shader_engines;
for (se_idx = 0; se_idx < se_cnt; se_idx++) {
amdgpu_gfx_select_se_sh(adev, se_idx, 0, 0xffffffff, inst);
- queue_map = RREG32_SOC15(GC, GET_INST(GC, inst), mmSPI_CSQ_WF_ACTIVE_STATUS);
+ queue_reg = RREG32_SOC15(GC, GET_INST(GC, inst), mmSPI_CSQ_WF_ACTIVE_STATUS);
+ bitmap_from_arr32(queue_map, &queue_reg, 32);
/*
* Assumption: queue map encodes following schema: four
* pipes per each micro-engine, with each pipe mapping
* eight queues. This schema is true for GFX9 devices
- * and must be verified for newer device families
+ * and must be verified for newer device families.
+ *
+ * Get number of waves in flight and aggregate them, and skip
+ * queues that are not associated with compute functions.
*/
- for (qidx = 0; qidx < max_queue_cnt; qidx++) {
- /* Skip qeueus that are not associated with
- * compute functions
- */
- if (!test_bit(qidx, cp_queue_bitmap))
- continue;
-
- if (!(queue_map & (1 << qidx)))
- continue;
-
- /* Get number of waves in flight and aggregate them */
- get_wave_count(adev, qidx, &cu_occupancy[qidx],
- inst);
- }
+ for_each_andnot_bit(qidx, adev->gfx.mec_bitmap[0].queue_bitmap,
+ queue_map, max_queue_cnt)
+ get_wave_count(adev, qidx, &cu_occupancy[qidx], inst);
}
amdgpu_gfx_select_se_sh(adev, 0xffffffff, 0xffffffff, 0xffffffff, inst);
--
2.51.0
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Claude review: drm/amdkfd: simplify kgd_gfx_v9_get_cu_occupancy()
2026-04-28 2:26 [PATCH] drm/amdkfd: simplify kgd_gfx_v9_get_cu_occupancy() Yury Norov
@ 2026-04-28 3:44 ` Claude Code Review Bot
2026-04-28 3:44 ` Claude Code Review Bot
1 sibling, 0 replies; 3+ messages in thread
From: Claude Code Review Bot @ 2026-04-28 3:44 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: drm/amdkfd: simplify kgd_gfx_v9_get_cu_occupancy()
Author: Yury Norov <ynorov@nvidia.com>
Patches: 1
Reviewed: 2026-04-28T13:44:59.288137
---
This is a single patch that attempts to simplify `kgd_gfx_v9_get_cu_occupancy()` by replacing a manual bitmap-complement-then-loop pattern with the kernel's `for_each_andnot_bit()` iterator. The intent is good and the cleanup would be welcome, but **the arguments to `for_each_andnot_bit()` are swapped**, completely inverting which queues are processed. This is a correctness bug that would cause the function to report wave occupancy for the wrong set of queues.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 3+ messages in thread
* Claude review: drm/amdkfd: simplify kgd_gfx_v9_get_cu_occupancy()
2026-04-28 2:26 [PATCH] drm/amdkfd: simplify kgd_gfx_v9_get_cu_occupancy() Yury Norov
2026-04-28 3:44 ` Claude review: " Claude Code Review Bot
@ 2026-04-28 3:44 ` Claude Code Review Bot
1 sibling, 0 replies; 3+ messages in thread
From: Claude Code Review Bot @ 2026-04-28 3:44 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Critical bug: inverted bitmap logic**
The original code does this:
```c
bitmap_complement(cp_queue_bitmap, adev->gfx.mec_bitmap[0].queue_bitmap,
AMDGPU_MAX_QUEUES);
...
for (qidx = 0; qidx < max_queue_cnt; qidx++) {
if (!test_bit(qidx, cp_queue_bitmap))
continue;
if (!(queue_map & (1 << qidx)))
continue;
get_wave_count(adev, qidx, &cu_occupancy[qidx], inst);
}
```
This iterates over bits that are **set in `~queue_bitmap` AND set in `queue_map`**, i.e. `~queue_bitmap & queue_map`.
The replacement:
```c
for_each_andnot_bit(qidx, adev->gfx.mec_bitmap[0].queue_bitmap,
queue_map, max_queue_cnt)
```
`for_each_andnot_bit(bit, addr1, addr2, size)` iterates bits set in `addr1 & ~addr2` (per `find_next_andnot_bit` in `include/linux/find.h:128`: `val = *addr1 & ~*addr2`). So this iterates `queue_bitmap & ~queue_map`, which is the **opposite** set of bits.
The correct fix would swap the arguments:
```c
for_each_andnot_bit(qidx, queue_map,
adev->gfx.mec_bitmap[0].queue_bitmap, max_queue_cnt)
```
This would give `queue_map & ~queue_bitmap`, which matches the original `~queue_bitmap & queue_map`.
**Minor: type mismatch for `bitmap_from_arr32`**
```c
int max_queue_cnt, queue_reg;
...
bitmap_from_arr32(queue_map, &queue_reg, 32);
```
`bitmap_from_arr32()` takes `const u32 *buf`. Passing `&queue_reg` where `queue_reg` is `int` is technically a type mismatch. While it works in practice on Linux (int is 32-bit, same representation), `queue_reg` should be declared `u32` for correctness and to match the register read semantics (`RREG32_SOC15` returns `uint32_t`).
**Nit: `DECLARE_BITMAP(queue_map, 32)` sizing**
This is fine — 32 bits matches the hardware register width of `mmSPI_CSQ_WF_ACTIVE_STATUS` and `max_queue_cnt` should be <= 32 for GFX9 (4 pipes * 8 queues = 32). However, there's no compile-time or runtime assertion that `max_queue_cnt <= 32`. If it ever exceeded 32, `for_each_andnot_bit` would read past the end of `queue_map`. The original code used `AMDGPU_MAX_QUEUES` (128) for `cp_queue_bitmap`, which was more defensive. A `WARN_ON(max_queue_cnt > 32)` or a comment acknowledging this constraint would be prudent.
**Summary**: NAK as-is due to the swapped `for_each_andnot_bit` arguments — this would cause `get_wave_count()` to be called for the complement of the intended queue set, producing wrong CU occupancy data.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-04-28 3:44 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-28 2:26 [PATCH] drm/amdkfd: simplify kgd_gfx_v9_get_cu_occupancy() Yury Norov
2026-04-28 3:44 ` Claude review: " Claude Code Review Bot
2026-04-28 3:44 ` Claude Code Review Bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox