From mboxrd@z Thu Jan 1 00:00:00 1970 From: Claude Code Review Bot To: dri-devel-reviews@example.com Subject: Claude review: drm/amdkfd: simplify kgd_gfx_v9_get_cu_occupancy() Date: Tue, 28 Apr 2026 13:44:59 +1000 Message-ID: In-Reply-To: <20260428022605.447901-1-ynorov@nvidia.com> References: <20260428022605.447901-1-ynorov@nvidia.com> <20260428022605.447901-1-ynorov@nvidia.com> X-Mailer: Claude Code Patch Reviewer Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Patch Review **Critical bug: inverted bitmap logic** The original code does this: ```c bitmap_complement(cp_queue_bitmap, adev->gfx.mec_bitmap[0].queue_bitmap, AMDGPU_MAX_QUEUES); ... for (qidx =3D 0; qidx < max_queue_cnt; qidx++) { if (!test_bit(qidx, cp_queue_bitmap)) continue; if (!(queue_map & (1 << qidx))) continue; get_wave_count(adev, qidx, &cu_occupancy[qidx], inst); } ``` This iterates over bits that are **set in `~queue_bitmap` AND set in `queue= _map`**, i.e. `~queue_bitmap & queue_map`. The replacement: ```c for_each_andnot_bit(qidx, adev->gfx.mec_bitmap[0].queue_bitmap, queue_map, max_queue_cnt) ``` `for_each_andnot_bit(bit, addr1, addr2, size)` iterates bits set in `addr1 = & ~addr2` (per `find_next_andnot_bit` in `include/linux/find.h:128`: `val = =3D *addr1 & ~*addr2`). So this iterates `queue_bitmap & ~queue_map`, which= is the **opposite** set of bits. The correct fix would swap the arguments: ```c for_each_andnot_bit(qidx, queue_map, adev->gfx.mec_bitmap[0].queue_bitmap, max_queue_cnt) ``` This would give `queue_map & ~queue_bitmap`, which matches the original `~q= ueue_bitmap & queue_map`. **Minor: type mismatch for `bitmap_from_arr32`** ```c int max_queue_cnt, queue_reg; ... bitmap_from_arr32(queue_map, &queue_reg, 32); ``` `bitmap_from_arr32()` takes `const u32 *buf`. Passing `&queue_reg` where `q= ueue_reg` is `int` is technically a type mismatch. While it works in practi= ce on Linux (int is 32-bit, same representation), `queue_reg` should be dec= lared `u32` for correctness and to match the register read semantics (`RREG= 32_SOC15` returns `uint32_t`). **Nit: `DECLARE_BITMAP(queue_map, 32)` sizing** This is fine =E2=80=94 32 bits matches the hardware register width of `mmSP= I_CSQ_WF_ACTIVE_STATUS` and `max_queue_cnt` should be <=3D 32 for GFX9 (4 p= ipes * 8 queues =3D 32). However, there's no compile-time or runtime assert= ion that `max_queue_cnt <=3D 32`. If it ever exceeded 32, `for_each_andnot_= bit` would read past the end of `queue_map`. The original code used `AMDGPU= _MAX_QUEUES` (128) for `cp_queue_bitmap`, which was more defensive. A `WARN= _ON(max_queue_cnt > 32)` or a comment acknowledging this constraint would b= e prudent. **Summary**: NAK as-is due to the swapped `for_each_andnot_bit` arguments = =E2=80=94 this would cause `get_wave_count()` to be called for the compleme= nt of the intended queue set, producing wrong CU occupancy data. --- Generated by Claude Code Patch Reviewer