From mboxrd@z Thu Jan 1 00:00:00 1970 From: Claude Code Review Bot To: dri-devel-reviews@example.com Subject: Claude review: accel: ethosu: Add performance counter support Date: Sat, 16 May 2026 10:08:39 +1000 Message-ID: In-Reply-To: <20260515032625.1880618-1-robh@kernel.org> References: <20260515032625.1880618-1-robh@kernel.org> <20260515032625.1880618-1-robh@kernel.org> X-Mailer: Claude Code Patch Reviewer Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Patch Review **Bug: Uninitialized `pmu_regs` offset for non-U65** In `ethosu_drv.c`, the probe function sets: ```c ethosudev->pmu_regs =3D ethosudev->regs; ``` Then in `ethosu_init()`: ```c if (!ethosu_is_u65(ethosudev)) ethosudev->pmu_regs +=3D 0x1000; ``` This `+=3D 0x1000` is applied to the `void __iomem *` pointer. However, at = this point `pmu_regs` was already set to `ethosudev->regs` in `ethosu_probe= ()`. If `ethosu_init()` were ever called a second time (e.g. via a resume p= ath or error-retry), the offset would be applied twice. More importantly, t= he `devm_platform_ioremap_resource()` mapping might not cover the extra 0x1= 000 offset =E2=80=94 the resource size should be validated. Suggest using a= n explicit assignment `ethosudev->pmu_regs =3D ethosudev->regs + 0x1000;` i= nstead of `+=3D` to make it idempotent and clearer. **UAPI: `drm_ethosu_npu_info` struct packing changed without padding** Adding `pmu_counters` (__u32) to `drm_ethosu_npu_info`: ```c struct drm_ethosu_npu_info { __u32 id; __u32 config; __u32 sram_size; __u32 pmu_counters; /* new */ }; ``` This changes the struct size from 12 to 16 bytes. Since this struct is used= via `DRM_ETHOSU_DEV_QUERY` with a size-based copy mechanism (`min(size, ac= tual_structure_size)`), this is designed to be extensible and should be fin= e for forward/backward compatibility. However, the original 12-byte struct = had no padding for natural alignment =E2=80=94 was 12 bytes intentional, or= should there have been an explicit pad field already? This is worth verify= ing that existing userspace won't break, since `sizeof(struct drm_ethosu_np= u_info)` changes. **UAPI: `drm_ethosu_perfmon_create` alignment/padding issues** ```c struct drm_ethosu_perfmon_create { __u32 id; __u32 ncounters; __u16 counters[DRM_ETHOSU_MAX_PERF_EVENT_COUNTERS]; /* 8 * 2 =3D 16 byt= es */ }; ``` Total: 4 + 4 + 16 =3D 24 bytes. The struct has no trailing padding issue pe= r se, but the `__u16` array following `__u32` fields is aligned fine. Howev= er, it would be good practice to add a `__u32 pad` for future extensibility= since this is ioctl-level ABI. **UAPI: `drm_ethosu_perfmon_destroy` is only 4 bytes** ```c struct drm_ethosu_perfmon_destroy { __u32 id; }; ``` A 4-byte UAPI struct is unusual. Consider adding a `__u32 pad;` for future = extensibility and to ensure 8-byte alignment for the ioctl struct. **UAPI: `drm_ethosu_submit.pad` field repurposed** The existing `pad` field (which was required to be zero) is repurposed as `= perfmon_id`: ```c - /** Reserved, must be zero. */ - __u32 pad; + /** Input: Id returned by DRM_ETHOSU_PERFMON_CREATE */ + __u32 perfmon_id; ``` The validation of `pad =3D=3D 0` is removed: ```c - if (args->pad) { - drm_dbg(dev, "Reserved field in drm_ethosu_submit struct should be 0.\n"= ); - return -EINVAL; - } ``` This is the correct way to reuse a reserved field. Since `XA_FLAGS_ALLOC1` = is used, valid perfmon IDs start at 1 and `perfmon_id =3D=3D 0` means "no p= erfmon," which is backward-compatible. Good. **Locking: `ethosu_perfmon_start` called under spinlock but not annotated** In `ethosu_switch_perfmon()`: ```c guard(spinlock)(ðosu->perfmon_state.lock); ... ethosu_perfmon_stop_locked(ethosu, ethosu->perfmon_state.active, true); if (perfmon && ethosu->perfmon_state.active !=3D perfmon) ethosu_perfmon_start(ethosu, perfmon); ``` `ethosu_perfmon_start()` is called under `perfmon_state.lock` (a spinlock),= and it does `writel_relaxed()` which is fine from a sleeping context. But = this function does MMIO writes =E2=80=94 the spinlock ensures atomicity wit= h respect to the stop path, which is correct. However, `ethosu_perfmon_star= t()` has a `WARN_ON_ONCE(!perfmon || ethosu->perfmon_state.active)` check = =E2=80=94 by the time we get there after the stop, `active` should be NULL.= The second condition in the `if` check `ethosu->perfmon_state.active !=3D = perfmon` is redundant after the early return for `perfmon =3D=3D active` ca= se, since `stop_locked` sets `active =3D NULL`. Not a bug, just slightly co= nfusing logic. **Potential data race: `global_perfmon` read without lock** In `ethosu_switch_perfmon()`: ```c guard(spinlock)(ðosu->perfmon_state.lock); perfmon =3D READ_ONCE(ethosu->global_perfmon); ``` The `READ_ONCE` is good, but `global_perfmon` is also written in `ethosu_pe= rfmon_set_global()` under the same spinlock, and in `ethosu_perfmon_delete(= )` under `spinlock_irqsave`. The lock coverage is consistent, which is corr= ect. However, the `global_perfmon` pointer is set with a refcount from `eth= osu_perfmon_find()` in `set_global`, but there's no corresponding `ethosu_p= erfmon_get()` when it's used in `switch_perfmon`. If the global perfmon is = concurrently destroyed (via `ethosu_perfmon_delete` which drops the ref), t= he pointer used in `switch_perfmon` could reference freed memory after the = spinlock is dropped. In practice, since `switch_perfmon` only uses the poin= ter within the spinlock scope and `delete` also takes the spinlock, this is= safe =E2=80=94 but the design is fragile. **`ethosu_perfmon_stop_locked`: `pm_runtime_get_if_active` returns int, not= bool** ```c if (!pm_runtime_get_if_active(ethosu->base.dev)) { ``` `pm_runtime_get_if_active()` can return negative error codes, 0 (device not= active), or positive (success). The `!` check treats negative errors the s= ame as "not active" =E2=80=94 this is probably fine (if PM errors out, bail= ing is reasonable), but it's worth noting. **`ethosu_perfmon_stop_locked`: called from spinlock but calls `pm_runtime_= put`** ```c void ethosu_perfmon_stop_locked(...) { ... pm_runtime_put(ethosu->base.dev); } ``` `pm_runtime_put()` may sleep (it can trigger `runtime_suspend` callbacks sy= nchronously). This is called under a spinlock context (from `ethosu_switch_= perfmon` via `guard(spinlock)`). This is a **bug** =E2=80=94 sleeping in at= omic context. Should use `pm_runtime_put_autosuspend()` or `pm_runtime_mark= _last_busy()` + `pm_runtime_put_autosuspend()` instead, or better yet, defe= r the pm_runtime_put to after the spinlock is released. Similarly, `pm_runtime_get_if_active()` should also be checked for sleeping= behavior, though that one is documented as safe in atomic context. **`ethosu_ioctl_perfmon_get_values`: stopping a perfmon that may not be act= ive** ```c ethosu_perfmon_stop(ethosu, perfmon, true); ``` This stops the perfmon to capture current values, but doesn't restart it af= terward. If this was a global perfmon that was running, it will be stopped = permanently after a get_values call. The docstring says "no implicit synchr= onization" but the stop side-effect is implicit. This seems intentional bas= ed on the V3D model, but worth documenting. **`ethosu_ioctl_perfmon_set_global`: no check that perfmon_id matches the p= erfmon being cleared** When `DRM_ETHOSU_PERFMON_CLEAR_GLOBAL` is set: ```c perfmon =3D ethosu_perfmon_find(ethosu_priv, req->id); if (!perfmon) return -EINVAL; ... if (req->flags & DRM_ETHOSU_PERFMON_CLEAR_GLOBAL) { old =3D ethosu->global_perfmon; if (!old) { ... return -EINVAL; } ethosu->global_perfmon =3D NULL; ``` The `req->id` is used to find a perfmon, but then the code doesn't verify t= hat the found perfmon matches the current `global_perfmon`. Any valid perfm= on ID will clear any global perfmon. The found `perfmon` is only used for t= he `find` validity check, then immediately put. This is potentially confusi= ng API =E2=80=94 either the `id` should be validated against the current gl= obal, or the clear operation shouldn't require a valid `id` at all. **Missing `#include` for flex array / `kzalloc_flex` in ethosu_perfmon.c** The code uses `kzalloc_flex()` but I don't see an explicit include for what= ever header provides it. It likely comes transitively through `slab.h` or o= ne of the DRM headers. Should be fine but worth verifying builds cleanly. **`ethosu_job_err_cleanup`: perfmon_put before BO cleanup** ```c static void ethosu_job_err_cleanup(struct ethosu_job *job) { ethosu_perfmon_put(job->perfmon); for (i =3D 0; i < job->region_cnt; i++) drm_gem_object_put(job->region_bo[i]); ``` This is fine =E2=80=94 just noting that `perfmon_put` is added at the top. = If `job->perfmon` is NULL (no perfmon attached), the `ethosu_perfmon_put` N= ULL-checks internally, which is correct. **Missing error check for `ethosu_perfmon_find` in submit path** ```c if (perfmon_id) ejob->perfmon =3D ethosu_perfmon_find(file_priv, perfmon_id); ``` If the user passes a nonzero `perfmon_id` that doesn't exist in the xarray,= `ethosu_perfmon_find` returns NULL (xa_load returns NULL for missing entri= es). The job proceeds with `perfmon =3D=3D NULL`, silently ignoring the inv= alid perfmon ID. This should return `-EINVAL` if the user specified a perfm= on_id but it wasn't found: ```c if (perfmon_id) { ejob->perfmon =3D ethosu_perfmon_find(file_priv, perfmon_id); if (!ejob->perfmon) return -EINVAL; } ``` **`counters` array in `ethosu_perfmon` struct is fixed-size but `values` is= flex** ```c struct ethosu_perfmon { u8 ncounters; u16 counters[DRM_ETHOSU_MAX_PERF_EVENT_COUNTERS]; u64 values[] __counted_by(ncounters); }; ``` `ncounters` includes the cycle counter (+1), so it can be up to 9 (DRM_ETHO= SU_MAX_PERF_COUNTERS). The `counters` array is sized `DRM_ETHOSU_MAX_PERF_E= VENT_COUNTERS` (8), which stores only the user-specified event counters (no= t cycles). `values` is `ncounters` entries which includes cycles. This mism= atch between `counters` array size (8 event types) and `values` array size = (up to 9 values) is intentional but the mixed use of `ncounters` and `ncoun= ters - 1` throughout the code is error-prone and hard to follow. Consider u= sing a separate field for the event counter count vs total counter count. **Minor: SPDX license tag inconsistency** `ethosu_perfmon.c` header: ```c // SPDX-License-Identifier: GPL-2.0-only or MIT ``` Other files use: ```c // SPDX-License-Identifier: GPL-2.0-only OR MIT ``` The SPDX spec requires uppercase `OR`. This should be `GPL-2.0-only OR MIT`. **Summary of issues by severity:** 1. **Bug**: `pm_runtime_put()` called under spinlock in `ethosu_perfmon_sto= p_locked` =E2=80=94 sleeping in atomic context 2. **Bug**: Missing error check when `ethosu_perfmon_find` returns NULL in = the submit path (invalid perfmon_id silently ignored) 3. **Issue**: `pmu_regs +=3D 0x1000` is not idempotent =E2=80=94 should be = explicit assignment 4. **UAPI**: Several structs could benefit from pad fields for future ABI e= xtensibility 5. **Minor**: SPDX license tag uses lowercase `or` instead of `OR` 6. **Style**: Pervasive `ncounters - 1` pattern throughout the code is frag= ile; a named field would be clearer --- Generated by Claude Code Patch Reviewer