* Claude review: accel: ethosu: Add performance counter support
2026-03-06 16:36 [PATCH] " Rob Herring (Arm)
@ 2026-03-08 22:34 ` Claude Code Review Bot
2026-03-08 22:34 ` Claude Code Review Bot
1 sibling, 0 replies; 5+ messages in thread
From: Claude Code Review Bot @ 2026-03-08 22:34 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: accel: ethosu: Add performance counter support
Author: "Rob Herring (Arm)" <robh@kernel.org>
Patches: 1
Reviewed: 2026-03-09T08:34:13.070401
---
This is a single patch adding PMU (Performance Monitoring Unit) performance counter support to the Arm Ethos-U NPU accel driver. The design is modeled after V3D's perfmon implementation. The patch adds new ioctls for creating, destroying, querying, and setting global performance monitors, along with hardware register definitions and job integration.
Overall the patch is functional but has several issues ranging from potential bugs, UAPI concerns, and concurrency/locking problems that should be addressed before merging.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 5+ messages in thread
* Claude review: accel: ethosu: Add performance counter support
2026-03-06 16:36 [PATCH] " Rob Herring (Arm)
2026-03-08 22:34 ` Claude review: " Claude Code Review Bot
@ 2026-03-08 22:34 ` Claude Code Review Bot
1 sibling, 0 replies; 5+ messages in thread
From: Claude Code Review Bot @ 2026-03-08 22:34 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**UAPI Issues:**
1. **Breaking ABI change by repurposing `pad` field in `drm_ethosu_submit`**: The patch renames the `pad` field to `perfmon_id` and removes the validation that `pad == 0`. This is an ABI break - existing userspace that happens to pass non-zero padding would now silently trigger perfmon behavior. The existing kernel rejects `pad != 0`, so technically any value in that field from old userspace would be 0, but the removal of the check (lines 480-484 of the diff) is concerning. It would be cleaner to keep checking `perfmon_id` for validity (i.e., return error if non-zero and no perfmon found) rather than silently ignoring invalid IDs.
2. **`drm_ethosu_npu_info` struct extension without versioning**: Adding `pmu_counters` to `drm_ethosu_npu_info` changes its size. The existing `drm_ethosu_dev_query` mechanism with its `size` field should handle this gracefully, but it's worth confirming that the query path uses `min(size, actual_size)` correctly so old userspace doesn't read garbage.
3. **`drm_ethosu_perfmon_create` has no padding**: The struct has `__u32 id`, `__u32 ncounters`, `__u16 counters[8]` = 4+4+16 = 24 bytes. This is fine for alignment, but there's no reserved/flags field for future extensibility.
4. **`drm_ethosu_perfmon_destroy` is only 4 bytes**: A `__u32 id` with no padding. This is quite small and has no room for future flags. Consider adding a `__u32 pad` that must be zero.
**Concurrency and Locking Issues:**
5. **`ethosu_switch_perfmon` has redundant checks**: Lines 337-341 in the diff:
```c
if (perfmon == ethosu->active_perfmon)
return;
if (perfmon != ethosu->active_perfmon)
ethosu_perfmon_stop(ethosu, ethosu->active_perfmon, true);
```
The second `if` is always true after the first check passes - it's redundant. The function also has a third check `if (perfmon && ethosu->active_perfmon != perfmon)` which is also always true at that point. This should be simplified.
6. **`active_perfmon` and `global_perfmon` lack proper locking**: These fields on `ethosu_device` are accessed from multiple paths (ioctl handlers, job run, close_file, etc.) without consistent locking. `cmpxchg` and `xchg` are used in some places for `global_perfmon` but not for `active_perfmon`. The `ethosu_switch_perfmon` is called from `ethosu_job_run` under no device-level lock. There's a `perfmon->lock` mutex but `active_perfmon` itself is set and read without synchronization.
7. **`ethosu_perfmon_set_global` leaks refcount**: `ethosu_perfmon_find` takes a reference, but both the set and clear paths in `ethosu_ioctl_perfmon_set_global` return without calling `ethosu_perfmon_put`:
```c
perfmon = ethosu_perfmon_find(ethosu_priv, req->id); // takes ref
if (!perfmon)
return -EINVAL;
if (req->flags & DRM_ETHOSU_PERFMON_CLEAR_GLOBAL) {
...
return 0; // leaked ref
}
if (cmpxchg(ðosu->global_perfmon, NULL, perfmon))
return -EBUSY; // leaked ref
return 0; // leaked ref (though the stored pointer "owns" it conceptually?)
```
The intent may be that the `global_perfmon` pointer holds the reference, but then the clear path also needs to put the old reference, and the `-EBUSY` path is a definite leak.
8. **`ethosu_ioctl_perfmon_get_values` stops the perfmon but doesn't restart it**: Calling `ethosu_perfmon_stop` captures the counters and sets `active_perfmon = NULL`. But it never restarts counting, so a `get_values` call will stop the perfmon permanently. If the intent is to read intermediate results, this is broken. If the intent is that get_values is a one-shot terminal operation, it's not documented as such.
**Logic Bugs:**
9. **PMU register base adjustment is order-dependent and fragile**: In `ethosu_init`:
```c
ethosudev->npu_info.pmu_counters = FIELD_GET(PMCR_NUM_EVENT_CNT_MASK,
readl_relaxed(ethosudev->pmu_regs + NPU_REG_PMCR));
...
if (!ethosu_is_u65(ethosudev))
ethosudev->pmu_regs += 0x1000;
```
The PMCR is read *before* the U85 offset adjustment. This means for U85, the PMCR read uses the wrong register address. On U85 the PMU registers are offset by 0x1000, so the initial read at the U65 offset would read garbage on U85 hardware.
10. **`ncounters` semantics are confusing**: The `ncounters` in the UAPI struct is the number of event counters, but after `perfmon_create` it gets incremented to include the cycle counter. Then in `perfmon_start`/`perfmon_stop`, the code does `ncounters = perfmon->ncounters - 1` to get back to event counters. This is error-prone. The `__counted_by(ncounters)` on `values[]` means the flex array is sized for the incremented count which is correct, but the back-and-forth is fragile and should be documented clearly.
11. **`perfmon_create` modifies the input struct**: `req->ncounters++` modifies the ioctl input/output struct, which means the incremented value gets copied back to userspace. Userspace sees ncounters = N+1 rather than the N it passed in. This is surprising behavior and may confuse userspace.
12. **Missing `ethosu_perfmon_put` in `ethosu_job_cleanup`**: The `ethosu_job_err_cleanup` path calls `ethosu_perfmon_put(job->perfmon)`, but the normal `ethosu_job_cleanup` (called via kref_put) does not. This means the perfmon reference taken in `ethosu_perfmon_find` during submit is leaked for successfully completed jobs.
13. **Magic constant `0x000011`**: In `ethosu_perfmon_start`:
```c
writel_relaxed(0x000011, ethosu->pmu_regs + NPU_REG_PMCCNTR_CFG);
```
This magic value should have a define or at least a comment explaining what bits are being set.
14. **Magic constant `0x80000000` for cycle counter enable**: Used in both start and stop:
```c
mask = 0x80000000;
```
Should use `BIT(31)` for clarity.
**Minor Issues:**
15. The comment "The perform lifetime is controlled by userspace" should be "The perfmon lifetime...".
16. The SPDX identifier on `ethosu_perfmon.c` says `GPL-2.0-only or MIT` (lowercase "or") while other files in the driver use `GPL-2.0-only OR MIT`. Should be consistent with the SPDX standard (uppercase `OR`).
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2] accel: ethosu: Add performance counter support
@ 2026-05-15 3:26 Rob Herring (Arm)
2026-05-16 0:08 ` Claude review: " Claude Code Review Bot
2026-05-16 0:08 ` Claude Code Review Bot
0 siblings, 2 replies; 5+ messages in thread
From: Rob Herring (Arm) @ 2026-05-15 3:26 UTC (permalink / raw)
To: Maíra Canal, Tomeu Vizoso, Oded Gabbay, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Kees Cook, Gustavo A. R. Silva
Cc: linux-kernel, dri-devel, linux-hardening
The Arm Ethos-U NPUs have a PMU with performance counters. The PMU h/w
supports up to 4 (U65) or 8 (U85) counters which can be programmed for
different events. There is also a dedicated cycle counter.
The ABI and implementation are copied from the V3D driver. The main
difference in the ABI is there is no query API for the the event list.
The events differ between the U65 and U85, so the events lists are
maintained in userspace along with other differences between the U65 and
U85.
The cycle counter is always enabled when the PMU is enabled. When the
user requests N events, reading the counters will return the N events
plus the cycle counter.
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
---
v2:
- Use XArray instead of idr
- Rework locking to use per device spinlock to protect modifying active
perfmon. Based on pending V3D changes:
https://lore.kernel.org/all/20260508-v3d-perfmon-lifetime-v1-1-f5b5642c085f@igalia.com/
- Add missing perfmon puts in ethosu_ioctl_perfmon_set_global() and
ethosu_ioctl_perfmon_get_values() error paths.
- Fix reading number of counters on U85.
- Add defines NPU_REG_PMCCNTR_CFG
---
drivers/accel/ethosu/Makefile | 2 +-
drivers/accel/ethosu/ethosu_device.h | 32 +++
drivers/accel/ethosu/ethosu_drv.c | 21 +-
drivers/accel/ethosu/ethosu_drv.h | 62 +++++-
drivers/accel/ethosu/ethosu_job.c | 41 +++-
drivers/accel/ethosu/ethosu_job.h | 2 +
drivers/accel/ethosu/ethosu_perfmon.c | 296 ++++++++++++++++++++++++++
include/uapi/drm/ethosu_accel.h | 59 ++++-
8 files changed, 500 insertions(+), 15 deletions(-)
create mode 100644 drivers/accel/ethosu/ethosu_perfmon.c
diff --git a/drivers/accel/ethosu/Makefile b/drivers/accel/ethosu/Makefile
index 17db5a600416..598a388b7179 100644
--- a/drivers/accel/ethosu/Makefile
+++ b/drivers/accel/ethosu/Makefile
@@ -1,4 +1,4 @@
# SPDX-License-Identifier: GPL-2.0-only
obj-$(CONFIG_DRM_ACCEL_ARM_ETHOSU) := ethosu.o
-ethosu-y += ethosu_drv.o ethosu_gem.o ethosu_job.o
+ethosu-y += ethosu_drv.o ethosu_gem.o ethosu_job.o ethosu_perfmon.o
diff --git a/drivers/accel/ethosu/ethosu_device.h b/drivers/accel/ethosu/ethosu_device.h
index b189fa783d6a..92b6f2c69254 100644
--- a/drivers/accel/ethosu/ethosu_device.h
+++ b/drivers/accel/ethosu/ethosu_device.h
@@ -43,6 +43,15 @@ struct gen_pool;
#define NPU_REG_BASEP_HI(x) (0x0084 + (x) * 8)
#define NPU_BASEP_REGION_MAX 8
+#define NPU_REG_PMCR 0x0180
+#define NPU_REG_PMCNTENSET 0x0184
+#define NPU_REG_PMCNTENCLR 0x0188
+#define NPU_REG_PMCCNTR_LO 0x01A0
+#define NPU_REG_PMCCNTR_HI 0x01A4
+#define NPU_REG_PMCCNTR_CFG 0x01A8
+#define NPU_REG_PMU_EVCNTR(x) (0x0300 + (x) * 4)
+#define NPU_REG_PMU_EVTYPER(x) (0x0380 + (x) * 4)
+
#define ID_ARCH_MAJOR_MASK GENMASK(31, 28)
#define ID_ARCH_MINOR_MASK GENMASK(27, 20)
#define ID_ARCH_PATCH_MASK GENMASK(19, 16)
@@ -67,6 +76,15 @@ struct gen_pool;
#define PROT_ACTIVE_CSL BIT(1)
+#define PMCR_NUM_EVENT_CNT_MASK GENMASK(15, 11)
+#define PMCR_CYCLE_CNT_RST BIT(2)
+#define PMCR_EVENT_CNT_RST BIT(1)
+#define PMCR_CNT_EN BIT(0)
+
+#define PMU_EV_TYPE_NONE 0
+#define PMU_EV_TYPE_CYCLES 0x11
+#define PMU_EV_TYPE_IDLE 0x20
+
enum ethosu_cmds {
NPU_OP_CONV = 0x2,
NPU_OP_DEPTHWISE = 0x3,
@@ -152,6 +170,8 @@ enum ethosu_cmds {
#define ETHOSU_SRAM_REGION 2 /* Matching Vela compiler */
+struct ethosu_perfmon;
+
/**
* struct ethosu_device - Ethosu device
*/
@@ -161,6 +181,7 @@ struct ethosu_device {
/** @iomem: CPU mapping of the registers. */
void __iomem *regs;
+ void __iomem *pmu_regs;
void __iomem *sram;
struct gen_pool *srampool;
@@ -184,6 +205,17 @@ struct ethosu_device {
struct mutex sched_lock;
u64 fence_context;
u64 emit_seqno;
+
+ /* Tracks the performance monitor state. */
+ struct {
+ /* Protects @active. */
+ spinlock_t lock;
+
+ /* Perfmon currently programmed in HW (or NULL if none). */
+ struct ethosu_perfmon *active;
+ } perfmon_state;
+
+ struct ethosu_perfmon *global_perfmon;
};
#define to_ethosu_device(drm_dev) \
diff --git a/drivers/accel/ethosu/ethosu_drv.c b/drivers/accel/ethosu/ethosu_drv.c
index 9992193d7338..67fc971d7f60 100644
--- a/drivers/accel/ethosu/ethosu_drv.c
+++ b/drivers/accel/ethosu/ethosu_drv.c
@@ -155,6 +155,7 @@ static int ethosu_open(struct drm_device *ddev, struct drm_file *file)
if (ret)
goto err_put_mod;
+ ethosu_perfmon_open_file(priv);
file->driver_priv = no_free_ptr(priv);
return 0;
@@ -166,6 +167,7 @@ static int ethosu_open(struct drm_device *ddev, struct drm_file *file)
static void ethosu_postclose(struct drm_device *ddev, struct drm_file *file)
{
ethosu_job_close(file->driver_priv);
+ ethosu_perfmon_close_file(file->driver_priv);
kfree(file->driver_priv);
module_put(THIS_MODULE);
}
@@ -180,6 +182,10 @@ static const struct drm_ioctl_desc ethosu_drm_driver_ioctls[] = {
ETHOSU_IOCTL(BO_MMAP_OFFSET, bo_mmap_offset, 0),
ETHOSU_IOCTL(CMDSTREAM_BO_CREATE, cmdstream_bo_create, 0),
ETHOSU_IOCTL(SUBMIT, submit, 0),
+ ETHOSU_IOCTL(PERFMON_CREATE, perfmon_create, 0),
+ ETHOSU_IOCTL(PERFMON_DESTROY, perfmon_destroy, 0),
+ ETHOSU_IOCTL(PERFMON_GET_VALUES, perfmon_get_values, 0),
+ ETHOSU_IOCTL(PERFMON_SET_GLOBAL, perfmon_set_global, 0),
};
DEFINE_DRM_ACCEL_FOPS(ethosu_drm_driver_fops);
@@ -312,11 +318,16 @@ static int ethosu_init(struct ethosu_device *ethosudev)
ethosudev->npu_info.id = id = readl_relaxed(ethosudev->regs + NPU_REG_ID);
ethosudev->npu_info.config = config = readl_relaxed(ethosudev->regs + NPU_REG_CONFIG);
-
ethosu_sram_init(ethosudev);
+ if (!ethosu_is_u65(ethosudev))
+ ethosudev->pmu_regs += 0x1000;
+
+ ethosudev->npu_info.pmu_counters = FIELD_GET(PMCR_NUM_EVENT_CNT_MASK,
+ readl_relaxed(ethosudev->pmu_regs + NPU_REG_PMCR));
+
dev_info(ethosudev->base.dev,
- "Ethos-U NPU, arch v%ld.%ld.%ld, rev r%ldp%ld, cmd stream ver%ld, %d MACs, %dKB SRAM\n",
+ "Ethos-U NPU, arch v%ld.%ld.%ld, rev r%ldp%ld, cmd stream ver%ld, %d MACs, %dKB SRAM, %d PMU cntrs\n",
FIELD_GET(ID_ARCH_MAJOR_MASK, id),
FIELD_GET(ID_ARCH_MINOR_MASK, id),
FIELD_GET(ID_ARCH_PATCH_MASK, id),
@@ -324,7 +335,8 @@ static int ethosu_init(struct ethosu_device *ethosudev)
FIELD_GET(ID_VER_MINOR_MASK, id),
FIELD_GET(CONFIG_CMD_STREAM_VER_MASK, config),
1 << FIELD_GET(CONFIG_MACS_PER_CC_MASK, config),
- ethosudev->npu_info.sram_size / 1024);
+ ethosudev->npu_info.sram_size / 1024,
+ ethosudev->npu_info.pmu_counters);
return 0;
}
@@ -343,11 +355,14 @@ static int ethosu_probe(struct platform_device *pdev)
dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(40));
ethosudev->regs = devm_platform_ioremap_resource(pdev, 0);
+ ethosudev->pmu_regs = ethosudev->regs;
ethosudev->num_clks = devm_clk_bulk_get_all(&pdev->dev, ðosudev->clks);
if (ethosudev->num_clks < 0)
return ethosudev->num_clks;
+ spin_lock_init(ðosudev->perfmon_state.lock);
+
ret = ethosu_job_init(ethosudev);
if (ret)
return ret;
diff --git a/drivers/accel/ethosu/ethosu_drv.h b/drivers/accel/ethosu/ethosu_drv.h
index 9e21dfe94184..8fed43c2a7af 100644
--- a/drivers/accel/ethosu/ethosu_drv.h
+++ b/drivers/accel/ethosu/ethosu_drv.h
@@ -1,15 +1,75 @@
/* SPDX-License-Identifier: GPL-2.0-only OR MIT */
-/* Copyright 2025 Arm, Ltd. */
+/* Copyright 2025-2026 Arm, Ltd. */
#ifndef __ETHOSU_DRV_H__
#define __ETHOSU_DRV_H__
+#include <linux/mutex.h>
+#include <linux/xarray.h>
#include <drm/gpu_scheduler.h>
struct ethosu_device;
+struct drm_device;
+struct drm_file;
struct ethosu_file_priv {
struct ethosu_device *edev;
struct drm_sched_entity sched_entity;
+ struct xarray perfmons;
};
+/* Performance monitor object. The perform lifetime is controlled by userspace
+ * using perfmon related ioctls. A perfmon can be attached to a submit_cl
+ * request, and when this is the case, HW perf counters will be activated just
+ * before the submit_cl is submitted to the GPU and disabled when the job is
+ * done. This way, only events related to a specific job will be counted.
+ */
+struct ethosu_perfmon {
+ /* Tracks the number of users of the perfmon, when this counter reaches
+ * zero the perfmon is destroyed.
+ */
+ refcount_t refcnt;
+
+ /* Number of counters activated in this perfmon instance
+ * (should be less than or equal to DRM_ETHOSU_MAX_PERF_COUNTERS).
+ */
+ u8 ncounters;
+
+ /* Events counted by the HW perf counters. */
+ u16 counters[DRM_ETHOSU_MAX_PERF_EVENT_COUNTERS];
+
+ /*
+ * Storage for counter values. Counters are incremented by the HW
+ * perf counter values every time the perfmon is attached to a
+ * NPU job. This way, perfmon users don't have to retrieve the
+ * results after each job if they want to track events covering
+ * several submissions. Note that counter values can't be reset,
+ * but you can fake a reset by destroying the perfmon and
+ * creating a new one.
+ */
+ u64 values[] __counted_by(ncounters);
+};
+
+/* ethosu_perfmon.c */
+void ethosu_perfmon_init(struct ethosu_device *ethosu);
+void ethosu_perfmon_get(struct ethosu_perfmon *perfmon);
+void ethosu_perfmon_put(struct ethosu_perfmon *perfmon);
+void ethosu_perfmon_start(struct ethosu_device *ethosu,
+ struct ethosu_perfmon *perfmon);
+void ethosu_perfmon_stop(struct ethosu_device *ethosu,
+ struct ethosu_perfmon *perfmon, bool capture);
+void ethosu_perfmon_stop_locked(struct ethosu_device *ethosu, struct ethosu_perfmon *perfmon,
+ bool capture);
+struct ethosu_perfmon *ethosu_perfmon_find(struct ethosu_file_priv *ethosu_priv,
+ int id);
+void ethosu_perfmon_open_file(struct ethosu_file_priv *ethosu_priv);
+void ethosu_perfmon_close_file(struct ethosu_file_priv *ethosu_priv);
+int ethosu_ioctl_perfmon_create(struct drm_device *dev, void *data,
+ struct drm_file *file_priv);
+int ethosu_ioctl_perfmon_destroy(struct drm_device *dev, void *data,
+ struct drm_file *file_priv);
+int ethosu_ioctl_perfmon_get_values(struct drm_device *dev, void *data,
+ struct drm_file *file_priv);
+int ethosu_ioctl_perfmon_set_global(struct drm_device *dev, void *data,
+ struct drm_file *file_priv);
+
#endif
diff --git a/drivers/accel/ethosu/ethosu_job.c b/drivers/accel/ethosu/ethosu_job.c
index e7b07cdbcced..5712848236af 100644
--- a/drivers/accel/ethosu/ethosu_job.c
+++ b/drivers/accel/ethosu/ethosu_job.c
@@ -1,6 +1,6 @@
// SPDX-License-Identifier: GPL-2.0-only OR MIT
/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
-/* Copyright 2025 Arm, Ltd. */
+/* Copyright 2025-2026 Arm, Ltd. */
#include <linux/bitfield.h>
#include <linux/genalloc.h>
@@ -147,6 +147,8 @@ static void ethosu_job_err_cleanup(struct ethosu_job *job)
{
unsigned int i;
+ ethosu_perfmon_put(job->perfmon);
+
for (i = 0; i < job->region_cnt; i++)
drm_gem_object_put(job->region_bo[i]);
@@ -181,6 +183,26 @@ static void ethosu_job_free(struct drm_sched_job *sched_job)
ethosu_job_put(job);
}
+static void
+ethosu_switch_perfmon(struct ethosu_device *ethosu, struct ethosu_job *job)
+{
+ struct ethosu_perfmon *perfmon;
+
+ guard(spinlock)(ðosu->perfmon_state.lock);
+
+ perfmon = READ_ONCE(ethosu->global_perfmon);
+ if (!perfmon)
+ perfmon = job->perfmon;
+
+ if (perfmon == ethosu->perfmon_state.active)
+ return;
+
+ ethosu_perfmon_stop_locked(ethosu, ethosu->perfmon_state.active, true);
+
+ if (perfmon && ethosu->perfmon_state.active != perfmon)
+ ethosu_perfmon_start(ethosu, perfmon);
+}
+
static struct dma_fence *ethosu_job_run(struct drm_sched_job *sched_job)
{
struct ethosu_job *job = to_ethosu_job(sched_job);
@@ -194,6 +216,8 @@ static struct dma_fence *ethosu_job_run(struct drm_sched_job *sched_job)
dev->fence_context, ++dev->emit_seqno);
dma_fence_get(fence);
+ ethosu_switch_perfmon(dev, job);
+
scoped_guard(mutex, &dev->job_lock) {
dev->in_flight_job = job;
ethosu_job_hw_submit(dev, job);
@@ -366,7 +390,8 @@ void ethosu_job_close(struct ethosu_file_priv *ethosu_priv)
}
static int ethosu_ioctl_submit_job(struct drm_device *dev, struct drm_file *file,
- struct drm_ethosu_job *job)
+ struct drm_ethosu_job *job,
+ int perfmon_id)
{
struct ethosu_device *edev = to_ethosu_device(dev);
struct ethosu_file_priv *file_priv = file->driver_priv;
@@ -390,6 +415,9 @@ static int ethosu_ioctl_submit_job(struct drm_device *dev, struct drm_file *file
ejob->dev = edev;
ejob->sram_size = job->sram_size;
+ if (perfmon_id)
+ ejob->perfmon = ethosu_perfmon_find(file_priv, perfmon_id);
+
ejob->done_fence = kzalloc_obj(*ejob->done_fence);
if (!ejob->done_fence) {
ret = -ENOMEM;
@@ -426,7 +454,7 @@ static int ethosu_ioctl_submit_job(struct drm_device *dev, struct drm_file *file
dev_err(dev->dev,
"cmd stream region %d size greater than SRAM size (%llu > %u)\n",
- i, cmd_info->region_size[i],
+ i, cmd_info->region_size[i],
edev->npu_info.sram_size);
ret = -EINVAL;
goto out_cleanup_job;
@@ -492,11 +520,6 @@ int ethosu_ioctl_submit(struct drm_device *dev, void *data, struct drm_file *fil
int ret = 0;
unsigned int i = 0;
- if (args->pad) {
- drm_dbg(dev, "Reserved field in drm_ethosu_submit struct should be 0.\n");
- return -EINVAL;
- }
-
struct drm_ethosu_job __free(kvfree) *jobs =
kvmalloc_objs(*jobs, args->job_count);
if (!jobs)
@@ -510,7 +533,7 @@ int ethosu_ioctl_submit(struct drm_device *dev, void *data, struct drm_file *fil
}
for (i = 0; i < args->job_count; i++) {
- ret = ethosu_ioctl_submit_job(dev, file, &jobs[i]);
+ ret = ethosu_ioctl_submit_job(dev, file, &jobs[i], args->perfmon_id);
if (ret)
return ret;
}
diff --git a/drivers/accel/ethosu/ethosu_job.h b/drivers/accel/ethosu/ethosu_job.h
index ff1cf448d094..8988edd00eed 100644
--- a/drivers/accel/ethosu/ethosu_job.h
+++ b/drivers/accel/ethosu/ethosu_job.h
@@ -21,6 +21,8 @@ struct ethosu_job {
u8 region_cnt;
u32 sram_size;
+ struct ethosu_perfmon *perfmon;
+
/* Fence to be signaled by drm-sched once its done with the job */
struct dma_fence *inference_done_fence;
diff --git a/drivers/accel/ethosu/ethosu_perfmon.c b/drivers/accel/ethosu/ethosu_perfmon.c
new file mode 100644
index 000000000000..c9860a777e65
--- /dev/null
+++ b/drivers/accel/ethosu/ethosu_perfmon.c
@@ -0,0 +1,296 @@
+// SPDX-License-Identifier: GPL-2.0-only or MIT
+/* Copyright 2026 Arm, Ltd. */
+/* Based on v3d_perfmon.c, Copyright (C) 2021 Raspberry Pi */
+
+#include <linux/device.h>
+#include <linux/errno.h>
+#include <linux/pm_runtime.h>
+#include <linux/types.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+
+#include <drm/drm_file.h>
+#include <drm/drm_ioctl.h>
+
+#include <uapi/drm/ethosu_accel.h>
+
+#include "ethosu_drv.h"
+#include "ethosu_device.h"
+
+void ethosu_perfmon_get(struct ethosu_perfmon *perfmon)
+{
+ if (perfmon)
+ refcount_inc(&perfmon->refcnt);
+}
+
+void ethosu_perfmon_put(struct ethosu_perfmon *perfmon)
+{
+ if (perfmon && refcount_dec_and_test(&perfmon->refcnt)) {
+ kfree(perfmon);
+ }
+}
+
+void ethosu_perfmon_start(struct ethosu_device *ethosu, struct ethosu_perfmon *perfmon)
+{
+ unsigned int i;
+ u8 ncounters;
+ u32 mask;
+
+ if (WARN_ON_ONCE(!perfmon || ethosu->perfmon_state.active))
+ return;
+
+ writel_relaxed(PMCR_CNT_EN, ethosu->pmu_regs + NPU_REG_PMCR);
+ writel_relaxed(PMU_EV_TYPE_CYCLES, ethosu->pmu_regs + NPU_REG_PMCCNTR_CFG);
+
+ mask = 0x80000000;
+ ncounters = perfmon->ncounters - 1;
+ if (ncounters)
+ mask |= GENMASK(ncounters - 1, 0);
+
+ for (i = 0; i < ncounters; i++)
+ writel_relaxed(perfmon->counters[i], ethosu->pmu_regs + NPU_REG_PMU_EVTYPER(i));
+
+ writel_relaxed(mask, ethosu->pmu_regs + NPU_REG_PMCNTENSET);
+ writel_relaxed(PMCR_CNT_EN | PMCR_EVENT_CNT_RST | PMCR_CYCLE_CNT_RST,
+ ethosu->pmu_regs + NPU_REG_PMCR);
+ ethosu->perfmon_state.active = perfmon;
+}
+
+void ethosu_perfmon_stop_locked(struct ethosu_device *ethosu, struct ethosu_perfmon *perfmon,
+ bool capture)
+{
+ unsigned int i;
+ u8 ncounters;
+ u32 mask;
+
+ if (!perfmon || perfmon != ethosu->perfmon_state.active)
+ return;
+
+ ncounters = perfmon->ncounters - 1;
+
+ if (!pm_runtime_get_if_active(ethosu->base.dev)) {
+ ethosu->perfmon_state.active = NULL;
+ return;
+ }
+
+ if (capture) {
+ for (i = 0; i < ncounters; i++)
+ perfmon->values[i] += readl_relaxed(ethosu->pmu_regs + NPU_REG_PMU_EVCNTR(i));
+ perfmon->values[ncounters] +=
+ readl_relaxed(ethosu->pmu_regs + NPU_REG_PMCCNTR_LO) |
+ (u64)readl_relaxed(ethosu->pmu_regs + NPU_REG_PMCCNTR_HI) << 32;
+ }
+
+ mask = 0x80000000;
+ if (ncounters)
+ mask |= GENMASK(ncounters - 1, 0);
+ writel_relaxed(mask, ethosu->pmu_regs + NPU_REG_PMCNTENCLR);
+
+ writel_relaxed(0, ethosu->pmu_regs + NPU_REG_PMCR);
+ ethosu->perfmon_state.active = NULL;
+
+ pm_runtime_put(ethosu->base.dev);
+}
+
+void ethosu_perfmon_stop(struct ethosu_device *ethosu, struct ethosu_perfmon *perfmon,
+ bool capture)
+{
+ if (!perfmon)
+ return;
+
+ guard(spinlock)(ðosu->perfmon_state.lock);
+ ethosu_perfmon_stop_locked(ethosu, perfmon, capture);
+}
+
+struct ethosu_perfmon *ethosu_perfmon_find(struct ethosu_file_priv *ethosu_priv, int id)
+{
+ struct ethosu_perfmon *perfmon;
+
+ xa_lock(ðosu_priv->perfmons);
+ perfmon = xa_load(ðosu_priv->perfmons, id);
+ ethosu_perfmon_get(perfmon);
+ xa_unlock(ðosu_priv->perfmons);
+
+ return perfmon;
+}
+
+void ethosu_perfmon_open_file(struct ethosu_file_priv *ethosu_priv)
+{
+ xa_init_flags(ðosu_priv->perfmons, XA_FLAGS_ALLOC1);
+}
+
+static void ethosu_perfmon_delete(struct ethosu_file_priv *ethosu_priv,
+ struct ethosu_perfmon *perfmon)
+{
+ struct ethosu_device *ethosu = ethosu_priv->edev;
+
+ /* If the active perfmon is being destroyed, stop it first */
+ scoped_guard(spinlock_irqsave, ðosu->perfmon_state.lock) {
+ /* If the global perfmon is being destroyed, set it to NULL */
+ if (ethosu->global_perfmon == perfmon) {
+ ethosu->global_perfmon = NULL;
+ ethosu_perfmon_put(perfmon);
+ }
+
+ ethosu_perfmon_stop_locked(ethosu, perfmon, false);
+ }
+
+ ethosu_perfmon_put(perfmon);
+}
+
+void ethosu_perfmon_close_file(struct ethosu_file_priv *ethosu_priv)
+{
+ struct ethosu_perfmon *perfmon;
+ unsigned long id;
+
+ xa_for_each(ðosu_priv->perfmons, id, perfmon)
+ ethosu_perfmon_delete(ethosu_priv, perfmon);
+
+ xa_destroy(ðosu_priv->perfmons);
+}
+
+int ethosu_ioctl_perfmon_create(struct drm_device *dev, void *data,
+ struct drm_file *file_priv)
+{
+ struct ethosu_file_priv *ethosu_priv = file_priv->driver_priv;
+ struct drm_ethosu_perfmon_create *req = data;
+ struct ethosu_device *ethosu = to_ethosu_device(dev);
+ struct ethosu_perfmon *perfmon;
+ unsigned int i, event_max;
+ int ret;
+ u32 id;
+
+ /* Number of monitored counters cannot exceed HW limits. */
+ if (req->ncounters > ethosu->npu_info.pmu_counters)
+ return -EINVAL;
+
+ /* Make sure all counters are valid. */
+ event_max = ethosu_is_u65(ethosu) ? 433 : 671;
+ for (i = 0; i < req->ncounters; i++) {
+ if (req->counters[i] > event_max)
+ return -EINVAL;
+ }
+
+ /* Add 1 more counter for cycle counter */
+ req->ncounters++;
+
+ perfmon = kzalloc_flex(*perfmon, values, req->ncounters);
+ if (!perfmon)
+ return -ENOMEM;
+
+ for (i = 0; i < req->ncounters - 1; i++)
+ perfmon->counters[i] = req->counters[i];
+
+ perfmon->ncounters = req->ncounters;
+
+ refcount_set(&perfmon->refcnt, 1);
+
+ ret = xa_alloc(ðosu_priv->perfmons, &id, perfmon, xa_limit_32b,
+ GFP_KERNEL);
+
+ if (ret < 0) {
+ kfree(perfmon);
+ return ret;
+ }
+
+ req->id = id;
+
+ return 0;
+}
+
+int ethosu_ioctl_perfmon_destroy(struct drm_device *dev, void *data,
+ struct drm_file *file_priv)
+{
+ struct ethosu_file_priv *ethosu_priv = file_priv->driver_priv;
+ struct drm_ethosu_perfmon_destroy *req = data;
+ struct ethosu_perfmon *perfmon;
+
+ perfmon = xa_erase(ðosu_priv->perfmons, req->id);
+ if (!perfmon)
+ return -EINVAL;
+
+ ethosu_perfmon_delete(ethosu_priv, perfmon);
+
+ return 0;
+}
+
+int ethosu_ioctl_perfmon_get_values(struct drm_device *dev, void *data,
+ struct drm_file *file_priv)
+{
+ struct ethosu_device *ethosu = to_ethosu_device(dev);
+ struct ethosu_file_priv *ethosu_priv = file_priv->driver_priv;
+ struct drm_ethosu_perfmon_get_values *req = data;
+ struct ethosu_perfmon *perfmon;
+ int ret = 0;
+
+ if (req->pad != 0)
+ return -EINVAL;
+
+ perfmon = ethosu_perfmon_find(ethosu_priv, req->id);
+ if (!perfmon)
+ return -EINVAL;
+
+ ret = pm_runtime_resume_and_get(dev->dev);
+ if (ret) {
+ ethosu_perfmon_put(perfmon);
+ return ret;
+ }
+ ethosu_perfmon_stop(ethosu, perfmon, true);
+
+ pm_runtime_put_autosuspend(dev->dev);
+
+ if (copy_to_user(u64_to_user_ptr(req->values_ptr), perfmon->values,
+ perfmon->ncounters * sizeof(u64)))
+ ret = -EFAULT;
+
+ ethosu_perfmon_put(perfmon);
+
+ return ret;
+}
+
+int ethosu_ioctl_perfmon_set_global(struct drm_device *dev, void *data,
+ struct drm_file *file_priv)
+{
+ struct ethosu_file_priv *ethosu_priv = file_priv->driver_priv;
+ struct drm_ethosu_perfmon_set_global *req = data;
+ struct ethosu_device *ethosu = to_ethosu_device(dev);
+ struct ethosu_perfmon *perfmon;
+
+ if (req->flags & ~DRM_ETHOSU_PERFMON_CLEAR_GLOBAL)
+ return -EINVAL;
+
+ perfmon = ethosu_perfmon_find(ethosu_priv, req->id);
+ if (!perfmon)
+ return -EINVAL;
+
+ /* If the request is to clear the global performance monitor */
+ if (req->flags & DRM_ETHOSU_PERFMON_CLEAR_GLOBAL) {
+ struct ethosu_perfmon *old;
+ scoped_guard(spinlock, ðosu->perfmon_state.lock) {
+ old = ethosu->global_perfmon;
+ if (!old) {
+ ethosu_perfmon_put(perfmon);
+ return -EINVAL;
+ }
+
+ ethosu->global_perfmon = NULL;
+ ethosu_perfmon_stop_locked(ethosu, old, true);
+ }
+
+ ethosu_perfmon_put(old);
+ ethosu_perfmon_put(perfmon);
+
+ return 0;
+ }
+
+ scoped_guard(spinlock, ðosu->perfmon_state.lock) {
+ if (ethosu->perfmon_state.active || ethosu->global_perfmon) {
+ ethosu_perfmon_put(perfmon);
+ return -EBUSY;
+ }
+
+ ethosu->global_perfmon = perfmon;
+ }
+
+ return 0;
+}
diff --git a/include/uapi/drm/ethosu_accel.h b/include/uapi/drm/ethosu_accel.h
index af78bb4686d7..dde6756642ea 100644
--- a/include/uapi/drm/ethosu_accel.h
+++ b/include/uapi/drm/ethosu_accel.h
@@ -43,6 +43,11 @@ enum drm_ethosu_ioctl_id {
/** @DRM_ETHOSU_SUBMIT: Submit a job and BOs to run. */
DRM_ETHOSU_SUBMIT,
+
+ DRM_ETHOSU_PERFMON_CREATE,
+ DRM_ETHOSU_PERFMON_DESTROY,
+ DRM_ETHOSU_PERFMON_GET_VALUES,
+ DRM_ETHOSU_PERFMON_SET_GLOBAL,
};
/**
@@ -79,6 +84,7 @@ struct drm_ethosu_npu_info {
__u32 config;
__u32 sram_size;
+ __u32 pmu_counters;
};
/**
@@ -220,8 +226,51 @@ struct drm_ethosu_submit {
/** Input: Number of jobs passed in. */
__u32 job_count;
- /** Reserved, must be zero. */
+ /** Input: Id returned by DRM_ETHOSU_PERFMON_CREATE */
+ __u32 perfmon_id;
+};
+
+#define DRM_ETHOSU_MAX_PERF_EVENT_COUNTERS 8
+#define DRM_ETHOSU_MAX_PERF_COUNTERS \
+ (DRM_ETHOSU_MAX_PERF_EVENT_COUNTERS + 1)
+
+struct drm_ethosu_perfmon_create {
+ __u32 id;
+ __u32 ncounters;
+ __u16 counters[DRM_ETHOSU_MAX_PERF_EVENT_COUNTERS];
+};
+
+struct drm_ethosu_perfmon_destroy {
+ __u32 id;
+};
+
+/*
+ * Returns the values of the performance counters tracked by this
+ * perfmon (as an array of (ncounters + 1) u64 values).
+ *
+ * No implicit synchronization is performed, so the user has to
+ * guarantee that any jobs using this perfmon have already been
+ * completed.
+ */
+struct drm_ethosu_perfmon_get_values {
+ __u32 id;
__u32 pad;
+ __u64 values_ptr;
+};
+
+#define DRM_ETHOSU_PERFMON_CLEAR_GLOBAL 0x0001
+
+/**
+ * struct drm_ethosu_perfmon_set_global - ioctl to define a global performance
+ * monitor
+ *
+ * The global performance monitor will be used for all jobs. If a global
+ * performance monitor is defined, jobs with a self-defined performance
+ * monitor won't be allowed.
+ */
+struct drm_ethosu_perfmon_set_global {
+ __u32 flags;
+ __u32 id;
};
/**
@@ -252,6 +301,14 @@ enum {
DRM_IOCTL_ETHOSU(WR, CMDSTREAM_BO_CREATE, cmdstream_bo_create),
DRM_IOCTL_ETHOSU_SUBMIT =
DRM_IOCTL_ETHOSU(WR, SUBMIT, submit),
+ DRM_IOCTL_ETHOSU_PERFMON_CREATE =
+ DRM_IOCTL_ETHOSU(WR, PERFMON_CREATE, perfmon_create),
+ DRM_IOCTL_ETHOSU_PERFMON_DESTROY =
+ DRM_IOCTL_ETHOSU(WR, PERFMON_DESTROY, perfmon_destroy),
+ DRM_IOCTL_ETHOSU_PERFMON_GET_VALUES =
+ DRM_IOCTL_ETHOSU(WR, PERFMON_GET_VALUES, perfmon_get_values),
+ DRM_IOCTL_ETHOSU_PERFMON_SET_GLOBAL =
+ DRM_IOCTL_ETHOSU(WR, PERFMON_SET_GLOBAL, perfmon_set_global),
};
#if defined(__cplusplus)
--
2.53.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Claude review: accel: ethosu: Add performance counter support
2026-05-15 3:26 [PATCH v2] accel: ethosu: Add performance counter support Rob Herring (Arm)
@ 2026-05-16 0:08 ` Claude Code Review Bot
2026-05-16 0:08 ` Claude Code Review Bot
1 sibling, 0 replies; 5+ messages in thread
From: Claude Code Review Bot @ 2026-05-16 0:08 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: accel: ethosu: Add performance counter support
Author: "Rob Herring (Arm)" <robh@kernel.org>
Patches: 1
Reviewed: 2026-05-16T10:08:39.258984
---
This is a single-patch series (v2) adding PMU performance counter support to the Arm Ethos-U NPU accelerator driver. The implementation is modeled after V3D's perfmon subsystem. The design is straightforward: userspace creates a perfmon object with desired event counters, attaches it to job submissions (or sets it as a global monitor), and reads back accumulated counter values. The code uses XArray for ID allocation, refcounting for perfmon lifetime, and a per-device spinlock for protecting active perfmon state.
Overall the patch is in reasonable shape for a v2 but has several issues that should be addressed before merging, ranging from a likely crash bug in `pmu_regs` initialization, to UAPI struct padding/alignment concerns, to locking and data-race issues.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 5+ messages in thread
* Claude review: accel: ethosu: Add performance counter support
2026-05-15 3:26 [PATCH v2] accel: ethosu: Add performance counter support Rob Herring (Arm)
2026-05-16 0:08 ` Claude review: " Claude Code Review Bot
@ 2026-05-16 0:08 ` Claude Code Review Bot
1 sibling, 0 replies; 5+ messages in thread
From: Claude Code Review Bot @ 2026-05-16 0:08 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Bug: Uninitialized `pmu_regs` offset for non-U65**
In `ethosu_drv.c`, the probe function sets:
```c
ethosudev->pmu_regs = ethosudev->regs;
```
Then in `ethosu_init()`:
```c
if (!ethosu_is_u65(ethosudev))
ethosudev->pmu_regs += 0x1000;
```
This `+= 0x1000` is applied to the `void __iomem *` pointer. However, at this point `pmu_regs` was already set to `ethosudev->regs` in `ethosu_probe()`. If `ethosu_init()` were ever called a second time (e.g. via a resume path or error-retry), the offset would be applied twice. More importantly, the `devm_platform_ioremap_resource()` mapping might not cover the extra 0x1000 offset — the resource size should be validated. Suggest using an explicit assignment `ethosudev->pmu_regs = ethosudev->regs + 0x1000;` instead of `+=` to make it idempotent and clearer.
**UAPI: `drm_ethosu_npu_info` struct packing changed without padding**
Adding `pmu_counters` (__u32) to `drm_ethosu_npu_info`:
```c
struct drm_ethosu_npu_info {
__u32 id;
__u32 config;
__u32 sram_size;
__u32 pmu_counters; /* new */
};
```
This changes the struct size from 12 to 16 bytes. Since this struct is used via `DRM_ETHOSU_DEV_QUERY` with a size-based copy mechanism (`min(size, actual_structure_size)`), this is designed to be extensible and should be fine for forward/backward compatibility. However, the original 12-byte struct had no padding for natural alignment — was 12 bytes intentional, or should there have been an explicit pad field already? This is worth verifying that existing userspace won't break, since `sizeof(struct drm_ethosu_npu_info)` changes.
**UAPI: `drm_ethosu_perfmon_create` alignment/padding issues**
```c
struct drm_ethosu_perfmon_create {
__u32 id;
__u32 ncounters;
__u16 counters[DRM_ETHOSU_MAX_PERF_EVENT_COUNTERS]; /* 8 * 2 = 16 bytes */
};
```
Total: 4 + 4 + 16 = 24 bytes. The struct has no trailing padding issue per se, but the `__u16` array following `__u32` fields is aligned fine. However, it would be good practice to add a `__u32 pad` for future extensibility since this is ioctl-level ABI.
**UAPI: `drm_ethosu_perfmon_destroy` is only 4 bytes**
```c
struct drm_ethosu_perfmon_destroy {
__u32 id;
};
```
A 4-byte UAPI struct is unusual. Consider adding a `__u32 pad;` for future extensibility and to ensure 8-byte alignment for the ioctl struct.
**UAPI: `drm_ethosu_submit.pad` field repurposed**
The existing `pad` field (which was required to be zero) is repurposed as `perfmon_id`:
```c
- /** Reserved, must be zero. */
- __u32 pad;
+ /** Input: Id returned by DRM_ETHOSU_PERFMON_CREATE */
+ __u32 perfmon_id;
```
The validation of `pad == 0` is removed:
```c
- if (args->pad) {
- drm_dbg(dev, "Reserved field in drm_ethosu_submit struct should be 0.\n");
- return -EINVAL;
- }
```
This is the correct way to reuse a reserved field. Since `XA_FLAGS_ALLOC1` is used, valid perfmon IDs start at 1 and `perfmon_id == 0` means "no perfmon," which is backward-compatible. Good.
**Locking: `ethosu_perfmon_start` called under spinlock but not annotated**
In `ethosu_switch_perfmon()`:
```c
guard(spinlock)(ðosu->perfmon_state.lock);
...
ethosu_perfmon_stop_locked(ethosu, ethosu->perfmon_state.active, true);
if (perfmon && ethosu->perfmon_state.active != perfmon)
ethosu_perfmon_start(ethosu, perfmon);
```
`ethosu_perfmon_start()` is called under `perfmon_state.lock` (a spinlock), and it does `writel_relaxed()` which is fine from a sleeping context. But this function does MMIO writes — the spinlock ensures atomicity with respect to the stop path, which is correct. However, `ethosu_perfmon_start()` has a `WARN_ON_ONCE(!perfmon || ethosu->perfmon_state.active)` check — by the time we get there after the stop, `active` should be NULL. The second condition in the `if` check `ethosu->perfmon_state.active != perfmon` is redundant after the early return for `perfmon == active` case, since `stop_locked` sets `active = NULL`. Not a bug, just slightly confusing logic.
**Potential data race: `global_perfmon` read without lock**
In `ethosu_switch_perfmon()`:
```c
guard(spinlock)(ðosu->perfmon_state.lock);
perfmon = READ_ONCE(ethosu->global_perfmon);
```
The `READ_ONCE` is good, but `global_perfmon` is also written in `ethosu_perfmon_set_global()` under the same spinlock, and in `ethosu_perfmon_delete()` under `spinlock_irqsave`. The lock coverage is consistent, which is correct. However, the `global_perfmon` pointer is set with a refcount from `ethosu_perfmon_find()` in `set_global`, but there's no corresponding `ethosu_perfmon_get()` when it's used in `switch_perfmon`. If the global perfmon is concurrently destroyed (via `ethosu_perfmon_delete` which drops the ref), the pointer used in `switch_perfmon` could reference freed memory after the spinlock is dropped. In practice, since `switch_perfmon` only uses the pointer within the spinlock scope and `delete` also takes the spinlock, this is safe — but the design is fragile.
**`ethosu_perfmon_stop_locked`: `pm_runtime_get_if_active` returns int, not bool**
```c
if (!pm_runtime_get_if_active(ethosu->base.dev)) {
```
`pm_runtime_get_if_active()` can return negative error codes, 0 (device not active), or positive (success). The `!` check treats negative errors the same as "not active" — this is probably fine (if PM errors out, bailing is reasonable), but it's worth noting.
**`ethosu_perfmon_stop_locked`: called from spinlock but calls `pm_runtime_put`**
```c
void ethosu_perfmon_stop_locked(...)
{
...
pm_runtime_put(ethosu->base.dev);
}
```
`pm_runtime_put()` may sleep (it can trigger `runtime_suspend` callbacks synchronously). This is called under a spinlock context (from `ethosu_switch_perfmon` via `guard(spinlock)`). This is a **bug** — sleeping in atomic context. Should use `pm_runtime_put_autosuspend()` or `pm_runtime_mark_last_busy()` + `pm_runtime_put_autosuspend()` instead, or better yet, defer the pm_runtime_put to after the spinlock is released.
Similarly, `pm_runtime_get_if_active()` should also be checked for sleeping behavior, though that one is documented as safe in atomic context.
**`ethosu_ioctl_perfmon_get_values`: stopping a perfmon that may not be active**
```c
ethosu_perfmon_stop(ethosu, perfmon, true);
```
This stops the perfmon to capture current values, but doesn't restart it afterward. If this was a global perfmon that was running, it will be stopped permanently after a get_values call. The docstring says "no implicit synchronization" but the stop side-effect is implicit. This seems intentional based on the V3D model, but worth documenting.
**`ethosu_ioctl_perfmon_set_global`: no check that perfmon_id matches the perfmon being cleared**
When `DRM_ETHOSU_PERFMON_CLEAR_GLOBAL` is set:
```c
perfmon = ethosu_perfmon_find(ethosu_priv, req->id);
if (!perfmon)
return -EINVAL;
...
if (req->flags & DRM_ETHOSU_PERFMON_CLEAR_GLOBAL) {
old = ethosu->global_perfmon;
if (!old) { ... return -EINVAL; }
ethosu->global_perfmon = NULL;
```
The `req->id` is used to find a perfmon, but then the code doesn't verify that the found perfmon matches the current `global_perfmon`. Any valid perfmon ID will clear any global perfmon. The found `perfmon` is only used for the `find` validity check, then immediately put. This is potentially confusing API — either the `id` should be validated against the current global, or the clear operation shouldn't require a valid `id` at all.
**Missing `#include` for flex array / `kzalloc_flex` in ethosu_perfmon.c**
The code uses `kzalloc_flex()` but I don't see an explicit include for whatever header provides it. It likely comes transitively through `slab.h` or one of the DRM headers. Should be fine but worth verifying builds cleanly.
**`ethosu_job_err_cleanup`: perfmon_put before BO cleanup**
```c
static void ethosu_job_err_cleanup(struct ethosu_job *job)
{
ethosu_perfmon_put(job->perfmon);
for (i = 0; i < job->region_cnt; i++)
drm_gem_object_put(job->region_bo[i]);
```
This is fine — just noting that `perfmon_put` is added at the top. If `job->perfmon` is NULL (no perfmon attached), the `ethosu_perfmon_put` NULL-checks internally, which is correct.
**Missing error check for `ethosu_perfmon_find` in submit path**
```c
if (perfmon_id)
ejob->perfmon = ethosu_perfmon_find(file_priv, perfmon_id);
```
If the user passes a nonzero `perfmon_id` that doesn't exist in the xarray, `ethosu_perfmon_find` returns NULL (xa_load returns NULL for missing entries). The job proceeds with `perfmon == NULL`, silently ignoring the invalid perfmon ID. This should return `-EINVAL` if the user specified a perfmon_id but it wasn't found:
```c
if (perfmon_id) {
ejob->perfmon = ethosu_perfmon_find(file_priv, perfmon_id);
if (!ejob->perfmon)
return -EINVAL;
}
```
**`counters` array in `ethosu_perfmon` struct is fixed-size but `values` is flex**
```c
struct ethosu_perfmon {
u8 ncounters;
u16 counters[DRM_ETHOSU_MAX_PERF_EVENT_COUNTERS];
u64 values[] __counted_by(ncounters);
};
```
`ncounters` includes the cycle counter (+1), so it can be up to 9 (DRM_ETHOSU_MAX_PERF_COUNTERS). The `counters` array is sized `DRM_ETHOSU_MAX_PERF_EVENT_COUNTERS` (8), which stores only the user-specified event counters (not cycles). `values` is `ncounters` entries which includes cycles. This mismatch between `counters` array size (8 event types) and `values` array size (up to 9 values) is intentional but the mixed use of `ncounters` and `ncounters - 1` throughout the code is error-prone and hard to follow. Consider using a separate field for the event counter count vs total counter count.
**Minor: SPDX license tag inconsistency**
`ethosu_perfmon.c` header:
```c
// SPDX-License-Identifier: GPL-2.0-only or MIT
```
Other files use:
```c
// SPDX-License-Identifier: GPL-2.0-only OR MIT
```
The SPDX spec requires uppercase `OR`. This should be `GPL-2.0-only OR MIT`.
**Summary of issues by severity:**
1. **Bug**: `pm_runtime_put()` called under spinlock in `ethosu_perfmon_stop_locked` — sleeping in atomic context
2. **Bug**: Missing error check when `ethosu_perfmon_find` returns NULL in the submit path (invalid perfmon_id silently ignored)
3. **Issue**: `pmu_regs += 0x1000` is not idempotent — should be explicit assignment
4. **UAPI**: Several structs could benefit from pad fields for future ABI extensibility
5. **Minor**: SPDX license tag uses lowercase `or` instead of `OR`
6. **Style**: Pervasive `ncounters - 1` pattern throughout the code is fragile; a named field would be clearer
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-05-16 0:08 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-15 3:26 [PATCH v2] accel: ethosu: Add performance counter support Rob Herring (Arm)
2026-05-16 0:08 ` Claude review: " Claude Code Review Bot
2026-05-16 0:08 ` Claude Code Review Bot
-- strict thread matches above, loose matches on Subject: below --
2026-03-06 16:36 [PATCH] " Rob Herring (Arm)
2026-03-08 22:34 ` Claude review: " Claude Code Review Bot
2026-03-08 22:34 ` Claude Code Review Bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox