public inbox for drm-ai-reviews@public-inbox.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH v2] accel: ethosu: Add performance counter support
@ 2026-05-15  3:26 Rob Herring (Arm)
  2026-05-16  0:08 ` Claude review: " Claude Code Review Bot
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Rob Herring (Arm) @ 2026-05-15  3:26 UTC (permalink / raw)
  To: Maíra Canal, Tomeu Vizoso, Oded Gabbay, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Kees Cook, Gustavo A. R. Silva
  Cc: linux-kernel, dri-devel, linux-hardening

The Arm Ethos-U NPUs have a PMU with performance counters. The PMU h/w
supports up to 4 (U65) or 8 (U85) counters which can be programmed for
different events. There is also a dedicated cycle counter.

The ABI and implementation are copied from the V3D driver. The main
difference in the ABI is there is no query API for the the event list.
The events differ between the U65 and U85, so the events lists are
maintained in userspace along with other differences between the U65 and
U85.

The cycle counter is always enabled when the PMU is enabled. When the
user requests N events, reading the counters will return the N events
plus the cycle counter.

Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
---
v2:
 - Use XArray instead of idr
 - Rework locking to use per device spinlock to protect modifying active
   perfmon. Based on pending V3D changes:
   https://lore.kernel.org/all/20260508-v3d-perfmon-lifetime-v1-1-f5b5642c085f@igalia.com/
 - Add missing perfmon puts in ethosu_ioctl_perfmon_set_global() and
   ethosu_ioctl_perfmon_get_values() error paths.
 - Fix reading number of counters on U85.
 - Add defines NPU_REG_PMCCNTR_CFG
---
 drivers/accel/ethosu/Makefile         |   2 +-
 drivers/accel/ethosu/ethosu_device.h  |  32 +++
 drivers/accel/ethosu/ethosu_drv.c     |  21 +-
 drivers/accel/ethosu/ethosu_drv.h     |  62 +++++-
 drivers/accel/ethosu/ethosu_job.c     |  41 +++-
 drivers/accel/ethosu/ethosu_job.h     |   2 +
 drivers/accel/ethosu/ethosu_perfmon.c | 296 ++++++++++++++++++++++++++
 include/uapi/drm/ethosu_accel.h       |  59 ++++-
 8 files changed, 500 insertions(+), 15 deletions(-)
 create mode 100644 drivers/accel/ethosu/ethosu_perfmon.c

diff --git a/drivers/accel/ethosu/Makefile b/drivers/accel/ethosu/Makefile
index 17db5a600416..598a388b7179 100644
--- a/drivers/accel/ethosu/Makefile
+++ b/drivers/accel/ethosu/Makefile
@@ -1,4 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only
 
 obj-$(CONFIG_DRM_ACCEL_ARM_ETHOSU) := ethosu.o
-ethosu-y += ethosu_drv.o ethosu_gem.o ethosu_job.o
+ethosu-y += ethosu_drv.o ethosu_gem.o ethosu_job.o ethosu_perfmon.o
diff --git a/drivers/accel/ethosu/ethosu_device.h b/drivers/accel/ethosu/ethosu_device.h
index b189fa783d6a..92b6f2c69254 100644
--- a/drivers/accel/ethosu/ethosu_device.h
+++ b/drivers/accel/ethosu/ethosu_device.h
@@ -43,6 +43,15 @@ struct gen_pool;
 #define NPU_REG_BASEP_HI(x)	(0x0084 + (x) * 8)
 #define NPU_BASEP_REGION_MAX	8
 
+#define NPU_REG_PMCR		0x0180
+#define NPU_REG_PMCNTENSET	0x0184
+#define NPU_REG_PMCNTENCLR	0x0188
+#define NPU_REG_PMCCNTR_LO	0x01A0
+#define NPU_REG_PMCCNTR_HI	0x01A4
+#define NPU_REG_PMCCNTR_CFG	0x01A8
+#define NPU_REG_PMU_EVCNTR(x)	(0x0300 + (x) * 4)
+#define NPU_REG_PMU_EVTYPER(x)	(0x0380 + (x) * 4)
+
 #define ID_ARCH_MAJOR_MASK	GENMASK(31, 28)
 #define ID_ARCH_MINOR_MASK	GENMASK(27, 20)
 #define ID_ARCH_PATCH_MASK	GENMASK(19, 16)
@@ -67,6 +76,15 @@ struct gen_pool;
 
 #define PROT_ACTIVE_CSL		BIT(1)
 
+#define PMCR_NUM_EVENT_CNT_MASK	GENMASK(15, 11)
+#define PMCR_CYCLE_CNT_RST	BIT(2)
+#define PMCR_EVENT_CNT_RST	BIT(1)
+#define PMCR_CNT_EN		BIT(0)
+
+#define PMU_EV_TYPE_NONE	0
+#define PMU_EV_TYPE_CYCLES	0x11
+#define PMU_EV_TYPE_IDLE	0x20
+
 enum ethosu_cmds {
 	NPU_OP_CONV = 0x2,
 	NPU_OP_DEPTHWISE = 0x3,
@@ -152,6 +170,8 @@ enum ethosu_cmds {
 
 #define ETHOSU_SRAM_REGION	2	/* Matching Vela compiler */
 
+struct ethosu_perfmon;
+
 /**
  * struct ethosu_device - Ethosu device
  */
@@ -161,6 +181,7 @@ struct ethosu_device {
 
 	/** @iomem: CPU mapping of the registers. */
 	void __iomem *regs;
+	void __iomem *pmu_regs;
 
 	void __iomem *sram;
 	struct gen_pool *srampool;
@@ -184,6 +205,17 @@ struct ethosu_device {
 	struct mutex sched_lock;
 	u64 fence_context;
 	u64 emit_seqno;
+
+	/* Tracks the performance monitor state. */
+	struct {
+		/* Protects @active. */
+		spinlock_t lock;
+
+		/* Perfmon currently programmed in HW (or NULL if none). */
+		struct ethosu_perfmon *active;
+	} perfmon_state;
+
+	struct ethosu_perfmon *global_perfmon;
 };
 
 #define to_ethosu_device(drm_dev) \
diff --git a/drivers/accel/ethosu/ethosu_drv.c b/drivers/accel/ethosu/ethosu_drv.c
index 9992193d7338..67fc971d7f60 100644
--- a/drivers/accel/ethosu/ethosu_drv.c
+++ b/drivers/accel/ethosu/ethosu_drv.c
@@ -155,6 +155,7 @@ static int ethosu_open(struct drm_device *ddev, struct drm_file *file)
 	if (ret)
 		goto err_put_mod;
 
+	ethosu_perfmon_open_file(priv);
 	file->driver_priv = no_free_ptr(priv);
 	return 0;
 
@@ -166,6 +167,7 @@ static int ethosu_open(struct drm_device *ddev, struct drm_file *file)
 static void ethosu_postclose(struct drm_device *ddev, struct drm_file *file)
 {
 	ethosu_job_close(file->driver_priv);
+	ethosu_perfmon_close_file(file->driver_priv);
 	kfree(file->driver_priv);
 	module_put(THIS_MODULE);
 }
@@ -180,6 +182,10 @@ static const struct drm_ioctl_desc ethosu_drm_driver_ioctls[] = {
 	ETHOSU_IOCTL(BO_MMAP_OFFSET, bo_mmap_offset, 0),
 	ETHOSU_IOCTL(CMDSTREAM_BO_CREATE, cmdstream_bo_create, 0),
 	ETHOSU_IOCTL(SUBMIT, submit, 0),
+	ETHOSU_IOCTL(PERFMON_CREATE, perfmon_create, 0),
+	ETHOSU_IOCTL(PERFMON_DESTROY, perfmon_destroy, 0),
+	ETHOSU_IOCTL(PERFMON_GET_VALUES, perfmon_get_values, 0),
+	ETHOSU_IOCTL(PERFMON_SET_GLOBAL, perfmon_set_global, 0),
 };
 
 DEFINE_DRM_ACCEL_FOPS(ethosu_drm_driver_fops);
@@ -312,11 +318,16 @@ static int ethosu_init(struct ethosu_device *ethosudev)
 
 	ethosudev->npu_info.id = id = readl_relaxed(ethosudev->regs + NPU_REG_ID);
 	ethosudev->npu_info.config = config = readl_relaxed(ethosudev->regs + NPU_REG_CONFIG);
-
 	ethosu_sram_init(ethosudev);
 
+	if (!ethosu_is_u65(ethosudev))
+		ethosudev->pmu_regs += 0x1000;
+
+	ethosudev->npu_info.pmu_counters = FIELD_GET(PMCR_NUM_EVENT_CNT_MASK,
+		readl_relaxed(ethosudev->pmu_regs + NPU_REG_PMCR));
+
 	dev_info(ethosudev->base.dev,
-		 "Ethos-U NPU, arch v%ld.%ld.%ld, rev r%ldp%ld, cmd stream ver%ld, %d MACs, %dKB SRAM\n",
+		 "Ethos-U NPU, arch v%ld.%ld.%ld, rev r%ldp%ld, cmd stream ver%ld, %d MACs, %dKB SRAM, %d PMU cntrs\n",
 		 FIELD_GET(ID_ARCH_MAJOR_MASK, id),
 		 FIELD_GET(ID_ARCH_MINOR_MASK, id),
 		 FIELD_GET(ID_ARCH_PATCH_MASK, id),
@@ -324,7 +335,8 @@ static int ethosu_init(struct ethosu_device *ethosudev)
 		 FIELD_GET(ID_VER_MINOR_MASK, id),
 		 FIELD_GET(CONFIG_CMD_STREAM_VER_MASK, config),
 		 1 << FIELD_GET(CONFIG_MACS_PER_CC_MASK, config),
-		 ethosudev->npu_info.sram_size / 1024);
+		 ethosudev->npu_info.sram_size / 1024,
+		 ethosudev->npu_info.pmu_counters);
 
 	return 0;
 }
@@ -343,11 +355,14 @@ static int ethosu_probe(struct platform_device *pdev)
 	dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(40));
 
 	ethosudev->regs = devm_platform_ioremap_resource(pdev, 0);
+	ethosudev->pmu_regs = ethosudev->regs;
 
 	ethosudev->num_clks = devm_clk_bulk_get_all(&pdev->dev, &ethosudev->clks);
 	if (ethosudev->num_clks < 0)
 		return ethosudev->num_clks;
 
+	spin_lock_init(&ethosudev->perfmon_state.lock);
+
 	ret = ethosu_job_init(ethosudev);
 	if (ret)
 		return ret;
diff --git a/drivers/accel/ethosu/ethosu_drv.h b/drivers/accel/ethosu/ethosu_drv.h
index 9e21dfe94184..8fed43c2a7af 100644
--- a/drivers/accel/ethosu/ethosu_drv.h
+++ b/drivers/accel/ethosu/ethosu_drv.h
@@ -1,15 +1,75 @@
 /* SPDX-License-Identifier: GPL-2.0-only OR MIT */
-/* Copyright 2025 Arm, Ltd. */
+/* Copyright 2025-2026 Arm, Ltd. */
 #ifndef __ETHOSU_DRV_H__
 #define __ETHOSU_DRV_H__
 
+#include <linux/mutex.h>
+#include <linux/xarray.h>
 #include <drm/gpu_scheduler.h>
 
 struct ethosu_device;
+struct drm_device;
+struct drm_file;
 
 struct ethosu_file_priv {
 	struct ethosu_device *edev;
 	struct drm_sched_entity sched_entity;
+	struct xarray perfmons;
 };
 
+/* Performance monitor object. The perform lifetime is controlled by userspace
+ * using perfmon related ioctls. A perfmon can be attached to a submit_cl
+ * request, and when this is the case, HW perf counters will be activated just
+ * before the submit_cl is submitted to the GPU and disabled when the job is
+ * done. This way, only events related to a specific job will be counted.
+ */
+struct ethosu_perfmon {
+	/* Tracks the number of users of the perfmon, when this counter reaches
+	 * zero the perfmon is destroyed.
+	 */
+	refcount_t refcnt;
+
+	/* Number of counters activated in this perfmon instance
+	 * (should be less than or equal to DRM_ETHOSU_MAX_PERF_COUNTERS).
+	 */
+	u8 ncounters;
+
+	/* Events counted by the HW perf counters. */
+	u16 counters[DRM_ETHOSU_MAX_PERF_EVENT_COUNTERS];
+
+	/*
+	 * Storage for counter values. Counters are incremented by the HW
+	 * perf counter values every time the perfmon is attached to a
+	 * NPU job. This way, perfmon users don't have to retrieve the
+	 * results after each job if they want to track events covering
+	 * several submissions. Note that counter values can't be reset,
+	 * but you can fake a reset by destroying the perfmon and
+	 * creating a new one.
+	 */
+	u64 values[] __counted_by(ncounters);
+};
+
+/* ethosu_perfmon.c */
+void ethosu_perfmon_init(struct ethosu_device *ethosu);
+void ethosu_perfmon_get(struct ethosu_perfmon *perfmon);
+void ethosu_perfmon_put(struct ethosu_perfmon *perfmon);
+void ethosu_perfmon_start(struct ethosu_device *ethosu,
+			  struct ethosu_perfmon *perfmon);
+void ethosu_perfmon_stop(struct ethosu_device *ethosu,
+			 struct ethosu_perfmon *perfmon, bool capture);
+void ethosu_perfmon_stop_locked(struct ethosu_device *ethosu, struct ethosu_perfmon *perfmon,
+				bool capture);
+struct ethosu_perfmon *ethosu_perfmon_find(struct ethosu_file_priv *ethosu_priv,
+					   int id);
+void ethosu_perfmon_open_file(struct ethosu_file_priv *ethosu_priv);
+void ethosu_perfmon_close_file(struct ethosu_file_priv *ethosu_priv);
+int ethosu_ioctl_perfmon_create(struct drm_device *dev, void *data,
+				struct drm_file *file_priv);
+int ethosu_ioctl_perfmon_destroy(struct drm_device *dev, void *data,
+				 struct drm_file *file_priv);
+int ethosu_ioctl_perfmon_get_values(struct drm_device *dev, void *data,
+				    struct drm_file *file_priv);
+int ethosu_ioctl_perfmon_set_global(struct drm_device *dev, void *data,
+				    struct drm_file *file_priv);
+
 #endif
diff --git a/drivers/accel/ethosu/ethosu_job.c b/drivers/accel/ethosu/ethosu_job.c
index e7b07cdbcced..5712848236af 100644
--- a/drivers/accel/ethosu/ethosu_job.c
+++ b/drivers/accel/ethosu/ethosu_job.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0-only OR MIT
 /* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
-/* Copyright 2025 Arm, Ltd. */
+/* Copyright 2025-2026 Arm, Ltd. */
 
 #include <linux/bitfield.h>
 #include <linux/genalloc.h>
@@ -147,6 +147,8 @@ static void ethosu_job_err_cleanup(struct ethosu_job *job)
 {
 	unsigned int i;
 
+	ethosu_perfmon_put(job->perfmon);
+
 	for (i = 0; i < job->region_cnt; i++)
 		drm_gem_object_put(job->region_bo[i]);
 
@@ -181,6 +183,26 @@ static void ethosu_job_free(struct drm_sched_job *sched_job)
 	ethosu_job_put(job);
 }
 
+static void
+ethosu_switch_perfmon(struct ethosu_device *ethosu, struct ethosu_job *job)
+{
+	struct ethosu_perfmon *perfmon;
+
+	guard(spinlock)(&ethosu->perfmon_state.lock);
+
+	perfmon = READ_ONCE(ethosu->global_perfmon);
+	if (!perfmon)
+		perfmon = job->perfmon;
+
+	if (perfmon == ethosu->perfmon_state.active)
+		return;
+
+	ethosu_perfmon_stop_locked(ethosu, ethosu->perfmon_state.active, true);
+
+	if (perfmon && ethosu->perfmon_state.active != perfmon)
+		ethosu_perfmon_start(ethosu, perfmon);
+}
+
 static struct dma_fence *ethosu_job_run(struct drm_sched_job *sched_job)
 {
 	struct ethosu_job *job = to_ethosu_job(sched_job);
@@ -194,6 +216,8 @@ static struct dma_fence *ethosu_job_run(struct drm_sched_job *sched_job)
 		       dev->fence_context, ++dev->emit_seqno);
 	dma_fence_get(fence);
 
+	ethosu_switch_perfmon(dev, job);
+
 	scoped_guard(mutex, &dev->job_lock) {
 		dev->in_flight_job = job;
 		ethosu_job_hw_submit(dev, job);
@@ -366,7 +390,8 @@ void ethosu_job_close(struct ethosu_file_priv *ethosu_priv)
 }
 
 static int ethosu_ioctl_submit_job(struct drm_device *dev, struct drm_file *file,
-				   struct drm_ethosu_job *job)
+				   struct drm_ethosu_job *job,
+				   int perfmon_id)
 {
 	struct ethosu_device *edev = to_ethosu_device(dev);
 	struct ethosu_file_priv *file_priv = file->driver_priv;
@@ -390,6 +415,9 @@ static int ethosu_ioctl_submit_job(struct drm_device *dev, struct drm_file *file
 	ejob->dev = edev;
 	ejob->sram_size = job->sram_size;
 
+	if (perfmon_id)
+		ejob->perfmon = ethosu_perfmon_find(file_priv, perfmon_id);
+
 	ejob->done_fence = kzalloc_obj(*ejob->done_fence);
 	if (!ejob->done_fence) {
 		ret = -ENOMEM;
@@ -426,7 +454,7 @@ static int ethosu_ioctl_submit_job(struct drm_device *dev, struct drm_file *file
 
 				dev_err(dev->dev,
 					"cmd stream region %d size greater than SRAM size (%llu > %u)\n",
-					i, cmd_info->region_size[i], 
+					i, cmd_info->region_size[i],
 					edev->npu_info.sram_size);
 				ret = -EINVAL;
 				goto out_cleanup_job;
@@ -492,11 +520,6 @@ int ethosu_ioctl_submit(struct drm_device *dev, void *data, struct drm_file *fil
 	int ret = 0;
 	unsigned int i = 0;
 
-	if (args->pad) {
-		drm_dbg(dev, "Reserved field in drm_ethosu_submit struct should be 0.\n");
-		return -EINVAL;
-	}
-
 	struct drm_ethosu_job __free(kvfree) *jobs =
 		kvmalloc_objs(*jobs, args->job_count);
 	if (!jobs)
@@ -510,7 +533,7 @@ int ethosu_ioctl_submit(struct drm_device *dev, void *data, struct drm_file *fil
 	}
 
 	for (i = 0; i < args->job_count; i++) {
-		ret = ethosu_ioctl_submit_job(dev, file, &jobs[i]);
+		ret = ethosu_ioctl_submit_job(dev, file, &jobs[i], args->perfmon_id);
 		if (ret)
 			return ret;
 	}
diff --git a/drivers/accel/ethosu/ethosu_job.h b/drivers/accel/ethosu/ethosu_job.h
index ff1cf448d094..8988edd00eed 100644
--- a/drivers/accel/ethosu/ethosu_job.h
+++ b/drivers/accel/ethosu/ethosu_job.h
@@ -21,6 +21,8 @@ struct ethosu_job {
 	u8 region_cnt;
 	u32 sram_size;
 
+	struct ethosu_perfmon *perfmon;
+
 	/* Fence to be signaled by drm-sched once its done with the job */
 	struct dma_fence *inference_done_fence;
 
diff --git a/drivers/accel/ethosu/ethosu_perfmon.c b/drivers/accel/ethosu/ethosu_perfmon.c
new file mode 100644
index 000000000000..c9860a777e65
--- /dev/null
+++ b/drivers/accel/ethosu/ethosu_perfmon.c
@@ -0,0 +1,296 @@
+// SPDX-License-Identifier: GPL-2.0-only or MIT
+/* Copyright 2026 Arm, Ltd. */
+/* Based on v3d_perfmon.c, Copyright (C) 2021 Raspberry Pi */
+
+#include <linux/device.h>
+#include <linux/errno.h>
+#include <linux/pm_runtime.h>
+#include <linux/types.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+
+#include <drm/drm_file.h>
+#include <drm/drm_ioctl.h>
+
+#include <uapi/drm/ethosu_accel.h>
+
+#include "ethosu_drv.h"
+#include "ethosu_device.h"
+
+void ethosu_perfmon_get(struct ethosu_perfmon *perfmon)
+{
+	if (perfmon)
+		refcount_inc(&perfmon->refcnt);
+}
+
+void ethosu_perfmon_put(struct ethosu_perfmon *perfmon)
+{
+	if (perfmon && refcount_dec_and_test(&perfmon->refcnt)) {
+		kfree(perfmon);
+	}
+}
+
+void ethosu_perfmon_start(struct ethosu_device *ethosu, struct ethosu_perfmon *perfmon)
+{
+	unsigned int i;
+	u8 ncounters;
+	u32 mask;
+
+	if (WARN_ON_ONCE(!perfmon || ethosu->perfmon_state.active))
+		return;
+
+	writel_relaxed(PMCR_CNT_EN, ethosu->pmu_regs + NPU_REG_PMCR);
+	writel_relaxed(PMU_EV_TYPE_CYCLES, ethosu->pmu_regs + NPU_REG_PMCCNTR_CFG);
+
+	mask = 0x80000000;
+	ncounters = perfmon->ncounters - 1;
+	if (ncounters)
+		mask |= GENMASK(ncounters - 1, 0);
+
+	for (i = 0; i < ncounters; i++)
+		writel_relaxed(perfmon->counters[i], ethosu->pmu_regs + NPU_REG_PMU_EVTYPER(i));
+
+	writel_relaxed(mask, ethosu->pmu_regs + NPU_REG_PMCNTENSET);
+	writel_relaxed(PMCR_CNT_EN | PMCR_EVENT_CNT_RST | PMCR_CYCLE_CNT_RST,
+		ethosu->pmu_regs + NPU_REG_PMCR);
+	ethosu->perfmon_state.active = perfmon;
+}
+
+void ethosu_perfmon_stop_locked(struct ethosu_device *ethosu, struct ethosu_perfmon *perfmon,
+				bool capture)
+{
+	unsigned int i;
+	u8 ncounters;
+	u32 mask;
+
+	if (!perfmon || perfmon != ethosu->perfmon_state.active)
+		return;
+
+	ncounters = perfmon->ncounters - 1;
+
+	if (!pm_runtime_get_if_active(ethosu->base.dev)) {
+		ethosu->perfmon_state.active = NULL;
+		return;
+	}
+
+	if (capture) {
+		for (i = 0; i < ncounters; i++)
+			perfmon->values[i] += readl_relaxed(ethosu->pmu_regs + NPU_REG_PMU_EVCNTR(i));
+		perfmon->values[ncounters] +=
+			readl_relaxed(ethosu->pmu_regs + NPU_REG_PMCCNTR_LO) |
+			(u64)readl_relaxed(ethosu->pmu_regs + NPU_REG_PMCCNTR_HI) << 32;
+	}
+
+	mask = 0x80000000;
+	if (ncounters)
+		mask |= GENMASK(ncounters - 1, 0);
+	writel_relaxed(mask, ethosu->pmu_regs + NPU_REG_PMCNTENCLR);
+
+	writel_relaxed(0, ethosu->pmu_regs + NPU_REG_PMCR);
+	ethosu->perfmon_state.active = NULL;
+
+	pm_runtime_put(ethosu->base.dev);
+}
+
+void ethosu_perfmon_stop(struct ethosu_device *ethosu, struct ethosu_perfmon *perfmon,
+			 bool capture)
+{
+	if (!perfmon)
+		return;
+
+	guard(spinlock)(&ethosu->perfmon_state.lock);
+	ethosu_perfmon_stop_locked(ethosu, perfmon, capture);
+}
+
+struct ethosu_perfmon *ethosu_perfmon_find(struct ethosu_file_priv *ethosu_priv, int id)
+{
+	struct ethosu_perfmon *perfmon;
+
+	xa_lock(&ethosu_priv->perfmons);
+	perfmon = xa_load(&ethosu_priv->perfmons, id);
+	ethosu_perfmon_get(perfmon);
+	xa_unlock(&ethosu_priv->perfmons);
+
+	return perfmon;
+}
+
+void ethosu_perfmon_open_file(struct ethosu_file_priv *ethosu_priv)
+{
+	xa_init_flags(&ethosu_priv->perfmons, XA_FLAGS_ALLOC1);
+}
+
+static void ethosu_perfmon_delete(struct ethosu_file_priv *ethosu_priv,
+				 struct ethosu_perfmon *perfmon)
+{
+	struct ethosu_device *ethosu = ethosu_priv->edev;
+
+	/* If the active perfmon is being destroyed, stop it first */
+	scoped_guard(spinlock_irqsave, &ethosu->perfmon_state.lock) {
+		/* If the global perfmon is being destroyed, set it to NULL */
+		if (ethosu->global_perfmon == perfmon) {
+			ethosu->global_perfmon = NULL;
+			ethosu_perfmon_put(perfmon);
+		}
+
+		ethosu_perfmon_stop_locked(ethosu, perfmon, false);
+	}
+
+	ethosu_perfmon_put(perfmon);
+}
+
+void ethosu_perfmon_close_file(struct ethosu_file_priv *ethosu_priv)
+{
+	struct ethosu_perfmon *perfmon;
+	unsigned long id;
+
+	xa_for_each(&ethosu_priv->perfmons, id, perfmon)
+		ethosu_perfmon_delete(ethosu_priv, perfmon);
+
+	xa_destroy(&ethosu_priv->perfmons);
+}
+
+int ethosu_ioctl_perfmon_create(struct drm_device *dev, void *data,
+			     struct drm_file *file_priv)
+{
+	struct ethosu_file_priv *ethosu_priv = file_priv->driver_priv;
+	struct drm_ethosu_perfmon_create *req = data;
+	struct ethosu_device *ethosu = to_ethosu_device(dev);
+	struct ethosu_perfmon *perfmon;
+	unsigned int i, event_max;
+	int ret;
+	u32 id;
+
+	/* Number of monitored counters cannot exceed HW limits. */
+	if (req->ncounters > ethosu->npu_info.pmu_counters)
+		return -EINVAL;
+
+	/* Make sure all counters are valid. */
+	event_max = ethosu_is_u65(ethosu) ? 433 : 671;
+	for (i = 0; i < req->ncounters; i++) {
+		if (req->counters[i] > event_max)
+			return -EINVAL;
+	}
+
+	/* Add 1 more counter for cycle counter  */
+	req->ncounters++;
+
+	perfmon = kzalloc_flex(*perfmon, values, req->ncounters);
+	if (!perfmon)
+		return -ENOMEM;
+
+	for (i = 0; i < req->ncounters - 1; i++)
+		perfmon->counters[i] = req->counters[i];
+
+	perfmon->ncounters = req->ncounters;
+
+	refcount_set(&perfmon->refcnt, 1);
+
+	ret = xa_alloc(&ethosu_priv->perfmons, &id, perfmon, xa_limit_32b,
+		       GFP_KERNEL);
+
+	if (ret < 0) {
+		kfree(perfmon);
+		return ret;
+	}
+
+	req->id = id;
+
+	return 0;
+}
+
+int ethosu_ioctl_perfmon_destroy(struct drm_device *dev, void *data,
+				 struct drm_file *file_priv)
+{
+	struct ethosu_file_priv *ethosu_priv = file_priv->driver_priv;
+	struct drm_ethosu_perfmon_destroy *req = data;
+	struct ethosu_perfmon *perfmon;
+
+	perfmon = xa_erase(&ethosu_priv->perfmons, req->id);
+	if (!perfmon)
+		return -EINVAL;
+
+	ethosu_perfmon_delete(ethosu_priv, perfmon);
+
+	return 0;
+}
+
+int ethosu_ioctl_perfmon_get_values(struct drm_device *dev, void *data,
+				 struct drm_file *file_priv)
+{
+	struct ethosu_device *ethosu = to_ethosu_device(dev);
+	struct ethosu_file_priv *ethosu_priv = file_priv->driver_priv;
+	struct drm_ethosu_perfmon_get_values *req = data;
+	struct ethosu_perfmon *perfmon;
+	int ret = 0;
+
+	if (req->pad != 0)
+		return -EINVAL;
+
+	perfmon = ethosu_perfmon_find(ethosu_priv, req->id);
+	if (!perfmon)
+		return -EINVAL;
+
+	ret = pm_runtime_resume_and_get(dev->dev);
+	if (ret) {
+		ethosu_perfmon_put(perfmon);
+		return ret;
+	}
+	ethosu_perfmon_stop(ethosu, perfmon, true);
+
+	pm_runtime_put_autosuspend(dev->dev);
+
+	if (copy_to_user(u64_to_user_ptr(req->values_ptr), perfmon->values,
+			 perfmon->ncounters * sizeof(u64)))
+		ret = -EFAULT;
+
+	ethosu_perfmon_put(perfmon);
+
+	return ret;
+}
+
+int ethosu_ioctl_perfmon_set_global(struct drm_device *dev, void *data,
+				 struct drm_file *file_priv)
+{
+	struct ethosu_file_priv *ethosu_priv = file_priv->driver_priv;
+	struct drm_ethosu_perfmon_set_global *req = data;
+	struct ethosu_device *ethosu = to_ethosu_device(dev);
+	struct ethosu_perfmon *perfmon;
+
+	if (req->flags & ~DRM_ETHOSU_PERFMON_CLEAR_GLOBAL)
+		return -EINVAL;
+
+	perfmon = ethosu_perfmon_find(ethosu_priv, req->id);
+	if (!perfmon)
+		return -EINVAL;
+
+	/* If the request is to clear the global performance monitor */
+	if (req->flags & DRM_ETHOSU_PERFMON_CLEAR_GLOBAL) {
+		struct ethosu_perfmon *old;
+		scoped_guard(spinlock, &ethosu->perfmon_state.lock) {
+			old = ethosu->global_perfmon;
+			if (!old) {
+				ethosu_perfmon_put(perfmon);
+				return -EINVAL;
+			}
+
+			ethosu->global_perfmon = NULL;
+			ethosu_perfmon_stop_locked(ethosu, old, true);
+		}
+
+		ethosu_perfmon_put(old);
+		ethosu_perfmon_put(perfmon);
+
+		return 0;
+	}
+
+	scoped_guard(spinlock, &ethosu->perfmon_state.lock) {
+		if (ethosu->perfmon_state.active || ethosu->global_perfmon) {
+			ethosu_perfmon_put(perfmon);
+			return -EBUSY;
+		}
+
+		ethosu->global_perfmon = perfmon;
+	}
+
+	return 0;
+}
diff --git a/include/uapi/drm/ethosu_accel.h b/include/uapi/drm/ethosu_accel.h
index af78bb4686d7..dde6756642ea 100644
--- a/include/uapi/drm/ethosu_accel.h
+++ b/include/uapi/drm/ethosu_accel.h
@@ -43,6 +43,11 @@ enum drm_ethosu_ioctl_id {
 
 	/** @DRM_ETHOSU_SUBMIT: Submit a job and BOs to run. */
 	DRM_ETHOSU_SUBMIT,
+
+	DRM_ETHOSU_PERFMON_CREATE,
+	DRM_ETHOSU_PERFMON_DESTROY,
+	DRM_ETHOSU_PERFMON_GET_VALUES,
+	DRM_ETHOSU_PERFMON_SET_GLOBAL,
 };
 
 /**
@@ -79,6 +84,7 @@ struct drm_ethosu_npu_info {
 	__u32 config;
 
 	__u32 sram_size;
+	__u32 pmu_counters;
 };
 
 /**
@@ -220,8 +226,51 @@ struct drm_ethosu_submit {
 	/** Input: Number of jobs passed in. */
 	__u32 job_count;
 
-	/** Reserved, must be zero. */
+	/** Input: Id returned by DRM_ETHOSU_PERFMON_CREATE */
+	__u32 perfmon_id;
+};
+
+#define DRM_ETHOSU_MAX_PERF_EVENT_COUNTERS	8
+#define DRM_ETHOSU_MAX_PERF_COUNTERS \
+	(DRM_ETHOSU_MAX_PERF_EVENT_COUNTERS + 1)
+
+struct drm_ethosu_perfmon_create {
+	__u32 id;
+	__u32 ncounters;
+	__u16 counters[DRM_ETHOSU_MAX_PERF_EVENT_COUNTERS];
+};
+
+struct drm_ethosu_perfmon_destroy {
+	__u32 id;
+};
+
+/*
+ * Returns the values of the performance counters tracked by this
+ * perfmon (as an array of (ncounters + 1) u64 values).
+ *
+ * No implicit synchronization is performed, so the user has to
+ * guarantee that any jobs using this perfmon have already been
+ * completed.
+ */
+struct drm_ethosu_perfmon_get_values {
+	__u32 id;
 	__u32 pad;
+	__u64 values_ptr;
+};
+
+#define DRM_ETHOSU_PERFMON_CLEAR_GLOBAL    0x0001
+
+/**
+ * struct drm_ethosu_perfmon_set_global - ioctl to define a global performance
+ * monitor
+ *
+ * The global performance monitor will be used for all jobs. If a global
+ * performance monitor is defined, jobs with a self-defined performance
+ * monitor won't be allowed.
+ */
+struct drm_ethosu_perfmon_set_global {
+	__u32 flags;
+	__u32 id;
 };
 
 /**
@@ -252,6 +301,14 @@ enum {
 		DRM_IOCTL_ETHOSU(WR, CMDSTREAM_BO_CREATE, cmdstream_bo_create),
 	DRM_IOCTL_ETHOSU_SUBMIT =
 		DRM_IOCTL_ETHOSU(WR, SUBMIT, submit),
+	DRM_IOCTL_ETHOSU_PERFMON_CREATE =
+		DRM_IOCTL_ETHOSU(WR, PERFMON_CREATE, perfmon_create),
+	DRM_IOCTL_ETHOSU_PERFMON_DESTROY =
+		DRM_IOCTL_ETHOSU(WR, PERFMON_DESTROY, perfmon_destroy),
+	DRM_IOCTL_ETHOSU_PERFMON_GET_VALUES =
+		DRM_IOCTL_ETHOSU(WR, PERFMON_GET_VALUES, perfmon_get_values),
+	DRM_IOCTL_ETHOSU_PERFMON_SET_GLOBAL =
+		DRM_IOCTL_ETHOSU(WR, PERFMON_SET_GLOBAL, perfmon_set_global),
 };
 
 #if defined(__cplusplus)
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Claude review: accel: ethosu: Add performance counter support
  2026-05-15  3:26 [PATCH v2] accel: ethosu: Add performance counter support Rob Herring (Arm)
@ 2026-05-16  0:08 ` Claude Code Review Bot
  2026-05-16  0:08 ` Claude Code Review Bot
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Claude Code Review Bot @ 2026-05-16  0:08 UTC (permalink / raw)
  To: dri-devel-reviews

Overall Series Review

Subject: accel: ethosu: Add performance counter support
Author: "Rob Herring (Arm)" <robh@kernel.org>
Patches: 1
Reviewed: 2026-05-16T10:08:39.258984

---

This is a single-patch series (v2) adding PMU performance counter support to the Arm Ethos-U NPU accelerator driver. The implementation is modeled after V3D's perfmon subsystem. The design is straightforward: userspace creates a perfmon object with desired event counters, attaches it to job submissions (or sets it as a global monitor), and reads back accumulated counter values. The code uses XArray for ID allocation, refcounting for perfmon lifetime, and a per-device spinlock for protecting active perfmon state.

Overall the patch is in reasonable shape for a v2 but has several issues that should be addressed before merging, ranging from a likely crash bug in `pmu_regs` initialization, to UAPI struct padding/alignment concerns, to locking and data-race issues.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Claude review: accel: ethosu: Add performance counter support
  2026-05-15  3:26 [PATCH v2] accel: ethosu: Add performance counter support Rob Herring (Arm)
  2026-05-16  0:08 ` Claude review: " Claude Code Review Bot
@ 2026-05-16  0:08 ` Claude Code Review Bot
  2026-05-16 13:23 ` [PATCH v2] " Maíra Canal
  2026-05-23  8:37 ` [PATCH v3] " Tomeu Vizoso
  3 siblings, 0 replies; 6+ messages in thread
From: Claude Code Review Bot @ 2026-05-16  0:08 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Bug: Uninitialized `pmu_regs` offset for non-U65**

In `ethosu_drv.c`, the probe function sets:
```c
ethosudev->pmu_regs = ethosudev->regs;
```
Then in `ethosu_init()`:
```c
if (!ethosu_is_u65(ethosudev))
    ethosudev->pmu_regs += 0x1000;
```

This `+= 0x1000` is applied to the `void __iomem *` pointer. However, at this point `pmu_regs` was already set to `ethosudev->regs` in `ethosu_probe()`. If `ethosu_init()` were ever called a second time (e.g. via a resume path or error-retry), the offset would be applied twice. More importantly, the `devm_platform_ioremap_resource()` mapping might not cover the extra 0x1000 offset — the resource size should be validated. Suggest using an explicit assignment `ethosudev->pmu_regs = ethosudev->regs + 0x1000;` instead of `+=` to make it idempotent and clearer.

**UAPI: `drm_ethosu_npu_info` struct packing changed without padding**

Adding `pmu_counters` (__u32) to `drm_ethosu_npu_info`:
```c
struct drm_ethosu_npu_info {
    __u32 id;
    __u32 config;
    __u32 sram_size;
    __u32 pmu_counters;  /* new */
};
```
This changes the struct size from 12 to 16 bytes. Since this struct is used via `DRM_ETHOSU_DEV_QUERY` with a size-based copy mechanism (`min(size, actual_structure_size)`), this is designed to be extensible and should be fine for forward/backward compatibility. However, the original 12-byte struct had no padding for natural alignment — was 12 bytes intentional, or should there have been an explicit pad field already? This is worth verifying that existing userspace won't break, since `sizeof(struct drm_ethosu_npu_info)` changes.

**UAPI: `drm_ethosu_perfmon_create` alignment/padding issues**

```c
struct drm_ethosu_perfmon_create {
    __u32 id;
    __u32 ncounters;
    __u16 counters[DRM_ETHOSU_MAX_PERF_EVENT_COUNTERS]; /* 8 * 2 = 16 bytes */
};
```
Total: 4 + 4 + 16 = 24 bytes. The struct has no trailing padding issue per se, but the `__u16` array following `__u32` fields is aligned fine. However, it would be good practice to add a `__u32 pad` for future extensibility since this is ioctl-level ABI.

**UAPI: `drm_ethosu_perfmon_destroy` is only 4 bytes**

```c
struct drm_ethosu_perfmon_destroy {
    __u32 id;
};
```
A 4-byte UAPI struct is unusual. Consider adding a `__u32 pad;` for future extensibility and to ensure 8-byte alignment for the ioctl struct.

**UAPI: `drm_ethosu_submit.pad` field repurposed**

The existing `pad` field (which was required to be zero) is repurposed as `perfmon_id`:
```c
-	/** Reserved, must be zero. */
-	__u32 pad;
+	/** Input: Id returned by DRM_ETHOSU_PERFMON_CREATE */
+	__u32 perfmon_id;
```
The validation of `pad == 0` is removed:
```c
-	if (args->pad) {
-		drm_dbg(dev, "Reserved field in drm_ethosu_submit struct should be 0.\n");
-		return -EINVAL;
-	}
```
This is the correct way to reuse a reserved field. Since `XA_FLAGS_ALLOC1` is used, valid perfmon IDs start at 1 and `perfmon_id == 0` means "no perfmon," which is backward-compatible. Good.

**Locking: `ethosu_perfmon_start` called under spinlock but not annotated**

In `ethosu_switch_perfmon()`:
```c
guard(spinlock)(&ethosu->perfmon_state.lock);
...
ethosu_perfmon_stop_locked(ethosu, ethosu->perfmon_state.active, true);
if (perfmon && ethosu->perfmon_state.active != perfmon)
    ethosu_perfmon_start(ethosu, perfmon);
```
`ethosu_perfmon_start()` is called under `perfmon_state.lock` (a spinlock), and it does `writel_relaxed()` which is fine from a sleeping context. But this function does MMIO writes — the spinlock ensures atomicity with respect to the stop path, which is correct. However, `ethosu_perfmon_start()` has a `WARN_ON_ONCE(!perfmon || ethosu->perfmon_state.active)` check — by the time we get there after the stop, `active` should be NULL. The second condition in the `if` check `ethosu->perfmon_state.active != perfmon` is redundant after the early return for `perfmon == active` case, since `stop_locked` sets `active = NULL`. Not a bug, just slightly confusing logic.

**Potential data race: `global_perfmon` read without lock**

In `ethosu_switch_perfmon()`:
```c
guard(spinlock)(&ethosu->perfmon_state.lock);
perfmon = READ_ONCE(ethosu->global_perfmon);
```
The `READ_ONCE` is good, but `global_perfmon` is also written in `ethosu_perfmon_set_global()` under the same spinlock, and in `ethosu_perfmon_delete()` under `spinlock_irqsave`. The lock coverage is consistent, which is correct. However, the `global_perfmon` pointer is set with a refcount from `ethosu_perfmon_find()` in `set_global`, but there's no corresponding `ethosu_perfmon_get()` when it's used in `switch_perfmon`. If the global perfmon is concurrently destroyed (via `ethosu_perfmon_delete` which drops the ref), the pointer used in `switch_perfmon` could reference freed memory after the spinlock is dropped. In practice, since `switch_perfmon` only uses the pointer within the spinlock scope and `delete` also takes the spinlock, this is safe — but the design is fragile.

**`ethosu_perfmon_stop_locked`: `pm_runtime_get_if_active` returns int, not bool**

```c
if (!pm_runtime_get_if_active(ethosu->base.dev)) {
```
`pm_runtime_get_if_active()` can return negative error codes, 0 (device not active), or positive (success). The `!` check treats negative errors the same as "not active" — this is probably fine (if PM errors out, bailing is reasonable), but it's worth noting.

**`ethosu_perfmon_stop_locked`: called from spinlock but calls `pm_runtime_put`**

```c
void ethosu_perfmon_stop_locked(...)
{
    ...
    pm_runtime_put(ethosu->base.dev);
}
```
`pm_runtime_put()` may sleep (it can trigger `runtime_suspend` callbacks synchronously). This is called under a spinlock context (from `ethosu_switch_perfmon` via `guard(spinlock)`). This is a **bug** — sleeping in atomic context. Should use `pm_runtime_put_autosuspend()` or `pm_runtime_mark_last_busy()` + `pm_runtime_put_autosuspend()` instead, or better yet, defer the pm_runtime_put to after the spinlock is released.

Similarly, `pm_runtime_get_if_active()` should also be checked for sleeping behavior, though that one is documented as safe in atomic context.

**`ethosu_ioctl_perfmon_get_values`: stopping a perfmon that may not be active**

```c
ethosu_perfmon_stop(ethosu, perfmon, true);
```
This stops the perfmon to capture current values, but doesn't restart it afterward. If this was a global perfmon that was running, it will be stopped permanently after a get_values call. The docstring says "no implicit synchronization" but the stop side-effect is implicit. This seems intentional based on the V3D model, but worth documenting.

**`ethosu_ioctl_perfmon_set_global`: no check that perfmon_id matches the perfmon being cleared**

When `DRM_ETHOSU_PERFMON_CLEAR_GLOBAL` is set:
```c
perfmon = ethosu_perfmon_find(ethosu_priv, req->id);
if (!perfmon)
    return -EINVAL;
...
if (req->flags & DRM_ETHOSU_PERFMON_CLEAR_GLOBAL) {
    old = ethosu->global_perfmon;
    if (!old) { ... return -EINVAL; }
    ethosu->global_perfmon = NULL;
```
The `req->id` is used to find a perfmon, but then the code doesn't verify that the found perfmon matches the current `global_perfmon`. Any valid perfmon ID will clear any global perfmon. The found `perfmon` is only used for the `find` validity check, then immediately put. This is potentially confusing API — either the `id` should be validated against the current global, or the clear operation shouldn't require a valid `id` at all.

**Missing `#include` for flex array / `kzalloc_flex` in ethosu_perfmon.c**

The code uses `kzalloc_flex()` but I don't see an explicit include for whatever header provides it. It likely comes transitively through `slab.h` or one of the DRM headers. Should be fine but worth verifying builds cleanly.

**`ethosu_job_err_cleanup`: perfmon_put before BO cleanup**

```c
static void ethosu_job_err_cleanup(struct ethosu_job *job)
{
    ethosu_perfmon_put(job->perfmon);
    for (i = 0; i < job->region_cnt; i++)
        drm_gem_object_put(job->region_bo[i]);
```
This is fine — just noting that `perfmon_put` is added at the top. If `job->perfmon` is NULL (no perfmon attached), the `ethosu_perfmon_put` NULL-checks internally, which is correct.

**Missing error check for `ethosu_perfmon_find` in submit path**

```c
if (perfmon_id)
    ejob->perfmon = ethosu_perfmon_find(file_priv, perfmon_id);
```
If the user passes a nonzero `perfmon_id` that doesn't exist in the xarray, `ethosu_perfmon_find` returns NULL (xa_load returns NULL for missing entries). The job proceeds with `perfmon == NULL`, silently ignoring the invalid perfmon ID. This should return `-EINVAL` if the user specified a perfmon_id but it wasn't found:
```c
if (perfmon_id) {
    ejob->perfmon = ethosu_perfmon_find(file_priv, perfmon_id);
    if (!ejob->perfmon)
        return -EINVAL;
}
```

**`counters` array in `ethosu_perfmon` struct is fixed-size but `values` is flex**

```c
struct ethosu_perfmon {
    u8 ncounters;
    u16 counters[DRM_ETHOSU_MAX_PERF_EVENT_COUNTERS];
    u64 values[] __counted_by(ncounters);
};
```
`ncounters` includes the cycle counter (+1), so it can be up to 9 (DRM_ETHOSU_MAX_PERF_COUNTERS). The `counters` array is sized `DRM_ETHOSU_MAX_PERF_EVENT_COUNTERS` (8), which stores only the user-specified event counters (not cycles). `values` is `ncounters` entries which includes cycles. This mismatch between `counters` array size (8 event types) and `values` array size (up to 9 values) is intentional but the mixed use of `ncounters` and `ncounters - 1` throughout the code is error-prone and hard to follow. Consider using a separate field for the event counter count vs total counter count.

**Minor: SPDX license tag inconsistency**

`ethosu_perfmon.c` header:
```c
// SPDX-License-Identifier: GPL-2.0-only or MIT
```
Other files use:
```c
// SPDX-License-Identifier: GPL-2.0-only OR MIT
```
The SPDX spec requires uppercase `OR`. This should be `GPL-2.0-only OR MIT`.

**Summary of issues by severity:**

1. **Bug**: `pm_runtime_put()` called under spinlock in `ethosu_perfmon_stop_locked` — sleeping in atomic context
2. **Bug**: Missing error check when `ethosu_perfmon_find` returns NULL in the submit path (invalid perfmon_id silently ignored)
3. **Issue**: `pmu_regs += 0x1000` is not idempotent — should be explicit assignment
4. **UAPI**: Several structs could benefit from pad fields for future ABI extensibility
5. **Minor**: SPDX license tag uses lowercase `or` instead of `OR`
6. **Style**: Pervasive `ncounters - 1` pattern throughout the code is fragile; a named field would be clearer

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] accel: ethosu: Add performance counter support
  2026-05-15  3:26 [PATCH v2] accel: ethosu: Add performance counter support Rob Herring (Arm)
  2026-05-16  0:08 ` Claude review: " Claude Code Review Bot
  2026-05-16  0:08 ` Claude Code Review Bot
@ 2026-05-16 13:23 ` Maíra Canal
  2026-05-25  7:53   ` Claude review: " Claude Code Review Bot
  2026-05-23  8:37 ` [PATCH v3] " Tomeu Vizoso
  3 siblings, 1 reply; 6+ messages in thread
From: Maíra Canal @ 2026-05-16 13:23 UTC (permalink / raw)
  To: Rob Herring (Arm), Tomeu Vizoso, Oded Gabbay, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Kees Cook, Gustavo A. R. Silva
  Cc: linux-kernel, dri-devel, linux-hardening

Hi Rob,

On 15/05/26 00:26, Rob Herring (Arm) wrote:
> The Arm Ethos-U NPUs have a PMU with performance counters. The PMU h/w
> supports up to 4 (U65) or 8 (U85) counters which can be programmed for
> different events. There is also a dedicated cycle counter.
> 
> The ABI and implementation are copied from the V3D driver. The main
> difference in the ABI is there is no query API for the the event list.
> The events differ between the U65 and U85, so the events lists are
> maintained in userspace along with other differences between the U65 and
> U85.
> 
> The cycle counter is always enabled when the PMU is enabled. When the
> user requests N events, reading the counters will return the N events
> plus the cycle counter.
> 
> Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
> ---
> v2:
>   - Use XArray instead of idr
>   - Rework locking to use per device spinlock to protect modifying active
>     perfmon. Based on pending V3D changes:
>     https://lore.kernel.org/all/20260508-v3d-perfmon-lifetime-v1-1-f5b5642c085f@igalia.com/
>   - Add missing perfmon puts in ethosu_ioctl_perfmon_set_global() and
>     ethosu_ioctl_perfmon_get_values() error paths.
>   - Fix reading number of counters on U85.
>   - Add defines NPU_REG_PMCCNTR_CFG
> ---
>   drivers/accel/ethosu/Makefile         |   2 +-
>   drivers/accel/ethosu/ethosu_device.h  |  32 +++
>   drivers/accel/ethosu/ethosu_drv.c     |  21 +-
>   drivers/accel/ethosu/ethosu_drv.h     |  62 +++++-
>   drivers/accel/ethosu/ethosu_job.c     |  41 +++-
>   drivers/accel/ethosu/ethosu_job.h     |   2 +
>   drivers/accel/ethosu/ethosu_perfmon.c | 296 ++++++++++++++++++++++++++
>   include/uapi/drm/ethosu_accel.h       |  59 ++++-
>   8 files changed, 500 insertions(+), 15 deletions(-)
>   create mode 100644 drivers/accel/ethosu/ethosu_perfmon.c
> 

[...]

> diff --git a/drivers/accel/ethosu/ethosu_drv.h b/drivers/accel/ethosu/ethosu_drv.h
> index 9e21dfe94184..8fed43c2a7af 100644
> --- a/drivers/accel/ethosu/ethosu_drv.h
> +++ b/drivers/accel/ethosu/ethosu_drv.h
> @@ -1,15 +1,75 @@
>   /* SPDX-License-Identifier: GPL-2.0-only OR MIT */
> -/* Copyright 2025 Arm, Ltd. */
> +/* Copyright 2025-2026 Arm, Ltd. */
>   #ifndef __ETHOSU_DRV_H__
>   #define __ETHOSU_DRV_H__
>   
> +#include <linux/mutex.h>
> +#include <linux/xarray.h>
>   #include <drm/gpu_scheduler.h>
>   
>   struct ethosu_device;
> +struct drm_device;
> +struct drm_file;
>   
>   struct ethosu_file_priv {
>   	struct ethosu_device *edev;
>   	struct drm_sched_entity sched_entity;
> +	struct xarray perfmons;
>   };
>   
> +/* Performance monitor object. The perform lifetime is controlled by userspace
> + * using perfmon related ioctls. A perfmon can be attached to a submit_cl

submit_cl?

> + * request, and when this is the case, HW perf counters will be activated just
> + * before the submit_cl is submitted to the GPU and disabled when the job is

Maybe replace the "submit_cl is submitted to the GPU" with NPU related
wording?

> + * done. This way, only events related to a specific job will be counted.
> + */
> +struct ethosu_perfmon {
> +	/* Tracks the number of users of the perfmon, when this counter reaches
> +	 * zero the perfmon is destroyed.
> +	 */
> +	refcount_t refcnt;
> +
> +	/* Number of counters activated in this perfmon instance
> +	 * (should be less than or equal to DRM_ETHOSU_MAX_PERF_COUNTERS).
> +	 */
> +	u8 ncounters;
> +
> +	/* Events counted by the HW perf counters. */
> +	u16 counters[DRM_ETHOSU_MAX_PERF_EVENT_COUNTERS];
> +
> +	/*
> +	 * Storage for counter values. Counters are incremented by the HW
> +	 * perf counter values every time the perfmon is attached to a
> +	 * NPU job. This way, perfmon users don't have to retrieve the
> +	 * results after each job if they want to track events covering
> +	 * several submissions. Note that counter values can't be reset,
> +	 * but you can fake a reset by destroying the perfmon and
> +	 * creating a new one.
> +	 */
> +	u64 values[] __counted_by(ncounters);
> +};
> +
> +/* ethosu_perfmon.c */
> +void ethosu_perfmon_init(struct ethosu_device *ethosu);
> +void ethosu_perfmon_get(struct ethosu_perfmon *perfmon);
> +void ethosu_perfmon_put(struct ethosu_perfmon *perfmon);
> +void ethosu_perfmon_start(struct ethosu_device *ethosu,
> +			  struct ethosu_perfmon *perfmon);
> +void ethosu_perfmon_stop(struct ethosu_device *ethosu,
> +			 struct ethosu_perfmon *perfmon, bool capture);
> +void ethosu_perfmon_stop_locked(struct ethosu_device *ethosu, struct ethosu_perfmon *perfmon,
> +				bool capture);
> +struct ethosu_perfmon *ethosu_perfmon_find(struct ethosu_file_priv *ethosu_priv,
> +					   int id);
> +void ethosu_perfmon_open_file(struct ethosu_file_priv *ethosu_priv);
> +void ethosu_perfmon_close_file(struct ethosu_file_priv *ethosu_priv);
> +int ethosu_ioctl_perfmon_create(struct drm_device *dev, void *data,
> +				struct drm_file *file_priv);
> +int ethosu_ioctl_perfmon_destroy(struct drm_device *dev, void *data,
> +				 struct drm_file *file_priv);
> +int ethosu_ioctl_perfmon_get_values(struct drm_device *dev, void *data,
> +				    struct drm_file *file_priv);
> +int ethosu_ioctl_perfmon_set_global(struct drm_device *dev, void *data,
> +				    struct drm_file *file_priv);
> +
>   #endif
> diff --git a/drivers/accel/ethosu/ethosu_job.c b/drivers/accel/ethosu/ethosu_job.c
> index e7b07cdbcced..5712848236af 100644
> --- a/drivers/accel/ethosu/ethosu_job.c
> +++ b/drivers/accel/ethosu/ethosu_job.c
> @@ -1,6 +1,6 @@
>   // SPDX-License-Identifier: GPL-2.0-only OR MIT
>   /* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
> -/* Copyright 2025 Arm, Ltd. */
> +/* Copyright 2025-2026 Arm, Ltd. */
>   
>   #include <linux/bitfield.h>
>   #include <linux/genalloc.h>
> @@ -147,6 +147,8 @@ static void ethosu_job_err_cleanup(struct ethosu_job *job)
>   {
>   	unsigned int i;
>   
> +	ethosu_perfmon_put(job->perfmon);
> +
>   	for (i = 0; i < job->region_cnt; i++)
>   		drm_gem_object_put(job->region_bo[i]);
>   
> @@ -181,6 +183,26 @@ static void ethosu_job_free(struct drm_sched_job *sched_job)
>   	ethosu_job_put(job);
>   }
>   
> +static void
> +ethosu_switch_perfmon(struct ethosu_device *ethosu, struct ethosu_job *job)
> +{
> +	struct ethosu_perfmon *perfmon;
> +
> +	guard(spinlock)(&ethosu->perfmon_state.lock);
> +
> +	perfmon = READ_ONCE(ethosu->global_perfmon);

Considering that you are under the spinlock, I believe that using
READ_ONCE is unneeded.

> +	if (!perfmon)
> +		perfmon = job->perfmon;
> +
> +	if (perfmon == ethosu->perfmon_state.active)
> +		return;
> +
> +	ethosu_perfmon_stop_locked(ethosu, ethosu->perfmon_state.active, true);
> +
> +	if (perfmon && ethosu->perfmon_state.active != perfmon)
> +		ethosu_perfmon_start(ethosu, perfmon);

Also, if you are going to keep ethosu_switch_perfmon(), you don't need
to use a spinlock. In V3D, I'm only using a spinlock because I'm
stopping the perfmon in a IRQ context. But considering that you are
stopping the perfmon in run_job(), you can use a sleeping mutex.

Moreover, I believe that if (perfmon != NULL) then automatically
(ethosu->perfmon_state.active != perfmon) as we already checked if
(perfmon == ethosu->perfmon_state.active).

Best regards,
- Maíra

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v3] accel: ethosu: Add performance counter support
  2026-05-15  3:26 [PATCH v2] accel: ethosu: Add performance counter support Rob Herring (Arm)
                   ` (2 preceding siblings ...)
  2026-05-16 13:23 ` [PATCH v2] " Maíra Canal
@ 2026-05-23  8:37 ` Tomeu Vizoso
  3 siblings, 0 replies; 6+ messages in thread
From: Tomeu Vizoso @ 2026-05-23  8:37 UTC (permalink / raw)
  To: Rob Herring (Arm), Tomeu Vizoso, Oded Gabbay, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Kees Cook, Gustavo A. R. Silva, open list
  Cc: open list

From: "Rob Herring (Arm)" <robh@kernel.org>

The Arm Ethos-U NPUs have a PMU with performance counters. The PMU h/w
supports up to 4 (U65) or 8 (U85) counters which can be programmed for
different events. There is also a dedicated cycle counter.

The ABI and implementation are copied from the V3D driver. The main
difference in the ABI is there is no query API for the the event list.
The events differ between the U65 and U85, so the events lists are
maintained in userspace along with other differences between the U65 and
U85.

The cycle counter is always enabled when the PMU is enabled. When the
user requests N events, reading the counters will return the N events
plus the cycle counter.

Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
---
v2:
 - Use XArray instead of idr
 - Rework locking to use per device spinlock to protect modifying active
   perfmon. Based on pending V3D changes:
   https://lore.kernel.org/all/20260508-v3d-perfmon-lifetime-v1-1-f5b5642c085f@igalia.com/
 - Add missing perfmon puts in ethosu_ioctl_perfmon_set_global() and
   ethosu_ioctl_perfmon_get_values() error paths.
 - Fix reading number of counters on U85.
 - Add defines NPU_REG_PMCCNTR_CFG

v3:
 - Add explicit padding to drm_ethosu_perfmon_destroy
 - Fix SPDX license expression
 - Fix comment typos
 - Convert perfmon lock from spinlock to mutex
 - Simplify switch_perfmon condition check
 - Remove unused ethosu_perfmon_init
 - Add lockdep_assert_held to ethosu_perfmon_stop_locked
---
 drivers/accel/ethosu/Makefile         |   2 +-
 drivers/accel/ethosu/ethosu_device.h  |  33 +++
 drivers/accel/ethosu/ethosu_drv.c     |  23 +-
 drivers/accel/ethosu/ethosu_drv.h     |  61 +++++-
 drivers/accel/ethosu/ethosu_job.c     |  39 +++-
 drivers/accel/ethosu/ethosu_job.h     |   2 +
 drivers/accel/ethosu/ethosu_perfmon.c | 298 ++++++++++++++++++++++++++
 include/uapi/drm/ethosu_accel.h       |  60 +++++-
 8 files changed, 504 insertions(+), 14 deletions(-)
 create mode 100644 drivers/accel/ethosu/ethosu_perfmon.c

diff --git a/drivers/accel/ethosu/Makefile b/drivers/accel/ethosu/Makefile
index 17db5a600416..598a388b7179 100644
--- a/drivers/accel/ethosu/Makefile
+++ b/drivers/accel/ethosu/Makefile
@@ -1,4 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only
 
 obj-$(CONFIG_DRM_ACCEL_ARM_ETHOSU) := ethosu.o
-ethosu-y += ethosu_drv.o ethosu_gem.o ethosu_job.o
+ethosu-y += ethosu_drv.o ethosu_gem.o ethosu_job.o ethosu_perfmon.o
diff --git a/drivers/accel/ethosu/ethosu_device.h b/drivers/accel/ethosu/ethosu_device.h
index b189fa783d6a..3a1d07d94785 100644
--- a/drivers/accel/ethosu/ethosu_device.h
+++ b/drivers/accel/ethosu/ethosu_device.h
@@ -6,6 +6,7 @@
 
 #include <linux/bitfield.h>
 #include <linux/bits.h>
+#include <linux/mutex.h>
 #include <linux/types.h>
 
 #include <drm/drm_device.h>
@@ -43,6 +44,15 @@ struct gen_pool;
 #define NPU_REG_BASEP_HI(x)	(0x0084 + (x) * 8)
 #define NPU_BASEP_REGION_MAX	8
 
+#define NPU_REG_PMCR		0x0180
+#define NPU_REG_PMCNTENSET	0x0184
+#define NPU_REG_PMCNTENCLR	0x0188
+#define NPU_REG_PMCCNTR_LO	0x01A0
+#define NPU_REG_PMCCNTR_HI	0x01A4
+#define NPU_REG_PMCCNTR_CFG	0x01A8
+#define NPU_REG_PMU_EVCNTR(x)	(0x0300 + (x) * 4)
+#define NPU_REG_PMU_EVTYPER(x)	(0x0380 + (x) * 4)
+
 #define ID_ARCH_MAJOR_MASK	GENMASK(31, 28)
 #define ID_ARCH_MINOR_MASK	GENMASK(27, 20)
 #define ID_ARCH_PATCH_MASK	GENMASK(19, 16)
@@ -67,6 +77,15 @@ struct gen_pool;
 
 #define PROT_ACTIVE_CSL		BIT(1)
 
+#define PMCR_NUM_EVENT_CNT_MASK	GENMASK(15, 11)
+#define PMCR_CYCLE_CNT_RST	BIT(2)
+#define PMCR_EVENT_CNT_RST	BIT(1)
+#define PMCR_CNT_EN		BIT(0)
+
+#define PMU_EV_TYPE_NONE	0
+#define PMU_EV_TYPE_CYCLES	0x11
+#define PMU_EV_TYPE_IDLE	0x20
+
 enum ethosu_cmds {
 	NPU_OP_CONV = 0x2,
 	NPU_OP_DEPTHWISE = 0x3,
@@ -152,6 +171,8 @@ enum ethosu_cmds {
 
 #define ETHOSU_SRAM_REGION	2	/* Matching Vela compiler */
 
+struct ethosu_perfmon;
+
 /**
  * struct ethosu_device - Ethosu device
  */
@@ -161,6 +182,7 @@ struct ethosu_device {
 
 	/** @iomem: CPU mapping of the registers. */
 	void __iomem *regs;
+	void __iomem *pmu_regs;
 
 	void __iomem *sram;
 	struct gen_pool *srampool;
@@ -184,6 +206,17 @@ struct ethosu_device {
 	struct mutex sched_lock;
 	u64 fence_context;
 	u64 emit_seqno;
+
+	/* Tracks the performance monitor state. */
+	struct {
+		/* Protects @active. */
+		struct mutex lock;
+
+		/* Perfmon currently programmed in HW (or NULL if none). */
+		struct ethosu_perfmon *active;
+	} perfmon_state;
+
+	struct ethosu_perfmon *global_perfmon;
 };
 
 #define to_ethosu_device(drm_dev) \
diff --git a/drivers/accel/ethosu/ethosu_drv.c b/drivers/accel/ethosu/ethosu_drv.c
index 9992193d7338..105517e700e2 100644
--- a/drivers/accel/ethosu/ethosu_drv.c
+++ b/drivers/accel/ethosu/ethosu_drv.c
@@ -155,6 +155,7 @@ static int ethosu_open(struct drm_device *ddev, struct drm_file *file)
 	if (ret)
 		goto err_put_mod;
 
+	ethosu_perfmon_open_file(priv);
 	file->driver_priv = no_free_ptr(priv);
 	return 0;
 
@@ -166,6 +167,7 @@ static int ethosu_open(struct drm_device *ddev, struct drm_file *file)
 static void ethosu_postclose(struct drm_device *ddev, struct drm_file *file)
 {
 	ethosu_job_close(file->driver_priv);
+	ethosu_perfmon_close_file(file->driver_priv);
 	kfree(file->driver_priv);
 	module_put(THIS_MODULE);
 }
@@ -180,6 +182,10 @@ static const struct drm_ioctl_desc ethosu_drm_driver_ioctls[] = {
 	ETHOSU_IOCTL(BO_MMAP_OFFSET, bo_mmap_offset, 0),
 	ETHOSU_IOCTL(CMDSTREAM_BO_CREATE, cmdstream_bo_create, 0),
 	ETHOSU_IOCTL(SUBMIT, submit, 0),
+	ETHOSU_IOCTL(PERFMON_CREATE, perfmon_create, 0),
+	ETHOSU_IOCTL(PERFMON_DESTROY, perfmon_destroy, 0),
+	ETHOSU_IOCTL(PERFMON_GET_VALUES, perfmon_get_values, 0),
+	ETHOSU_IOCTL(PERFMON_SET_GLOBAL, perfmon_set_global, 0),
 };
 
 DEFINE_DRM_ACCEL_FOPS(ethosu_drm_driver_fops);
@@ -312,11 +318,16 @@ static int ethosu_init(struct ethosu_device *ethosudev)
 
 	ethosudev->npu_info.id = id = readl_relaxed(ethosudev->regs + NPU_REG_ID);
 	ethosudev->npu_info.config = config = readl_relaxed(ethosudev->regs + NPU_REG_CONFIG);
-
 	ethosu_sram_init(ethosudev);
 
+	if (!ethosu_is_u65(ethosudev))
+		ethosudev->pmu_regs += 0x1000;
+
+	ethosudev->npu_info.pmu_counters = FIELD_GET(PMCR_NUM_EVENT_CNT_MASK,
+		readl_relaxed(ethosudev->pmu_regs + NPU_REG_PMCR));
+
 	dev_info(ethosudev->base.dev,
-		 "Ethos-U NPU, arch v%ld.%ld.%ld, rev r%ldp%ld, cmd stream ver%ld, %d MACs, %dKB SRAM\n",
+		 "Ethos-U NPU, arch v%ld.%ld.%ld, rev r%ldp%ld, cmd stream ver%ld, %d MACs, %dKB SRAM, %d PMU cntrs\n",
 		 FIELD_GET(ID_ARCH_MAJOR_MASK, id),
 		 FIELD_GET(ID_ARCH_MINOR_MASK, id),
 		 FIELD_GET(ID_ARCH_PATCH_MASK, id),
@@ -324,7 +335,8 @@ static int ethosu_init(struct ethosu_device *ethosudev)
 		 FIELD_GET(ID_VER_MINOR_MASK, id),
 		 FIELD_GET(CONFIG_CMD_STREAM_VER_MASK, config),
 		 1 << FIELD_GET(CONFIG_MACS_PER_CC_MASK, config),
-		 ethosudev->npu_info.sram_size / 1024);
+		 ethosudev->npu_info.sram_size / 1024,
+		 ethosudev->npu_info.pmu_counters);
 
 	return 0;
 }
@@ -343,11 +355,16 @@ static int ethosu_probe(struct platform_device *pdev)
 	dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(40));
 
 	ethosudev->regs = devm_platform_ioremap_resource(pdev, 0);
+	ethosudev->pmu_regs = ethosudev->regs;
 
 	ethosudev->num_clks = devm_clk_bulk_get_all(&pdev->dev, &ethosudev->clks);
 	if (ethosudev->num_clks < 0)
 		return ethosudev->num_clks;
 
+	ret = devm_mutex_init(&pdev->dev, &ethosudev->perfmon_state.lock);
+	if (ret)
+		return ret;
+
 	ret = ethosu_job_init(ethosudev);
 	if (ret)
 		return ret;
diff --git a/drivers/accel/ethosu/ethosu_drv.h b/drivers/accel/ethosu/ethosu_drv.h
index 9e21dfe94184..2193bc51d425 100644
--- a/drivers/accel/ethosu/ethosu_drv.h
+++ b/drivers/accel/ethosu/ethosu_drv.h
@@ -1,15 +1,74 @@
 /* SPDX-License-Identifier: GPL-2.0-only OR MIT */
-/* Copyright 2025 Arm, Ltd. */
+/* Copyright 2025-2026 Arm, Ltd. */
 #ifndef __ETHOSU_DRV_H__
 #define __ETHOSU_DRV_H__
 
+#include <linux/mutex.h>
+#include <linux/xarray.h>
 #include <drm/gpu_scheduler.h>
 
 struct ethosu_device;
+struct drm_device;
+struct drm_file;
 
 struct ethosu_file_priv {
 	struct ethosu_device *edev;
 	struct drm_sched_entity sched_entity;
+	struct xarray perfmons;
 };
 
+/* Performance monitor object. The perfmon lifetime is controlled by userspace
+ * using perfmon related ioctls. A perfmon can be attached to a DRM_ETHOSU_SUBMIT
+ * request, and when this is the case, HW perf counters will be activated just
+ * before the job is submitted to the NPU and disabled when the job is
+ * done. This way, only events related to a specific job will be counted.
+ */
+struct ethosu_perfmon {
+	/* Tracks the number of users of the perfmon, when this counter reaches
+	 * zero the perfmon is destroyed.
+	 */
+	refcount_t refcnt;
+
+	/* Number of counters activated in this perfmon instance
+	 * (should be less than or equal to DRM_ETHOSU_MAX_PERF_COUNTERS).
+	 */
+	u8 ncounters;
+
+	/* Events counted by the HW perf counters. */
+	u16 counters[DRM_ETHOSU_MAX_PERF_EVENT_COUNTERS];
+
+	/*
+	 * Storage for counter values. Counters are incremented by the HW
+	 * perf counter values every time the perfmon is attached to an
+	 * NPU job. This way, perfmon users don't have to retrieve the
+	 * results after each job if they want to track events covering
+	 * several submissions. Note that counter values can't be reset,
+	 * but you can fake a reset by destroying the perfmon and
+	 * creating a new one.
+	 */
+	u64 values[] __counted_by(ncounters);
+};
+
+/* ethosu_perfmon.c */
+void ethosu_perfmon_get(struct ethosu_perfmon *perfmon);
+void ethosu_perfmon_put(struct ethosu_perfmon *perfmon);
+void ethosu_perfmon_start(struct ethosu_device *ethosu,
+			  struct ethosu_perfmon *perfmon);
+void ethosu_perfmon_stop(struct ethosu_device *ethosu,
+			 struct ethosu_perfmon *perfmon, bool capture);
+void ethosu_perfmon_stop_locked(struct ethosu_device *ethosu, struct ethosu_perfmon *perfmon,
+				bool capture);
+struct ethosu_perfmon *ethosu_perfmon_find(struct ethosu_file_priv *ethosu_priv,
+					   int id);
+void ethosu_perfmon_open_file(struct ethosu_file_priv *ethosu_priv);
+void ethosu_perfmon_close_file(struct ethosu_file_priv *ethosu_priv);
+int ethosu_ioctl_perfmon_create(struct drm_device *dev, void *data,
+				struct drm_file *file_priv);
+int ethosu_ioctl_perfmon_destroy(struct drm_device *dev, void *data,
+				 struct drm_file *file_priv);
+int ethosu_ioctl_perfmon_get_values(struct drm_device *dev, void *data,
+				    struct drm_file *file_priv);
+int ethosu_ioctl_perfmon_set_global(struct drm_device *dev, void *data,
+				    struct drm_file *file_priv);
+
 #endif
diff --git a/drivers/accel/ethosu/ethosu_job.c b/drivers/accel/ethosu/ethosu_job.c
index b76924645aaa..99dec33f526b 100644
--- a/drivers/accel/ethosu/ethosu_job.c
+++ b/drivers/accel/ethosu/ethosu_job.c
@@ -1,6 +1,6 @@
 // SPDX-License-Identifier: GPL-2.0-only OR MIT
 /* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
-/* Copyright 2025 Arm, Ltd. */
+/* Copyright 2025-2026 Arm, Ltd. */
 
 #include <linux/bitfield.h>
 #include <linux/genalloc.h>
@@ -147,6 +147,8 @@ static void ethosu_job_err_cleanup(struct ethosu_job *job)
 {
 	unsigned int i;
 
+	ethosu_perfmon_put(job->perfmon);
+
 	for (i = 0; i < job->region_cnt; i++)
 		drm_gem_object_put(job->region_bo[i]);
 
@@ -181,6 +183,26 @@ static void ethosu_job_free(struct drm_sched_job *sched_job)
 	ethosu_job_put(job);
 }
 
+static void
+ethosu_switch_perfmon(struct ethosu_device *ethosu, struct ethosu_job *job)
+{
+	struct ethosu_perfmon *perfmon;
+
+	guard(mutex)(&ethosu->perfmon_state.lock);
+
+	perfmon = ethosu->global_perfmon;
+	if (!perfmon)
+		perfmon = job->perfmon;
+
+	if (perfmon == ethosu->perfmon_state.active)
+		return;
+
+	ethosu_perfmon_stop_locked(ethosu, ethosu->perfmon_state.active, true);
+
+	if (perfmon)
+		ethosu_perfmon_start(ethosu, perfmon);
+}
+
 static struct dma_fence *ethosu_job_run(struct drm_sched_job *sched_job)
 {
 	struct ethosu_job *job = to_ethosu_job(sched_job);
@@ -194,6 +216,8 @@ static struct dma_fence *ethosu_job_run(struct drm_sched_job *sched_job)
 		       dev->fence_context, ++dev->emit_seqno);
 	dma_fence_get(fence);
 
+	ethosu_switch_perfmon(dev, job);
+
 	scoped_guard(mutex, &dev->job_lock) {
 		dev->in_flight_job = job;
 		ethosu_job_hw_submit(dev, job);
@@ -365,7 +389,8 @@ void ethosu_job_close(struct ethosu_file_priv *ethosu_priv)
 }
 
 static int ethosu_ioctl_submit_job(struct drm_device *dev, struct drm_file *file,
-				   struct drm_ethosu_job *job)
+				   struct drm_ethosu_job *job,
+				   int perfmon_id)
 {
 	struct ethosu_device *edev = to_ethosu_device(dev);
 	struct ethosu_file_priv *file_priv = file->driver_priv;
@@ -389,6 +414,9 @@ static int ethosu_ioctl_submit_job(struct drm_device *dev, struct drm_file *file
 	ejob->dev = edev;
 	ejob->sram_size = job->sram_size;
 
+	if (perfmon_id)
+		ejob->perfmon = ethosu_perfmon_find(file_priv, perfmon_id);
+
 	ejob->done_fence = kzalloc_obj(*ejob->done_fence);
 	if (!ejob->done_fence) {
 		ret = -ENOMEM;
@@ -491,11 +519,6 @@ int ethosu_ioctl_submit(struct drm_device *dev, void *data, struct drm_file *fil
 	int ret = 0;
 	unsigned int i = 0;
 
-	if (args->pad) {
-		drm_dbg(dev, "Reserved field in drm_ethosu_submit struct should be 0.\n");
-		return -EINVAL;
-	}
-
 	struct drm_ethosu_job __free(kvfree) *jobs =
 		kvmalloc_objs(*jobs, args->job_count);
 	if (!jobs)
@@ -509,7 +532,7 @@ int ethosu_ioctl_submit(struct drm_device *dev, void *data, struct drm_file *fil
 	}
 
 	for (i = 0; i < args->job_count; i++) {
-		ret = ethosu_ioctl_submit_job(dev, file, &jobs[i]);
+		ret = ethosu_ioctl_submit_job(dev, file, &jobs[i], args->perfmon_id);
 		if (ret)
 			return ret;
 	}
diff --git a/drivers/accel/ethosu/ethosu_job.h b/drivers/accel/ethosu/ethosu_job.h
index ff1cf448d094..8988edd00eed 100644
--- a/drivers/accel/ethosu/ethosu_job.h
+++ b/drivers/accel/ethosu/ethosu_job.h
@@ -21,6 +21,8 @@ struct ethosu_job {
 	u8 region_cnt;
 	u32 sram_size;
 
+	struct ethosu_perfmon *perfmon;
+
 	/* Fence to be signaled by drm-sched once its done with the job */
 	struct dma_fence *inference_done_fence;
 
diff --git a/drivers/accel/ethosu/ethosu_perfmon.c b/drivers/accel/ethosu/ethosu_perfmon.c
new file mode 100644
index 000000000000..d1380e3f2ea3
--- /dev/null
+++ b/drivers/accel/ethosu/ethosu_perfmon.c
@@ -0,0 +1,298 @@
+// SPDX-License-Identifier: GPL-2.0-only OR MIT
+/* Copyright 2026 Arm, Ltd. */
+/* Based on v3d_perfmon.c, Copyright (C) 2021 Raspberry Pi */
+
+#include <linux/device.h>
+#include <linux/errno.h>
+#include <linux/pm_runtime.h>
+#include <linux/types.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+
+#include <drm/drm_file.h>
+#include <drm/drm_ioctl.h>
+
+#include <uapi/drm/ethosu_accel.h>
+
+#include "ethosu_drv.h"
+#include "ethosu_device.h"
+
+void ethosu_perfmon_get(struct ethosu_perfmon *perfmon)
+{
+	if (perfmon)
+		refcount_inc(&perfmon->refcnt);
+}
+
+void ethosu_perfmon_put(struct ethosu_perfmon *perfmon)
+{
+	if (perfmon && refcount_dec_and_test(&perfmon->refcnt)) {
+		kfree(perfmon);
+	}
+}
+
+void ethosu_perfmon_start(struct ethosu_device *ethosu, struct ethosu_perfmon *perfmon)
+{
+	unsigned int i;
+	u8 ncounters;
+	u32 mask;
+
+	if (WARN_ON_ONCE(!perfmon || ethosu->perfmon_state.active))
+		return;
+
+	writel_relaxed(PMCR_CNT_EN, ethosu->pmu_regs + NPU_REG_PMCR);
+	writel_relaxed(PMU_EV_TYPE_CYCLES, ethosu->pmu_regs + NPU_REG_PMCCNTR_CFG);
+
+	mask = 0x80000000;
+	ncounters = perfmon->ncounters - 1;
+	if (ncounters)
+		mask |= GENMASK(ncounters - 1, 0);
+
+	for (i = 0; i < ncounters; i++)
+		writel_relaxed(perfmon->counters[i], ethosu->pmu_regs + NPU_REG_PMU_EVTYPER(i));
+
+	writel_relaxed(mask, ethosu->pmu_regs + NPU_REG_PMCNTENSET);
+	writel_relaxed(PMCR_CNT_EN | PMCR_EVENT_CNT_RST | PMCR_CYCLE_CNT_RST,
+		ethosu->pmu_regs + NPU_REG_PMCR);
+	ethosu->perfmon_state.active = perfmon;
+}
+
+void ethosu_perfmon_stop_locked(struct ethosu_device *ethosu, struct ethosu_perfmon *perfmon,
+				bool capture)
+{
+	unsigned int i;
+	u8 ncounters;
+	u32 mask;
+
+	lockdep_assert_held(&ethosu->perfmon_state.lock);
+
+	if (!perfmon || perfmon != ethosu->perfmon_state.active)
+		return;
+
+	ncounters = perfmon->ncounters - 1;
+
+	if (!pm_runtime_get_if_active(ethosu->base.dev)) {
+		ethosu->perfmon_state.active = NULL;
+		return;
+	}
+
+	if (capture) {
+		for (i = 0; i < ncounters; i++)
+			perfmon->values[i] += readl_relaxed(ethosu->pmu_regs + NPU_REG_PMU_EVCNTR(i));
+		perfmon->values[ncounters] +=
+			readl_relaxed(ethosu->pmu_regs + NPU_REG_PMCCNTR_LO) |
+			(u64)readl_relaxed(ethosu->pmu_regs + NPU_REG_PMCCNTR_HI) << 32;
+	}
+
+	mask = 0x80000000;
+	if (ncounters)
+		mask |= GENMASK(ncounters - 1, 0);
+	writel_relaxed(mask, ethosu->pmu_regs + NPU_REG_PMCNTENCLR);
+
+	writel_relaxed(0, ethosu->pmu_regs + NPU_REG_PMCR);
+	ethosu->perfmon_state.active = NULL;
+
+	pm_runtime_put(ethosu->base.dev);
+}
+
+void ethosu_perfmon_stop(struct ethosu_device *ethosu, struct ethosu_perfmon *perfmon,
+			 bool capture)
+{
+	if (!perfmon)
+		return;
+
+	guard(mutex)(&ethosu->perfmon_state.lock);
+	ethosu_perfmon_stop_locked(ethosu, perfmon, capture);
+}
+
+struct ethosu_perfmon *ethosu_perfmon_find(struct ethosu_file_priv *ethosu_priv, int id)
+{
+	struct ethosu_perfmon *perfmon;
+
+	xa_lock(&ethosu_priv->perfmons);
+	perfmon = xa_load(&ethosu_priv->perfmons, id);
+	ethosu_perfmon_get(perfmon);
+	xa_unlock(&ethosu_priv->perfmons);
+
+	return perfmon;
+}
+
+void ethosu_perfmon_open_file(struct ethosu_file_priv *ethosu_priv)
+{
+	xa_init_flags(&ethosu_priv->perfmons, XA_FLAGS_ALLOC1);
+}
+
+static void ethosu_perfmon_delete(struct ethosu_file_priv *ethosu_priv,
+				 struct ethosu_perfmon *perfmon)
+{
+	struct ethosu_device *ethosu = ethosu_priv->edev;
+
+	/* If the active perfmon is being destroyed, stop it first */
+	scoped_guard(mutex, &ethosu->perfmon_state.lock) {
+		/* If the global perfmon is being destroyed, set it to NULL */
+		if (ethosu->global_perfmon == perfmon) {
+			ethosu->global_perfmon = NULL;
+			ethosu_perfmon_put(perfmon);
+		}
+
+		ethosu_perfmon_stop_locked(ethosu, perfmon, false);
+	}
+
+	ethosu_perfmon_put(perfmon);
+}
+
+void ethosu_perfmon_close_file(struct ethosu_file_priv *ethosu_priv)
+{
+	struct ethosu_perfmon *perfmon;
+	unsigned long id;
+
+	xa_for_each(&ethosu_priv->perfmons, id, perfmon)
+		ethosu_perfmon_delete(ethosu_priv, perfmon);
+
+	xa_destroy(&ethosu_priv->perfmons);
+}
+
+int ethosu_ioctl_perfmon_create(struct drm_device *dev, void *data,
+			     struct drm_file *file_priv)
+{
+	struct ethosu_file_priv *ethosu_priv = file_priv->driver_priv;
+	struct drm_ethosu_perfmon_create *req = data;
+	struct ethosu_device *ethosu = to_ethosu_device(dev);
+	struct ethosu_perfmon *perfmon;
+	unsigned int i, event_max;
+	int ret;
+	u32 id;
+
+	/* Number of monitored counters cannot exceed HW limits. */
+	if (req->ncounters > ethosu->npu_info.pmu_counters)
+		return -EINVAL;
+
+	/* Make sure all counters are valid. */
+	event_max = ethosu_is_u65(ethosu) ? 433 : 671;
+	for (i = 0; i < req->ncounters; i++) {
+		if (req->counters[i] > event_max)
+			return -EINVAL;
+	}
+
+	/* Add 1 more counter for cycle counter */
+	req->ncounters++;
+
+	perfmon = kzalloc_flex(*perfmon, values, req->ncounters);
+	if (!perfmon)
+		return -ENOMEM;
+
+	for (i = 0; i < req->ncounters - 1; i++)
+		perfmon->counters[i] = req->counters[i];
+
+	perfmon->ncounters = req->ncounters;
+
+	refcount_set(&perfmon->refcnt, 1);
+
+	ret = xa_alloc(&ethosu_priv->perfmons, &id, perfmon, xa_limit_32b,
+		       GFP_KERNEL);
+
+	if (ret < 0) {
+		kfree(perfmon);
+		return ret;
+	}
+
+	req->id = id;
+
+	return 0;
+}
+
+int ethosu_ioctl_perfmon_destroy(struct drm_device *dev, void *data,
+				 struct drm_file *file_priv)
+{
+	struct ethosu_file_priv *ethosu_priv = file_priv->driver_priv;
+	struct drm_ethosu_perfmon_destroy *req = data;
+	struct ethosu_perfmon *perfmon;
+
+	perfmon = xa_erase(&ethosu_priv->perfmons, req->id);
+	if (!perfmon)
+		return -EINVAL;
+
+	ethosu_perfmon_delete(ethosu_priv, perfmon);
+
+	return 0;
+}
+
+int ethosu_ioctl_perfmon_get_values(struct drm_device *dev, void *data,
+				 struct drm_file *file_priv)
+{
+	struct ethosu_device *ethosu = to_ethosu_device(dev);
+	struct ethosu_file_priv *ethosu_priv = file_priv->driver_priv;
+	struct drm_ethosu_perfmon_get_values *req = data;
+	struct ethosu_perfmon *perfmon;
+	int ret = 0;
+
+	if (req->pad != 0)
+		return -EINVAL;
+
+	perfmon = ethosu_perfmon_find(ethosu_priv, req->id);
+	if (!perfmon)
+		return -EINVAL;
+
+	ret = pm_runtime_resume_and_get(dev->dev);
+	if (ret) {
+		ethosu_perfmon_put(perfmon);
+		return ret;
+	}
+	ethosu_perfmon_stop(ethosu, perfmon, true);
+
+	pm_runtime_put_autosuspend(dev->dev);
+
+	if (copy_to_user(u64_to_user_ptr(req->values_ptr), perfmon->values,
+			 perfmon->ncounters * sizeof(u64)))
+		ret = -EFAULT;
+
+	ethosu_perfmon_put(perfmon);
+
+	return ret;
+}
+
+int ethosu_ioctl_perfmon_set_global(struct drm_device *dev, void *data,
+				 struct drm_file *file_priv)
+{
+	struct ethosu_file_priv *ethosu_priv = file_priv->driver_priv;
+	struct drm_ethosu_perfmon_set_global *req = data;
+	struct ethosu_device *ethosu = to_ethosu_device(dev);
+	struct ethosu_perfmon *perfmon;
+
+	if (req->flags & ~DRM_ETHOSU_PERFMON_CLEAR_GLOBAL)
+		return -EINVAL;
+
+	perfmon = ethosu_perfmon_find(ethosu_priv, req->id);
+	if (!perfmon)
+		return -EINVAL;
+
+	/* If the request is to clear the global performance monitor */
+	if (req->flags & DRM_ETHOSU_PERFMON_CLEAR_GLOBAL) {
+		struct ethosu_perfmon *old;
+		scoped_guard(mutex, &ethosu->perfmon_state.lock) {
+			old = ethosu->global_perfmon;
+			if (!old) {
+				ethosu_perfmon_put(perfmon);
+				return -EINVAL;
+			}
+
+			ethosu->global_perfmon = NULL;
+			ethosu_perfmon_stop_locked(ethosu, old, true);
+		}
+
+		ethosu_perfmon_put(old);
+		ethosu_perfmon_put(perfmon);
+
+		return 0;
+	}
+
+	scoped_guard(mutex, &ethosu->perfmon_state.lock) {
+		if (ethosu->perfmon_state.active || ethosu->global_perfmon) {
+			ethosu_perfmon_put(perfmon);
+			return -EBUSY;
+		}
+
+		ethosu->global_perfmon = perfmon;
+	}
+
+	return 0;
+}
diff --git a/include/uapi/drm/ethosu_accel.h b/include/uapi/drm/ethosu_accel.h
index af78bb4686d7..5b97d59a7806 100644
--- a/include/uapi/drm/ethosu_accel.h
+++ b/include/uapi/drm/ethosu_accel.h
@@ -43,6 +43,11 @@ enum drm_ethosu_ioctl_id {
 
 	/** @DRM_ETHOSU_SUBMIT: Submit a job and BOs to run. */
 	DRM_ETHOSU_SUBMIT,
+
+	DRM_ETHOSU_PERFMON_CREATE,
+	DRM_ETHOSU_PERFMON_DESTROY,
+	DRM_ETHOSU_PERFMON_GET_VALUES,
+	DRM_ETHOSU_PERFMON_SET_GLOBAL,
 };
 
 /**
@@ -79,6 +84,7 @@ struct drm_ethosu_npu_info {
 	__u32 config;
 
 	__u32 sram_size;
+	__u32 pmu_counters;
 };
 
 /**
@@ -220,10 +226,54 @@ struct drm_ethosu_submit {
 	/** Input: Number of jobs passed in. */
 	__u32 job_count;
 
-	/** Reserved, must be zero. */
+	/** Input: Id returned by DRM_ETHOSU_PERFMON_CREATE */
+	__u32 perfmon_id;
+};
+
+#define DRM_ETHOSU_MAX_PERF_EVENT_COUNTERS	8
+#define DRM_ETHOSU_MAX_PERF_COUNTERS \
+	(DRM_ETHOSU_MAX_PERF_EVENT_COUNTERS + 1)
+
+struct drm_ethosu_perfmon_create {
+	__u32 id;
+	__u32 ncounters;
+	__u16 counters[DRM_ETHOSU_MAX_PERF_EVENT_COUNTERS];
+};
+
+struct drm_ethosu_perfmon_destroy {
+	__u32 id;
 	__u32 pad;
 };
 
+/*
+ * Returns the values of the performance counters tracked by this
+ * perfmon (as an array of (ncounters + 1) u64 values).
+ *
+ * No implicit synchronization is performed, so the user has to
+ * guarantee that any jobs using this perfmon have already been
+ * completed.
+ */
+struct drm_ethosu_perfmon_get_values {
+	__u32 id;
+	__u32 pad;
+	__u64 values_ptr;
+};
+
+#define DRM_ETHOSU_PERFMON_CLEAR_GLOBAL    0x0001
+
+/**
+ * struct drm_ethosu_perfmon_set_global - ioctl to define a global performance
+ * monitor
+ *
+ * The global performance monitor will be used for all jobs. If a global
+ * performance monitor is defined, jobs with a self-defined performance
+ * monitor won't be allowed.
+ */
+struct drm_ethosu_perfmon_set_global {
+	__u32 flags;
+	__u32 id;
+};
+
 /**
  * DRM_IOCTL_ETHOSU() - Build a ethosu IOCTL number
  * @__access: Access type. Must be R, W or RW.
@@ -252,6 +302,14 @@ enum {
 		DRM_IOCTL_ETHOSU(WR, CMDSTREAM_BO_CREATE, cmdstream_bo_create),
 	DRM_IOCTL_ETHOSU_SUBMIT =
 		DRM_IOCTL_ETHOSU(WR, SUBMIT, submit),
+	DRM_IOCTL_ETHOSU_PERFMON_CREATE =
+		DRM_IOCTL_ETHOSU(WR, PERFMON_CREATE, perfmon_create),
+	DRM_IOCTL_ETHOSU_PERFMON_DESTROY =
+		DRM_IOCTL_ETHOSU(WR, PERFMON_DESTROY, perfmon_destroy),
+	DRM_IOCTL_ETHOSU_PERFMON_GET_VALUES =
+		DRM_IOCTL_ETHOSU(WR, PERFMON_GET_VALUES, perfmon_get_values),
+	DRM_IOCTL_ETHOSU_PERFMON_SET_GLOBAL =
+		DRM_IOCTL_ETHOSU(WR, PERFMON_SET_GLOBAL, perfmon_set_global),
 };
 
 #if defined(__cplusplus)
-- 
2.54.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Claude review: Re: [PATCH v2] accel: ethosu: Add performance counter support
  2026-05-16 13:23 ` [PATCH v2] " Maíra Canal
@ 2026-05-25  7:53   ` Claude Code Review Bot
  0 siblings, 0 replies; 6+ messages in thread
From: Claude Code Review Bot @ 2026-05-25  7:53 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**UAPI issues:**

1. **`drm_ethosu_npu_info` breaks existing ABI** (ethosu_accel.h). Adding `pmu_counters` after `sram_size` changes the struct size. Since this struct is used with `DRM_IOCTL_ETHOSU(R, ...)`, the ioctl number encodes the struct size. Old userspace calling `GET_NPU_INFO` will use a different ioctl number and get `-ENOTTY`. This is likely acceptable for a new driver that hasn't shipped a stable ABI yet, but should be explicitly called out in the commit message.

2. **`drm_ethosu_perfmon_create` has no padding field** — The struct is 24 bytes (`__u32 id` + `__u32 ncounters` + `__u16 counters[8]`), which is 8-byte aligned. This is fine, but `ncounters` is modified by the kernel (incremented to include cycle counter at line 1553: `req->ncounters++`). The DRM ioctl infrastructure copies data back to userspace for WR ioctls, so this modified value will be visible to userspace. This is a UAPI concern — userspace passes N but gets N+1 back. This should either be documented explicitly or handled differently (e.g., don't modify `req->ncounters` in place).

3. **`drm_ethosu_perfmon_destroy` pad field is never checked** — v3 added a `__u32 pad` field for alignment, but the destroy ioctl handler never validates `req->pad == 0`. This is inconsistent with `perfmon_get_values` which does check its pad field.

**Locking / concurrency issues:**

4. **`global_perfmon` accessed without lock protection in `ethosu_switch_perfmon`** (ethosu_job.c, around line 1290):
```c
perfmon = ethosu->global_perfmon;
```
This reads `global_perfmon` under `perfmon_state.lock`, which is correct. However, `ethosu_ioctl_perfmon_set_global` also accesses `global_perfmon` under the same lock, so this is actually fine. Good.

5. **`ethosu_perfmon_start` called under mutex in `ethosu_switch_perfmon` but accesses MMIO registers** — `ethosu_perfmon_start` does `writel_relaxed` to program counters. This is called from `ethosu_job_run` which is the scheduler's `run_job` callback. The function holds `perfmon_state.lock` (mutex). Since the NPU should already be powered at this point (job is being submitted), this should be okay, but there's no `pm_runtime_get` protecting these register writes, unlike `ethosu_perfmon_stop_locked` which does call `pm_runtime_get_if_active`. If the device gets suspended between the fence init and the MMIO writes, the register accesses would hit powered-down hardware. The `job_lock` mutex below should serialize with the IRQ handler, but pm_runtime is the concern.

6. **Race in `ethosu_perfmon_find`** (ethosu_perfmon.c, around line 1483-1493):
```c
xa_lock(&ethosu_priv->perfmons);
perfmon = xa_load(&ethosu_priv->perfmons, id);
ethosu_perfmon_get(perfmon);
xa_unlock(&ethosu_priv->perfmons);
```
Using `xa_lock`/`xa_unlock` manually here works for serializing the load+refcount_inc, but `xa_erase` in `perfmon_destroy` does not hold this lock — it uses its own internal locking. There is a TOCTOU window: `perfmon_find` could load and start incrementing the refcount on a perfmon that `perfmon_destroy` just erased and is about to `perfmon_delete`. The `refcount_inc` in `ethosu_perfmon_get` will WARN if the refcount is already 0. This needs the xa_lock to be held across both the erase and the first put, or use `xa_cmpxchg`.

7. **`ethosu_perfmon_get` uses `refcount_inc` which will WARN/BUG if refcount is 0** — If `ethosu_perfmon_find` races with destroy (see above), this could trigger a refcount saturation warning. Using `refcount_inc_not_zero` and returning NULL on failure would be safer.

**Correctness issues:**

8. **`ethosu_perfmon_find` can return NULL perfmon but `ethosu_ioctl_submit_job` doesn't check** (ethosu_job.c, around line 1331-1332):
```c
if (perfmon_id)
    ejob->perfmon = ethosu_perfmon_find(file_priv, perfmon_id);
```
If `perfmon_id` is nonzero but the perfmon doesn't exist (or was destroyed), `ethosu_perfmon_find` returns NULL and the job proceeds without a perfmon rather than returning an error. This should return `-ENOENT` or `-EINVAL` if the perfmon_id was specified but not found.

9. **`perfmon_get_values` stops the perfmon then doesn't restart it** (ethosu_perfmon.c, around line 1616):
```c
ethosu_perfmon_stop(ethosu, perfmon, true);
```
After reading values, the perfmon is stopped and `perfmon_state.active` is set to NULL. If this was a global perfmon actively counting, it won't be restarted until the next job switches perfmon. The comment says "No implicit synchronization is performed, so the user has to guarantee that any jobs using this perfmon have already been completed" — but for a *global* perfmon this semantic is awkward, since the user may want to read accumulated values mid-stream. The behavior should at least be documented.

10. **Magic number `0x80000000` for cycle counter enable bit** (ethosu_perfmon.c, lines ~1421, 1462, 1549):
```c
mask = 0x80000000;
```
This appears to be bit 31 of the PMCNTENSET/PMCNTENCLR register, presumably the cycle counter enable. It should be a named define (e.g., `PMCNTEN_CCNT_EN BIT(31)`) for readability, consistent with the other PMU defines.

11. **v2 patch is included in the same mbox as v3** — This will confuse automated tooling that tries to apply patches from the mbox. Only the latest version should typically be in a single mbox for review/application purposes. This appears to be an artifact of the filtering though, not the submitter's fault.

**Minor issues:**

12. **Missing error path for `IS_ERR(ethosudev->regs)` propagation** — The existing code at `ethosu_probe` line 345 already has this bug (no error check on `devm_platform_ioremap_resource`), but the patch adds `ethosudev->pmu_regs = ethosudev->regs` right after, inheriting the issue. The `pmu_regs += 0x1000` adjustment in `ethosu_init` would then corrupt an ERR_PTR. Not introduced by this patch, but worth noting.

13. **`drm_ethosu_perfmon_create.ncounters` should validate `ncounters > 0`** — If userspace passes `ncounters = 0`, the code increments to 1 (for the cycle counter), allocates a perfmon with 1 value slot, and sets `ncounters - 1 = 0` event counters. This technically works (only the cycle counter is used), but the intent is unclear — should a perfmon with zero event counters be allowed?

14. **Comment says "perform" instead of "perfmon"** — In `ethosu_drv.h` (v2 only), the comment reads "The perform lifetime is controlled by userspace". v3 fixes this to "The perfmon lifetime...". Good.

**Summary:** The overall design is sound and follows the V3D pattern well. The most critical issues are the race between find and destroy (#6/#7), the missing NULL-check after `perfmon_find` in the submit path (#8), and the UAPI semantics around `ncounters` being modified in-place (#2). The pm_runtime concern in `perfmon_start` (#5) deserves investigation. The rest are style/documentation issues.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-05-25  7:53 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-15  3:26 [PATCH v2] accel: ethosu: Add performance counter support Rob Herring (Arm)
2026-05-16  0:08 ` Claude review: " Claude Code Review Bot
2026-05-16  0:08 ` Claude Code Review Bot
2026-05-16 13:23 ` [PATCH v2] " Maíra Canal
2026-05-25  7:53   ` Claude review: " Claude Code Review Bot
2026-05-23  8:37 ` [PATCH v3] " Tomeu Vizoso

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox