[PATCH v2 0/4] drm/v3d: Fix perfmon locking and cross-queue isolation

public inbox for drm-ai-reviews@public-inbox.freedesktop.org
 help / color / mirror / Atom feed

* [PATCH v2 0/4] drm/v3d: Fix perfmon locking and cross-queue isolation
@ 2026-05-31 20:18 Maíra Canal
  2026-05-31 20:18 ` [PATCH v2 1/4] drm/v3d: Fix global performance monitor reference counting Maíra Canal
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Maíra Canal @ 2026-05-31 20:18 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, Tvrtko Ursulin, David Airlie,
	Simona Vetter
  Cc: kernel-dev, dri-devel, stable, Maíra Canal

A v3d core is able to expose a single set of HW performance counters, so at
any moment at most one perfmon can be programmed in HW. Currently, the
driver tracks the active perfmon with v3d_dev->active_perfmon, but three
long-standing issues makes perfmon handling unreliable:

1. The active_perfmon pointer is accessed lock-free from scheduler
   callbacks, the GPU-reset path and the perfmon ioctls. Note that the
   v3d_perfmon->lock mutex serialized start/stop of one perfmon object
   against itself, but the invariant that needs protection is device-wide.

2. perfmon start/stop is hooked exclusively to run_job() callbacks via
   v3d_switch_perfmon(). If nothing is queued behind a perfmon-carrying
   job, the perfmon is never actually stopped.

3. A non-global perfmon should count events generated by a specific
   submission, but the scheduler can run jobs from different queues
   concurrently. Without explicit cross-queue serialization, an unrelated
   job running in parallel pollutes the counters and produces unusable
   results.

This series aims to address all three issues.

PATCH 1 is a minimal, stable-targeted fix for a separate problem: the
SET_GLOBAL ioctl leaks the perfmon reference on several paths. It is kept
self-contained so it can be backported on its own.

PATCH 2 moves the locking to where the invariant actually lives (fixing
issue #1) and replaces the sleeping mutex with a spinlock, which allows us
to stop the perfmon from the IRQ handler at job-completion time (the
natural boundary for "active perfmon follows the active job") and fixes
issue #2.

PATCH 3 addresses issue #3 by building on the new locking to enforce
cross-queue serialization when a non-global perfmon is attached, by adding
scheduler fence dependencies during submission. The fence dependencies
allow us to enforce two rules:

1. A job that carries a non-global perfmon waits for every job currently
   in-flight across all HW queues to finish.

2. While such a job is in-flight, any subsequently submitted job waits
   for it.

This allows us to ensure cross-queue isolation and the reliability of
the performance counter values.

PATCH 4 is a cleanup that drops the now-redundant queue argument from
v3d_job_add_syncobjs(), as struct v3d_job carries its submission queue
after PATCH 3.

To make sure that this series actually produces the expected results and
improves the overall reliability of v3d's performance monitors, this
series is accompanied by a IGT series [1], which was already merged.

This series depends on [2].

[1] https://lore.kernel.org/igt-dev/20260514201637.1811428-1-mcanal@igalia.com/T/
[2] https://lore.kernel.org/dri-devel/20260510-v3d-sched-misc-fixes-v2-0-ca4aba343ef6@igalia.com/T/

Best regards,
- Maíra

---
v1 -> v2: https://lore.kernel.org/r/20260508-v3d-perfmon-lifetime-v1-0-f5b5642c085f@igalia.com

- Rebased on top of "[PATCH v2 00/14] drm/v3d: Scheduler and submission
  fixes and refactoring"
- [1/4] NEW PATCH: "drm/v3d: Fix global performance monitor reference counting"
        - Minimal patch for stable branches only fixing the reference leaks
          in global perfmons.
- [2/4] Start/stop the global perfmon inside the set_global_perfmon ioctl and
        simplify global perfmon management across the helpers (Iago Toral)
        - In the reset path, before stopping the perfmon for the HW reset,
          v3d_reset() now re-arms the global perfmon with v3d_perfmon_resume(),
          as the global perfmon's start/stop points live only in the IOCTL.
        - v3d_perfmon_get_values_ioctl() no longer stops the perfmon, it
          only captures the values. Lifecycle management is left to the job
          (per-job perfmons) or the SET_GLOBAL ioctl (global perfmon).
- [2/4] In v3d_perfmon_delete(), first, stop the perfmon and then, check if
        it's a global perfmon (Iago Toral)
- [2/4] Add some comments to explain the refcount logic for global perfmons
        (Iago Toral)
- [3/4] Move the job->queue introduction to this patch instead of the
        previous one.
- [4/4] NEW PATCH: "drm/v3d: Drop the queue argument from v3d_job_add_syncobjs()"

---
Maíra Canal (4):
      drm/v3d: Fix global performance monitor reference counting
      drm/v3d: Refactor perfmon locking
      drm/v3d: Serialize jobs across queues when a perfmon is attached
      drm/v3d: Drop the queue argument from v3d_job_add_syncobjs()

 drivers/gpu/drm/v3d/v3d_drv.h     |  50 ++++++++--
 drivers/gpu/drm/v3d/v3d_gem.c     |   7 +-
 drivers/gpu/drm/v3d/v3d_irq.c     |   7 +-
 drivers/gpu/drm/v3d/v3d_perfmon.c | 195 +++++++++++++++++++++++++++++---------
 drivers/gpu/drm/v3d/v3d_power.c   |   4 +
 drivers/gpu/drm/v3d/v3d_sched.c   |  26 +----
 drivers/gpu/drm/v3d/v3d_submit.c  |  85 ++++++++++++++---
 7 files changed, 282 insertions(+), 92 deletions(-)
---
base-commit: 4c26e162947f91aa78ba57dd4fddd38fc80e7d60
change-id: 20260505-v3d-perfmon-lifetime-48c9ded1091b
prerequisite-change-id: 20260407-v3d-sched-misc-fixes-623739017e53:v2
prerequisite-patch-id: 01823e165a822ddec72b0b18e49c096d35149e9a
prerequisite-patch-id: 1df1c11ec62617336e2ed5445c24bbb912570035
prerequisite-patch-id: 738852cd3115283b43cef336ba8fe88616f28a88
prerequisite-patch-id: e1fa04bb45b0c1eb1478b2893b0dedcb6a825255
prerequisite-patch-id: 32c804d9921bcf259b236b3f1d74f7972aec02f2
prerequisite-patch-id: b1b437650405dd43ed324f9c02a0e591a793aec5
prerequisite-patch-id: f13898126dac8b6f14d8e1fba8804123f012889e
prerequisite-patch-id: e03f525b4491a3475ed0efa68bc8049f92be0bd0
prerequisite-patch-id: 971ac9100d4958aa2c0da2d734533eb9ea9d80b3
prerequisite-patch-id: 9d3d40ef80c032c0c05b058f6c397d04d1879236
prerequisite-patch-id: fb2e7a320a7ec3df75b74cfa6683a558f87135a2
prerequisite-patch-id: 29537da4c8f3c2a97f1350638e9da035cf3e7057
prerequisite-patch-id: 7303a43285dabde083921fe380a294c9026ef6e9
prerequisite-patch-id: bc42b0633cf33b2a693cc17534a6090f3813cc60

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/4] drm/v3d: Fix global performance monitor reference counting
  2026-05-31 20:18 [PATCH v2 0/4] drm/v3d: Fix perfmon locking and cross-queue isolation Maíra Canal
@ 2026-05-31 20:18 ` Maíra Canal
  2026-06-04  4:44   ` Claude review: " Claude Code Review Bot
  2026-05-31 20:18 ` [PATCH v2 2/4] drm/v3d: Refactor perfmon locking Maíra Canal
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Maíra Canal @ 2026-05-31 20:18 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, Tvrtko Ursulin, David Airlie,
	Simona Vetter
  Cc: kernel-dev, dri-devel, stable, Maíra Canal

In the SET_GLOBAL ioctl, v3d_perfmon_find() bumps the reference count on
the perfmon it returns, but v3d_perfmon_set_global_ioctl() and
v3d_perfmon_delete() fail to release that reference on several paths:

  1. v3d_perfmon_set_global_ioctl() leaks the reference on its error
     paths.

  2. CLEAR_GLOBAL leaks both the find reference and the reference
     previously stashed in v3d->global_perfmon by the SET_GLOBAL ioctl
     that configured it.

  3. Destroying a perfmon that is the current global perfmon leaks the
     reference stashed by the SET_GLOBAL ioctl.

Release each of these references explicitly.

Cc: stable@vger.kernel.org
Fixes: c6eabbab359c ("drm/v3d: Add DRM_IOCTL_V3D_PERFMON_SET_GLOBAL")
Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_perfmon.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c b/drivers/gpu/drm/v3d/v3d_perfmon.c
index 02451fc09dbb..48ae748247be 100644
--- a/drivers/gpu/drm/v3d/v3d_perfmon.c
+++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
@@ -319,8 +319,11 @@ static void v3d_perfmon_delete(struct v3d_file_priv *v3d_priv,
 	if (perfmon == v3d->active_perfmon)
 		v3d_perfmon_stop(v3d, perfmon, false);
 
-	/* If the global perfmon is being destroyed, set it to NULL */
-	cmpxchg(&v3d->global_perfmon, perfmon, NULL);
+	/* If the global perfmon is being destroyed, clean it and release
+	 * the reference stashed in v3d_perfmon_set_global_ioctl().
+	 */
+	if (cmpxchg(&v3d->global_perfmon, perfmon, NULL) == perfmon)
+		v3d_perfmon_put(perfmon);
 
 	v3d_perfmon_put(perfmon);
 }
@@ -471,16 +474,27 @@ int v3d_perfmon_set_global_ioctl(struct drm_device *dev, void *data,
 
 	/* If the request is to clear the global performance monitor */
 	if (req->flags & DRM_V3D_PERFMON_CLEAR_GLOBAL) {
-		if (!v3d->global_perfmon)
+		struct v3d_perfmon *old;
+
+		/* DRM_V3D_PERFMON_CLEAR_GLOBAL doesn't check if
+		 * v3d->global_perfmon == perfmon. Therefore, there
+		 * is no need to keep perfmon's reference.
+		 */
+		v3d_perfmon_put(perfmon);
+
+		old = xchg(&v3d->global_perfmon, NULL);
+		if (!old)
 			return -EINVAL;
 
-		xchg(&v3d->global_perfmon, NULL);
+		v3d_perfmon_put(old);
 
 		return 0;
 	}
 
-	if (cmpxchg(&v3d->global_perfmon, NULL, perfmon))
+	if (cmpxchg(&v3d->global_perfmon, NULL, perfmon)) {
+		v3d_perfmon_put(perfmon);
 		return -EBUSY;
+	}
 
 	return 0;
 }

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/4] drm/v3d: Refactor perfmon locking
  2026-05-31 20:18 [PATCH v2 0/4] drm/v3d: Fix perfmon locking and cross-queue isolation Maíra Canal
  2026-05-31 20:18 ` [PATCH v2 1/4] drm/v3d: Fix global performance monitor reference counting Maíra Canal
@ 2026-05-31 20:18 ` Maíra Canal
  2026-06-01 11:52   ` Iago Toral
  2026-06-04  4:44   ` Claude review: " Claude Code Review Bot
  2026-05-31 20:18 ` [PATCH v2 3/4] drm/v3d: Serialize jobs across queues when a perfmon is attached Maíra Canal
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 15+ messages in thread
From: Maíra Canal @ 2026-05-31 20:18 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, Tvrtko Ursulin, David Airlie,
	Simona Vetter
  Cc: kernel-dev, dri-devel, Maíra Canal

v3d exposes a single set of performance counters per core, so at any
moment at most one performance monitor can be programmed in HW. In
software, this singleton is represented by v3d_dev->active_perfmon, but
until now nothing actually serialized access to it: scheduler callbacks,
the GPU-reset path, and perfmon ioctls all read and wrote that field
lock-free.

The existence of v3d_perfmon->lock mutex did not close the gap. It
serialized start/stop of *one* perfmon object against itself, but the
invariant that needs protection is device-wide: there can be exactly one
active perfmon at any moment in HW. Two threads acting on different
perfmon objects could race through v3d_dev->active_perfmon and the
counter registers, leaving software and HW out of sync.

This commit moves the locking to where the invariant actually lives. Group
the active perfmon pointer with a device-wide spinlock and route every
state transition (job start, job completion, set global, reset,
suspend/resume, destruction) through a small set of locked entry points
that are the only mutators of the HW counters.

Some design improvements needed to be made for the refactor:

  1. Stop the perfmon from the IRQ handler at job-completion time (the
     natural boundary for "active perfmon follows the active job"). This
     required a change from a mutex to a spinlock. This solves another
     issue of the existing design: perfmon start/stop was exclusively
     attached to run_job() callbacks, which means that if nothing was
     further queued up, a perfmon would never actually be stopped.

  2. Pause/resume the HW counters across runtime-PM transitions without
     dropping the software reference. This preserves the perfmon state
     while the device is idle.

  3. Move the global perfmon lifecycle management to the set_global
     IOCTL. This simplifies the logic in v3d_perfmon_start() and
     v3d_perfmon_stop(), as there is no need to always check if the
     global perfmon is enabled.

  4. v3d_perfmon_get_values_ioctl() doesn't stop the perfmon when
     capturing the values. All lifecycle management is handled by the
     job (for per-job perfmons) or the set_global IOCTL (for global
     perfmons).

Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_drv.h     |  17 ++--
 drivers/gpu/drm/v3d/v3d_gem.c     |   4 +-
 drivers/gpu/drm/v3d/v3d_irq.c     |   7 +-
 drivers/gpu/drm/v3d/v3d_perfmon.c | 183 +++++++++++++++++++++++++++-----------
 drivers/gpu/drm/v3d/v3d_power.c   |   4 +
 drivers/gpu/drm/v3d/v3d_sched.c   |  26 ++----
 drivers/gpu/drm/v3d/v3d_submit.c  |   6 +-
 7 files changed, 165 insertions(+), 82 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 071d919fe860..51486af68cf4 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -86,9 +86,6 @@ struct v3d_perfmon {
 	 */
 	refcount_t refcnt;
 
-	/* Protects perfmon stop, as it can be invoked from multiple places. */
-	struct mutex lock;
-
 	/* Number of counters activated in this perfmon instance
 	 * (should be less than DRM_V3D_MAX_PERF_COUNTERS).
 	 */
@@ -170,8 +167,14 @@ struct v3d_dev {
 
 	struct v3d_queue_state queue[V3D_MAX_QUEUES];
 
-	/* Used to track the active perfmon if any. */
-	struct v3d_perfmon *active_perfmon;
+	/* Tracks the performance monitor state. */
+	struct {
+		/* Protects @active. */
+		spinlock_t lock;
+
+		/* Perfmon currently programmed in HW (or NULL if none). */
+		struct v3d_perfmon *active;
+	} perfmon_state;
 
 	/* Protects bo_stats */
 	struct mutex bo_lock;
@@ -663,6 +666,10 @@ void v3d_perfmon_put(struct v3d_perfmon *perfmon);
 void v3d_perfmon_start(struct v3d_dev *v3d, struct v3d_perfmon *perfmon);
 void v3d_perfmon_stop(struct v3d_dev *v3d, struct v3d_perfmon *perfmon,
 		      bool capture);
+void v3d_perfmon_stop_locked(struct v3d_dev *v3d, struct v3d_perfmon *perfmon,
+			     bool capture);
+void v3d_perfmon_suspend(struct v3d_dev *v3d);
+void v3d_perfmon_resume(struct v3d_dev *v3d);
 struct v3d_perfmon *v3d_perfmon_find(struct v3d_file_priv *v3d_priv, int id);
 void v3d_perfmon_open_file(struct v3d_file_priv *v3d_priv);
 void v3d_perfmon_close_file(struct v3d_file_priv *v3d_priv);
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index 1ee3c038d5f6..9487ab7acd03 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -137,7 +137,8 @@ v3d_reset(struct v3d_dev *v3d)
 	v3d_mmu_set_page_table(v3d);
 	v3d_irq_reset(v3d);
 
-	v3d_perfmon_stop(v3d, v3d->active_perfmon, false);
+	/* Re-arm the global perfmon HW counters that the reset zeroed. */
+	v3d_perfmon_resume(v3d);
 
 	trace_v3d_reset_end(dev);
 }
@@ -299,6 +300,7 @@ v3d_gem_init(struct drm_device *dev)
 	}
 
 	spin_lock_init(&v3d->mm_lock);
+	spin_lock_init(&v3d->perfmon_state.lock);
 	ret = drmm_mutex_init(dev, &v3d->bo_lock);
 	if (ret)
 		goto err_stats;
diff --git a/drivers/gpu/drm/v3d/v3d_irq.c b/drivers/gpu/drm/v3d/v3d_irq.c
index 754a969b862b..41fce1f8f96c 100644
--- a/drivers/gpu/drm/v3d/v3d_irq.c
+++ b/drivers/gpu/drm/v3d/v3d_irq.c
@@ -87,9 +87,12 @@ v3d_irq_signal_fence(struct v3d_dev *v3d, enum v3d_queue q,
 		     void (*trace_irq)(struct drm_device *, uint64_t))
 {
 	struct v3d_queue_state *queue = &v3d->queue[q];
-	struct v3d_fence *fence = to_v3d_fence(queue->active_job->irq_fence);
+	struct v3d_job *job = queue->active_job;
+	struct v3d_fence *fence = to_v3d_fence(job->irq_fence);
 
-	v3d_job_update_stats(queue->active_job);
+	v3d_perfmon_stop(v3d, job->perfmon, true);
+
+	v3d_job_update_stats(job);
 	trace_irq(&v3d->drm, fence->seqno);
 
 	queue->active_job = NULL;
diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c b/drivers/gpu/drm/v3d/v3d_perfmon.c
index 48ae748247be..3ad0f022753c 100644
--- a/drivers/gpu/drm/v3d/v3d_perfmon.c
+++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
@@ -217,26 +217,15 @@ void v3d_perfmon_get(struct v3d_perfmon *perfmon)
 
 void v3d_perfmon_put(struct v3d_perfmon *perfmon)
 {
-	if (perfmon && refcount_dec_and_test(&perfmon->refcnt)) {
-		mutex_destroy(&perfmon->lock);
+	if (perfmon && refcount_dec_and_test(&perfmon->refcnt))
 		kfree(perfmon);
-	}
 }
 
-void v3d_perfmon_start(struct v3d_dev *v3d, struct v3d_perfmon *perfmon)
+static void v3d_perfmon_hw_start(struct v3d_dev *v3d, struct v3d_perfmon *perfmon)
 {
+	u8 ncounters = perfmon->ncounters;
+	u32 mask = GENMASK(ncounters - 1, 0);
 	unsigned int i;
-	u32 mask;
-	u8 ncounters;
-
-	if (WARN_ON_ONCE(!perfmon || v3d->active_perfmon))
-		return;
-
-	if (!pm_runtime_get_if_active(v3d->drm.dev))
-		return;
-
-	ncounters = perfmon->ncounters;
-	mask = GENMASK(ncounters - 1, 0);
 
 	for (i = 0; i < ncounters; i++) {
 		u32 source = i / 4;
@@ -258,39 +247,106 @@ void v3d_perfmon_start(struct v3d_dev *v3d, struct v3d_perfmon *perfmon)
 	V3D_CORE_WRITE(0, V3D_V4_PCTR_0_EN, mask);
 	V3D_CORE_WRITE(0, V3D_V4_PCTR_0_CLR, mask);
 	V3D_CORE_WRITE(0, V3D_PCTR_0_OVERFLOW, mask);
+}
 
-	v3d->active_perfmon = perfmon;
+static void v3d_perfmon_hw_capture(struct v3d_dev *v3d, struct v3d_perfmon *perfmon)
+{
+	u32 mask = GENMASK(perfmon->ncounters - 1, 0);
 
+	for (int i = 0; i < perfmon->ncounters; i++)
+		perfmon->values[i] += V3D_CORE_READ(0, V3D_PCTR_0_PCTRX(i));
+
+	V3D_CORE_WRITE(0, V3D_V4_PCTR_0_CLR, mask);
+}
+
+static void v3d_perfmon_hw_stop(struct v3d_dev *v3d, struct v3d_perfmon *perfmon,
+				bool capture)
+{
+	if (capture)
+		v3d_perfmon_hw_capture(v3d, perfmon);
+
+	V3D_CORE_WRITE(0, V3D_V4_PCTR_0_EN, 0);
+}
+
+void v3d_perfmon_start(struct v3d_dev *v3d, struct v3d_perfmon *perfmon)
+{
+	guard(spinlock_irqsave)(&v3d->perfmon_state.lock);
+
+	if (!perfmon || v3d->global_perfmon)
+		return;
+
+	if (!pm_runtime_get_if_active(v3d->drm.dev))
+		return;
+
+	v3d_perfmon_hw_start(v3d, perfmon);
+	v3d->perfmon_state.active = perfmon;
+
+	v3d_pm_runtime_put(v3d);
+}
+
+static void v3d_perfmon_capture_locked(struct v3d_dev *v3d,
+				       struct v3d_perfmon *perfmon)
+{
+	lockdep_assert_held(&v3d->perfmon_state.lock);
+
+	if (!perfmon || perfmon != v3d->perfmon_state.active)
+		return;
+
+	if (!pm_runtime_get_if_active(v3d->drm.dev))
+		return;
+
+	v3d_perfmon_hw_capture(v3d, perfmon);
+	v3d_pm_runtime_put(v3d);
+}
+
+void v3d_perfmon_stop_locked(struct v3d_dev *v3d, struct v3d_perfmon *perfmon,
+			     bool capture)
+{
+	lockdep_assert_held(&v3d->perfmon_state.lock);
+
+	if (!perfmon || perfmon != v3d->perfmon_state.active)
+		return;
+
+	v3d->perfmon_state.active = NULL;
+
+	/* If the device is suspended, the HW has already stopped counting. */
+	if (!pm_runtime_get_if_active(v3d->drm.dev))
+		return;
+
+	v3d_perfmon_hw_stop(v3d, perfmon, capture);
 	v3d_pm_runtime_put(v3d);
 }
 
 void v3d_perfmon_stop(struct v3d_dev *v3d, struct v3d_perfmon *perfmon,
 		      bool capture)
 {
-	unsigned int i;
-
-	if (!perfmon || !v3d->active_perfmon)
+	if (!perfmon)
 		return;
 
-	mutex_lock(&perfmon->lock);
-	if (perfmon != v3d->active_perfmon)
-		goto out;
+	guard(spinlock_irqsave)(&v3d->perfmon_state.lock);
+	v3d_perfmon_stop_locked(v3d, perfmon, capture);
+}
 
-	if (!pm_runtime_get_if_active(v3d->drm.dev))
-		goto out_clear;
+void
+v3d_perfmon_suspend(struct v3d_dev *v3d)
+{
+	guard(spinlock_irqsave)(&v3d->perfmon_state.lock);
 
-	if (capture)
-		for (i = 0; i < perfmon->ncounters; i++)
-			perfmon->values[i] += V3D_CORE_READ(0, V3D_PCTR_0_PCTRX(i));
+	if (!v3d->perfmon_state.active)
+		return;
 
-	V3D_CORE_WRITE(0, V3D_V4_PCTR_0_EN, 0);
+	v3d_perfmon_hw_stop(v3d, v3d->perfmon_state.active, true);
+}
 
-	v3d_pm_runtime_put(v3d);
+void
+v3d_perfmon_resume(struct v3d_dev *v3d)
+{
+	guard(spinlock_irqsave)(&v3d->perfmon_state.lock);
 
-out_clear:
-	v3d->active_perfmon = NULL;
-out:
-	mutex_unlock(&perfmon->lock);
+	if (!v3d->perfmon_state.active)
+		return;
+
+	v3d_perfmon_hw_start(v3d, v3d->perfmon_state.active);
 }
 
 struct v3d_perfmon *v3d_perfmon_find(struct v3d_file_priv *v3d_priv, int id)
@@ -316,14 +372,17 @@ static void v3d_perfmon_delete(struct v3d_file_priv *v3d_priv,
 	struct v3d_dev *v3d = v3d_priv->v3d;
 
 	/* If the active perfmon is being destroyed, stop it first */
-	if (perfmon == v3d->active_perfmon)
-		v3d_perfmon_stop(v3d, perfmon, false);
+	scoped_guard(spinlock_irqsave, &v3d->perfmon_state.lock) {
+		v3d_perfmon_stop_locked(v3d, perfmon, false);
 
-	/* If the global perfmon is being destroyed, clean it and release
-	 * the reference stashed in v3d_perfmon_set_global_ioctl().
-	 */
-	if (cmpxchg(&v3d->global_perfmon, perfmon, NULL) == perfmon)
-		v3d_perfmon_put(perfmon);
+		/* If the global perfmon is being destroyed, clean it and release
+		 * the reference stashed in v3d_perfmon_set_global_ioctl().
+		 */
+		if (v3d->global_perfmon == perfmon) {
+			v3d_perfmon_put(v3d->global_perfmon);
+			v3d->global_perfmon = NULL;
+		}
+	}
 
 	v3d_perfmon_put(perfmon);
 }
@@ -371,12 +430,10 @@ int v3d_perfmon_create_ioctl(struct drm_device *dev, void *data,
 	perfmon->ncounters = req->ncounters;
 
 	refcount_set(&perfmon->refcnt, 1);
-	mutex_init(&perfmon->lock);
 
 	ret = xa_alloc(&v3d_priv->perfmons, &id, perfmon, xa_limit_32b,
 		       GFP_KERNEL);
 	if (ret < 0) {
-		mutex_destroy(&perfmon->lock);
 		kfree(perfmon);
 		return ret;
 	}
@@ -408,7 +465,9 @@ int v3d_perfmon_get_values_ioctl(struct drm_device *dev, void *data,
 	struct v3d_dev *v3d = to_v3d_dev(dev);
 	struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
 	struct drm_v3d_perfmon_get_values *req = data;
+	u64 values[DRM_V3D_MAX_PERF_COUNTERS];
 	struct v3d_perfmon *perfmon;
+	size_t size;
 	int ret = 0;
 
 	if (req->pad != 0)
@@ -418,10 +477,14 @@ int v3d_perfmon_get_values_ioctl(struct drm_device *dev, void *data,
 	if (!perfmon)
 		return -EINVAL;
 
-	v3d_perfmon_stop(v3d, perfmon, true);
+	size = perfmon->ncounters * sizeof(u64);
 
-	if (copy_to_user(u64_to_user_ptr(req->values_ptr), perfmon->values,
-			 perfmon->ncounters * sizeof(u64)))
+	scoped_guard(spinlock_irqsave, &v3d->perfmon_state.lock) {
+		v3d_perfmon_capture_locked(v3d, perfmon);
+		memcpy(values, perfmon->values, size);
+	}
+
+	if (copy_to_user(u64_to_user_ptr(req->values_ptr), values, size))
 		ret = -EFAULT;
 
 	v3d_perfmon_put(perfmon);
@@ -482,18 +545,36 @@ int v3d_perfmon_set_global_ioctl(struct drm_device *dev, void *data,
 		 */
 		v3d_perfmon_put(perfmon);
 
-		old = xchg(&v3d->global_perfmon, NULL);
-		if (!old)
-			return -EINVAL;
+		scoped_guard(spinlock_irqsave, &v3d->perfmon_state.lock) {
+			old = v3d->global_perfmon;
+			if (!old)
+				return -EINVAL;
+
+			v3d_perfmon_stop_locked(v3d, old, true);
+			v3d->global_perfmon = NULL;
+		}
 
 		v3d_perfmon_put(old);
 
 		return 0;
 	}
 
-	if (cmpxchg(&v3d->global_perfmon, NULL, perfmon)) {
-		v3d_perfmon_put(perfmon);
-		return -EBUSY;
+	scoped_guard(spinlock_irqsave, &v3d->perfmon_state.lock) {
+		if (v3d->perfmon_state.active || v3d->global_perfmon) {
+			v3d_perfmon_put(perfmon);
+			return -EBUSY;
+		}
+
+		v3d->global_perfmon = perfmon;
+		v3d->perfmon_state.active = perfmon;
+
+		/* If the device is suspended, v3d_perfmon_resume() will
+		 * program the HW on the next resume.
+		 */
+		if (pm_runtime_get_if_active(v3d->drm.dev)) {
+			v3d_perfmon_hw_start(v3d, perfmon);
+			v3d_pm_runtime_put(v3d);
+		}
 	}
 
 	return 0;
diff --git a/drivers/gpu/drm/v3d/v3d_power.c b/drivers/gpu/drm/v3d/v3d_power.c
index 769e90032b04..1a4b651a2c5f 100644
--- a/drivers/gpu/drm/v3d/v3d_power.c
+++ b/drivers/gpu/drm/v3d/v3d_power.c
@@ -50,6 +50,8 @@ int v3d_power_suspend(struct device *dev)
 	struct v3d_dev *v3d = to_v3d_dev(drm);
 	int ret;
 
+	v3d_perfmon_suspend(v3d);
+
 	v3d_irq_disable(v3d);
 
 	ret = v3d_suspend_sms(v3d);
@@ -83,5 +85,7 @@ int v3d_power_resume(struct device *dev)
 	v3d_mmu_set_page_table(v3d);
 	v3d_irq_enable(v3d);
 
+	v3d_perfmon_resume(v3d);
+
 	return 0;
 }
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index c16a9d4d41e6..4d2b91d49542 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -125,24 +125,6 @@ v3d_performance_query_info_free(struct v3d_performance_query_info *query_info,
 	}
 }
 
-static void
-v3d_switch_perfmon(struct v3d_dev *v3d, struct v3d_job *job)
-{
-	struct v3d_perfmon *perfmon = v3d->global_perfmon;
-
-	if (!perfmon)
-		perfmon = job->perfmon;
-
-	if (perfmon == v3d->active_perfmon)
-		return;
-
-	if (perfmon != v3d->active_perfmon)
-		v3d_perfmon_stop(v3d, v3d->active_perfmon, true);
-
-	if (perfmon && v3d->active_perfmon != perfmon)
-		v3d_perfmon_start(v3d, perfmon);
-}
-
 static void
 v3d_stats_start(struct v3d_stats *stats, u64 now)
 {
@@ -219,7 +201,7 @@ static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job)
 			    job->start, job->end);
 
 	v3d_job_start_stats(&job->base);
-	v3d_switch_perfmon(v3d, &job->base);
+	v3d_perfmon_start(v3d, job->base.perfmon);
 
 	/* Set the current and end address of the control list.
 	 * Writing the end register is what starts the job.
@@ -277,7 +259,7 @@ static struct dma_fence *v3d_render_job_run(struct drm_sched_job *sched_job)
 			    job->start, job->end);
 
 	v3d_job_start_stats(&job->base);
-	v3d_switch_perfmon(v3d, &job->base);
+	v3d_perfmon_start(v3d, job->base.perfmon);
 
 	/* XXX: Set the QCFG */
 
@@ -370,7 +352,7 @@ v3d_csd_job_run(struct drm_sched_job *sched_job)
 	trace_v3d_submit_csd(dev, to_v3d_fence(fence)->seqno);
 
 	v3d_job_start_stats(&job->base);
-	v3d_switch_perfmon(v3d, &job->base);
+	v3d_perfmon_start(v3d, job->base.perfmon);
 
 	csd_cfg0_reg = V3D_CSD_QUEUED_CFG0(v3d->ver);
 	for (i = 1; i <= 6; i++)
@@ -711,6 +693,8 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job,
 	if (sched_job)
 		drm_sched_increase_karma(sched_job);
 
+	v3d_perfmon_stop(v3d, job->perfmon, false);
+
 	/* get the GPU back into the init state */
 	v3d_reset(v3d);
 
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 02441d4f495d..4c526aafc4e0 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -275,8 +275,10 @@ v3d_attach_perfmon_to_jobs(struct v3d_submit *submit, u32 perfmon_id)
 	if (!perfmon_id)
 		return 0;
 
-	if (v3d->global_perfmon)
-		return -EAGAIN;
+	scoped_guard(spinlock_irqsave, &v3d->perfmon_state.lock) {
+		if (v3d->global_perfmon)
+			return -EAGAIN;
+	}
 
 	perfmon = v3d_perfmon_find(v3d_priv, perfmon_id);
 	if (!perfmon)

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/4] drm/v3d: Serialize jobs across queues when a perfmon is attached
  2026-05-31 20:18 [PATCH v2 0/4] drm/v3d: Fix perfmon locking and cross-queue isolation Maíra Canal
  2026-05-31 20:18 ` [PATCH v2 1/4] drm/v3d: Fix global performance monitor reference counting Maíra Canal
  2026-05-31 20:18 ` [PATCH v2 2/4] drm/v3d: Refactor perfmon locking Maíra Canal
@ 2026-05-31 20:18 ` Maíra Canal
  2026-06-02  7:35   ` Iago Toral
  2026-06-04  4:44   ` Claude review: " Claude Code Review Bot
  2026-05-31 20:18 ` [PATCH v2 4/4] drm/v3d: Drop the queue argument from v3d_job_add_syncobjs() Maíra Canal
  2026-06-04  4:44 ` Claude review: drm/v3d: Fix perfmon locking and cross-queue isolation Claude Code Review Bot
  4 siblings, 2 replies; 15+ messages in thread
From: Maíra Canal @ 2026-05-31 20:18 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, Tvrtko Ursulin, David Airlie,
	Simona Vetter
  Cc: kernel-dev, dri-devel, Maíra Canal

A non-global perfmon is meant to count events generated by a specific
submission, but the scheduler can run jobs from different queues
concurrently on the same V3D core. Without explicit serialization, an
unrelated job running in parallel with a perfmon-carrying job pollutes
the counters and generates unusable results.

To address such issue, we must enforce cross-queue serialization when we
detect a perfmon-carrying submission. It's possible to implement
serialization by enforcing two rules:

  1. A job that carries a non-global perfmon must wait for every job
    currently in-flight across all HW queues to finish.

  2. While a perfmon-carrying job is still in-flight, all subsequently
    submitted jobs must wait for it.

Note that serialization is not needed in the global perfmon case, as the
global perfmon tracks activity from all jobs, so concurrency is desirable.

Therefore, check if serialization is needed during job submission and if
so, attach fence dependences to enforce cross-queue serialization.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_drv.h     | 35 ++++++++++++++++++----
 drivers/gpu/drm/v3d/v3d_gem.c     |  3 ++
 drivers/gpu/drm/v3d/v3d_perfmon.c |  6 ++++
 drivers/gpu/drm/v3d/v3d_submit.c  | 63 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 101 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 51486af68cf4..cdf4926d51f2 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -74,11 +74,13 @@ struct v3d_queue_state {
 	spinlock_t queue_lock;
 };
 
-/* Performance monitor object. The perform lifetime is controlled by userspace
- * using perfmon related ioctls. A perfmon can be attached to a submit_cl
- * request, and when this is the case, HW perf counters will be activated just
- * before the submit_cl is submitted to the GPU and disabled when the job is
- * done. This way, only events related to a specific job will be counted.
+/* Performance monitor object
+ *
+ * The performance monitor (perfmon) lifetime is controlled by userspace using
+ * perfmon related ioctls. A perfmon can be attached to a CL or CSD submission
+ * request, and when it is, HW performance counters will be activated just
+ * before the job is submitted to the GPU and disabled when the job is done.
+ * This way, only events related to a specific submission will be counted.
  */
 struct v3d_perfmon {
 	/* Tracks the number of users of the perfmon, when this counter reaches
@@ -167,13 +169,31 @@ struct v3d_dev {
 
 	struct v3d_queue_state queue[V3D_MAX_QUEUES];
 
-	/* Tracks the performance monitor state. */
+	/*
+	 * Tracks the performance monitor state and consistency.
+	 *
+	 * When a non-global perfmon is attached to a job, the scheduler must
+	 * not run any other job on the HW concurrently (otherwise, the
+	 * counters would be polluted by unrelated work).
+	 */
 	struct {
 		/* Protects @active. */
 		spinlock_t lock;
 
 		/* Perfmon currently programmed in HW (or NULL if none). */
 		struct v3d_perfmon *active;
+
+		/* Finished fence of the most recently submitted job that
+		 * opened a serialization window (i.e. a job with a non-global
+		 * perfmon attached).
+		 */
+		struct dma_fence *fence;
+
+		/* Finished fence of the most recently submitted job on each HW
+		 * queue. Used so that a new perfmon-carrying job can depend on
+		 * every job currently in-flight across all queues.
+		 */
+		struct dma_fence *last_hw_fence[V3D_MAX_QUEUES];
 	} perfmon_state;
 
 	/* Protects bo_stats */
@@ -319,6 +339,9 @@ struct v3d_job {
 
 	struct v3d_dev *v3d;
 
+	/* The queue that the job was submitted on. */
+	enum v3d_queue queue;
+
 	/* This is the array of BOs that were looked up at the start
 	 * of submission.
 	 */
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index 9487ab7acd03..0b415486f5b4 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -363,8 +363,11 @@ v3d_gem_destroy(struct drm_device *dev)
 	for (q = 0; q < V3D_MAX_QUEUES; q++) {
 		WARN_ON(v3d->queue[q].active_job);
 		v3d_stats_put(v3d->queue[q].stats);
+		dma_fence_put(v3d->perfmon_state.last_hw_fence[q]);
 	}
 
+	dma_fence_put(v3d->perfmon_state.fence);
+
 	drm_mm_takedown(&v3d->mm);
 
 	dma_free_coherent(v3d->drm.dev, 4096 * 1024, (void *)v3d->pt,
diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c b/drivers/gpu/drm/v3d/v3d_perfmon.c
index 3ad0f022753c..07dab7fb3060 100644
--- a/drivers/gpu/drm/v3d/v3d_perfmon.c
+++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
@@ -275,6 +275,12 @@ void v3d_perfmon_start(struct v3d_dev *v3d, struct v3d_perfmon *perfmon)
 	if (!perfmon || v3d->global_perfmon)
 		return;
 
+	/* Cross-queue serialization should have drained any previous perfmon
+	 * job before this one runs.
+	 */
+	if (WARN_ON_ONCE(v3d->perfmon_state.active))
+		return;
+
 	if (!pm_runtime_get_if_active(v3d->drm.dev))
 		return;
 
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 4c526aafc4e0..46fc88bacc95 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -234,6 +234,7 @@ v3d_submit_add_job(struct v3d_submit *submit, void **container, size_t size,
 	job->v3d = v3d;
 	job->free = free;
 	job->file_priv = v3d_priv;
+	job->queue = queue;
 
 	ret = drm_sched_job_init(&job->base, &v3d_priv->sched_entity[queue],
 				 1, v3d_priv, submit->file_priv->client_id);
@@ -293,6 +294,62 @@ v3d_attach_perfmon_to_jobs(struct v3d_submit *submit, u32 perfmon_id)
 	return 0;
 }
 
+/*
+ * Prepare fences to enforce job serialization when a perfmon is active. A job
+ * that carries a non-global perfmon must wait for every job currently in-flight
+ * across all HW queues to finish, otherwise concurrent unrelated work on the
+ * same core would pollute the performance counters. Symmetrically, while such a
+ * job is still in-flight, all subsequently submitted jobs must wait for it.
+ *
+ * We don't serialize the jobs when using a global perfmon as it's expected to
+ * track concurrent activity from all jobs.
+ */
+static int
+v3d_serialize_for_perfmon(struct v3d_job *job)
+{
+	struct v3d_dev *v3d = job->v3d;
+	bool is_global_perfmon;
+	int ret;
+
+	lockdep_assert_held(&v3d->sched_lock);
+
+	scoped_guard(spinlock_irqsave, &v3d->perfmon_state.lock)
+		is_global_perfmon = !!v3d->global_perfmon;
+
+	if (is_global_perfmon)
+		goto publish;
+
+	if (job->perfmon) {
+		for (enum v3d_queue q = 0; q < V3D_MAX_QUEUES; q++) {
+			struct dma_fence *f = v3d->perfmon_state.last_hw_fence[q];
+
+			if (!f || dma_fence_is_signaled(f))
+				continue;
+
+			ret = drm_sched_job_add_dependency(&job->base, dma_fence_get(f));
+			if (ret)
+				return ret;
+		}
+	} else if (v3d->perfmon_state.fence &&
+		   !dma_fence_is_signaled(v3d->perfmon_state.fence)) {
+		ret = drm_sched_job_add_dependency(&job->base,
+						   dma_fence_get(v3d->perfmon_state.fence));
+		if (ret)
+			return ret;
+	}
+
+publish:
+	dma_fence_put(v3d->perfmon_state.last_hw_fence[job->queue]);
+	v3d->perfmon_state.last_hw_fence[job->queue] = dma_fence_get(job->done_fence);
+
+	if (job->perfmon && !is_global_perfmon) {
+		dma_fence_put(v3d->perfmon_state.fence);
+		v3d->perfmon_state.fence = dma_fence_get(job->done_fence);
+	}
+
+	return 0;
+}
+
 static void
 v3d_submit_attach_object_fences(struct v3d_submit *submit)
 {
@@ -337,6 +394,12 @@ v3d_submit_jobs(struct v3d_submit *submit)
 			goto err;
 	}
 
+	for (int i = 0; i < submit->job_count; i++) {
+		ret = v3d_serialize_for_perfmon(submit->jobs[i]);
+		if (ret)
+			goto err;
+	}
+
 	for (int i = 0; i < submit->job_count; i++)
 		drm_sched_entity_push_job(&submit->jobs[i]->base);
 

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 4/4] drm/v3d: Drop the queue argument from v3d_job_add_syncobjs()
  2026-05-31 20:18 [PATCH v2 0/4] drm/v3d: Fix perfmon locking and cross-queue isolation Maíra Canal
                   ` (2 preceding siblings ...)
  2026-05-31 20:18 ` [PATCH v2 3/4] drm/v3d: Serialize jobs across queues when a perfmon is attached Maíra Canal
@ 2026-05-31 20:18 ` Maíra Canal
  2026-06-04  4:44   ` Claude review: " Claude Code Review Bot
  2026-06-04  4:44 ` Claude review: drm/v3d: Fix perfmon locking and cross-queue isolation Claude Code Review Bot
  4 siblings, 1 reply; 15+ messages in thread
From: Maíra Canal @ 2026-05-31 20:18 UTC (permalink / raw)
  To: Melissa Wen, Iago Toral, Tvrtko Ursulin, David Airlie,
	Simona Vetter
  Cc: kernel-dev, dri-devel, Maíra Canal

The struct v3d_job now carries its submission queue. The explicit queue
parameter passed alongside the job is therefore redundant, as it can be
read through job->queue directly.

Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_submit.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 46fc88bacc95..bfd99935c2ef 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -179,7 +179,7 @@ void v3d_job_put(struct v3d_job *job)
 
 static int
 v3d_job_add_syncobjs(struct v3d_job *job, struct drm_file *file_priv,
-		     u32 in_sync, struct v3d_submit_ext *se, enum v3d_queue queue)
+		     u32 in_sync, struct v3d_submit_ext *se)
 {
 	bool has_multisync = se && (se->flags & DRM_V3D_EXT_ID_MULTI_SYNC);
 	struct v3d_dev *v3d = job->v3d;
@@ -193,7 +193,7 @@ v3d_job_add_syncobjs(struct v3d_job *job, struct drm_file *file_priv,
 		return ret;
 	}
 
-	if (se->in_sync_count && se->wait_stage == queue) {
+	if (se->in_sync_count && se->wait_stage == job->queue) {
 		struct drm_v3d_sem __user *handle = u64_to_user_ptr(se->in_syncs);
 
 		for (int i = 0; i < se->in_sync_count; i++) {
@@ -481,8 +481,7 @@ v3d_setup_csd_jobs_and_bos(struct v3d_submit *submit,
 	if (ret)
 		return ret;
 
-	ret = v3d_job_add_syncobjs(&job->base, submit->file_priv, args->in_sync,
-				   se, V3D_CSD);
+	ret = v3d_job_add_syncobjs(&job->base, submit->file_priv, args->in_sync, se);
 	if (ret)
 		return ret;
 
@@ -1112,7 +1111,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 		bin->qts = args->qts;
 
 		ret = v3d_job_add_syncobjs(&bin->base, file_priv,
-					   args->in_sync_bcl, &se, V3D_BIN);
+					   args->in_sync_bcl, &se);
 		if (ret)
 			goto fail;
 	}
@@ -1129,8 +1128,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
 	if (bin)
 		bin->render = render;
 
-	ret = v3d_job_add_syncobjs(&render->base, file_priv, args->in_sync_rcl,
-				   &se, V3D_RENDER);
+	ret = v3d_job_add_syncobjs(&render->base, file_priv, args->in_sync_rcl, &se);
 	if (ret)
 		goto fail;
 
@@ -1221,7 +1219,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
 	if (ret)
 		goto fail;
 
-	ret = v3d_job_add_syncobjs(&job->base, file_priv, args->in_sync, &se, V3D_TFU);
+	ret = v3d_job_add_syncobjs(&job->base, file_priv, args->in_sync, &se);
 	if (ret)
 		goto fail;
 
@@ -1413,7 +1411,7 @@ v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
 
 	trace_v3d_submit_cpu_ioctl(dev, cpu_job->job_type);
 
-	ret = v3d_job_add_syncobjs(&cpu_job->base, file_priv, 0, &se, V3D_CPU);
+	ret = v3d_job_add_syncobjs(&cpu_job->base, file_priv, 0, &se);
 	if (ret)
 		goto fail;
 

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/4] drm/v3d: Refactor perfmon locking
  2026-05-31 20:18 ` [PATCH v2 2/4] drm/v3d: Refactor perfmon locking Maíra Canal
@ 2026-06-01 11:52   ` Iago Toral
  2026-06-01 12:03     ` Maíra Canal
  2026-06-04  4:44   ` Claude review: " Claude Code Review Bot
  1 sibling, 1 reply; 15+ messages in thread
From: Iago Toral @ 2026-06-01 11:52 UTC (permalink / raw)
  To: Maíra Canal, Melissa Wen, Tvrtko Ursulin, David Airlie,
	Simona Vetter
  Cc: kernel-dev, dri-devel

El dom, 31-05-2026 a las 17:18 -0300, Maíra Canal escribió:

(...)
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> b/drivers/gpu/drm/v3d/v3d_gem.c
> index 1ee3c038d5f6..9487ab7acd03 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -137,7 +137,8 @@ v3d_reset(struct v3d_dev *v3d)
>  	v3d_mmu_set_page_table(v3d);
>  	v3d_irq_reset(v3d);
>  
> -	v3d_perfmon_stop(v3d, v3d->active_perfmon, false);
> +	/* Re-arm the global perfmon HW counters that the reset
> zeroed. */
> +	v3d_perfmon_resume(v3d);

What would happen if the reset happens when a non-global perfmon is
active?

Iago


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/4] drm/v3d: Refactor perfmon locking
  2026-06-01 11:52   ` Iago Toral
@ 2026-06-01 12:03     ` Maíra Canal
  0 siblings, 0 replies; 15+ messages in thread
From: Maíra Canal @ 2026-06-01 12:03 UTC (permalink / raw)
  To: Iago Toral, Melissa Wen, Tvrtko Ursulin, David Airlie,
	Simona Vetter
  Cc: kernel-dev, dri-devel

Hi Iago,

Thanks for your review!

On 6/1/26 08:52, Iago Toral wrote:
> El dom, 31-05-2026 a las 17:18 -0300, Maíra Canal escribió:
> 
> (...)
>> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
>> b/drivers/gpu/drm/v3d/v3d_gem.c
>> index 1ee3c038d5f6..9487ab7acd03 100644
>> --- a/drivers/gpu/drm/v3d/v3d_gem.c
>> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
>> @@ -137,7 +137,8 @@ v3d_reset(struct v3d_dev *v3d)
>>   	v3d_mmu_set_page_table(v3d);
>>   	v3d_irq_reset(v3d);
>>   
>> -	v3d_perfmon_stop(v3d, v3d->active_perfmon, false);
>> +	/* Re-arm the global perfmon HW counters that the reset
>> zeroed. */
>> +	v3d_perfmon_resume(v3d);
> 
> What would happen if the reset happens when a non-global perfmon is
> active?
> 

Nothing. In v3d_gpu_reset_for_timeout(), before calling v3d_reset(), we
stop job->perfmon and set v3d->perfmon_state.active = NULL.

@@ -711,6 +693,8 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, 
struct drm_sched_job *sched_job,
  	if (sched_job)
  		drm_sched_increase_karma(sched_job);

+	v3d_perfmon_stop(v3d, job->perfmon, false);
+
  	/* get the GPU back into the init state */
  	v3d_reset(v3d);

Best regards,
- Maíra

> Iago
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/4] drm/v3d: Serialize jobs across queues when a perfmon is attached
  2026-05-31 20:18 ` [PATCH v2 3/4] drm/v3d: Serialize jobs across queues when a perfmon is attached Maíra Canal
@ 2026-06-02  7:35   ` Iago Toral
  2026-06-02 10:49     ` Maíra Canal
  2026-06-04  4:44   ` Claude review: " Claude Code Review Bot
  1 sibling, 1 reply; 15+ messages in thread
From: Iago Toral @ 2026-06-02  7:35 UTC (permalink / raw)
  To: Maíra Canal, Melissa Wen, Tvrtko Ursulin, David Airlie,
	Simona Vetter
  Cc: kernel-dev, dri-devel

Hi Maíra, this pactch looks good to me but I am wondering if we should
do anything to clean up the perfmon state when a gpu reset happens:
i.e. what would happen if a perfmon job resets? Would we continue to
schedule after-reset jobs to wait on the fence of the reset job?

Iago

El dom, 31-05-2026 a las 17:18 -0300, Maíra Canal escribió:
> A non-global perfmon is meant to count events generated by a specific
> submission, but the scheduler can run jobs from different queues
> concurrently on the same V3D core. Without explicit serialization, an
> unrelated job running in parallel with a perfmon-carrying job
> pollutes
> the counters and generates unusable results.
> 
> To address such issue, we must enforce cross-queue serialization when
> we
> detect a perfmon-carrying submission. It's possible to implement
> serialization by enforcing two rules:
> 
>   1. A job that carries a non-global perfmon must wait for every job
>     currently in-flight across all HW queues to finish.
> 
>   2. While a perfmon-carrying job is still in-flight, all
> subsequently
>     submitted jobs must wait for it.
> 
> Note that serialization is not needed in the global perfmon case, as
> the
> global perfmon tracks activity from all jobs, so concurrency is
> desirable.
> 
> Therefore, check if serialization is needed during job submission and
> if
> so, attach fence dependences to enforce cross-queue serialization.
> 
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> ---
>  drivers/gpu/drm/v3d/v3d_drv.h     | 35 ++++++++++++++++++----
>  drivers/gpu/drm/v3d/v3d_gem.c     |  3 ++
>  drivers/gpu/drm/v3d/v3d_perfmon.c |  6 ++++
>  drivers/gpu/drm/v3d/v3d_submit.c  | 63
> +++++++++++++++++++++++++++++++++++++++
>  4 files changed, 101 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h
> b/drivers/gpu/drm/v3d/v3d_drv.h
> index 51486af68cf4..cdf4926d51f2 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.h
> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
> @@ -74,11 +74,13 @@ struct v3d_queue_state {
>  	spinlock_t queue_lock;
>  };
>  
> -/* Performance monitor object. The perform lifetime is controlled by
> userspace
> - * using perfmon related ioctls. A perfmon can be attached to a
> submit_cl
> - * request, and when this is the case, HW perf counters will be
> activated just
> - * before the submit_cl is submitted to the GPU and disabled when
> the job is
> - * done. This way, only events related to a specific job will be
> counted.
> +/* Performance monitor object
> + *
> + * The performance monitor (perfmon) lifetime is controlled by
> userspace using
> + * perfmon related ioctls. A perfmon can be attached to a CL or CSD
> submission
> + * request, and when it is, HW performance counters will be
> activated just
> + * before the job is submitted to the GPU and disabled when the job
> is done.
> + * This way, only events related to a specific submission will be
> counted.
>   */
>  struct v3d_perfmon {
>  	/* Tracks the number of users of the perfmon, when this
> counter reaches
> @@ -167,13 +169,31 @@ struct v3d_dev {
>  
>  	struct v3d_queue_state queue[V3D_MAX_QUEUES];
>  
> -	/* Tracks the performance monitor state. */
> +	/*
> +	 * Tracks the performance monitor state and consistency.
> +	 *
> +	 * When a non-global perfmon is attached to a job, the
> scheduler must
> +	 * not run any other job on the HW concurrently (otherwise,
> the
> +	 * counters would be polluted by unrelated work).
> +	 */
>  	struct {
>  		/* Protects @active. */
>  		spinlock_t lock;
>  
>  		/* Perfmon currently programmed in HW (or NULL if
> none). */
>  		struct v3d_perfmon *active;
> +
> +		/* Finished fence of the most recently submitted job
> that
> +		 * opened a serialization window (i.e. a job with a
> non-global
> +		 * perfmon attached).
> +		 */
> +		struct dma_fence *fence;
> +
> +		/* Finished fence of the most recently submitted job
> on each HW
> +		 * queue. Used so that a new perfmon-carrying job
> can depend on
> +		 * every job currently in-flight across all queues.
> +		 */
> +		struct dma_fence *last_hw_fence[V3D_MAX_QUEUES];
>  	} perfmon_state;
>  
>  	/* Protects bo_stats */
> @@ -319,6 +339,9 @@ struct v3d_job {
>  
>  	struct v3d_dev *v3d;
>  
> +	/* The queue that the job was submitted on. */
> +	enum v3d_queue queue;
> +
>  	/* This is the array of BOs that were looked up at the start
>  	 * of submission.
>  	 */
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> b/drivers/gpu/drm/v3d/v3d_gem.c
> index 9487ab7acd03..0b415486f5b4 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -363,8 +363,11 @@ v3d_gem_destroy(struct drm_device *dev)
>  	for (q = 0; q < V3D_MAX_QUEUES; q++) {
>  		WARN_ON(v3d->queue[q].active_job);
>  		v3d_stats_put(v3d->queue[q].stats);
> +		dma_fence_put(v3d->perfmon_state.last_hw_fence[q]);
>  	}
>  
> +	dma_fence_put(v3d->perfmon_state.fence);
> +
>  	drm_mm_takedown(&v3d->mm);
>  
>  	dma_free_coherent(v3d->drm.dev, 4096 * 1024, (void *)v3d-
> >pt,
> diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c
> b/drivers/gpu/drm/v3d/v3d_perfmon.c
> index 3ad0f022753c..07dab7fb3060 100644
> --- a/drivers/gpu/drm/v3d/v3d_perfmon.c
> +++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
> @@ -275,6 +275,12 @@ void v3d_perfmon_start(struct v3d_dev *v3d,
> struct v3d_perfmon *perfmon)
>  	if (!perfmon || v3d->global_perfmon)
>  		return;
>  
> +	/* Cross-queue serialization should have drained any
> previous perfmon
> +	 * job before this one runs.
> +	 */
> +	if (WARN_ON_ONCE(v3d->perfmon_state.active))
> +		return;
> +
>  	if (!pm_runtime_get_if_active(v3d->drm.dev))
>  		return;
>  
> diff --git a/drivers/gpu/drm/v3d/v3d_submit.c
> b/drivers/gpu/drm/v3d/v3d_submit.c
> index 4c526aafc4e0..46fc88bacc95 100644
> --- a/drivers/gpu/drm/v3d/v3d_submit.c
> +++ b/drivers/gpu/drm/v3d/v3d_submit.c
> @@ -234,6 +234,7 @@ v3d_submit_add_job(struct v3d_submit *submit,
> void **container, size_t size,
>  	job->v3d = v3d;
>  	job->free = free;
>  	job->file_priv = v3d_priv;
> +	job->queue = queue;
>  
>  	ret = drm_sched_job_init(&job->base, &v3d_priv-
> >sched_entity[queue],
>  				 1, v3d_priv, submit->file_priv-
> >client_id);
> @@ -293,6 +294,62 @@ v3d_attach_perfmon_to_jobs(struct v3d_submit
> *submit, u32 perfmon_id)
>  	return 0;
>  }
>  
> +/*
> + * Prepare fences to enforce job serialization when a perfmon is
> active. A job
> + * that carries a non-global perfmon must wait for every job
> currently in-flight
> + * across all HW queues to finish, otherwise concurrent unrelated
> work on the
> + * same core would pollute the performance counters. Symmetrically,
> while such a
> + * job is still in-flight, all subsequently submitted jobs must wait
> for it.
> + *
> + * We don't serialize the jobs when using a global perfmon as it's
> expected to
> + * track concurrent activity from all jobs.
> + */
> +static int
> +v3d_serialize_for_perfmon(struct v3d_job *job)
> +{
> +	struct v3d_dev *v3d = job->v3d;
> +	bool is_global_perfmon;
> +	int ret;
> +
> +	lockdep_assert_held(&v3d->sched_lock);
> +
> +	scoped_guard(spinlock_irqsave, &v3d->perfmon_state.lock)
> +		is_global_perfmon = !!v3d->global_perfmon;
> +
> +	if (is_global_perfmon)
> +		goto publish;
> +
> +	if (job->perfmon) {
> +		for (enum v3d_queue q = 0; q < V3D_MAX_QUEUES; q++)
> {
> +			struct dma_fence *f = v3d-
> >perfmon_state.last_hw_fence[q];
> +
> +			if (!f || dma_fence_is_signaled(f))
> +				continue;
> +
> +			ret = drm_sched_job_add_dependency(&job-
> >base, dma_fence_get(f));
> +			if (ret)
> +				return ret;
> +		}
> +	} else if (v3d->perfmon_state.fence &&
> +		   !dma_fence_is_signaled(v3d->perfmon_state.fence))
> {
> +		ret = drm_sched_job_add_dependency(&job->base,
> +						  
> dma_fence_get(v3d->perfmon_state.fence));
> +		if (ret)
> +			return ret;
> +	}
> +
> +publish:
> +	dma_fence_put(v3d->perfmon_state.last_hw_fence[job->queue]);
> +	v3d->perfmon_state.last_hw_fence[job->queue] =
> dma_fence_get(job->done_fence);
> +
> +	if (job->perfmon && !is_global_perfmon) {
> +		dma_fence_put(v3d->perfmon_state.fence);
> +		v3d->perfmon_state.fence = dma_fence_get(job-
> >done_fence);
> +	}
> +
> +	return 0;
> +}
> +
>  static void
>  v3d_submit_attach_object_fences(struct v3d_submit *submit)
>  {
> @@ -337,6 +394,12 @@ v3d_submit_jobs(struct v3d_submit *submit)
>  			goto err;
>  	}
>  
> +	for (int i = 0; i < submit->job_count; i++) {
> +		ret = v3d_serialize_for_perfmon(submit->jobs[i]);
> +		if (ret)
> +			goto err;
> +	}
> +
>  	for (int i = 0; i < submit->job_count; i++)
>  		drm_sched_entity_push_job(&submit->jobs[i]->base);
>  
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/4] drm/v3d: Serialize jobs across queues when a perfmon is attached
  2026-06-02  7:35   ` Iago Toral
@ 2026-06-02 10:49     ` Maíra Canal
  2026-06-02 11:10       ` Iago Toral
  0 siblings, 1 reply; 15+ messages in thread
From: Maíra Canal @ 2026-06-02 10:49 UTC (permalink / raw)
  To: Iago Toral, Melissa Wen, Tvrtko Ursulin, David Airlie,
	Simona Vetter
  Cc: kernel-dev, dri-devel

Hi Iago,

On 02/06/26 04:35, Iago Toral wrote:
> Hi Maíra, this pactch looks good to me but I am wondering if we should
> do anything to clean up the perfmon state when a gpu reset happens:
> i.e. what would happen if a perfmon job resets? Would we continue to
> schedule after-reset jobs to wait on the fence of the reset job?

If the job resets, the design around the DRM scheduler will protect us.

The DRM scheduler guarantees that a finished fence is always signaled
eventually. On a reset, drm_sched_start() signals the pending jobs'
finished fences (with -ECANCELED for the guilty job). So if a
perfmon-carrying job is the one that times out, the done_fence we
stored for serialization still gets signaled when resubmitted (check the
first if-clause in the run_job() callbacks).

drm_sched_job_add_dependency() releases the dependent job as soon as the
fence signals, regardless of error, so any job submitted behind the
reset perfmon job is released and runs. The serialization window simply
closes when the job dies.

Best regards,
- Maíra

> 
> Iago
> 
> El dom, 31-05-2026 a las 17:18 -0300, Maíra Canal escribió:
>> A non-global perfmon is meant to count events generated by a specific
>> submission, but the scheduler can run jobs from different queues
>> concurrently on the same V3D core. Without explicit serialization, an
>> unrelated job running in parallel with a perfmon-carrying job
>> pollutes
>> the counters and generates unusable results.
>>
>> To address such issue, we must enforce cross-queue serialization when
>> we
>> detect a perfmon-carrying submission. It's possible to implement
>> serialization by enforcing two rules:
>>
>>    1. A job that carries a non-global perfmon must wait for every job
>>      currently in-flight across all HW queues to finish.
>>
>>    2. While a perfmon-carrying job is still in-flight, all
>> subsequently
>>      submitted jobs must wait for it.
>>
>> Note that serialization is not needed in the global perfmon case, as
>> the
>> global perfmon tracks activity from all jobs, so concurrency is
>> desirable.
>>
>> Therefore, check if serialization is needed during job submission and
>> if
>> so, attach fence dependences to enforce cross-queue serialization.
>>
>> Signed-off-by: Maíra Canal <mcanal@igalia.com>
>> ---
>>   drivers/gpu/drm/v3d/v3d_drv.h     | 35 ++++++++++++++++++----
>>   drivers/gpu/drm/v3d/v3d_gem.c     |  3 ++
>>   drivers/gpu/drm/v3d/v3d_perfmon.c |  6 ++++
>>   drivers/gpu/drm/v3d/v3d_submit.c  | 63
>> +++++++++++++++++++++++++++++++++++++++
>>   4 files changed, 101 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h
>> b/drivers/gpu/drm/v3d/v3d_drv.h
>> index 51486af68cf4..cdf4926d51f2 100644
>> --- a/drivers/gpu/drm/v3d/v3d_drv.h
>> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
>> @@ -74,11 +74,13 @@ struct v3d_queue_state {
>>   	spinlock_t queue_lock;
>>   };
>>   
>> -/* Performance monitor object. The perform lifetime is controlled by
>> userspace
>> - * using perfmon related ioctls. A perfmon can be attached to a
>> submit_cl
>> - * request, and when this is the case, HW perf counters will be
>> activated just
>> - * before the submit_cl is submitted to the GPU and disabled when
>> the job is
>> - * done. This way, only events related to a specific job will be
>> counted.
>> +/* Performance monitor object
>> + *
>> + * The performance monitor (perfmon) lifetime is controlled by
>> userspace using
>> + * perfmon related ioctls. A perfmon can be attached to a CL or CSD
>> submission
>> + * request, and when it is, HW performance counters will be
>> activated just
>> + * before the job is submitted to the GPU and disabled when the job
>> is done.
>> + * This way, only events related to a specific submission will be
>> counted.
>>    */
>>   struct v3d_perfmon {
>>   	/* Tracks the number of users of the perfmon, when this
>> counter reaches
>> @@ -167,13 +169,31 @@ struct v3d_dev {
>>   
>>   	struct v3d_queue_state queue[V3D_MAX_QUEUES];
>>   
>> -	/* Tracks the performance monitor state. */
>> +	/*
>> +	 * Tracks the performance monitor state and consistency.
>> +	 *
>> +	 * When a non-global perfmon is attached to a job, the
>> scheduler must
>> +	 * not run any other job on the HW concurrently (otherwise,
>> the
>> +	 * counters would be polluted by unrelated work).
>> +	 */
>>   	struct {
>>   		/* Protects @active. */
>>   		spinlock_t lock;
>>   
>>   		/* Perfmon currently programmed in HW (or NULL if
>> none). */
>>   		struct v3d_perfmon *active;
>> +
>> +		/* Finished fence of the most recently submitted job
>> that
>> +		 * opened a serialization window (i.e. a job with a
>> non-global
>> +		 * perfmon attached).
>> +		 */
>> +		struct dma_fence *fence;
>> +
>> +		/* Finished fence of the most recently submitted job
>> on each HW
>> +		 * queue. Used so that a new perfmon-carrying job
>> can depend on
>> +		 * every job currently in-flight across all queues.
>> +		 */
>> +		struct dma_fence *last_hw_fence[V3D_MAX_QUEUES];
>>   	} perfmon_state;
>>   
>>   	/* Protects bo_stats */
>> @@ -319,6 +339,9 @@ struct v3d_job {
>>   
>>   	struct v3d_dev *v3d;
>>   
>> +	/* The queue that the job was submitted on. */
>> +	enum v3d_queue queue;
>> +
>>   	/* This is the array of BOs that were looked up at the start
>>   	 * of submission.
>>   	 */
>> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
>> b/drivers/gpu/drm/v3d/v3d_gem.c
>> index 9487ab7acd03..0b415486f5b4 100644
>> --- a/drivers/gpu/drm/v3d/v3d_gem.c
>> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
>> @@ -363,8 +363,11 @@ v3d_gem_destroy(struct drm_device *dev)
>>   	for (q = 0; q < V3D_MAX_QUEUES; q++) {
>>   		WARN_ON(v3d->queue[q].active_job);
>>   		v3d_stats_put(v3d->queue[q].stats);
>> +		dma_fence_put(v3d->perfmon_state.last_hw_fence[q]);
>>   	}
>>   
>> +	dma_fence_put(v3d->perfmon_state.fence);
>> +
>>   	drm_mm_takedown(&v3d->mm);
>>   
>>   	dma_free_coherent(v3d->drm.dev, 4096 * 1024, (void *)v3d-
>>> pt,
>> diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c
>> b/drivers/gpu/drm/v3d/v3d_perfmon.c
>> index 3ad0f022753c..07dab7fb3060 100644
>> --- a/drivers/gpu/drm/v3d/v3d_perfmon.c
>> +++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
>> @@ -275,6 +275,12 @@ void v3d_perfmon_start(struct v3d_dev *v3d,
>> struct v3d_perfmon *perfmon)
>>   	if (!perfmon || v3d->global_perfmon)
>>   		return;
>>   
>> +	/* Cross-queue serialization should have drained any
>> previous perfmon
>> +	 * job before this one runs.
>> +	 */
>> +	if (WARN_ON_ONCE(v3d->perfmon_state.active))
>> +		return;
>> +
>>   	if (!pm_runtime_get_if_active(v3d->drm.dev))
>>   		return;
>>   
>> diff --git a/drivers/gpu/drm/v3d/v3d_submit.c
>> b/drivers/gpu/drm/v3d/v3d_submit.c
>> index 4c526aafc4e0..46fc88bacc95 100644
>> --- a/drivers/gpu/drm/v3d/v3d_submit.c
>> +++ b/drivers/gpu/drm/v3d/v3d_submit.c
>> @@ -234,6 +234,7 @@ v3d_submit_add_job(struct v3d_submit *submit,
>> void **container, size_t size,
>>   	job->v3d = v3d;
>>   	job->free = free;
>>   	job->file_priv = v3d_priv;
>> +	job->queue = queue;
>>   
>>   	ret = drm_sched_job_init(&job->base, &v3d_priv-
>>> sched_entity[queue],
>>   				 1, v3d_priv, submit->file_priv-
>>> client_id);
>> @@ -293,6 +294,62 @@ v3d_attach_perfmon_to_jobs(struct v3d_submit
>> *submit, u32 perfmon_id)
>>   	return 0;
>>   }
>>   
>> +/*
>> + * Prepare fences to enforce job serialization when a perfmon is
>> active. A job
>> + * that carries a non-global perfmon must wait for every job
>> currently in-flight
>> + * across all HW queues to finish, otherwise concurrent unrelated
>> work on the
>> + * same core would pollute the performance counters. Symmetrically,
>> while such a
>> + * job is still in-flight, all subsequently submitted jobs must wait
>> for it.
>> + *
>> + * We don't serialize the jobs when using a global perfmon as it's
>> expected to
>> + * track concurrent activity from all jobs.
>> + */
>> +static int
>> +v3d_serialize_for_perfmon(struct v3d_job *job)
>> +{
>> +	struct v3d_dev *v3d = job->v3d;
>> +	bool is_global_perfmon;
>> +	int ret;
>> +
>> +	lockdep_assert_held(&v3d->sched_lock);
>> +
>> +	scoped_guard(spinlock_irqsave, &v3d->perfmon_state.lock)
>> +		is_global_perfmon = !!v3d->global_perfmon;
>> +
>> +	if (is_global_perfmon)
>> +		goto publish;
>> +
>> +	if (job->perfmon) {
>> +		for (enum v3d_queue q = 0; q < V3D_MAX_QUEUES; q++)
>> {
>> +			struct dma_fence *f = v3d-
>>> perfmon_state.last_hw_fence[q];
>> +
>> +			if (!f || dma_fence_is_signaled(f))
>> +				continue;
>> +
>> +			ret = drm_sched_job_add_dependency(&job-
>>> base, dma_fence_get(f));
>> +			if (ret)
>> +				return ret;
>> +		}
>> +	} else if (v3d->perfmon_state.fence &&
>> +		   !dma_fence_is_signaled(v3d->perfmon_state.fence))
>> {
>> +		ret = drm_sched_job_add_dependency(&job->base,
>> +						
>> dma_fence_get(v3d->perfmon_state.fence));
>> +		if (ret)
>> +			return ret;
>> +	}
>> +
>> +publish:
>> +	dma_fence_put(v3d->perfmon_state.last_hw_fence[job->queue]);
>> +	v3d->perfmon_state.last_hw_fence[job->queue] =
>> dma_fence_get(job->done_fence);
>> +
>> +	if (job->perfmon && !is_global_perfmon) {
>> +		dma_fence_put(v3d->perfmon_state.fence);
>> +		v3d->perfmon_state.fence = dma_fence_get(job-
>>> done_fence);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>   static void
>>   v3d_submit_attach_object_fences(struct v3d_submit *submit)
>>   {
>> @@ -337,6 +394,12 @@ v3d_submit_jobs(struct v3d_submit *submit)
>>   			goto err;
>>   	}
>>   
>> +	for (int i = 0; i < submit->job_count; i++) {
>> +		ret = v3d_serialize_for_perfmon(submit->jobs[i]);
>> +		if (ret)
>> +			goto err;
>> +	}
>> +
>>   	for (int i = 0; i < submit->job_count; i++)
>>   		drm_sched_entity_push_job(&submit->jobs[i]->base);
>>   
>>
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/4] drm/v3d: Serialize jobs across queues when a perfmon is attached
  2026-06-02 10:49     ` Maíra Canal
@ 2026-06-02 11:10       ` Iago Toral
  0 siblings, 0 replies; 15+ messages in thread
From: Iago Toral @ 2026-06-02 11:10 UTC (permalink / raw)
  To: Maíra Canal, Melissa Wen, Tvrtko Ursulin, David Airlie,
	Simona Vetter
  Cc: kernel-dev, dri-devel

Hi Maíra,

thanks for clarifying this. For the series:

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>

Iago

El mar, 02-06-2026 a las 07:49 -0300, Maíra Canal escribió:
> Hi Iago,
> 
> On 02/06/26 04:35, Iago Toral wrote:
> > Hi Maíra, this pactch looks good to me but I am wondering if we
> > should
> > do anything to clean up the perfmon state when a gpu reset happens:
> > i.e. what would happen if a perfmon job resets? Would we continue
> > to
> > schedule after-reset jobs to wait on the fence of the reset job?
> 
> If the job resets, the design around the DRM scheduler will protect
> us.
> 
> The DRM scheduler guarantees that a finished fence is always signaled
> eventually. On a reset, drm_sched_start() signals the pending jobs'
> finished fences (with -ECANCELED for the guilty job). So if a
> perfmon-carrying job is the one that times out, the done_fence we
> stored for serialization still gets signaled when resubmitted (check
> the
> first if-clause in the run_job() callbacks).
> 
> drm_sched_job_add_dependency() releases the dependent job as soon as
> the
> fence signals, regardless of error, so any job submitted behind the
> reset perfmon job is released and runs. The serialization window
> simply
> closes when the job dies.
> 
> Best regards,
> - Maíra
> 
> > 
> > Iago
> > 
> > El dom, 31-05-2026 a las 17:18 -0300, Maíra Canal escribió:
> > > A non-global perfmon is meant to count events generated by a
> > > specific
> > > submission, but the scheduler can run jobs from different queues
> > > concurrently on the same V3D core. Without explicit
> > > serialization, an
> > > unrelated job running in parallel with a perfmon-carrying job
> > > pollutes
> > > the counters and generates unusable results.
> > > 
> > > To address such issue, we must enforce cross-queue serialization
> > > when
> > > we
> > > detect a perfmon-carrying submission. It's possible to implement
> > > serialization by enforcing two rules:
> > > 
> > >    1. A job that carries a non-global perfmon must wait for every
> > > job
> > >      currently in-flight across all HW queues to finish.
> > > 
> > >    2. While a perfmon-carrying job is still in-flight, all
> > > subsequently
> > >      submitted jobs must wait for it.
> > > 
> > > Note that serialization is not needed in the global perfmon case,
> > > as
> > > the
> > > global perfmon tracks activity from all jobs, so concurrency is
> > > desirable.
> > > 
> > > Therefore, check if serialization is needed during job submission
> > > and
> > > if
> > > so, attach fence dependences to enforce cross-queue
> > > serialization.
> > > 
> > > Signed-off-by: Maíra Canal <mcanal@igalia.com>
> > > ---
> > >   drivers/gpu/drm/v3d/v3d_drv.h     | 35 ++++++++++++++++++----
> > >   drivers/gpu/drm/v3d/v3d_gem.c     |  3 ++
> > >   drivers/gpu/drm/v3d/v3d_perfmon.c |  6 ++++
> > >   drivers/gpu/drm/v3d/v3d_submit.c  | 63
> > > +++++++++++++++++++++++++++++++++++++++
> > >   4 files changed, 101 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/v3d/v3d_drv.h
> > > b/drivers/gpu/drm/v3d/v3d_drv.h
> > > index 51486af68cf4..cdf4926d51f2 100644
> > > --- a/drivers/gpu/drm/v3d/v3d_drv.h
> > > +++ b/drivers/gpu/drm/v3d/v3d_drv.h
> > > @@ -74,11 +74,13 @@ struct v3d_queue_state {
> > >   	spinlock_t queue_lock;
> > >   };
> > >   
> > > -/* Performance monitor object. The perform lifetime is
> > > controlled by
> > > userspace
> > > - * using perfmon related ioctls. A perfmon can be attached to a
> > > submit_cl
> > > - * request, and when this is the case, HW perf counters will be
> > > activated just
> > > - * before the submit_cl is submitted to the GPU and disabled
> > > when
> > > the job is
> > > - * done. This way, only events related to a specific job will be
> > > counted.
> > > +/* Performance monitor object
> > > + *
> > > + * The performance monitor (perfmon) lifetime is controlled by
> > > userspace using
> > > + * perfmon related ioctls. A perfmon can be attached to a CL or
> > > CSD
> > > submission
> > > + * request, and when it is, HW performance counters will be
> > > activated just
> > > + * before the job is submitted to the GPU and disabled when the
> > > job
> > > is done.
> > > + * This way, only events related to a specific submission will
> > > be
> > > counted.
> > >    */
> > >   struct v3d_perfmon {
> > >   	/* Tracks the number of users of the perfmon, when this
> > > counter reaches
> > > @@ -167,13 +169,31 @@ struct v3d_dev {
> > >   
> > >   	struct v3d_queue_state queue[V3D_MAX_QUEUES];
> > >   
> > > -	/* Tracks the performance monitor state. */
> > > +	/*
> > > +	 * Tracks the performance monitor state and consistency.
> > > +	 *
> > > +	 * When a non-global perfmon is attached to a job, the
> > > scheduler must
> > > +	 * not run any other job on the HW concurrently
> > > (otherwise,
> > > the
> > > +	 * counters would be polluted by unrelated work).
> > > +	 */
> > >   	struct {
> > >   		/* Protects @active. */
> > >   		spinlock_t lock;
> > >   
> > >   		/* Perfmon currently programmed in HW (or NULL
> > > if
> > > none). */
> > >   		struct v3d_perfmon *active;
> > > +
> > > +		/* Finished fence of the most recently submitted
> > > job
> > > that
> > > +		 * opened a serialization window (i.e. a job
> > > with a
> > > non-global
> > > +		 * perfmon attached).
> > > +		 */
> > > +		struct dma_fence *fence;
> > > +
> > > +		/* Finished fence of the most recently submitted
> > > job
> > > on each HW
> > > +		 * queue. Used so that a new perfmon-carrying
> > > job
> > > can depend on
> > > +		 * every job currently in-flight across all
> > > queues.
> > > +		 */
> > > +		struct dma_fence *last_hw_fence[V3D_MAX_QUEUES];
> > >   	} perfmon_state;
> > >   
> > >   	/* Protects bo_stats */
> > > @@ -319,6 +339,9 @@ struct v3d_job {
> > >   
> > >   	struct v3d_dev *v3d;
> > >   
> > > +	/* The queue that the job was submitted on. */
> > > +	enum v3d_queue queue;
> > > +
> > >   	/* This is the array of BOs that were looked up at the
> > > start
> > >   	 * of submission.
> > >   	 */
> > > diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> > > b/drivers/gpu/drm/v3d/v3d_gem.c
> > > index 9487ab7acd03..0b415486f5b4 100644
> > > --- a/drivers/gpu/drm/v3d/v3d_gem.c
> > > +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> > > @@ -363,8 +363,11 @@ v3d_gem_destroy(struct drm_device *dev)
> > >   	for (q = 0; q < V3D_MAX_QUEUES; q++) {
> > >   		WARN_ON(v3d->queue[q].active_job);
> > >   		v3d_stats_put(v3d->queue[q].stats);
> > > +		dma_fence_put(v3d-
> > > >perfmon_state.last_hw_fence[q]);
> > >   	}
> > >   
> > > +	dma_fence_put(v3d->perfmon_state.fence);
> > > +
> > >   	drm_mm_takedown(&v3d->mm);
> > >   
> > >   	dma_free_coherent(v3d->drm.dev, 4096 * 1024, (void
> > > *)v3d-
> > > > pt,
> > > diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c
> > > b/drivers/gpu/drm/v3d/v3d_perfmon.c
> > > index 3ad0f022753c..07dab7fb3060 100644
> > > --- a/drivers/gpu/drm/v3d/v3d_perfmon.c
> > > +++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
> > > @@ -275,6 +275,12 @@ void v3d_perfmon_start(struct v3d_dev *v3d,
> > > struct v3d_perfmon *perfmon)
> > >   	if (!perfmon || v3d->global_perfmon)
> > >   		return;
> > >   
> > > +	/* Cross-queue serialization should have drained any
> > > previous perfmon
> > > +	 * job before this one runs.
> > > +	 */
> > > +	if (WARN_ON_ONCE(v3d->perfmon_state.active))
> > > +		return;
> > > +
> > >   	if (!pm_runtime_get_if_active(v3d->drm.dev))
> > >   		return;
> > >   
> > > diff --git a/drivers/gpu/drm/v3d/v3d_submit.c
> > > b/drivers/gpu/drm/v3d/v3d_submit.c
> > > index 4c526aafc4e0..46fc88bacc95 100644
> > > --- a/drivers/gpu/drm/v3d/v3d_submit.c
> > > +++ b/drivers/gpu/drm/v3d/v3d_submit.c
> > > @@ -234,6 +234,7 @@ v3d_submit_add_job(struct v3d_submit *submit,
> > > void **container, size_t size,
> > >   	job->v3d = v3d;
> > >   	job->free = free;
> > >   	job->file_priv = v3d_priv;
> > > +	job->queue = queue;
> > >   
> > >   	ret = drm_sched_job_init(&job->base, &v3d_priv-
> > > > sched_entity[queue],
> > >   				 1, v3d_priv, submit->file_priv-
> > > > client_id);
> > > @@ -293,6 +294,62 @@ v3d_attach_perfmon_to_jobs(struct v3d_submit
> > > *submit, u32 perfmon_id)
> > >   	return 0;
> > >   }
> > >   
> > > +/*
> > > + * Prepare fences to enforce job serialization when a perfmon is
> > > active. A job
> > > + * that carries a non-global perfmon must wait for every job
> > > currently in-flight
> > > + * across all HW queues to finish, otherwise concurrent
> > > unrelated
> > > work on the
> > > + * same core would pollute the performance counters.
> > > Symmetrically,
> > > while such a
> > > + * job is still in-flight, all subsequently submitted jobs must
> > > wait
> > > for it.
> > > + *
> > > + * We don't serialize the jobs when using a global perfmon as
> > > it's
> > > expected to
> > > + * track concurrent activity from all jobs.
> > > + */
> > > +static int
> > > +v3d_serialize_for_perfmon(struct v3d_job *job)
> > > +{
> > > +	struct v3d_dev *v3d = job->v3d;
> > > +	bool is_global_perfmon;
> > > +	int ret;
> > > +
> > > +	lockdep_assert_held(&v3d->sched_lock);
> > > +
> > > +	scoped_guard(spinlock_irqsave, &v3d->perfmon_state.lock)
> > > +		is_global_perfmon = !!v3d->global_perfmon;
> > > +
> > > +	if (is_global_perfmon)
> > > +		goto publish;
> > > +
> > > +	if (job->perfmon) {
> > > +		for (enum v3d_queue q = 0; q < V3D_MAX_QUEUES;
> > > q++)
> > > {
> > > +			struct dma_fence *f = v3d-
> > > > perfmon_state.last_hw_fence[q];
> > > +
> > > +			if (!f || dma_fence_is_signaled(f))
> > > +				continue;
> > > +
> > > +			ret = drm_sched_job_add_dependency(&job-
> > > > base, dma_fence_get(f));
> > > +			if (ret)
> > > +				return ret;
> > > +		}
> > > +	} else if (v3d->perfmon_state.fence &&
> > > +		   !dma_fence_is_signaled(v3d-
> > > >perfmon_state.fence))
> > > {
> > > +		ret = drm_sched_job_add_dependency(&job->base,
> > > +						
> > > dma_fence_get(v3d->perfmon_state.fence));
> > > +		if (ret)
> > > +			return ret;
> > > +	}
> > > +
> > > +publish:
> > > +	dma_fence_put(v3d->perfmon_state.last_hw_fence[job-
> > > >queue]);
> > > +	v3d->perfmon_state.last_hw_fence[job->queue] =
> > > dma_fence_get(job->done_fence);
> > > +
> > > +	if (job->perfmon && !is_global_perfmon) {
> > > +		dma_fence_put(v3d->perfmon_state.fence);
> > > +		v3d->perfmon_state.fence = dma_fence_get(job-
> > > > done_fence);
> > > +	}
> > > +
> > > +	return 0;
> > > +}
> > > +
> > >   static void
> > >   v3d_submit_attach_object_fences(struct v3d_submit *submit)
> > >   {
> > > @@ -337,6 +394,12 @@ v3d_submit_jobs(struct v3d_submit *submit)
> > >   			goto err;
> > >   	}
> > >   
> > > +	for (int i = 0; i < submit->job_count; i++) {
> > > +		ret = v3d_serialize_for_perfmon(submit-
> > > >jobs[i]);
> > > +		if (ret)
> > > +			goto err;
> > > +	}
> > > +
> > >   	for (int i = 0; i < submit->job_count; i++)
> > >   		drm_sched_entity_push_job(&submit->jobs[i]-
> > > >base);
> > >   
> > > 
> > 
> 
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Claude review: drm/v3d: Fix perfmon locking and cross-queue isolation
  2026-05-31 20:18 [PATCH v2 0/4] drm/v3d: Fix perfmon locking and cross-queue isolation Maíra Canal
                   ` (3 preceding siblings ...)
  2026-05-31 20:18 ` [PATCH v2 4/4] drm/v3d: Drop the queue argument from v3d_job_add_syncobjs() Maíra Canal
@ 2026-06-04  4:44 ` Claude Code Review Bot
  4 siblings, 0 replies; 15+ messages in thread
From: Claude Code Review Bot @ 2026-06-04  4:44 UTC (permalink / raw)
  To: dri-devel-reviews

Overall Series Review

Subject: drm/v3d: Fix perfmon locking and cross-queue isolation
Author: =?utf-8?q?Ma=C3=ADra_Canal?= <mcanal@igalia.com>
Patches: 10
Reviewed: 2026-06-04T14:44:42.323803

---

This is a well-structured 4-patch series from Maíra Canal that addresses three real, long-standing correctness issues in v3d's performance monitor subsystem: (1) lock-free access to `active_perfmon`, (2) perfmons never being stopped after job completion, and (3) cross-queue counter pollution. The cover letter clearly describes the problems and the approach.

The series is logically decomposed: patch 1 is a self-contained stable fix for refcount leaks, patch 2 is the major locking refactor, patch 3 adds cross-queue serialization via fence dependencies, and patch 4 is a trivial cleanup. The dependency on a prerequisite series (the submission refactoring) introduces a `v3d_submit` abstraction used in patches 3-4 that isn't present in drm-next today, but the patches are internally consistent.

**Overall verdict**: Good series. A few points worth discussing, mainly around locking subtleties in patch 2 and a potential race window in patch 3, but nothing blocking.

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Claude review: drm/v3d: Fix global performance monitor reference counting
  2026-05-31 20:18 ` [PATCH v2 1/4] drm/v3d: Fix global performance monitor reference counting Maíra Canal
@ 2026-06-04  4:44   ` Claude Code Review Bot
  0 siblings, 0 replies; 15+ messages in thread
From: Claude Code Review Bot @ 2026-06-04  4:44 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Status**: Looks correct. Good stable candidate.

This patch fixes three distinct refcount leak paths in `v3d_perfmon_set_global_ioctl()` and `v3d_perfmon_delete()`. The analysis in the commit message is precise.

The fix for `v3d_perfmon_delete()`:
```c
if (cmpxchg(&v3d->global_perfmon, perfmon, NULL) == perfmon)
    v3d_perfmon_put(perfmon);
```
Correctly releases the extra reference stashed by SET_GLOBAL only when the cmpxchg succeeds (i.e., this perfmon was actually the global one).

The CLEAR_GLOBAL path fix:
```c
v3d_perfmon_put(perfmon);

old = xchg(&v3d->global_perfmon, NULL);
if (!old)
    return -EINVAL;

v3d_perfmon_put(old);
```
Correctly drops the find reference on `perfmon` early (since CLEAR_GLOBAL doesn't use it), then drops the stashed reference on whatever was actually the global perfmon. The semantics are a bit surprising — CLEAR_GLOBAL takes a perfmon ID but ignores it for the actual clear — but the comment documents this, and matching the existing behavior is the right call for a stable fix.

The SET_GLOBAL error path:
```c
if (cmpxchg(&v3d->global_perfmon, NULL, perfmon)) {
    v3d_perfmon_put(perfmon);
    return -EBUSY;
}
```
Straightforward — drops the find reference when we can't install the perfmon.

**No issues found.**

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Claude review: drm/v3d: Refactor perfmon locking
  2026-05-31 20:18 ` [PATCH v2 2/4] drm/v3d: Refactor perfmon locking Maíra Canal
  2026-06-01 11:52   ` Iago Toral
@ 2026-06-04  4:44   ` Claude Code Review Bot
  1 sibling, 0 replies; 15+ messages in thread
From: Claude Code Review Bot @ 2026-06-04  4:44 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Status**: Generally good, a couple of observations.

This is the core patch. It replaces the per-perfmon mutex with a device-wide `perfmon_state.lock` spinlock, which is the correct granularity for protecting a device-wide singleton (`active_perfmon`). The spinlock also enables stopping the perfmon from IRQ context (the job-completion IRQ handler), which is the natural place to do it.

**Positive design decisions:**

1. Moving perfmon stop into `v3d_irq_signal_fence()` is correct — this is the natural job-completion boundary and solves the "perfmon never stopped if nothing queued behind" problem.

2. The suspend/resume split (`v3d_perfmon_suspend` / `v3d_perfmon_resume`) that captures HW counters without clearing `perfmon_state.active` is clean. This means the global perfmon survives power transitions correctly.

3. The `v3d_perfmon_get_values_ioctl()` change to snapshot values under the spinlock into a stack-local `values[]` array before `copy_to_user()` is correct — you can't hold a spinlock across `copy_to_user()`.

**Observations:**

1. **`v3d_perfmon_start()` calls `pm_runtime_get_if_active()` under spinlock_irqsave**:

```c
void v3d_perfmon_start(struct v3d_dev *v3d, struct v3d_perfmon *perfmon)
{
    guard(spinlock_irqsave)(&v3d->perfmon_state.lock);
    ...
    if (!pm_runtime_get_if_active(v3d->drm.dev))
        return;

    v3d_perfmon_hw_start(v3d, perfmon);
    v3d->perfmon_state.active = perfmon;

    v3d_pm_runtime_put(v3d);
}
```

`pm_runtime_get_if_active()` is documented as safe to call from atomic context (it uses `spin_lock_irqsave` internally on the power domain lock), so this should be fine. `v3d_pm_runtime_put()` likely wraps `pm_runtime_put_autosuspend()` which is also atomic-safe. Just worth confirming the `v3d_pm_runtime_put` wrapper doesn't do anything sleepable.

2. **`v3d_perfmon_start()` silently returns when `v3d->global_perfmon` is set**:

```c
if (!perfmon || v3d->global_perfmon)
    return;
```

This is correct for per-job perfmons (they shouldn't override the global one), but it means the `v3d_perfmon_start()` call from `run_job()` in the scheduler becomes a no-op when a global perfmon is active. The global perfmon is started in `v3d_perfmon_set_global_ioctl()` instead. This is a cleaner separation of concerns.

3. **`v3d_reset()` now calls `v3d_perfmon_resume()` instead of `v3d_perfmon_stop()`**:

```c
/* Re-arm the global perfmon HW counters that the reset zeroed. */
v3d_perfmon_resume(v3d);
```

This makes sense for the global perfmon case — a reset zeros the HW, so you need to reprogram the counters. For a per-job perfmon, the stop in `v3d_gpu_reset_for_timeout()` (added just above `v3d_reset()`) already cleared `perfmon_state.active`, so `v3d_perfmon_resume()` will find `active == NULL` and be a no-op. The ordering is correct: stop perfmon -> reset HW -> resume (reprograms if global active).

4. **Return from inside `scoped_guard` in `v3d_perfmon_set_global_ioctl()`**:

```c
scoped_guard(spinlock_irqsave, &v3d->perfmon_state.lock) {
    old = v3d->global_perfmon;
    if (!old)
        return -EINVAL;
    ...
}
```

The `scoped_guard` macro correctly handles early returns (the lock is released via the cleanup attribute), so this is fine. Just noting it since it's a pattern that can confuse readers.

5. **`v3d_perfmon_start()` sets `active` and calls `v3d_pm_runtime_put()` in the same spinlock section**. After `v3d_pm_runtime_put()`, a suspend could race in and call `v3d_perfmon_suspend()`, which also takes the spinlock. Since we're still holding the lock, this is safe — the suspend will spin until we release. But it means `v3d_pm_runtime_put()` must not trigger a synchronous suspend while holding the spinlock. `pm_runtime_put_autosuspend()` is deferred, so this should be fine.

**No blocking issues.**

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Claude review: drm/v3d: Serialize jobs across queues when a perfmon is attached
  2026-05-31 20:18 ` [PATCH v2 3/4] drm/v3d: Serialize jobs across queues when a perfmon is attached Maíra Canal
  2026-06-02  7:35   ` Iago Toral
@ 2026-06-04  4:44   ` Claude Code Review Bot
  1 sibling, 0 replies; 15+ messages in thread
From: Claude Code Review Bot @ 2026-06-04  4:44 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Status**: The design is sound but there's a subtle concern worth discussing.

The approach is straightforward: when a non-global perfmon-carrying job is submitted, it depends on all in-flight jobs across all queues (via `last_hw_fence[]`). Conversely, when a non-perfmon job is submitted while a perfmon job is in-flight, it depends on `perfmon_state.fence`.

**The `v3d_serialize_for_perfmon()` function:**

```c
static int
v3d_serialize_for_perfmon(struct v3d_job *job)
{
    ...
    lockdep_assert_held(&v3d->sched_lock);

    scoped_guard(spinlock_irqsave, &v3d->perfmon_state.lock)
        is_global_perfmon = !!v3d->global_perfmon;

    if (is_global_perfmon)
        goto publish;
    ...
```

**Observation 1 — TOCTOU on `global_perfmon` check**: The `is_global_perfmon` flag is read under the perfmon spinlock, but after dropping it, the global perfmon state could change before we act on it. However, `sched_lock` is held, which serializes all submissions. The global perfmon can only be set/cleared by the SET_GLOBAL ioctl, which doesn't hold `sched_lock`. So there's a theoretical window where:

- Thread A: holds `sched_lock`, reads `is_global_perfmon = false`, proceeds to add fence dependencies
- Thread B: concurrently calls SET_GLOBAL, installs a global perfmon

This seems low-risk in practice (the global perfmon is a privileged operation and the window is tiny), and the `sched_lock` assertion confirms submission serialization. But it's worth considering whether the SET_GLOBAL ioctl should also acquire `sched_lock` to close this gap. Given that the global perfmon case doesn't need serialization (concurrency is desired), the worst case is unnecessary serialization of one submission, not a correctness bug.

**Observation 2 — `last_hw_fence` update always happens (the `publish` label)**:

```c
publish:
    dma_fence_put(v3d->perfmon_state.last_hw_fence[job->queue]);
    v3d->perfmon_state.last_hw_fence[job->queue] = dma_fence_get(job->done_fence);

    if (job->perfmon && !is_global_perfmon) {
        dma_fence_put(v3d->perfmon_state.fence);
        v3d->perfmon_state.fence = dma_fence_get(job->done_fence);
    }
```

This correctly updates `last_hw_fence` for *every* job (needed so future perfmon jobs can wait on all in-flight work). The `perfmon_state.fence` is only set for perfmon-carrying jobs. This is sound.

**Observation 3 — `done_fence` vs `irq_fence`**: The code uses `job->done_fence` for the serialization fences. This is the scheduler's finished fence, which signals when the job completes. This is the correct fence to use (rather than `irq_fence` which is the HW interrupt fence).

**Observation 4 — WARN_ON_ONCE in `v3d_perfmon_start()`**:

```c
if (WARN_ON_ONCE(v3d->perfmon_state.active))
    return;
```

Good defensive check. With cross-queue serialization working correctly, there should never be an active perfmon when a new perfmon-carrying job starts. If this fires, it means serialization failed.

**Observation 5 — Cleanup in `v3d_gem_destroy()`**:

```c
for (q = 0; q < V3D_MAX_QUEUES; q++) {
    WARN_ON(v3d->queue[q].active_job);
    v3d_stats_put(v3d->queue[q].stats);
    dma_fence_put(v3d->perfmon_state.last_hw_fence[q]);
}

dma_fence_put(v3d->perfmon_state.fence);
```

Correct cleanup of the new fence references.

**No blocking issues.**

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Claude review: drm/v3d: Drop the queue argument from v3d_job_add_syncobjs()
  2026-05-31 20:18 ` [PATCH v2 4/4] drm/v3d: Drop the queue argument from v3d_job_add_syncobjs() Maíra Canal
@ 2026-06-04  4:44   ` Claude Code Review Bot
  0 siblings, 0 replies; 15+ messages in thread
From: Claude Code Review Bot @ 2026-06-04  4:44 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Status**: Clean mechanical cleanup, no issues.

Now that `job->queue` exists (introduced in patch 3), the explicit `enum v3d_queue queue` parameter to `v3d_job_add_syncobjs()` is redundant. The patch simply removes it and uses `job->queue` instead.

```c
-v3d_job_add_syncobjs(struct v3d_job *job, struct drm_file *file_priv,
-		     u32 in_sync, struct v3d_submit_ext *se, enum v3d_queue queue)
+v3d_job_add_syncobjs(struct v3d_job *job, struct drm_file *file_priv,
+		     u32 in_sync, struct v3d_submit_ext *se)
```

All call sites updated consistently. **No issues found.**

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-06-04  4:44 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-31 20:18 [PATCH v2 0/4] drm/v3d: Fix perfmon locking and cross-queue isolation Maíra Canal
2026-05-31 20:18 ` [PATCH v2 1/4] drm/v3d: Fix global performance monitor reference counting Maíra Canal
2026-06-04  4:44   ` Claude review: " Claude Code Review Bot
2026-05-31 20:18 ` [PATCH v2 2/4] drm/v3d: Refactor perfmon locking Maíra Canal
2026-06-01 11:52   ` Iago Toral
2026-06-01 12:03     ` Maíra Canal
2026-06-04  4:44   ` Claude review: " Claude Code Review Bot
2026-05-31 20:18 ` [PATCH v2 3/4] drm/v3d: Serialize jobs across queues when a perfmon is attached Maíra Canal
2026-06-02  7:35   ` Iago Toral
2026-06-02 10:49     ` Maíra Canal
2026-06-02 11:10       ` Iago Toral
2026-06-04  4:44   ` Claude review: " Claude Code Review Bot
2026-05-31 20:18 ` [PATCH v2 4/4] drm/v3d: Drop the queue argument from v3d_job_add_syncobjs() Maíra Canal
2026-06-04  4:44   ` Claude review: " Claude Code Review Bot
2026-06-04  4:44 ` Claude review: drm/v3d: Fix perfmon locking and cross-queue isolation Claude Code Review Bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox