* [PATCH v2 1/4] drm/v3d: Fix global performance monitor reference counting
2026-05-31 20:18 [PATCH v2 0/4] drm/v3d: Fix perfmon locking and cross-queue isolation Maíra Canal
@ 2026-05-31 20:18 ` Maíra Canal
2026-06-04 4:44 ` Claude review: " Claude Code Review Bot
2026-05-31 20:18 ` [PATCH v2 2/4] drm/v3d: Refactor perfmon locking Maíra Canal
` (3 subsequent siblings)
4 siblings, 1 reply; 15+ messages in thread
From: Maíra Canal @ 2026-05-31 20:18 UTC (permalink / raw)
To: Melissa Wen, Iago Toral, Tvrtko Ursulin, David Airlie,
Simona Vetter
Cc: kernel-dev, dri-devel, stable, Maíra Canal
In the SET_GLOBAL ioctl, v3d_perfmon_find() bumps the reference count on
the perfmon it returns, but v3d_perfmon_set_global_ioctl() and
v3d_perfmon_delete() fail to release that reference on several paths:
1. v3d_perfmon_set_global_ioctl() leaks the reference on its error
paths.
2. CLEAR_GLOBAL leaks both the find reference and the reference
previously stashed in v3d->global_perfmon by the SET_GLOBAL ioctl
that configured it.
3. Destroying a perfmon that is the current global perfmon leaks the
reference stashed by the SET_GLOBAL ioctl.
Release each of these references explicitly.
Cc: stable@vger.kernel.org
Fixes: c6eabbab359c ("drm/v3d: Add DRM_IOCTL_V3D_PERFMON_SET_GLOBAL")
Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
drivers/gpu/drm/v3d/v3d_perfmon.c | 24 +++++++++++++++++++-----
1 file changed, 19 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c b/drivers/gpu/drm/v3d/v3d_perfmon.c
index 02451fc09dbb..48ae748247be 100644
--- a/drivers/gpu/drm/v3d/v3d_perfmon.c
+++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
@@ -319,8 +319,11 @@ static void v3d_perfmon_delete(struct v3d_file_priv *v3d_priv,
if (perfmon == v3d->active_perfmon)
v3d_perfmon_stop(v3d, perfmon, false);
- /* If the global perfmon is being destroyed, set it to NULL */
- cmpxchg(&v3d->global_perfmon, perfmon, NULL);
+ /* If the global perfmon is being destroyed, clean it and release
+ * the reference stashed in v3d_perfmon_set_global_ioctl().
+ */
+ if (cmpxchg(&v3d->global_perfmon, perfmon, NULL) == perfmon)
+ v3d_perfmon_put(perfmon);
v3d_perfmon_put(perfmon);
}
@@ -471,16 +474,27 @@ int v3d_perfmon_set_global_ioctl(struct drm_device *dev, void *data,
/* If the request is to clear the global performance monitor */
if (req->flags & DRM_V3D_PERFMON_CLEAR_GLOBAL) {
- if (!v3d->global_perfmon)
+ struct v3d_perfmon *old;
+
+ /* DRM_V3D_PERFMON_CLEAR_GLOBAL doesn't check if
+ * v3d->global_perfmon == perfmon. Therefore, there
+ * is no need to keep perfmon's reference.
+ */
+ v3d_perfmon_put(perfmon);
+
+ old = xchg(&v3d->global_perfmon, NULL);
+ if (!old)
return -EINVAL;
- xchg(&v3d->global_perfmon, NULL);
+ v3d_perfmon_put(old);
return 0;
}
- if (cmpxchg(&v3d->global_perfmon, NULL, perfmon))
+ if (cmpxchg(&v3d->global_perfmon, NULL, perfmon)) {
+ v3d_perfmon_put(perfmon);
return -EBUSY;
+ }
return 0;
}
--
2.54.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* Claude review: drm/v3d: Fix global performance monitor reference counting
2026-05-31 20:18 ` [PATCH v2 1/4] drm/v3d: Fix global performance monitor reference counting Maíra Canal
@ 2026-06-04 4:44 ` Claude Code Review Bot
0 siblings, 0 replies; 15+ messages in thread
From: Claude Code Review Bot @ 2026-06-04 4:44 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Status**: Looks correct. Good stable candidate.
This patch fixes three distinct refcount leak paths in `v3d_perfmon_set_global_ioctl()` and `v3d_perfmon_delete()`. The analysis in the commit message is precise.
The fix for `v3d_perfmon_delete()`:
```c
if (cmpxchg(&v3d->global_perfmon, perfmon, NULL) == perfmon)
v3d_perfmon_put(perfmon);
```
Correctly releases the extra reference stashed by SET_GLOBAL only when the cmpxchg succeeds (i.e., this perfmon was actually the global one).
The CLEAR_GLOBAL path fix:
```c
v3d_perfmon_put(perfmon);
old = xchg(&v3d->global_perfmon, NULL);
if (!old)
return -EINVAL;
v3d_perfmon_put(old);
```
Correctly drops the find reference on `perfmon` early (since CLEAR_GLOBAL doesn't use it), then drops the stashed reference on whatever was actually the global perfmon. The semantics are a bit surprising — CLEAR_GLOBAL takes a perfmon ID but ignores it for the actual clear — but the comment documents this, and matching the existing behavior is the right call for a stable fix.
The SET_GLOBAL error path:
```c
if (cmpxchg(&v3d->global_perfmon, NULL, perfmon)) {
v3d_perfmon_put(perfmon);
return -EBUSY;
}
```
Straightforward — drops the find reference when we can't install the perfmon.
**No issues found.**
---
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 2/4] drm/v3d: Refactor perfmon locking
2026-05-31 20:18 [PATCH v2 0/4] drm/v3d: Fix perfmon locking and cross-queue isolation Maíra Canal
2026-05-31 20:18 ` [PATCH v2 1/4] drm/v3d: Fix global performance monitor reference counting Maíra Canal
@ 2026-05-31 20:18 ` Maíra Canal
2026-06-01 11:52 ` Iago Toral
2026-06-04 4:44 ` Claude review: " Claude Code Review Bot
2026-05-31 20:18 ` [PATCH v2 3/4] drm/v3d: Serialize jobs across queues when a perfmon is attached Maíra Canal
` (2 subsequent siblings)
4 siblings, 2 replies; 15+ messages in thread
From: Maíra Canal @ 2026-05-31 20:18 UTC (permalink / raw)
To: Melissa Wen, Iago Toral, Tvrtko Ursulin, David Airlie,
Simona Vetter
Cc: kernel-dev, dri-devel, Maíra Canal
v3d exposes a single set of performance counters per core, so at any
moment at most one performance monitor can be programmed in HW. In
software, this singleton is represented by v3d_dev->active_perfmon, but
until now nothing actually serialized access to it: scheduler callbacks,
the GPU-reset path, and perfmon ioctls all read and wrote that field
lock-free.
The existence of v3d_perfmon->lock mutex did not close the gap. It
serialized start/stop of *one* perfmon object against itself, but the
invariant that needs protection is device-wide: there can be exactly one
active perfmon at any moment in HW. Two threads acting on different
perfmon objects could race through v3d_dev->active_perfmon and the
counter registers, leaving software and HW out of sync.
This commit moves the locking to where the invariant actually lives. Group
the active perfmon pointer with a device-wide spinlock and route every
state transition (job start, job completion, set global, reset,
suspend/resume, destruction) through a small set of locked entry points
that are the only mutators of the HW counters.
Some design improvements needed to be made for the refactor:
1. Stop the perfmon from the IRQ handler at job-completion time (the
natural boundary for "active perfmon follows the active job"). This
required a change from a mutex to a spinlock. This solves another
issue of the existing design: perfmon start/stop was exclusively
attached to run_job() callbacks, which means that if nothing was
further queued up, a perfmon would never actually be stopped.
2. Pause/resume the HW counters across runtime-PM transitions without
dropping the software reference. This preserves the perfmon state
while the device is idle.
3. Move the global perfmon lifecycle management to the set_global
IOCTL. This simplifies the logic in v3d_perfmon_start() and
v3d_perfmon_stop(), as there is no need to always check if the
global perfmon is enabled.
4. v3d_perfmon_get_values_ioctl() doesn't stop the perfmon when
capturing the values. All lifecycle management is handled by the
job (for per-job perfmons) or the set_global IOCTL (for global
perfmons).
Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
drivers/gpu/drm/v3d/v3d_drv.h | 17 ++--
drivers/gpu/drm/v3d/v3d_gem.c | 4 +-
drivers/gpu/drm/v3d/v3d_irq.c | 7 +-
drivers/gpu/drm/v3d/v3d_perfmon.c | 183 +++++++++++++++++++++++++++-----------
drivers/gpu/drm/v3d/v3d_power.c | 4 +
drivers/gpu/drm/v3d/v3d_sched.c | 26 ++----
drivers/gpu/drm/v3d/v3d_submit.c | 6 +-
7 files changed, 165 insertions(+), 82 deletions(-)
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 071d919fe860..51486af68cf4 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -86,9 +86,6 @@ struct v3d_perfmon {
*/
refcount_t refcnt;
- /* Protects perfmon stop, as it can be invoked from multiple places. */
- struct mutex lock;
-
/* Number of counters activated in this perfmon instance
* (should be less than DRM_V3D_MAX_PERF_COUNTERS).
*/
@@ -170,8 +167,14 @@ struct v3d_dev {
struct v3d_queue_state queue[V3D_MAX_QUEUES];
- /* Used to track the active perfmon if any. */
- struct v3d_perfmon *active_perfmon;
+ /* Tracks the performance monitor state. */
+ struct {
+ /* Protects @active. */
+ spinlock_t lock;
+
+ /* Perfmon currently programmed in HW (or NULL if none). */
+ struct v3d_perfmon *active;
+ } perfmon_state;
/* Protects bo_stats */
struct mutex bo_lock;
@@ -663,6 +666,10 @@ void v3d_perfmon_put(struct v3d_perfmon *perfmon);
void v3d_perfmon_start(struct v3d_dev *v3d, struct v3d_perfmon *perfmon);
void v3d_perfmon_stop(struct v3d_dev *v3d, struct v3d_perfmon *perfmon,
bool capture);
+void v3d_perfmon_stop_locked(struct v3d_dev *v3d, struct v3d_perfmon *perfmon,
+ bool capture);
+void v3d_perfmon_suspend(struct v3d_dev *v3d);
+void v3d_perfmon_resume(struct v3d_dev *v3d);
struct v3d_perfmon *v3d_perfmon_find(struct v3d_file_priv *v3d_priv, int id);
void v3d_perfmon_open_file(struct v3d_file_priv *v3d_priv);
void v3d_perfmon_close_file(struct v3d_file_priv *v3d_priv);
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index 1ee3c038d5f6..9487ab7acd03 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -137,7 +137,8 @@ v3d_reset(struct v3d_dev *v3d)
v3d_mmu_set_page_table(v3d);
v3d_irq_reset(v3d);
- v3d_perfmon_stop(v3d, v3d->active_perfmon, false);
+ /* Re-arm the global perfmon HW counters that the reset zeroed. */
+ v3d_perfmon_resume(v3d);
trace_v3d_reset_end(dev);
}
@@ -299,6 +300,7 @@ v3d_gem_init(struct drm_device *dev)
}
spin_lock_init(&v3d->mm_lock);
+ spin_lock_init(&v3d->perfmon_state.lock);
ret = drmm_mutex_init(dev, &v3d->bo_lock);
if (ret)
goto err_stats;
diff --git a/drivers/gpu/drm/v3d/v3d_irq.c b/drivers/gpu/drm/v3d/v3d_irq.c
index 754a969b862b..41fce1f8f96c 100644
--- a/drivers/gpu/drm/v3d/v3d_irq.c
+++ b/drivers/gpu/drm/v3d/v3d_irq.c
@@ -87,9 +87,12 @@ v3d_irq_signal_fence(struct v3d_dev *v3d, enum v3d_queue q,
void (*trace_irq)(struct drm_device *, uint64_t))
{
struct v3d_queue_state *queue = &v3d->queue[q];
- struct v3d_fence *fence = to_v3d_fence(queue->active_job->irq_fence);
+ struct v3d_job *job = queue->active_job;
+ struct v3d_fence *fence = to_v3d_fence(job->irq_fence);
- v3d_job_update_stats(queue->active_job);
+ v3d_perfmon_stop(v3d, job->perfmon, true);
+
+ v3d_job_update_stats(job);
trace_irq(&v3d->drm, fence->seqno);
queue->active_job = NULL;
diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c b/drivers/gpu/drm/v3d/v3d_perfmon.c
index 48ae748247be..3ad0f022753c 100644
--- a/drivers/gpu/drm/v3d/v3d_perfmon.c
+++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
@@ -217,26 +217,15 @@ void v3d_perfmon_get(struct v3d_perfmon *perfmon)
void v3d_perfmon_put(struct v3d_perfmon *perfmon)
{
- if (perfmon && refcount_dec_and_test(&perfmon->refcnt)) {
- mutex_destroy(&perfmon->lock);
+ if (perfmon && refcount_dec_and_test(&perfmon->refcnt))
kfree(perfmon);
- }
}
-void v3d_perfmon_start(struct v3d_dev *v3d, struct v3d_perfmon *perfmon)
+static void v3d_perfmon_hw_start(struct v3d_dev *v3d, struct v3d_perfmon *perfmon)
{
+ u8 ncounters = perfmon->ncounters;
+ u32 mask = GENMASK(ncounters - 1, 0);
unsigned int i;
- u32 mask;
- u8 ncounters;
-
- if (WARN_ON_ONCE(!perfmon || v3d->active_perfmon))
- return;
-
- if (!pm_runtime_get_if_active(v3d->drm.dev))
- return;
-
- ncounters = perfmon->ncounters;
- mask = GENMASK(ncounters - 1, 0);
for (i = 0; i < ncounters; i++) {
u32 source = i / 4;
@@ -258,39 +247,106 @@ void v3d_perfmon_start(struct v3d_dev *v3d, struct v3d_perfmon *perfmon)
V3D_CORE_WRITE(0, V3D_V4_PCTR_0_EN, mask);
V3D_CORE_WRITE(0, V3D_V4_PCTR_0_CLR, mask);
V3D_CORE_WRITE(0, V3D_PCTR_0_OVERFLOW, mask);
+}
- v3d->active_perfmon = perfmon;
+static void v3d_perfmon_hw_capture(struct v3d_dev *v3d, struct v3d_perfmon *perfmon)
+{
+ u32 mask = GENMASK(perfmon->ncounters - 1, 0);
+ for (int i = 0; i < perfmon->ncounters; i++)
+ perfmon->values[i] += V3D_CORE_READ(0, V3D_PCTR_0_PCTRX(i));
+
+ V3D_CORE_WRITE(0, V3D_V4_PCTR_0_CLR, mask);
+}
+
+static void v3d_perfmon_hw_stop(struct v3d_dev *v3d, struct v3d_perfmon *perfmon,
+ bool capture)
+{
+ if (capture)
+ v3d_perfmon_hw_capture(v3d, perfmon);
+
+ V3D_CORE_WRITE(0, V3D_V4_PCTR_0_EN, 0);
+}
+
+void v3d_perfmon_start(struct v3d_dev *v3d, struct v3d_perfmon *perfmon)
+{
+ guard(spinlock_irqsave)(&v3d->perfmon_state.lock);
+
+ if (!perfmon || v3d->global_perfmon)
+ return;
+
+ if (!pm_runtime_get_if_active(v3d->drm.dev))
+ return;
+
+ v3d_perfmon_hw_start(v3d, perfmon);
+ v3d->perfmon_state.active = perfmon;
+
+ v3d_pm_runtime_put(v3d);
+}
+
+static void v3d_perfmon_capture_locked(struct v3d_dev *v3d,
+ struct v3d_perfmon *perfmon)
+{
+ lockdep_assert_held(&v3d->perfmon_state.lock);
+
+ if (!perfmon || perfmon != v3d->perfmon_state.active)
+ return;
+
+ if (!pm_runtime_get_if_active(v3d->drm.dev))
+ return;
+
+ v3d_perfmon_hw_capture(v3d, perfmon);
+ v3d_pm_runtime_put(v3d);
+}
+
+void v3d_perfmon_stop_locked(struct v3d_dev *v3d, struct v3d_perfmon *perfmon,
+ bool capture)
+{
+ lockdep_assert_held(&v3d->perfmon_state.lock);
+
+ if (!perfmon || perfmon != v3d->perfmon_state.active)
+ return;
+
+ v3d->perfmon_state.active = NULL;
+
+ /* If the device is suspended, the HW has already stopped counting. */
+ if (!pm_runtime_get_if_active(v3d->drm.dev))
+ return;
+
+ v3d_perfmon_hw_stop(v3d, perfmon, capture);
v3d_pm_runtime_put(v3d);
}
void v3d_perfmon_stop(struct v3d_dev *v3d, struct v3d_perfmon *perfmon,
bool capture)
{
- unsigned int i;
-
- if (!perfmon || !v3d->active_perfmon)
+ if (!perfmon)
return;
- mutex_lock(&perfmon->lock);
- if (perfmon != v3d->active_perfmon)
- goto out;
+ guard(spinlock_irqsave)(&v3d->perfmon_state.lock);
+ v3d_perfmon_stop_locked(v3d, perfmon, capture);
+}
- if (!pm_runtime_get_if_active(v3d->drm.dev))
- goto out_clear;
+void
+v3d_perfmon_suspend(struct v3d_dev *v3d)
+{
+ guard(spinlock_irqsave)(&v3d->perfmon_state.lock);
- if (capture)
- for (i = 0; i < perfmon->ncounters; i++)
- perfmon->values[i] += V3D_CORE_READ(0, V3D_PCTR_0_PCTRX(i));
+ if (!v3d->perfmon_state.active)
+ return;
- V3D_CORE_WRITE(0, V3D_V4_PCTR_0_EN, 0);
+ v3d_perfmon_hw_stop(v3d, v3d->perfmon_state.active, true);
+}
- v3d_pm_runtime_put(v3d);
+void
+v3d_perfmon_resume(struct v3d_dev *v3d)
+{
+ guard(spinlock_irqsave)(&v3d->perfmon_state.lock);
-out_clear:
- v3d->active_perfmon = NULL;
-out:
- mutex_unlock(&perfmon->lock);
+ if (!v3d->perfmon_state.active)
+ return;
+
+ v3d_perfmon_hw_start(v3d, v3d->perfmon_state.active);
}
struct v3d_perfmon *v3d_perfmon_find(struct v3d_file_priv *v3d_priv, int id)
@@ -316,14 +372,17 @@ static void v3d_perfmon_delete(struct v3d_file_priv *v3d_priv,
struct v3d_dev *v3d = v3d_priv->v3d;
/* If the active perfmon is being destroyed, stop it first */
- if (perfmon == v3d->active_perfmon)
- v3d_perfmon_stop(v3d, perfmon, false);
+ scoped_guard(spinlock_irqsave, &v3d->perfmon_state.lock) {
+ v3d_perfmon_stop_locked(v3d, perfmon, false);
- /* If the global perfmon is being destroyed, clean it and release
- * the reference stashed in v3d_perfmon_set_global_ioctl().
- */
- if (cmpxchg(&v3d->global_perfmon, perfmon, NULL) == perfmon)
- v3d_perfmon_put(perfmon);
+ /* If the global perfmon is being destroyed, clean it and release
+ * the reference stashed in v3d_perfmon_set_global_ioctl().
+ */
+ if (v3d->global_perfmon == perfmon) {
+ v3d_perfmon_put(v3d->global_perfmon);
+ v3d->global_perfmon = NULL;
+ }
+ }
v3d_perfmon_put(perfmon);
}
@@ -371,12 +430,10 @@ int v3d_perfmon_create_ioctl(struct drm_device *dev, void *data,
perfmon->ncounters = req->ncounters;
refcount_set(&perfmon->refcnt, 1);
- mutex_init(&perfmon->lock);
ret = xa_alloc(&v3d_priv->perfmons, &id, perfmon, xa_limit_32b,
GFP_KERNEL);
if (ret < 0) {
- mutex_destroy(&perfmon->lock);
kfree(perfmon);
return ret;
}
@@ -408,7 +465,9 @@ int v3d_perfmon_get_values_ioctl(struct drm_device *dev, void *data,
struct v3d_dev *v3d = to_v3d_dev(dev);
struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
struct drm_v3d_perfmon_get_values *req = data;
+ u64 values[DRM_V3D_MAX_PERF_COUNTERS];
struct v3d_perfmon *perfmon;
+ size_t size;
int ret = 0;
if (req->pad != 0)
@@ -418,10 +477,14 @@ int v3d_perfmon_get_values_ioctl(struct drm_device *dev, void *data,
if (!perfmon)
return -EINVAL;
- v3d_perfmon_stop(v3d, perfmon, true);
+ size = perfmon->ncounters * sizeof(u64);
- if (copy_to_user(u64_to_user_ptr(req->values_ptr), perfmon->values,
- perfmon->ncounters * sizeof(u64)))
+ scoped_guard(spinlock_irqsave, &v3d->perfmon_state.lock) {
+ v3d_perfmon_capture_locked(v3d, perfmon);
+ memcpy(values, perfmon->values, size);
+ }
+
+ if (copy_to_user(u64_to_user_ptr(req->values_ptr), values, size))
ret = -EFAULT;
v3d_perfmon_put(perfmon);
@@ -482,18 +545,36 @@ int v3d_perfmon_set_global_ioctl(struct drm_device *dev, void *data,
*/
v3d_perfmon_put(perfmon);
- old = xchg(&v3d->global_perfmon, NULL);
- if (!old)
- return -EINVAL;
+ scoped_guard(spinlock_irqsave, &v3d->perfmon_state.lock) {
+ old = v3d->global_perfmon;
+ if (!old)
+ return -EINVAL;
+
+ v3d_perfmon_stop_locked(v3d, old, true);
+ v3d->global_perfmon = NULL;
+ }
v3d_perfmon_put(old);
return 0;
}
- if (cmpxchg(&v3d->global_perfmon, NULL, perfmon)) {
- v3d_perfmon_put(perfmon);
- return -EBUSY;
+ scoped_guard(spinlock_irqsave, &v3d->perfmon_state.lock) {
+ if (v3d->perfmon_state.active || v3d->global_perfmon) {
+ v3d_perfmon_put(perfmon);
+ return -EBUSY;
+ }
+
+ v3d->global_perfmon = perfmon;
+ v3d->perfmon_state.active = perfmon;
+
+ /* If the device is suspended, v3d_perfmon_resume() will
+ * program the HW on the next resume.
+ */
+ if (pm_runtime_get_if_active(v3d->drm.dev)) {
+ v3d_perfmon_hw_start(v3d, perfmon);
+ v3d_pm_runtime_put(v3d);
+ }
}
return 0;
diff --git a/drivers/gpu/drm/v3d/v3d_power.c b/drivers/gpu/drm/v3d/v3d_power.c
index 769e90032b04..1a4b651a2c5f 100644
--- a/drivers/gpu/drm/v3d/v3d_power.c
+++ b/drivers/gpu/drm/v3d/v3d_power.c
@@ -50,6 +50,8 @@ int v3d_power_suspend(struct device *dev)
struct v3d_dev *v3d = to_v3d_dev(drm);
int ret;
+ v3d_perfmon_suspend(v3d);
+
v3d_irq_disable(v3d);
ret = v3d_suspend_sms(v3d);
@@ -83,5 +85,7 @@ int v3d_power_resume(struct device *dev)
v3d_mmu_set_page_table(v3d);
v3d_irq_enable(v3d);
+ v3d_perfmon_resume(v3d);
+
return 0;
}
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index c16a9d4d41e6..4d2b91d49542 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -125,24 +125,6 @@ v3d_performance_query_info_free(struct v3d_performance_query_info *query_info,
}
}
-static void
-v3d_switch_perfmon(struct v3d_dev *v3d, struct v3d_job *job)
-{
- struct v3d_perfmon *perfmon = v3d->global_perfmon;
-
- if (!perfmon)
- perfmon = job->perfmon;
-
- if (perfmon == v3d->active_perfmon)
- return;
-
- if (perfmon != v3d->active_perfmon)
- v3d_perfmon_stop(v3d, v3d->active_perfmon, true);
-
- if (perfmon && v3d->active_perfmon != perfmon)
- v3d_perfmon_start(v3d, perfmon);
-}
-
static void
v3d_stats_start(struct v3d_stats *stats, u64 now)
{
@@ -219,7 +201,7 @@ static struct dma_fence *v3d_bin_job_run(struct drm_sched_job *sched_job)
job->start, job->end);
v3d_job_start_stats(&job->base);
- v3d_switch_perfmon(v3d, &job->base);
+ v3d_perfmon_start(v3d, job->base.perfmon);
/* Set the current and end address of the control list.
* Writing the end register is what starts the job.
@@ -277,7 +259,7 @@ static struct dma_fence *v3d_render_job_run(struct drm_sched_job *sched_job)
job->start, job->end);
v3d_job_start_stats(&job->base);
- v3d_switch_perfmon(v3d, &job->base);
+ v3d_perfmon_start(v3d, job->base.perfmon);
/* XXX: Set the QCFG */
@@ -370,7 +352,7 @@ v3d_csd_job_run(struct drm_sched_job *sched_job)
trace_v3d_submit_csd(dev, to_v3d_fence(fence)->seqno);
v3d_job_start_stats(&job->base);
- v3d_switch_perfmon(v3d, &job->base);
+ v3d_perfmon_start(v3d, job->base.perfmon);
csd_cfg0_reg = V3D_CSD_QUEUED_CFG0(v3d->ver);
for (i = 1; i <= 6; i++)
@@ -711,6 +693,8 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d, struct drm_sched_job *sched_job,
if (sched_job)
drm_sched_increase_karma(sched_job);
+ v3d_perfmon_stop(v3d, job->perfmon, false);
+
/* get the GPU back into the init state */
v3d_reset(v3d);
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 02441d4f495d..4c526aafc4e0 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -275,8 +275,10 @@ v3d_attach_perfmon_to_jobs(struct v3d_submit *submit, u32 perfmon_id)
if (!perfmon_id)
return 0;
- if (v3d->global_perfmon)
- return -EAGAIN;
+ scoped_guard(spinlock_irqsave, &v3d->perfmon_state.lock) {
+ if (v3d->global_perfmon)
+ return -EAGAIN;
+ }
perfmon = v3d_perfmon_find(v3d_priv, perfmon_id);
if (!perfmon)
--
2.54.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH v2 2/4] drm/v3d: Refactor perfmon locking
2026-05-31 20:18 ` [PATCH v2 2/4] drm/v3d: Refactor perfmon locking Maíra Canal
@ 2026-06-01 11:52 ` Iago Toral
2026-06-01 12:03 ` Maíra Canal
2026-06-04 4:44 ` Claude review: " Claude Code Review Bot
1 sibling, 1 reply; 15+ messages in thread
From: Iago Toral @ 2026-06-01 11:52 UTC (permalink / raw)
To: Maíra Canal, Melissa Wen, Tvrtko Ursulin, David Airlie,
Simona Vetter
Cc: kernel-dev, dri-devel
El dom, 31-05-2026 a las 17:18 -0300, Maíra Canal escribió:
(...)
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> b/drivers/gpu/drm/v3d/v3d_gem.c
> index 1ee3c038d5f6..9487ab7acd03 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -137,7 +137,8 @@ v3d_reset(struct v3d_dev *v3d)
> v3d_mmu_set_page_table(v3d);
> v3d_irq_reset(v3d);
>
> - v3d_perfmon_stop(v3d, v3d->active_perfmon, false);
> + /* Re-arm the global perfmon HW counters that the reset
> zeroed. */
> + v3d_perfmon_resume(v3d);
What would happen if the reset happens when a non-global perfmon is
active?
Iago
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v2 2/4] drm/v3d: Refactor perfmon locking
2026-06-01 11:52 ` Iago Toral
@ 2026-06-01 12:03 ` Maíra Canal
0 siblings, 0 replies; 15+ messages in thread
From: Maíra Canal @ 2026-06-01 12:03 UTC (permalink / raw)
To: Iago Toral, Melissa Wen, Tvrtko Ursulin, David Airlie,
Simona Vetter
Cc: kernel-dev, dri-devel
Hi Iago,
Thanks for your review!
On 6/1/26 08:52, Iago Toral wrote:
> El dom, 31-05-2026 a las 17:18 -0300, Maíra Canal escribió:
>
> (...)
>> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
>> b/drivers/gpu/drm/v3d/v3d_gem.c
>> index 1ee3c038d5f6..9487ab7acd03 100644
>> --- a/drivers/gpu/drm/v3d/v3d_gem.c
>> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
>> @@ -137,7 +137,8 @@ v3d_reset(struct v3d_dev *v3d)
>> v3d_mmu_set_page_table(v3d);
>> v3d_irq_reset(v3d);
>>
>> - v3d_perfmon_stop(v3d, v3d->active_perfmon, false);
>> + /* Re-arm the global perfmon HW counters that the reset
>> zeroed. */
>> + v3d_perfmon_resume(v3d);
>
> What would happen if the reset happens when a non-global perfmon is
> active?
>
Nothing. In v3d_gpu_reset_for_timeout(), before calling v3d_reset(), we
stop job->perfmon and set v3d->perfmon_state.active = NULL.
@@ -711,6 +693,8 @@ v3d_gpu_reset_for_timeout(struct v3d_dev *v3d,
struct drm_sched_job *sched_job,
if (sched_job)
drm_sched_increase_karma(sched_job);
+ v3d_perfmon_stop(v3d, job->perfmon, false);
+
/* get the GPU back into the init state */
v3d_reset(v3d);
Best regards,
- Maíra
> Iago
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Claude review: drm/v3d: Refactor perfmon locking
2026-05-31 20:18 ` [PATCH v2 2/4] drm/v3d: Refactor perfmon locking Maíra Canal
2026-06-01 11:52 ` Iago Toral
@ 2026-06-04 4:44 ` Claude Code Review Bot
1 sibling, 0 replies; 15+ messages in thread
From: Claude Code Review Bot @ 2026-06-04 4:44 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Status**: Generally good, a couple of observations.
This is the core patch. It replaces the per-perfmon mutex with a device-wide `perfmon_state.lock` spinlock, which is the correct granularity for protecting a device-wide singleton (`active_perfmon`). The spinlock also enables stopping the perfmon from IRQ context (the job-completion IRQ handler), which is the natural place to do it.
**Positive design decisions:**
1. Moving perfmon stop into `v3d_irq_signal_fence()` is correct — this is the natural job-completion boundary and solves the "perfmon never stopped if nothing queued behind" problem.
2. The suspend/resume split (`v3d_perfmon_suspend` / `v3d_perfmon_resume`) that captures HW counters without clearing `perfmon_state.active` is clean. This means the global perfmon survives power transitions correctly.
3. The `v3d_perfmon_get_values_ioctl()` change to snapshot values under the spinlock into a stack-local `values[]` array before `copy_to_user()` is correct — you can't hold a spinlock across `copy_to_user()`.
**Observations:**
1. **`v3d_perfmon_start()` calls `pm_runtime_get_if_active()` under spinlock_irqsave**:
```c
void v3d_perfmon_start(struct v3d_dev *v3d, struct v3d_perfmon *perfmon)
{
guard(spinlock_irqsave)(&v3d->perfmon_state.lock);
...
if (!pm_runtime_get_if_active(v3d->drm.dev))
return;
v3d_perfmon_hw_start(v3d, perfmon);
v3d->perfmon_state.active = perfmon;
v3d_pm_runtime_put(v3d);
}
```
`pm_runtime_get_if_active()` is documented as safe to call from atomic context (it uses `spin_lock_irqsave` internally on the power domain lock), so this should be fine. `v3d_pm_runtime_put()` likely wraps `pm_runtime_put_autosuspend()` which is also atomic-safe. Just worth confirming the `v3d_pm_runtime_put` wrapper doesn't do anything sleepable.
2. **`v3d_perfmon_start()` silently returns when `v3d->global_perfmon` is set**:
```c
if (!perfmon || v3d->global_perfmon)
return;
```
This is correct for per-job perfmons (they shouldn't override the global one), but it means the `v3d_perfmon_start()` call from `run_job()` in the scheduler becomes a no-op when a global perfmon is active. The global perfmon is started in `v3d_perfmon_set_global_ioctl()` instead. This is a cleaner separation of concerns.
3. **`v3d_reset()` now calls `v3d_perfmon_resume()` instead of `v3d_perfmon_stop()`**:
```c
/* Re-arm the global perfmon HW counters that the reset zeroed. */
v3d_perfmon_resume(v3d);
```
This makes sense for the global perfmon case — a reset zeros the HW, so you need to reprogram the counters. For a per-job perfmon, the stop in `v3d_gpu_reset_for_timeout()` (added just above `v3d_reset()`) already cleared `perfmon_state.active`, so `v3d_perfmon_resume()` will find `active == NULL` and be a no-op. The ordering is correct: stop perfmon -> reset HW -> resume (reprograms if global active).
4. **Return from inside `scoped_guard` in `v3d_perfmon_set_global_ioctl()`**:
```c
scoped_guard(spinlock_irqsave, &v3d->perfmon_state.lock) {
old = v3d->global_perfmon;
if (!old)
return -EINVAL;
...
}
```
The `scoped_guard` macro correctly handles early returns (the lock is released via the cleanup attribute), so this is fine. Just noting it since it's a pattern that can confuse readers.
5. **`v3d_perfmon_start()` sets `active` and calls `v3d_pm_runtime_put()` in the same spinlock section**. After `v3d_pm_runtime_put()`, a suspend could race in and call `v3d_perfmon_suspend()`, which also takes the spinlock. Since we're still holding the lock, this is safe — the suspend will spin until we release. But it means `v3d_pm_runtime_put()` must not trigger a synchronous suspend while holding the spinlock. `pm_runtime_put_autosuspend()` is deferred, so this should be fine.
**No blocking issues.**
---
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 3/4] drm/v3d: Serialize jobs across queues when a perfmon is attached
2026-05-31 20:18 [PATCH v2 0/4] drm/v3d: Fix perfmon locking and cross-queue isolation Maíra Canal
2026-05-31 20:18 ` [PATCH v2 1/4] drm/v3d: Fix global performance monitor reference counting Maíra Canal
2026-05-31 20:18 ` [PATCH v2 2/4] drm/v3d: Refactor perfmon locking Maíra Canal
@ 2026-05-31 20:18 ` Maíra Canal
2026-06-02 7:35 ` Iago Toral
2026-06-04 4:44 ` Claude review: " Claude Code Review Bot
2026-05-31 20:18 ` [PATCH v2 4/4] drm/v3d: Drop the queue argument from v3d_job_add_syncobjs() Maíra Canal
2026-06-04 4:44 ` Claude review: drm/v3d: Fix perfmon locking and cross-queue isolation Claude Code Review Bot
4 siblings, 2 replies; 15+ messages in thread
From: Maíra Canal @ 2026-05-31 20:18 UTC (permalink / raw)
To: Melissa Wen, Iago Toral, Tvrtko Ursulin, David Airlie,
Simona Vetter
Cc: kernel-dev, dri-devel, Maíra Canal
A non-global perfmon is meant to count events generated by a specific
submission, but the scheduler can run jobs from different queues
concurrently on the same V3D core. Without explicit serialization, an
unrelated job running in parallel with a perfmon-carrying job pollutes
the counters and generates unusable results.
To address such issue, we must enforce cross-queue serialization when we
detect a perfmon-carrying submission. It's possible to implement
serialization by enforcing two rules:
1. A job that carries a non-global perfmon must wait for every job
currently in-flight across all HW queues to finish.
2. While a perfmon-carrying job is still in-flight, all subsequently
submitted jobs must wait for it.
Note that serialization is not needed in the global perfmon case, as the
global perfmon tracks activity from all jobs, so concurrency is desirable.
Therefore, check if serialization is needed during job submission and if
so, attach fence dependences to enforce cross-queue serialization.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
drivers/gpu/drm/v3d/v3d_drv.h | 35 ++++++++++++++++++----
drivers/gpu/drm/v3d/v3d_gem.c | 3 ++
drivers/gpu/drm/v3d/v3d_perfmon.c | 6 ++++
drivers/gpu/drm/v3d/v3d_submit.c | 63 +++++++++++++++++++++++++++++++++++++++
4 files changed, 101 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/v3d_drv.h
index 51486af68cf4..cdf4926d51f2 100644
--- a/drivers/gpu/drm/v3d/v3d_drv.h
+++ b/drivers/gpu/drm/v3d/v3d_drv.h
@@ -74,11 +74,13 @@ struct v3d_queue_state {
spinlock_t queue_lock;
};
-/* Performance monitor object. The perform lifetime is controlled by userspace
- * using perfmon related ioctls. A perfmon can be attached to a submit_cl
- * request, and when this is the case, HW perf counters will be activated just
- * before the submit_cl is submitted to the GPU and disabled when the job is
- * done. This way, only events related to a specific job will be counted.
+/* Performance monitor object
+ *
+ * The performance monitor (perfmon) lifetime is controlled by userspace using
+ * perfmon related ioctls. A perfmon can be attached to a CL or CSD submission
+ * request, and when it is, HW performance counters will be activated just
+ * before the job is submitted to the GPU and disabled when the job is done.
+ * This way, only events related to a specific submission will be counted.
*/
struct v3d_perfmon {
/* Tracks the number of users of the perfmon, when this counter reaches
@@ -167,13 +169,31 @@ struct v3d_dev {
struct v3d_queue_state queue[V3D_MAX_QUEUES];
- /* Tracks the performance monitor state. */
+ /*
+ * Tracks the performance monitor state and consistency.
+ *
+ * When a non-global perfmon is attached to a job, the scheduler must
+ * not run any other job on the HW concurrently (otherwise, the
+ * counters would be polluted by unrelated work).
+ */
struct {
/* Protects @active. */
spinlock_t lock;
/* Perfmon currently programmed in HW (or NULL if none). */
struct v3d_perfmon *active;
+
+ /* Finished fence of the most recently submitted job that
+ * opened a serialization window (i.e. a job with a non-global
+ * perfmon attached).
+ */
+ struct dma_fence *fence;
+
+ /* Finished fence of the most recently submitted job on each HW
+ * queue. Used so that a new perfmon-carrying job can depend on
+ * every job currently in-flight across all queues.
+ */
+ struct dma_fence *last_hw_fence[V3D_MAX_QUEUES];
} perfmon_state;
/* Protects bo_stats */
@@ -319,6 +339,9 @@ struct v3d_job {
struct v3d_dev *v3d;
+ /* The queue that the job was submitted on. */
+ enum v3d_queue queue;
+
/* This is the array of BOs that were looked up at the start
* of submission.
*/
diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index 9487ab7acd03..0b415486f5b4 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -363,8 +363,11 @@ v3d_gem_destroy(struct drm_device *dev)
for (q = 0; q < V3D_MAX_QUEUES; q++) {
WARN_ON(v3d->queue[q].active_job);
v3d_stats_put(v3d->queue[q].stats);
+ dma_fence_put(v3d->perfmon_state.last_hw_fence[q]);
}
+ dma_fence_put(v3d->perfmon_state.fence);
+
drm_mm_takedown(&v3d->mm);
dma_free_coherent(v3d->drm.dev, 4096 * 1024, (void *)v3d->pt,
diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c b/drivers/gpu/drm/v3d/v3d_perfmon.c
index 3ad0f022753c..07dab7fb3060 100644
--- a/drivers/gpu/drm/v3d/v3d_perfmon.c
+++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
@@ -275,6 +275,12 @@ void v3d_perfmon_start(struct v3d_dev *v3d, struct v3d_perfmon *perfmon)
if (!perfmon || v3d->global_perfmon)
return;
+ /* Cross-queue serialization should have drained any previous perfmon
+ * job before this one runs.
+ */
+ if (WARN_ON_ONCE(v3d->perfmon_state.active))
+ return;
+
if (!pm_runtime_get_if_active(v3d->drm.dev))
return;
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 4c526aafc4e0..46fc88bacc95 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -234,6 +234,7 @@ v3d_submit_add_job(struct v3d_submit *submit, void **container, size_t size,
job->v3d = v3d;
job->free = free;
job->file_priv = v3d_priv;
+ job->queue = queue;
ret = drm_sched_job_init(&job->base, &v3d_priv->sched_entity[queue],
1, v3d_priv, submit->file_priv->client_id);
@@ -293,6 +294,62 @@ v3d_attach_perfmon_to_jobs(struct v3d_submit *submit, u32 perfmon_id)
return 0;
}
+/*
+ * Prepare fences to enforce job serialization when a perfmon is active. A job
+ * that carries a non-global perfmon must wait for every job currently in-flight
+ * across all HW queues to finish, otherwise concurrent unrelated work on the
+ * same core would pollute the performance counters. Symmetrically, while such a
+ * job is still in-flight, all subsequently submitted jobs must wait for it.
+ *
+ * We don't serialize the jobs when using a global perfmon as it's expected to
+ * track concurrent activity from all jobs.
+ */
+static int
+v3d_serialize_for_perfmon(struct v3d_job *job)
+{
+ struct v3d_dev *v3d = job->v3d;
+ bool is_global_perfmon;
+ int ret;
+
+ lockdep_assert_held(&v3d->sched_lock);
+
+ scoped_guard(spinlock_irqsave, &v3d->perfmon_state.lock)
+ is_global_perfmon = !!v3d->global_perfmon;
+
+ if (is_global_perfmon)
+ goto publish;
+
+ if (job->perfmon) {
+ for (enum v3d_queue q = 0; q < V3D_MAX_QUEUES; q++) {
+ struct dma_fence *f = v3d->perfmon_state.last_hw_fence[q];
+
+ if (!f || dma_fence_is_signaled(f))
+ continue;
+
+ ret = drm_sched_job_add_dependency(&job->base, dma_fence_get(f));
+ if (ret)
+ return ret;
+ }
+ } else if (v3d->perfmon_state.fence &&
+ !dma_fence_is_signaled(v3d->perfmon_state.fence)) {
+ ret = drm_sched_job_add_dependency(&job->base,
+ dma_fence_get(v3d->perfmon_state.fence));
+ if (ret)
+ return ret;
+ }
+
+publish:
+ dma_fence_put(v3d->perfmon_state.last_hw_fence[job->queue]);
+ v3d->perfmon_state.last_hw_fence[job->queue] = dma_fence_get(job->done_fence);
+
+ if (job->perfmon && !is_global_perfmon) {
+ dma_fence_put(v3d->perfmon_state.fence);
+ v3d->perfmon_state.fence = dma_fence_get(job->done_fence);
+ }
+
+ return 0;
+}
+
static void
v3d_submit_attach_object_fences(struct v3d_submit *submit)
{
@@ -337,6 +394,12 @@ v3d_submit_jobs(struct v3d_submit *submit)
goto err;
}
+ for (int i = 0; i < submit->job_count; i++) {
+ ret = v3d_serialize_for_perfmon(submit->jobs[i]);
+ if (ret)
+ goto err;
+ }
+
for (int i = 0; i < submit->job_count; i++)
drm_sched_entity_push_job(&submit->jobs[i]->base);
--
2.54.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH v2 3/4] drm/v3d: Serialize jobs across queues when a perfmon is attached
2026-05-31 20:18 ` [PATCH v2 3/4] drm/v3d: Serialize jobs across queues when a perfmon is attached Maíra Canal
@ 2026-06-02 7:35 ` Iago Toral
2026-06-02 10:49 ` Maíra Canal
2026-06-04 4:44 ` Claude review: " Claude Code Review Bot
1 sibling, 1 reply; 15+ messages in thread
From: Iago Toral @ 2026-06-02 7:35 UTC (permalink / raw)
To: Maíra Canal, Melissa Wen, Tvrtko Ursulin, David Airlie,
Simona Vetter
Cc: kernel-dev, dri-devel
Hi Maíra, this pactch looks good to me but I am wondering if we should
do anything to clean up the perfmon state when a gpu reset happens:
i.e. what would happen if a perfmon job resets? Would we continue to
schedule after-reset jobs to wait on the fence of the reset job?
Iago
El dom, 31-05-2026 a las 17:18 -0300, Maíra Canal escribió:
> A non-global perfmon is meant to count events generated by a specific
> submission, but the scheduler can run jobs from different queues
> concurrently on the same V3D core. Without explicit serialization, an
> unrelated job running in parallel with a perfmon-carrying job
> pollutes
> the counters and generates unusable results.
>
> To address such issue, we must enforce cross-queue serialization when
> we
> detect a perfmon-carrying submission. It's possible to implement
> serialization by enforcing two rules:
>
> 1. A job that carries a non-global perfmon must wait for every job
> currently in-flight across all HW queues to finish.
>
> 2. While a perfmon-carrying job is still in-flight, all
> subsequently
> submitted jobs must wait for it.
>
> Note that serialization is not needed in the global perfmon case, as
> the
> global perfmon tracks activity from all jobs, so concurrency is
> desirable.
>
> Therefore, check if serialization is needed during job submission and
> if
> so, attach fence dependences to enforce cross-queue serialization.
>
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> ---
> drivers/gpu/drm/v3d/v3d_drv.h | 35 ++++++++++++++++++----
> drivers/gpu/drm/v3d/v3d_gem.c | 3 ++
> drivers/gpu/drm/v3d/v3d_perfmon.c | 6 ++++
> drivers/gpu/drm/v3d/v3d_submit.c | 63
> +++++++++++++++++++++++++++++++++++++++
> 4 files changed, 101 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h
> b/drivers/gpu/drm/v3d/v3d_drv.h
> index 51486af68cf4..cdf4926d51f2 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.h
> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
> @@ -74,11 +74,13 @@ struct v3d_queue_state {
> spinlock_t queue_lock;
> };
>
> -/* Performance monitor object. The perform lifetime is controlled by
> userspace
> - * using perfmon related ioctls. A perfmon can be attached to a
> submit_cl
> - * request, and when this is the case, HW perf counters will be
> activated just
> - * before the submit_cl is submitted to the GPU and disabled when
> the job is
> - * done. This way, only events related to a specific job will be
> counted.
> +/* Performance monitor object
> + *
> + * The performance monitor (perfmon) lifetime is controlled by
> userspace using
> + * perfmon related ioctls. A perfmon can be attached to a CL or CSD
> submission
> + * request, and when it is, HW performance counters will be
> activated just
> + * before the job is submitted to the GPU and disabled when the job
> is done.
> + * This way, only events related to a specific submission will be
> counted.
> */
> struct v3d_perfmon {
> /* Tracks the number of users of the perfmon, when this
> counter reaches
> @@ -167,13 +169,31 @@ struct v3d_dev {
>
> struct v3d_queue_state queue[V3D_MAX_QUEUES];
>
> - /* Tracks the performance monitor state. */
> + /*
> + * Tracks the performance monitor state and consistency.
> + *
> + * When a non-global perfmon is attached to a job, the
> scheduler must
> + * not run any other job on the HW concurrently (otherwise,
> the
> + * counters would be polluted by unrelated work).
> + */
> struct {
> /* Protects @active. */
> spinlock_t lock;
>
> /* Perfmon currently programmed in HW (or NULL if
> none). */
> struct v3d_perfmon *active;
> +
> + /* Finished fence of the most recently submitted job
> that
> + * opened a serialization window (i.e. a job with a
> non-global
> + * perfmon attached).
> + */
> + struct dma_fence *fence;
> +
> + /* Finished fence of the most recently submitted job
> on each HW
> + * queue. Used so that a new perfmon-carrying job
> can depend on
> + * every job currently in-flight across all queues.
> + */
> + struct dma_fence *last_hw_fence[V3D_MAX_QUEUES];
> } perfmon_state;
>
> /* Protects bo_stats */
> @@ -319,6 +339,9 @@ struct v3d_job {
>
> struct v3d_dev *v3d;
>
> + /* The queue that the job was submitted on. */
> + enum v3d_queue queue;
> +
> /* This is the array of BOs that were looked up at the start
> * of submission.
> */
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> b/drivers/gpu/drm/v3d/v3d_gem.c
> index 9487ab7acd03..0b415486f5b4 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -363,8 +363,11 @@ v3d_gem_destroy(struct drm_device *dev)
> for (q = 0; q < V3D_MAX_QUEUES; q++) {
> WARN_ON(v3d->queue[q].active_job);
> v3d_stats_put(v3d->queue[q].stats);
> + dma_fence_put(v3d->perfmon_state.last_hw_fence[q]);
> }
>
> + dma_fence_put(v3d->perfmon_state.fence);
> +
> drm_mm_takedown(&v3d->mm);
>
> dma_free_coherent(v3d->drm.dev, 4096 * 1024, (void *)v3d-
> >pt,
> diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c
> b/drivers/gpu/drm/v3d/v3d_perfmon.c
> index 3ad0f022753c..07dab7fb3060 100644
> --- a/drivers/gpu/drm/v3d/v3d_perfmon.c
> +++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
> @@ -275,6 +275,12 @@ void v3d_perfmon_start(struct v3d_dev *v3d,
> struct v3d_perfmon *perfmon)
> if (!perfmon || v3d->global_perfmon)
> return;
>
> + /* Cross-queue serialization should have drained any
> previous perfmon
> + * job before this one runs.
> + */
> + if (WARN_ON_ONCE(v3d->perfmon_state.active))
> + return;
> +
> if (!pm_runtime_get_if_active(v3d->drm.dev))
> return;
>
> diff --git a/drivers/gpu/drm/v3d/v3d_submit.c
> b/drivers/gpu/drm/v3d/v3d_submit.c
> index 4c526aafc4e0..46fc88bacc95 100644
> --- a/drivers/gpu/drm/v3d/v3d_submit.c
> +++ b/drivers/gpu/drm/v3d/v3d_submit.c
> @@ -234,6 +234,7 @@ v3d_submit_add_job(struct v3d_submit *submit,
> void **container, size_t size,
> job->v3d = v3d;
> job->free = free;
> job->file_priv = v3d_priv;
> + job->queue = queue;
>
> ret = drm_sched_job_init(&job->base, &v3d_priv-
> >sched_entity[queue],
> 1, v3d_priv, submit->file_priv-
> >client_id);
> @@ -293,6 +294,62 @@ v3d_attach_perfmon_to_jobs(struct v3d_submit
> *submit, u32 perfmon_id)
> return 0;
> }
>
> +/*
> + * Prepare fences to enforce job serialization when a perfmon is
> active. A job
> + * that carries a non-global perfmon must wait for every job
> currently in-flight
> + * across all HW queues to finish, otherwise concurrent unrelated
> work on the
> + * same core would pollute the performance counters. Symmetrically,
> while such a
> + * job is still in-flight, all subsequently submitted jobs must wait
> for it.
> + *
> + * We don't serialize the jobs when using a global perfmon as it's
> expected to
> + * track concurrent activity from all jobs.
> + */
> +static int
> +v3d_serialize_for_perfmon(struct v3d_job *job)
> +{
> + struct v3d_dev *v3d = job->v3d;
> + bool is_global_perfmon;
> + int ret;
> +
> + lockdep_assert_held(&v3d->sched_lock);
> +
> + scoped_guard(spinlock_irqsave, &v3d->perfmon_state.lock)
> + is_global_perfmon = !!v3d->global_perfmon;
> +
> + if (is_global_perfmon)
> + goto publish;
> +
> + if (job->perfmon) {
> + for (enum v3d_queue q = 0; q < V3D_MAX_QUEUES; q++)
> {
> + struct dma_fence *f = v3d-
> >perfmon_state.last_hw_fence[q];
> +
> + if (!f || dma_fence_is_signaled(f))
> + continue;
> +
> + ret = drm_sched_job_add_dependency(&job-
> >base, dma_fence_get(f));
> + if (ret)
> + return ret;
> + }
> + } else if (v3d->perfmon_state.fence &&
> + !dma_fence_is_signaled(v3d->perfmon_state.fence))
> {
> + ret = drm_sched_job_add_dependency(&job->base,
> +
> dma_fence_get(v3d->perfmon_state.fence));
> + if (ret)
> + return ret;
> + }
> +
> +publish:
> + dma_fence_put(v3d->perfmon_state.last_hw_fence[job->queue]);
> + v3d->perfmon_state.last_hw_fence[job->queue] =
> dma_fence_get(job->done_fence);
> +
> + if (job->perfmon && !is_global_perfmon) {
> + dma_fence_put(v3d->perfmon_state.fence);
> + v3d->perfmon_state.fence = dma_fence_get(job-
> >done_fence);
> + }
> +
> + return 0;
> +}
> +
> static void
> v3d_submit_attach_object_fences(struct v3d_submit *submit)
> {
> @@ -337,6 +394,12 @@ v3d_submit_jobs(struct v3d_submit *submit)
> goto err;
> }
>
> + for (int i = 0; i < submit->job_count; i++) {
> + ret = v3d_serialize_for_perfmon(submit->jobs[i]);
> + if (ret)
> + goto err;
> + }
> +
> for (int i = 0; i < submit->job_count; i++)
> drm_sched_entity_push_job(&submit->jobs[i]->base);
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v2 3/4] drm/v3d: Serialize jobs across queues when a perfmon is attached
2026-06-02 7:35 ` Iago Toral
@ 2026-06-02 10:49 ` Maíra Canal
2026-06-02 11:10 ` Iago Toral
0 siblings, 1 reply; 15+ messages in thread
From: Maíra Canal @ 2026-06-02 10:49 UTC (permalink / raw)
To: Iago Toral, Melissa Wen, Tvrtko Ursulin, David Airlie,
Simona Vetter
Cc: kernel-dev, dri-devel
Hi Iago,
On 02/06/26 04:35, Iago Toral wrote:
> Hi Maíra, this pactch looks good to me but I am wondering if we should
> do anything to clean up the perfmon state when a gpu reset happens:
> i.e. what would happen if a perfmon job resets? Would we continue to
> schedule after-reset jobs to wait on the fence of the reset job?
If the job resets, the design around the DRM scheduler will protect us.
The DRM scheduler guarantees that a finished fence is always signaled
eventually. On a reset, drm_sched_start() signals the pending jobs'
finished fences (with -ECANCELED for the guilty job). So if a
perfmon-carrying job is the one that times out, the done_fence we
stored for serialization still gets signaled when resubmitted (check the
first if-clause in the run_job() callbacks).
drm_sched_job_add_dependency() releases the dependent job as soon as the
fence signals, regardless of error, so any job submitted behind the
reset perfmon job is released and runs. The serialization window simply
closes when the job dies.
Best regards,
- Maíra
>
> Iago
>
> El dom, 31-05-2026 a las 17:18 -0300, Maíra Canal escribió:
>> A non-global perfmon is meant to count events generated by a specific
>> submission, but the scheduler can run jobs from different queues
>> concurrently on the same V3D core. Without explicit serialization, an
>> unrelated job running in parallel with a perfmon-carrying job
>> pollutes
>> the counters and generates unusable results.
>>
>> To address such issue, we must enforce cross-queue serialization when
>> we
>> detect a perfmon-carrying submission. It's possible to implement
>> serialization by enforcing two rules:
>>
>> 1. A job that carries a non-global perfmon must wait for every job
>> currently in-flight across all HW queues to finish.
>>
>> 2. While a perfmon-carrying job is still in-flight, all
>> subsequently
>> submitted jobs must wait for it.
>>
>> Note that serialization is not needed in the global perfmon case, as
>> the
>> global perfmon tracks activity from all jobs, so concurrency is
>> desirable.
>>
>> Therefore, check if serialization is needed during job submission and
>> if
>> so, attach fence dependences to enforce cross-queue serialization.
>>
>> Signed-off-by: Maíra Canal <mcanal@igalia.com>
>> ---
>> drivers/gpu/drm/v3d/v3d_drv.h | 35 ++++++++++++++++++----
>> drivers/gpu/drm/v3d/v3d_gem.c | 3 ++
>> drivers/gpu/drm/v3d/v3d_perfmon.c | 6 ++++
>> drivers/gpu/drm/v3d/v3d_submit.c | 63
>> +++++++++++++++++++++++++++++++++++++++
>> 4 files changed, 101 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h
>> b/drivers/gpu/drm/v3d/v3d_drv.h
>> index 51486af68cf4..cdf4926d51f2 100644
>> --- a/drivers/gpu/drm/v3d/v3d_drv.h
>> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
>> @@ -74,11 +74,13 @@ struct v3d_queue_state {
>> spinlock_t queue_lock;
>> };
>>
>> -/* Performance monitor object. The perform lifetime is controlled by
>> userspace
>> - * using perfmon related ioctls. A perfmon can be attached to a
>> submit_cl
>> - * request, and when this is the case, HW perf counters will be
>> activated just
>> - * before the submit_cl is submitted to the GPU and disabled when
>> the job is
>> - * done. This way, only events related to a specific job will be
>> counted.
>> +/* Performance monitor object
>> + *
>> + * The performance monitor (perfmon) lifetime is controlled by
>> userspace using
>> + * perfmon related ioctls. A perfmon can be attached to a CL or CSD
>> submission
>> + * request, and when it is, HW performance counters will be
>> activated just
>> + * before the job is submitted to the GPU and disabled when the job
>> is done.
>> + * This way, only events related to a specific submission will be
>> counted.
>> */
>> struct v3d_perfmon {
>> /* Tracks the number of users of the perfmon, when this
>> counter reaches
>> @@ -167,13 +169,31 @@ struct v3d_dev {
>>
>> struct v3d_queue_state queue[V3D_MAX_QUEUES];
>>
>> - /* Tracks the performance monitor state. */
>> + /*
>> + * Tracks the performance monitor state and consistency.
>> + *
>> + * When a non-global perfmon is attached to a job, the
>> scheduler must
>> + * not run any other job on the HW concurrently (otherwise,
>> the
>> + * counters would be polluted by unrelated work).
>> + */
>> struct {
>> /* Protects @active. */
>> spinlock_t lock;
>>
>> /* Perfmon currently programmed in HW (or NULL if
>> none). */
>> struct v3d_perfmon *active;
>> +
>> + /* Finished fence of the most recently submitted job
>> that
>> + * opened a serialization window (i.e. a job with a
>> non-global
>> + * perfmon attached).
>> + */
>> + struct dma_fence *fence;
>> +
>> + /* Finished fence of the most recently submitted job
>> on each HW
>> + * queue. Used so that a new perfmon-carrying job
>> can depend on
>> + * every job currently in-flight across all queues.
>> + */
>> + struct dma_fence *last_hw_fence[V3D_MAX_QUEUES];
>> } perfmon_state;
>>
>> /* Protects bo_stats */
>> @@ -319,6 +339,9 @@ struct v3d_job {
>>
>> struct v3d_dev *v3d;
>>
>> + /* The queue that the job was submitted on. */
>> + enum v3d_queue queue;
>> +
>> /* This is the array of BOs that were looked up at the start
>> * of submission.
>> */
>> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
>> b/drivers/gpu/drm/v3d/v3d_gem.c
>> index 9487ab7acd03..0b415486f5b4 100644
>> --- a/drivers/gpu/drm/v3d/v3d_gem.c
>> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
>> @@ -363,8 +363,11 @@ v3d_gem_destroy(struct drm_device *dev)
>> for (q = 0; q < V3D_MAX_QUEUES; q++) {
>> WARN_ON(v3d->queue[q].active_job);
>> v3d_stats_put(v3d->queue[q].stats);
>> + dma_fence_put(v3d->perfmon_state.last_hw_fence[q]);
>> }
>>
>> + dma_fence_put(v3d->perfmon_state.fence);
>> +
>> drm_mm_takedown(&v3d->mm);
>>
>> dma_free_coherent(v3d->drm.dev, 4096 * 1024, (void *)v3d-
>>> pt,
>> diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c
>> b/drivers/gpu/drm/v3d/v3d_perfmon.c
>> index 3ad0f022753c..07dab7fb3060 100644
>> --- a/drivers/gpu/drm/v3d/v3d_perfmon.c
>> +++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
>> @@ -275,6 +275,12 @@ void v3d_perfmon_start(struct v3d_dev *v3d,
>> struct v3d_perfmon *perfmon)
>> if (!perfmon || v3d->global_perfmon)
>> return;
>>
>> + /* Cross-queue serialization should have drained any
>> previous perfmon
>> + * job before this one runs.
>> + */
>> + if (WARN_ON_ONCE(v3d->perfmon_state.active))
>> + return;
>> +
>> if (!pm_runtime_get_if_active(v3d->drm.dev))
>> return;
>>
>> diff --git a/drivers/gpu/drm/v3d/v3d_submit.c
>> b/drivers/gpu/drm/v3d/v3d_submit.c
>> index 4c526aafc4e0..46fc88bacc95 100644
>> --- a/drivers/gpu/drm/v3d/v3d_submit.c
>> +++ b/drivers/gpu/drm/v3d/v3d_submit.c
>> @@ -234,6 +234,7 @@ v3d_submit_add_job(struct v3d_submit *submit,
>> void **container, size_t size,
>> job->v3d = v3d;
>> job->free = free;
>> job->file_priv = v3d_priv;
>> + job->queue = queue;
>>
>> ret = drm_sched_job_init(&job->base, &v3d_priv-
>>> sched_entity[queue],
>> 1, v3d_priv, submit->file_priv-
>>> client_id);
>> @@ -293,6 +294,62 @@ v3d_attach_perfmon_to_jobs(struct v3d_submit
>> *submit, u32 perfmon_id)
>> return 0;
>> }
>>
>> +/*
>> + * Prepare fences to enforce job serialization when a perfmon is
>> active. A job
>> + * that carries a non-global perfmon must wait for every job
>> currently in-flight
>> + * across all HW queues to finish, otherwise concurrent unrelated
>> work on the
>> + * same core would pollute the performance counters. Symmetrically,
>> while such a
>> + * job is still in-flight, all subsequently submitted jobs must wait
>> for it.
>> + *
>> + * We don't serialize the jobs when using a global perfmon as it's
>> expected to
>> + * track concurrent activity from all jobs.
>> + */
>> +static int
>> +v3d_serialize_for_perfmon(struct v3d_job *job)
>> +{
>> + struct v3d_dev *v3d = job->v3d;
>> + bool is_global_perfmon;
>> + int ret;
>> +
>> + lockdep_assert_held(&v3d->sched_lock);
>> +
>> + scoped_guard(spinlock_irqsave, &v3d->perfmon_state.lock)
>> + is_global_perfmon = !!v3d->global_perfmon;
>> +
>> + if (is_global_perfmon)
>> + goto publish;
>> +
>> + if (job->perfmon) {
>> + for (enum v3d_queue q = 0; q < V3D_MAX_QUEUES; q++)
>> {
>> + struct dma_fence *f = v3d-
>>> perfmon_state.last_hw_fence[q];
>> +
>> + if (!f || dma_fence_is_signaled(f))
>> + continue;
>> +
>> + ret = drm_sched_job_add_dependency(&job-
>>> base, dma_fence_get(f));
>> + if (ret)
>> + return ret;
>> + }
>> + } else if (v3d->perfmon_state.fence &&
>> + !dma_fence_is_signaled(v3d->perfmon_state.fence))
>> {
>> + ret = drm_sched_job_add_dependency(&job->base,
>> +
>> dma_fence_get(v3d->perfmon_state.fence));
>> + if (ret)
>> + return ret;
>> + }
>> +
>> +publish:
>> + dma_fence_put(v3d->perfmon_state.last_hw_fence[job->queue]);
>> + v3d->perfmon_state.last_hw_fence[job->queue] =
>> dma_fence_get(job->done_fence);
>> +
>> + if (job->perfmon && !is_global_perfmon) {
>> + dma_fence_put(v3d->perfmon_state.fence);
>> + v3d->perfmon_state.fence = dma_fence_get(job-
>>> done_fence);
>> + }
>> +
>> + return 0;
>> +}
>> +
>> static void
>> v3d_submit_attach_object_fences(struct v3d_submit *submit)
>> {
>> @@ -337,6 +394,12 @@ v3d_submit_jobs(struct v3d_submit *submit)
>> goto err;
>> }
>>
>> + for (int i = 0; i < submit->job_count; i++) {
>> + ret = v3d_serialize_for_perfmon(submit->jobs[i]);
>> + if (ret)
>> + goto err;
>> + }
>> +
>> for (int i = 0; i < submit->job_count; i++)
>> drm_sched_entity_push_job(&submit->jobs[i]->base);
>>
>>
>
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH v2 3/4] drm/v3d: Serialize jobs across queues when a perfmon is attached
2026-06-02 10:49 ` Maíra Canal
@ 2026-06-02 11:10 ` Iago Toral
0 siblings, 0 replies; 15+ messages in thread
From: Iago Toral @ 2026-06-02 11:10 UTC (permalink / raw)
To: Maíra Canal, Melissa Wen, Tvrtko Ursulin, David Airlie,
Simona Vetter
Cc: kernel-dev, dri-devel
Hi Maíra,
thanks for clarifying this. For the series:
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Iago
El mar, 02-06-2026 a las 07:49 -0300, Maíra Canal escribió:
> Hi Iago,
>
> On 02/06/26 04:35, Iago Toral wrote:
> > Hi Maíra, this pactch looks good to me but I am wondering if we
> > should
> > do anything to clean up the perfmon state when a gpu reset happens:
> > i.e. what would happen if a perfmon job resets? Would we continue
> > to
> > schedule after-reset jobs to wait on the fence of the reset job?
>
> If the job resets, the design around the DRM scheduler will protect
> us.
>
> The DRM scheduler guarantees that a finished fence is always signaled
> eventually. On a reset, drm_sched_start() signals the pending jobs'
> finished fences (with -ECANCELED for the guilty job). So if a
> perfmon-carrying job is the one that times out, the done_fence we
> stored for serialization still gets signaled when resubmitted (check
> the
> first if-clause in the run_job() callbacks).
>
> drm_sched_job_add_dependency() releases the dependent job as soon as
> the
> fence signals, regardless of error, so any job submitted behind the
> reset perfmon job is released and runs. The serialization window
> simply
> closes when the job dies.
>
> Best regards,
> - Maíra
>
> >
> > Iago
> >
> > El dom, 31-05-2026 a las 17:18 -0300, Maíra Canal escribió:
> > > A non-global perfmon is meant to count events generated by a
> > > specific
> > > submission, but the scheduler can run jobs from different queues
> > > concurrently on the same V3D core. Without explicit
> > > serialization, an
> > > unrelated job running in parallel with a perfmon-carrying job
> > > pollutes
> > > the counters and generates unusable results.
> > >
> > > To address such issue, we must enforce cross-queue serialization
> > > when
> > > we
> > > detect a perfmon-carrying submission. It's possible to implement
> > > serialization by enforcing two rules:
> > >
> > > 1. A job that carries a non-global perfmon must wait for every
> > > job
> > > currently in-flight across all HW queues to finish.
> > >
> > > 2. While a perfmon-carrying job is still in-flight, all
> > > subsequently
> > > submitted jobs must wait for it.
> > >
> > > Note that serialization is not needed in the global perfmon case,
> > > as
> > > the
> > > global perfmon tracks activity from all jobs, so concurrency is
> > > desirable.
> > >
> > > Therefore, check if serialization is needed during job submission
> > > and
> > > if
> > > so, attach fence dependences to enforce cross-queue
> > > serialization.
> > >
> > > Signed-off-by: Maíra Canal <mcanal@igalia.com>
> > > ---
> > > drivers/gpu/drm/v3d/v3d_drv.h | 35 ++++++++++++++++++----
> > > drivers/gpu/drm/v3d/v3d_gem.c | 3 ++
> > > drivers/gpu/drm/v3d/v3d_perfmon.c | 6 ++++
> > > drivers/gpu/drm/v3d/v3d_submit.c | 63
> > > +++++++++++++++++++++++++++++++++++++++
> > > 4 files changed, 101 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/v3d/v3d_drv.h
> > > b/drivers/gpu/drm/v3d/v3d_drv.h
> > > index 51486af68cf4..cdf4926d51f2 100644
> > > --- a/drivers/gpu/drm/v3d/v3d_drv.h
> > > +++ b/drivers/gpu/drm/v3d/v3d_drv.h
> > > @@ -74,11 +74,13 @@ struct v3d_queue_state {
> > > spinlock_t queue_lock;
> > > };
> > >
> > > -/* Performance monitor object. The perform lifetime is
> > > controlled by
> > > userspace
> > > - * using perfmon related ioctls. A perfmon can be attached to a
> > > submit_cl
> > > - * request, and when this is the case, HW perf counters will be
> > > activated just
> > > - * before the submit_cl is submitted to the GPU and disabled
> > > when
> > > the job is
> > > - * done. This way, only events related to a specific job will be
> > > counted.
> > > +/* Performance monitor object
> > > + *
> > > + * The performance monitor (perfmon) lifetime is controlled by
> > > userspace using
> > > + * perfmon related ioctls. A perfmon can be attached to a CL or
> > > CSD
> > > submission
> > > + * request, and when it is, HW performance counters will be
> > > activated just
> > > + * before the job is submitted to the GPU and disabled when the
> > > job
> > > is done.
> > > + * This way, only events related to a specific submission will
> > > be
> > > counted.
> > > */
> > > struct v3d_perfmon {
> > > /* Tracks the number of users of the perfmon, when this
> > > counter reaches
> > > @@ -167,13 +169,31 @@ struct v3d_dev {
> > >
> > > struct v3d_queue_state queue[V3D_MAX_QUEUES];
> > >
> > > - /* Tracks the performance monitor state. */
> > > + /*
> > > + * Tracks the performance monitor state and consistency.
> > > + *
> > > + * When a non-global perfmon is attached to a job, the
> > > scheduler must
> > > + * not run any other job on the HW concurrently
> > > (otherwise,
> > > the
> > > + * counters would be polluted by unrelated work).
> > > + */
> > > struct {
> > > /* Protects @active. */
> > > spinlock_t lock;
> > >
> > > /* Perfmon currently programmed in HW (or NULL
> > > if
> > > none). */
> > > struct v3d_perfmon *active;
> > > +
> > > + /* Finished fence of the most recently submitted
> > > job
> > > that
> > > + * opened a serialization window (i.e. a job
> > > with a
> > > non-global
> > > + * perfmon attached).
> > > + */
> > > + struct dma_fence *fence;
> > > +
> > > + /* Finished fence of the most recently submitted
> > > job
> > > on each HW
> > > + * queue. Used so that a new perfmon-carrying
> > > job
> > > can depend on
> > > + * every job currently in-flight across all
> > > queues.
> > > + */
> > > + struct dma_fence *last_hw_fence[V3D_MAX_QUEUES];
> > > } perfmon_state;
> > >
> > > /* Protects bo_stats */
> > > @@ -319,6 +339,9 @@ struct v3d_job {
> > >
> > > struct v3d_dev *v3d;
> > >
> > > + /* The queue that the job was submitted on. */
> > > + enum v3d_queue queue;
> > > +
> > > /* This is the array of BOs that were looked up at the
> > > start
> > > * of submission.
> > > */
> > > diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> > > b/drivers/gpu/drm/v3d/v3d_gem.c
> > > index 9487ab7acd03..0b415486f5b4 100644
> > > --- a/drivers/gpu/drm/v3d/v3d_gem.c
> > > +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> > > @@ -363,8 +363,11 @@ v3d_gem_destroy(struct drm_device *dev)
> > > for (q = 0; q < V3D_MAX_QUEUES; q++) {
> > > WARN_ON(v3d->queue[q].active_job);
> > > v3d_stats_put(v3d->queue[q].stats);
> > > + dma_fence_put(v3d-
> > > >perfmon_state.last_hw_fence[q]);
> > > }
> > >
> > > + dma_fence_put(v3d->perfmon_state.fence);
> > > +
> > > drm_mm_takedown(&v3d->mm);
> > >
> > > dma_free_coherent(v3d->drm.dev, 4096 * 1024, (void
> > > *)v3d-
> > > > pt,
> > > diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c
> > > b/drivers/gpu/drm/v3d/v3d_perfmon.c
> > > index 3ad0f022753c..07dab7fb3060 100644
> > > --- a/drivers/gpu/drm/v3d/v3d_perfmon.c
> > > +++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
> > > @@ -275,6 +275,12 @@ void v3d_perfmon_start(struct v3d_dev *v3d,
> > > struct v3d_perfmon *perfmon)
> > > if (!perfmon || v3d->global_perfmon)
> > > return;
> > >
> > > + /* Cross-queue serialization should have drained any
> > > previous perfmon
> > > + * job before this one runs.
> > > + */
> > > + if (WARN_ON_ONCE(v3d->perfmon_state.active))
> > > + return;
> > > +
> > > if (!pm_runtime_get_if_active(v3d->drm.dev))
> > > return;
> > >
> > > diff --git a/drivers/gpu/drm/v3d/v3d_submit.c
> > > b/drivers/gpu/drm/v3d/v3d_submit.c
> > > index 4c526aafc4e0..46fc88bacc95 100644
> > > --- a/drivers/gpu/drm/v3d/v3d_submit.c
> > > +++ b/drivers/gpu/drm/v3d/v3d_submit.c
> > > @@ -234,6 +234,7 @@ v3d_submit_add_job(struct v3d_submit *submit,
> > > void **container, size_t size,
> > > job->v3d = v3d;
> > > job->free = free;
> > > job->file_priv = v3d_priv;
> > > + job->queue = queue;
> > >
> > > ret = drm_sched_job_init(&job->base, &v3d_priv-
> > > > sched_entity[queue],
> > > 1, v3d_priv, submit->file_priv-
> > > > client_id);
> > > @@ -293,6 +294,62 @@ v3d_attach_perfmon_to_jobs(struct v3d_submit
> > > *submit, u32 perfmon_id)
> > > return 0;
> > > }
> > >
> > > +/*
> > > + * Prepare fences to enforce job serialization when a perfmon is
> > > active. A job
> > > + * that carries a non-global perfmon must wait for every job
> > > currently in-flight
> > > + * across all HW queues to finish, otherwise concurrent
> > > unrelated
> > > work on the
> > > + * same core would pollute the performance counters.
> > > Symmetrically,
> > > while such a
> > > + * job is still in-flight, all subsequently submitted jobs must
> > > wait
> > > for it.
> > > + *
> > > + * We don't serialize the jobs when using a global perfmon as
> > > it's
> > > expected to
> > > + * track concurrent activity from all jobs.
> > > + */
> > > +static int
> > > +v3d_serialize_for_perfmon(struct v3d_job *job)
> > > +{
> > > + struct v3d_dev *v3d = job->v3d;
> > > + bool is_global_perfmon;
> > > + int ret;
> > > +
> > > + lockdep_assert_held(&v3d->sched_lock);
> > > +
> > > + scoped_guard(spinlock_irqsave, &v3d->perfmon_state.lock)
> > > + is_global_perfmon = !!v3d->global_perfmon;
> > > +
> > > + if (is_global_perfmon)
> > > + goto publish;
> > > +
> > > + if (job->perfmon) {
> > > + for (enum v3d_queue q = 0; q < V3D_MAX_QUEUES;
> > > q++)
> > > {
> > > + struct dma_fence *f = v3d-
> > > > perfmon_state.last_hw_fence[q];
> > > +
> > > + if (!f || dma_fence_is_signaled(f))
> > > + continue;
> > > +
> > > + ret = drm_sched_job_add_dependency(&job-
> > > > base, dma_fence_get(f));
> > > + if (ret)
> > > + return ret;
> > > + }
> > > + } else if (v3d->perfmon_state.fence &&
> > > + !dma_fence_is_signaled(v3d-
> > > >perfmon_state.fence))
> > > {
> > > + ret = drm_sched_job_add_dependency(&job->base,
> > > +
> > > dma_fence_get(v3d->perfmon_state.fence));
> > > + if (ret)
> > > + return ret;
> > > + }
> > > +
> > > +publish:
> > > + dma_fence_put(v3d->perfmon_state.last_hw_fence[job-
> > > >queue]);
> > > + v3d->perfmon_state.last_hw_fence[job->queue] =
> > > dma_fence_get(job->done_fence);
> > > +
> > > + if (job->perfmon && !is_global_perfmon) {
> > > + dma_fence_put(v3d->perfmon_state.fence);
> > > + v3d->perfmon_state.fence = dma_fence_get(job-
> > > > done_fence);
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > > static void
> > > v3d_submit_attach_object_fences(struct v3d_submit *submit)
> > > {
> > > @@ -337,6 +394,12 @@ v3d_submit_jobs(struct v3d_submit *submit)
> > > goto err;
> > > }
> > >
> > > + for (int i = 0; i < submit->job_count; i++) {
> > > + ret = v3d_serialize_for_perfmon(submit-
> > > >jobs[i]);
> > > + if (ret)
> > > + goto err;
> > > + }
> > > +
> > > for (int i = 0; i < submit->job_count; i++)
> > > drm_sched_entity_push_job(&submit->jobs[i]-
> > > >base);
> > >
> > >
> >
>
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Claude review: drm/v3d: Serialize jobs across queues when a perfmon is attached
2026-05-31 20:18 ` [PATCH v2 3/4] drm/v3d: Serialize jobs across queues when a perfmon is attached Maíra Canal
2026-06-02 7:35 ` Iago Toral
@ 2026-06-04 4:44 ` Claude Code Review Bot
1 sibling, 0 replies; 15+ messages in thread
From: Claude Code Review Bot @ 2026-06-04 4:44 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Status**: The design is sound but there's a subtle concern worth discussing.
The approach is straightforward: when a non-global perfmon-carrying job is submitted, it depends on all in-flight jobs across all queues (via `last_hw_fence[]`). Conversely, when a non-perfmon job is submitted while a perfmon job is in-flight, it depends on `perfmon_state.fence`.
**The `v3d_serialize_for_perfmon()` function:**
```c
static int
v3d_serialize_for_perfmon(struct v3d_job *job)
{
...
lockdep_assert_held(&v3d->sched_lock);
scoped_guard(spinlock_irqsave, &v3d->perfmon_state.lock)
is_global_perfmon = !!v3d->global_perfmon;
if (is_global_perfmon)
goto publish;
...
```
**Observation 1 — TOCTOU on `global_perfmon` check**: The `is_global_perfmon` flag is read under the perfmon spinlock, but after dropping it, the global perfmon state could change before we act on it. However, `sched_lock` is held, which serializes all submissions. The global perfmon can only be set/cleared by the SET_GLOBAL ioctl, which doesn't hold `sched_lock`. So there's a theoretical window where:
- Thread A: holds `sched_lock`, reads `is_global_perfmon = false`, proceeds to add fence dependencies
- Thread B: concurrently calls SET_GLOBAL, installs a global perfmon
This seems low-risk in practice (the global perfmon is a privileged operation and the window is tiny), and the `sched_lock` assertion confirms submission serialization. But it's worth considering whether the SET_GLOBAL ioctl should also acquire `sched_lock` to close this gap. Given that the global perfmon case doesn't need serialization (concurrency is desired), the worst case is unnecessary serialization of one submission, not a correctness bug.
**Observation 2 — `last_hw_fence` update always happens (the `publish` label)**:
```c
publish:
dma_fence_put(v3d->perfmon_state.last_hw_fence[job->queue]);
v3d->perfmon_state.last_hw_fence[job->queue] = dma_fence_get(job->done_fence);
if (job->perfmon && !is_global_perfmon) {
dma_fence_put(v3d->perfmon_state.fence);
v3d->perfmon_state.fence = dma_fence_get(job->done_fence);
}
```
This correctly updates `last_hw_fence` for *every* job (needed so future perfmon jobs can wait on all in-flight work). The `perfmon_state.fence` is only set for perfmon-carrying jobs. This is sound.
**Observation 3 — `done_fence` vs `irq_fence`**: The code uses `job->done_fence` for the serialization fences. This is the scheduler's finished fence, which signals when the job completes. This is the correct fence to use (rather than `irq_fence` which is the HW interrupt fence).
**Observation 4 — WARN_ON_ONCE in `v3d_perfmon_start()`**:
```c
if (WARN_ON_ONCE(v3d->perfmon_state.active))
return;
```
Good defensive check. With cross-queue serialization working correctly, there should never be an active perfmon when a new perfmon-carrying job starts. If this fires, it means serialization failed.
**Observation 5 — Cleanup in `v3d_gem_destroy()`**:
```c
for (q = 0; q < V3D_MAX_QUEUES; q++) {
WARN_ON(v3d->queue[q].active_job);
v3d_stats_put(v3d->queue[q].stats);
dma_fence_put(v3d->perfmon_state.last_hw_fence[q]);
}
dma_fence_put(v3d->perfmon_state.fence);
```
Correct cleanup of the new fence references.
**No blocking issues.**
---
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2 4/4] drm/v3d: Drop the queue argument from v3d_job_add_syncobjs()
2026-05-31 20:18 [PATCH v2 0/4] drm/v3d: Fix perfmon locking and cross-queue isolation Maíra Canal
` (2 preceding siblings ...)
2026-05-31 20:18 ` [PATCH v2 3/4] drm/v3d: Serialize jobs across queues when a perfmon is attached Maíra Canal
@ 2026-05-31 20:18 ` Maíra Canal
2026-06-04 4:44 ` Claude review: " Claude Code Review Bot
2026-06-04 4:44 ` Claude review: drm/v3d: Fix perfmon locking and cross-queue isolation Claude Code Review Bot
4 siblings, 1 reply; 15+ messages in thread
From: Maíra Canal @ 2026-05-31 20:18 UTC (permalink / raw)
To: Melissa Wen, Iago Toral, Tvrtko Ursulin, David Airlie,
Simona Vetter
Cc: kernel-dev, dri-devel, Maíra Canal
The struct v3d_job now carries its submission queue. The explicit queue
parameter passed alongside the job is therefore redundant, as it can be
read through job->queue directly.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
drivers/gpu/drm/v3d/v3d_submit.c | 16 +++++++---------
1 file changed, 7 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/v3d/v3d_submit.c b/drivers/gpu/drm/v3d/v3d_submit.c
index 46fc88bacc95..bfd99935c2ef 100644
--- a/drivers/gpu/drm/v3d/v3d_submit.c
+++ b/drivers/gpu/drm/v3d/v3d_submit.c
@@ -179,7 +179,7 @@ void v3d_job_put(struct v3d_job *job)
static int
v3d_job_add_syncobjs(struct v3d_job *job, struct drm_file *file_priv,
- u32 in_sync, struct v3d_submit_ext *se, enum v3d_queue queue)
+ u32 in_sync, struct v3d_submit_ext *se)
{
bool has_multisync = se && (se->flags & DRM_V3D_EXT_ID_MULTI_SYNC);
struct v3d_dev *v3d = job->v3d;
@@ -193,7 +193,7 @@ v3d_job_add_syncobjs(struct v3d_job *job, struct drm_file *file_priv,
return ret;
}
- if (se->in_sync_count && se->wait_stage == queue) {
+ if (se->in_sync_count && se->wait_stage == job->queue) {
struct drm_v3d_sem __user *handle = u64_to_user_ptr(se->in_syncs);
for (int i = 0; i < se->in_sync_count; i++) {
@@ -481,8 +481,7 @@ v3d_setup_csd_jobs_and_bos(struct v3d_submit *submit,
if (ret)
return ret;
- ret = v3d_job_add_syncobjs(&job->base, submit->file_priv, args->in_sync,
- se, V3D_CSD);
+ ret = v3d_job_add_syncobjs(&job->base, submit->file_priv, args->in_sync, se);
if (ret)
return ret;
@@ -1112,7 +1111,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
bin->qts = args->qts;
ret = v3d_job_add_syncobjs(&bin->base, file_priv,
- args->in_sync_bcl, &se, V3D_BIN);
+ args->in_sync_bcl, &se);
if (ret)
goto fail;
}
@@ -1129,8 +1128,7 @@ v3d_submit_cl_ioctl(struct drm_device *dev, void *data,
if (bin)
bin->render = render;
- ret = v3d_job_add_syncobjs(&render->base, file_priv, args->in_sync_rcl,
- &se, V3D_RENDER);
+ ret = v3d_job_add_syncobjs(&render->base, file_priv, args->in_sync_rcl, &se);
if (ret)
goto fail;
@@ -1221,7 +1219,7 @@ v3d_submit_tfu_ioctl(struct drm_device *dev, void *data,
if (ret)
goto fail;
- ret = v3d_job_add_syncobjs(&job->base, file_priv, args->in_sync, &se, V3D_TFU);
+ ret = v3d_job_add_syncobjs(&job->base, file_priv, args->in_sync, &se);
if (ret)
goto fail;
@@ -1413,7 +1411,7 @@ v3d_submit_cpu_ioctl(struct drm_device *dev, void *data,
trace_v3d_submit_cpu_ioctl(dev, cpu_job->job_type);
- ret = v3d_job_add_syncobjs(&cpu_job->base, file_priv, 0, &se, V3D_CPU);
+ ret = v3d_job_add_syncobjs(&cpu_job->base, file_priv, 0, &se);
if (ret)
goto fail;
--
2.54.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* Claude review: drm/v3d: Drop the queue argument from v3d_job_add_syncobjs()
2026-05-31 20:18 ` [PATCH v2 4/4] drm/v3d: Drop the queue argument from v3d_job_add_syncobjs() Maíra Canal
@ 2026-06-04 4:44 ` Claude Code Review Bot
0 siblings, 0 replies; 15+ messages in thread
From: Claude Code Review Bot @ 2026-06-04 4:44 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Status**: Clean mechanical cleanup, no issues.
Now that `job->queue` exists (introduced in patch 3), the explicit `enum v3d_queue queue` parameter to `v3d_job_add_syncobjs()` is redundant. The patch simply removes it and uses `job->queue` instead.
```c
-v3d_job_add_syncobjs(struct v3d_job *job, struct drm_file *file_priv,
- u32 in_sync, struct v3d_submit_ext *se, enum v3d_queue queue)
+v3d_job_add_syncobjs(struct v3d_job *job, struct drm_file *file_priv,
+ u32 in_sync, struct v3d_submit_ext *se)
```
All call sites updated consistently. **No issues found.**
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 15+ messages in thread
* Claude review: drm/v3d: Fix perfmon locking and cross-queue isolation
2026-05-31 20:18 [PATCH v2 0/4] drm/v3d: Fix perfmon locking and cross-queue isolation Maíra Canal
` (3 preceding siblings ...)
2026-05-31 20:18 ` [PATCH v2 4/4] drm/v3d: Drop the queue argument from v3d_job_add_syncobjs() Maíra Canal
@ 2026-06-04 4:44 ` Claude Code Review Bot
4 siblings, 0 replies; 15+ messages in thread
From: Claude Code Review Bot @ 2026-06-04 4:44 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: drm/v3d: Fix perfmon locking and cross-queue isolation
Author: =?utf-8?q?Ma=C3=ADra_Canal?= <mcanal@igalia.com>
Patches: 10
Reviewed: 2026-06-04T14:44:42.323803
---
This is a well-structured 4-patch series from Maíra Canal that addresses three real, long-standing correctness issues in v3d's performance monitor subsystem: (1) lock-free access to `active_perfmon`, (2) perfmons never being stopped after job completion, and (3) cross-queue counter pollution. The cover letter clearly describes the problems and the approach.
The series is logically decomposed: patch 1 is a self-contained stable fix for refcount leaks, patch 2 is the major locking refactor, patch 3 adds cross-queue serialization via fence dependencies, and patch 4 is a trivial cleanup. The dependency on a prerequisite series (the submission refactoring) introduces a `v3d_submit` abstraction used in patches 3-4 that isn't present in drm-next today, but the patches are internally consistent.
**Overall verdict**: Good series. A few points worth discussing, mainly around locking subtleties in patch 2 and a potential race window in patch 3, but nothing blocking.
---
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 15+ messages in thread