From: Maíra Canal <mcanal@igalia.com>
To: Iago Toral <itoral@igalia.com>, Melissa Wen <mwen@igalia.com>,
Tvrtko Ursulin <tvrtko.ursulin@igalia.com>,
David Airlie <airlied@gmail.com>, Simona Vetter <simona@ffwll.ch>
Cc: kernel-dev@igalia.com, dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v2 3/4] drm/v3d: Serialize jobs across queues when a perfmon is attached
Date: Tue, 2 Jun 2026 07:49:12 -0300 [thread overview]
Message-ID: <44cbf6ec-3134-44d5-b37c-0c99ac55f678@igalia.com> (raw)
In-Reply-To: <7a9d503de81b403585962f1c25dd0b506a6eb3f2.camel@igalia.com>
Hi Iago,
On 02/06/26 04:35, Iago Toral wrote:
> Hi Maíra, this pactch looks good to me but I am wondering if we should
> do anything to clean up the perfmon state when a gpu reset happens:
> i.e. what would happen if a perfmon job resets? Would we continue to
> schedule after-reset jobs to wait on the fence of the reset job?
If the job resets, the design around the DRM scheduler will protect us.
The DRM scheduler guarantees that a finished fence is always signaled
eventually. On a reset, drm_sched_start() signals the pending jobs'
finished fences (with -ECANCELED for the guilty job). So if a
perfmon-carrying job is the one that times out, the done_fence we
stored for serialization still gets signaled when resubmitted (check the
first if-clause in the run_job() callbacks).
drm_sched_job_add_dependency() releases the dependent job as soon as the
fence signals, regardless of error, so any job submitted behind the
reset perfmon job is released and runs. The serialization window simply
closes when the job dies.
Best regards,
- Maíra
>
> Iago
>
> El dom, 31-05-2026 a las 17:18 -0300, Maíra Canal escribió:
>> A non-global perfmon is meant to count events generated by a specific
>> submission, but the scheduler can run jobs from different queues
>> concurrently on the same V3D core. Without explicit serialization, an
>> unrelated job running in parallel with a perfmon-carrying job
>> pollutes
>> the counters and generates unusable results.
>>
>> To address such issue, we must enforce cross-queue serialization when
>> we
>> detect a perfmon-carrying submission. It's possible to implement
>> serialization by enforcing two rules:
>>
>> 1. A job that carries a non-global perfmon must wait for every job
>> currently in-flight across all HW queues to finish.
>>
>> 2. While a perfmon-carrying job is still in-flight, all
>> subsequently
>> submitted jobs must wait for it.
>>
>> Note that serialization is not needed in the global perfmon case, as
>> the
>> global perfmon tracks activity from all jobs, so concurrency is
>> desirable.
>>
>> Therefore, check if serialization is needed during job submission and
>> if
>> so, attach fence dependences to enforce cross-queue serialization.
>>
>> Signed-off-by: Maíra Canal <mcanal@igalia.com>
>> ---
>> drivers/gpu/drm/v3d/v3d_drv.h | 35 ++++++++++++++++++----
>> drivers/gpu/drm/v3d/v3d_gem.c | 3 ++
>> drivers/gpu/drm/v3d/v3d_perfmon.c | 6 ++++
>> drivers/gpu/drm/v3d/v3d_submit.c | 63
>> +++++++++++++++++++++++++++++++++++++++
>> 4 files changed, 101 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h
>> b/drivers/gpu/drm/v3d/v3d_drv.h
>> index 51486af68cf4..cdf4926d51f2 100644
>> --- a/drivers/gpu/drm/v3d/v3d_drv.h
>> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
>> @@ -74,11 +74,13 @@ struct v3d_queue_state {
>> spinlock_t queue_lock;
>> };
>>
>> -/* Performance monitor object. The perform lifetime is controlled by
>> userspace
>> - * using perfmon related ioctls. A perfmon can be attached to a
>> submit_cl
>> - * request, and when this is the case, HW perf counters will be
>> activated just
>> - * before the submit_cl is submitted to the GPU and disabled when
>> the job is
>> - * done. This way, only events related to a specific job will be
>> counted.
>> +/* Performance monitor object
>> + *
>> + * The performance monitor (perfmon) lifetime is controlled by
>> userspace using
>> + * perfmon related ioctls. A perfmon can be attached to a CL or CSD
>> submission
>> + * request, and when it is, HW performance counters will be
>> activated just
>> + * before the job is submitted to the GPU and disabled when the job
>> is done.
>> + * This way, only events related to a specific submission will be
>> counted.
>> */
>> struct v3d_perfmon {
>> /* Tracks the number of users of the perfmon, when this
>> counter reaches
>> @@ -167,13 +169,31 @@ struct v3d_dev {
>>
>> struct v3d_queue_state queue[V3D_MAX_QUEUES];
>>
>> - /* Tracks the performance monitor state. */
>> + /*
>> + * Tracks the performance monitor state and consistency.
>> + *
>> + * When a non-global perfmon is attached to a job, the
>> scheduler must
>> + * not run any other job on the HW concurrently (otherwise,
>> the
>> + * counters would be polluted by unrelated work).
>> + */
>> struct {
>> /* Protects @active. */
>> spinlock_t lock;
>>
>> /* Perfmon currently programmed in HW (or NULL if
>> none). */
>> struct v3d_perfmon *active;
>> +
>> + /* Finished fence of the most recently submitted job
>> that
>> + * opened a serialization window (i.e. a job with a
>> non-global
>> + * perfmon attached).
>> + */
>> + struct dma_fence *fence;
>> +
>> + /* Finished fence of the most recently submitted job
>> on each HW
>> + * queue. Used so that a new perfmon-carrying job
>> can depend on
>> + * every job currently in-flight across all queues.
>> + */
>> + struct dma_fence *last_hw_fence[V3D_MAX_QUEUES];
>> } perfmon_state;
>>
>> /* Protects bo_stats */
>> @@ -319,6 +339,9 @@ struct v3d_job {
>>
>> struct v3d_dev *v3d;
>>
>> + /* The queue that the job was submitted on. */
>> + enum v3d_queue queue;
>> +
>> /* This is the array of BOs that were looked up at the start
>> * of submission.
>> */
>> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
>> b/drivers/gpu/drm/v3d/v3d_gem.c
>> index 9487ab7acd03..0b415486f5b4 100644
>> --- a/drivers/gpu/drm/v3d/v3d_gem.c
>> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
>> @@ -363,8 +363,11 @@ v3d_gem_destroy(struct drm_device *dev)
>> for (q = 0; q < V3D_MAX_QUEUES; q++) {
>> WARN_ON(v3d->queue[q].active_job);
>> v3d_stats_put(v3d->queue[q].stats);
>> + dma_fence_put(v3d->perfmon_state.last_hw_fence[q]);
>> }
>>
>> + dma_fence_put(v3d->perfmon_state.fence);
>> +
>> drm_mm_takedown(&v3d->mm);
>>
>> dma_free_coherent(v3d->drm.dev, 4096 * 1024, (void *)v3d-
>>> pt,
>> diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c
>> b/drivers/gpu/drm/v3d/v3d_perfmon.c
>> index 3ad0f022753c..07dab7fb3060 100644
>> --- a/drivers/gpu/drm/v3d/v3d_perfmon.c
>> +++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
>> @@ -275,6 +275,12 @@ void v3d_perfmon_start(struct v3d_dev *v3d,
>> struct v3d_perfmon *perfmon)
>> if (!perfmon || v3d->global_perfmon)
>> return;
>>
>> + /* Cross-queue serialization should have drained any
>> previous perfmon
>> + * job before this one runs.
>> + */
>> + if (WARN_ON_ONCE(v3d->perfmon_state.active))
>> + return;
>> +
>> if (!pm_runtime_get_if_active(v3d->drm.dev))
>> return;
>>
>> diff --git a/drivers/gpu/drm/v3d/v3d_submit.c
>> b/drivers/gpu/drm/v3d/v3d_submit.c
>> index 4c526aafc4e0..46fc88bacc95 100644
>> --- a/drivers/gpu/drm/v3d/v3d_submit.c
>> +++ b/drivers/gpu/drm/v3d/v3d_submit.c
>> @@ -234,6 +234,7 @@ v3d_submit_add_job(struct v3d_submit *submit,
>> void **container, size_t size,
>> job->v3d = v3d;
>> job->free = free;
>> job->file_priv = v3d_priv;
>> + job->queue = queue;
>>
>> ret = drm_sched_job_init(&job->base, &v3d_priv-
>>> sched_entity[queue],
>> 1, v3d_priv, submit->file_priv-
>>> client_id);
>> @@ -293,6 +294,62 @@ v3d_attach_perfmon_to_jobs(struct v3d_submit
>> *submit, u32 perfmon_id)
>> return 0;
>> }
>>
>> +/*
>> + * Prepare fences to enforce job serialization when a perfmon is
>> active. A job
>> + * that carries a non-global perfmon must wait for every job
>> currently in-flight
>> + * across all HW queues to finish, otherwise concurrent unrelated
>> work on the
>> + * same core would pollute the performance counters. Symmetrically,
>> while such a
>> + * job is still in-flight, all subsequently submitted jobs must wait
>> for it.
>> + *
>> + * We don't serialize the jobs when using a global perfmon as it's
>> expected to
>> + * track concurrent activity from all jobs.
>> + */
>> +static int
>> +v3d_serialize_for_perfmon(struct v3d_job *job)
>> +{
>> + struct v3d_dev *v3d = job->v3d;
>> + bool is_global_perfmon;
>> + int ret;
>> +
>> + lockdep_assert_held(&v3d->sched_lock);
>> +
>> + scoped_guard(spinlock_irqsave, &v3d->perfmon_state.lock)
>> + is_global_perfmon = !!v3d->global_perfmon;
>> +
>> + if (is_global_perfmon)
>> + goto publish;
>> +
>> + if (job->perfmon) {
>> + for (enum v3d_queue q = 0; q < V3D_MAX_QUEUES; q++)
>> {
>> + struct dma_fence *f = v3d-
>>> perfmon_state.last_hw_fence[q];
>> +
>> + if (!f || dma_fence_is_signaled(f))
>> + continue;
>> +
>> + ret = drm_sched_job_add_dependency(&job-
>>> base, dma_fence_get(f));
>> + if (ret)
>> + return ret;
>> + }
>> + } else if (v3d->perfmon_state.fence &&
>> + !dma_fence_is_signaled(v3d->perfmon_state.fence))
>> {
>> + ret = drm_sched_job_add_dependency(&job->base,
>> +
>> dma_fence_get(v3d->perfmon_state.fence));
>> + if (ret)
>> + return ret;
>> + }
>> +
>> +publish:
>> + dma_fence_put(v3d->perfmon_state.last_hw_fence[job->queue]);
>> + v3d->perfmon_state.last_hw_fence[job->queue] =
>> dma_fence_get(job->done_fence);
>> +
>> + if (job->perfmon && !is_global_perfmon) {
>> + dma_fence_put(v3d->perfmon_state.fence);
>> + v3d->perfmon_state.fence = dma_fence_get(job-
>>> done_fence);
>> + }
>> +
>> + return 0;
>> +}
>> +
>> static void
>> v3d_submit_attach_object_fences(struct v3d_submit *submit)
>> {
>> @@ -337,6 +394,12 @@ v3d_submit_jobs(struct v3d_submit *submit)
>> goto err;
>> }
>>
>> + for (int i = 0; i < submit->job_count; i++) {
>> + ret = v3d_serialize_for_perfmon(submit->jobs[i]);
>> + if (ret)
>> + goto err;
>> + }
>> +
>> for (int i = 0; i < submit->job_count; i++)
>> drm_sched_entity_push_job(&submit->jobs[i]->base);
>>
>>
>
next prev parent reply other threads:[~2026-06-02 10:49 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-31 20:18 [PATCH v2 0/4] drm/v3d: Fix perfmon locking and cross-queue isolation Maíra Canal
2026-05-31 20:18 ` [PATCH v2 1/4] drm/v3d: Fix global performance monitor reference counting Maíra Canal
2026-06-04 4:44 ` Claude review: " Claude Code Review Bot
2026-05-31 20:18 ` [PATCH v2 2/4] drm/v3d: Refactor perfmon locking Maíra Canal
2026-06-01 11:52 ` Iago Toral
2026-06-01 12:03 ` Maíra Canal
2026-06-04 4:44 ` Claude review: " Claude Code Review Bot
2026-05-31 20:18 ` [PATCH v2 3/4] drm/v3d: Serialize jobs across queues when a perfmon is attached Maíra Canal
2026-06-02 7:35 ` Iago Toral
2026-06-02 10:49 ` Maíra Canal [this message]
2026-06-02 11:10 ` Iago Toral
2026-06-04 4:44 ` Claude review: " Claude Code Review Bot
2026-05-31 20:18 ` [PATCH v2 4/4] drm/v3d: Drop the queue argument from v3d_job_add_syncobjs() Maíra Canal
2026-06-04 4:44 ` Claude review: " Claude Code Review Bot
2026-06-04 4:44 ` Claude review: drm/v3d: Fix perfmon locking and cross-queue isolation Claude Code Review Bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=44cbf6ec-3134-44d5-b37c-0c99ac55f678@igalia.com \
--to=mcanal@igalia.com \
--cc=airlied@gmail.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=itoral@igalia.com \
--cc=kernel-dev@igalia.com \
--cc=mwen@igalia.com \
--cc=simona@ffwll.ch \
--cc=tvrtko.ursulin@igalia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox