Re: [PATCH v2 3/4] drm/v3d: Serialize jobs across queues when a perfmon is attached

public inbox for drm-ai-reviews@public-inbox.freedesktop.org
 help / color / mirror / Atom feed

From: Iago Toral <itoral@igalia.com>
To: Maíra Canal <mcanal@igalia.com>, Melissa Wen <mwen@igalia.com>,
	Tvrtko Ursulin <tvrtko.ursulin@igalia.com>,
	David Airlie <airlied@gmail.com>, Simona Vetter <simona@ffwll.ch>
Cc: kernel-dev@igalia.com, dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v2 3/4] drm/v3d: Serialize jobs across queues when a perfmon is attached
Date: Tue, 02 Jun 2026 09:35:15 +0200	[thread overview]
Message-ID: <7a9d503de81b403585962f1c25dd0b506a6eb3f2.camel@igalia.com> (raw)
In-Reply-To: <20260531-v3d-perfmon-lifetime-v2-3-60ed4485a203@igalia.com>

Hi Maíra, this pactch looks good to me but I am wondering if we should
do anything to clean up the perfmon state when a gpu reset happens:
i.e. what would happen if a perfmon job resets? Would we continue to
schedule after-reset jobs to wait on the fence of the reset job?

Iago

El dom, 31-05-2026 a las 17:18 -0300, Maíra Canal escribió:
> A non-global perfmon is meant to count events generated by a specific
> submission, but the scheduler can run jobs from different queues
> concurrently on the same V3D core. Without explicit serialization, an
> unrelated job running in parallel with a perfmon-carrying job
> pollutes
> the counters and generates unusable results.
> 
> To address such issue, we must enforce cross-queue serialization when
> we
> detect a perfmon-carrying submission. It's possible to implement
> serialization by enforcing two rules:
> 
>   1. A job that carries a non-global perfmon must wait for every job
>     currently in-flight across all HW queues to finish.
> 
>   2. While a perfmon-carrying job is still in-flight, all
> subsequently
>     submitted jobs must wait for it.
> 
> Note that serialization is not needed in the global perfmon case, as
> the
> global perfmon tracks activity from all jobs, so concurrency is
> desirable.
> 
> Therefore, check if serialization is needed during job submission and
> if
> so, attach fence dependences to enforce cross-queue serialization.
> 
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> ---
>  drivers/gpu/drm/v3d/v3d_drv.h     | 35 ++++++++++++++++++----
>  drivers/gpu/drm/v3d/v3d_gem.c     |  3 ++
>  drivers/gpu/drm/v3d/v3d_perfmon.c |  6 ++++
>  drivers/gpu/drm/v3d/v3d_submit.c  | 63
> +++++++++++++++++++++++++++++++++++++++
>  4 files changed, 101 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h
> b/drivers/gpu/drm/v3d/v3d_drv.h
> index 51486af68cf4..cdf4926d51f2 100644
> --- a/drivers/gpu/drm/v3d/v3d_drv.h
> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
> @@ -74,11 +74,13 @@ struct v3d_queue_state {
>  	spinlock_t queue_lock;
>  };
>  
> -/* Performance monitor object. The perform lifetime is controlled by
> userspace
> - * using perfmon related ioctls. A perfmon can be attached to a
> submit_cl
> - * request, and when this is the case, HW perf counters will be
> activated just
> - * before the submit_cl is submitted to the GPU and disabled when
> the job is
> - * done. This way, only events related to a specific job will be
> counted.
> +/* Performance monitor object
> + *
> + * The performance monitor (perfmon) lifetime is controlled by
> userspace using
> + * perfmon related ioctls. A perfmon can be attached to a CL or CSD
> submission
> + * request, and when it is, HW performance counters will be
> activated just
> + * before the job is submitted to the GPU and disabled when the job
> is done.
> + * This way, only events related to a specific submission will be
> counted.
>   */
>  struct v3d_perfmon {
>  	/* Tracks the number of users of the perfmon, when this
> counter reaches
> @@ -167,13 +169,31 @@ struct v3d_dev {
>  
>  	struct v3d_queue_state queue[V3D_MAX_QUEUES];
>  
> -	/* Tracks the performance monitor state. */
> +	/*
> +	 * Tracks the performance monitor state and consistency.
> +	 *
> +	 * When a non-global perfmon is attached to a job, the
> scheduler must
> +	 * not run any other job on the HW concurrently (otherwise,
> the
> +	 * counters would be polluted by unrelated work).
> +	 */
>  	struct {
>  		/* Protects @active. */
>  		spinlock_t lock;
>  
>  		/* Perfmon currently programmed in HW (or NULL if
> none). */
>  		struct v3d_perfmon *active;
> +
> +		/* Finished fence of the most recently submitted job
> that
> +		 * opened a serialization window (i.e. a job with a
> non-global
> +		 * perfmon attached).
> +		 */
> +		struct dma_fence *fence;
> +
> +		/* Finished fence of the most recently submitted job
> on each HW
> +		 * queue. Used so that a new perfmon-carrying job
> can depend on
> +		 * every job currently in-flight across all queues.
> +		 */
> +		struct dma_fence *last_hw_fence[V3D_MAX_QUEUES];
>  	} perfmon_state;
>  
>  	/* Protects bo_stats */
> @@ -319,6 +339,9 @@ struct v3d_job {
>  
>  	struct v3d_dev *v3d;
>  
> +	/* The queue that the job was submitted on. */
> +	enum v3d_queue queue;
> +
>  	/* This is the array of BOs that were looked up at the start
>  	 * of submission.
>  	 */
> diff --git a/drivers/gpu/drm/v3d/v3d_gem.c
> b/drivers/gpu/drm/v3d/v3d_gem.c
> index 9487ab7acd03..0b415486f5b4 100644
> --- a/drivers/gpu/drm/v3d/v3d_gem.c
> +++ b/drivers/gpu/drm/v3d/v3d_gem.c
> @@ -363,8 +363,11 @@ v3d_gem_destroy(struct drm_device *dev)
>  	for (q = 0; q < V3D_MAX_QUEUES; q++) {
>  		WARN_ON(v3d->queue[q].active_job);
>  		v3d_stats_put(v3d->queue[q].stats);
> +		dma_fence_put(v3d->perfmon_state.last_hw_fence[q]);
>  	}
>  
> +	dma_fence_put(v3d->perfmon_state.fence);
> +
>  	drm_mm_takedown(&v3d->mm);
>  
>  	dma_free_coherent(v3d->drm.dev, 4096 * 1024, (void *)v3d-
> >pt,
> diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c
> b/drivers/gpu/drm/v3d/v3d_perfmon.c
> index 3ad0f022753c..07dab7fb3060 100644
> --- a/drivers/gpu/drm/v3d/v3d_perfmon.c
> +++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
> @@ -275,6 +275,12 @@ void v3d_perfmon_start(struct v3d_dev *v3d,
> struct v3d_perfmon *perfmon)
>  	if (!perfmon || v3d->global_perfmon)
>  		return;
>  
> +	/* Cross-queue serialization should have drained any
> previous perfmon
> +	 * job before this one runs.
> +	 */
> +	if (WARN_ON_ONCE(v3d->perfmon_state.active))
> +		return;
> +
>  	if (!pm_runtime_get_if_active(v3d->drm.dev))
>  		return;
>  
> diff --git a/drivers/gpu/drm/v3d/v3d_submit.c
> b/drivers/gpu/drm/v3d/v3d_submit.c
> index 4c526aafc4e0..46fc88bacc95 100644
> --- a/drivers/gpu/drm/v3d/v3d_submit.c
> +++ b/drivers/gpu/drm/v3d/v3d_submit.c
> @@ -234,6 +234,7 @@ v3d_submit_add_job(struct v3d_submit *submit,
> void **container, size_t size,
>  	job->v3d = v3d;
>  	job->free = free;
>  	job->file_priv = v3d_priv;
> +	job->queue = queue;
>  
>  	ret = drm_sched_job_init(&job->base, &v3d_priv-
> >sched_entity[queue],
>  				 1, v3d_priv, submit->file_priv-
> >client_id);
> @@ -293,6 +294,62 @@ v3d_attach_perfmon_to_jobs(struct v3d_submit
> *submit, u32 perfmon_id)
>  	return 0;
>  }
>  
> +/*
> + * Prepare fences to enforce job serialization when a perfmon is
> active. A job
> + * that carries a non-global perfmon must wait for every job
> currently in-flight
> + * across all HW queues to finish, otherwise concurrent unrelated
> work on the
> + * same core would pollute the performance counters. Symmetrically,
> while such a
> + * job is still in-flight, all subsequently submitted jobs must wait
> for it.
> + *
> + * We don't serialize the jobs when using a global perfmon as it's
> expected to
> + * track concurrent activity from all jobs.
> + */
> +static int
> +v3d_serialize_for_perfmon(struct v3d_job *job)
> +{
> +	struct v3d_dev *v3d = job->v3d;
> +	bool is_global_perfmon;
> +	int ret;
> +
> +	lockdep_assert_held(&v3d->sched_lock);
> +
> +	scoped_guard(spinlock_irqsave, &v3d->perfmon_state.lock)
> +		is_global_perfmon = !!v3d->global_perfmon;
> +
> +	if (is_global_perfmon)
> +		goto publish;
> +
> +	if (job->perfmon) {
> +		for (enum v3d_queue q = 0; q < V3D_MAX_QUEUES; q++)
> {
> +			struct dma_fence *f = v3d-
> >perfmon_state.last_hw_fence[q];
> +
> +			if (!f || dma_fence_is_signaled(f))
> +				continue;
> +
> +			ret = drm_sched_job_add_dependency(&job-
> >base, dma_fence_get(f));
> +			if (ret)
> +				return ret;
> +		}
> +	} else if (v3d->perfmon_state.fence &&
> +		   !dma_fence_is_signaled(v3d->perfmon_state.fence))
> {
> +		ret = drm_sched_job_add_dependency(&job->base,
> +						  
> dma_fence_get(v3d->perfmon_state.fence));
> +		if (ret)
> +			return ret;
> +	}
> +
> +publish:
> +	dma_fence_put(v3d->perfmon_state.last_hw_fence[job->queue]);
> +	v3d->perfmon_state.last_hw_fence[job->queue] =
> dma_fence_get(job->done_fence);
> +
> +	if (job->perfmon && !is_global_perfmon) {
> +		dma_fence_put(v3d->perfmon_state.fence);
> +		v3d->perfmon_state.fence = dma_fence_get(job-
> >done_fence);
> +	}
> +
> +	return 0;
> +}
> +
>  static void
>  v3d_submit_attach_object_fences(struct v3d_submit *submit)
>  {
> @@ -337,6 +394,12 @@ v3d_submit_jobs(struct v3d_submit *submit)
>  			goto err;
>  	}
>  
> +	for (int i = 0; i < submit->job_count; i++) {
> +		ret = v3d_serialize_for_perfmon(submit->jobs[i]);
> +		if (ret)
> +			goto err;
> +	}
> +
>  	for (int i = 0; i < submit->job_count; i++)
>  		drm_sched_entity_push_job(&submit->jobs[i]->base);
>  
>

next prev parent reply	other threads:[~2026-06-02  7:35 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-31 20:18 [PATCH v2 0/4] drm/v3d: Fix perfmon locking and cross-queue isolation Maíra Canal
2026-05-31 20:18 ` [PATCH v2 1/4] drm/v3d: Fix global performance monitor reference counting Maíra Canal
2026-06-04  4:44   ` Claude review: " Claude Code Review Bot
2026-05-31 20:18 ` [PATCH v2 2/4] drm/v3d: Refactor perfmon locking Maíra Canal
2026-06-01 11:52   ` Iago Toral
2026-06-01 12:03     ` Maíra Canal
2026-06-04  4:44   ` Claude review: " Claude Code Review Bot
2026-05-31 20:18 ` [PATCH v2 3/4] drm/v3d: Serialize jobs across queues when a perfmon is attached Maíra Canal
2026-06-02  7:35   ` Iago Toral [this message]
2026-06-02 10:49     ` Maíra Canal
2026-06-02 11:10       ` Iago Toral
2026-06-04  4:44   ` Claude review: " Claude Code Review Bot
2026-05-31 20:18 ` [PATCH v2 4/4] drm/v3d: Drop the queue argument from v3d_job_add_syncobjs() Maíra Canal
2026-06-04  4:44   ` Claude review: " Claude Code Review Bot
2026-06-04  4:44 ` Claude review: drm/v3d: Fix perfmon locking and cross-queue isolation Claude Code Review Bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7a9d503de81b403585962f1c25dd0b506a6eb3f2.camel@igalia.com \
    --to=itoral@igalia.com \
    --cc=airlied@gmail.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=kernel-dev@igalia.com \
    --cc=mcanal@igalia.com \
    --cc=mwen@igalia.com \
    --cc=simona@ffwll.ch \
    --cc=tvrtko.ursulin@igalia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox