From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 08E84C5DF71 for ; Tue, 2 Jun 2026 07:35:32 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 552BC1138AE; Tue, 2 Jun 2026 07:35:31 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=igalia.com header.i=@igalia.com header.b="KjTZkAAF"; dkim-atps=neutral Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by gabe.freedesktop.org (Postfix) with ESMTPS id E35B21138AE for ; Tue, 2 Jun 2026 07:35:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=MIME-Version:Content-Transfer-Encoding:Content-Type:References: In-Reply-To:Date:Cc:To:From:Subject:Message-ID:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=BiUGaHbvV4QyxKg1mUkrI3tIJET9EfA97Sdt30Gxg0A=; b=KjTZkAAF0fZS7rtL8Ym68ND87X hTXcwPRJc2YllvS2IBw5leqnt+pndf9Po6VqH1RjccWMR827Pnw71yrif3l9AiWBH9hD6ZseLoW/t VO57RbJcsoVn3ya7iaQA1QWZuImI9jS2V6nOdun+l97eBWAHINuIK5/wpHJbxYXmTeMDw2JM0JmPF KOaJiLtMUGJmvj/aPXCWnebj0gNvwpZsfSWqDEskplqp/neK/Nk7aldaEezTQDL7jPhcsaMeNkIwk z6LdyrgdKKDO5bEo/+iQwxtMckDHl+0BrZ69gI+sARgYSLzmlwvZjhJ4HsT/wuGojSL9Z/RBodsDL 4j9oNAWA==; Received: from static-234-112-85-188.ipcom.comunitel.net ([188.85.112.234] helo=[192.168.0.17]) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_SECP256R1__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1wUJen-00BaAC-SQ; Tue, 02 Jun 2026 09:35:25 +0200 Message-ID: <7a9d503de81b403585962f1c25dd0b506a6eb3f2.camel@igalia.com> Subject: Re: [PATCH v2 3/4] drm/v3d: Serialize jobs across queues when a perfmon is attached From: Iago Toral To: =?ISO-8859-1?Q?Ma=EDra?= Canal , Melissa Wen , Tvrtko Ursulin , David Airlie , Simona Vetter Cc: kernel-dev@igalia.com, dri-devel@lists.freedesktop.org Date: Tue, 02 Jun 2026 09:35:15 +0200 In-Reply-To: <20260531-v3d-perfmon-lifetime-v2-3-60ed4485a203@igalia.com> References: <20260531-v3d-perfmon-lifetime-v2-0-60ed4485a203@igalia.com> <20260531-v3d-perfmon-lifetime-v2-3-60ed4485a203@igalia.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.52.3-0ubuntu1.1 MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Hi Ma=C3=ADra, this pactch looks good to me but I am wondering if we should do anything to clean up the perfmon state when a gpu reset happens: i.e. what would happen if a perfmon job resets? Would we continue to schedule after-reset jobs to wait on the fence of the reset job? Iago El dom, 31-05-2026 a las 17:18 -0300, Ma=C3=ADra Canal escribi=C3=B3: > A non-global perfmon is meant to count events generated by a specific > submission, but the scheduler can run jobs from different queues > concurrently on the same V3D core. Without explicit serialization, an > unrelated job running in parallel with a perfmon-carrying job > pollutes > the counters and generates unusable results. >=20 > To address such issue, we must enforce cross-queue serialization when > we > detect a perfmon-carrying submission. It's possible to implement > serialization by enforcing two rules: >=20 > =C2=A0 1. A job that carries a non-global perfmon must wait for every job > =C2=A0=C2=A0=C2=A0 currently in-flight across all HW queues to finish. >=20 > =C2=A0 2. While a perfmon-carrying job is still in-flight, all > subsequently > =C2=A0=C2=A0=C2=A0 submitted jobs must wait for it. >=20 > Note that serialization is not needed in the global perfmon case, as > the > global perfmon tracks activity from all jobs, so concurrency is > desirable. >=20 > Therefore, check if serialization is needed during job submission and > if > so, attach fence dependences to enforce cross-queue serialization. >=20 > Signed-off-by: Ma=C3=ADra Canal > --- > =C2=A0drivers/gpu/drm/v3d/v3d_drv.h=C2=A0=C2=A0=C2=A0=C2=A0 | 35 ++++++++= ++++++++++---- > =C2=A0drivers/gpu/drm/v3d/v3d_gem.c=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0 3 ++ > =C2=A0drivers/gpu/drm/v3d/v3d_perfmon.c |=C2=A0 6 ++++ > =C2=A0drivers/gpu/drm/v3d/v3d_submit.c=C2=A0 | 63 > +++++++++++++++++++++++++++++++++++++++ > =C2=A04 files changed, 101 insertions(+), 6 deletions(-) >=20 > diff --git a/drivers/gpu/drm/v3d/v3d_drv.h > b/drivers/gpu/drm/v3d/v3d_drv.h > index 51486af68cf4..cdf4926d51f2 100644 > --- a/drivers/gpu/drm/v3d/v3d_drv.h > +++ b/drivers/gpu/drm/v3d/v3d_drv.h > @@ -74,11 +74,13 @@ struct v3d_queue_state { > =C2=A0 spinlock_t queue_lock; > =C2=A0}; > =C2=A0 > -/* Performance monitor object. The perform lifetime is controlled by > userspace > - * using perfmon related ioctls. A perfmon can be attached to a > submit_cl > - * request, and when this is the case, HW perf counters will be > activated just > - * before the submit_cl is submitted to the GPU and disabled when > the job is > - * done. This way, only events related to a specific job will be > counted. > +/* Performance monitor object > + * > + * The performance monitor (perfmon) lifetime is controlled by > userspace using > + * perfmon related ioctls. A perfmon can be attached to a CL or CSD > submission > + * request, and when it is, HW performance counters will be > activated just > + * before the job is submitted to the GPU and disabled when the job > is done. > + * This way, only events related to a specific submission will be > counted. > =C2=A0 */ > =C2=A0struct v3d_perfmon { > =C2=A0 /* Tracks the number of users of the perfmon, when this > counter reaches > @@ -167,13 +169,31 @@ struct v3d_dev { > =C2=A0 > =C2=A0 struct v3d_queue_state queue[V3D_MAX_QUEUES]; > =C2=A0 > - /* Tracks the performance monitor state. */ > + /* > + * Tracks the performance monitor state and consistency. > + * > + * When a non-global perfmon is attached to a job, the > scheduler must > + * not run any other job on the HW concurrently (otherwise, > the > + * counters would be polluted by unrelated work). > + */ > =C2=A0 struct { > =C2=A0 /* Protects @active. */ > =C2=A0 spinlock_t lock; > =C2=A0 > =C2=A0 /* Perfmon currently programmed in HW (or NULL if > none). */ > =C2=A0 struct v3d_perfmon *active; > + > + /* Finished fence of the most recently submitted job > that > + * opened a serialization window (i.e. a job with a > non-global > + * perfmon attached). > + */ > + struct dma_fence *fence; > + > + /* Finished fence of the most recently submitted job > on each HW > + * queue. Used so that a new perfmon-carrying job > can depend on > + * every job currently in-flight across all queues. > + */ > + struct dma_fence *last_hw_fence[V3D_MAX_QUEUES]; > =C2=A0 } perfmon_state; > =C2=A0 > =C2=A0 /* Protects bo_stats */ > @@ -319,6 +339,9 @@ struct v3d_job { > =C2=A0 > =C2=A0 struct v3d_dev *v3d; > =C2=A0 > + /* The queue that the job was submitted on. */ > + enum v3d_queue queue; > + > =C2=A0 /* This is the array of BOs that were looked up at the start > =C2=A0 * of submission. > =C2=A0 */ > diff --git a/drivers/gpu/drm/v3d/v3d_gem.c > b/drivers/gpu/drm/v3d/v3d_gem.c > index 9487ab7acd03..0b415486f5b4 100644 > --- a/drivers/gpu/drm/v3d/v3d_gem.c > +++ b/drivers/gpu/drm/v3d/v3d_gem.c > @@ -363,8 +363,11 @@ v3d_gem_destroy(struct drm_device *dev) > =C2=A0 for (q =3D 0; q < V3D_MAX_QUEUES; q++) { > =C2=A0 WARN_ON(v3d->queue[q].active_job); > =C2=A0 v3d_stats_put(v3d->queue[q].stats); > + dma_fence_put(v3d->perfmon_state.last_hw_fence[q]); > =C2=A0 } > =C2=A0 > + dma_fence_put(v3d->perfmon_state.fence); > + > =C2=A0 drm_mm_takedown(&v3d->mm); > =C2=A0 > =C2=A0 dma_free_coherent(v3d->drm.dev, 4096 * 1024, (void *)v3d- > >pt, > diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c > b/drivers/gpu/drm/v3d/v3d_perfmon.c > index 3ad0f022753c..07dab7fb3060 100644 > --- a/drivers/gpu/drm/v3d/v3d_perfmon.c > +++ b/drivers/gpu/drm/v3d/v3d_perfmon.c > @@ -275,6 +275,12 @@ void v3d_perfmon_start(struct v3d_dev *v3d, > struct v3d_perfmon *perfmon) > =C2=A0 if (!perfmon || v3d->global_perfmon) > =C2=A0 return; > =C2=A0 > + /* Cross-queue serialization should have drained any > previous perfmon > + * job before this one runs. > + */ > + if (WARN_ON_ONCE(v3d->perfmon_state.active)) > + return; > + > =C2=A0 if (!pm_runtime_get_if_active(v3d->drm.dev)) > =C2=A0 return; > =C2=A0 > diff --git a/drivers/gpu/drm/v3d/v3d_submit.c > b/drivers/gpu/drm/v3d/v3d_submit.c > index 4c526aafc4e0..46fc88bacc95 100644 > --- a/drivers/gpu/drm/v3d/v3d_submit.c > +++ b/drivers/gpu/drm/v3d/v3d_submit.c > @@ -234,6 +234,7 @@ v3d_submit_add_job(struct v3d_submit *submit, > void **container, size_t size, > =C2=A0 job->v3d =3D v3d; > =C2=A0 job->free =3D free; > =C2=A0 job->file_priv =3D v3d_priv; > + job->queue =3D queue; > =C2=A0 > =C2=A0 ret =3D drm_sched_job_init(&job->base, &v3d_priv- > >sched_entity[queue], > =C2=A0 1, v3d_priv, submit->file_priv- > >client_id); > @@ -293,6 +294,62 @@ v3d_attach_perfmon_to_jobs(struct v3d_submit > *submit, u32 perfmon_id) > =C2=A0 return 0; > =C2=A0} > =C2=A0 > +/* > + * Prepare fences to enforce job serialization when a perfmon is > active. A job > + * that carries a non-global perfmon must wait for every job > currently in-flight > + * across all HW queues to finish, otherwise concurrent unrelated > work on the > + * same core would pollute the performance counters. Symmetrically, > while such a > + * job is still in-flight, all subsequently submitted jobs must wait > for it. > + * > + * We don't serialize the jobs when using a global perfmon as it's > expected to > + * track concurrent activity from all jobs. > + */ > +static int > +v3d_serialize_for_perfmon(struct v3d_job *job) > +{ > + struct v3d_dev *v3d =3D job->v3d; > + bool is_global_perfmon; > + int ret; > + > + lockdep_assert_held(&v3d->sched_lock); > + > + scoped_guard(spinlock_irqsave, &v3d->perfmon_state.lock) > + is_global_perfmon =3D !!v3d->global_perfmon; > + > + if (is_global_perfmon) > + goto publish; > + > + if (job->perfmon) { > + for (enum v3d_queue q =3D 0; q < V3D_MAX_QUEUES; q++) > { > + struct dma_fence *f =3D v3d- > >perfmon_state.last_hw_fence[q]; > + > + if (!f || dma_fence_is_signaled(f)) > + continue; > + > + ret =3D drm_sched_job_add_dependency(&job- > >base, dma_fence_get(f)); > + if (ret) > + return ret; > + } > + } else if (v3d->perfmon_state.fence && > + =C2=A0=C2=A0 !dma_fence_is_signaled(v3d->perfmon_state.fence)) > { > + ret =3D drm_sched_job_add_dependency(&job->base, > + =C2=A0=C2=A0 > dma_fence_get(v3d->perfmon_state.fence)); > + if (ret) > + return ret; > + } > + > +publish: > + dma_fence_put(v3d->perfmon_state.last_hw_fence[job->queue]); > + v3d->perfmon_state.last_hw_fence[job->queue] =3D > dma_fence_get(job->done_fence); > + > + if (job->perfmon && !is_global_perfmon) { > + dma_fence_put(v3d->perfmon_state.fence); > + v3d->perfmon_state.fence =3D dma_fence_get(job- > >done_fence); > + } > + > + return 0; > +} > + > =C2=A0static void > =C2=A0v3d_submit_attach_object_fences(struct v3d_submit *submit) > =C2=A0{ > @@ -337,6 +394,12 @@ v3d_submit_jobs(struct v3d_submit *submit) > =C2=A0 goto err; > =C2=A0 } > =C2=A0 > + for (int i =3D 0; i < submit->job_count; i++) { > + ret =3D v3d_serialize_for_perfmon(submit->jobs[i]); > + if (ret) > + goto err; > + } > + > =C2=A0 for (int i =3D 0; i < submit->job_count; i++) > =C2=A0 drm_sched_entity_push_job(&submit->jobs[i]->base); > =C2=A0 >=20