From: Boris Brezillon <boris.brezillon@collabora.com>
To: Liviu Dudau <liviu.dudau@arm.com>
Cc: Steven Price <steven.price@arm.com>,
Adrián Larumbe <adrian.larumbe@collabora.com>,
dri-devel@lists.freedesktop.org, kernel@collabora.com,
Nicolas Frattaroli <nicolas.frattaroli@collabora.com>,
Tvrtko Ursulin <tvrtko.ursulin@igalia.com>,
Philipp Stanner <phasta@kernel.org>,
Christian König <christian.koenig@amd.com>
Subject: Re: [PATCH] drm/panthor: Fix the "done_fence is initialized" detection logic
Date: Mon, 9 Mar 2026 16:32:37 +0100 [thread overview]
Message-ID: <20260309163237.1941983b@fedora> (raw)
In-Reply-To: <aa7fHayRMdHn2Yxo@e142607>
On Mon, 9 Mar 2026 14:54:21 +0000
Liviu Dudau <liviu.dudau@arm.com> wrote:
> On Mon, Mar 09, 2026 at 02:15:49PM +0100, Boris Brezillon wrote:
> > On Mon, 9 Mar 2026 11:05:06 +0000
> > Liviu Dudau <liviu.dudau@arm.com> wrote:
> >
> > > > After commit 541c8f2468b9 ("dma-buf: detach fence ops on signal v3"),
> > > > dma_fence::ops == NULL can't be used to check if the fence is initialized
> > > > or not. We could turn this into an "is_signaled() || ops == NULL" test,
> > > > but that's fragile, since it's still subject to dma_fence internal
> > > > changes. So let's have the "is_initialized" state encoded directly in
> > > > the pointer through the lowest bit which is guaranteed to be unused
> > > > because of the dma_fence alignment constraint.
> > >
> > > I'm confused! There is only one place where we end up being interested if the
> > > fence has been initialized or not, and that is in job_release(). I don't
> > > see why checking for "ops != NULL" before calling dma_fence_put() should not
> > > be enough,
> >
> > Because after 541c8f2468b9 ("dma-buf: detach fence ops on signal v3"),
> > dma_fence->ops is set back to NULL at signal time[1].
>
> Yes, I gathered that. What I meant to say was that I don't understand why we need
> all this infrastructure just for one check. Meanwhile Christian pointed out that
> a simpler solution already exists.
>
> >
> > > or even better, why don't we call dma_fence_put() regardless,
> > > as the core code should take care of an uninitialized dma_fence AFAICT.
> >
> > When the job is created, we pre-allocate the done_fence, but we leave it
> > uninitialized until ::run_job() is called. If we call
> > dma_fence_release() (through dma_fence_put()) on a dma_fence that was
> > not dma_fence_init()-ialized, we have a NULL deref on the cb_list, and
> > probably other issues too.
>
> I don't see the benefit of not initializing the done_fence until we ::run_job()
> but I might have missed something obvious.
It has to do with the way we connect dma_fence::seqno to the CS_SYNC
object seqno. The submission process is a multi-step operation:
for_each_job() // can fail
1. allocate and initialize resources (including dma_fence and
drm_sched_fence objects)
2. gather deps
for_each_job() // can't fail
3. arm drm_sched fences
4. queue jobs
5. update syncobjs with the drm_sched fences
If anything fails before step3, we rollback all we've done. Now, if we
were initializing the job::dma_fence when we allocate it, we would
consume a seqno on the panthor_queue, and because the execute-job
sequence assumes the seqno increases monotonically (SYNC_ADD(+1)), we
can't leave holes behind, which would happen if we were initializing at
alloc time and something fails half way through the submission process.
There are ways around it, like using SYNC_SET(seqno) instead of
SYNC_ADD(+1), but those changes are more invasive than delaying the
initialization of the ::done_fence object.
> If we want to keep that, maybe we
> should not be droping the reference in job_release() but when we
> signal the fence.
If we assume that several paths call dma_fence_signal[_locked](), that'd
mean more code and more chances to forget the
dma_fence_put()+done_fence=NULL in case new paths are added. That, or we
need a panthor_job_signal_done_fence() wrapper.
> But that would leak the memory of the uninitialized done_fence.
Yes, the problem with uninitialized fences remains: as soon as we have
this two-step model where allocation and initialization is split, we
need to deal with both cases in the cleanup path.
next prev parent reply other threads:[~2026-03-09 15:32 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-09 10:30 [PATCH] drm/panthor: Fix the "done_fence is initialized" detection logic Boris Brezillon
2026-03-09 10:50 ` Christian König
2026-03-09 11:06 ` Boris Brezillon
2026-03-09 11:05 ` Liviu Dudau
2026-03-09 13:15 ` Boris Brezillon
2026-03-09 14:54 ` Liviu Dudau
2026-03-09 15:32 ` Boris Brezillon [this message]
2026-03-09 11:06 ` Nicolas Frattaroli
2026-03-10 2:25 ` Claude review: " Claude Code Review Bot
2026-03-10 2:25 ` Claude Code Review Bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260309163237.1941983b@fedora \
--to=boris.brezillon@collabora.com \
--cc=adrian.larumbe@collabora.com \
--cc=christian.koenig@amd.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=kernel@collabora.com \
--cc=liviu.dudau@arm.com \
--cc=nicolas.frattaroli@collabora.com \
--cc=phasta@kernel.org \
--cc=steven.price@arm.com \
--cc=tvrtko.ursulin@igalia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox