Re: [PATCH] drm/panthor: Fix the "done_fence is initialized" detection logic

public inbox for drm-ai-reviews@public-inbox.freedesktop.org
 help / color / mirror / Atom feed

From: Boris Brezillon <boris.brezillon@collabora.com>
To: Liviu Dudau <liviu.dudau@arm.com>
Cc: Steven Price <steven.price@arm.com>,
	Adrián Larumbe <adrian.larumbe@collabora.com>,
	dri-devel@lists.freedesktop.org, kernel@collabora.com,
	Nicolas Frattaroli <nicolas.frattaroli@collabora.com>,
	Tvrtko Ursulin <tvrtko.ursulin@igalia.com>,
	Philipp Stanner <phasta@kernel.org>,
	Christian König <christian.koenig@amd.com>
Subject: Re: [PATCH] drm/panthor: Fix the "done_fence is initialized" detection logic
Date: Mon, 9 Mar 2026 16:32:37 +0100	[thread overview]
Message-ID: <20260309163237.1941983b@fedora> (raw)
In-Reply-To: <aa7fHayRMdHn2Yxo@e142607>

On Mon, 9 Mar 2026 14:54:21 +0000
Liviu Dudau <liviu.dudau@arm.com> wrote:

> On Mon, Mar 09, 2026 at 02:15:49PM +0100, Boris Brezillon wrote:
> > On Mon, 9 Mar 2026 11:05:06 +0000
> > Liviu Dudau <liviu.dudau@arm.com> wrote:
> >   
> > > > After commit 541c8f2468b9 ("dma-buf: detach fence ops on signal v3"),
> > > > dma_fence::ops == NULL can't be used to check if the fence is initialized
> > > > or not. We could turn this into an "is_signaled() || ops == NULL" test,
> > > > but that's fragile, since it's still subject to dma_fence internal
> > > > changes. So let's have the "is_initialized" state encoded directly in
> > > > the pointer through the lowest bit which is guaranteed to be unused
> > > > because of the dma_fence alignment constraint.    
> > > 
> > > I'm confused! There is only one place where we end up being interested if the
> > > fence has been initialized or not, and that is in job_release(). I don't
> > > see why checking for "ops != NULL" before calling dma_fence_put() should not
> > > be enough,  
> > 
> > Because after 541c8f2468b9 ("dma-buf: detach fence ops on signal v3"),
> > dma_fence->ops is set back to NULL at signal time[1].  
> 
> Yes, I gathered that. What I meant to say was that I don't understand why we need
> all this infrastructure just for one check. Meanwhile Christian pointed out that
> a simpler solution already exists.
> 
> >   
> > > or even better, why don't we call dma_fence_put() regardless,
> > > as the core code should take care of an uninitialized dma_fence AFAICT.  
> > 
> > When the job is created, we pre-allocate the done_fence, but we leave it
> > uninitialized until ::run_job() is called. If we call
> > dma_fence_release() (through dma_fence_put()) on a dma_fence that was
> > not dma_fence_init()-ialized, we have a NULL deref on the cb_list, and
> > probably other issues too.  
> 
> I don't see the benefit of not initializing the done_fence until we ::run_job()
> but I might have missed something obvious.

It has to do with the way we connect dma_fence::seqno to the CS_SYNC
object seqno. The submission process is a multi-step operation:

for_each_job() // can fail
	1. allocate and initialize resources (including dma_fence and
	   drm_sched_fence objects)
	2. gather deps

for_each_job() // can't fail
	3. arm drm_sched fences
	4. queue jobs
	5. update syncobjs with the drm_sched fences

If anything fails before step3, we rollback all we've done. Now, if we
were initializing the job::dma_fence when we allocate it, we would
consume a seqno on the panthor_queue, and because the execute-job
sequence assumes the seqno increases monotonically (SYNC_ADD(+1)), we
can't leave holes behind, which would happen if we were initializing at
alloc time and something fails half way through the submission process.
There are ways around it, like using SYNC_SET(seqno) instead of
SYNC_ADD(+1), but those changes are more invasive than delaying the
initialization of the ::done_fence object.

> If we want to keep that, maybe we
> should not be droping the reference in job_release() but when we
> signal the fence.

If we assume that several paths call dma_fence_signal[_locked](), that'd
mean more code and more chances to forget the
dma_fence_put()+done_fence=NULL in case new paths are added. That, or we
need a panthor_job_signal_done_fence() wrapper.

> But that would leak the memory of the uninitialized done_fence.

Yes, the problem with uninitialized fences remains: as soon as we have
this two-step model where allocation and initialization is split, we
need to deal with both cases in the cleanup path.

next prev parent reply	other threads:[~2026-03-09 15:32 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-09 10:30 [PATCH] drm/panthor: Fix the "done_fence is initialized" detection logic Boris Brezillon
2026-03-09 10:50 ` Christian König
2026-03-09 11:06   ` Boris Brezillon
2026-03-09 11:05 ` Liviu Dudau
2026-03-09 13:15   ` Boris Brezillon
2026-03-09 14:54     ` Liviu Dudau
2026-03-09 15:32       ` Boris Brezillon [this message]
2026-03-09 11:06 ` Nicolas Frattaroli
2026-03-10  2:25 ` Claude review: " Claude Code Review Bot
2026-03-10  2:25 ` Claude Code Review Bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260309163237.1941983b@fedora \
    --to=boris.brezillon@collabora.com \
    --cc=adrian.larumbe@collabora.com \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=kernel@collabora.com \
    --cc=liviu.dudau@arm.com \
    --cc=nicolas.frattaroli@collabora.com \
    --cc=phasta@kernel.org \
    --cc=steven.price@arm.com \
    --cc=tvrtko.ursulin@igalia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox