From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 7A267FD4F17 for ; Tue, 10 Mar 2026 18:33:50 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id CEFD910E2CD; Tue, 10 Mar 2026 18:33:49 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=kernel.org header.i=@kernel.org header.b="KlUKuR78"; dkim-atps=neutral Received: from sea.source.kernel.org (sea.source.kernel.org [172.234.252.31]) by gabe.freedesktop.org (Postfix) with ESMTPS id 308C210E2CD for ; Tue, 10 Mar 2026 18:33:48 +0000 (UTC) Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sea.source.kernel.org (Postfix) with ESMTP id CE8CA40BC6; Tue, 10 Mar 2026 18:33:47 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1EDE0C19423; Tue, 10 Mar 2026 18:33:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773167627; bh=IWze9ndJ7ZTYwgOtXRpowa0bsdsmscXezI4tlBTdUXM=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=KlUKuR785f4ZR+hJ5euN4Ffvta7MF0VHmem7CJig2tvEl06oM9dkNbKUSVbo3zSyf yeqLPU1Bz4c7rbiZZaq1c3YBb8WpYmhaCA3lUEyAwC92F89gglPjBSVQm4y/IwkOdk OYS6lIFBah7JHbkSD3sAQlDDCx0mGxisBJub57dWX8qgE36sw2KbDJt6KtLNPKKoMc l4t3JvKHGEFCg/1QxN3pw00QxofehPC+VXFwCf+y8ScmpsIE612WKUSuHyhodHtCgC H88bcdk7C4U3bnHYyO+tK+97jF0wOvsTLP/VbCMxMvgthED/EmurUTkoQzz9xVM+uR ramA+q3W6EF3g== Message-ID: <8822df6a-e14f-4079-8a54-1ee7c1f78632@kernel.org> Date: Tue, 10 Mar 2026 13:33:46 -0500 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH V2] accel/amdxdna: Fix runtime suspend deadlock when there is pending job To: Lizhi Hou , ogabbay@kernel.org, quic_jhugo@quicinc.com, dri-devel@lists.freedesktop.org, maciej.falkowski@linux.intel.com Cc: linux-kernel@vger.kernel.org, max.zhen@amd.com, sonal.santan@amd.com References: <20260310180058.336348-1-lizhi.hou@amd.com> Content-Language: en-US From: Mario Limonciello In-Reply-To: <20260310180058.336348-1-lizhi.hou@amd.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" On 3/10/26 1:00 PM, Lizhi Hou wrote: > The runtime suspend callback drains the running job workqueue before > suspending the device. If a job is still executing and calls > pm_runtime_resume_and_get(), it can deadlock with the runtime suspend > path. > > Fix this by moving pm_runtime_resume_and_get() from the job execution > routine to the job submission routine, ensuring the device is resumed > before the job is queued and avoiding the deadlock during runtime > suspend. > > Fixes: 063db451832b ("accel/amdxdna: Enhance runtime power management") > Signed-off-by: Lizhi Hou Reviewed-by: Mario Limonciello (AMD) > --- > drivers/accel/amdxdna/aie2_ctx.c | 14 ++------------ > drivers/accel/amdxdna/amdxdna_ctx.c | 10 ++++++++++ > 2 files changed, 12 insertions(+), 12 deletions(-) > > diff --git a/drivers/accel/amdxdna/aie2_ctx.c b/drivers/accel/amdxdna/aie2_ctx.c > index afee5e667f77..c0d348884f74 100644 > --- a/drivers/accel/amdxdna/aie2_ctx.c > +++ b/drivers/accel/amdxdna/aie2_ctx.c > @@ -165,7 +165,6 @@ aie2_sched_notify(struct amdxdna_sched_job *job) > > trace_xdna_job(&job->base, job->hwctx->name, "signaled fence", job->seq); > > - amdxdna_pm_suspend_put(job->hwctx->client->xdna); > job->hwctx->priv->completed++; > dma_fence_signal(fence); > > @@ -290,19 +289,11 @@ aie2_sched_job_run(struct drm_sched_job *sched_job) > struct dma_fence *fence; > int ret; > > - ret = amdxdna_pm_resume_get(hwctx->client->xdna); > - if (ret) > + if (!hwctx->priv->mbox_chann) > return NULL; > > - if (!hwctx->priv->mbox_chann) { > - amdxdna_pm_suspend_put(hwctx->client->xdna); > - return NULL; > - } > - > - if (!mmget_not_zero(job->mm)) { > - amdxdna_pm_suspend_put(hwctx->client->xdna); > + if (!mmget_not_zero(job->mm)) > return ERR_PTR(-ESRCH); > - } > > kref_get(&job->refcnt); > fence = dma_fence_get(job->fence); > @@ -333,7 +324,6 @@ aie2_sched_job_run(struct drm_sched_job *sched_job) > > out: > if (ret) { > - amdxdna_pm_suspend_put(hwctx->client->xdna); > dma_fence_put(job->fence); > aie2_job_put(job); > mmput(job->mm); > diff --git a/drivers/accel/amdxdna/amdxdna_ctx.c b/drivers/accel/amdxdna/amdxdna_ctx.c > index 666dfd7b2a80..838430903a3e 100644 > --- a/drivers/accel/amdxdna/amdxdna_ctx.c > +++ b/drivers/accel/amdxdna/amdxdna_ctx.c > @@ -17,6 +17,7 @@ > #include "amdxdna_ctx.h" > #include "amdxdna_gem.h" > #include "amdxdna_pci_drv.h" > +#include "amdxdna_pm.h" > > #define MAX_HWCTX_ID 255 > #define MAX_ARG_COUNT 4095 > @@ -445,6 +446,7 @@ amdxdna_arg_bos_lookup(struct amdxdna_client *client, > void amdxdna_sched_job_cleanup(struct amdxdna_sched_job *job) > { > trace_amdxdna_debug_point(job->hwctx->name, job->seq, "job release"); > + amdxdna_pm_suspend_put(job->hwctx->client->xdna); > amdxdna_arg_bos_put(job); > amdxdna_gem_put_obj(job->cmd_bo); > dma_fence_put(job->fence); > @@ -482,6 +484,12 @@ int amdxdna_cmd_submit(struct amdxdna_client *client, > goto cmd_put; > } > > + ret = amdxdna_pm_resume_get(xdna); > + if (ret) { > + XDNA_ERR(xdna, "Resume failed, ret %d", ret); > + goto put_bos; > + } > + > idx = srcu_read_lock(&client->hwctx_srcu); > hwctx = xa_load(&client->hwctx_xa, hwctx_hdl); > if (!hwctx) { > @@ -522,6 +530,8 @@ int amdxdna_cmd_submit(struct amdxdna_client *client, > dma_fence_put(job->fence); > unlock_srcu: > srcu_read_unlock(&client->hwctx_srcu, idx); > + amdxdna_pm_suspend_put(xdna); > +put_bos: > amdxdna_arg_bos_put(job); > cmd_put: > amdxdna_gem_put_obj(job->cmd_bo);