From: "Ryosuke Yasuoka" <ryasuoka@redhat.com>
To: "Dmitry Osipenko" <dmitry.osipenko@collabora.com>,
"David Airlie" <airlied@redhat.com>,
"Gerd Hoffmann" <kraxel@redhat.com>,
"Gurchetan Singh" <gurchetansingh@chromium.org>,
"Chia-I Wu" <olvaffe@gmail.com>,
"Maarten Lankhorst" <maarten.lankhorst@linux.intel.com>,
"Maxime Ripard" <mripard@kernel.org>,
"Thomas Zimmermann" <tzimmermann@suse.de>,
"Simona Vetter" <simona@ffwll.ch>
Cc: <dri-devel@lists.freedesktop.org>,
<virtualization@lists.linux.dev>, <linux-kernel@vger.kernel.org>,
<syzbot+d6dd6f86d3aaf7eebe7406e45c1c6e549453f224@syzkaller.appspotmail.com>,
<syzbot+908bd910da5dd79b88de4cf7baf376cc873a922e@syzkaller.appspotmail.com>
Subject: Re: [PATCH v2] drm/virtio: abort virtqueue wait on device removal to avoid hung task
Date: Fri, 22 May 2026 08:51:28 +0000 [thread overview]
Message-ID: <18b1d7267ace8b76.329809b8dd1e13b3.4134c0f069791aa6@ryasuoka-thinkpadx1carbongen9.tokyo.csb> (raw)
In-Reply-To: <4bc3709f-21bc-49f5-8fcf-972dec1397f7@collabora.com>
Hi Dmitry,
Thank you for your review and the comment.
On 21/05/2026 22:01, Dmitry Osipenko wrote:
> 21.05.2026 05:19, Ryosuke Yasuoka пишет:
>> virtio_gpu_queue_ctrl_sgs() and virtio_gpu_queue_cursor() use
>> wait_event() without any abort condition when waiting for virtqueue
>> space. If the host device stops processing commands, these waits block
>> indefinitely inside a drm_dev_enter/exit() critical section. Since
>> drm_dev_unplug(), which is called in device removal and system shutdown
>> call path, blocks on synchronize_srcu() until all critical sections
>> complete, device removal and system shutdown also hang.
>>
>> Add a vqs_released flag to virtio_gpu_device and include it in the
>> wait_event() condition. Set the flag and wake up both queues in a new
>> virtio_gpu_release_vqs() helper, called before drm_dev_unplug() in both
>> virtio_gpu_remove() and virtio_gpu_shutdown(). When the flag is set, the
>> wait returns immediately and the command is aborted, following the same
>> cleanup path as drm_dev_enter() failure.
>>
>> Reported-by: syzbot+d6dd6f86d3aaf7eebe7406e45c1c6e549453f224@syzkaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?id=d6dd6f86d3aaf7eebe7406e45c1c6e549453f224
>> Reported-by: syzbot+908bd910da5dd79b88de4cf7baf376cc873a922e@syzkaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?id=908bd910da5dd79b88de4cf7baf376cc873a922e
>> Signed-off-by: Ryosuke Yasuoka <ryasuoka@redhat.com>
>> ---
>> Changes in v2:
>> - Update the commit message.
>> - Replace wait_event_timeout() with wait_event() using a compound
>> condition that includes a new vqs_released flag.
>> - Add virtio_gpu_release_vqs() helper to set the flag and wake up
>> both queues, called before drm_dev_unplug() in remove and shutdown
>> paths.
>> - Remove the hardcoded 5-second timeout. Recovery is now driven by
>> the driver flag instead of an arbitrary timeout value.
>> ---
>> drivers/gpu/drm/virtio/virtgpu_drv.c | 15 +++++++++++++++
>> drivers/gpu/drm/virtio/virtgpu_drv.h | 1 +
>> drivers/gpu/drm/virtio/virtgpu_vq.c | 23 +++++++++++++++++++++--
>> 3 files changed, 37 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c b/drivers/gpu/drm/virtio/virtgpu_drv.c
>> index a5ce96fb8a1d..e4fe5e0780f9 100644
>> --- a/drivers/gpu/drm/virtio/virtgpu_drv.c
>> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
>> @@ -119,10 +119,24 @@ static int virtio_gpu_probe(struct virtio_device *vdev)
>> return ret;
>> }
>>
>> +/*
>> + * Release pending virtqueue waits so the drm_dev_enter/exit() critical
>> + * sections complete before drm_dev_unplug() blocks on synchronize_srcu().
>> + */
>> +static void virtio_gpu_release_vqs(struct drm_device *dev)
>> +{
>> + struct virtio_gpu_device *vgdev = dev->dev_private;
>> +
>> + vgdev->vqs_released = true;
>> + wake_up_all(&vgdev->ctrlq.ack_queue);
>> + wake_up_all(&vgdev->cursorq.ack_queue);
>> +}
>> +
>> static void virtio_gpu_remove(struct virtio_device *vdev)
>> {
>> struct drm_device *dev = vdev->priv;
>>
>> + virtio_gpu_release_vqs(dev);
>> drm_dev_unplug(dev);
>> drm_atomic_helper_shutdown(dev);
>> virtio_gpu_deinit(dev);
>> @@ -133,6 +147,7 @@ static void virtio_gpu_shutdown(struct virtio_device *vdev)
>> {
>> struct drm_device *dev = vdev->priv;
>>
>> + virtio_gpu_release_vqs(dev);
>> /* stop talking to the device */
>> drm_dev_unplug(dev);
>> }
>> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h
>> index f17660a71a3e..0bd69a40857e 100644
>> --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
>> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
>> @@ -235,6 +235,7 @@ struct virtio_gpu_device {
>>
>> struct virtio_gpu_queue ctrlq;
>> struct virtio_gpu_queue cursorq;
>> + bool vqs_released;
>> struct kmem_cache *vbufs;
>>
>> atomic_t pending_commands;
>> diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c
>> index 67865810a2e7..8057a9b7356d 100644
>> --- a/drivers/gpu/drm/virtio/virtgpu_vq.c
>> +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
>> @@ -396,7 +396,19 @@ static int virtio_gpu_queue_ctrl_sgs(struct virtio_gpu_device *vgdev,
>> if (vq->num_free < elemcnt) {
>> spin_unlock(&vgdev->ctrlq.qlock);
>> virtio_gpu_notify(vgdev);
>> - wait_event(vgdev->ctrlq.ack_queue, vq->num_free >= elemcnt);
>> + wait_event(vgdev->ctrlq.ack_queue,
>> + vq->num_free >= elemcnt || vgdev->vqs_released);
>> + /*
>> + * Set by virtio_gpu_release_vqs() to unblock
>> + * synchronize_srcu() wait in drm_dev_unplug().
>> + */
>> + if (vgdev->vqs_released) {
>> + if (fence && vbuf->objs)
>> + virtio_gpu_array_unlock_resv(vbuf->objs);
>> + free_vbuf(vgdev, vbuf);
>> + drm_dev_exit(idx);
>> + return -ENODEV;
>> + }
>> goto again;
>> }
>>
>> @@ -566,7 +578,14 @@ static void virtio_gpu_queue_cursor(struct virtio_gpu_device *vgdev,
>> ret = virtqueue_add_sgs(vq, sgs, outcnt, 0, vbuf, GFP_ATOMIC);
>> if (ret == -ENOSPC) {
>> spin_unlock(&vgdev->cursorq.qlock);
>> - wait_event(vgdev->cursorq.ack_queue, vq->num_free >= outcnt);
>> + wait_event(vgdev->cursorq.ack_queue,
>> + vq->num_free >= outcnt || vgdev->vqs_released);
>> + /* See comment in virtio_gpu_queue_ctrl_sgs(). */
>> + if (vgdev->vqs_released) {
>> + free_vbuf(vgdev, vbuf);
>> + drm_dev_exit(idx);
>> + return;
>> + }
>> spin_lock(&vgdev->cursorq.qlock);
>> goto retry;
>> } else {
>
> What about other wait_event in the driver? Why only these?
There are other wait_event() calls on vgdev->resp_wq in virtgpu_prime.c
and virtgpu_vram.c. These can also cause a stuck process when the host
device stops or does not respond and should be fixed. However, they are
not related to the issue being fixed here. I think they should be
addressed in a separate commit.
The wait_event(vgdev->resp_wq) calls wait for a host response, not for
virtqueue ring space. They are not inside a drm_dev_enter/exit()
critical section, so they don't block drm_dev_unplug() ->
synchronize_srcu() and are not part of the deadlock reported by syzbot.
However, if a process is stuck in wait_event(vgdev->resp_wq), there is
no way to recover other than host device recovery itself. We can still
remove the device and destroy the virtqueues while the process is stuck,
so the host response can never arrive. IIUC, the stuck thread holds a
drm_device reference, preventing vgdev from being freed; this is a
resource leak.
This can be fixed by adding a wake_up_all() and a vqs_released check
similar to what is done for ctrlq/cursorq. However, I think this should
be a separate commit since the root cause and problem are different.
Alternatively, the resp_wq wait_event() calls could be converted to
wait_event_interruptible() to fix the issue. But as you mentioned
earlier in the v1 comment, we need the wait_event_interruptible()
rework.
What do you think? Should I include the resp_wq fix as a separate commit
in this patch series, or leave it for the _interruptible() rework?
Best regards,
Ryosuke
next prev parent reply other threads:[~2026-05-22 8:51 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-21 2:19 [PATCH v2] drm/virtio: abort virtqueue wait on device removal to avoid hung task Ryosuke Yasuoka
2026-05-21 2:19 ` syzbot
2026-05-21 2:19 ` syzbot
2026-05-21 19:01 ` Dmitry Osipenko
2026-05-21 19:02 ` syzbot
2026-05-21 19:02 ` syzbot
2026-05-22 8:51 ` Ryosuke Yasuoka [this message]
2026-05-22 8:51 ` syzbot
2026-05-22 8:51 ` syzbot
2026-05-25 11:05 ` Claude review: " Claude Code Review Bot
2026-05-25 11:05 ` Claude Code Review Bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=18b1d7267ace8b76.329809b8dd1e13b3.4134c0f069791aa6@ryasuoka-thinkpadx1carbongen9.tokyo.csb \
--to=ryasuoka@redhat.com \
--cc=airlied@redhat.com \
--cc=dmitry.osipenko@collabora.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=gurchetansingh@chromium.org \
--cc=kraxel@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=maarten.lankhorst@linux.intel.com \
--cc=mripard@kernel.org \
--cc=olvaffe@gmail.com \
--cc=simona@ffwll.ch \
--cc=syzbot+908bd910da5dd79b88de4cf7baf376cc873a922e@syzkaller.appspotmail.com \
--cc=syzbot+d6dd6f86d3aaf7eebe7406e45c1c6e549453f224@syzkaller.appspotmail.com \
--cc=tzimmermann@suse.de \
--cc=virtualization@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox