public inbox for drm-ai-reviews@public-inbox.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH v2] drm/virtio: abort virtqueue wait on device removal to avoid hung task
@ 2026-05-21  2:19 Ryosuke Yasuoka
  2026-05-21  2:19 ` syzbot
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Ryosuke Yasuoka @ 2026-05-21  2:19 UTC (permalink / raw)
  To: David Airlie, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Simona Vetter
  Cc: dri-devel, virtualization, linux-kernel,
	syzbot+d6dd6f86d3aaf7eebe7406e45c1c6e549453f224,
	syzbot+908bd910da5dd79b88de4cf7baf376cc873a922e, Ryosuke Yasuoka

virtio_gpu_queue_ctrl_sgs() and virtio_gpu_queue_cursor() use
wait_event() without any abort condition when waiting for virtqueue
space. If the host device stops processing commands, these waits block
indefinitely inside a drm_dev_enter/exit() critical section. Since
drm_dev_unplug(), which is called in device removal and system shutdown
call path, blocks on synchronize_srcu() until all critical sections
complete, device removal and system shutdown also hang.

Add a vqs_released flag to virtio_gpu_device and include it in the
wait_event() condition. Set the flag and wake up both queues in a new
virtio_gpu_release_vqs() helper, called before drm_dev_unplug() in both
virtio_gpu_remove() and virtio_gpu_shutdown(). When the flag is set, the
wait returns immediately and the command is aborted, following the same
cleanup path as drm_dev_enter() failure.

Reported-by: syzbot+d6dd6f86d3aaf7eebe7406e45c1c6e549453f224@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?id=d6dd6f86d3aaf7eebe7406e45c1c6e549453f224
Reported-by: syzbot+908bd910da5dd79b88de4cf7baf376cc873a922e@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?id=908bd910da5dd79b88de4cf7baf376cc873a922e
Signed-off-by: Ryosuke Yasuoka <ryasuoka@redhat.com>
---
Changes in v2:
- Update the commit message.
- Replace wait_event_timeout() with wait_event() using a compound
condition that includes a new vqs_released flag.
- Add virtio_gpu_release_vqs() helper to set the flag and wake up
both queues, called before drm_dev_unplug() in remove and shutdown
paths.
- Remove the hardcoded 5-second timeout. Recovery is now driven by
the driver flag instead of an arbitrary timeout value.
---
 drivers/gpu/drm/virtio/virtgpu_drv.c | 15 +++++++++++++++
 drivers/gpu/drm/virtio/virtgpu_drv.h |  1 +
 drivers/gpu/drm/virtio/virtgpu_vq.c  | 23 +++++++++++++++++++++--
 3 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c b/drivers/gpu/drm/virtio/virtgpu_drv.c
index a5ce96fb8a1d..e4fe5e0780f9 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.c
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
@@ -119,10 +119,24 @@ static int virtio_gpu_probe(struct virtio_device *vdev)
 	return ret;
 }
 
+/*
+ * Release pending virtqueue waits so the drm_dev_enter/exit() critical
+ * sections complete before drm_dev_unplug() blocks on synchronize_srcu().
+ */
+static void virtio_gpu_release_vqs(struct drm_device *dev)
+{
+	struct virtio_gpu_device *vgdev = dev->dev_private;
+
+	vgdev->vqs_released = true;
+	wake_up_all(&vgdev->ctrlq.ack_queue);
+	wake_up_all(&vgdev->cursorq.ack_queue);
+}
+
 static void virtio_gpu_remove(struct virtio_device *vdev)
 {
 	struct drm_device *dev = vdev->priv;
 
+	virtio_gpu_release_vqs(dev);
 	drm_dev_unplug(dev);
 	drm_atomic_helper_shutdown(dev);
 	virtio_gpu_deinit(dev);
@@ -133,6 +147,7 @@ static void virtio_gpu_shutdown(struct virtio_device *vdev)
 {
 	struct drm_device *dev = vdev->priv;
 
+	virtio_gpu_release_vqs(dev);
 	/* stop talking to the device */
 	drm_dev_unplug(dev);
 }
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h
index f17660a71a3e..0bd69a40857e 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -235,6 +235,7 @@ struct virtio_gpu_device {
 
 	struct virtio_gpu_queue ctrlq;
 	struct virtio_gpu_queue cursorq;
+	bool vqs_released;
 	struct kmem_cache *vbufs;
 
 	atomic_t pending_commands;
diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c
index 67865810a2e7..8057a9b7356d 100644
--- a/drivers/gpu/drm/virtio/virtgpu_vq.c
+++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
@@ -396,7 +396,19 @@ static int virtio_gpu_queue_ctrl_sgs(struct virtio_gpu_device *vgdev,
 	if (vq->num_free < elemcnt) {
 		spin_unlock(&vgdev->ctrlq.qlock);
 		virtio_gpu_notify(vgdev);
-		wait_event(vgdev->ctrlq.ack_queue, vq->num_free >= elemcnt);
+		wait_event(vgdev->ctrlq.ack_queue,
+			   vq->num_free >= elemcnt || vgdev->vqs_released);
+		/*
+		 * Set by virtio_gpu_release_vqs() to unblock
+		 * synchronize_srcu() wait in drm_dev_unplug().
+		 */
+		if (vgdev->vqs_released) {
+			if (fence && vbuf->objs)
+				virtio_gpu_array_unlock_resv(vbuf->objs);
+			free_vbuf(vgdev, vbuf);
+			drm_dev_exit(idx);
+			return -ENODEV;
+		}
 		goto again;
 	}
 
@@ -566,7 +578,14 @@ static void virtio_gpu_queue_cursor(struct virtio_gpu_device *vgdev,
 	ret = virtqueue_add_sgs(vq, sgs, outcnt, 0, vbuf, GFP_ATOMIC);
 	if (ret == -ENOSPC) {
 		spin_unlock(&vgdev->cursorq.qlock);
-		wait_event(vgdev->cursorq.ack_queue, vq->num_free >= outcnt);
+		wait_event(vgdev->cursorq.ack_queue,
+			   vq->num_free >= outcnt || vgdev->vqs_released);
+		/* See comment in virtio_gpu_queue_ctrl_sgs(). */
+		if (vgdev->vqs_released) {
+			free_vbuf(vgdev, vbuf);
+			drm_dev_exit(idx);
+			return;
+		}
 		spin_lock(&vgdev->cursorq.qlock);
 		goto retry;
 	} else {

---
base-commit: 5200f5f493f79f14bbdc349e402a40dfb32f23c8
change-id: 20260518-virtio-gpu_wait_event-5aa060754f12

Best regards,
-- 
Ryosuke Yasuoka <ryasuoka@redhat.com>


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] drm/virtio: abort virtqueue wait on device removal to avoid hung task
  2026-05-21  2:19 [PATCH v2] drm/virtio: abort virtqueue wait on device removal to avoid hung task Ryosuke Yasuoka
@ 2026-05-21  2:19 ` syzbot
  2026-05-21  2:19 ` syzbot
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: syzbot @ 2026-05-21  2:19 UTC (permalink / raw)
  To: ryasuoka
  Cc: airlied, dmitry.osipenko, dri-devel, gurchetansingh, kraxel,
	linux-kernel, maarten.lankhorst, mripard, olvaffe, ryasuoka,
	simona, tzimmermann, virtualization

> virtio_gpu_queue_ctrl_sgs() and virtio_gpu_queue_cursor() use
> wait_event() without any abort condition when waiting for virtqueue
> space. If the host device stops processing commands, these waits block
> indefinitely inside a drm_dev_enter/exit() critical section. Since
> drm_dev_unplug(), which is called in device removal and system shutdown
> call path, blocks on synchronize_srcu() until all critical sections
> complete, device removal and system shutdown also hang.
>
> Add a vqs_released flag to virtio_gpu_device and include it in the
> wait_event() condition. Set the flag and wake up both queues in a new
> virtio_gpu_release_vqs() helper, called before drm_dev_unplug() in both
> virtio_gpu_remove() and virtio_gpu_shutdown(). When the flag is set, the
> wait returns immediately and the command is aborted, following the same
> cleanup path as drm_dev_enter() failure.
>
> Reported-by: syzbot+d6dd6f86d3aaf7eebe7406e45c1c6e549453f224@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?id=d6dd6f86d3aaf7eebe7406e45c1c6e549453f224
> Reported-by: syzbot+908bd910da5dd79b88de4cf7baf376cc873a922e@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?id=908bd910da5dd79b88de4cf7baf376cc873a922e
> Signed-off-by: Ryosuke Yasuoka <ryasuoka@redhat.com>
> ---
> Changes in v2:
> - Update the commit message.
> - Replace wait_event_timeout() with wait_event() using a compound
> condition that includes a new vqs_released flag.
> - Add virtio_gpu_release_vqs() helper to set the flag and wake up
> both queues, called before drm_dev_unplug() in remove and shutdown
> paths.
> - Remove the hardcoded 5-second timeout. Recovery is now driven by
> the driver flag instead of an arbitrary timeout value.
> ---
>  drivers/gpu/drm/virtio/virtgpu_drv.c | 15 +++++++++++++++
>  drivers/gpu/drm/virtio/virtgpu_drv.h |  1 +
>  drivers/gpu/drm/virtio/virtgpu_vq.c  | 23 +++++++++++++++++++++--
>  3 files changed, 37 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c b/drivers/gpu/drm/virtio/virtgpu_drv.c
> index a5ce96fb8a1d..e4fe5e0780f9 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_drv.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
> @@ -119,10 +119,24 @@ static int virtio_gpu_probe(struct virtio_device *vdev)
>  	return ret;
>  }
>  
> +/*
> + * Release pending virtqueue waits so the drm_dev_enter/exit() critical
> + * sections complete before drm_dev_unplug() blocks on synchronize_srcu().
> + */
> +static void virtio_gpu_release_vqs(struct drm_device *dev)
> +{
> +	struct virtio_gpu_device *vgdev = dev->dev_private;
> +
> +	vgdev->vqs_released = true;
> +	wake_up_all(&vgdev->ctrlq.ack_queue);
> +	wake_up_all(&vgdev->cursorq.ack_queue);
> +}
> +
>  static void virtio_gpu_remove(struct virtio_device *vdev)
>  {
>  	struct drm_device *dev = vdev->priv;
>  
> +	virtio_gpu_release_vqs(dev);
>  	drm_dev_unplug(dev);
>  	drm_atomic_helper_shutdown(dev);
>  	virtio_gpu_deinit(dev);
> @@ -133,6 +147,7 @@ static void virtio_gpu_shutdown(struct virtio_device *vdev)
>  {
>  	struct drm_device *dev = vdev->priv;
>  
> +	virtio_gpu_release_vqs(dev);
>  	/* stop talking to the device */
>  	drm_dev_unplug(dev);
>  }
> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h
> index f17660a71a3e..0bd69a40857e 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
> @@ -235,6 +235,7 @@ struct virtio_gpu_device {
>  
>  	struct virtio_gpu_queue ctrlq;
>  	struct virtio_gpu_queue cursorq;
> +	bool vqs_released;
>  	struct kmem_cache *vbufs;
>  
>  	atomic_t pending_commands;
> diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c
> index 67865810a2e7..8057a9b7356d 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_vq.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
> @@ -396,7 +396,19 @@ static int virtio_gpu_queue_ctrl_sgs(struct virtio_gpu_device *vgdev,
>  	if (vq->num_free < elemcnt) {
>  		spin_unlock(&vgdev->ctrlq.qlock);
>  		virtio_gpu_notify(vgdev);
> -		wait_event(vgdev->ctrlq.ack_queue, vq->num_free >= elemcnt);
> +		wait_event(vgdev->ctrlq.ack_queue,
> +			   vq->num_free >= elemcnt || vgdev->vqs_released);
> +		/*
> +		 * Set by virtio_gpu_release_vqs() to unblock
> +		 * synchronize_srcu() wait in drm_dev_unplug().
> +		 */
> +		if (vgdev->vqs_released) {
> +			if (fence && vbuf->objs)
> +				virtio_gpu_array_unlock_resv(vbuf->objs);
> +			free_vbuf(vgdev, vbuf);
> +			drm_dev_exit(idx);
> +			return -ENODEV;
> +		}
>  		goto again;
>  	}
>  
> @@ -566,7 +578,14 @@ static void virtio_gpu_queue_cursor(struct virtio_gpu_device *vgdev,
>  	ret = virtqueue_add_sgs(vq, sgs, outcnt, 0, vbuf, GFP_ATOMIC);
>  	if (ret == -ENOSPC) {
>  		spin_unlock(&vgdev->cursorq.qlock);
> -		wait_event(vgdev->cursorq.ack_queue, vq->num_free >= outcnt);
> +		wait_event(vgdev->cursorq.ack_queue,
> +			   vq->num_free >= outcnt || vgdev->vqs_released);
> +		/* See comment in virtio_gpu_queue_ctrl_sgs(). */
> +		if (vgdev->vqs_released) {
> +			free_vbuf(vgdev, vbuf);
> +			drm_dev_exit(idx);
> +			return;
> +		}
>  		spin_lock(&vgdev->cursorq.qlock);
>  		goto retry;
>  	} else {
>
> ---
> base-commit: 5200f5f493f79f14bbdc349e402a40dfb32f23c8
> change-id: 20260518-virtio-gpu_wait_event-5aa060754f12
>
> Best regards,
> -- 
> Ryosuke Yasuoka <ryasuoka@redhat.com>
>

I see the command but can't find the corresponding bug.
The email is sent to  syzbot+HASH@syzkaller.appspotmail.com address
but the HASH does not correspond to any known bug.
Please double check the address.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] drm/virtio: abort virtqueue wait on device removal to avoid hung task
  2026-05-21  2:19 [PATCH v2] drm/virtio: abort virtqueue wait on device removal to avoid hung task Ryosuke Yasuoka
  2026-05-21  2:19 ` syzbot
@ 2026-05-21  2:19 ` syzbot
  2026-05-21 19:01 ` Dmitry Osipenko
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: syzbot @ 2026-05-21  2:19 UTC (permalink / raw)
  To: ryasuoka
  Cc: airlied, dmitry.osipenko, dri-devel, gurchetansingh, kraxel,
	linux-kernel, maarten.lankhorst, mripard, olvaffe, ryasuoka,
	simona, tzimmermann, virtualization

> virtio_gpu_queue_ctrl_sgs() and virtio_gpu_queue_cursor() use
> wait_event() without any abort condition when waiting for virtqueue
> space. If the host device stops processing commands, these waits block
> indefinitely inside a drm_dev_enter/exit() critical section. Since
> drm_dev_unplug(), which is called in device removal and system shutdown
> call path, blocks on synchronize_srcu() until all critical sections
> complete, device removal and system shutdown also hang.
>
> Add a vqs_released flag to virtio_gpu_device and include it in the
> wait_event() condition. Set the flag and wake up both queues in a new
> virtio_gpu_release_vqs() helper, called before drm_dev_unplug() in both
> virtio_gpu_remove() and virtio_gpu_shutdown(). When the flag is set, the
> wait returns immediately and the command is aborted, following the same
> cleanup path as drm_dev_enter() failure.
>
> Reported-by: syzbot+d6dd6f86d3aaf7eebe7406e45c1c6e549453f224@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?id=d6dd6f86d3aaf7eebe7406e45c1c6e549453f224
> Reported-by: syzbot+908bd910da5dd79b88de4cf7baf376cc873a922e@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?id=908bd910da5dd79b88de4cf7baf376cc873a922e
> Signed-off-by: Ryosuke Yasuoka <ryasuoka@redhat.com>
> ---
> Changes in v2:
> - Update the commit message.
> - Replace wait_event_timeout() with wait_event() using a compound
> condition that includes a new vqs_released flag.
> - Add virtio_gpu_release_vqs() helper to set the flag and wake up
> both queues, called before drm_dev_unplug() in remove and shutdown
> paths.
> - Remove the hardcoded 5-second timeout. Recovery is now driven by
> the driver flag instead of an arbitrary timeout value.
> ---
>  drivers/gpu/drm/virtio/virtgpu_drv.c | 15 +++++++++++++++
>  drivers/gpu/drm/virtio/virtgpu_drv.h |  1 +
>  drivers/gpu/drm/virtio/virtgpu_vq.c  | 23 +++++++++++++++++++++--
>  3 files changed, 37 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c b/drivers/gpu/drm/virtio/virtgpu_drv.c
> index a5ce96fb8a1d..e4fe5e0780f9 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_drv.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
> @@ -119,10 +119,24 @@ static int virtio_gpu_probe(struct virtio_device *vdev)
>  	return ret;
>  }
>  
> +/*
> + * Release pending virtqueue waits so the drm_dev_enter/exit() critical
> + * sections complete before drm_dev_unplug() blocks on synchronize_srcu().
> + */
> +static void virtio_gpu_release_vqs(struct drm_device *dev)
> +{
> +	struct virtio_gpu_device *vgdev = dev->dev_private;
> +
> +	vgdev->vqs_released = true;
> +	wake_up_all(&vgdev->ctrlq.ack_queue);
> +	wake_up_all(&vgdev->cursorq.ack_queue);
> +}
> +
>  static void virtio_gpu_remove(struct virtio_device *vdev)
>  {
>  	struct drm_device *dev = vdev->priv;
>  
> +	virtio_gpu_release_vqs(dev);
>  	drm_dev_unplug(dev);
>  	drm_atomic_helper_shutdown(dev);
>  	virtio_gpu_deinit(dev);
> @@ -133,6 +147,7 @@ static void virtio_gpu_shutdown(struct virtio_device *vdev)
>  {
>  	struct drm_device *dev = vdev->priv;
>  
> +	virtio_gpu_release_vqs(dev);
>  	/* stop talking to the device */
>  	drm_dev_unplug(dev);
>  }
> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h
> index f17660a71a3e..0bd69a40857e 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
> @@ -235,6 +235,7 @@ struct virtio_gpu_device {
>  
>  	struct virtio_gpu_queue ctrlq;
>  	struct virtio_gpu_queue cursorq;
> +	bool vqs_released;
>  	struct kmem_cache *vbufs;
>  
>  	atomic_t pending_commands;
> diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c
> index 67865810a2e7..8057a9b7356d 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_vq.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
> @@ -396,7 +396,19 @@ static int virtio_gpu_queue_ctrl_sgs(struct virtio_gpu_device *vgdev,
>  	if (vq->num_free < elemcnt) {
>  		spin_unlock(&vgdev->ctrlq.qlock);
>  		virtio_gpu_notify(vgdev);
> -		wait_event(vgdev->ctrlq.ack_queue, vq->num_free >= elemcnt);
> +		wait_event(vgdev->ctrlq.ack_queue,
> +			   vq->num_free >= elemcnt || vgdev->vqs_released);
> +		/*
> +		 * Set by virtio_gpu_release_vqs() to unblock
> +		 * synchronize_srcu() wait in drm_dev_unplug().
> +		 */
> +		if (vgdev->vqs_released) {
> +			if (fence && vbuf->objs)
> +				virtio_gpu_array_unlock_resv(vbuf->objs);
> +			free_vbuf(vgdev, vbuf);
> +			drm_dev_exit(idx);
> +			return -ENODEV;
> +		}
>  		goto again;
>  	}
>  
> @@ -566,7 +578,14 @@ static void virtio_gpu_queue_cursor(struct virtio_gpu_device *vgdev,
>  	ret = virtqueue_add_sgs(vq, sgs, outcnt, 0, vbuf, GFP_ATOMIC);
>  	if (ret == -ENOSPC) {
>  		spin_unlock(&vgdev->cursorq.qlock);
> -		wait_event(vgdev->cursorq.ack_queue, vq->num_free >= outcnt);
> +		wait_event(vgdev->cursorq.ack_queue,
> +			   vq->num_free >= outcnt || vgdev->vqs_released);
> +		/* See comment in virtio_gpu_queue_ctrl_sgs(). */
> +		if (vgdev->vqs_released) {
> +			free_vbuf(vgdev, vbuf);
> +			drm_dev_exit(idx);
> +			return;
> +		}
>  		spin_lock(&vgdev->cursorq.qlock);
>  		goto retry;
>  	} else {
>
> ---
> base-commit: 5200f5f493f79f14bbdc349e402a40dfb32f23c8
> change-id: 20260518-virtio-gpu_wait_event-5aa060754f12
>
> Best regards,
> -- 
> Ryosuke Yasuoka <ryasuoka@redhat.com>
>

I see the command but can't find the corresponding bug.
The email is sent to  syzbot+HASH@syzkaller.appspotmail.com address
but the HASH does not correspond to any known bug.
Please double check the address.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] drm/virtio: abort virtqueue wait on device removal to avoid hung task
  2026-05-21  2:19 [PATCH v2] drm/virtio: abort virtqueue wait on device removal to avoid hung task Ryosuke Yasuoka
  2026-05-21  2:19 ` syzbot
  2026-05-21  2:19 ` syzbot
@ 2026-05-21 19:01 ` Dmitry Osipenko
  2026-05-21 19:02   ` syzbot
                     ` (2 more replies)
  2026-05-25 11:05 ` Claude review: " Claude Code Review Bot
  2026-05-25 11:05 ` Claude Code Review Bot
  4 siblings, 3 replies; 11+ messages in thread
From: Dmitry Osipenko @ 2026-05-21 19:01 UTC (permalink / raw)
  To: Ryosuke Yasuoka, David Airlie, Gerd Hoffmann, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Simona Vetter
  Cc: dri-devel, virtualization, linux-kernel,
	syzbot+d6dd6f86d3aaf7eebe7406e45c1c6e549453f224,
	syzbot+908bd910da5dd79b88de4cf7baf376cc873a922e

21.05.2026 05:19, Ryosuke Yasuoka пишет:
> virtio_gpu_queue_ctrl_sgs() and virtio_gpu_queue_cursor() use
> wait_event() without any abort condition when waiting for virtqueue
> space. If the host device stops processing commands, these waits block
> indefinitely inside a drm_dev_enter/exit() critical section. Since
> drm_dev_unplug(), which is called in device removal and system shutdown
> call path, blocks on synchronize_srcu() until all critical sections
> complete, device removal and system shutdown also hang.
> 
> Add a vqs_released flag to virtio_gpu_device and include it in the
> wait_event() condition. Set the flag and wake up both queues in a new
> virtio_gpu_release_vqs() helper, called before drm_dev_unplug() in both
> virtio_gpu_remove() and virtio_gpu_shutdown(). When the flag is set, the
> wait returns immediately and the command is aborted, following the same
> cleanup path as drm_dev_enter() failure.
> 
> Reported-by: syzbot+d6dd6f86d3aaf7eebe7406e45c1c6e549453f224@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?id=d6dd6f86d3aaf7eebe7406e45c1c6e549453f224
> Reported-by: syzbot+908bd910da5dd79b88de4cf7baf376cc873a922e@syzkaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?id=908bd910da5dd79b88de4cf7baf376cc873a922e
> Signed-off-by: Ryosuke Yasuoka <ryasuoka@redhat.com>
> ---
> Changes in v2:
> - Update the commit message.
> - Replace wait_event_timeout() with wait_event() using a compound
> condition that includes a new vqs_released flag.
> - Add virtio_gpu_release_vqs() helper to set the flag and wake up
> both queues, called before drm_dev_unplug() in remove and shutdown
> paths.
> - Remove the hardcoded 5-second timeout. Recovery is now driven by
> the driver flag instead of an arbitrary timeout value.
> ---
>  drivers/gpu/drm/virtio/virtgpu_drv.c | 15 +++++++++++++++
>  drivers/gpu/drm/virtio/virtgpu_drv.h |  1 +
>  drivers/gpu/drm/virtio/virtgpu_vq.c  | 23 +++++++++++++++++++++--
>  3 files changed, 37 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c b/drivers/gpu/drm/virtio/virtgpu_drv.c
> index a5ce96fb8a1d..e4fe5e0780f9 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_drv.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
> @@ -119,10 +119,24 @@ static int virtio_gpu_probe(struct virtio_device *vdev)
>  	return ret;
>  }
>  
> +/*
> + * Release pending virtqueue waits so the drm_dev_enter/exit() critical
> + * sections complete before drm_dev_unplug() blocks on synchronize_srcu().
> + */
> +static void virtio_gpu_release_vqs(struct drm_device *dev)
> +{
> +	struct virtio_gpu_device *vgdev = dev->dev_private;
> +
> +	vgdev->vqs_released = true;
> +	wake_up_all(&vgdev->ctrlq.ack_queue);
> +	wake_up_all(&vgdev->cursorq.ack_queue);
> +}
> +
>  static void virtio_gpu_remove(struct virtio_device *vdev)
>  {
>  	struct drm_device *dev = vdev->priv;
>  
> +	virtio_gpu_release_vqs(dev);
>  	drm_dev_unplug(dev);
>  	drm_atomic_helper_shutdown(dev);
>  	virtio_gpu_deinit(dev);
> @@ -133,6 +147,7 @@ static void virtio_gpu_shutdown(struct virtio_device *vdev)
>  {
>  	struct drm_device *dev = vdev->priv;
>  
> +	virtio_gpu_release_vqs(dev);
>  	/* stop talking to the device */
>  	drm_dev_unplug(dev);
>  }
> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h
> index f17660a71a3e..0bd69a40857e 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
> @@ -235,6 +235,7 @@ struct virtio_gpu_device {
>  
>  	struct virtio_gpu_queue ctrlq;
>  	struct virtio_gpu_queue cursorq;
> +	bool vqs_released;
>  	struct kmem_cache *vbufs;
>  
>  	atomic_t pending_commands;
> diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c
> index 67865810a2e7..8057a9b7356d 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_vq.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
> @@ -396,7 +396,19 @@ static int virtio_gpu_queue_ctrl_sgs(struct virtio_gpu_device *vgdev,
>  	if (vq->num_free < elemcnt) {
>  		spin_unlock(&vgdev->ctrlq.qlock);
>  		virtio_gpu_notify(vgdev);
> -		wait_event(vgdev->ctrlq.ack_queue, vq->num_free >= elemcnt);
> +		wait_event(vgdev->ctrlq.ack_queue,
> +			   vq->num_free >= elemcnt || vgdev->vqs_released);
> +		/*
> +		 * Set by virtio_gpu_release_vqs() to unblock
> +		 * synchronize_srcu() wait in drm_dev_unplug().
> +		 */
> +		if (vgdev->vqs_released) {
> +			if (fence && vbuf->objs)
> +				virtio_gpu_array_unlock_resv(vbuf->objs);
> +			free_vbuf(vgdev, vbuf);
> +			drm_dev_exit(idx);
> +			return -ENODEV;
> +		}
>  		goto again;
>  	}
>  
> @@ -566,7 +578,14 @@ static void virtio_gpu_queue_cursor(struct virtio_gpu_device *vgdev,
>  	ret = virtqueue_add_sgs(vq, sgs, outcnt, 0, vbuf, GFP_ATOMIC);
>  	if (ret == -ENOSPC) {
>  		spin_unlock(&vgdev->cursorq.qlock);
> -		wait_event(vgdev->cursorq.ack_queue, vq->num_free >= outcnt);
> +		wait_event(vgdev->cursorq.ack_queue,
> +			   vq->num_free >= outcnt || vgdev->vqs_released);
> +		/* See comment in virtio_gpu_queue_ctrl_sgs(). */
> +		if (vgdev->vqs_released) {
> +			free_vbuf(vgdev, vbuf);
> +			drm_dev_exit(idx);
> +			return;
> +		}
>  		spin_lock(&vgdev->cursorq.qlock);
>  		goto retry;
>  	} else {

What about other wait_event in the driver? Why only these?

-- 
Best regards,
Dmitry

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] drm/virtio: abort virtqueue wait on device removal to avoid hung task
  2026-05-21 19:01 ` Dmitry Osipenko
  2026-05-21 19:02   ` syzbot
@ 2026-05-21 19:02   ` syzbot
  2026-05-22  8:51   ` Ryosuke Yasuoka
  2 siblings, 0 replies; 11+ messages in thread
From: syzbot @ 2026-05-21 19:02 UTC (permalink / raw)
  To: dmitry.osipenko
  Cc: airlied, dmitry.osipenko, dri-devel, gurchetansingh, kraxel,
	linux-kernel, maarten.lankhorst, mripard, olvaffe, ryasuoka,
	simona, tzimmermann, virtualization

> 21.05.2026 05:19, Ryosuke Yasuoka пишет:
>> virtio_gpu_queue_ctrl_sgs() and virtio_gpu_queue_cursor() use
>> wait_event() without any abort condition when waiting for virtqueue
>> space. If the host device stops processing commands, these waits block
>> indefinitely inside a drm_dev_enter/exit() critical section. Since
>> drm_dev_unplug(), which is called in device removal and system shutdown
>> call path, blocks on synchronize_srcu() until all critical sections
>> complete, device removal and system shutdown also hang.
>> 
>> Add a vqs_released flag to virtio_gpu_device and include it in the
>> wait_event() condition. Set the flag and wake up both queues in a new
>> virtio_gpu_release_vqs() helper, called before drm_dev_unplug() in both
>> virtio_gpu_remove() and virtio_gpu_shutdown(). When the flag is set, the
>> wait returns immediately and the command is aborted, following the same
>> cleanup path as drm_dev_enter() failure.
>> 
>> Reported-by: syzbot+d6dd6f86d3aaf7eebe7406e45c1c6e549453f224@syzkaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?id=d6dd6f86d3aaf7eebe7406e45c1c6e549453f224
>> Reported-by: syzbot+908bd910da5dd79b88de4cf7baf376cc873a922e@syzkaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?id=908bd910da5dd79b88de4cf7baf376cc873a922e
>> Signed-off-by: Ryosuke Yasuoka <ryasuoka@redhat.com>
>> ---
>> Changes in v2:
>> - Update the commit message.
>> - Replace wait_event_timeout() with wait_event() using a compound
>> condition that includes a new vqs_released flag.
>> - Add virtio_gpu_release_vqs() helper to set the flag and wake up
>> both queues, called before drm_dev_unplug() in remove and shutdown
>> paths.
>> - Remove the hardcoded 5-second timeout. Recovery is now driven by
>> the driver flag instead of an arbitrary timeout value.
>> ---
>>  drivers/gpu/drm/virtio/virtgpu_drv.c | 15 +++++++++++++++
>>  drivers/gpu/drm/virtio/virtgpu_drv.h |  1 +
>>  drivers/gpu/drm/virtio/virtgpu_vq.c  | 23 +++++++++++++++++++++--
>>  3 files changed, 37 insertions(+), 2 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c b/drivers/gpu/drm/virtio/virtgpu_drv.c
>> index a5ce96fb8a1d..e4fe5e0780f9 100644
>> --- a/drivers/gpu/drm/virtio/virtgpu_drv.c
>> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
>> @@ -119,10 +119,24 @@ static int virtio_gpu_probe(struct virtio_device *vdev)
>>  	return ret;
>>  }
>>  
>> +/*
>> + * Release pending virtqueue waits so the drm_dev_enter/exit() critical
>> + * sections complete before drm_dev_unplug() blocks on synchronize_srcu().
>> + */
>> +static void virtio_gpu_release_vqs(struct drm_device *dev)
>> +{
>> +	struct virtio_gpu_device *vgdev = dev->dev_private;
>> +
>> +	vgdev->vqs_released = true;
>> +	wake_up_all(&vgdev->ctrlq.ack_queue);
>> +	wake_up_all(&vgdev->cursorq.ack_queue);
>> +}
>> +
>>  static void virtio_gpu_remove(struct virtio_device *vdev)
>>  {
>>  	struct drm_device *dev = vdev->priv;
>>  
>> +	virtio_gpu_release_vqs(dev);
>>  	drm_dev_unplug(dev);
>>  	drm_atomic_helper_shutdown(dev);
>>  	virtio_gpu_deinit(dev);
>> @@ -133,6 +147,7 @@ static void virtio_gpu_shutdown(struct virtio_device *vdev)
>>  {
>>  	struct drm_device *dev = vdev->priv;
>>  
>> +	virtio_gpu_release_vqs(dev);
>>  	/* stop talking to the device */
>>  	drm_dev_unplug(dev);
>>  }
>> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h
>> index f17660a71a3e..0bd69a40857e 100644
>> --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
>> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
>> @@ -235,6 +235,7 @@ struct virtio_gpu_device {
>>  
>>  	struct virtio_gpu_queue ctrlq;
>>  	struct virtio_gpu_queue cursorq;
>> +	bool vqs_released;
>>  	struct kmem_cache *vbufs;
>>  
>>  	atomic_t pending_commands;
>> diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c
>> index 67865810a2e7..8057a9b7356d 100644
>> --- a/drivers/gpu/drm/virtio/virtgpu_vq.c
>> +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
>> @@ -396,7 +396,19 @@ static int virtio_gpu_queue_ctrl_sgs(struct virtio_gpu_device *vgdev,
>>  	if (vq->num_free < elemcnt) {
>>  		spin_unlock(&vgdev->ctrlq.qlock);
>>  		virtio_gpu_notify(vgdev);
>> -		wait_event(vgdev->ctrlq.ack_queue, vq->num_free >= elemcnt);
>> +		wait_event(vgdev->ctrlq.ack_queue,
>> +			   vq->num_free >= elemcnt || vgdev->vqs_released);
>> +		/*
>> +		 * Set by virtio_gpu_release_vqs() to unblock
>> +		 * synchronize_srcu() wait in drm_dev_unplug().
>> +		 */
>> +		if (vgdev->vqs_released) {
>> +			if (fence && vbuf->objs)
>> +				virtio_gpu_array_unlock_resv(vbuf->objs);
>> +			free_vbuf(vgdev, vbuf);
>> +			drm_dev_exit(idx);
>> +			return -ENODEV;
>> +		}
>>  		goto again;
>>  	}
>>  
>> @@ -566,7 +578,14 @@ static void virtio_gpu_queue_cursor(struct virtio_gpu_device *vgdev,
>>  	ret = virtqueue_add_sgs(vq, sgs, outcnt, 0, vbuf, GFP_ATOMIC);
>>  	if (ret == -ENOSPC) {
>>  		spin_unlock(&vgdev->cursorq.qlock);
>> -		wait_event(vgdev->cursorq.ack_queue, vq->num_free >= outcnt);
>> +		wait_event(vgdev->cursorq.ack_queue,
>> +			   vq->num_free >= outcnt || vgdev->vqs_released);
>> +		/* See comment in virtio_gpu_queue_ctrl_sgs(). */
>> +		if (vgdev->vqs_released) {
>> +			free_vbuf(vgdev, vbuf);
>> +			drm_dev_exit(idx);
>> +			return;
>> +		}
>>  		spin_lock(&vgdev->cursorq.qlock);
>>  		goto retry;
>>  	} else {
>
> What about other wait_event in the driver? Why only these?
>
> -- 
> Best regards,
> Dmitry

I see the command but can't find the corresponding bug.
The email is sent to  syzbot+HASH@syzkaller.appspotmail.com address
but the HASH does not correspond to any known bug.
Please double check the address.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] drm/virtio: abort virtqueue wait on device removal to avoid hung task
  2026-05-21 19:01 ` Dmitry Osipenko
@ 2026-05-21 19:02   ` syzbot
  2026-05-21 19:02   ` syzbot
  2026-05-22  8:51   ` Ryosuke Yasuoka
  2 siblings, 0 replies; 11+ messages in thread
From: syzbot @ 2026-05-21 19:02 UTC (permalink / raw)
  To: dmitry.osipenko
  Cc: airlied, dmitry.osipenko, dri-devel, gurchetansingh, kraxel,
	linux-kernel, maarten.lankhorst, mripard, olvaffe, ryasuoka,
	simona, tzimmermann, virtualization

> 21.05.2026 05:19, Ryosuke Yasuoka пишет:
>> virtio_gpu_queue_ctrl_sgs() and virtio_gpu_queue_cursor() use
>> wait_event() without any abort condition when waiting for virtqueue
>> space. If the host device stops processing commands, these waits block
>> indefinitely inside a drm_dev_enter/exit() critical section. Since
>> drm_dev_unplug(), which is called in device removal and system shutdown
>> call path, blocks on synchronize_srcu() until all critical sections
>> complete, device removal and system shutdown also hang.
>> 
>> Add a vqs_released flag to virtio_gpu_device and include it in the
>> wait_event() condition. Set the flag and wake up both queues in a new
>> virtio_gpu_release_vqs() helper, called before drm_dev_unplug() in both
>> virtio_gpu_remove() and virtio_gpu_shutdown(). When the flag is set, the
>> wait returns immediately and the command is aborted, following the same
>> cleanup path as drm_dev_enter() failure.
>> 
>> Reported-by: syzbot+d6dd6f86d3aaf7eebe7406e45c1c6e549453f224@syzkaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?id=d6dd6f86d3aaf7eebe7406e45c1c6e549453f224
>> Reported-by: syzbot+908bd910da5dd79b88de4cf7baf376cc873a922e@syzkaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?id=908bd910da5dd79b88de4cf7baf376cc873a922e
>> Signed-off-by: Ryosuke Yasuoka <ryasuoka@redhat.com>
>> ---
>> Changes in v2:
>> - Update the commit message.
>> - Replace wait_event_timeout() with wait_event() using a compound
>> condition that includes a new vqs_released flag.
>> - Add virtio_gpu_release_vqs() helper to set the flag and wake up
>> both queues, called before drm_dev_unplug() in remove and shutdown
>> paths.
>> - Remove the hardcoded 5-second timeout. Recovery is now driven by
>> the driver flag instead of an arbitrary timeout value.
>> ---
>>  drivers/gpu/drm/virtio/virtgpu_drv.c | 15 +++++++++++++++
>>  drivers/gpu/drm/virtio/virtgpu_drv.h |  1 +
>>  drivers/gpu/drm/virtio/virtgpu_vq.c  | 23 +++++++++++++++++++++--
>>  3 files changed, 37 insertions(+), 2 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c b/drivers/gpu/drm/virtio/virtgpu_drv.c
>> index a5ce96fb8a1d..e4fe5e0780f9 100644
>> --- a/drivers/gpu/drm/virtio/virtgpu_drv.c
>> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
>> @@ -119,10 +119,24 @@ static int virtio_gpu_probe(struct virtio_device *vdev)
>>  	return ret;
>>  }
>>  
>> +/*
>> + * Release pending virtqueue waits so the drm_dev_enter/exit() critical
>> + * sections complete before drm_dev_unplug() blocks on synchronize_srcu().
>> + */
>> +static void virtio_gpu_release_vqs(struct drm_device *dev)
>> +{
>> +	struct virtio_gpu_device *vgdev = dev->dev_private;
>> +
>> +	vgdev->vqs_released = true;
>> +	wake_up_all(&vgdev->ctrlq.ack_queue);
>> +	wake_up_all(&vgdev->cursorq.ack_queue);
>> +}
>> +
>>  static void virtio_gpu_remove(struct virtio_device *vdev)
>>  {
>>  	struct drm_device *dev = vdev->priv;
>>  
>> +	virtio_gpu_release_vqs(dev);
>>  	drm_dev_unplug(dev);
>>  	drm_atomic_helper_shutdown(dev);
>>  	virtio_gpu_deinit(dev);
>> @@ -133,6 +147,7 @@ static void virtio_gpu_shutdown(struct virtio_device *vdev)
>>  {
>>  	struct drm_device *dev = vdev->priv;
>>  
>> +	virtio_gpu_release_vqs(dev);
>>  	/* stop talking to the device */
>>  	drm_dev_unplug(dev);
>>  }
>> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h
>> index f17660a71a3e..0bd69a40857e 100644
>> --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
>> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
>> @@ -235,6 +235,7 @@ struct virtio_gpu_device {
>>  
>>  	struct virtio_gpu_queue ctrlq;
>>  	struct virtio_gpu_queue cursorq;
>> +	bool vqs_released;
>>  	struct kmem_cache *vbufs;
>>  
>>  	atomic_t pending_commands;
>> diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c
>> index 67865810a2e7..8057a9b7356d 100644
>> --- a/drivers/gpu/drm/virtio/virtgpu_vq.c
>> +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
>> @@ -396,7 +396,19 @@ static int virtio_gpu_queue_ctrl_sgs(struct virtio_gpu_device *vgdev,
>>  	if (vq->num_free < elemcnt) {
>>  		spin_unlock(&vgdev->ctrlq.qlock);
>>  		virtio_gpu_notify(vgdev);
>> -		wait_event(vgdev->ctrlq.ack_queue, vq->num_free >= elemcnt);
>> +		wait_event(vgdev->ctrlq.ack_queue,
>> +			   vq->num_free >= elemcnt || vgdev->vqs_released);
>> +		/*
>> +		 * Set by virtio_gpu_release_vqs() to unblock
>> +		 * synchronize_srcu() wait in drm_dev_unplug().
>> +		 */
>> +		if (vgdev->vqs_released) {
>> +			if (fence && vbuf->objs)
>> +				virtio_gpu_array_unlock_resv(vbuf->objs);
>> +			free_vbuf(vgdev, vbuf);
>> +			drm_dev_exit(idx);
>> +			return -ENODEV;
>> +		}
>>  		goto again;
>>  	}
>>  
>> @@ -566,7 +578,14 @@ static void virtio_gpu_queue_cursor(struct virtio_gpu_device *vgdev,
>>  	ret = virtqueue_add_sgs(vq, sgs, outcnt, 0, vbuf, GFP_ATOMIC);
>>  	if (ret == -ENOSPC) {
>>  		spin_unlock(&vgdev->cursorq.qlock);
>> -		wait_event(vgdev->cursorq.ack_queue, vq->num_free >= outcnt);
>> +		wait_event(vgdev->cursorq.ack_queue,
>> +			   vq->num_free >= outcnt || vgdev->vqs_released);
>> +		/* See comment in virtio_gpu_queue_ctrl_sgs(). */
>> +		if (vgdev->vqs_released) {
>> +			free_vbuf(vgdev, vbuf);
>> +			drm_dev_exit(idx);
>> +			return;
>> +		}
>>  		spin_lock(&vgdev->cursorq.qlock);
>>  		goto retry;
>>  	} else {
>
> What about other wait_event in the driver? Why only these?
>
> -- 
> Best regards,
> Dmitry

I see the command but can't find the corresponding bug.
The email is sent to  syzbot+HASH@syzkaller.appspotmail.com address
but the HASH does not correspond to any known bug.
Please double check the address.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] drm/virtio: abort virtqueue wait on device removal to avoid hung task
  2026-05-21 19:01 ` Dmitry Osipenko
  2026-05-21 19:02   ` syzbot
  2026-05-21 19:02   ` syzbot
@ 2026-05-22  8:51   ` Ryosuke Yasuoka
  2026-05-22  8:51     ` syzbot
  2026-05-22  8:51     ` syzbot
  2 siblings, 2 replies; 11+ messages in thread
From: Ryosuke Yasuoka @ 2026-05-22  8:51 UTC (permalink / raw)
  To: Dmitry Osipenko, David Airlie, Gerd Hoffmann, Gurchetan Singh,
	Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Simona Vetter
  Cc: dri-devel, virtualization, linux-kernel,
	syzbot+d6dd6f86d3aaf7eebe7406e45c1c6e549453f224,
	syzbot+908bd910da5dd79b88de4cf7baf376cc873a922e

Hi Dmitry,
Thank you for your review and the comment.

On 21/05/2026 22:01, Dmitry Osipenko wrote:
> 21.05.2026 05:19, Ryosuke Yasuoka пишет:
>> virtio_gpu_queue_ctrl_sgs() and virtio_gpu_queue_cursor() use
>> wait_event() without any abort condition when waiting for virtqueue
>> space. If the host device stops processing commands, these waits block
>> indefinitely inside a drm_dev_enter/exit() critical section. Since
>> drm_dev_unplug(), which is called in device removal and system shutdown
>> call path, blocks on synchronize_srcu() until all critical sections
>> complete, device removal and system shutdown also hang.
>> 
>> Add a vqs_released flag to virtio_gpu_device and include it in the
>> wait_event() condition. Set the flag and wake up both queues in a new
>> virtio_gpu_release_vqs() helper, called before drm_dev_unplug() in both
>> virtio_gpu_remove() and virtio_gpu_shutdown(). When the flag is set, the
>> wait returns immediately and the command is aborted, following the same
>> cleanup path as drm_dev_enter() failure.
>> 
>> Reported-by: syzbot+d6dd6f86d3aaf7eebe7406e45c1c6e549453f224@syzkaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?id=d6dd6f86d3aaf7eebe7406e45c1c6e549453f224
>> Reported-by: syzbot+908bd910da5dd79b88de4cf7baf376cc873a922e@syzkaller.appspotmail.com
>> Closes: https://syzkaller.appspot.com/bug?id=908bd910da5dd79b88de4cf7baf376cc873a922e
>> Signed-off-by: Ryosuke Yasuoka <ryasuoka@redhat.com>
>> ---
>> Changes in v2:
>> - Update the commit message.
>> - Replace wait_event_timeout() with wait_event() using a compound
>> condition that includes a new vqs_released flag.
>> - Add virtio_gpu_release_vqs() helper to set the flag and wake up
>> both queues, called before drm_dev_unplug() in remove and shutdown
>> paths.
>> - Remove the hardcoded 5-second timeout. Recovery is now driven by
>> the driver flag instead of an arbitrary timeout value.
>> ---
>>  drivers/gpu/drm/virtio/virtgpu_drv.c | 15 +++++++++++++++
>>  drivers/gpu/drm/virtio/virtgpu_drv.h |  1 +
>>  drivers/gpu/drm/virtio/virtgpu_vq.c  | 23 +++++++++++++++++++++--
>>  3 files changed, 37 insertions(+), 2 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c b/drivers/gpu/drm/virtio/virtgpu_drv.c
>> index a5ce96fb8a1d..e4fe5e0780f9 100644
>> --- a/drivers/gpu/drm/virtio/virtgpu_drv.c
>> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
>> @@ -119,10 +119,24 @@ static int virtio_gpu_probe(struct virtio_device *vdev)
>>  	return ret;
>>  }
>>  
>> +/*
>> + * Release pending virtqueue waits so the drm_dev_enter/exit() critical
>> + * sections complete before drm_dev_unplug() blocks on synchronize_srcu().
>> + */
>> +static void virtio_gpu_release_vqs(struct drm_device *dev)
>> +{
>> +	struct virtio_gpu_device *vgdev = dev->dev_private;
>> +
>> +	vgdev->vqs_released = true;
>> +	wake_up_all(&vgdev->ctrlq.ack_queue);
>> +	wake_up_all(&vgdev->cursorq.ack_queue);
>> +}
>> +
>>  static void virtio_gpu_remove(struct virtio_device *vdev)
>>  {
>>  	struct drm_device *dev = vdev->priv;
>>  
>> +	virtio_gpu_release_vqs(dev);
>>  	drm_dev_unplug(dev);
>>  	drm_atomic_helper_shutdown(dev);
>>  	virtio_gpu_deinit(dev);
>> @@ -133,6 +147,7 @@ static void virtio_gpu_shutdown(struct virtio_device *vdev)
>>  {
>>  	struct drm_device *dev = vdev->priv;
>>  
>> +	virtio_gpu_release_vqs(dev);
>>  	/* stop talking to the device */
>>  	drm_dev_unplug(dev);
>>  }
>> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h
>> index f17660a71a3e..0bd69a40857e 100644
>> --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
>> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
>> @@ -235,6 +235,7 @@ struct virtio_gpu_device {
>>  
>>  	struct virtio_gpu_queue ctrlq;
>>  	struct virtio_gpu_queue cursorq;
>> +	bool vqs_released;
>>  	struct kmem_cache *vbufs;
>>  
>>  	atomic_t pending_commands;
>> diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c
>> index 67865810a2e7..8057a9b7356d 100644
>> --- a/drivers/gpu/drm/virtio/virtgpu_vq.c
>> +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
>> @@ -396,7 +396,19 @@ static int virtio_gpu_queue_ctrl_sgs(struct virtio_gpu_device *vgdev,
>>  	if (vq->num_free < elemcnt) {
>>  		spin_unlock(&vgdev->ctrlq.qlock);
>>  		virtio_gpu_notify(vgdev);
>> -		wait_event(vgdev->ctrlq.ack_queue, vq->num_free >= elemcnt);
>> +		wait_event(vgdev->ctrlq.ack_queue,
>> +			   vq->num_free >= elemcnt || vgdev->vqs_released);
>> +		/*
>> +		 * Set by virtio_gpu_release_vqs() to unblock
>> +		 * synchronize_srcu() wait in drm_dev_unplug().
>> +		 */
>> +		if (vgdev->vqs_released) {
>> +			if (fence && vbuf->objs)
>> +				virtio_gpu_array_unlock_resv(vbuf->objs);
>> +			free_vbuf(vgdev, vbuf);
>> +			drm_dev_exit(idx);
>> +			return -ENODEV;
>> +		}
>>  		goto again;
>>  	}
>>  
>> @@ -566,7 +578,14 @@ static void virtio_gpu_queue_cursor(struct virtio_gpu_device *vgdev,
>>  	ret = virtqueue_add_sgs(vq, sgs, outcnt, 0, vbuf, GFP_ATOMIC);
>>  	if (ret == -ENOSPC) {
>>  		spin_unlock(&vgdev->cursorq.qlock);
>> -		wait_event(vgdev->cursorq.ack_queue, vq->num_free >= outcnt);
>> +		wait_event(vgdev->cursorq.ack_queue,
>> +			   vq->num_free >= outcnt || vgdev->vqs_released);
>> +		/* See comment in virtio_gpu_queue_ctrl_sgs(). */
>> +		if (vgdev->vqs_released) {
>> +			free_vbuf(vgdev, vbuf);
>> +			drm_dev_exit(idx);
>> +			return;
>> +		}
>>  		spin_lock(&vgdev->cursorq.qlock);
>>  		goto retry;
>>  	} else {
> 
> What about other wait_event in the driver? Why only these?

There are other wait_event() calls on vgdev->resp_wq in virtgpu_prime.c
and virtgpu_vram.c. These can also cause a stuck process when the host
device stops or does not respond and should be fixed. However, they are
not related to the issue being fixed here. I think they should be
addressed in a separate commit.

The wait_event(vgdev->resp_wq) calls wait for a host response, not for
virtqueue ring space. They are not inside a drm_dev_enter/exit()
critical section, so they don't block drm_dev_unplug() ->
synchronize_srcu() and are not part of the deadlock reported by syzbot.

However, if a process is stuck in wait_event(vgdev->resp_wq), there is
no way to recover other than host device recovery itself. We can still
remove the device and destroy the virtqueues while the process is stuck,
so the host response can never arrive. IIUC, the stuck thread holds a
drm_device reference, preventing vgdev from being freed; this is a
resource leak.

This can be fixed by adding a wake_up_all() and a vqs_released check
similar to what is done for ctrlq/cursorq. However, I think this should
be a separate commit since the root cause and problem are different.

Alternatively, the resp_wq wait_event() calls could be converted to
wait_event_interruptible() to fix the issue. But as you mentioned
earlier in the v1 comment, we need the wait_event_interruptible()
rework.

What do you think? Should I include the resp_wq fix as a separate commit
in this patch series, or leave it for the _interruptible() rework?

Best regards,
Ryosuke



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] drm/virtio: abort virtqueue wait on device removal to avoid hung task
  2026-05-22  8:51   ` Ryosuke Yasuoka
@ 2026-05-22  8:51     ` syzbot
  2026-05-22  8:51     ` syzbot
  1 sibling, 0 replies; 11+ messages in thread
From: syzbot @ 2026-05-22  8:51 UTC (permalink / raw)
  To: ryasuoka
  Cc: airlied, dmitry.osipenko, dri-devel, gurchetansingh, kraxel,
	linux-kernel, maarten.lankhorst, mripard, olvaffe, ryasuoka,
	simona, tzimmermann, virtualization

> Hi Dmitry,
> Thank you for your review and the comment.
>
> On 21/05/2026 22:01, Dmitry Osipenko wrote:
>> 21.05.2026 05:19, Ryosuke Yasuoka пишет:
>>> virtio_gpu_queue_ctrl_sgs() and virtio_gpu_queue_cursor() use
>>> wait_event() without any abort condition when waiting for virtqueue
>>> space. If the host device stops processing commands, these waits block
>>> indefinitely inside a drm_dev_enter/exit() critical section. Since
>>> drm_dev_unplug(), which is called in device removal and system shutdown
>>> call path, blocks on synchronize_srcu() until all critical sections
>>> complete, device removal and system shutdown also hang.
>>> 
>>> Add a vqs_released flag to virtio_gpu_device and include it in the
>>> wait_event() condition. Set the flag and wake up both queues in a new
>>> virtio_gpu_release_vqs() helper, called before drm_dev_unplug() in both
>>> virtio_gpu_remove() and virtio_gpu_shutdown(). When the flag is set, the
>>> wait returns immediately and the command is aborted, following the same
>>> cleanup path as drm_dev_enter() failure.
>>> 
>>> Reported-by: syzbot+d6dd6f86d3aaf7eebe7406e45c1c6e549453f224@syzkaller.appspotmail.com
>>> Closes: https://syzkaller.appspot.com/bug?id=d6dd6f86d3aaf7eebe7406e45c1c6e549453f224
>>> Reported-by: syzbot+908bd910da5dd79b88de4cf7baf376cc873a922e@syzkaller.appspotmail.com
>>> Closes: https://syzkaller.appspot.com/bug?id=908bd910da5dd79b88de4cf7baf376cc873a922e
>>> Signed-off-by: Ryosuke Yasuoka <ryasuoka@redhat.com>
>>> ---
>>> Changes in v2:
>>> - Update the commit message.
>>> - Replace wait_event_timeout() with wait_event() using a compound
>>> condition that includes a new vqs_released flag.
>>> - Add virtio_gpu_release_vqs() helper to set the flag and wake up
>>> both queues, called before drm_dev_unplug() in remove and shutdown
>>> paths.
>>> - Remove the hardcoded 5-second timeout. Recovery is now driven by
>>> the driver flag instead of an arbitrary timeout value.
>>> ---
>>>  drivers/gpu/drm/virtio/virtgpu_drv.c | 15 +++++++++++++++
>>>  drivers/gpu/drm/virtio/virtgpu_drv.h |  1 +
>>>  drivers/gpu/drm/virtio/virtgpu_vq.c  | 23 +++++++++++++++++++++--
>>>  3 files changed, 37 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c b/drivers/gpu/drm/virtio/virtgpu_drv.c
>>> index a5ce96fb8a1d..e4fe5e0780f9 100644
>>> --- a/drivers/gpu/drm/virtio/virtgpu_drv.c
>>> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
>>> @@ -119,10 +119,24 @@ static int virtio_gpu_probe(struct virtio_device *vdev)
>>>  	return ret;
>>>  }
>>>  
>>> +/*
>>> + * Release pending virtqueue waits so the drm_dev_enter/exit() critical
>>> + * sections complete before drm_dev_unplug() blocks on synchronize_srcu().
>>> + */
>>> +static void virtio_gpu_release_vqs(struct drm_device *dev)
>>> +{
>>> +	struct virtio_gpu_device *vgdev = dev->dev_private;
>>> +
>>> +	vgdev->vqs_released = true;
>>> +	wake_up_all(&vgdev->ctrlq.ack_queue);
>>> +	wake_up_all(&vgdev->cursorq.ack_queue);
>>> +}
>>> +
>>>  static void virtio_gpu_remove(struct virtio_device *vdev)
>>>  {
>>>  	struct drm_device *dev = vdev->priv;
>>>  
>>> +	virtio_gpu_release_vqs(dev);
>>>  	drm_dev_unplug(dev);
>>>  	drm_atomic_helper_shutdown(dev);
>>>  	virtio_gpu_deinit(dev);
>>> @@ -133,6 +147,7 @@ static void virtio_gpu_shutdown(struct virtio_device *vdev)
>>>  {
>>>  	struct drm_device *dev = vdev->priv;
>>>  
>>> +	virtio_gpu_release_vqs(dev);
>>>  	/* stop talking to the device */
>>>  	drm_dev_unplug(dev);
>>>  }
>>> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h
>>> index f17660a71a3e..0bd69a40857e 100644
>>> --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
>>> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
>>> @@ -235,6 +235,7 @@ struct virtio_gpu_device {
>>>  
>>>  	struct virtio_gpu_queue ctrlq;
>>>  	struct virtio_gpu_queue cursorq;
>>> +	bool vqs_released;
>>>  	struct kmem_cache *vbufs;
>>>  
>>>  	atomic_t pending_commands;
>>> diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c
>>> index 67865810a2e7..8057a9b7356d 100644
>>> --- a/drivers/gpu/drm/virtio/virtgpu_vq.c
>>> +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
>>> @@ -396,7 +396,19 @@ static int virtio_gpu_queue_ctrl_sgs(struct virtio_gpu_device *vgdev,
>>>  	if (vq->num_free < elemcnt) {
>>>  		spin_unlock(&vgdev->ctrlq.qlock);
>>>  		virtio_gpu_notify(vgdev);
>>> -		wait_event(vgdev->ctrlq.ack_queue, vq->num_free >= elemcnt);
>>> +		wait_event(vgdev->ctrlq.ack_queue,
>>> +			   vq->num_free >= elemcnt || vgdev->vqs_released);
>>> +		/*
>>> +		 * Set by virtio_gpu_release_vqs() to unblock
>>> +		 * synchronize_srcu() wait in drm_dev_unplug().
>>> +		 */
>>> +		if (vgdev->vqs_released) {
>>> +			if (fence && vbuf->objs)
>>> +				virtio_gpu_array_unlock_resv(vbuf->objs);
>>> +			free_vbuf(vgdev, vbuf);
>>> +			drm_dev_exit(idx);
>>> +			return -ENODEV;
>>> +		}
>>>  		goto again;
>>>  	}
>>>  
>>> @@ -566,7 +578,14 @@ static void virtio_gpu_queue_cursor(struct virtio_gpu_device *vgdev,
>>>  	ret = virtqueue_add_sgs(vq, sgs, outcnt, 0, vbuf, GFP_ATOMIC);
>>>  	if (ret == -ENOSPC) {
>>>  		spin_unlock(&vgdev->cursorq.qlock);
>>> -		wait_event(vgdev->cursorq.ack_queue, vq->num_free >= outcnt);
>>> +		wait_event(vgdev->cursorq.ack_queue,
>>> +			   vq->num_free >= outcnt || vgdev->vqs_released);
>>> +		/* See comment in virtio_gpu_queue_ctrl_sgs(). */
>>> +		if (vgdev->vqs_released) {
>>> +			free_vbuf(vgdev, vbuf);
>>> +			drm_dev_exit(idx);
>>> +			return;
>>> +		}
>>>  		spin_lock(&vgdev->cursorq.qlock);
>>>  		goto retry;
>>>  	} else {
>> 
>> What about other wait_event in the driver? Why only these?
>
> There are other wait_event() calls on vgdev->resp_wq in virtgpu_prime.c
> and virtgpu_vram.c. These can also cause a stuck process when the host
> device stops or does not respond and should be fixed. However, they are
> not related to the issue being fixed here. I think they should be
> addressed in a separate commit.
>
> The wait_event(vgdev->resp_wq) calls wait for a host response, not for
> virtqueue ring space. They are not inside a drm_dev_enter/exit()
> critical section, so they don't block drm_dev_unplug() ->
> synchronize_srcu() and are not part of the deadlock reported by syzbot.
>
> However, if a process is stuck in wait_event(vgdev->resp_wq), there is
> no way to recover other than host device recovery itself. We can still
> remove the device and destroy the virtqueues while the process is stuck,
> so the host response can never arrive. IIUC, the stuck thread holds a
> drm_device reference, preventing vgdev from being freed; this is a
> resource leak.
>
> This can be fixed by adding a wake_up_all() and a vqs_released check
> similar to what is done for ctrlq/cursorq. However, I think this should
> be a separate commit since the root cause and problem are different.
>
> Alternatively, the resp_wq wait_event() calls could be converted to
> wait_event_interruptible() to fix the issue. But as you mentioned
> earlier in the v1 comment, we need the wait_event_interruptible()
> rework.
>
> What do you think? Should I include the resp_wq fix as a separate commit
> in this patch series, or leave it for the _interruptible() rework?
>
> Best regards,
> Ryosuke
>
>

I see the command but can't find the corresponding bug.
The email is sent to  syzbot+HASH@syzkaller.appspotmail.com address
but the HASH does not correspond to any known bug.
Please double check the address.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] drm/virtio: abort virtqueue wait on device removal to avoid hung task
  2026-05-22  8:51   ` Ryosuke Yasuoka
  2026-05-22  8:51     ` syzbot
@ 2026-05-22  8:51     ` syzbot
  1 sibling, 0 replies; 11+ messages in thread
From: syzbot @ 2026-05-22  8:51 UTC (permalink / raw)
  To: ryasuoka
  Cc: airlied, dmitry.osipenko, dri-devel, gurchetansingh, kraxel,
	linux-kernel, maarten.lankhorst, mripard, olvaffe, ryasuoka,
	simona, tzimmermann, virtualization

> Hi Dmitry,
> Thank you for your review and the comment.
>
> On 21/05/2026 22:01, Dmitry Osipenko wrote:
>> 21.05.2026 05:19, Ryosuke Yasuoka пишет:
>>> virtio_gpu_queue_ctrl_sgs() and virtio_gpu_queue_cursor() use
>>> wait_event() without any abort condition when waiting for virtqueue
>>> space. If the host device stops processing commands, these waits block
>>> indefinitely inside a drm_dev_enter/exit() critical section. Since
>>> drm_dev_unplug(), which is called in device removal and system shutdown
>>> call path, blocks on synchronize_srcu() until all critical sections
>>> complete, device removal and system shutdown also hang.
>>> 
>>> Add a vqs_released flag to virtio_gpu_device and include it in the
>>> wait_event() condition. Set the flag and wake up both queues in a new
>>> virtio_gpu_release_vqs() helper, called before drm_dev_unplug() in both
>>> virtio_gpu_remove() and virtio_gpu_shutdown(). When the flag is set, the
>>> wait returns immediately and the command is aborted, following the same
>>> cleanup path as drm_dev_enter() failure.
>>> 
>>> Reported-by: syzbot+d6dd6f86d3aaf7eebe7406e45c1c6e549453f224@syzkaller.appspotmail.com
>>> Closes: https://syzkaller.appspot.com/bug?id=d6dd6f86d3aaf7eebe7406e45c1c6e549453f224
>>> Reported-by: syzbot+908bd910da5dd79b88de4cf7baf376cc873a922e@syzkaller.appspotmail.com
>>> Closes: https://syzkaller.appspot.com/bug?id=908bd910da5dd79b88de4cf7baf376cc873a922e
>>> Signed-off-by: Ryosuke Yasuoka <ryasuoka@redhat.com>
>>> ---
>>> Changes in v2:
>>> - Update the commit message.
>>> - Replace wait_event_timeout() with wait_event() using a compound
>>> condition that includes a new vqs_released flag.
>>> - Add virtio_gpu_release_vqs() helper to set the flag and wake up
>>> both queues, called before drm_dev_unplug() in remove and shutdown
>>> paths.
>>> - Remove the hardcoded 5-second timeout. Recovery is now driven by
>>> the driver flag instead of an arbitrary timeout value.
>>> ---
>>>  drivers/gpu/drm/virtio/virtgpu_drv.c | 15 +++++++++++++++
>>>  drivers/gpu/drm/virtio/virtgpu_drv.h |  1 +
>>>  drivers/gpu/drm/virtio/virtgpu_vq.c  | 23 +++++++++++++++++++++--
>>>  3 files changed, 37 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c b/drivers/gpu/drm/virtio/virtgpu_drv.c
>>> index a5ce96fb8a1d..e4fe5e0780f9 100644
>>> --- a/drivers/gpu/drm/virtio/virtgpu_drv.c
>>> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
>>> @@ -119,10 +119,24 @@ static int virtio_gpu_probe(struct virtio_device *vdev)
>>>  	return ret;
>>>  }
>>>  
>>> +/*
>>> + * Release pending virtqueue waits so the drm_dev_enter/exit() critical
>>> + * sections complete before drm_dev_unplug() blocks on synchronize_srcu().
>>> + */
>>> +static void virtio_gpu_release_vqs(struct drm_device *dev)
>>> +{
>>> +	struct virtio_gpu_device *vgdev = dev->dev_private;
>>> +
>>> +	vgdev->vqs_released = true;
>>> +	wake_up_all(&vgdev->ctrlq.ack_queue);
>>> +	wake_up_all(&vgdev->cursorq.ack_queue);
>>> +}
>>> +
>>>  static void virtio_gpu_remove(struct virtio_device *vdev)
>>>  {
>>>  	struct drm_device *dev = vdev->priv;
>>>  
>>> +	virtio_gpu_release_vqs(dev);
>>>  	drm_dev_unplug(dev);
>>>  	drm_atomic_helper_shutdown(dev);
>>>  	virtio_gpu_deinit(dev);
>>> @@ -133,6 +147,7 @@ static void virtio_gpu_shutdown(struct virtio_device *vdev)
>>>  {
>>>  	struct drm_device *dev = vdev->priv;
>>>  
>>> +	virtio_gpu_release_vqs(dev);
>>>  	/* stop talking to the device */
>>>  	drm_dev_unplug(dev);
>>>  }
>>> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h
>>> index f17660a71a3e..0bd69a40857e 100644
>>> --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
>>> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
>>> @@ -235,6 +235,7 @@ struct virtio_gpu_device {
>>>  
>>>  	struct virtio_gpu_queue ctrlq;
>>>  	struct virtio_gpu_queue cursorq;
>>> +	bool vqs_released;
>>>  	struct kmem_cache *vbufs;
>>>  
>>>  	atomic_t pending_commands;
>>> diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c
>>> index 67865810a2e7..8057a9b7356d 100644
>>> --- a/drivers/gpu/drm/virtio/virtgpu_vq.c
>>> +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
>>> @@ -396,7 +396,19 @@ static int virtio_gpu_queue_ctrl_sgs(struct virtio_gpu_device *vgdev,
>>>  	if (vq->num_free < elemcnt) {
>>>  		spin_unlock(&vgdev->ctrlq.qlock);
>>>  		virtio_gpu_notify(vgdev);
>>> -		wait_event(vgdev->ctrlq.ack_queue, vq->num_free >= elemcnt);
>>> +		wait_event(vgdev->ctrlq.ack_queue,
>>> +			   vq->num_free >= elemcnt || vgdev->vqs_released);
>>> +		/*
>>> +		 * Set by virtio_gpu_release_vqs() to unblock
>>> +		 * synchronize_srcu() wait in drm_dev_unplug().
>>> +		 */
>>> +		if (vgdev->vqs_released) {
>>> +			if (fence && vbuf->objs)
>>> +				virtio_gpu_array_unlock_resv(vbuf->objs);
>>> +			free_vbuf(vgdev, vbuf);
>>> +			drm_dev_exit(idx);
>>> +			return -ENODEV;
>>> +		}
>>>  		goto again;
>>>  	}
>>>  
>>> @@ -566,7 +578,14 @@ static void virtio_gpu_queue_cursor(struct virtio_gpu_device *vgdev,
>>>  	ret = virtqueue_add_sgs(vq, sgs, outcnt, 0, vbuf, GFP_ATOMIC);
>>>  	if (ret == -ENOSPC) {
>>>  		spin_unlock(&vgdev->cursorq.qlock);
>>> -		wait_event(vgdev->cursorq.ack_queue, vq->num_free >= outcnt);
>>> +		wait_event(vgdev->cursorq.ack_queue,
>>> +			   vq->num_free >= outcnt || vgdev->vqs_released);
>>> +		/* See comment in virtio_gpu_queue_ctrl_sgs(). */
>>> +		if (vgdev->vqs_released) {
>>> +			free_vbuf(vgdev, vbuf);
>>> +			drm_dev_exit(idx);
>>> +			return;
>>> +		}
>>>  		spin_lock(&vgdev->cursorq.qlock);
>>>  		goto retry;
>>>  	} else {
>> 
>> What about other wait_event in the driver? Why only these?
>
> There are other wait_event() calls on vgdev->resp_wq in virtgpu_prime.c
> and virtgpu_vram.c. These can also cause a stuck process when the host
> device stops or does not respond and should be fixed. However, they are
> not related to the issue being fixed here. I think they should be
> addressed in a separate commit.
>
> The wait_event(vgdev->resp_wq) calls wait for a host response, not for
> virtqueue ring space. They are not inside a drm_dev_enter/exit()
> critical section, so they don't block drm_dev_unplug() ->
> synchronize_srcu() and are not part of the deadlock reported by syzbot.
>
> However, if a process is stuck in wait_event(vgdev->resp_wq), there is
> no way to recover other than host device recovery itself. We can still
> remove the device and destroy the virtqueues while the process is stuck,
> so the host response can never arrive. IIUC, the stuck thread holds a
> drm_device reference, preventing vgdev from being freed; this is a
> resource leak.
>
> This can be fixed by adding a wake_up_all() and a vqs_released check
> similar to what is done for ctrlq/cursorq. However, I think this should
> be a separate commit since the root cause and problem are different.
>
> Alternatively, the resp_wq wait_event() calls could be converted to
> wait_event_interruptible() to fix the issue. But as you mentioned
> earlier in the v1 comment, we need the wait_event_interruptible()
> rework.
>
> What do you think? Should I include the resp_wq fix as a separate commit
> in this patch series, or leave it for the _interruptible() rework?
>
> Best regards,
> Ryosuke
>
>

I see the command but can't find the corresponding bug.
The email is sent to  syzbot+HASH@syzkaller.appspotmail.com address
but the HASH does not correspond to any known bug.
Please double check the address.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Claude review: drm/virtio: abort virtqueue wait on device removal to avoid hung task
  2026-05-21  2:19 [PATCH v2] drm/virtio: abort virtqueue wait on device removal to avoid hung task Ryosuke Yasuoka
                   ` (2 preceding siblings ...)
  2026-05-21 19:01 ` Dmitry Osipenko
@ 2026-05-25 11:05 ` Claude Code Review Bot
  2026-05-25 11:05 ` Claude Code Review Bot
  4 siblings, 0 replies; 11+ messages in thread
From: Claude Code Review Bot @ 2026-05-25 11:05 UTC (permalink / raw)
  To: dri-devel-reviews

Overall Series Review

Subject: drm/virtio: abort virtqueue wait on device removal to avoid hung task
Author: Ryosuke Yasuoka <ryasuoka@redhat.com>
Patches: 9
Reviewed: 2026-05-25T21:05:48.593126

---

This is a single-patch fix for a real syzbot-reported hung task bug in the virtio-gpu driver. The core problem is well understood: `wait_event()` calls in `virtio_gpu_queue_ctrl_sgs()` and `virtio_gpu_queue_cursor()` block indefinitely inside `drm_dev_enter()`/`drm_dev_exit()` SRCU critical sections. Since `drm_dev_unplug()` calls `synchronize_srcu()` which waits for those critical sections to complete, device removal and shutdown deadlock.

The v2 approach — a driver-level `vqs_released` flag set before `drm_dev_unplug()` — is the right idea and is a clear improvement over v1's arbitrary timeout. The abort paths correctly mirror the existing `drm_dev_enter()` failure cleanup. However, I have a few concerns about atomicity/ordering, and a potential race window that should be addressed.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Claude review: drm/virtio: abort virtqueue wait on device removal to avoid hung task
  2026-05-21  2:19 [PATCH v2] drm/virtio: abort virtqueue wait on device removal to avoid hung task Ryosuke Yasuoka
                   ` (3 preceding siblings ...)
  2026-05-25 11:05 ` Claude review: " Claude Code Review Bot
@ 2026-05-25 11:05 ` Claude Code Review Bot
  4 siblings, 0 replies; 11+ messages in thread
From: Claude Code Review Bot @ 2026-05-25 11:05 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Positive aspects:**
- The commit message clearly describes the problem and solution.
- The v2 changes (removing arbitrary timeout, using a flag-driven approach) are a good design evolution.
- The abort paths correctly mirror the `drm_dev_enter()` failure cleanup in both functions.
- Calling `virtio_gpu_release_vqs()` before `drm_dev_unplug()` in both `remove` and `shutdown` is correct for the ordering requirement.

**Issue 1 (medium): `vqs_released` should be atomic or have explicit barrier**

```c
+	bool vqs_released;
```

This is a plain `bool` written by one context and read by others without any locking. While `wait_event()` internally provides memory barriers for its condition check, the post-`wait_event` re-read of `vqs_released`:

```c
+		if (vgdev->vqs_released) {
```

is a plain load outside of any lock. On weakly-ordered architectures, the compiler or CPU could theoretically reorder or cache this. In practice, `wait_event()` likely provides sufficient ordering since it just evaluated the same variable, but the idiomatic kernel approach would be to either:
- Use `READ_ONCE()`/`WRITE_ONCE()` for the accesses, or
- Use an `atomic_t` flag (e.g., `atomic_read(&vgdev->vqs_released)`)

At minimum, the write side should use `WRITE_ONCE()`:
```c
WRITE_ONCE(vgdev->vqs_released, true);
```
and the reads should use `READ_ONCE()`.

**Issue 2 (low): Race window between `drm_dev_enter()` succeeding and reaching `wait_event()`**

Consider this sequence:
1. Thread A calls `drm_dev_enter()` — succeeds, enters SRCU critical section
2. Thread B calls `virtio_gpu_release_vqs()` — sets `vqs_released`, wakes queues
3. Thread B calls `drm_dev_unplug()` — starts `synchronize_srcu()`, waits for Thread A
4. Thread A reaches `wait_event()` — condition is already true due to `vqs_released`, returns immediately
5. Thread A takes the abort path, calls `drm_dev_exit()`, releases SRCU
6. Thread B's `synchronize_srcu()` completes

This actually works correctly. The `vqs_released` flag being set before `drm_dev_unplug()` means the `wait_event()` will see it regardless of timing.

**Issue 3 (nit): Redundant comment in cursor path**

```c
+		/* See comment in virtio_gpu_queue_ctrl_sgs(). */
```

This is fine for cross-referencing, but the comment it points to in `virtio_gpu_queue_ctrl_sgs()`:

```c
+		/*
+		 * Set by virtio_gpu_release_vqs() to unblock
+		 * synchronize_srcu() wait in drm_dev_unplug().
+		 */
```

is describing what sets the flag rather than what the code is doing. The code flow is self-explanatory — the comment could be shortened to just the "why": `/* Device being removed — abort command. */`

**Issue 4 (low): `virtio_gpu_release_vqs` is only used in one file but not static**

The function `virtio_gpu_release_vqs()` is defined in `virtgpu_drv.c` and only used in that file. It's already implicitly static by not being declared in the header, but marking it explicitly `static` would be more consistent:

```c
+static void virtio_gpu_release_vqs(struct drm_device *dev)
```

Looking at the diff again, it already doesn't have `static` but also isn't exported or declared in the header. Since it's only called from `virtio_gpu_remove()` and `virtio_gpu_shutdown()` in the same file, it should be `static`.

**Issue 5 (observation): Fence not signaled on abort**

When the abort path fires in `virtio_gpu_queue_ctrl_sgs()`, the fence passed as a parameter is never signaled with an error. This is consistent with the existing `drm_dev_enter()` failure path (which also doesn't signal the fence), so this is not a regression introduced by this patch. However, it's worth noting that this could leave waiters on the fence hanging — though during device removal this is arguably acceptable since the entire device is going away.

**Summary:** The patch is correct in its approach and addresses the real deadlock. The main actionable feedback is to use `WRITE_ONCE()`/`READ_ONCE()` for the `vqs_released` accesses to satisfy kernel memory model expectations, and to add `static` to `virtio_gpu_release_vqs()`.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-05-25 11:05 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-21  2:19 [PATCH v2] drm/virtio: abort virtqueue wait on device removal to avoid hung task Ryosuke Yasuoka
2026-05-21  2:19 ` syzbot
2026-05-21  2:19 ` syzbot
2026-05-21 19:01 ` Dmitry Osipenko
2026-05-21 19:02   ` syzbot
2026-05-21 19:02   ` syzbot
2026-05-22  8:51   ` Ryosuke Yasuoka
2026-05-22  8:51     ` syzbot
2026-05-22  8:51     ` syzbot
2026-05-25 11:05 ` Claude review: " Claude Code Review Bot
2026-05-25 11:05 ` Claude Code Review Bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox