* [PATCH v3] drm/virtio: abort virtqueue wait on device removal to avoid hung task
@ 2026-06-01 7:53 Ryosuke Yasuoka
2026-06-03 1:23 ` Dmitry Osipenko
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Ryosuke Yasuoka @ 2026-06-01 7:53 UTC (permalink / raw)
To: David Airlie, Gerd Hoffmann, Dmitry Osipenko, Gurchetan Singh,
Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
Simona Vetter
Cc: dri-devel, virtualization, linux-kernel, Ryosuke Yasuoka
virtio_gpu_queue_ctrl_sgs() and virtio_gpu_queue_cursor() use
wait_event() without any abort condition when waiting for virtqueue
space. If the host device stops processing commands, these waits block
indefinitely inside a drm_dev_enter/exit() critical section. Since
drm_dev_unplug(), which is called in device removal and system shutdown
call path, blocks on synchronize_srcu() until all critical sections
complete, device removal and system shutdown also hang.
Add a vqs_released flag to virtio_gpu_device and include it in the
wait_event() condition. Set the flag and wake up both queues in a new
virtio_gpu_release_vqs() helper, called before drm_dev_unplug() in both
virtio_gpu_remove() and virtio_gpu_shutdown(). When the flag is set, the
wait returns immediately and the command is aborted, following the same
cleanup path as drm_dev_enter() failure.
Signed-off-by: Ryosuke Yasuoka <ryasuoka@redhat.com>
---
Changes in v3:
- Remove Reported-by and Closes tag from commit msg because they are not
related to this fix.
- Link to v2: https://lore.kernel.org/r/20260521-virtio-gpu_wait_event-v2-1-5796b3a71d03@redhat.com
Changes in v2:
- Update the commit message.
- Replace wait_event_timeout() with wait_event() using a compound
condition that includes a new vqs_released flag.
- Add virtio_gpu_release_vqs() helper to set the flag and wake up
both queues, called before drm_dev_unplug() in remove and shutdown
paths.
- Remove the hardcoded 5-second timeout. Recovery is now driven by
the driver flag instead of an arbitrary timeout value.
---
drivers/gpu/drm/virtio/virtgpu_drv.c | 15 +++++++++++++++
drivers/gpu/drm/virtio/virtgpu_drv.h | 1 +
drivers/gpu/drm/virtio/virtgpu_vq.c | 23 +++++++++++++++++++++--
3 files changed, 37 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c b/drivers/gpu/drm/virtio/virtgpu_drv.c
index a5ce96fb8a1d..e4fe5e0780f9 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.c
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
@@ -119,10 +119,24 @@ static int virtio_gpu_probe(struct virtio_device *vdev)
return ret;
}
+/*
+ * Release pending virtqueue waits so the drm_dev_enter/exit() critical
+ * sections complete before drm_dev_unplug() blocks on synchronize_srcu().
+ */
+static void virtio_gpu_release_vqs(struct drm_device *dev)
+{
+ struct virtio_gpu_device *vgdev = dev->dev_private;
+
+ vgdev->vqs_released = true;
+ wake_up_all(&vgdev->ctrlq.ack_queue);
+ wake_up_all(&vgdev->cursorq.ack_queue);
+}
+
static void virtio_gpu_remove(struct virtio_device *vdev)
{
struct drm_device *dev = vdev->priv;
+ virtio_gpu_release_vqs(dev);
drm_dev_unplug(dev);
drm_atomic_helper_shutdown(dev);
virtio_gpu_deinit(dev);
@@ -133,6 +147,7 @@ static void virtio_gpu_shutdown(struct virtio_device *vdev)
{
struct drm_device *dev = vdev->priv;
+ virtio_gpu_release_vqs(dev);
/* stop talking to the device */
drm_dev_unplug(dev);
}
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 2f3531950aa4..5f7bb6cc6ba7 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -235,6 +235,7 @@ struct virtio_gpu_device {
struct virtio_gpu_queue ctrlq;
struct virtio_gpu_queue cursorq;
+ bool vqs_released;
struct kmem_cache *vbufs;
atomic_t pending_commands;
diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c
index 67865810a2e7..8057a9b7356d 100644
--- a/drivers/gpu/drm/virtio/virtgpu_vq.c
+++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
@@ -396,7 +396,19 @@ static int virtio_gpu_queue_ctrl_sgs(struct virtio_gpu_device *vgdev,
if (vq->num_free < elemcnt) {
spin_unlock(&vgdev->ctrlq.qlock);
virtio_gpu_notify(vgdev);
- wait_event(vgdev->ctrlq.ack_queue, vq->num_free >= elemcnt);
+ wait_event(vgdev->ctrlq.ack_queue,
+ vq->num_free >= elemcnt || vgdev->vqs_released);
+ /*
+ * Set by virtio_gpu_release_vqs() to unblock
+ * synchronize_srcu() wait in drm_dev_unplug().
+ */
+ if (vgdev->vqs_released) {
+ if (fence && vbuf->objs)
+ virtio_gpu_array_unlock_resv(vbuf->objs);
+ free_vbuf(vgdev, vbuf);
+ drm_dev_exit(idx);
+ return -ENODEV;
+ }
goto again;
}
@@ -566,7 +578,14 @@ static void virtio_gpu_queue_cursor(struct virtio_gpu_device *vgdev,
ret = virtqueue_add_sgs(vq, sgs, outcnt, 0, vbuf, GFP_ATOMIC);
if (ret == -ENOSPC) {
spin_unlock(&vgdev->cursorq.qlock);
- wait_event(vgdev->cursorq.ack_queue, vq->num_free >= outcnt);
+ wait_event(vgdev->cursorq.ack_queue,
+ vq->num_free >= outcnt || vgdev->vqs_released);
+ /* See comment in virtio_gpu_queue_ctrl_sgs(). */
+ if (vgdev->vqs_released) {
+ free_vbuf(vgdev, vbuf);
+ drm_dev_exit(idx);
+ return;
+ }
spin_lock(&vgdev->cursorq.qlock);
goto retry;
} else {
---
base-commit: e43ffb69e0438cddd72aaa30898b4dc446f664f8
change-id: 20260518-virtio-gpu_wait_event-5aa060754f12
Best regards,
--
Ryosuke Yasuoka <ryasuoka@redhat.com>
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v3] drm/virtio: abort virtqueue wait on device removal to avoid hung task
2026-06-01 7:53 [PATCH v3] drm/virtio: abort virtqueue wait on device removal to avoid hung task Ryosuke Yasuoka
@ 2026-06-03 1:23 ` Dmitry Osipenko
2026-06-04 4:31 ` Claude review: " Claude Code Review Bot
2026-06-04 4:31 ` Claude Code Review Bot
2 siblings, 0 replies; 4+ messages in thread
From: Dmitry Osipenko @ 2026-06-03 1:23 UTC (permalink / raw)
To: Ryosuke Yasuoka, David Airlie, Gerd Hoffmann, Gurchetan Singh,
Chia-I Wu, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
Simona Vetter
Cc: dri-devel, virtualization, linux-kernel
On 6/1/26 10:53, Ryosuke Yasuoka wrote:
> virtio_gpu_queue_ctrl_sgs() and virtio_gpu_queue_cursor() use
> wait_event() without any abort condition when waiting for virtqueue
> space. If the host device stops processing commands, these waits block
> indefinitely inside a drm_dev_enter/exit() critical section. Since
> drm_dev_unplug(), which is called in device removal and system shutdown
> call path, blocks on synchronize_srcu() until all critical sections
> complete, device removal and system shutdown also hang.
>
> Add a vqs_released flag to virtio_gpu_device and include it in the
> wait_event() condition. Set the flag and wake up both queues in a new
> virtio_gpu_release_vqs() helper, called before drm_dev_unplug() in both
> virtio_gpu_remove() and virtio_gpu_shutdown(). When the flag is set, the
> wait returns immediately and the command is aborted, following the same
> cleanup path as drm_dev_enter() failure.
>
> Signed-off-by: Ryosuke Yasuoka <ryasuoka@redhat.com>
> ---
> Changes in v3:
> - Remove Reported-by and Closes tag from commit msg because they are not
> related to this fix.
> - Link to v2: https://lore.kernel.org/r/20260521-virtio-gpu_wait_event-v2-1-5796b3a71d03@redhat.com
>
> Changes in v2:
> - Update the commit message.
> - Replace wait_event_timeout() with wait_event() using a compound
> condition that includes a new vqs_released flag.
> - Add virtio_gpu_release_vqs() helper to set the flag and wake up
> both queues, called before drm_dev_unplug() in remove and shutdown
> paths.
> - Remove the hardcoded 5-second timeout. Recovery is now driven by
> the driver flag instead of an arbitrary timeout value.
> ---
> drivers/gpu/drm/virtio/virtgpu_drv.c | 15 +++++++++++++++
> drivers/gpu/drm/virtio/virtgpu_drv.h | 1 +
> drivers/gpu/drm/virtio/virtgpu_vq.c | 23 +++++++++++++++++++++--
> 3 files changed, 37 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c b/drivers/gpu/drm/virtio/virtgpu_drv.c
> index a5ce96fb8a1d..e4fe5e0780f9 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_drv.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
> @@ -119,10 +119,24 @@ static int virtio_gpu_probe(struct virtio_device *vdev)
> return ret;
> }
>
> +/*
> + * Release pending virtqueue waits so the drm_dev_enter/exit() critical
> + * sections complete before drm_dev_unplug() blocks on synchronize_srcu().
> + */
> +static void virtio_gpu_release_vqs(struct drm_device *dev)
> +{
> + struct virtio_gpu_device *vgdev = dev->dev_private;
> +
> + vgdev->vqs_released = true;
> + wake_up_all(&vgdev->ctrlq.ack_queue);
> + wake_up_all(&vgdev->cursorq.ack_queue);
> +}
> +
> static void virtio_gpu_remove(struct virtio_device *vdev)
> {
> struct drm_device *dev = vdev->priv;
>
> + virtio_gpu_release_vqs(dev);
> drm_dev_unplug(dev);
> drm_atomic_helper_shutdown(dev);
> virtio_gpu_deinit(dev);
> @@ -133,6 +147,7 @@ static void virtio_gpu_shutdown(struct virtio_device *vdev)
> {
> struct drm_device *dev = vdev->priv;
>
> + virtio_gpu_release_vqs(dev);
> /* stop talking to the device */
> drm_dev_unplug(dev);
> }
> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h b/drivers/gpu/drm/virtio/virtgpu_drv.h
> index 2f3531950aa4..5f7bb6cc6ba7 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_drv.h
> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
> @@ -235,6 +235,7 @@ struct virtio_gpu_device {
>
> struct virtio_gpu_queue ctrlq;
> struct virtio_gpu_queue cursorq;
> + bool vqs_released;
> struct kmem_cache *vbufs;
>
> atomic_t pending_commands;
> diff --git a/drivers/gpu/drm/virtio/virtgpu_vq.c b/drivers/gpu/drm/virtio/virtgpu_vq.c
> index 67865810a2e7..8057a9b7356d 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_vq.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_vq.c
> @@ -396,7 +396,19 @@ static int virtio_gpu_queue_ctrl_sgs(struct virtio_gpu_device *vgdev,
> if (vq->num_free < elemcnt) {
> spin_unlock(&vgdev->ctrlq.qlock);
> virtio_gpu_notify(vgdev);
> - wait_event(vgdev->ctrlq.ack_queue, vq->num_free >= elemcnt);
> + wait_event(vgdev->ctrlq.ack_queue,
> + vq->num_free >= elemcnt || vgdev->vqs_released);
> + /*
> + * Set by virtio_gpu_release_vqs() to unblock
> + * synchronize_srcu() wait in drm_dev_unplug().
> + */
> + if (vgdev->vqs_released) {
> + if (fence && vbuf->objs)
> + virtio_gpu_array_unlock_resv(vbuf->objs);
> + free_vbuf(vgdev, vbuf);
> + drm_dev_exit(idx);
> + return -ENODEV;
> + }
> goto again;
> }
>
> @@ -566,7 +578,14 @@ static void virtio_gpu_queue_cursor(struct virtio_gpu_device *vgdev,
> ret = virtqueue_add_sgs(vq, sgs, outcnt, 0, vbuf, GFP_ATOMIC);
> if (ret == -ENOSPC) {
> spin_unlock(&vgdev->cursorq.qlock);
> - wait_event(vgdev->cursorq.ack_queue, vq->num_free >= outcnt);
> + wait_event(vgdev->cursorq.ack_queue,
> + vq->num_free >= outcnt || vgdev->vqs_released);
> + /* See comment in virtio_gpu_queue_ctrl_sgs(). */
> + if (vgdev->vqs_released) {
> + free_vbuf(vgdev, vbuf);
> + drm_dev_exit(idx);
> + return;
> + }
> spin_lock(&vgdev->cursorq.qlock);
> goto retry;
> } else {
>
> ---
> base-commit: e43ffb69e0438cddd72aaa30898b4dc446f664f8
> change-id: 20260518-virtio-gpu_wait_event-5aa060754f12
>
> Best regards,
Applied to misc-next, thanks!
--
Best regards,
Dmitry
^ permalink raw reply [flat|nested] 4+ messages in thread
* Claude review: drm/virtio: abort virtqueue wait on device removal to avoid hung task
2026-06-01 7:53 [PATCH v3] drm/virtio: abort virtqueue wait on device removal to avoid hung task Ryosuke Yasuoka
2026-06-03 1:23 ` Dmitry Osipenko
@ 2026-06-04 4:31 ` Claude Code Review Bot
2026-06-04 4:31 ` Claude Code Review Bot
2 siblings, 0 replies; 4+ messages in thread
From: Claude Code Review Bot @ 2026-06-04 4:31 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: drm/virtio: abort virtqueue wait on device removal to avoid hung task
Author: Ryosuke Yasuoka <ryasuoka@redhat.com>
Patches: 2
Reviewed: 2026-06-04T14:31:07.760561
---
This is a single-patch fix for a real hang scenario in the virtio-gpu driver. The problem is well-identified: `wait_event()` calls in `virtio_gpu_queue_ctrl_sgs()` and `virtio_gpu_queue_cursor()` can block indefinitely if the host stops processing commands, and since they're inside `drm_dev_enter()`/`drm_dev_exit()` critical sections, `drm_dev_unplug()` (which calls `synchronize_srcu()`) will also hang, blocking device removal and system shutdown.
The approach — adding a `vqs_released` flag checked in the wait condition, set before `drm_dev_unplug()` — is a reasonable and straightforward solution. The cleanup paths mirror the existing `drm_dev_enter()` failure paths, which is correct.
**Overall: The patch is a solid fix for a real problem. There are a few concerns worth raising, mostly around memory ordering and minor style, but nothing blocking.**
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 4+ messages in thread
* Claude review: drm/virtio: abort virtqueue wait on device removal to avoid hung task
2026-06-01 7:53 [PATCH v3] drm/virtio: abort virtqueue wait on device removal to avoid hung task Ryosuke Yasuoka
2026-06-03 1:23 ` Dmitry Osipenko
2026-06-04 4:31 ` Claude review: " Claude Code Review Bot
@ 2026-06-04 4:31 ` Claude Code Review Bot
2 siblings, 0 replies; 4+ messages in thread
From: Claude Code Review Bot @ 2026-06-04 4:31 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Correctness — memory ordering of `vqs_released`**
The `vqs_released` flag is a plain `bool` written on one CPU and read on another (the waiting thread). While `wake_up_all()` does provide a memory barrier on the waking side, and `wait_event()` rechecks the condition after being woken, there's a subtlety: the `wait_event()` macro checks the condition *before* sleeping. If the flag is set just as another CPU is entering the wait path, the plain bool read could theoretically be stale without an explicit barrier or atomic.
In practice, this is likely fine on x86 (strong memory model) and the `wait_event` internal barriers paper over most cases, but for correctness on all architectures, consider using `READ_ONCE(vgdev->vqs_released)` in the wait conditions and `WRITE_ONCE(vgdev->vqs_released, true)` in `virtio_gpu_release_vqs()`. This is the standard kernel pattern for lockless flag communication:
```c
// In virtio_gpu_release_vqs():
WRITE_ONCE(vgdev->vqs_released, true);
// In wait conditions:
wait_event(vgdev->ctrlq.ack_queue,
vq->num_free >= elemcnt || READ_ONCE(vgdev->vqs_released));
```
This prevents compiler tearing/reordering and is the idiomatic way to do this in the kernel.
**Correctness — the ctrl queue abort path looks correct**
```c
if (vgdev->vqs_released) {
if (fence && vbuf->objs)
virtio_gpu_array_unlock_resv(vbuf->objs);
free_vbuf(vgdev, vbuf);
drm_dev_exit(idx);
return -ENODEV;
}
```
This matches the cleanup in the `drm_dev_enter()` failure path at lines 383-388 of the current tree. The `drm_dev_exit(idx)` call is correct since we entered the SRCU critical section earlier. Good.
**Correctness — the cursor queue abort path looks correct**
```c
if (vgdev->vqs_released) {
free_vbuf(vgdev, vbuf);
drm_dev_exit(idx);
return;
}
```
Mirrors the `drm_dev_enter()` failure path at lines 555-558. The cursor path doesn't have fence/objs handling, consistent with the existing code. Good.
**Design — ordering of `virtio_gpu_release_vqs()` vs `drm_dev_unplug()`**
Placing `virtio_gpu_release_vqs()` before `drm_dev_unplug()` is correct — this is the whole point. The flag unblocks waiters so their SRCU critical sections can complete, allowing `synchronize_srcu()` inside `drm_dev_unplug()` to proceed. The ordering is sound.
**Minor — race window between `vqs_released` check and `goto again`/`goto retry`**
In the ctrl queue path, after the `vqs_released` check:
```c
if (vgdev->vqs_released) {
...
return -ENODEV;
}
goto again;
```
If `vqs_released` is false when checked but becomes true before re-entering the `wait_event`, the next iteration will catch it. This is fine — the condition is rechecked in `wait_event`.
**Minor — `vqs_released` initialization**
The bool field is added to `struct virtio_gpu_device`. Since the struct is allocated with `kzalloc` (through `drm_dev_alloc`), it will be zero-initialized, so `vqs_released` starts as `false`. This is correct.
**Style — comment verbosity**
The comment block in `virtio_gpu_queue_ctrl_sgs()`:
```c
/*
* Set by virtio_gpu_release_vqs() to unblock
* synchronize_srcu() wait in drm_dev_unplug().
*/
```
This is reasonable for explaining the non-obvious interaction with `drm_dev_unplug()`. The shorter `/* See comment in virtio_gpu_queue_ctrl_sgs(). */` cross-reference in the cursor path is appropriate.
**Overall assessment: Good fix. The `READ_ONCE`/`WRITE_ONCE` suggestion is the main feedback worth acting on. The logic, cleanup paths, and ordering are all correct.**
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-06-04 4:31 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-01 7:53 [PATCH v3] drm/virtio: abort virtqueue wait on device removal to avoid hung task Ryosuke Yasuoka
2026-06-03 1:23 ` Dmitry Osipenko
2026-06-04 4:31 ` Claude review: " Claude Code Review Bot
2026-06-04 4:31 ` Claude Code Review Bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox