[PATCH v1 RESEND 0/4] drm/tyr: implement GPU reset API

public inbox for drm-ai-reviews@public-inbox.freedesktop.org
 help / color / mirror / Atom feed

* [PATCH v1 RESEND 0/4] drm/tyr: implement GPU reset API
@ 2026-03-13  9:16 Onur Özkan
  2026-03-13  9:16 ` [PATCH v1 RESEND 1/4] drm/tyr: clear reset IRQ before soft reset Onur Özkan
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Onur Özkan @ 2026-03-13  9:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: dakr, aliceryhl, daniel.almeida, airlied, simona, dri-devel,
	rust-for-linux, Onur Özkan

This series adds GPU reset handling support for Tyr in a new module
drivers/gpu/drm/tyr/driver.rs which encapsulates the low-level reset
controller internals and exposes a ResetHandle API to the driver.

The reset module owns reset state, queueing and execution ordering
through OrderedQueue and handles duplicate/concurrent reset requests
with a pending flag.

Apart from the reset module, the first 3 patches:

- Fixes a potential reset-complete stale state bug by clearing completed
  state before doing soft reset.
- Adds Work::disable_sync() (wrapper of bindings::disable_work_sync).
- Adds OrderedQueue support.

Runtime tested on hardware by Deborah Brouwer (see [1]) and myself.

[1]: https://gitlab.freedesktop.org/panfrost/linux/-/merge_requests/63#note_3364131

Link: https://gitlab.freedesktop.org/panfrost/linux/-/issues/28
---

Onur Özkan (4):
  drm/tyr: clear reset IRQ before soft reset
  rust: add Work::disable_sync
  rust: add ordered workqueue wrapper
  drm/tyr: add GPU reset handling

 drivers/gpu/drm/tyr/driver.rs |  38 +++----
 drivers/gpu/drm/tyr/reset.rs  | 180 ++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/tyr/tyr.rs    |   1 +
 rust/helpers/workqueue.c      |   6 ++
 rust/kernel/workqueue.rs      |  62 ++++++++++++
 5 files changed, 260 insertions(+), 27 deletions(-)
 create mode 100644 drivers/gpu/drm/tyr/reset.rs


base-commit: 0ccc0dac94bf2f5c6eb3e9e7f1014cd9dddf009f
-- 
2.51.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v1 RESEND 1/4] drm/tyr: clear reset IRQ before soft reset
  2026-03-13  9:16 [PATCH v1 RESEND 0/4] drm/tyr: implement GPU reset API Onur Özkan
@ 2026-03-13  9:16 ` Onur Özkan
  2026-03-13 21:15   ` Claude review: " Claude Code Review Bot
  2026-03-13  9:16 ` [PATCH v1 RESEND 2/4] rust: add Work::disable_sync Onur Özkan
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Onur Özkan @ 2026-03-13  9:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: dakr, aliceryhl, daniel.almeida, airlied, simona, dri-devel,
	rust-for-linux, Onur Özkan, Deborah Brouwer

Clear RESET_COMPLETED before writing GPU_CMD_SOFT_RESET.

This is also used in
drivers/gpu/drm/panfrost/panfrost_gpu.c::panfrost_gpu_soft_reset
and avoids seeing old reset-complete status from a previous reset.

Tested-by: Deborah Brouwer <deborah.brouwer@collabora.com>
Signed-off-by: Onur Özkan <work@onurozkan.dev>
---
 drivers/gpu/drm/tyr/driver.rs | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/tyr/driver.rs b/drivers/gpu/drm/tyr/driver.rs
index 69eff2a9e116..f7951804e4e0 100644
--- a/drivers/gpu/drm/tyr/driver.rs
+++ b/drivers/gpu/drm/tyr/driver.rs
@@ -91,6 +91,8 @@ unsafe impl Send for TyrDrmDeviceData {}
 unsafe impl Sync for TyrDrmDeviceData {}
 
 fn issue_soft_reset(dev: &Device<Bound>, iomem: &Devres<IoMem>) -> Result {
+    // Clear any stale reset-complete IRQ state before issuing a new soft reset.
+    regs::GPU_IRQ_CLEAR.write(dev, iomem, regs::GPU_IRQ_RAWSTAT_RESET_COMPLETED)?;
     regs::GPU_CMD.write(dev, iomem, regs::GPU_CMD_SOFT_RESET)?;
 
     poll::read_poll_timeout(
-- 
2.51.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Claude review: drm/tyr: clear reset IRQ before soft reset
  2026-03-13  9:16 ` [PATCH v1 RESEND 1/4] drm/tyr: clear reset IRQ before soft reset Onur Özkan
@ 2026-03-13 21:15   ` Claude Code Review Bot
  0 siblings, 0 replies; 15+ messages in thread
From: Claude Code Review Bot @ 2026-03-13 21:15 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Verdict: Looks good.**

This is a straightforward and correct fix. Clearing `RESET_COMPLETED` from `GPU_IRQ_CLEAR` before issuing `GPU_CMD_SOFT_RESET` prevents the subsequent `poll::read_poll_timeout` from seeing stale status from a previous reset cycle.

```rust
+    // Clear any stale reset-complete IRQ state before issuing a new soft reset.
+    regs::GPU_IRQ_CLEAR.write(dev, iomem, regs::GPU_IRQ_RAWSTAT_RESET_COMPLETED)?;
     regs::GPU_CMD.write(dev, iomem, regs::GPU_CMD_SOFT_RESET)?;
```

This matches the pattern used by `panfrost_gpu_soft_reset()` as the commit message states. No issues.

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v1 RESEND 2/4] rust: add Work::disable_sync
  2026-03-13  9:16 [PATCH v1 RESEND 0/4] drm/tyr: implement GPU reset API Onur Özkan
  2026-03-13  9:16 ` [PATCH v1 RESEND 1/4] drm/tyr: clear reset IRQ before soft reset Onur Özkan
@ 2026-03-13  9:16 ` Onur Özkan
  2026-03-13 12:00   ` Alice Ryhl
  2026-03-13 21:15   ` Claude review: " Claude Code Review Bot
  2026-03-13  9:16 ` [PATCH v1 RESEND 3/4] rust: add ordered workqueue wrapper Onur Özkan
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 15+ messages in thread
From: Onur Özkan @ 2026-03-13  9:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: dakr, aliceryhl, daniel.almeida, airlied, simona, dri-devel,
	rust-for-linux, Onur Özkan, Deborah Brouwer

Add Work::disable_sync() as a safe wrapper for disable_work_sync().

Drivers can use this during teardown to stop new queueing and wait for
queued or running work to finish before dropping related resources.

Tested-by: Deborah Brouwer <deborah.brouwer@collabora.com>
Signed-off-by: Onur Özkan <work@onurozkan.dev>
---
 rust/kernel/workqueue.rs | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/rust/kernel/workqueue.rs b/rust/kernel/workqueue.rs
index 706e833e9702..6acc7b5ba31c 100644
--- a/rust/kernel/workqueue.rs
+++ b/rust/kernel/workqueue.rs
@@ -530,6 +530,21 @@ pub unsafe fn raw_get(ptr: *const Self) -> *mut bindings::work_struct {
         // the compiler does not complain that the `work` field is unused.
         unsafe { Opaque::cast_into(core::ptr::addr_of!((*ptr).work)) }
     }
+
+    /// Disables this work item and waits for queued/running executions to finish.
+    ///
+    /// # Safety
+    ///
+    /// Must be called from a sleepable context if the work was last queued on a non-BH
+    /// workqueue.
+    #[inline]
+    pub unsafe fn disable_sync(&self) {
+        let ptr: *const Self = self;
+        // SAFETY: `self` points to a valid initialized work.
+        let raw_work = unsafe { Self::raw_get(ptr) };
+        // SAFETY: `raw_work` is a valid embedded `work_struct`.
+        unsafe { bindings::disable_work_sync(raw_work) };
+    }
 }
 
 /// Declares that a type contains a [`Work<T, ID>`].
-- 
2.51.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 RESEND 2/4] rust: add Work::disable_sync
  2026-03-13  9:16 ` [PATCH v1 RESEND 2/4] rust: add Work::disable_sync Onur Özkan
@ 2026-03-13 12:00   ` Alice Ryhl
  2026-03-13 21:15   ` Claude review: " Claude Code Review Bot
  1 sibling, 0 replies; 15+ messages in thread
From: Alice Ryhl @ 2026-03-13 12:00 UTC (permalink / raw)
  To: Onur Özkan
  Cc: linux-kernel, dakr, daniel.almeida, airlied, simona, dri-devel,
	rust-for-linux, Deborah Brouwer

On Fri, Mar 13, 2026 at 10:17 AM Onur Özkan <work@onurozkan.dev> wrote:
>
> Add Work::disable_sync() as a safe wrapper for disable_work_sync().
>
> Drivers can use this during teardown to stop new queueing and wait for
> queued or running work to finish before dropping related resources.
>
> Tested-by: Deborah Brouwer <deborah.brouwer@collabora.com>
> Signed-off-by: Onur Özkan <work@onurozkan.dev>
> ---
>  rust/kernel/workqueue.rs | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
>
> diff --git a/rust/kernel/workqueue.rs b/rust/kernel/workqueue.rs
> index 706e833e9702..6acc7b5ba31c 100644
> --- a/rust/kernel/workqueue.rs
> +++ b/rust/kernel/workqueue.rs
> @@ -530,6 +530,21 @@ pub unsafe fn raw_get(ptr: *const Self) -> *mut bindings::work_struct {
>          // the compiler does not complain that the `work` field is unused.
>          unsafe { Opaque::cast_into(core::ptr::addr_of!((*ptr).work)) }
>      }
> +
> +    /// Disables this work item and waits for queued/running executions to finish.
> +    ///
> +    /// # Safety
> +    ///
> +    /// Must be called from a sleepable context if the work was last queued on a non-BH
> +    /// workqueue.

We generally do not make functions unsafe just because they might sleep.

Alice

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Claude review: rust: add Work::disable_sync
  2026-03-13  9:16 ` [PATCH v1 RESEND 2/4] rust: add Work::disable_sync Onur Özkan
  2026-03-13 12:00   ` Alice Ryhl
@ 2026-03-13 21:15   ` Claude Code Review Bot
  1 sibling, 0 replies; 15+ messages in thread
From: Claude Code Review Bot @ 2026-03-13 21:15 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Minor concern: should this really be `unsafe`?**

```rust
+    pub unsafe fn disable_sync(&self) {
+        let ptr: *const Self = self;
+        // SAFETY: `self` points to a valid initialized work.
+        let raw_work = unsafe { Self::raw_get(ptr) };
+        // SAFETY: `raw_work` is a valid embedded `work_struct`.
+        unsafe { bindings::disable_work_sync(raw_work) };
+    }
```

The safety doc says:

```rust
+    /// # Safety
+    ///
+    /// Must be called from a sleepable context if the work was last queued on a non-BH
+    /// workqueue.
```

This is reasonable — `disable_work_sync()` can sleep, so calling it from atomic context would be a bug. However, this pattern ("must be called from sleepable context") is very common in kernel APIs and many Rust wrappers handle it differently (e.g., via type-level guarantees or just making it safe with a documented precondition, since calling from atomic context is already UB-adjacent behavior the kernel catches with `might_sleep()` debugging). Consider whether making this `unsafe` is the right abstraction level — the Rust workqueue `Queue` type's methods like `enqueue` are safe despite having their own context requirements.

Also, the `# Safety` section is documenting a caller requirement but this is a method on `&self` — the naming convention in kernel Rust is typically to use `# Safety` for `unsafe` preconditions, which this does, but you might also want a `# Panics` or general doc section clarifying when it's valid to call. Minor style nit.

**Missing C helper**: The patch adds a Rust wrapper for `disable_work_sync` but doesn't add a corresponding C helper in `rust/helpers/workqueue.c`. Looking at the binding, `bindings::disable_work_sync` would need to resolve to the C function. Since `disable_work_sync()` in `include/linux/workqueue.h` is a regular (non-inline, non-macro) extern function, this should work via direct bindgen without a helper. This is fine.

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v1 RESEND 3/4] rust: add ordered workqueue wrapper
  2026-03-13  9:16 [PATCH v1 RESEND 0/4] drm/tyr: implement GPU reset API Onur Özkan
  2026-03-13  9:16 ` [PATCH v1 RESEND 1/4] drm/tyr: clear reset IRQ before soft reset Onur Özkan
  2026-03-13  9:16 ` [PATCH v1 RESEND 2/4] rust: add Work::disable_sync Onur Özkan
@ 2026-03-13  9:16 ` Onur Özkan
  2026-03-13 21:15   ` Claude review: " Claude Code Review Bot
  2026-03-13  9:16 ` [PATCH v1 RESEND 4/4] drm/tyr: add GPU reset handling Onur Özkan
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 15+ messages in thread
From: Onur Özkan @ 2026-03-13  9:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: dakr, aliceryhl, daniel.almeida, airlied, simona, dri-devel,
	rust-for-linux, Onur Özkan, Deborah Brouwer

Add an owned OrderedQueue wrapper for alloc_ordered_workqueue() and
destroy_workqueue().

This gives Rust drivers a simple way to create and own an ordered
workqueue with automatic cleanup in Drop.

Tested-by: Deborah Brouwer <deborah.brouwer@collabora.com>
Signed-off-by: Onur Özkan <work@onurozkan.dev>
---
 rust/helpers/workqueue.c |  6 +++++
 rust/kernel/workqueue.rs | 47 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+)

diff --git a/rust/helpers/workqueue.c b/rust/helpers/workqueue.c
index ce1c3a5b2150..7cd3b000a5b6 100644
--- a/rust/helpers/workqueue.c
+++ b/rust/helpers/workqueue.c
@@ -14,3 +14,9 @@ __rust_helper void rust_helper_init_work_with_key(struct work_struct *work,
 	INIT_LIST_HEAD(&work->entry);
 	work->func = func;
 }
+
+__rust_helper struct workqueue_struct *
+rust_helper_alloc_ordered_workqueue(const char *name, unsigned int flags)
+{
+	return alloc_ordered_workqueue("%s", flags, name);
+}
diff --git a/rust/kernel/workqueue.rs b/rust/kernel/workqueue.rs
index 6acc7b5ba31c..d5aa61a5ef93 100644
--- a/rust/kernel/workqueue.rs
+++ b/rust/kernel/workqueue.rs
@@ -195,6 +195,7 @@
     types::Opaque,
 };
 use core::marker::PhantomData;
+use core::ptr::NonNull;
 
 /// Creates a [`Work`] initialiser with the given name and a newly-created lock class.
 #[macro_export]
@@ -346,6 +347,52 @@ pub fn try_spawn<T: 'static + Send + FnOnce()>(
     }
 }
 
+/// A kernel work queue that allocates and owns an ordered `workqueue_struct`.
+///
+/// Unlike [`Queue`], [`OrderedQueue`] takes ownership of the underlying C
+/// workqueue and automatically destroys it when dropped.
+pub struct OrderedQueue(NonNull<bindings::workqueue_struct>);
+
+// SAFETY: Workqueue objects are thread-safe to share and use concurrently.
+unsafe impl Send for OrderedQueue {}
+// SAFETY: Workqueue objects are thread-safe to share and use concurrently.
+unsafe impl Sync for OrderedQueue {}
+
+impl OrderedQueue {
+    /// Allocates an ordered workqueue.
+    ///
+    /// It is equivalent to C's `alloc_ordered_workqueue()`.
+    pub fn new(name: &'static CStr, flags: u32) -> Result<Self> {
+        // SAFETY: `name` is a `&'static CStr`, guaranteeing a valid, null-terminated C
+        // string pointer for the duration of this call.
+        let ptr = unsafe { bindings::alloc_ordered_workqueue(name.as_char_ptr(), flags) };
+        let ptr = NonNull::new(ptr).ok_or(ENOMEM)?;
+        Ok(Self(ptr))
+    }
+
+    /// Enqueues a work item.
+    ///
+    /// This may fail if the work item is already enqueued in a workqueue.
+    ///
+    /// The work item will be submitted using `WORK_CPU_UNBOUND`.
+    pub fn enqueue<W, const ID: u64>(&self, w: W) -> W::EnqueueOutput
+    where
+        W: RawWorkItem<ID> + Send + 'static,
+    {
+        // SAFETY: `self.0` is valid while `self` is alive.
+        unsafe { Queue::from_raw(self.0.as_ptr()) }.enqueue(w)
+    }
+}
+
+impl Drop for OrderedQueue {
+    fn drop(&mut self) {
+        // SAFETY:
+        // - Pointer comes from `alloc_ordered_workqueue()` and is owned by `self`.
+        // - `OrderedQueue` does not expose delayed scheduling API.
+        unsafe { bindings::destroy_workqueue(self.0.as_ptr()) };
+    }
+}
+
 /// A helper type used in [`try_spawn`].
 ///
 /// [`try_spawn`]: Queue::try_spawn
-- 
2.51.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Claude review: rust: add ordered workqueue wrapper
  2026-03-13  9:16 ` [PATCH v1 RESEND 3/4] rust: add ordered workqueue wrapper Onur Özkan
@ 2026-03-13 21:15   ` Claude Code Review Bot
  0 siblings, 0 replies; 15+ messages in thread
From: Claude Code Review Bot @ 2026-03-13 21:15 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Concerns about the helper and API design.**

The C helper:

```c
+rust_helper_alloc_ordered_workqueue(const char *name, unsigned int flags)
+{
+	return alloc_ordered_workqueue("%s", flags, name);
+}
```

This correctly avoids format string issues by using `"%s"` as the format. Good.

The `OrderedQueue` type:

```rust
+pub struct OrderedQueue(NonNull<bindings::workqueue_struct>);
```

1. **`enqueue` uses `Queue::from_raw`**: The `enqueue` method does:

```rust
+    pub fn enqueue<W, const ID: u64>(&self, w: W) -> W::EnqueueOutput
+    where
+        W: RawWorkItem<ID> + Send + 'static,
+    {
+        // SAFETY: `self.0` is valid while `self` is alive.
+        unsafe { Queue::from_raw(self.0.as_ptr()) }.enqueue(w)
+    }
```

This creates a temporary `Queue` reference from the raw pointer just to call `enqueue`. This works but means `OrderedQueue` doesn't expose other `Queue` methods (like `try_spawn`). This is fine for the current use case but you might consider providing an `as_queue(&self) -> &Queue` method or implementing `Deref<Target=Queue>` for more general use. That said, keeping the API minimal is also reasonable for now.

2. **Drop safety comment**: The drop impl says:

```rust
+        // - `OrderedQueue` does not expose delayed scheduling API.
```

This safety comment seems tangential — the key invariant for `destroy_workqueue` is that the workqueue was allocated by `alloc_ordered_workqueue` and is being destroyed exactly once, which is guaranteed by `NonNull` ownership semantics.

3. **No `flush_workqueue` method**: Before destruction, `destroy_workqueue` in C already drains pending work. This is fine, but callers using `disable_work_sync` (patch 2) before dropping individual work items should be aware of the ordering — which patch 4 handles correctly.

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v1 RESEND 4/4] drm/tyr: add GPU reset handling
  2026-03-13  9:16 [PATCH v1 RESEND 0/4] drm/tyr: implement GPU reset API Onur Özkan
                   ` (2 preceding siblings ...)
  2026-03-13  9:16 ` [PATCH v1 RESEND 3/4] rust: add ordered workqueue wrapper Onur Özkan
@ 2026-03-13  9:16 ` Onur Özkan
  2026-03-13 14:56   ` Daniel Almeida
  2026-03-13 21:15   ` Claude review: " Claude Code Review Bot
  2026-03-13  9:52 ` [PATCH v1 RESEND 0/4] drm/tyr: implement GPU reset API Alice Ryhl
  2026-03-13 21:15 ` Claude review: " Claude Code Review Bot
  5 siblings, 2 replies; 15+ messages in thread
From: Onur Özkan @ 2026-03-13  9:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: dakr, aliceryhl, daniel.almeida, airlied, simona, dri-devel,
	rust-for-linux, Onur Özkan, Deborah Brouwer

Move Tyr reset logic into a new reset module and add async reset work.

This adds:
- ResetHandle with internal controller state
- a dedicated ordered reset workqueue
- a pending flag to avoid duplicate queued resets
- run_reset() as the shared synchronous reset helper

Probe now calls reset::run_reset() before normal init. Driver data now
keeps ResetHandle so reset work is drained before clocks and regulators
are dropped.

Tested-by: Deborah Brouwer <deborah.brouwer@collabora.com>
Signed-off-by: Onur Özkan <work@onurozkan.dev>
---
 drivers/gpu/drm/tyr/driver.rs |  40 +++-----
 drivers/gpu/drm/tyr/reset.rs  | 180 ++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/tyr/tyr.rs    |   1 +
 3 files changed, 192 insertions(+), 29 deletions(-)
 create mode 100644 drivers/gpu/drm/tyr/reset.rs

diff --git a/drivers/gpu/drm/tyr/driver.rs b/drivers/gpu/drm/tyr/driver.rs
index f7951804e4e0..c80238a21ff2 100644
--- a/drivers/gpu/drm/tyr/driver.rs
+++ b/drivers/gpu/drm/tyr/driver.rs
@@ -6,11 +6,8 @@
         OptionalClk, //
     },
     device::{
-        Bound,
-        Core,
-        Device, //
+        Core, //
     },
-    devres::Devres,
     dma::{
         Device as DmaDevice,
         DmaMask, //
@@ -22,10 +19,7 @@
         Registered,
         UnregisteredDevice, //
     },
-    io::poll,
-    new_mutex,
-    of,
-    platform,
+    new_mutex, of, platform,
     prelude::*,
     regulator,
     regulator::Regulator,
@@ -35,17 +29,15 @@
         Arc,
         Mutex, //
     },
-    time, //
 };
 
 use crate::{
     file::TyrDrmFileData,
     fw::Firmware,
     gem::BoData,
-    gpu,
     gpu::GpuInfo,
     mmu::Mmu,
-    regs, //
+    reset, //
 };
 
 pub(crate) type IoMem = kernel::io::mem::IoMem<SZ_2M>;
@@ -62,6 +54,11 @@ pub(crate) struct TyrPlatformDriverData {
 
 #[pin_data]
 pub(crate) struct TyrDrmDeviceData {
+    // `ResetHandle::drop()` drains queued/running works and this must happen
+    // before clocks/regulators are dropped. So keep this field before them to
+    // ensure the correct drop order.
+    pub(crate) reset: reset::ResetHandle,
+
     pub(crate) pdev: ARef<platform::Device>,
 
     pub(crate) fw: Arc<Firmware>,
@@ -90,22 +87,6 @@ unsafe impl Send for TyrDrmDeviceData {}
 // SAFETY: This will be removed in a future patch.
 unsafe impl Sync for TyrDrmDeviceData {}
 
-fn issue_soft_reset(dev: &Device<Bound>, iomem: &Devres<IoMem>) -> Result {
-    // Clear any stale reset-complete IRQ state before issuing a new soft reset.
-    regs::GPU_IRQ_CLEAR.write(dev, iomem, regs::GPU_IRQ_RAWSTAT_RESET_COMPLETED)?;
-    regs::GPU_CMD.write(dev, iomem, regs::GPU_CMD_SOFT_RESET)?;
-
-    poll::read_poll_timeout(
-        || regs::GPU_IRQ_RAWSTAT.read(dev, iomem),
-        |status| *status & regs::GPU_IRQ_RAWSTAT_RESET_COMPLETED != 0,
-        time::Delta::from_millis(1),
-        time::Delta::from_millis(100),
-    )
-    .inspect_err(|_| dev_err!(dev, "GPU reset failed."))?;
-
-    Ok(())
-}
-
 kernel::of_device_table!(
     OF_TABLE,
     MODULE_OF_TABLE,
@@ -138,8 +119,7 @@ fn probe(
         let request = pdev.io_request_by_index(0).ok_or(ENODEV)?;
         let iomem = Arc::pin_init(request.iomap_sized::<SZ_2M>(), GFP_KERNEL)?;
 
-        issue_soft_reset(pdev.as_ref(), &iomem)?;
-        gpu::l2_power_on(pdev.as_ref(), &iomem)?;
+        reset::run_reset(pdev.as_ref(), &iomem)?;
 
         let gpu_info = GpuInfo::new(pdev.as_ref(), &iomem)?;
         gpu_info.log(pdev);
@@ -153,6 +133,7 @@ fn probe(
 
         let uninit_ddev = UnregisteredDevice::<TyrDrmDriver>::new(pdev.as_ref())?;
         let platform: ARef<platform::Device> = pdev.into();
+        let reset = reset::ResetHandle::new(platform.clone(), iomem.clone())?;
 
         let mmu = Mmu::new(pdev, iomem.as_arc_borrow(), &gpu_info)?;
 
@@ -178,6 +159,7 @@ fn probe(
                     _mali: mali_regulator,
                     _sram: sram_regulator,
                 }),
+                reset,
                 gpu_info,
         });
         let ddev = Registration::new_foreign_owned(uninit_ddev, pdev.as_ref(), data, 0)?;
diff --git a/drivers/gpu/drm/tyr/reset.rs b/drivers/gpu/drm/tyr/reset.rs
new file mode 100644
index 000000000000..29dfae98b0dd
--- /dev/null
+++ b/drivers/gpu/drm/tyr/reset.rs
@@ -0,0 +1,180 @@
+// SPDX-License-Identifier: GPL-2.0 or MIT
+
+//! Provides asynchronous reset handling for the Tyr DRM driver via
+//! [`ResetHandle`], which runs reset work on a dedicated ordered
+//! workqueue and avoids duplicate pending resets.
+
+use kernel::{
+    device::{
+        Bound,
+        Device, //
+    },
+    devres::Devres,
+    io::poll,
+    platform,
+    prelude::*,
+    sync::{
+        aref::ARef,
+        atomic::{
+            Acquire,
+            Atomic,
+            Relaxed,
+            Release, //
+        },
+        Arc,
+    },
+    time,
+    workqueue::{
+        self,
+        Work, //
+    },
+};
+
+use crate::{
+    driver::IoMem,
+    gpu,
+    regs, //
+};
+
+/// Manages asynchronous GPU reset handling and ensures only a single reset
+/// work is pending at a time.
+#[pin_data]
+struct Controller {
+    /// Platform device reference needed for reset operations and logging.
+    pdev: ARef<platform::Device>,
+    /// Mapped register space needed for reset operations.
+    iomem: Arc<Devres<IoMem>>,
+    /// Atomic flag for controlling the scheduling pending state.
+    pending: Atomic<bool>,
+    /// Dedicated ordered workqueue for reset operations.
+    wq: workqueue::OrderedQueue,
+    /// Work item backing async reset processing.
+    #[pin]
+    work: Work<Controller>,
+}
+
+kernel::impl_has_work! {
+    impl HasWork<Controller> for Controller { self.work }
+}
+
+impl workqueue::WorkItem for Controller {
+    type Pointer = Arc<Self>;
+
+    fn run(this: Arc<Self>) {
+        this.reset_work();
+    }
+}
+
+impl Controller {
+    /// Creates a [`Controller`] instance.
+    fn new(pdev: ARef<platform::Device>, iomem: Arc<Devres<IoMem>>) -> Result<Arc<Self>> {
+        let wq = workqueue::OrderedQueue::new(c"tyr-reset-wq", 0)?;
+
+        Arc::pin_init(
+            try_pin_init!(Self {
+                pdev,
+                iomem,
+                pending: Atomic::new(false),
+                wq,
+                work <- kernel::new_work!("tyr::reset"),
+            }),
+            GFP_KERNEL,
+        )
+    }
+
+    /// Processes one scheduled reset request.
+    ///
+    /// Panthor reference:
+    /// - drivers/gpu/drm/panthor/panthor_device.c::panthor_device_reset_work()
+    fn reset_work(self: &Arc<Self>) {
+        dev_info!(self.pdev.as_ref(), "GPU reset work is started.\n");
+
+        // SAFETY: `Controller` is part of driver-private data and only exists
+        // while the platform device is bound.
+        let pdev = unsafe { self.pdev.as_ref().as_bound() };
+        if let Err(e) = run_reset(pdev, &self.iomem) {
+            dev_err!(self.pdev.as_ref(), "GPU reset failed: {:?}\n", e);
+        } else {
+            dev_info!(self.pdev.as_ref(), "GPU reset work is done.\n");
+        }
+
+        self.pending.store(false, Release);
+    }
+}
+
+/// Reset handle that shuts down pending work gracefully on drop.
+pub(crate) struct ResetHandle(Arc<Controller>);
+
+impl ResetHandle {
+    /// Creates a [`ResetHandle`] instance.
+    pub(crate) fn new(pdev: ARef<platform::Device>, iomem: Arc<Devres<IoMem>>) -> Result<Self> {
+        Ok(Self(Controller::new(pdev, iomem)?))
+    }
+
+    /// Schedules reset work.
+    #[expect(dead_code)]
+    pub(crate) fn schedule(&self) {
+        // TODO: Similar to `panthor_device_schedule_reset()` in Panthor, add a
+        // power management check once Tyr supports it.
+
+        // Keep only one reset request running or queued. If one is already pending,
+        // we ignore new schedule requests.
+        if self.0.pending.cmpxchg(false, true, Relaxed).is_ok()
+            && self.0.wq.enqueue(self.0.clone()).is_err()
+        {
+            self.0.pending.store(false, Release);
+        }
+    }
+
+    /// Returns true if a reset is queued or in progress.
+    ///
+    /// Note that the state can change immediately after the return.
+    #[inline]
+    #[expect(dead_code)]
+    pub(crate) fn is_pending(&self) -> bool {
+        self.0.pending.load(Acquire)
+    }
+}
+
+impl Drop for ResetHandle {
+    fn drop(&mut self) {
+        // Drain queued/running work and block future queueing attempts for this
+        // work item before clocks/regulators are torn down.
+        // SAFETY: drop executes in a sleepable context.
+        unsafe { self.0.work.disable_sync() };
+    }
+}
+
+/// Issues a soft reset command and waits for reset-complete IRQ status.
+fn issue_soft_reset(dev: &Device<Bound>, iomem: &Devres<IoMem>) -> Result {
+    // Clear any stale reset-complete IRQ state before issuing a new soft reset.
+    regs::GPU_IRQ_CLEAR.write(dev, iomem, regs::GPU_IRQ_RAWSTAT_RESET_COMPLETED)?;
+    regs::GPU_CMD.write(dev, iomem, regs::GPU_CMD_SOFT_RESET)?;
+
+    poll::read_poll_timeout(
+        || regs::GPU_IRQ_RAWSTAT.read(dev, iomem),
+        |status| *status & regs::GPU_IRQ_RAWSTAT_RESET_COMPLETED != 0,
+        time::Delta::from_millis(1),
+        time::Delta::from_millis(100),
+    )
+    .inspect_err(|_| dev_err!(dev, "GPU reset failed."))?;
+
+    Ok(())
+}
+
+/// Runs one synchronous GPU reset pass.
+///
+/// Its visibility is `pub(super)` only so the probe path can run an
+/// initial reset; it is not part of this module's public API.
+///
+/// On success, the GPU is left in a state suitable for reinitialization.
+///
+/// The reset sequence is as follows:
+/// 1. Trigger a GPU soft reset.
+/// 2. Wait for the reset-complete IRQ status.
+/// 3. Power L2 back on.
+pub(super) fn run_reset(dev: &Device<Bound>, iomem: &Devres<IoMem>) -> Result {
+    issue_soft_reset(dev, iomem)?;
+    gpu::l2_power_on(dev, iomem)?;
+    Ok(())
+}
diff --git a/drivers/gpu/drm/tyr/tyr.rs b/drivers/gpu/drm/tyr/tyr.rs
index 18b0668bb217..d0349bc49f27 100644
--- a/drivers/gpu/drm/tyr/tyr.rs
+++ b/drivers/gpu/drm/tyr/tyr.rs
@@ -14,6 +14,7 @@
 mod gpu;
 mod mmu;
 mod regs;
+mod reset;
 mod slot;
 mod vm;
 
-- 
2.51.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 RESEND 4/4] drm/tyr: add GPU reset handling
  2026-03-13  9:16 ` [PATCH v1 RESEND 4/4] drm/tyr: add GPU reset handling Onur Özkan
@ 2026-03-13 14:56   ` Daniel Almeida
  2026-03-13 21:15   ` Claude review: " Claude Code Review Bot
  1 sibling, 0 replies; 15+ messages in thread
From: Daniel Almeida @ 2026-03-13 14:56 UTC (permalink / raw)
  To: Onur Özkan
  Cc: linux-kernel, dakr, aliceryhl, airlied, simona, dri-devel,
	rust-for-linux, Deborah Brouwer



> On 13 Mar 2026, at 06:16, Onur Özkan <work@onurozkan.dev> wrote:
> 
> Move Tyr reset logic into a new reset module and add async reset work.
> 
> This adds:
> - ResetHandle with internal controller state
> - a dedicated ordered reset workqueue
> - a pending flag to avoid duplicate queued resets
> - run_reset() as the shared synchronous reset helper
> 
> Probe now calls reset::run_reset() before normal init. Driver data now
> keeps ResetHandle so reset work is drained before clocks and regulators
> are dropped.
> 
> Tested-by: Deborah Brouwer <deborah.brouwer@collabora.com>
> Signed-off-by: Onur Özkan <work@onurozkan.dev>
> ---
> drivers/gpu/drm/tyr/driver.rs |  40 +++-----
> drivers/gpu/drm/tyr/reset.rs  | 180 ++++++++++++++++++++++++++++++++++
> drivers/gpu/drm/tyr/tyr.rs    |   1 +
> 3 files changed, 192 insertions(+), 29 deletions(-)
> create mode 100644 drivers/gpu/drm/tyr/reset.rs
> 
> diff --git a/drivers/gpu/drm/tyr/driver.rs b/drivers/gpu/drm/tyr/driver.rs
> index f7951804e4e0..c80238a21ff2 100644
> --- a/drivers/gpu/drm/tyr/driver.rs
> +++ b/drivers/gpu/drm/tyr/driver.rs
> @@ -6,11 +6,8 @@
>         OptionalClk, //
>     },
>     device::{
> -        Bound,
> -        Core,
> -        Device, //
> +        Core, //
>     },
> -    devres::Devres,
>     dma::{
>         Device as DmaDevice,
>         DmaMask, //
> @@ -22,10 +19,7 @@
>         Registered,
>         UnregisteredDevice, //
>     },
> -    io::poll,
> -    new_mutex,
> -    of,
> -    platform,
> +    new_mutex, of, platform,
>     prelude::*,
>     regulator,
>     regulator::Regulator,
> @@ -35,17 +29,15 @@
>         Arc,
>         Mutex, //
>     },
> -    time, //
> };
> 
> use crate::{
>     file::TyrDrmFileData,
>     fw::Firmware,
>     gem::BoData,
> -    gpu,
>     gpu::GpuInfo,
>     mmu::Mmu,
> -    regs, //
> +    reset, //
> };
> 
> pub(crate) type IoMem = kernel::io::mem::IoMem<SZ_2M>;
> @@ -62,6 +54,11 @@ pub(crate) struct TyrPlatformDriverData {
> 
> #[pin_data]
> pub(crate) struct TyrDrmDeviceData {
> +    // `ResetHandle::drop()` drains queued/running works and this must happen
> +    // before clocks/regulators are dropped. So keep this field before them to
> +    // ensure the correct drop order.
> +    pub(crate) reset: reset::ResetHandle,
> +
>     pub(crate) pdev: ARef<platform::Device>,
> 
>     pub(crate) fw: Arc<Firmware>,
> @@ -90,22 +87,6 @@ unsafe impl Send for TyrDrmDeviceData {}
> // SAFETY: This will be removed in a future patch.
> unsafe impl Sync for TyrDrmDeviceData {}
> 
> -fn issue_soft_reset(dev: &Device<Bound>, iomem: &Devres<IoMem>) -> Result {
> -    // Clear any stale reset-complete IRQ state before issuing a new soft reset.
> -    regs::GPU_IRQ_CLEAR.write(dev, iomem, regs::GPU_IRQ_RAWSTAT_RESET_COMPLETED)?;
> -    regs::GPU_CMD.write(dev, iomem, regs::GPU_CMD_SOFT_RESET)?;
> -
> -    poll::read_poll_timeout(
> -        || regs::GPU_IRQ_RAWSTAT.read(dev, iomem),
> -        |status| *status & regs::GPU_IRQ_RAWSTAT_RESET_COMPLETED != 0,
> -        time::Delta::from_millis(1),
> -        time::Delta::from_millis(100),
> -    )
> -    .inspect_err(|_| dev_err!(dev, "GPU reset failed."))?;
> -
> -    Ok(())
> -}
> -
> kernel::of_device_table!(
>     OF_TABLE,
>     MODULE_OF_TABLE,
> @@ -138,8 +119,7 @@ fn probe(
>         let request = pdev.io_request_by_index(0).ok_or(ENODEV)?;
>         let iomem = Arc::pin_init(request.iomap_sized::<SZ_2M>(), GFP_KERNEL)?;
> 
> -        issue_soft_reset(pdev.as_ref(), &iomem)?;
> -        gpu::l2_power_on(pdev.as_ref(), &iomem)?;
> +        reset::run_reset(pdev.as_ref(), &iomem)?;
> 
>         let gpu_info = GpuInfo::new(pdev.as_ref(), &iomem)?;
>         gpu_info.log(pdev);
> @@ -153,6 +133,7 @@ fn probe(
> 
>         let uninit_ddev = UnregisteredDevice::<TyrDrmDriver>::new(pdev.as_ref())?;
>         let platform: ARef<platform::Device> = pdev.into();
> +        let reset = reset::ResetHandle::new(platform.clone(), iomem.clone())?;
> 
>         let mmu = Mmu::new(pdev, iomem.as_arc_borrow(), &gpu_info)?;
> 
> @@ -178,6 +159,7 @@ fn probe(
>                     _mali: mali_regulator,
>                     _sram: sram_regulator,
>                 }),
> +                reset,
>                 gpu_info,
>         });
>         let ddev = Registration::new_foreign_owned(uninit_ddev, pdev.as_ref(), data, 0)?;
> diff --git a/drivers/gpu/drm/tyr/reset.rs b/drivers/gpu/drm/tyr/reset.rs
> new file mode 100644
> index 000000000000..29dfae98b0dd
> --- /dev/null
> +++ b/drivers/gpu/drm/tyr/reset.rs
> @@ -0,0 +1,180 @@
> +// SPDX-License-Identifier: GPL-2.0 or MIT
> +
> +//! Provides asynchronous reset handling for the Tyr DRM driver via
> +//! [`ResetHandle`], which runs reset work on a dedicated ordered
> +//! workqueue and avoids duplicate pending resets.
> +
> +use kernel::{
> +    device::{
> +        Bound,
> +        Device, //
> +    },
> +    devres::Devres,
> +    io::poll,
> +    platform,
> +    prelude::*,
> +    sync::{
> +        aref::ARef,
> +        atomic::{
> +            Acquire,
> +            Atomic,
> +            Relaxed,
> +            Release, //
> +        },
> +        Arc,
> +    },
> +    time,
> +    workqueue::{
> +        self,
> +        Work, //
> +    },
> +};
> +
> +use crate::{
> +    driver::IoMem,
> +    gpu,
> +    regs, //
> +};
> +
> +/// Manages asynchronous GPU reset handling and ensures only a single reset
> +/// work is pending at a time.
> +#[pin_data]
> +struct Controller {
> +    /// Platform device reference needed for reset operations and logging.
> +    pdev: ARef<platform::Device>,
> +    /// Mapped register space needed for reset operations.
> +    iomem: Arc<Devres<IoMem>>,
> +    /// Atomic flag for controlling the scheduling pending state.
> +    pending: Atomic<bool>,
> +    /// Dedicated ordered workqueue for reset operations.
> +    wq: workqueue::OrderedQueue,
> +    /// Work item backing async reset processing.
> +    #[pin]
> +    work: Work<Controller>,
> +}
> +
> +kernel::impl_has_work! {
> +    impl HasWork<Controller> for Controller { self.work }
> +}
> +
> +impl workqueue::WorkItem for Controller {
> +    type Pointer = Arc<Self>;
> +
> +    fn run(this: Arc<Self>) {
> +        this.reset_work();
> +    }
> +}
> +
> +impl Controller {
> +    /// Creates a [`Controller`] instance.
> +    fn new(pdev: ARef<platform::Device>, iomem: Arc<Devres<IoMem>>) -> Result<Arc<Self>> {
> +        let wq = workqueue::OrderedQueue::new(c"tyr-reset-wq", 0)?;
> +
> +        Arc::pin_init(
> +            try_pin_init!(Self {
> +                pdev,
> +                iomem,
> +                pending: Atomic::new(false),
> +                wq,
> +                work <- kernel::new_work!("tyr::reset"),
> +            }),
> +            GFP_KERNEL,
> +        )
> +    }
> +
> +    /// Processes one scheduled reset request.
> +    ///
> +    /// Panthor reference:
> +    /// - drivers/gpu/drm/panthor/panthor_device.c::panthor_device_reset_work()
> +    fn reset_work(self: &Arc<Self>) {
> +        dev_info!(self.pdev.as_ref(), "GPU reset work is started.\n");
> +
> +        // SAFETY: `Controller` is part of driver-private data and only exists
> +        // while the platform device is bound.
> +        let pdev = unsafe { self.pdev.as_ref().as_bound() };
> +        if let Err(e) = run_reset(pdev, &self.iomem) {
> +            dev_err!(self.pdev.as_ref(), "GPU reset failed: {:?}\n", e);
> +        } else {
> +            dev_info!(self.pdev.as_ref(), "GPU reset work is done.\n");
> +        }

Can we have more descriptive strings here? A user cares little for
implementation details such as “reset work”, what they care about is
that the hardware is undergoing a reset.

> +
> +        self.pending.store(false, Release);
> +    }
> +}
> +
> +/// Reset handle that shuts down pending work gracefully on drop.
> +pub(crate) struct ResetHandle(Arc<Controller>);
> +

Why is this an Arc? There seems to be a single owner?

> +impl ResetHandle {
> +    /// Creates a [`ResetHandle`] instance.
> +    pub(crate) fn new(pdev: ARef<platform::Device>, iomem: Arc<Devres<IoMem>>) -> Result<Self> {
> +        Ok(Self(Controller::new(pdev, iomem)?))
> +    }
> +
> +    /// Schedules reset work.
> +    #[expect(dead_code)]
> +    pub(crate) fn schedule(&self) {
> +        // TODO: Similar to `panthor_device_schedule_reset()` in Panthor, add a
> +        // power management check once Tyr supports it.
> +
> +        // Keep only one reset request running or queued. If one is already pending,
> +        // we ignore new schedule requests.
> +        if self.0.pending.cmpxchg(false, true, Relaxed).is_ok()
> +            && self.0.wq.enqueue(self.0.clone()).is_err()
> +        {
> +            self.0.pending.store(false, Release);
> +        }
> +    }
> +
> +    /// Returns true if a reset is queued or in progress.
> +    ///
> +    /// Note that the state can change immediately after the return.
> +    #[inline]
> +    #[expect(dead_code)]
> +    pub(crate) fn is_pending(&self) -> bool {
> +        self.0.pending.load(Acquire)
> +    }

> +}
> +
> +impl Drop for ResetHandle {
> +    fn drop(&mut self) {
> +        // Drain queued/running work and block future queueing attempts for this
> +        // work item before clocks/regulators are torn down.
> +        // SAFETY: drop executes in a sleepable context.
> +        unsafe { self.0.work.disable_sync() };
> +    }
> +}
> +
> +/// Issues a soft reset command and waits for reset-complete IRQ status.
> +fn issue_soft_reset(dev: &Device<Bound>, iomem: &Devres<IoMem>) -> Result {
> +    // Clear any stale reset-complete IRQ state before issuing a new soft reset.
> +    regs::GPU_IRQ_CLEAR.write(dev, iomem, regs::GPU_IRQ_RAWSTAT_RESET_COMPLETED)?;
> +    regs::GPU_CMD.write(dev, iomem, regs::GPU_CMD_SOFT_RESET)?;
> +
> +    poll::read_poll_timeout(
> +        || regs::GPU_IRQ_RAWSTAT.read(dev, iomem),
> +        |status| *status & regs::GPU_IRQ_RAWSTAT_RESET_COMPLETED != 0,
> +        time::Delta::from_millis(1),
> +        time::Delta::from_millis(100),
> +    )
> +    .inspect_err(|_| dev_err!(dev, "GPU reset failed."))?;
> +
> +    Ok(())
> +}
> +
> +/// Runs one synchronous GPU reset pass.
> +///
> +/// Its visibility is `pub(super)` only so the probe path can run an
> +/// initial reset; it is not part of this module's public API.
> +///
> +/// On success, the GPU is left in a state suitable for reinitialization.
> +///
> +/// The reset sequence is as follows:
> +/// 1. Trigger a GPU soft reset.
> +/// 2. Wait for the reset-complete IRQ status.
> +/// 3. Power L2 back on.
> +pub(super) fn run_reset(dev: &Device<Bound>, iomem: &Devres<IoMem>) -> Result {
> +    issue_soft_reset(dev, iomem)?;
> +    gpu::l2_power_on(dev, iomem)?;
> +    Ok(())
> +}
> diff --git a/drivers/gpu/drm/tyr/tyr.rs b/drivers/gpu/drm/tyr/tyr.rs
> index 18b0668bb217..d0349bc49f27 100644
> --- a/drivers/gpu/drm/tyr/tyr.rs
> +++ b/drivers/gpu/drm/tyr/tyr.rs
> @@ -14,6 +14,7 @@
> mod gpu;
> mod mmu;
> mod regs;
> +mod reset;
> mod slot;
> mod vm;
> 
> -- 
> 2.51.2
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Claude review: drm/tyr: add GPU reset handling
  2026-03-13  9:16 ` [PATCH v1 RESEND 4/4] drm/tyr: add GPU reset handling Onur Özkan
  2026-03-13 14:56   ` Daniel Almeida
@ 2026-03-13 21:15   ` Claude Code Review Bot
  1 sibling, 0 replies; 15+ messages in thread
From: Claude Code Review Bot @ 2026-03-13 21:15 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**This is the main patch and has several items worth discussing.**

#### 1. `as_bound()` safety in async work context

```rust
+    fn reset_work(self: &Arc<Self>) {
+        // ...
+        // SAFETY: `Controller` is part of driver-private data and only exists
+        // while the platform device is bound.
+        let pdev = unsafe { self.pdev.as_ref().as_bound() };
```

This safety argument is the weakest point in the series. The `Controller` holds an `ARef<platform::Device>` which is a refcounted reference — it keeps the device alive, but **being alive is not the same as being bound**. The device could be unbound (e.g., during module removal or device hot-unplug) while the workqueue work item is still executing. The `disable_sync()` in `ResetHandle::drop()` is meant to prevent this, but the safety argument in the comment doesn't articulate this clearly.

The safety justification should reference that `ResetHandle::drop()` calls `disable_sync()` which guarantees no work is running after that point, and `ResetHandle` is dropped before the device unbinds (because it's embedded in `TyrDrmDeviceData` which is driver-private data released during unbind). This chain of reasoning needs to be explicit in the safety comment.

#### 2. Memory ordering on `pending` flag

```rust
+        if self.0.pending.cmpxchg(false, true, Relaxed).is_ok()
+            && self.0.wq.enqueue(self.0.clone()).is_err()
+        {
+            self.0.pending.store(false, Release);
+        }
```

Using `Relaxed` on the cmpxchg is consistent with what Panthor does (`atomic_cmpxchg` in C defaults to relaxed on many architectures). However, there's a subtle issue: if `schedule()` is called from an IRQ handler or another CPU that has just observed a hardware fault, the `Relaxed` ordering means there's no guarantee that the work function will see the state that triggered the reset request. In practice, the workqueue infrastructure provides sufficient memory barriers (work item execution implies acquire semantics from the queue), so this is likely fine, but it's worth noting.

The `Release` store on the failure path is correct — it ensures that the "not pending" state is visible to other CPUs.

In `reset_work`:
```rust
+        self.pending.store(false, Release);
```

This `Release` at the end of `reset_work` pairs with the `Acquire` load in `is_pending()`:
```rust
+    pub(crate) fn is_pending(&self) -> bool {
+        self.0.pending.load(Acquire)
+    }
```

This pairing is correct — `Acquire` on load ensures that if `is_pending()` returns `false`, the caller sees all memory effects of the completed reset.

However, the `cmpxchg` in `schedule()` uses `Relaxed`, which means a successful cmpxchg (false→true) doesn't synchronize with the `Release` store at the end of the previous `reset_work()`. This means `schedule()` callers don't have a happens-before relationship with the completion of a prior reset. For the current code this is fine since `schedule()` doesn't read any shared state that depends on the reset having completed, but it's a footgun for future modifications. Consider using `Acquire` on the cmpxchg for safety.

#### 3. Drop ordering comment

```rust
+    // `ResetHandle::drop()` drains queued/running works and this must happen
+    // before clocks/regulators are dropped. So keep this field before them to
+    // ensure the correct drop order.
+    pub(crate) reset: reset::ResetHandle,
```

Relying on struct field declaration order for drop ordering is correct in Rust (fields are dropped in declaration order), and the comment documenting this is good practice. This is a well-known pattern.

#### 4. Formatting changes mixed in

The patch makes some unrelated formatting changes in `driver.rs`:

```rust
-    device::{
-        Bound,
-        Core,
-        Device, //
-    },
+    device::{
+        Core, //
+    },
```

and:

```rust
-    new_mutex,
-    of,
-    platform,
+    new_mutex, of, platform,
```

These are fine as cleanup of now-unused imports/reformatting, but the commit message doesn't mention them. They're a consequence of the refactoring so this is acceptable.

#### 5. `run_reset` visibility

```rust
+pub(super) fn run_reset(dev: &Device<Bound>, iomem: &Devres<IoMem>) -> Result {
```

The doc comment says:

```rust
+/// Its visibility is `pub(super)` only so the probe path can run an
+/// initial reset; it is not part of this module's public API.
```

This is `pub(super)` but could also be `pub(crate)` since `driver.rs` is a sibling module, not a parent. Actually, looking at the module structure (`tyr.rs` declares `mod driver` and `mod reset`), `pub(super)` makes `run_reset` visible to `tyr.rs` and its child modules including `driver.rs`. This is correct.

#### 6. Minor: `#[expect(dead_code)]`

```rust
+    #[expect(dead_code)]
+    pub(crate) fn schedule(&self) {
```

```rust
+    #[expect(dead_code)]
+    pub(crate) fn is_pending(&self) -> bool {
```

These are currently unused and marked with `#[expect(dead_code)]`. This is fine for an incremental series where callers will be added later, but in upstream submission you'd typically want the caller in the same series or the function added when it's needed. The `#[expect(dead_code)]` is better than `#[allow(dead_code)]` since it will warn when the attribute becomes unnecessary.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 RESEND 0/4] drm/tyr: implement GPU reset API
  2026-03-13  9:16 [PATCH v1 RESEND 0/4] drm/tyr: implement GPU reset API Onur Özkan
                   ` (3 preceding siblings ...)
  2026-03-13  9:16 ` [PATCH v1 RESEND 4/4] drm/tyr: add GPU reset handling Onur Özkan
@ 2026-03-13  9:52 ` Alice Ryhl
  2026-03-13 11:12   ` Onur Özkan
  2026-03-13 21:15 ` Claude review: " Claude Code Review Bot
  5 siblings, 1 reply; 15+ messages in thread
From: Alice Ryhl @ 2026-03-13  9:52 UTC (permalink / raw)
  To: Onur Özkan
  Cc: linux-kernel, dakr, daniel.almeida, airlied, simona, dri-devel,
	rust-for-linux

On Fri, Mar 13, 2026 at 12:16:40PM +0300, Onur Özkan wrote:
> This series adds GPU reset handling support for Tyr in a new module
> drivers/gpu/drm/tyr/driver.rs which encapsulates the low-level reset
> controller internals and exposes a ResetHandle API to the driver.
> 
> The reset module owns reset state, queueing and execution ordering
> through OrderedQueue and handles duplicate/concurrent reset requests
> with a pending flag.
> 
> Apart from the reset module, the first 3 patches:
> 
> - Fixes a potential reset-complete stale state bug by clearing completed
>   state before doing soft reset.
> - Adds Work::disable_sync() (wrapper of bindings::disable_work_sync).
> - Adds OrderedQueue support.
> 
> Runtime tested on hardware by Deborah Brouwer (see [1]) and myself.
> 
> [1]: https://gitlab.freedesktop.org/panfrost/linux/-/merge_requests/63#note_3364131
> 
> Link: https://gitlab.freedesktop.org/panfrost/linux/-/issues/28
> ---
> 
> Onur Özkan (4):
>   drm/tyr: clear reset IRQ before soft reset
>   rust: add Work::disable_sync
>   rust: add ordered workqueue wrapper

I actually added ordered workqueue support here:
https://lore.kernel.org/all/20260312-create-workqueue-v4-0-ea39c351c38f@google.com/

Alice

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 RESEND 0/4] drm/tyr: implement GPU reset API
  2026-03-13  9:52 ` [PATCH v1 RESEND 0/4] drm/tyr: implement GPU reset API Alice Ryhl
@ 2026-03-13 11:12   ` Onur Özkan
  2026-03-13 11:26     ` Alice Ryhl
  0 siblings, 1 reply; 15+ messages in thread
From: Onur Özkan @ 2026-03-13 11:12 UTC (permalink / raw)
  To: Alice Ryhl
  Cc: linux-kernel, dakr, daniel.almeida, airlied, simona, dri-devel,
	rust-for-linux

On Fri, 13 Mar 2026 09:52:16 +0000
Alice Ryhl <aliceryhl@google.com> wrote:

> On Fri, Mar 13, 2026 at 12:16:40PM +0300, Onur Özkan wrote:
> > This series adds GPU reset handling support for Tyr in a new module
> > drivers/gpu/drm/tyr/driver.rs which encapsulates the low-level reset
> > controller internals and exposes a ResetHandle API to the driver.
> > 
> > The reset module owns reset state, queueing and execution ordering
> > through OrderedQueue and handles duplicate/concurrent reset requests
> > with a pending flag.
> > 
> > Apart from the reset module, the first 3 patches:
> > 
> > - Fixes a potential reset-complete stale state bug by clearing
> > completed state before doing soft reset.
> > - Adds Work::disable_sync() (wrapper of
> > bindings::disable_work_sync).
> > - Adds OrderedQueue support.
> > 
> > Runtime tested on hardware by Deborah Brouwer (see [1]) and myself.
> > 
> > [1]:
> > https://gitlab.freedesktop.org/panfrost/linux/-/merge_requests/63#note_3364131
> > 
> > Link: https://gitlab.freedesktop.org/panfrost/linux/-/issues/28
> > ---
> > 
> > Onur Özkan (4):
> >   drm/tyr: clear reset IRQ before soft reset
> >   rust: add Work::disable_sync
> >   rust: add ordered workqueue wrapper
> 
> I actually added ordered workqueue support here:
> https://lore.kernel.org/all/20260312-create-workqueue-v4-0-ea39c351c38f@google.com/
> 
> Alice

That's cool. I guess this will wait until your patch lands unless we
want to combine them into a single series.

- Onur


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v1 RESEND 0/4] drm/tyr: implement GPU reset API
  2026-03-13 11:12   ` Onur Özkan
@ 2026-03-13 11:26     ` Alice Ryhl
  0 siblings, 0 replies; 15+ messages in thread
From: Alice Ryhl @ 2026-03-13 11:26 UTC (permalink / raw)
  To: Onur Özkan
  Cc: linux-kernel, dakr, daniel.almeida, airlied, simona, dri-devel,
	rust-for-linux

On Fri, Mar 13, 2026 at 12:12 PM Onur Özkan <work@onurozkan.dev> wrote:
>
> On Fri, 13 Mar 2026 09:52:16 +0000
> Alice Ryhl <aliceryhl@google.com> wrote:
>
> > On Fri, Mar 13, 2026 at 12:16:40PM +0300, Onur Özkan wrote:
> > > This series adds GPU reset handling support for Tyr in a new module
> > > drivers/gpu/drm/tyr/driver.rs which encapsulates the low-level reset
> > > controller internals and exposes a ResetHandle API to the driver.
> > >
> > > The reset module owns reset state, queueing and execution ordering
> > > through OrderedQueue and handles duplicate/concurrent reset requests
> > > with a pending flag.
> > >
> > > Apart from the reset module, the first 3 patches:
> > >
> > > - Fixes a potential reset-complete stale state bug by clearing
> > > completed state before doing soft reset.
> > > - Adds Work::disable_sync() (wrapper of
> > > bindings::disable_work_sync).
> > > - Adds OrderedQueue support.
> > >
> > > Runtime tested on hardware by Deborah Brouwer (see [1]) and myself.
> > >
> > > [1]:
> > > https://gitlab.freedesktop.org/panfrost/linux/-/merge_requests/63#note_3364131
> > >
> > > Link: https://gitlab.freedesktop.org/panfrost/linux/-/issues/28
> > > ---
> > >
> > > Onur Özkan (4):
> > >   drm/tyr: clear reset IRQ before soft reset
> > >   rust: add Work::disable_sync
> > >   rust: add ordered workqueue wrapper
> >
> > I actually added ordered workqueue support here:
> > https://lore.kernel.org/all/20260312-create-workqueue-v4-0-ea39c351c38f@google.com/
> >
> > Alice
>
> That's cool. I guess this will wait until your patch lands unless we
> want to combine them into a single series.

You can just say in your cover letter that your series depends on mine.

Alice

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Claude review: drm/tyr: implement GPU reset API
  2026-03-13  9:16 [PATCH v1 RESEND 0/4] drm/tyr: implement GPU reset API Onur Özkan
                   ` (4 preceding siblings ...)
  2026-03-13  9:52 ` [PATCH v1 RESEND 0/4] drm/tyr: implement GPU reset API Alice Ryhl
@ 2026-03-13 21:15 ` Claude Code Review Bot
  5 siblings, 0 replies; 15+ messages in thread
From: Claude Code Review Bot @ 2026-03-13 21:15 UTC (permalink / raw)
  To: dri-devel-reviews

Overall Series Review

Subject: drm/tyr: implement GPU reset API
Author: =?UTF-8?q?Onur=20=C3=96zkan?= <work@onurozkan.dev>
Patches: 10
Reviewed: 2026-03-14T07:15:46.380916

---

This is a well-structured 4-patch series that adds GPU reset handling infrastructure to the Tyr DRM driver (a Rust-based GPU driver for Mali/Panthor-family hardware). The series follows a logical progression: fix a stale-state bug, add workqueue primitives to the Rust kernel abstractions, then build the reset module on top.

The code is generally clean and well-organized. The main concerns are around memory ordering semantics in the `schedule()` method, the safety argument for the `as_bound()` call in the async work path, and some API design questions for the new Rust workqueue abstractions. The `disable_work_sync` wrapper should also consider whether it truly needs to be `unsafe`.

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-03-13 21:15 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-13  9:16 [PATCH v1 RESEND 0/4] drm/tyr: implement GPU reset API Onur Özkan
2026-03-13  9:16 ` [PATCH v1 RESEND 1/4] drm/tyr: clear reset IRQ before soft reset Onur Özkan
2026-03-13 21:15   ` Claude review: " Claude Code Review Bot
2026-03-13  9:16 ` [PATCH v1 RESEND 2/4] rust: add Work::disable_sync Onur Özkan
2026-03-13 12:00   ` Alice Ryhl
2026-03-13 21:15   ` Claude review: " Claude Code Review Bot
2026-03-13  9:16 ` [PATCH v1 RESEND 3/4] rust: add ordered workqueue wrapper Onur Özkan
2026-03-13 21:15   ` Claude review: " Claude Code Review Bot
2026-03-13  9:16 ` [PATCH v1 RESEND 4/4] drm/tyr: add GPU reset handling Onur Özkan
2026-03-13 14:56   ` Daniel Almeida
2026-03-13 21:15   ` Claude review: " Claude Code Review Bot
2026-03-13  9:52 ` [PATCH v1 RESEND 0/4] drm/tyr: implement GPU reset API Alice Ryhl
2026-03-13 11:12   ` Onur Özkan
2026-03-13 11:26     ` Alice Ryhl
2026-03-13 21:15 ` Claude review: " Claude Code Review Bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox