public inbox for drm-ai-reviews@public-inbox.freedesktop.org
 help / color / mirror / Atom feed
* Claude review: gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding
  2026-04-22 13:40 ` [PATCH v3 6/6] gpu: nova-core: run Booter Unloader and FWSEC-SB " Alexandre Courbot
@ 2026-04-22 21:34   ` Claude Code Review Bot
  0 siblings, 0 replies; 19+ messages in thread
From: Claude Code Review Bot @ 2026-04-22 21:34 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Status: Mostly good, a few observations**

This is the final functional patch that completes the unload sequence by running FWSEC-SB and Booter Unloader. The structure mirrors the boot path nicely (FWSEC-FRTS → Booter Loader at boot becomes FWSEC-SB → Booter Unloader at unbind).

The FWSEC-SB section:

```rust
let bios = Vbios::new(dev, bar)?;
let fwsec_sb = FwsecFirmware::new(dev, gsp_falcon, bar, &bios, FwsecCommand::Sb)?;

if chipset.needs_fwsec_bootloader() {
    let fwsec_sb_bl = FwsecFirmwareWithBl::new(fwsec_sb, dev, chipset)?;
    fwsec_sb_bl.run(dev, gsp_falcon, bar)?;
} else {
    fwsec_sb.run(dev, gsp_falcon, bar)?;
}
```

This closely follows the `run_fwsec_frts` pattern from the boot path, which is good.

The Booter Unloader section:

```rust
const MAILBOX_SENTINEL: u32 = 0xff;
let (mbox0, _) = sec2_falcon.boot(bar, Some(MAILBOX_SENTINEL), None)?;
if mbox0 != 0 {
    dev_err!(dev, "Booter Unloader returned error 0x{:x}\n", mbox0);
    return Err(EINVAL);
}
```

The sentinel value `0xff` is documented per the v3 changelog as a value written to mailbox0 before booting to confirm the unloader actually ran and overwrote it. The check `mbox0 != 0` verifies success.

**Observations:**

1. **Error handling in unload path**: The `unload()` method uses `?` to propagate errors from FWSEC-SB and Booter Unloader operations. Back in `gpu.rs`, `unload()` errors are caught by `warn_on_err!` and discarded with `let _ =`. This means that if FWSEC-SB fails after GSP shutdown has already succeeded, the function returns early without reaching `sysmem_flush.unregister()`. This seems intentional — if we can't reset the falcon state, it's unclear whether it's safe to proceed. But it does mean the sysmem flush page might remain registered in a partially-torn-down state. Worth considering whether `sysmem_flush.unregister()` should always run regardless.

2. **WPR2 check uses `hi_val() != 0`**: The `is_wpr2_set()` method checks whether the high address is non-zero. This matches how the boot path checks for WPR2 existence (`higher_bound() != 0` at line 62 of boot.rs). Consistent.

3. **Removing `#[expect(unused)]` and `#[expect(dead_code)]`**: The patch removes the `#[expect(unused)]` from `BooterKind::Unloader` and `#[expect(dead_code)]` from `FwsecCommand::Sb`, since both are now used. Clean.

4. **Re-reading VBIOS during unload**: `Vbios::new(dev, bar)?` is called during unload, which re-reads the VBIOS from the GPU ROM. The VBIOS was already read during boot (in `run_fwsec_frts`). This is a necessary re-read since the original VBIOS object wasn't stored, and it's the correct approach — caching it would add complexity for no meaningful performance benefit during a one-shot unbind operation.

Overall, this is a solid and well-organized series. The main architectural question is item 1 above — whether `sysmem_flush.unregister()` should be reached even if the FWSEC-SB or Booter Unloader steps fail during unload, since leaving it registered in a partially torn-down state could be problematic. The rest is clean and follows existing patterns correctly.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Claude review: gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding
  2026-04-21  6:16 ` [PATCH v2 5/5] gpu: nova-core: run Booter Unloader and FWSEC-SB " Alexandre Courbot
@ 2026-04-22 22:52   ` Claude Code Review Bot
  0 siblings, 0 replies; 19+ messages in thread
From: Claude Code Review Bot @ 2026-04-22 22:52 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

This completes the unload sequence by tearing down the WPR2 region.

1. **Removing dead_code/unused annotations** (`firmware/booter.rs`, `firmware/fwsec.rs`):
```rust
-    #[expect(unused)]
     Unloader,
```
```rust
-    #[expect(dead_code)]
     Sb,
```
These variants were pre-declared but unused until now. Clean removal.

2. **WPR2 check helper** (`regs.rs`):
```rust
+    pub(crate) fn is_wpr2_set(self) -> bool {
+        self.hi_val() != 0
+    }
```
Simple and correct — the existing comment on `higher_bound()` says "A value of zero means the WPR2 region is not set."

3. **Full unload sequence** (`gsp/boot.rs`):

The unload now does:
- Shut down GSP (from patch 4)
- Run FWSEC-SB to reset the GSP falcon to pre-libos state
- If WPR2 is still set, run Booter Unloader to remove it
- Verify WPR2 was cleared

The FWSEC-SB and Booter Unloader loading mirrors the boot-time pattern in `boot()` / `run_fwsec_frts()`, using the same `needs_fwsec_bootloader()` dispatch. This symmetry with the boot path is good.

```rust
+        let bios = Vbios::new(dev, bar)?;
+        let fwsec_sb = FwsecFirmware::new(dev, gsp_falcon, bar, &bios, FwsecCommand::Sb)?;
```

One thing to note: `Vbios::new()`, `FwsecFirmware::new()`, and `BooterFirmware::new()` all request firmware from the filesystem. If firmware loading fails during unbind (e.g., firmware files have been removed), the unload will fail with an error and the WPR2 region will remain set, preventing re-probe. The `?` propagation means these failures are reported via `warn_on_err!` in the caller. This is probably the best that can be done — you can't unload without the firmware — but it's a dependency worth documenting.

```rust
+            let _ = sec2_falcon.boot(bar, Some(0xff), Some(0xff))?;
```

The `let _ =` discards the boot return value (a mailbox read result). The `Some(0xff)` arguments for both mailbox values appear to be "don't care" sentinel values (matching the pattern used during boot in the sequencer). The `?` propagation means if SEC2 fails to boot the unloader, we bail out.

```rust
+            let wpr2_hi = bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI);
+            if wpr2_hi.is_wpr2_set() {
+                dev_err!(dev, "WPR2 region still set after Booter Unloader ran\n");
+                return Err(EBUSY);
+            }
```

Good defensive check. If the booter unloader ran but didn't actually clear WPR2, returning `EBUSY` is appropriate.

4. **Updated unload signature** (`gpu.rs`):
```rust
+        let _ = kernel::warn_on_err!(self.gsp.unload(
+            dev,
+            bar,
+            self.spec.chipset,
+            &self.gsp_falcon,
+            &self.sec2_falcon,
+        ));
```

Passing chipset and sec2_falcon through is clean — they're needed for the FWSEC-SB bootloader dispatch and Booter Unloader respectively.

**Minor observation on error handling in unbind path:** Both the FWSEC-SB and Booter Unloader steps use `?` internally, which means any failure in the unload sequence will abort the remaining steps but still continue to `sysmem_flush.unregister()` thanks to the `let _ = warn_on_err!(...)` in `unbind()`. This is reasonable best-effort teardown. If FWSEC-SB fails, attempting the Booter Unloader would likely also fail, so the early return within `unload()` makes sense.

No blocking issues.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Claude review: gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding
  2026-04-27  6:57 ` [PATCH v4 8/8] gpu: nova-core: run Booter Unloader and FWSEC-SB " Alexandre Courbot
@ 2026-04-28  5:01   ` Claude Code Review Bot
  0 siblings, 0 replies; 19+ messages in thread
From: Claude Code Review Bot @ 2026-04-28  5:01 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

This is the culmination of the series. Key points:

**UnloadBundle trait:**
```rust
pub(super) trait UnloadBundle: Send {
    fn run(&self, dev: ..., bar: ..., gsp_falcon: ..., sec2_falcon: ...) -> Result;
}
```

The `Send` bound is important since the bundle is stored in the `Gsp` struct and may be accessed from different contexts.

**FwsecUnloadFirmware enum** — abstracts over with/without bootloader variants. Clean approach:
```rust
enum FwsecUnloadFirmware {
    WithoutBl(FwsecFirmware),
    WithBl(FwsecFirmwareWithBl),
}
```

**Sec2UnloadBundle::run() implementation:**
```rust
    fn run(&self, ...) -> Result<()> {
        self.fwsec_sb.run(dev, bar, gsp_falcon)?;
        let wpr2_hi = bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI);
        if wpr2_hi.is_wpr2_set() {
            sec2_falcon.reset(bar)?;
            sec2_falcon.load(dev, bar, &self.booter_unloader)?;
            const MAILBOX_SENTINEL: u32 = 0xff;
            let (mbox0, _) = sec2_falcon.boot(bar, Some(MAILBOX_SENTINEL), Some(MAILBOX_SENTINEL))?;
            ...
        }
        Ok(())
    }
```

The sequence is: (1) run FWSEC-SB to reset GSP falcon state, (2) run Booter Unloader via SEC2 to remove WPR2. The sentinel value `0xff` in both mailboxes aligns with the v4 changelog noting alignment with OpenRM.

**Checking WPR2 removal after Booter Unloader:**
```rust
            let wpr2_hi = bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI);
            if wpr2_hi.is_wpr2_set() {
                dev_err!(dev, "WPR2 region still set after Booter Unloader returned\n");
                return Err(EBUSY);
            }
```
Good defensive check.

**Non-fatal unload bundle preparation:**
```rust
        let unload_bundle = Sec2UnloadBundle::build(...)
            .inspect_err(|e| { dev_warn!(...); })
            .ok();
```
This converts errors to `None`, meaning the driver loads successfully even if unload firmware can't be prepared. The three `dev_warn!` messages clearly communicate the degraded state. This is a reasonable tradeoff — failing to load entirely because the unload firmware is unavailable would be too aggressive.

**Pin projection concern:** In `boot.rs`:
```rust
+        *self.as_mut().project().unload = unload_bundle;
```
This requires the `#[pin_data]` macro to generate a `project()` method and `unload` must not be a pinned field. Since `unload` is `Option<KBox<dyn hal::UnloadBundle>>` and is NOT marked `#[pin]` in the struct definition, this should work correctly — `project()` will give direct `&mut` access to non-pinned fields.

**`is_wpr2_set()` helper** added to `regs.rs`:
```rust
    pub(crate) fn is_wpr2_set(self) -> bool {
        self.hi_val() != 0
    }
```
This duplicates the logic already present in `higher_bound()` (which returns 0 when not set), but is more semantically clear. Fine addition.

**The `#[expect(unused)]` and `#[expect(dead_code)]` removals** from `BooterKind::Unloader` and `FwsecCommand::Sb` are correct — these variants are now used.

**One observation:** The unload path in `boot.rs` calls `shutdown_gsp()` then `unload_bundle.run()`. If `shutdown_gsp` succeeds but `unload_bundle.run()` fails (e.g., Booter Unloader errors), the GSP is stopped but WPR2 is not removed. The GPU would then need a PCI reset before rebinding. This seems unavoidable given the sequential nature of the unload, and the error is propagated to the caller.

No blocking issues. The implementation is solid and well-considered.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Claude review: gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding
  2026-05-15  6:12 ` [PATCH v5 7/7] gpu: nova-core: run Booter Unloader and FWSEC-SB " Alexandre Courbot
@ 2026-05-15 23:54   ` Claude Code Review Bot
  0 siblings, 0 replies; 19+ messages in thread
From: Claude Code Review Bot @ 2026-05-15 23:54 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

This is the culmination of the series. The design of pre-loading unload firmware at boot time is well-motivated (filesystem may not be available at unbind).

The `FwsecUnloadFirmware` enum cleanly handles the with/without-bootloader variants:

```rust
+enum FwsecUnloadFirmware {
+    WithoutBl(FwsecFirmware),
+    WithBl(FwsecFirmwareWithBl),
+}
```

The `Sec2UnloadBundle::build()` returns `KBox<dyn UnloadBundle>` -- the heap allocation is appropriate since the firmware blobs can be large. The coercion `b as KBox<dyn UnloadBundle>` is the standard pattern.

The unload sequence is correctly ordered:
1. FWSEC-SB (reset GSP falcon to pre-libos state)
2. Booter Unloader (remove WPR2 region)

The Booter Unloader writes a sentinel value `0xff` to both mailboxes and checks mbox0 for success:

```rust
+const MAILBOX_SENTINEL: u32 = 0xff;
+let (mbox0, _) =
+    sec2_falcon.boot(bar, Some(MAILBOX_SENTINEL), Some(MAILBOX_SENTINEL))?;
+if mbox0 != 0 {
+    dev_err!(dev, "Booter Unloader returned error 0x{:x}\n", mbox0);
+    return Err(EINVAL);
+}
```

Good: WPR2 removal is verified after Booter Unloader returns:

```rust
+let wpr2_hi = bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI);
+if wpr2_hi.is_wpr2_set() {
+    dev_err!(dev, "WPR2 region still set after Booter Unloader returned\n");
+    return Err(EBUSY);
+}
```

The non-fatal handling of unload bundle preparation failure during boot is well-done:

```rust
+let unload_bundle =
+    Sec2UnloadBundle::build(dev, bar, chipset, &bios, gsp_falcon, sec2_falcon)
+        .inspect_err(|e| {
+            dev_warn!(dev, "Failed to prepare unload firmware: {:?}\n", e);
+            ...
+        })
+        .ok();
```

Using `.ok()` to convert the error to `None` means boot succeeds even if unload prep fails, which is the right tradeoff -- better to have a working driver that needs a manual GPU reset on unbind than to fail probe entirely.

The `is_wpr2_set()` helper in regs.rs is a thin wrapper over `hi_val() != 0`, which is functionally equivalent to the existing `higher_bound() != 0` check used elsewhere. This is fine for readability.

One observation: the `unload` field is set via pin projection `*self.as_mut().project().unload = unload_bundle`. Since `unload` is not a `#[pin]` field, this should work correctly through the generated pin projection.

No blocking issues.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Claude review: gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding
  2026-05-21 13:50 ` [PATCH v6 7/7] gpu: nova-core: run Booter Unloader and FWSEC-SB " Alexandre Courbot
@ 2026-05-25 10:06   ` Claude Code Review Bot
  0 siblings, 0 replies; 19+ messages in thread
From: Claude Code Review Bot @ 2026-05-25 10:06 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

This is the largest and most important patch. Several points:

**`Cell<Option<gsp::UnloadBundle>>` in `NovaCore`**:
```rust
unload_bundle: Cell<Option<gsp::UnloadBundle>>,
```
As noted in the overall review, `Cell` is `!Sync`. The `Cell::take()` in `unbind()` is used because `unbind` only gets a shared reference (`Pin<&Self::Data>`). This is a pragmatic solution — the unload bundle is consumed exactly once. However, if two unbind calls could race (unlikely but architecturally possible), `Cell::take()` would silently lose the bundle on one path. A `// SAFETY:` or `// INVARIANT:` comment explaining why this is sound would strengthen the patch.

**`UnloadBundle` as opaque wrapper**:
```rust
pub(crate) struct UnloadBundle(KBox<dyn hal::UnloadBundle>);
```

This nicely hides the HAL-specific implementation behind a trait object. The `KBox` ensures heap allocation and dynamic dispatch.

**`Sec2UnloadBundle::build()` error handling**:
```rust
let unload_bundle = Sec2UnloadBundle::build(...)
    .inspect_err(|e| {
        dev_warn!(dev, "Failed to prepare unload firmware: {:?}\n", e);
        ...
    })
    .ok();
```

Converting the error to `None` and continuing boot is the right call — failing to prepare unload firmware shouldn't prevent the driver from loading. The warnings are clear about the consequence.

**`Sec2UnloadBundle::run()` sequence**:
1. Run FWSEC-SB to reset GSP falcon
2. Check WPR2 region, run Booter Unloader if set
3. Verify WPR2 was removed

The Booter Unloader uses sentinel values `0xff` in both mailboxes (unlike Booter Loader which passes WPR metadata addresses). This aligns with OpenRM behavior per the cover letter.

**`unload()` error continuation**:
```rust
let mut res = Self::shutdown_gsp(...)
    .inspect_err(|e| dev_err!(dev, "GSP shutdown failed: {:?}\n", e));

if let Some(unload_bundle) = unload_bundle {
    res = res.and(
        unload_bundle.0.run(...)
            .inspect_err(|e| dev_err!(dev, "Unload bundle failed: {:?}\n", e)),
    );
}
```

Using `res.and()` means: if GSP shutdown failed, the unload bundle still runs, but the overall result preserves the first error. This is excellent teardown logic — continue cleanup despite partial failures, but don't mask errors.

**Missing `Reviewed-by` on this patch**: Unlike patches 1-6, patch 7 has no `Reviewed-by` tag. This is the most critical patch in the series, so it would benefit from additional review.

**`is_wpr2_set()` helper in `regs.rs`**:
```rust
pub(crate) fn is_wpr2_set(self) -> bool {
    self.hi_val() != 0
}
```
This duplicates the logic already in `higher_bound() != 0` checks elsewhere, but having a named predicate is clearer for the unload path.

**`FwsecUnloadFirmware` enum**: Clean abstraction over the with/without-bootloader variants. The `run()` method delegates correctly to the underlying firmware's `run()` method.

**One potential issue**: In `Sec2UnloadBundle::run()`, `self.fwsec_sb.run(dev, bar, gsp_falcon)` is called, but looking at the method signature:
```rust
fn run(&self, dev: ..., bar: &Bar0, gsp_falcon: &Falcon<GspEngine>) -> Result<()>
```
This passes `(dev, bar, gsp_falcon)`, and internally dispatches to `fw.run(dev, gsp_falcon, bar)` — the parameter reordering is intentional and correct, matching the underlying `FwsecFirmware::run(&self, dev, falcon, bar)` signature.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding
@ 2026-05-29  7:33 Alexandre Courbot
  2026-05-29  7:33 ` [PATCH v7 1/4] gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL Alexandre Courbot
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Alexandre Courbot @ 2026-05-29  7:33 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter
  Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
	nova-gpu, dri-devel, linux-kernel, rust-for-linux,
	Alexandre Courbot

Currently the GSP is left running and the WPR2 memory region untouched
when the driver is unbound. This is obviously not ideal for at least two
reasons:

- Probing requires setting up the WPR2 region, which cannot be done if
  there is already one in place. Hence the current requirement to reset
  the GPU (using e.g. `echo 1 >/sys/bus/pci/devices/.../reset`) before
  the driver can be probed again after removal.
- The running GSP may still attempt to access shared memory regions
  which the kernel might recycle.

On top of that, there is a nasty bug in the Blackwell VBIOS that
sometimes borks the GPU upon PCI reset, requiring a reboot. So relying
on the PCI reset to unload/reload Nova is really not practical here.

This series does what is needed to leave the GPU in a clean state after
unbind, for all currently supported GPUs. Blackwell support is just a
placeholder and will be completed by the Blackwell boot support series.

This revision is based on `drm-rust-next`.

A branch with the series is available at [1].

[1] https://github.com/Gnurou/linux/tree/b4/nova-unload

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
Changes in v7:
- Rebase on current drm-rust-next.
- Drop merged patches.
- Integrate Eliot's unload-on-drop improvement.
- Use `&Gsp` instead of `Pin<&mut Gsp>` in HAL.
- Add a new patch that runs the unload bundle if `Gsp::boot` fails.
- Link to v6: https://patch.msgid.link/20260521-nova-unload-v6-0-65f581c812c9@nvidia.com

Changes in v6:
- Inline TU102 local `run_booter` method in its unique call site.
- Rename unload bundle field to `unload_bundle`.
- Make Sec2UnloadBundle private.
- Continue GSP teardown upon partial failure.
- Store the unload bundle into `NovaCore`.
- Take the unload bundle by value to make it one-shot.
- Link to v5: https://patch.msgid.link/20260515-nova-unload-v5-0-c4d6250ad160@nvidia.com

Changes in v5:
- Rebase on top of the Device HRT series.
- Drop the now unneeded "gpu: nova-core: split BAR acquisition in unbind()".
- Link to v4: https://patch.msgid.link/20260427-nova-unload-v4-0-e145ccddae66@nvidia.com

Changes in v4:
- Remove `warn_on_err` macro as it isn't performing as expected and
  distracts from the goal of the series.
- Add John's patch from the Blackwell series refactoring the Booter
  Loader runner code.
- Add a GSP HAL and move the existing TU102/SEC2 boot sequence into it
  in preparation for the Hopper/Blackwell FSP boot path.
- Prepare the firmware required for unloading at probe time and save it
  into an unload bundle, as we cannot guarantee filesystem access at
  unload time.
- Constrain `UNLOADING_GUEST_DRIVER`'s visibility to the parent module.
- Also write the sentinel value `0xff` into `mbox1` when running Booter
  Unloader to align with OpenRM.
- Link to v3: https://patch.msgid.link/20260422-nova-unload-v3-0-1d2c81bd3ced@nvidia.com

Changes in v3:
- Disambiguate doccomment for `warn_on_err`.
- Test the correct bit instead of the whole register value to determine
  that the GSP has stopped.
- Use an enum instead of a boolean to encode the power level when
  shutting down the GSP.
- Add missing newline to `dev_err`.
- Add missing doccomments for new types.
- Use values from bindings instead of magic numbers.
- Remove the redundant `get_gsp_info` function.
- Better document Booter Unloader mailbox sentinel value, and check the
  value of mbox0 upon return.
- Link to v2: https://patch.msgid.link/20260421-nova-unload-v2-0-2fe54963af8b@nvidia.com

Changes in v2:
- Rebase on top of `master` and remove unneeded/obsolete preparatory patches.
- Tidy up the imports of commands from the `fw` module in the `gsp` module.
- Link to v1: https://patch.msgid.link/20251216-nova-unload-v1-0-6a5d823be19d@nvidia.com

---
Alexandre Courbot (4):
      gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL
      gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading
      gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding
      gpu: nova-core: gsp: run the unload bundle if Gsp::boot() fails

 drivers/gpu/nova-core/firmware/booter.rs          |   1 -
 drivers/gpu/nova-core/firmware/fwsec.rs           |   1 -
 drivers/gpu/nova-core/gpu.rs                      |  34 ++-
 drivers/gpu/nova-core/gsp.rs                      |   4 +
 drivers/gpu/nova-core/gsp/boot.rs                 | 290 +++++++++---------
 drivers/gpu/nova-core/gsp/commands.rs             |  43 +++
 drivers/gpu/nova-core/gsp/fw.rs                   |   4 +
 drivers/gpu/nova-core/gsp/fw/commands.rs          |  45 +++
 drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs |  11 +
 drivers/gpu/nova-core/gsp/hal.rs                  |  94 ++++++
 drivers/gpu/nova-core/gsp/hal/gh100.rs            |  51 ++++
 drivers/gpu/nova-core/gsp/hal/tu102.rs            | 349 ++++++++++++++++++++++
 drivers/gpu/nova-core/regs.rs                     |   5 +
 13 files changed, 775 insertions(+), 157 deletions(-)
---
base-commit: 0e42ec83d46ab8877d38d37493328ed7d1a24de8
change-id: 20251216-nova-unload-4029b3b76950

Best regards,
--  
Alexandre Courbot <acourbot@nvidia.com>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v7 1/4] gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL
  2026-05-29  7:33 [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
@ 2026-05-29  7:33 ` Alexandre Courbot
  2026-06-04  6:57   ` Claude review: " Claude Code Review Bot
  2026-05-29  7:33 ` [PATCH v7 2/4] gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading Alexandre Courbot
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Alexandre Courbot @ 2026-05-29  7:33 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter
  Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
	nova-gpu, dri-devel, linux-kernel, rust-for-linux,
	Alexandre Courbot

Booting the GSP is done differently depending on the architecture. Move
the parts that are chipset-specific under a HAL.

This does not change much at the moment, since the differences between
Turing and Ampere are rather benign, but will become critical to
properly support the FSP boot process used by Hopper and Blackwell.

The Hopper/Blackwell support is not merged yet, so their HAL is a stub
for now.

This patch is intended to be a mechanical code extraction with no
behavioral changes.

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gsp.rs           |   1 +
 drivers/gpu/nova-core/gsp/boot.rs      | 166 +++-----------------------
 drivers/gpu/nova-core/gsp/hal.rs       |  74 ++++++++++++
 drivers/gpu/nova-core/gsp/hal/gh100.rs |  50 ++++++++
 drivers/gpu/nova-core/gsp/hal/tu102.rs | 206 +++++++++++++++++++++++++++++++++
 5 files changed, 344 insertions(+), 153 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
index ba5b7f990031..38378f104068 100644
--- a/drivers/gpu/nova-core/gsp.rs
+++ b/drivers/gpu/nova-core/gsp.rs
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 
 mod boot;
+mod hal;
 
 use kernel::{
     debugfs,
diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 259f7c4d94f5..1bd9f21fc443 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -4,7 +4,6 @@
     device,
     dma::Coherent,
     io::poll::read_poll_timeout,
-    io::Io,
     pci,
     prelude::*,
     time::Delta, //
@@ -19,121 +18,17 @@
     },
     fb::FbLayout,
     firmware::{
-        booter::{
-            BooterFirmware,
-            BooterKind, //
-        },
-        fwsec::{
-            bootloader::FwsecFirmwareWithBl,
-            FwsecCommand,
-            FwsecFirmware, //
-        },
         gsp::GspFirmware,
         FIRMWARE_VERSION, //
     },
-    gpu::{
-        Architecture,
-        Chipset, //
-    },
+    gpu::Chipset,
     gsp::{
         commands,
-        sequencer::{
-            GspSequencer,
-            GspSequencerParams, //
-        },
         GspFwWprMeta, //
     },
-    regs,
-    vbios::Vbios,
 };
 
 impl super::Gsp {
-    /// Helper function to load and run the FWSEC-FRTS firmware and confirm that it has properly
-    /// created the WPR2 region.
-    fn run_fwsec_frts(
-        dev: &device::Device<device::Bound>,
-        chipset: Chipset,
-        falcon: &Falcon<Gsp>,
-        bar: &Bar0,
-        bios: &Vbios,
-        fb_layout: &FbLayout,
-    ) -> Result<()> {
-        // Check that the WPR2 region does not already exists - if it does, we cannot run
-        // FWSEC-FRTS until the GPU is reset.
-        if bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI).higher_bound() != 0 {
-            dev_err!(
-                dev,
-                "WPR2 region already exists - GPU needs to be reset to proceed\n"
-            );
-            return Err(EBUSY);
-        }
-
-        // FWSEC-FRTS will create the WPR2 region.
-        let fwsec_frts = FwsecFirmware::new(
-            dev,
-            falcon,
-            bar,
-            bios,
-            FwsecCommand::Frts {
-                frts_addr: fb_layout.frts.start,
-                frts_size: fb_layout.frts.len(),
-            },
-        )?;
-
-        if chipset.needs_fwsec_bootloader() {
-            let fwsec_frts_bl = FwsecFirmwareWithBl::new(fwsec_frts, dev, chipset)?;
-            // Load and run the bootloader, which will load FWSEC-FRTS and run it.
-            fwsec_frts_bl.run(dev, falcon, bar)?;
-        } else {
-            // Load and run FWSEC-FRTS directly.
-            fwsec_frts.run(dev, falcon, bar)?;
-        }
-
-        // SCRATCH_E contains the error code for FWSEC-FRTS.
-        let frts_status = bar
-            .read(regs::NV_PBUS_SW_SCRATCH_0E_FRTS_ERR)
-            .frts_err_code();
-        if frts_status != 0 {
-            dev_err!(
-                dev,
-                "FWSEC-FRTS returned with error code {:#x}\n",
-                frts_status
-            );
-
-            return Err(EIO);
-        }
-
-        // Check that the WPR2 region has been created as we requested.
-        let (wpr2_lo, wpr2_hi) = (
-            bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_LO).lower_bound(),
-            bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI).higher_bound(),
-        );
-
-        match (wpr2_lo, wpr2_hi) {
-            (_, 0) => {
-                dev_err!(dev, "WPR2 region not created after running FWSEC-FRTS\n");
-
-                Err(EIO)
-            }
-            (wpr2_lo, _) if wpr2_lo != fb_layout.frts.start => {
-                dev_err!(
-                    dev,
-                    "WPR2 region created at unexpected address {:#x}; expected {:#x}\n",
-                    wpr2_lo,
-                    fb_layout.frts.start,
-                );
-
-                Err(EIO)
-            }
-            (wpr2_lo, wpr2_hi) => {
-                dev_dbg!(dev, "WPR2: {:#x}-{:#x}\n", wpr2_lo, wpr2_hi);
-                dev_dbg!(dev, "GPU instance built\n");
-
-                Ok(())
-            }
-        }
-    }
-
     /// Attempt to boot the GSP.
     ///
     /// This is a GPU-dependent and complex procedure that involves loading firmware files from
@@ -149,17 +44,8 @@ pub(crate) fn boot(
         gsp_falcon: &Falcon<Gsp>,
         sec2_falcon: &Falcon<Sec2>,
     ) -> Result {
-        // The FSP boot process of Hopper+ is not supported for now.
-        if matches!(
-            chipset.arch(),
-            Architecture::Hopper | Architecture::BlackwellGB10x | Architecture::BlackwellGB20x
-        ) {
-            return Err(ENOTSUPP);
-        }
-
         let dev = pdev.as_ref();
-
-        let bios = Vbios::new(dev, bar)?;
+        let hal = super::hal::gsp_hal(chipset);
 
         let gsp_fw = KBox::pin_init(GspFirmware::new(dev, chipset, FIRMWARE_VERSION), GFP_KERNEL)?;
 
@@ -168,38 +54,21 @@ pub(crate) fn boot(
 
         let wpr_meta = Coherent::init(dev, GFP_KERNEL, GspFwWprMeta::new(&gsp_fw, &fb_layout))?;
 
-        // FWSEC-FRTS is not executed on chips where the FRTS region size is 0 (e.g. GA100).
-        if !fb_layout.frts.is_empty() {
-            Self::run_fwsec_frts(dev, chipset, gsp_falcon, bar, &bios, &fb_layout)?;
-        }
-
-        gsp_falcon.reset(bar)?;
-        let libos_handle = self.libos.dma_handle();
-        let (mbox0, mbox1) = gsp_falcon.boot(
-            bar,
-            Some(libos_handle as u32),
-            Some((libos_handle >> 32) as u32),
-        )?;
-        dev_dbg!(pdev, "GSP MBOX0: {:#x}, MBOX1: {:#x}\n", mbox0, mbox1);
-
-        dev_dbg!(
-            pdev,
-            "Using SEC2 to load and run the booter_load firmware...\n"
-        );
-
-        BooterFirmware::new(
+        // Perform the chipset-specific boot sequence.
+        hal.boot(
+            &self,
             dev,
-            BooterKind::Loader,
-            chipset,
-            FIRMWARE_VERSION,
-            sec2_falcon,
             bar,
-        )?
-        .run(dev, bar, sec2_falcon, &wpr_meta)?;
+            chipset,
+            &fb_layout,
+            &wpr_meta,
+            gsp_falcon,
+            sec2_falcon,
+        )?;
 
         gsp_falcon.write_os_version(bar, gsp_fw.bootloader.app_version);
 
-        // Poll for RISC-V to become active before running sequencer
+        // Poll for RISC-V to become active before continuing.
         read_poll_timeout(
             || Ok(gsp_falcon.is_riscv_active(bar)),
             |val: &bool| *val,
@@ -214,16 +83,7 @@ pub(crate) fn boot(
         self.cmdq
             .send_command_no_wait(bar, commands::SetRegistry::new())?;
 
-        // Create and run the GSP sequencer.
-        let seq_params = GspSequencerParams {
-            bootloader_app_version: gsp_fw.bootloader.app_version,
-            libos_dma_handle: libos_handle,
-            gsp_falcon,
-            sec2_falcon,
-            dev,
-            bar,
-        };
-        GspSequencer::run(&self.cmdq, seq_params)?;
+        hal.post_boot(&self, dev, bar, &gsp_fw, gsp_falcon, sec2_falcon)?;
 
         // Wait until GSP is fully initialized.
         commands::wait_gsp_init_done(&self.cmdq)?;
diff --git a/drivers/gpu/nova-core/gsp/hal.rs b/drivers/gpu/nova-core/gsp/hal.rs
new file mode 100644
index 000000000000..fb3edaeb3160
--- /dev/null
+++ b/drivers/gpu/nova-core/gsp/hal.rs
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+mod gh100;
+mod tu102;
+
+use kernel::prelude::*;
+
+use kernel::{
+    device,
+    dma::Coherent, //
+};
+
+use crate::{
+    driver::Bar0,
+    falcon::{
+        gsp::Gsp as GspEngine,
+        sec2::Sec2,
+        Falcon, //
+    },
+    fb::FbLayout,
+    firmware::gsp::GspFirmware,
+    gpu::{
+        Architecture,
+        Chipset, //
+    },
+    gsp::{
+        Gsp,
+        GspFwWprMeta, //
+    },
+};
+
+/// Trait implemented by GSP HALs.
+pub(super) trait GspHal: Send {
+    /// Performs the GSP boot process, loading and running the required firmwares as needed.
+    #[allow(clippy::too_many_arguments)]
+    fn boot(
+        &self,
+        gsp: &Gsp,
+        dev: &device::Device<device::Bound>,
+        bar: &Bar0,
+        chipset: Chipset,
+        fb_layout: &FbLayout,
+        wpr_meta: &Coherent<GspFwWprMeta>,
+        gsp_falcon: &Falcon<GspEngine>,
+        sec2_falcon: &Falcon<Sec2>,
+    ) -> Result;
+
+    /// Performs HAL-specific post-GSP boot tasks.
+    ///
+    /// This method is called by the GSP boot code after the GSP is confirmed to be running, and
+    /// after the initialization commands have been pushed onto its queue.
+    fn post_boot(
+        &self,
+        _gsp: &Gsp,
+        _dev: &device::Device<device::Bound>,
+        _bar: &Bar0,
+        _gsp_fw: &GspFirmware,
+        _gsp_falcon: &Falcon<GspEngine>,
+        _sec2_falcon: &Falcon<Sec2>,
+    ) -> Result {
+        Ok(())
+    }
+}
+
+/// Returns the GSP HAL to be used for `chipset`.
+pub(super) fn gsp_hal(chipset: Chipset) -> &'static dyn GspHal {
+    match chipset.arch() {
+        Architecture::Turing | Architecture::Ampere | Architecture::Ada => tu102::TU102_HAL,
+        Architecture::Hopper | Architecture::BlackwellGB10x | Architecture::BlackwellGB20x => {
+            gh100::GH100_HAL
+        }
+    }
+}
diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
new file mode 100644
index 000000000000..3f3675f9c16a
--- /dev/null
+++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
@@ -0,0 +1,50 @@
+// SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+use kernel::prelude::*;
+
+use kernel::{
+    device,
+    dma::Coherent, //
+};
+
+use crate::{
+    driver::Bar0,
+    falcon::{
+        gsp::Gsp as GspEngine,
+        sec2::Sec2,
+        Falcon, //
+    },
+    fb::FbLayout,
+    gpu::Chipset,
+    gsp::{
+        hal::GspHal,
+        Gsp,
+        GspFwWprMeta, //
+    },
+};
+
+struct Gh100;
+
+impl GspHal for Gh100 {
+    /// Boot GSP via FSP Chain of Trust (Hopper/Blackwell+ path).
+    ///
+    /// This path uses FSP to establish a chain of trust and boot GSP-FMC. FSP handles
+    /// the GSP boot internally - no manual GSP reset/boot is needed.
+    fn boot(
+        &self,
+        _gsp: &Gsp,
+        _dev: &device::Device<device::Bound>,
+        _bar: &Bar0,
+        _chipset: Chipset,
+        _fb_layout: &FbLayout,
+        _wpr_meta: &Coherent<GspFwWprMeta>,
+        _gsp_falcon: &Falcon<GspEngine>,
+        _sec2_falcon: &Falcon<Sec2>,
+    ) -> Result {
+        Err(ENOTSUPP)
+    }
+}
+
+const GH100: Gh100 = Gh100;
+pub(super) const GH100_HAL: &dyn GspHal = &GH100;
diff --git a/drivers/gpu/nova-core/gsp/hal/tu102.rs b/drivers/gpu/nova-core/gsp/hal/tu102.rs
new file mode 100644
index 000000000000..a6f2b2e279e8
--- /dev/null
+++ b/drivers/gpu/nova-core/gsp/hal/tu102.rs
@@ -0,0 +1,206 @@
+// SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+use kernel::prelude::*;
+
+use kernel::{
+    device,
+    dma::Coherent,
+    io::Io, //
+};
+
+use crate::{
+    driver::Bar0,
+    falcon::{
+        gsp::Gsp as GspEngine,
+        sec2::Sec2,
+        Falcon, //
+    },
+    fb::FbLayout,
+    firmware::{
+        booter::{
+            BooterFirmware,
+            BooterKind, //
+        },
+        fwsec::{
+            bootloader::FwsecFirmwareWithBl,
+            FwsecCommand,
+            FwsecFirmware, //
+        },
+        gsp::GspFirmware,
+        FIRMWARE_VERSION, //
+    },
+    gpu::Chipset,
+    gsp::{
+        hal::GspHal,
+        sequencer::{
+            GspSequencer,
+            GspSequencerParams, //
+        },
+        Gsp,
+        GspFwWprMeta, //
+    },
+    regs,
+    vbios::Vbios, //
+};
+
+/// Helper function to load and run the FWSEC-FRTS firmware and confirm that it has properly
+/// created the WPR2 region.
+fn run_fwsec_frts(
+    dev: &device::Device<device::Bound>,
+    chipset: Chipset,
+    falcon: &Falcon<GspEngine>,
+    bar: &Bar0,
+    bios: &Vbios,
+    fb_layout: &FbLayout,
+) -> Result<()> {
+    // Check that the WPR2 region does not already exist - if it does, we cannot run
+    // FWSEC-FRTS until the GPU is reset.
+    if bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI).higher_bound() != 0 {
+        dev_err!(
+            dev,
+            "WPR2 region already exists - GPU needs to be reset to proceed\n"
+        );
+        return Err(EBUSY);
+    }
+
+    // FWSEC-FRTS will create the WPR2 region.
+    let fwsec_frts = FwsecFirmware::new(
+        dev,
+        falcon,
+        bar,
+        bios,
+        FwsecCommand::Frts {
+            frts_addr: fb_layout.frts.start,
+            frts_size: fb_layout.frts.len(),
+        },
+    )?;
+
+    if chipset.needs_fwsec_bootloader() {
+        let fwsec_frts_bl = FwsecFirmwareWithBl::new(fwsec_frts, dev, chipset)?;
+        // Load and run the bootloader, which will load FWSEC-FRTS and run it.
+        fwsec_frts_bl.run(dev, falcon, bar)?;
+    } else {
+        // Load and run FWSEC-FRTS directly.
+        fwsec_frts.run(dev, falcon, bar)?;
+    }
+
+    // SCRATCH_E contains the error code for FWSEC-FRTS.
+    let frts_status = bar
+        .read(regs::NV_PBUS_SW_SCRATCH_0E_FRTS_ERR)
+        .frts_err_code();
+    if frts_status != 0 {
+        dev_err!(
+            dev,
+            "FWSEC-FRTS returned with error code {:#x}\n",
+            frts_status
+        );
+
+        return Err(EIO);
+    }
+
+    // Check that the WPR2 region has been created as we requested.
+    let (wpr2_lo, wpr2_hi) = (
+        bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_LO).lower_bound(),
+        bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI).higher_bound(),
+    );
+
+    match (wpr2_lo, wpr2_hi) {
+        (_, 0) => {
+            dev_err!(dev, "WPR2 region not created after running FWSEC-FRTS\n");
+
+            Err(EIO)
+        }
+        (wpr2_lo, _) if wpr2_lo != fb_layout.frts.start => {
+            dev_err!(
+                dev,
+                "WPR2 region created at unexpected address {:#x}; expected {:#x}\n",
+                wpr2_lo,
+                fb_layout.frts.start,
+            );
+
+            Err(EIO)
+        }
+        (wpr2_lo, wpr2_hi) => {
+            dev_dbg!(dev, "WPR2: {:#x}-{:#x}\n", wpr2_lo, wpr2_hi);
+            dev_dbg!(dev, "GPU instance built\n");
+
+            Ok(())
+        }
+    }
+}
+
+struct Tu102;
+
+impl GspHal for Tu102 {
+    fn boot(
+        &self,
+        gsp: &Gsp,
+        dev: &device::Device<device::Bound>,
+        bar: &Bar0,
+        chipset: Chipset,
+        fb_layout: &FbLayout,
+        wpr_meta: &Coherent<GspFwWprMeta>,
+        gsp_falcon: &Falcon<GspEngine>,
+        sec2_falcon: &Falcon<Sec2>,
+    ) -> Result {
+        let bios = Vbios::new(dev, bar)?;
+
+        // FWSEC-FRTS is not executed on chips where the FRTS region size is 0 (e.g. GA100).
+        if !fb_layout.frts.is_empty() {
+            run_fwsec_frts(dev, chipset, gsp_falcon, bar, &bios, fb_layout)?;
+        }
+
+        gsp_falcon.reset(bar)?;
+        let libos_handle = gsp.libos.dma_handle();
+        let (mbox0, mbox1) = gsp_falcon.boot(
+            bar,
+            Some(libos_handle as u32),
+            Some((libos_handle >> 32) as u32),
+        )?;
+        dev_dbg!(dev, "GSP MBOX0: {:#x}, MBOX1: {:#x}\n", mbox0, mbox1);
+
+        dev_dbg!(
+            dev,
+            "Using SEC2 to load and run the booter_load firmware...\n"
+        );
+
+        BooterFirmware::new(
+            dev,
+            BooterKind::Loader,
+            chipset,
+            FIRMWARE_VERSION,
+            sec2_falcon,
+            bar,
+        )?
+        .run(dev, bar, sec2_falcon, wpr_meta)?;
+
+        Ok(())
+    }
+
+    fn post_boot(
+        &self,
+        gsp: &Gsp,
+        dev: &device::Device<device::Bound>,
+        bar: &Bar0,
+        gsp_fw: &GspFirmware,
+        gsp_falcon: &Falcon<GspEngine>,
+        sec2_falcon: &Falcon<Sec2>,
+    ) -> Result {
+        // Create and run the GSP sequencer.
+        let seq_params = GspSequencerParams {
+            bootloader_app_version: gsp_fw.bootloader.app_version,
+            libos_dma_handle: gsp.libos.dma_handle(),
+            gsp_falcon,
+            sec2_falcon,
+            dev,
+            bar,
+        };
+        GspSequencer::run(&gsp.cmdq, seq_params)?;
+
+        Ok(())
+    }
+}
+
+const TU102: Tu102 = Tu102;
+pub(super) const TU102_HAL: &dyn GspHal = &TU102;

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v7 2/4] gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading
  2026-05-29  7:33 [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
  2026-05-29  7:33 ` [PATCH v7 1/4] gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL Alexandre Courbot
@ 2026-05-29  7:33 ` Alexandre Courbot
  2026-06-04  6:57   ` Claude review: " Claude Code Review Bot
  2026-05-29  7:33 ` [PATCH v7 3/4] gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding Alexandre Courbot
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Alexandre Courbot @ 2026-05-29  7:33 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter
  Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
	nova-gpu, dri-devel, linux-kernel, rust-for-linux,
	Alexandre Courbot

Currently, the GSP is left running after the driver is unbound. This is
not great for several reasons, notably that it can still access shared
memory areas that the kernel will now reclaim (especially problematic on
setups without an IOMMU).

Fix this by sending the `UNLOADING_GUEST_DRIVER` GSP command when the
`Gpu` is dropped. This stops the GSP and lets us proceed with the rest
of the unbind sequence in a later patch.

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Co-developed-by: Eliot Courtney <ecourtney@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs                      | 21 ++++++++++-
 drivers/gpu/nova-core/gsp/boot.rs                 | 45 +++++++++++++++++++++++
 drivers/gpu/nova-core/gsp/commands.rs             | 43 ++++++++++++++++++++++
 drivers/gpu/nova-core/gsp/fw.rs                   |  4 ++
 drivers/gpu/nova-core/gsp/fw/commands.rs          | 45 +++++++++++++++++++++++
 drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs | 11 ++++++
 6 files changed, 168 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index cf134cab49cd..011d504830e4 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -243,8 +243,10 @@ fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
 }
 
 /// Structure holding the resources required to operate the GPU.
-#[pin_data]
+#[pin_data(PinnedDrop)]
 pub(crate) struct Gpu<'gpu> {
+    /// Device owning the GPU.
+    device: &'gpu device::Device<device::Bound>,
     spec: Spec,
     /// MMIO mapping of PCI BAR 0.
     bar: &'gpu Bar0,
@@ -266,6 +268,7 @@ pub(crate) fn new(
         bar: &'gpu Bar0,
     ) -> impl PinInit<Self, Error> + 'gpu {
         try_pin_init!(Self {
+            device: pdev.as_ref(),
             spec: Spec::new(pdev.as_ref(), bar).inspect(|spec| {
                 dev_info!(pdev,"NVIDIA ({})\n", spec);
             })?,
@@ -294,3 +297,19 @@ pub(crate) fn new(
         })
     }
 }
+
+#[pinned_drop]
+impl PinnedDrop for Gpu<'_> {
+    fn drop(self: Pin<&mut Self>) {
+        let this = self.project();
+        let device = *this.device;
+        let bar = *this.bar;
+
+        let _ = this
+            .gsp
+            .as_ref()
+            .get_ref()
+            .unload(device, bar, &*this.gsp_falcon)
+            .inspect_err(|e| dev_err!(device, "failed to unload GSP: {:?}\n", e));
+    }
+}
diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 1bd9f21fc443..adc66809ce83 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -1,6 +1,8 @@
 // SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 
 use kernel::{
+    bits,
     device,
     dma::Coherent,
     io::poll::read_poll_timeout,
@@ -23,6 +25,7 @@
     },
     gpu::Chipset,
     gsp::{
+        cmdq::Cmdq,
         commands,
         GspFwWprMeta, //
     },
@@ -97,4 +100,46 @@ pub(crate) fn boot(
 
         Ok(())
     }
+
+    /// Shut down the GSP and wait until it is offline.
+    fn shutdown_gsp(
+        cmdq: &Cmdq,
+        bar: &Bar0,
+        gsp_falcon: &Falcon<Gsp>,
+        mode: commands::PowerStateLevel,
+    ) -> Result<()> {
+        // Command to shut the GSP down.
+        cmdq.send_command(bar, commands::UnloadingGuestDriver::new(mode))?;
+
+        // Wait until GSP signals it is suspended.
+        const LIBOS_INTERRUPT_PROCESSOR_SUSPENDED: u32 = bits::bit_u32(31);
+        read_poll_timeout(
+            || Ok(gsp_falcon.read_mailbox0(bar)),
+            |&mb0| mb0 & LIBOS_INTERRUPT_PROCESSOR_SUSPENDED != 0,
+            Delta::from_millis(10),
+            Delta::from_secs(5),
+        )
+        .map(|_| ())
+    }
+
+    /// Attempts to unload the GSP firmware.
+    ///
+    /// This stops all activity on the GSP.
+    pub(crate) fn unload(
+        &self,
+        dev: &device::Device<device::Bound>,
+        bar: &Bar0,
+        gsp_falcon: &Falcon<Gsp>,
+    ) -> Result {
+        // Shut down the GSP.
+        Self::shutdown_gsp(
+            &self.cmdq,
+            bar,
+            gsp_falcon,
+            commands::PowerStateLevel::Level0,
+        )
+        .inspect_err(|e| dev_err!(dev, "Unload guest driver failed: {:?}\n", e))?;
+
+        Ok(())
+    }
 }
diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
index ac9cef312b10..3a365455d10c 100644
--- a/drivers/gpu/nova-core/gsp/commands.rs
+++ b/drivers/gpu/nova-core/gsp/commands.rs
@@ -232,3 +232,46 @@ pub(crate) fn gpu_name(&self) -> core::result::Result<&str, GpuNameError> {
             .map_err(GpuNameError::InvalidUtf8)
     }
 }
+
+pub(crate) use fw::commands::PowerStateLevel;
+
+/// The `UnloadingGuestDriver` command, used to shut down the GSP.
+///
+/// Only used within the `gsp` module.
+pub(super) struct UnloadingGuestDriver {
+    level: PowerStateLevel,
+}
+
+impl UnloadingGuestDriver {
+    /// Creates a new `UnloadingGuestDriver` command for the given [`PowerStateLevel`].
+    pub(super) fn new(level: PowerStateLevel) -> Self {
+        Self { level }
+    }
+}
+
+impl CommandToGsp for UnloadingGuestDriver {
+    const FUNCTION: MsgFunction = MsgFunction::UnloadingGuestDriver;
+    type Command = fw::commands::UnloadingGuestDriver;
+    type Reply = UnloadingGuestDriverReply;
+    type InitError = Infallible;
+
+    fn init(&self) -> impl Init<Self::Command, Self::InitError> {
+        fw::commands::UnloadingGuestDriver::new(self.level)
+    }
+}
+
+/// The reply from the GSP to the [`UnloadingGuestDriver`] command.
+pub(super) struct UnloadingGuestDriverReply;
+
+impl MessageFromGsp for UnloadingGuestDriverReply {
+    const FUNCTION: MsgFunction = MsgFunction::UnloadingGuestDriver;
+    type InitError = Infallible;
+    type Message = ();
+
+    fn read(
+        _msg: &Self::Message,
+        _sbuffer: &mut SBufferIter<array::IntoIter<&[u8], 2>>,
+    ) -> Result<Self, Self::InitError> {
+        Ok(UnloadingGuestDriverReply)
+    }
+}
diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
index 3245793bbe42..33c9f5860771 100644
--- a/drivers/gpu/nova-core/gsp/fw.rs
+++ b/drivers/gpu/nova-core/gsp/fw.rs
@@ -279,6 +279,7 @@ pub(crate) enum MsgFunction {
     Nop = bindings::NV_VGPU_MSG_FUNCTION_NOP,
     SetGuestSystemInfo = bindings::NV_VGPU_MSG_FUNCTION_SET_GUEST_SYSTEM_INFO,
     SetRegistry = bindings::NV_VGPU_MSG_FUNCTION_SET_REGISTRY,
+    UnloadingGuestDriver = bindings::NV_VGPU_MSG_FUNCTION_UNLOADING_GUEST_DRIVER,
 
     // Event codes
     GspInitDone = bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE,
@@ -323,6 +324,9 @@ fn try_from(value: u32) -> Result<MsgFunction> {
                 Ok(MsgFunction::SetGuestSystemInfo)
             }
             bindings::NV_VGPU_MSG_FUNCTION_SET_REGISTRY => Ok(MsgFunction::SetRegistry),
+            bindings::NV_VGPU_MSG_FUNCTION_UNLOADING_GUEST_DRIVER => {
+                Ok(MsgFunction::UnloadingGuestDriver)
+            }
 
             // Event codes
             bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE => Ok(MsgFunction::GspInitDone),
diff --git a/drivers/gpu/nova-core/gsp/fw/commands.rs b/drivers/gpu/nova-core/gsp/fw/commands.rs
index db46276430be..42985d446bae 100644
--- a/drivers/gpu/nova-core/gsp/fw/commands.rs
+++ b/drivers/gpu/nova-core/gsp/fw/commands.rs
@@ -1,4 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 
 use kernel::{
     device,
@@ -129,3 +130,47 @@ unsafe impl AsBytes for GspStaticConfigInfo {}
 // SAFETY: This struct only contains integer types for which all bit patterns
 // are valid.
 unsafe impl FromBytes for GspStaticConfigInfo {}
+
+/// Power level requested to the [`UnloadingGuestDriver`] command.
+#[derive(Clone, Copy, Debug, PartialEq, Eq)]
+#[repr(u32)]
+#[expect(unused)]
+pub(crate) enum PowerStateLevel {
+    /// Full unload.
+    Level0 = bindings::NV2080_CTRL_GPU_SET_POWER_STATE_GPU_LEVEL_0,
+    /// S3 (suspend to RAM).
+    Level3 = bindings::NV2080_CTRL_GPU_SET_POWER_STATE_GPU_LEVEL_3,
+    /// Hibernate (suspend to disk).
+    Level7 = bindings::NV2080_CTRL_GPU_SET_POWER_STATE_GPU_LEVEL_7,
+}
+
+impl PowerStateLevel {
+    /// Returns `true` if this state represents a power management transition, i.e. some GPU state
+    /// must survive it (as opposed to a full unload).
+    pub(crate) fn is_power_transition(self) -> bool {
+        self != PowerStateLevel::Level0
+    }
+}
+
+/// Payload of the `UnloadingGuestDriver` command and message.
+#[repr(transparent)]
+#[derive(Clone, Copy, Debug, Zeroable)]
+pub(crate) struct UnloadingGuestDriver(bindings::rpc_unloading_guest_driver_v1F_07);
+
+impl UnloadingGuestDriver {
+    pub(crate) fn new(level: PowerStateLevel) -> Self {
+        Self(bindings::rpc_unloading_guest_driver_v1F_07 {
+            bInPMTransition: u8::from(level.is_power_transition()),
+            bGc6Entering: 0,
+            newLevel: level as u32,
+            ..Zeroable::zeroed()
+        })
+    }
+}
+
+// SAFETY: Padding is explicit and will not contain uninitialized data.
+unsafe impl AsBytes for UnloadingGuestDriver {}
+
+// SAFETY: This struct only contains integer types for which all bit patterns
+// are valid.
+unsafe impl FromBytes for UnloadingGuestDriver {}
diff --git a/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs b/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
index 334e8be5fde8..f82ed097b283 100644
--- a/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
+++ b/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
@@ -30,6 +30,9 @@ fn fmt(&self, fmt: &mut ::core::fmt::Formatter<'_>) -> ::core::fmt::Result {
         fmt.write_str("__IncompleteArrayField")
     }
 }
+pub const NV2080_CTRL_GPU_SET_POWER_STATE_GPU_LEVEL_0: u32 = 0;
+pub const NV2080_CTRL_GPU_SET_POWER_STATE_GPU_LEVEL_3: u32 = 3;
+pub const NV2080_CTRL_GPU_SET_POWER_STATE_GPU_LEVEL_7: u32 = 7;
 pub const NV_VGPU_MSG_SIGNATURE_VALID: u32 = 1129337430;
 pub const GSP_FW_HEAP_PARAM_OS_SIZE_LIBOS2: u32 = 0;
 pub const GSP_FW_HEAP_PARAM_OS_SIZE_LIBOS3_BAREMETAL: u32 = 23068672;
@@ -880,6 +883,14 @@ fn default() -> Self {
     }
 }
 #[repr(C)]
+#[derive(Debug, Default, Copy, Clone, MaybeZeroable)]
+pub struct rpc_unloading_guest_driver_v1F_07 {
+    pub bInPMTransition: u8_,
+    pub bGc6Entering: u8_,
+    pub __bindgen_padding_0: [u8; 2usize],
+    pub newLevel: u32_,
+}
+#[repr(C)]
 #[derive(Debug, Default, MaybeZeroable)]
 pub struct rpc_run_cpu_sequencer_v17_00 {
     pub bufferSizeDWord: u32_,

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v7 3/4] gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding
  2026-05-29  7:33 [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
  2026-05-29  7:33 ` [PATCH v7 1/4] gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL Alexandre Courbot
  2026-05-29  7:33 ` [PATCH v7 2/4] gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading Alexandre Courbot
@ 2026-05-29  7:33 ` Alexandre Courbot
  2026-06-04  6:57   ` Claude review: " Claude Code Review Bot
  2026-05-29  7:33 ` [PATCH v7 4/4] gpu: nova-core: gsp: run the unload bundle if Gsp::boot() fails Alexandre Courbot
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Alexandre Courbot @ 2026-05-29  7:33 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter
  Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
	nova-gpu, dri-devel, linux-kernel, rust-for-linux,
	Alexandre Courbot

When probing the driver, the FWSEC-FRTS firmware creates a WPR2 secure
memory region to store the GSP firmware, and the Booter Loader loads and
starts that firmware into the GSP, making it run in RISC-V mode.

These operations need to be reverted upon unloading, particularly the
WPR2 secure region creation, as its presence prevents the driver from
subsequently probing.

Thus, prepare the Booter Unloader and FWSEC-SB firmware images when
booting the GSP, so they can be executed at unbind time to put the GPU
into a state where it can be probed again.

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Co-developed-by: Eliot Courtney <ecourtney@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/firmware/booter.rs |   1 -
 drivers/gpu/nova-core/firmware/fwsec.rs  |   1 -
 drivers/gpu/nova-core/gpu.rs             |  15 +++-
 drivers/gpu/nova-core/gsp.rs             |   3 +
 drivers/gpu/nova-core/gsp/boot.rs        |  38 +++++++--
 drivers/gpu/nova-core/gsp/hal.rs         |  21 ++++-
 drivers/gpu/nova-core/gsp/hal/gh100.rs   |   2 +-
 drivers/gpu/nova-core/gsp/hal/tu102.rs   | 142 ++++++++++++++++++++++++++++++-
 drivers/gpu/nova-core/regs.rs            |   5 ++
 9 files changed, 209 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/nova-core/firmware/booter.rs b/drivers/gpu/nova-core/firmware/booter.rs
index e45e5dc8d5d2..c5e17605e1a3 100644
--- a/drivers/gpu/nova-core/firmware/booter.rs
+++ b/drivers/gpu/nova-core/firmware/booter.rs
@@ -282,7 +282,6 @@ fn new_booter(data: &[u8]) -> Result<Self> {
 #[derive(Copy, Clone, Debug, PartialEq)]
 pub(crate) enum BooterKind {
     Loader,
-    #[expect(unused)]
     Unloader,
 }
 
diff --git a/drivers/gpu/nova-core/firmware/fwsec.rs b/drivers/gpu/nova-core/firmware/fwsec.rs
index 8810cb49db67..4108f28cd338 100644
--- a/drivers/gpu/nova-core/firmware/fwsec.rs
+++ b/drivers/gpu/nova-core/firmware/fwsec.rs
@@ -144,7 +144,6 @@ pub(crate) enum FwsecCommand {
     /// image into it.
     Frts { frts_addr: u64, frts_size: u64 },
     /// Asks [`FwsecFirmware`] to load pre-OS apps on the PMU.
-    #[expect(dead_code)]
     Sb,
 }
 
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 011d504830e4..aed992488db3 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -18,7 +18,10 @@
         Falcon, //
     },
     fb::SysmemFlush,
-    gsp::Gsp,
+    gsp::{
+        self,
+        Gsp, //
+    },
     regs,
 };
 
@@ -260,6 +263,8 @@ pub(crate) struct Gpu<'gpu> {
     /// GSP runtime data. Temporarily an empty placeholder.
     #[pin]
     gsp: Gsp,
+    /// GSP unload firmware bundle, if any.
+    unload_bundle: Option<gsp::UnloadBundle>,
 }
 
 impl<'gpu> Gpu<'gpu> {
@@ -293,7 +298,10 @@ pub(crate) fn new(
 
             gsp <- Gsp::new(pdev),
 
-            _: { gsp.boot(pdev, bar, spec.chipset, gsp_falcon, sec2_falcon)? },
+            // This member must be initialized last, so the `UnloadBundle` can never be dropped from
+            // outside of the constructed `Gpu`, ensuring that the unload sequence is properly run
+            // in case of failure.
+            unload_bundle: gsp.boot(pdev, bar, spec.chipset, gsp_falcon, sec2_falcon)?,
         })
     }
 }
@@ -304,12 +312,13 @@ fn drop(self: Pin<&mut Self>) {
         let this = self.project();
         let device = *this.device;
         let bar = *this.bar;
+        let bundle = this.unload_bundle.take();
 
         let _ = this
             .gsp
             .as_ref()
             .get_ref()
-            .unload(device, bar, &*this.gsp_falcon)
+            .unload(device, bar, &*this.gsp_falcon, &*this.sec2_falcon, bundle)
             .inspect_err(|e| dev_err!(device, "failed to unload GSP: {:?}\n", e));
     }
 }
diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
index 38378f104068..1885cfa5cb38 100644
--- a/drivers/gpu/nova-core/gsp.rs
+++ b/drivers/gpu/nova-core/gsp.rs
@@ -185,3 +185,6 @@ pub(crate) fn new(pdev: &pci::Device<device::Bound>) -> impl PinInit<Self, Error
         })
     }
 }
+
+/// Opaque bundle required to unload the GSP. Created by [`Gsp::boot`], consumed by [`Gsp::unload`].
+pub(crate) struct UnloadBundle(KBox<dyn hal::UnloadBundle>);
diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index adc66809ce83..8d6fcc35b653 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -38,7 +38,8 @@ impl super::Gsp {
     /// user-space, patching them with signatures, and building firmware-specific intricate data
     /// structures that the GSP will use at runtime.
     ///
-    /// Upon return, the GSP is up and running, and its runtime object given as return value.
+    /// Upon return, the GSP is up and running, and its unload bundle (to be given as argument to
+    /// [`Self::unload`]) returned.
     pub(crate) fn boot(
         self: Pin<&mut Self>,
         pdev: &pci::Device<device::Bound>,
@@ -46,7 +47,7 @@ pub(crate) fn boot(
         chipset: Chipset,
         gsp_falcon: &Falcon<Gsp>,
         sec2_falcon: &Falcon<Sec2>,
-    ) -> Result {
+    ) -> Result<Option<super::UnloadBundle>> {
         let dev = pdev.as_ref();
         let hal = super::hal::gsp_hal(chipset);
 
@@ -57,8 +58,8 @@ pub(crate) fn boot(
 
         let wpr_meta = Coherent::init(dev, GFP_KERNEL, GspFwWprMeta::new(&gsp_fw, &fb_layout))?;
 
-        // Perform the chipset-specific boot sequence.
-        hal.boot(
+        // Perform the chipset-specific boot sequence, and retrieve the unload bundle.
+        let unload_bundle = hal.boot(
             &self,
             dev,
             bar,
@@ -98,7 +99,7 @@ pub(crate) fn boot(
             Err(e) => dev_warn!(pdev, "GPU name unavailable: {:?}\n", e),
         }
 
-        Ok(())
+        Ok(unload_bundle)
     }
 
     /// Shut down the GSP and wait until it is offline.
@@ -130,16 +131,35 @@ pub(crate) fn unload(
         dev: &device::Device<device::Bound>,
         bar: &Bar0,
         gsp_falcon: &Falcon<Gsp>,
+        sec2_falcon: &Falcon<Sec2>,
+        unload_bundle: Option<super::UnloadBundle>,
     ) -> Result {
-        // Shut down the GSP.
-        Self::shutdown_gsp(
+        // Shut down the GSP. Keep going even in case of error.
+        let mut res = Self::shutdown_gsp(
             &self.cmdq,
             bar,
             gsp_falcon,
             commands::PowerStateLevel::Level0,
         )
-        .inspect_err(|e| dev_err!(dev, "Unload guest driver failed: {:?}\n", e))?;
+        .inspect_err(|e| dev_err!(dev, "GSP shutdown failed: {:?}\n", e));
 
-        Ok(())
+        // Run the unload bundle to reset the GSP so it can be booted again.
+        if let Some(unload_bundle) = unload_bundle {
+            res = res.and(
+                unload_bundle
+                    .0
+                    .run(dev, bar, gsp_falcon, sec2_falcon)
+                    .inspect_err(|e| dev_err!(dev, "Unload bundle failed: {:?}\n", e)),
+            );
+        } else {
+            dev_warn!(
+                dev,
+                "Unload bundle is missing, GSP won't be properly reset.\n"
+            );
+
+            res = Err(EAGAIN);
+        }
+
+        res.inspect(|()| dev_info!(dev, "GSP successfully unloaded\n"))
     }
 }
diff --git a/drivers/gpu/nova-core/gsp/hal.rs b/drivers/gpu/nova-core/gsp/hal.rs
index fb3edaeb3160..501b852dcb29 100644
--- a/drivers/gpu/nova-core/gsp/hal.rs
+++ b/drivers/gpu/nova-core/gsp/hal.rs
@@ -30,9 +30,28 @@
     },
 };
 
+/// Trait for types containing the resources and code required to fully reset the GSP.
+///
+/// The GSP unload code might run in a situation where we cannot load firmware dynamically (e.g.
+/// because we are in shutdown and the file system is not accessible anymore). Thus, the firmware
+/// required for unloading is prepared at load time, and stored here until it needs to be run.
+pub(super) trait UnloadBundle: Send {
+    /// Performs the steps required to properly reset the GSP after it has been stopped.
+    fn run(
+        &self,
+        dev: &device::Device<device::Bound>,
+        bar: &Bar0,
+        gsp_falcon: &Falcon<GspEngine>,
+        sec2_falcon: &Falcon<Sec2>,
+    ) -> Result;
+}
+
 /// Trait implemented by GSP HALs.
 pub(super) trait GspHal: Send {
     /// Performs the GSP boot process, loading and running the required firmwares as needed.
+    ///
+    /// Upon success, returns the [`UnloadBundle`] to be run (if any) in order to properly reset the
+    /// GSP after it has been stopped.
     #[allow(clippy::too_many_arguments)]
     fn boot(
         &self,
@@ -44,7 +63,7 @@ fn boot(
         wpr_meta: &Coherent<GspFwWprMeta>,
         gsp_falcon: &Falcon<GspEngine>,
         sec2_falcon: &Falcon<Sec2>,
-    ) -> Result;
+    ) -> Result<Option<crate::gsp::UnloadBundle>>;
 
     /// Performs HAL-specific post-GSP boot tasks.
     ///
diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
index 3f3675f9c16a..0a8b7f763883 100644
--- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
+++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
@@ -41,7 +41,7 @@ fn boot(
         _wpr_meta: &Coherent<GspFwWprMeta>,
         _gsp_falcon: &Falcon<GspEngine>,
         _sec2_falcon: &Falcon<Sec2>,
-    ) -> Result {
+    ) -> Result<Option<crate::gsp::UnloadBundle>> {
         Err(ENOTSUPP)
     }
 }
diff --git a/drivers/gpu/nova-core/gsp/hal/tu102.rs b/drivers/gpu/nova-core/gsp/hal/tu102.rs
index a6f2b2e279e8..c4ab081f25c4 100644
--- a/drivers/gpu/nova-core/gsp/hal/tu102.rs
+++ b/drivers/gpu/nova-core/gsp/hal/tu102.rs
@@ -32,7 +32,10 @@
     },
     gpu::Chipset,
     gsp::{
-        hal::GspHal,
+        hal::{
+            GspHal,
+            UnloadBundle, //
+        },
         sequencer::{
             GspSequencer,
             GspSequencerParams, //
@@ -44,6 +47,124 @@
     vbios::Vbios, //
 };
 
+// A ready-to-run FWSEC unload firmware.
+//
+// Since there are two variants of the prepared firmware (with and without a bootloader), this type
+// abstracts the difference.
+enum FwsecUnloadFirmware {
+    WithoutBl(FwsecFirmware),
+    WithBl(FwsecFirmwareWithBl),
+}
+
+impl FwsecUnloadFirmware {
+    /// Loads the FWSEC SB firmware, as well as its bootloader if `chipset` requires it.
+    fn new(
+        dev: &device::Device<device::Bound>,
+        bar: &Bar0,
+        chipset: Chipset,
+        bios: &Vbios,
+        gsp_falcon: &Falcon<GspEngine>,
+    ) -> Result<Self> {
+        let fwsec_sb = FwsecFirmware::new(dev, gsp_falcon, bar, bios, FwsecCommand::Sb)?;
+
+        Ok(if chipset.needs_fwsec_bootloader() {
+            Self::WithBl(FwsecFirmwareWithBl::new(fwsec_sb, dev, chipset)?)
+        } else {
+            Self::WithoutBl(fwsec_sb)
+        })
+    }
+
+    /// Runs the FWSEC SB firmware.
+    fn run(
+        &self,
+        dev: &device::Device<device::Bound>,
+        bar: &Bar0,
+        gsp_falcon: &Falcon<GspEngine>,
+    ) -> Result<()> {
+        match self {
+            Self::WithoutBl(fw) => fw.run(dev, gsp_falcon, bar),
+            Self::WithBl(fw) => fw.run(dev, gsp_falcon, bar),
+        }
+    }
+}
+
+// Contains the firmware required to fully reset GSP on chipsets where the GSP is started using
+// FWSEC/Booter.
+struct Sec2UnloadBundle {
+    fwsec_sb: FwsecUnloadFirmware,
+    booter_unloader: BooterFirmware,
+}
+
+impl Sec2UnloadBundle {
+    /// Load and prepare the resources required to properly reset the GSP after it has been stopped.
+    fn build(
+        dev: &device::Device<device::Bound>,
+        bar: &Bar0,
+        chipset: Chipset,
+        bios: &Vbios,
+        gsp_falcon: &Falcon<GspEngine>,
+        sec2_falcon: &Falcon<Sec2>,
+    ) -> Result<KBox<dyn UnloadBundle>> {
+        KBox::new(
+            Self {
+                fwsec_sb: FwsecUnloadFirmware::new(dev, bar, chipset, bios, gsp_falcon)?,
+                booter_unloader: BooterFirmware::new(
+                    dev,
+                    BooterKind::Unloader,
+                    chipset,
+                    FIRMWARE_VERSION,
+                    sec2_falcon,
+                    bar,
+                )?,
+            },
+            GFP_KERNEL,
+        )
+        .map(|b| b as KBox<dyn UnloadBundle>)
+        .map_err(Into::into)
+    }
+}
+
+impl UnloadBundle for Sec2UnloadBundle {
+    fn run(
+        &self,
+        dev: &device::Device<device::Bound>,
+        bar: &Bar0,
+        gsp_falcon: &Falcon<GspEngine>,
+        sec2_falcon: &Falcon<Sec2>,
+    ) -> Result<()> {
+        // Run FWSEC-SB to reset the GSP falcon to its pre-libos state.
+        self.fwsec_sb.run(dev, bar, gsp_falcon)?;
+
+        // Remove WPR2 region if set.
+        let wpr2_hi = bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI);
+        if wpr2_hi.is_wpr2_set() {
+            sec2_falcon.reset(bar)?;
+            sec2_falcon.load(dev, bar, &self.booter_unloader)?;
+
+            // Sentinel value to confirm that Booter Unloader has run.
+            const MAILBOX_SENTINEL: u32 = 0xff;
+            let (mbox0, _) =
+                sec2_falcon.boot(bar, Some(MAILBOX_SENTINEL), Some(MAILBOX_SENTINEL))?;
+            if mbox0 != 0 {
+                dev_err!(dev, "Booter Unloader returned error 0x{:x}\n", mbox0);
+                return Err(EINVAL);
+            }
+
+            // Confirm that the WPR2 region has been removed.
+            let wpr2_hi = bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI);
+            if wpr2_hi.is_wpr2_set() {
+                dev_err!(
+                    dev,
+                    "WPR2 region still set after Booter Unloader returned\n"
+                );
+                return Err(EBUSY);
+            }
+        }
+
+        Ok(())
+    }
+}
+
 /// Helper function to load and run the FWSEC-FRTS firmware and confirm that it has properly
 /// created the WPR2 region.
 fn run_fwsec_frts(
@@ -143,9 +264,24 @@ fn boot(
         wpr_meta: &Coherent<GspFwWprMeta>,
         gsp_falcon: &Falcon<GspEngine>,
         sec2_falcon: &Falcon<Sec2>,
-    ) -> Result {
+    ) -> Result<Option<crate::gsp::UnloadBundle>> {
         let bios = Vbios::new(dev, bar)?;
 
+        // Try and prepare the unload bundle. If this fails, the GPU will need to be reset
+        // before the driver can be probed again.
+        let unload_bundle =
+            Sec2UnloadBundle::build(dev, bar, chipset, &bios, gsp_falcon, sec2_falcon)
+                .inspect_err(|e| {
+                    dev_warn!(dev, "Failed to prepare unload firmware: {:?}\n", e);
+                    dev_warn!(dev, "The GSP won't be able to unload properly on unbind.\n");
+                    dev_warn!(
+                        dev,
+                        "The GPU will need to be reset before the driver can bind again.\n"
+                    );
+                })
+                .map(crate::gsp::UnloadBundle)
+                .ok();
+
         // FWSEC-FRTS is not executed on chips where the FRTS region size is 0 (e.g. GA100).
         if !fb_layout.frts.is_empty() {
             run_fwsec_frts(dev, chipset, gsp_falcon, bar, &bios, fb_layout)?;
@@ -175,7 +311,7 @@ fn boot(
         )?
         .run(dev, bar, sec2_falcon, wpr_meta)?;
 
-        Ok(())
+        Ok(unload_bundle)
     }
 
     fn post_boot(
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index 6faeed73901d..356fbf364ea5 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -175,6 +175,11 @@ impl NV_PFB_PRI_MMU_WPR2_ADDR_HI {
     pub(crate) fn higher_bound(self) -> u64 {
         u64::from(self.hi_val()) << 12
     }
+
+    /// Returns whether the WPR2 region is currently set.
+    pub(crate) fn is_wpr2_set(self) -> bool {
+        self.hi_val() != 0
+    }
 }
 
 // PGSP

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v7 4/4] gpu: nova-core: gsp: run the unload bundle if Gsp::boot() fails
  2026-05-29  7:33 [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
                   ` (2 preceding siblings ...)
  2026-05-29  7:33 ` [PATCH v7 3/4] gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding Alexandre Courbot
@ 2026-05-29  7:33 ` Alexandre Courbot
  2026-05-30  1:46   ` Eliot Courtney
  2026-06-04  6:57   ` Claude review: " Claude Code Review Bot
  2026-05-29 11:15 ` [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding Danilo Krummrich
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 19+ messages in thread
From: Alexandre Courbot @ 2026-05-29  7:33 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter
  Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
	nova-gpu, dri-devel, linux-kernel, rust-for-linux,
	Alexandre Courbot

If `Gsp::boot` fails, the GSP can be left in a state where boot cannot
be attempted again unless it is reset first.

To avoid this, we want to run the unload bundle whenever `boot` fails to
try and clear the partially-initialized state.

Do this by wrapping the unload bundle into a drop guard up until `boot`
returns. After that, running the unload bundle becomes the
responsibility of the caller.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gsp/boot.rs      | 67 ++++++++++++++++++++++++++++++++--
 drivers/gpu/nova-core/gsp/hal.rs       | 19 +++++-----
 drivers/gpu/nova-core/gsp/hal/gh100.rs | 15 ++++----
 drivers/gpu/nova-core/gsp/hal/tu102.rs | 31 ++++++++++------
 4 files changed, 101 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 8d6fcc35b653..1f83f63ceeb0 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -8,7 +8,8 @@
     io::poll::read_poll_timeout,
     pci,
     prelude::*,
-    time::Delta, //
+    time::Delta,
+    types::ScopeGuard, //
 };
 
 use crate::{
@@ -31,6 +32,66 @@
     },
 };
 
+/// Arguments required to call [`Gsp::unload`](super::Gsp::unload).
+///
+/// Stored as their own type to avoid repeating a long and tedious list in [`BootUnloadGuard`].
+pub(super) struct BootUnloadArgs<'a> {
+    gsp: &'a super::Gsp,
+    dev: &'a device::Device<device::Bound>,
+    bar: &'a Bar0,
+    gsp_falcon: &'a Falcon<Gsp>,
+    sec2_falcon: &'a Falcon<Sec2>,
+    unload_bundle: Option<super::UnloadBundle>,
+}
+
+/// Guard that calls [`Gsp::unload`](super::Gsp::unload) with a
+/// [`UnloadBundle`](super::UnloadBundle) when dropped.
+///
+/// Used to ensure the `UnloadBundle` is run during failure paths.
+pub(super) struct BootUnloadGuard<'a> {
+    guard: ScopeGuard<BootUnloadArgs<'a>, fn(BootUnloadArgs<'a>)>,
+}
+
+impl<'a> BootUnloadGuard<'a> {
+    /// Wraps `unload_bundle` into a guard that executes it when dropped.
+    pub(super) fn new(
+        gsp: &'a super::Gsp,
+        dev: &'a device::Device<device::Bound>,
+        bar: &'a Bar0,
+        gsp_falcon: &'a Falcon<Gsp>,
+        sec2_falcon: &'a Falcon<Sec2>,
+        unload_bundle: Option<super::UnloadBundle>,
+    ) -> Self {
+        Self {
+            guard: ScopeGuard::new_with_data(
+                BootUnloadArgs {
+                    gsp,
+                    dev,
+                    bar,
+                    gsp_falcon,
+                    sec2_falcon,
+                    unload_bundle,
+                },
+                |args| {
+                    let _ = super::Gsp::unload(
+                        args.gsp,
+                        args.dev,
+                        args.bar,
+                        args.gsp_falcon,
+                        args.sec2_falcon,
+                        args.unload_bundle,
+                    );
+                },
+            ),
+        }
+    }
+
+    /// Disarms the guard and returns the [`UnloadBundle`](super::UnloadBundle) it contains.
+    pub(super) fn dismiss(self) -> Option<super::UnloadBundle> {
+        self.guard.dismiss().unload_bundle
+    }
+}
+
 impl super::Gsp {
     /// Attempt to boot the GSP.
     ///
@@ -59,7 +120,7 @@ pub(crate) fn boot(
         let wpr_meta = Coherent::init(dev, GFP_KERNEL, GspFwWprMeta::new(&gsp_fw, &fb_layout))?;
 
         // Perform the chipset-specific boot sequence, and retrieve the unload bundle.
-        let unload_bundle = hal.boot(
+        let unload_guard = hal.boot(
             &self,
             dev,
             bar,
@@ -99,7 +160,7 @@ pub(crate) fn boot(
             Err(e) => dev_warn!(pdev, "GPU name unavailable: {:?}\n", e),
         }
 
-        Ok(unload_bundle)
+        Ok(unload_guard.dismiss())
     }
 
     /// Shut down the GSP and wait until it is offline.
diff --git a/drivers/gpu/nova-core/gsp/hal.rs b/drivers/gpu/nova-core/gsp/hal.rs
index 501b852dcb29..88fc3e791114 100644
--- a/drivers/gpu/nova-core/gsp/hal.rs
+++ b/drivers/gpu/nova-core/gsp/hal.rs
@@ -25,6 +25,7 @@
         Chipset, //
     },
     gsp::{
+        boot::BootUnloadGuard,
         Gsp,
         GspFwWprMeta, //
     },
@@ -50,20 +51,20 @@ fn run(
 pub(super) trait GspHal: Send {
     /// Performs the GSP boot process, loading and running the required firmwares as needed.
     ///
-    /// Upon success, returns the [`UnloadBundle`] to be run (if any) in order to properly reset the
-    /// GSP after it has been stopped.
+    /// Upon success, returns a guard that runs the GSP unload sequence if GSP boot does not
+    /// complete.
     #[allow(clippy::too_many_arguments)]
-    fn boot(
+    fn boot<'a>(
         &self,
-        gsp: &Gsp,
-        dev: &device::Device<device::Bound>,
-        bar: &Bar0,
+        gsp: &'a Gsp,
+        dev: &'a device::Device<device::Bound>,
+        bar: &'a Bar0,
         chipset: Chipset,
         fb_layout: &FbLayout,
         wpr_meta: &Coherent<GspFwWprMeta>,
-        gsp_falcon: &Falcon<GspEngine>,
-        sec2_falcon: &Falcon<Sec2>,
-    ) -> Result<Option<crate::gsp::UnloadBundle>>;
+        gsp_falcon: &'a Falcon<GspEngine>,
+        sec2_falcon: &'a Falcon<Sec2>,
+    ) -> Result<BootUnloadGuard<'a>>;
 
     /// Performs HAL-specific post-GSP boot tasks.
     ///
diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
index 0a8b7f763883..9a4bb22578b3 100644
--- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
+++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
@@ -18,6 +18,7 @@
     fb::FbLayout,
     gpu::Chipset,
     gsp::{
+        boot::BootUnloadGuard,
         hal::GspHal,
         Gsp,
         GspFwWprMeta, //
@@ -31,17 +32,17 @@ impl GspHal for Gh100 {
     ///
     /// This path uses FSP to establish a chain of trust and boot GSP-FMC. FSP handles
     /// the GSP boot internally - no manual GSP reset/boot is needed.
-    fn boot(
+    fn boot<'a>(
         &self,
-        _gsp: &Gsp,
-        _dev: &device::Device<device::Bound>,
-        _bar: &Bar0,
+        _gsp: &'a Gsp,
+        _dev: &'a device::Device<device::Bound>,
+        _bar: &'a Bar0,
         _chipset: Chipset,
         _fb_layout: &FbLayout,
         _wpr_meta: &Coherent<GspFwWprMeta>,
-        _gsp_falcon: &Falcon<GspEngine>,
-        _sec2_falcon: &Falcon<Sec2>,
-    ) -> Result<Option<crate::gsp::UnloadBundle>> {
+        _gsp_falcon: &'a Falcon<GspEngine>,
+        _sec2_falcon: &'a Falcon<Sec2>,
+    ) -> Result<BootUnloadGuard<'a>> {
         Err(ENOTSUPP)
     }
 }
diff --git a/drivers/gpu/nova-core/gsp/hal/tu102.rs b/drivers/gpu/nova-core/gsp/hal/tu102.rs
index c4ab081f25c4..6a27e7e90279 100644
--- a/drivers/gpu/nova-core/gsp/hal/tu102.rs
+++ b/drivers/gpu/nova-core/gsp/hal/tu102.rs
@@ -32,6 +32,7 @@
     },
     gpu::Chipset,
     gsp::{
+        boot::BootUnloadGuard,
         hal::{
             GspHal,
             UnloadBundle, //
@@ -254,21 +255,23 @@ fn run_fwsec_frts(
 struct Tu102;
 
 impl GspHal for Tu102 {
-    fn boot(
+    fn boot<'a>(
         &self,
-        gsp: &Gsp,
-        dev: &device::Device<device::Bound>,
-        bar: &Bar0,
+        gsp: &'a Gsp,
+        dev: &'a device::Device<device::Bound>,
+        bar: &'a Bar0,
         chipset: Chipset,
         fb_layout: &FbLayout,
         wpr_meta: &Coherent<GspFwWprMeta>,
-        gsp_falcon: &Falcon<GspEngine>,
-        sec2_falcon: &Falcon<Sec2>,
-    ) -> Result<Option<crate::gsp::UnloadBundle>> {
+        gsp_falcon: &'a Falcon<GspEngine>,
+        sec2_falcon: &'a Falcon<Sec2>,
+    ) -> Result<BootUnloadGuard<'a>> {
         let bios = Vbios::new(dev, bar)?;
 
-        // Try and prepare the unload bundle. If this fails, the GPU will need to be reset
-        // before the driver can be probed again.
+        // Try and prepare the unload bundle.
+        //
+        // If the unload bundle creation fails, the GPU will need to be reset before the driver can
+        // be probed again.
         let unload_bundle =
             Sec2UnloadBundle::build(dev, bar, chipset, &bios, gsp_falcon, sec2_falcon)
                 .inspect_err(|e| {
@@ -279,8 +282,12 @@ fn boot(
                         "The GPU will need to be reset before the driver can bind again.\n"
                     );
                 })
-                .map(crate::gsp::UnloadBundle)
-                .ok();
+                .ok()
+                .map(crate::gsp::UnloadBundle);
+
+        // Wrap the unload bundle into a drop guard so it is automatically run upon failure.
+        let unload_guard =
+            BootUnloadGuard::new(gsp, dev, bar, gsp_falcon, sec2_falcon, unload_bundle);
 
         // FWSEC-FRTS is not executed on chips where the FRTS region size is 0 (e.g. GA100).
         if !fb_layout.frts.is_empty() {
@@ -311,7 +318,7 @@ fn boot(
         )?
         .run(dev, bar, sec2_falcon, wpr_meta)?;
 
-        Ok(unload_bundle)
+        Ok(unload_guard)
     }
 
     fn post_boot(

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding
  2026-05-29  7:33 [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
                   ` (3 preceding siblings ...)
  2026-05-29  7:33 ` [PATCH v7 4/4] gpu: nova-core: gsp: run the unload bundle if Gsp::boot() fails Alexandre Courbot
@ 2026-05-29 11:15 ` Danilo Krummrich
  2026-05-29 13:06   ` Alexandre Courbot
  2026-05-30  5:55 ` Alexandre Courbot
  2026-06-04  6:57 ` Claude review: " Claude Code Review Bot
  6 siblings, 1 reply; 19+ messages in thread
From: Danilo Krummrich @ 2026-05-29 11:15 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Alice Ryhl, David Airlie, Simona Vetter, John Hubbard,
	Alistair Popple, Timur Tabi, Eliot Courtney, nova-gpu, dri-devel,
	linux-kernel, rust-for-linux

On Fri May 29, 2026 at 9:33 AM CEST, Alexandre Courbot wrote:
> Alexandre Courbot (4):
>       gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL
>       gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading
>       gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding
>       gpu: nova-core: gsp: run the unload bundle if Gsp::boot() fails

Reviewed-by: Danilo Krummrich <dakr@kernel.org>

NIT: There's a few places with Result<()> instead of just Result that could be
cleaned up (including the moved code).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding
  2026-05-29 11:15 ` [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding Danilo Krummrich
@ 2026-05-29 13:06   ` Alexandre Courbot
  0 siblings, 0 replies; 19+ messages in thread
From: Alexandre Courbot @ 2026-05-29 13:06 UTC (permalink / raw)
  To: Danilo Krummrich
  Cc: Alice Ryhl, David Airlie, Simona Vetter, John Hubbard,
	Alistair Popple, Timur Tabi, Eliot Courtney, nova-gpu, dri-devel,
	linux-kernel, rust-for-linux

On Fri May 29, 2026 at 8:15 PM JST, Danilo Krummrich wrote:
> On Fri May 29, 2026 at 9:33 AM CEST, Alexandre Courbot wrote:
>> Alexandre Courbot (4):
>>       gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL
>>       gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading
>>       gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding
>>       gpu: nova-core: gsp: run the unload bundle if Gsp::boot() fails
>
> Reviewed-by: Danilo Krummrich <dakr@kernel.org>
>
> NIT: There's a few places with Result<()> instead of just Result that could be
> cleaned up (including the moved code).

Oops, I don't know why I always write them this way. I've fixed locally
and will apply upon push if a v8 is not needed, thanks!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 4/4] gpu: nova-core: gsp: run the unload bundle if Gsp::boot() fails
  2026-05-29  7:33 ` [PATCH v7 4/4] gpu: nova-core: gsp: run the unload bundle if Gsp::boot() fails Alexandre Courbot
@ 2026-05-30  1:46   ` Eliot Courtney
  2026-06-04  6:57   ` Claude review: " Claude Code Review Bot
  1 sibling, 0 replies; 19+ messages in thread
From: Eliot Courtney @ 2026-05-30  1:46 UTC (permalink / raw)
  To: Alexandre Courbot, Danilo Krummrich, Alice Ryhl, David Airlie,
	Simona Vetter
  Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
	nova-gpu, dri-devel, linux-kernel, rust-for-linux, dri-devel

On Fri May 29, 2026 at 4:33 PM JST, Alexandre Courbot wrote:
> If `Gsp::boot` fails, the GSP can be left in a state where boot cannot
> be attempted again unless it is reset first.
>
> To avoid this, we want to run the unload bundle whenever `boot` fails to
> try and clear the partially-initialized state.
>
> Do this by wrapping the unload bundle into a drop guard up until `boot`
> returns. After that, running the unload bundle becomes the
> responsibility of the caller.
>
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding
  2026-05-29  7:33 [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
                   ` (4 preceding siblings ...)
  2026-05-29 11:15 ` [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding Danilo Krummrich
@ 2026-05-30  5:55 ` Alexandre Courbot
  2026-06-04  6:57 ` Claude review: " Claude Code Review Bot
  6 siblings, 0 replies; 19+ messages in thread
From: Alexandre Courbot @ 2026-05-30  5:55 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter
  Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
	nova-gpu, dri-devel, linux-kernel, rust-for-linux,
	Alexandre Courbot

On Fri May 29, 2026 at 4:33 PM JST, Alexandre Courbot wrote:
>       gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL
>       gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading
>       gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding
>       gpu: nova-core: gsp: run the unload bundle if Gsp::boot() fails

All pushed to drm-rust-next (after s/Result<()>/Result)! Thanks for the
reviews.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Claude review: gpu: nova-core: run unload sequence upon unbinding
  2026-05-29  7:33 [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
                   ` (5 preceding siblings ...)
  2026-05-30  5:55 ` Alexandre Courbot
@ 2026-06-04  6:57 ` Claude Code Review Bot
  6 siblings, 0 replies; 19+ messages in thread
From: Claude Code Review Bot @ 2026-06-04  6:57 UTC (permalink / raw)
  To: dri-devel-reviews

Overall Series Review

Subject: gpu: nova-core: run unload sequence upon unbinding
Author: Alexandre Courbot <acourbot@nvidia.com>
Patches: 9
Reviewed: 2026-06-04T16:57:32.956770

---

This is a well-structured 4-patch series by Alexandre Courbot (NVIDIA) that adds proper GPU unload/teardown for the nova-core driver. Currently, when the driver unbinds, the GSP firmware is left running and the WPR2 memory region is not cleaned up, which means a GPU reset is required before the driver can probe again. This is particularly problematic on Blackwell where PCI reset can corrupt the GPU.

The series is logically decomposed:
1. Extract chipset-specific boot code into a HAL pattern
2. Send the `UNLOADING_GUEST_DRIVER` command to stop the GSP
3. Run Booter Unloader and FWSEC-SB to reset WPR2 region
4. Add a drop guard to run the unload bundle on boot failure

The code quality is high — good error handling, proper use of Rust's type system to enforce lifecycle constraints (one-shot unload bundle, drop guards), and the HAL abstraction prepares cleanly for Hopper/Blackwell FSP boot paths. The v7 changelog shows this series has received thorough review iterations.

**Verdict**: Looks good overall. A few observations below, mostly minor.

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Claude review: gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL
  2026-05-29  7:33 ` [PATCH v7 1/4] gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL Alexandre Courbot
@ 2026-06-04  6:57   ` Claude Code Review Bot
  0 siblings, 0 replies; 19+ messages in thread
From: Claude Code Review Bot @ 2026-06-04  6:57 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

This is a mechanical extraction of the boot code into a HAL trait pattern. The existing `run_fwsec_frts` and boot-related logic moves from `gsp/boot.rs` into `gsp/hal/tu102.rs`, with a trait `GspHal` defined in `gsp/hal.rs` and a stub `Gh100` HAL.

**Observations:**

1. The HAL dispatch is a static function returning `&'static dyn GspHal`:
```rust
pub(super) fn gsp_hal(chipset: Chipset) -> &'static dyn GspHal {
    match chipset.arch() {
        Architecture::Turing | Architecture::Ampere | Architecture::Ada => tu102::TU102_HAL,
        Architecture::Hopper | Architecture::BlackwellGB10x | Architecture::BlackwellGB20x => {
            gh100::GH100_HAL
        }
    }
}
```
This is clean — the match is exhaustive, the static dispatch avoids allocations, and the `dyn GspHal` provides the needed polymorphism. The `Send` bound on the trait is appropriate since the HAL instances are `&'static`.

2. The `post_boot` trait method has a default no-op implementation, which makes sense since GH100 doesn't need sequencer logic. Good trait design.

3. The `boot` method in `tu102.rs` takes `gsp: &Gsp` (versus `self: Pin<&mut Self>` in the original). This works because the HAL implementation accesses `gsp.libos` and `gsp.cmdq` through shared references, and those fields provide interior mutability where needed.

4. The `#[allow(clippy::too_many_arguments)]` on the trait method is pragmatic — the parameters are all genuinely needed. No concern here.

5. Minor: in `gsp/boot.rs` the call passes `&self` to `hal.boot()`:
```rust
hal.boot(
    &self,
    dev,
    ...
```
This is fine because `self` here is `Pin<&mut Gsp>` and `&self` reborrrows as `&Gsp`, which is what the HAL expects.

**No issues.** This is a clean mechanical refactoring.

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Claude review: gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading
  2026-05-29  7:33 ` [PATCH v7 2/4] gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading Alexandre Courbot
@ 2026-06-04  6:57   ` Claude Code Review Bot
  0 siblings, 0 replies; 19+ messages in thread
From: Claude Code Review Bot @ 2026-06-04  6:57 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

This patch makes `Gpu` implement `PinnedDrop` to send the `UNLOADING_GUEST_DRIVER` RPC to the GSP when the driver unbinds.

**Observations:**

1. The `Gpu` struct gains a `device` field to retain the device reference for the drop path:
```rust
#[pin_data(PinnedDrop)]
pub(crate) struct Gpu<'gpu> {
    device: &'gpu device::Device<device::Bound>,
    ...
```
This is necessary since `PinnedDrop::drop` doesn't receive the device as a parameter. Note the upstream base already has `bar: &'gpu Bar0` so this is consistent with the existing pattern.

2. The `PinnedDrop` implementation:
```rust
fn drop(self: Pin<&mut Self>) {
    let this = self.project();
    let device = *this.device;
    let bar = *this.bar;

    let _ = this
        .gsp
        .as_ref()
        .get_ref()
        .unload(device, bar, &*this.gsp_falcon)
        .inspect_err(|e| dev_err!(device, "failed to unload GSP: {:?}\n", e));
}
```
The `let _ =` properly discards the error after logging it — correct for a drop path where there's nothing more to do. The `.as_ref().get_ref()` chain on the pin projection is the standard way to get `&Gsp` from a pinned field.

3. The `shutdown_gsp` method correctly polls mailbox0 for the `LIBOS_INTERRUPT_PROCESSOR_SUSPENDED` bit:
```rust
const LIBOS_INTERRUPT_PROCESSOR_SUSPENDED: u32 = bits::bit_u32(31);
read_poll_timeout(
    || Ok(gsp_falcon.read_mailbox0(bar)),
    |&mb0| mb0 & LIBOS_INTERRUPT_PROCESSOR_SUSPENDED != 0,
    Delta::from_millis(10),
    Delta::from_secs(5),
)
```
5-second timeout with 10ms polling interval is reasonable for firmware shutdown.

4. The `PowerStateLevel` enum uses `#[expect(unused)]` which correctly anticipates that `Level3` and `Level7` are not used yet but will be needed for suspend/hibernate support. Only `Level0` (full unload) is used currently.

5. The `UnloadingGuestDriver` command implementation follows the existing pattern for `SetRegistry` and `SetGuestSystemInfo` commands — consistent and correct.

6. The `SAFETY` comments on `AsBytes` and `FromBytes` impls for `UnloadingGuestDriver` look correct — the struct is `#[repr(transparent)]` wrapping a C struct with explicit padding (`__bindgen_padding_0: [u8; 2usize]`), so no uninitialized padding.

7. `is_power_transition` method:
```rust
pub(crate) fn is_power_transition(self) -> bool {
    self != PowerStateLevel::Level0
}
```
This is clean and semantically clear.

**No issues.** Well-implemented.

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Claude review: gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding
  2026-05-29  7:33 ` [PATCH v7 3/4] gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding Alexandre Courbot
@ 2026-06-04  6:57   ` Claude Code Review Bot
  0 siblings, 0 replies; 19+ messages in thread
From: Claude Code Review Bot @ 2026-06-04  6:57 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

This is the most substantial patch. It prepares the Booter Unloader and FWSEC-SB firmware at probe time (since filesystem may not be available at unbind time) and runs them during teardown to clear the WPR2 region.

**Observations:**

1. The `FwsecUnloadFirmware` enum handling the with/without bootloader variants is clean:
```rust
enum FwsecUnloadFirmware {
    WithoutBl(FwsecFirmware),
    WithBl(FwsecFirmwareWithBl),
}
```
This avoids conditional logic at run time and makes both paths type-safe.

2. The `Sec2UnloadBundle` correctly pre-loads firmware at boot:
```rust
fn build(...) -> Result<KBox<dyn UnloadBundle>> {
    KBox::new(
        Self {
            fwsec_sb: FwsecUnloadFirmware::new(dev, bar, chipset, bios, gsp_falcon)?,
            booter_unloader: BooterFirmware::new(dev, BooterKind::Unloader, ...)?
        },
        GFP_KERNEL,
    )
    .map(|b| b as KBox<dyn UnloadBundle>)
    .map_err(Into::into)
}
```
Using `KBox<dyn UnloadBundle>` for the trait object is correct for kernel heap allocation.

3. The unload bundle run sequence is well-ordered:
```rust
fn run(&self, dev, bar, gsp_falcon, sec2_falcon) -> Result<()> {
    // 1. Run FWSEC-SB to reset GSP falcon
    self.fwsec_sb.run(dev, bar, gsp_falcon)?;
    // 2. Remove WPR2 via Booter Unloader
    if wpr2_hi.is_wpr2_set() { ... }
}
```
FWSEC-SB first (resets falcon to pre-libos state), then Booter Unloader (removes WPR2). This matches OpenRM's sequence.

4. The mailbox sentinel pattern in the Booter Unloader execution:
```rust
const MAILBOX_SENTINEL: u32 = 0xff;
let (mbox0, _) = sec2_falcon.boot(bar, Some(MAILBOX_SENTINEL), Some(MAILBOX_SENTINEL))?;
if mbox0 != 0 {
    dev_err!(dev, "Booter Unloader returned error 0x{:x}\n", mbox0);
    return Err(EINVAL);
}
```
Writing `0xff` to both mailboxes and then checking mbox0 == 0 for success matches the pattern described in prior versions' changelogs as aligning with OpenRM behavior. Good.

5. **Important design decision**: The unload bundle preparation failure is non-fatal during boot:
```rust
let unload_bundle =
    Sec2UnloadBundle::build(...)
        .inspect_err(|e| {
            dev_warn!(dev, "Failed to prepare unload firmware: {:?}\n", e);
            ...
        })
        .ok();
```
Converting the error to `None` via `.ok()` and continuing boot is the right tradeoff — failing to prepare unload firmware shouldn't prevent the GPU from being used, just means manual reset will be needed.

6. The `UnloadBundle` wrapper type in `gsp.rs`:
```rust
pub(crate) struct UnloadBundle(KBox<dyn hal::UnloadBundle>);
```
This provides an opaque boundary between the HAL-internal trait and the public API. Clean encapsulation.

7. In `gpu.rs`, the unload path continues even if GSP shutdown fails:
```rust
let mut res = Self::shutdown_gsp(...)
    .inspect_err(|e| dev_err!(dev, "GSP shutdown failed: {:?}\n", e));

if let Some(unload_bundle) = unload_bundle {
    res = res.and(
        unload_bundle.0.run(...)
            .inspect_err(|e| dev_err!(dev, "Unload bundle failed: {:?}\n", e)),
    );
}
```
Using `res.and()` means if shutdown fails, the first error is preserved, but we still attempt the unload bundle. This is correct — even if the GSP didn't respond to the shutdown command, the booter unloader / FWSEC-SB may still succeed at resetting hardware state.

8. The `is_wpr2_set` helper is a trivial but useful abstraction:
```rust
pub(crate) fn is_wpr2_set(self) -> bool {
    self.hi_val() != 0
}
```
This replaces `higher_bound() != 0` with a semantically clearer method. Note: `higher_bound()` shifts by 12, so checking `hi_val() != 0` is equivalent to checking `higher_bound() != 0` and avoids the unnecessary shift. Slightly more efficient, though it doesn't matter in practice.

9. Minor: `unload` now warns if no unload bundle is available and returns `Err(EAGAIN)`. `EAGAIN` seems slightly unusual here — the operation won't succeed on retry without external intervention (GPU reset). However, the cover letter explains this situation occurs only when bundle preparation fails, and the user was already warned at boot time. The error code won't propagate to userspace in the normal flow (it's in `PinnedDrop`), so this is mostly for internal tracking. Acceptable.

**No issues.** This is solid work.

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Claude review: gpu: nova-core: gsp: run the unload bundle if Gsp::boot() fails
  2026-05-29  7:33 ` [PATCH v7 4/4] gpu: nova-core: gsp: run the unload bundle if Gsp::boot() fails Alexandre Courbot
  2026-05-30  1:46   ` Eliot Courtney
@ 2026-06-04  6:57   ` Claude Code Review Bot
  1 sibling, 0 replies; 19+ messages in thread
From: Claude Code Review Bot @ 2026-06-04  6:57 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

This patch adds a `BootUnloadGuard` — a `ScopeGuard` that automatically runs the unload bundle if boot fails partway through.

**Observations:**

1. The `BootUnloadGuard` wraps `ScopeGuard<BootUnloadArgs, fn(BootUnloadArgs)>`:
```rust
pub(super) struct BootUnloadGuard<'a> {
    guard: ScopeGuard<BootUnloadArgs<'a>, fn(BootUnloadArgs<'a>)>,
}
```
Using a concrete function pointer `fn(BootUnloadArgs)` instead of a closure avoids lifetime complications with closures capturing references. Clean approach.

2. The guard's callback calls `Gsp::unload`:
```rust
|args| {
    let _ = super::Gsp::unload(
        args.gsp,
        args.dev,
        args.bar,
        args.gsp_falcon,
        args.sec2_falcon,
        args.unload_bundle,
    );
},
```
The `let _ =` discards the result in the drop path — correct, since there's nothing to do with errors during cleanup of a failed boot.

3. The `dismiss()` method returns the unload bundle for the caller to take ownership:
```rust
pub(super) fn dismiss(self) -> Option<super::UnloadBundle> {
    self.guard.dismiss().unload_bundle
}
```
This is the standard `ScopeGuard` pattern — on the success path, dismiss the guard and hand the resource to the normal owner.

4. The lifetime threading `'a` through the HAL `boot` signature is well done:
```rust
fn boot<'a>(
    &self,
    gsp: &'a Gsp,
    dev: &'a device::Device<device::Bound>,
    bar: &'a Bar0,
    ...
    gsp_falcon: &'a Falcon<GspEngine>,
    sec2_falcon: &'a Falcon<Sec2>,
) -> Result<BootUnloadGuard<'a>>;
```
This ensures the guard borrows from the same lifetime as the falcons and device, which is correct — the guard must not outlive the hardware resources it references.

5. **One concern**: The `BootUnloadGuard`'s drop callback calls `Gsp::unload`, which includes `shutdown_gsp` (sends `UNLOADING_GUEST_DRIVER` command). If boot fails early enough that the GSP isn't actually running yet (e.g., before `gsp_falcon.boot()`), the `shutdown_gsp` command will fail (5-second timeout). The unload bundle's `run()` would then also execute. This is probably fine — `shutdown_gsp` failure is logged and continued past, and the unload bundle itself should work regardless of GSP state. But it does mean a boot failure will incur an extra 5-second timeout from the futile shutdown attempt. Not a functional issue, but worth noting for debugging (the error logs will appear).

6. In `tu102.rs`, the guard construction placement is correct — it's created after the unload bundle is built but before FWSEC-FRTS:
```rust
let unload_bundle = Sec2UnloadBundle::build(...).ok().map(crate::gsp::UnloadBundle);
let unload_guard = BootUnloadGuard::new(gsp, dev, bar, gsp_falcon, sec2_falcon, unload_bundle);

// FWSEC-FRTS, falcon reset, booter load all happen after this
// Any `?` will trigger the guard
```
This means if FWSEC-FRTS, falcon boot, or booter load fails, the guard fires and cleans up. If unload bundle prep itself failed (returned `None`), the guard fires but `unload` logs a warning and returns `EAGAIN` — the hardware still won't be cleaned up, but that was already the case.

7. The `gh100.rs` stub correctly propagates the lifetime change and still returns `Err(ENOTSUPP)` — no guard needed since boot never succeeds.

**No issues.** The drop guard pattern is well-implemented and addresses a real concern about partial boot failures.

---

**Summary**: This is a clean, well-reviewed v7 series that addresses a real operational problem (inability to rebind without GPU reset). The HAL abstraction is forward-looking without being over-engineered, the error handling is thorough (continue-on-error in teardown paths, pre-load firmware at boot time), and the drop guard in patch 4 closes the gap for partial boot failures. All four patches are ready to merge.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2026-06-04  6:57 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-29  7:33 [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
2026-05-29  7:33 ` [PATCH v7 1/4] gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL Alexandre Courbot
2026-06-04  6:57   ` Claude review: " Claude Code Review Bot
2026-05-29  7:33 ` [PATCH v7 2/4] gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading Alexandre Courbot
2026-06-04  6:57   ` Claude review: " Claude Code Review Bot
2026-05-29  7:33 ` [PATCH v7 3/4] gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding Alexandre Courbot
2026-06-04  6:57   ` Claude review: " Claude Code Review Bot
2026-05-29  7:33 ` [PATCH v7 4/4] gpu: nova-core: gsp: run the unload bundle if Gsp::boot() fails Alexandre Courbot
2026-05-30  1:46   ` Eliot Courtney
2026-06-04  6:57   ` Claude review: " Claude Code Review Bot
2026-05-29 11:15 ` [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding Danilo Krummrich
2026-05-29 13:06   ` Alexandre Courbot
2026-05-30  5:55 ` Alexandre Courbot
2026-06-04  6:57 ` Claude review: " Claude Code Review Bot
  -- strict thread matches above, loose matches on Subject: below --
2026-05-21 13:50 [PATCH v6 0/7] " Alexandre Courbot
2026-05-21 13:50 ` [PATCH v6 7/7] gpu: nova-core: run Booter Unloader and FWSEC-SB " Alexandre Courbot
2026-05-25 10:06   ` Claude review: " Claude Code Review Bot
2026-05-15  6:12 [PATCH v5 0/7] gpu: nova-core: run unload sequence " Alexandre Courbot
2026-05-15  6:12 ` [PATCH v5 7/7] gpu: nova-core: run Booter Unloader and FWSEC-SB " Alexandre Courbot
2026-05-15 23:54   ` Claude review: " Claude Code Review Bot
2026-04-27  6:56 [PATCH v4 0/8] gpu: nova-core: run unload sequence " Alexandre Courbot
2026-04-27  6:57 ` [PATCH v4 8/8] gpu: nova-core: run Booter Unloader and FWSEC-SB " Alexandre Courbot
2026-04-28  5:01   ` Claude review: " Claude Code Review Bot
2026-04-22 13:40 [PATCH v3 0/6] gpu: nova-core: run unload sequence " Alexandre Courbot
2026-04-22 13:40 ` [PATCH v3 6/6] gpu: nova-core: run Booter Unloader and FWSEC-SB " Alexandre Courbot
2026-04-22 21:34   ` Claude review: " Claude Code Review Bot
2026-04-21  6:16 [PATCH v2 0/5] gpu: nova-core: run unload sequence " Alexandre Courbot
2026-04-21  6:16 ` [PATCH v2 5/5] gpu: nova-core: run Booter Unloader and FWSEC-SB " Alexandre Courbot
2026-04-22 22:52   ` Claude review: " Claude Code Review Bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox