* Claude review: gpu: nova-core: run unload sequence upon unbinding
2026-04-22 13:40 [PATCH v3 0/6] " Alexandre Courbot
@ 2026-04-22 21:34 ` Claude Code Review Bot
0 siblings, 0 replies; 19+ messages in thread
From: Claude Code Review Bot @ 2026-04-22 21:34 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: gpu: nova-core: run unload sequence upon unbinding
Author: Alexandre Courbot <acourbot@nvidia.com>
Patches: 7
Reviewed: 2026-04-23T07:34:42.172558
---
This is a well-structured 6-patch series that implements the GSP firmware unload sequence for the nova-core GPU driver. The series addresses a real and important problem: without proper unload, the WPR2 region persists across driver unbind/rebind cycles, requiring a PCI reset (which itself can brick Blackwell GPUs due to a VBIOS bug). The patches are logically ordered — utility macro first, cleanup/refactoring in the middle, then the actual unload functionality. The code closely mirrors the boot path in reverse, which is the correct approach.
The series is in good shape overall. The concerns below are mostly minor.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 19+ messages in thread
* Claude review: gpu: nova-core: run unload sequence upon unbinding
2026-04-21 6:16 [PATCH v2 0/5] " Alexandre Courbot
@ 2026-04-22 22:52 ` Claude Code Review Bot
0 siblings, 0 replies; 19+ messages in thread
From: Claude Code Review Bot @ 2026-04-22 22:52 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: gpu: nova-core: run unload sequence upon unbinding
Author: Alexandre Courbot <acourbot@nvidia.com>
Patches: 14
Reviewed: 2026-04-23T08:52:54.507537
---
This is a well-structured 5-patch series that implements proper GPU teardown upon driver unbind for the nova-core driver. The motivation is clear and important: without this, the GSP remains running after unbind (potentially accessing reclaimed memory), and the WPR2 region prevents re-probing without a full PCI reset — which itself can brick Blackwell GPUs due to a VBIOS bug.
The series is logically ordered: utility macro (1), refactor unbind to use it (2), namespace cleanup (3), GSP shutdown command (4), and full unload sequence (5). Each patch is reviewable in isolation and incrementally buildable. The code follows existing patterns in the codebase well (e.g., FWSEC/booter loading mirrors `boot()`, the command/reply pattern matches existing GSP commands).
Overall this looks good for merging. I have minor observations but no blocking issues.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 19+ messages in thread
* Claude review: gpu: nova-core: run unload sequence upon unbinding
2026-04-27 6:56 [PATCH v4 0/8] " Alexandre Courbot
@ 2026-04-28 5:01 ` Claude Code Review Bot
0 siblings, 0 replies; 19+ messages in thread
From: Claude Code Review Bot @ 2026-04-28 5:01 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: gpu: nova-core: run unload sequence upon unbinding
Author: Alexandre Courbot <acourbot@nvidia.com>
Patches: 10
Reviewed: 2026-04-28T15:01:46.459272
---
This is a well-structured 8-patch series that implements GSP unload for nova-core, allowing the driver to properly clean up the GPU when unbinding rather than leaving WPR2 in place and requiring a PCI reset before rebinding. The series is logically decomposed: patches 1-3 are preparatory cleanup, patch 4 adds the GSP shutdown command, patch 5 refactors booter code, patches 6-7 introduce a HAL for chipset-specific boot/unload paths, and patch 8 wires up the full unload sequence (Booter Unloader + FWSEC-SB).
The approach of preparing the unload firmware bundle at probe time (since the filesystem may not be available at unbind) is sound and well-motivated. The HAL introduction is a reasonable forward-looking design for Hopper/Blackwell support.
A few concerns worth raising:
1. **Patch 6 reorders `SetSystemInfo`/`SetRegistry` commands to after RISC-V becomes active** — this is a behavioral change that deserves explicit justification beyond "shuffle code," since previously these were enqueued before the GSP falcon boot and now they're sent after.
2. **Patch 8 makes unload bundle preparation non-fatal** — if it fails, the driver still loads but won't be able to cleanly unload. This is pragmatic but means the driver can silently enter a degraded state.
3. **Pin projection in patch 8** — `self.as_mut().project().unload` assumes `#[pin_data]` generates a `project()` method. This should work with kernel pin-init infrastructure, but is worth verifying.
---
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 19+ messages in thread
* Claude review: gpu: nova-core: run unload sequence upon unbinding
2026-05-15 6:12 [PATCH v5 0/7] " Alexandre Courbot
@ 2026-05-15 23:54 ` Claude Code Review Bot
0 siblings, 0 replies; 19+ messages in thread
From: Claude Code Review Bot @ 2026-05-15 23:54 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: gpu: nova-core: run unload sequence upon unbinding
Author: Alexandre Courbot <acourbot@nvidia.com>
Patches: 8
Reviewed: 2026-05-16T09:54:07.385671
---
This is a well-structured 7-patch series from Alexandre Courbot (NVIDIA) implementing GSP firmware unloading for the nova-core driver, enabling clean GPU teardown on driver unbind. The series is at v5, rebased on the Device HRT series, and follows a logical progression: cleanup (patches 1-2), GSP shutdown command (patch 3), SEC2 refactoring (patch 4), boot code reorganization (patches 5-6 for HAL extraction), and the actual unload sequence (patch 7).
The overall design is sound: firmware required for unloading is pre-loaded at boot time (since the filesystem may not be accessible at unbind), stored in a trait-object `UnloadBundle`, and executed during the unbind path. The HAL abstraction cleanly separates the Turing/Ampere/Ada SEC2 boot path from the Hopper/Blackwell FSP path (currently a stub).
The series has appropriate reviews on the cleanup patches (1-4) from Eliot Courtney and Gary Guo, but patches 5-7 lack Reviewed-by tags. Patches 5 and 6 are described as purely mechanical but are large enough to warrant review attention.
I have a few observations, mostly minor, but one item on patch 5's reordering deserves attention.
---
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 19+ messages in thread
* Claude review: gpu: nova-core: run unload sequence upon unbinding
2026-05-21 13:50 [PATCH v6 0/7] " Alexandre Courbot
@ 2026-05-25 10:06 ` Claude Code Review Bot
0 siblings, 0 replies; 19+ messages in thread
From: Claude Code Review Bot @ 2026-05-25 10:06 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: gpu: nova-core: run unload sequence upon unbinding
Author: Alexandre Courbot <acourbot@nvidia.com>
Patches: 9
Reviewed: 2026-05-25T20:06:41.048243
---
This is a well-structured 7-patch series that implements proper GSP firmware unloading for the nova-core driver. The series progresses logically: first cleanup patches (1-2), then adding the GSP shutdown command (3), refactoring boot code for reuse (4-5), introducing a HAL abstraction for chipset-specific boot paths (6), and finally the full unload sequence (7).
The architecture is sound: firmware needed for unloading is pre-loaded at boot time (since the filesystem may not be available at unbind), stored as an opaque "unload bundle," and consumed at unbind. The HAL trait cleanly separates Turing/Ampere/Ada (SEC2-based) from Hopper/Blackwell (FSP-based, stubbed out). Error handling during unload continues through partial failures rather than short-circuiting, which is the right approach for teardown paths.
**Key concern**: Patch 7 introduces `Cell<Option<gsp::UnloadBundle>>` in `NovaCore`. `Cell` is `!Sync`, which means `NovaCore` becomes `!Sync`. This is safe for a pinned driver data structure only if the framework doesn't require `Sync` for `pci::Driver::Data`. The `unbind()` callback receives `Pin<&Self::Data>` (shared ref), so the `Cell::take()` is the mechanism to extract the value — but `Cell` on a shared reference is only sound in single-threaded contexts. If the kernel guarantees that `probe` and `unbind` are never called concurrently on the same device, this is fine, but it deserves a safety comment.
**No blocking issues found.** The series is clean, well-documented, and follows good kernel driver patterns. A few minor observations follow per-patch.
---
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding
@ 2026-05-29 7:33 Alexandre Courbot
2026-05-29 7:33 ` [PATCH v7 1/4] gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL Alexandre Courbot
` (6 more replies)
0 siblings, 7 replies; 19+ messages in thread
From: Alexandre Courbot @ 2026-05-29 7:33 UTC (permalink / raw)
To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter
Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
nova-gpu, dri-devel, linux-kernel, rust-for-linux,
Alexandre Courbot
Currently the GSP is left running and the WPR2 memory region untouched
when the driver is unbound. This is obviously not ideal for at least two
reasons:
- Probing requires setting up the WPR2 region, which cannot be done if
there is already one in place. Hence the current requirement to reset
the GPU (using e.g. `echo 1 >/sys/bus/pci/devices/.../reset`) before
the driver can be probed again after removal.
- The running GSP may still attempt to access shared memory regions
which the kernel might recycle.
On top of that, there is a nasty bug in the Blackwell VBIOS that
sometimes borks the GPU upon PCI reset, requiring a reboot. So relying
on the PCI reset to unload/reload Nova is really not practical here.
This series does what is needed to leave the GPU in a clean state after
unbind, for all currently supported GPUs. Blackwell support is just a
placeholder and will be completed by the Blackwell boot support series.
This revision is based on `drm-rust-next`.
A branch with the series is available at [1].
[1] https://github.com/Gnurou/linux/tree/b4/nova-unload
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
Changes in v7:
- Rebase on current drm-rust-next.
- Drop merged patches.
- Integrate Eliot's unload-on-drop improvement.
- Use `&Gsp` instead of `Pin<&mut Gsp>` in HAL.
- Add a new patch that runs the unload bundle if `Gsp::boot` fails.
- Link to v6: https://patch.msgid.link/20260521-nova-unload-v6-0-65f581c812c9@nvidia.com
Changes in v6:
- Inline TU102 local `run_booter` method in its unique call site.
- Rename unload bundle field to `unload_bundle`.
- Make Sec2UnloadBundle private.
- Continue GSP teardown upon partial failure.
- Store the unload bundle into `NovaCore`.
- Take the unload bundle by value to make it one-shot.
- Link to v5: https://patch.msgid.link/20260515-nova-unload-v5-0-c4d6250ad160@nvidia.com
Changes in v5:
- Rebase on top of the Device HRT series.
- Drop the now unneeded "gpu: nova-core: split BAR acquisition in unbind()".
- Link to v4: https://patch.msgid.link/20260427-nova-unload-v4-0-e145ccddae66@nvidia.com
Changes in v4:
- Remove `warn_on_err` macro as it isn't performing as expected and
distracts from the goal of the series.
- Add John's patch from the Blackwell series refactoring the Booter
Loader runner code.
- Add a GSP HAL and move the existing TU102/SEC2 boot sequence into it
in preparation for the Hopper/Blackwell FSP boot path.
- Prepare the firmware required for unloading at probe time and save it
into an unload bundle, as we cannot guarantee filesystem access at
unload time.
- Constrain `UNLOADING_GUEST_DRIVER`'s visibility to the parent module.
- Also write the sentinel value `0xff` into `mbox1` when running Booter
Unloader to align with OpenRM.
- Link to v3: https://patch.msgid.link/20260422-nova-unload-v3-0-1d2c81bd3ced@nvidia.com
Changes in v3:
- Disambiguate doccomment for `warn_on_err`.
- Test the correct bit instead of the whole register value to determine
that the GSP has stopped.
- Use an enum instead of a boolean to encode the power level when
shutting down the GSP.
- Add missing newline to `dev_err`.
- Add missing doccomments for new types.
- Use values from bindings instead of magic numbers.
- Remove the redundant `get_gsp_info` function.
- Better document Booter Unloader mailbox sentinel value, and check the
value of mbox0 upon return.
- Link to v2: https://patch.msgid.link/20260421-nova-unload-v2-0-2fe54963af8b@nvidia.com
Changes in v2:
- Rebase on top of `master` and remove unneeded/obsolete preparatory patches.
- Tidy up the imports of commands from the `fw` module in the `gsp` module.
- Link to v1: https://patch.msgid.link/20251216-nova-unload-v1-0-6a5d823be19d@nvidia.com
---
Alexandre Courbot (4):
gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL
gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading
gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding
gpu: nova-core: gsp: run the unload bundle if Gsp::boot() fails
drivers/gpu/nova-core/firmware/booter.rs | 1 -
drivers/gpu/nova-core/firmware/fwsec.rs | 1 -
drivers/gpu/nova-core/gpu.rs | 34 ++-
drivers/gpu/nova-core/gsp.rs | 4 +
drivers/gpu/nova-core/gsp/boot.rs | 290 +++++++++---------
drivers/gpu/nova-core/gsp/commands.rs | 43 +++
drivers/gpu/nova-core/gsp/fw.rs | 4 +
drivers/gpu/nova-core/gsp/fw/commands.rs | 45 +++
drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs | 11 +
drivers/gpu/nova-core/gsp/hal.rs | 94 ++++++
drivers/gpu/nova-core/gsp/hal/gh100.rs | 51 ++++
drivers/gpu/nova-core/gsp/hal/tu102.rs | 349 ++++++++++++++++++++++
drivers/gpu/nova-core/regs.rs | 5 +
13 files changed, 775 insertions(+), 157 deletions(-)
---
base-commit: 0e42ec83d46ab8877d38d37493328ed7d1a24de8
change-id: 20251216-nova-unload-4029b3b76950
Best regards,
--
Alexandre Courbot <acourbot@nvidia.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v7 1/4] gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL
2026-05-29 7:33 [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
@ 2026-05-29 7:33 ` Alexandre Courbot
2026-06-04 6:57 ` Claude review: " Claude Code Review Bot
2026-05-29 7:33 ` [PATCH v7 2/4] gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading Alexandre Courbot
` (5 subsequent siblings)
6 siblings, 1 reply; 19+ messages in thread
From: Alexandre Courbot @ 2026-05-29 7:33 UTC (permalink / raw)
To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter
Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
nova-gpu, dri-devel, linux-kernel, rust-for-linux,
Alexandre Courbot
Booting the GSP is done differently depending on the architecture. Move
the parts that are chipset-specific under a HAL.
This does not change much at the moment, since the differences between
Turing and Ampere are rather benign, but will become critical to
properly support the FSP boot process used by Hopper and Blackwell.
The Hopper/Blackwell support is not merged yet, so their HAL is a stub
for now.
This patch is intended to be a mechanical code extraction with no
behavioral changes.
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
drivers/gpu/nova-core/gsp.rs | 1 +
drivers/gpu/nova-core/gsp/boot.rs | 166 +++-----------------------
drivers/gpu/nova-core/gsp/hal.rs | 74 ++++++++++++
drivers/gpu/nova-core/gsp/hal/gh100.rs | 50 ++++++++
drivers/gpu/nova-core/gsp/hal/tu102.rs | 206 +++++++++++++++++++++++++++++++++
5 files changed, 344 insertions(+), 153 deletions(-)
diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
index ba5b7f990031..38378f104068 100644
--- a/drivers/gpu/nova-core/gsp.rs
+++ b/drivers/gpu/nova-core/gsp.rs
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
mod boot;
+mod hal;
use kernel::{
debugfs,
diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 259f7c4d94f5..1bd9f21fc443 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -4,7 +4,6 @@
device,
dma::Coherent,
io::poll::read_poll_timeout,
- io::Io,
pci,
prelude::*,
time::Delta, //
@@ -19,121 +18,17 @@
},
fb::FbLayout,
firmware::{
- booter::{
- BooterFirmware,
- BooterKind, //
- },
- fwsec::{
- bootloader::FwsecFirmwareWithBl,
- FwsecCommand,
- FwsecFirmware, //
- },
gsp::GspFirmware,
FIRMWARE_VERSION, //
},
- gpu::{
- Architecture,
- Chipset, //
- },
+ gpu::Chipset,
gsp::{
commands,
- sequencer::{
- GspSequencer,
- GspSequencerParams, //
- },
GspFwWprMeta, //
},
- regs,
- vbios::Vbios,
};
impl super::Gsp {
- /// Helper function to load and run the FWSEC-FRTS firmware and confirm that it has properly
- /// created the WPR2 region.
- fn run_fwsec_frts(
- dev: &device::Device<device::Bound>,
- chipset: Chipset,
- falcon: &Falcon<Gsp>,
- bar: &Bar0,
- bios: &Vbios,
- fb_layout: &FbLayout,
- ) -> Result<()> {
- // Check that the WPR2 region does not already exists - if it does, we cannot run
- // FWSEC-FRTS until the GPU is reset.
- if bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI).higher_bound() != 0 {
- dev_err!(
- dev,
- "WPR2 region already exists - GPU needs to be reset to proceed\n"
- );
- return Err(EBUSY);
- }
-
- // FWSEC-FRTS will create the WPR2 region.
- let fwsec_frts = FwsecFirmware::new(
- dev,
- falcon,
- bar,
- bios,
- FwsecCommand::Frts {
- frts_addr: fb_layout.frts.start,
- frts_size: fb_layout.frts.len(),
- },
- )?;
-
- if chipset.needs_fwsec_bootloader() {
- let fwsec_frts_bl = FwsecFirmwareWithBl::new(fwsec_frts, dev, chipset)?;
- // Load and run the bootloader, which will load FWSEC-FRTS and run it.
- fwsec_frts_bl.run(dev, falcon, bar)?;
- } else {
- // Load and run FWSEC-FRTS directly.
- fwsec_frts.run(dev, falcon, bar)?;
- }
-
- // SCRATCH_E contains the error code for FWSEC-FRTS.
- let frts_status = bar
- .read(regs::NV_PBUS_SW_SCRATCH_0E_FRTS_ERR)
- .frts_err_code();
- if frts_status != 0 {
- dev_err!(
- dev,
- "FWSEC-FRTS returned with error code {:#x}\n",
- frts_status
- );
-
- return Err(EIO);
- }
-
- // Check that the WPR2 region has been created as we requested.
- let (wpr2_lo, wpr2_hi) = (
- bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_LO).lower_bound(),
- bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI).higher_bound(),
- );
-
- match (wpr2_lo, wpr2_hi) {
- (_, 0) => {
- dev_err!(dev, "WPR2 region not created after running FWSEC-FRTS\n");
-
- Err(EIO)
- }
- (wpr2_lo, _) if wpr2_lo != fb_layout.frts.start => {
- dev_err!(
- dev,
- "WPR2 region created at unexpected address {:#x}; expected {:#x}\n",
- wpr2_lo,
- fb_layout.frts.start,
- );
-
- Err(EIO)
- }
- (wpr2_lo, wpr2_hi) => {
- dev_dbg!(dev, "WPR2: {:#x}-{:#x}\n", wpr2_lo, wpr2_hi);
- dev_dbg!(dev, "GPU instance built\n");
-
- Ok(())
- }
- }
- }
-
/// Attempt to boot the GSP.
///
/// This is a GPU-dependent and complex procedure that involves loading firmware files from
@@ -149,17 +44,8 @@ pub(crate) fn boot(
gsp_falcon: &Falcon<Gsp>,
sec2_falcon: &Falcon<Sec2>,
) -> Result {
- // The FSP boot process of Hopper+ is not supported for now.
- if matches!(
- chipset.arch(),
- Architecture::Hopper | Architecture::BlackwellGB10x | Architecture::BlackwellGB20x
- ) {
- return Err(ENOTSUPP);
- }
-
let dev = pdev.as_ref();
-
- let bios = Vbios::new(dev, bar)?;
+ let hal = super::hal::gsp_hal(chipset);
let gsp_fw = KBox::pin_init(GspFirmware::new(dev, chipset, FIRMWARE_VERSION), GFP_KERNEL)?;
@@ -168,38 +54,21 @@ pub(crate) fn boot(
let wpr_meta = Coherent::init(dev, GFP_KERNEL, GspFwWprMeta::new(&gsp_fw, &fb_layout))?;
- // FWSEC-FRTS is not executed on chips where the FRTS region size is 0 (e.g. GA100).
- if !fb_layout.frts.is_empty() {
- Self::run_fwsec_frts(dev, chipset, gsp_falcon, bar, &bios, &fb_layout)?;
- }
-
- gsp_falcon.reset(bar)?;
- let libos_handle = self.libos.dma_handle();
- let (mbox0, mbox1) = gsp_falcon.boot(
- bar,
- Some(libos_handle as u32),
- Some((libos_handle >> 32) as u32),
- )?;
- dev_dbg!(pdev, "GSP MBOX0: {:#x}, MBOX1: {:#x}\n", mbox0, mbox1);
-
- dev_dbg!(
- pdev,
- "Using SEC2 to load and run the booter_load firmware...\n"
- );
-
- BooterFirmware::new(
+ // Perform the chipset-specific boot sequence.
+ hal.boot(
+ &self,
dev,
- BooterKind::Loader,
- chipset,
- FIRMWARE_VERSION,
- sec2_falcon,
bar,
- )?
- .run(dev, bar, sec2_falcon, &wpr_meta)?;
+ chipset,
+ &fb_layout,
+ &wpr_meta,
+ gsp_falcon,
+ sec2_falcon,
+ )?;
gsp_falcon.write_os_version(bar, gsp_fw.bootloader.app_version);
- // Poll for RISC-V to become active before running sequencer
+ // Poll for RISC-V to become active before continuing.
read_poll_timeout(
|| Ok(gsp_falcon.is_riscv_active(bar)),
|val: &bool| *val,
@@ -214,16 +83,7 @@ pub(crate) fn boot(
self.cmdq
.send_command_no_wait(bar, commands::SetRegistry::new())?;
- // Create and run the GSP sequencer.
- let seq_params = GspSequencerParams {
- bootloader_app_version: gsp_fw.bootloader.app_version,
- libos_dma_handle: libos_handle,
- gsp_falcon,
- sec2_falcon,
- dev,
- bar,
- };
- GspSequencer::run(&self.cmdq, seq_params)?;
+ hal.post_boot(&self, dev, bar, &gsp_fw, gsp_falcon, sec2_falcon)?;
// Wait until GSP is fully initialized.
commands::wait_gsp_init_done(&self.cmdq)?;
diff --git a/drivers/gpu/nova-core/gsp/hal.rs b/drivers/gpu/nova-core/gsp/hal.rs
new file mode 100644
index 000000000000..fb3edaeb3160
--- /dev/null
+++ b/drivers/gpu/nova-core/gsp/hal.rs
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+mod gh100;
+mod tu102;
+
+use kernel::prelude::*;
+
+use kernel::{
+ device,
+ dma::Coherent, //
+};
+
+use crate::{
+ driver::Bar0,
+ falcon::{
+ gsp::Gsp as GspEngine,
+ sec2::Sec2,
+ Falcon, //
+ },
+ fb::FbLayout,
+ firmware::gsp::GspFirmware,
+ gpu::{
+ Architecture,
+ Chipset, //
+ },
+ gsp::{
+ Gsp,
+ GspFwWprMeta, //
+ },
+};
+
+/// Trait implemented by GSP HALs.
+pub(super) trait GspHal: Send {
+ /// Performs the GSP boot process, loading and running the required firmwares as needed.
+ #[allow(clippy::too_many_arguments)]
+ fn boot(
+ &self,
+ gsp: &Gsp,
+ dev: &device::Device<device::Bound>,
+ bar: &Bar0,
+ chipset: Chipset,
+ fb_layout: &FbLayout,
+ wpr_meta: &Coherent<GspFwWprMeta>,
+ gsp_falcon: &Falcon<GspEngine>,
+ sec2_falcon: &Falcon<Sec2>,
+ ) -> Result;
+
+ /// Performs HAL-specific post-GSP boot tasks.
+ ///
+ /// This method is called by the GSP boot code after the GSP is confirmed to be running, and
+ /// after the initialization commands have been pushed onto its queue.
+ fn post_boot(
+ &self,
+ _gsp: &Gsp,
+ _dev: &device::Device<device::Bound>,
+ _bar: &Bar0,
+ _gsp_fw: &GspFirmware,
+ _gsp_falcon: &Falcon<GspEngine>,
+ _sec2_falcon: &Falcon<Sec2>,
+ ) -> Result {
+ Ok(())
+ }
+}
+
+/// Returns the GSP HAL to be used for `chipset`.
+pub(super) fn gsp_hal(chipset: Chipset) -> &'static dyn GspHal {
+ match chipset.arch() {
+ Architecture::Turing | Architecture::Ampere | Architecture::Ada => tu102::TU102_HAL,
+ Architecture::Hopper | Architecture::BlackwellGB10x | Architecture::BlackwellGB20x => {
+ gh100::GH100_HAL
+ }
+ }
+}
diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
new file mode 100644
index 000000000000..3f3675f9c16a
--- /dev/null
+++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
@@ -0,0 +1,50 @@
+// SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+use kernel::prelude::*;
+
+use kernel::{
+ device,
+ dma::Coherent, //
+};
+
+use crate::{
+ driver::Bar0,
+ falcon::{
+ gsp::Gsp as GspEngine,
+ sec2::Sec2,
+ Falcon, //
+ },
+ fb::FbLayout,
+ gpu::Chipset,
+ gsp::{
+ hal::GspHal,
+ Gsp,
+ GspFwWprMeta, //
+ },
+};
+
+struct Gh100;
+
+impl GspHal for Gh100 {
+ /// Boot GSP via FSP Chain of Trust (Hopper/Blackwell+ path).
+ ///
+ /// This path uses FSP to establish a chain of trust and boot GSP-FMC. FSP handles
+ /// the GSP boot internally - no manual GSP reset/boot is needed.
+ fn boot(
+ &self,
+ _gsp: &Gsp,
+ _dev: &device::Device<device::Bound>,
+ _bar: &Bar0,
+ _chipset: Chipset,
+ _fb_layout: &FbLayout,
+ _wpr_meta: &Coherent<GspFwWprMeta>,
+ _gsp_falcon: &Falcon<GspEngine>,
+ _sec2_falcon: &Falcon<Sec2>,
+ ) -> Result {
+ Err(ENOTSUPP)
+ }
+}
+
+const GH100: Gh100 = Gh100;
+pub(super) const GH100_HAL: &dyn GspHal = &GH100;
diff --git a/drivers/gpu/nova-core/gsp/hal/tu102.rs b/drivers/gpu/nova-core/gsp/hal/tu102.rs
new file mode 100644
index 000000000000..a6f2b2e279e8
--- /dev/null
+++ b/drivers/gpu/nova-core/gsp/hal/tu102.rs
@@ -0,0 +1,206 @@
+// SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+use kernel::prelude::*;
+
+use kernel::{
+ device,
+ dma::Coherent,
+ io::Io, //
+};
+
+use crate::{
+ driver::Bar0,
+ falcon::{
+ gsp::Gsp as GspEngine,
+ sec2::Sec2,
+ Falcon, //
+ },
+ fb::FbLayout,
+ firmware::{
+ booter::{
+ BooterFirmware,
+ BooterKind, //
+ },
+ fwsec::{
+ bootloader::FwsecFirmwareWithBl,
+ FwsecCommand,
+ FwsecFirmware, //
+ },
+ gsp::GspFirmware,
+ FIRMWARE_VERSION, //
+ },
+ gpu::Chipset,
+ gsp::{
+ hal::GspHal,
+ sequencer::{
+ GspSequencer,
+ GspSequencerParams, //
+ },
+ Gsp,
+ GspFwWprMeta, //
+ },
+ regs,
+ vbios::Vbios, //
+};
+
+/// Helper function to load and run the FWSEC-FRTS firmware and confirm that it has properly
+/// created the WPR2 region.
+fn run_fwsec_frts(
+ dev: &device::Device<device::Bound>,
+ chipset: Chipset,
+ falcon: &Falcon<GspEngine>,
+ bar: &Bar0,
+ bios: &Vbios,
+ fb_layout: &FbLayout,
+) -> Result<()> {
+ // Check that the WPR2 region does not already exist - if it does, we cannot run
+ // FWSEC-FRTS until the GPU is reset.
+ if bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI).higher_bound() != 0 {
+ dev_err!(
+ dev,
+ "WPR2 region already exists - GPU needs to be reset to proceed\n"
+ );
+ return Err(EBUSY);
+ }
+
+ // FWSEC-FRTS will create the WPR2 region.
+ let fwsec_frts = FwsecFirmware::new(
+ dev,
+ falcon,
+ bar,
+ bios,
+ FwsecCommand::Frts {
+ frts_addr: fb_layout.frts.start,
+ frts_size: fb_layout.frts.len(),
+ },
+ )?;
+
+ if chipset.needs_fwsec_bootloader() {
+ let fwsec_frts_bl = FwsecFirmwareWithBl::new(fwsec_frts, dev, chipset)?;
+ // Load and run the bootloader, which will load FWSEC-FRTS and run it.
+ fwsec_frts_bl.run(dev, falcon, bar)?;
+ } else {
+ // Load and run FWSEC-FRTS directly.
+ fwsec_frts.run(dev, falcon, bar)?;
+ }
+
+ // SCRATCH_E contains the error code for FWSEC-FRTS.
+ let frts_status = bar
+ .read(regs::NV_PBUS_SW_SCRATCH_0E_FRTS_ERR)
+ .frts_err_code();
+ if frts_status != 0 {
+ dev_err!(
+ dev,
+ "FWSEC-FRTS returned with error code {:#x}\n",
+ frts_status
+ );
+
+ return Err(EIO);
+ }
+
+ // Check that the WPR2 region has been created as we requested.
+ let (wpr2_lo, wpr2_hi) = (
+ bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_LO).lower_bound(),
+ bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI).higher_bound(),
+ );
+
+ match (wpr2_lo, wpr2_hi) {
+ (_, 0) => {
+ dev_err!(dev, "WPR2 region not created after running FWSEC-FRTS\n");
+
+ Err(EIO)
+ }
+ (wpr2_lo, _) if wpr2_lo != fb_layout.frts.start => {
+ dev_err!(
+ dev,
+ "WPR2 region created at unexpected address {:#x}; expected {:#x}\n",
+ wpr2_lo,
+ fb_layout.frts.start,
+ );
+
+ Err(EIO)
+ }
+ (wpr2_lo, wpr2_hi) => {
+ dev_dbg!(dev, "WPR2: {:#x}-{:#x}\n", wpr2_lo, wpr2_hi);
+ dev_dbg!(dev, "GPU instance built\n");
+
+ Ok(())
+ }
+ }
+}
+
+struct Tu102;
+
+impl GspHal for Tu102 {
+ fn boot(
+ &self,
+ gsp: &Gsp,
+ dev: &device::Device<device::Bound>,
+ bar: &Bar0,
+ chipset: Chipset,
+ fb_layout: &FbLayout,
+ wpr_meta: &Coherent<GspFwWprMeta>,
+ gsp_falcon: &Falcon<GspEngine>,
+ sec2_falcon: &Falcon<Sec2>,
+ ) -> Result {
+ let bios = Vbios::new(dev, bar)?;
+
+ // FWSEC-FRTS is not executed on chips where the FRTS region size is 0 (e.g. GA100).
+ if !fb_layout.frts.is_empty() {
+ run_fwsec_frts(dev, chipset, gsp_falcon, bar, &bios, fb_layout)?;
+ }
+
+ gsp_falcon.reset(bar)?;
+ let libos_handle = gsp.libos.dma_handle();
+ let (mbox0, mbox1) = gsp_falcon.boot(
+ bar,
+ Some(libos_handle as u32),
+ Some((libos_handle >> 32) as u32),
+ )?;
+ dev_dbg!(dev, "GSP MBOX0: {:#x}, MBOX1: {:#x}\n", mbox0, mbox1);
+
+ dev_dbg!(
+ dev,
+ "Using SEC2 to load and run the booter_load firmware...\n"
+ );
+
+ BooterFirmware::new(
+ dev,
+ BooterKind::Loader,
+ chipset,
+ FIRMWARE_VERSION,
+ sec2_falcon,
+ bar,
+ )?
+ .run(dev, bar, sec2_falcon, wpr_meta)?;
+
+ Ok(())
+ }
+
+ fn post_boot(
+ &self,
+ gsp: &Gsp,
+ dev: &device::Device<device::Bound>,
+ bar: &Bar0,
+ gsp_fw: &GspFirmware,
+ gsp_falcon: &Falcon<GspEngine>,
+ sec2_falcon: &Falcon<Sec2>,
+ ) -> Result {
+ // Create and run the GSP sequencer.
+ let seq_params = GspSequencerParams {
+ bootloader_app_version: gsp_fw.bootloader.app_version,
+ libos_dma_handle: gsp.libos.dma_handle(),
+ gsp_falcon,
+ sec2_falcon,
+ dev,
+ bar,
+ };
+ GspSequencer::run(&gsp.cmdq, seq_params)?;
+
+ Ok(())
+ }
+}
+
+const TU102: Tu102 = Tu102;
+pub(super) const TU102_HAL: &dyn GspHal = &TU102;
--
2.54.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v7 2/4] gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading
2026-05-29 7:33 [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
2026-05-29 7:33 ` [PATCH v7 1/4] gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL Alexandre Courbot
@ 2026-05-29 7:33 ` Alexandre Courbot
2026-06-04 6:57 ` Claude review: " Claude Code Review Bot
2026-05-29 7:33 ` [PATCH v7 3/4] gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding Alexandre Courbot
` (4 subsequent siblings)
6 siblings, 1 reply; 19+ messages in thread
From: Alexandre Courbot @ 2026-05-29 7:33 UTC (permalink / raw)
To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter
Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
nova-gpu, dri-devel, linux-kernel, rust-for-linux,
Alexandre Courbot
Currently, the GSP is left running after the driver is unbound. This is
not great for several reasons, notably that it can still access shared
memory areas that the kernel will now reclaim (especially problematic on
setups without an IOMMU).
Fix this by sending the `UNLOADING_GUEST_DRIVER` GSP command when the
`Gpu` is dropped. This stops the GSP and lets us proceed with the rest
of the unbind sequence in a later patch.
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Co-developed-by: Eliot Courtney <ecourtney@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
drivers/gpu/nova-core/gpu.rs | 21 ++++++++++-
drivers/gpu/nova-core/gsp/boot.rs | 45 +++++++++++++++++++++++
drivers/gpu/nova-core/gsp/commands.rs | 43 ++++++++++++++++++++++
drivers/gpu/nova-core/gsp/fw.rs | 4 ++
drivers/gpu/nova-core/gsp/fw/commands.rs | 45 +++++++++++++++++++++++
drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs | 11 ++++++
6 files changed, 168 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index cf134cab49cd..011d504830e4 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -243,8 +243,10 @@ fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
}
/// Structure holding the resources required to operate the GPU.
-#[pin_data]
+#[pin_data(PinnedDrop)]
pub(crate) struct Gpu<'gpu> {
+ /// Device owning the GPU.
+ device: &'gpu device::Device<device::Bound>,
spec: Spec,
/// MMIO mapping of PCI BAR 0.
bar: &'gpu Bar0,
@@ -266,6 +268,7 @@ pub(crate) fn new(
bar: &'gpu Bar0,
) -> impl PinInit<Self, Error> + 'gpu {
try_pin_init!(Self {
+ device: pdev.as_ref(),
spec: Spec::new(pdev.as_ref(), bar).inspect(|spec| {
dev_info!(pdev,"NVIDIA ({})\n", spec);
})?,
@@ -294,3 +297,19 @@ pub(crate) fn new(
})
}
}
+
+#[pinned_drop]
+impl PinnedDrop for Gpu<'_> {
+ fn drop(self: Pin<&mut Self>) {
+ let this = self.project();
+ let device = *this.device;
+ let bar = *this.bar;
+
+ let _ = this
+ .gsp
+ .as_ref()
+ .get_ref()
+ .unload(device, bar, &*this.gsp_falcon)
+ .inspect_err(|e| dev_err!(device, "failed to unload GSP: {:?}\n", e));
+ }
+}
diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 1bd9f21fc443..adc66809ce83 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -1,6 +1,8 @@
// SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
use kernel::{
+ bits,
device,
dma::Coherent,
io::poll::read_poll_timeout,
@@ -23,6 +25,7 @@
},
gpu::Chipset,
gsp::{
+ cmdq::Cmdq,
commands,
GspFwWprMeta, //
},
@@ -97,4 +100,46 @@ pub(crate) fn boot(
Ok(())
}
+
+ /// Shut down the GSP and wait until it is offline.
+ fn shutdown_gsp(
+ cmdq: &Cmdq,
+ bar: &Bar0,
+ gsp_falcon: &Falcon<Gsp>,
+ mode: commands::PowerStateLevel,
+ ) -> Result<()> {
+ // Command to shut the GSP down.
+ cmdq.send_command(bar, commands::UnloadingGuestDriver::new(mode))?;
+
+ // Wait until GSP signals it is suspended.
+ const LIBOS_INTERRUPT_PROCESSOR_SUSPENDED: u32 = bits::bit_u32(31);
+ read_poll_timeout(
+ || Ok(gsp_falcon.read_mailbox0(bar)),
+ |&mb0| mb0 & LIBOS_INTERRUPT_PROCESSOR_SUSPENDED != 0,
+ Delta::from_millis(10),
+ Delta::from_secs(5),
+ )
+ .map(|_| ())
+ }
+
+ /// Attempts to unload the GSP firmware.
+ ///
+ /// This stops all activity on the GSP.
+ pub(crate) fn unload(
+ &self,
+ dev: &device::Device<device::Bound>,
+ bar: &Bar0,
+ gsp_falcon: &Falcon<Gsp>,
+ ) -> Result {
+ // Shut down the GSP.
+ Self::shutdown_gsp(
+ &self.cmdq,
+ bar,
+ gsp_falcon,
+ commands::PowerStateLevel::Level0,
+ )
+ .inspect_err(|e| dev_err!(dev, "Unload guest driver failed: {:?}\n", e))?;
+
+ Ok(())
+ }
}
diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
index ac9cef312b10..3a365455d10c 100644
--- a/drivers/gpu/nova-core/gsp/commands.rs
+++ b/drivers/gpu/nova-core/gsp/commands.rs
@@ -232,3 +232,46 @@ pub(crate) fn gpu_name(&self) -> core::result::Result<&str, GpuNameError> {
.map_err(GpuNameError::InvalidUtf8)
}
}
+
+pub(crate) use fw::commands::PowerStateLevel;
+
+/// The `UnloadingGuestDriver` command, used to shut down the GSP.
+///
+/// Only used within the `gsp` module.
+pub(super) struct UnloadingGuestDriver {
+ level: PowerStateLevel,
+}
+
+impl UnloadingGuestDriver {
+ /// Creates a new `UnloadingGuestDriver` command for the given [`PowerStateLevel`].
+ pub(super) fn new(level: PowerStateLevel) -> Self {
+ Self { level }
+ }
+}
+
+impl CommandToGsp for UnloadingGuestDriver {
+ const FUNCTION: MsgFunction = MsgFunction::UnloadingGuestDriver;
+ type Command = fw::commands::UnloadingGuestDriver;
+ type Reply = UnloadingGuestDriverReply;
+ type InitError = Infallible;
+
+ fn init(&self) -> impl Init<Self::Command, Self::InitError> {
+ fw::commands::UnloadingGuestDriver::new(self.level)
+ }
+}
+
+/// The reply from the GSP to the [`UnloadingGuestDriver`] command.
+pub(super) struct UnloadingGuestDriverReply;
+
+impl MessageFromGsp for UnloadingGuestDriverReply {
+ const FUNCTION: MsgFunction = MsgFunction::UnloadingGuestDriver;
+ type InitError = Infallible;
+ type Message = ();
+
+ fn read(
+ _msg: &Self::Message,
+ _sbuffer: &mut SBufferIter<array::IntoIter<&[u8], 2>>,
+ ) -> Result<Self, Self::InitError> {
+ Ok(UnloadingGuestDriverReply)
+ }
+}
diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
index 3245793bbe42..33c9f5860771 100644
--- a/drivers/gpu/nova-core/gsp/fw.rs
+++ b/drivers/gpu/nova-core/gsp/fw.rs
@@ -279,6 +279,7 @@ pub(crate) enum MsgFunction {
Nop = bindings::NV_VGPU_MSG_FUNCTION_NOP,
SetGuestSystemInfo = bindings::NV_VGPU_MSG_FUNCTION_SET_GUEST_SYSTEM_INFO,
SetRegistry = bindings::NV_VGPU_MSG_FUNCTION_SET_REGISTRY,
+ UnloadingGuestDriver = bindings::NV_VGPU_MSG_FUNCTION_UNLOADING_GUEST_DRIVER,
// Event codes
GspInitDone = bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE,
@@ -323,6 +324,9 @@ fn try_from(value: u32) -> Result<MsgFunction> {
Ok(MsgFunction::SetGuestSystemInfo)
}
bindings::NV_VGPU_MSG_FUNCTION_SET_REGISTRY => Ok(MsgFunction::SetRegistry),
+ bindings::NV_VGPU_MSG_FUNCTION_UNLOADING_GUEST_DRIVER => {
+ Ok(MsgFunction::UnloadingGuestDriver)
+ }
// Event codes
bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE => Ok(MsgFunction::GspInitDone),
diff --git a/drivers/gpu/nova-core/gsp/fw/commands.rs b/drivers/gpu/nova-core/gsp/fw/commands.rs
index db46276430be..42985d446bae 100644
--- a/drivers/gpu/nova-core/gsp/fw/commands.rs
+++ b/drivers/gpu/nova-core/gsp/fw/commands.rs
@@ -1,4 +1,5 @@
// SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
use kernel::{
device,
@@ -129,3 +130,47 @@ unsafe impl AsBytes for GspStaticConfigInfo {}
// SAFETY: This struct only contains integer types for which all bit patterns
// are valid.
unsafe impl FromBytes for GspStaticConfigInfo {}
+
+/// Power level requested to the [`UnloadingGuestDriver`] command.
+#[derive(Clone, Copy, Debug, PartialEq, Eq)]
+#[repr(u32)]
+#[expect(unused)]
+pub(crate) enum PowerStateLevel {
+ /// Full unload.
+ Level0 = bindings::NV2080_CTRL_GPU_SET_POWER_STATE_GPU_LEVEL_0,
+ /// S3 (suspend to RAM).
+ Level3 = bindings::NV2080_CTRL_GPU_SET_POWER_STATE_GPU_LEVEL_3,
+ /// Hibernate (suspend to disk).
+ Level7 = bindings::NV2080_CTRL_GPU_SET_POWER_STATE_GPU_LEVEL_7,
+}
+
+impl PowerStateLevel {
+ /// Returns `true` if this state represents a power management transition, i.e. some GPU state
+ /// must survive it (as opposed to a full unload).
+ pub(crate) fn is_power_transition(self) -> bool {
+ self != PowerStateLevel::Level0
+ }
+}
+
+/// Payload of the `UnloadingGuestDriver` command and message.
+#[repr(transparent)]
+#[derive(Clone, Copy, Debug, Zeroable)]
+pub(crate) struct UnloadingGuestDriver(bindings::rpc_unloading_guest_driver_v1F_07);
+
+impl UnloadingGuestDriver {
+ pub(crate) fn new(level: PowerStateLevel) -> Self {
+ Self(bindings::rpc_unloading_guest_driver_v1F_07 {
+ bInPMTransition: u8::from(level.is_power_transition()),
+ bGc6Entering: 0,
+ newLevel: level as u32,
+ ..Zeroable::zeroed()
+ })
+ }
+}
+
+// SAFETY: Padding is explicit and will not contain uninitialized data.
+unsafe impl AsBytes for UnloadingGuestDriver {}
+
+// SAFETY: This struct only contains integer types for which all bit patterns
+// are valid.
+unsafe impl FromBytes for UnloadingGuestDriver {}
diff --git a/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs b/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
index 334e8be5fde8..f82ed097b283 100644
--- a/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
+++ b/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
@@ -30,6 +30,9 @@ fn fmt(&self, fmt: &mut ::core::fmt::Formatter<'_>) -> ::core::fmt::Result {
fmt.write_str("__IncompleteArrayField")
}
}
+pub const NV2080_CTRL_GPU_SET_POWER_STATE_GPU_LEVEL_0: u32 = 0;
+pub const NV2080_CTRL_GPU_SET_POWER_STATE_GPU_LEVEL_3: u32 = 3;
+pub const NV2080_CTRL_GPU_SET_POWER_STATE_GPU_LEVEL_7: u32 = 7;
pub const NV_VGPU_MSG_SIGNATURE_VALID: u32 = 1129337430;
pub const GSP_FW_HEAP_PARAM_OS_SIZE_LIBOS2: u32 = 0;
pub const GSP_FW_HEAP_PARAM_OS_SIZE_LIBOS3_BAREMETAL: u32 = 23068672;
@@ -880,6 +883,14 @@ fn default() -> Self {
}
}
#[repr(C)]
+#[derive(Debug, Default, Copy, Clone, MaybeZeroable)]
+pub struct rpc_unloading_guest_driver_v1F_07 {
+ pub bInPMTransition: u8_,
+ pub bGc6Entering: u8_,
+ pub __bindgen_padding_0: [u8; 2usize],
+ pub newLevel: u32_,
+}
+#[repr(C)]
#[derive(Debug, Default, MaybeZeroable)]
pub struct rpc_run_cpu_sequencer_v17_00 {
pub bufferSizeDWord: u32_,
--
2.54.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v7 3/4] gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding
2026-05-29 7:33 [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
2026-05-29 7:33 ` [PATCH v7 1/4] gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL Alexandre Courbot
2026-05-29 7:33 ` [PATCH v7 2/4] gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading Alexandre Courbot
@ 2026-05-29 7:33 ` Alexandre Courbot
2026-06-04 6:57 ` Claude review: " Claude Code Review Bot
2026-05-29 7:33 ` [PATCH v7 4/4] gpu: nova-core: gsp: run the unload bundle if Gsp::boot() fails Alexandre Courbot
` (3 subsequent siblings)
6 siblings, 1 reply; 19+ messages in thread
From: Alexandre Courbot @ 2026-05-29 7:33 UTC (permalink / raw)
To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter
Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
nova-gpu, dri-devel, linux-kernel, rust-for-linux,
Alexandre Courbot
When probing the driver, the FWSEC-FRTS firmware creates a WPR2 secure
memory region to store the GSP firmware, and the Booter Loader loads and
starts that firmware into the GSP, making it run in RISC-V mode.
These operations need to be reverted upon unloading, particularly the
WPR2 secure region creation, as its presence prevents the driver from
subsequently probing.
Thus, prepare the Booter Unloader and FWSEC-SB firmware images when
booting the GSP, so they can be executed at unbind time to put the GPU
into a state where it can be probed again.
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Co-developed-by: Eliot Courtney <ecourtney@nvidia.com>
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
drivers/gpu/nova-core/firmware/booter.rs | 1 -
drivers/gpu/nova-core/firmware/fwsec.rs | 1 -
drivers/gpu/nova-core/gpu.rs | 15 +++-
drivers/gpu/nova-core/gsp.rs | 3 +
drivers/gpu/nova-core/gsp/boot.rs | 38 +++++++--
drivers/gpu/nova-core/gsp/hal.rs | 21 ++++-
drivers/gpu/nova-core/gsp/hal/gh100.rs | 2 +-
drivers/gpu/nova-core/gsp/hal/tu102.rs | 142 ++++++++++++++++++++++++++++++-
drivers/gpu/nova-core/regs.rs | 5 ++
9 files changed, 209 insertions(+), 19 deletions(-)
diff --git a/drivers/gpu/nova-core/firmware/booter.rs b/drivers/gpu/nova-core/firmware/booter.rs
index e45e5dc8d5d2..c5e17605e1a3 100644
--- a/drivers/gpu/nova-core/firmware/booter.rs
+++ b/drivers/gpu/nova-core/firmware/booter.rs
@@ -282,7 +282,6 @@ fn new_booter(data: &[u8]) -> Result<Self> {
#[derive(Copy, Clone, Debug, PartialEq)]
pub(crate) enum BooterKind {
Loader,
- #[expect(unused)]
Unloader,
}
diff --git a/drivers/gpu/nova-core/firmware/fwsec.rs b/drivers/gpu/nova-core/firmware/fwsec.rs
index 8810cb49db67..4108f28cd338 100644
--- a/drivers/gpu/nova-core/firmware/fwsec.rs
+++ b/drivers/gpu/nova-core/firmware/fwsec.rs
@@ -144,7 +144,6 @@ pub(crate) enum FwsecCommand {
/// image into it.
Frts { frts_addr: u64, frts_size: u64 },
/// Asks [`FwsecFirmware`] to load pre-OS apps on the PMU.
- #[expect(dead_code)]
Sb,
}
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 011d504830e4..aed992488db3 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -18,7 +18,10 @@
Falcon, //
},
fb::SysmemFlush,
- gsp::Gsp,
+ gsp::{
+ self,
+ Gsp, //
+ },
regs,
};
@@ -260,6 +263,8 @@ pub(crate) struct Gpu<'gpu> {
/// GSP runtime data. Temporarily an empty placeholder.
#[pin]
gsp: Gsp,
+ /// GSP unload firmware bundle, if any.
+ unload_bundle: Option<gsp::UnloadBundle>,
}
impl<'gpu> Gpu<'gpu> {
@@ -293,7 +298,10 @@ pub(crate) fn new(
gsp <- Gsp::new(pdev),
- _: { gsp.boot(pdev, bar, spec.chipset, gsp_falcon, sec2_falcon)? },
+ // This member must be initialized last, so the `UnloadBundle` can never be dropped from
+ // outside of the constructed `Gpu`, ensuring that the unload sequence is properly run
+ // in case of failure.
+ unload_bundle: gsp.boot(pdev, bar, spec.chipset, gsp_falcon, sec2_falcon)?,
})
}
}
@@ -304,12 +312,13 @@ fn drop(self: Pin<&mut Self>) {
let this = self.project();
let device = *this.device;
let bar = *this.bar;
+ let bundle = this.unload_bundle.take();
let _ = this
.gsp
.as_ref()
.get_ref()
- .unload(device, bar, &*this.gsp_falcon)
+ .unload(device, bar, &*this.gsp_falcon, &*this.sec2_falcon, bundle)
.inspect_err(|e| dev_err!(device, "failed to unload GSP: {:?}\n", e));
}
}
diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
index 38378f104068..1885cfa5cb38 100644
--- a/drivers/gpu/nova-core/gsp.rs
+++ b/drivers/gpu/nova-core/gsp.rs
@@ -185,3 +185,6 @@ pub(crate) fn new(pdev: &pci::Device<device::Bound>) -> impl PinInit<Self, Error
})
}
}
+
+/// Opaque bundle required to unload the GSP. Created by [`Gsp::boot`], consumed by [`Gsp::unload`].
+pub(crate) struct UnloadBundle(KBox<dyn hal::UnloadBundle>);
diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index adc66809ce83..8d6fcc35b653 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -38,7 +38,8 @@ impl super::Gsp {
/// user-space, patching them with signatures, and building firmware-specific intricate data
/// structures that the GSP will use at runtime.
///
- /// Upon return, the GSP is up and running, and its runtime object given as return value.
+ /// Upon return, the GSP is up and running, and its unload bundle (to be given as argument to
+ /// [`Self::unload`]) returned.
pub(crate) fn boot(
self: Pin<&mut Self>,
pdev: &pci::Device<device::Bound>,
@@ -46,7 +47,7 @@ pub(crate) fn boot(
chipset: Chipset,
gsp_falcon: &Falcon<Gsp>,
sec2_falcon: &Falcon<Sec2>,
- ) -> Result {
+ ) -> Result<Option<super::UnloadBundle>> {
let dev = pdev.as_ref();
let hal = super::hal::gsp_hal(chipset);
@@ -57,8 +58,8 @@ pub(crate) fn boot(
let wpr_meta = Coherent::init(dev, GFP_KERNEL, GspFwWprMeta::new(&gsp_fw, &fb_layout))?;
- // Perform the chipset-specific boot sequence.
- hal.boot(
+ // Perform the chipset-specific boot sequence, and retrieve the unload bundle.
+ let unload_bundle = hal.boot(
&self,
dev,
bar,
@@ -98,7 +99,7 @@ pub(crate) fn boot(
Err(e) => dev_warn!(pdev, "GPU name unavailable: {:?}\n", e),
}
- Ok(())
+ Ok(unload_bundle)
}
/// Shut down the GSP and wait until it is offline.
@@ -130,16 +131,35 @@ pub(crate) fn unload(
dev: &device::Device<device::Bound>,
bar: &Bar0,
gsp_falcon: &Falcon<Gsp>,
+ sec2_falcon: &Falcon<Sec2>,
+ unload_bundle: Option<super::UnloadBundle>,
) -> Result {
- // Shut down the GSP.
- Self::shutdown_gsp(
+ // Shut down the GSP. Keep going even in case of error.
+ let mut res = Self::shutdown_gsp(
&self.cmdq,
bar,
gsp_falcon,
commands::PowerStateLevel::Level0,
)
- .inspect_err(|e| dev_err!(dev, "Unload guest driver failed: {:?}\n", e))?;
+ .inspect_err(|e| dev_err!(dev, "GSP shutdown failed: {:?}\n", e));
- Ok(())
+ // Run the unload bundle to reset the GSP so it can be booted again.
+ if let Some(unload_bundle) = unload_bundle {
+ res = res.and(
+ unload_bundle
+ .0
+ .run(dev, bar, gsp_falcon, sec2_falcon)
+ .inspect_err(|e| dev_err!(dev, "Unload bundle failed: {:?}\n", e)),
+ );
+ } else {
+ dev_warn!(
+ dev,
+ "Unload bundle is missing, GSP won't be properly reset.\n"
+ );
+
+ res = Err(EAGAIN);
+ }
+
+ res.inspect(|()| dev_info!(dev, "GSP successfully unloaded\n"))
}
}
diff --git a/drivers/gpu/nova-core/gsp/hal.rs b/drivers/gpu/nova-core/gsp/hal.rs
index fb3edaeb3160..501b852dcb29 100644
--- a/drivers/gpu/nova-core/gsp/hal.rs
+++ b/drivers/gpu/nova-core/gsp/hal.rs
@@ -30,9 +30,28 @@
},
};
+/// Trait for types containing the resources and code required to fully reset the GSP.
+///
+/// The GSP unload code might run in a situation where we cannot load firmware dynamically (e.g.
+/// because we are in shutdown and the file system is not accessible anymore). Thus, the firmware
+/// required for unloading is prepared at load time, and stored here until it needs to be run.
+pub(super) trait UnloadBundle: Send {
+ /// Performs the steps required to properly reset the GSP after it has been stopped.
+ fn run(
+ &self,
+ dev: &device::Device<device::Bound>,
+ bar: &Bar0,
+ gsp_falcon: &Falcon<GspEngine>,
+ sec2_falcon: &Falcon<Sec2>,
+ ) -> Result;
+}
+
/// Trait implemented by GSP HALs.
pub(super) trait GspHal: Send {
/// Performs the GSP boot process, loading and running the required firmwares as needed.
+ ///
+ /// Upon success, returns the [`UnloadBundle`] to be run (if any) in order to properly reset the
+ /// GSP after it has been stopped.
#[allow(clippy::too_many_arguments)]
fn boot(
&self,
@@ -44,7 +63,7 @@ fn boot(
wpr_meta: &Coherent<GspFwWprMeta>,
gsp_falcon: &Falcon<GspEngine>,
sec2_falcon: &Falcon<Sec2>,
- ) -> Result;
+ ) -> Result<Option<crate::gsp::UnloadBundle>>;
/// Performs HAL-specific post-GSP boot tasks.
///
diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
index 3f3675f9c16a..0a8b7f763883 100644
--- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
+++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
@@ -41,7 +41,7 @@ fn boot(
_wpr_meta: &Coherent<GspFwWprMeta>,
_gsp_falcon: &Falcon<GspEngine>,
_sec2_falcon: &Falcon<Sec2>,
- ) -> Result {
+ ) -> Result<Option<crate::gsp::UnloadBundle>> {
Err(ENOTSUPP)
}
}
diff --git a/drivers/gpu/nova-core/gsp/hal/tu102.rs b/drivers/gpu/nova-core/gsp/hal/tu102.rs
index a6f2b2e279e8..c4ab081f25c4 100644
--- a/drivers/gpu/nova-core/gsp/hal/tu102.rs
+++ b/drivers/gpu/nova-core/gsp/hal/tu102.rs
@@ -32,7 +32,10 @@
},
gpu::Chipset,
gsp::{
- hal::GspHal,
+ hal::{
+ GspHal,
+ UnloadBundle, //
+ },
sequencer::{
GspSequencer,
GspSequencerParams, //
@@ -44,6 +47,124 @@
vbios::Vbios, //
};
+// A ready-to-run FWSEC unload firmware.
+//
+// Since there are two variants of the prepared firmware (with and without a bootloader), this type
+// abstracts the difference.
+enum FwsecUnloadFirmware {
+ WithoutBl(FwsecFirmware),
+ WithBl(FwsecFirmwareWithBl),
+}
+
+impl FwsecUnloadFirmware {
+ /// Loads the FWSEC SB firmware, as well as its bootloader if `chipset` requires it.
+ fn new(
+ dev: &device::Device<device::Bound>,
+ bar: &Bar0,
+ chipset: Chipset,
+ bios: &Vbios,
+ gsp_falcon: &Falcon<GspEngine>,
+ ) -> Result<Self> {
+ let fwsec_sb = FwsecFirmware::new(dev, gsp_falcon, bar, bios, FwsecCommand::Sb)?;
+
+ Ok(if chipset.needs_fwsec_bootloader() {
+ Self::WithBl(FwsecFirmwareWithBl::new(fwsec_sb, dev, chipset)?)
+ } else {
+ Self::WithoutBl(fwsec_sb)
+ })
+ }
+
+ /// Runs the FWSEC SB firmware.
+ fn run(
+ &self,
+ dev: &device::Device<device::Bound>,
+ bar: &Bar0,
+ gsp_falcon: &Falcon<GspEngine>,
+ ) -> Result<()> {
+ match self {
+ Self::WithoutBl(fw) => fw.run(dev, gsp_falcon, bar),
+ Self::WithBl(fw) => fw.run(dev, gsp_falcon, bar),
+ }
+ }
+}
+
+// Contains the firmware required to fully reset GSP on chipsets where the GSP is started using
+// FWSEC/Booter.
+struct Sec2UnloadBundle {
+ fwsec_sb: FwsecUnloadFirmware,
+ booter_unloader: BooterFirmware,
+}
+
+impl Sec2UnloadBundle {
+ /// Load and prepare the resources required to properly reset the GSP after it has been stopped.
+ fn build(
+ dev: &device::Device<device::Bound>,
+ bar: &Bar0,
+ chipset: Chipset,
+ bios: &Vbios,
+ gsp_falcon: &Falcon<GspEngine>,
+ sec2_falcon: &Falcon<Sec2>,
+ ) -> Result<KBox<dyn UnloadBundle>> {
+ KBox::new(
+ Self {
+ fwsec_sb: FwsecUnloadFirmware::new(dev, bar, chipset, bios, gsp_falcon)?,
+ booter_unloader: BooterFirmware::new(
+ dev,
+ BooterKind::Unloader,
+ chipset,
+ FIRMWARE_VERSION,
+ sec2_falcon,
+ bar,
+ )?,
+ },
+ GFP_KERNEL,
+ )
+ .map(|b| b as KBox<dyn UnloadBundle>)
+ .map_err(Into::into)
+ }
+}
+
+impl UnloadBundle for Sec2UnloadBundle {
+ fn run(
+ &self,
+ dev: &device::Device<device::Bound>,
+ bar: &Bar0,
+ gsp_falcon: &Falcon<GspEngine>,
+ sec2_falcon: &Falcon<Sec2>,
+ ) -> Result<()> {
+ // Run FWSEC-SB to reset the GSP falcon to its pre-libos state.
+ self.fwsec_sb.run(dev, bar, gsp_falcon)?;
+
+ // Remove WPR2 region if set.
+ let wpr2_hi = bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI);
+ if wpr2_hi.is_wpr2_set() {
+ sec2_falcon.reset(bar)?;
+ sec2_falcon.load(dev, bar, &self.booter_unloader)?;
+
+ // Sentinel value to confirm that Booter Unloader has run.
+ const MAILBOX_SENTINEL: u32 = 0xff;
+ let (mbox0, _) =
+ sec2_falcon.boot(bar, Some(MAILBOX_SENTINEL), Some(MAILBOX_SENTINEL))?;
+ if mbox0 != 0 {
+ dev_err!(dev, "Booter Unloader returned error 0x{:x}\n", mbox0);
+ return Err(EINVAL);
+ }
+
+ // Confirm that the WPR2 region has been removed.
+ let wpr2_hi = bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI);
+ if wpr2_hi.is_wpr2_set() {
+ dev_err!(
+ dev,
+ "WPR2 region still set after Booter Unloader returned\n"
+ );
+ return Err(EBUSY);
+ }
+ }
+
+ Ok(())
+ }
+}
+
/// Helper function to load and run the FWSEC-FRTS firmware and confirm that it has properly
/// created the WPR2 region.
fn run_fwsec_frts(
@@ -143,9 +264,24 @@ fn boot(
wpr_meta: &Coherent<GspFwWprMeta>,
gsp_falcon: &Falcon<GspEngine>,
sec2_falcon: &Falcon<Sec2>,
- ) -> Result {
+ ) -> Result<Option<crate::gsp::UnloadBundle>> {
let bios = Vbios::new(dev, bar)?;
+ // Try and prepare the unload bundle. If this fails, the GPU will need to be reset
+ // before the driver can be probed again.
+ let unload_bundle =
+ Sec2UnloadBundle::build(dev, bar, chipset, &bios, gsp_falcon, sec2_falcon)
+ .inspect_err(|e| {
+ dev_warn!(dev, "Failed to prepare unload firmware: {:?}\n", e);
+ dev_warn!(dev, "The GSP won't be able to unload properly on unbind.\n");
+ dev_warn!(
+ dev,
+ "The GPU will need to be reset before the driver can bind again.\n"
+ );
+ })
+ .map(crate::gsp::UnloadBundle)
+ .ok();
+
// FWSEC-FRTS is not executed on chips where the FRTS region size is 0 (e.g. GA100).
if !fb_layout.frts.is_empty() {
run_fwsec_frts(dev, chipset, gsp_falcon, bar, &bios, fb_layout)?;
@@ -175,7 +311,7 @@ fn boot(
)?
.run(dev, bar, sec2_falcon, wpr_meta)?;
- Ok(())
+ Ok(unload_bundle)
}
fn post_boot(
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index 6faeed73901d..356fbf364ea5 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -175,6 +175,11 @@ impl NV_PFB_PRI_MMU_WPR2_ADDR_HI {
pub(crate) fn higher_bound(self) -> u64 {
u64::from(self.hi_val()) << 12
}
+
+ /// Returns whether the WPR2 region is currently set.
+ pub(crate) fn is_wpr2_set(self) -> bool {
+ self.hi_val() != 0
+ }
}
// PGSP
--
2.54.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v7 4/4] gpu: nova-core: gsp: run the unload bundle if Gsp::boot() fails
2026-05-29 7:33 [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
` (2 preceding siblings ...)
2026-05-29 7:33 ` [PATCH v7 3/4] gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding Alexandre Courbot
@ 2026-05-29 7:33 ` Alexandre Courbot
2026-05-30 1:46 ` Eliot Courtney
2026-06-04 6:57 ` Claude review: " Claude Code Review Bot
2026-05-29 11:15 ` [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding Danilo Krummrich
` (2 subsequent siblings)
6 siblings, 2 replies; 19+ messages in thread
From: Alexandre Courbot @ 2026-05-29 7:33 UTC (permalink / raw)
To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter
Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
nova-gpu, dri-devel, linux-kernel, rust-for-linux,
Alexandre Courbot
If `Gsp::boot` fails, the GSP can be left in a state where boot cannot
be attempted again unless it is reset first.
To avoid this, we want to run the unload bundle whenever `boot` fails to
try and clear the partially-initialized state.
Do this by wrapping the unload bundle into a drop guard up until `boot`
returns. After that, running the unload bundle becomes the
responsibility of the caller.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
drivers/gpu/nova-core/gsp/boot.rs | 67 ++++++++++++++++++++++++++++++++--
drivers/gpu/nova-core/gsp/hal.rs | 19 +++++-----
drivers/gpu/nova-core/gsp/hal/gh100.rs | 15 ++++----
drivers/gpu/nova-core/gsp/hal/tu102.rs | 31 ++++++++++------
4 files changed, 101 insertions(+), 31 deletions(-)
diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 8d6fcc35b653..1f83f63ceeb0 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -8,7 +8,8 @@
io::poll::read_poll_timeout,
pci,
prelude::*,
- time::Delta, //
+ time::Delta,
+ types::ScopeGuard, //
};
use crate::{
@@ -31,6 +32,66 @@
},
};
+/// Arguments required to call [`Gsp::unload`](super::Gsp::unload).
+///
+/// Stored as their own type to avoid repeating a long and tedious list in [`BootUnloadGuard`].
+pub(super) struct BootUnloadArgs<'a> {
+ gsp: &'a super::Gsp,
+ dev: &'a device::Device<device::Bound>,
+ bar: &'a Bar0,
+ gsp_falcon: &'a Falcon<Gsp>,
+ sec2_falcon: &'a Falcon<Sec2>,
+ unload_bundle: Option<super::UnloadBundle>,
+}
+
+/// Guard that calls [`Gsp::unload`](super::Gsp::unload) with a
+/// [`UnloadBundle`](super::UnloadBundle) when dropped.
+///
+/// Used to ensure the `UnloadBundle` is run during failure paths.
+pub(super) struct BootUnloadGuard<'a> {
+ guard: ScopeGuard<BootUnloadArgs<'a>, fn(BootUnloadArgs<'a>)>,
+}
+
+impl<'a> BootUnloadGuard<'a> {
+ /// Wraps `unload_bundle` into a guard that executes it when dropped.
+ pub(super) fn new(
+ gsp: &'a super::Gsp,
+ dev: &'a device::Device<device::Bound>,
+ bar: &'a Bar0,
+ gsp_falcon: &'a Falcon<Gsp>,
+ sec2_falcon: &'a Falcon<Sec2>,
+ unload_bundle: Option<super::UnloadBundle>,
+ ) -> Self {
+ Self {
+ guard: ScopeGuard::new_with_data(
+ BootUnloadArgs {
+ gsp,
+ dev,
+ bar,
+ gsp_falcon,
+ sec2_falcon,
+ unload_bundle,
+ },
+ |args| {
+ let _ = super::Gsp::unload(
+ args.gsp,
+ args.dev,
+ args.bar,
+ args.gsp_falcon,
+ args.sec2_falcon,
+ args.unload_bundle,
+ );
+ },
+ ),
+ }
+ }
+
+ /// Disarms the guard and returns the [`UnloadBundle`](super::UnloadBundle) it contains.
+ pub(super) fn dismiss(self) -> Option<super::UnloadBundle> {
+ self.guard.dismiss().unload_bundle
+ }
+}
+
impl super::Gsp {
/// Attempt to boot the GSP.
///
@@ -59,7 +120,7 @@ pub(crate) fn boot(
let wpr_meta = Coherent::init(dev, GFP_KERNEL, GspFwWprMeta::new(&gsp_fw, &fb_layout))?;
// Perform the chipset-specific boot sequence, and retrieve the unload bundle.
- let unload_bundle = hal.boot(
+ let unload_guard = hal.boot(
&self,
dev,
bar,
@@ -99,7 +160,7 @@ pub(crate) fn boot(
Err(e) => dev_warn!(pdev, "GPU name unavailable: {:?}\n", e),
}
- Ok(unload_bundle)
+ Ok(unload_guard.dismiss())
}
/// Shut down the GSP and wait until it is offline.
diff --git a/drivers/gpu/nova-core/gsp/hal.rs b/drivers/gpu/nova-core/gsp/hal.rs
index 501b852dcb29..88fc3e791114 100644
--- a/drivers/gpu/nova-core/gsp/hal.rs
+++ b/drivers/gpu/nova-core/gsp/hal.rs
@@ -25,6 +25,7 @@
Chipset, //
},
gsp::{
+ boot::BootUnloadGuard,
Gsp,
GspFwWprMeta, //
},
@@ -50,20 +51,20 @@ fn run(
pub(super) trait GspHal: Send {
/// Performs the GSP boot process, loading and running the required firmwares as needed.
///
- /// Upon success, returns the [`UnloadBundle`] to be run (if any) in order to properly reset the
- /// GSP after it has been stopped.
+ /// Upon success, returns a guard that runs the GSP unload sequence if GSP boot does not
+ /// complete.
#[allow(clippy::too_many_arguments)]
- fn boot(
+ fn boot<'a>(
&self,
- gsp: &Gsp,
- dev: &device::Device<device::Bound>,
- bar: &Bar0,
+ gsp: &'a Gsp,
+ dev: &'a device::Device<device::Bound>,
+ bar: &'a Bar0,
chipset: Chipset,
fb_layout: &FbLayout,
wpr_meta: &Coherent<GspFwWprMeta>,
- gsp_falcon: &Falcon<GspEngine>,
- sec2_falcon: &Falcon<Sec2>,
- ) -> Result<Option<crate::gsp::UnloadBundle>>;
+ gsp_falcon: &'a Falcon<GspEngine>,
+ sec2_falcon: &'a Falcon<Sec2>,
+ ) -> Result<BootUnloadGuard<'a>>;
/// Performs HAL-specific post-GSP boot tasks.
///
diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
index 0a8b7f763883..9a4bb22578b3 100644
--- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
+++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
@@ -18,6 +18,7 @@
fb::FbLayout,
gpu::Chipset,
gsp::{
+ boot::BootUnloadGuard,
hal::GspHal,
Gsp,
GspFwWprMeta, //
@@ -31,17 +32,17 @@ impl GspHal for Gh100 {
///
/// This path uses FSP to establish a chain of trust and boot GSP-FMC. FSP handles
/// the GSP boot internally - no manual GSP reset/boot is needed.
- fn boot(
+ fn boot<'a>(
&self,
- _gsp: &Gsp,
- _dev: &device::Device<device::Bound>,
- _bar: &Bar0,
+ _gsp: &'a Gsp,
+ _dev: &'a device::Device<device::Bound>,
+ _bar: &'a Bar0,
_chipset: Chipset,
_fb_layout: &FbLayout,
_wpr_meta: &Coherent<GspFwWprMeta>,
- _gsp_falcon: &Falcon<GspEngine>,
- _sec2_falcon: &Falcon<Sec2>,
- ) -> Result<Option<crate::gsp::UnloadBundle>> {
+ _gsp_falcon: &'a Falcon<GspEngine>,
+ _sec2_falcon: &'a Falcon<Sec2>,
+ ) -> Result<BootUnloadGuard<'a>> {
Err(ENOTSUPP)
}
}
diff --git a/drivers/gpu/nova-core/gsp/hal/tu102.rs b/drivers/gpu/nova-core/gsp/hal/tu102.rs
index c4ab081f25c4..6a27e7e90279 100644
--- a/drivers/gpu/nova-core/gsp/hal/tu102.rs
+++ b/drivers/gpu/nova-core/gsp/hal/tu102.rs
@@ -32,6 +32,7 @@
},
gpu::Chipset,
gsp::{
+ boot::BootUnloadGuard,
hal::{
GspHal,
UnloadBundle, //
@@ -254,21 +255,23 @@ fn run_fwsec_frts(
struct Tu102;
impl GspHal for Tu102 {
- fn boot(
+ fn boot<'a>(
&self,
- gsp: &Gsp,
- dev: &device::Device<device::Bound>,
- bar: &Bar0,
+ gsp: &'a Gsp,
+ dev: &'a device::Device<device::Bound>,
+ bar: &'a Bar0,
chipset: Chipset,
fb_layout: &FbLayout,
wpr_meta: &Coherent<GspFwWprMeta>,
- gsp_falcon: &Falcon<GspEngine>,
- sec2_falcon: &Falcon<Sec2>,
- ) -> Result<Option<crate::gsp::UnloadBundle>> {
+ gsp_falcon: &'a Falcon<GspEngine>,
+ sec2_falcon: &'a Falcon<Sec2>,
+ ) -> Result<BootUnloadGuard<'a>> {
let bios = Vbios::new(dev, bar)?;
- // Try and prepare the unload bundle. If this fails, the GPU will need to be reset
- // before the driver can be probed again.
+ // Try and prepare the unload bundle.
+ //
+ // If the unload bundle creation fails, the GPU will need to be reset before the driver can
+ // be probed again.
let unload_bundle =
Sec2UnloadBundle::build(dev, bar, chipset, &bios, gsp_falcon, sec2_falcon)
.inspect_err(|e| {
@@ -279,8 +282,12 @@ fn boot(
"The GPU will need to be reset before the driver can bind again.\n"
);
})
- .map(crate::gsp::UnloadBundle)
- .ok();
+ .ok()
+ .map(crate::gsp::UnloadBundle);
+
+ // Wrap the unload bundle into a drop guard so it is automatically run upon failure.
+ let unload_guard =
+ BootUnloadGuard::new(gsp, dev, bar, gsp_falcon, sec2_falcon, unload_bundle);
// FWSEC-FRTS is not executed on chips where the FRTS region size is 0 (e.g. GA100).
if !fb_layout.frts.is_empty() {
@@ -311,7 +318,7 @@ fn boot(
)?
.run(dev, bar, sec2_falcon, wpr_meta)?;
- Ok(unload_bundle)
+ Ok(unload_guard)
}
fn post_boot(
--
2.54.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding
2026-05-29 7:33 [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
` (3 preceding siblings ...)
2026-05-29 7:33 ` [PATCH v7 4/4] gpu: nova-core: gsp: run the unload bundle if Gsp::boot() fails Alexandre Courbot
@ 2026-05-29 11:15 ` Danilo Krummrich
2026-05-29 13:06 ` Alexandre Courbot
2026-05-30 5:55 ` Alexandre Courbot
2026-06-04 6:57 ` Claude review: " Claude Code Review Bot
6 siblings, 1 reply; 19+ messages in thread
From: Danilo Krummrich @ 2026-05-29 11:15 UTC (permalink / raw)
To: Alexandre Courbot
Cc: Alice Ryhl, David Airlie, Simona Vetter, John Hubbard,
Alistair Popple, Timur Tabi, Eliot Courtney, nova-gpu, dri-devel,
linux-kernel, rust-for-linux
On Fri May 29, 2026 at 9:33 AM CEST, Alexandre Courbot wrote:
> Alexandre Courbot (4):
> gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL
> gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading
> gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding
> gpu: nova-core: gsp: run the unload bundle if Gsp::boot() fails
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
NIT: There's a few places with Result<()> instead of just Result that could be
cleaned up (including the moved code).
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding
2026-05-29 11:15 ` [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding Danilo Krummrich
@ 2026-05-29 13:06 ` Alexandre Courbot
0 siblings, 0 replies; 19+ messages in thread
From: Alexandre Courbot @ 2026-05-29 13:06 UTC (permalink / raw)
To: Danilo Krummrich
Cc: Alice Ryhl, David Airlie, Simona Vetter, John Hubbard,
Alistair Popple, Timur Tabi, Eliot Courtney, nova-gpu, dri-devel,
linux-kernel, rust-for-linux
On Fri May 29, 2026 at 8:15 PM JST, Danilo Krummrich wrote:
> On Fri May 29, 2026 at 9:33 AM CEST, Alexandre Courbot wrote:
>> Alexandre Courbot (4):
>> gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL
>> gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading
>> gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding
>> gpu: nova-core: gsp: run the unload bundle if Gsp::boot() fails
>
> Reviewed-by: Danilo Krummrich <dakr@kernel.org>
>
> NIT: There's a few places with Result<()> instead of just Result that could be
> cleaned up (including the moved code).
Oops, I don't know why I always write them this way. I've fixed locally
and will apply upon push if a v8 is not needed, thanks!
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v7 4/4] gpu: nova-core: gsp: run the unload bundle if Gsp::boot() fails
2026-05-29 7:33 ` [PATCH v7 4/4] gpu: nova-core: gsp: run the unload bundle if Gsp::boot() fails Alexandre Courbot
@ 2026-05-30 1:46 ` Eliot Courtney
2026-06-04 6:57 ` Claude review: " Claude Code Review Bot
1 sibling, 0 replies; 19+ messages in thread
From: Eliot Courtney @ 2026-05-30 1:46 UTC (permalink / raw)
To: Alexandre Courbot, Danilo Krummrich, Alice Ryhl, David Airlie,
Simona Vetter
Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
nova-gpu, dri-devel, linux-kernel, rust-for-linux, dri-devel
On Fri May 29, 2026 at 4:33 PM JST, Alexandre Courbot wrote:
> If `Gsp::boot` fails, the GSP can be left in a state where boot cannot
> be attempted again unless it is reset first.
>
> To avoid this, we want to run the unload bundle whenever `boot` fails to
> try and clear the partially-initialized state.
>
> Do this by wrapping the unload bundle into a drop guard up until `boot`
> returns. After that, running the unload bundle becomes the
> responsibility of the caller.
>
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding
2026-05-29 7:33 [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
` (4 preceding siblings ...)
2026-05-29 11:15 ` [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding Danilo Krummrich
@ 2026-05-30 5:55 ` Alexandre Courbot
2026-06-04 6:57 ` Claude review: " Claude Code Review Bot
6 siblings, 0 replies; 19+ messages in thread
From: Alexandre Courbot @ 2026-05-30 5:55 UTC (permalink / raw)
To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter
Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
nova-gpu, dri-devel, linux-kernel, rust-for-linux,
Alexandre Courbot
On Fri May 29, 2026 at 4:33 PM JST, Alexandre Courbot wrote:
> gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL
> gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading
> gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding
> gpu: nova-core: gsp: run the unload bundle if Gsp::boot() fails
All pushed to drm-rust-next (after s/Result<()>/Result)! Thanks for the
reviews.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Claude review: gpu: nova-core: run unload sequence upon unbinding
2026-05-29 7:33 [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
` (5 preceding siblings ...)
2026-05-30 5:55 ` Alexandre Courbot
@ 2026-06-04 6:57 ` Claude Code Review Bot
6 siblings, 0 replies; 19+ messages in thread
From: Claude Code Review Bot @ 2026-06-04 6:57 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: gpu: nova-core: run unload sequence upon unbinding
Author: Alexandre Courbot <acourbot@nvidia.com>
Patches: 9
Reviewed: 2026-06-04T16:57:32.956770
---
This is a well-structured 4-patch series by Alexandre Courbot (NVIDIA) that adds proper GPU unload/teardown for the nova-core driver. Currently, when the driver unbinds, the GSP firmware is left running and the WPR2 memory region is not cleaned up, which means a GPU reset is required before the driver can probe again. This is particularly problematic on Blackwell where PCI reset can corrupt the GPU.
The series is logically decomposed:
1. Extract chipset-specific boot code into a HAL pattern
2. Send the `UNLOADING_GUEST_DRIVER` command to stop the GSP
3. Run Booter Unloader and FWSEC-SB to reset WPR2 region
4. Add a drop guard to run the unload bundle on boot failure
The code quality is high — good error handling, proper use of Rust's type system to enforce lifecycle constraints (one-shot unload bundle, drop guards), and the HAL abstraction prepares cleanly for Hopper/Blackwell FSP boot paths. The v7 changelog shows this series has received thorough review iterations.
**Verdict**: Looks good overall. A few observations below, mostly minor.
---
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 19+ messages in thread
* Claude review: gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL
2026-05-29 7:33 ` [PATCH v7 1/4] gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL Alexandre Courbot
@ 2026-06-04 6:57 ` Claude Code Review Bot
0 siblings, 0 replies; 19+ messages in thread
From: Claude Code Review Bot @ 2026-06-04 6:57 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
This is a mechanical extraction of the boot code into a HAL trait pattern. The existing `run_fwsec_frts` and boot-related logic moves from `gsp/boot.rs` into `gsp/hal/tu102.rs`, with a trait `GspHal` defined in `gsp/hal.rs` and a stub `Gh100` HAL.
**Observations:**
1. The HAL dispatch is a static function returning `&'static dyn GspHal`:
```rust
pub(super) fn gsp_hal(chipset: Chipset) -> &'static dyn GspHal {
match chipset.arch() {
Architecture::Turing | Architecture::Ampere | Architecture::Ada => tu102::TU102_HAL,
Architecture::Hopper | Architecture::BlackwellGB10x | Architecture::BlackwellGB20x => {
gh100::GH100_HAL
}
}
}
```
This is clean — the match is exhaustive, the static dispatch avoids allocations, and the `dyn GspHal` provides the needed polymorphism. The `Send` bound on the trait is appropriate since the HAL instances are `&'static`.
2. The `post_boot` trait method has a default no-op implementation, which makes sense since GH100 doesn't need sequencer logic. Good trait design.
3. The `boot` method in `tu102.rs` takes `gsp: &Gsp` (versus `self: Pin<&mut Self>` in the original). This works because the HAL implementation accesses `gsp.libos` and `gsp.cmdq` through shared references, and those fields provide interior mutability where needed.
4. The `#[allow(clippy::too_many_arguments)]` on the trait method is pragmatic — the parameters are all genuinely needed. No concern here.
5. Minor: in `gsp/boot.rs` the call passes `&self` to `hal.boot()`:
```rust
hal.boot(
&self,
dev,
...
```
This is fine because `self` here is `Pin<&mut Gsp>` and `&self` reborrrows as `&Gsp`, which is what the HAL expects.
**No issues.** This is a clean mechanical refactoring.
---
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 19+ messages in thread
* Claude review: gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading
2026-05-29 7:33 ` [PATCH v7 2/4] gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading Alexandre Courbot
@ 2026-06-04 6:57 ` Claude Code Review Bot
0 siblings, 0 replies; 19+ messages in thread
From: Claude Code Review Bot @ 2026-06-04 6:57 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
This patch makes `Gpu` implement `PinnedDrop` to send the `UNLOADING_GUEST_DRIVER` RPC to the GSP when the driver unbinds.
**Observations:**
1. The `Gpu` struct gains a `device` field to retain the device reference for the drop path:
```rust
#[pin_data(PinnedDrop)]
pub(crate) struct Gpu<'gpu> {
device: &'gpu device::Device<device::Bound>,
...
```
This is necessary since `PinnedDrop::drop` doesn't receive the device as a parameter. Note the upstream base already has `bar: &'gpu Bar0` so this is consistent with the existing pattern.
2. The `PinnedDrop` implementation:
```rust
fn drop(self: Pin<&mut Self>) {
let this = self.project();
let device = *this.device;
let bar = *this.bar;
let _ = this
.gsp
.as_ref()
.get_ref()
.unload(device, bar, &*this.gsp_falcon)
.inspect_err(|e| dev_err!(device, "failed to unload GSP: {:?}\n", e));
}
```
The `let _ =` properly discards the error after logging it — correct for a drop path where there's nothing more to do. The `.as_ref().get_ref()` chain on the pin projection is the standard way to get `&Gsp` from a pinned field.
3. The `shutdown_gsp` method correctly polls mailbox0 for the `LIBOS_INTERRUPT_PROCESSOR_SUSPENDED` bit:
```rust
const LIBOS_INTERRUPT_PROCESSOR_SUSPENDED: u32 = bits::bit_u32(31);
read_poll_timeout(
|| Ok(gsp_falcon.read_mailbox0(bar)),
|&mb0| mb0 & LIBOS_INTERRUPT_PROCESSOR_SUSPENDED != 0,
Delta::from_millis(10),
Delta::from_secs(5),
)
```
5-second timeout with 10ms polling interval is reasonable for firmware shutdown.
4. The `PowerStateLevel` enum uses `#[expect(unused)]` which correctly anticipates that `Level3` and `Level7` are not used yet but will be needed for suspend/hibernate support. Only `Level0` (full unload) is used currently.
5. The `UnloadingGuestDriver` command implementation follows the existing pattern for `SetRegistry` and `SetGuestSystemInfo` commands — consistent and correct.
6. The `SAFETY` comments on `AsBytes` and `FromBytes` impls for `UnloadingGuestDriver` look correct — the struct is `#[repr(transparent)]` wrapping a C struct with explicit padding (`__bindgen_padding_0: [u8; 2usize]`), so no uninitialized padding.
7. `is_power_transition` method:
```rust
pub(crate) fn is_power_transition(self) -> bool {
self != PowerStateLevel::Level0
}
```
This is clean and semantically clear.
**No issues.** Well-implemented.
---
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 19+ messages in thread
* Claude review: gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding
2026-05-29 7:33 ` [PATCH v7 3/4] gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding Alexandre Courbot
@ 2026-06-04 6:57 ` Claude Code Review Bot
0 siblings, 0 replies; 19+ messages in thread
From: Claude Code Review Bot @ 2026-06-04 6:57 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
This is the most substantial patch. It prepares the Booter Unloader and FWSEC-SB firmware at probe time (since filesystem may not be available at unbind time) and runs them during teardown to clear the WPR2 region.
**Observations:**
1. The `FwsecUnloadFirmware` enum handling the with/without bootloader variants is clean:
```rust
enum FwsecUnloadFirmware {
WithoutBl(FwsecFirmware),
WithBl(FwsecFirmwareWithBl),
}
```
This avoids conditional logic at run time and makes both paths type-safe.
2. The `Sec2UnloadBundle` correctly pre-loads firmware at boot:
```rust
fn build(...) -> Result<KBox<dyn UnloadBundle>> {
KBox::new(
Self {
fwsec_sb: FwsecUnloadFirmware::new(dev, bar, chipset, bios, gsp_falcon)?,
booter_unloader: BooterFirmware::new(dev, BooterKind::Unloader, ...)?
},
GFP_KERNEL,
)
.map(|b| b as KBox<dyn UnloadBundle>)
.map_err(Into::into)
}
```
Using `KBox<dyn UnloadBundle>` for the trait object is correct for kernel heap allocation.
3. The unload bundle run sequence is well-ordered:
```rust
fn run(&self, dev, bar, gsp_falcon, sec2_falcon) -> Result<()> {
// 1. Run FWSEC-SB to reset GSP falcon
self.fwsec_sb.run(dev, bar, gsp_falcon)?;
// 2. Remove WPR2 via Booter Unloader
if wpr2_hi.is_wpr2_set() { ... }
}
```
FWSEC-SB first (resets falcon to pre-libos state), then Booter Unloader (removes WPR2). This matches OpenRM's sequence.
4. The mailbox sentinel pattern in the Booter Unloader execution:
```rust
const MAILBOX_SENTINEL: u32 = 0xff;
let (mbox0, _) = sec2_falcon.boot(bar, Some(MAILBOX_SENTINEL), Some(MAILBOX_SENTINEL))?;
if mbox0 != 0 {
dev_err!(dev, "Booter Unloader returned error 0x{:x}\n", mbox0);
return Err(EINVAL);
}
```
Writing `0xff` to both mailboxes and then checking mbox0 == 0 for success matches the pattern described in prior versions' changelogs as aligning with OpenRM behavior. Good.
5. **Important design decision**: The unload bundle preparation failure is non-fatal during boot:
```rust
let unload_bundle =
Sec2UnloadBundle::build(...)
.inspect_err(|e| {
dev_warn!(dev, "Failed to prepare unload firmware: {:?}\n", e);
...
})
.ok();
```
Converting the error to `None` via `.ok()` and continuing boot is the right tradeoff — failing to prepare unload firmware shouldn't prevent the GPU from being used, just means manual reset will be needed.
6. The `UnloadBundle` wrapper type in `gsp.rs`:
```rust
pub(crate) struct UnloadBundle(KBox<dyn hal::UnloadBundle>);
```
This provides an opaque boundary between the HAL-internal trait and the public API. Clean encapsulation.
7. In `gpu.rs`, the unload path continues even if GSP shutdown fails:
```rust
let mut res = Self::shutdown_gsp(...)
.inspect_err(|e| dev_err!(dev, "GSP shutdown failed: {:?}\n", e));
if let Some(unload_bundle) = unload_bundle {
res = res.and(
unload_bundle.0.run(...)
.inspect_err(|e| dev_err!(dev, "Unload bundle failed: {:?}\n", e)),
);
}
```
Using `res.and()` means if shutdown fails, the first error is preserved, but we still attempt the unload bundle. This is correct — even if the GSP didn't respond to the shutdown command, the booter unloader / FWSEC-SB may still succeed at resetting hardware state.
8. The `is_wpr2_set` helper is a trivial but useful abstraction:
```rust
pub(crate) fn is_wpr2_set(self) -> bool {
self.hi_val() != 0
}
```
This replaces `higher_bound() != 0` with a semantically clearer method. Note: `higher_bound()` shifts by 12, so checking `hi_val() != 0` is equivalent to checking `higher_bound() != 0` and avoids the unnecessary shift. Slightly more efficient, though it doesn't matter in practice.
9. Minor: `unload` now warns if no unload bundle is available and returns `Err(EAGAIN)`. `EAGAIN` seems slightly unusual here — the operation won't succeed on retry without external intervention (GPU reset). However, the cover letter explains this situation occurs only when bundle preparation fails, and the user was already warned at boot time. The error code won't propagate to userspace in the normal flow (it's in `PinnedDrop`), so this is mostly for internal tracking. Acceptable.
**No issues.** This is solid work.
---
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 19+ messages in thread
* Claude review: gpu: nova-core: gsp: run the unload bundle if Gsp::boot() fails
2026-05-29 7:33 ` [PATCH v7 4/4] gpu: nova-core: gsp: run the unload bundle if Gsp::boot() fails Alexandre Courbot
2026-05-30 1:46 ` Eliot Courtney
@ 2026-06-04 6:57 ` Claude Code Review Bot
1 sibling, 0 replies; 19+ messages in thread
From: Claude Code Review Bot @ 2026-06-04 6:57 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
This patch adds a `BootUnloadGuard` — a `ScopeGuard` that automatically runs the unload bundle if boot fails partway through.
**Observations:**
1. The `BootUnloadGuard` wraps `ScopeGuard<BootUnloadArgs, fn(BootUnloadArgs)>`:
```rust
pub(super) struct BootUnloadGuard<'a> {
guard: ScopeGuard<BootUnloadArgs<'a>, fn(BootUnloadArgs<'a>)>,
}
```
Using a concrete function pointer `fn(BootUnloadArgs)` instead of a closure avoids lifetime complications with closures capturing references. Clean approach.
2. The guard's callback calls `Gsp::unload`:
```rust
|args| {
let _ = super::Gsp::unload(
args.gsp,
args.dev,
args.bar,
args.gsp_falcon,
args.sec2_falcon,
args.unload_bundle,
);
},
```
The `let _ =` discards the result in the drop path — correct, since there's nothing to do with errors during cleanup of a failed boot.
3. The `dismiss()` method returns the unload bundle for the caller to take ownership:
```rust
pub(super) fn dismiss(self) -> Option<super::UnloadBundle> {
self.guard.dismiss().unload_bundle
}
```
This is the standard `ScopeGuard` pattern — on the success path, dismiss the guard and hand the resource to the normal owner.
4. The lifetime threading `'a` through the HAL `boot` signature is well done:
```rust
fn boot<'a>(
&self,
gsp: &'a Gsp,
dev: &'a device::Device<device::Bound>,
bar: &'a Bar0,
...
gsp_falcon: &'a Falcon<GspEngine>,
sec2_falcon: &'a Falcon<Sec2>,
) -> Result<BootUnloadGuard<'a>>;
```
This ensures the guard borrows from the same lifetime as the falcons and device, which is correct — the guard must not outlive the hardware resources it references.
5. **One concern**: The `BootUnloadGuard`'s drop callback calls `Gsp::unload`, which includes `shutdown_gsp` (sends `UNLOADING_GUEST_DRIVER` command). If boot fails early enough that the GSP isn't actually running yet (e.g., before `gsp_falcon.boot()`), the `shutdown_gsp` command will fail (5-second timeout). The unload bundle's `run()` would then also execute. This is probably fine — `shutdown_gsp` failure is logged and continued past, and the unload bundle itself should work regardless of GSP state. But it does mean a boot failure will incur an extra 5-second timeout from the futile shutdown attempt. Not a functional issue, but worth noting for debugging (the error logs will appear).
6. In `tu102.rs`, the guard construction placement is correct — it's created after the unload bundle is built but before FWSEC-FRTS:
```rust
let unload_bundle = Sec2UnloadBundle::build(...).ok().map(crate::gsp::UnloadBundle);
let unload_guard = BootUnloadGuard::new(gsp, dev, bar, gsp_falcon, sec2_falcon, unload_bundle);
// FWSEC-FRTS, falcon reset, booter load all happen after this
// Any `?` will trigger the guard
```
This means if FWSEC-FRTS, falcon boot, or booter load fails, the guard fires and cleans up. If unload bundle prep itself failed (returned `None`), the guard fires but `unload` logs a warning and returns `EAGAIN` — the hardware still won't be cleaned up, but that was already the case.
7. The `gh100.rs` stub correctly propagates the lifetime change and still returns `Err(ENOTSUPP)` — no guard needed since boot never succeeds.
**No issues.** The drop guard pattern is well-implemented and addresses a real concern about partial boot failures.
---
**Summary**: This is a clean, well-reviewed v7 series that addresses a real operational problem (inability to rebind without GPU reset). The HAL abstraction is forward-looking without being over-engineered, the error handling is thorough (continue-on-error in teardown paths, pre-load firmware at boot time), and the drop guard in patch 4 closes the gap for partial boot failures. All four patches are ready to merge.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2026-06-04 6:57 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-29 7:33 [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
2026-05-29 7:33 ` [PATCH v7 1/4] gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL Alexandre Courbot
2026-06-04 6:57 ` Claude review: " Claude Code Review Bot
2026-05-29 7:33 ` [PATCH v7 2/4] gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading Alexandre Courbot
2026-06-04 6:57 ` Claude review: " Claude Code Review Bot
2026-05-29 7:33 ` [PATCH v7 3/4] gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding Alexandre Courbot
2026-06-04 6:57 ` Claude review: " Claude Code Review Bot
2026-05-29 7:33 ` [PATCH v7 4/4] gpu: nova-core: gsp: run the unload bundle if Gsp::boot() fails Alexandre Courbot
2026-05-30 1:46 ` Eliot Courtney
2026-06-04 6:57 ` Claude review: " Claude Code Review Bot
2026-05-29 11:15 ` [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding Danilo Krummrich
2026-05-29 13:06 ` Alexandre Courbot
2026-05-30 5:55 ` Alexandre Courbot
2026-06-04 6:57 ` Claude review: " Claude Code Review Bot
-- strict thread matches above, loose matches on Subject: below --
2026-05-21 13:50 [PATCH v6 0/7] " Alexandre Courbot
2026-05-25 10:06 ` Claude review: " Claude Code Review Bot
2026-05-15 6:12 [PATCH v5 0/7] " Alexandre Courbot
2026-05-15 23:54 ` Claude review: " Claude Code Review Bot
2026-04-27 6:56 [PATCH v4 0/8] " Alexandre Courbot
2026-04-28 5:01 ` Claude review: " Claude Code Review Bot
2026-04-22 13:40 [PATCH v3 0/6] " Alexandre Courbot
2026-04-22 21:34 ` Claude review: " Claude Code Review Bot
2026-04-21 6:16 [PATCH v2 0/5] " Alexandre Courbot
2026-04-22 22:52 ` Claude review: " Claude Code Review Bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox