public inbox for drm-ai-reviews@public-inbox.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH v6 0/7] gpu: nova-core: run unload sequence upon unbinding
@ 2026-05-21 13:50 Alexandre Courbot
  2026-05-21 13:50 ` [PATCH v6 1/7] gpu: nova-core: remove unneeded get_gsp_info proxy function Alexandre Courbot
                   ` (7 more replies)
  0 siblings, 8 replies; 17+ messages in thread
From: Alexandre Courbot @ 2026-05-21 13:50 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter
  Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
	nova-gpu, dri-devel, linux-kernel, rust-for-linux,
	Alexandre Courbot, Gary Guo

Currently the GSP is left running and the WPR2 memory region untouched
when the driver is unbound. This is obviously not ideal for at least two
reasons:

- Probing requires setting up the WPR2 region, which cannot be done if
  there is already one in place. Hence the current requirement to reset
  the GPU (using e.g. `echo 1 >/sys/bus/pci/devices/.../reset`) before
  the driver can be probed again after removal.
- The running GSP may still attempt to access shared memory regions
  which the kernel might recycle.

On top of that, there is a nasty bug in the Blackwell VBIOS that
sometimes borks the GPU upon PCI reset, requiring a reboot. So relying
on the PCI reset to unload/reload Nova is really not practical here.

This series does what is needed to leave the GPU in a clean state after
unbind, for all currently supported GPUs. Blackwell support is basic and
will be added alongside the Blackwell series if this can be merged
first.

This revision rebases on top of the Device HRT series [1] and moves the
unload bundle from `Gsp` into `NovaCore`. This makes GSP unload
initiation only possible at the driver module level, which is the only
place that can consume the unload bundle.

A branch with the series and its required dependencies is available at
[2].

[1] https://lore.kernel.org/20260506215113.851360-1-dakr@kernel.org
[2] https://github.com/Gnurou/linux/tree/b4/nova-unload

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
Changes in v6:
- Inline TU102 local `run_booter` method in its unique call site.
- Rename unload bundle field to `unload_bundle`.
- Make Sec2UnloadBundle private.
- Continue GSP teardown upon partial failure.
- Store the unload bundle into `NovaCore`.
- Take the unload bundle by value to make it one-shot.
- Link to v5: https://patch.msgid.link/20260515-nova-unload-v5-0-c4d6250ad160@nvidia.com

Changes in v5:
- Rebase on top of the Device HRT series.
- Drop the now unneeded "gpu: nova-core: split BAR acquisition in unbind()".
- Link to v4: https://patch.msgid.link/20260427-nova-unload-v4-0-e145ccddae66@nvidia.com

Changes in v4:
- Remove `warn_on_err` macro as it isn't performing as expected and
  distracts from the goal of the series.
- Add John's patch from the Blackwell series refactoring the Booter
  Loader runner code.
- Add a GSP HAL and move the existing TU102/SEC2 boot sequence into it
  in preparation for the Hopper/Blackwell FSP boot path.
- Prepare the firmware required for unloading at probe time and save it
  into an unload bundle, as we cannot guarantee filesystem access at
  unload time.
- Constrain `UNLOADING_GUEST_DRIVER`'s visibility to the parent module.
- Also write the sentinel value `0xff` into `mbox1` when running Booter
  Unloader to align with OpenRM.
- Link to v3: https://patch.msgid.link/20260422-nova-unload-v3-0-1d2c81bd3ced@nvidia.com

Changes in v3:
- Disambiguate doccomment for `warn_on_err`.
- Test the correct bit instead of the whole register value to determine
  that the GSP has stopped.
- Use an enum instead of a boolean to encode the power level when
  shutting down the GSP.
- Add missing newline to `dev_err`.
- Add missing doccomments for new types.
- Use values from bindings instead of magic numbers.
- Remove the redundant `get_gsp_info` function.
- Better document Booter Unloader mailbox sentinel value, and check the
  value of mbox0 upon return.
- Link to v2: https://patch.msgid.link/20260421-nova-unload-v2-0-2fe54963af8b@nvidia.com

Changes in v2:
- Rebase on top of `master` and remove unneeded/obsolete preparatory patches.
- Tidy up the imports of commands from the `fw` module in the `gsp` module.
- Link to v1: https://patch.msgid.link/20251216-nova-unload-v1-0-6a5d823be19d@nvidia.com

---
Alexandre Courbot (6):
      gpu: nova-core: remove unneeded get_gsp_info proxy function
      gpu: nova-core: do not import firmware commands into GSP command module
      gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading
      gpu: nova-core: gsp: shuffle boot code a bit to keep chipset-specific parts close
      gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL
      gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding

John Hubbard (1):
      gpu: nova-core: refactor SEC2 booter loading into BooterFirmware::run()

 drivers/gpu/nova-core/driver.rs                   |  23 +-
 drivers/gpu/nova-core/firmware/booter.rs          |  32 +-
 drivers/gpu/nova-core/firmware/fwsec.rs           |   1 -
 drivers/gpu/nova-core/gpu.rs                      |  38 ++-
 drivers/gpu/nova-core/gsp.rs                      |   4 +
 drivers/gpu/nova-core/gsp/boot.rs                 | 262 ++++++-----------
 drivers/gpu/nova-core/gsp/commands.rs             |  72 +++--
 drivers/gpu/nova-core/gsp/fw.rs                   |   4 +
 drivers/gpu/nova-core/gsp/fw/commands.rs          |  45 +++
 drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs |  11 +
 drivers/gpu/nova-core/gsp/hal.rs                  |  93 ++++++
 drivers/gpu/nova-core/gsp/hal/gh100.rs            |  53 ++++
 drivers/gpu/nova-core/gsp/hal/tu102.rs            | 341 ++++++++++++++++++++++
 drivers/gpu/nova-core/regs.rs                     |   5 +
 14 files changed, 783 insertions(+), 201 deletions(-)
---
base-commit: 293c8393b49c9fc017168ddb46aa2012d508c921
change-id: 20251216-nova-unload-4029b3b76950

Best regards,
--  
Alexandre Courbot <acourbot@nvidia.com>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v6 1/7] gpu: nova-core: remove unneeded get_gsp_info proxy function
  2026-05-21 13:50 [PATCH v6 0/7] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
@ 2026-05-21 13:50 ` Alexandre Courbot
  2026-05-25 10:06   ` Claude review: " Claude Code Review Bot
  2026-05-21 13:50 ` [PATCH v6 2/7] gpu: nova-core: do not import firmware commands into GSP command module Alexandre Courbot
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Alexandre Courbot @ 2026-05-21 13:50 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter
  Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
	nova-gpu, dri-devel, linux-kernel, rust-for-linux,
	Alexandre Courbot, Gary Guo

This function was useful before the generic command-queue send methods
got merged, but it is just boilerplate now. Replace it with the correct
sequence to queue the `GetGspStaticInfo` command directly.

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Reviewed-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gsp/boot.rs     | 2 +-
 drivers/gpu/nova-core/gsp/commands.rs | 8 +-------
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index df105ef4b371..e838d61bef50 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -243,7 +243,7 @@ pub(crate) fn boot(
         commands::wait_gsp_init_done(&self.cmdq)?;
 
         // Obtain and display basic GPU information.
-        let info = commands::get_gsp_info(&self.cmdq, bar)?;
+        let info = self.cmdq.send_command(bar, commands::GetGspStaticInfo)?;
         match info.gpu_name() {
             Ok(name) => dev_info!(pdev, "GPU name: {}\n", name),
             Err(e) => dev_warn!(pdev, "GPU name unavailable: {:?}\n", e),
diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
index 0da5b92f4b27..74a8a79bd2d6 100644
--- a/drivers/gpu/nova-core/gsp/commands.rs
+++ b/drivers/gpu/nova-core/gsp/commands.rs
@@ -18,7 +18,6 @@
 };
 
 use crate::{
-    driver::Bar0,
     gsp::{
         cmdq::{
             Cmdq,
@@ -176,7 +175,7 @@ pub(crate) fn wait_gsp_init_done(cmdq: &Cmdq) -> Result {
 }
 
 /// The `GetGspStaticInfo` command.
-struct GetGspStaticInfo;
+pub(crate) struct GetGspStaticInfo;
 
 impl CommandToGsp for GetGspStaticInfo {
     const FUNCTION: MsgFunction = MsgFunction::GetGspStaticInfo;
@@ -232,8 +231,3 @@ pub(crate) fn gpu_name(&self) -> core::result::Result<&str, GpuNameError> {
             .map_err(GpuNameError::InvalidUtf8)
     }
 }
-
-/// Send the [`GetGspInfo`] command and awaits for its reply.
-pub(crate) fn get_gsp_info(cmdq: &Cmdq, bar: &Bar0) -> Result<GetGspStaticInfoReply> {
-    cmdq.send_command(bar, GetGspStaticInfo)
-}

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 2/7] gpu: nova-core: do not import firmware commands into GSP command module
  2026-05-21 13:50 [PATCH v6 0/7] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
  2026-05-21 13:50 ` [PATCH v6 1/7] gpu: nova-core: remove unneeded get_gsp_info proxy function Alexandre Courbot
@ 2026-05-21 13:50 ` Alexandre Courbot
  2026-05-25 10:06   ` Claude review: " Claude Code Review Bot
  2026-05-21 13:50 ` [PATCH v6 3/7] gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading Alexandre Courbot
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Alexandre Courbot @ 2026-05-21 13:50 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter
  Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
	nova-gpu, dri-devel, linux-kernel, rust-for-linux,
	Alexandre Courbot

Importing all the firmware commands like we did is a bit confusing, as
the layer of a command type (fw or GSP) cannot be inferred from looking
at its name alone. Furthermore it makes it impossible to create commands
that have the same name as their firmware command.

Thus, stop importing all commands and refer to them from the `fw` module
instead.

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gsp/commands.rs | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
index 74a8a79bd2d6..cc8df448ff39 100644
--- a/drivers/gpu/nova-core/gsp/commands.rs
+++ b/drivers/gpu/nova-core/gsp/commands.rs
@@ -1,4 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 
 use core::{
     array,
@@ -26,7 +27,7 @@
             NoReply, //
         },
         fw::{
-            commands::*,
+            self,
             MsgFunction, //
         },
     },
@@ -47,12 +48,12 @@ pub(crate) fn new(pdev: &'bound pci::Device<device::Bound>) -> Self {
 
 impl<'bound> CommandToGsp for SetSystemInfo<'bound> {
     const FUNCTION: MsgFunction = MsgFunction::GspSetSystemInfo;
-    type Command = GspSetSystemInfo;
+    type Command = fw::commands::GspSetSystemInfo;
     type Reply = NoReply;
     type InitError = Error;
 
     fn init(&self) -> impl Init<Self::Command, Self::InitError> {
-        GspSetSystemInfo::init(self.pdev)
+        Self::Command::init(self.pdev)
     }
 }
 
@@ -99,12 +100,12 @@ pub(crate) fn new() -> Self {
 
 impl CommandToGsp for SetRegistry {
     const FUNCTION: MsgFunction = MsgFunction::SetRegistry;
-    type Command = PackedRegistryTable;
+    type Command = fw::commands::PackedRegistryTable;
     type Reply = NoReply;
     type InitError = Infallible;
 
     fn init(&self) -> impl Init<Self::Command, Self::InitError> {
-        PackedRegistryTable::init(Self::NUM_ENTRIES as u32, self.variable_payload_len() as u32)
+        Self::Command::init(Self::NUM_ENTRIES as u32, self.variable_payload_len() as u32)
     }
 
     fn variable_payload_len(&self) -> usize {
@@ -112,22 +113,22 @@ fn variable_payload_len(&self) -> usize {
         for i in 0..Self::NUM_ENTRIES {
             key_size += self.entries[i].key.len() + 1; // +1 for NULL terminator
         }
-        Self::NUM_ENTRIES * size_of::<PackedRegistryEntry>() + key_size
+        Self::NUM_ENTRIES * size_of::<fw::commands::PackedRegistryEntry>() + key_size
     }
 
     fn init_variable_payload(
         &self,
         dst: &mut SBufferIter<core::array::IntoIter<&mut [u8], 2>>,
     ) -> Result {
-        let string_data_start_offset =
-            size_of::<PackedRegistryTable>() + Self::NUM_ENTRIES * size_of::<PackedRegistryEntry>();
+        let string_data_start_offset = size_of::<Self::Command>()
+            + Self::NUM_ENTRIES * size_of::<fw::commands::PackedRegistryEntry>();
 
         // Array for string data.
         let mut string_data = KVec::new();
 
         for entry in self.entries.iter().take(Self::NUM_ENTRIES) {
             dst.write_all(
-                PackedRegistryEntry::new(
+                fw::commands::PackedRegistryEntry::new(
                     (string_data_start_offset + string_data.len()) as u32,
                     entry.value,
                 )
@@ -179,12 +180,12 @@ pub(crate) fn wait_gsp_init_done(cmdq: &Cmdq) -> Result {
 
 impl CommandToGsp for GetGspStaticInfo {
     const FUNCTION: MsgFunction = MsgFunction::GetGspStaticInfo;
-    type Command = GspStaticConfigInfo;
+    type Command = fw::commands::GspStaticConfigInfo;
     type Reply = GetGspStaticInfoReply;
     type InitError = Infallible;
 
     fn init(&self) -> impl Init<Self::Command, Self::InitError> {
-        GspStaticConfigInfo::init_zeroed()
+        Self::Command::init_zeroed()
     }
 }
 
@@ -195,7 +196,7 @@ pub(crate) struct GetGspStaticInfoReply {
 
 impl MessageFromGsp for GetGspStaticInfoReply {
     const FUNCTION: MsgFunction = MsgFunction::GetGspStaticInfo;
-    type Message = GspStaticConfigInfo;
+    type Message = fw::commands::GspStaticConfigInfo;
     type InitError = Infallible;
 
     fn read(

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 3/7] gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading
  2026-05-21 13:50 [PATCH v6 0/7] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
  2026-05-21 13:50 ` [PATCH v6 1/7] gpu: nova-core: remove unneeded get_gsp_info proxy function Alexandre Courbot
  2026-05-21 13:50 ` [PATCH v6 2/7] gpu: nova-core: do not import firmware commands into GSP command module Alexandre Courbot
@ 2026-05-21 13:50 ` Alexandre Courbot
  2026-05-25 10:06   ` Claude review: " Claude Code Review Bot
  2026-05-21 13:50 ` [PATCH v6 4/7] gpu: nova-core: refactor SEC2 booter loading into BooterFirmware::run() Alexandre Courbot
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Alexandre Courbot @ 2026-05-21 13:50 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter
  Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
	nova-gpu, dri-devel, linux-kernel, rust-for-linux,
	Alexandre Courbot

Currently, the GSP is left running after the driver is unbound. This is
not great for several reasons, notably that it can still access shared
memory areas that the kernel will now reclaim (especially problematic on
setups without an IOMMU).

Fix this by sending the `UNLOADING_GUEST_DRIVER` GSP command when
unbinding. This stops the GSP and lets us proceed with the rest of the
unbind sequence in a later patch.

We make use of the `pci::Driver::unbind()` hook as we need access to the
device in order to properly unbind.

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/driver.rs                   |  7 ++++
 drivers/gpu/nova-core/gpu.rs                      |  7 ++++
 drivers/gpu/nova-core/gsp/boot.rs                 | 45 +++++++++++++++++++++++
 drivers/gpu/nova-core/gsp/commands.rs             | 43 ++++++++++++++++++++++
 drivers/gpu/nova-core/gsp/fw.rs                   |  4 ++
 drivers/gpu/nova-core/gsp/fw/commands.rs          | 45 +++++++++++++++++++++++
 drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs | 11 ++++++
 7 files changed, 162 insertions(+)

diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
index 6571a8c59d09..ced1d38d206a 100644
--- a/drivers/gpu/nova-core/driver.rs
+++ b/drivers/gpu/nova-core/driver.rs
@@ -114,4 +114,11 @@ fn probe<'bound>(
             }))
         })
     }
+
+    fn unbind<'bound>(
+        dev: &'bound pci::Device<kernel::device::Core>,
+        this: Pin<&'bound Self::Data<'bound>>,
+    ) {
+        this.gpu.unbind(dev)
+    }
 }
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 4c329ab40955..75fe1bdb80fe 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -293,4 +293,11 @@ pub(crate) fn new(
             _: { gsp.boot(pdev, bar, spec.chipset, gsp_falcon, sec2_falcon)? },
         })
     }
+
+    pub(crate) fn unbind(&self, pdev: &'bound pci::Device<device::Core>) {
+        let _ = self
+            .gsp
+            .unload(pdev.as_ref(), self.bar, &self.gsp_falcon)
+            .inspect_err(|e| dev_err!(pdev, "failed to unload GSP: {:?}\n", e));
+    }
 }
diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index e838d61bef50..c123080148f1 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -1,6 +1,8 @@
 // SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 
 use kernel::{
+    bits,
     device,
     dma::Coherent,
     io::poll::read_poll_timeout,
@@ -36,6 +38,7 @@
         Chipset, //
     },
     gsp::{
+        cmdq::Cmdq,
         commands,
         sequencer::{
             GspSequencer,
@@ -251,4 +254,46 @@ pub(crate) fn boot(
 
         Ok(())
     }
+
+    /// Shut down the GSP and wait until it is offline.
+    fn shutdown_gsp(
+        cmdq: &Cmdq,
+        bar: &Bar0,
+        gsp_falcon: &Falcon<Gsp>,
+        mode: commands::PowerStateLevel,
+    ) -> Result<()> {
+        // Command to shut the GSP down.
+        cmdq.send_command(bar, commands::UnloadingGuestDriver::new(mode))?;
+
+        // Wait until GSP signals it is suspended.
+        const LIBOS_INTERRUPT_PROCESSOR_SUSPENDED: u32 = bits::bit_u32(31);
+        read_poll_timeout(
+            || Ok(gsp_falcon.read_mailbox0(bar)),
+            |&mb0| mb0 & LIBOS_INTERRUPT_PROCESSOR_SUSPENDED != 0,
+            Delta::from_millis(10),
+            Delta::from_secs(5),
+        )
+        .map(|_| ())
+    }
+
+    /// Attempts to unload the GSP firmware.
+    ///
+    /// This stops all activity on the GSP.
+    pub(crate) fn unload(
+        &self,
+        dev: &device::Device<device::Bound>,
+        bar: &Bar0,
+        gsp_falcon: &Falcon<Gsp>,
+    ) -> Result {
+        // Shut down the GSP.
+        Self::shutdown_gsp(
+            &self.cmdq,
+            bar,
+            gsp_falcon,
+            commands::PowerStateLevel::Level0,
+        )
+        .inspect_err(|e| dev_err!(dev, "Unload guest driver failed: {:?}\n", e))?;
+
+        Ok(())
+    }
 }
diff --git a/drivers/gpu/nova-core/gsp/commands.rs b/drivers/gpu/nova-core/gsp/commands.rs
index cc8df448ff39..1ab1f972249f 100644
--- a/drivers/gpu/nova-core/gsp/commands.rs
+++ b/drivers/gpu/nova-core/gsp/commands.rs
@@ -232,3 +232,46 @@ pub(crate) fn gpu_name(&self) -> core::result::Result<&str, GpuNameError> {
             .map_err(GpuNameError::InvalidUtf8)
     }
 }
+
+pub(crate) use fw::commands::PowerStateLevel;
+
+/// The `UnloadingGuestDriver` command, used to shut down the GSP.
+///
+/// Only used within the `gsp` module.
+pub(super) struct UnloadingGuestDriver {
+    level: PowerStateLevel,
+}
+
+impl UnloadingGuestDriver {
+    /// Creates a new `UnloadingGuestDriver` command for the given [`PowerStateLevel`].
+    pub(super) fn new(level: PowerStateLevel) -> Self {
+        Self { level }
+    }
+}
+
+impl CommandToGsp for UnloadingGuestDriver {
+    const FUNCTION: MsgFunction = MsgFunction::UnloadingGuestDriver;
+    type Command = fw::commands::UnloadingGuestDriver;
+    type Reply = UnloadingGuestDriverReply;
+    type InitError = Infallible;
+
+    fn init(&self) -> impl Init<Self::Command, Self::InitError> {
+        fw::commands::UnloadingGuestDriver::new(self.level)
+    }
+}
+
+/// The reply from the GSP to the [`UnloadingGuestDriver`] command.
+pub(super) struct UnloadingGuestDriverReply;
+
+impl MessageFromGsp for UnloadingGuestDriverReply {
+    const FUNCTION: MsgFunction = MsgFunction::UnloadingGuestDriver;
+    type InitError = Infallible;
+    type Message = ();
+
+    fn read(
+        _msg: &Self::Message,
+        _sbuffer: &mut SBufferIter<array::IntoIter<&[u8], 2>>,
+    ) -> Result<Self, Self::InitError> {
+        Ok(UnloadingGuestDriverReply)
+    }
+}
diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
index 3245793bbe42..33c9f5860771 100644
--- a/drivers/gpu/nova-core/gsp/fw.rs
+++ b/drivers/gpu/nova-core/gsp/fw.rs
@@ -279,6 +279,7 @@ pub(crate) enum MsgFunction {
     Nop = bindings::NV_VGPU_MSG_FUNCTION_NOP,
     SetGuestSystemInfo = bindings::NV_VGPU_MSG_FUNCTION_SET_GUEST_SYSTEM_INFO,
     SetRegistry = bindings::NV_VGPU_MSG_FUNCTION_SET_REGISTRY,
+    UnloadingGuestDriver = bindings::NV_VGPU_MSG_FUNCTION_UNLOADING_GUEST_DRIVER,
 
     // Event codes
     GspInitDone = bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE,
@@ -323,6 +324,9 @@ fn try_from(value: u32) -> Result<MsgFunction> {
                 Ok(MsgFunction::SetGuestSystemInfo)
             }
             bindings::NV_VGPU_MSG_FUNCTION_SET_REGISTRY => Ok(MsgFunction::SetRegistry),
+            bindings::NV_VGPU_MSG_FUNCTION_UNLOADING_GUEST_DRIVER => {
+                Ok(MsgFunction::UnloadingGuestDriver)
+            }
 
             // Event codes
             bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE => Ok(MsgFunction::GspInitDone),
diff --git a/drivers/gpu/nova-core/gsp/fw/commands.rs b/drivers/gpu/nova-core/gsp/fw/commands.rs
index d3ef7ecdd73e..00e54361839a 100644
--- a/drivers/gpu/nova-core/gsp/fw/commands.rs
+++ b/drivers/gpu/nova-core/gsp/fw/commands.rs
@@ -1,4 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 
 use kernel::{
     device,
@@ -131,3 +132,47 @@ unsafe impl AsBytes for GspStaticConfigInfo {}
 // SAFETY: This struct only contains integer types for which all bit patterns
 // are valid.
 unsafe impl FromBytes for GspStaticConfigInfo {}
+
+/// Power level requested to the [`UnloadingGuestDriver`] command.
+#[derive(Clone, Copy, Debug, PartialEq, Eq)]
+#[repr(u32)]
+#[expect(unused)]
+pub(crate) enum PowerStateLevel {
+    /// Full unload.
+    Level0 = bindings::NV2080_CTRL_GPU_SET_POWER_STATE_GPU_LEVEL_0,
+    /// S3 (suspend to RAM).
+    Level3 = bindings::NV2080_CTRL_GPU_SET_POWER_STATE_GPU_LEVEL_3,
+    /// Hibernate (suspend to disk).
+    Level7 = bindings::NV2080_CTRL_GPU_SET_POWER_STATE_GPU_LEVEL_7,
+}
+
+impl PowerStateLevel {
+    /// Returns `true` if this state represents a power management transition, i.e. some GPU state
+    /// must survive it (as opposed to a full unload).
+    pub(crate) fn is_power_transition(self) -> bool {
+        self != PowerStateLevel::Level0
+    }
+}
+
+/// Payload of the `UnloadingGuestDriver` command and message.
+#[repr(transparent)]
+#[derive(Clone, Copy, Debug, Zeroable)]
+pub(crate) struct UnloadingGuestDriver(bindings::rpc_unloading_guest_driver_v1F_07);
+
+impl UnloadingGuestDriver {
+    pub(crate) fn new(level: PowerStateLevel) -> Self {
+        Self(bindings::rpc_unloading_guest_driver_v1F_07 {
+            bInPMTransition: u8::from(level.is_power_transition()),
+            bGc6Entering: 0,
+            newLevel: level as u32,
+            ..Zeroable::zeroed()
+        })
+    }
+}
+
+// SAFETY: Padding is explicit and will not contain uninitialized data.
+unsafe impl AsBytes for UnloadingGuestDriver {}
+
+// SAFETY: This struct only contains integer types for which all bit patterns
+// are valid.
+unsafe impl FromBytes for UnloadingGuestDriver {}
diff --git a/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs b/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
index 334e8be5fde8..f82ed097b283 100644
--- a/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
+++ b/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
@@ -30,6 +30,9 @@ fn fmt(&self, fmt: &mut ::core::fmt::Formatter<'_>) -> ::core::fmt::Result {
         fmt.write_str("__IncompleteArrayField")
     }
 }
+pub const NV2080_CTRL_GPU_SET_POWER_STATE_GPU_LEVEL_0: u32 = 0;
+pub const NV2080_CTRL_GPU_SET_POWER_STATE_GPU_LEVEL_3: u32 = 3;
+pub const NV2080_CTRL_GPU_SET_POWER_STATE_GPU_LEVEL_7: u32 = 7;
 pub const NV_VGPU_MSG_SIGNATURE_VALID: u32 = 1129337430;
 pub const GSP_FW_HEAP_PARAM_OS_SIZE_LIBOS2: u32 = 0;
 pub const GSP_FW_HEAP_PARAM_OS_SIZE_LIBOS3_BAREMETAL: u32 = 23068672;
@@ -880,6 +883,14 @@ fn default() -> Self {
     }
 }
 #[repr(C)]
+#[derive(Debug, Default, Copy, Clone, MaybeZeroable)]
+pub struct rpc_unloading_guest_driver_v1F_07 {
+    pub bInPMTransition: u8_,
+    pub bGc6Entering: u8_,
+    pub __bindgen_padding_0: [u8; 2usize],
+    pub newLevel: u32_,
+}
+#[repr(C)]
 #[derive(Debug, Default, MaybeZeroable)]
 pub struct rpc_run_cpu_sequencer_v17_00 {
     pub bufferSizeDWord: u32_,

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 4/7] gpu: nova-core: refactor SEC2 booter loading into BooterFirmware::run()
  2026-05-21 13:50 [PATCH v6 0/7] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
                   ` (2 preceding siblings ...)
  2026-05-21 13:50 ` [PATCH v6 3/7] gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading Alexandre Courbot
@ 2026-05-21 13:50 ` Alexandre Courbot
  2026-05-25 10:06   ` Claude review: " Claude Code Review Bot
  2026-05-21 13:50 ` [PATCH v6 5/7] gpu: nova-core: gsp: shuffle boot code a bit to keep chipset-specific parts close Alexandre Courbot
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Alexandre Courbot @ 2026-05-21 13:50 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter
  Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
	nova-gpu, dri-devel, linux-kernel, rust-for-linux,
	Alexandre Courbot

From: John Hubbard <jhubbard@nvidia.com>

Move the SEC2 reset/load/boot sequence into a BooterFirmware::run()
method This is mostly refactoring, with no significant behavior change,
done in preparation for adding an alternative FSP boot path.

Suggested-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Co-developed-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/firmware/booter.rs | 31 +++++++++++++++++++++++++++++++
 drivers/gpu/nova-core/gsp/boot.rs        | 30 ++++++++----------------------
 2 files changed, 39 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/nova-core/firmware/booter.rs b/drivers/gpu/nova-core/firmware/booter.rs
index de2a4536b532..e45e5dc8d5d2 100644
--- a/drivers/gpu/nova-core/firmware/booter.rs
+++ b/drivers/gpu/nova-core/firmware/booter.rs
@@ -1,4 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 
 //! Support for loading and patching the `Booter` firmware. `Booter` is a Heavy Secured firmware
 //! running on [`Sec2`], that is used on Turing/Ampere to load the GSP firmware into the GSP falcon
@@ -8,6 +9,7 @@
 
 use kernel::{
     device,
+    dma::Coherent,
     prelude::*,
     transmute::FromBytes, //
 };
@@ -396,6 +398,35 @@ pub(crate) fn new(
             ucode: ucode_signed,
         })
     }
+
+    /// Load and run the booter firmware on SEC2.
+    ///
+    /// Resets SEC2, loads this firmware image, then boots with the WPR metadata
+    /// address passed via the SEC2 mailboxes.
+    pub(crate) fn run<T>(
+        &self,
+        dev: &device::Device<device::Bound>,
+        bar: &Bar0,
+        sec2_falcon: &Falcon<Sec2>,
+        wpr_meta: &Coherent<T>,
+    ) -> Result {
+        sec2_falcon.reset(bar)?;
+        sec2_falcon.load(dev, bar, self)?;
+        let wpr_handle = wpr_meta.dma_handle();
+        let (mbox0, mbox1) = sec2_falcon.boot(
+            bar,
+            Some(wpr_handle as u32),
+            Some((wpr_handle >> 32) as u32),
+        )?;
+        dev_dbg!(dev, "SEC2 MBOX0: {:#x}, MBOX1: {:#x}\n", mbox0, mbox1);
+
+        if mbox0 != 0 {
+            dev_err!(dev, "Booter-load failed with error {:#x}\n", mbox0);
+            return Err(ENODEV);
+        }
+
+        Ok(())
+    }
 }
 
 impl FalconDmaLoadable for BooterFirmware {
diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index c123080148f1..4f654e10259a 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -174,15 +174,6 @@ pub(crate) fn boot(
             Self::run_fwsec_frts(dev, chipset, gsp_falcon, bar, &bios, &fb_layout)?;
         }
 
-        let booter_loader = BooterFirmware::new(
-            dev,
-            BooterKind::Loader,
-            chipset,
-            FIRMWARE_VERSION,
-            sec2_falcon,
-            bar,
-        )?;
-
         let wpr_meta = Coherent::init(dev, GFP_KERNEL, GspFwWprMeta::new(&gsp_fw, &fb_layout))?;
 
         self.cmdq
@@ -204,20 +195,15 @@ pub(crate) fn boot(
             "Using SEC2 to load and run the booter_load firmware...\n"
         );
 
-        sec2_falcon.reset(bar)?;
-        sec2_falcon.load(dev, bar, &booter_loader)?;
-        let wpr_handle = wpr_meta.dma_handle();
-        let (mbox0, mbox1) = sec2_falcon.boot(
+        BooterFirmware::new(
+            dev,
+            BooterKind::Loader,
+            chipset,
+            FIRMWARE_VERSION,
+            sec2_falcon,
             bar,
-            Some(wpr_handle as u32),
-            Some((wpr_handle >> 32) as u32),
-        )?;
-        dev_dbg!(pdev, "SEC2 MBOX0: {:#x}, MBOX1: {:#x}\n", mbox0, mbox1);
-
-        if mbox0 != 0 {
-            dev_err!(pdev, "Booter-load failed with error {:#x}\n", mbox0);
-            return Err(ENODEV);
-        }
+        )?
+        .run(dev, bar, sec2_falcon, &wpr_meta)?;
 
         gsp_falcon.write_os_version(bar, gsp_fw.bootloader.app_version);
 

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 5/7] gpu: nova-core: gsp: shuffle boot code a bit to keep chipset-specific parts close
  2026-05-21 13:50 [PATCH v6 0/7] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
                   ` (3 preceding siblings ...)
  2026-05-21 13:50 ` [PATCH v6 4/7] gpu: nova-core: refactor SEC2 booter loading into BooterFirmware::run() Alexandre Courbot
@ 2026-05-21 13:50 ` Alexandre Courbot
  2026-05-25 10:06   ` Claude review: " Claude Code Review Bot
  2026-05-21 13:50 ` [PATCH v6 6/7] gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL Alexandre Courbot
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 17+ messages in thread
From: Alexandre Courbot @ 2026-05-21 13:50 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter
  Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
	nova-gpu, dri-devel, linux-kernel, rust-for-linux,
	Alexandre Courbot

Some parts of the GSP boot process are chip-specific actions, whereas
others (like sending the initial post-boot messages) deal directly with
the working GSP.

Reorganize the boot code a bit so the chipset-specific parts are clumped
together, which will make their extraction into a HAL easier.

This has no effect on the GSP boot process.

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gsp/boot.rs | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 4f654e10259a..11b1cd2db8cb 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -169,18 +169,13 @@ pub(crate) fn boot(
         let fb_layout = FbLayout::new(chipset, bar, &gsp_fw)?;
         dev_dbg!(dev, "{:#x?}\n", fb_layout);
 
+        let wpr_meta = Coherent::init(dev, GFP_KERNEL, GspFwWprMeta::new(&gsp_fw, &fb_layout))?;
+
         // FWSEC-FRTS is not executed on chips where the FRTS region size is 0 (e.g. GA100).
         if !fb_layout.frts.is_empty() {
             Self::run_fwsec_frts(dev, chipset, gsp_falcon, bar, &bios, &fb_layout)?;
         }
 
-        let wpr_meta = Coherent::init(dev, GFP_KERNEL, GspFwWprMeta::new(&gsp_fw, &fb_layout))?;
-
-        self.cmdq
-            .send_command_no_wait(bar, commands::SetSystemInfo::new(pdev))?;
-        self.cmdq
-            .send_command_no_wait(bar, commands::SetRegistry::new())?;
-
         gsp_falcon.reset(bar)?;
         let libos_handle = self.libos.dma_handle();
         let (mbox0, mbox1) = gsp_falcon.boot(
@@ -217,6 +212,11 @@ pub(crate) fn boot(
 
         dev_dbg!(pdev, "RISC-V active? {}\n", gsp_falcon.is_riscv_active(bar),);
 
+        self.cmdq
+            .send_command_no_wait(bar, commands::SetSystemInfo::new(pdev))?;
+        self.cmdq
+            .send_command_no_wait(bar, commands::SetRegistry::new())?;
+
         // Create and run the GSP sequencer.
         let seq_params = GspSequencerParams {
             bootloader_app_version: gsp_fw.bootloader.app_version,

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 6/7] gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL
  2026-05-21 13:50 [PATCH v6 0/7] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
                   ` (4 preceding siblings ...)
  2026-05-21 13:50 ` [PATCH v6 5/7] gpu: nova-core: gsp: shuffle boot code a bit to keep chipset-specific parts close Alexandre Courbot
@ 2026-05-21 13:50 ` Alexandre Courbot
  2026-05-25 10:06   ` Claude review: " Claude Code Review Bot
  2026-05-21 13:50 ` [PATCH v6 7/7] gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding Alexandre Courbot
  2026-05-25 10:06 ` Claude review: gpu: nova-core: run unload sequence " Claude Code Review Bot
  7 siblings, 1 reply; 17+ messages in thread
From: Alexandre Courbot @ 2026-05-21 13:50 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter
  Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
	nova-gpu, dri-devel, linux-kernel, rust-for-linux,
	Alexandre Courbot

Booting the GSP is done differently depending on the architecture. Move
the parts that are chipset-specific under a HAL.

This does not change much at the moment, since the differences between
Turing and Ampere are rather benign, but will become critical to
properly support the FSP boot process used by Hopper and Blackwell.

The Hopper/Blackwell support is not merged yet, so their HAL is a stub
for now.

This patch is intended to be a mechanical code extraction with no
behavioral changes.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/gsp.rs           |   1 +
 drivers/gpu/nova-core/gsp/boot.rs      | 168 +++------------------------
 drivers/gpu/nova-core/gsp/hal.rs       |  74 ++++++++++++
 drivers/gpu/nova-core/gsp/hal/gh100.rs |  50 ++++++++
 drivers/gpu/nova-core/gsp/hal/tu102.rs | 206 +++++++++++++++++++++++++++++++++
 5 files changed, 345 insertions(+), 154 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
index ba5b7f990031..38378f104068 100644
--- a/drivers/gpu/nova-core/gsp.rs
+++ b/drivers/gpu/nova-core/gsp.rs
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 
 mod boot;
+mod hal;
 
 use kernel::{
     debugfs,
diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 11b1cd2db8cb..447c9b083039 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -6,7 +6,6 @@
     device,
     dma::Coherent,
     io::poll::read_poll_timeout,
-    io::Io,
     pci,
     prelude::*,
     time::Delta, //
@@ -21,122 +20,18 @@
     },
     fb::FbLayout,
     firmware::{
-        booter::{
-            BooterFirmware,
-            BooterKind, //
-        },
-        fwsec::{
-            bootloader::FwsecFirmwareWithBl,
-            FwsecCommand,
-            FwsecFirmware, //
-        },
         gsp::GspFirmware,
         FIRMWARE_VERSION, //
     },
-    gpu::{
-        Architecture,
-        Chipset, //
-    },
+    gpu::Chipset,
     gsp::{
         cmdq::Cmdq,
         commands,
-        sequencer::{
-            GspSequencer,
-            GspSequencerParams, //
-        },
         GspFwWprMeta, //
     },
-    regs,
-    vbios::Vbios,
 };
 
 impl super::Gsp {
-    /// Helper function to load and run the FWSEC-FRTS firmware and confirm that it has properly
-    /// created the WPR2 region.
-    fn run_fwsec_frts(
-        dev: &device::Device<device::Bound>,
-        chipset: Chipset,
-        falcon: &Falcon<Gsp>,
-        bar: &Bar0,
-        bios: &Vbios,
-        fb_layout: &FbLayout,
-    ) -> Result<()> {
-        // Check that the WPR2 region does not already exists - if it does, we cannot run
-        // FWSEC-FRTS until the GPU is reset.
-        if bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI).higher_bound() != 0 {
-            dev_err!(
-                dev,
-                "WPR2 region already exists - GPU needs to be reset to proceed\n"
-            );
-            return Err(EBUSY);
-        }
-
-        // FWSEC-FRTS will create the WPR2 region.
-        let fwsec_frts = FwsecFirmware::new(
-            dev,
-            falcon,
-            bar,
-            bios,
-            FwsecCommand::Frts {
-                frts_addr: fb_layout.frts.start,
-                frts_size: fb_layout.frts.len(),
-            },
-        )?;
-
-        if chipset.needs_fwsec_bootloader() {
-            let fwsec_frts_bl = FwsecFirmwareWithBl::new(fwsec_frts, dev, chipset)?;
-            // Load and run the bootloader, which will load FWSEC-FRTS and run it.
-            fwsec_frts_bl.run(dev, falcon, bar)?;
-        } else {
-            // Load and run FWSEC-FRTS directly.
-            fwsec_frts.run(dev, falcon, bar)?;
-        }
-
-        // SCRATCH_E contains the error code for FWSEC-FRTS.
-        let frts_status = bar
-            .read(regs::NV_PBUS_SW_SCRATCH_0E_FRTS_ERR)
-            .frts_err_code();
-        if frts_status != 0 {
-            dev_err!(
-                dev,
-                "FWSEC-FRTS returned with error code {:#x}\n",
-                frts_status
-            );
-
-            return Err(EIO);
-        }
-
-        // Check that the WPR2 region has been created as we requested.
-        let (wpr2_lo, wpr2_hi) = (
-            bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_LO).lower_bound(),
-            bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI).higher_bound(),
-        );
-
-        match (wpr2_lo, wpr2_hi) {
-            (_, 0) => {
-                dev_err!(dev, "WPR2 region not created after running FWSEC-FRTS\n");
-
-                Err(EIO)
-            }
-            (wpr2_lo, _) if wpr2_lo != fb_layout.frts.start => {
-                dev_err!(
-                    dev,
-                    "WPR2 region created at unexpected address {:#x}; expected {:#x}\n",
-                    wpr2_lo,
-                    fb_layout.frts.start,
-                );
-
-                Err(EIO)
-            }
-            (wpr2_lo, wpr2_hi) => {
-                dev_dbg!(dev, "WPR2: {:#x}-{:#x}\n", wpr2_lo, wpr2_hi);
-                dev_dbg!(dev, "GPU instance built\n");
-
-                Ok(())
-            }
-        }
-    }
-
     /// Attempt to boot the GSP.
     ///
     /// This is a GPU-dependent and complex procedure that involves loading firmware files from
@@ -145,24 +40,15 @@ fn run_fwsec_frts(
     ///
     /// Upon return, the GSP is up and running, and its runtime object given as return value.
     pub(crate) fn boot(
-        self: Pin<&mut Self>,
+        mut self: Pin<&mut Self>,
         pdev: &pci::Device<device::Bound>,
         bar: &Bar0,
         chipset: Chipset,
         gsp_falcon: &Falcon<Gsp>,
         sec2_falcon: &Falcon<Sec2>,
     ) -> Result {
-        // The FSP boot process of Hopper+ is not supported for now.
-        if matches!(
-            chipset.arch(),
-            Architecture::Hopper | Architecture::BlackwellGB10x | Architecture::BlackwellGB20x
-        ) {
-            return Err(ENOTSUPP);
-        }
-
         let dev = pdev.as_ref();
-
-        let bios = Vbios::new(dev, bar)?;
+        let hal = super::hal::gsp_hal(chipset);
 
         let gsp_fw = KBox::pin_init(GspFirmware::new(dev, chipset, FIRMWARE_VERSION), GFP_KERNEL)?;
 
@@ -171,38 +57,21 @@ pub(crate) fn boot(
 
         let wpr_meta = Coherent::init(dev, GFP_KERNEL, GspFwWprMeta::new(&gsp_fw, &fb_layout))?;
 
-        // FWSEC-FRTS is not executed on chips where the FRTS region size is 0 (e.g. GA100).
-        if !fb_layout.frts.is_empty() {
-            Self::run_fwsec_frts(dev, chipset, gsp_falcon, bar, &bios, &fb_layout)?;
-        }
-
-        gsp_falcon.reset(bar)?;
-        let libos_handle = self.libos.dma_handle();
-        let (mbox0, mbox1) = gsp_falcon.boot(
-            bar,
-            Some(libos_handle as u32),
-            Some((libos_handle >> 32) as u32),
-        )?;
-        dev_dbg!(pdev, "GSP MBOX0: {:#x}, MBOX1: {:#x}\n", mbox0, mbox1);
-
-        dev_dbg!(
-            pdev,
-            "Using SEC2 to load and run the booter_load firmware...\n"
-        );
-
-        BooterFirmware::new(
+        // Perform the chipset-specific boot sequence.
+        hal.boot(
+            self.as_mut(),
             dev,
-            BooterKind::Loader,
-            chipset,
-            FIRMWARE_VERSION,
-            sec2_falcon,
             bar,
-        )?
-        .run(dev, bar, sec2_falcon, &wpr_meta)?;
+            chipset,
+            &fb_layout,
+            &wpr_meta,
+            gsp_falcon,
+            sec2_falcon,
+        )?;
 
         gsp_falcon.write_os_version(bar, gsp_fw.bootloader.app_version);
 
-        // Poll for RISC-V to become active before running sequencer
+        // Poll for RISC-V to become active before continuing.
         read_poll_timeout(
             || Ok(gsp_falcon.is_riscv_active(bar)),
             |val: &bool| *val,
@@ -217,16 +86,7 @@ pub(crate) fn boot(
         self.cmdq
             .send_command_no_wait(bar, commands::SetRegistry::new())?;
 
-        // Create and run the GSP sequencer.
-        let seq_params = GspSequencerParams {
-            bootloader_app_version: gsp_fw.bootloader.app_version,
-            libos_dma_handle: libos_handle,
-            gsp_falcon,
-            sec2_falcon,
-            dev: pdev.as_ref().into(),
-            bar,
-        };
-        GspSequencer::run(&self.cmdq, seq_params)?;
+        hal.post_boot(self.as_mut(), dev, bar, &gsp_fw, gsp_falcon, sec2_falcon)?;
 
         // Wait until GSP is fully initialized.
         commands::wait_gsp_init_done(&self.cmdq)?;
diff --git a/drivers/gpu/nova-core/gsp/hal.rs b/drivers/gpu/nova-core/gsp/hal.rs
new file mode 100644
index 000000000000..ae16fd6ba7fb
--- /dev/null
+++ b/drivers/gpu/nova-core/gsp/hal.rs
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+mod gh100;
+mod tu102;
+
+use kernel::prelude::*;
+
+use kernel::{
+    device,
+    dma::Coherent, //
+};
+
+use crate::{
+    driver::Bar0,
+    falcon::{
+        gsp::Gsp as GspEngine,
+        sec2::Sec2,
+        Falcon, //
+    },
+    fb::FbLayout,
+    firmware::gsp::GspFirmware,
+    gpu::{
+        Architecture,
+        Chipset, //
+    },
+    gsp::{
+        Gsp,
+        GspFwWprMeta, //
+    },
+};
+
+/// Trait implemented by GSP HALs.
+pub(super) trait GspHal: Send {
+    /// Performs the GSP boot process, loading and running the required firmwares as needed.
+    #[allow(clippy::too_many_arguments)]
+    fn boot(
+        &self,
+        gsp: Pin<&mut Gsp>,
+        dev: &device::Device<device::Bound>,
+        bar: &Bar0,
+        chipset: Chipset,
+        fb_layout: &FbLayout,
+        wpr_meta: &Coherent<GspFwWprMeta>,
+        gsp_falcon: &Falcon<GspEngine>,
+        sec2_falcon: &Falcon<Sec2>,
+    ) -> Result;
+
+    /// Performs HAL-specific post-GSP boot tasks.
+    ///
+    /// This method is called by the GSP boot code after the GSP is confirmed to be running, and
+    /// after the initialization commands have been pushed onto its queue.
+    fn post_boot(
+        &self,
+        _gsp: Pin<&mut Gsp>,
+        _dev: &device::Device<device::Bound>,
+        _bar: &Bar0,
+        _gsp_fw: &GspFirmware,
+        _gsp_falcon: &Falcon<GspEngine>,
+        _sec2_falcon: &Falcon<Sec2>,
+    ) -> Result {
+        Ok(())
+    }
+}
+
+/// Returns the GSP HAL to be used for `chipset`.
+pub(super) fn gsp_hal(chipset: Chipset) -> &'static dyn GspHal {
+    match chipset.arch() {
+        Architecture::Turing | Architecture::Ampere | Architecture::Ada => tu102::TU102_HAL,
+        Architecture::Hopper | Architecture::BlackwellGB10x | Architecture::BlackwellGB20x => {
+            gh100::GH100_HAL
+        }
+    }
+}
diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
new file mode 100644
index 000000000000..187fb7dbe40a
--- /dev/null
+++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
@@ -0,0 +1,50 @@
+// SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+use kernel::prelude::*;
+
+use kernel::{
+    device,
+    dma::Coherent, //
+};
+
+use crate::{
+    driver::Bar0,
+    falcon::{
+        gsp::Gsp as GspEngine,
+        sec2::Sec2,
+        Falcon, //
+    },
+    fb::FbLayout,
+    gpu::Chipset,
+    gsp::{
+        hal::GspHal,
+        Gsp,
+        GspFwWprMeta, //
+    },
+};
+
+struct Gh100;
+
+impl GspHal for Gh100 {
+    /// Boot GSP via FSP Chain of Trust (Hopper/Blackwell+ path).
+    ///
+    /// This path uses FSP to establish a chain of trust and boot GSP-FMC. FSP handles
+    /// the GSP boot internally - no manual GSP reset/boot is needed.
+    fn boot(
+        &self,
+        _gsp: Pin<&mut Gsp>,
+        _dev: &device::Device<device::Bound>,
+        _bar: &Bar0,
+        _chipset: Chipset,
+        _fb_layout: &FbLayout,
+        _wpr_meta: &Coherent<GspFwWprMeta>,
+        _gsp_falcon: &Falcon<GspEngine>,
+        _sec2_falcon: &Falcon<Sec2>,
+    ) -> Result {
+        Err(ENOTSUPP)
+    }
+}
+
+const GH100: Gh100 = Gh100;
+pub(super) const GH100_HAL: &dyn GspHal = &GH100;
diff --git a/drivers/gpu/nova-core/gsp/hal/tu102.rs b/drivers/gpu/nova-core/gsp/hal/tu102.rs
new file mode 100644
index 000000000000..b7a88f3ecea9
--- /dev/null
+++ b/drivers/gpu/nova-core/gsp/hal/tu102.rs
@@ -0,0 +1,206 @@
+// SPDX-License-Identifier: GPL-2.0
+// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+use kernel::prelude::*;
+
+use kernel::{
+    device,
+    dma::Coherent,
+    io::Io, //
+};
+
+use crate::{
+    driver::Bar0,
+    falcon::{
+        gsp::Gsp as GspEngine,
+        sec2::Sec2,
+        Falcon, //
+    },
+    fb::FbLayout,
+    firmware::{
+        booter::{
+            BooterFirmware,
+            BooterKind, //
+        },
+        fwsec::{
+            bootloader::FwsecFirmwareWithBl,
+            FwsecCommand,
+            FwsecFirmware, //
+        },
+        gsp::GspFirmware,
+        FIRMWARE_VERSION, //
+    },
+    gpu::Chipset,
+    gsp::{
+        hal::GspHal,
+        sequencer::{
+            GspSequencer,
+            GspSequencerParams, //
+        },
+        Gsp,
+        GspFwWprMeta, //
+    },
+    regs,
+    vbios::Vbios, //
+};
+
+/// Helper function to load and run the FWSEC-FRTS firmware and confirm that it has properly
+/// created the WPR2 region.
+fn run_fwsec_frts(
+    dev: &device::Device<device::Bound>,
+    chipset: Chipset,
+    falcon: &Falcon<GspEngine>,
+    bar: &Bar0,
+    bios: &Vbios,
+    fb_layout: &FbLayout,
+) -> Result<()> {
+    // Check that the WPR2 region does not already exist - if it does, we cannot run
+    // FWSEC-FRTS until the GPU is reset.
+    if bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI).higher_bound() != 0 {
+        dev_err!(
+            dev,
+            "WPR2 region already exists - GPU needs to be reset to proceed\n"
+        );
+        return Err(EBUSY);
+    }
+
+    // FWSEC-FRTS will create the WPR2 region.
+    let fwsec_frts = FwsecFirmware::new(
+        dev,
+        falcon,
+        bar,
+        bios,
+        FwsecCommand::Frts {
+            frts_addr: fb_layout.frts.start,
+            frts_size: fb_layout.frts.len(),
+        },
+    )?;
+
+    if chipset.needs_fwsec_bootloader() {
+        let fwsec_frts_bl = FwsecFirmwareWithBl::new(fwsec_frts, dev, chipset)?;
+        // Load and run the bootloader, which will load FWSEC-FRTS and run it.
+        fwsec_frts_bl.run(dev, falcon, bar)?;
+    } else {
+        // Load and run FWSEC-FRTS directly.
+        fwsec_frts.run(dev, falcon, bar)?;
+    }
+
+    // SCRATCH_E contains the error code for FWSEC-FRTS.
+    let frts_status = bar
+        .read(regs::NV_PBUS_SW_SCRATCH_0E_FRTS_ERR)
+        .frts_err_code();
+    if frts_status != 0 {
+        dev_err!(
+            dev,
+            "FWSEC-FRTS returned with error code {:#x}\n",
+            frts_status
+        );
+
+        return Err(EIO);
+    }
+
+    // Check that the WPR2 region has been created as we requested.
+    let (wpr2_lo, wpr2_hi) = (
+        bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_LO).lower_bound(),
+        bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI).higher_bound(),
+    );
+
+    match (wpr2_lo, wpr2_hi) {
+        (_, 0) => {
+            dev_err!(dev, "WPR2 region not created after running FWSEC-FRTS\n");
+
+            Err(EIO)
+        }
+        (wpr2_lo, _) if wpr2_lo != fb_layout.frts.start => {
+            dev_err!(
+                dev,
+                "WPR2 region created at unexpected address {:#x}; expected {:#x}\n",
+                wpr2_lo,
+                fb_layout.frts.start,
+            );
+
+            Err(EIO)
+        }
+        (wpr2_lo, wpr2_hi) => {
+            dev_dbg!(dev, "WPR2: {:#x}-{:#x}\n", wpr2_lo, wpr2_hi);
+            dev_dbg!(dev, "GPU instance built\n");
+
+            Ok(())
+        }
+    }
+}
+
+struct Tu102;
+
+impl GspHal for Tu102 {
+    fn boot(
+        &self,
+        gsp: Pin<&mut Gsp>,
+        dev: &device::Device<device::Bound>,
+        bar: &Bar0,
+        chipset: Chipset,
+        fb_layout: &FbLayout,
+        wpr_meta: &Coherent<GspFwWprMeta>,
+        gsp_falcon: &Falcon<GspEngine>,
+        sec2_falcon: &Falcon<Sec2>,
+    ) -> Result {
+        let bios = Vbios::new(dev, bar)?;
+
+        // FWSEC-FRTS is not executed on chips where the FRTS region size is 0 (e.g. GA100).
+        if !fb_layout.frts.is_empty() {
+            run_fwsec_frts(dev, chipset, gsp_falcon, bar, &bios, fb_layout)?;
+        }
+
+        gsp_falcon.reset(bar)?;
+        let libos_handle = gsp.libos.dma_handle();
+        let (mbox0, mbox1) = gsp_falcon.boot(
+            bar,
+            Some(libos_handle as u32),
+            Some((libos_handle >> 32) as u32),
+        )?;
+        dev_dbg!(dev, "GSP MBOX0: {:#x}, MBOX1: {:#x}\n", mbox0, mbox1);
+
+        dev_dbg!(
+            dev,
+            "Using SEC2 to load and run the booter_load firmware...\n"
+        );
+
+        BooterFirmware::new(
+            dev,
+            BooterKind::Loader,
+            chipset,
+            FIRMWARE_VERSION,
+            sec2_falcon,
+            bar,
+        )?
+        .run(dev, bar, sec2_falcon, wpr_meta)?;
+
+        Ok(())
+    }
+
+    fn post_boot(
+        &self,
+        gsp: Pin<&mut Gsp>,
+        dev: &device::Device<device::Bound>,
+        bar: &Bar0,
+        gsp_fw: &GspFirmware,
+        gsp_falcon: &Falcon<GspEngine>,
+        sec2_falcon: &Falcon<Sec2>,
+    ) -> Result {
+        // Create and run the GSP sequencer.
+        let seq_params = GspSequencerParams {
+            bootloader_app_version: gsp_fw.bootloader.app_version,
+            libos_dma_handle: gsp.libos.dma_handle(),
+            gsp_falcon,
+            sec2_falcon,
+            dev: dev.into(),
+            bar,
+        };
+        GspSequencer::run(&gsp.cmdq, seq_params)?;
+
+        Ok(())
+    }
+}
+
+const TU102: Tu102 = Tu102;
+pub(super) const TU102_HAL: &dyn GspHal = &TU102;

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v6 7/7] gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding
  2026-05-21 13:50 [PATCH v6 0/7] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
                   ` (5 preceding siblings ...)
  2026-05-21 13:50 ` [PATCH v6 6/7] gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL Alexandre Courbot
@ 2026-05-21 13:50 ` Alexandre Courbot
  2026-05-22  6:59   ` Eliot Courtney
  2026-05-25 10:06   ` Claude review: " Claude Code Review Bot
  2026-05-25 10:06 ` Claude review: gpu: nova-core: run unload sequence " Claude Code Review Bot
  7 siblings, 2 replies; 17+ messages in thread
From: Alexandre Courbot @ 2026-05-21 13:50 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter
  Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
	nova-gpu, dri-devel, linux-kernel, rust-for-linux,
	Alexandre Courbot

When probing the driver, the FWSEC-FRTS firmware creates a WPR2 secure
memory region to store the GSP firmware, and the Booter Loader loads and
starts that firmware into the GSP, making it run in RISC-V mode.

These operations need to be reverted upon unloading, particularly the
WPR2 secure region creation, as its presence prevents the driver from
subsequently probing.

Thus, prepare the Booter Unloader and FWSEC-SB firmwares when booting
the GSP, so they can be executed at unbind time to put the GPU into a
state where it can be probed again.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/driver.rs          |  18 +++-
 drivers/gpu/nova-core/firmware/booter.rs |   1 -
 drivers/gpu/nova-core/firmware/fwsec.rs  |   1 -
 drivers/gpu/nova-core/gpu.rs             |  35 ++++++--
 drivers/gpu/nova-core/gsp.rs             |   3 +
 drivers/gpu/nova-core/gsp/boot.rs        |  35 ++++++--
 drivers/gpu/nova-core/gsp/hal.rs         |  21 ++++-
 drivers/gpu/nova-core/gsp/hal/gh100.rs   |   7 +-
 drivers/gpu/nova-core/gsp/hal/tu102.rs   | 141 ++++++++++++++++++++++++++++++-
 drivers/gpu/nova-core/regs.rs            |   5 ++
 10 files changed, 242 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
index ced1d38d206a..20d38a64dcc7 100644
--- a/drivers/gpu/nova-core/driver.rs
+++ b/drivers/gpu/nova-core/driver.rs
@@ -1,5 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 
+use core::cell::Cell;
+
 use kernel::{
     auxiliary,
     device::Core,
@@ -21,7 +23,10 @@
     types::ForLt,
 };
 
-use crate::gpu::Gpu;
+use crate::{
+    gpu::Gpu,
+    gsp, //
+};
 
 /// Counter for generating unique auxiliary device IDs.
 static AUXILIARY_ID_COUNTER: Atomic<u32> = Atomic::new(0);
@@ -33,6 +38,10 @@ pub(crate) struct NovaCore<'bound> {
     bar: pci::Bar<'bound, BAR0_SIZE>,
     #[allow(clippy::type_complexity)]
     _reg: Devres<auxiliary::Registration<ForLt!(())>>,
+    /// GSP unload bundle, if any.
+    ///
+    /// Stored into a `Cell` so it can be [taken](Cell::take) without a mutable reference.
+    unload_bundle: Cell<Option<gsp::UnloadBundle>>,
 }
 
 const BAR0_SIZE: usize = SZ_16M;
@@ -94,6 +103,8 @@ fn probe<'bound>(
             // other threads of execution.
             unsafe { pdev.dma_set_mask_and_coherent(DmaMask::new::<GPU_DMA_BITS>())? };
 
+            let mut unload_bundle = None;
+
             Ok(try_pin_init!(NovaCore {
                 bar: pdev.iomap_region_sized::<BAR0_SIZE>(0, c"nova-core/bar0")?,
                 // TODO: Use `&bar` self-referential pin-init syntax once available.
@@ -101,7 +112,7 @@ fn probe<'bound>(
                 // SAFETY: `bar` is initialized before this expression is evaluated
                 // (`try_pin_init!()` initializes fields in declaration order), lives at a pinned
                 // stable address, and is dropped after `gpu` (struct field drop order).
-                gpu <- Gpu::new(pdev, unsafe { &*core::ptr::from_ref(bar) }),
+                gpu <- Gpu::new(pdev, unsafe { &*core::ptr::from_ref(bar) }, &mut unload_bundle),
                 _reg: auxiliary::Registration::new(
                     pdev.as_ref(),
                     c"nova-drm",
@@ -111,6 +122,7 @@ fn probe<'bound>(
                     crate::MODULE_NAME,
                     (),
                 )?,
+                unload_bundle: Cell::new(unload_bundle),
             }))
         })
     }
@@ -119,6 +131,6 @@ fn unbind<'bound>(
         dev: &'bound pci::Device<kernel::device::Core>,
         this: Pin<&'bound Self::Data<'bound>>,
     ) {
-        this.gpu.unbind(dev)
+        this.gpu.unbind(dev, this.unload_bundle.take())
     }
 }
diff --git a/drivers/gpu/nova-core/firmware/booter.rs b/drivers/gpu/nova-core/firmware/booter.rs
index e45e5dc8d5d2..c5e17605e1a3 100644
--- a/drivers/gpu/nova-core/firmware/booter.rs
+++ b/drivers/gpu/nova-core/firmware/booter.rs
@@ -282,7 +282,6 @@ fn new_booter(data: &[u8]) -> Result<Self> {
 #[derive(Copy, Clone, Debug, PartialEq)]
 pub(crate) enum BooterKind {
     Loader,
-    #[expect(unused)]
     Unloader,
 }
 
diff --git a/drivers/gpu/nova-core/firmware/fwsec.rs b/drivers/gpu/nova-core/firmware/fwsec.rs
index 8810cb49db67..4108f28cd338 100644
--- a/drivers/gpu/nova-core/firmware/fwsec.rs
+++ b/drivers/gpu/nova-core/firmware/fwsec.rs
@@ -144,7 +144,6 @@ pub(crate) enum FwsecCommand {
     /// image into it.
     Frts { frts_addr: u64, frts_size: u64 },
     /// Asks [`FwsecFirmware`] to load pre-OS apps on the PMU.
-    #[expect(dead_code)]
     Sb,
 }
 
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 75fe1bdb80fe..5af04901b512 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -18,7 +18,10 @@
         Falcon, //
     },
     fb::SysmemFlush,
-    gsp::Gsp,
+    gsp::{
+        self,
+        Gsp, //
+    },
     regs,
 };
 
@@ -261,10 +264,20 @@ pub(crate) struct Gpu<'bound> {
 }
 
 impl<'bound> Gpu<'bound> {
-    pub(crate) fn new(
+    /// Create a new [`Gpu`] instance.
+    ///
+    /// `pdev` is the PCI device for the GPU, `bar` is its `Bar0` mapping.
+    ///
+    /// `unload_bundle` is an output parameter, where the [GSP unload bundle](gsp::UnloadBundle) is
+    /// to be written. The driver layer will pass the written value back to [`Gpu::unbind`].
+    pub(crate) fn new<'init>(
         pdev: &'bound pci::Device<device::Bound>,
         bar: &'bound Bar0,
-    ) -> impl PinInit<Self, Error> + 'bound {
+        unload_bundle: &'init mut Option<gsp::UnloadBundle>,
+    ) -> impl PinInit<Self, Error> + 'init
+    where
+        'bound: 'init,
+    {
         try_pin_init!(Self {
             spec: Spec::new(pdev.as_ref(), bar).inspect(|spec| {
                 dev_info!(pdev,"NVIDIA ({})\n", spec);
@@ -290,14 +303,24 @@ pub(crate) fn new(
 
             gsp <- Gsp::new(pdev),
 
-            _: { gsp.boot(pdev, bar, spec.chipset, gsp_falcon, sec2_falcon)? },
+            _: { *unload_bundle = gsp.boot(pdev, bar, spec.chipset, gsp_falcon, sec2_falcon)? },
         })
     }
 
-    pub(crate) fn unbind(&self, pdev: &'bound pci::Device<device::Core>) {
+    pub(crate) fn unbind(
+        &self,
+        pdev: &'bound pci::Device<device::Core>,
+        unload_bundle: Option<gsp::UnloadBundle>,
+    ) {
         let _ = self
             .gsp
-            .unload(pdev.as_ref(), self.bar, &self.gsp_falcon)
+            .unload(
+                pdev.as_ref(),
+                self.bar,
+                &self.gsp_falcon,
+                &self.sec2_falcon,
+                unload_bundle,
+            )
             .inspect_err(|e| dev_err!(pdev, "failed to unload GSP: {:?}\n", e));
     }
 }
diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
index 38378f104068..1885cfa5cb38 100644
--- a/drivers/gpu/nova-core/gsp.rs
+++ b/drivers/gpu/nova-core/gsp.rs
@@ -185,3 +185,6 @@ pub(crate) fn new(pdev: &pci::Device<device::Bound>) -> impl PinInit<Self, Error
         })
     }
 }
+
+/// Opaque bundle required to unload the GSP. Created by [`Gsp::boot`], consumed by [`Gsp::unload`].
+pub(crate) struct UnloadBundle(KBox<dyn hal::UnloadBundle>);
diff --git a/drivers/gpu/nova-core/gsp/boot.rs b/drivers/gpu/nova-core/gsp/boot.rs
index 447c9b083039..2968178d0c6d 100644
--- a/drivers/gpu/nova-core/gsp/boot.rs
+++ b/drivers/gpu/nova-core/gsp/boot.rs
@@ -46,7 +46,7 @@ pub(crate) fn boot(
         chipset: Chipset,
         gsp_falcon: &Falcon<Gsp>,
         sec2_falcon: &Falcon<Sec2>,
-    ) -> Result {
+    ) -> Result<Option<super::UnloadBundle>> {
         let dev = pdev.as_ref();
         let hal = super::hal::gsp_hal(chipset);
 
@@ -57,8 +57,8 @@ pub(crate) fn boot(
 
         let wpr_meta = Coherent::init(dev, GFP_KERNEL, GspFwWprMeta::new(&gsp_fw, &fb_layout))?;
 
-        // Perform the chipset-specific boot sequence.
-        hal.boot(
+        // Perform the chipset-specific boot sequence, and retrieve the unload bundle.
+        let unload_bundle = hal.boot(
             self.as_mut(),
             dev,
             bar,
@@ -98,7 +98,7 @@ pub(crate) fn boot(
             Err(e) => dev_warn!(pdev, "GPU name unavailable: {:?}\n", e),
         }
 
-        Ok(())
+        Ok(unload_bundle.map(super::UnloadBundle))
     }
 
     /// Shut down the GSP and wait until it is offline.
@@ -130,16 +130,35 @@ pub(crate) fn unload(
         dev: &device::Device<device::Bound>,
         bar: &Bar0,
         gsp_falcon: &Falcon<Gsp>,
+        sec2_falcon: &Falcon<Sec2>,
+        unload_bundle: Option<super::UnloadBundle>,
     ) -> Result {
-        // Shut down the GSP.
-        Self::shutdown_gsp(
+        // Shut down the GSP. Keep going even in case of error.
+        let mut res = Self::shutdown_gsp(
             &self.cmdq,
             bar,
             gsp_falcon,
             commands::PowerStateLevel::Level0,
         )
-        .inspect_err(|e| dev_err!(dev, "Unload guest driver failed: {:?}\n", e))?;
+        .inspect_err(|e| dev_err!(dev, "GSP shutdown failed: {:?}\n", e));
 
-        Ok(())
+        // Run the unload bundle to reset the GSP so it can be booted again.
+        if let Some(unload_bundle) = unload_bundle {
+            res = res.and(
+                unload_bundle
+                    .0
+                    .run(dev, bar, gsp_falcon, sec2_falcon)
+                    .inspect_err(|e| dev_err!(dev, "Unload bundle failed: {:?}\n", e)),
+            );
+        } else {
+            dev_warn!(
+                dev,
+                "Unload bundle is missing, GSP won't be properly reset.\n"
+            );
+
+            res = Err(EAGAIN);
+        }
+
+        res.inspect(|()| dev_info!(dev, "GSP successfully unloaded\n"))
     }
 }
diff --git a/drivers/gpu/nova-core/gsp/hal.rs b/drivers/gpu/nova-core/gsp/hal.rs
index ae16fd6ba7fb..fe591c124a94 100644
--- a/drivers/gpu/nova-core/gsp/hal.rs
+++ b/drivers/gpu/nova-core/gsp/hal.rs
@@ -30,9 +30,28 @@
     },
 };
 
+/// Trait for types containing the resources and code required to fully reset the GSP.
+///
+/// The GSP unload code might run in a situation where we cannot load firmware dynamically (e.g.
+/// because we are in shutdown and the file system is not accessible anymore). Thus, the firmware
+/// required for unloading is prepared at load time, and stored here until it needs to be run.
+pub(super) trait UnloadBundle: Send {
+    /// Performs the steps required to properly reset the GSP after it has been stopped.
+    fn run(
+        &self,
+        dev: &device::Device<device::Bound>,
+        bar: &Bar0,
+        gsp_falcon: &Falcon<GspEngine>,
+        sec2_falcon: &Falcon<Sec2>,
+    ) -> Result;
+}
+
 /// Trait implemented by GSP HALs.
 pub(super) trait GspHal: Send {
     /// Performs the GSP boot process, loading and running the required firmwares as needed.
+    ///
+    /// Upon success, returns the [`UnloadBundle`] to be run (if any) in order to properly reset the
+    /// GSP after it has been stopped.
     #[allow(clippy::too_many_arguments)]
     fn boot(
         &self,
@@ -44,7 +63,7 @@ fn boot(
         wpr_meta: &Coherent<GspFwWprMeta>,
         gsp_falcon: &Falcon<GspEngine>,
         sec2_falcon: &Falcon<Sec2>,
-    ) -> Result;
+    ) -> Result<Option<KBox<dyn UnloadBundle>>>;
 
     /// Performs HAL-specific post-GSP boot tasks.
     ///
diff --git a/drivers/gpu/nova-core/gsp/hal/gh100.rs b/drivers/gpu/nova-core/gsp/hal/gh100.rs
index 187fb7dbe40a..46ffc51dc385 100644
--- a/drivers/gpu/nova-core/gsp/hal/gh100.rs
+++ b/drivers/gpu/nova-core/gsp/hal/gh100.rs
@@ -18,7 +18,10 @@
     fb::FbLayout,
     gpu::Chipset,
     gsp::{
-        hal::GspHal,
+        hal::{
+            GspHal,
+            UnloadBundle, //
+        },
         Gsp,
         GspFwWprMeta, //
     },
@@ -41,7 +44,7 @@ fn boot(
         _wpr_meta: &Coherent<GspFwWprMeta>,
         _gsp_falcon: &Falcon<GspEngine>,
         _sec2_falcon: &Falcon<Sec2>,
-    ) -> Result {
+    ) -> Result<Option<KBox<dyn UnloadBundle>>> {
         Err(ENOTSUPP)
     }
 }
diff --git a/drivers/gpu/nova-core/gsp/hal/tu102.rs b/drivers/gpu/nova-core/gsp/hal/tu102.rs
index b7a88f3ecea9..fe6fcb84b03d 100644
--- a/drivers/gpu/nova-core/gsp/hal/tu102.rs
+++ b/drivers/gpu/nova-core/gsp/hal/tu102.rs
@@ -32,7 +32,10 @@
     },
     gpu::Chipset,
     gsp::{
-        hal::GspHal,
+        hal::{
+            GspHal,
+            UnloadBundle, //
+        },
         sequencer::{
             GspSequencer,
             GspSequencerParams, //
@@ -44,6 +47,124 @@
     vbios::Vbios, //
 };
 
+// A ready-to-run FWSEC unload firmware.
+//
+// Since there are two variants of the prepared firmware (with and without a bootloader), this type
+// abstracts the difference.
+enum FwsecUnloadFirmware {
+    WithoutBl(FwsecFirmware),
+    WithBl(FwsecFirmwareWithBl),
+}
+
+impl FwsecUnloadFirmware {
+    /// Loads the FWSEC SB firmware, as well as its bootloader if `chipset` requires it.
+    fn new(
+        dev: &device::Device<device::Bound>,
+        bar: &Bar0,
+        chipset: Chipset,
+        bios: &Vbios,
+        gsp_falcon: &Falcon<GspEngine>,
+    ) -> Result<Self> {
+        let fwsec_sb = FwsecFirmware::new(dev, gsp_falcon, bar, bios, FwsecCommand::Sb)?;
+
+        Ok(if chipset.needs_fwsec_bootloader() {
+            Self::WithBl(FwsecFirmwareWithBl::new(fwsec_sb, dev, chipset)?)
+        } else {
+            Self::WithoutBl(fwsec_sb)
+        })
+    }
+
+    /// Runs the FWSEC SB firmware.
+    fn run(
+        &self,
+        dev: &device::Device<device::Bound>,
+        bar: &Bar0,
+        gsp_falcon: &Falcon<GspEngine>,
+    ) -> Result<()> {
+        match self {
+            Self::WithoutBl(fw) => fw.run(dev, gsp_falcon, bar),
+            Self::WithBl(fw) => fw.run(dev, gsp_falcon, bar),
+        }
+    }
+}
+
+// Contains the firmware required to fully reset GSP on chipsets where the GSP is started using
+// FWSEC/Booter.
+struct Sec2UnloadBundle {
+    fwsec_sb: FwsecUnloadFirmware,
+    booter_unloader: BooterFirmware,
+}
+
+impl Sec2UnloadBundle {
+    /// Load and prepare the resources required to properly reset the GSP after it has been stopped.
+    fn build(
+        dev: &device::Device<device::Bound>,
+        bar: &Bar0,
+        chipset: Chipset,
+        bios: &Vbios,
+        gsp_falcon: &Falcon<GspEngine>,
+        sec2_falcon: &Falcon<Sec2>,
+    ) -> Result<KBox<dyn UnloadBundle>> {
+        KBox::new(
+            Self {
+                fwsec_sb: FwsecUnloadFirmware::new(dev, bar, chipset, bios, gsp_falcon)?,
+                booter_unloader: BooterFirmware::new(
+                    dev,
+                    BooterKind::Unloader,
+                    chipset,
+                    FIRMWARE_VERSION,
+                    sec2_falcon,
+                    bar,
+                )?,
+            },
+            GFP_KERNEL,
+        )
+        .map(|b| b as KBox<dyn UnloadBundle>)
+        .map_err(Into::into)
+    }
+}
+
+impl UnloadBundle for Sec2UnloadBundle {
+    fn run(
+        &self,
+        dev: &device::Device<device::Bound>,
+        bar: &Bar0,
+        gsp_falcon: &Falcon<GspEngine>,
+        sec2_falcon: &Falcon<Sec2>,
+    ) -> Result<()> {
+        // Run FWSEC-SB to reset the GSP falcon to its pre-libos state.
+        self.fwsec_sb.run(dev, bar, gsp_falcon)?;
+
+        // Remove WPR2 region if set.
+        let wpr2_hi = bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI);
+        if wpr2_hi.is_wpr2_set() {
+            sec2_falcon.reset(bar)?;
+            sec2_falcon.load(dev, bar, &self.booter_unloader)?;
+
+            // Sentinel value to confirm that Booter Unloader has run.
+            const MAILBOX_SENTINEL: u32 = 0xff;
+            let (mbox0, _) =
+                sec2_falcon.boot(bar, Some(MAILBOX_SENTINEL), Some(MAILBOX_SENTINEL))?;
+            if mbox0 != 0 {
+                dev_err!(dev, "Booter Unloader returned error 0x{:x}\n", mbox0);
+                return Err(EINVAL);
+            }
+
+            // Confirm that the WPR2 region has been removed.
+            let wpr2_hi = bar.read(regs::NV_PFB_PRI_MMU_WPR2_ADDR_HI);
+            if wpr2_hi.is_wpr2_set() {
+                dev_err!(
+                    dev,
+                    "WPR2 region still set after Booter Unloader returned\n"
+                );
+                return Err(EBUSY);
+            }
+        }
+
+        Ok(())
+    }
+}
+
 /// Helper function to load and run the FWSEC-FRTS firmware and confirm that it has properly
 /// created the WPR2 region.
 fn run_fwsec_frts(
@@ -143,7 +264,7 @@ fn boot(
         wpr_meta: &Coherent<GspFwWprMeta>,
         gsp_falcon: &Falcon<GspEngine>,
         sec2_falcon: &Falcon<Sec2>,
-    ) -> Result {
+    ) -> Result<Option<KBox<dyn UnloadBundle>>> {
         let bios = Vbios::new(dev, bar)?;
 
         // FWSEC-FRTS is not executed on chips where the FRTS region size is 0 (e.g. GA100).
@@ -175,7 +296,21 @@ fn boot(
         )?
         .run(dev, bar, sec2_falcon, wpr_meta)?;
 
-        Ok(())
+        // Last, try and prepare the unload bundle. If this fails, the GPU will need to be reset
+        // before the driver can be probed again.
+        let unload_bundle =
+            Sec2UnloadBundle::build(dev, bar, chipset, &bios, gsp_falcon, sec2_falcon)
+                .inspect_err(|e| {
+                    dev_warn!(dev, "Failed to prepare unload firmware: {:?}\n", e);
+                    dev_warn!(dev, "The GSP won't be able to unload properly on unbind.\n");
+                    dev_warn!(
+                        dev,
+                        "The GPU will need to be reset before the driver can bind again.\n"
+                    );
+                })
+                .ok();
+
+        Ok(unload_bundle)
     }
 
     fn post_boot(
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index 6faeed73901d..356fbf364ea5 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -175,6 +175,11 @@ impl NV_PFB_PRI_MMU_WPR2_ADDR_HI {
     pub(crate) fn higher_bound(self) -> u64 {
         u64::from(self.hi_val()) << 12
     }
+
+    /// Returns whether the WPR2 region is currently set.
+    pub(crate) fn is_wpr2_set(self) -> bool {
+        self.hi_val() != 0
+    }
 }
 
 // PGSP

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v6 7/7] gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding
  2026-05-21 13:50 ` [PATCH v6 7/7] gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding Alexandre Courbot
@ 2026-05-22  6:59   ` Eliot Courtney
  2026-05-25 10:06   ` Claude review: " Claude Code Review Bot
  1 sibling, 0 replies; 17+ messages in thread
From: Eliot Courtney @ 2026-05-22  6:59 UTC (permalink / raw)
  To: Alexandre Courbot, Danilo Krummrich, Alice Ryhl, David Airlie,
	Simona Vetter
  Cc: John Hubbard, Alistair Popple, Timur Tabi, Eliot Courtney,
	nova-gpu, dri-devel, linux-kernel, rust-for-linux, dri-devel

On Thu May 21, 2026 at 10:50 PM JST, Alexandre Courbot wrote:
> When probing the driver, the FWSEC-FRTS firmware creates a WPR2 secure
> memory region to store the GSP firmware, and the Booter Loader loads and
> starts that firmware into the GSP, making it run in RISC-V mode.
>
> These operations need to be reverted upon unloading, particularly the
> WPR2 secure region creation, as its presence prevents the driver from
> subsequently probing.
>
> Thus, prepare the Booter Unloader and FWSEC-SB firmwares when booting
> the GSP, so they can be executed at unbind time to put the GPU into a
> state where it can be probed again.
>
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---

After seeing the bundle moved outwards, I realised that it has the same
issue that SysmemFlush does, i.e. if probe fails it does not reset the
GSP. A lot of the time during development I will break things badly
enough that probe fails, so it would be nice if this is supported. OTOH,
this gets the probe suceed and unload case working which is important
and this is a definite improvement, so for this version and the previous
version of the patch:

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>

I also had a brief go at making this work on Drop, here is the diff on
top of this series. I can send this as a follow up if you would like
after cleaning it up, or lmk wdyt:

diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
index 20d38a64dcc7..6571a8c59d09 100644
--- a/drivers/gpu/nova-core/driver.rs
+++ b/drivers/gpu/nova-core/driver.rs
@@ -1,7 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0
 
-use core::cell::Cell;
-
 use kernel::{
     auxiliary,
     device::Core,
@@ -23,10 +21,7 @@
     types::ForLt,
 };
 
-use crate::{
-    gpu::Gpu,
-    gsp, //
-};
+use crate::gpu::Gpu;
 
 /// Counter for generating unique auxiliary device IDs.
 static AUXILIARY_ID_COUNTER: Atomic<u32> = Atomic::new(0);
@@ -38,10 +33,6 @@ pub(crate) struct NovaCore<'bound> {
     bar: pci::Bar<'bound, BAR0_SIZE>,
     #[allow(clippy::type_complexity)]
     _reg: Devres<auxiliary::Registration<ForLt!(())>>,
-    /// GSP unload bundle, if any.
-    ///
-    /// Stored into a `Cell` so it can be [taken](Cell::take) without a mutable reference.
-    unload_bundle: Cell<Option<gsp::UnloadBundle>>,
 }
 
 const BAR0_SIZE: usize = SZ_16M;
@@ -103,8 +94,6 @@ fn probe<'bound>(
             // other threads of execution.
             unsafe { pdev.dma_set_mask_and_coherent(DmaMask::new::<GPU_DMA_BITS>())? };
 
-            let mut unload_bundle = None;
-
             Ok(try_pin_init!(NovaCore {
                 bar: pdev.iomap_region_sized::<BAR0_SIZE>(0, c"nova-core/bar0")?,
                 // TODO: Use `&bar` self-referential pin-init syntax once available.
@@ -112,7 +101,7 @@ fn probe<'bound>(
                 // SAFETY: `bar` is initialized before this expression is evaluated
                 // (`try_pin_init!()` initializes fields in declaration order), lives at a pinned
                 // stable address, and is dropped after `gpu` (struct field drop order).
-                gpu <- Gpu::new(pdev, unsafe { &*core::ptr::from_ref(bar) }, &mut unload_bundle),
+                gpu <- Gpu::new(pdev, unsafe { &*core::ptr::from_ref(bar) }),
                 _reg: auxiliary::Registration::new(
                     pdev.as_ref(),
                     c"nova-drm",
@@ -122,15 +111,7 @@ fn probe<'bound>(
                     crate::MODULE_NAME,
                     (),
                 )?,
-                unload_bundle: Cell::new(unload_bundle),
             }))
         })
     }
-
-    fn unbind<'bound>(
-        dev: &'bound pci::Device<kernel::device::Core>,
-        this: Pin<&'bound Self::Data<'bound>>,
-    ) {
-        this.gpu.unbind(dev, this.unload_bundle.take())
-    }
 }
diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 5af04901b512..605eaba5d90a 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -246,8 +246,10 @@ fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
 }
 
 /// Structure holding the resources required to operate the GPU.
-#[pin_data]
+#[pin_data(PinnedDrop)]
 pub(crate) struct Gpu<'bound> {
+    /// Device owning the GPU.
+    device: &'bound device::Device<device::Bound>,
     spec: Spec,
     /// MMIO mapping of PCI BAR 0.
     bar: &'bound Bar0,
@@ -261,24 +263,23 @@ pub(crate) struct Gpu<'bound> {
     /// GSP runtime data. Temporarily an empty placeholder.
     #[pin]
     gsp: Gsp,
+    /// GSP unload firmware bundle, if any.
+    unload_bundle: Option<gsp::UnloadBundle>,
 }
 
 impl<'bound> Gpu<'bound> {
     /// Create a new [`Gpu`] instance.
     ///
     /// `pdev` is the PCI device for the GPU, `bar` is its `Bar0` mapping.
-    ///
-    /// `unload_bundle` is an output parameter, where the [GSP unload bundle](gsp::UnloadBundle) is
-    /// to be written. The driver layer will pass the written value back to [`Gpu::unbind`].
     pub(crate) fn new<'init>(
         pdev: &'bound pci::Device<device::Bound>,
         bar: &'bound Bar0,
-        unload_bundle: &'init mut Option<gsp::UnloadBundle>,
     ) -> impl PinInit<Self, Error> + 'init
     where
         'bound: 'init,
     {
         try_pin_init!(Self {
+            device: pdev.as_ref(),
             spec: Spec::new(pdev.as_ref(), bar).inspect(|spec| {
                 dev_info!(pdev,"NVIDIA ({})\n", spec);
             })?,
@@ -303,24 +304,24 @@ pub(crate) fn new<'init>(
 
             gsp <- Gsp::new(pdev),
 
-            _: { *unload_bundle = gsp.boot(pdev, bar, spec.chipset, gsp_falcon, sec2_falcon)? },
+            unload_bundle: gsp.boot(pdev, bar, spec.chipset, gsp_falcon, sec2_falcon)?,
         })
     }
+}
 
-    pub(crate) fn unbind(
-        &self,
-        pdev: &'bound pci::Device<device::Core>,
-        unload_bundle: Option<gsp::UnloadBundle>,
-    ) {
-        let _ = self
+#[pinned_drop]
+impl PinnedDrop for Gpu<'_> {
+    fn drop(self: Pin<&mut Self>) {
+        let this = self.project();
+        let device = *this.device;
+        let bar = *this.bar;
+        let bundle = this.unload_bundle.take();
+
+        let _ = this
             .gsp
-            .unload(
-                pdev.as_ref(),
-                self.bar,
-                &self.gsp_falcon,
-                &self.sec2_falcon,
-                unload_bundle,
-            )
-            .inspect_err(|e| dev_err!(pdev, "failed to unload GSP: {:?}\n", e));
+            .as_ref()
+            .get_ref()
+            .unload(device, bar, &*this.gsp_falcon, &*this.sec2_falcon, bundle)
+            .inspect_err(|e| dev_err!(device, "failed to unload GSP: {:?}\n", e));
     }
 }


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Claude review: gpu: nova-core: run unload sequence upon unbinding
  2026-05-21 13:50 [PATCH v6 0/7] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
                   ` (6 preceding siblings ...)
  2026-05-21 13:50 ` [PATCH v6 7/7] gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding Alexandre Courbot
@ 2026-05-25 10:06 ` Claude Code Review Bot
  7 siblings, 0 replies; 17+ messages in thread
From: Claude Code Review Bot @ 2026-05-25 10:06 UTC (permalink / raw)
  To: dri-devel-reviews

Overall Series Review

Subject: gpu: nova-core: run unload sequence upon unbinding
Author: Alexandre Courbot <acourbot@nvidia.com>
Patches: 9
Reviewed: 2026-05-25T20:06:41.048243

---

This is a well-structured 7-patch series that implements proper GSP firmware unloading for the nova-core driver. The series progresses logically: first cleanup patches (1-2), then adding the GSP shutdown command (3), refactoring boot code for reuse (4-5), introducing a HAL abstraction for chipset-specific boot paths (6), and finally the full unload sequence (7).

The architecture is sound: firmware needed for unloading is pre-loaded at boot time (since the filesystem may not be available at unbind), stored as an opaque "unload bundle," and consumed at unbind. The HAL trait cleanly separates Turing/Ampere/Ada (SEC2-based) from Hopper/Blackwell (FSP-based, stubbed out). Error handling during unload continues through partial failures rather than short-circuiting, which is the right approach for teardown paths.

**Key concern**: Patch 7 introduces `Cell<Option<gsp::UnloadBundle>>` in `NovaCore`. `Cell` is `!Sync`, which means `NovaCore` becomes `!Sync`. This is safe for a pinned driver data structure only if the framework doesn't require `Sync` for `pci::Driver::Data`. The `unbind()` callback receives `Pin<&Self::Data>` (shared ref), so the `Cell::take()` is the mechanism to extract the value — but `Cell` on a shared reference is only sound in single-threaded contexts. If the kernel guarantees that `probe` and `unbind` are never called concurrently on the same device, this is fine, but it deserves a safety comment.

**No blocking issues found.** The series is clean, well-documented, and follows good kernel driver patterns. A few minor observations follow per-patch.

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Claude review: gpu: nova-core: remove unneeded get_gsp_info proxy function
  2026-05-21 13:50 ` [PATCH v6 1/7] gpu: nova-core: remove unneeded get_gsp_info proxy function Alexandre Courbot
@ 2026-05-25 10:06   ` Claude Code Review Bot
  0 siblings, 0 replies; 17+ messages in thread
From: Claude Code Review Bot @ 2026-05-25 10:06 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

Clean removal of a trivial wrapper function. Making `GetGspStaticInfo` `pub(crate)` is the right visibility for direct use from `boot.rs`. The call site change from `commands::get_gsp_info(&self.cmdq, bar)` to `self.cmdq.send_command(bar, commands::GetGspStaticInfo)` is a direct equivalent.

No issues.

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Claude review: gpu: nova-core: do not import firmware commands into GSP command module
  2026-05-21 13:50 ` [PATCH v6 2/7] gpu: nova-core: do not import firmware commands into GSP command module Alexandre Courbot
@ 2026-05-25 10:06   ` Claude Code Review Bot
  0 siblings, 0 replies; 17+ messages in thread
From: Claude Code Review Bot @ 2026-05-25 10:06 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

Good namespace hygiene. Replacing `use fw::commands::*` with `use fw::self` and then qualifying all types as `fw::commands::GspSetSystemInfo` etc. makes the code clearer about which layer each type belongs to.

The use of `Self::Command` in `init()` methods (e.g., `Self::Command::init(self.pdev)` instead of `GspSetSystemInfo::init(self.pdev)`) is a nice pattern that ties the init call to the associated type.

No issues.

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Claude review: gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading
  2026-05-21 13:50 ` [PATCH v6 3/7] gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading Alexandre Courbot
@ 2026-05-25 10:06   ` Claude Code Review Bot
  0 siblings, 0 replies; 17+ messages in thread
From: Claude Code Review Bot @ 2026-05-25 10:06 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

This is the core functional patch. Several observations:

**`shutdown_gsp` implementation**: The polling approach with `read_poll_timeout` for `LIBOS_INTERRUPT_PROCESSOR_SUSPENDED` (bit 31 of mailbox0) with a 5-second timeout and 10ms poll interval is reasonable.

**`PowerStateLevel::is_power_transition`**: Clean design — only `Level0` is a full unload, others preserve state:
```rust
pub(crate) fn is_power_transition(self) -> bool {
    self != PowerStateLevel::Level0
}
```

**`UnloadingGuestDriverReply`**: The reply type uses `type Message = ()`, meaning the GSP sends no payload in its reply. This is consistent with OpenRM's behavior for this RPC.

**`unbind` in `driver.rs`**: The `unbind` callback receives `dev: &'bound pci::Device<kernel::device::Core>` — note it's `device::Core`, not `device::Bound`. This is intentional since at unbind time the device is being unbound. The `gpu.rs` `unbind` method correctly calls `pdev.as_ref()` to get a `device::Device<device::Bound>` for the `unload` call — wait, actually `pci::Device<Core>` would yield `device::Device<Core>` via `as_ref()`. Let me check:

```rust
pub(crate) fn unbind(&self, pdev: &'bound pci::Device<device::Core>) {
    let _ = self
        .gsp
        .unload(pdev.as_ref(), self.bar, &self.gsp_falcon)
```

And `unload` takes `dev: &device::Device<device::Bound>`. This would be a type mismatch if `pdev.as_ref()` returns `&device::Device<Core>` rather than `&device::Device<Bound>`. However, the Device HRT framework may provide the right conversions here. This could be a compile-time issue but since the author presumably tested this, the types likely work out.

**Minor**: The `#[expect(unused)]` on the `PowerStateLevel` enum is good practice since `Level3` and `Level7` aren't used yet.

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Claude review: gpu: nova-core: refactor SEC2 booter loading into BooterFirmware::run()
  2026-05-21 13:50 ` [PATCH v6 4/7] gpu: nova-core: refactor SEC2 booter loading into BooterFirmware::run() Alexandre Courbot
@ 2026-05-25 10:06   ` Claude Code Review Bot
  0 siblings, 0 replies; 17+ messages in thread
From: Claude Code Review Bot @ 2026-05-25 10:06 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

Good refactoring. The `run()` method encapsulates the SEC2 reset → load → boot → check-mbox0 sequence that will be needed by both Booter Loader and Booter Unloader.

One observation on the `run()` method:

```rust
pub(crate) fn run<T>(
    &self,
    dev: &device::Device<device::Bound>,
    bar: &Bar0,
    sec2_falcon: &Falcon<Sec2>,
    wpr_meta: &Coherent<T>,
) -> Result {
```

The generic `<T>` for `wpr_meta` is used to get the DMA handle. This works for the Booter Loader case. However, patch 7 later uses the Booter Unloader with a different boot sequence (passing sentinel values instead of WPR metadata). Patch 7 doesn't use `BooterFirmware::run()` for the unloader — it does the SEC2 load/boot manually with sentinel mailbox values. So `run()` is only used for the loader path, which is fine, but the generic `<T>` is somewhat unnecessary since it's always called with `Coherent<GspFwWprMeta>`. Not a real issue though.

**Subtle change**: The log message device changed from `pdev` (PCI device) to `dev` (generic device):
```rust
-        dev_dbg!(pdev, "SEC2 MBOX0: {:#x}, MBOX1: {:#x}\n", mbox0, mbox1);
+        dev_dbg!(dev, "SEC2 MBOX0: {:#x}, MBOX1: {:#x}\n", mbox0, mbox1);
```
This is fine — both reference the same underlying device.

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Claude review: gpu: nova-core: gsp: shuffle boot code a bit to keep chipset-specific parts close
  2026-05-21 13:50 ` [PATCH v6 5/7] gpu: nova-core: gsp: shuffle boot code a bit to keep chipset-specific parts close Alexandre Courbot
@ 2026-05-25 10:06   ` Claude Code Review Bot
  0 siblings, 0 replies; 17+ messages in thread
From: Claude Code Review Bot @ 2026-05-25 10:06 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

This moves `SetSystemInfo` and `SetRegistry` commands from before the GSP falcon boot to after RISC-V is active. The commit message says "no effect on the GSP boot process" which is correct: these are queued commands that the GSP processes after it's up.

Also moves `wpr_meta` allocation before the FWSEC-FRTS check, which makes the chipset-specific block (FWSEC-FRTS → GSP boot → Booter Loader) contiguous. This is a clean preparation for patch 6's HAL extraction.

No issues.

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Claude review: gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL
  2026-05-21 13:50 ` [PATCH v6 6/7] gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL Alexandre Courbot
@ 2026-05-25 10:06   ` Claude Code Review Bot
  0 siblings, 0 replies; 17+ messages in thread
From: Claude Code Review Bot @ 2026-05-25 10:06 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

The HAL design is clean:

```rust
pub(super) trait GspHal: Send {
    fn boot(&self, ...) -> Result;
    fn post_boot(&self, ...) -> Result { Ok(()) }  // default impl
}
```

The `post_boot` default implementation returning `Ok(())` is good — Hopper/Blackwell won't need the GSP sequencer.

**`gsp_hal()` dispatch**:
```rust
pub(super) fn gsp_hal(chipset: Chipset) -> &'static dyn GspHal {
    match chipset.arch() {
        Architecture::Turing | Architecture::Ampere | Architecture::Ada => tu102::TU102_HAL,
        Architecture::Hopper | Architecture::BlackwellGB10x | Architecture::BlackwellGB20x => {
            gh100::GH100_HAL
        }
    }
}
```

Using `&'static dyn GspHal` with const instances is a nice pattern that avoids runtime allocation while still being extensible.

**GH100 stub**: Returns `Err(ENOTSUPP)` which maintains the previous behavior for unsupported architectures. Good.

**`boot()` signature**: `#[allow(clippy::too_many_arguments)]` — 8 arguments is a lot, but given the hardware initialization context, all parameters are genuinely needed. Could be packed into a struct but that would be churn with no real benefit.

**Pin<&mut Gsp> change**: `self: Pin<&mut Self>` → `mut self: Pin<&mut Self>` — needed because `self.as_mut()` is called to reborrow the pinned reference. Correct.

No issues.

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Claude review: gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding
  2026-05-21 13:50 ` [PATCH v6 7/7] gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding Alexandre Courbot
  2026-05-22  6:59   ` Eliot Courtney
@ 2026-05-25 10:06   ` Claude Code Review Bot
  1 sibling, 0 replies; 17+ messages in thread
From: Claude Code Review Bot @ 2026-05-25 10:06 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

This is the largest and most important patch. Several points:

**`Cell<Option<gsp::UnloadBundle>>` in `NovaCore`**:
```rust
unload_bundle: Cell<Option<gsp::UnloadBundle>>,
```
As noted in the overall review, `Cell` is `!Sync`. The `Cell::take()` in `unbind()` is used because `unbind` only gets a shared reference (`Pin<&Self::Data>`). This is a pragmatic solution — the unload bundle is consumed exactly once. However, if two unbind calls could race (unlikely but architecturally possible), `Cell::take()` would silently lose the bundle on one path. A `// SAFETY:` or `// INVARIANT:` comment explaining why this is sound would strengthen the patch.

**`UnloadBundle` as opaque wrapper**:
```rust
pub(crate) struct UnloadBundle(KBox<dyn hal::UnloadBundle>);
```

This nicely hides the HAL-specific implementation behind a trait object. The `KBox` ensures heap allocation and dynamic dispatch.

**`Sec2UnloadBundle::build()` error handling**:
```rust
let unload_bundle = Sec2UnloadBundle::build(...)
    .inspect_err(|e| {
        dev_warn!(dev, "Failed to prepare unload firmware: {:?}\n", e);
        ...
    })
    .ok();
```

Converting the error to `None` and continuing boot is the right call — failing to prepare unload firmware shouldn't prevent the driver from loading. The warnings are clear about the consequence.

**`Sec2UnloadBundle::run()` sequence**:
1. Run FWSEC-SB to reset GSP falcon
2. Check WPR2 region, run Booter Unloader if set
3. Verify WPR2 was removed

The Booter Unloader uses sentinel values `0xff` in both mailboxes (unlike Booter Loader which passes WPR metadata addresses). This aligns with OpenRM behavior per the cover letter.

**`unload()` error continuation**:
```rust
let mut res = Self::shutdown_gsp(...)
    .inspect_err(|e| dev_err!(dev, "GSP shutdown failed: {:?}\n", e));

if let Some(unload_bundle) = unload_bundle {
    res = res.and(
        unload_bundle.0.run(...)
            .inspect_err(|e| dev_err!(dev, "Unload bundle failed: {:?}\n", e)),
    );
}
```

Using `res.and()` means: if GSP shutdown failed, the unload bundle still runs, but the overall result preserves the first error. This is excellent teardown logic — continue cleanup despite partial failures, but don't mask errors.

**Missing `Reviewed-by` on this patch**: Unlike patches 1-6, patch 7 has no `Reviewed-by` tag. This is the most critical patch in the series, so it would benefit from additional review.

**`is_wpr2_set()` helper in `regs.rs`**:
```rust
pub(crate) fn is_wpr2_set(self) -> bool {
    self.hi_val() != 0
}
```
This duplicates the logic already in `higher_bound() != 0` checks elsewhere, but having a named predicate is clearer for the unload path.

**`FwsecUnloadFirmware` enum**: Clean abstraction over the with/without-bootloader variants. The `run()` method delegates correctly to the underlying firmware's `run()` method.

**One potential issue**: In `Sec2UnloadBundle::run()`, `self.fwsec_sb.run(dev, bar, gsp_falcon)` is called, but looking at the method signature:
```rust
fn run(&self, dev: ..., bar: &Bar0, gsp_falcon: &Falcon<GspEngine>) -> Result<()>
```
This passes `(dev, bar, gsp_falcon)`, and internally dispatches to `fw.run(dev, gsp_falcon, bar)` — the parameter reordering is intentional and correct, matching the underlying `FwsecFirmware::run(&self, dev, falcon, bar)` signature.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2026-05-25 10:06 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-21 13:50 [PATCH v6 0/7] gpu: nova-core: run unload sequence upon unbinding Alexandre Courbot
2026-05-21 13:50 ` [PATCH v6 1/7] gpu: nova-core: remove unneeded get_gsp_info proxy function Alexandre Courbot
2026-05-25 10:06   ` Claude review: " Claude Code Review Bot
2026-05-21 13:50 ` [PATCH v6 2/7] gpu: nova-core: do not import firmware commands into GSP command module Alexandre Courbot
2026-05-25 10:06   ` Claude review: " Claude Code Review Bot
2026-05-21 13:50 ` [PATCH v6 3/7] gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading Alexandre Courbot
2026-05-25 10:06   ` Claude review: " Claude Code Review Bot
2026-05-21 13:50 ` [PATCH v6 4/7] gpu: nova-core: refactor SEC2 booter loading into BooterFirmware::run() Alexandre Courbot
2026-05-25 10:06   ` Claude review: " Claude Code Review Bot
2026-05-21 13:50 ` [PATCH v6 5/7] gpu: nova-core: gsp: shuffle boot code a bit to keep chipset-specific parts close Alexandre Courbot
2026-05-25 10:06   ` Claude review: " Claude Code Review Bot
2026-05-21 13:50 ` [PATCH v6 6/7] gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL Alexandre Courbot
2026-05-25 10:06   ` Claude review: " Claude Code Review Bot
2026-05-21 13:50 ` [PATCH v6 7/7] gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding Alexandre Courbot
2026-05-22  6:59   ` Eliot Courtney
2026-05-25 10:06   ` Claude review: " Claude Code Review Bot
2026-05-25 10:06 ` Claude review: gpu: nova-core: run unload sequence " Claude Code Review Bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox