public inbox for drm-ai-reviews@public-inbox.freedesktop.org
 help / color / mirror / Atom feed
From: Alexandre Courbot <acourbot@nvidia.com>
To: Danilo Krummrich <dakr@kernel.org>,
	Alice Ryhl <aliceryhl@google.com>,
	David Airlie <airlied@gmail.com>, Simona Vetter <simona@ffwll.ch>
Cc: John Hubbard <jhubbard@nvidia.com>,
	Alistair Popple <apopple@nvidia.com>,
	Timur Tabi <ttabi@nvidia.com>,
	Eliot Courtney <ecourtney@nvidia.com>,
	nova-gpu@lists.linux.dev, dri-devel@lists.freedesktop.org,
	linux-kernel@vger.kernel.org, rust-for-linux@vger.kernel.org,
	Alexandre Courbot <acourbot@nvidia.com>
Subject: [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding
Date: Fri, 29 May 2026 16:33:40 +0900	[thread overview]
Message-ID: <20260529-nova-unload-v7-0-678f39209e00@nvidia.com> (raw)

Currently the GSP is left running and the WPR2 memory region untouched
when the driver is unbound. This is obviously not ideal for at least two
reasons:

- Probing requires setting up the WPR2 region, which cannot be done if
  there is already one in place. Hence the current requirement to reset
  the GPU (using e.g. `echo 1 >/sys/bus/pci/devices/.../reset`) before
  the driver can be probed again after removal.
- The running GSP may still attempt to access shared memory regions
  which the kernel might recycle.

On top of that, there is a nasty bug in the Blackwell VBIOS that
sometimes borks the GPU upon PCI reset, requiring a reboot. So relying
on the PCI reset to unload/reload Nova is really not practical here.

This series does what is needed to leave the GPU in a clean state after
unbind, for all currently supported GPUs. Blackwell support is just a
placeholder and will be completed by the Blackwell boot support series.

This revision is based on `drm-rust-next`.

A branch with the series is available at [1].

[1] https://github.com/Gnurou/linux/tree/b4/nova-unload

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
Changes in v7:
- Rebase on current drm-rust-next.
- Drop merged patches.
- Integrate Eliot's unload-on-drop improvement.
- Use `&Gsp` instead of `Pin<&mut Gsp>` in HAL.
- Add a new patch that runs the unload bundle if `Gsp::boot` fails.
- Link to v6: https://patch.msgid.link/20260521-nova-unload-v6-0-65f581c812c9@nvidia.com

Changes in v6:
- Inline TU102 local `run_booter` method in its unique call site.
- Rename unload bundle field to `unload_bundle`.
- Make Sec2UnloadBundle private.
- Continue GSP teardown upon partial failure.
- Store the unload bundle into `NovaCore`.
- Take the unload bundle by value to make it one-shot.
- Link to v5: https://patch.msgid.link/20260515-nova-unload-v5-0-c4d6250ad160@nvidia.com

Changes in v5:
- Rebase on top of the Device HRT series.
- Drop the now unneeded "gpu: nova-core: split BAR acquisition in unbind()".
- Link to v4: https://patch.msgid.link/20260427-nova-unload-v4-0-e145ccddae66@nvidia.com

Changes in v4:
- Remove `warn_on_err` macro as it isn't performing as expected and
  distracts from the goal of the series.
- Add John's patch from the Blackwell series refactoring the Booter
  Loader runner code.
- Add a GSP HAL and move the existing TU102/SEC2 boot sequence into it
  in preparation for the Hopper/Blackwell FSP boot path.
- Prepare the firmware required for unloading at probe time and save it
  into an unload bundle, as we cannot guarantee filesystem access at
  unload time.
- Constrain `UNLOADING_GUEST_DRIVER`'s visibility to the parent module.
- Also write the sentinel value `0xff` into `mbox1` when running Booter
  Unloader to align with OpenRM.
- Link to v3: https://patch.msgid.link/20260422-nova-unload-v3-0-1d2c81bd3ced@nvidia.com

Changes in v3:
- Disambiguate doccomment for `warn_on_err`.
- Test the correct bit instead of the whole register value to determine
  that the GSP has stopped.
- Use an enum instead of a boolean to encode the power level when
  shutting down the GSP.
- Add missing newline to `dev_err`.
- Add missing doccomments for new types.
- Use values from bindings instead of magic numbers.
- Remove the redundant `get_gsp_info` function.
- Better document Booter Unloader mailbox sentinel value, and check the
  value of mbox0 upon return.
- Link to v2: https://patch.msgid.link/20260421-nova-unload-v2-0-2fe54963af8b@nvidia.com

Changes in v2:
- Rebase on top of `master` and remove unneeded/obsolete preparatory patches.
- Tidy up the imports of commands from the `fw` module in the `gsp` module.
- Link to v1: https://patch.msgid.link/20251216-nova-unload-v1-0-6a5d823be19d@nvidia.com

---
Alexandre Courbot (4):
      gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL
      gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading
      gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding
      gpu: nova-core: gsp: run the unload bundle if Gsp::boot() fails

 drivers/gpu/nova-core/firmware/booter.rs          |   1 -
 drivers/gpu/nova-core/firmware/fwsec.rs           |   1 -
 drivers/gpu/nova-core/gpu.rs                      |  34 ++-
 drivers/gpu/nova-core/gsp.rs                      |   4 +
 drivers/gpu/nova-core/gsp/boot.rs                 | 290 +++++++++---------
 drivers/gpu/nova-core/gsp/commands.rs             |  43 +++
 drivers/gpu/nova-core/gsp/fw.rs                   |   4 +
 drivers/gpu/nova-core/gsp/fw/commands.rs          |  45 +++
 drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs |  11 +
 drivers/gpu/nova-core/gsp/hal.rs                  |  94 ++++++
 drivers/gpu/nova-core/gsp/hal/gh100.rs            |  51 ++++
 drivers/gpu/nova-core/gsp/hal/tu102.rs            | 349 ++++++++++++++++++++++
 drivers/gpu/nova-core/regs.rs                     |   5 +
 13 files changed, 775 insertions(+), 157 deletions(-)
---
base-commit: 0e42ec83d46ab8877d38d37493328ed7d1a24de8
change-id: 20251216-nova-unload-4029b3b76950

Best regards,
--  
Alexandre Courbot <acourbot@nvidia.com>


             reply	other threads:[~2026-05-29  7:34 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-29  7:33 Alexandre Courbot [this message]
2026-05-29  7:33 ` [PATCH v7 1/4] gpu: nova-core: gsp: move chipset-specific parts of the boot process into a HAL Alexandre Courbot
2026-06-04  6:57   ` Claude review: " Claude Code Review Bot
2026-05-29  7:33 ` [PATCH v7 2/4] gpu: nova-core: send UNLOADING_GUEST_DRIVER GSP command upon unloading Alexandre Courbot
2026-06-04  6:57   ` Claude review: " Claude Code Review Bot
2026-05-29  7:33 ` [PATCH v7 3/4] gpu: nova-core: run Booter Unloader and FWSEC-SB upon unbinding Alexandre Courbot
2026-06-04  6:57   ` Claude review: " Claude Code Review Bot
2026-05-29  7:33 ` [PATCH v7 4/4] gpu: nova-core: gsp: run the unload bundle if Gsp::boot() fails Alexandre Courbot
2026-05-30  1:46   ` Eliot Courtney
2026-06-04  6:57   ` Claude review: " Claude Code Review Bot
2026-05-29 11:15 ` [PATCH v7 0/4] gpu: nova-core: run unload sequence upon unbinding Danilo Krummrich
2026-05-29 13:06   ` Alexandre Courbot
2026-05-30  5:55 ` Alexandre Courbot
2026-06-04  6:57 ` Claude review: " Claude Code Review Bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260529-nova-unload-v7-0-678f39209e00@nvidia.com \
    --to=acourbot@nvidia.com \
    --cc=airlied@gmail.com \
    --cc=aliceryhl@google.com \
    --cc=apopple@nvidia.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=ecourtney@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nova-gpu@lists.linux.dev \
    --cc=rust-for-linux@vger.kernel.org \
    --cc=simona@ffwll.ch \
    --cc=ttabi@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox