From mboxrd@z Thu Jan 1 00:00:00 1970 From: Claude Code Review Bot To: dri-devel-reviews@example.com Subject: Claude review: gpu: nova-core: gsp: run the unload bundle if Gsp::boot() fails Date: Thu, 04 Jun 2026 16:57:33 +1000 Message-ID: In-Reply-To: <20260529-nova-unload-v7-4-678f39209e00@nvidia.com> References: <20260529-nova-unload-v7-0-678f39209e00@nvidia.com> <20260529-nova-unload-v7-4-678f39209e00@nvidia.com> X-Mailer: Claude Code Patch Reviewer Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Patch Review This patch adds a `BootUnloadGuard` =E2=80=94 a `ScopeGuard` that automatic= ally runs the unload bundle if boot fails partway through. **Observations:** 1. The `BootUnloadGuard` wraps `ScopeGuard`: ```rust pub(super) struct BootUnloadGuard<'a> { guard: ScopeGuard, fn(BootUnloadArgs<'a>)>, } ``` Using a concrete function pointer `fn(BootUnloadArgs)` instead of a closure= avoids lifetime complications with closures capturing references. Clean ap= proach. 2. The guard's callback calls `Gsp::unload`: ```rust |args| { let _ =3D super::Gsp::unload( args.gsp, args.dev, args.bar, args.gsp_falcon, args.sec2_falcon, args.unload_bundle, ); }, ``` The `let _ =3D` discards the result in the drop path =E2=80=94 correct, sin= ce there's nothing to do with errors during cleanup of a failed boot. 3. The `dismiss()` method returns the unload bundle for the caller to take = ownership: ```rust pub(super) fn dismiss(self) -> Option { self.guard.dismiss().unload_bundle } ``` This is the standard `ScopeGuard` pattern =E2=80=94 on the success path, di= smiss the guard and hand the resource to the normal owner. 4. The lifetime threading `'a` through the HAL `boot` signature is well don= e: ```rust fn boot<'a>( &self, gsp: &'a Gsp, dev: &'a device::Device, bar: &'a Bar0, ... gsp_falcon: &'a Falcon, sec2_falcon: &'a Falcon, ) -> Result>; ``` This ensures the guard borrows from the same lifetime as the falcons and de= vice, which is correct =E2=80=94 the guard must not outlive the hardware re= sources it references. 5. **One concern**: The `BootUnloadGuard`'s drop callback calls `Gsp::unloa= d`, which includes `shutdown_gsp` (sends `UNLOADING_GUEST_DRIVER` command).= If boot fails early enough that the GSP isn't actually running yet (e.g., = before `gsp_falcon.boot()`), the `shutdown_gsp` command will fail (5-second= timeout). The unload bundle's `run()` would then also execute. This is pro= bably fine =E2=80=94 `shutdown_gsp` failure is logged and continued past, a= nd the unload bundle itself should work regardless of GSP state. But it doe= s mean a boot failure will incur an extra 5-second timeout from the futile = shutdown attempt. Not a functional issue, but worth noting for debugging (t= he error logs will appear). 6. In `tu102.rs`, the guard construction placement is correct =E2=80=94 it'= s created after the unload bundle is built but before FWSEC-FRTS: ```rust let unload_bundle =3D Sec2UnloadBundle::build(...).ok().map(crate::gsp::Unl= oadBundle); let unload_guard =3D BootUnloadGuard::new(gsp, dev, bar, gsp_falcon, sec2_f= alcon, unload_bundle); // FWSEC-FRTS, falcon reset, booter load all happen after this // Any `?` will trigger the guard ``` This means if FWSEC-FRTS, falcon boot, or booter load fails, the guard fire= s and cleans up. If unload bundle prep itself failed (returned `None`), the= guard fires but `unload` logs a warning and returns `EAGAIN` =E2=80=94 the= hardware still won't be cleaned up, but that was already the case. 7. The `gh100.rs` stub correctly propagates the lifetime change and still r= eturns `Err(ENOTSUPP)` =E2=80=94 no guard needed since boot never succeeds. **No issues.** The drop guard pattern is well-implemented and addresses a r= eal concern about partial boot failures. --- **Summary**: This is a clean, well-reviewed v7 series that addresses a real= operational problem (inability to rebind without GPU reset). The HAL abstr= action is forward-looking without being over-engineered, the error handling= is thorough (continue-on-error in teardown paths, pre-load firmware at boo= t time), and the drop guard in patch 4 closes the gap for partial boot fail= ures. All four patches are ready to merge. --- Generated by Claude Code Patch Reviewer