* [PATCH v2] gpu: nova-core: fix stack overflow in GSP memory allocation
@ 2026-02-13 19:40 Tim Kovalenko via B4 Relay
2026-02-13 21:16 ` Claude review: " Claude Code Review Bot
2026-02-13 21:16 ` Claude Code Review Bot
0 siblings, 2 replies; 3+ messages in thread
From: Tim Kovalenko via B4 Relay @ 2026-02-13 19:40 UTC (permalink / raw)
To: Alexandre Courbot, Danilo Krummrich, Alice Ryhl, David Airlie,
Simona Vetter, Miguel Ojeda, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross
Cc: nouveau, dri-devel, linux-kernel, rust-for-linux, Tim Kovalenko
From: Tim Kovalenko <tim.kovalenko@proton.me>
The `Cmdq::new` function was allocating a `PteArray` struct on the stack
and was causing a stack overflow with 8216 bytes.
Remove the `PteArray` and instead calculate and write the Page Table
Entries directly into the coherent DMA buffer one-by-one. This reduces
the stack usage quite a lot.
Signed-off-by: Tim Kovalenko <tim.kovalenko@proton.me>
---
Changes in v2:
- Missed a code formatting issue.
- Link to v1: https://lore.kernel.org/r/20260212-drm-rust-next-v1-1-409398b12e61@proton.me
---
drivers/gpu/nova-core/gsp.rs | 50 ++++++++++++++-------------------------
drivers/gpu/nova-core/gsp/cmdq.rs | 27 ++++++++++++++++++---
2 files changed, 42 insertions(+), 35 deletions(-)
diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
index 174feaca0a6b9269cf35286dec3acc4d60918904..316eeaf87ec5ae67422a34426eefa747c9b6502b 100644
--- a/drivers/gpu/nova-core/gsp.rs
+++ b/drivers/gpu/nova-core/gsp.rs
@@ -2,16 +2,14 @@
mod boot;
+use core::iter::Iterator;
+
use kernel::{
device,
- dma::{
- CoherentAllocation,
- DmaAddress, //
- },
+ dma::CoherentAllocation,
dma_write,
pci,
- prelude::*,
- transmute::AsBytes, //
+ prelude::*, //
};
pub(crate) mod cmdq;
@@ -39,27 +37,6 @@
/// Number of GSP pages to use in a RM log buffer.
const RM_LOG_BUFFER_NUM_PAGES: usize = 0x10;
-/// Array of page table entries, as understood by the GSP bootloader.
-#[repr(C)]
-struct PteArray<const NUM_ENTRIES: usize>([u64; NUM_ENTRIES]);
-
-/// SAFETY: arrays of `u64` implement `AsBytes` and we are but a wrapper around one.
-unsafe impl<const NUM_ENTRIES: usize> AsBytes for PteArray<NUM_ENTRIES> {}
-
-impl<const NUM_PAGES: usize> PteArray<NUM_PAGES> {
- /// Creates a new page table array mapping `NUM_PAGES` GSP pages starting at address `start`.
- fn new(start: DmaAddress) -> Result<Self> {
- let mut ptes = [0u64; NUM_PAGES];
- for (i, pte) in ptes.iter_mut().enumerate() {
- *pte = start
- .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
- .ok_or(EOVERFLOW)?;
- }
-
- Ok(Self(ptes))
- }
-}
-
/// The logging buffers are byte queues that contain encoded printf-like
/// messages from GSP-RM. They need to be decoded by a special application
/// that can parse the buffers.
@@ -86,16 +63,25 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
NUM_PAGES * GSP_PAGE_SIZE,
GFP_KERNEL | __GFP_ZERO,
)?);
- let ptes = PteArray::<NUM_PAGES>::new(obj.0.dma_handle())?;
+
+ let start_addr = obj.0.dma_handle();
// SAFETY: `obj` has just been created and we are its sole user.
- unsafe {
- // Copy the self-mapping PTE at the expected location.
+ let pte_region = unsafe {
obj.0
- .as_slice_mut(size_of::<u64>(), size_of_val(&ptes))?
- .copy_from_slice(ptes.as_bytes())
+ .as_slice_mut(size_of::<u64>(), NUM_PAGES * size_of::<u64>())?
};
+ // As in [`DmaGspMem`], this is a one by one GSP Page write to the memory
+ // to avoid stack overflow when allocating the whole array at once.
+ for (i, chunk) in pte_region.chunks_exact_mut(size_of::<u64>()).enumerate() {
+ let pte_value = start_addr
+ .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
+ .ok_or(EOVERFLOW)?;
+
+ chunk.copy_from_slice(&pte_value.to_ne_bytes());
+ }
+
Ok(obj)
}
}
diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
index 46819a82a51adc58423502d9d45730923b843656..7a6cb261f4e62ac6210a80f9ecb61213cdb91b15 100644
--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -35,7 +35,6 @@
MsgqRxHeader,
MsgqTxHeader, //
},
- PteArray,
GSP_PAGE_SHIFT,
GSP_PAGE_SIZE, //
},
@@ -159,7 +158,7 @@ struct Msgq {
#[repr(C)]
struct GspMem {
/// Self-mapping page table entries.
- ptes: PteArray<{ GSP_PAGE_SIZE / size_of::<u64>() }>,
+ ptes: [u64; GSP_PAGE_SIZE / size_of::<u64>()],
/// CPU queue: the driver writes commands here, and the GSP reads them. It also contains the
/// write and read pointers that the CPU updates.
///
@@ -201,7 +200,29 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
let gsp_mem =
CoherentAllocation::<GspMem>::alloc_coherent(dev, 1, GFP_KERNEL | __GFP_ZERO)?;
- dma_write!(gsp_mem[0].ptes = PteArray::new(gsp_mem.dma_handle())?)?;
+ const NUM_PAGES: usize = GSP_PAGE_SIZE / size_of::<u64>();
+
+ // One by one GSP Page write to the memory to avoid stack overflow when allocating
+ // the whole array at once.
+ let item = gsp_mem.item_from_index(0)?;
+ for i in 0..NUM_PAGES {
+ let pte_value = gsp_mem
+ .dma_handle()
+ .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
+ .ok_or(EOVERFLOW)?;
+
+ // SAFETY: `item_from_index` ensures that `item` is always a valid pointer and can be
+ // dereferenced. The compiler also further validates the expression on whether `field`
+ // is a member of `item` when expanded by the macro.
+ //
+ // Further, this is dma_write! macro expanded and modified to allow for individual
+ // page write.
+ unsafe {
+ let ptr_field = core::ptr::addr_of_mut!((*item).ptes[i]);
+ gsp_mem.field_write(ptr_field, pte_value);
+ }
+ }
+
dma_write!(gsp_mem[0].cpuq.tx = MsgqTxHeader::new(MSGQ_SIZE, RX_HDR_OFF, MSGQ_NUM_PAGES))?;
dma_write!(gsp_mem[0].cpuq.rx = MsgqRxHeader::new())?;
---
base-commit: cea7b66a80412e2a5b74627b89ae25f1d0110a4b
change-id: 20260212-drm-rust-next-beb92aee9d75
Best regards,
--
Tim Kovalenko <tim.kovalenko@proton.me>
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Claude review: gpu: nova-core: fix stack overflow in GSP memory allocation
2026-02-13 19:40 [PATCH v2] gpu: nova-core: fix stack overflow in GSP memory allocation Tim Kovalenko via B4 Relay
@ 2026-02-13 21:16 ` Claude Code Review Bot
2026-02-13 21:16 ` Claude Code Review Bot
1 sibling, 0 replies; 3+ messages in thread
From: Claude Code Review Bot @ 2026-02-13 21:16 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: gpu: nova-core: fix stack overflow in GSP memory allocation
Author: Tim Kovalenko via B4 Relay <devnull+tim.kovalenko.proton.me@kernel.org>
Patches: 1
Reviewed: 2026-02-14T07:16:12.245002
---
This is a single-patch series (v2) fixing a stack overflow in the nova-core GSP driver. The `PteArray` struct, when instantiated with a large number of entries, was being constructed on the stack before being copied into the DMA buffer. For the command queue path (`Cmdq::new`), `GspMem` contains a `PteArray<512>` (4096 bytes / 8 bytes per u64 = 512 entries), resulting in 4096 bytes of stack-allocated PTEs. The patch eliminates the intermediate stack allocation by computing and writing each PTE value directly into the DMA buffer one at a time.
The approach is sound and the two sites are handled differently due to their different DMA abstractions: the `LogBuffer` path uses `as_slice_mut` to get a mutable byte slice and writes PTEs via `copy_from_slice`, while the `Cmdq` path uses raw pointer arithmetic with `addr_of_mut!` and `field_write` to write individual array elements. Both approaches avoid the stack-allocated intermediate array.
There is one correctness concern worth examining in the `Cmdq::new` path related to using `field_write` (which performs a volatile write) for individual array elements versus the original `dma_write!` macro approach that wrote the entire `PteArray` struct atomically.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 3+ messages in thread
* Claude review: gpu: nova-core: fix stack overflow in GSP memory allocation
2026-02-13 19:40 [PATCH v2] gpu: nova-core: fix stack overflow in GSP memory allocation Tim Kovalenko via B4 Relay
2026-02-13 21:16 ` Claude review: " Claude Code Review Bot
@ 2026-02-13 21:16 ` Claude Code Review Bot
1 sibling, 0 replies; 3+ messages in thread
From: Claude Code Review Bot @ 2026-02-13 21:16 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**LogBuffer changes (gsp.rs)**
The `LogBuffer::new` changes look correct. The original code:
> - let ptes = PteArray::<NUM_PAGES>::new(obj.0.dma_handle())?;
> - // SAFETY: `obj` has just been created and we are its sole user.
> - unsafe {
> - // Copy the self-mapping PTE at the expected location.
> - obj.0
> - .as_slice_mut(size_of::<u64>(), size_of_val(&ptes))?
> - .copy_from_slice(ptes.as_bytes())
> - };
is replaced with:
> + let start_addr = obj.0.dma_handle();
> +
> + // SAFETY: `obj` has just been created and we are its sole user.
> + let pte_region = unsafe {
> + obj.0
> + .as_slice_mut(size_of::<u64>(), NUM_PAGES * size_of::<u64>())?
> + };
> +
> + // As in [`DmaGspMem`], this is a one by one GSP Page write to the memory
> + // to avoid stack overflow when allocating the whole array at once.
> + for (i, chunk) in pte_region.chunks_exact_mut(size_of::<u64>()).enumerate() {
> + let pte_value = start_addr
> + .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
> + .ok_or(EOVERFLOW)?;
> +
> + chunk.copy_from_slice(&pte_value.to_ne_bytes());
> + }
Since the `CoherentAllocation` is `CoherentAllocation<u8>`, the `as_slice_mut` offset and count parameters are in bytes. The original code passed `size_of::<u64>()` (8) as the offset and `size_of_val(&ptes)` as the count — which was `NUM_PAGES * 8`. The new code passes `size_of::<u64>()` (8) as offset and `NUM_PAGES * size_of::<u64>()` as count, which is equivalent. The `chunks_exact_mut(size_of::<u64>())` then iterates over 8-byte chunks, and `NUM_PAGES * 8 / 8 = NUM_PAGES` chunks, so all PTEs are written. This looks correct.
Minor nit: the comment has a double space ("this is a one by one").
**Cmdq changes (gsp/cmdq.rs)**
> + const NUM_PAGES: usize = GSP_PAGE_SIZE / size_of::<u64>();
This computes 4096 / 8 = 512, matching the original `PteArray<{ GSP_PAGE_SIZE / size_of::<u64>() }>` generic parameter. However, this `NUM_PAGES` name is potentially misleading — this isn't the number of GSP pages in the `GspMem` allocation, it's the number of PTE *entries* that fit in one GSP page. The existing code used this same expression as the array size, so this isn't a new issue, but the name could cause confusion with the `Cmdq::NUM_PTES` constant which equals `size_of::<GspMem>() >> GSP_PAGE_SHIFT` (the actual number of pages in the GspMem structure). That said, this is a naming preference not a bug.
> + let item = gsp_mem.item_from_index(0)?;
> + for i in 0..NUM_PAGES {
> + let pte_value = gsp_mem
> + .dma_handle()
> + .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
> + .ok_or(EOVERFLOW)?;
> +
> + // SAFETY: `item_from_index` ensures that `item` is always a valid pointer and can be
> + // dereferenced. The compiler also further validates the expression on whether `field`
> + // is a member of `item` when expanded by the macro.
> + //
> + // Further, this is dma_write! macro expanded and modified to allow for individual
> + // page write.
> + unsafe {
> + let ptr_field = core::ptr::addr_of_mut!((*item).ptes[i]);
> + gsp_mem.field_write(ptr_field, pte_value);
> + }
> + }
The safety comment says this is the `dma_write!` macro "expanded and modified," which is an accurate description of what's happening. The `field_write` method performs a volatile write of each `u64` PTE value. The original `dma_write!` macro for the whole `PteArray` would have written the entire struct via a single volatile write — but since the GSP isn't running at this point during initialization (the command queue hasn't been used yet), element-by-element volatile writes should be functionally equivalent.
The struct field type was also changed:
> - ptes: PteArray<{ GSP_PAGE_SIZE / size_of::<u64>() }>,
> + ptes: [u64; GSP_PAGE_SIZE / size_of::<u64>()],
This is correct since `PteArray` was just a `#[repr(C)]` newtype around `[u64; N]`, so the layout is identical and the `GspMem` struct layout is preserved. The `AsBytes`/`FromBytes` impls for `GspMem` are manual `unsafe impl`s so they don't depend on `PteArray` implementing those traits.
One thing worth verifying: the `PteArray` type is also imported in `cmdq.rs`:
> - PteArray,
This import is removed, consistent with `PteArray` being deleted entirely from `gsp.rs`. The unused `DmaAddress` import is also correctly removed from `gsp.rs` since `dma_handle()` returns a `DmaAddress` but it's now used via method call rather than being named explicitly.
The `use core::iter::Iterator` addition at the top of `gsp.rs`:
> +use core::iter::Iterator;
This appears unnecessary — `Iterator` is in the prelude and the patch doesn't use it explicitly as a trait bound anywhere in `gsp.rs`. The `chunks_exact_mut` method is on `[T]`, not on `Iterator`. Was this left over from an earlier version of the patch?
Overall this is a straightforward and correct fix. The only items worth addressing are the unnecessary `use core::iter::Iterator` import and the double-space typo in the comment.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2026-02-13 21:16 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-13 19:40 [PATCH v2] gpu: nova-core: fix stack overflow in GSP memory allocation Tim Kovalenko via B4 Relay
2026-02-13 21:16 ` Claude review: " Claude Code Review Bot
2026-02-13 21:16 ` Claude Code Review Bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox