[PATCH v2] gpu: nova-core: fix stack overflow in GSP memory allocation

public inbox for drm-ai-reviews@public-inbox.freedesktop.org
 help / color / mirror / Atom feed

* [PATCH v2] gpu: nova-core: fix stack overflow in GSP memory allocation
@ 2026-02-13 19:40 Tim Kovalenko via B4 Relay
  2026-02-13 21:16 ` Claude review: " Claude Code Review Bot
  2026-02-13 21:16 ` Claude Code Review Bot
  0 siblings, 2 replies; 6+ messages in thread
From: Tim Kovalenko via B4 Relay @ 2026-02-13 19:40 UTC (permalink / raw)
  To: Alexandre Courbot, Danilo Krummrich, Alice Ryhl, David Airlie,
	Simona Vetter, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross
  Cc: nouveau, dri-devel, linux-kernel, rust-for-linux, Tim Kovalenko

From: Tim Kovalenko <tim.kovalenko@proton.me>

The `Cmdq::new` function was allocating a `PteArray` struct on the stack
and was causing a stack overflow with 8216 bytes.

Remove the `PteArray` and instead calculate and write the Page Table
Entries directly into the coherent DMA buffer one-by-one. This reduces
the stack usage quite a lot.

Signed-off-by: Tim Kovalenko <tim.kovalenko@proton.me>
---
Changes in v2:
- Missed a code formatting issue.
- Link to v1: https://lore.kernel.org/r/20260212-drm-rust-next-v1-1-409398b12e61@proton.me
---
 drivers/gpu/nova-core/gsp.rs      | 50 ++++++++++++++-------------------------
 drivers/gpu/nova-core/gsp/cmdq.rs | 27 ++++++++++++++++++---
 2 files changed, 42 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
index 174feaca0a6b9269cf35286dec3acc4d60918904..316eeaf87ec5ae67422a34426eefa747c9b6502b 100644
--- a/drivers/gpu/nova-core/gsp.rs
+++ b/drivers/gpu/nova-core/gsp.rs
@@ -2,16 +2,14 @@
 
 mod boot;
 
+use core::iter::Iterator;
+
 use kernel::{
     device,
-    dma::{
-        CoherentAllocation,
-        DmaAddress, //
-    },
+    dma::CoherentAllocation,
     dma_write,
     pci,
-    prelude::*,
-    transmute::AsBytes, //
+    prelude::*, //
 };
 
 pub(crate) mod cmdq;
@@ -39,27 +37,6 @@
 /// Number of GSP pages to use in a RM log buffer.
 const RM_LOG_BUFFER_NUM_PAGES: usize = 0x10;
 
-/// Array of page table entries, as understood by the GSP bootloader.
-#[repr(C)]
-struct PteArray<const NUM_ENTRIES: usize>([u64; NUM_ENTRIES]);
-
-/// SAFETY: arrays of `u64` implement `AsBytes` and we are but a wrapper around one.
-unsafe impl<const NUM_ENTRIES: usize> AsBytes for PteArray<NUM_ENTRIES> {}
-
-impl<const NUM_PAGES: usize> PteArray<NUM_PAGES> {
-    /// Creates a new page table array mapping `NUM_PAGES` GSP pages starting at address `start`.
-    fn new(start: DmaAddress) -> Result<Self> {
-        let mut ptes = [0u64; NUM_PAGES];
-        for (i, pte) in ptes.iter_mut().enumerate() {
-            *pte = start
-                .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
-                .ok_or(EOVERFLOW)?;
-        }
-
-        Ok(Self(ptes))
-    }
-}
-
 /// The logging buffers are byte queues that contain encoded printf-like
 /// messages from GSP-RM.  They need to be decoded by a special application
 /// that can parse the buffers.
@@ -86,16 +63,25 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
             NUM_PAGES * GSP_PAGE_SIZE,
             GFP_KERNEL | __GFP_ZERO,
         )?);
-        let ptes = PteArray::<NUM_PAGES>::new(obj.0.dma_handle())?;
+
+        let start_addr = obj.0.dma_handle();
 
         // SAFETY: `obj` has just been created and we are its sole user.
-        unsafe {
-            // Copy the self-mapping PTE at the expected location.
+        let pte_region = unsafe {
             obj.0
-                .as_slice_mut(size_of::<u64>(), size_of_val(&ptes))?
-                .copy_from_slice(ptes.as_bytes())
+                .as_slice_mut(size_of::<u64>(), NUM_PAGES * size_of::<u64>())?
         };
 
+        // As in [`DmaGspMem`], this is a  one by one GSP Page write to the memory
+        // to avoid stack overflow when allocating the whole array at once.
+        for (i, chunk) in pte_region.chunks_exact_mut(size_of::<u64>()).enumerate() {
+            let pte_value = start_addr
+                .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
+                .ok_or(EOVERFLOW)?;
+
+            chunk.copy_from_slice(&pte_value.to_ne_bytes());
+        }
+
         Ok(obj)
     }
 }
diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
index 46819a82a51adc58423502d9d45730923b843656..7a6cb261f4e62ac6210a80f9ecb61213cdb91b15 100644
--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -35,7 +35,6 @@
             MsgqRxHeader,
             MsgqTxHeader, //
         },
-        PteArray,
         GSP_PAGE_SHIFT,
         GSP_PAGE_SIZE, //
     },
@@ -159,7 +158,7 @@ struct Msgq {
 #[repr(C)]
 struct GspMem {
     /// Self-mapping page table entries.
-    ptes: PteArray<{ GSP_PAGE_SIZE / size_of::<u64>() }>,
+    ptes: [u64; GSP_PAGE_SIZE / size_of::<u64>()],
     /// CPU queue: the driver writes commands here, and the GSP reads them. It also contains the
     /// write and read pointers that the CPU updates.
     ///
@@ -201,7 +200,29 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
 
         let gsp_mem =
             CoherentAllocation::<GspMem>::alloc_coherent(dev, 1, GFP_KERNEL | __GFP_ZERO)?;
-        dma_write!(gsp_mem[0].ptes = PteArray::new(gsp_mem.dma_handle())?)?;
+        const NUM_PAGES: usize = GSP_PAGE_SIZE / size_of::<u64>();
+
+        // One by one GSP Page write to the memory to avoid stack overflow when allocating
+        // the whole array at once.
+        let item = gsp_mem.item_from_index(0)?;
+        for i in 0..NUM_PAGES {
+            let pte_value = gsp_mem
+                .dma_handle()
+                .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
+                .ok_or(EOVERFLOW)?;
+
+            // SAFETY: `item_from_index` ensures that `item` is always a valid pointer and can be
+            // dereferenced. The compiler also further validates the expression on whether `field`
+            // is a member of `item` when expanded by the macro.
+            //
+            // Further, this is dma_write! macro expanded and modified to allow for individual
+            // page write.
+            unsafe {
+                let ptr_field = core::ptr::addr_of_mut!((*item).ptes[i]);
+                gsp_mem.field_write(ptr_field, pte_value);
+            }
+        }
+
         dma_write!(gsp_mem[0].cpuq.tx = MsgqTxHeader::new(MSGQ_SIZE, RX_HDR_OFF, MSGQ_NUM_PAGES))?;
         dma_write!(gsp_mem[0].cpuq.rx = MsgqRxHeader::new())?;
 

---
base-commit: cea7b66a80412e2a5b74627b89ae25f1d0110a4b
change-id: 20260212-drm-rust-next-beb92aee9d75

Best regards,
-- 
Tim Kovalenko <tim.kovalenko@proton.me>



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Claude review: gpu: nova-core: fix stack overflow in GSP memory allocation
  2026-02-13 19:40 [PATCH v2] gpu: nova-core: fix stack overflow in GSP memory allocation Tim Kovalenko via B4 Relay
@ 2026-02-13 21:16 ` Claude Code Review Bot
  2026-02-13 21:16 ` Claude Code Review Bot
  1 sibling, 0 replies; 6+ messages in thread
From: Claude Code Review Bot @ 2026-02-13 21:16 UTC (permalink / raw)
  To: dri-devel-reviews

Overall Series Review

Subject: gpu: nova-core: fix stack overflow in GSP memory allocation
Author: Tim Kovalenko via B4 Relay <devnull+tim.kovalenko.proton.me@kernel.org>
Patches: 1
Reviewed: 2026-02-14T07:16:12.245002

---

This is a single-patch series (v2) fixing a stack overflow in the nova-core GSP driver. The `PteArray` struct, when instantiated with a large number of entries, was being constructed on the stack before being copied into the DMA buffer. For the command queue path (`Cmdq::new`), `GspMem` contains a `PteArray<512>` (4096 bytes / 8 bytes per u64 = 512 entries), resulting in 4096 bytes of stack-allocated PTEs. The patch eliminates the intermediate stack allocation by computing and writing each PTE value directly into the DMA buffer one at a time.

The approach is sound and the two sites are handled differently due to their different DMA abstractions: the `LogBuffer` path uses `as_slice_mut` to get a mutable byte slice and writes PTEs via `copy_from_slice`, while the `Cmdq` path uses raw pointer arithmetic with `addr_of_mut!` and `field_write` to write individual array elements. Both approaches avoid the stack-allocated intermediate array.

There is one correctness concern worth examining in the `Cmdq::new` path related to using `field_write` (which performs a volatile write) for individual array elements versus the original `dma_write!` macro approach that wrote the entire `PteArray` struct atomically.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Claude review: gpu: nova-core: fix stack overflow in GSP memory allocation
  2026-02-13 19:40 [PATCH v2] gpu: nova-core: fix stack overflow in GSP memory allocation Tim Kovalenko via B4 Relay
  2026-02-13 21:16 ` Claude review: " Claude Code Review Bot
@ 2026-02-13 21:16 ` Claude Code Review Bot
  1 sibling, 0 replies; 6+ messages in thread
From: Claude Code Review Bot @ 2026-02-13 21:16 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**LogBuffer changes (gsp.rs)**

The `LogBuffer::new` changes look correct. The original code:

> -        let ptes = PteArray::<NUM_PAGES>::new(obj.0.dma_handle())?;
> -        // SAFETY: `obj` has just been created and we are its sole user.
> -        unsafe {
> -            // Copy the self-mapping PTE at the expected location.
> -            obj.0
> -                .as_slice_mut(size_of::<u64>(), size_of_val(&ptes))?
> -                .copy_from_slice(ptes.as_bytes())
> -        };

is replaced with:

> +        let start_addr = obj.0.dma_handle();
> +
> +        // SAFETY: `obj` has just been created and we are its sole user.
> +        let pte_region = unsafe {
> +            obj.0
> +                .as_slice_mut(size_of::<u64>(), NUM_PAGES * size_of::<u64>())?
> +        };
> +
> +        // As in [`DmaGspMem`], this is a  one by one GSP Page write to the memory
> +        // to avoid stack overflow when allocating the whole array at once.
> +        for (i, chunk) in pte_region.chunks_exact_mut(size_of::<u64>()).enumerate() {
> +            let pte_value = start_addr
> +                .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
> +                .ok_or(EOVERFLOW)?;
> +
> +            chunk.copy_from_slice(&pte_value.to_ne_bytes());
> +        }

Since the `CoherentAllocation` is `CoherentAllocation<u8>`, the `as_slice_mut` offset and count parameters are in bytes. The original code passed `size_of::<u64>()` (8) as the offset and `size_of_val(&ptes)` as the count — which was `NUM_PAGES * 8`. The new code passes `size_of::<u64>()` (8) as offset and `NUM_PAGES * size_of::<u64>()` as count, which is equivalent. The `chunks_exact_mut(size_of::<u64>())` then iterates over 8-byte chunks, and `NUM_PAGES * 8 / 8 = NUM_PAGES` chunks, so all PTEs are written. This looks correct.

Minor nit: the comment has a double space ("this is a  one by one").

**Cmdq changes (gsp/cmdq.rs)**

> +        const NUM_PAGES: usize = GSP_PAGE_SIZE / size_of::<u64>();

This computes 4096 / 8 = 512, matching the original `PteArray<{ GSP_PAGE_SIZE / size_of::<u64>() }>` generic parameter. However, this `NUM_PAGES` name is potentially misleading — this isn't the number of GSP pages in the `GspMem` allocation, it's the number of PTE *entries* that fit in one GSP page. The existing code used this same expression as the array size, so this isn't a new issue, but the name could cause confusion with the `Cmdq::NUM_PTES` constant which equals `size_of::<GspMem>() >> GSP_PAGE_SHIFT` (the actual number of pages in the GspMem structure). That said, this is a naming preference not a bug.

> +        let item = gsp_mem.item_from_index(0)?;
> +        for i in 0..NUM_PAGES {
> +            let pte_value = gsp_mem
> +                .dma_handle()
> +                .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
> +                .ok_or(EOVERFLOW)?;
> +
> +            // SAFETY: `item_from_index` ensures that `item` is always a valid pointer and can be
> +            // dereferenced. The compiler also further validates the expression on whether `field`
> +            // is a member of `item` when expanded by the macro.
> +            //
> +            // Further, this is dma_write! macro expanded and modified to allow for individual
> +            // page write.
> +            unsafe {
> +                let ptr_field = core::ptr::addr_of_mut!((*item).ptes[i]);
> +                gsp_mem.field_write(ptr_field, pte_value);
> +            }
> +        }

The safety comment says this is the `dma_write!` macro "expanded and modified," which is an accurate description of what's happening. The `field_write` method performs a volatile write of each `u64` PTE value. The original `dma_write!` macro for the whole `PteArray` would have written the entire struct via a single volatile write — but since the GSP isn't running at this point during initialization (the command queue hasn't been used yet), element-by-element volatile writes should be functionally equivalent.

The struct field type was also changed:

> -    ptes: PteArray<{ GSP_PAGE_SIZE / size_of::<u64>() }>,
> +    ptes: [u64; GSP_PAGE_SIZE / size_of::<u64>()],

This is correct since `PteArray` was just a `#[repr(C)]` newtype around `[u64; N]`, so the layout is identical and the `GspMem` struct layout is preserved. The `AsBytes`/`FromBytes` impls for `GspMem` are manual `unsafe impl`s so they don't depend on `PteArray` implementing those traits.

One thing worth verifying: the `PteArray` type is also imported in `cmdq.rs`:

> -        PteArray,

This import is removed, consistent with `PteArray` being deleted entirely from `gsp.rs`. The unused `DmaAddress` import is also correctly removed from `gsp.rs` since `dma_handle()` returns a `DmaAddress` but it's now used via method call rather than being named explicitly.

The `use core::iter::Iterator` addition at the top of `gsp.rs`:

> +use core::iter::Iterator;

This appears unnecessary — `Iterator` is in the prelude and the patch doesn't use it explicitly as a trait bound anywhere in `gsp.rs`. The `chunks_exact_mut` method is on `[T]`, not on `Iterator`. Was this left over from an earlier version of the patch?

Overall this is a straightforward and correct fix. The only items worth addressing are the unnecessary `use core::iter::Iterator` import and the double-space typo in the comment.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v4 0/4] Fixes the stack overflow
@ 2026-03-09 16:34 Tim Kovalenko via B4 Relay
  2026-03-09 16:34 ` [PATCH v4 4/4] gpu: nova-core: fix stack overflow in GSP memory allocation Tim Kovalenko via B4 Relay
  0 siblings, 1 reply; 6+ messages in thread
From: Tim Kovalenko via B4 Relay @ 2026-03-09 16:34 UTC (permalink / raw)
  To: Alexandre Courbot, Danilo Krummrich, Alice Ryhl, David Airlie,
	Simona Vetter, Miguel Ojeda, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Trevor Gross, Boqun Feng,
	Nathan Chancellor, Nicolas Schier, Abdiel Janulgue,
	Daniel Almeida, Robin Murphy, Boqun Feng
  Cc: nouveau, dri-devel, linux-kernel, rust-for-linux, linux-kbuild,
	driver-core, Tim Kovalenko


---
Changes in v4:
- Rebase on top of projection changes
- Use index projection when writing PTEs
- Keep PteArray, as discussed in V2, and move entry calculation there
- Avoid stack allocation by writing PTEs directly to DMA memory

- Link to v3: https://lore.kernel.org/r/20260217-drm-rust-next-v3-1-9e7e95c597dc@proton.me

Changes in v3:
- Addressed the comments and re-instated the PteArray type.
- PteArray now uses `init` instead of `new` where it writes to `self`
  page by page.
- PteArray just needs a pte pointer obtained from the `gsp_mem.as_slice_mut`.

I hope I understood everything in the V2 email chain and implemented it correctly :)

- Link to v2: https://lore.kernel.org/r/20260213-drm-rust-next-v2-1-aa094f78721a@proton.me

Changes in v2:
- Missed a code formatting issue.
- Link to v1: https://lore.kernel.org/r/20260212-drm-rust-next-v1-1-409398b12e61@proton.me

---
Gary Guo (3):
      rust: ptr: add `KnownSize` trait to support DST size info extraction
      rust: ptr: add projection infrastructure
      rust: dma: use pointer projection infra for `dma_{read,write}` macro

Tim Kovalenko (1):
      gpu: nova-core: fix stack overflow in GSP memory allocation

 drivers/gpu/nova-core/gsp.rs      |  48 ++++---
 drivers/gpu/nova-core/gsp/boot.rs |   2 +-
 drivers/gpu/nova-core/gsp/cmdq.rs |  23 ++-
 rust/kernel/dma.rs                | 114 +++++++--------
 rust/kernel/lib.rs                |   4 +
 rust/kernel/ptr.rs                |  30 +++-
 rust/kernel/ptr/projection.rs     | 294 ++++++++++++++++++++++++++++++++++++++
 samples/rust/rust_dma.rs          |  30 ++--
 scripts/Makefile.build            |   4 +-
 9 files changed, 443 insertions(+), 106 deletions(-)
---
base-commit: dd8a93dafe6ef50b49d2a7b44862264d74a7aafa
change-id: 20260212-drm-rust-next-beb92aee9d75

Best regards,
-- 
Tim Kovalenko <tim.kovalenko@proton.me>



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v4 4/4] gpu: nova-core: fix stack overflow in GSP memory allocation
  2026-03-09 16:34 [PATCH v4 0/4] Fixes the stack overflow Tim Kovalenko via B4 Relay
@ 2026-03-09 16:34 ` Tim Kovalenko via B4 Relay
  2026-03-10  2:10   ` Claude review: " Claude Code Review Bot
  0 siblings, 1 reply; 6+ messages in thread
From: Tim Kovalenko via B4 Relay @ 2026-03-09 16:34 UTC (permalink / raw)
  To: Alexandre Courbot, Danilo Krummrich, Alice Ryhl, David Airlie,
	Simona Vetter, Miguel Ojeda, Gary Guo, Björn Roy Baron,
	Benno Lossin, Andreas Hindborg, Trevor Gross, Boqun Feng,
	Nathan Chancellor, Nicolas Schier, Abdiel Janulgue,
	Daniel Almeida, Robin Murphy, Boqun Feng
  Cc: nouveau, dri-devel, linux-kernel, rust-for-linux, linux-kbuild,
	driver-core, Tim Kovalenko

From: Tim Kovalenko <tim.kovalenko@proton.me>

The `Cmdq::new` function was allocating a `PteArray` struct on the stack
and was causing a stack overflow with 8216 bytes.

Modify the `PteArray` to calculate and write the Page Table Entries
directly into the coherent DMA buffer one-by-one. This reduces the stack
usage quite a lot.

Signed-off-by: Tim Kovalenko <tim.kovalenko@proton.me>
---
 drivers/gpu/nova-core/gsp.rs      | 34 +++++++++++++++++++---------------
 drivers/gpu/nova-core/gsp/cmdq.rs | 15 ++++++++++++++-
 2 files changed, 33 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
index 25cd48514c777cb405a2af0acf57196b2e2e7837..20170e483e04c476efce8997b3916b0ad829ed38 100644
--- a/drivers/gpu/nova-core/gsp.rs
+++ b/drivers/gpu/nova-core/gsp.rs
@@ -47,16 +47,11 @@
 unsafe impl<const NUM_ENTRIES: usize> AsBytes for PteArray<NUM_ENTRIES> {}
 
 impl<const NUM_PAGES: usize> PteArray<NUM_PAGES> {
-    /// Creates a new page table array mapping `NUM_PAGES` GSP pages starting at address `start`.
-    fn new(start: DmaAddress) -> Result<Self> {
-        let mut ptes = [0u64; NUM_PAGES];
-        for (i, pte) in ptes.iter_mut().enumerate() {
-            *pte = start
-                .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
-                .ok_or(EOVERFLOW)?;
-        }
-
-        Ok(Self(ptes))
+    /// Returns the page table entry for `index`, for a mapping starting at `start` DmaAddress.
+    fn entry(start: DmaAddress, index: usize) -> Result<u64> {
+        start
+            .checked_add(num::usize_as_u64(index) << GSP_PAGE_SHIFT)
+            .ok_or(EOVERFLOW)
     }
 }
 
@@ -86,16 +81,25 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
             NUM_PAGES * GSP_PAGE_SIZE,
             GFP_KERNEL | __GFP_ZERO,
         )?);
-        let ptes = PteArray::<NUM_PAGES>::new(obj.0.dma_handle())?;
+
+        let start_addr = obj.0.dma_handle();
 
         // SAFETY: `obj` has just been created and we are its sole user.
-        unsafe {
-            // Copy the self-mapping PTE at the expected location.
+        let pte_region = unsafe {
             obj.0
-                .as_slice_mut(size_of::<u64>(), size_of_val(&ptes))?
-                .copy_from_slice(ptes.as_bytes())
+                .as_slice_mut(size_of::<u64>(), NUM_PAGES * size_of::<u64>())?
         };
 
+        // This is a  one by one GSP Page write to the memory
+        // to avoid stack overflow when allocating the whole array at once.
+        for (i, chunk) in pte_region.chunks_exact_mut(size_of::<u64>()).enumerate() {
+            let pte_value = start_addr
+                .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
+                .ok_or(EOVERFLOW)?;
+
+            chunk.copy_from_slice(&pte_value.to_ne_bytes());
+        }
+
         Ok(obj)
     }
 }
diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
index 0056bfbf0a44cfbc5a0ca08d069f881b877e1edc..c8327d3098f73f9b880eee99038ad10a16e1e32d 100644
--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -202,7 +202,20 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
 
         let gsp_mem =
             CoherentAllocation::<GspMem>::alloc_coherent(dev, 1, GFP_KERNEL | __GFP_ZERO)?;
-        dma_write!(gsp_mem, [0]?.ptes, PteArray::new(gsp_mem.dma_handle())?);
+
+        const NUM_PTES: usize = GSP_PAGE_SIZE / size_of::<u64>();
+
+        let start = gsp_mem.dma_handle();
+        // One by one GSP Page write to the memory to avoid stack overflow when allocating
+        // the whole array at once.
+        for i in 0..NUM_PTES {
+            dma_write!(
+                gsp_mem,
+                [0]?.ptes.0[i],
+                PteArray::<NUM_PTES>::entry(start, i)?
+            );
+        }
+
         dma_write!(
             gsp_mem,
             [0]?.cpuq.tx,

-- 
2.53.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Claude review: gpu: nova-core: fix stack overflow in GSP memory allocation
  2026-03-09 16:34 ` [PATCH v4 4/4] gpu: nova-core: fix stack overflow in GSP memory allocation Tim Kovalenko via B4 Relay
@ 2026-03-10  2:10   ` Claude Code Review Bot
  0 siblings, 0 replies; 6+ messages in thread
From: Claude Code Review Bot @ 2026-03-10  2:10 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

This is the actual bug fix. The `PteArray<NUM_PAGES>` was `[u64; NUM_PAGES]` on the stack, which for typical GSP page sizes resulted in 8216+ bytes of stack usage, causing overflow.

**cmdq.rs changes** — Uses the projection macro to write PTEs one-by-one:

```rust
for i in 0..NUM_PTES {
    dma_write!(
        gsp_mem,
        [0]?.ptes.0[i],
        PteArray::<NUM_PTES>::entry(start, i)?
    );
}
```

The use of infallible `[i]` (not `[i]?`) for the inner array index is interesting. This relies on the compiler (LLVM) proving that `i < NUM_PTES` from the `0..NUM_PTES` loop bound, so `build_error!()` is eliminated as dead code. If the optimizer cannot prove this, compilation will fail, which acts as a safety net. This is acceptable but somewhat fragile — if future changes complicate the loop, it could become a compile error. Using `[i]?` would be safer and essentially free at runtime.

**gsp.rs LogBuffer::new changes** — Does NOT use the projection macro or `PteArray::entry()`:

```rust
let pte_region = unsafe {
    obj.0
        .as_slice_mut(size_of::<u64>(), NUM_PAGES * size_of::<u64>())?
};
for (i, chunk) in pte_region.chunks_exact_mut(size_of::<u64>()).enumerate() {
    let pte_value = start_addr
        .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
        .ok_or(EOVERFLOW)?;
    chunk.copy_from_slice(&pte_value.to_ne_bytes());
}
```

**Issue: Code duplication** — The PTE value calculation here (`start_addr.checked_add(...)`) duplicates the logic of the newly added `PteArray::entry()` method. The `entry()` method was added in the same patch but is only called from `cmdq.rs`. The `gsp.rs` path should use it too:

```rust
let pte_value = PteArray::<NUM_PAGES>::entry(start_addr, i)?;
```

This would reduce duplication and keep the PTE calculation logic in one place.

**Issue: Inconsistent approaches** — The two call sites use fundamentally different strategies: `cmdq.rs` uses `dma_write!` with projection, while `gsp.rs` uses raw `as_slice_mut` + `chunks_exact_mut` + manual byte copying. This is understandable since `LogBuffer` wraps `CoherentAllocation<u8>` (byte-level access) while `DmaGspMem` wraps `CoherentAllocation<GspMem>` (typed access), but a brief comment in `gsp.rs` explaining why the projection approach isn't used here would help future readers.

**Minor style nit**: The comment at line 295 has a double space:

```rust
// This is a  one by one GSP Page write to the memory
```

Overall the fix is correct and achieves the goal of eliminating the stack overflow. The inconsistency between the two call sites and the unused `PteArray::entry()` in `gsp.rs` are the main items worth addressing before merge.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] gpu: nova-core: fix stack overflow in GSP memory allocation
@ 2026-02-13  3:49 Tim Kovalenko via B4 Relay
  2026-02-13  8:06 ` Claude review: " Claude Code Review Bot
  2026-02-13  8:06 ` Claude Code Review Bot
  0 siblings, 2 replies; 6+ messages in thread
From: Tim Kovalenko via B4 Relay @ 2026-02-13  3:49 UTC (permalink / raw)
  To: Alexandre Courbot, Danilo Krummrich, Alice Ryhl, David Airlie,
	Simona Vetter, Miguel Ojeda, Boqun Feng, Gary Guo,
	Björn Roy Baron, Benno Lossin, Andreas Hindborg,
	Trevor Gross
  Cc: nouveau, dri-devel, linux-kernel, rust-for-linux, Tim Kovalenko

From: Tim Kovalenko <tim.kovalenko@proton.me>

The `Cmdq::new` function was allocating a `PteArray` struct on the stack
and was causing a stack overflow with 8216 bytes.

Remove the `PteArray` and instead calculate and write the Page Table
Entries directly into the coherent DMA buffer one-by-one. This reduces
the stack usage quite a lot.

Signed-off-by: Tim Kovalenko <tim.kovalenko@proton.me>
---
 drivers/gpu/nova-core/gsp.rs      | 50 ++++++++++++++-------------------------
 drivers/gpu/nova-core/gsp/cmdq.rs | 27 ++++++++++++++++++---
 2 files changed, 42 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
index 174feaca0a6b9269cf35286dec3acc4d60918904..316eeaf87ec5ae67422a34426eefa747c9b6502b 100644
--- a/drivers/gpu/nova-core/gsp.rs
+++ b/drivers/gpu/nova-core/gsp.rs
@@ -2,16 +2,14 @@
 
 mod boot;
 
+use core::iter::Iterator;
+
 use kernel::{
     device,
-    dma::{
-        CoherentAllocation,
-        DmaAddress, //
-    },
+    dma::CoherentAllocation,
     dma_write,
     pci,
-    prelude::*,
-    transmute::AsBytes, //
+    prelude::*, //
 };
 
 pub(crate) mod cmdq;
@@ -39,27 +37,6 @@
 /// Number of GSP pages to use in a RM log buffer.
 const RM_LOG_BUFFER_NUM_PAGES: usize = 0x10;
 
-/// Array of page table entries, as understood by the GSP bootloader.
-#[repr(C)]
-struct PteArray<const NUM_ENTRIES: usize>([u64; NUM_ENTRIES]);
-
-/// SAFETY: arrays of `u64` implement `AsBytes` and we are but a wrapper around one.
-unsafe impl<const NUM_ENTRIES: usize> AsBytes for PteArray<NUM_ENTRIES> {}
-
-impl<const NUM_PAGES: usize> PteArray<NUM_PAGES> {
-    /// Creates a new page table array mapping `NUM_PAGES` GSP pages starting at address `start`.
-    fn new(start: DmaAddress) -> Result<Self> {
-        let mut ptes = [0u64; NUM_PAGES];
-        for (i, pte) in ptes.iter_mut().enumerate() {
-            *pte = start
-                .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
-                .ok_or(EOVERFLOW)?;
-        }
-
-        Ok(Self(ptes))
-    }
-}
-
 /// The logging buffers are byte queues that contain encoded printf-like
 /// messages from GSP-RM.  They need to be decoded by a special application
 /// that can parse the buffers.
@@ -86,16 +63,25 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
             NUM_PAGES * GSP_PAGE_SIZE,
             GFP_KERNEL | __GFP_ZERO,
         )?);
-        let ptes = PteArray::<NUM_PAGES>::new(obj.0.dma_handle())?;
+
+        let start_addr = obj.0.dma_handle();
 
         // SAFETY: `obj` has just been created and we are its sole user.
-        unsafe {
-            // Copy the self-mapping PTE at the expected location.
+        let pte_region = unsafe {
             obj.0
-                .as_slice_mut(size_of::<u64>(), size_of_val(&ptes))?
-                .copy_from_slice(ptes.as_bytes())
+                .as_slice_mut(size_of::<u64>(), NUM_PAGES * size_of::<u64>())?
         };
 
+        // As in [`DmaGspMem`], this is a  one by one GSP Page write to the memory
+        // to avoid stack overflow when allocating the whole array at once.
+        for (i, chunk) in pte_region.chunks_exact_mut(size_of::<u64>()).enumerate() {
+            let pte_value = start_addr
+                .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
+                .ok_or(EOVERFLOW)?;
+
+            chunk.copy_from_slice(&pte_value.to_ne_bytes());
+        }
+
         Ok(obj)
     }
 }
diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
index 46819a82a51adc58423502d9d45730923b843656..13a82d505c123e733850a00f627ddfe0c218940c 100644
--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -35,7 +35,6 @@
             MsgqRxHeader,
             MsgqTxHeader, //
         },
-        PteArray,
         GSP_PAGE_SHIFT,
         GSP_PAGE_SIZE, //
     },
@@ -159,7 +158,7 @@ struct Msgq {
 #[repr(C)]
 struct GspMem {
     /// Self-mapping page table entries.
-    ptes: PteArray<{ GSP_PAGE_SIZE / size_of::<u64>() }>,
+    ptes: [u64; GSP_PAGE_SIZE / size_of::<u64>()],
     /// CPU queue: the driver writes commands here, and the GSP reads them. It also contains the
     /// write and read pointers that the CPU updates.
     ///
@@ -201,7 +200,29 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
 
         let gsp_mem =
             CoherentAllocation::<GspMem>::alloc_coherent(dev, 1, GFP_KERNEL | __GFP_ZERO)?;
-        dma_write!(gsp_mem[0].ptes = PteArray::new(gsp_mem.dma_handle())?)?;
+        const NUM_PAGES: usize = GSP_PAGE_SIZE / size_of::<u64>();
+
+        // One by one GSP Page write to the memory to avoid stack overflow when allocating
+        // the whole array at once.
+        let item = gsp_mem.item_from_index(0)?;
+        for i in 0..NUM_PAGES {
+            let pte_value = gsp_mem
+                    .dma_handle()
+                    .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
+                    .ok_or(EOVERFLOW)?;
+
+            // SAFETY: `item_from_index` ensures that `item` is always a valid pointer and can be
+            // dereferenced. The compiler also further validates the expression on whether `field`
+            // is a member of `item` when expanded by the macro.
+            //
+            // Further, this is dma_write! macro expanded and modified to allow for individual
+            // page write.
+            unsafe {
+                let ptr_field = core::ptr::addr_of_mut!((*item).ptes[i]);
+                gsp_mem.field_write(ptr_field, pte_value);
+            }
+        }
+
         dma_write!(gsp_mem[0].cpuq.tx = MsgqTxHeader::new(MSGQ_SIZE, RX_HDR_OFF, MSGQ_NUM_PAGES))?;
         dma_write!(gsp_mem[0].cpuq.rx = MsgqRxHeader::new())?;
 

---
base-commit: cea7b66a80412e2a5b74627b89ae25f1d0110a4b
change-id: 20260212-drm-rust-next-beb92aee9d75

Best regards,
-- 
Tim Kovalenko <tim.kovalenko@proton.me>



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Claude review: gpu: nova-core: fix stack overflow in GSP memory allocation
  2026-02-13  3:49 [PATCH] " Tim Kovalenko via B4 Relay
@ 2026-02-13  8:06 ` Claude Code Review Bot
  2026-02-13  8:06 ` Claude Code Review Bot
  1 sibling, 0 replies; 6+ messages in thread
From: Claude Code Review Bot @ 2026-02-13  8:06 UTC (permalink / raw)
  To: dri-devel-reviews

Overall Series Review

Subject: gpu: nova-core: fix stack overflow in GSP memory allocation
Author: Tim Kovalenko via B4 Relay <devnull+tim.kovalenko.proton.me@kernel.org>
Patches: 2
Reviewed: 2026-02-13T18:06:46.808370

---

This is a single-patch fix for a stack overflow in the nova-core GPU driver's GSP memory allocation path. The `PteArray` struct, which allocated a large `[u64; N]` array on the stack (up to 8216 bytes for the command queue case with 512 entries), is removed and replaced with direct element-by-element writes into the coherent DMA buffer.

The approach is straightforward and addresses the problem at both call sites: `LogBuffer::new` in `gsp.rs` and `DmaGspMem::new` in `gsp/cmdq.rs`. The two sites use different techniques — `gsp.rs` obtains a mutable byte slice and writes via `copy_from_slice`, while `cmdq.rs` uses raw pointer arithmetic with `field_write` — which makes the code less consistent than it could be, though each approach is valid in its context.

There are a few issues: the patch has a `rustfmt` failure as flagged by the kernel test robot (indentation on the method chain in `cmdq.rs`), an unused import added to `gsp.rs`, and a misleading constant name. None are correctness bugs, but the rustfmt issue will block merging.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Claude review: gpu: nova-core: fix stack overflow in GSP memory allocation
  2026-02-13  3:49 [PATCH] " Tim Kovalenko via B4 Relay
  2026-02-13  8:06 ` Claude review: " Claude Code Review Bot
@ 2026-02-13  8:06 ` Claude Code Review Bot
  1 sibling, 0 replies; 6+ messages in thread
From: Claude Code Review Bot @ 2026-02-13  8:06 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**rustfmt failure**: The kernel test robot already reported this. The indentation of the `.dma_handle()` chain in `cmdq.rs` does not match `rustfmt` expectations:

> +        for i in 0..NUM_PAGES {
> +            let pte_value = gsp_mem
> +                    .dma_handle()
> +                    .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
> +                    .ok_or(EOVERFLOW)?;

The `.dma_handle()` has 20 spaces of indentation (double-indented) whereas `rustfmt` wants 16 (single continuation indent). This needs to be fixed before the patch can be merged.

**Unused import**: The patch adds `use core::iter::Iterator` to `gsp.rs`, but nothing in the file uses `Iterator` as a named type. The `Iterator` trait is already in scope via the prelude. Was this perhaps left over from an earlier iteration of the patch?

> +use core::iter::Iterator;

**Misleading constant name in cmdq.rs**: The locally-defined `NUM_PAGES` doesn't represent a number of pages — it's the number of PTE entries that fit in one GSP page:

> +        const NUM_PAGES: usize = GSP_PAGE_SIZE / size_of::<u64>();

This evaluates to 512 (4096 / 8), which is the number of u64 entries, not pages. The name was inherited from the old `PteArray` generic parameter, which also misnamed it. Something like `NUM_PTES` or `PTE_COUNT` would be clearer and would match the existing `Cmdq::NUM_PTES` constant defined elsewhere in the same file. Not a correctness issue, but worth cleaning up since the code is being rewritten anyway.

**Comment typo in gsp.rs**: There's a double space in the comment:

> +        // As in [`DmaGspMem`], this is a  one by one GSP Page write to the memory

Minor nit — "a  one" should be "a one".

**Inconsistent approaches between the two call sites**: In `gsp.rs`, the PTE writes are done through a byte slice obtained from `as_slice_mut`, writing `to_ne_bytes()`. In `cmdq.rs`, the writes go through raw pointer arithmetic and `field_write`. Both work, but the `gsp.rs` approach is simpler and doesn't require unsafe code beyond the initial `as_slice_mut` call. Is there a reason the `cmdq.rs` site couldn't use the same pattern? The `LogBuffer` allocates `CoherentAllocation<u8>` so it naturally uses byte-level access, while `GspMem` is a typed allocation — but even so, the `field_write` approach seems more complex than necessary for writing individual array elements.

**Correctness**: Setting aside the style concerns, the logic in both call sites correctly reproduces the behavior of the removed `PteArray::new` — computing `base_dma_addr + i * GSP_PAGE_SIZE` for each PTE entry and writing it to the appropriate location in the DMA buffer. The overflow checking via `checked_add` is preserved. The error propagation with `?` on `ok_or(EOVERFLOW)` inside the loop correctly returns early if any PTE address overflows, matching the old behavior.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-03-10  2:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-13 19:40 [PATCH v2] gpu: nova-core: fix stack overflow in GSP memory allocation Tim Kovalenko via B4 Relay
2026-02-13 21:16 ` Claude review: " Claude Code Review Bot
2026-02-13 21:16 ` Claude Code Review Bot
  -- strict thread matches above, loose matches on Subject: below --
2026-03-09 16:34 [PATCH v4 0/4] Fixes the stack overflow Tim Kovalenko via B4 Relay
2026-03-09 16:34 ` [PATCH v4 4/4] gpu: nova-core: fix stack overflow in GSP memory allocation Tim Kovalenko via B4 Relay
2026-03-10  2:10   ` Claude review: " Claude Code Review Bot
2026-02-13  3:49 [PATCH] " Tim Kovalenko via B4 Relay
2026-02-13  8:06 ` Claude review: " Claude Code Review Bot
2026-02-13  8:06 ` Claude Code Review Bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox