* [PATCH] gpu: nova-core: fix stack overflow in GSP memory allocation
@ 2026-02-13 3:49 Tim Kovalenko via B4 Relay
2026-02-13 6:56 ` kernel test robot
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Tim Kovalenko via B4 Relay @ 2026-02-13 3:49 UTC (permalink / raw)
To: Alexandre Courbot, Danilo Krummrich, Alice Ryhl, David Airlie,
Simona Vetter, Miguel Ojeda, Boqun Feng, Gary Guo,
Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross
Cc: nouveau, dri-devel, linux-kernel, rust-for-linux, Tim Kovalenko
From: Tim Kovalenko <tim.kovalenko@proton.me>
The `Cmdq::new` function was allocating a `PteArray` struct on the stack
and was causing a stack overflow with 8216 bytes.
Remove the `PteArray` and instead calculate and write the Page Table
Entries directly into the coherent DMA buffer one-by-one. This reduces
the stack usage quite a lot.
Signed-off-by: Tim Kovalenko <tim.kovalenko@proton.me>
---
drivers/gpu/nova-core/gsp.rs | 50 ++++++++++++++-------------------------
drivers/gpu/nova-core/gsp/cmdq.rs | 27 ++++++++++++++++++---
2 files changed, 42 insertions(+), 35 deletions(-)
diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
index 174feaca0a6b9269cf35286dec3acc4d60918904..316eeaf87ec5ae67422a34426eefa747c9b6502b 100644
--- a/drivers/gpu/nova-core/gsp.rs
+++ b/drivers/gpu/nova-core/gsp.rs
@@ -2,16 +2,14 @@
mod boot;
+use core::iter::Iterator;
+
use kernel::{
device,
- dma::{
- CoherentAllocation,
- DmaAddress, //
- },
+ dma::CoherentAllocation,
dma_write,
pci,
- prelude::*,
- transmute::AsBytes, //
+ prelude::*, //
};
pub(crate) mod cmdq;
@@ -39,27 +37,6 @@
/// Number of GSP pages to use in a RM log buffer.
const RM_LOG_BUFFER_NUM_PAGES: usize = 0x10;
-/// Array of page table entries, as understood by the GSP bootloader.
-#[repr(C)]
-struct PteArray<const NUM_ENTRIES: usize>([u64; NUM_ENTRIES]);
-
-/// SAFETY: arrays of `u64` implement `AsBytes` and we are but a wrapper around one.
-unsafe impl<const NUM_ENTRIES: usize> AsBytes for PteArray<NUM_ENTRIES> {}
-
-impl<const NUM_PAGES: usize> PteArray<NUM_PAGES> {
- /// Creates a new page table array mapping `NUM_PAGES` GSP pages starting at address `start`.
- fn new(start: DmaAddress) -> Result<Self> {
- let mut ptes = [0u64; NUM_PAGES];
- for (i, pte) in ptes.iter_mut().enumerate() {
- *pte = start
- .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
- .ok_or(EOVERFLOW)?;
- }
-
- Ok(Self(ptes))
- }
-}
-
/// The logging buffers are byte queues that contain encoded printf-like
/// messages from GSP-RM. They need to be decoded by a special application
/// that can parse the buffers.
@@ -86,16 +63,25 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
NUM_PAGES * GSP_PAGE_SIZE,
GFP_KERNEL | __GFP_ZERO,
)?);
- let ptes = PteArray::<NUM_PAGES>::new(obj.0.dma_handle())?;
+
+ let start_addr = obj.0.dma_handle();
// SAFETY: `obj` has just been created and we are its sole user.
- unsafe {
- // Copy the self-mapping PTE at the expected location.
+ let pte_region = unsafe {
obj.0
- .as_slice_mut(size_of::<u64>(), size_of_val(&ptes))?
- .copy_from_slice(ptes.as_bytes())
+ .as_slice_mut(size_of::<u64>(), NUM_PAGES * size_of::<u64>())?
};
+ // As in [`DmaGspMem`], this is a one by one GSP Page write to the memory
+ // to avoid stack overflow when allocating the whole array at once.
+ for (i, chunk) in pte_region.chunks_exact_mut(size_of::<u64>()).enumerate() {
+ let pte_value = start_addr
+ .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
+ .ok_or(EOVERFLOW)?;
+
+ chunk.copy_from_slice(&pte_value.to_ne_bytes());
+ }
+
Ok(obj)
}
}
diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
index 46819a82a51adc58423502d9d45730923b843656..13a82d505c123e733850a00f627ddfe0c218940c 100644
--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -35,7 +35,6 @@
MsgqRxHeader,
MsgqTxHeader, //
},
- PteArray,
GSP_PAGE_SHIFT,
GSP_PAGE_SIZE, //
},
@@ -159,7 +158,7 @@ struct Msgq {
#[repr(C)]
struct GspMem {
/// Self-mapping page table entries.
- ptes: PteArray<{ GSP_PAGE_SIZE / size_of::<u64>() }>,
+ ptes: [u64; GSP_PAGE_SIZE / size_of::<u64>()],
/// CPU queue: the driver writes commands here, and the GSP reads them. It also contains the
/// write and read pointers that the CPU updates.
///
@@ -201,7 +200,29 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
let gsp_mem =
CoherentAllocation::<GspMem>::alloc_coherent(dev, 1, GFP_KERNEL | __GFP_ZERO)?;
- dma_write!(gsp_mem[0].ptes = PteArray::new(gsp_mem.dma_handle())?)?;
+ const NUM_PAGES: usize = GSP_PAGE_SIZE / size_of::<u64>();
+
+ // One by one GSP Page write to the memory to avoid stack overflow when allocating
+ // the whole array at once.
+ let item = gsp_mem.item_from_index(0)?;
+ for i in 0..NUM_PAGES {
+ let pte_value = gsp_mem
+ .dma_handle()
+ .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
+ .ok_or(EOVERFLOW)?;
+
+ // SAFETY: `item_from_index` ensures that `item` is always a valid pointer and can be
+ // dereferenced. The compiler also further validates the expression on whether `field`
+ // is a member of `item` when expanded by the macro.
+ //
+ // Further, this is dma_write! macro expanded and modified to allow for individual
+ // page write.
+ unsafe {
+ let ptr_field = core::ptr::addr_of_mut!((*item).ptes[i]);
+ gsp_mem.field_write(ptr_field, pte_value);
+ }
+ }
+
dma_write!(gsp_mem[0].cpuq.tx = MsgqTxHeader::new(MSGQ_SIZE, RX_HDR_OFF, MSGQ_NUM_PAGES))?;
dma_write!(gsp_mem[0].cpuq.rx = MsgqRxHeader::new())?;
---
base-commit: cea7b66a80412e2a5b74627b89ae25f1d0110a4b
change-id: 20260212-drm-rust-next-beb92aee9d75
Best regards,
--
Tim Kovalenko <tim.kovalenko@proton.me>
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] gpu: nova-core: fix stack overflow in GSP memory allocation
2026-02-13 3:49 [PATCH] gpu: nova-core: fix stack overflow in GSP memory allocation Tim Kovalenko via B4 Relay
@ 2026-02-13 6:56 ` kernel test robot
2026-02-13 8:06 ` Claude review: " Claude Code Review Bot
2026-02-13 8:06 ` Claude Code Review Bot
2 siblings, 0 replies; 4+ messages in thread
From: kernel test robot @ 2026-02-13 6:56 UTC (permalink / raw)
To: Tim Kovalenko via B4 Relay, Alexandre Courbot, Danilo Krummrich,
Alice Ryhl, David Airlie, Simona Vetter, Miguel Ojeda, Boqun Feng,
Gary Guo, Björn Roy Baron, Benno Lossin, Andreas Hindborg,
Trevor Gross
Cc: oe-kbuild-all, nouveau, dri-devel, linux-kernel, rust-for-linux,
Tim Kovalenko
Hi Tim,
kernel test robot noticed the following build errors:
[auto build test ERROR on cea7b66a80412e2a5b74627b89ae25f1d0110a4b]
url: https://github.com/intel-lab-lkp/linux/commits/Tim-Kovalenko-via-B4-Relay/gpu-nova-core-fix-stack-overflow-in-GSP-memory-allocation/20260213-115022
base: cea7b66a80412e2a5b74627b89ae25f1d0110a4b
patch link: https://lore.kernel.org/r/20260212-drm-rust-next-v1-1-409398b12e61%40proton.me
patch subject: [PATCH] gpu: nova-core: fix stack overflow in GSP memory allocation
config: x86_64-rhel-9.4-rust (https://download.01.org/0day-ci/archive/20260213/202602130722.99jbGBM1-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
rustc: rustc 1.88.0 (6b00bc388 2025-06-23)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260213/202602130722.99jbGBM1-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202602130722.99jbGBM1-lkp@intel.com/
All errors (new ones prefixed by >>):
PATH=/opt/cross/clang-20/bin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
INFO PATH=/opt/cross/rustc-1.88.0-bindgen-0.72.1/cargo/bin:/opt/cross/clang-20/bin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
/usr/bin/timeout -k 100 12h /usr/bin/make KCFLAGS=\ -fno-crash-diagnostics\ -Wno-error=return-type\ -Wreturn-type\ -funsigned-char\ -Wundef\ -falign-functions=64 W=1 --keep-going LLVM=1 -j32 -C source O=/kbuild/obj/consumer/x86_64-rhel-9.4-rust ARCH=x86_64 SHELL=/bin/bash rustfmtcheck
make: Entering directory '/kbuild/src/consumer'
make[1]: Entering directory '/kbuild/obj/consumer/x86_64-rhel-9.4-rust'
>> Diff in drivers/gpu/nova-core/gsp/cmdq.rs:207:
let item = gsp_mem.item_from_index(0)?;
for i in 0..NUM_PAGES {
let pte_value = gsp_mem
- .dma_handle()
- .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
- .ok_or(EOVERFLOW)?;
+ .dma_handle()
+ .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
+ .ok_or(EOVERFLOW)?;
// SAFETY: `item_from_index` ensures that `item` is always a valid pointer and can be
// dereferenced. The compiler also further validates the expression on whether `field`
>> Diff in drivers/gpu/nova-core/gsp/cmdq.rs:207:
let item = gsp_mem.item_from_index(0)?;
for i in 0..NUM_PAGES {
let pte_value = gsp_mem
- .dma_handle()
- .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
- .ok_or(EOVERFLOW)?;
+ .dma_handle()
+ .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
+ .ok_or(EOVERFLOW)?;
// SAFETY: `item_from_index` ensures that `item` is always a valid pointer and can be
// dereferenced. The compiler also further validates the expression on whether `field`
>> Diff in drivers/gpu/nova-core/gsp/cmdq.rs:207:
let item = gsp_mem.item_from_index(0)?;
for i in 0..NUM_PAGES {
let pte_value = gsp_mem
- .dma_handle()
- .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
- .ok_or(EOVERFLOW)?;
+ .dma_handle()
+ .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
+ .ok_or(EOVERFLOW)?;
// SAFETY: `item_from_index` ensures that `item` is always a valid pointer and can be
// dereferenced. The compiler also further validates the expression on whether `field`
make[2]: *** [Makefile:1871: rustfmt] Error 123
make[1]: Leaving directory '/kbuild/obj/consumer/x86_64-rhel-9.4-rust'
make[2]: Target 'rustfmtcheck' not remade because of errors.
make[1]: *** [Makefile:248: __sub-make] Error 2
make[1]: Target 'rustfmtcheck' not remade because of errors.
make: *** [Makefile:248: __sub-make] Error 2
make: Target 'rustfmtcheck' not remade because of errors.
make: Leaving directory '/kbuild/src/consumer'
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 4+ messages in thread
* Claude review: gpu: nova-core: fix stack overflow in GSP memory allocation
2026-02-13 3:49 [PATCH] gpu: nova-core: fix stack overflow in GSP memory allocation Tim Kovalenko via B4 Relay
2026-02-13 6:56 ` kernel test robot
@ 2026-02-13 8:06 ` Claude Code Review Bot
2026-02-13 8:06 ` Claude Code Review Bot
2 siblings, 0 replies; 4+ messages in thread
From: Claude Code Review Bot @ 2026-02-13 8:06 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: gpu: nova-core: fix stack overflow in GSP memory allocation
Author: Tim Kovalenko via B4 Relay <devnull+tim.kovalenko.proton.me@kernel.org>
Patches: 2
Reviewed: 2026-02-13T18:06:46.808370
---
This is a single-patch fix for a stack overflow in the nova-core GPU driver's GSP memory allocation path. The `PteArray` struct, which allocated a large `[u64; N]` array on the stack (up to 8216 bytes for the command queue case with 512 entries), is removed and replaced with direct element-by-element writes into the coherent DMA buffer.
The approach is straightforward and addresses the problem at both call sites: `LogBuffer::new` in `gsp.rs` and `DmaGspMem::new` in `gsp/cmdq.rs`. The two sites use different techniques — `gsp.rs` obtains a mutable byte slice and writes via `copy_from_slice`, while `cmdq.rs` uses raw pointer arithmetic with `field_write` — which makes the code less consistent than it could be, though each approach is valid in its context.
There are a few issues: the patch has a `rustfmt` failure as flagged by the kernel test robot (indentation on the method chain in `cmdq.rs`), an unused import added to `gsp.rs`, and a misleading constant name. None are correctness bugs, but the rustfmt issue will block merging.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 4+ messages in thread
* Claude review: gpu: nova-core: fix stack overflow in GSP memory allocation
2026-02-13 3:49 [PATCH] gpu: nova-core: fix stack overflow in GSP memory allocation Tim Kovalenko via B4 Relay
2026-02-13 6:56 ` kernel test robot
2026-02-13 8:06 ` Claude review: " Claude Code Review Bot
@ 2026-02-13 8:06 ` Claude Code Review Bot
2 siblings, 0 replies; 4+ messages in thread
From: Claude Code Review Bot @ 2026-02-13 8:06 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**rustfmt failure**: The kernel test robot already reported this. The indentation of the `.dma_handle()` chain in `cmdq.rs` does not match `rustfmt` expectations:
> + for i in 0..NUM_PAGES {
> + let pte_value = gsp_mem
> + .dma_handle()
> + .checked_add(num::usize_as_u64(i) << GSP_PAGE_SHIFT)
> + .ok_or(EOVERFLOW)?;
The `.dma_handle()` has 20 spaces of indentation (double-indented) whereas `rustfmt` wants 16 (single continuation indent). This needs to be fixed before the patch can be merged.
**Unused import**: The patch adds `use core::iter::Iterator` to `gsp.rs`, but nothing in the file uses `Iterator` as a named type. The `Iterator` trait is already in scope via the prelude. Was this perhaps left over from an earlier iteration of the patch?
> +use core::iter::Iterator;
**Misleading constant name in cmdq.rs**: The locally-defined `NUM_PAGES` doesn't represent a number of pages — it's the number of PTE entries that fit in one GSP page:
> + const NUM_PAGES: usize = GSP_PAGE_SIZE / size_of::<u64>();
This evaluates to 512 (4096 / 8), which is the number of u64 entries, not pages. The name was inherited from the old `PteArray` generic parameter, which also misnamed it. Something like `NUM_PTES` or `PTE_COUNT` would be clearer and would match the existing `Cmdq::NUM_PTES` constant defined elsewhere in the same file. Not a correctness issue, but worth cleaning up since the code is being rewritten anyway.
**Comment typo in gsp.rs**: There's a double space in the comment:
> + // As in [`DmaGspMem`], this is a one by one GSP Page write to the memory
Minor nit — "a one" should be "a one".
**Inconsistent approaches between the two call sites**: In `gsp.rs`, the PTE writes are done through a byte slice obtained from `as_slice_mut`, writing `to_ne_bytes()`. In `cmdq.rs`, the writes go through raw pointer arithmetic and `field_write`. Both work, but the `gsp.rs` approach is simpler and doesn't require unsafe code beyond the initial `as_slice_mut` call. Is there a reason the `cmdq.rs` site couldn't use the same pattern? The `LogBuffer` allocates `CoherentAllocation<u8>` so it naturally uses byte-level access, while `GspMem` is a typed allocation — but even so, the `field_write` approach seems more complex than necessary for writing individual array elements.
**Correctness**: Setting aside the style concerns, the logic in both call sites correctly reproduces the behavior of the removed `PteArray::new` — computing `base_dma_addr + i * GSP_PAGE_SIZE` for each PTE entry and writing it to the appropriate location in the DMA buffer. The overflow checking via `checked_add` is preserved. The error propagation with `?` on `ok_or(EOVERFLOW)` inside the loop correctly returns early if any PTE address overflows, matching the old behavior.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-02-13 8:06 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-13 3:49 [PATCH] gpu: nova-core: fix stack overflow in GSP memory allocation Tim Kovalenko via B4 Relay
2026-02-13 6:56 ` kernel test robot
2026-02-13 8:06 ` Claude review: " Claude Code Review Bot
2026-02-13 8:06 ` Claude Code Review Bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox