public inbox for drm-ai-reviews@public-inbox.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH] gpu: nova-core: gsp: fix undefined behavior in command queue code
@ 2026-03-19  5:36 Alexandre Courbot
  2026-03-19  6:41 ` Eliot Courtney
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Alexandre Courbot @ 2026-03-19  5:36 UTC (permalink / raw)
  To: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter,
	Alistair Popple
  Cc: John Hubbard, Joel Fernandes, Timur Tabi, Zhi Wang,
	Eliot Courtney, rust-for-linux, dri-devel, linux-kernel,
	Alexandre Courbot

`driver_read_area` and `driver_write_area` are internal methods that
return slices containing the area of the command queue buffer that the
driver has exclusive read of write access, respectively.

While their returned value is correct and safe to use, internally they
temporarily create a reference to the whole command-buffer slice,
including GSP-owned regions. These regions can change without notice,
and thus creating a slice to them is undefined behavior.

Fix this by replacing the slice logic with pointer arithmetic and
creating slices to valid regions only. It relies on unsafe code, but
should be mostly replaced by `IoView` and `IoSlice` once they land.

Fixes: 75f6b1de8133 ("gpu: nova-core: gsp: Add GSP command queue bindings and handling")
Suggested-by: Danilo Krummrich <dakr@kernel.org>
Link: https://lore.kernel.org/all/DH47AVPEKN06.3BERUSJIB4M1R@kernel.org/
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gsp/cmdq.rs | 135 ++++++++++++++++++++++++++++----------
 1 file changed, 100 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
index d36a62ba1c60..4200e7986774 100644
--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -251,38 +251,77 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
     /// As the message queue is a circular buffer, the region may be discontiguous in memory. In
     /// that case the second slice will have a non-zero length.
     fn driver_write_area(&mut self) -> (&mut [[u8; GSP_PAGE_SIZE]], &mut [[u8; GSP_PAGE_SIZE]]) {
-        let tx = self.cpu_write_ptr() as usize;
-        let rx = self.gsp_read_ptr() as usize;
+        let tx = num::u32_as_usize(self.cpu_write_ptr());
+        let rx = num::u32_as_usize(self.gsp_read_ptr());
+        // Number of pages between `tx` and the end of the command queue.
+        // PANIC: Per the invariant of `cpu_write_ptr`, `tx < MSGQ_NUM_PAGES`.
+        let after_tx_len = num::u32_as_usize(MSGQ_NUM_PAGES) - tx;
 
+        // Pointer to the start of the CPU message queue.
+        //
         // SAFETY:
-        // - The `CoherentAllocation` contains exactly one object.
-        // - We will only access the driver-owned part of the shared memory.
-        // - Per the safety statement of the function, no concurrent access will be performed.
-        let gsp_mem = &mut unsafe { self.0.as_slice_mut(0, 1) }.unwrap()[0];
-        // PANIC: per the invariant of `cpu_write_ptr`, `tx` is `< MSGQ_NUM_PAGES`.
-        let (before_tx, after_tx) = gsp_mem.cpuq.msgq.data.split_at_mut(tx);
+        // - `self.0` contains exactly one element.
+        // - `cpuq.msgq.data[0]` is within the bounds of that element.
+        let data = unsafe { &raw mut (*self.0.start_ptr_mut()).cpuq.msgq.data[0] };
 
-        // The area starting at `tx` and ending at `rx - 2` modulo MSGQ_NUM_PAGES, inclusive,
-        // belongs to the driver for writing.
+        // Safety/Panic comments to be referenced by the code below.
+        //
+        // SAFETY[1]:
+        // - `data` points to an array of `MSGQ_NUM_PAGES` elements.
+        // - The area starting at `tx` and ending at `rx - 2` modulo `MSGQ_NUM_PAGES`,
+        //   inclusive, belongs to the driver for writing and is not accessed concurrently by
+        //   the GSP.
+        // - `tx + after_tx_len` == `MSGQ_NUM_PAGES`.
+        //
+        // PANIC[1]:
+        // - Per the invariant of `cpu_write_ptr`, `tx < MSGQ_NUM_PAGES`.
+        // - Per the invariant of `gsp_read_ptr`, `rx < MSGQ_NUM_PAGES`.
 
         if rx == 0 {
-            // Since `rx` is zero, leave an empty slot at end of the buffer.
-            let last = after_tx.len() - 1;
-            (&mut after_tx[..last], &mut [])
+            (
+                // SAFETY: See SAFETY[1].
+                unsafe {
+                    core::slice::from_raw_parts_mut(
+                        data.add(tx),
+                        // Since `rx` is zero, leave an empty slot at end of the buffer.
+                        // PANIC: See PANIC[1].
+                        after_tx_len - 1,
+                    )
+                },
+                &mut [],
+            )
         } else if rx <= tx {
             // The area is discontiguous and we leave an empty slot before `rx`.
-            // PANIC:
-            // - The index `rx - 1` is non-negative because `rx != 0` in this branch.
-            // - The index does not exceed `before_tx.len()` (which equals `tx`) because
-            //   `rx <= tx` in this branch.
-            (after_tx, &mut before_tx[..(rx - 1)])
+            (
+                // SAFETY: See SAFETY[1].
+                unsafe { core::slice::from_raw_parts_mut(data.add(tx), after_tx_len) },
+                // SAFETY: See SAFETY[1].
+                unsafe {
+                    core::slice::from_raw_parts_mut(
+                        data,
+                        // Leave one empty slot before `rx`.
+                        // PANIC:
+                        // - See PANIC[1].
+                        // - `rx - 1` is non-negative because `rx != 0` in this branch.
+                        rx - 1,
+                    )
+                },
+            )
         } else {
             // The area is contiguous and we leave an empty slot before `rx`.
-            // PANIC:
-            // - The index `rx - tx - 1` is non-negative because `rx > tx` in this branch.
-            // - The index does not exceed `after_tx.len()` (which is `MSGQ_NUM_PAGES - tx`)
-            //   because `rx < MSGQ_NUM_PAGES` by the `gsp_read_ptr` invariant.
-            (&mut after_tx[..(rx - tx - 1)], &mut [])
+            (
+                // SAFETY: See SAFETY[1].
+                unsafe {
+                    core::slice::from_raw_parts_mut(
+                        data.add(tx),
+                        // PANIC:
+                        // - See PANIC[1].
+                        // - `rx - tx - 1` is non-negative because `rx > tx` in this branch.
+                        rx - tx - 1,
+                    )
+                },
+                &mut [],
+            )
         }
     }
 
@@ -308,24 +347,50 @@ fn driver_write_area_size(&self) -> usize {
         let tx = self.gsp_write_ptr() as usize;
         let rx = self.cpu_read_ptr() as usize;
 
+        // Pointer to the start of the GSP message queue.
+        //
         // SAFETY:
-        // - The `CoherentAllocation` contains exactly one object.
-        // - We will only access the driver-owned part of the shared memory.
-        // - Per the safety statement of the function, no concurrent access will be performed.
-        let gsp_mem = &unsafe { self.0.as_slice(0, 1) }.unwrap()[0];
-        let data = &gsp_mem.gspq.msgq.data;
+        // - `self.0` contains exactly one element.
+        // - `gspq.msgq.data[0]` is within the bounds of that element.
+        let data = unsafe { &raw const (*self.0.start_ptr()).gspq.msgq.data[0] };
+
+        // Safety/Panic comments to be referenced by the code below.
+        //
+        // SAFETY[1]:
+        // - `data` points to an array of `MSGQ_NUM_PAGES` elements.
+        // - The area starting at `rx` and ending at `tx - 1` modulo `MSGQ_NUM_PAGES`,
+        //   inclusive, belongs to the driver for reading and is not accessed concurrently by
+        //   the GSP.
+        //
+        // PANIC[1]:
+        // - Per the invariant of `cpu_read_ptr`, `rx < MSGQ_NUM_PAGES`.
+        // - Per the invariant of `gsp_write_ptr`, `tx < MSGQ_NUM_PAGES`.
 
-        // The area starting at `rx` and ending at `tx - 1` modulo MSGQ_NUM_PAGES, inclusive,
-        // belongs to the driver for reading.
-        // PANIC:
-        // - per the invariant of `cpu_read_ptr`, `rx < MSGQ_NUM_PAGES`
-        // - per the invariant of `gsp_write_ptr`, `tx < MSGQ_NUM_PAGES`
         if rx <= tx {
             // The area is contiguous.
-            (&data[rx..tx], &[])
+            (
+                // SAFETY: See SAFETY[1].
+                // PANIC:
+                // - See PANIC[1].
+                // - Per the branch test, `rx <= tx`.
+                unsafe { core::slice::from_raw_parts(data.add(rx), tx - rx) },
+                &[],
+            )
         } else {
             // The area is discontiguous.
-            (&data[rx..], &data[..tx])
+            (
+                // SAFETY: See SAFETY[1].
+                // PANIC: See PANIC[1].
+                unsafe {
+                    core::slice::from_raw_parts(
+                        data.add(rx),
+                        num::u32_as_usize(MSGQ_NUM_PAGES) - rx,
+                    )
+                },
+                // SAFETY: See SAFETY[1].
+                // PANIC: See PANIC[1].
+                unsafe { core::slice::from_raw_parts(data, tx) },
+            )
         }
     }
 

---
base-commit: a19457958c3018783881c4416f272cd594f13049
change-id: 20260319-cmdq-ub-fix-d57b09a745b9

Best regards,
-- 
Alexandre Courbot <acourbot@nvidia.com>


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] gpu: nova-core: gsp: fix undefined behavior in command queue code
  2026-03-19  5:36 [PATCH] gpu: nova-core: gsp: fix undefined behavior in command queue code Alexandre Courbot
@ 2026-03-19  6:41 ` Eliot Courtney
  2026-03-19  8:24   ` Alexandre Courbot
  2026-03-20 12:54 ` Danilo Krummrich
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 6+ messages in thread
From: Eliot Courtney @ 2026-03-19  6:41 UTC (permalink / raw)
  To: Alexandre Courbot, Danilo Krummrich, Alice Ryhl, David Airlie,
	Simona Vetter, Alistair Popple
  Cc: John Hubbard, Joel Fernandes, Timur Tabi, Zhi Wang,
	Eliot Courtney, rust-for-linux, dri-devel, linux-kernel

On Thu Mar 19, 2026 at 2:36 PM JST, Alexandre Courbot wrote:
> `driver_read_area` and `driver_write_area` are internal methods that
> return slices containing the area of the command queue buffer that the
> driver has exclusive read of write access, respectively.
>
> While their returned value is correct and safe to use, internally they
> temporarily create a reference to the whole command-buffer slice,
> including GSP-owned regions. These regions can change without notice,
> and thus creating a slice to them is undefined behavior.
>
> Fix this by replacing the slice logic with pointer arithmetic and
> creating slices to valid regions only. It relies on unsafe code, but
> should be mostly replaced by `IoView` and `IoSlice` once they land.
>
> Fixes: 75f6b1de8133 ("gpu: nova-core: gsp: Add GSP command queue bindings and handling")
> Suggested-by: Danilo Krummrich <dakr@kernel.org>
> Link: https://lore.kernel.org/all/DH47AVPEKN06.3BERUSJIB4M1R@kernel.org/
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
>  drivers/gpu/nova-core/gsp/cmdq.rs | 135 ++++++++++++++++++++++++++++----------
>  1 file changed, 100 insertions(+), 35 deletions(-)
>
> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
> index d36a62ba1c60..4200e7986774 100644
> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> @@ -251,38 +251,77 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
>      /// As the message queue is a circular buffer, the region may be discontiguous in memory. In
>      /// that case the second slice will have a non-zero length.
>      fn driver_write_area(&mut self) -> (&mut [[u8; GSP_PAGE_SIZE]], &mut [[u8; GSP_PAGE_SIZE]]) {
> -        let tx = self.cpu_write_ptr() as usize;
> -        let rx = self.gsp_read_ptr() as usize;
> +        let tx = num::u32_as_usize(self.cpu_write_ptr());
> +        let rx = num::u32_as_usize(self.gsp_read_ptr());
> +        // Number of pages between `tx` and the end of the command queue.
> +        // PANIC: Per the invariant of `cpu_write_ptr`, `tx < MSGQ_NUM_PAGES`.
> +        let after_tx_len = num::u32_as_usize(MSGQ_NUM_PAGES) - tx;
>  
> +        // Pointer to the start of the CPU message queue.
> +        //
>          // SAFETY:
> -        // - The `CoherentAllocation` contains exactly one object.
> -        // - We will only access the driver-owned part of the shared memory.
> -        // - Per the safety statement of the function, no concurrent access will be performed.
> -        let gsp_mem = &mut unsafe { self.0.as_slice_mut(0, 1) }.unwrap()[0];
> -        // PANIC: per the invariant of `cpu_write_ptr`, `tx` is `< MSGQ_NUM_PAGES`.
> -        let (before_tx, after_tx) = gsp_mem.cpuq.msgq.data.split_at_mut(tx);
> +        // - `self.0` contains exactly one element.
> +        // - `cpuq.msgq.data[0]` is within the bounds of that element.
> +        let data = unsafe { &raw mut (*self.0.start_ptr_mut()).cpuq.msgq.data[0] };
>  
> -        // The area starting at `tx` and ending at `rx - 2` modulo MSGQ_NUM_PAGES, inclusive,
> -        // belongs to the driver for writing.
> +        // Safety/Panic comments to be referenced by the code below.
> +        //
> +        // SAFETY[1]:
> +        // - `data` points to an array of `MSGQ_NUM_PAGES` elements.
> +        // - The area starting at `tx` and ending at `rx - 2` modulo `MSGQ_NUM_PAGES`,
> +        //   inclusive, belongs to the driver for writing and is not accessed concurrently by
> +        //   the GSP.
> +        // - `tx + after_tx_len` == `MSGQ_NUM_PAGES`.
> +        //
> +        // PANIC[1]:
> +        // - Per the invariant of `cpu_write_ptr`, `tx < MSGQ_NUM_PAGES`.
> +        // - Per the invariant of `gsp_read_ptr`, `rx < MSGQ_NUM_PAGES`.
>  
>          if rx == 0 {
> -            // Since `rx` is zero, leave an empty slot at end of the buffer.
> -            let last = after_tx.len() - 1;
> -            (&mut after_tx[..last], &mut [])
> +            (
> +                // SAFETY: See SAFETY[1].
> +                unsafe {
> +                    core::slice::from_raw_parts_mut(
> +                        data.add(tx),
> +                        // Since `rx` is zero, leave an empty slot at end of the buffer.
> +                        // PANIC: See PANIC[1].
> +                        after_tx_len - 1,
> +                    )
> +                },
> +                &mut [],
> +            )
>          } else if rx <= tx {
>              // The area is discontiguous and we leave an empty slot before `rx`.
> -            // PANIC:
> -            // - The index `rx - 1` is non-negative because `rx != 0` in this branch.
> -            // - The index does not exceed `before_tx.len()` (which equals `tx`) because
> -            //   `rx <= tx` in this branch.
> -            (after_tx, &mut before_tx[..(rx - 1)])
> +            (
> +                // SAFETY: See SAFETY[1].
> +                unsafe { core::slice::from_raw_parts_mut(data.add(tx), after_tx_len) },
> +                // SAFETY: See SAFETY[1].
> +                unsafe {
> +                    core::slice::from_raw_parts_mut(
> +                        data,
> +                        // Leave one empty slot before `rx`.
> +                        // PANIC:
> +                        // - See PANIC[1].
> +                        // - `rx - 1` is non-negative because `rx != 0` in this branch.
> +                        rx - 1,
> +                    )
> +                },
> +            )
>          } else {
>              // The area is contiguous and we leave an empty slot before `rx`.
> -            // PANIC:
> -            // - The index `rx - tx - 1` is non-negative because `rx > tx` in this branch.
> -            // - The index does not exceed `after_tx.len()` (which is `MSGQ_NUM_PAGES - tx`)
> -            //   because `rx < MSGQ_NUM_PAGES` by the `gsp_read_ptr` invariant.
> -            (&mut after_tx[..(rx - tx - 1)], &mut [])
> +            (
> +                // SAFETY: See SAFETY[1].
> +                unsafe {
> +                    core::slice::from_raw_parts_mut(
> +                        data.add(tx),
> +                        // PANIC:
> +                        // - See PANIC[1].
> +                        // - `rx - tx - 1` is non-negative because `rx > tx` in this branch.
> +                        rx - tx - 1,
> +                    )
> +                },
> +                &mut [],
> +            )
>          }
>      }
>  
> @@ -308,24 +347,50 @@ fn driver_write_area_size(&self) -> usize {
>          let tx = self.gsp_write_ptr() as usize;
>          let rx = self.cpu_read_ptr() as usize;

Should we use u32_as_usize here too?

Reviewed-by: Eliot Courtney <ecourtney@nvidia.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] gpu: nova-core: gsp: fix undefined behavior in command queue code
  2026-03-19  6:41 ` Eliot Courtney
@ 2026-03-19  8:24   ` Alexandre Courbot
  0 siblings, 0 replies; 6+ messages in thread
From: Alexandre Courbot @ 2026-03-19  8:24 UTC (permalink / raw)
  To: Eliot Courtney
  Cc: Danilo Krummrich, Alice Ryhl, David Airlie, Simona Vetter,
	Alistair Popple, John Hubbard, Joel Fernandes, Timur Tabi,
	Zhi Wang, rust-for-linux, dri-devel, linux-kernel

On Thu Mar 19, 2026 at 3:41 PM JST, Eliot Courtney wrote:
> On Thu Mar 19, 2026 at 2:36 PM JST, Alexandre Courbot wrote:
>> `driver_read_area` and `driver_write_area` are internal methods that
>> return slices containing the area of the command queue buffer that the
>> driver has exclusive read of write access, respectively.
>>
>> While their returned value is correct and safe to use, internally they
>> temporarily create a reference to the whole command-buffer slice,
>> including GSP-owned regions. These regions can change without notice,
>> and thus creating a slice to them is undefined behavior.
>>
>> Fix this by replacing the slice logic with pointer arithmetic and
>> creating slices to valid regions only. It relies on unsafe code, but
>> should be mostly replaced by `IoView` and `IoSlice` once they land.
>>
>> Fixes: 75f6b1de8133 ("gpu: nova-core: gsp: Add GSP command queue bindings and handling")
>> Suggested-by: Danilo Krummrich <dakr@kernel.org>
>> Link: https://lore.kernel.org/all/DH47AVPEKN06.3BERUSJIB4M1R@kernel.org/
>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>> ---
>>  drivers/gpu/nova-core/gsp/cmdq.rs | 135 ++++++++++++++++++++++++++++----------
>>  1 file changed, 100 insertions(+), 35 deletions(-)
>>
>> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
>> index d36a62ba1c60..4200e7986774 100644
>> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
>> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
>> @@ -251,38 +251,77 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
>>      /// As the message queue is a circular buffer, the region may be discontiguous in memory. In
>>      /// that case the second slice will have a non-zero length.
>>      fn driver_write_area(&mut self) -> (&mut [[u8; GSP_PAGE_SIZE]], &mut [[u8; GSP_PAGE_SIZE]]) {
>> -        let tx = self.cpu_write_ptr() as usize;
>> -        let rx = self.gsp_read_ptr() as usize;
>> +        let tx = num::u32_as_usize(self.cpu_write_ptr());
>> +        let rx = num::u32_as_usize(self.gsp_read_ptr());
>> +        // Number of pages between `tx` and the end of the command queue.
>> +        // PANIC: Per the invariant of `cpu_write_ptr`, `tx < MSGQ_NUM_PAGES`.
>> +        let after_tx_len = num::u32_as_usize(MSGQ_NUM_PAGES) - tx;
>>  
>> +        // Pointer to the start of the CPU message queue.
>> +        //
>>          // SAFETY:
>> -        // - The `CoherentAllocation` contains exactly one object.
>> -        // - We will only access the driver-owned part of the shared memory.
>> -        // - Per the safety statement of the function, no concurrent access will be performed.
>> -        let gsp_mem = &mut unsafe { self.0.as_slice_mut(0, 1) }.unwrap()[0];
>> -        // PANIC: per the invariant of `cpu_write_ptr`, `tx` is `< MSGQ_NUM_PAGES`.
>> -        let (before_tx, after_tx) = gsp_mem.cpuq.msgq.data.split_at_mut(tx);
>> +        // - `self.0` contains exactly one element.
>> +        // - `cpuq.msgq.data[0]` is within the bounds of that element.
>> +        let data = unsafe { &raw mut (*self.0.start_ptr_mut()).cpuq.msgq.data[0] };
>>  
>> -        // The area starting at `tx` and ending at `rx - 2` modulo MSGQ_NUM_PAGES, inclusive,
>> -        // belongs to the driver for writing.
>> +        // Safety/Panic comments to be referenced by the code below.
>> +        //
>> +        // SAFETY[1]:
>> +        // - `data` points to an array of `MSGQ_NUM_PAGES` elements.
>> +        // - The area starting at `tx` and ending at `rx - 2` modulo `MSGQ_NUM_PAGES`,
>> +        //   inclusive, belongs to the driver for writing and is not accessed concurrently by
>> +        //   the GSP.
>> +        // - `tx + after_tx_len` == `MSGQ_NUM_PAGES`.
>> +        //
>> +        // PANIC[1]:
>> +        // - Per the invariant of `cpu_write_ptr`, `tx < MSGQ_NUM_PAGES`.
>> +        // - Per the invariant of `gsp_read_ptr`, `rx < MSGQ_NUM_PAGES`.
>>  
>>          if rx == 0 {
>> -            // Since `rx` is zero, leave an empty slot at end of the buffer.
>> -            let last = after_tx.len() - 1;
>> -            (&mut after_tx[..last], &mut [])
>> +            (
>> +                // SAFETY: See SAFETY[1].
>> +                unsafe {
>> +                    core::slice::from_raw_parts_mut(
>> +                        data.add(tx),
>> +                        // Since `rx` is zero, leave an empty slot at end of the buffer.
>> +                        // PANIC: See PANIC[1].
>> +                        after_tx_len - 1,
>> +                    )
>> +                },
>> +                &mut [],
>> +            )
>>          } else if rx <= tx {
>>              // The area is discontiguous and we leave an empty slot before `rx`.
>> -            // PANIC:
>> -            // - The index `rx - 1` is non-negative because `rx != 0` in this branch.
>> -            // - The index does not exceed `before_tx.len()` (which equals `tx`) because
>> -            //   `rx <= tx` in this branch.
>> -            (after_tx, &mut before_tx[..(rx - 1)])
>> +            (
>> +                // SAFETY: See SAFETY[1].
>> +                unsafe { core::slice::from_raw_parts_mut(data.add(tx), after_tx_len) },
>> +                // SAFETY: See SAFETY[1].
>> +                unsafe {
>> +                    core::slice::from_raw_parts_mut(
>> +                        data,
>> +                        // Leave one empty slot before `rx`.
>> +                        // PANIC:
>> +                        // - See PANIC[1].
>> +                        // - `rx - 1` is non-negative because `rx != 0` in this branch.
>> +                        rx - 1,
>> +                    )
>> +                },
>> +            )
>>          } else {
>>              // The area is contiguous and we leave an empty slot before `rx`.
>> -            // PANIC:
>> -            // - The index `rx - tx - 1` is non-negative because `rx > tx` in this branch.
>> -            // - The index does not exceed `after_tx.len()` (which is `MSGQ_NUM_PAGES - tx`)
>> -            //   because `rx < MSGQ_NUM_PAGES` by the `gsp_read_ptr` invariant.
>> -            (&mut after_tx[..(rx - tx - 1)], &mut [])
>> +            (
>> +                // SAFETY: See SAFETY[1].
>> +                unsafe {
>> +                    core::slice::from_raw_parts_mut(
>> +                        data.add(tx),
>> +                        // PANIC:
>> +                        // - See PANIC[1].
>> +                        // - `rx - tx - 1` is non-negative because `rx > tx` in this branch.
>> +                        rx - tx - 1,
>> +                    )
>> +                },
>> +                &mut [],
>> +            )
>>          }
>>      }
>>  
>> @@ -308,24 +347,50 @@ fn driver_write_area_size(&self) -> usize {
>>          let tx = self.gsp_write_ptr() as usize;
>>          let rx = self.cpu_read_ptr() as usize;
>
> Should we use u32_as_usize here too?

Absolutely - I forgot to update that part (although arguably that could
be a separate patch).

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] gpu: nova-core: gsp: fix undefined behavior in command queue code
  2026-03-19  5:36 [PATCH] gpu: nova-core: gsp: fix undefined behavior in command queue code Alexandre Courbot
  2026-03-19  6:41 ` Eliot Courtney
@ 2026-03-20 12:54 ` Danilo Krummrich
  2026-03-21 18:48 ` Claude review: " Claude Code Review Bot
  2026-03-21 18:48 ` Claude Code Review Bot
  3 siblings, 0 replies; 6+ messages in thread
From: Danilo Krummrich @ 2026-03-20 12:54 UTC (permalink / raw)
  To: Alexandre Courbot
  Cc: Alice Ryhl, David Airlie, Simona Vetter, Alistair Popple,
	John Hubbard, Joel Fernandes, Timur Tabi, Zhi Wang,
	Eliot Courtney, rust-for-linux, dri-devel, linux-kernel

On Thu Mar 19, 2026 at 6:36 AM CET, Alexandre Courbot wrote:
> `driver_read_area` and `driver_write_area` are internal methods that
> return slices containing the area of the command queue buffer that the
> driver has exclusive read of write access, respectively.
>
> While their returned value is correct and safe to use, internally they
> temporarily create a reference to the whole command-buffer slice,
> including GSP-owned regions. These regions can change without notice,
> and thus creating a slice to them is undefined behavior.
>
> Fix this by replacing the slice logic with pointer arithmetic and
> creating slices to valid regions only. It relies on unsafe code, but
> should be mostly replaced by `IoView` and `IoSlice` once they land.
>
> Fixes: 75f6b1de8133 ("gpu: nova-core: gsp: Add GSP command queue bindings and handling")
> Suggested-by: Danilo Krummrich <dakr@kernel.org>

Should be Reported-by:.

> Link: https://lore.kernel.org/all/DH47AVPEKN06.3BERUSJIB4M1R@kernel.org/

Should be Closes:.

> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
>  drivers/gpu/nova-core/gsp/cmdq.rs | 135 ++++++++++++++++++++++++++++----------
>  1 file changed, 100 insertions(+), 35 deletions(-)
>
> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
> index d36a62ba1c60..4200e7986774 100644
> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> @@ -251,38 +251,77 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
>      /// As the message queue is a circular buffer, the region may be discontiguous in memory. In
>      /// that case the second slice will have a non-zero length.
>      fn driver_write_area(&mut self) -> (&mut [[u8; GSP_PAGE_SIZE]], &mut [[u8; GSP_PAGE_SIZE]]) {
> -        let tx = self.cpu_write_ptr() as usize;
> -        let rx = self.gsp_read_ptr() as usize;
> +        let tx = num::u32_as_usize(self.cpu_write_ptr());
> +        let rx = num::u32_as_usize(self.gsp_read_ptr());
> +        // Number of pages between `tx` and the end of the command queue.
> +        // PANIC: Per the invariant of `cpu_write_ptr`, `tx < MSGQ_NUM_PAGES`.
> +        let after_tx_len = num::u32_as_usize(MSGQ_NUM_PAGES) - tx;
>  
> +        // Pointer to the start of the CPU message queue.
> +        //
>          // SAFETY:
> -        // - The `CoherentAllocation` contains exactly one object.
> -        // - We will only access the driver-owned part of the shared memory.
> -        // - Per the safety statement of the function, no concurrent access will be performed.
> -        let gsp_mem = &mut unsafe { self.0.as_slice_mut(0, 1) }.unwrap()[0];
> -        // PANIC: per the invariant of `cpu_write_ptr`, `tx` is `< MSGQ_NUM_PAGES`.
> -        let (before_tx, after_tx) = gsp_mem.cpuq.msgq.data.split_at_mut(tx);
> +        // - `self.0` contains exactly one element.
> +        // - `cpuq.msgq.data[0]` is within the bounds of that element.
> +        let data = unsafe { &raw mut (*self.0.start_ptr_mut()).cpuq.msgq.data[0] };
>  
> -        // The area starting at `tx` and ending at `rx - 2` modulo MSGQ_NUM_PAGES, inclusive,
> -        // belongs to the driver for writing.
> +        // Safety/Panic comments to be referenced by the code below.
> +        //
> +        // SAFETY[1]:
> +        // - `data` points to an array of `MSGQ_NUM_PAGES` elements.
> +        // - The area starting at `tx` and ending at `rx - 2` modulo `MSGQ_NUM_PAGES`,
> +        //   inclusive, belongs to the driver for writing and is not accessed concurrently by
> +        //   the GSP.
> +        // - `tx + after_tx_len` == `MSGQ_NUM_PAGES`.
> +        //
> +        // PANIC[1]:
> +        // - Per the invariant of `cpu_write_ptr`, `tx < MSGQ_NUM_PAGES`.
> +        // - Per the invariant of `gsp_read_ptr`, `rx < MSGQ_NUM_PAGES`.

I didn't do the math, but can't we just calculate the offset values in the below
if-else-if-else blocks and call from_raw_parts_mut() once with the safety
comment above? I think that'd be much cleaner. Similar for
driver_write_area_size().

>          if rx == 0 {
> -            // Since `rx` is zero, leave an empty slot at end of the buffer.
> -            let last = after_tx.len() - 1;
> -            (&mut after_tx[..last], &mut [])
> +            (
> +                // SAFETY: See SAFETY[1].
> +                unsafe {
> +                    core::slice::from_raw_parts_mut(
> +                        data.add(tx),
> +                        // Since `rx` is zero, leave an empty slot at end of the buffer.
> +                        // PANIC: See PANIC[1].
> +                        after_tx_len - 1,
> +                    )
> +                },
> +                &mut [],
> +            )
>          } else if rx <= tx {
>              // The area is discontiguous and we leave an empty slot before `rx`.
> -            // PANIC:
> -            // - The index `rx - 1` is non-negative because `rx != 0` in this branch.
> -            // - The index does not exceed `before_tx.len()` (which equals `tx`) because
> -            //   `rx <= tx` in this branch.
> -            (after_tx, &mut before_tx[..(rx - 1)])
> +            (
> +                // SAFETY: See SAFETY[1].
> +                unsafe { core::slice::from_raw_parts_mut(data.add(tx), after_tx_len) },
> +                // SAFETY: See SAFETY[1].
> +                unsafe {
> +                    core::slice::from_raw_parts_mut(
> +                        data,
> +                        // Leave one empty slot before `rx`.
> +                        // PANIC:
> +                        // - See PANIC[1].
> +                        // - `rx - 1` is non-negative because `rx != 0` in this branch.
> +                        rx - 1,
> +                    )
> +                },
> +            )
>          } else {
>              // The area is contiguous and we leave an empty slot before `rx`.
> -            // PANIC:
> -            // - The index `rx - tx - 1` is non-negative because `rx > tx` in this branch.
> -            // - The index does not exceed `after_tx.len()` (which is `MSGQ_NUM_PAGES - tx`)
> -            //   because `rx < MSGQ_NUM_PAGES` by the `gsp_read_ptr` invariant.
> -            (&mut after_tx[..(rx - tx - 1)], &mut [])
> +            (
> +                // SAFETY: See SAFETY[1].
> +                unsafe {
> +                    core::slice::from_raw_parts_mut(
> +                        data.add(tx),
> +                        // PANIC:
> +                        // - See PANIC[1].
> +                        // - `rx - tx - 1` is non-negative because `rx > tx` in this branch.
> +                        rx - tx - 1,
> +                    )
> +                },
> +                &mut [],
> +            )
>          }
>      }
>  
> @@ -308,24 +347,50 @@ fn driver_write_area_size(&self) -> usize {
>          let tx = self.gsp_write_ptr() as usize;
>          let rx = self.cpu_read_ptr() as usize;
>  
> +        // Pointer to the start of the GSP message queue.
> +        //
>          // SAFETY:
> -        // - The `CoherentAllocation` contains exactly one object.
> -        // - We will only access the driver-owned part of the shared memory.
> -        // - Per the safety statement of the function, no concurrent access will be performed.
> -        let gsp_mem = &unsafe { self.0.as_slice(0, 1) }.unwrap()[0];
> -        let data = &gsp_mem.gspq.msgq.data;
> +        // - `self.0` contains exactly one element.
> +        // - `gspq.msgq.data[0]` is within the bounds of that element.
> +        let data = unsafe { &raw const (*self.0.start_ptr()).gspq.msgq.data[0] };
> +
> +        // Safety/Panic comments to be referenced by the code below.
> +        //
> +        // SAFETY[1]:
> +        // - `data` points to an array of `MSGQ_NUM_PAGES` elements.
> +        // - The area starting at `rx` and ending at `tx - 1` modulo `MSGQ_NUM_PAGES`,
> +        //   inclusive, belongs to the driver for reading and is not accessed concurrently by
> +        //   the GSP.
> +        //
> +        // PANIC[1]:
> +        // - Per the invariant of `cpu_read_ptr`, `rx < MSGQ_NUM_PAGES`.
> +        // - Per the invariant of `gsp_write_ptr`, `tx < MSGQ_NUM_PAGES`.
>  
> -        // The area starting at `rx` and ending at `tx - 1` modulo MSGQ_NUM_PAGES, inclusive,
> -        // belongs to the driver for reading.
> -        // PANIC:
> -        // - per the invariant of `cpu_read_ptr`, `rx < MSGQ_NUM_PAGES`
> -        // - per the invariant of `gsp_write_ptr`, `tx < MSGQ_NUM_PAGES`
>          if rx <= tx {
>              // The area is contiguous.
> -            (&data[rx..tx], &[])
> +            (
> +                // SAFETY: See SAFETY[1].
> +                // PANIC:
> +                // - See PANIC[1].
> +                // - Per the branch test, `rx <= tx`.
> +                unsafe { core::slice::from_raw_parts(data.add(rx), tx - rx) },
> +                &[],
> +            )
>          } else {
>              // The area is discontiguous.
> -            (&data[rx..], &data[..tx])
> +            (
> +                // SAFETY: See SAFETY[1].
> +                // PANIC: See PANIC[1].
> +                unsafe {
> +                    core::slice::from_raw_parts(
> +                        data.add(rx),
> +                        num::u32_as_usize(MSGQ_NUM_PAGES) - rx,
> +                    )
> +                },
> +                // SAFETY: See SAFETY[1].
> +                // PANIC: See PANIC[1].
> +                unsafe { core::slice::from_raw_parts(data, tx) },
> +            )
>          }
>      }
>  
>
> ---
> base-commit: a19457958c3018783881c4416f272cd594f13049
> change-id: 20260319-cmdq-ub-fix-d57b09a745b9
>
> Best regards,
> -- 
> Alexandre Courbot <acourbot@nvidia.com>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Claude review: gpu: nova-core: gsp: fix undefined behavior in command queue code
  2026-03-19  5:36 [PATCH] gpu: nova-core: gsp: fix undefined behavior in command queue code Alexandre Courbot
  2026-03-19  6:41 ` Eliot Courtney
  2026-03-20 12:54 ` Danilo Krummrich
@ 2026-03-21 18:48 ` Claude Code Review Bot
  2026-03-21 18:48 ` Claude Code Review Bot
  3 siblings, 0 replies; 6+ messages in thread
From: Claude Code Review Bot @ 2026-03-21 18:48 UTC (permalink / raw)
  To: dri-devel-reviews

Overall Series Review

Subject: gpu: nova-core: gsp: fix undefined behavior in command queue code
Author: Alexandre Courbot <acourbot@nvidia.com>
Patches: 4
Reviewed: 2026-03-22T04:48:51.690847

---

This is a single-patch fix for undefined behavior in the nova-core GSP command queue code. The problem is real and well-identified: the old code creates Rust references (`&` / `&mut`) to the entire `GspMem` shared memory structure (via `as_slice`/`as_slice_mut`), which includes regions concurrently owned by the GSP. Under Rust's aliasing model, creating a reference to memory that may be concurrently modified — even if you never actually access those parts — is undefined behavior.

The fix correctly replaces this with raw pointer arithmetic (`&raw mut` / `&raw const` + `start_ptr_mut()`/`start_ptr()`), only materializing safe slices over the driver-owned regions. The approach is sound and the safety reasoning is well-documented.

**Verdict: The patch is correct and should be accepted.** A few minor observations below.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Claude review: gpu: nova-core: gsp: fix undefined behavior in command queue code
  2026-03-19  5:36 [PATCH] gpu: nova-core: gsp: fix undefined behavior in command queue code Alexandre Courbot
                   ` (2 preceding siblings ...)
  2026-03-21 18:48 ` Claude review: " Claude Code Review Bot
@ 2026-03-21 18:48 ` Claude Code Review Bot
  3 siblings, 0 replies; 6+ messages in thread
From: Claude Code Review Bot @ 2026-03-21 18:48 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Correctness of the UB fix:**

The core issue is on lines like the old:
```rust
let gsp_mem = &mut unsafe { self.0.as_slice_mut(0, 1) }.unwrap()[0];
```
This creates `&mut GspMem`, a mutable reference to the *entire* shared memory structure. Even though the code only subsequently accesses the driver-owned region via `split_at_mut`, the reference itself covers GSP-owned memory that may be concurrently modified, violating Rust's aliasing rules.

The fix correctly uses:
```rust
let data = unsafe { &raw mut (*self.0.start_ptr_mut()).cpuq.msgq.data[0] };
```
This obtains a raw pointer via place projection (`&raw mut` on a field path through a raw pointer dereference), which does *not* create an intermediate reference to `GspMem`. Slices are then constructed only over driver-owned regions via `core::slice::from_raw_parts_mut`.

**Ring buffer logic verification:**

All three branches of `driver_write_area` are correct:

- `rx == 0`: Returns `[tx, MSGQ_NUM_PAGES - 1)` — correctly leaves one empty sentinel slot at the end. `after_tx_len - 1` cannot underflow because `tx < MSGQ_NUM_PAGES` guarantees `after_tx_len >= 1`.

- `rx <= tx` (rx ≠ 0): Returns `[tx, MSGQ_NUM_PAGES)` and `[0, rx-1)` — correctly leaves one slot before `rx`. `rx - 1` is safe because `rx != 0`.

- `rx > tx`: Returns `[tx, rx-1)` — correctly leaves one slot before `rx`. `rx - tx - 1` is non-negative because `rx > tx`.

The `driver_read_area` branches are also correct:

- `rx <= tx`: Returns `[rx, tx)` — contiguous read area. When `rx == tx` this produces a zero-length slice, correctly indicating an empty queue.

- `rx > tx`: Returns `[rx, MSGQ_NUM_PAGES)` and `[0, tx)` — discontiguous read area wrapping around.

**Minor observations (non-blocking):**

1. The `driver_read_area` code still uses `as usize` casts on line 277-278 of the new code:
   ```rust
   let tx = self.gsp_write_ptr() as usize;
   let rx = self.cpu_read_ptr() as usize;
   ```
   while `driver_write_area` was updated to use `num::u32_as_usize()`. This inconsistency is cosmetic but could be cleaned up for consistency.

2. The SAFETY comment style using numbered references (`SAFETY[1]`, `PANIC[1]`) is a nice pattern for avoiding repetition. The comments are thorough and accurate.

3. The commit message has a minor typo: "read of write access" should be "read **or** write access."

**No functional issues found.** The patch correctly eliminates the UB while preserving the exact same externally-observable behavior.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-03-21 18:48 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-19  5:36 [PATCH] gpu: nova-core: gsp: fix undefined behavior in command queue code Alexandre Courbot
2026-03-19  6:41 ` Eliot Courtney
2026-03-19  8:24   ` Alexandre Courbot
2026-03-20 12:54 ` Danilo Krummrich
2026-03-21 18:48 ` Claude review: " Claude Code Review Bot
2026-03-21 18:48 ` Claude Code Review Bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox