public inbox for drm-ai-reviews@public-inbox.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH 0/2] drm/xe/pagefault: Add SRCID to pagefault reporting
@ 2026-05-26 21:44 Jonathan Cavitt
  2026-05-26 21:44 ` [PATCH 1/2] drm/xe/pagefault: Add SRCID to pagefault struct Jonathan Cavitt
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Jonathan Cavitt @ 2026-05-26 21:44 UTC (permalink / raw)
  To: dri-devel
  Cc: saurabhg.gupta, alex.zuo, jonathan.cavitt, mripard, airlied,
	simona, linux-kernel, intel-xe, Rodrigo.vivi, matthew.brost,
	maarten.lankhorst, thomas.hellstrom, tzimmermann

Add SRCID to the xe_pagefault struct, which reports the ID of the
faulting hardware unit.  This will be passed on to the
xe_vm_get_property_ioctl for reading per-vm faults and will assist in
diagnosing pagefaults.

Jonathan Cavitt (2):
  drm/xe/pagefault: Add SRCID to pagefault struct
  drm/xe/vm: Add srcid to xe_vm_get_property_ioctl fault report

 drivers/gpu/drm/xe/xe_guc_pagefault.c   |  1 +
 drivers/gpu/drm/xe/xe_pagefault.c       |  6 ++++--
 drivers/gpu/drm/xe/xe_pagefault_types.h |  4 +++-
 drivers/gpu/drm/xe/xe_vm.c              | 11 +++++++++--
 drivers/gpu/drm/xe/xe_vm_types.h        |  2 ++
 include/uapi/drm/xe_drm.h               |  4 ++--
 6 files changed, 21 insertions(+), 7 deletions(-)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] drm/xe/pagefault: Add SRCID to pagefault struct
  2026-05-26 21:44 [PATCH 0/2] drm/xe/pagefault: Add SRCID to pagefault reporting Jonathan Cavitt
@ 2026-05-26 21:44 ` Jonathan Cavitt
  2026-05-27  4:00   ` Claude review: " Claude Code Review Bot
  2026-05-26 21:44 ` [PATCH 2/2] drm/xe/vm: Add srcid to xe_vm_get_property_ioctl fault report Jonathan Cavitt
  2026-05-27  4:00 ` Claude review: drm/xe/pagefault: Add SRCID to pagefault reporting Claude Code Review Bot
  2 siblings, 1 reply; 6+ messages in thread
From: Jonathan Cavitt @ 2026-05-26 21:44 UTC (permalink / raw)
  To: dri-devel
  Cc: saurabhg.gupta, alex.zuo, jonathan.cavitt, mripard, airlied,
	simona, linux-kernel, intel-xe, Rodrigo.vivi, matthew.brost,
	maarten.lankhorst, thomas.hellstrom, tzimmermann

Add SRCID information to pagefault struct for the purpose of reporting
the hardware unit that resulted in the pagefault.

Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
---
 drivers/gpu/drm/xe/xe_guc_pagefault.c   | 1 +
 drivers/gpu/drm/xe/xe_pagefault.c       | 6 ++++--
 drivers/gpu/drm/xe/xe_pagefault_types.h | 4 +++-
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_guc_pagefault.c b/drivers/gpu/drm/xe/xe_guc_pagefault.c
index 607e32392f46..17cb7f359261 100644
--- a/drivers/gpu/drm/xe/xe_guc_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_guc_pagefault.c
@@ -91,6 +91,7 @@ int xe_guc_pagefault_handler(struct xe_guc *guc, u32 *msg, u32 len)
 				   FIELD_GET(PFD_FAULT_TYPE, msg[2]));
 	pf.consumer.engine_class = FIELD_GET(PFD_ENG_CLASS, msg[0]);
 	pf.consumer.engine_instance = FIELD_GET(PFD_ENG_INSTANCE, msg[0]);
+	pf.consumer.srcid = FIELD_GET(PFD_SRC_ID, msg[0]);
 
 	pf.producer.private = guc;
 	pf.producer.ops = &guc_pagefault_ops;
diff --git a/drivers/gpu/drm/xe/xe_pagefault.c b/drivers/gpu/drm/xe/xe_pagefault.c
index dd3c068e1a39..42b682ba5f9b 100644
--- a/drivers/gpu/drm/xe/xe_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_pagefault.c
@@ -248,7 +248,8 @@ static void xe_pagefault_print(struct xe_pagefault *pf)
 		   "\tAccessType: %lu\n"
 		   "\tFaultLevel: %lu\n"
 		   "\tEngineClass: %d %s\n"
-		   "\tEngineInstance: %d\n",
+		   "\tEngineInstance: %d\n"
+		   "\tSRCID: 0x%02x\n",
 		   pf->consumer.asid,
 		   upper_32_bits(pf->consumer.page_addr),
 		   lower_32_bits(pf->consumer.page_addr),
@@ -260,7 +261,8 @@ static void xe_pagefault_print(struct xe_pagefault *pf)
 			     pf->consumer.fault_type_level),
 		   pf->consumer.engine_class,
 		   xe_hw_engine_class_to_str(pf->consumer.engine_class),
-		   pf->consumer.engine_instance);
+		   pf->consumer.engine_instance,
+		   pf->consumer.srcid);
 }
 
 static void xe_pagefault_save_to_vm(struct xe_device *xe, struct xe_pagefault *pf)
diff --git a/drivers/gpu/drm/xe/xe_pagefault_types.h b/drivers/gpu/drm/xe/xe_pagefault_types.h
index c4ee625b93dd..60cc269aeea3 100644
--- a/drivers/gpu/drm/xe/xe_pagefault_types.h
+++ b/drivers/gpu/drm/xe/xe_pagefault_types.h
@@ -86,8 +86,10 @@ struct xe_pagefault {
 		u8 engine_class;
 		/** @consumer.engine_instance: engine instance */
 		u8 engine_instance;
+		/** @consumer.srcid: ID of hardware unit producing fault */
+		u8 srcid;
 		/** @consumer.reserved: reserved bits for future expansion */
-		u64 reserved;
+		u8 reserved[7];
 	} consumer;
 	/**
 	 * @producer: State for the producer (i.e., HW/FW interface). Populated
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] drm/xe/vm: Add srcid to xe_vm_get_property_ioctl fault report
  2026-05-26 21:44 [PATCH 0/2] drm/xe/pagefault: Add SRCID to pagefault reporting Jonathan Cavitt
  2026-05-26 21:44 ` [PATCH 1/2] drm/xe/pagefault: Add SRCID to pagefault struct Jonathan Cavitt
@ 2026-05-26 21:44 ` Jonathan Cavitt
  2026-05-27  4:00   ` Claude review: " Claude Code Review Bot
  2026-05-27  4:00 ` Claude review: drm/xe/pagefault: Add SRCID to pagefault reporting Claude Code Review Bot
  2 siblings, 1 reply; 6+ messages in thread
From: Jonathan Cavitt @ 2026-05-26 21:44 UTC (permalink / raw)
  To: dri-devel
  Cc: saurabhg.gupta, alex.zuo, jonathan.cavitt, mripard, airlied,
	simona, linux-kernel, intel-xe, Rodrigo.vivi, matthew.brost,
	maarten.lankhorst, thomas.hellstrom, tzimmermann

Add the SRCID of the faulting hardware unit to the return of the
xe_vm_get_property_ioctl fault report.

Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
---
 drivers/gpu/drm/xe/xe_vm.c       | 11 +++++++++--
 drivers/gpu/drm/xe/xe_vm_types.h |  2 ++
 include/uapi/drm/xe_drm.h        |  4 ++--
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 080c2fff0e95..18d5cdafd76b 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -636,6 +636,7 @@ void xe_vm_add_fault_entry_pf(struct xe_vm *vm, struct xe_pagefault *pf)
 				  pf->consumer.fault_type_level);
 	e->fault_level = FIELD_GET(XE_PAGEFAULT_LEVEL_MASK,
 				   pf->consumer.fault_type_level);
+	e->srcid = pf->consumer.srcid;
 
 	list_add_tail(&e->list, &vm->faults.list);
 	vm->faults.len++;
@@ -4108,6 +4109,11 @@ static u8 xe_to_user_fault_level(u8 fault_level)
 	return fault_level;
 }
 
+static u8 xe_to_user_srcid(u8 srcid)
+{
+	return srcid;
+}
+
 static int fill_faults(struct xe_vm *vm,
 		       struct drm_xe_vm_get_property *args)
 {
@@ -4135,6 +4141,8 @@ static int fill_faults(struct xe_vm *vm,
 		fault_entry.fault_type = xe_to_user_fault_type(entry->fault_type);
 		fault_entry.fault_level = xe_to_user_fault_level(entry->fault_level);
 
+		fault_entry.srcid = xe_to_user_srcid(entry->srcid);
+
 		memcpy(&fault_list[i], &fault_entry, entry_size);
 
 		i++;
@@ -4190,8 +4198,7 @@ int xe_vm_get_property_ioctl(struct drm_device *drm, void *data,
 	int ret = 0;
 
 	if (XE_IOCTL_DBG(xe, (args->reserved[0] || args->reserved[1] ||
-			      args->reserved[2] || args->extensions ||
-			      args->pad)))
+			      args->reserved[2] || args->extensions)))
 		return -EINVAL;
 
 	vm = xe_vm_lookup(xef, args->vm_id);
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 635ed29b9a69..e8aea75341cc 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -196,6 +196,7 @@ struct xe_device;
  * @access_type: type of address access that resulted in fault
  * @fault_type: type of fault reported
  * @fault_level: fault level of the fault
+ * @srcid: ID of the faulting hardware unit
  */
 struct xe_vm_fault_entry {
 	struct list_head list;
@@ -204,6 +205,7 @@ struct xe_vm_fault_entry {
 	u8 access_type;
 	u8 fault_type;
 	u8 fault_level;
+	u8 srcid;
 };
 
 struct xe_vm {
diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h
index 48e9f1fdb78d..2b45b691a032 100644
--- a/include/uapi/drm/xe_drm.h
+++ b/include/uapi/drm/xe_drm.h
@@ -1294,8 +1294,8 @@ struct xe_vm_fault {
 #define FAULT_LEVEL_PML4	3
 #define FAULT_LEVEL_PML5	4
 	__u8 fault_level;
-	/** @pad: MBZ */
-	__u8 pad;
+	/** @srcid: ID of the faulting hardware unit */
+	__u8 srcid;
 	/** @reserved: MBZ */
 	__u64 reserved[4];
 };
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Claude review: drm/xe/pagefault: Add SRCID to pagefault reporting
  2026-05-26 21:44 [PATCH 0/2] drm/xe/pagefault: Add SRCID to pagefault reporting Jonathan Cavitt
  2026-05-26 21:44 ` [PATCH 1/2] drm/xe/pagefault: Add SRCID to pagefault struct Jonathan Cavitt
  2026-05-26 21:44 ` [PATCH 2/2] drm/xe/vm: Add srcid to xe_vm_get_property_ioctl fault report Jonathan Cavitt
@ 2026-05-27  4:00 ` Claude Code Review Bot
  2 siblings, 0 replies; 6+ messages in thread
From: Claude Code Review Bot @ 2026-05-27  4:00 UTC (permalink / raw)
  To: dri-devel-reviews

Overall Series Review

Subject: drm/xe/pagefault: Add SRCID to pagefault reporting
Author: Jonathan Cavitt <jonathan.cavitt@intel.com>
Patches: 3
Reviewed: 2026-05-27T14:00:17.515622

---

This is a small 2-patch series adding the SRCID (source ID of the faulting hardware unit) to the xe pagefault reporting path. The data flows from the GuC pagefault handler through the internal pagefault struct, gets saved into per-VM fault entries, and is exposed to userspace via the `xe_vm_get_property_ioctl`.

The series is mostly clean, but **Patch 2 has a bug**: it accidentally removes the MBZ validation of the `pad` field in `drm_xe_vm_get_property`, which is completely unrelated to the srcid feature. Additionally, the UAPI for `srcid` lacks the `#define` documentation that the sibling fields have, making it harder for userspace to interpret the value.

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Claude review: drm/xe/pagefault: Add SRCID to pagefault struct
  2026-05-26 21:44 ` [PATCH 1/2] drm/xe/pagefault: Add SRCID to pagefault struct Jonathan Cavitt
@ 2026-05-27  4:00   ` Claude Code Review Bot
  0 siblings, 0 replies; 6+ messages in thread
From: Claude Code Review Bot @ 2026-05-27  4:00 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Verdict: Looks good.**

The `PFD_SRC_ID` field is `GENMASK(10, 3)` — an 8-bit value that fits cleanly in the new `u8 srcid` field.

The `reserved` field change from `u64 reserved` to `u8 reserved[7]` correctly maintains the 8-byte footprint at the same offset (16), keeping the struct at its documented 64-byte size:

```c
/** @consumer.srcid: ID of hardware unit producing fault */
u8 srcid;
/** @consumer.reserved: reserved bits for future expansion */
u8 reserved[7];
```

The print format addition is correct — `0x%02x` for a `u8` value, and the argument ordering matches the format string.

No issues with this patch.

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Claude review: drm/xe/vm: Add srcid to xe_vm_get_property_ioctl fault report
  2026-05-26 21:44 ` [PATCH 2/2] drm/xe/vm: Add srcid to xe_vm_get_property_ioctl fault report Jonathan Cavitt
@ 2026-05-27  4:00   ` Claude Code Review Bot
  0 siblings, 0 replies; 6+ messages in thread
From: Claude Code Review Bot @ 2026-05-27  4:00 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Issue 1 (Bug): `pad` validation removed from `drm_xe_vm_get_property`**

The patch removes the `args->pad` check from the ioctl input validation:

```c
// Before:
if (XE_IOCTL_DBG(xe, (args->reserved[0] || args->reserved[1] ||
                      args->reserved[2] || args->extensions ||
                      args->pad)))

// After:
if (XE_IOCTL_DBG(xe, (args->reserved[0] || args->reserved[1] ||
                      args->reserved[2] || args->extensions)))
```

The `pad` field in `struct drm_xe_vm_get_property` (`xe_drm.h:1344`) still exists and is still documented as MBZ. This removal is unrelated to the srcid feature — the srcid change is in `struct xe_vm_fault`, a completely different struct. Dropping this validation means userspace can now pass garbage in the `pad` field without error, which prevents using that field for future extensions (since old kernels would silently accept non-zero values). This looks like an accidental change and should be reverted.

**Issue 2 (Minor): `xe_to_user_srcid` is a pure identity function**

```c
static u8 xe_to_user_srcid(u8 srcid)
{
	return srcid;
}
```

This follows the pattern of `xe_to_user_fault_type` and `xe_to_user_fault_level` (which are also identity functions), so it's consistent. However, the comment at `xe_vm.c:4091` explains these exist to map "from current bspec specification to user spec abstraction." If there's no planned translation for srcid, this is unnecessary indirection. This is a style nit — acceptable if the convention is to have one per field for future-proofing.

**Issue 3 (UAPI documentation gap): No `#define` constants for srcid values**

The other fields in `struct xe_vm_fault` all have `#define` constants documenting their valid values:

```c
#define FAULT_ACCESS_TYPE_READ    0
#define FAULT_ACCESS_TYPE_WRITE   1
...
#define FAULT_LEVEL_PTE           0
#define FAULT_LEVEL_PDE           1
...
```

The new `srcid` field has no such documentation. Userspace receives a raw hardware register value with no guidance on interpretation. At minimum, the UAPI header should note that these are hardware-defined source IDs, or provide defines for known values. Without this, the field is opaque to userspace consumers.

**Correctness items that are fine:**
- `xe_vm_fault_entry` gains `srcid` at the right place and it's populated in `xe_vm_add_fault_entry_pf`.
- The UAPI struct `xe_vm_fault` repurposes `pad` → `srcid` — since this is an output-only struct (kernel fills it, copies to user), old userspace just sees a previously-zero byte now carrying data, which is safe. New userspace on old kernels sees `srcid=0`, which is harmless.
- The `fill_faults` function correctly converts and copies the field.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-05-27  4:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-26 21:44 [PATCH 0/2] drm/xe/pagefault: Add SRCID to pagefault reporting Jonathan Cavitt
2026-05-26 21:44 ` [PATCH 1/2] drm/xe/pagefault: Add SRCID to pagefault struct Jonathan Cavitt
2026-05-27  4:00   ` Claude review: " Claude Code Review Bot
2026-05-26 21:44 ` [PATCH 2/2] drm/xe/vm: Add srcid to xe_vm_get_property_ioctl fault report Jonathan Cavitt
2026-05-27  4:00   ` Claude review: " Claude Code Review Bot
2026-05-27  4:00 ` Claude review: drm/xe/pagefault: Add SRCID to pagefault reporting Claude Code Review Bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox