* [PATCH v1 1/3] drm/doc: Update documentation for 'none' recovery method
2026-03-03 11:05 [PATCH v1 0/3] Update 'none' recovery method for DRM wedged event Raag Jadav
@ 2026-03-03 11:05 ` Raag Jadav
2026-03-03 15:32 ` Rodrigo Vivi
2026-03-03 21:31 ` Claude review: " Claude Code Review Bot
2026-03-03 11:05 ` [PATCH v1 2/3] drm: Update log " Raag Jadav
` (2 subsequent siblings)
3 siblings, 2 replies; 11+ messages in thread
From: Raag Jadav @ 2026-03-03 11:05 UTC (permalink / raw)
To: intel-xe, dri-devel
Cc: airlied, simona, mripard, matthew.brost, rodrigo.vivi,
riana.tauro, christian.koenig, andrealmeid, Raag Jadav
Expand 'none' recovery method for wedged event to include debug cases
where driver wants to hint "no recovery" without resetting the device
from driver context.
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
---
Documentation/gpu/drm-uapi.rst | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
index d98428a592f1..af20136108b7 100644
--- a/Documentation/gpu/drm-uapi.rst
+++ b/Documentation/gpu/drm-uapi.rst
@@ -439,13 +439,11 @@ following expectations.
=============== ========================================
The only exception to this is ``WEDGED=none``, which signifies that the device
-was temporarily 'wedged' at some point but was recovered from driver context
-using device specific methods like reset. No explicit recovery is expected from
-the consumer in this case, but it can still take additional steps like gathering
-telemetry information (devcoredump, syslog). This is useful because the first
-hang is usually the most critical one which can result in consequential hangs or
-complete wedging.
-
+was temporarily 'wedged' at some point but no explicit recovery is expected
+from the consumer in this case. The consumer can still try to gather telemetry
+information (devcoredump, syslog) for debug purpose in order to root cause the
+hang. This is useful because the first hang is usually the most critical one
+which can result in consequential hangs or complete wedging.
Vendor Specific Recovery
------------------------
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH v1 1/3] drm/doc: Update documentation for 'none' recovery method
2026-03-03 11:05 ` [PATCH v1 1/3] drm/doc: Update documentation for 'none' recovery method Raag Jadav
@ 2026-03-03 15:32 ` Rodrigo Vivi
2026-03-03 21:31 ` Claude review: " Claude Code Review Bot
1 sibling, 0 replies; 11+ messages in thread
From: Rodrigo Vivi @ 2026-03-03 15:32 UTC (permalink / raw)
To: Raag Jadav
Cc: intel-xe, dri-devel, airlied, simona, mripard, matthew.brost,
riana.tauro, christian.koenig, andrealmeid
On Tue, Mar 03, 2026 at 04:35:17PM +0530, Raag Jadav wrote:
> Expand 'none' recovery method for wedged event to include debug cases
> where driver wants to hint "no recovery" without resetting the device
> from driver context.
>
> Signed-off-by: Raag Jadav <raag.jadav@intel.com>
> ---
> Documentation/gpu/drm-uapi.rst | 12 +++++-------
> 1 file changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
> index d98428a592f1..af20136108b7 100644
> --- a/Documentation/gpu/drm-uapi.rst
> +++ b/Documentation/gpu/drm-uapi.rst
> @@ -439,13 +439,11 @@ following expectations.
> =============== ========================================
>
> The only exception to this is ``WEDGED=none``, which signifies that the device
I don't believe it is an 'exception'. The recovery method is none. It means 'do nothing
to recover'.
> -was temporarily 'wedged' at some point but was recovered from driver context
> -using device specific methods like reset. No explicit recovery is expected from
> -the consumer in this case, but it can still take additional steps like gathering
> -telemetry information (devcoredump, syslog). This is useful because the first
> -hang is usually the most critical one which can result in consequential hangs or
> -complete wedging.
> -
> +was temporarily 'wedged' at some point but no explicit recovery is expected
temporarily or definitive it doesn't matter. none still means 'do nothing to recover'.
> +from the consumer in this case. The consumer can still try to gather telemetry
> +information (devcoredump, syslog) for debug purpose in order to root cause the
> +hang. This is useful because the first hang is usually the most critical one
> +which can result in consequential hangs or complete wedging.
What about replacing this entire sentence to something like:
WEDGED=none indicates that no recovery action is expected from userspace—regardless of whether the driver already handled the condition—and consumers may optionally collect telemetry (e.g., devcoredump, syslog) for debugging.
(AI assisted text)
>
> Vendor Specific Recovery
> ------------------------
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Claude review: drm/doc: Update documentation for 'none' recovery method
2026-03-03 11:05 ` [PATCH v1 1/3] drm/doc: Update documentation for 'none' recovery method Raag Jadav
2026-03-03 15:32 ` Rodrigo Vivi
@ 2026-03-03 21:31 ` Claude Code Review Bot
1 sibling, 0 replies; 11+ messages in thread
From: Claude Code Review Bot @ 2026-03-03 21:31 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
The documentation change removes the key phrase "was recovered from driver context using device specific methods like reset" and replaces it with a more vague description. However, the new text still says:
```
The only exception to this is ``WEDGED=none``, which signifies that the device
was temporarily 'wedged' at some point but no explicit recovery is expected
```
**Issue: "temporarily" is now inaccurate for the new use case.** In `XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET`, the device is *permanently* wedged - IOCTLs are blocked and `xe->needs_flr_on_fini` is set. The device stays wedged until a rebind. Saying it was "temporarily" wedged is misleading for this scenario.
**Suggestion:** The documentation should explicitly describe both scenarios for `WEDGED=none`: (1) driver already recovered the device via reset, and (2) device is wedged for debug purposes with no recovery intended. This would make the UAPI contract clear rather than muddying the existing semantics.
Minor: "for debug purpose" should be "for debug purposes" (plural).
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v1 2/3] drm: Update log for 'none' recovery method
2026-03-03 11:05 [PATCH v1 0/3] Update 'none' recovery method for DRM wedged event Raag Jadav
2026-03-03 11:05 ` [PATCH v1 1/3] drm/doc: Update documentation for 'none' recovery method Raag Jadav
@ 2026-03-03 11:05 ` Raag Jadav
2026-03-03 15:33 ` Rodrigo Vivi
2026-03-03 21:31 ` Claude review: " Claude Code Review Bot
2026-03-03 11:05 ` [PATCH v1 3/3] drm/xe: Send 'none' recovery method for XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET Raag Jadav
2026-03-03 21:31 ` Claude review: Update 'none' recovery method for DRM wedged event Claude Code Review Bot
3 siblings, 2 replies; 11+ messages in thread
From: Raag Jadav @ 2026-03-03 11:05 UTC (permalink / raw)
To: intel-xe, dri-devel
Cc: airlied, simona, mripard, matthew.brost, rodrigo.vivi,
riana.tauro, christian.koenig, andrealmeid, Raag Jadav
Update log for 'none' recovery method for wedged event where driver wants
to hint "no recovery" without resetting the device from driver context.
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
---
drivers/gpu/drm/drm_drv.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index 2915118436ce..72e7e09225c7 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -586,7 +586,7 @@ int drm_dev_wedged_event(struct drm_device *dev, unsigned long method,
snprintf(event_string, sizeof(event_string), "%s", "WEDGED=unknown");
drm_info(dev, "device wedged, %s\n", method == DRM_WEDGE_RECOVERY_NONE ?
- "but recovered through reset" : "needs recovery");
+ "but no recovery needed" : "needs recovery");
if (info && (info->comm[0] != '\0') && (info->pid >= 0)) {
snprintf(pid_string, sizeof(pid_string), "PID=%u", info->pid);
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH v1 2/3] drm: Update log for 'none' recovery method
2026-03-03 11:05 ` [PATCH v1 2/3] drm: Update log " Raag Jadav
@ 2026-03-03 15:33 ` Rodrigo Vivi
2026-03-03 21:31 ` Claude review: " Claude Code Review Bot
1 sibling, 0 replies; 11+ messages in thread
From: Rodrigo Vivi @ 2026-03-03 15:33 UTC (permalink / raw)
To: Raag Jadav
Cc: intel-xe, dri-devel, airlied, simona, mripard, matthew.brost,
riana.tauro, christian.koenig, andrealmeid
On Tue, Mar 03, 2026 at 04:35:18PM +0530, Raag Jadav wrote:
> Update log for 'none' recovery method for wedged event where driver wants
> to hint "no recovery" without resetting the device from driver context.
>
> Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
> drivers/gpu/drm/drm_drv.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> index 2915118436ce..72e7e09225c7 100644
> --- a/drivers/gpu/drm/drm_drv.c
> +++ b/drivers/gpu/drm/drm_drv.c
> @@ -586,7 +586,7 @@ int drm_dev_wedged_event(struct drm_device *dev, unsigned long method,
> snprintf(event_string, sizeof(event_string), "%s", "WEDGED=unknown");
>
> drm_info(dev, "device wedged, %s\n", method == DRM_WEDGE_RECOVERY_NONE ?
> - "but recovered through reset" : "needs recovery");
> + "but no recovery needed" : "needs recovery");
>
> if (info && (info->comm[0] != '\0') && (info->pid >= 0)) {
> snprintf(pid_string, sizeof(pid_string), "PID=%u", info->pid);
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 11+ messages in thread* Claude review: drm: Update log for 'none' recovery method
2026-03-03 11:05 ` [PATCH v1 2/3] drm: Update log " Raag Jadav
2026-03-03 15:33 ` Rodrigo Vivi
@ 2026-03-03 21:31 ` Claude Code Review Bot
1 sibling, 0 replies; 11+ messages in thread
From: Claude Code Review Bot @ 2026-03-03 21:31 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
The change:
```c
- "but recovered through reset" : "needs recovery");
+ "but no recovery needed" : "needs recovery");
```
**Concern: loss of diagnostic information.** The old message "but recovered through reset" was informative - it told you the device had already been reset. The new message "but no recovery needed" is vague and covers two very different situations: (a) the device was reset and is healthy, and (b) the device is permanently wedged for debugging. A kernel log reader seeing "device wedged, but no recovery needed" when the device is actually permanently non-functional would find this confusing.
**Suggestion:** Consider making the log more descriptive for the debug case. For instance, in the xe driver, the log at `xe_device.c:1312-1316` already says "CRITICAL: Xe has declared device ... as wedged. IOCTLs and executions are blocked." so there is some context, but the DRM core log is still misleading. At minimum, consider something like "no recovery expected" rather than "no recovery needed" to avoid implying the device is fine.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v1 3/3] drm/xe: Send 'none' recovery method for XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET
2026-03-03 11:05 [PATCH v1 0/3] Update 'none' recovery method for DRM wedged event Raag Jadav
2026-03-03 11:05 ` [PATCH v1 1/3] drm/doc: Update documentation for 'none' recovery method Raag Jadav
2026-03-03 11:05 ` [PATCH v1 2/3] drm: Update log " Raag Jadav
@ 2026-03-03 11:05 ` Raag Jadav
2026-03-03 15:34 ` Rodrigo Vivi
2026-03-03 21:31 ` Claude review: " Claude Code Review Bot
2026-03-03 21:31 ` Claude review: Update 'none' recovery method for DRM wedged event Claude Code Review Bot
3 siblings, 2 replies; 11+ messages in thread
From: Raag Jadav @ 2026-03-03 11:05 UTC (permalink / raw)
To: intel-xe, dri-devel
Cc: airlied, simona, mripard, matthew.brost, rodrigo.vivi,
riana.tauro, christian.koenig, andrealmeid, Raag Jadav
XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET is intended for debugging hangs,
so wedge the device with 'none' recovery method and have it available
to the user for debugging.
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
---
drivers/gpu/drm/xe/xe_device.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index c137dc0a1840..eeb626737760 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -1329,8 +1329,15 @@ void xe_device_declare_wedged(struct xe_device *xe)
xe_gt_declare_wedged(gt);
if (xe_device_wedged(xe)) {
+ /*
+ * XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET is intended for debugging
+ * hangs, so wedge the device with 'none' recovery method and have
+ * it available to the user for debugging.
+ */
+ if (xe->wedged.mode == XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET)
+ xe_device_set_wedged_method(xe, DRM_WEDGE_RECOVERY_NONE);
/* If no wedge recovery method is set, use default */
- if (!xe->wedged.method)
+ else if (!xe->wedged.method)
xe_device_set_wedged_method(xe, DRM_WEDGE_RECOVERY_REBIND |
DRM_WEDGE_RECOVERY_BUS_RESET);
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread* Re: [PATCH v1 3/3] drm/xe: Send 'none' recovery method for XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET
2026-03-03 11:05 ` [PATCH v1 3/3] drm/xe: Send 'none' recovery method for XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET Raag Jadav
@ 2026-03-03 15:34 ` Rodrigo Vivi
2026-03-03 21:31 ` Claude review: " Claude Code Review Bot
1 sibling, 0 replies; 11+ messages in thread
From: Rodrigo Vivi @ 2026-03-03 15:34 UTC (permalink / raw)
To: Raag Jadav
Cc: intel-xe, dri-devel, airlied, simona, mripard, matthew.brost,
riana.tauro, christian.koenig, andrealmeid
On Tue, Mar 03, 2026 at 04:35:19PM +0530, Raag Jadav wrote:
> XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET is intended for debugging hangs,
> so wedge the device with 'none' recovery method and have it available
> to the user for debugging.
>
> Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
> ---
> drivers/gpu/drm/xe/xe_device.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index c137dc0a1840..eeb626737760 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -1329,8 +1329,15 @@ void xe_device_declare_wedged(struct xe_device *xe)
> xe_gt_declare_wedged(gt);
>
> if (xe_device_wedged(xe)) {
> + /*
> + * XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET is intended for debugging
> + * hangs, so wedge the device with 'none' recovery method and have
> + * it available to the user for debugging.
> + */
> + if (xe->wedged.mode == XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET)
> + xe_device_set_wedged_method(xe, DRM_WEDGE_RECOVERY_NONE);
> /* If no wedge recovery method is set, use default */
> - if (!xe->wedged.method)
> + else if (!xe->wedged.method)
> xe_device_set_wedged_method(xe, DRM_WEDGE_RECOVERY_REBIND |
> DRM_WEDGE_RECOVERY_BUS_RESET);
>
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 11+ messages in thread* Claude review: drm/xe: Send 'none' recovery method for XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET
2026-03-03 11:05 ` [PATCH v1 3/3] drm/xe: Send 'none' recovery method for XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET Raag Jadav
2026-03-03 15:34 ` Rodrigo Vivi
@ 2026-03-03 21:31 ` Claude Code Review Bot
1 sibling, 0 replies; 11+ messages in thread
From: Claude Code Review Bot @ 2026-03-03 21:31 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
```c
+ if (xe->wedged.mode == XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET)
+ xe_device_set_wedged_method(xe, DRM_WEDGE_RECOVERY_NONE);
/* If no wedge recovery method is set, use default */
- if (!xe->wedged.method)
+ else if (!xe->wedged.method)
xe_device_set_wedged_method(xe, DRM_WEDGE_RECOVERY_REBIND |
DRM_WEDGE_RECOVERY_BUS_RESET);
```
**Issue 1: Unconditional override of pre-set method.** The new code unconditionally sets `DRM_WEDGE_RECOVERY_NONE` when the mode is `XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET`, regardless of whether a caller had already set `xe->wedged.method` via `xe_device_set_wedged_method()`. The old `if (!xe->wedged.method)` pattern respected pre-set methods. If `NONE` should always be used in this mode, this is fine but should be documented in the comment. If a caller could have legitimate reasons to set a different method even in NO_RESET mode, this would silently override it.
**Issue 2: Semantic mismatch with `WEDGED=none`.** As discussed in the overall review, `WEDGED=none` was designed for "device recovered, no action needed." Using it for "device permanently wedged, don't bother recovering" is a different semantic. In this mode, the device *does* need recovery (rebind) to become functional again - it's just that the intent is to leave it broken for debugging. A userspace daemon listening for wedged events and seeing `WEDGED=none` would reasonably conclude the device is fine, when it's actually non-functional.
**Suggestion:** Consider whether this debug mode should simply not send a uevent at all (since it's a debug-only configuration), or use a different signaling mechanism. Alternatively, if the uevent is needed for telemetry collection triggers, the documentation needs to be much clearer about the dual meaning.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 11+ messages in thread
* Claude review: Update 'none' recovery method for DRM wedged event
2026-03-03 11:05 [PATCH v1 0/3] Update 'none' recovery method for DRM wedged event Raag Jadav
` (2 preceding siblings ...)
2026-03-03 11:05 ` [PATCH v1 3/3] drm/xe: Send 'none' recovery method for XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET Raag Jadav
@ 2026-03-03 21:31 ` Claude Code Review Bot
3 siblings, 0 replies; 11+ messages in thread
From: Claude Code Review Bot @ 2026-03-03 21:31 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: Update 'none' recovery method for DRM wedged event
Author: Raag Jadav <raag.jadav@intel.com>
Patches: 7
Reviewed: 2026-03-04T07:31:27.271209
---
This 3-patch series broadens the meaning of `WEDGED=none` to cover a new use case: the Xe driver's `XE_WEDGED_MODE_UPON_ANY_HANG_NO_RESET` debug mode, where the device is intentionally left wedged (no GT reset) so developers can inspect the hang state.
The core concern is a **UAPI semantic change**. The current documentation and log message for `WEDGED=none` imply the device was already recovered (via reset) and is functional again. This series repurposes it to also mean "device is permanently wedged, don't try to recover it" for the debug case. These are fundamentally different device states from userspace's perspective: in one case the device works; in the other, all IOCTLs are blocked and only a rebind can restore it. Any existing userspace consumer processing `WEDGED=none` as "device is healthy, carry on" would be broken by the new usage.
The series would benefit from clearer documentation that distinguishes these two scenarios, or alternatively, considering whether a different mechanism (perhaps `WEDGED=vendor-specific` or a new method) would better express the "permanently wedged for debug" semantic without changing the existing contract.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 11+ messages in thread