public inbox for drm-ai-reviews@public-inbox.freedesktop.org
 help / color / mirror / Atom feed
From: Claude Code Review Bot <claude-review@example.com>
To: dri-devel-reviews@example.com
Subject: Claude review: Introduce Xe Uncorrectable Error Handling
Date: Sun, 12 Apr 2026 14:32:33 +1000	[thread overview]
Message-ID: <review-patch1-20260406142325.157035-7-mallesh.koujalagi@intel.com> (raw)
In-Reply-To: <20260406142325.157035-7-mallesh.koujalagi@intel.com>

Patch Review

*Marked "DO NOT REVIEW. COMPILATION ONLY" — reviewed only for issues that affect patches 2-4.*

**xe_pci_error.c — `xe_pci_error_slot_reset()`:**

```c
pdev->driver->remove(pdev);
devres_release_group(&pdev->dev, xe->devres_group_id);

if (!pdev->driver->probe(pdev, ent))
    return PCI_ERS_RESULT_RECOVERED;
```

Calling `pdev->driver->remove()` followed by `pdev->driver->probe()` directly from a PCI error handler callback is dangerous. After `remove()` runs, the `xe` pointer obtained earlier via `pdev_to_xe_device()` may be freed, making the `devres_release_group()` call use-after-free on `xe->devres_group_id`. The devres group release should happen before or as part of the remove, not after.

**xe_pci_error.c — `xe_pci_error_handling()`:**

```c
/* Wedge the device to prevent userspace access but don't send the event yet */
atomic_set(&xe->wedged.flag, 1);
```

This directly sets the wedged flag bypassing `xe_device_declare_wedged()`, which means no GT wedging, no PM runtime get, no uevent. The comment says "don't send the event yet" but there's no clear place where the event is eventually sent for this path (io_frozen case). If the slot_reset then removes and reprobes, the wedge state is lost with the old device.

**xe_pci_error.c — `extern` declaration:**

```c
extern const struct pci_error_handlers xe_pci_error_handlers;
```

This `extern` is in `xe_pci.c` instead of a header. It should be declared in a proper header (e.g., `xe_pci_error.h` or similar) that both files include, to get compiler type checking.

**xe_pci_error.c — `ras_recovery_action_to_pci_result[]`:**

```c
static const pci_ers_result_t ras_recovery_action_to_pci_result[] = {
	[XE_RAS_RECOVERY_ACTION_RECOVERED] = PCI_ERS_RESULT_RECOVERED,
	[XE_RAS_RECOVERY_ACTION_RESET] = PCI_ERS_RESULT_NEED_RESET,
	[XE_RAS_RECOVERY_ACTION_DISCONNECT] = PCI_ERS_RESULT_DISCONNECT,
};
```

No bounds check on the index used to access this array. If `xe_ras_process_errors()` ever returns `XE_RAS_RECOVERY_ACTION_MAX` or an out-of-range value, this is an out-of-bounds read.

**xe_ras.c — `severity_to_str()` / `comp_to_str()`:**

```c
static inline const char *severity_to_str(struct xe_device *xe, u32 severity)
{
	xe_assert(xe, severity < XE_RAS_SEVERITY_MAX);

	return severity < XE_RAS_SEVERITY_MAX ? xe_ras_severities[severity] : "Unknown";
}
```

The `xe_assert()` will crash in debug builds for out-of-range values, making the subsequent bounds-check fallback to `"Unknown"` unreachable. Either trust the assert (remove the ternary) or handle gracefully (remove the assert). Having both is inconsistent.

**xe_ras.c — `handle_soc_internal_errors()` type punning:**

```c
struct xe_ras_csc_error *csc_error =
    (struct xe_ras_csc_error *)error_info->additional_details;
```

and:

```c
struct xe_ras_ieh_error *ieh_error =
    (struct xe_ras_ieh_error *)error_info->additional_details;
```

Both CSC and IEH errors cast from `additional_details[15]` (a `u32[]`) to different struct types. If both `source.csc` and `source.soc` are set simultaneously, both branches execute and interpret the same `additional_details` buffer as two different struct types. The code should use `else if` or clarify that these are mutually exclusive.

**xe_device.h — `xe_device_clear_in_recovery()`:**

```c
static inline void xe_device_clear_in_recovery(struct xe_device *xe)
{
	 atomic_set(&xe->in_recovery, 0);
}
```

Minor: extra leading space before `atomic_set`.

**xe_ras.c — `xe_ras_process_errors()` loop:**

```c
for (int i = 0; i < response.num_errors &&  i < XE_RAS_NUM_ERROR_ARR; i++) {
```

Minor: double space before `i < XE_RAS_NUM_ERROR_ARR`.

**xe_survivability_mode.c change:**

```c
-	if (check_boot_failure(xe)) {
+	/* Survivability info is not required if enabled via configfs */
+	if (!xe_configfs_get_survivability_mode(pdev)) {
```

This changes the condition for creating survivability sysfs info from "boot failure detected" to "not enabled via configfs". This inverts the logic and broadens the scope — previously the info group was only created on boot failure, now it's created whenever configfs doesn't have survivability mode. This seems like a behavioral change that deserves its own commit and proper explanation.

---
Generated by Claude Code Patch Reviewer

  reply	other threads:[~2026-04-12  4:32 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-06 14:23 [PATCH v3 0/4] Introduce cold reset recovery method Mallesh Koujalagi
2026-04-06 14:23 ` [PATCH v3 1/4] Introduce Xe Uncorrectable Error Handling Mallesh Koujalagi
2026-04-12  4:32   ` Claude Code Review Bot [this message]
2026-04-06 14:23 ` [PATCH v3 2/4] drm: Add DRM_WEDGE_RECOVERY_COLD_RESET recovery method Mallesh Koujalagi
2026-04-08  7:46   ` Raag Jadav
2026-04-12  4:32   ` Claude review: " Claude Code Review Bot
2026-04-06 14:23 ` [PATCH v3 3/4] drm/doc: Document " Mallesh Koujalagi
2026-04-08  8:01   ` Raag Jadav
2026-04-12  4:32   ` Claude review: " Claude Code Review Bot
2026-04-06 14:23 ` [PATCH v3 4/4] drm/xe: Handle PUNIT errors by requesting cold-reset recovery Mallesh Koujalagi
2026-04-08  8:09   ` Raag Jadav
2026-04-12  4:32   ` Claude review: " Claude Code Review Bot
2026-04-12  4:32 ` Claude review: Introduce cold reset recovery method Claude Code Review Bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=review-patch1-20260406142325.157035-7-mallesh.koujalagi@intel.com \
    --to=claude-review@example.com \
    --cc=dri-devel-reviews@example.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox