From mboxrd@z Thu Jan  1 00:00:00 1970
From: Claude Code Review Bot <claude-review@example.com>
To: dri-devel-reviews@example.com
Subject: Claude review: drm/xe/xe_hw_error: Integrate DRM RAS with hardware error handling
Date: Thu, 05 Mar 2026 13:47:41 +1000
Message-ID: <review-patch3-20260304074412.464435-10-riana.tauro@intel.com>
In-Reply-To: <20260304074412.464435-10-riana.tauro@intel.com>
References: <20260304074412.464435-7-riana.tauro@intel.com> <20260304074412.464435-10-riana.tauro@intel.com>
X-Mailer: Claude Code Patch Reviewer
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
MIME-Version: 1.0

Patch Review

**Concern: Lost diagnostic specificity in error messages**

The original code distinguished CORRECTABLE, NONFATAL, and FATAL in its log strings (e.g. `HEC_ERR_STATUS_NONFATAL`). The new code maps both NONFATAL and FATAL to the same severity string "uncorrectable-errors":

```c
static enum drm_xe_ras_error_severity hw_err_to_severity(const enum hardware_error hw_err)
{
    if (hw_err == HARDWARE_ERROR_CORRECTABLE)
        return DRM_XE_RAS_ERR_SEV_CORRECTABLE;
    return DRM_XE_RAS_ERR_SEV_UNCORRECTABLE;
}
```

This means log messages no longer distinguish FATAL from NONFATAL hardware errors, which is a regression in debuggability. The log message `"HEC FW %s %s reported"` will say "uncorrectable-errors" for both fatal and nonfatal, making it harder for engineers to triage.

**Minor: `hw_error_info_init` silently skips non-PVC**

```c
static int hw_error_info_init(struct xe_device *xe)
{
    if (xe->info.platform != XE_PVC)
        return 0;
    return xe_drm_ras_init(xe);
}
```

This is fine but means the `xe_hw_error_init` flow will call `process_hw_errors` on non-PVC platforms without RAS info initialized, so the `info` pointer check in patch 4's `hw_error_source_handler` (`if (!info) goto clear_reg`) becomes load-bearing.

---
Generated by Claude Code Patch Reviewer