From mboxrd@z Thu Jan 1 00:00:00 1970 From: Claude Code Review Bot To: dri-devel-reviews@example.com Subject: Claude review: drm/xe/xe_hw_error: Integrate DRM RAS with hardware error handling Date: Thu, 05 Mar 2026 13:47:41 +1000 Message-ID: In-Reply-To: <20260304074412.464435-10-riana.tauro@intel.com> References: <20260304074412.464435-7-riana.tauro@intel.com> <20260304074412.464435-10-riana.tauro@intel.com> X-Mailer: Claude Code Patch Reviewer Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit MIME-Version: 1.0 Patch Review **Concern: Lost diagnostic specificity in error messages** The original code distinguished CORRECTABLE, NONFATAL, and FATAL in its log strings (e.g. `HEC_ERR_STATUS_NONFATAL`). The new code maps both NONFATAL and FATAL to the same severity string "uncorrectable-errors": ```c static enum drm_xe_ras_error_severity hw_err_to_severity(const enum hardware_error hw_err) { if (hw_err == HARDWARE_ERROR_CORRECTABLE) return DRM_XE_RAS_ERR_SEV_CORRECTABLE; return DRM_XE_RAS_ERR_SEV_UNCORRECTABLE; } ``` This means log messages no longer distinguish FATAL from NONFATAL hardware errors, which is a regression in debuggability. The log message `"HEC FW %s %s reported"` will say "uncorrectable-errors" for both fatal and nonfatal, making it harder for engineers to triage. **Minor: `hw_error_info_init` silently skips non-PVC** ```c static int hw_error_info_init(struct xe_device *xe) { if (xe->info.platform != XE_PVC) return 0; return xe_drm_ras_init(xe); } ``` This is fine but means the `xe_hw_error_init` flow will call `process_hw_errors` on non-PVC platforms without RAS info initialized, so the `info` pointer check in patch 4's `hw_error_source_handler` (`if (!info) goto clear_reg`) becomes load-bearing. --- Generated by Claude Code Patch Reviewer