From mboxrd@z Thu Jan 1 00:00:00 1970 From: Claude Code Review Bot To: dri-devel-reviews@example.com Subject: Claude review: drm/xe/xe_hw_error: Add support for PVC SoC errors Date: Tue, 03 Mar 2026 14:32:51 +1000 Message-ID: In-Reply-To: <20260228080858.3063532-12-riana.tauro@intel.com> References: <20260228080858.3063532-7-riana.tauro@intel.com> <20260228080858.3063532-12-riana.tauro@intel.com> X-Mailer: Claude Code Patch Reviewer Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit MIME-Version: 1.0 Patch Review **Fragile "Undefined" string comparison:** ```c + if (strcmp(name, "Undefined")) { + ... + atomic_inc(&info[index].counter); + } ``` Using `strcmp` against a magic string to skip entries is error-prone. If someone typos "Undefined" in one of the tables, errors would be silently miscounted. Use `NULL` entries and a `NULL` check instead, consistent with how other kernel error tables work. **Correctable SoC errors silently cleared without logging or counting:** ```c + if (hw_err == HARDWARE_ERROR_CORRECTABLE) { + xe_mmio_write32(mmio, SOC_GLOBAL_ERR_STAT_REG(master, hw_err), REG_GENMASK(31, 0)); + xe_mmio_write32(mmio, SOC_LOCAL_ERR_STAT_REG(master, hw_err), REG_GENMASK(31, 0)); + xe_mmio_write32(mmio, SOC_GLOBAL_ERR_STAT_REG(slave, hw_err), REG_GENMASK(31, 0)); + xe_mmio_write32(mmio, SOC_LOCAL_ERR_STAT_REG(slave, hw_err), REG_GENMASK(31, 0)); + goto unmask_gsysevtctl; + } ``` The registers are cleared with all-ones without first reading them to determine which errors occurred. No counter is incremented and no log message is produced. This makes the correctable SoC error counter permanently zero, which defeats the purpose of exposing it via RAS. At minimum, the registers should be read and counted before clearing. **Magic unmask value:** ```c + xe_mmio_write32(mmio, SOC_GSYSEVTCTL_REG(master, slave, i), + (HARDWARE_ERROR_MAX << 1) + 1); ``` `(HARDWARE_ERROR_MAX << 1) + 1` = `(3 << 1) + 1` = `7` = `0b111`. This enables all three error types (correctable, nonfatal, fatal). This should be a named constant with a comment explaining the register field layout. **`SOC_IEH0_LOCAL_ERR_STATUS` and `SOC_IEH1_LOCAL_ERR_STATUS` both `REG_BIT(0)`:** ```c +#define SOC_IEH0_LOCAL_ERR_STATUS REG_BIT(0) +#define SOC_IEH1_LOCAL_ERR_STATUS REG_BIT(0) ``` Two different names for the exact same bit value is confusing. If these are the same bit in different registers, a single define with a clear name (or a comment explaining the distinction) would be better. **Potential over-counting from global + local register processing:** In `soc_slave_ieh_handler` and the master handler, both the local error register bits and global error register bits are iterated, and each calls `log_soc_error()` which increments the counter. If a single error event sets bits in both registers, the counter is incremented multiple times. Clarify whether this is the intended behavior. **`slave` variable declared but only used as a constant:** ```c + u32 master, slave, regbit; + ... + master = SOC_PVC_MASTER_BASE; + slave = SOC_PVC_SLAVE_BASE; ``` These are compile-time constants. Consider using the defines directly or making them `const`. --- Generated by Claude Code Patch Reviewer