From: Claude Code Review Bot <claude-review@example.com>
To: dri-devel-reviews@example.com
Subject: Claude review: drm/xe/xe_hw_error: Add support for PVC SoC errors
Date: Tue, 24 Feb 2026 10:45:42 +1000 [thread overview]
Message-ID: <review-patch5-20260223060541.526397-12-riana.tauro@intel.com> (raw)
In-Reply-To: <20260223060541.526397-12-riana.tauro@intel.com>
Patch Review
**Correctable SoC errors not counted:**
> + if (hw_err == HARDWARE_ERROR_CORRECTABLE) {
> + xe_mmio_write32(mmio, SOC_GLOBAL_ERR_STAT_REG(master, hw_err), REG_GENMASK(31, 0));
> + xe_mmio_write32(mmio, SOC_LOCAL_ERR_STAT_REG(master, hw_err), REG_GENMASK(31, 0));
> + xe_mmio_write32(mmio, SOC_GLOBAL_ERR_STAT_REG(slave, hw_err), REG_GENMASK(31, 0));
> + xe_mmio_write32(mmio, SOC_LOCAL_ERR_STAT_REG(slave, hw_err), REG_GENMASK(31, 0));
> + goto unmask_gsysevtctl;
> + }
For correctable SoC errors, the status registers are cleared (written with all-ones) without reading them or incrementing any counter. This means the `soc-internal` correctable error counter will always remain 0, regardless of how many correctable SoC errors occur. Is this intentional? It seems like a gap -- for GT errors in `gt_hw_error_handler`, correctable errors DO get counted, but SoC correctable errors are silently discarded.
**Magic value for GSYSEVTCTL unmask:**
> +unmask_gsysevtctl:
> + for (i = 0; i < XE_SOC_NUM_IEH; i++)
> + xe_mmio_write32(mmio, SOC_GSYSEVTCTL_REG(master, slave, i),
> + (HARDWARE_ERROR_MAX << 1) + 1);
`(HARDWARE_ERROR_MAX << 1) + 1` evaluates to `(3 << 1) + 1 = 7 = 0b111`. This presumably enables all three error severity levels. The expression relies on `HARDWARE_ERROR_MAX` being 3 and on the register bit layout matching. A named constant or comment explaining the bit layout would reduce the chance of breakage if `HARDWARE_ERROR_MAX` is ever changed.
**Missing nonfatal local error handling in slave IEH:**
> + if (hw_err == HARDWARE_ERROR_FATAL) {
> + for_each_set_bit(regbit, &slave_local_errstat, XE_RAS_REG_SIZE)
> + log_soc_error(tile, pvc_slave_local_fatal_err_reg, severity,
> + regbit, error_id);
> + }
`soc_slave_ieh_handler` is called for both NONFATAL and FATAL hw_err (correctable was filtered out earlier). For NONFATAL, the slave local error register is read and cleared, but its bits are not processed or logged. There's no `pvc_slave_local_nonfatal_err_reg` array. If the hardware can report nonfatal slave local errors, they would be silently dropped without counting or logging.
**`strcmp` for filtering "Undefined" errors:**
> + name = reg_info[err_bit];
> +
> + if (strcmp(name, "Undefined")) {
Using `strcmp` against a string literal to filter out placeholder entries works but is fragile. A more robust approach would be to use NULL for undefined entries and check `if (name)`. The current approach also means every bit that IS set in the register and mapped to "Undefined" triggers a `strcmp` call, which is minor overhead but unnecessary.
---
Generated by Claude Code Patch Reviewer
next prev parent reply other threads:[~2026-02-24 0:45 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-23 6:05 [PATCH v8 0/5] Introduce DRM_RAS using generic netlink for RAS Riana Tauro
2026-02-23 6:05 ` [PATCH v8 1/5] drm/ras: Introduce the DRM RAS infrastructure over generic netlink Riana Tauro
2026-02-24 0:45 ` Claude review: " Claude Code Review Bot
2026-02-23 6:05 ` [PATCH v8 2/5] drm/xe/xe_drm_ras: Add support for XE DRM RAS Riana Tauro
2026-02-24 0:45 ` Claude review: " Claude Code Review Bot
2026-02-23 6:05 ` [PATCH v8 3/5] drm/xe/xe_hw_error: Integrate DRM RAS with hardware error handling Riana Tauro
2026-02-24 0:45 ` Claude review: " Claude Code Review Bot
2026-02-23 6:05 ` [PATCH v8 4/5] drm/xe/xe_hw_error: Add support for Core-Compute errors Riana Tauro
2026-02-24 0:45 ` Claude review: " Claude Code Review Bot
2026-02-23 6:05 ` [PATCH v8 5/5] drm/xe/xe_hw_error: Add support for PVC SoC errors Riana Tauro
2026-02-24 0:45 ` Claude Code Review Bot [this message]
2026-02-24 0:45 ` Claude review: Introduce DRM_RAS using generic netlink for RAS Claude Code Review Bot
-- strict thread matches above, loose matches on Subject: below --
2026-02-28 8:08 [PATCH v9 0/5] " Riana Tauro
2026-02-28 8:08 ` [PATCH v9 5/5] drm/xe/xe_hw_error: Add support for PVC SoC errors Riana Tauro
2026-03-03 4:32 ` Claude review: " Claude Code Review Bot
2026-03-04 7:44 [PATCH v10 0/5] Introduce DRM_RAS using generic netlink for RAS Riana Tauro
2026-03-04 7:44 ` [PATCH v10 5/5] drm/xe/xe_hw_error: Add support for PVC SoC errors Riana Tauro
2026-03-05 3:47 ` Claude review: " Claude Code Review Bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=review-patch5-20260223060541.526397-12-riana.tauro@intel.com \
--to=claude-review@example.com \
--cc=dri-devel-reviews@example.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox