public inbox for drm-ai-reviews@public-inbox.freedesktop.org
 help / color / mirror / Atom feed
From: Claude Code Review Bot <claude-review@example.com>
To: dri-devel-reviews@example.com
Subject: Claude review: drm/xe/xe_hw_error: Add support for PVC SoC errors
Date: Tue, 24 Feb 2026 10:45:42 +1000	[thread overview]
Message-ID: <review-patch5-20260223060541.526397-12-riana.tauro@intel.com> (raw)
In-Reply-To: <20260223060541.526397-12-riana.tauro@intel.com>

Patch Review

**Correctable SoC errors not counted:**

> +	if (hw_err == HARDWARE_ERROR_CORRECTABLE) {
> +		xe_mmio_write32(mmio, SOC_GLOBAL_ERR_STAT_REG(master, hw_err), REG_GENMASK(31, 0));
> +		xe_mmio_write32(mmio, SOC_LOCAL_ERR_STAT_REG(master, hw_err), REG_GENMASK(31, 0));
> +		xe_mmio_write32(mmio, SOC_GLOBAL_ERR_STAT_REG(slave, hw_err), REG_GENMASK(31, 0));
> +		xe_mmio_write32(mmio, SOC_LOCAL_ERR_STAT_REG(slave, hw_err), REG_GENMASK(31, 0));
> +		goto unmask_gsysevtctl;
> +	}

For correctable SoC errors, the status registers are cleared (written with all-ones) without reading them or incrementing any counter. This means the `soc-internal` correctable error counter will always remain 0, regardless of how many correctable SoC errors occur. Is this intentional? It seems like a gap -- for GT errors in `gt_hw_error_handler`, correctable errors DO get counted, but SoC correctable errors are silently discarded.

**Magic value for GSYSEVTCTL unmask:**

> +unmask_gsysevtctl:
> +	for (i = 0; i < XE_SOC_NUM_IEH; i++)
> +		xe_mmio_write32(mmio, SOC_GSYSEVTCTL_REG(master, slave, i),
> +				(HARDWARE_ERROR_MAX << 1) + 1);

`(HARDWARE_ERROR_MAX << 1) + 1` evaluates to `(3 << 1) + 1 = 7 = 0b111`. This presumably enables all three error severity levels. The expression relies on `HARDWARE_ERROR_MAX` being 3 and on the register bit layout matching. A named constant or comment explaining the bit layout would reduce the chance of breakage if `HARDWARE_ERROR_MAX` is ever changed.

**Missing nonfatal local error handling in slave IEH:**

> +	if (hw_err == HARDWARE_ERROR_FATAL) {
> +		for_each_set_bit(regbit, &slave_local_errstat, XE_RAS_REG_SIZE)
> +			log_soc_error(tile, pvc_slave_local_fatal_err_reg, severity,
> +				      regbit, error_id);
> +	}

`soc_slave_ieh_handler` is called for both NONFATAL and FATAL hw_err (correctable was filtered out earlier). For NONFATAL, the slave local error register is read and cleared, but its bits are not processed or logged. There's no `pvc_slave_local_nonfatal_err_reg` array. If the hardware can report nonfatal slave local errors, they would be silently dropped without counting or logging.

**`strcmp` for filtering "Undefined" errors:**

> +	name = reg_info[err_bit];
> +
> +	if (strcmp(name, "Undefined")) {

Using `strcmp` against a string literal to filter out placeholder entries works but is fragile. A more robust approach would be to use NULL for undefined entries and check `if (name)`. The current approach also means every bit that IS set in the register and mapped to "Undefined" triggers a `strcmp` call, which is minor overhead but unnecessary.

---
Generated by Claude Code Patch Reviewer

  reply	other threads:[~2026-02-24  0:45 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-23  6:05 [PATCH v8 0/5] Introduce DRM_RAS using generic netlink for RAS Riana Tauro
2026-02-23  6:05 ` [PATCH v8 1/5] drm/ras: Introduce the DRM RAS infrastructure over generic netlink Riana Tauro
2026-02-24  0:45   ` Claude review: " Claude Code Review Bot
2026-02-23  6:05 ` [PATCH v8 2/5] drm/xe/xe_drm_ras: Add support for XE DRM RAS Riana Tauro
2026-02-24  0:45   ` Claude review: " Claude Code Review Bot
2026-02-23  6:05 ` [PATCH v8 3/5] drm/xe/xe_hw_error: Integrate DRM RAS with hardware error handling Riana Tauro
2026-02-24  0:45   ` Claude review: " Claude Code Review Bot
2026-02-23  6:05 ` [PATCH v8 4/5] drm/xe/xe_hw_error: Add support for Core-Compute errors Riana Tauro
2026-02-24  0:45   ` Claude review: " Claude Code Review Bot
2026-02-23  6:05 ` [PATCH v8 5/5] drm/xe/xe_hw_error: Add support for PVC SoC errors Riana Tauro
2026-02-24  0:45   ` Claude Code Review Bot [this message]
2026-02-24  0:45 ` Claude review: Introduce DRM_RAS using generic netlink for RAS Claude Code Review Bot
  -- strict thread matches above, loose matches on Subject: below --
2026-02-28  8:08 [PATCH v9 0/5] " Riana Tauro
2026-02-28  8:08 ` [PATCH v9 5/5] drm/xe/xe_hw_error: Add support for PVC SoC errors Riana Tauro
2026-03-03  4:32   ` Claude review: " Claude Code Review Bot
2026-03-04  7:44 [PATCH v10 0/5] Introduce DRM_RAS using generic netlink for RAS Riana Tauro
2026-03-04  7:44 ` [PATCH v10 5/5] drm/xe/xe_hw_error: Add support for PVC SoC errors Riana Tauro
2026-03-05  3:47   ` Claude review: " Claude Code Review Bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=review-patch5-20260223060541.526397-12-riana.tauro@intel.com \
    --to=claude-review@example.com \
    --cc=dri-devel-reviews@example.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox