Claude review: drm/xe/xe_hw_error: Add support for Core-Compute errors

public inbox for drm-ai-reviews@public-inbox.freedesktop.org
 help / color / mirror / Atom feed

From: Claude Code Review Bot <claude-review@example.com>
To: dri-devel-reviews@example.com
Subject: Claude review: drm/xe/xe_hw_error: Add support for Core-Compute errors
Date: Tue, 24 Feb 2026 10:45:42 +1000	[thread overview]
Message-ID: <review-patch4-20260223060541.526397-11-riana.tauro@intel.com> (raw)
In-Reply-To: <20260223060541.526397-11-riana.tauro@intel.com>

Patch Review

**Platform guard placement:**

The `hw_error_source_handler` changes the platform check from `XE_BATTLEMAGE` to a broader `IS_DGFX(xe)` check:

> -	if (xe->info.platform != XE_BATTLEMAGE)
> +	if (!IS_DGFX(xe))
>  		return;

But `gt_hw_error_handler` has its own platform guard:

> +	if (xe->info.platform != XE_PVC)
> +		return;

This means for Battlemage (which is DGFX but not PVC), `hw_error_source_handler` will now enter the for_each_set_bit loop and try to process GT/SOC errors via `xe_hw_error_map`, but `ras->info[severity]` was never initialized (since `hw_error_info_init` only runs on PVC). The `if (!info) goto clear_reg` check prevents a crash, but the code path that reaches `gt_hw_error_handler` only to immediately return on the PVC check is somewhat wasteful and fragile. If `xe_hw_error_map` is ever extended for another platform, the guards inside the sub-handlers would need updating too.

**Double-counting in subslice error path:**

> +		case ERR_STAT_GT_VECTOR0:
> +		case ERR_STAT_GT_VECTOR1: {
> +			u32 errbit;
> +
> +			val = hweight32(vector);
> +			atomic_add(val, &info[error_id].counter);
> +			...
> +			err_stat = xe_mmio_read32(mmio, ERR_STAT_GT_REG(hw_err));
> +			for_each_set_bit(errbit, &err_stat, GT_HW_ERROR_MAX_ERR_BITS) {
> +				if (PVC_ERROR_MASK_SET(hw_err, errbit))
> +					atomic_inc(&info[error_id].counter);
> +			}

For subslice errors, the code first counts by `hweight32(vector)` (number of set bits in the vector register), then also iterates the error status register and increments the counter for each set bit matching the error mask. Are these truly independent error sources that should each contribute to the counter? Or is the error status register providing detail about the same errors reported in the vector? If it's the same errors, this is double-counting. The comment says the status register is "only populated once per error", which suggests it's supplementary detail, not additional errors. If that's the case, the `atomic_inc` for the status register bits is double-counting.

**`xe_hw_error_map` size vs `XE_RAS_REG_SIZE`:**

> +static const unsigned long xe_hw_error_map[] = {
> +	[XE_GT_ERROR]	= DRM_XE_RAS_ERR_COMP_CORE_COMPUTE,
> +};

This array has only 1 element (index 0). In `hw_error_source_handler`:

> +		if (err_bit >= ARRAY_SIZE(xe_hw_error_map))
> +			break;

When `err_bit >= 1`, the loop breaks entirely rather than continuing to the next bit. This means if any bit above bit 0 is set in `err_src`, all subsequent bits are skipped entirely. Should this be `continue` instead of `break`? With `break`, a CSC error at bit 17 would never be reached via this loop (though it's handled by the earlier `if (err_src & REG_BIT(XE_CSC_ERROR))` check before the loop). After patch 5 extends the map to include `[XE_SOC_ERROR] = ...` at index 16, this break at index 1 would prevent ever reaching index 16. Looking at it more carefully -- in patch 5, `xe_hw_error_map` is extended to `[16]`, so `ARRAY_SIZE` becomes 17. That solves the problem for SOC errors, but the `break` vs `continue` semantics still matter for bits between 1 and 15 that aren't in the map.

Actually wait, with the extended array from patch 5, `ARRAY_SIZE(xe_hw_error_map)` = 17 (indices 0-16), and `err_bit >= 17` would break. `XE_CSC_ERROR` is bit 17, which was already handled before the loop. So the `break` works for this specific set of patches, but it's fragile -- `continue` would be more robust.

---
Generated by Claude Code Patch Reviewer

next prev parent reply	other threads:[~2026-02-24  0:45 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-23  6:05 [PATCH v8 0/5] Introduce DRM_RAS using generic netlink for RAS Riana Tauro
2026-02-23  6:05 ` [PATCH v8 1/5] drm/ras: Introduce the DRM RAS infrastructure over generic netlink Riana Tauro
2026-02-24  0:45   ` Claude review: " Claude Code Review Bot
2026-02-23  6:05 ` [PATCH v8 2/5] drm/xe/xe_drm_ras: Add support for XE DRM RAS Riana Tauro
2026-02-24  0:45   ` Claude review: " Claude Code Review Bot
2026-02-23  6:05 ` [PATCH v8 3/5] drm/xe/xe_hw_error: Integrate DRM RAS with hardware error handling Riana Tauro
2026-02-24  0:45   ` Claude review: " Claude Code Review Bot
2026-02-23  6:05 ` [PATCH v8 4/5] drm/xe/xe_hw_error: Add support for Core-Compute errors Riana Tauro
2026-02-24  0:45   ` Claude Code Review Bot [this message]
2026-02-23  6:05 ` [PATCH v8 5/5] drm/xe/xe_hw_error: Add support for PVC SoC errors Riana Tauro
2026-02-24  0:45   ` Claude review: " Claude Code Review Bot
2026-02-24  0:45 ` Claude review: Introduce DRM_RAS using generic netlink for RAS Claude Code Review Bot
  -- strict thread matches above, loose matches on Subject: below --
2026-02-28  8:08 [PATCH v9 0/5] " Riana Tauro
2026-02-28  8:08 ` [PATCH v9 4/5] drm/xe/xe_hw_error: Add support for Core-Compute errors Riana Tauro
2026-03-03  4:32   ` Claude review: " Claude Code Review Bot
2026-03-04  7:44 [PATCH v10 0/5] Introduce DRM_RAS using generic netlink for RAS Riana Tauro
2026-03-04  7:44 ` [PATCH v10 4/5] drm/xe/xe_hw_error: Add support for Core-Compute errors Riana Tauro
2026-03-05  3:47   ` Claude review: " Claude Code Review Bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=review-patch4-20260223060541.526397-11-riana.tauro@intel.com \
    --to=claude-review@example.com \
    --cc=dri-devel-reviews@example.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox