From mboxrd@z Thu Jan 1 00:00:00 1970 From: Claude Code Review Bot To: dri-devel-reviews@example.com Subject: Claude review: drm/xe: Handle PUNIT errors by requesting cold-reset recovery Date: Sun, 12 Apr 2026 14:32:34 +1000 Message-ID: In-Reply-To: <20260406142325.157035-10-mallesh.koujalagi@intel.com> References: <20260406142325.157035-6-mallesh.koujalagi@intel.com> <20260406142325.157035-10-mallesh.koujalagi@intel.com> X-Mailer: Claude Code Patch Reviewer Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Patch Review **`xe_punit_error_handler()` double-wedging concern:** ```c void xe_punit_error_handler(struct xe_device *xe) { xe_err(xe, "Recovery: Device cold reset required\n"); xe_device_set_wedged_method(xe, DRM_WEDGE_RECOVERY_COLD_RESET); xe_device_declare_wedged(xe); } ``` This is called from `handle_soc_internal_errors()`, which is called from `x= e_ras_process_errors()`, which is called from `xe_pci_error_mmio_enabled()`= . But looking at `xe_pci_error_detected()` for the `pci_channel_io_frozen` = case, `xe_pci_error_handling()` already sets `atomic_set(&xe->wedged.flag, = 1)` and calls `xe_gt_declare_wedged()`. So by the time `xe_punit_error_hand= ler()` runs and calls `xe_device_declare_wedged()`, the device is already p= artially wedged. `xe_device_declare_wedged()` uses `atomic_xchg(&xe->wedged= .flag, 1)` which will see it's already set and skip the PM runtime get and = error message, but it will still call `xe_gt_declare_wedged()` again for al= l GTs (double wedge). The uevent will still fire (good), but the flow is fr= agile and depends on ordering assumptions. For the `pci_channel_io_normal` path (which returns `CAN_RECOVER` and leads= to `mmio_enabled`), the wedged flag is NOT pre-set, so `xe_punit_error_han= dler()` works correctly. But this means the behavior differs depending on w= hich PCI channel state triggered the error =E2=80=94 this should be documen= ted. **Exported function placement:** `xe_punit_error_handler()` is declared in `xe_ras.h` but it's a fairly high= -level operation (wedge + uevent). It might make more sense in `xe_device.c= ` or at least the name should reflect that it's specific to the RAS/PCI err= or recovery path, not a general-purpose API. **Header include ordering in xe_ras.c:** ```c +#include "xe_device.h" ``` Added between `xe_device_types.h` and `xe_printk.h` =E2=80=94 the includes = should be in alphabetical order. `xe_device.h` should come before `xe_devic= e_types.h`. --- Generated by Claude Code Patch Reviewer