From mboxrd@z Thu Jan 1 00:00:00 1970 From: Claude Code Review Bot To: dri-devel-reviews@example.com Subject: Claude review: Introduce cold reset recovery method Date: Mon, 25 May 2026 21:57:08 +1000 Message-ID: In-Reply-To: <20260520113351.171119-7-mallesh.koujalagi@intel.com> References: <20260520113351.171119-7-mallesh.koujalagi@intel.com> X-Mailer: Claude Code Patch Reviewer Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit MIME-Version: 1.0 Overall Series Review Subject: Introduce cold reset recovery method Author: Mallesh Koujalagi Patches: 6 Reviewed: 2026-05-25T21:57:08.799531 --- This v6 series adds a `DRM_WEDGE_RECOVERY_COLD_RESET` recovery method for Intel Xe GPUs, targeting PUNIT errors that persist across warm resets and require a full slot power cycle. The design is sound: detect the error via RAS infrastructure, wedge the device, emit a uevent, and let userspace perform the power cycle. The DRM core change (patch 2) is clean and minimal. The documentation (patch 3) is well-written with a practical example script. The PUNIT handler (patch 4) correctly uses the wedge API. The AER SLD suppression (patch 5) is architecturally sensible but has a concern about the mask never being restored on failure. Patch 1 is marked "DO NOT REVIEW - COMPILATION ONLY" but has a clear uninitialized variable bug and a wedge API bypass that could trigger an assertion, which I flag below for completeness since it ships as part of this build. **Key concerns:** - Uninitialized variable in `handle_page_offline()` (patch 1) - `xe_pci_error_handling()` bypasses `xe_device_declare_wedged()` and could hit the `xe_gt_assert` if `wedged.mode == XE_WEDGED_MODE_NEVER` (patch 1) - Duplicate `XE_SYSCTRL_FLOOD` macro definition across two headers (patch 1) - SLD mask in patch 5 is never restored if cold reset fails or is never performed --- --- Generated by Claude Code Patch Reviewer