From mboxrd@z Thu Jan 1 00:00:00 1970 From: Claude Code Review Bot To: dri-devel-reviews@example.com Subject: Claude review: drm/nouveau/fifo: add recovery path for Tesla cache_error/dma_pusher Date: Sat, 16 May 2026 11:34:50 +1000 Message-ID: In-Reply-To: <20260513175014.96599-3-marek@czernohous.de> References: <20260513175014.96599-1-marek@czernohous.de> <20260513175014.96599-3-marek@czernohous.de> X-Mailer: Claude Code Patch Reviewer Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Patch Review **Assessment: Functionally correct, a few items to address.** **Issue 1 (should fix) =E2=80=94 German comments in public header.** `inclu= de/nvkm/engine/fifo.h` is a header shared across nouveau. The struct field = comments are in German: ```c u32 count; /* aktuelle Fenste= r-Tiefe */ ktime_t ts[NVKM_FIFO_WEDGE_RING_MAX]; /* Ring von Timesta= mps */ u32 head; /* Ring-Head */ ``` These should be in English ("current window depth", "timestamp ring", "ring= head") or removed entirely, since the field names are already reasonably d= escriptive. **Issue 2 (design question) =E2=80=94 permanent wedge with no reset path.**= Once Tier-2 fires: ```c if (atomic_xchg(&w->wedged, 1) !=3D 0) return; /* already wedged this cycle */ ``` The `wedged` atomic is never reset to 0. If userspace performs the `DRM_WED= GE_RECOVERY_REBIND` and the driver is re-bound, the `nvkm_fifo` structure i= s re-allocated so this is implicitly reset. But if the uevent is *not* acte= d on (no listener, or userspace ignores it), the wedge event fires exactly = once and then Tier-2 is silently dead while Tier-1 keeps running. This is p= robably intentional, but worth documenting in a comment or the commit messa= ge. **Issue 3 (minor) =E2=80=94 `w->count` read without lock in worker.** ```c static void nv04_fifo_wedge_work(struct work_struct *work) { ... fault_count =3D w->count; ``` `w->count` is updated under `fifo->wedge.lock` in the IRQ path but read her= e without the lock. This is a benign race since the value is only used for = the log message and tracepoint (not control flow), but KCSAN may flag it. A= `READ_ONCE()` would silence any tooling complaint and make the intent expl= icit. **Issue 4 (minor) =E2=80=94 `nv04_fifo_wedge_init`/`fini` called for all ch= ipsets.** These are called from `nvkm_fifo_new_` and `nvkm_fifo_dtor` which= run for every GPU generation, not just Tesla. The overhead is trivial (spi= nlock init + work init), but naming them `nv04_*` and initializing them glo= bally is a bit misleading. Consider either: (a) renaming to something more = generic, or (b) conditioning on chipset. Very minor. **Issue 5 (cosmetic) =E2=80=94 dev_info prefix redundancy.** ```c dev_info(drm_dev->dev, "nouveau: fifo wedged after %u faults in %u ms\n", fault_count, nouveau_fifo_wedge_window_ms); ``` `dev_info` already prefixes the device name. The manual `"nouveau: "` prefi= x is redundant and will produce output like `pci 0000:02:00.0: nouveau: fif= o wedged ...`. Drop the `"nouveau: "` prefix. **Correctness notes (all fine):** - `schedule_work` from IRQ context: safe by design. - `nvkm_chan_error` from IRQ context: safe =E2=80=94 other interrupt handle= rs (gf100, ga100, gv100) already call it this way. - Reuse of `flags` variable in `nv04_fifo_recover` for both the chan get/pu= t and the spinlock: correct, the chan is released before the spin_lock. - `ktime_t` zero as sentinel: safe since `ktime_get()` will never return 0 = in practice. - Module params declared with perm `0400` (read-only): no runtime write rac= e possible. - `CREATE_TRACE_POINTS` placement in `nouveau_drm.c`: correct, and this is = the first trace event header for nouveau. - Tracepoint `TRACE_INCLUDE_PATH` uses the `../../` relative path pattern: = this is the standard kernel convention for out-of-tree-style trace headers. --- Generated by Claude Code Patch Reviewer