From mboxrd@z Thu Jan 1 00:00:00 1970 From: Claude Code Review Bot To: dri-devel-reviews@example.com Subject: Claude review: drm/xe/userptr: Defer Waiting for TLB invalidation to the second pass if possible Date: Wed, 04 Mar 2026 07:09:17 +1000 Message-ID: In-Reply-To: <20260303133409.11609-5-thomas.hellstrom@linux.intel.com> References: <20260303133409.11609-1-thomas.hellstrom@linux.intel.com> <20260303133409.11609-5-thomas.hellstrom@linux.intel.com> X-Mailer: Claude Code Patch Reviewer Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Patch Review This is the most complex patch, adding a *third* effective pass: pass1 defe= rs fence wait, finish-pass does fence wait + submits TLB invalidation, then= another finish-pass waits for TLB invalidation. The cover letter documents= three flows: ``` pass1 (fences pending) -> invalidate_finish -> do_inval (sync TLB) pass1 (fences done) -> do_inval -> invalidate_finish -> complete_tlb_inval (deferred TLB) pass1 (finish occupied) -> do_inval (sync TLB, inline) ``` **Concern =E2=80=94 three-pass through a two-pass API:** The mmu_notifier infrastructure in patch 1 is designed for two passes. This= patch achieves a third pass by having `xe_vma_userptr_do_inval` (called fr= om `invalidate_finish`) return *another* `mmu_interval_notifier_finish*`, e= ffectively re-queuing the notifier for another finish pass. However, lookin= g at the mm core code in `mn_itree_finish_pass`: ```c static void mn_itree_finish_pass(struct llist_head *finish_passes) { struct llist_node *first =3D llist_reverse_order(__llist_del_all(finish_pa= sses)); struct mmu_interval_notifier_finish *f, *next; llist_for_each_entry_safe(f, next, first, link) f->notifier->ops->invalidate_finish(f); } ``` This drains the list and calls `invalidate_finish` for each entry. There's = **no mechanism** for `invalidate_finish` to re-queue itself for yet another= pass =E2=80=94 the list is already drained when finish runs. Looking more = carefully at `xe_vma_userptr_do_inval` in this patch: ```c if (!userptr->finish_inuse) { userptr->finish_inuse =3D true; userptr->tlb_inval_submitted =3D true; err =3D xe_vm_invalidate_vma_submit(vma, &userptr->inval_batch); XE_WARN_ON(err); return &userptr->finish; } ``` This returns `&userptr->finish` from within `xe_vma_userptr_do_inval`, whic= h was called from `xe_vma_userptr_invalidate_finish`. But `invalidate_finis= h` in the mm core doesn't look at any return value =E2=80=94 it's `void`: ```c void (*invalidate_finish)(struct mmu_interval_notifier_finish *finish); ``` So the return value from `xe_vma_userptr_do_inval` in the `invalidate_finis= h` path goes... where? Let's trace the call: ```c static void xe_vma_userptr_invalidate_finish(...) { down_write(&vm->svm.gpusvm.notifier_lock); if (uvma->userptr.tlb_inval_submitted) xe_vma_userptr_complete_tlb_inval(vm, uvma); else xe_vma_userptr_do_inval(vm, uvma, true); // return value IGNORED up_write(&vm->svm.gpusvm.notifier_lock); } ``` The return value of `xe_vma_userptr_do_inval` is **ignored** in `invalidate= _finish`. So when `xe_vma_userptr_do_inval` is called with `is_deferred=3Dt= rue` and `!finish_inuse`, it sets `finish_inuse=3Dtrue`, `tlb_inval_submitt= ed=3Dtrue`, submits the TLB invalidation, and returns `&userptr->finish`. B= ut nobody picks up this return value to schedule another finish pass. **This means the TLB invalidation batch submitted in this path is never wai= ted on.** The `xe_vma_userptr_complete_tlb_inval` function would need to be= called, but it's only called when `tlb_inval_submitted` is true at the *en= try* of `xe_vma_userptr_invalidate_finish`. Since the finish callback is on= ly invoked once per `__llist_add`, this deferred TLB path appears to leak = =E2=80=94 `finish_inuse` remains true, `tlb_inval_submitted` remains true, = and the TLB batch is never waited on. The next notifier call would see `fin= ish_inuse=3Dtrue` and fall back to the synchronous path, which works, but `= tlb_inval_submitted` being stale would cause `xe_vma_userptr_invalidate_fin= ish` to call `xe_vma_userptr_complete_tlb_inval` on a stale/reused batch. Wait =E2=80=94 let me re-read more carefully. The `is_deferred=3Dtrue` path= enters with `finish_inuse` already set (the assert checks it): ```c if (is_deferred) xe_assert(vm->xe, userptr->finish_inuse && !userptr->tlb_inval_submitte= d); ``` So when `xe_vma_userptr_do_inval` is called with `is_deferred=3Dtrue`, `fin= ish_inuse` is already true. Then the `if (!userptr->finish_inuse)` check is= false, so it falls through to the synchronous `xe_vm_invalidate_vma()` pat= h. So actually the deferred TLB submit via `do_inval` **only happens in the= non-deferred `is_deferred=3Dfalse` path** where `finish_inuse` is initiall= y false. Let me re-trace: In `xe_vma_userptr_invalidate_pass1`, when `signaled || finish_inuse`: ```c if (signaled || userptr->finish_inuse) return xe_vma_userptr_do_inval(vm, uvma, false); // is_deferred=3Dfalse ``` Here `is_deferred=3Dfalse`. If `!finish_inuse` (i.e., we entered because `s= ignaled` was true), then `xe_vma_userptr_do_inval` may return `&userptr->fi= nish` after setting `finish_inuse=3Dtrue` and `tlb_inval_submitted=3Dtrue`.= This return value goes back to `xe_vma_userptr_invalidate_pass1`, then to = `xe_vma_userptr_invalidate_start`, which sets `*p_finish` to it. This **doe= s** schedule the finish pass in the mm core via `__llist_add`. Then `xe_vma= _userptr_invalidate_finish` is called, sees `tlb_inval_submitted=3Dtrue`, a= nd calls `xe_vma_userptr_complete_tlb_inval`. This path is **correct**. The other case: when `finish_inuse` is already true (concurrent invalidatio= n), `xe_vma_userptr_do_inval(vm, uvma, false)` with `finish_inuse=3Dtrue` f= alls through to sync `xe_vm_invalidate_vma` and returns NULL. Correct. And the deferred path from pass1: when `!signaled && !finish_inuse`, pass1 = returns `&userptr->finish` directly, finish is called, and `do_inval` with = `is_deferred=3Dtrue` does the sync path (since `finish_inuse` is already tr= ue). So the deferred TLB only happens in the "fences already signaled but T= LB flush needed" case. That makes sense =E2=80=94 if fences are done, you c= an immediately submit TLB flush and defer the wait. After re-analysis: **The logic is correct but extremely convoluted.** The t= hree interleaved flows with the `finish_inuse` and `tlb_inval_submitted` bo= oleans create a complex state machine that's hard to reason about. The asse= rtions help, but this would benefit from a state diagram in a comment. **Per-userptr struct bloat:** Patch 4 embeds `struct xe_tlb_inval_batch` (~450 bytes) into every `struct = xe_userptr`. This is added to the already-present `struct mmu_interval_noti= fier_finish` (from patch 2). For systems with many userptr VMAs, this is si= gnificant memory overhead for a feature that only activates in fault mode w= ith multi-GPU. Consider whether the batch could be dynamically allocated (w= ith fallback to sync) instead of embedded. **`xe_vma_userptr_force_invalidate` in patch 4:** ```c finish =3D xe_vma_userptr_invalidate_pass1(vm, uvma); if (finish) finish =3D xe_vma_userptr_do_inval(vm, uvma, true); if (finish) xe_vma_userptr_complete_tlb_inval(vm, uvma); ``` This still uses the `static` variable from patch 2 (which is a bug as noted= above). Also, the second `if (finish)` can never be true because `do_inval= ` with `is_deferred=3Dtrue` always falls through to the sync path (since `f= inish_inuse` is already true when `is_deferred=3Dtrue`). The third line is = dead code in the current implementation. The `static` bug and this dead cod= e suggest the force_invalidate path hasn't been fully thought through. Overall: **Logic is correct after careful analysis, but the state machine i= s very complex. Bug inherited from patch 2 (static variable). Per-userptr m= emory overhead is significant. Dead code in force_invalidate.** --- ## Summary of Issues | Severity | Patch | Issue | |----------|-------|-------| | Bug | 2/4 | `static struct mmu_interval_notifier_finish *finish` in `xe_v= ma_userptr_force_invalidate` =E2=80=94 should be a local variable | | Medium | 4/4 | ~450 bytes of `xe_tlb_inval_batch` embedded per-userptr fo= r a feature only used in fault mode multi-GPU | | Medium | 4/4 | Three-pass state machine through a two-pass API is hard to= follow; would benefit from a state diagram comment | | Low | 3/4 | `tile_invalidated` is now set before TLB flush completes =E2= =80=94 semantics should be documented | | Low | 3/4 | `xe_tlb_inval_types.h` pulls in heavyweight `xe_device_types.= h` for a constant | | Nit | 1/4 | Duplicated dispatch logic between `mn_itree_release` and `mn_= itree_invalidate` could use a helper | | Nit | 4/4 | Third `if (finish)` in `xe_vma_userptr_force_invalidate` is d= ead code | --- Generated by Claude Code Patch Reviewer