From mboxrd@z Thu Jan 1 00:00:00 1970 From: Claude Code Review Bot To: dri-devel-reviews@example.com Subject: Claude review: drm/xe: Split TLB invalidation into submit and wait steps Date: Tue, 03 Mar 2026 13:05:44 +1000 Message-ID: In-Reply-To: <20260302163248.105454-4-thomas.hellstrom@linux.intel.com> References: <20260302163248.105454-1-thomas.hellstrom@linux.intel.com> <20260302163248.105454-4-thomas.hellstrom@linux.intel.com> X-Mailer: Claude Code Patch Reviewer Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit MIME-Version: 1.0 Patch Review This is a clean refactor that extracts the submit-and-wait logic from `xe_vm_range_tilemask_tlb_inval` into two separate functions in `xe_tlb_inval.c`. **`xe_tlb_inval_batch` struct size concern:** ```c struct xe_tlb_inval_batch { struct xe_tlb_inval_fence fence[XE_MAX_TILES_PER_DEVICE * XE_MAX_GT_PER_TILE]; unsigned int num_fences; }; ``` With `XE_MAX_TILES_PER_DEVICE=2` and `XE_MAX_GT_PER_TILE=2`, this is 4 `xe_tlb_inval_fence` structs. Each contains a `struct dma_fence` (~80 bytes) plus overhead, so the batch is roughly ~400-500 bytes. This was already the case in the original `xe_vm_range_tilemask_tlb_inval` (which had `fence[XE_MAX_TILES_PER_DEVICE * XE_MAX_GT_PER_TILE]` on the stack), so it's not a regression. However, in patch 4 this struct gets embedded in `xe_userptr`, which means every userptr VMA now carries this overhead permanently. Worth noting but probably acceptable. **Header include in `xe_tlb_inval_types.h`:** ```c +#include "xe_device_types.h" ``` This pulls in the full device types header just for `XE_MAX_TILES_PER_DEVICE` and `XE_MAX_GT_PER_TILE`. Consider whether these constants could be defined in a smaller header or if `xe_device.h` (which has the `#define XE_MAX_GT_PER_TILE 2`) would be lighter-weight. Types headers generally try to minimize includes to avoid circular dependency issues. **Error handling in `xe_svm_invalidate` caller:** ```c err = xe_tlb_inval_range_tilemask_submit(xe, vm->usm.asid, adj_start, adj_end, tile_mask, &_batch); xe_tlb_inval_batch_wait(&_batch); WARN_ON_ONCE(err); ``` On error, `xe_tlb_inval_range_tilemask_submit` already calls `xe_tlb_inval_batch_wait` internally (the `goto wait` path sets `num_fences` and waits). Then the caller calls `xe_tlb_inval_batch_wait` again, but `num_fences` is 0 after the internal wait, so it's a no-op. This is harmless but slightly confusing - consider documenting that on error the batch is already waited/cleaned up. **`xe_vm_invalidate_vma` behavior change:** The original code called `xe_vm_range_tilemask_tlb_inval` which waited on error. The new code: ```c ret = xe_tlb_inval_range_tilemask_submit(xe, ..., &_batch); WRITE_ONCE(vma->tile_invalidated, vma->tile_mask); if (!ret) xe_tlb_inval_batch_wait(&_batch); ``` The `WRITE_ONCE(vma->tile_invalidated, vma->tile_mask)` now happens before the wait completes. In the original code it happened after the wait (since `xe_vm_range_tilemask_tlb_inval` blocked). This reordering seems intentional and is maintained in patch 4, but it means `tile_invalidated` is set before the TLB flush completes. Is this semantically correct? The pairing with `READ_ONCE` in `xe_vm_has_valid_gpu_mapping()` suggests this flag is read to check if invalidation was *initiated*, not *completed*, so it's likely fine. **Naming: `_batch` with underscore prefix:** The local variables in `xe_svm_invalidate` and `xe_vm_invalidate_vma` use `_batch` with a leading underscore. In kernel style, leading underscores on local variables are unusual and typically reserved for function/macro names. Consider just `batch` (as used in `xe_vm_invalidate_madvise_range`). --- --- Generated by Claude Code Patch Reviewer