From mboxrd@z Thu Jan  1 00:00:00 1970
From: Claude Code Review Bot <claude-review@example.com>
To: dri-devel-reviews@example.com
Subject: Claude review: drm/xe: Split TLB invalidation into submit and wait steps
Date: Tue, 03 Mar 2026 13:05:44 +1000
Message-ID: <review-patch3-20260302163248.105454-4-thomas.hellstrom@linux.intel.com>
In-Reply-To: <20260302163248.105454-4-thomas.hellstrom@linux.intel.com>
References: <20260302163248.105454-1-thomas.hellstrom@linux.intel.com> <20260302163248.105454-4-thomas.hellstrom@linux.intel.com>
X-Mailer: Claude Code Patch Reviewer
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
MIME-Version: 1.0

Patch Review

This is a clean refactor that extracts the submit-and-wait logic from `xe_vm_range_tilemask_tlb_inval` into two separate functions in `xe_tlb_inval.c`.

**`xe_tlb_inval_batch` struct size concern:**

```c
struct xe_tlb_inval_batch {
	struct xe_tlb_inval_fence fence[XE_MAX_TILES_PER_DEVICE * XE_MAX_GT_PER_TILE];
	unsigned int num_fences;
};
```

With `XE_MAX_TILES_PER_DEVICE=2` and `XE_MAX_GT_PER_TILE=2`, this is 4 `xe_tlb_inval_fence` structs. Each contains a `struct dma_fence` (~80 bytes) plus overhead, so the batch is roughly ~400-500 bytes. This was already the case in the original `xe_vm_range_tilemask_tlb_inval` (which had `fence[XE_MAX_TILES_PER_DEVICE * XE_MAX_GT_PER_TILE]` on the stack), so it's not a regression. However, in patch 4 this struct gets embedded in `xe_userptr`, which means every userptr VMA now carries this overhead permanently. Worth noting but probably acceptable.

**Header include in `xe_tlb_inval_types.h`:**

```c
+#include "xe_device_types.h"
```

This pulls in the full device types header just for `XE_MAX_TILES_PER_DEVICE` and `XE_MAX_GT_PER_TILE`. Consider whether these constants could be defined in a smaller header or if `xe_device.h` (which has the `#define XE_MAX_GT_PER_TILE 2`) would be lighter-weight. Types headers generally try to minimize includes to avoid circular dependency issues.

**Error handling in `xe_svm_invalidate` caller:**

```c
	err = xe_tlb_inval_range_tilemask_submit(xe, vm->usm.asid, adj_start, adj_end,
						 tile_mask, &_batch);
	xe_tlb_inval_batch_wait(&_batch);
	WARN_ON_ONCE(err);
```

On error, `xe_tlb_inval_range_tilemask_submit` already calls `xe_tlb_inval_batch_wait` internally (the `goto wait` path sets `num_fences` and waits). Then the caller calls `xe_tlb_inval_batch_wait` again, but `num_fences` is 0 after the internal wait, so it's a no-op. This is harmless but slightly confusing - consider documenting that on error the batch is already waited/cleaned up.

**`xe_vm_invalidate_vma` behavior change:**

The original code called `xe_vm_range_tilemask_tlb_inval` which waited on error. The new code:
```c
	ret = xe_tlb_inval_range_tilemask_submit(xe, ..., &_batch);
	WRITE_ONCE(vma->tile_invalidated, vma->tile_mask);
	if (!ret)
		xe_tlb_inval_batch_wait(&_batch);
```

The `WRITE_ONCE(vma->tile_invalidated, vma->tile_mask)` now happens before the wait completes. In the original code it happened after the wait (since `xe_vm_range_tilemask_tlb_inval` blocked). This reordering seems intentional and is maintained in patch 4, but it means `tile_invalidated` is set before the TLB flush completes. Is this semantically correct? The pairing with `READ_ONCE` in `xe_vm_has_valid_gpu_mapping()` suggests this flag is read to check if invalidation was *initiated*, not *completed*, so it's likely fine.

**Naming: `_batch` with underscore prefix:**

The local variables in `xe_svm_invalidate` and `xe_vm_invalidate_vma` use `_batch` with a leading underscore. In kernel style, leading underscores on local variables are unusual and typically reserved for function/macro names. Consider just `batch` (as used in `xe_vm_invalidate_madvise_range`).

---

---
Generated by Claude Code Patch Reviewer