From mboxrd@z Thu Jan 1 00:00:00 1970 From: Claude Code Review Bot To: dri-devel-reviews@example.com Subject: Claude review: drm/ttm: Hook up a cgroup-aware reclaim callback for the dmem controller Date: Sat, 16 May 2026 14:48:43 +1000 Message-ID: In-Reply-To: <20260511173008.36526-4-thomas.hellstrom@linux.intel.com> References: <20260511173008.36526-1-thomas.hellstrom@linux.intel.com> <20260511173008.36526-4-thomas.hellstrom@linux.intel.com> X-Mailer: Claude Code Patch Reviewer Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit MIME-Version: 1.0 Patch Review **Verdict: Good with one concern worth discussing.** The `ttm_bo_evict_cgroup()` function mirrors the multi-pass strategy of `ttm_bo_evict_alloc()`: trylock pass, low-watermark pass, then sleeping-lock passes. The structure is clean. The new `sleeping_lock` field in `ttm_lru_walk_arg`: ```c bool sleeping_lock; ``` The change in `__ttm_bo_lru_cursor_next()`: ```c - else if (!arg->ticket || arg->ctx->no_wait_gpu || arg->trylock_only) + else if ((!arg->ticket && !arg->sleeping_lock) || arg->ctx->no_wait_gpu || + arg->trylock_only) ``` When `sleeping_lock = true` and `ticket = NULL`, this falls through to `ttm_lru_walk_ticketlock()` which calls `dma_resv_lock(bo->base.resv, NULL)`. Without a WW ticket, there's no deadlock detection. As the commit message notes, this relies on the caller only holding one lock at a time. This is true here since the cgroup reclaim path has no pre-existing `dma_resv` locks. Using `interruptible = true` mitigates hard deadlocks since the process can be signalled, but a theoretical lock-order inversion could still cause a soft deadlock. Acceptable for this use case (admin writing to cgroup files), but should be flagged as a limitation. The modification to `ttm_bo_evict_cb()` for cgroup drain mode: ```c + s64 bo_size = bo->base.size; ... + } else { + /* Cgroup drain: return bytes freed for byte-denominated progress. */ + return bo_size; + } ``` This captures `bo->base.size` before eviction. `bo->base.size` is the GEM object size which doesn't change during eviction, so the early capture is defensive but harmless. Returning `bo_size` instead of `1` is correct since `ttm_lru_walk_for_evict()` accumulates these values and the caller (`ttm_bo_evict_cgroup`) compares against `target_bytes`. The comment about `evict_walk->place` being NULL in cgroup drain mode is important: ```c + /* + * evict_walk->place is NULL in cgroup drain mode. Drivers' + * eviction_valuable() callbacks must handle a NULL place, treating it + * as "any placement": the TTM base implementation already does so via + * ttm_resource_intersects(). + */ ``` I verified that `ttm_resource_intersects()` returns `true` when `place == NULL` (at `ttm_resource.c:450`), so `ttm_bo_eviction_valuable()` correctly handles this. The xe driver's `xe_bo_eviction_valuable()` delegates to `ttm_bo_eviction_valuable()` as its first check and doesn't use `place` directly, so it's also safe without modification. Only amdgpu needs the explicit NULL guard (patch 5). In `ttm_resource_manager_set_dmem_region()`: ```c if (!IS_ERR_OR_NULL(region)) { man->cg = region; dmem_cgroup_region_set_reclaim(region, ...); } ``` This correctly handles the error/NULL case (v3 fix from Sashiko-bot), leaving `man->cg` at its default (NULL/zero) if registration failed. --- Generated by Claude Code Patch Reviewer