From mboxrd@z Thu Jan 1 00:00:00 1970 From: Claude Code Review Bot To: dri-devel-reviews@example.com Subject: Claude review: drm/panthor: Add explicit memory reclaim sysfs knob Date: Thu, 07 May 2026 13:42:51 +1000 Message-ID: In-Reply-To: <20260506-panthor-explicit-reclaim-v1-3-44f82ac147ce@collabora.com> References: <20260506-panthor-explicit-reclaim-v1-0-44f82ac147ce@collabora.com> <20260506-panthor-explicit-reclaim-v1-3-44f82ac147ce@collabora.com> X-Mailer: Claude Code Patch Reviewer Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Patch Review **Verdict: Needs rework for locking issues.** **Issue 1 =E2=80=94 `filelist_mutex` held during reclaim (serious):** `panthor_run_on_pfiles_of_tgid` holds `ptdev->base.filelist_mutex` for the = entire duration of the callback: ```c scoped_cond_guard(mutex_intr, return -EINTR, &ptdev->base.filelist_mutex) { list_for_each_entry(file, &ptdev->base.filelist, lhead) { ... func(file->driver_priv); /* called with filelist_mutex held */ } } ``` `panthor_mmu_force_reclaim` calls `panthor_mmu_reclaim_priv_bos` which call= s `drm_gem_lru_scan` =E2=86=92 `panthor_gem_try_evict`, which acquires BO r= eservation locks and does actual page eviction (writeback to swap). Holding= `filelist_mutex` throughout blocks all `drm_open`/`drm_close` on the devic= e. On an embedded system with multiple GPU clients, this could cause signif= icant stalls. The fix would be to collect references to the relevant `panthor_file` objec= ts under the mutex, then release the mutex before invoking the callbacks. **Issue 2 =E2=80=94 Missing xarray locking (bug):** ```c xa_for_each(&pfile->vms->xa, i, vm) { if (!vm->reclaim.lru.count) continue; nr_to_scan +=3D vm->reclaim.lru.count; list_move(&vm->reclaim.lru_node, &ptdev->reclaim.vms); } ``` Existing code in panthor (e.g., the fdinfo stats path at `panthor_mmu.c:208= 1`) takes `xa_lock(&pfile->vms->xa)` before iterating. Here the iteration r= uns without `xa_lock`, racing with concurrent `xa_store`/`xa_erase` from VM= create/destroy ioctls. A concurrent `panthor_vm_pool_destroy_vm` could era= se and free a VM while this loop is accessing it. **Issue 3 =E2=80=94 No VM reference on the iterated VMs:** Compare with `panthor_vm_pool_get_vm` which does `panthor_vm_get(xa_load(..= .))` under `xa_lock`. The force_reclaim code accesses `vm->reclaim.lru.coun= t` and `vm->reclaim.lru_node` without a reference. If the VM is concurrentl= y destroyed, these accesses are use-after-free. **Issue 4 =E2=80=94 TGID lookup could be simplified:** ```c file_pid =3D rcu_dereference(file->pid); task =3D pid_task(file_pid, PIDTYPE_TGID); if (!task || task->tgid !=3D tgid) { ``` Since `file->pid` is set to `task_tgid(current)` in `drm_file_alloc`, `pid_= nr(file_pid)` already gives the TGID number. The task lookup is unnecessary= =E2=80=94 `pid_nr(file_pid) !=3D tgid` would suffice and avoids touching t= he task struct. **Issue 5 =E2=80=94 PID namespace correctness:** The user writes a TGID via sysfs, and the code compares it against `task->t= gid` (which is the global namespace pid). If the writer is in a PID namespa= ce, their view of the TGID won't match the global value. Using `pid_vnr()` = or documenting this as init-namespace-only would be more correct. **Issue 6 =E2=80=94 Negative TGID accepted:** `kstrtoint` allows negative values, and the code doesn't reject `tgid <=3D = 0`. While the lookup would simply fail with `-ESRCH`, explicitly rejecting = invalid TGIDs would be cleaner. --- Generated by Claude Code Patch Reviewer