public inbox for drm-ai-reviews@public-inbox.freedesktop.org
 help / color / mirror / Atom feed
From: Steven Price <steven.price@arm.com>
To: Boris Brezillon <boris.brezillon@collabora.com>,
	Liviu Dudau <liviu.dudau@arm.com>,
	Adrián Larumbe <adrian.larumbe@collabora.com>
Cc: dri-devel@lists.freedesktop.org, David Airlie <airlied@gmail.com>,
	Simona Vetter <simona@ffwll.ch>, Akash Goel <akash.goel@arm.com>,
	Rob Clark <robin.clark@oss.qualcomm.com>,
	Sean Paul <sean@poorly.run>,
	Konrad Dybcio <konradybcio@kernel.org>,
	Akhil P Oommen <akhilpo@oss.qualcomm.com>,
	Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
	Maxime Ripard <mripard@kernel.org>,
	Thomas Zimmermann <tzimmermann@suse.de>,
	Dmitry Osipenko <dmitry.osipenko@collabora.com>,
	Chris Diamand <chris.diamand@arm.com>,
	Danilo Krummrich <dakr@kernel.org>,
	Matthew Brost <matthew.brost@intel.com>,
	Thomas Hellström <thomas.hellstrom@linux.intel.com>,
	Alice Ryhl <aliceryhl@google.com>, Chia-I Wu <olvaffe@gmail.com>,
	kernel@collabora.com
Subject: Re: [PATCH v6 0/9] drm/panthor: Add a GEM shrinker
Date: Mon, 30 Mar 2026 11:39:00 +0100	[thread overview]
Message-ID: <8b2b65a3-3db8-469f-90ae-6abccdb3c71a@arm.com> (raw)
In-Reply-To: <20260330094848.2169422-1-boris.brezillon@collabora.com>

Hi Boris,

On 30/03/2026 10:48, Boris Brezillon wrote:
> Hello,
> 
> This is an attempt at adding a GEM shrinker to panthor so the system
> can finally reclaim GPU memory.
> 
> This implementation is losely based on the MSM shrinker (which is why
> I added the MSM maintainers in Cc), and it's relying on the drm_gpuvm
> eviction/validation infrastructure.
> 
> I've only done very basic IGT-based [1] and chromium-based (opening
> a lot of tabs on Aquarium until the system starts reclaiming+swapping
> out GPU buffers) testing, but I'm posting this early so I can get
> preliminary feedback on the implementation. If someone knows about
> better tools/ways to test the shrinker, please let me know.

I did my own pretty basic testing (glmark with memhog) and managed to hit this:

[  265.053172] ============================================
[  265.053667] WARNING: possible recursive locking detected
[  265.054159] 7.0.0-rc3-00694-gadfa5ca08767 #1 Not tainted
[  265.054655] --------------------------------------------
[  265.055143] glmark2-es2-drm/443 is trying to acquire lock:
[  265.055651] ffff0001194011a8 (reservation_ww_class_mutex){+.+.}-{4:4}, at: drm_gpuvm_bo_deferred_cleanup+0x100/0x2c0 [drm_gpuvm]
[  265.056738] 
[  265.056738] but task is already holding lock:
[  265.057278] ffff80008b7c79e8 (reservation_ww_class_mutex){+.+.}-{4:4}, at: panthor_ioctl_group_submit+0x424/0x560 [panthor]
[  265.058324] 
[  265.058324] other info that might help us debug this:
[  265.058927]  Possible unsafe locking scenario:
[  265.058927] 
[  265.059475]        CPU0
[  265.059706]        ----
[  265.059939]   lock(reservation_ww_class_mutex);
[  265.060365]   lock(reservation_ww_class_mutex);
[  265.060788] 
[  265.060788]  *** DEADLOCK ***
[  265.060788] 
[  265.061338]  May be due to missing lock nesting notation
[  265.061338] 
[  265.061964] 3 locks held by glmark2-es2-drm/443:
[  265.062395]  #0: ffff80008493c458 (drm_unplug_srcu){.+.+}-{0:0}, at: drm_dev_enter+0x0/0x140
[  265.063188]  #1: ffff80008b7c79c0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: panthor_ioctl_group_submit+0x424/0x560 [panthor]
[  265.064288]  #2: ffff80008b7c79e8 (reservation_ww_class_mutex){+.+.}-{4:4}, at: panthor_ioctl_group_submit+0x424/0x560 [panthor]
[  265.065370] 
[  265.065370] stack backtrace:
[  265.065780] CPU: 4 UID: 1000 PID: 443 Comm: glmark2-es2-drm Not tainted 7.0.0-rc3-00694-gadfa5ca08767 #1 PREEMPT 
[  265.065787] Hardware name: Radxa ROCK 5B (DT)
[  265.065791] Call trace:
[  265.065793]  show_stack+0x18/0x24 (C)
[  265.065802]  dump_stack_lvl+0x6c/0x94
[  265.065810]  dump_stack+0x1c/0x28
[  265.065815]  print_deadlock_bug+0x224/0x238
[  265.065822]  __lock_acquire+0xe54/0x1600
[  265.065829]  lock_acquire+0x3cc/0x420
[  265.065834]  __ww_mutex_lock.constprop.0+0x1fc/0x2c40
[  265.065844]  ww_mutex_lock+0x50/0x168
[  265.065850]  drm_gpuvm_bo_deferred_cleanup+0x100/0x2c0 [drm_gpuvm]
[  265.065862]  panthor_vm_cleanup_op_ctx+0x188/0x270 [panthor]
[  265.065881]  panthor_vm_bo_validate+0x404/0x758 [panthor]
[  265.065898]  drm_gpuvm_validate+0x28c/0xf50 [drm_gpuvm]
[  265.065907]  panthor_vm_prepare_mapped_bos_resvs+0x64/0x80 [panthor]
[  265.065925]  panthor_ioctl_group_submit+0x418/0x560 [panthor]
[  265.065942]  drm_ioctl_kernel+0x15c/0x2c0
[  265.065947]  drm_ioctl+0x56c/0xb1c
[  265.065952]  __arm64_sys_ioctl+0x124/0x1a4
[  265.065961]  invoke_syscall+0x70/0x260
[  265.065967]  el0_svc_common.constprop.0+0xac/0x230
[  265.065972]  do_el0_svc+0x40/0x58
[  265.065976]  el0_svc+0x4c/0x210
[  265.065981]  el0t_64_sync_handler+0xa0/0xe4
[  265.065986]  el0t_64_sync+0x198/0x19c

Thanks,
Steve

> A few words about some design/implementation choices:
> - No MADVISE support because I want to see if we can live with just
>   transparent reclaim
> - We considered basing this implementation on the generic shrinker work
>   started by Dmitry [2], but
>   1. with the activeness/idleness tracking happening at the VM
>      granularity, having per-BO LRUs would caused a lot of
>      list_move()s that are not really needed (the VM as a whole
>      become active/idle, we can track individual BOs)
>   2. Thomas Zimmermann recently suggested that we should have our
>      own GEM implementation instead of trying to add this extra reclaim
>      complexity to gem-shmem. There are some plans to create a
>      gem-uma (Unified Memory Architecture) lib that would do more
>      than gem-shmem but in a way that doesn't force all its users
>      to pay the overhead (size overhead of the gem object, mostly)
>      for features they don't use. Patch "Part ways with
>      drm_gem_shmem_object" is showing what this component-based lib
>      API could look like if it were to be extracted
> - At the moment we only support swapout, but we could add an
>   extra flag to specify when buffer content doesn't need to be
>   preserved to avoid the swapout/swapin dance. First candidate for
>   this DISCARD_ON_RECLAIM flag would probably be the tiler heap chunks.
> - Reclaim uses _try_lock() all the way because of the various lock order
>   inversions between the reclaim path and submission paths. That means
>   we don't try very hard to reclaim hot GPU buffers, but the locking is
>   such a mess that I don't really see a better option to be honest.
> 
> 
> Changes in v2:
> - No fundamental changes in this v2, since the feedback I got were more
>   focused on bugs than the overall approach. Check the changelog in each
>   patch for more details.
> 
> Changes in v3:
> - Mostly fixes (see the changelog in each patch)
> 
> Changes in v4:
> - Only fixes (see the changelog in each patch)
> 
> Changes in v5:
> - Only fixes (see the changelog in each patch)
> 
> Changes in v6:
> - Fix huge fault handling and various other things reported by Adrian
>   and the review bot (see the changelog in each patch for more)
> 
> Regards,
> 
> Boris
> 
> [1]https://gitlab.freedesktop.org/bbrezillon/igt-gpu-tools/-/commit/fc76934a5579767d2aabe787d40e38a17c3f4ea4
> [2]https://lkml.org/lkml/2024/1/5/665
> 
> Akash Goel (1):
>   drm/panthor: Add a GEM shrinker
> 
> Boris Brezillon (8):
>   drm/gem: Consider GEM object reclaimable if shrinking fails
>   drm/panthor: Move panthor_gems_debugfs_init() to panthor_gem.c
>   drm/panthor: Group panthor_kernel_bo_xxx() helpers
>   drm/panthor: Don't call drm_gpuvm_bo_extobj_add() if the object is
>     private
>   drm/panthor: Part ways with drm_gem_shmem_object
>   drm/panthor: Lazily allocate pages on mmap()
>   drm/panthor: Split panthor_vm_prepare_map_op_ctx() to prepare for
>     reclaim
>   drm/panthor: Track the number of mmap on a BO
> 
>  drivers/gpu/drm/drm_gem.c                |   10 +
>  drivers/gpu/drm/panthor/Kconfig          |    1 -
>  drivers/gpu/drm/panthor/panthor_device.c |   11 +-
>  drivers/gpu/drm/panthor/panthor_device.h |   73 ++
>  drivers/gpu/drm/panthor/panthor_drv.c    |   33 +-
>  drivers/gpu/drm/panthor/panthor_fw.c     |   16 +-
>  drivers/gpu/drm/panthor/panthor_gem.c    | 1453 ++++++++++++++++++----
>  drivers/gpu/drm/panthor/panthor_gem.h    |  136 +-
>  drivers/gpu/drm/panthor/panthor_mmu.c    |  502 ++++++--
>  drivers/gpu/drm/panthor/panthor_mmu.h    |    8 +
>  drivers/gpu/drm/panthor/panthor_sched.c  |    9 +-
>  11 files changed, 1911 insertions(+), 341 deletions(-)
> 


  parent reply	other threads:[~2026-03-30 10:39 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-30  9:48 [PATCH v6 0/9] drm/panthor: Add a GEM shrinker Boris Brezillon
2026-03-30  9:48 ` [PATCH v6 1/9] drm/gem: Consider GEM object reclaimable if shrinking fails Boris Brezillon
2026-03-31  7:24   ` Claude review: " Claude Code Review Bot
2026-03-30  9:48 ` [PATCH v6 2/9] drm/panthor: Move panthor_gems_debugfs_init() to panthor_gem.c Boris Brezillon
2026-03-31  7:24   ` Claude review: " Claude Code Review Bot
2026-03-30  9:48 ` [PATCH v6 3/9] drm/panthor: Group panthor_kernel_bo_xxx() helpers Boris Brezillon
2026-03-31  7:24   ` Claude review: " Claude Code Review Bot
2026-03-30  9:48 ` [PATCH v6 4/9] drm/panthor: Don't call drm_gpuvm_bo_extobj_add() if the object is private Boris Brezillon
2026-03-31  7:24   ` Claude review: " Claude Code Review Bot
2026-03-30  9:48 ` [PATCH v6 5/9] drm/panthor: Part ways with drm_gem_shmem_object Boris Brezillon
2026-03-31  7:24   ` Claude review: " Claude Code Review Bot
2026-03-30  9:48 ` [PATCH v6 6/9] drm/panthor: Lazily allocate pages on mmap() Boris Brezillon
2026-03-31  7:24   ` Claude review: " Claude Code Review Bot
2026-03-30  9:48 ` [PATCH v6 7/9] drm/panthor: Split panthor_vm_prepare_map_op_ctx() to prepare for reclaim Boris Brezillon
2026-03-31  7:24   ` Claude review: " Claude Code Review Bot
2026-03-30  9:48 ` [PATCH v6 8/9] drm/panthor: Track the number of mmap on a BO Boris Brezillon
2026-03-31  7:24   ` Claude review: " Claude Code Review Bot
2026-03-30  9:48 ` [PATCH v6 9/9] drm/panthor: Add a GEM shrinker Boris Brezillon
2026-03-31  7:24   ` Claude review: " Claude Code Review Bot
2026-03-30 10:39 ` Steven Price [this message]
2026-03-31  7:24 ` Claude Code Review Bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8b2b65a3-3db8-469f-90ae-6abccdb3c71a@arm.com \
    --to=steven.price@arm.com \
    --cc=adrian.larumbe@collabora.com \
    --cc=airlied@gmail.com \
    --cc=akash.goel@arm.com \
    --cc=akhilpo@oss.qualcomm.com \
    --cc=aliceryhl@google.com \
    --cc=boris.brezillon@collabora.com \
    --cc=chris.diamand@arm.com \
    --cc=dakr@kernel.org \
    --cc=dmitry.osipenko@collabora.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=kernel@collabora.com \
    --cc=konradybcio@kernel.org \
    --cc=liviu.dudau@arm.com \
    --cc=maarten.lankhorst@linux.intel.com \
    --cc=matthew.brost@intel.com \
    --cc=mripard@kernel.org \
    --cc=olvaffe@gmail.com \
    --cc=robin.clark@oss.qualcomm.com \
    --cc=sean@poorly.run \
    --cc=simona@ffwll.ch \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox