From: Honglei Huang <honglei1.huang@amd.com>
To: <Felix.Kuehling@amd.com>, <alexander.deucher@amd.com>,
<christian.koenig@amd.com>, <Philip.Yang@amd.com>,
<Ray.Huang@amd.com>
Cc: <dmitry.osipenko@collabora.com>, <airlied@gmail.com>,
<daniel@ffwll.ch>, <amd-gfx@lists.freedesktop.org>,
<dri-devel@lists.freedesktop.org>, <linux-kernel@vger.kernel.org>,
<linux-mm@kvack.org>, <akpm@linux-foundation.org>,
<honghuan@amd.com>
Subject: [PATCH v4 0/8] drm/amdkfd: Add batch userptr allocation support
Date: Mon, 9 Feb 2026 14:10:39 +0800 [thread overview]
Message-ID: <20260209061047.3881808-1-honglei1.huang@amd.com> (raw)
From: Honglei Huang <honghuan@amd.com>
Hi all,
This is v4 of the patch series to support allocating multiple non-contiguous
CPU virtual address ranges that map to a single contiguous GPU virtual address.
v4:
1. Fixed hmm_range.notifier_seq limitation: add per-range 'valid' flag
- Use explicit 'valid' flag instead of notifier seq in user_range_info
to track whether each range's pages are valid or have been invalidated
2. Fixed mmap_read_lock usage: hold mmap_read_lock across entire batch
page fault operation instead of per-range locking
- Renamed get_user_pages_batch() to get_user_pages_batch_locked()
- Caller now holds mmap_read_lock for entire batch operation
3. Improved error handling: Better cleanup paths and error messages
v3:
1. No new ioctl: Reuses existing AMDKFD_IOC_ALLOC_MEMORY_OF_GPU
- Adds only one flag: KFD_IOC_ALLOC_MEM_FLAGS_USERPTR_BATCH
- When flag is set, mmap_offset field points to range array
- Minimal API surface change
2. Improved MMU notifier handling:
- Single mmu_interval_notifier covering the VA span [va_min, va_max]
- Interval tree for efficient lookup of affected ranges during invalidation
- Avoids per-range notifier overhead mentioned in v2 review
3. Better code organization: Split into 8 focused patches for easier review
v2:
- Each CPU VA range gets its own mmu_interval_notifier for invalidation
- All ranges validated together and mapped to contiguous GPU VA
- Single kgd_mem object with array of user_range_info structures
- Unified eviction/restore path for all ranges in a batch
Current Implementation Approach
===============================
This series implements a practical solution within existing kernel constraints:
1. Single MMU notifier for VA span: Register one notifier covering the
entire range from lowest to highest address in the batch
2. Interval tree filtering: Use interval tree to efficiently identify
which specific ranges are affected during invalidation callbacks,
avoiding unnecessary processing for unrelated address changes
3. Unified eviction/restore: All ranges in a batch share eviction and
restore paths, maintaining consistency with existing userptr handling
Patch Series Overview
=====================
Patch 1/8: Add userptr batch allocation UAPI structures
- KFD_IOC_ALLOC_MEM_FLAGS_USERPTR_BATCH flag
- kfd_ioctl_userptr_range and kfd_ioctl_userptr_ranges_data structures
Patch 2/8: Add user_range_info infrastructure to kgd_mem
- user_range_info structure for per-range tracking
- Fields for batch allocation in kgd_mem
Patch 3/8: Implement interval tree for userptr ranges
- Interval tree for efficient range lookup during invalidation
- mark_invalid_ranges() function
Patch 4/8: Add batch MMU notifier support
- Single notifier for entire VA span
- Invalidation callback using interval tree filtering
Patch 5/8: Implement batch userptr page management
- get_user_pages_batch_locked() and set_user_pages_batch()
- Per-range page array management
Patch 6/8: Add batch allocation function and export API
- init_user_pages_batch() main initialization
- amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu_batch() entry point
Patch 7/8: Unify userptr cleanup and update paths
- Shared eviction/restore handling for batch allocations
- Integration with existing userptr validation flows
Patch 8/8: Wire up batch allocation in ioctl handler
- Input validation and range array parsing
- Integration with existing alloc_memory_of_gpu path
Testing
=======
- Multiple scattered malloc() allocations (2-4000+ ranges)
- Various allocation sizes (4KB to 1G+ per range)
- Memory pressure scenarios and eviction/restore cycles
- OpenCL CTS and HIP catch tests in KVM guest environment
- AI workloads: Stable Diffusion, ComfyUI in virtualized environments
- Small LLM inference (3B-7B models)
- Benchmark score: 160,000 - 190,000 (80%-95% of bare metal)
- Performance improvement: 2x-2.4x faster than userspace approach
Thank you for your review and feedback.
Best regards,
Honglei Huang
Honglei Huang (8):
drm/amdkfd: Add userptr batch allocation UAPI structures
drm/amdkfd: Add user_range_info infrastructure to kgd_mem
drm/amdkfd: Implement interval tree for userptr ranges
drm/amdkfd: Add batch MMU notifier support
drm/amdkfd: Implement batch userptr page management
drm/amdkfd: Add batch allocation function and export API
drm/amdkfd: Unify userptr cleanup and update paths
drm/amdkfd: Wire up batch allocation in ioctl handler
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 23 +
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 559 +++++++++++++++++-
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 128 +++-
include/uapi/linux/kfd_ioctl.h | 31 +-
4 files changed, 717 insertions(+), 24 deletions(-)
--
2.34.1
next reply other threads:[~2026-02-09 6:11 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-09 6:10 Honglei Huang [this message]
2026-02-09 6:10 ` [PATCH v4 1/8] drm/amdkfd: Add userptr batch allocation UAPI structures Honglei Huang
2026-02-11 7:15 ` Claude review: " Claude Code Review Bot
2026-02-09 6:10 ` [PATCH v4 2/8] drm/amdkfd: Add user_range_info infrastructure to kgd_mem Honglei Huang
2026-02-11 7:15 ` Claude review: " Claude Code Review Bot
2026-02-09 6:10 ` [PATCH v4 3/8] drm/amdkfd: Implement interval tree for userptr ranges Honglei Huang
2026-02-11 7:15 ` Claude review: " Claude Code Review Bot
2026-02-09 6:10 ` [PATCH v4 4/8] drm/amdkfd: Add batch MMU notifier support Honglei Huang
2026-02-11 7:15 ` Claude review: " Claude Code Review Bot
2026-02-09 6:10 ` [PATCH v4 5/8] drm/amdkfd: Implement batch userptr page management Honglei Huang
2026-02-11 7:15 ` Claude review: " Claude Code Review Bot
2026-02-09 6:10 ` [PATCH v4 6/8] drm/amdkfd: Add batch allocation function and export API Honglei Huang
2026-02-11 7:15 ` Claude review: " Claude Code Review Bot
2026-02-09 6:10 ` [PATCH v4 7/8] drm/amdkfd: Unify userptr cleanup and update paths Honglei Huang
2026-02-11 7:15 ` Claude review: " Claude Code Review Bot
2026-02-09 6:10 ` [PATCH v4 8/8] drm/amdkfd: Wire up batch allocation in ioctl handler Honglei Huang
2026-02-11 7:15 ` Claude review: " Claude Code Review Bot
2026-02-11 7:15 ` Claude review: drm/amdkfd: Add batch userptr allocation support Claude Code Review Bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260209061047.3881808-1-honglei1.huang@amd.com \
--to=honglei1.huang@amd.com \
--cc=Felix.Kuehling@amd.com \
--cc=Philip.Yang@amd.com \
--cc=Ray.Huang@amd.com \
--cc=airlied@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=alexander.deucher@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=christian.koenig@amd.com \
--cc=daniel@ffwll.ch \
--cc=dmitry.osipenko@collabora.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=honghuan@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox