* [PATCH v4 1/8] drm/amdkfd: Add userptr batch allocation UAPI structures
2026-02-09 6:10 [PATCH v4 0/8] drm/amdkfd: Add batch userptr allocation support Honglei Huang
@ 2026-02-09 6:10 ` Honglei Huang
2026-02-11 7:15 ` Claude review: " Claude Code Review Bot
2026-02-09 6:10 ` [PATCH v4 2/8] drm/amdkfd: Add user_range_info infrastructure to kgd_mem Honglei Huang
` (7 subsequent siblings)
8 siblings, 1 reply; 18+ messages in thread
From: Honglei Huang @ 2026-02-09 6:10 UTC (permalink / raw)
To: Felix.Kuehling, alexander.deucher, christian.koenig, Philip.Yang,
Ray.Huang
Cc: dmitry.osipenko, airlied, daniel, amd-gfx, dri-devel,
linux-kernel, linux-mm, akpm, honghuan
From: Honglei Huang <honghuan@amd.com>
Introduce new UAPI structures to support batch allocation of
non-contiguous userptr ranges in a single ioctl call.
add:
- KFD_IOC_ALLOC_MEM_FLAGS_USERPTR_BATCH flag
- struct kfd_ioctl_userptr_range for individual ranges
- struct kfd_ioctl_userptr_ranges_data for batch data
Signed-off-by: Honglei Huang <honghuan@amd.com>
---
include/uapi/linux/kfd_ioctl.h | 31 ++++++++++++++++++++++++++++++-
1 file changed, 30 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 84aa24c02..579850e70 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -420,16 +420,45 @@ struct kfd_ioctl_acquire_vm_args {
#define KFD_IOC_ALLOC_MEM_FLAGS_UNCACHED (1 << 25)
#define KFD_IOC_ALLOC_MEM_FLAGS_EXT_COHERENT (1 << 24)
#define KFD_IOC_ALLOC_MEM_FLAGS_CONTIGUOUS (1 << 23)
+#define KFD_IOC_ALLOC_MEM_FLAGS_USERPTR_BATCH (1 << 22)
+
+/* Userptr range for batch allocation
+ *
+ * @start: start address of user virtual memory range
+ * @size: size of this user virtual memory range in bytes
+ */
+struct kfd_ioctl_userptr_range {
+ __u64 start; /* to KFD */
+ __u64 size; /* to KFD */
+};
+
+/* Complete userptr batch allocation data structure
+ *
+ * This structure combines the header and ranges array for convenience.
+ * User space can allocate memory for this structure with the desired
+ * number of ranges and pass a pointer to it via mmap_offset field.
+ *
+ * @num_ranges: number of ranges in the ranges array
+ * @reserved: reserved for future use, must be 0
+ * @ranges: flexible array of userptr ranges
+ */
+struct kfd_ioctl_userptr_ranges_data {
+ __u32 num_ranges; /* to KFD */
+ __u32 reserved; /* to KFD, must be 0 */
+ struct kfd_ioctl_userptr_range ranges[]; /* to KFD */
+};
/* Allocate memory for later SVM (shared virtual memory) mapping.
*
* @va_addr: virtual address of the memory to be allocated
* all later mappings on all GPUs will use this address
- * @size: size in bytes
+ * @size: size in bytes (total size for batch allocation)
* @handle: buffer handle returned to user mode, used to refer to
* this allocation for mapping, unmapping and freeing
* @mmap_offset: for CPU-mapping the allocation by mmapping a render node
* for userptrs this is overloaded to specify the CPU address
+ * for batch userptr (KFD_IOC_ALLOC_MEM_FLAGS_USERPTR_BATCH),
+ * this should point to a kfd_ioctl_userptr_ranges_data structure
* @gpu_id: device identifier
* @flags: memory type and attributes. See KFD_IOC_ALLOC_MEM_FLAGS above
*/
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* Claude review: drm/amdkfd: Add userptr batch allocation UAPI structures
2026-02-09 6:10 ` [PATCH v4 1/8] drm/amdkfd: Add userptr batch allocation UAPI structures Honglei Huang
@ 2026-02-11 7:15 ` Claude Code Review Bot
0 siblings, 0 replies; 18+ messages in thread
From: Claude Code Review Bot @ 2026-02-11 7:15 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Overview:** Adds UAPI structures for batch allocation.
**Issues:**
1. **ABI Compatibility Concern - Critical:**
```c
+ * @mmap_offset: for CPU-mapping the allocation by mmapping a render node
+ * for userptrs this is overloaded to specify the CPU address
++ * for batch userptr (KFD_IOC_ALLOC_MEM_FLAGS_USERPTR_BATCH),
++ * this should point to a kfd_ioctl_userptr_ranges_data structure
```
Overloading `mmap_offset` to be a pointer in batch mode is problematic:
- Creates type confusion (u64 vs pointer)
- Different semantics on 32-bit vs 64-bit systems
- Makes it impossible to detect misuse at compile time
**Recommendation:** Consider adding a new field or using a union to make this type-safe.
2. **Potential Denial of Service:**
```c
+struct kfd_ioctl_userptr_ranges_data {
+ __u32 num_ranges; /* to KFD */
```
No documented limit on `num_ranges` could allow userspace to allocate arbitrary amounts of kernel memory. Should define `KFD_MAX_USERPTR_RANGES` (e.g., 4096) to prevent abuse.
3. **Missing Documentation:**
- No maximum limit documented for `num_ranges`
- No explanation of what happens if ranges overlap or have gaps
- No specification of required ordering (ascending addresses?)
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v4 2/8] drm/amdkfd: Add user_range_info infrastructure to kgd_mem
2026-02-09 6:10 [PATCH v4 0/8] drm/amdkfd: Add batch userptr allocation support Honglei Huang
2026-02-09 6:10 ` [PATCH v4 1/8] drm/amdkfd: Add userptr batch allocation UAPI structures Honglei Huang
@ 2026-02-09 6:10 ` Honglei Huang
2026-02-11 7:15 ` Claude review: " Claude Code Review Bot
2026-02-09 6:10 ` [PATCH v4 3/8] drm/amdkfd: Implement interval tree for userptr ranges Honglei Huang
` (6 subsequent siblings)
8 siblings, 1 reply; 18+ messages in thread
From: Honglei Huang @ 2026-02-09 6:10 UTC (permalink / raw)
To: Felix.Kuehling, alexander.deucher, christian.koenig, Philip.Yang,
Ray.Huang
Cc: dmitry.osipenko, airlied, daniel, amd-gfx, dri-devel,
linux-kernel, linux-mm, akpm, honghuan
From: Honglei Huang <honghuan@amd.com>
Add data structures to support batch userptr allocations with
multiple non-contiguous CPU virtual address ranges.
add:
- struct user_range_info: per-range metadata including HMM range,
invalidation counter, and interval tree node
- Fields to kgd_mem: num_user_ranges, user_ranges array,
batch_va_min/max, batch_notifier, and user_ranges_itree
Signed-off-by: Honglei Huang <honghuan@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 23 ++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 321cbf9a1..5b6d399f5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -48,6 +48,7 @@ enum TLB_FLUSH_TYPE {
struct amdgpu_device;
struct kfd_process_device;
+struct kfd_ioctl_userptr_range;
struct amdgpu_reset_context;
enum kfd_mem_attachment_type {
@@ -67,6 +68,15 @@ struct kfd_mem_attachment {
uint64_t pte_flags;
};
+/* Individual range info for batch userptr allocations */
+struct user_range_info {
+ uint64_t start; /* CPU virtual address start */
+ uint64_t size; /* Size in bytes */
+ struct hmm_range *range; /* HMM range for this userptr */
+ bool valid; /* Flag: true if pages are valid, false if invalidated */
+ struct interval_tree_node it_node; /* Interval tree node for fast overlap lookup */
+};
+
struct kgd_mem {
struct mutex lock;
struct amdgpu_bo *bo;
@@ -89,6 +99,14 @@ struct kgd_mem {
uint32_t gem_handle;
bool aql_queue;
bool is_imported;
+
+ /* For batch userptr allocation: multiple non-contiguous CPU VA ranges */
+ uint32_t num_user_ranges;
+ struct user_range_info *user_ranges;
+ uint64_t batch_va_min;
+ uint64_t batch_va_max;
+ struct mmu_interval_notifier batch_notifier;
+ struct rb_root_cached user_ranges_itree;
};
/* KFD Memory Eviction */
@@ -313,6 +331,11 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
struct amdgpu_device *adev, uint64_t va, uint64_t size,
void *drm_priv, struct kgd_mem **mem,
uint64_t *offset, uint32_t flags, bool criu_resume);
+int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu_batch(
+ struct amdgpu_device *adev, uint64_t va, uint64_t size,
+ void *drm_priv, struct kgd_mem **mem,
+ uint64_t *offset, struct kfd_ioctl_userptr_range *ranges,
+ uint32_t num_ranges, uint32_t flags, bool criu_resume);
int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
struct amdgpu_device *adev, struct kgd_mem *mem, void *drm_priv,
uint64_t *size);
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* Claude review: drm/amdkfd: Add user_range_info infrastructure to kgd_mem
2026-02-09 6:10 ` [PATCH v4 2/8] drm/amdkfd: Add user_range_info infrastructure to kgd_mem Honglei Huang
@ 2026-02-11 7:15 ` Claude Code Review Bot
0 siblings, 0 replies; 18+ messages in thread
From: Claude Code Review Bot @ 2026-02-11 7:15 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Overview:** Adds per-range metadata structures.
**Issues:**
1. **Race Condition - Critical:**
```c
+struct user_range_info {
+ uint64_t start;
+ uint64_t size;
+ struct hmm_range *range;
+ bool valid; /* Flag: true if pages are valid, false if invalidated */
+ struct interval_tree_node it_node;
+};
```
The `valid` flag is accessed without proper locking documentation. This flag is:
- Set in `init_user_pages_batch()` under `process_info->lock`
- Cleared in `discard_invalid_ranges()` under `process_info->notifier_lock`
- Read in validation paths potentially without locks
This creates a race window where `valid` could be stale.
2. **Type Inconsistency:**
Uses both `uint64_t` and `__u64`. Kernel code should prefer fixed-width types like `u64`.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v4 3/8] drm/amdkfd: Implement interval tree for userptr ranges
2026-02-09 6:10 [PATCH v4 0/8] drm/amdkfd: Add batch userptr allocation support Honglei Huang
2026-02-09 6:10 ` [PATCH v4 1/8] drm/amdkfd: Add userptr batch allocation UAPI structures Honglei Huang
2026-02-09 6:10 ` [PATCH v4 2/8] drm/amdkfd: Add user_range_info infrastructure to kgd_mem Honglei Huang
@ 2026-02-09 6:10 ` Honglei Huang
2026-02-11 7:15 ` Claude review: " Claude Code Review Bot
2026-02-09 6:10 ` [PATCH v4 4/8] drm/amdkfd: Add batch MMU notifier support Honglei Huang
` (5 subsequent siblings)
8 siblings, 1 reply; 18+ messages in thread
From: Honglei Huang @ 2026-02-09 6:10 UTC (permalink / raw)
To: Felix.Kuehling, alexander.deucher, christian.koenig, Philip.Yang,
Ray.Huang
Cc: dmitry.osipenko, airlied, daniel, amd-gfx, dri-devel,
linux-kernel, linux-mm, akpm, honghuan
From: Honglei Huang <honghuan@amd.com>
Add interval tree support for efficient lookup of affected userptr
ranges during MMU notifier callbacks.
add:
- Include linux/interval_tree.h
- discard_invalid_ranges() function that uses interval tree to
identify and mark ranges affected by a given invalidation event
Signed-off-by: Honglei Huang <honghuan@amd.com>
---
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 21 +++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index a32b46355..6fe4ed22c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -25,6 +25,7 @@
#include <linux/pagemap.h>
#include <linux/sched/mm.h>
#include <linux/sched/task.h>
+#include <linux/interval_tree.h>
#include <drm/ttm/ttm_tt.h>
#include <drm/drm_exec.h>
@@ -1122,6 +1123,26 @@ static int init_user_pages(struct kgd_mem *mem, uint64_t user_addr,
return ret;
}
+static bool discard_invalid_ranges(struct kgd_mem *mem,
+ unsigned long inv_start, unsigned long inv_end)
+{
+ struct interval_tree_node *node;
+ struct user_range_info *range_info;
+ bool any_invalid = false;
+
+ for (node = interval_tree_iter_first(&mem->user_ranges_itree, inv_start, inv_end - 1);
+ node;
+ node = interval_tree_iter_next(node, inv_start, inv_end - 1)) {
+ range_info = container_of(node, struct user_range_info, it_node);
+
+ range_info->valid = false;
+ any_invalid = true;
+ pr_debug("Range [0x%llx-0x%llx) marked invalid\n",
+ range_info->start, range_info->start + range_info->size);
+ }
+ return any_invalid;
+}
+
/* Reserving a BO and its page table BOs must happen atomically to
* avoid deadlocks. Some operations update multiple VMs at once. Track
* all the reservation info in a context structure. Optionally a sync
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* Claude review: drm/amdkfd: Implement interval tree for userptr ranges
2026-02-09 6:10 ` [PATCH v4 3/8] drm/amdkfd: Implement interval tree for userptr ranges Honglei Huang
@ 2026-02-11 7:15 ` Claude Code Review Bot
0 siblings, 0 replies; 18+ messages in thread
From: Claude Code Review Bot @ 2026-02-11 7:15 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Overview:** Adds interval tree for efficient range lookup.
**Issues:**
1. **Locking Missing - Critical:**
```c
+static bool discard_invalid_ranges(struct kgd_mem *mem,
+ unsigned long inv_start, unsigned long inv_end)
+{
+ for (node = interval_tree_iter_first(&mem->user_ranges_itree, inv_start, inv_end - 1);
```
No lock assertion or documentation. The interval tree is being iterated without documented protection. If ranges can be added/removed concurrently, this could crash.
2. **Integer Underflow:**
```c
+ for (node = interval_tree_iter_first(&mem->user_ranges_itree, inv_start, inv_end - 1);
```
If `inv_end == 0`, this underflows to `ULONG_MAX`. Should validate `inv_end > inv_start`.
3. **Debug Logging in Hot Path:**
```c
+ pr_debug("Range [0x%llx-0x%llx) marked invalid\n",
```
Called from MMU notifier callback which is performance-critical. Debug logging here could impact performance.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v4 4/8] drm/amdkfd: Add batch MMU notifier support
2026-02-09 6:10 [PATCH v4 0/8] drm/amdkfd: Add batch userptr allocation support Honglei Huang
` (2 preceding siblings ...)
2026-02-09 6:10 ` [PATCH v4 3/8] drm/amdkfd: Implement interval tree for userptr ranges Honglei Huang
@ 2026-02-09 6:10 ` Honglei Huang
2026-02-11 7:15 ` Claude review: " Claude Code Review Bot
2026-02-09 6:10 ` [PATCH v4 5/8] drm/amdkfd: Implement batch userptr page management Honglei Huang
` (4 subsequent siblings)
8 siblings, 1 reply; 18+ messages in thread
From: Honglei Huang @ 2026-02-09 6:10 UTC (permalink / raw)
To: Felix.Kuehling, alexander.deucher, christian.koenig, Philip.Yang,
Ray.Huang
Cc: dmitry.osipenko, airlied, daniel, amd-gfx, dri-devel,
linux-kernel, linux-mm, akpm, honghuan
From: Honglei Huang <honghuan@amd.com>
Implement MMU notifier callbacks for batch userptr allocations.
This adds:
- amdgpu_amdkfd_evict_userptr_batch(): handles MMU invalidation
events for batch allocations, using interval tree to identify
affected ranges
Signed-off-by: Honglei Huang <honghuan@amd.com>
---
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 59 +++++++++++++++++++
1 file changed, 59 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 6fe4ed22c..a22a99b8d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1143,6 +1143,65 @@ static bool discard_invalid_ranges(struct kgd_mem *mem,
return any_invalid;
}
+static int amdgpu_amdkfd_evict_userptr_batch(struct mmu_interval_notifier *mni,
+ const struct mmu_notifier_range *range,
+ unsigned long cur_seq)
+{
+ struct kgd_mem *mem;
+ struct amdkfd_process_info *process_info;
+ int r = 0;
+
+ mem = container_of(mni, struct kgd_mem, batch_notifier);
+ process_info = mem->process_info;
+
+ if (READ_ONCE(process_info->block_mmu_notifications))
+ return 0;
+
+ mutex_lock(&process_info->notifier_lock);
+ mmu_interval_set_seq(mni, cur_seq);
+
+ /* Discard affected ranges under notifier_lock for atomicity */
+ if (!discard_invalid_ranges(mem, range->start, range->end)) {
+ mutex_unlock(&process_info->notifier_lock);
+ pr_debug("Batch userptr: invalidation [0x%lx-0x%lx) does not affect any range\n",
+ range->start, range->end);
+ return 0;
+ }
+
+ mem->invalid++;
+
+ if (++process_info->evicted_bos == 1) {
+ r = kgd2kfd_quiesce_mm(mni->mm,
+ KFD_QUEUE_EVICTION_TRIGGER_USERPTR);
+
+ if (r && r != -ESRCH)
+ pr_err("Failed to quiesce KFD\n");
+
+ if (r != -ESRCH)
+ queue_delayed_work(system_freezable_wq,
+ &process_info->restore_userptr_work,
+ msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS));
+ }
+ mutex_unlock(&process_info->notifier_lock);
+
+ pr_debug("Batch userptr evicted: va_min=0x%llx va_max=0x%llx, inv_range=[0x%lx-0x%lx)\n",
+ mem->batch_va_min, mem->batch_va_max, range->start, range->end);
+
+ return r;
+}
+
+static bool amdgpu_amdkfd_invalidate_userptr_batch(struct mmu_interval_notifier *mni,
+ const struct mmu_notifier_range *range,
+ unsigned long cur_seq)
+{
+ amdgpu_amdkfd_evict_userptr_batch(mni, range, cur_seq);
+ return true;
+}
+
+static const struct mmu_interval_notifier_ops amdgpu_amdkfd_hsa_batch_ops = {
+ .invalidate = amdgpu_amdkfd_invalidate_userptr_batch,
+};
+
/* Reserving a BO and its page table BOs must happen atomically to
* avoid deadlocks. Some operations update multiple VMs at once. Track
* all the reservation info in a context structure. Optionally a sync
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* Claude review: drm/amdkfd: Add batch MMU notifier support
2026-02-09 6:10 ` [PATCH v4 4/8] drm/amdkfd: Add batch MMU notifier support Honglei Huang
@ 2026-02-11 7:15 ` Claude Code Review Bot
0 siblings, 0 replies; 18+ messages in thread
From: Claude Code Review Bot @ 2026-02-11 7:15 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Overview:** Implements MMU notifier for batch allocations.
**Issues:**
1. **Race Condition - Critical:**
After unlocking `notifier_lock`, the code continues to access `mem`:
```c
+ mutex_unlock(&process_info->notifier_lock);
+ mem->invalid++;
```
There's a TOCTOU (time-of-check-time-of-use) issue if `mem` can be freed between the check and increment.
2. **Return Value Ignored:**
```c
+static bool amdgpu_amdkfd_invalidate_userptr_batch(...)
+{
+ amdgpu_amdkfd_evict_userptr_batch(mni, range, cur_seq);
+ return true;
+}
```
Always returns true even if eviction failed. The caller might need to know about failures.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v4 5/8] drm/amdkfd: Implement batch userptr page management
2026-02-09 6:10 [PATCH v4 0/8] drm/amdkfd: Add batch userptr allocation support Honglei Huang
` (3 preceding siblings ...)
2026-02-09 6:10 ` [PATCH v4 4/8] drm/amdkfd: Add batch MMU notifier support Honglei Huang
@ 2026-02-09 6:10 ` Honglei Huang
2026-02-11 7:15 ` Claude review: " Claude Code Review Bot
2026-02-09 6:10 ` [PATCH v4 6/8] drm/amdkfd: Add batch allocation function and export API Honglei Huang
` (3 subsequent siblings)
8 siblings, 1 reply; 18+ messages in thread
From: Honglei Huang @ 2026-02-09 6:10 UTC (permalink / raw)
To: Felix.Kuehling, alexander.deucher, christian.koenig, Philip.Yang,
Ray.Huang
Cc: dmitry.osipenko, airlied, daniel, amd-gfx, dri-devel,
linux-kernel, linux-mm, akpm, honghuan
From: Honglei Huang <honghuan@amd.com>
Add core page management functions for batch userptr allocations.
This adds:
- get_user_pages_batch_locked(): gets user pages for batch
- set_user_pages_batch(): populates TTM page array from multiple
HMM ranges
Signed-off-by: Honglei Huang <honghuan@amd.com>
---
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 45 +++++++++++++++++++
1 file changed, 45 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index a22a99b8d..5f10a4514 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1202,6 +1202,51 @@ static const struct mmu_interval_notifier_ops amdgpu_amdkfd_hsa_batch_ops = {
.invalidate = amdgpu_amdkfd_invalidate_userptr_batch,
};
+static int get_user_pages_batch_locked(struct mm_struct *mm,
+ struct kgd_mem *mem,
+ struct user_range_info *range,
+ struct hmm_range **range_hmm,
+ bool readonly)
+{
+ struct vm_area_struct *vma;
+ int r;
+
+ *range_hmm = NULL;
+
+ vma = vma_lookup(mm, range->start);
+ if (unlikely(!vma))
+ return -EFAULT;
+
+ r = amdgpu_hmm_range_get_pages(&mem->batch_notifier, range->start,
+ range->size >> PAGE_SHIFT, readonly,
+ NULL, range_hmm);
+ return r;
+}
+
+static int set_user_pages_batch(struct ttm_tt *ttm,
+ struct user_range_info *ranges,
+ uint32_t nranges)
+{
+ uint32_t i, j, k = 0, range_npfns;
+
+ for (i = 0; i < nranges; ++i) {
+ if (!ranges[i].range || !ranges[i].range->hmm_pfns)
+ return -EINVAL;
+
+ range_npfns = (ranges[i].range->end - ranges[i].range->start) >>
+ PAGE_SHIFT;
+
+ if (k + range_npfns > ttm->num_pages)
+ return -EOVERFLOW;
+
+ for (j = 0; j < range_npfns; ++j)
+ ttm->pages[k++] =
+ hmm_pfn_to_page(ranges[i].range->hmm_pfns[j]);
+ }
+
+ return 0;
+}
+
/* Reserving a BO and its page table BOs must happen atomically to
* avoid deadlocks. Some operations update multiple VMs at once. Track
* all the reservation info in a context structure. Optionally a sync
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* Claude review: drm/amdkfd: Implement batch userptr page management
2026-02-09 6:10 ` [PATCH v4 5/8] drm/amdkfd: Implement batch userptr page management Honglei Huang
@ 2026-02-11 7:15 ` Claude Code Review Bot
0 siblings, 0 replies; 18+ messages in thread
From: Claude Code Review Bot @ 2026-02-11 7:15 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Overview:** Adds page management for batch allocations.
**Issues:**
1. **Integer Overflow - Critical:**
```c
+ if (k + range_npfns > ttm->num_pages)
+ return -EOVERFLOW;
```
`k + range_npfns` could overflow before the comparison. Should use `check_add_overflow()`.
2. **Incomplete Validation:**
```c
+ for (i = 0; i < nranges; ++i) {
+ if (!ranges[i].range || !ranges[i].range->hmm_pfns)
+ return -EINVAL;
```
No validation that `ttm`, `ttm->pages`, or `ranges` are non-NULL.
3. **No IOMMU Check:**
```c
+ for (j = 0; j < range_npfns; ++j)
+ ttm->pages[k++] =
+ hmm_pfn_to_page(ranges[i].range->hmm_pfns[j]);
```
`hmm_pfn_to_page()` can return NULL if the PFN represents a device/special page. No NULL check before assignment.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v4 6/8] drm/amdkfd: Add batch allocation function and export API
2026-02-09 6:10 [PATCH v4 0/8] drm/amdkfd: Add batch userptr allocation support Honglei Huang
` (4 preceding siblings ...)
2026-02-09 6:10 ` [PATCH v4 5/8] drm/amdkfd: Implement batch userptr page management Honglei Huang
@ 2026-02-09 6:10 ` Honglei Huang
2026-02-11 7:15 ` Claude review: " Claude Code Review Bot
2026-02-09 6:10 ` [PATCH v4 7/8] drm/amdkfd: Unify userptr cleanup and update paths Honglei Huang
` (2 subsequent siblings)
8 siblings, 1 reply; 18+ messages in thread
From: Honglei Huang @ 2026-02-09 6:10 UTC (permalink / raw)
To: Felix.Kuehling, alexander.deucher, christian.koenig, Philip.Yang,
Ray.Huang
Cc: dmitry.osipenko, airlied, daniel, amd-gfx, dri-devel,
linux-kernel, linux-mm, akpm, honghuan
From: Honglei Huang <honghuan@amd.com>
Implement the main batch userptr allocation function and export it
through the AMDKFD API.
This adds:
- init_user_pages_batch(): initializes batch allocation by setting
up interval tree, registering single MMU notifier, and getting
pages for all ranges
- amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu_batch(): main entry point
for batch userptr allocation
- Function export in amdgpu_amdkfd.h
Signed-off-by: Honglei Huang <honghuan@amd.com>
---
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 276 ++++++++++++++++++
1 file changed, 276 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 5f10a4514..c2fc31964 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1247,6 +1247,163 @@ static int set_user_pages_batch(struct ttm_tt *ttm,
return 0;
}
+static int init_user_pages_batch(struct kgd_mem *mem,
+ struct kfd_ioctl_userptr_range *ranges,
+ uint32_t num_ranges, bool criu_resume,
+ uint64_t user_addr, uint32_t size)
+{
+ struct amdkfd_process_info *process_info = mem->process_info;
+ struct amdgpu_bo *bo = mem->bo;
+ struct ttm_operation_ctx ctx = { true, false };
+ struct hmm_range *range;
+ uint64_t va_min = ULLONG_MAX, va_max = 0;
+ int ret = 0;
+ uint32_t i;
+
+ if (!num_ranges || !ranges)
+ return -EINVAL;
+
+ mutex_lock(&process_info->lock);
+
+ mem->user_ranges = kvcalloc(num_ranges, sizeof(struct user_range_info),
+ GFP_KERNEL);
+
+ if (!mem->user_ranges) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ mem->num_user_ranges = num_ranges;
+
+ mem->user_ranges_itree = RB_ROOT_CACHED;
+
+ ret = amdgpu_ttm_tt_set_userptr(&bo->tbo, user_addr, 0);
+ if (ret) {
+ pr_err("%s: Failed to set userptr: %d\n", __func__, ret);
+ goto out;
+ }
+
+ for (i = 0; i < num_ranges; i++) {
+ uint64_t range_end;
+
+ mem->user_ranges[i].start = ranges[i].start;
+ mem->user_ranges[i].size = ranges[i].size;
+ mem->user_ranges[i].range = NULL;
+ mem->user_ranges[i].valid = false;
+
+ range_end = ranges[i].start + ranges[i].size;
+
+ mem->user_ranges[i].it_node.start = ranges[i].start;
+ mem->user_ranges[i].it_node.last = range_end - 1;
+ interval_tree_insert(&mem->user_ranges[i].it_node, &mem->user_ranges_itree);
+
+ if (ranges[i].start < va_min)
+ va_min = ranges[i].start;
+ if (range_end > va_max)
+ va_max = range_end;
+
+ pr_debug("Initializing userptr range %u: addr=0x%llx size=0x%llx\n",
+ i, mem->user_ranges[i].start, mem->user_ranges[i].size);
+ }
+
+ mem->batch_va_min = va_min;
+ mem->batch_va_max = va_max;
+
+ pr_debug("Batch userptr: registering single notifier for span [0x%llx - 0x%llx)\n",
+ va_min, va_max);
+
+ ret = mmu_interval_notifier_insert(&mem->batch_notifier,
+ current->mm, va_min, va_max - va_min,
+ &amdgpu_amdkfd_hsa_batch_ops);
+ if (ret) {
+ pr_err("%s: Failed to register batch MMU notifier: %d\n",
+ __func__, ret);
+ goto err_cleanup_ranges;
+ }
+
+ if (criu_resume) {
+ mutex_lock(&process_info->notifier_lock);
+ mem->invalid++;
+ mutex_unlock(&process_info->notifier_lock);
+ mutex_unlock(&process_info->lock);
+ return 0;
+ }
+
+ if (!mmget_not_zero(current->mm)) {
+ ret = -ESRCH;
+ goto err_unregister;
+ }
+
+ mmap_read_lock(current->mm);
+ for (i = 0; i < num_ranges; i++) {
+ ret = get_user_pages_batch_locked(
+ current->mm, mem, &mem->user_ranges[i], &range,
+ amdgpu_ttm_tt_is_readonly(bo->tbo.ttm));
+ if (ret) {
+ if (ret == -EAGAIN)
+ pr_debug("Failed to get user pages for range %u, try again\n", i);
+ else
+ pr_err("%s: Failed to get user pages for range %u: %d\n",
+ __func__, i, ret);
+ mmap_read_unlock(current->mm);
+ mmput(current->mm);
+ goto err_unregister;
+ }
+
+ mem->user_ranges[i].range = range;
+ mem->user_ranges[i].valid = true;
+ }
+ mmap_read_unlock(current->mm);
+ mmput(current->mm);
+
+ ret = amdgpu_bo_reserve(bo, true);
+ if (ret) {
+ pr_err("%s: Failed to reserve BO\n", __func__);
+ goto release_pages;
+ }
+
+ if (bo->tbo.ttm->pages) {
+ set_user_pages_batch(bo->tbo.ttm,
+ mem->user_ranges,
+ num_ranges);
+ } else {
+ pr_err("%s: TTM pages array is NULL\n", __func__);
+ ret = -EINVAL;
+ amdgpu_bo_unreserve(bo);
+ goto release_pages;
+ }
+
+ amdgpu_bo_placement_from_domain(bo, mem->domain);
+ ret = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
+ if (ret)
+ pr_err("%s: failed to validate BO\n", __func__);
+
+ amdgpu_bo_unreserve(bo);
+
+release_pages:
+ for (i = 0; i < num_ranges; i++) {
+ if (mem->user_ranges[i].range) {
+ amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm,
+ mem->user_ranges[i].range);
+ }
+ }
+
+err_unregister:
+ if (ret && mem->batch_notifier.mm) {
+ mmu_interval_notifier_remove(&mem->batch_notifier);
+ mem->batch_notifier.mm = NULL;
+ }
+err_cleanup_ranges:
+ if (ret) {
+ for (i = 0; i < num_ranges; i++) {
+ mem->user_ranges[i].range = NULL;
+ }
+ }
+
+out:
+ mutex_unlock(&process_info->lock);
+ return ret;
+}
+
/* Reserving a BO and its page table BOs must happen atomically to
* avoid deadlocks. Some operations update multiple VMs at once. Track
* all the reservation info in a context structure. Optionally a sync
@@ -2005,6 +2162,125 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
return ret;
}
+int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu_batch(
+ struct amdgpu_device *adev, uint64_t va, uint64_t size, void *drm_priv,
+ struct kgd_mem **mem, uint64_t *offset,
+ struct kfd_ioctl_userptr_range *ranges, uint32_t num_ranges,
+ uint32_t flags, bool criu_resume)
+{
+ struct amdgpu_vm *avm = drm_priv_to_vm(drm_priv);
+ struct amdgpu_bo *bo;
+ struct drm_gem_object *gobj = NULL;
+ u32 domain, alloc_domain;
+ uint64_t aligned_size;
+ int8_t xcp_id = -1;
+ u64 alloc_flags;
+ int ret;
+
+ if (!(flags & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR)) {
+ pr_err("Batch allocation requires USERPTR flag\n");
+ return -EINVAL;
+ }
+
+ if (flags & KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM) {
+ pr_err("Batch userptr does not support AQL queue\n");
+ return -EINVAL;
+ }
+
+ domain = AMDGPU_GEM_DOMAIN_GTT;
+ alloc_domain = AMDGPU_GEM_DOMAIN_CPU;
+ alloc_flags = AMDGPU_GEM_CREATE_PREEMPTIBLE;
+
+ if (flags & KFD_IOC_ALLOC_MEM_FLAGS_COHERENT)
+ alloc_flags |= AMDGPU_GEM_CREATE_COHERENT;
+ if (flags & KFD_IOC_ALLOC_MEM_FLAGS_EXT_COHERENT)
+ alloc_flags |= AMDGPU_GEM_CREATE_EXT_COHERENT;
+ if (flags & KFD_IOC_ALLOC_MEM_FLAGS_UNCACHED)
+ alloc_flags |= AMDGPU_GEM_CREATE_UNCACHED;
+
+ *mem = kzalloc(sizeof(struct kgd_mem), GFP_KERNEL);
+ if (!*mem) {
+ ret = -ENOMEM;
+ goto err;
+ }
+ INIT_LIST_HEAD(&(*mem)->attachments);
+ mutex_init(&(*mem)->lock);
+ (*mem)->aql_queue = false;
+
+ aligned_size = PAGE_ALIGN(size);
+
+ (*mem)->alloc_flags = flags;
+
+ amdgpu_sync_create(&(*mem)->sync);
+
+ ret = amdgpu_amdkfd_reserve_mem_limit(adev, aligned_size, flags,
+ xcp_id);
+ if (ret) {
+ pr_debug("Insufficient memory\n");
+ goto err_reserve_limit;
+ }
+
+ pr_debug("\tcreate BO VA 0x%llx size 0x%llx for batch userptr (ranges=%u)\n",
+ va, size, num_ranges);
+
+ ret = amdgpu_gem_object_create(adev, aligned_size, 1, alloc_domain, alloc_flags,
+ ttm_bo_type_device, NULL, &gobj, xcp_id + 1);
+ if (ret) {
+ pr_debug("Failed to create BO on domain %s. ret %d\n",
+ domain_string(alloc_domain), ret);
+ goto err_bo_create;
+ }
+
+ ret = drm_vma_node_allow(&gobj->vma_node, drm_priv);
+ if (ret) {
+ pr_debug("Failed to allow vma node access. ret %d\n", ret);
+ goto err_node_allow;
+ }
+
+ ret = drm_gem_handle_create(adev->kfd.client.file, gobj, &(*mem)->gem_handle);
+ if (ret)
+ goto err_gem_handle_create;
+
+ bo = gem_to_amdgpu_bo(gobj);
+ bo->kfd_bo = *mem;
+ bo->flags |= AMDGPU_AMDKFD_CREATE_USERPTR_BO;
+
+ (*mem)->bo = bo;
+ (*mem)->va = va;
+ (*mem)->domain = domain;
+ (*mem)->mapped_to_gpu_memory = 0;
+ (*mem)->process_info = avm->process_info;
+
+ add_kgd_mem_to_kfd_bo_list(*mem, avm->process_info, ranges[0].start);
+
+ ret = init_user_pages_batch(*mem, ranges, num_ranges, criu_resume, va, aligned_size);
+ if (ret) {
+ pr_err("Failed to initialize batch user pages: %d\n", ret);
+ goto allocate_init_user_pages_failed;
+ }
+
+ return 0;
+
+allocate_init_user_pages_failed:
+ remove_kgd_mem_from_kfd_bo_list(*mem, avm->process_info);
+ drm_gem_handle_delete(adev->kfd.client.file, (*mem)->gem_handle);
+err_gem_handle_create:
+ drm_vma_node_revoke(&gobj->vma_node, drm_priv);
+err_node_allow:
+ goto err_reserve_limit;
+err_bo_create:
+ amdgpu_amdkfd_unreserve_mem_limit(adev, aligned_size, flags, xcp_id);
+err_reserve_limit:
+ amdgpu_sync_free(&(*mem)->sync);
+ mutex_destroy(&(*mem)->lock);
+ if (gobj)
+ drm_gem_object_put(gobj);
+ else
+ kfree(*mem);
+err:
+ return ret;
+}
+
int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
struct amdgpu_device *adev, struct kgd_mem *mem, void *drm_priv,
uint64_t *size)
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* Claude review: drm/amdkfd: Add batch allocation function and export API
2026-02-09 6:10 ` [PATCH v4 6/8] drm/amdkfd: Add batch allocation function and export API Honglei Huang
@ 2026-02-11 7:15 ` Claude Code Review Bot
0 siblings, 0 replies; 18+ messages in thread
From: Claude Code Review Bot @ 2026-02-11 7:15 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Overview:** Main initialization function for batch allocations.
**Issues:**
1. **Memory Leak on Error - Critical:**
```c
+ mem->user_ranges = kvcalloc(num_ranges, sizeof(struct user_range_info),
+ GFP_KERNEL);
+ if (!mem->user_ranges) {
+ ret = -ENOMEM;
+ goto out;
+ }
```
If allocation fails, `mem->user_ranges` is never freed before returning. Should `kvfree()` in error path.
2. **Integer Overflow - Critical:**
```c
+ range_end = ranges[i].start + ranges[i].size;
```
No overflow check. Should use `check_add_overflow()`.
3. **Missing Validation:**
Should validate:
- `num_ranges` has a maximum (prevent DoS)
- Ranges don't overlap (could cause double-mapping issues)
- Ranges are in ascending order (simplifies interval tree)
4. **Wrong User Address:**
```c
+ ret = amdgpu_ttm_tt_set_userptr(&bo->tbo, user_addr, 0);
```
`user_addr` is the GPU VA, not a CPU VA from any range. This seems incorrect.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v4 7/8] drm/amdkfd: Unify userptr cleanup and update paths
2026-02-09 6:10 [PATCH v4 0/8] drm/amdkfd: Add batch userptr allocation support Honglei Huang
` (5 preceding siblings ...)
2026-02-09 6:10 ` [PATCH v4 6/8] drm/amdkfd: Add batch allocation function and export API Honglei Huang
@ 2026-02-09 6:10 ` Honglei Huang
2026-02-11 7:15 ` Claude review: " Claude Code Review Bot
2026-02-09 6:10 ` [PATCH v4 8/8] drm/amdkfd: Wire up batch allocation in ioctl handler Honglei Huang
2026-02-11 7:15 ` Claude review: drm/amdkfd: Add batch userptr allocation support Claude Code Review Bot
8 siblings, 1 reply; 18+ messages in thread
From: Honglei Huang @ 2026-02-09 6:10 UTC (permalink / raw)
To: Felix.Kuehling, alexander.deucher, christian.koenig, Philip.Yang,
Ray.Huang
Cc: dmitry.osipenko, airlied, daniel, amd-gfx, dri-devel,
linux-kernel, linux-mm, akpm, honghuan
From: Honglei Huang <honghuan@amd.com>
Refactor userptr management code to handle both single and batch
allocations uniformly.
This adds:
- cleanup_userptr_resources(): unified cleanup for single/batch
- discard_user_pages_batch(): discard pages for batch ranges
- amdgpu_amdkfd_update_user_pages_batch(): update pages for batch
- valid_user_pages_batch(): validate batch pages
Modified functions to support batch mode:
- update_invalid_user_pages(): uses batch update when applicable
- confirm_valid_user_pages_locked(): checks batch validity
- amdgpu_amdkfd_gpuvm_free_memory_of_gpu(): uses unified cleanup
Signed-off-by: Honglei Huang <honghuan@amd.com>
---
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 158 ++++++++++++++++--
1 file changed, 141 insertions(+), 17 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index c2fc31964..7233b127b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -2281,6 +2281,35 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu_batch(
return ret;
}
+static void cleanup_userptr_resources(struct kgd_mem *mem,
+ struct amdkfd_process_info *process_info)
+{
+ uint32_t i;
+
+ if (!amdgpu_ttm_tt_get_usermm(mem->bo->tbo.ttm))
+ return;
+
+ if (mem->num_user_ranges > 0 && mem->user_ranges) {
+ for (i = 0; i < mem->num_user_ranges; i++)
+ interval_tree_remove(&mem->user_ranges[i].it_node,
+ &mem->user_ranges_itree);
+
+ if (mem->batch_notifier.mm) {
+ mmu_interval_notifier_remove(&mem->batch_notifier);
+ mem->batch_notifier.mm = NULL;
+ }
+
+ kvfree(mem->user_ranges);
+ mem->user_ranges = NULL;
+ mem->num_user_ranges = 0;
+ } else {
+ amdgpu_hmm_unregister(mem->bo);
+ mutex_lock(&process_info->notifier_lock);
+ amdgpu_ttm_tt_discard_user_pages(mem->bo->tbo.ttm, mem->range);
+ mutex_unlock(&process_info->notifier_lock);
+ }
+}
+
int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
struct amdgpu_device *adev, struct kgd_mem *mem, void *drm_priv,
uint64_t *size)
@@ -2322,12 +2351,7 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
mutex_unlock(&process_info->lock);
/* Cleanup user pages and MMU notifiers */
- if (amdgpu_ttm_tt_get_usermm(mem->bo->tbo.ttm)) {
- amdgpu_hmm_unregister(mem->bo);
- mutex_lock(&process_info->notifier_lock);
- amdgpu_ttm_tt_discard_user_pages(mem->bo->tbo.ttm, mem->range);
- mutex_unlock(&process_info->notifier_lock);
- }
+ cleanup_userptr_resources(mem, process_info);
ret = reserve_bo_and_cond_vms(mem, NULL, BO_VM_ALL, &ctx);
if (unlikely(ret))
@@ -2914,6 +2938,51 @@ int amdgpu_amdkfd_evict_userptr(struct mmu_interval_notifier *mni,
return r;
}
+static void discard_user_pages_batch(struct amdgpu_bo *bo, struct kgd_mem *mem)
+{
+ uint32_t i;
+
+ for (i = 0; i < mem->num_user_ranges; i++) {
+ if (!mem->user_ranges[i].valid && mem->user_ranges[i].range) {
+ amdgpu_ttm_tt_discard_user_pages(bo->tbo.ttm,
+ mem->user_ranges[i].range);
+ mem->user_ranges[i].range = NULL;
+ }
+ }
+}
+
+static int amdgpu_amdkfd_update_user_pages_batch(struct mm_struct *mm,
+ struct amdgpu_bo *bo,
+ struct kgd_mem *mem)
+{
+ uint32_t i;
+ int ret = 0;
+
+ if (!mmget_not_zero(mm))
+ return -ESRCH;
+
+ mmap_read_lock(mm);
+ for (i = 0; i < mem->num_user_ranges; i++) {
+ if (mem->user_ranges[i].valid)
+ continue;
+
+ ret = get_user_pages_batch_locked(
+ mm, mem, &mem->user_ranges[i],
+ &mem->user_ranges[i].range, amdgpu_ttm_tt_is_readonly(bo->tbo.ttm));
+ if (ret) {
+ pr_debug("Failed %d to get user pages for range %u\n",
+ ret, i);
+ break;
+ }
+
+ mem->user_ranges[i].valid = true;
+ }
+ mmap_read_unlock(mm);
+ mmput(mm);
+
+ return ret;
+}
+
/* Update invalid userptr BOs
*
* Moves invalidated (evicted) userptr BOs from userptr_valid_list to
@@ -2928,6 +2997,7 @@ static int update_invalid_user_pages(struct amdkfd_process_info *process_info,
struct ttm_operation_ctx ctx = { false, false };
uint32_t invalid;
int ret = 0;
+ uint32_t i;
mutex_lock(&process_info->notifier_lock);
@@ -2951,8 +3021,12 @@ static int update_invalid_user_pages(struct amdkfd_process_info *process_info,
bo = mem->bo;
- amdgpu_ttm_tt_discard_user_pages(bo->tbo.ttm, mem->range);
- mem->range = NULL;
+ if (mem->num_user_ranges > 0 && mem->user_ranges)
+ discard_user_pages_batch(bo, mem);
+ else {
+ amdgpu_ttm_tt_discard_user_pages(bo->tbo.ttm, mem->range);
+ mem->range = NULL;
+ }
/* BO reservations and getting user pages (hmm_range_fault)
* must happen outside the notifier lock
@@ -2976,7 +3050,11 @@ static int update_invalid_user_pages(struct amdkfd_process_info *process_info,
}
/* Get updated user pages */
- ret = amdgpu_ttm_tt_get_user_pages(bo, &mem->range);
+ if (mem->num_user_ranges > 0 && mem->user_ranges)
+ ret = amdgpu_amdkfd_update_user_pages_batch(mm, bo, mem);
+ else
+ ret = amdgpu_ttm_tt_get_user_pages(bo, &mem->range);
+
if (ret) {
pr_debug("Failed %d to get user pages\n", ret);
@@ -3010,7 +3088,10 @@ static int update_invalid_user_pages(struct amdkfd_process_info *process_info,
ret = 0;
}
- amdgpu_ttm_tt_set_user_pages(bo->tbo.ttm, mem->range);
+ if (mem->num_user_ranges == 0)
+ amdgpu_ttm_tt_set_user_pages(bo->tbo.ttm, mem->range);
+ else
+ set_user_pages_batch(bo->tbo.ttm, mem->user_ranges, mem->num_user_ranges);
mutex_lock(&process_info->notifier_lock);
@@ -3024,6 +3105,17 @@ static int update_invalid_user_pages(struct amdkfd_process_info *process_info,
/* set mem valid if mem has hmm range associated */
if (mem->range)
mem->invalid = 0;
+
+ /* For batch mode, clear global invalid counter and mark ranges as valid.
+ * Individual range validity is tracked by valid flag.
+ */
+ if (mem->num_user_ranges > 0 && mem->user_ranges) {
+ mem->invalid = 0;
+ for (i = 0; i < mem->num_user_ranges; i++) {
+ if (mem->user_ranges[i].range)
+ mem->user_ranges[i].valid = true;
+ }
+ }
}
unlock_out:
@@ -3131,6 +3223,33 @@ static int validate_invalid_user_pages(struct amdkfd_process_info *process_info)
return ret;
}
+static bool valid_user_pages_batch(struct kgd_mem *mem)
+{
+ uint32_t i;
+ bool all_valid = true;
+
+ if (!mem->user_ranges || mem->num_user_ranges == 0)
+ return true;
+
+ for (i = 0; i < mem->num_user_ranges; i++) {
+ if (!mem->user_ranges[i].valid) {
+ all_valid = false;
+ continue;
+ }
+
+ if (!mem->user_ranges[i].range)
+ continue;
+
+ if (!amdgpu_ttm_tt_get_user_pages_done(
+ mem->bo->tbo.ttm, mem->user_ranges[i].range))
+ all_valid = false;
+
+ mem->user_ranges[i].range = NULL;
+ }
+
+ return all_valid;
+}
+
/* Confirm that all user pages are valid while holding the notifier lock
*
* Moves valid BOs from the userptr_inval_list back to userptr_val_list.
@@ -3145,15 +3264,20 @@ static int confirm_valid_user_pages_locked(struct amdkfd_process_info *process_i
validate_list) {
bool valid;
- /* keep mem without hmm range at userptr_inval_list */
- if (!mem->range)
- continue;
+ if (mem->num_user_ranges > 0 && mem->user_ranges)
+ valid = valid_user_pages_batch(mem);
+ else {
+ /* keep mem without hmm range at userptr_inval_list */
+ if (!mem->range)
+ continue;
- /* Only check mem with hmm range associated */
- valid = amdgpu_ttm_tt_get_user_pages_done(
- mem->bo->tbo.ttm, mem->range);
+ /* Only check mem with hmm range associated */
+ valid = amdgpu_ttm_tt_get_user_pages_done(
+ mem->bo->tbo.ttm, mem->range);
+
+ mem->range = NULL;
+ }
- mem->range = NULL;
if (!valid) {
WARN(!mem->invalid, "Invalid BO not marked invalid");
ret = -EAGAIN;
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* Claude review: drm/amdkfd: Unify userptr cleanup and update paths
2026-02-09 6:10 ` [PATCH v4 7/8] drm/amdkfd: Unify userptr cleanup and update paths Honglei Huang
@ 2026-02-11 7:15 ` Claude Code Review Bot
0 siblings, 0 replies; 18+ messages in thread
From: Claude Code Review Bot @ 2026-02-11 7:15 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Overview:** Refactors cleanup to handle both single and batch allocations.
**Issues:**
1. **Missing Lock - Critical:**
```c
+ if (mem->num_user_ranges > 0 && mem->user_ranges) {
+ for (i = 0; i < mem->num_user_ranges; i++)
+ interval_tree_remove(&mem->user_ranges[i].it_node,
+ &mem->user_ranges_itree);
```
Modifying the interval tree without holding any lock. This can race with concurrent MMU notifier callbacks.
2. **Use After Free - Critical:**
```c
+ if (mem->batch_notifier.mm) {
+ mmu_interval_notifier_remove(&mem->batch_notifier);
+ mem->batch_notifier.mm = NULL;
+ }
+ kvfree(mem->user_ranges);
```
After removing the notifier, a concurrent invalidation callback could still be running and accessing `mem->user_ranges`. Should ensure all callbacks are complete before freeing.
3. **Race Condition:**
```c
+ mem->user_ranges[i].valid = true;
```
Setting `valid` flag outside of `notifier_lock` can race with invalidation callbacks that clear it.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v4 8/8] drm/amdkfd: Wire up batch allocation in ioctl handler
2026-02-09 6:10 [PATCH v4 0/8] drm/amdkfd: Add batch userptr allocation support Honglei Huang
` (6 preceding siblings ...)
2026-02-09 6:10 ` [PATCH v4 7/8] drm/amdkfd: Unify userptr cleanup and update paths Honglei Huang
@ 2026-02-09 6:10 ` Honglei Huang
2026-02-11 7:15 ` Claude review: " Claude Code Review Bot
2026-02-11 7:15 ` Claude review: drm/amdkfd: Add batch userptr allocation support Claude Code Review Bot
8 siblings, 1 reply; 18+ messages in thread
From: Honglei Huang @ 2026-02-09 6:10 UTC (permalink / raw)
To: Felix.Kuehling, alexander.deucher, christian.koenig, Philip.Yang,
Ray.Huang
Cc: dmitry.osipenko, airlied, daniel, amd-gfx, dri-devel,
linux-kernel, linux-mm, akpm, honghuan
From: Honglei Huang <honghuan@amd.com>
Integrate batch userptr allocation into the KFD ioctl interface.
This adds:
- kfd_copy_userptr_ranges(): validates and copies batch range data
from userspace, checking alignment, sizes, and total size match
- Modifications to kfd_ioctl_alloc_memory_of_gpu() to detect batch
mode and route to appropriate allocation function
- SVM conflict checking extended for batch ranges
Signed-off-by: Honglei Huang <honghuan@amd.com>
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 128 +++++++++++++++++++++--
1 file changed, 122 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index a72cc980a..d0b56d5cc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1047,10 +1047,79 @@ static int kfd_ioctl_get_available_memory(struct file *filep,
return 0;
}
+static int kfd_copy_userptr_ranges(void __user *user_data, uint64_t expected_size,
+ struct kfd_ioctl_userptr_range **ranges_out,
+ uint32_t *num_ranges_out)
+{
+ struct kfd_ioctl_userptr_ranges_data ranges_header;
+ struct kfd_ioctl_userptr_range *ranges;
+ uint64_t total_size = 0;
+ uint32_t num_ranges;
+ size_t header_size;
+ uint32_t i;
+
+ if (!user_data) {
+ pr_err("Batch allocation: ranges pointer is NULL\n");
+ return -EINVAL;
+ }
+
+ header_size = offsetof(struct kfd_ioctl_userptr_ranges_data, ranges);
+ if (copy_from_user(&ranges_header, user_data, header_size)) {
+ pr_err("Failed to copy ranges data header from user space\n");
+ return -EFAULT;
+ }
+
+ num_ranges = ranges_header.num_ranges;
+ if (num_ranges == 0) {
+ pr_err("Batch allocation: invalid number of ranges %u\n", num_ranges);
+ return -EINVAL;
+ }
+
+ if (ranges_header.reserved != 0) {
+ pr_err("Batch allocation: reserved field must be 0\n");
+ return -EINVAL;
+ }
+
+ ranges = kvmalloc_array(num_ranges, sizeof(*ranges), GFP_KERNEL);
+ if (!ranges)
+ return -ENOMEM;
+
+ if (copy_from_user(ranges, user_data + header_size,
+ num_ranges * sizeof(*ranges))) {
+ pr_err("Failed to copy ranges from user space\n");
+ kvfree(ranges);
+ return -EFAULT;
+ }
+
+ for (i = 0; i < num_ranges; i++) {
+ if (!ranges[i].start || !ranges[i].size ||
+ (ranges[i].start & ~PAGE_MASK) ||
+ (ranges[i].size & ~PAGE_MASK)) {
+ pr_err("Invalid range %u: start=0x%llx size=0x%llx\n",
+ i, ranges[i].start, ranges[i].size);
+ kvfree(ranges);
+ return -EINVAL;
+ }
+ total_size += ranges[i].size;
+ }
+
+ if (total_size != expected_size) {
+ pr_err("Size mismatch: provided %llu != calculated %llu\n",
+ expected_size, total_size);
+ kvfree(ranges);
+ return -EINVAL;
+ }
+
+ *ranges_out = ranges;
+ *num_ranges_out = num_ranges;
+ return 0;
+}
+
static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
struct kfd_process *p, void *data)
{
struct kfd_ioctl_alloc_memory_of_gpu_args *args = data;
+ struct kfd_ioctl_userptr_range *ranges = NULL;
struct kfd_process_device *pdd;
void *mem;
struct kfd_node *dev;
@@ -1058,16 +1127,32 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
long err;
uint64_t offset = args->mmap_offset;
uint32_t flags = args->flags;
+ uint32_t num_ranges = 0;
+ bool is_batch = false;
if (args->size == 0)
return -EINVAL;
+ if ((flags & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) &&
+ (flags & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR_BATCH)) {
+ is_batch = true;
+ }
+
if (p->context_id != KFD_CONTEXT_ID_PRIMARY && (flags & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR)) {
pr_debug("USERPTR is not supported on non-primary kfd_process\n");
return -EOPNOTSUPP;
}
+ if (is_batch) {
+ err = kfd_copy_userptr_ranges((void __user *)args->mmap_offset,
+ args->size, &ranges, &num_ranges);
+ if (err)
+ return err;
+
+ offset = 0;
+ }
+
#if IS_ENABLED(CONFIG_HSA_AMD_SVM)
/* Flush pending deferred work to avoid racing with deferred actions
* from previous memory map changes (e.g. munmap).
@@ -1086,13 +1171,15 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
pr_err("Address: 0x%llx already allocated by SVM\n",
args->va_addr);
mutex_unlock(&p->svms.lock);
- return -EADDRINUSE;
+ err = -EADDRINUSE;
+ goto err_free_ranges;
}
/* When register user buffer check if it has been registered by svm by
* buffer cpu virtual address.
+ * For batch mode, check each range individually below.
*/
- if ((flags & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) &&
+ if ((flags & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) && !is_batch &&
interval_tree_iter_first(&p->svms.objects,
args->mmap_offset >> PAGE_SHIFT,
(args->mmap_offset + args->size - 1) >> PAGE_SHIFT)) {
@@ -1102,6 +1189,22 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
return -EADDRINUSE;
}
+ /* Check each userptr range for SVM conflicts in batch mode */
+ if (is_batch) {
+ uint32_t i;
+ for (i = 0; i < num_ranges; i++) {
+ if (interval_tree_iter_first(&p->svms.objects,
+ ranges[i].start >> PAGE_SHIFT,
+ (ranges[i].start + ranges[i].size - 1) >> PAGE_SHIFT)) {
+ pr_err("Userptr range %u (0x%llx) already allocated by SVM\n",
+ i, ranges[i].start);
+ mutex_unlock(&p->svms.lock);
+ err = -EADDRINUSE;
+ goto err_free_ranges;
+ }
+ }
+ }
+
mutex_unlock(&p->svms.lock);
#endif
mutex_lock(&p->mutex);
@@ -1149,10 +1252,17 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
}
}
- err = amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
- dev->adev, args->va_addr, args->size,
- pdd->drm_priv, (struct kgd_mem **) &mem, &offset,
- flags, false);
+ if (is_batch) {
+ err = amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu_batch(
+ dev->adev, args->va_addr, args->size, pdd->drm_priv,
+ (struct kgd_mem **)&mem, &offset, ranges, num_ranges,
+ flags, false);
+ } else {
+ err = amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
+ dev->adev, args->va_addr, args->size,
+ pdd->drm_priv, (struct kgd_mem **) &mem, &offset,
+ flags, false);
+ }
if (err)
goto err_unlock;
@@ -1184,6 +1294,9 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
args->mmap_offset = KFD_MMAP_TYPE_MMIO
| KFD_MMAP_GPU_ID(args->gpu_id);
+ if (is_batch)
+ kvfree(ranges);
+
return 0;
err_free:
@@ -1193,6 +1306,9 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
err_pdd:
err_large_bar:
mutex_unlock(&p->mutex);
+err_free_ranges:
+ if (ranges)
+ kvfree(ranges);
return err;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread* Claude review: drm/amdkfd: Wire up batch allocation in ioctl handler
2026-02-09 6:10 ` [PATCH v4 8/8] drm/amdkfd: Wire up batch allocation in ioctl handler Honglei Huang
@ 2026-02-11 7:15 ` Claude Code Review Bot
0 siblings, 0 replies; 18+ messages in thread
From: Claude Code Review Bot @ 2026-02-11 7:15 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Overview:** Integrates batch allocation into ioctl handler.
**Issues:**
1. **Integer Overflow - Critical:**
```c
+ for (i = 0; i < num_ranges; i++) {
+ total_size += ranges[i].size;
+ }
```
`total_size` can overflow. Should use `check_add_overflow()`.
2. **Missing Range Overlap Check - Critical:**
No check for overlapping ranges. Overlapping ranges could cause:
- Double-mapping to same GPU address
- Corruption during page operations
- Confusion in interval tree
3. **Memory Allocation Without Limit:**
```c
+ ranges = kvmalloc_array(num_ranges, sizeof(*ranges), GFP_KERNEL);
```
No limit on `num_ranges`. A malicious user could cause kernel memory exhaustion (DoS).
4. **SVM Check Overflow:**
```c
+ if (interval_tree_iter_first(&p->svms.objects,
+ ranges[i].start >> PAGE_SHIFT,
+ (ranges[i].start + ranges[i].size - 1) >> PAGE_SHIFT)) {
```
The `ranges[i].start + ranges[i].size - 1` can overflow.
---
## SUMMARY OF CRITICAL ISSUES
**Must Fix Before Merge:**
1. Race conditions around `valid` flag and interval tree access (need proper locking)
2. Use-after-free in cleanup paths when MMU notifier callbacks race with resource cleanup
3. Integer overflows in size calculations throughout the series
4. Memory leaks in error paths (especially patch 6)
5. Missing range overlap validation (patch 8)
6. Missing maximum limit on num_ranges (DoS vulnerability)
**Should Fix:**
7. UAPI design (mmap_offset overloading)
8. Missing error handling for partial failures
9. Incomplete validation of HMM page returns
10. Debug logging in hot paths
11. Type inconsistencies (uint64_t vs u64)
The core approach is sound, but the implementation needs careful review of locking protocols, error handling, and input validation before this can be safely merged.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 18+ messages in thread
* Claude review: drm/amdkfd: Add batch userptr allocation support
2026-02-09 6:10 [PATCH v4 0/8] drm/amdkfd: Add batch userptr allocation support Honglei Huang
` (7 preceding siblings ...)
2026-02-09 6:10 ` [PATCH v4 8/8] drm/amdkfd: Wire up batch allocation in ioctl handler Honglei Huang
@ 2026-02-11 7:15 ` Claude Code Review Bot
8 siblings, 0 replies; 18+ messages in thread
From: Claude Code Review Bot @ 2026-02-11 7:15 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: drm/amdkfd: Add batch userptr allocation support
Author: Honglei Huang <honglei1.huang@amd.com>
Patches: 9
Reviewed: 2026-02-11T17:15:48.245798
---
This is a v4 patch series adding batch userptr allocation support to the AMDKFD driver. The series allows allocating multiple non-contiguous CPU virtual address ranges that map to a single contiguous GPU virtual address, which is valuable for virtualized GPU workloads.
**High-Level Assessment:**
**Strengths:**
- Well-structured patch series with logical separation of concerns
- Clear progression: UAPI → data structures → interval tree → MMU notifier → page management → API → cleanup → ioctl integration
- Addresses previous review feedback (v4 improvements noted in cover letter)
- Good use of interval tree for efficient range invalidation
**Critical Issues Requiring Attention:**
1. **Race Conditions & Locking**: Several critical race conditions exist around the `valid` flag and MMU notifier interactions
2. **Memory Safety**: Potential use-after-free and double-free issues in cleanup paths
3. **Error Handling**: Incomplete error handling with memory leaks in failure paths
4. **Integer Overflow**: Missing overflow checks in size calculations
5. **UAPI Stability**: Overloading `mmap_offset` field creates ABI compatibility concerns
6. **Missing Validation**: Insufficient input validation for ranges (overlap, ordering, gaps)
**Recommendation:** This series requires significant work before merging. The core design is sound, but implementation details need careful attention to locking, memory management, and error handling.
---
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 18+ messages in thread