From mboxrd@z Thu Jan  1 00:00:00 1970
From: Claude Code Review Bot <claude-review@example.com>
To: dri-devel-reviews@example.com
Subject: Claude review: accel/amdxdna: Add expandable device heap support
Date: Sat, 16 May 2026 08:50:25 +1000
Message-ID: <review-patch1-20260515161922.744647-1-lizhi.hou@amd.com>
In-Reply-To: <20260515161922.744647-1-lizhi.hou@amd.com>
References: <20260515161922.744647-1-lizhi.hou@amd.com> <20260515161922.744647-1-lizhi.hou@amd.com>
X-Mailer: Claude Code Patch Reviewer
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

Patch Review

**Locking: `amdxdna_hwctx_expand_heap` drops and re-takes `mm_lock` mid-ite=
ration**

In `amdxdna_ctx.c`, `amdxdna_hwctx_expand_heap()` iterates the `dev_heap_xa=
` under `mm_lock`, but drops the lock to call `hwctx_heap_expand`:

```c
xa_for_each_range(&client->dev_heap_xa, heap_id, heap,
                  nid, client->dev_heap_nid) {
    drm_gem_object_get(to_gobj(heap));
    ret =3D amdxdna_gem_pin(heap);
    ...
    mutex_unlock(&client->mm_lock);
    ret =3D xdna->dev_info->ops->hwctx_heap_expand(hwctx, heap);
    mutex_lock(&client->mm_lock);
    ...
```

Dropping `mm_lock` while iterating the xarray is dangerous. `client->dev_he=
ap_nid` could change while the lock is dropped (another thread could add a =
new heap chunk). The `xa_for_each_range` macro caches its iteration state, =
but the upper bound `client->dev_heap_nid` is evaluated once. More importan=
tly, if another concurrent call to `amdxdna_update_heap(client, NULL)` runs=
, you could get concurrent iteration + expansion on the same hwctx. The `de=
v_lock` is held in the `amdxdna_drm_create_bo_ioctl` path but not in the `a=
ie2_hwctx_init` path, so this is a real concern.

**Off-by-one in `xa_for_each_range` upper bound**

In `aie2_hwctx_restart`:
```c
xa_for_each_range(&hwctx->client->dev_heap_xa, heap_id, heap, 1,
                  hwctx->last_attached_heap) {
```

And in `amdxdna_hwctx_release_expanded_heap`:
```c
xa_for_each_range(&client->dev_heap_xa, heap_id, heap, 1,
                  hwctx->last_attached_heap) {
```

These iterate from index 1 to `last_attached_heap` inclusive. But in `amdxd=
na_hwctx_expand_heap`, `last_attached_heap` is set to `heap_id` on success:
```c
hwctx->last_attached_heap =3D heap_id;
```

This looks correct for the restart/release paths =E2=80=94 they replay from=
 1 (skipping the initial heap at 0) to the last expanded heap inclusive. Ho=
wever, `last_attached_heap` is initialized to 0 (by kzalloc), and the condi=
tional in `amdxdna_hwctx_release_expanded_heap` checks `if (hwctx->last_att=
ached_heap)` =E2=80=94 this works but is fragile. If `last_attached_heap` w=
ere ever set to 0 for a different reason, the release path would skip clean=
up.

**`amdxdna_update_heap` called from `amdxdna_drm_create_bo_ioctl` with wron=
g locking**

```c
case AMDXDNA_BO_DEV:
    abo =3D amdxdna_drm_create_dev_bo(dev, args, filp);
    if (!IS_ERR(abo)) {
        mutex_lock(&xdna->dev_lock);
        ret =3D amdxdna_update_heap(client, NULL);
        mutex_unlock(&xdna->dev_lock);
        if (ret)
            goto put_obj;
    }
```

`amdxdna_update_heap(client, NULL)` iterates all hwctxs with `amdxdna_for_e=
ach_hwctx`, which is `xa_for_each(&client->hwctx_xa, ...)`. In other callsi=
tes this xarray is iterated under SRCU read lock (`client->hwctx_srcu`), bu=
t here it's only under `dev_lock`. While `dev_lock` may prevent concurrent =
hwctx creation/destruction, the existing code consistently uses SRCU for hw=
ctx xarray iteration. This is at minimum inconsistent and may be incorrect =
if other paths remove hwctxs without holding `dev_lock`.

**Missing pin in sync BO path for dev BOs**

The original `amdxdna_drm_sync_bo_ioctl` pinned the BO before flushing. In =
the new code, the AMDXDNA_BO_DEV path skips pinning entirely:

```c
if (abo->type =3D=3D AMDXDNA_BO_DEV) {
    ...
    xa_for_each_range(&client->dev_heap_xa, heap_id, heap, ...) {
        ...
        ret =3D amdxdna_flush_bo(heap, start - heap_start, end - start);
```

The heap chunks are flushed without pinning them first. The original code p=
inned the dev BO (which redirected to pinning the heap). Without pinning, `=
amdxdna_gem_vmap()` or `abo->base.pages` may not be valid. The flush helper=
 `amdxdna_flush_bo` tries `amdxdna_gem_vmap()` and `abo->base.pages`, but t=
hese may be NULL if the heap isn't pinned.

**`amdxdna_flush_bo` silently clamps size but logs no warning**

```c
static int amdxdna_flush_bo(struct amdxdna_gem_obj *abo, u64 offset, u64 si=
ze)
{
    ...
    size =3D min(abo->mem.size, end) - offset;
    if (is_import_bo(abo))
        drm_clflush_sg(abo->base.sgt);
    else if (amdxdna_gem_vmap(abo))
        drm_clflush_virt_range(amdxdna_gem_vmap(abo) + offset, size);
    else if (abo->base.pages)
        drm_clflush_pages(abo->base.pages, abo->mem.size >> PAGE_SHIFT);
```

When the `pages` path is taken, it flushes `abo->mem.size >> PAGE_SHIFT` pa=
ges =E2=80=94 the entire BO, ignoring `offset` and `size`. This was the sam=
e in the original code and is technically a pre-existing issue, but it's no=
w been refactored into a helper that receives offset/size parameters sugges=
ting it should respect them.

Also, the original code had `drm_WARN` for the "can not flush" case. The ne=
w `amdxdna_flush_bo` returns `-EINVAL` for that case, which changes the beh=
avior from a warning to an ioctl error. The caller in the non-dev-BO path t=
hen issues `drm_WARN` on the error, which is fine. But the dev-BO path does=
n't issue a `drm_WARN`, just logs an error =E2=80=94 inconsistent.

**`aie2_send_host_buf_msgs` reuses `DECLARE_AIE_MSG` stack variables across=
 loop iterations**

```c
static int aie2_send_host_buf_msgs(...)
{
    DECLARE_AIE_MSG(map_host_buffer, MSG_OP_MAP_HOST_BUFFER);
    ...
    msg.opcode =3D initial_opcode;
    do {
        req.context_id =3D context_id;
        req.buf_addr =3D addr;
        req.buf_size =3D chunk_size;
        ret =3D aie_send_mgmt_msg_wait(&ndev->aie, &msg);
        ...
        msg.opcode =3D MSG_OP_ADD_HOST_BUFFER;
    } while (size);
```

The `DECLARE_AIE_MSG` macro declares `req` and `resp` as stack-initialized =
structures. This should be fine since `req` fields are fully re-assigned ea=
ch iteration. However, the `msg` struct's `opcode` is overridden to `initia=
l_opcode` before the loop, and set to `MSG_OP_ADD_HOST_BUFFER` after the fi=
rst iteration. The `resp.status` is initialized to -1 by the macro, but aft=
er the first `aie_send_mgmt_msg_wait`, it may contain a different value =E2=
=80=94 this should be re-initialized for subsequent iterations if the messa=
ge handling checks it. Worth verifying.

**Error log removed from `amdxdna_drm_create_dev_bo`**

```c
ret =3D amdxdna_gem_heap_alloc(abo);
if (ret) {
-    XDNA_ERR(xdna, "Failed to alloc dev bo memory, ret %d", ret);
     amdxdna_gem_destroy_obj(abo);
     return ERR_PTR(ret);
}
```

The error log was removed. Since `-EAGAIN` is now an expected return (signa=
ling the need for heap expansion), this makes sense =E2=80=94 but it means =
real allocation failures are also silent. Consider logging at `XDNA_DBG` le=
vel or only suppressing for `-EAGAIN`.

**`amdxdna_gem_uva` changed from function to inline but semantics changed**

The old `amdxdna_gem_uva` computed the UVA for `AMDXDNA_BO_DEV` objects dyn=
amically from the heap's UVA + offset. The new version is a simple inline r=
eturning `abo->mem.uva`:

```c
static inline u64 amdxdna_gem_uva(struct amdxdna_gem_obj *abo)
{
    return abo->mem.uva;
}
```

This means `mem.uva` must be correctly set for DEV BOs during `amdxdna_init=
_dev_bo`. Looking at that function:
```c
dev_bo->mem.uva =3D dev_bo->mm_node.start - heap_addr + exp_heap_uva;
```

This is set once at allocation time. If the heap's UVA changes after the de=
v BO is created (e.g., remap), the dev BO's UVA would be stale. In the old =
code, it was always computed live. This may be acceptable if heap UVA is im=
mutable after mmap, but it's a subtle semantic change worth noting.

**`amdxdna_init_dev_bo` has a subtle underflow risk**

```c
for (heap_id =3D 0; heap_id < client->dev_heap_nid; heap_id++) {
    heap =3D xa_load(&client->dev_heap_xa, heap_id);
    ...
    heap_addr =3D amdxdna_gem_dev_addr(heap);
    if (heap_addr > dev_bo->mm_node.start)
        break;
}

heap_id--;
heap =3D xa_load(&client->dev_heap_xa, heap_id);
```

If `heap_id` is 0 and the first heap's `dev_addr` is already greater than `=
mm_node.start`, `heap_id` will be 0 when it's decremented, wrapping to `UIN=
T_MAX` (it's `u32`). This would be a bug. This scenario shouldn't happen in=
 practice since the first heap starts at `dev_mem_base` and the mm allocato=
r also starts at `dev_mem_base`, but it's not defended against.

**`amdxdna_gem_pin_nolock` for DEV BOs doesn't hold `abo->lock`**

The new `amdxdna_gem_pin_nolock` for `AMDXDNA_BO_DEV` type iterates heap ch=
unks and calls `amdxdna_bo_pin` on each. But `amdxdna_gem_pin` (the caller)=
 takes `abo->lock` =E2=80=94 the lock of the dev BO, not the heap BO. The u=
nderlying `drm_gem_shmem_pin` on the heap BO is called without the heap BO'=
s lock. The old code redirected to `client->dev_heap` which was always the =
same object, and `amdxdna_gem_pin` would take its lock. Now the lock discip=
line has changed =E2=80=94 the dev BO's lock is held but the individual hea=
p BOs' locks are not.

**npu1 gets `dev_heap_max_size =3D AIE2_DEVM_SIZE` (64M) while others get `=
AIE2_DEVM_MAX_SIZE` (512M)**

This is intentional =E2=80=94 npu1 is an older device that doesn't support =
heap expansion. But the code path that creates additional heap BOs doesn't =
check whether the firmware supports `MSG_OP_ADD_HOST_BUFFER` until `aie2_ad=
d_host_buf` is called. On npu1, `dev_heap_max_size` is 64M, same as `dev_me=
m_size`, so only one 64M chunk fits. This means expansion is effectively bl=
ocked by size, not by feature check. This works but is fragile =E2=80=94 if=
 someone changed the sizes without understanding this coupling, npu1 could =
try unsupported firmware commands.

**Summary**: The overall design is sound, but the locking around `amdxdna_h=
wctx_expand_heap` dropping `mm_lock` mid-xarray iteration needs careful scr=
utiny. The missing pin in the sync-BO path for dev BOs is likely a bug. The=
 `heap_id--` underflow in `amdxdna_init_dev_bo` should be defended. The inc=
onsistent lock discipline between `amdxdna_gem_pin_nolock` (holds dev BO lo=
ck but not heap BO locks) and the old code (which held the heap's lock) is =
a correctness concern.

---
Generated by Claude Code Patch Reviewer