From mboxrd@z Thu Jan 1 00:00:00 1970 From: Claude Code Review Bot To: dri-devel-reviews@example.com Subject: Claude review: mm: setup device page migration in HMM pagewalk Date: Tue, 26 May 2026 07:29:20 +1000 Message-ID: In-Reply-To: <20260525084524.139868-5-mpenttil@redhat.com> References: <20260525084524.139868-1-mpenttil@redhat.com> <20260525084524.139868-5-mpenttil@redhat.com> X-Mailer: Claude Code Patch Reviewer Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Patch Review The core implementation =E2=80=94 fills in the stub functions from patch 3 = with real migration entry installation, implements the split callback, and = rewires `migrate_vma_setup()` to use `hmm_range_fault()` internally. **Issue 1 =E2=80=94 `migrate_vma_setup()` return value semantics changed:** Old behavior: ```c - migrate_vma_collect(args); - if (args->cpages) - migrate_vma_unmap(args); - return 0; ``` New behavior: ```c + ret =3D hmm_range_fault(&range); + migrate_hmm_range_setup(&range); + if (ret) { + migrate_vma_pages(args); + migrate_vma_finalize(args); + } + return ret; ``` The old code always returned 0 after the collect/unmap phase. The new code = can return errors from `hmm_range_fault()`. This changes the API contract f= or callers (test_hmm, nouveau, amdgpu, xe). While callers should already ha= ndle errors from `migrate_vma_setup()` (due to earlier validation failures)= , any caller that assumed the walk phase never fails will now see new error= paths exercised. The cleanup path (calling `migrate_vma_pages` + `migrate_= vma_finalize` to undo migration entries) seems correct but is a new error-h= andling pattern. **Issue 2 =E2=80=94 `migrate_hmm_range_setup` called unconditionally on err= or:** ```c + ret =3D hmm_range_fault(&range); + migrate_hmm_range_setup(&range); ``` If `hmm_range_fault` fails partway, some `hmm_pfns` entries may be in an in= consistent state. `migrate_hmm_range_setup` iterates all entries and only p= rocesses those with `HMM_PFN_MIGRATE` set, so unprocessed entries (value 0)= are safely skipped. This should be correct. **Issue 3 =E2=80=94 `flush_cache_page` called for device private pages:** ```c + flush_cache_page(walk->vma, addr, pfn); ``` This is called in the common path after both the device-private and present= -pte branches converge. For device private pages, the page isn't in CPU cac= he, making this unnecessary. The old code only called `flush_cache_page` fo= r present PTEs. Not a correctness bug, but unnecessary work for device-priv= ate-to-device-private migrations. **Issue 4 =E2=80=94 `hmm_vma_handle_pmd` called with `start` instead of `ad= dr`:** ```c - r =3D hmm_vma_handle_pmd(walk, addr, end, hmm_pfns, pmd); + r =3D hmm_vma_handle_pmd(walk, start, end, hmm_pfns, pmd); ``` Using `start` is correct for the PMD-level handler since the PMD covers `[s= tart, end)`. But note that `hmm_pfns` is recalculated at `again:` based on = `addr`, not `start`. If an `-EAGAIN` retry causes `addr !=3D start` AND the= PMD happens to be (re-)promoted to a huge page in between, `hmm_pfns` woul= d point to the wrong offset. This is a pre-existing race condition (extreme= ly unlikely in practice) that isn't introduced by this patch, but the `star= t` change makes the mismatch slightly more visible. **Issue 5 =E2=80=94 Style: `} else` without braces:** ```c + } else + folio_put(folio); ``` Same coding style issue as patch 3. **Issue 6 =E2=80=94 `hmm_vma_walk_split` folio handling for fault_folio:** ```c + if (folio !=3D fault_folio) { + if (unlikely(!folio_trylock(folio))) { + folio_put(folio); + ret =3D -EBUSY; + goto out; + } + } else + folio_put(folio); ``` When `folio =3D=3D fault_folio`: `folio_get` is followed by `folio_put` (ne= t zero), then `split_folio(folio)` is called. This relies on `fault_folio` = already being locked by the caller with sufficient refcount. Correct but no= n-obvious =E2=80=94 a comment would help here. --- Generated by Claude Code Patch Reviewer