From: Junhua Shen <Junhua.Shen@amd.com>
To: <Alexander.Deucher@amd.com>, <Felix.Kuehling@amd.com>,
<Christian.Koenig@amd.com>, <Oak.Zeng@amd.com>,
<Jenny-Jing.Liu@amd.com>, <Philip.Yang@amd.com>,
<Xiaogang.Chen@amd.com>, <Ray.Huang@amd.com>,
<honglei1.huang@amd.com>, <Lingshan.Zhu@amd.com>,
<simona@ffwll.ch>
Cc: <amd-gfx@lists.freedesktop.org>,
<dri-devel@lists.freedesktop.org>, <Junhua.Shen@amd.com>
Subject: [PATCH v4 0/6] drm/amdgpu: SVM VRAM migration via drm_pagemap (XNACK-on)
Date: Wed, 13 May 2026 17:57:28 +0800 [thread overview]
Message-ID: <20260513095734.69598-1-Junhua.Shen@amd.com> (raw)
This series adds VRAM migration support to amdgpu's SVM (Shared Virtual
Memory) implementation, using the drm_pagemap framework for ZONE_DEVICE
page management and SDMA for data migration.
This is the XNACK-on (GPU fault-driven) version of the migration
series, built on top of the drm_gpusvm-based amdgpu SVM core [1].
Previous v1/v2/v3 were XNACK-off (ioctl-driven) based on an earlier
SVM core; this v4 is a rewrite targeting the XNACK-on path.
The implementation follows the Xe driver's approach for TTM eviction,
using synchronous bo_move to migrate device-private pages back to system
RAM when TTM needs to evict SVM BOs.
Key design points:
- GPU VRAM registered as ZONE_DEVICE via devm_memremap_pages(),
wrapped in struct amdgpu_pagemap with drm_pagemap state
- SDMA-based data transfer through GART aperture window for both
copy_to_devmem and copy_to_ram callbacks
- amdgpu_bo_svm: lightweight BO subtype with drm_pagemap_devmem for
ZONE_DEVICE page ownership tracking
- Synchronous TTM eviction via drm_pagemap_evict_to_ram() in bo_move,
following the Xe pattern (no eviction fences needed)
- Migration policy driven by SVM range attributes (preferred location,
prefetch hints) and GPU fault path
Limitations:
- Single GPU only; multi-GPU migration is not addressed
- No VRAM-to-VRAM (peer GPU) migration
Open issue:
- Unnecessary TTM system memory allocation during eviction: when TTM
evicts an SVM BO, it allocates a destination system memory resource
(TTM_PL_SYSTEM) before calling bo_move, then frees it afterwards.
This allocation is unnecessary because the actual data migration is
done via drm_pagemap_evict_to_ram() → migrate_device_* which
migrates device-private pages directly to regular system pages,
bypassing the TTM-allocated resource entirely. The current TTM
framework does not support num_placement=0 to skip this redundant
allocation; this needs further discussion.
Dependencies:
This series applies on top of the amdgpu drm_gpusvm SVM core [1].
[1] https://lore.kernel.org/amd-gfx/20260508075129.1161157-1-honglei1.huang@amd.com/
Changes since v3:
- Rebased on drm_gpusvm-based amdgpu SVM core [1], switching from
XNACK-off ioctl-driven to XNACK-on GPU fault-driven migration
- Introduced amdgpu_bo_svm subtype with drm_pagemap_devmem embedding
and two-layer reference counting (GEM refcount + TTM kref)
- Added synchronous TTM eviction via drm_pagemap_evict_to_ram() in
amdgpu_bo_move(), following the Xe driver pattern
- Added amdgpu_bo_is_amdgpu_bo() check for SVM BOs in TTM path
- Cleaned up container_of macros to follow amdgpu conventions
(to_amdgpu_bo_svm as #define, devmem_to_amdgpu_bo_svm as inline)
Changes since v2:
- Moved amdgpu_pagemap entirely to amdgpu side, eliminating all KFD
modifications
- Split commits for better reviewability: separated infrastructure
from SDMA callbacks, decision layer from integration
- Merged ZONE_DEVICE registration hook into the integration patch
Changes since v1:
- Dropped the eviction fence patch (was 4/6) after Christian König
pointed out it violates the dma_fence contract
- Refactored migration integration: extracted migration logic into
new files amdgpu_svm_range_migrate.{c,h}
- Introduced enum amdgpu_svm_migrate_mode (PREFERRED, TO_VRAM,
TO_SYSMEM, NONE) to make migration intent explicit, replacing
the _ex functions used in v1
Previous versions:
v1 (XNACK-off): https://lore.kernel.org/amd-gfx/20260410113146.146212-1-Junhua.Shen@amd.com/
v2 (XNACK-off): https://lore.kernel.org/amd-gfx/20260413103031.181953-1-Junhua.Shen@amd.com/
v3 (XNACK-off): https://lore.kernel.org/amd-gfx/20260427100522.7014-1-Junhua.Shen@amd.com/
Test results:
Tested on gfx943 (MI300X) and gfx906 (MI60) with XNACK on:
- KFD test: 95%+ passed.
- ROCR test: all passed.
Patch overview:
1/6 Core VRAM migration infrastructure (ZONE_DEVICE registration,
amdgpu_pagemap, amdgpu_bo_svm subtype, drm_pagemap_ops)
2/6 SDMA migration callbacks (copy_to_devmem, copy_to_ram,
populate_devmem_pfn via GART aperture window)
3/6 Synchronous TTM eviction for SVM BOs (amdgpu_svm_bo_evict
in bo_move path, amdgpu_bo_is_amdgpu_bo check)
4/6 SVM range migration helpers (range-level migrate_to_vram /
migrate_to_sysmem decision layer)
5/6 Hook up ZONE_DEVICE registration in device init and GPU reset
6/6 Wire up VRAM migration into SVM range map and GPU fault paths
Junhua Shen (6):
drm/amdgpu: add VRAM migration infrastructure for drm_pagemap
drm/amdgpu: implement drm_pagemap SDMA migration callbacks
drm/amdgpu: implement synchronous TTM eviction for SVM BOs
drm/amdgpu: add SVM range migration helpers for drm_pagemap
drm/amdgpu: hook up ZONE_DEVICE registration in device init and reset
drm/amdgpu: integrate VRAM migration into SVM range map and fault
paths
drivers/gpu/drm/amd/amdgpu/Makefile | 6 +-
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 8 +
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +
drivers/gpu/drm/amd/amdgpu/amdgpu_migrate.c | 831 ++++++++++++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_migrate.h | 110 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 4 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 4 +
drivers/gpu/drm/amd/amdgpu/amdgpu_svm_attr.c | 4 +
drivers/gpu/drm/amd/amdgpu/amdgpu_svm_fault.c | 9 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range.c | 21 +-
.../drm/amd/amdgpu/amdgpu_svm_range_migrate.c | 122 +++
.../drm/amd/amdgpu/amdgpu_svm_range_migrate.h | 47 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 20 +
13 files changed, 1181 insertions(+), 9 deletions(-)
create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_migrate.c
create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_migrate.h
create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range_migrate.c
create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_svm_range_migrate.h
--
2.34.1
next reply other threads:[~2026-05-13 9:57 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-13 9:57 Junhua Shen [this message]
2026-05-13 9:57 ` [PATCH v4 1/6] drm/amdgpu: add VRAM migration infrastructure for drm_pagemap Junhua Shen
2026-05-16 2:15 ` Claude review: " Claude Code Review Bot
2026-05-13 9:57 ` [PATCH v4 2/6] drm/amdgpu: implement drm_pagemap SDMA migration callbacks Junhua Shen
2026-05-16 2:15 ` Claude review: " Claude Code Review Bot
2026-05-13 9:57 ` [PATCH v4 3/6] drm/amdgpu: implement synchronous TTM eviction for SVM BOs Junhua Shen
2026-05-16 2:15 ` Claude review: " Claude Code Review Bot
2026-05-13 9:57 ` [PATCH v4 4/6] drm/amdgpu: add SVM range migration helpers for drm_pagemap Junhua Shen
2026-05-16 2:15 ` Claude review: " Claude Code Review Bot
2026-05-13 9:57 ` [PATCH v4 5/6] drm/amdgpu: hook up ZONE_DEVICE registration in device init and reset Junhua Shen
2026-05-13 13:47 ` Christian König
2026-05-14 7:33 ` Junhua Shen
2026-05-16 2:15 ` Claude review: " Claude Code Review Bot
2026-05-13 9:57 ` [PATCH v4 6/6] drm/amdgpu: integrate VRAM migration into SVM range map and fault paths Junhua Shen
2026-05-16 2:15 ` Claude review: " Claude Code Review Bot
2026-05-16 2:15 ` Claude review: drm/amdgpu: SVM VRAM migration via drm_pagemap (XNACK-on) Claude Code Review Bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260513095734.69598-1-Junhua.Shen@amd.com \
--to=junhua.shen@amd.com \
--cc=Alexander.Deucher@amd.com \
--cc=Christian.Koenig@amd.com \
--cc=Felix.Kuehling@amd.com \
--cc=Jenny-Jing.Liu@amd.com \
--cc=Lingshan.Zhu@amd.com \
--cc=Oak.Zeng@amd.com \
--cc=Philip.Yang@amd.com \
--cc=Ray.Huang@amd.com \
--cc=Xiaogang.Chen@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=honglei1.huang@amd.com \
--cc=simona@ffwll.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox