* [PATCH v1] drm/amdgpu: fix sync handling in amdgpu_dma_buf_move_notify
@ 2026-02-10 9:14 Pierre-Eric Pelloux-Prayer
2026-02-10 9:55 ` Christian König
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Pierre-Eric Pelloux-Prayer @ 2026-02-10 9:14 UTC (permalink / raw)
To: Alex Deucher, Christian König, David Airlie, Simona Vetter
Cc: Pierre-Eric Pelloux-Prayer, Simona Vetter, amd-gfx, dri-devel,
linux-kernel
Invalidating a dmabuf will impact other users of the shared BO.
In the scenario where process A moves the BO, it needs to inform
process B about the move and process B will need to update its
page table.
The commit fixes a synchronisation bug caused by the use of the
ticket: it made amdgpu_vm_handle_moved behave as if updating
the page table immediately was correct but in this case it's not.
An example is the following scenario, with 2 GPUs and glxgears
running on GPU0 and Xorg running on GPU1, on a system where P2P
PCI isn't supported:
glxgears:
export linear buffer from GPU0 and import using GPU1
submit frame rendering to GPU0
submit tiled->linear blit
Xorg:
copy of linear buffer
The sequence of jobs would be:
drm_sched_job_run # GPU0, frame rendering
drm_sched_job_queue # GPU0, blit
drm_sched_job_done # GPU0, frame rendering
drm_sched_job_run # GPU0, blit
move linear buffer for GPU1 access #
amdgpu_dma_buf_move_notify -> update pt # GPU0
It this point the blit job on GPU0 is still running and would
likely produce a page fault.
Fixes: a448cb003edc ("drm/amdgpu: implement amdgpu_gem_prime_move_notify v2")
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
index b9c38a4fe546..656c267dbe58 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
@@ -514,8 +514,15 @@ amdgpu_dma_buf_move_notify(struct dma_buf_attachment *attach)
r = dma_resv_reserve_fences(resv, 2);
if (!r)
r = amdgpu_vm_clear_freed(adev, vm, NULL);
+
+ /* Don't pass 'ticket' to amdgpu_vm_handle_moved: we want the clear=true
+ * path to be used otherwise we might update the PT of another process
+ * while it's using the BO.
+ * With clear=true, amdgpu_vm_bo_update will sync to command submission
+ * from the same VM.
+ */
if (!r)
- r = amdgpu_vm_handle_moved(adev, vm, ticket);
+ r = amdgpu_vm_handle_moved(adev, vm, NULL);
if (r && r != -EBUSY)
DRM_ERROR("Failed to invalidate VM page tables (%d))\n",
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v1] drm/amdgpu: fix sync handling in amdgpu_dma_buf_move_notify
2026-02-10 9:14 [PATCH v1] drm/amdgpu: fix sync handling in amdgpu_dma_buf_move_notify Pierre-Eric Pelloux-Prayer
@ 2026-02-10 9:55 ` Christian König
2026-02-11 6:29 ` Claude review: " Claude Code Review Bot
2026-02-11 6:29 ` Claude Code Review Bot
2 siblings, 0 replies; 4+ messages in thread
From: Christian König @ 2026-02-10 9:55 UTC (permalink / raw)
To: Pierre-Eric Pelloux-Prayer, Alex Deucher, David Airlie,
Simona Vetter
Cc: Simona Vetter, amd-gfx, dri-devel, linux-kernel
On 2/10/26 10:14, Pierre-Eric Pelloux-Prayer wrote:
> Invalidating a dmabuf will impact other users of the shared BO.
> In the scenario where process A moves the BO, it needs to inform
> process B about the move and process B will need to update its
> page table.
>
> The commit fixes a synchronisation bug caused by the use of the
> ticket: it made amdgpu_vm_handle_moved behave as if updating
> the page table immediately was correct but in this case it's not.
>
> An example is the following scenario, with 2 GPUs and glxgears
> running on GPU0 and Xorg running on GPU1, on a system where P2P
> PCI isn't supported:
>
> glxgears:
> export linear buffer from GPU0 and import using GPU1
> submit frame rendering to GPU0
> submit tiled->linear blit
> Xorg:
> copy of linear buffer
>
> The sequence of jobs would be:
> drm_sched_job_run # GPU0, frame rendering
> drm_sched_job_queue # GPU0, blit
> drm_sched_job_done # GPU0, frame rendering
> drm_sched_job_run # GPU0, blit
> move linear buffer for GPU1 access #
> amdgpu_dma_buf_move_notify -> update pt # GPU0
>
> It this point the blit job on GPU0 is still running and would
> likely produce a page fault.
>
> Fixes: a448cb003edc ("drm/amdgpu: implement amdgpu_gem_prime_move_notify v2")
CC: stable?
> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> index b9c38a4fe546..656c267dbe58 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c
> @@ -514,8 +514,15 @@ amdgpu_dma_buf_move_notify(struct dma_buf_attachment *attach)
> r = dma_resv_reserve_fences(resv, 2);
> if (!r)
> r = amdgpu_vm_clear_freed(adev, vm, NULL);
> +
> + /* Don't pass 'ticket' to amdgpu_vm_handle_moved: we want the clear=true
> + * path to be used otherwise we might update the PT of another process
> + * while it's using the BO.
> + * With clear=true, amdgpu_vm_bo_update will sync to command submission
> + * from the same VM.
> + */
> if (!r)
> - r = amdgpu_vm_handle_moved(adev, vm, ticket);
> + r = amdgpu_vm_handle_moved(adev, vm, NULL);
>
> if (r && r != -EBUSY)
> DRM_ERROR("Failed to invalidate VM page tables (%d))\n",
^ permalink raw reply [flat|nested] 4+ messages in thread
* Claude review: drm/amdgpu: fix sync handling in amdgpu_dma_buf_move_notify
2026-02-10 9:14 [PATCH v1] drm/amdgpu: fix sync handling in amdgpu_dma_buf_move_notify Pierre-Eric Pelloux-Prayer
2026-02-10 9:55 ` Christian König
2026-02-11 6:29 ` Claude review: " Claude Code Review Bot
@ 2026-02-11 6:29 ` Claude Code Review Bot
2 siblings, 0 replies; 4+ messages in thread
From: Claude Code Review Bot @ 2026-02-11 6:29 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: drm/amdgpu: fix sync handling in amdgpu_dma_buf_move_notify
Author: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Patches: 2
Reviewed: 2026-02-11T16:29:51.076623
---
This is a single-patch series addressing a synchronization bug in amdgpu's dma-buf move notification handling. The patch has been reviewed and approved by Christian König (maintainer).
**Series Summary:**
- **Scope**: Single patch fixing race condition in cross-process shared buffer handling
- **Impact**: Fixes page faults when shared buffers are moved between processes/GPUs
- **Risk**: Low - changes only synchronization behavior, not fundamental logic
- **Stability candidate**: Yes - fixes a race condition that can cause GPU page faults
**Key Concerns:**
1. The patch lacks a Cc: stable tag despite fixing a regression introduced by commit a448cb003edc
2. Christian König requested stable tag in review but this wasn't addressed
3. The fix is conceptually sound but testing coverage is unclear
**Recommendation**: The patch should be merged with a Cc: stable@vger.kernel.org tag added.
---
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 4+ messages in thread
* Claude review: drm/amdgpu: fix sync handling in amdgpu_dma_buf_move_notify
2026-02-10 9:14 [PATCH v1] drm/amdgpu: fix sync handling in amdgpu_dma_buf_move_notify Pierre-Eric Pelloux-Prayer
2026-02-10 9:55 ` Christian König
@ 2026-02-11 6:29 ` Claude Code Review Bot
2026-02-11 6:29 ` Claude Code Review Bot
2 siblings, 0 replies; 4+ messages in thread
From: Claude Code Review Bot @ 2026-02-11 6:29 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Subject line**: Clear and descriptive. Follows subsystem conventions.
**Commit Message Analysis:**
The commit message effectively explains:
- **What**: Synchronization bug in dma-buf move notification
- **Why**: Passing ticket parameter caused premature page table updates
- **How**: Remove ticket parameter to force synchronous behavior
- **Example**: Detailed multi-GPU scenario with glxgears/Xorg
The scenario description is excellent - it provides concrete steps to reproduce and explains the race window clearly:
```
drm_sched_job_run # GPU0, blit
move linear buffer for GPU1 access #
amdgpu_dma_buf_move_notify -> update pt # GPU0
It this point the blit job on GPU0 is still running and would
likely produce a page fault.
```
**Minor nit**: "It this point" should be "At this point" (typo in line 412 of mbox).
**Code Review:**
```c
- r = amdgpu_vm_handle_moved(adev, vm, ticket);
+ r = amdgpu_vm_handle_moved(adev, vm, NULL);
```
**Analysis:**
1. **Correctness**: The fix is correct. By passing NULL instead of `ticket`, the code forces `amdgpu_vm_handle_moved` to use the `clear=true` path, which ensures proper synchronization with in-flight command submissions from the same VM.
2. **Comment Quality**: The added comment is good but could be clearer:
```c
+ /* Don't pass 'ticket' to amdgpu_vm_handle_moved: we want the clear=true
+ * path to be used otherwise we might update the PT of another process
+ * while it's using the BO.
```
The phrase "update the PT of another process while it's using the BO" is slightly ambiguous. More precisely: we're updating the page table of the importing process (process B) while the exporting process (process A) still has GPU jobs using the old mapping.
3. **Fixes Tag**: Correct format and appropriate commit referenced:
```
Fixes: a448cb003edc ("drm/amdgpu: implement amdgpu_gem_prime_move_notify v2")
```
4. **Missing Stable Tag**: Christian König requested "CC: stable?" in the reply (line 417). This regression fix should include:
```
Cc: stable@vger.kernel.org
```
**Context Review (amdgpu_dma_buf.c:514)**:
Looking at the surrounding code:
```c
r = dma_resv_reserve_fences(resv, 2);
if (!r)
r = amdgpu_vm_clear_freed(adev, vm, NULL); // Already uses NULL
if (!r)
r = amdgpu_vm_handle_moved(adev, vm, NULL); // Now also uses NULL
```
This creates symmetry - both calls now use NULL for the ticket parameter, which is logical since they're part of the same invalidation sequence.
**Potential Issues:**
1. **Performance Impact**: Forcing the synchronous path (clear=true) may have performance implications. However, correctness trumps performance, and cross-process shared buffer scenarios are typically not the fast path.
2. **Testing**: No information provided about:
- How was this bug discovered?
- Has the fix been tested with the glxgears/Xorg scenario described?
- Are there any regressions?
**Review Feedback Items:**
1. **MUST FIX**: Add `Cc: stable@vger.kernel.org` tag as requested by maintainer
2. **SHOULD FIX**: Fix typo "It this point" → "At this point" in commit message
3. **NICE TO HAVE**: Add Tested-by tag if available
4. **NICE TO HAVE**: Clarify in commit message which kernel versions are affected
**Security Implications**: None. This is a robustness fix, not a security vulnerability.
**ABI/API Impact**: None. Internal driver change only.
**Verdict**:
- **With stable tag**: Reviewed-by worthy
- **Without stable tag**: Needs revision as requested by Christian König
**Final Assessment**: The fix is technically correct and has maintainer approval (Reviewed-by: Christian König). The only blocking issue is the missing Cc: stable tag that was explicitly requested in review.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-02-11 6:29 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-10 9:14 [PATCH v1] drm/amdgpu: fix sync handling in amdgpu_dma_buf_move_notify Pierre-Eric Pelloux-Prayer
2026-02-10 9:55 ` Christian König
2026-02-11 6:29 ` Claude review: " Claude Code Review Bot
2026-02-11 6:29 ` Claude Code Review Bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox