* Claude review: RDMA/mlx5: get tph for p2p access when registering dma-buf mr
2026-04-20 18:39 ` [PATCH v1 2/2] RDMA/mlx5: get tph for p2p access when registering dma-buf mr Zhiping Zhang
@ 2026-04-22 23:26 ` Claude Code Review Bot
0 siblings, 0 replies; 8+ messages in thread
From: Claude Code Review Bot @ 2026-04-22 23:26 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Critical: Steering tag index resource leak**
```c
+ ret = mlx5_st_alloc_index_by_tag(dev->mdev, steering_tag, st_index);
+ if (ret) {
+ *ph = MLX5_IB_NO_PH;
+ mlx5_ib_dbg(dev, "st_alloc_index_by_tag failed (%d)\n", ret);
+ }
```
`mlx5_st_alloc_index_by_tag()` allocates a refcounted entry in `st->idx_xa` (in non-direct mode). The resulting `st_index` is passed into `alloc_cacheable_mr()` and written into the hardware MKC, but it is **never stored** in a way that allows `mlx5_st_dealloc_index()` to be called when the MR is destroyed. Looking at `__mlx5_ib_dereg_mr()`, there is no call to `mlx5_st_dealloc_index()` — that only happens through the DMAH path (`mlx5_ib_dealloc_dmah()`).
This means every MR registered via the new dma-buf TPH fallback path leaks an xarray entry. Over time, the steering tag table fills up and `xa_alloc()` will eventually fail.
The MR needs to track whether it owns an st_index allocation and call `mlx5_st_dealloc_index()` during `__mlx5_ib_dereg_mr()`.
**Redundant `dma_buf_get()`/`dma_buf_put()`**
```c
+static void get_tph_mr_dmabuf(struct mlx5_ib_dev *dev, int fd, u16 *st_index,
+ u8 *ph)
+{
+ ...
+ dmabuf = dma_buf_get(fd);
+ if (IS_ERR(dmabuf))
+ return;
+ ...
+ dma_buf_put(dmabuf);
+}
```
This function does `dma_buf_get(fd)` / `dma_buf_put(dmabuf)` solely to query TPH. The caller `reg_user_mr_dmabuf()` will immediately afterward call `ib_umem_dmabuf_get()`, which does another `dma_buf_get()` on the same fd. This is a wasted get/put cycle. Consider restructuring so the TPH query happens after the dmabuf is already acquired (e.g., query from `umem_dmabuf->dmabuf`), avoiding the redundant fd lookup.
**Direct ops call bypasses dma-buf framework**
```c
+ if (!dmabuf->ops->get_tph)
+ goto end_dbuf_put;
+
+ ret = dmabuf->ops->get_tph(dmabuf, &steering_tag, ph, st_width);
```
This calls through the ops table directly rather than through a framework wrapper function. Every other dma-buf operation (map, unmap, pin, unpin, attach, etc.) goes through a wrapper in `drivers/dma-buf/dma-buf.c`. The NULL check should be inside the wrapper, not at the call site. This also means the call bypasses any future framework-level tracing, locking, or validation.
**`ret = 0` initialization in `mlx5_st_create()`**
```c
- int ret;
+ int ret = 0;
```
This fix addresses a real compiler warning, but it would be better in a separate patch since it's an independent bugfix unrelated to the TPH-from-dma-buf feature. It could also be submitted as a standalone fix that can be backported independently.
**`MODULE_IMPORT_NS("DMA_BUF")`**
```c
+MODULE_IMPORT_NS("DMA_BUF");
```
This is needed because `dma_buf_get()` and `dma_buf_put()` are exported in the `DMA_BUF` namespace. Correct.
**`st_width` derivation**
```c
+ st_width = (pdev->tph_req_type == PCI_TPH_REQ_EXT_TPH) ? 16 : 8;
```
This checks `tph_req_type` but doesn't first verify that TPH is actually enabled/supported on this device. If `pdev->tph_req_type` is `PCI_TPH_REQ_DISABLE` or 0 (no TPH), the function will derive `st_width = 8` and proceed to allocate a steering tag that the device can't use. The function should early-return if the device doesn't support TPH (or `dev->st` is NULL), before even querying the dma-buf.
**Error handling: `ph` set to `MLX5_IB_NO_PH` on alloc failure but steering_tag left as-is**
```c
+ ret = mlx5_st_alloc_index_by_tag(dev->mdev, steering_tag, st_index);
+ if (ret) {
+ *ph = MLX5_IB_NO_PH;
```
When `mlx5_st_alloc_index_by_tag()` fails, `*ph` is reset to `MLX5_IB_NO_PH` but `*st_index` is left in an indeterminate state (it was passed as a pointer and `mlx5_st_alloc_index_by_tag()` may or may not have written to it). The caller initialized `st_index = MLX5_MKC_PCIE_TPH_NO_STEERING_TAG_INDEX`, which should be fine if `mlx5_st_alloc_index_by_tag()` doesn't modify `*st_index` on error, but this isn't guaranteed — the xarray path could partially succeed. Safer to explicitly reset `*st_index = MLX5_MKC_PCIE_TPH_NO_STEERING_TAG_INDEX` in the error path.
**Refactoring of `mlx5_st_alloc_index` is clean**
The split into `mlx5_st_alloc_index_by_tag()` (takes raw tag) and `mlx5_st_alloc_index()` (resolves CPU → tag, then calls `_by_tag`) is a clean refactor. The `#ifndef CONFIG_PCIE_TPH` static inline stub returning `-EOPNOTSUPP` is correct.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 0/2] vfio/dma-buf: add TPH support for peer-to-peer access
@ 2026-04-30 20:06 Zhiping Zhang
2026-04-30 20:06 ` [PATCH v2 1/2] vfio: add dma-buf get_tph callback and DMA_BUF_TPH feature Zhiping Zhang
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Zhiping Zhang @ 2026-04-30 20:06 UTC (permalink / raw)
To: Alex Williamson, Jason Gunthorpe, Leon Romanovsky
Cc: Bjorn Helgaas, linux-rdma, linux-pci, netdev, dri-devel,
Keith Busch, Yochai Cohen, Yishai Hadas, Zhiping Zhang
This series adds TLP Processing Hints (TPH) support to the VFIO dma-buf
export path, allowing importing drivers (e.g. mlx5) to use the exporter's
steering tag when performing peer-to-peer DMA into a VFIO-owned device.
Changes since v1:
- VFIO_DEVICE_FEATURE_DMA_BUF is now unchanged — dma_ranges[],
__counted_by(nr_ranges), and flags==0 are all preserved
- Added a new VFIO_DEVICE_FEATURE_DMA_BUF_TPH (feature 13) as a separate
SET ioctl that takes a dmabuf fd, validates it belongs to this vfio
device, and stores the steering tag + processing hint under memory_lock
- Kept the dma_buf_ops.get_tph callback as the general exporter-side
interface for importing drivers
Patch 1 adds the dma-buf get_tph callback and the new vfio uAPI.
Patch 2 wires up the mlx5 RDMA driver as a consumer.
Previous links:
https://lore.kernel.org/linux-pci/20260324234615.3731237-1-zhipingz@meta.com/
https://lore.kernel.org/dri-devel/20260420183920.3626389-1-zhipingz@meta.com/
Zhiping Zhang (2):
vfio: add dma-buf get_tph callback and DMA_BUF_TPH feature
RDMA/mlx5: get tph for p2p access when registering dma-buf mr
drivers/infiniband/hw/mlx5/mr.c | 38 +++++++
drivers/net/ethernet/mellanox/mlx5/core/lib/st.c | 25 +++--
drivers/vfio/pci/vfio_pci_core.c | 3 +
drivers/vfio/pci/vfio_pci_dmabuf.c | 65 ++++++++++++
drivers/vfio/pci/vfio_pci_priv.h | 11 ++
include/linux/dma-buf.h | 17 +++
include/linux/mlx5/driver.h | 7 ++
include/uapi/linux/vfio.h | 22 ++++
8 files changed, 180 insertions(+), 8 deletions(-)
--
2.47.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 1/2] vfio: add dma-buf get_tph callback and DMA_BUF_TPH feature
2026-04-30 20:06 [PATCH v2 0/2] vfio/dma-buf: add TPH support for peer-to-peer access Zhiping Zhang
@ 2026-04-30 20:06 ` Zhiping Zhang
2026-05-04 21:44 ` Alex Williamson
2026-05-04 23:54 ` Claude review: " Claude Code Review Bot
2026-04-30 20:06 ` [PATCH v2 2/2] RDMA/mlx5: get tph for p2p access when registering dma-buf mr Zhiping Zhang
2026-05-04 23:54 ` Claude review: vfio/dma-buf: add TPH support for peer-to-peer access Claude Code Review Bot
2 siblings, 2 replies; 8+ messages in thread
From: Zhiping Zhang @ 2026-04-30 20:06 UTC (permalink / raw)
To: Alex Williamson, Jason Gunthorpe, Leon Romanovsky
Cc: Bjorn Helgaas, linux-rdma, linux-pci, netdev, dri-devel,
Keith Busch, Yochai Cohen, Yishai Hadas, Zhiping Zhang
Add a dma-buf callback that returns raw TPH metadata from the exporter
so peer devices can reuse the steering tag and processing hint
associated with a VFIO-exported buffer.
Add a new VFIO_DEVICE_FEATURE_DMA_BUF_TPH ioctl that takes the fd from
VFIO_DEVICE_FEATURE_DMA_BUF along with a steering tag and processing
hint, validates the fd is a vfio-exported dma-buf belonging to this
device, and stores the TPH values under memory_lock. This keeps the
existing VFIO_DEVICE_FEATURE_DMA_BUF uAPI completely unchanged.
The user sequences setting TPH on the dma-buf before the importer
consumes it.
Add an st_width parameter to get_tph() so the exporter can reject
steering tags that exceed the consumer's supported width (8 vs 16 bit).
When no TPH metadata was supplied, get_tph() returns -EOPNOTSUPP.
Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1534,6 +1534,9 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
return vfio_pci_core_feature_token(vdev, flags, arg, argsz);
case VFIO_DEVICE_FEATURE_DMA_BUF:
return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz);
+ case VFIO_DEVICE_FEATURE_DMA_BUF_TPH:
+ return vfio_pci_core_feature_dma_buf_tph(vdev, flags, arg,
+ argsz);
default:
return -ENOTTY;
}
diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
--- a/drivers/vfio/pci/vfio_pci_dmabuf.c
+++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
@@ -19,6 +19,9 @@ struct vfio_pci_dma_buf {
u32 nr_ranges;
struct kref kref;
struct completion comp;
+ u16 steering_tag;
+ u8 ph;
+ u8 tph_present : 1;
u8 revoked : 1;
};
@@ -69,6 +72,22 @@ vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment,
return ret;
}
+static int vfio_pci_dma_buf_get_tph(struct dma_buf *dmabuf, u16 *steering_tag,
+ u8 *ph, u8 st_width)
+{
+ struct vfio_pci_dma_buf *priv = dmabuf->priv;
+
+ if (!priv->tph_present)
+ return -EOPNOTSUPP;
+
+ if (st_width < 16 && priv->steering_tag > ((1U << st_width) - 1))
+ return -EINVAL;
+
+ *steering_tag = priv->steering_tag;
+ *ph = priv->ph;
+ return 0;
+}
+
static void vfio_pci_dma_buf_unmap(struct dma_buf_attachment *attachment,
struct sg_table *sgt,
enum dma_data_direction dir)
@@ -101,6 +120,7 @@ static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf)
static const struct dma_buf_ops vfio_pci_dmabuf_ops = {
.attach = vfio_pci_dma_buf_attach,
+ .get_tph = vfio_pci_dma_buf_get_tph,
.map_dma_buf = vfio_pci_dma_buf_map,
.unmap_dma_buf = vfio_pci_dma_buf_unmap,
.release = vfio_pci_dma_buf_release,
@@ -331,6 +351,55 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
return ret;
}
+int vfio_pci_core_feature_dma_buf_tph(struct vfio_pci_core_device *vdev,
+ u32 flags,
+ struct vfio_device_feature_dma_buf_tph __user *arg,
+ size_t argsz)
+{
+ struct vfio_device_feature_dma_buf_tph set_tph;
+ struct vfio_pci_dma_buf *priv;
+ struct dma_buf *dmabuf;
+ int ret;
+
+ ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET,
+ sizeof(set_tph));
+ if (ret != 1)
+ return ret;
+
+ if (copy_from_user(&set_tph, arg, sizeof(set_tph)))
+ return -EFAULT;
+
+ if (set_tph.reserved)
+ return -EINVAL;
+
+ dmabuf = dma_buf_get(set_tph.dmabuf_fd);
+ if (IS_ERR(dmabuf))
+ return PTR_ERR(dmabuf);
+
+ if (dmabuf->ops != &vfio_pci_dmabuf_ops) {
+ ret = -EINVAL;
+ goto out_put;
+ }
+
+ priv = dmabuf->priv;
+ down_write(&vdev->memory_lock);
+ if (priv->vdev != vdev) {
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+
+ priv->steering_tag = set_tph.steering_tag;
+ priv->ph = set_tph.ph;
+ priv->tph_present = 1;
+ ret = 0;
+
+out_unlock:
+ up_write(&vdev->memory_lock);
+out_put:
+ dma_buf_put(dmabuf);
+ return ret;
+}
+
void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked)
{
struct vfio_pci_dma_buf *priv;
diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
--- a/drivers/vfio/pci/vfio_pci_priv.h
+++ b/drivers/vfio/pci/vfio_pci_priv.h
@@ -118,6 +118,10 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
struct vfio_device_feature_dma_buf __user *arg,
size_t argsz);
+int vfio_pci_core_feature_dma_buf_tph(struct vfio_pci_core_device *vdev,
+ u32 flags,
+ struct vfio_device_feature_dma_buf_tph __user *arg,
+ size_t argsz);
void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev);
void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked);
#else
@@ -128,6 +132,13 @@ vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
{
return -ENOTTY;
}
+static inline int
+vfio_pci_core_feature_dma_buf_tph(struct vfio_pci_core_device *vdev, u32 flags,
+ struct vfio_device_feature_dma_buf_tph __user *arg,
+ size_t argsz)
+{
+ return -ENOTTY;
+}
static inline void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
{
}
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -113,6 +113,23 @@ struct dma_buf_ops {
*/
void (*unpin)(struct dma_buf_attachment *attach);
+ /**
+ * @get_tph:
+ * @dmabuf: DMA buffer for which to retrieve TPH metadata
+ * @steering_tag: Returns the raw TPH steering tag
+ * @ph: Returns the TPH processing hint
+ * @st_width: Consumer's supported steering tag width in bits (8 or 16)
+ *
+ * Return the TPH (TLP Processing Hints) metadata associated with this
+ * DMA buffer. Exporters that do not provide TPH metadata should return
+ * -EOPNOTSUPP. If the steering tag exceeds @st_width bits, return
+ * -EINVAL.
+ *
+ * This callback is optional.
+ */
+ int (*get_tph)(struct dma_buf *dmabuf, u16 *steering_tag, u8 *ph,
+ u8 st_width);
+
/**
* @map_dma_buf:
*
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -1534,6 +1534,28 @@ struct vfio_device_feature_dma_buf {
*/
#define VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 12
+/**
+ * Upon VFIO_DEVICE_FEATURE_SET associate TPH (TLP Processing Hints) metadata
+ * with a vfio-exported dma-buf. The dma-buf must have been created by
+ * VFIO_DEVICE_FEATURE_DMA_BUF on this device.
+ *
+ * dmabuf_fd is the file descriptor returned by VFIO_DEVICE_FEATURE_DMA_BUF.
+ * steering_tag and ph are the raw TPH values that importing drivers should use
+ * when accessing the buffer.
+ *
+ * The user must set TPH on the dma-buf before the importer consumes it.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+#define VFIO_DEVICE_FEATURE_DMA_BUF_TPH 13
+
+struct vfio_device_feature_dma_buf_tph {
+ __s32 dmabuf_fd;
+ __u16 steering_tag;
+ __u8 ph;
+ __u8 reserved;
+};
+
/* -------- API for Type1 VFIO IOMMU -------- */
/**
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 2/2] RDMA/mlx5: get tph for p2p access when registering dma-buf mr
2026-04-30 20:06 [PATCH v2 0/2] vfio/dma-buf: add TPH support for peer-to-peer access Zhiping Zhang
2026-04-30 20:06 ` [PATCH v2 1/2] vfio: add dma-buf get_tph callback and DMA_BUF_TPH feature Zhiping Zhang
@ 2026-04-30 20:06 ` Zhiping Zhang
2026-05-04 23:54 ` Claude review: " Claude Code Review Bot
2026-05-04 23:54 ` Claude review: vfio/dma-buf: add TPH support for peer-to-peer access Claude Code Review Bot
2 siblings, 1 reply; 8+ messages in thread
From: Zhiping Zhang @ 2026-04-30 20:06 UTC (permalink / raw)
To: Alex Williamson, Jason Gunthorpe, Leon Romanovsky
Cc: Bjorn Helgaas, linux-rdma, linux-pci, netdev, dri-devel,
Keith Busch, Yochai Cohen, Yishai Hadas, Zhiping Zhang
Query dma-buf TPH metadata when registering a dma-buf MR for peer to
peer access and translate the raw steering tag into an mlx5 steering tag
index. Factor mlx5_st_alloc_index() so callers that already have a raw
steering tag can allocate the corresponding mlx5 index directly. Keep the
DMAH path as the first priority and only fall back to dma-buf metadata when
no DMAH is supplied.
Pass the device's supported ST width (8 or 16 bit, derived from
pdev->tph_req_type) to get_tph() so the exporter can reject tags that
exceed the consumer's capability. Initialize ret in mlx5_st_create() so the
cached steering-tag path returns success cleanly under clang builds.
Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -46,6 +46,8 @@
#include "data_direct.h"
#include "dmah.h"
+MODULE_IMPORT_NS("DMA_BUF");
+
static int mkey_max_umr_order(struct mlx5_ib_dev *dev)
{
if (MLX5_CAP_GEN(dev->mdev, umr_extended_translation_offset))
@@ -899,6 +901,40 @@ static struct dma_buf_attach_ops mlx5_ib_dmabuf_attach_ops = {
.invalidate_mappings = mlx5_ib_dmabuf_invalidate_cb,
};
+static void get_tph_mr_dmabuf(struct mlx5_ib_dev *dev, int fd, u16 *st_index,
+ u8 *ph)
+{
+ struct pci_dev *pdev = dev->mdev->pdev;
+ struct dma_buf *dmabuf;
+ u16 steering_tag;
+ u8 st_width;
+ int ret;
+
+ st_width = (pdev->tph_req_type == PCI_TPH_REQ_EXT_TPH) ? 16 : 8;
+
+ dmabuf = dma_buf_get(fd);
+ if (IS_ERR(dmabuf))
+ return;
+
+ if (!dmabuf->ops->get_tph)
+ goto end_dbuf_put;
+
+ ret = dmabuf->ops->get_tph(dmabuf, &steering_tag, ph, st_width);
+ if (ret) {
+ mlx5_ib_dbg(dev, "get_tph failed (%d)\n", ret);
+ goto end_dbuf_put;
+ }
+
+ ret = mlx5_st_alloc_index_by_tag(dev->mdev, steering_tag, st_index);
+ if (ret) {
+ *ph = MLX5_IB_NO_PH;
+ mlx5_ib_dbg(dev, "st_alloc_index_by_tag failed (%d)\n", ret);
+ }
+
+end_dbuf_put:
+ dma_buf_put(dmabuf);
+}
+
static struct ib_mr *
reg_user_mr_dmabuf(struct ib_pd *pd, struct device *dma_device,
u64 offset, u64 length, u64 virt_addr,
@@ -941,6 +977,8 @@ reg_user_mr_dmabuf(struct ib_pd *pd, struct device *dma_device,
ph = dmah->ph;
if (dmah->valid_fields & BIT(IB_DMAH_CPU_ID_EXISTS))
st_index = mdmah->st_index;
+ } else {
+ get_tph_mr_dmabuf(dev, fd, &st_index, &ph);
}
mr = alloc_cacheable_mr(pd, &umem_dmabuf->umem, virt_addr,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/st.c
@@ -29,7 +29,7 @@ struct mlx5_st *mlx5_st_create(struct mlx5_core_dev *dev)
u8 direct_mode = 0;
u16 num_entries;
u32 tbl_loc;
- int ret;
+ int ret = 0;
if (!MLX5_CAP_GEN(dev, mkey_pcie_tph))
return NULL;
@@ -92,23 +92,18 @@ void mlx5_st_destroy(struct mlx5_core_dev *dev)
kfree(st);
}
-int mlx5_st_alloc_index(struct mlx5_core_dev *dev, enum tph_mem_type mem_type,
- unsigned int cpu_uid, u16 *st_index)
+int mlx5_st_alloc_index_by_tag(struct mlx5_core_dev *dev, u16 tag,
+ u16 *st_index)
{
struct mlx5_st_idx_data *idx_data;
struct mlx5_st *st = dev->st;
unsigned long index;
u32 xa_id;
- u16 tag;
- int ret;
+ int ret = 0;
if (!st)
return -EOPNOTSUPP;
- ret = pcie_tph_get_cpu_st(dev->pdev, mem_type, cpu_uid, &tag);
- if (ret)
- return ret;
-
if (st->direct_mode) {
*st_index = tag;
return 0;
@@ -152,6 +147,20 @@ int mlx5_st_alloc_index(struct mlx5_core_dev *dev, enum tph_mem_type mem_type,
mutex_unlock(&st->lock);
return ret;
}
+EXPORT_SYMBOL_GPL(mlx5_st_alloc_index_by_tag);
+
+int mlx5_st_alloc_index(struct mlx5_core_dev *dev, enum tph_mem_type mem_type,
+ unsigned int cpu_uid, u16 *st_index)
+{
+ u16 tag;
+ int ret;
+
+ ret = pcie_tph_get_cpu_st(dev->pdev, mem_type, cpu_uid, &tag);
+ if (ret)
+ return ret;
+
+ return mlx5_st_alloc_index_by_tag(dev, tag, st_index);
+}
EXPORT_SYMBOL_GPL(mlx5_st_alloc_index);
int mlx5_st_dealloc_index(struct mlx5_core_dev *dev, u16 st_index)
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -1166,10 +1166,17 @@ int mlx5_dm_sw_icm_dealloc(struct mlx5_core_dev *dev, enum mlx5_sw_icm_type type
u64 length, u16 uid, phys_addr_t addr, u32 obj_id);
#ifdef CONFIG_PCIE_TPH
+int mlx5_st_alloc_index_by_tag(struct mlx5_core_dev *dev, u16 tag,
+ u16 *st_index);
int mlx5_st_alloc_index(struct mlx5_core_dev *dev, enum tph_mem_type mem_type,
unsigned int cpu_uid, u16 *st_index);
int mlx5_st_dealloc_index(struct mlx5_core_dev *dev, u16 st_index);
#else
+static inline int mlx5_st_alloc_index_by_tag(struct mlx5_core_dev *dev,
+ u16 tag, u16 *st_index)
+{
+ return -EOPNOTSUPP;
+}
static inline int mlx5_st_alloc_index(struct mlx5_core_dev *dev,
enum tph_mem_type mem_type,
unsigned int cpu_uid, u16 *st_index)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] vfio: add dma-buf get_tph callback and DMA_BUF_TPH feature
2026-04-30 20:06 ` [PATCH v2 1/2] vfio: add dma-buf get_tph callback and DMA_BUF_TPH feature Zhiping Zhang
@ 2026-05-04 21:44 ` Alex Williamson
2026-05-04 23:54 ` Claude review: " Claude Code Review Bot
1 sibling, 0 replies; 8+ messages in thread
From: Alex Williamson @ 2026-05-04 21:44 UTC (permalink / raw)
To: Zhiping Zhang
Cc: Jason Gunthorpe, Leon Romanovsky, Bjorn Helgaas, linux-rdma,
linux-pci, netdev, dri-devel, Keith Busch, Yochai Cohen,
Yishai Hadas, alex
On Thu, 30 Apr 2026 13:06:56 -0700
Zhiping Zhang <zhipingz@meta.com> wrote:
> Add a dma-buf callback that returns raw TPH metadata from the exporter
> so peer devices can reuse the steering tag and processing hint
> associated with a VFIO-exported buffer.
>
> Add a new VFIO_DEVICE_FEATURE_DMA_BUF_TPH ioctl that takes the fd from
> VFIO_DEVICE_FEATURE_DMA_BUF along with a steering tag and processing
> hint, validates the fd is a vfio-exported dma-buf belonging to this
> device, and stores the TPH values under memory_lock. This keeps the
> existing VFIO_DEVICE_FEATURE_DMA_BUF uAPI completely unchanged.
>
> The user sequences setting TPH on the dma-buf before the importer
> consumes it.
>
> Add an st_width parameter to get_tph() so the exporter can reject
> steering tags that exceed the consumer's supported width (8 vs 16 bit).
> When no TPH metadata was supplied, get_tph() returns -EOPNOTSUPP.
>
> Signed-off-by: Zhiping Zhang <zhipingz@meta.com>
The uAPI is better, but sashiko has some review comments[1] for you.
Please also copy the kvm list for vfio related development. Thanks,
Alex
[1]https://sashiko.dev/#/patchset/20260430200704.352228-1-zhipingz@meta.com
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1534,6 +1534,9 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags,
> return vfio_pci_core_feature_token(vdev, flags, arg, argsz);
> case VFIO_DEVICE_FEATURE_DMA_BUF:
> return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz);
> + case VFIO_DEVICE_FEATURE_DMA_BUF_TPH:
> + return vfio_pci_core_feature_dma_buf_tph(vdev, flags, arg,
> + argsz);
> default:
> return -ENOTTY;
> }
> diff --git a/drivers/vfio/pci/vfio_pci_dmabuf.c b/drivers/vfio/pci/vfio_pci_dmabuf.c
> --- a/drivers/vfio/pci/vfio_pci_dmabuf.c
> +++ b/drivers/vfio/pci/vfio_pci_dmabuf.c
> @@ -19,6 +19,9 @@ struct vfio_pci_dma_buf {
> u32 nr_ranges;
> struct kref kref;
> struct completion comp;
> + u16 steering_tag;
> + u8 ph;
> + u8 tph_present : 1;
> u8 revoked : 1;
> };
>
> @@ -69,6 +72,22 @@ vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment,
> return ret;
> }
>
> +static int vfio_pci_dma_buf_get_tph(struct dma_buf *dmabuf, u16 *steering_tag,
> + u8 *ph, u8 st_width)
> +{
> + struct vfio_pci_dma_buf *priv = dmabuf->priv;
> +
> + if (!priv->tph_present)
> + return -EOPNOTSUPP;
> +
> + if (st_width < 16 && priv->steering_tag > ((1U << st_width) - 1))
> + return -EINVAL;
> +
> + *steering_tag = priv->steering_tag;
> + *ph = priv->ph;
> + return 0;
> +}
> +
> static void vfio_pci_dma_buf_unmap(struct dma_buf_attachment *attachment,
> struct sg_table *sgt,
> enum dma_data_direction dir)
> @@ -101,6 +120,7 @@ static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf)
>
> static const struct dma_buf_ops vfio_pci_dmabuf_ops = {
> .attach = vfio_pci_dma_buf_attach,
> + .get_tph = vfio_pci_dma_buf_get_tph,
> .map_dma_buf = vfio_pci_dma_buf_map,
> .unmap_dma_buf = vfio_pci_dma_buf_unmap,
> .release = vfio_pci_dma_buf_release,
> @@ -331,6 +351,55 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
> return ret;
> }
>
> +int vfio_pci_core_feature_dma_buf_tph(struct vfio_pci_core_device *vdev,
> + u32 flags,
> + struct vfio_device_feature_dma_buf_tph __user *arg,
> + size_t argsz)
> +{
> + struct vfio_device_feature_dma_buf_tph set_tph;
> + struct vfio_pci_dma_buf *priv;
> + struct dma_buf *dmabuf;
> + int ret;
> +
> + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET,
> + sizeof(set_tph));
> + if (ret != 1)
> + return ret;
> +
> + if (copy_from_user(&set_tph, arg, sizeof(set_tph)))
> + return -EFAULT;
> +
> + if (set_tph.reserved)
> + return -EINVAL;
> +
> + dmabuf = dma_buf_get(set_tph.dmabuf_fd);
> + if (IS_ERR(dmabuf))
> + return PTR_ERR(dmabuf);
> +
> + if (dmabuf->ops != &vfio_pci_dmabuf_ops) {
> + ret = -EINVAL;
> + goto out_put;
> + }
> +
> + priv = dmabuf->priv;
> + down_write(&vdev->memory_lock);
> + if (priv->vdev != vdev) {
> + ret = -EINVAL;
> + goto out_unlock;
> + }
> +
> + priv->steering_tag = set_tph.steering_tag;
> + priv->ph = set_tph.ph;
> + priv->tph_present = 1;
> + ret = 0;
> +
> +out_unlock:
> + up_write(&vdev->memory_lock);
> +out_put:
> + dma_buf_put(dmabuf);
> + return ret;
> +}
> +
> void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked)
> {
> struct vfio_pci_dma_buf *priv;
> diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h
> --- a/drivers/vfio/pci/vfio_pci_priv.h
> +++ b/drivers/vfio/pci/vfio_pci_priv.h
> @@ -118,6 +118,10 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pdev)
> int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
> struct vfio_device_feature_dma_buf __user *arg,
> size_t argsz);
> +int vfio_pci_core_feature_dma_buf_tph(struct vfio_pci_core_device *vdev,
> + u32 flags,
> + struct vfio_device_feature_dma_buf_tph __user *arg,
> + size_t argsz);
> void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev);
> void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked);
> #else
> @@ -128,6 +132,13 @@ vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
> {
> return -ENOTTY;
> }
> +static inline int
> +vfio_pci_core_feature_dma_buf_tph(struct vfio_pci_core_device *vdev, u32 flags,
> + struct vfio_device_feature_dma_buf_tph __user *arg,
> + size_t argsz)
> +{
> + return -ENOTTY;
> +}
> static inline void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev)
> {
> }
> diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
> --- a/include/linux/dma-buf.h
> +++ b/include/linux/dma-buf.h
> @@ -113,6 +113,23 @@ struct dma_buf_ops {
> */
> void (*unpin)(struct dma_buf_attachment *attach);
>
> + /**
> + * @get_tph:
> + * @dmabuf: DMA buffer for which to retrieve TPH metadata
> + * @steering_tag: Returns the raw TPH steering tag
> + * @ph: Returns the TPH processing hint
> + * @st_width: Consumer's supported steering tag width in bits (8 or 16)
> + *
> + * Return the TPH (TLP Processing Hints) metadata associated with this
> + * DMA buffer. Exporters that do not provide TPH metadata should return
> + * -EOPNOTSUPP. If the steering tag exceeds @st_width bits, return
> + * -EINVAL.
> + *
> + * This callback is optional.
> + */
> + int (*get_tph)(struct dma_buf *dmabuf, u16 *steering_tag, u8 *ph,
> + u8 st_width);
> +
> /**
> * @map_dma_buf:
> *
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -1534,6 +1534,28 @@ struct vfio_device_feature_dma_buf {
> */
> #define VFIO_DEVICE_FEATURE_MIG_PRECOPY_INFOv2 12
>
> +/**
> + * Upon VFIO_DEVICE_FEATURE_SET associate TPH (TLP Processing Hints) metadata
> + * with a vfio-exported dma-buf. The dma-buf must have been created by
> + * VFIO_DEVICE_FEATURE_DMA_BUF on this device.
> + *
> + * dmabuf_fd is the file descriptor returned by VFIO_DEVICE_FEATURE_DMA_BUF.
> + * steering_tag and ph are the raw TPH values that importing drivers should use
> + * when accessing the buffer.
> + *
> + * The user must set TPH on the dma-buf before the importer consumes it.
> + *
> + * Return: 0 on success, -errno on failure.
> + */
> +#define VFIO_DEVICE_FEATURE_DMA_BUF_TPH 13
> +
> +struct vfio_device_feature_dma_buf_tph {
> + __s32 dmabuf_fd;
> + __u16 steering_tag;
> + __u8 ph;
> + __u8 reserved;
> +};
> +
> /* -------- API for Type1 VFIO IOMMU -------- */
>
> /**
^ permalink raw reply [flat|nested] 8+ messages in thread
* Claude review: vfio/dma-buf: add TPH support for peer-to-peer access
2026-04-30 20:06 [PATCH v2 0/2] vfio/dma-buf: add TPH support for peer-to-peer access Zhiping Zhang
2026-04-30 20:06 ` [PATCH v2 1/2] vfio: add dma-buf get_tph callback and DMA_BUF_TPH feature Zhiping Zhang
2026-04-30 20:06 ` [PATCH v2 2/2] RDMA/mlx5: get tph for p2p access when registering dma-buf mr Zhiping Zhang
@ 2026-05-04 23:54 ` Claude Code Review Bot
2 siblings, 0 replies; 8+ messages in thread
From: Claude Code Review Bot @ 2026-05-04 23:54 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: vfio/dma-buf: add TPH support for peer-to-peer access
Author: Zhiping Zhang <zhipingz@meta.com>
Patches: 4
Reviewed: 2026-05-05T09:54:40.381245
---
This 2-patch series adds TLP Processing Hints (TPH) support to the VFIO dma-buf export path, allowing importing drivers (mlx5 RDMA) to use the exporter's steering tag for peer-to-peer DMA. The design is clean: a separate `VFIO_DEVICE_FEATURE_DMA_BUF_TPH` ioctl keeps the existing `DMA_BUF` uAPI unchanged, and a new `dma_buf_ops.get_tph` callback provides a generic exporter-side interface.
The approach of having userspace set TPH metadata on the dma-buf before the importer consumes it is reasonable for VFIO (device is userspace-owned), but there are a few validation gaps and a locking subtlety around the `get_tph` read path.
Overall the series is a sensible addition. The issues below are mostly about hardening validation and documenting ordering requirements.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 8+ messages in thread
* Claude review: vfio: add dma-buf get_tph callback and DMA_BUF_TPH feature
2026-04-30 20:06 ` [PATCH v2 1/2] vfio: add dma-buf get_tph callback and DMA_BUF_TPH feature Zhiping Zhang
2026-05-04 21:44 ` Alex Williamson
@ 2026-05-04 23:54 ` Claude Code Review Bot
1 sibling, 0 replies; 8+ messages in thread
From: Claude Code Review Bot @ 2026-05-04 23:54 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Missing `ph` validation in the SET ioctl**
The `ph` (processing hint) field is accepted as a raw `__u8` with no validation. PCIe TLP Processing Hints are a 2-bit field (values 0–3). An out-of-range value would be silently stored and returned to importers:
```c
priv->steering_tag = set_tph.steering_tag;
priv->ph = set_tph.ph;
priv->tph_present = 1;
```
Consider adding a check like `if (set_tph.ph > 3) return -EINVAL;` before storing. Otherwise a malicious/buggy userspace can inject arbitrary PH values that importers would trust.
**get_tph callback reads fields without locking or barriers**
`vfio_pci_dma_buf_get_tph()` reads `tph_present`, `steering_tag`, and `ph` without holding any lock:
```c
if (!priv->tph_present)
return -EOPNOTSUPP;
...
*steering_tag = priv->steering_tag;
*ph = priv->ph;
```
While the SET path holds `memory_lock`, the read path does not. The commit message states "The user sequences setting TPH on the dma-buf before the importer consumes it," so in the expected usage there's no race. However, there's no compiler or CPU barrier ensuring the importer sees the stores in order. Consider using `smp_load_acquire`/`smp_store_release` on `tph_present`, or at minimum adding a comment documenting the caller-enforced ordering requirement. The dma-buf core doc for the callback should probably mention this too.
**Bitfield adjacency is safe under memory_lock**
`tph_present` and `revoked` are adjacent bitfields sharing a byte. Both the SET TPH ioctl and `vfio_pci_dma_buf_move` hold `memory_lock` when writing their respective bits, so there's no bitfield tearing risk between those paths. This is correct.
**st_width boundary: what values are valid?**
The `get_tph` callback accepts arbitrary `st_width`:
```c
if (st_width < 16 && priv->steering_tag > ((1U << st_width) - 1))
return -EINVAL;
```
The documentation says "8 or 16," but nothing in the callback or its callers enforces that. If `st_width == 0` is passed, this evaluates to `(1U << 0) - 1 = 0`, rejecting any nonzero tag. This won't cause a crash but may be surprising. Consider a brief comment or a check that `st_width` is one of the expected values.
**uAPI struct looks good**
```c
struct vfio_device_feature_dma_buf_tph {
__s32 dmabuf_fd;
__u16 steering_tag;
__u8 ph;
__u8 reserved;
};
```
8 bytes, naturally aligned, reserved field validated to zero, SET-only. Feature number 13 follows sequentially after 12 (`MIG_PRECOPY_INFOv2`). Clean.
**Stub for `!CONFIG_VFIO_PCI_DMABUF` looks correct**
The `#else` stub returning `-ENOTTY` mirrors the existing `vfio_pci_core_feature_dma_buf` stub.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 8+ messages in thread
* Claude review: RDMA/mlx5: get tph for p2p access when registering dma-buf mr
2026-04-30 20:06 ` [PATCH v2 2/2] RDMA/mlx5: get tph for p2p access when registering dma-buf mr Zhiping Zhang
@ 2026-05-04 23:54 ` Claude Code Review Bot
0 siblings, 0 replies; 8+ messages in thread
From: Claude Code Review Bot @ 2026-05-04 23:54 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Double `dma_buf_get` on the same fd**
`get_tph_mr_dmabuf()` calls `dma_buf_get(fd)` to temporarily resolve the fd just to query TPH metadata:
```c
dmabuf = dma_buf_get(fd);
if (IS_ERR(dmabuf))
return;
...
dma_buf_put(dmabuf);
```
The same fd is resolved again later by the main MR registration path (`ib_umem_dmabuf_get_pinned` or similar). This is functionally correct but does an unnecessary extra get/put cycle. Consider whether the dma_buf pointer could be passed from the caller instead, though I understand it may not be available at this point in the flow.
**Direct callback invocation instead of helper**
The code calls `dmabuf->ops->get_tph()` directly:
```c
if (!dmabuf->ops->get_tph)
goto end_dbuf_put;
ret = dmabuf->ops->get_tph(dmabuf, &steering_tag, ph, st_width);
```
This is the standard pattern for optional dma-buf callbacks (same as how `pin`/`unpin` are called). It would be worth considering whether a `dma_buf_get_tph()` inline helper in `<linux/dma-buf.h>` would be better for encapsulation, especially if other importers will use this callback. But for an initial implementation with a single consumer, this is fine.
**Error handling in `get_tph_mr_dmabuf` is correct**
If `mlx5_st_alloc_index_by_tag` fails, `*ph` is reset to `MLX5_IB_NO_PH`:
```c
ret = mlx5_st_alloc_index_by_tag(dev->mdev, steering_tag, st_index);
if (ret) {
*ph = MLX5_IB_NO_PH;
...
}
```
`*st_index` was never modified by the failed call, so it retains the caller's default (`MLX5_MKC_PCIE_TPH_NO_STEERING_TAG_INDEX`). This is correct — a failed TPH query silently falls back to no-TPH behavior.
**`ret = 0` initialization in `mlx5_st_create` is a real fix**
Looking at the existing code, when the `xa_for_each` loop finds a cached tag match:
```c
xa_for_each(&st->idx_xa, index, idx_data) {
if (tag == idx_data->tag) {
refcount_inc(&idx_data->usecount);
*st_index = index;
goto end;
}
}
...
end:
mutex_unlock(&st->lock);
return ret; // <-- was uninitialized on cache hit path
```
`ret` is genuinely uninitialized on the cache-hit `goto end` path. The `ret = 0` init in the refactored `mlx5_st_alloc_index_by_tag` fixes this bug. This is correct and should be called out more prominently — it's a real bug fix, not just a clang warning suppression.
**Refactoring of `mlx5_st_alloc_index` is clean**
The split into `mlx5_st_alloc_index_by_tag` (takes raw tag) + `mlx5_st_alloc_index` (wraps with `pcie_tph_get_cpu_st`) is a straightforward factoring. The original function body moves to `_by_tag`, and the original becomes a thin wrapper. The `#else` stub for `!CONFIG_PCIE_TPH` is correctly added.
**`MODULE_IMPORT_NS("DMA_BUF")` is needed**
`dma_buf_get` is exported under the `DMA_BUF` namespace, so this is required.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-05-04 23:54 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-30 20:06 [PATCH v2 0/2] vfio/dma-buf: add TPH support for peer-to-peer access Zhiping Zhang
2026-04-30 20:06 ` [PATCH v2 1/2] vfio: add dma-buf get_tph callback and DMA_BUF_TPH feature Zhiping Zhang
2026-05-04 21:44 ` Alex Williamson
2026-05-04 23:54 ` Claude review: " Claude Code Review Bot
2026-04-30 20:06 ` [PATCH v2 2/2] RDMA/mlx5: get tph for p2p access when registering dma-buf mr Zhiping Zhang
2026-05-04 23:54 ` Claude review: " Claude Code Review Bot
2026-05-04 23:54 ` Claude review: vfio/dma-buf: add TPH support for peer-to-peer access Claude Code Review Bot
-- strict thread matches above, loose matches on Subject: below --
2026-04-20 18:39 [PATCH v1 0/2] Retrieve TPH from dma-buf for PCIe P2P memory access Zhiping Zhang
2026-04-20 18:39 ` [PATCH v1 2/2] RDMA/mlx5: get tph for p2p access when registering dma-buf mr Zhiping Zhang
2026-04-22 23:26 ` Claude review: " Claude Code Review Bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox