From: Bobby Eshleman <bobbyeshleman@gmail.com>
To: Donald Hunter <donald.hunter@gmail.com>,
Jakub Kicinski <kuba@kernel.org>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>,
Andrew Lunn <andrew+netdev@lunn.ch>,
Gerd Hoffmann <kraxel@redhat.com>,
Vivek Kasireddy <vivek.kasireddy@intel.com>,
Sumit Semwal <sumit.semwal@linaro.org>,
Christian König <christian.koenig@amd.com>,
Shuah Khan <shuah@kernel.org>
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
dri-devel@lists.freedesktop.org, linux-media@vger.kernel.org,
linaro-mm-sig@lists.linaro.org, linux-kselftest@vger.kernel.org,
sdf@fomichev.me, razor@blackwall.org, daniel@iogearbox.net,
almasrymina@google.com, matttbe@kernel.org, skhawaja@google.com,
dw@davidwei.uk, Bobby Eshleman <bobbyeshleman@meta.com>
Subject: [PATCH net-next 0/4] net: devmem: allow rx-buf-size > PAGE_SIZE per binding
Date: Wed, 03 Jun 2026 17:42:57 -0700 [thread overview]
Message-ID: <20260603-tcpdm-large-niovs-v1-0-f37a4ac6726c@meta.com> (raw)
Every devmem dmabuf binding hands the page_pool PAGE_SIZE niovs today.
On NICs that consume one descriptor per netmem, this caps a single RX
descriptor at PAGE_SIZE and burns CPU on buffer churn.
In this series, we add a bind-time netlink attribute,
NETDEV_A_DMABUF_RX_BUF_SIZE, that lets userspace request a larger niov size
(power of two >= PAGE_SIZE). Drivers must opt in via
queue_mgmt_ops.QCFG_RX_PAGE_SIZE.
Selftests use udmabuf, but udmabuf sgtables were previously hardcoded to
PAGE_SIZE. This series modifies udmabuf to respect folio sizes in its exported
sgtable. The result is that when backing udmabuf with MFD_HUGETLB 2MB pages,
the sgtable is populated with 2MB entries, allowing devmem's gen_pool to carve
out large (eg. 64K) niovs.
Measurements
------------
Setup: kperf devmem RX/TX cuda, 4 flows, 64 MB messages, 60s, dctcp,
num-rx-queues=4, dmabuf-rx/tx-size-mb=2048, 10 runs per niov size,
mlx5.
niov RX dev Gbps RX flow avg Gbps app sys %
----- ---------------- ----------------- ----------------
4K 300.63 +/- 53.21 75.16 +/- 13.30 54.15 +/- 10.23
16K 321.35 +/- 28.20 80.34 +/- 7.05 41.05 +/- 8.87
32K 347.63 +/- 2.20 86.91 +/- 0.55 44.54 +/- 3.51
64K 332.11 +/- 14.26 83.03 +/- 3.56 35.47 +/- 3.11
RX app sys % drops ~19% from 4K to 64K.
kperf support (not yet merged):
https://github.com/facebookexperimental/kperf/commit/8837577f920876bce6986ec18869ac04439ebcd2
Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Bobby Eshleman (4):
net: devmem: allow rx-buf-size > PAGE_SIZE per dmabuf binding
udmabuf: emit one sg entry per pinned folio
selftests/net: ncdevmem: add -b option to set rx-buf-size on bind
selftests/net: devmem.py: add check_rx_large_niov
Documentation/netlink/specs/netdev.yaml | 8 ++++
drivers/dma-buf/udmabuf.c | 47 ++++++++++++++++---
include/uapi/linux/netdev.h | 1 +
net/core/devmem.c | 52 +++++++++++++---------
net/core/devmem.h | 13 ++++--
net/core/netdev-genl-gen.c | 5 ++-
net/core/netdev-genl.c | 18 +++++++-
tools/include/uapi/linux/netdev.h | 1 +
tools/testing/selftests/drivers/net/hw/config | 1 +
tools/testing/selftests/drivers/net/hw/devmem.py | 12 ++++-
.../testing/selftests/drivers/net/hw/devmem_lib.py | 46 ++++++++++++++++++-
tools/testing/selftests/drivers/net/hw/ncdevmem.c | 49 ++++++++++++++++++--
.../testing/selftests/drivers/net/hw/nk_devmem.py | 11 ++++-
13 files changed, 220 insertions(+), 44 deletions(-)
---
base-commit: dfcc2ff12925d99e858eaf539eaa4aaaf81fe2a6
change-id: 20260602-tcpdm-large-niovs-56523a3a1077
Best regards,
--
Bobby Eshleman <bobbyeshleman@meta.com>
next reply other threads:[~2026-06-04 7:03 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-04 0:42 Bobby Eshleman [this message]
2026-06-04 0:42 ` [PATCH net-next 1/4] net: devmem: allow rx-buf-size > PAGE_SIZE per dmabuf binding Bobby Eshleman
2026-06-04 20:49 ` Claude review: " Claude Code Review Bot
2026-06-04 0:42 ` [PATCH net-next 2/4] udmabuf: emit one sg entry per pinned folio Bobby Eshleman
2026-06-04 20:49 ` Claude review: " Claude Code Review Bot
2026-06-04 0:43 ` [PATCH net-next 3/4] selftests/net: ncdevmem: add -b option to set rx-buf-size on bind Bobby Eshleman
2026-06-04 20:49 ` Claude review: " Claude Code Review Bot
2026-06-04 0:43 ` [PATCH net-next 4/4] selftests/net: devmem.py: add check_rx_large_niov Bobby Eshleman
2026-06-04 20:49 ` Claude review: " Claude Code Review Bot
2026-06-04 20:49 ` Claude review: net: devmem: allow rx-buf-size > PAGE_SIZE per binding Claude Code Review Bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260603-tcpdm-large-niovs-v1-0-f37a4ac6726c@meta.com \
--to=bobbyeshleman@gmail.com \
--cc=almasrymina@google.com \
--cc=andrew+netdev@lunn.ch \
--cc=bobbyeshleman@meta.com \
--cc=christian.koenig@amd.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=donald.hunter@gmail.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=dw@davidwei.uk \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=kraxel@redhat.com \
--cc=kuba@kernel.org \
--cc=linaro-mm-sig@lists.linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-media@vger.kernel.org \
--cc=matttbe@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=razor@blackwall.org \
--cc=sdf@fomichev.me \
--cc=shuah@kernel.org \
--cc=skhawaja@google.com \
--cc=sumit.semwal@linaro.org \
--cc=vivek.kasireddy@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox