public inbox for drm-ai-reviews@public-inbox.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH v2 00/19] tracepoint: Avoid double static_branch evaluation at guarded call sites
@ 2026-03-23 16:00 Vineeth Pillai (Google)
  2026-03-23 16:00 ` [PATCH v2 01/19] tracepoint: Add trace_call__##name() API Vineeth Pillai (Google)
                   ` (5 more replies)
  0 siblings, 6 replies; 11+ messages in thread
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
  To: Steven Rostedt, Peter Zijlstra, Dmitry Ilvokhin
  Cc: Vineeth Pillai (Google), Masami Hiramatsu, Mathieu Desnoyers,
	Ingo Molnar, Jens Axboe, io-uring, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Marcelo Ricardo Leitner, Xin Long, Jon Maloy, Aaron Conole,
	Eelco Chaudron, Ilya Maximets, netdev, bpf, linux-sctp,
	tipc-discussion, dev, Jiri Pirko, Oded Gabbay, Koby Elbaz,
	dri-devel, Rafael J. Wysocki, Viresh Kumar, Gautham R. Shenoy,
	Huang Rui, Mario Limonciello, Len Brown, Srinivas Pandruvada,
	linux-pm, MyungJoo Ham, Kyungmin Park, Chanwoo Choi,
	Christian König, Sumit Semwal, linaro-mm-sig, Eddie James,
	Andrew Jeffery, Joel Stanley, linux-fsi, David Airlie,
	Simona Vetter, Alex Deucher, Danilo Krummrich, Matthew Brost,
	Philipp Stanner, Harry Wentland, Leo Li, amd-gfx, Jiri Kosina,
	Benjamin Tissoires, linux-input, Wolfram Sang, linux-i2c,
	Mark Brown, Michael Hennerich, Nuno Sá, linux-spi,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi, Chris Mason,
	David Sterba, linux-btrfs, Thomas Gleixner, Andrew Morton,
	SeongJae Park, linux-mm, Borislav Petkov, Dave Hansen, x86,
	linux-trace-kernel, linux-kernel

When a caller already guards a tracepoint with an explicit enabled check:

  if (trace_foo_enabled() && cond)
      trace_foo(args);

trace_foo() internally re-evaluates the static_branch_unlikely() key.
Since static branches are patched binary instructions the compiler cannot
fold the two evaluations, so every such site pays the cost twice.

This series introduces trace_call__##name() as a companion to
trace_##name().  It calls __do_trace_##name() directly, bypassing the
redundant static-branch re-check, while preserving all other correctness
properties of the normal path (RCU-watching assertion, might_fault() for
syscall tracepoints).  The internal __do_trace_##name() symbol is not
leaked to call sites; trace_call__##name() is the only new public API.

  if (trace_foo_enabled() && cond)
      trace_call__foo(args);   /* calls __do_trace_foo() directly */

The first patch adds the three-location change to
include/linux/tracepoint.h (__DECLARE_TRACE, __DECLARE_TRACE_SYSCALL,
and the !TRACEPOINTS_ENABLED stub).  The remaining 18 patches
mechanically convert all guarded call sites found in the tree:
kernel/, io_uring/, net/, accel/habanalabs, cpufreq/, devfreq/,
dma-buf/, fsi/, drm/, HID, i2c/, spi/, scsi/ufs/, btrfs/,
net/devlink/, kernel/time/, kernel/trace/, mm/damon/, and arch/x86/.

This series is motivated by Peter Zijlstra's observation in the discussion
around Dmitry Ilvokhin's locking tracepoint instrumentation series, where
he noted that compilers cannot optimize static branches and that guarded
call sites end up evaluating the static branch twice for no reason, and
by Steven Rostedt's suggestion to add a proper API instead of exposing
internal implementation details like __do_trace_##name() directly to
call sites:

  https://lore.kernel.org/linux-trace-kernel/8298e098d3418cb446ef396f119edac58a3414e9.1772642407.git.d@ilvokhin.com

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>

Changes in v2:
- Renamed trace_invoke_##name() to trace_call__##name() (double
  underscore) per review comments.
- Added 4 new patches covering sites missed in v1, found using
  coccinelle to scan the tree (Keith Busch):
    * net/devlink: guarded tracepoint_enabled() block in trap.c
    * kernel/time: early-return guard in tick-sched.c (tick_stop)
    * kernel/trace: early-return guard in trace_benchmark.c
    * mm/damon: early-return guard in core.c
    * arch/x86: do_trace_*() wrapper functions in lib/msr.c, which
      are called exclusively from tracepoint_enabled()-guarded sites
      in asm/msr.h

v1: https://lore.kernel.org/linux-trace-kernel/abSqrJ1J59RQC47U@kbusch-mbp/

Vineeth Pillai (Google) (19):
  tracepoint: Add trace_call__##name() API
  kernel: Use trace_call__##name() at guarded tracepoint call sites
  io_uring: Use trace_call__##name() at guarded tracepoint call sites
  net: Use trace_call__##name() at guarded tracepoint call sites
  accel/habanalabs: Use trace_call__##name() at guarded tracepoint call
    sites
  cpufreq: Use trace_call__##name() at guarded tracepoint call sites
  devfreq: Use trace_call__##name() at guarded tracepoint call sites
  dma-buf: Use trace_call__##name() at guarded tracepoint call sites
  fsi: Use trace_call__##name() at guarded tracepoint call sites
  drm: Use trace_call__##name() at guarded tracepoint call sites
  HID: Use trace_call__##name() at guarded tracepoint call sites
  i2c: Use trace_call__##name() at guarded tracepoint call sites
  spi: Use trace_call__##name() at guarded tracepoint call sites
  scsi: ufs: Use trace_call__##name() at guarded tracepoint call sites
  btrfs: Use trace_call__##name() at guarded tracepoint call sites
  net: devlink: Use trace_call__##name() at guarded tracepoint call
    sites
  kernel: time, trace: Use trace_call__##name() at guarded tracepoint
    call sites
  mm: damon: Use trace_call__##name() at guarded tracepoint call sites
  x86: msr: Use trace_call__##name() at guarded tracepoint call sites

 arch/x86/lib/msr.c                                |  6 +++---
 drivers/accel/habanalabs/common/device.c          | 12 ++++++------
 drivers/accel/habanalabs/common/mmu/mmu.c         |  3 ++-
 drivers/accel/habanalabs/common/pci/pci.c         |  4 ++--
 drivers/cpufreq/amd-pstate.c                      | 10 +++++-----
 drivers/cpufreq/cpufreq.c                         |  2 +-
 drivers/cpufreq/intel_pstate.c                    |  2 +-
 drivers/devfreq/devfreq.c                         |  2 +-
 drivers/dma-buf/dma-fence.c                       |  4 ++--
 drivers/fsi/fsi-master-aspeed.c                   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c            |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c            |  4 ++--
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  2 +-
 drivers/gpu/drm/scheduler/sched_entity.c          |  4 ++--
 drivers/hid/intel-ish-hid/ipc/pci-ish.c           |  2 +-
 drivers/i2c/i2c-core-slave.c                      |  2 +-
 drivers/spi/spi-axi-spi-engine.c                  |  4 ++--
 drivers/ufs/core/ufshcd.c                         | 12 ++++++------
 fs/btrfs/extent_map.c                             |  4 ++--
 fs/btrfs/raid56.c                                 |  4 ++--
 include/linux/tracepoint.h                        | 11 +++++++++++
 io_uring/io_uring.h                               |  2 +-
 kernel/irq_work.c                                 |  2 +-
 kernel/sched/ext.c                                |  2 +-
 kernel/smp.c                                      |  2 +-
 kernel/time/tick-sched.c                          | 12 ++++++------
 kernel/trace/trace_benchmark.c                    |  2 +-
 mm/damon/core.c                                   |  2 +-
 net/core/dev.c                                    |  2 +-
 net/core/xdp.c                                    |  2 +-
 net/devlink/trap.c                                |  2 +-
 net/openvswitch/actions.c                         |  2 +-
 net/openvswitch/datapath.c                        |  2 +-
 net/sctp/outqueue.c                               |  2 +-
 net/tipc/node.c                                   |  2 +-
 35 files changed, 74 insertions(+), 62 deletions(-)

-- 
2.53.0


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 01/19] tracepoint: Add trace_call__##name() API
  2026-03-23 16:00 [PATCH v2 00/19] tracepoint: Avoid double static_branch evaluation at guarded call sites Vineeth Pillai (Google)
@ 2026-03-23 16:00 ` Vineeth Pillai (Google)
  2026-03-24 21:48   ` Claude review: " Claude Code Review Bot
  2026-03-23 16:00 ` [PATCH v2 05/19] accel/habanalabs: Use trace_call__##name() at guarded tracepoint call sites Vineeth Pillai (Google)
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 11+ messages in thread
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
  To: Steven Rostedt, Peter Zijlstra, Dmitry Ilvokhin
  Cc: Vineeth Pillai (Google), Masami Hiramatsu, Mathieu Desnoyers,
	Ingo Molnar, Jens Axboe, io-uring, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Marcelo Ricardo Leitner, Xin Long, Jon Maloy, Aaron Conole,
	Eelco Chaudron, Ilya Maximets, netdev, bpf, linux-sctp,
	tipc-discussion, dev, Jiri Pirko, Oded Gabbay, Koby Elbaz,
	dri-devel, Rafael J. Wysocki, Viresh Kumar, Gautham R. Shenoy,
	Huang Rui, Mario Limonciello, Len Brown, Srinivas Pandruvada,
	linux-pm, MyungJoo Ham, Kyungmin Park, Chanwoo Choi,
	Christian König, Sumit Semwal, linaro-mm-sig, Eddie James,
	Andrew Jeffery, Joel Stanley, linux-fsi, David Airlie,
	Simona Vetter, Alex Deucher, Danilo Krummrich, Matthew Brost,
	Philipp Stanner, Harry Wentland, Leo Li, amd-gfx, Jiri Kosina,
	Benjamin Tissoires, linux-input, Wolfram Sang, linux-i2c,
	Mark Brown, Michael Hennerich, Nuno Sá, linux-spi,
	James E.J. Bottomley, Martin K. Petersen, linux-scsi, Chris Mason,
	David Sterba, linux-btrfs, Thomas Gleixner, Andrew Morton,
	SeongJae Park, linux-mm, Borislav Petkov, Dave Hansen, x86,
	linux-trace-kernel, linux-kernel

Add trace_call__##name() as a companion to trace_##name().  When a
caller already guards a tracepoint with an explicit enabled check:

  if (trace_foo_enabled() && cond)
      trace_foo(args);

trace_foo() internally repeats the static_branch_unlikely() test, which
the compiler cannot fold since static branches are patched binary
instructions.  This results in two static-branch evaluations for every
guarded call site.

trace_call__##name() calls __do_trace_##name() directly, skipping the
redundant static-branch re-check.  This avoids leaking the internal
__do_trace_##name() symbol into call sites while still eliminating the
double evaluation:

  if (trace_foo_enabled() && cond)
      trace_invoke_foo(args);   /* calls __do_trace_foo() directly */

Three locations are updated:
- __DECLARE_TRACE: invoke form omits static_branch_unlikely, retains
  the LOCKDEP RCU-watching assertion.
- __DECLARE_TRACE_SYSCALL: same, plus retains might_fault().
- !TRACEPOINTS_ENABLED stub: empty no-op so callers compile cleanly
  when tracepoints are compiled out.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Assisted-by: Claude:claude-sonnet-4-6
---
 include/linux/tracepoint.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 22ca1c8b54f32..ed969705341f1 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -294,6 +294,10 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
 			WARN_ONCE(!rcu_is_watching(),			\
 				  "RCU not watching for tracepoint");	\
 		}							\
+	}								\
+	static inline void trace_call__##name(proto)			\
+	{								\
+		__do_trace_##name(args);				\
 	}
 
 #define __DECLARE_TRACE_SYSCALL(name, proto, args, data_proto)		\
@@ -313,6 +317,11 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
 			WARN_ONCE(!rcu_is_watching(),			\
 				  "RCU not watching for tracepoint");	\
 		}							\
+	}								\
+	static inline void trace_call__##name(proto)			\
+	{								\
+		might_fault();						\
+		__do_trace_##name(args);				\
 	}
 
 /*
@@ -398,6 +407,8 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
 #define __DECLARE_TRACE_COMMON(name, proto, args, data_proto)		\
 	static inline void trace_##name(proto)				\
 	{ }								\
+	static inline void trace_call__##name(proto)			\
+	{ }								\
 	static inline int						\
 	register_trace_##name(void (*probe)(data_proto),		\
 			      void *data)				\
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 05/19] accel/habanalabs: Use trace_call__##name() at guarded tracepoint call sites
  2026-03-23 16:00 [PATCH v2 00/19] tracepoint: Avoid double static_branch evaluation at guarded call sites Vineeth Pillai (Google)
  2026-03-23 16:00 ` [PATCH v2 01/19] tracepoint: Add trace_call__##name() API Vineeth Pillai (Google)
@ 2026-03-23 16:00 ` Vineeth Pillai (Google)
  2026-03-24 21:48   ` Claude review: " Claude Code Review Bot
  2026-03-23 16:00 ` [PATCH v2 08/19] dma-buf: " Vineeth Pillai (Google)
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 11+ messages in thread
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
  Cc: Vineeth Pillai (Google), Steven Rostedt, Peter Zijlstra,
	Koby Elbaz, Konstantin Sinyuk, Oded Gabbay, Kees Cook,
	Andy Shevchenko, dri-devel, linux-kernel, linux-trace-kernel

Replace trace_foo() with the new trace_call__foo() at sites already
guarded by trace_foo_enabled(), avoiding a redundant
static_branch_unlikely() re-evaluation inside the tracepoint.
trace_call__foo() calls the tracepoint callbacks directly without
utilizing the static branch again.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Assisted-by: Claude:claude-sonnet-4-6
---
 drivers/accel/habanalabs/common/device.c  | 12 ++++++------
 drivers/accel/habanalabs/common/mmu/mmu.c |  3 ++-
 drivers/accel/habanalabs/common/pci/pci.c |  4 ++--
 3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/accel/habanalabs/common/device.c b/drivers/accel/habanalabs/common/device.c
index 09b27bac3a31d..a9ad1edd0bf41 100644
--- a/drivers/accel/habanalabs/common/device.c
+++ b/drivers/accel/habanalabs/common/device.c
@@ -132,8 +132,8 @@ static void *hl_dma_alloc_common(struct hl_device *hdev, size_t size, dma_addr_t
 	}
 
 	if (trace_habanalabs_dma_alloc_enabled() && !ZERO_OR_NULL_PTR(ptr))
-		trace_habanalabs_dma_alloc(&(hdev)->pdev->dev, (u64) (uintptr_t) ptr, *dma_handle,
-						size, caller);
+		trace_call__habanalabs_dma_alloc(&(hdev)->pdev->dev, (u64) (uintptr_t) ptr,
+						  *dma_handle, size, caller);
 
 	return ptr;
 }
@@ -206,7 +206,7 @@ int hl_dma_map_sgtable_caller(struct hl_device *hdev, struct sg_table *sgt,
 		return 0;
 
 	for_each_sgtable_dma_sg(sgt, sg, i)
-		trace_habanalabs_dma_map_page(&(hdev)->pdev->dev,
+		trace_call__habanalabs_dma_map_page(&(hdev)->pdev->dev,
 					page_to_phys(sg_page(sg)),
 					sg->dma_address - prop->device_dma_offset_for_host_access,
 #ifdef CONFIG_NEED_SG_DMA_LENGTH
@@ -249,7 +249,7 @@ void hl_dma_unmap_sgtable_caller(struct hl_device *hdev, struct sg_table *sgt,
 
 	if (trace_habanalabs_dma_unmap_page_enabled()) {
 		for_each_sgtable_dma_sg(sgt, sg, i)
-			trace_habanalabs_dma_unmap_page(&(hdev)->pdev->dev,
+			trace_call__habanalabs_dma_unmap_page(&(hdev)->pdev->dev,
 					page_to_phys(sg_page(sg)),
 					sg->dma_address - prop->device_dma_offset_for_host_access,
 #ifdef CONFIG_NEED_SG_DMA_LENGTH
@@ -2656,7 +2656,7 @@ inline u32 hl_rreg(struct hl_device *hdev, u32 reg)
 	u32 val = readl(hdev->rmmio + reg);
 
 	if (unlikely(trace_habanalabs_rreg32_enabled()))
-		trace_habanalabs_rreg32(&(hdev)->pdev->dev, reg, val);
+		trace_call__habanalabs_rreg32(&(hdev)->pdev->dev, reg, val);
 
 	return val;
 }
@@ -2674,7 +2674,7 @@ inline u32 hl_rreg(struct hl_device *hdev, u32 reg)
 inline void hl_wreg(struct hl_device *hdev, u32 reg, u32 val)
 {
 	if (unlikely(trace_habanalabs_wreg32_enabled()))
-		trace_habanalabs_wreg32(&(hdev)->pdev->dev, reg, val);
+		trace_call__habanalabs_wreg32(&(hdev)->pdev->dev, reg, val);
 
 	writel(val, hdev->rmmio + reg);
 }
diff --git a/drivers/accel/habanalabs/common/mmu/mmu.c b/drivers/accel/habanalabs/common/mmu/mmu.c
index 6c7c4ff8a8a95..fdeca1fe5fc78 100644
--- a/drivers/accel/habanalabs/common/mmu/mmu.c
+++ b/drivers/accel/habanalabs/common/mmu/mmu.c
@@ -263,7 +263,8 @@ int hl_mmu_unmap_page(struct hl_ctx *ctx, u64 virt_addr, u32 page_size, bool flu
 		mmu_funcs->flush(ctx);
 
 	if (trace_habanalabs_mmu_unmap_enabled() && !rc)
-		trace_habanalabs_mmu_unmap(&hdev->pdev->dev, virt_addr, 0, page_size, flush_pte);
+		trace_call__habanalabs_mmu_unmap(&hdev->pdev->dev, virt_addr,
+						  0, page_size, flush_pte);
 
 	return rc;
 }
diff --git a/drivers/accel/habanalabs/common/pci/pci.c b/drivers/accel/habanalabs/common/pci/pci.c
index 81cbd8697d4cd..bffbde3c399be 100644
--- a/drivers/accel/habanalabs/common/pci/pci.c
+++ b/drivers/accel/habanalabs/common/pci/pci.c
@@ -123,7 +123,7 @@ int hl_pci_elbi_read(struct hl_device *hdev, u64 addr, u32 *data)
 		pci_read_config_dword(pdev, mmPCI_CONFIG_ELBI_DATA, data);
 
 		if (unlikely(trace_habanalabs_elbi_read_enabled()))
-			trace_habanalabs_elbi_read(&hdev->pdev->dev, (u32) addr, val);
+			trace_call__habanalabs_elbi_read(&hdev->pdev->dev, (u32) addr, val);
 
 		return 0;
 	}
@@ -186,7 +186,7 @@ static int hl_pci_elbi_write(struct hl_device *hdev, u64 addr, u32 data)
 
 	if ((val & PCI_CONFIG_ELBI_STS_MASK) == PCI_CONFIG_ELBI_STS_DONE) {
 		if (unlikely(trace_habanalabs_elbi_write_enabled()))
-			trace_habanalabs_elbi_write(&hdev->pdev->dev, (u32) addr, val);
+			trace_call__habanalabs_elbi_write(&hdev->pdev->dev, (u32) addr, val);
 		return 0;
 	}
 
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 08/19] dma-buf: Use trace_call__##name() at guarded tracepoint call sites
  2026-03-23 16:00 [PATCH v2 00/19] tracepoint: Avoid double static_branch evaluation at guarded call sites Vineeth Pillai (Google)
  2026-03-23 16:00 ` [PATCH v2 01/19] tracepoint: Add trace_call__##name() API Vineeth Pillai (Google)
  2026-03-23 16:00 ` [PATCH v2 05/19] accel/habanalabs: Use trace_call__##name() at guarded tracepoint call sites Vineeth Pillai (Google)
@ 2026-03-23 16:00 ` Vineeth Pillai (Google)
  2026-03-24 21:48   ` Claude review: " Claude Code Review Bot
  2026-03-23 16:00 ` [PATCH v2 10/19] drm: " Vineeth Pillai (Google)
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 11+ messages in thread
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
  Cc: Vineeth Pillai (Google), Steven Rostedt, Peter Zijlstra,
	Sumit Semwal, Christian König, linux-media, dri-devel,
	linaro-mm-sig, linux-kernel, linux-trace-kernel

Replace trace_foo() with the new trace_call__foo() at sites already
guarded by trace_foo_enabled(), avoiding a redundant
static_branch_unlikely() re-evaluation inside the tracepoint.
trace_call__foo() calls the tracepoint callbacks directly without
utilizing the static branch again.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Assisted-by: Claude:claude-sonnet-4-6
---
 drivers/dma-buf/dma-fence.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 35afcfcac5910..232e92196da43 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -535,7 +535,7 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
 
 	if (trace_dma_fence_wait_start_enabled()) {
 		rcu_read_lock();
-		trace_dma_fence_wait_start(fence);
+		trace_call__dma_fence_wait_start(fence);
 		rcu_read_unlock();
 	}
 	if (fence->ops->wait)
@@ -544,7 +544,7 @@ dma_fence_wait_timeout(struct dma_fence *fence, bool intr, signed long timeout)
 		ret = dma_fence_default_wait(fence, intr, timeout);
 	if (trace_dma_fence_wait_end_enabled()) {
 		rcu_read_lock();
-		trace_dma_fence_wait_end(fence);
+		trace_call__dma_fence_wait_end(fence);
 		rcu_read_unlock();
 	}
 	return ret;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 10/19] drm: Use trace_call__##name() at guarded tracepoint call sites
  2026-03-23 16:00 [PATCH v2 00/19] tracepoint: Avoid double static_branch evaluation at guarded call sites Vineeth Pillai (Google)
                   ` (2 preceding siblings ...)
  2026-03-23 16:00 ` [PATCH v2 08/19] dma-buf: " Vineeth Pillai (Google)
@ 2026-03-23 16:00 ` Vineeth Pillai (Google)
  2026-03-24 21:48   ` Claude review: " Claude Code Review Bot
  2026-03-24 14:28 ` [PATCH v2 00/19] tracepoint: Avoid double static_branch evaluation at guarded " Steven Rostedt
  2026-03-24 21:48 ` Claude review: " Claude Code Review Bot
  5 siblings, 1 reply; 11+ messages in thread
From: Vineeth Pillai (Google) @ 2026-03-23 16:00 UTC (permalink / raw)
  Cc: Vineeth Pillai (Google), Steven Rostedt, Peter Zijlstra,
	Alex Deucher, Christian König, David Airlie, Simona Vetter,
	Harry Wentland, Leo Li, Rodrigo Siqueira, Matthew Brost,
	Danilo Krummrich, Philipp Stanner, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Sunil Khatri, Liu01 Tong,
	Tvrtko Ursulin, Mario Limonciello, Kees Cook, Prike Liang,
	Felix Kuehling, André Almeida, Jesse.Zhang, Philip Yang,
	Alex Hung, Aurabindo Pillai, Ray Wu, Wayne Lin,
	Mario Limonciello (AMD), Timur Kristóf, Ivan Lipski,
	Dominik Kaszewski, amd-gfx, dri-devel, linux-kernel,
	linux-trace-kernel

Replace trace_foo() with the new trace_call__foo() at sites already
guarded by trace_foo_enabled(), avoiding a redundant
static_branch_unlikely() re-evaluation inside the tracepoint.
trace_call__foo() calls the tracepoint callbacks directly without
utilizing the static branch again.

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
Assisted-by: Claude:claude-sonnet-4-6
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c            | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c            | 4 ++--
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 +-
 drivers/gpu/drm/scheduler/sched_entity.c          | 4 ++--
 4 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 24e4b4fc91564..99f0e3c9bddcc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1012,7 +1012,7 @@ static void trace_amdgpu_cs_ibs(struct amdgpu_cs_parser *p)
 		struct amdgpu_job *job = p->jobs[i];
 
 		for (j = 0; j < job->num_ibs; ++j)
-			trace_amdgpu_cs(p, job, &job->ibs[j]);
+			trace_call__amdgpu_cs(p, job, &job->ibs[j]);
 	}
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index f2beb980e3c3a..69d5723b98580 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1394,7 +1394,7 @@ int amdgpu_vm_bo_update(struct amdgpu_device *adev, struct amdgpu_bo_va *bo_va,
 
 	if (trace_amdgpu_vm_bo_mapping_enabled()) {
 		list_for_each_entry(mapping, &bo_va->valids, list)
-			trace_amdgpu_vm_bo_mapping(mapping);
+			trace_call__amdgpu_vm_bo_mapping(mapping);
 	}
 
 error_free:
@@ -2167,7 +2167,7 @@ void amdgpu_vm_bo_trace_cs(struct amdgpu_vm *vm, struct ww_acquire_ctx *ticket)
 				continue;
 		}
 
-		trace_amdgpu_vm_bo_cs(mapping);
+		trace_call__amdgpu_vm_bo_cs(mapping);
 	}
 }
 
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index b3d6f2cd8ab6f..2c6c8050af269 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -5190,7 +5190,7 @@ static void amdgpu_dm_backlight_set_level(struct amdgpu_display_manager *dm,
 	}
 
 	if (trace_amdgpu_dm_brightness_enabled()) {
-		trace_amdgpu_dm_brightness(__builtin_return_address(0),
+		trace_call__amdgpu_dm_brightness(__builtin_return_address(0),
 					   user_brightness,
 					   brightness,
 					   caps->aux_support,
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index fe174a4857be7..93d764d229ca9 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -429,7 +429,7 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity,
 
 	if (trace_drm_sched_job_unschedulable_enabled() &&
 	    !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &entity->dependency->flags))
-		trace_drm_sched_job_unschedulable(sched_job, entity->dependency);
+		trace_call__drm_sched_job_unschedulable(sched_job, entity->dependency);
 
 	if (!dma_fence_add_callback(entity->dependency, &entity->cb,
 				    drm_sched_entity_wakeup))
@@ -586,7 +586,7 @@ void drm_sched_entity_push_job(struct drm_sched_job *sched_job)
 		unsigned long index;
 
 		xa_for_each(&sched_job->dependencies, index, entry)
-			trace_drm_sched_job_add_dep(sched_job, entry);
+			trace_call__drm_sched_job_add_dep(sched_job, entry);
 	}
 	atomic_inc(entity->rq->sched->score);
 	WRITE_ONCE(entity->last_user, current->group_leader);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 00/19] tracepoint: Avoid double static_branch evaluation at guarded call sites
  2026-03-23 16:00 [PATCH v2 00/19] tracepoint: Avoid double static_branch evaluation at guarded call sites Vineeth Pillai (Google)
                   ` (3 preceding siblings ...)
  2026-03-23 16:00 ` [PATCH v2 10/19] drm: " Vineeth Pillai (Google)
@ 2026-03-24 14:28 ` Steven Rostedt
  2026-03-24 21:48 ` Claude review: " Claude Code Review Bot
  5 siblings, 0 replies; 11+ messages in thread
From: Steven Rostedt @ 2026-03-24 14:28 UTC (permalink / raw)
  To: Vineeth Pillai (Google)
  Cc: Peter Zijlstra, Dmitry Ilvokhin, Masami Hiramatsu,
	Mathieu Desnoyers, Ingo Molnar, Jens Axboe, io-uring,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Alexei Starovoitov, Daniel Borkmann, Marcelo Ricardo Leitner,
	Xin Long, Jon Maloy, Aaron Conole, Eelco Chaudron, Ilya Maximets,
	netdev, bpf, linux-sctp, tipc-discussion, dev, Jiri Pirko,
	Oded Gabbay, Koby Elbaz, dri-devel, Rafael J. Wysocki,
	Viresh Kumar, Gautham R. Shenoy, Huang Rui, Mario Limonciello,
	Len Brown, Srinivas Pandruvada, linux-pm, MyungJoo Ham,
	Kyungmin Park, Chanwoo Choi, Christian König, Sumit Semwal,
	linaro-mm-sig, Eddie James, Andrew Jeffery, Joel Stanley,
	linux-fsi, David Airlie, Simona Vetter, Alex Deucher,
	Danilo Krummrich, Matthew Brost, Philipp Stanner, Harry Wentland,
	Leo Li, amd-gfx, Jiri Kosina, Benjamin Tissoires, linux-input,
	Wolfram Sang, linux-i2c, Mark Brown, Michael Hennerich,
	Nuno Sá, linux-spi, James E.J. Bottomley, Martin K. Petersen,
	linux-scsi, Chris Mason, David Sterba, linux-btrfs,
	Thomas Gleixner, Andrew Morton, SeongJae Park, linux-mm,
	Borislav Petkov, Dave Hansen, x86, linux-trace-kernel,
	linux-kernel

On Mon, 23 Mar 2026 12:00:19 -0400
"Vineeth Pillai (Google)" <vineeth@bitbyteword.org> wrote:

> When a caller already guards a tracepoint with an explicit enabled check:
> 
>   if (trace_foo_enabled() && cond)
>       trace_foo(args);

Thanks Vineeth!

I'm going to start pulling in this series. I'll take the first patch, and
then any patch that has an Acked-by or Reviewed-by from the maintainer.

For patches without acks, I'll leave alone and then after the first patch
gets merged into mainline, the maintainers could pull in their own patches
at their own convenience. Unless of course they speak up now if they want
me to take them ;-)

-- Steve

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Claude review: tracepoint: Avoid double static_branch evaluation at guarded call sites
  2026-03-23 16:00 [PATCH v2 00/19] tracepoint: Avoid double static_branch evaluation at guarded call sites Vineeth Pillai (Google)
                   ` (4 preceding siblings ...)
  2026-03-24 14:28 ` [PATCH v2 00/19] tracepoint: Avoid double static_branch evaluation at guarded " Steven Rostedt
@ 2026-03-24 21:48 ` Claude Code Review Bot
  5 siblings, 0 replies; 11+ messages in thread
From: Claude Code Review Bot @ 2026-03-24 21:48 UTC (permalink / raw)
  To: dri-devel-reviews

Overall Series Review

Subject: tracepoint: Avoid double static_branch evaluation at guarded call sites
Author: "Vineeth Pillai (Google)" <vineeth@bitbyteword.org>
Patches: 6
Reviewed: 2026-03-25T07:48:30.130602

---

This is a well-motivated, straightforward optimization series. The core idea — avoiding a redundant `static_branch_unlikely()` evaluation at guarded tracepoint call sites — is sound and was explicitly requested by Peter Zijlstra and Steven Rostedt. The implementation is clean: patch 1 adds the new API in three locations within `tracepoint.h`, and the remaining patches are mechanical conversions.

The mbox filtered to dri-devel contains the cover letter (0/19) plus patches 1/19, 5/19, 8/19, and 10/19. The conversion patches are straightforward and correct. I have one concern about the API patch regarding the LOCKDEP assertion, and one minor observation about the commit message.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Claude review: tracepoint: Add trace_call__##name() API
  2026-03-23 16:00 ` [PATCH v2 01/19] tracepoint: Add trace_call__##name() API Vineeth Pillai (Google)
@ 2026-03-24 21:48   ` Claude Code Review Bot
  0 siblings, 0 replies; 11+ messages in thread
From: Claude Code Review Bot @ 2026-03-24 21:48 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**LOCKDEP assertion missing from `trace_call__##name()`**

In `__DECLARE_TRACE`, the existing `trace_##name()` includes a LOCKDEP RCU-watching assertion:

```c
if (IS_ENABLED(CONFIG_LOCKDEP) && (cond)) {
    WARN_ONCE(!rcu_is_watching(),
              "RCU not watching for tracepoint");
}
```

The new `trace_call__##name()` in this patch omits it entirely:

```c
static inline void trace_call__##name(proto)
{
    __do_trace_##name(args);
}
```

The cover letter claims "preserving all other correctness properties of the normal path (RCU-watching assertion...)" but the implementation does not actually do this. While one could argue the assertion already fired from the preceding `trace_foo_enabled()` call (which does *not* check RCU watching — it only checks the static branch key), the LOCKDEP check in `trace_##name()` runs unconditionally regardless of the static branch, specifically to catch RCU misuse even in testing where the tracepoint may not be enabled. Dropping it from `trace_call__##name()` changes debug-build behavior.

The same concern applies to `__DECLARE_TRACE_SYSCALL`, though there `might_fault()` *is* preserved, which is good.

Consider adding the LOCKDEP check to both `trace_call__` variants, or explicitly documenting in the commit message that it is intentionally omitted and why.

**Commit message inconsistency**: The commit message says:
```
trace_invoke_foo(args);   /* calls __do_trace_foo() directly */
```
This still uses the old `trace_invoke_` naming from v1, but the actual API is `trace_call__`. This should be updated to match.

**Code looks correct otherwise**: The three insertion points (regular, syscall, stub) are the right places, and the stub correctly compiles to an empty function.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Claude review: accel/habanalabs: Use trace_call__##name() at guarded tracepoint call sites
  2026-03-23 16:00 ` [PATCH v2 05/19] accel/habanalabs: Use trace_call__##name() at guarded tracepoint call sites Vineeth Pillai (Google)
@ 2026-03-24 21:48   ` Claude Code Review Bot
  0 siblings, 0 replies; 11+ messages in thread
From: Claude Code Review Bot @ 2026-03-24 21:48 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

All conversions are correct. Each `trace_call__` replacement is properly guarded by the corresponding `trace_*_enabled()` check. Examples:

```c
if (trace_habanalabs_dma_alloc_enabled() && !ZERO_OR_NULL_PTR(ptr))
    trace_call__habanalabs_dma_alloc(...)
```

```c
if (unlikely(trace_habanalabs_rreg32_enabled()))
    trace_call__habanalabs_rreg32(...)
```

The alignment fixups in `device.c` (e.g. adjusting continuation-line indentation for the longer function name) look fine.

No issues.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Claude review: dma-buf: Use trace_call__##name() at guarded tracepoint call sites
  2026-03-23 16:00 ` [PATCH v2 08/19] dma-buf: " Vineeth Pillai (Google)
@ 2026-03-24 21:48   ` Claude Code Review Bot
  0 siblings, 0 replies; 11+ messages in thread
From: Claude Code Review Bot @ 2026-03-24 21:48 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

Both conversions are correct and trivial:

```c
if (trace_dma_fence_wait_start_enabled()) {
    rcu_read_lock();
    trace_call__dma_fence_wait_start(fence);
    rcu_read_unlock();
}
```

The RCU read lock bracketing around the tracepoint is unrelated to the tracepoint's own internal RCU handling and remains correct.

No issues.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Claude review: drm: Use trace_call__##name() at guarded tracepoint call sites
  2026-03-23 16:00 ` [PATCH v2 10/19] drm: " Vineeth Pillai (Google)
@ 2026-03-24 21:48   ` Claude Code Review Bot
  0 siblings, 0 replies; 11+ messages in thread
From: Claude Code Review Bot @ 2026-03-24 21:48 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

All four conversions are correct:

- `amdgpu_cs.c`: `trace_amdgpu_cs_ibs()` is a helper that is only called from within a `trace_amdgpu_cs_enabled()` guard (verified in the existing tree at `amdgpu_cs.c:1021`), so using `trace_call__amdgpu_cs()` inside the loop is correct.

- `amdgpu_vm.c:1394`: Guarded by `trace_amdgpu_vm_bo_mapping_enabled()` — correct.

- `amdgpu_vm.c:2177`: The function `amdgpu_vm_bo_trace_cs()` has an early return `if (!trace_amdgpu_vm_bo_cs_enabled()) return;` — correct.

- `amdgpu_dm.c`: Guarded by `trace_amdgpu_dm_brightness_enabled()` — correct.

- `sched_entity.c:429`: Guarded by `trace_drm_sched_job_unschedulable_enabled()` — correct.

- `sched_entity.c:586`: The `trace_drm_sched_job_add_dep_enabled()` guard is implicit via the xa_for_each being inside the enabled check — verified correct from the patch context.

No issues with any of the conversions.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-03-24 21:48 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-23 16:00 [PATCH v2 00/19] tracepoint: Avoid double static_branch evaluation at guarded call sites Vineeth Pillai (Google)
2026-03-23 16:00 ` [PATCH v2 01/19] tracepoint: Add trace_call__##name() API Vineeth Pillai (Google)
2026-03-24 21:48   ` Claude review: " Claude Code Review Bot
2026-03-23 16:00 ` [PATCH v2 05/19] accel/habanalabs: Use trace_call__##name() at guarded tracepoint call sites Vineeth Pillai (Google)
2026-03-24 21:48   ` Claude review: " Claude Code Review Bot
2026-03-23 16:00 ` [PATCH v2 08/19] dma-buf: " Vineeth Pillai (Google)
2026-03-24 21:48   ` Claude review: " Claude Code Review Bot
2026-03-23 16:00 ` [PATCH v2 10/19] drm: " Vineeth Pillai (Google)
2026-03-24 21:48   ` Claude review: " Claude Code Review Bot
2026-03-24 14:28 ` [PATCH v2 00/19] tracepoint: Avoid double static_branch evaluation at guarded " Steven Rostedt
2026-03-24 21:48 ` Claude review: " Claude Code Review Bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox