public inbox for drm-ai-reviews@public-inbox.freedesktop.org
 help / color / mirror / Atom feed
* [PATCH v1] misc: fastrpc: Add cache maintenance for non-coherent platforms
@ 2026-06-04 19:48 Abhinav Parihar
  2026-06-04 19:54 ` Claude review: " Claude Code Review Bot
  2026-06-04 19:54 ` Claude Code Review Bot
  0 siblings, 2 replies; 3+ messages in thread
From: Abhinav Parihar @ 2026-06-04 19:48 UTC (permalink / raw)
  To: srinivas.kandagatla, linux-arm-msm
  Cc: Abhinav Parihar, gregkh, quic_bkumar, ekansh.gupta, linux-kernel,
	quic_chennak, dri-devel, arnd

Some platforms using fastrpc do not support DMA coherency on
HLOS. On such systems, explicit cache maintenance is
required to ensure data consistency for RPC argument buffers.

Add cache maintenance for argument buffers when operating on
non-coherent platforms:
- Flush input buffers before invoking RPC to ensure CPU writes are
  visible to the DSP
- Invalidate output buffers after RPC completion to ensure DSP
  writes are visible to the CPU

Introduce helper functions fastrpc_flush_args() and
fastrpc_inv_args() to perform the required dma-buf cache
operations. These are invoked only when the device is not marked
as DMA coherent.

The coherency capability is determined using the "dma-coherent"
device tree property and stored per session context.

This ensures correct data synchronization on platforms lacking
DMA coherency, while avoiding unnecessary overhead on coherent
systems.

Signed-off-by: Abhinav Parihar <abhinav.parihar@oss.qualcomm.com>
---
 drivers/misc/fastrpc.c | 66 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 66 insertions(+)

diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
index 1080f9acf70a..043b6a5548fb 100644
--- a/drivers/misc/fastrpc.c
+++ b/drivers/misc/fastrpc.c
@@ -255,6 +255,7 @@ struct fastrpc_session_ctx {
 	int sid;
 	bool used;
 	bool valid;
+	bool coherent;
 };
 
 struct fastrpc_soc_data {
@@ -973,6 +974,64 @@ static int fastrpc_create_maps(struct fastrpc_invoke_ctx *ctx)
 	return 0;
 }
 
+static void fastrpc_flush_args(struct fastrpc_invoke_ctx *ctx)
+{
+	union fastrpc_remote_arg *rpra = ctx->rpra;
+	int i, inbufs, outbufs;
+
+	inbufs = REMOTE_SCALARS_INBUFS(ctx->sc);
+	outbufs = REMOTE_SCALARS_OUTBUFS(ctx->sc);
+
+	for (i = 0; i < inbufs + outbufs; ++i) {
+		int raix = ctx->olaps[i].raix;
+		struct fastrpc_map *map = ctx->maps[raix];
+
+		if (raix + 1 > inbufs)
+			continue;
+		if (!map || !map->buf)
+			continue;
+
+		if (rpra[raix].buf.len && ctx->olaps[i].mstart) {
+			dma_buf_begin_cpu_access(map->buf, DMA_TO_DEVICE);
+			dma_buf_end_cpu_access(map->buf, DMA_TO_DEVICE);
+		}
+	}
+}
+
+static void fastrpc_inv_args(struct fastrpc_invoke_ctx *ctx)
+{
+	union fastrpc_remote_arg *rpra = ctx->rpra;
+	int i, inbufs, outbufs;
+
+	inbufs = REMOTE_SCALARS_INBUFS(ctx->sc);
+	outbufs = REMOTE_SCALARS_OUTBUFS(ctx->sc);
+
+	for (i = 0; i < inbufs + outbufs; ++i) {
+		int raix = ctx->olaps[i].raix;
+		struct fastrpc_map *map = ctx->maps[raix];
+
+		if (raix + 1 <= inbufs)
+			continue;
+		if (!rpra[raix].buf.len)
+			continue;
+		if (!map || !map->buf)
+			continue;
+
+		/*
+		 * Skip invalidation if the argument overlaps with the
+		 * RPC control header page.
+		 */
+		if (((uintptr_t)rpra & PAGE_MASK) ==
+			((uintptr_t)rpra[raix].buf.pv & PAGE_MASK))
+			continue;
+
+		if (ctx->olaps[i].mstart) {
+			dma_buf_begin_cpu_access(map->buf, DMA_FROM_DEVICE);
+			dma_buf_end_cpu_access(map->buf, DMA_TO_DEVICE);
+		}
+	}
+}
+
 static struct fastrpc_invoke_buf *fastrpc_invoke_buf_start(union fastrpc_remote_arg *pra, int len)
 {
 	return (struct fastrpc_invoke_buf *)(&pra[len]);
@@ -1093,6 +1152,9 @@ static int fastrpc_get_args(u32 kernel, struct fastrpc_invoke_ctx *ctx)
 		}
 	}
 
+	if (!ctx->fl->sctx->coherent)
+		fastrpc_flush_args(ctx);
+
 	for (i = ctx->nbufs; i < ctx->nscalars; ++i) {
 		list[i].num = ctx->args[i].length ? 1 : 0;
 		list[i].pgidx = i;
@@ -1239,6 +1301,9 @@ static int fastrpc_internal_invoke(struct fastrpc_user *fl,  u32 kernel,
 
 	/* make sure that all memory writes by DSP are seen by CPU */
 	dma_rmb();
+	if (!fl->sctx->coherent)
+		fastrpc_inv_args(ctx);
+
 	/* populate all the output buffers with results */
 	err = fastrpc_put_args(ctx, kernel);
 	if (err)
@@ -2217,6 +2282,7 @@ static int fastrpc_cb_probe(struct platform_device *pdev)
 	sess->used = false;
 	sess->valid = true;
 	sess->dev = dev;
+	sess->coherent = of_property_read_bool(dev->of_node, "dma-coherent");
 	dev_set_drvdata(dev, sess);
 
 	if (cctx->domain_id == CDSP_DOMAIN_ID)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Claude review: misc: fastrpc: Add cache maintenance for non-coherent platforms
  2026-06-04 19:48 [PATCH v1] misc: fastrpc: Add cache maintenance for non-coherent platforms Abhinav Parihar
  2026-06-04 19:54 ` Claude review: " Claude Code Review Bot
@ 2026-06-04 19:54 ` Claude Code Review Bot
  1 sibling, 0 replies; 3+ messages in thread
From: Claude Code Review Bot @ 2026-06-04 19:54 UTC (permalink / raw)
  To: dri-devel-reviews

Overall Series Review

Subject: misc: fastrpc: Add cache maintenance for non-coherent platforms
Author: Abhinav Parihar <abhinav.parihar@oss.qualcomm.com>
Patches: 1
Reviewed: 2026-06-05T05:54:35.360233

---

This single patch adds explicit cache maintenance (flush/invalidate) for fastrpc argument buffers on platforms that lack DMA coherency between the CPU and DSP. The intent is sound — non-coherent platforms need software cache management — but the implementation has a **critical correctness bug** in the invalidation path (mismatched DMA direction in `dma_buf_end_cpu_access`), ignores error returns from the dma-buf API, and uses a somewhat unconventional cache-maintenance pattern that deserves closer scrutiny.

**Verdict: Needs revision before merge.**

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Claude review: misc: fastrpc: Add cache maintenance for non-coherent platforms
  2026-06-04 19:48 [PATCH v1] misc: fastrpc: Add cache maintenance for non-coherent platforms Abhinav Parihar
@ 2026-06-04 19:54 ` Claude Code Review Bot
  2026-06-04 19:54 ` Claude Code Review Bot
  1 sibling, 0 replies; 3+ messages in thread
From: Claude Code Review Bot @ 2026-06-04 19:54 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**BUG (Critical): Direction mismatch in `fastrpc_inv_args()`**

```c
if (ctx->olaps[i].mstart) {
    dma_buf_begin_cpu_access(map->buf, DMA_FROM_DEVICE);
    dma_buf_end_cpu_access(map->buf, DMA_TO_DEVICE);
}
```

The `begin_cpu_access` uses `DMA_FROM_DEVICE` (invalidate — correct for output buffers being read by CPU), but `end_cpu_access` uses `DMA_TO_DEVICE` instead of `DMA_FROM_DEVICE`. The direction passed to `end_cpu_access` must match the direction used in `begin_cpu_access` — these are paired calls bracketing a CPU access with a specific direction. Using `DMA_TO_DEVICE` in the end call means the exporter receives a mismatched direction, which can result in the wrong cache operation (a clean instead of an invalidate) or no operation at all, depending on the exporter implementation. This should be:

```c
dma_buf_begin_cpu_access(map->buf, DMA_FROM_DEVICE);
dma_buf_end_cpu_access(map->buf, DMA_FROM_DEVICE);
```

**Issue (Medium): Questionable cache maintenance pattern**

Both `fastrpc_flush_args()` and `fastrpc_inv_args()` immediately pair `begin_cpu_access` + `end_cpu_access` with no actual CPU access in between:

```c
dma_buf_begin_cpu_access(map->buf, DMA_TO_DEVICE);
dma_buf_end_cpu_access(map->buf, DMA_TO_DEVICE);
```

The dma-buf API documentation states: *"Only when cpu access is bracketed by both calls is it guaranteed to be coherent with other DMA access."* The intended usage is: `begin` → perform CPU reads/writes → `end`. Here the CPU access (the copy_from_user into the buffer) has *already happened* before `fastrpc_flush_args()` is called, so the bracket is empty. While this may work on some exporters as a side-effect (the `end_cpu_access` callback might do a cache clean), it's relying on implementation details rather than the documented contract. The correct approach would be to call `dma_buf_begin_cpu_access` *before* the CPU writes to the buffer, and `dma_buf_end_cpu_access` after. Similarly for the invalidation path — `begin_cpu_access(DMA_FROM_DEVICE)` should be called before `fastrpc_put_args` reads the output data, and `end_cpu_access` after.

**Issue (Medium): Return values of dma_buf_{begin,end}_cpu_access are ignored**

Both `dma_buf_begin_cpu_access()` and `dma_buf_end_cpu_access()` return `int` error codes. Failures are silently ignored:

```c
dma_buf_begin_cpu_access(map->buf, DMA_TO_DEVICE);
dma_buf_end_cpu_access(map->buf, DMA_TO_DEVICE);
```

At minimum, `begin_cpu_access` failures should be checked and, if failing, the `end_cpu_access` call skipped (and preferably the error propagated upward, since `fastrpc_get_args` already returns int).

**Issue (Minor): `uintptr_t` cast on kernel virtual address in `fastrpc_inv_args`**

```c
if (((uintptr_t)rpra & PAGE_MASK) ==
    ((uintptr_t)rpra[raix].buf.pv & PAGE_MASK))
```

`rpra` is a kernel pointer, `rpra[raix].buf.pv` is a `u64` from the remote arg union. Casting a kernel pointer to `uintptr_t` and masking it works but is a bit fragile — would be clearer to compare against the buffer's known DMA address or page frame, and it's worth a comment explaining that `buf.pv` here holds a kernel virtual address within the same allocation as `rpra`.

**Observation: `coherent` field not propagated to duplicate sessions**

The `coherent` field is set per session in `fastrpc_cb_probe()`:

```c
sess->coherent = of_property_read_bool(dev->of_node, "dma-coherent");
```

The duplicate sessions created in the loop below are copied via `memcpy(dup_sess, sess, sizeof(*dup_sess))`, so they *do* inherit the `coherent` flag. This is correct.

**Nit: Filtering logic in `fastrpc_flush_args` could be simplified**

```c
if (raix + 1 > inbufs)
    continue;
```

This is equivalent to `raix >= inbufs` which would be more idiomatic. Similarly in `fastrpc_inv_args`: `raix + 1 <= inbufs` is `raix < inbufs`. While functionally correct, the `+1` pattern is slightly confusing.

**Summary of required changes:**
1. Fix `dma_buf_end_cpu_access` direction to `DMA_FROM_DEVICE` in `fastrpc_inv_args()` — this is a correctness bug.
2. Check return values of `dma_buf_begin_cpu_access()`.
3. Reconsider the placement of begin/end brackets to properly surround the actual CPU access rather than being empty pairs.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-06-04 19:54 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-04 19:48 [PATCH v1] misc: fastrpc: Add cache maintenance for non-coherent platforms Abhinav Parihar
2026-06-04 19:54 ` Claude review: " Claude Code Review Bot
2026-06-04 19:54 ` Claude Code Review Bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox