* [PATCH 0/2] Add some sensor data to amdxdna
@ 2026-02-28 6:10 Mario Limonciello (AMD)
2026-02-28 6:10 ` [PATCH 1/2] accel/amdxdna: Add IOCTL to retrieve realtime NPU power estimate Mario Limonciello (AMD)
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Mario Limonciello (AMD) @ 2026-02-28 6:10 UTC (permalink / raw)
To: mario.limonciello, airlied, lizhi.hou, maarten.lankhorst,
mamin506, mripard, ogabbay, simona, tzimmermann
Cc: Mario Limonciello (AMD), dri-devel
Previously a series [1] was sent to add power sensor data for amdxdna
through the PMF mailbox. The PMF patch was merged but the XDNA
patches were planned for later.
Re-send the XDNA patch, and then add in another patch for getting
column utilization.
Link: https://lore.kernel.org/platform-driver-x86/20260105172956.3732123-3-lizhi.hou@amd.com/ [1]
Lizhi Hou (1):
accel/amdxdna: Add IOCTL to retrieve realtime NPU power estimate
Mario Limonciello (AMD) (1):
accel/amdxdna: Support sensors for column utilization
drivers/accel/amdxdna/aie2_pci.c | 50 +++++++++++++++++++++++++
drivers/accel/amdxdna/aie2_pci.h | 22 +++++++++++
drivers/accel/amdxdna/amdxdna_pci_drv.c | 3 +-
include/uapi/drm/amdxdna_accel.h | 3 +-
4 files changed, 76 insertions(+), 2 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH 1/2] accel/amdxdna: Add IOCTL to retrieve realtime NPU power estimate
2026-02-28 6:10 [PATCH 0/2] Add some sensor data to amdxdna Mario Limonciello (AMD)
@ 2026-02-28 6:10 ` Mario Limonciello (AMD)
2026-03-03 4:37 ` Claude review: " Claude Code Review Bot
2026-02-28 6:10 ` [PATCH 2/2] accel/amdxdna: Support sensors for column utilization Mario Limonciello (AMD)
2026-03-03 4:37 ` Claude review: Add some sensor data to amdxdna Claude Code Review Bot
2 siblings, 1 reply; 9+ messages in thread
From: Mario Limonciello (AMD) @ 2026-02-28 6:10 UTC (permalink / raw)
To: mario.limonciello, airlied, lizhi.hou, maarten.lankhorst,
mamin506, mripard, ogabbay, simona, tzimmermann
Cc: Mario Limonciello, Shyam Sundar S K, dri-devel
From: Lizhi Hou <lizhi.hou@amd.com>
The AMD PMF driver provides an interface to obtain realtime power
estimates for the NPU. Expose this information to userspace through a
new DRM_IOCTL_AMDXDNA_GET_INFO parameter, allowing applications to query
the current NPU power level.
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
(Update comment to indicate power and utilization)
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
---
drivers/accel/amdxdna/aie2_pci.c | 29 +++++++++++++++++++++++++
drivers/accel/amdxdna/aie2_pci.h | 20 +++++++++++++++++
drivers/accel/amdxdna/amdxdna_pci_drv.c | 3 ++-
3 files changed, 51 insertions(+), 1 deletion(-)
diff --git a/drivers/accel/amdxdna/aie2_pci.c b/drivers/accel/amdxdna/aie2_pci.c
index 85079b6fc5d90..a2e586512e26c 100644
--- a/drivers/accel/amdxdna/aie2_pci.c
+++ b/drivers/accel/amdxdna/aie2_pci.c
@@ -10,6 +10,7 @@
#include <drm/drm_managed.h>
#include <drm/drm_print.h>
#include <drm/gpu_scheduler.h>
+#include <linux/amd-pmf-io.h>
#include <linux/cleanup.h>
#include <linux/errno.h>
#include <linux/firmware.h>
@@ -783,6 +784,31 @@ static int aie2_get_clock_metadata(struct amdxdna_client *client,
return ret;
}
+static int aie2_get_sensors(struct amdxdna_client *client,
+ struct amdxdna_drm_get_info *args)
+{
+ struct amdxdna_drm_query_sensor sensor = {};
+ int ret;
+
+ if (args->buffer_size < sizeof(sensor))
+ return -EINVAL;
+
+ ret = AIE2_GET_PMF_NPU_DATA(npu_power, sensor.input);
+ if (ret)
+ return ret;
+ sensor.type = AMDXDNA_SENSOR_TYPE_POWER;
+ sensor.unitm = -3;
+ scnprintf(sensor.label, sizeof(sensor.label), "Total Power");
+ scnprintf(sensor.units, sizeof(sensor.units), "mW");
+
+ if (copy_to_user(u64_to_user_ptr(args->buffer), &sensor, sizeof(sensor)))
+ return -EFAULT;
+
+ args->buffer_size = sizeof(sensor);
+
+ return 0;
+}
+
static int aie2_hwctx_status_cb(struct amdxdna_hwctx *hwctx, void *arg)
{
struct amdxdna_drm_hwctx_entry *tmp __free(kfree) = NULL;
@@ -986,6 +1012,9 @@ static int aie2_get_info(struct amdxdna_client *client, struct amdxdna_drm_get_i
case DRM_AMDXDNA_QUERY_CLOCK_METADATA:
ret = aie2_get_clock_metadata(client, args);
break;
+ case DRM_AMDXDNA_QUERY_SENSORS:
+ ret = aie2_get_sensors(client, args);
+ break;
case DRM_AMDXDNA_QUERY_HW_CONTEXTS:
ret = aie2_get_hwctx_status(client, args);
break;
diff --git a/drivers/accel/amdxdna/aie2_pci.h b/drivers/accel/amdxdna/aie2_pci.h
index b20a3661078cc..1bb88711bedb3 100644
--- a/drivers/accel/amdxdna/aie2_pci.h
+++ b/drivers/accel/amdxdna/aie2_pci.h
@@ -7,6 +7,7 @@
#define _AIE2_PCI_H_
#include <drm/amdxdna_accel.h>
+#include <linux/limits.h>
#include <linux/semaphore.h>
#include "amdxdna_mailbox.h"
@@ -46,6 +47,25 @@
pci_resource_len(NDEV2PDEV(_ndev), (_ndev)->xdna->dev_info->mbox_bar); \
})
+#if IS_ENABLED(CONFIG_AMD_PMF)
+#define AIE2_GET_PMF_NPU_DATA(field, val) \
+({ \
+ struct amd_pmf_npu_metrics _npu_metrics; \
+ int _ret; \
+ \
+ _ret = amd_pmf_get_npu_data(&_npu_metrics); \
+ val = _ret ? U32_MAX : _npu_metrics.field; \
+ (_ret); \
+})
+#else
+#define SENSOR_DEFAULT_npu_power U32_MAX
+#define AIE2_GET_PMF_NPU_DATA(field, val) \
+({ \
+ val = SENSOR_DEFAULT_##field; \
+ (-EOPNOTSUPP); \
+})
+#endif
+
enum aie2_smu_reg_idx {
SMU_CMD_REG = 0,
SMU_ARG_REG,
diff --git a/drivers/accel/amdxdna/amdxdna_pci_drv.c b/drivers/accel/amdxdna/amdxdna_pci_drv.c
index a4384593bdccb..98720ddc9818e 100644
--- a/drivers/accel/amdxdna/amdxdna_pci_drv.c
+++ b/drivers/accel/amdxdna/amdxdna_pci_drv.c
@@ -35,9 +35,10 @@ MODULE_FIRMWARE("amdnpu/17f0_11/npu_7.sbin");
* 0.4: Support getting resource information
* 0.5: Support getting telemetry data
* 0.6: Support preemption
+ * 0.7: Support getting power and utilization data
*/
#define AMDXDNA_DRIVER_MAJOR 0
-#define AMDXDNA_DRIVER_MINOR 6
+#define AMDXDNA_DRIVER_MINOR 7
/*
* Bind the driver base on (vendor_id, device_id) pair and later use the
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* Claude review: accel/amdxdna: Add IOCTL to retrieve realtime NPU power estimate
2026-02-28 6:10 ` [PATCH 1/2] accel/amdxdna: Add IOCTL to retrieve realtime NPU power estimate Mario Limonciello (AMD)
@ 2026-03-03 4:37 ` Claude Code Review Bot
0 siblings, 0 replies; 9+ messages in thread
From: Claude Code Review Bot @ 2026-03-03 4:37 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Commit message nit**: The line `(Update comment to indicate power and utilization)` between the Reviewed-by tags and the Signed-off-by is unusual. This annotation should go below the `---` separator or in the cover letter, not in the permanent commit message area.
**`AIE2_GET_PMF_NPU_DATA` macro — dead assignment on error** (`aie2_pci.h`):
```c
#define AIE2_GET_PMF_NPU_DATA(field, val) \
({ \
struct amd_pmf_npu_metrics _npu_metrics; \
int _ret; \
\
_ret = amd_pmf_get_npu_data(&_npu_metrics); \
val = _ret ? U32_MAX : _npu_metrics.field; \
(_ret); \
})
```
When `_ret` is non-zero, `val` is set to `U32_MAX`, but the caller immediately returns:
```c
ret = AIE2_GET_PMF_NPU_DATA(npu_power, sensor.input);
if (ret)
return ret;
```
So the `U32_MAX` assignment is dead code. This is harmless but the macro design is misleading — it suggests the caller might continue with a fallback value.
**`#else` stub — fragile naming convention** (`aie2_pci.h`):
```c
#define SENSOR_DEFAULT_npu_power U32_MAX
#define AIE2_GET_PMF_NPU_DATA(field, val) \
({ \
val = SENSOR_DEFAULT_##field; \
(-EOPNOTSUPP); \
})
```
The token-pasting `SENSOR_DEFAULT_##field` requires a matching `#define` for every field name used. Currently only `npu_power` has one. This is brittle and the whole construct is unnecessary since the macro always returns `-EOPNOTSUPP` and the caller returns immediately on error — the value assigned to `val` is never used. This could be simplified to just:
```c
#define AIE2_GET_PMF_NPU_DATA(field, val) (-EOPNOTSUPP)
```
**`unitm` / `units` inconsistency** (`aie2_pci.c`):
```c
sensor.unitm = -3;
scnprintf(sensor.units, sizeof(sensor.units), "mW");
```
Per the UAPI documentation: `unitm` translates values via `pow(10, unitm) * value`. The PMF struct has `u16 npu_power`. If the raw value is already in milliwatts, then `unitm` should be `0` and `units = "mW"`, or `unitm = -3` and `units = "W"`. Having `unitm = -3` with `units = "mW"` means userspace would compute `pow(10, -3) * value` in milliwatts, yielding microwatts — likely not the intent. Please clarify which is correct.
**Missing `buffer_size` check parity with other handlers**: `aie2_get_clock_metadata()` doesn't validate `buffer_size` before `copy_to_user` — it just copies. Here `aie2_get_sensors()` does validate. Inconsistent, but the validation here is the right thing to do; just noting the discrepancy.
**No `#include <linux/limits.h>` justification**: The patch adds `#include <linux/limits.h>` to `aie2_pci.h`, presumably for `U32_MAX`, but `U32_MAX` is defined in `<linux/limits.h>` only indirectly. It's actually in `<linux/types.h>` or `<linux/kernel.h>` which are likely already pulled in. Minor, but worth verifying the include is actually needed.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/2] accel/amdxdna: Support sensors for column utilization
2026-02-28 6:10 [PATCH 0/2] Add some sensor data to amdxdna Mario Limonciello (AMD)
2026-02-28 6:10 ` [PATCH 1/2] accel/amdxdna: Add IOCTL to retrieve realtime NPU power estimate Mario Limonciello (AMD)
@ 2026-02-28 6:10 ` Mario Limonciello (AMD)
2026-02-28 19:56 ` Lizhi Hou
2026-03-03 4:37 ` Claude review: " Claude Code Review Bot
2026-03-03 4:37 ` Claude review: Add some sensor data to amdxdna Claude Code Review Bot
2 siblings, 2 replies; 9+ messages in thread
From: Mario Limonciello (AMD) @ 2026-02-28 6:10 UTC (permalink / raw)
To: mario.limonciello, airlied, lizhi.hou, maarten.lankhorst,
mamin506, mripard, ogabbay, simona, tzimmermann
Cc: Mario Limonciello (AMD), dri-devel
The AMD PMF driver provides realtime column utilization (npu_busy)
metrics for the NPU. Extend the DRM_IOCTL_AMDXDNA_GET_INFO sensor
query to expose these metrics to userspace.
Add AMDXDNA_SENSOR_TYPE_COLUMN_UTILIZATION to the sensor type enum
and update aie2_get_sensors() to return both the total power and up
to 8 column utilization sensors if the user buffer permits.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
---
drivers/accel/amdxdna/aie2_pci.c | 27 ++++++++++++++++++++++++---
drivers/accel/amdxdna/aie2_pci.h | 2 ++
include/uapi/drm/amdxdna_accel.h | 3 ++-
3 files changed, 28 insertions(+), 4 deletions(-)
diff --git a/drivers/accel/amdxdna/aie2_pci.c b/drivers/accel/amdxdna/aie2_pci.c
index a2e586512e26c..1f11822f0bfda 100644
--- a/drivers/accel/amdxdna/aie2_pci.c
+++ b/drivers/accel/amdxdna/aie2_pci.c
@@ -787,16 +787,24 @@ static int aie2_get_clock_metadata(struct amdxdna_client *client,
static int aie2_get_sensors(struct amdxdna_client *client,
struct amdxdna_drm_get_info *args)
{
+ struct amdxdna_dev_hdl *ndev = client->xdna->dev_handle;
struct amdxdna_drm_query_sensor sensor = {};
+ struct amd_pmf_npu_metrics npu_metrics;
+ u32 sensors_count, util_count, i;
int ret;
- if (args->buffer_size < sizeof(sensor))
+ util_count = min_t(u32, ndev->total_col, 8);
+ sensors_count = 1 + util_count;
+
+ if (args->buffer_size < sensors_count * sizeof(sensor))
return -EINVAL;
- ret = AIE2_GET_PMF_NPU_DATA(npu_power, sensor.input);
+ ret = AIE2_GET_PMF_NPU_METRICS(&npu_metrics);
if (ret)
return ret;
+
sensor.type = AMDXDNA_SENSOR_TYPE_POWER;
+ sensor.input = npu_metrics.npu_power;
sensor.unitm = -3;
scnprintf(sensor.label, sizeof(sensor.label), "Total Power");
scnprintf(sensor.units, sizeof(sensor.units), "mW");
@@ -804,7 +812,20 @@ static int aie2_get_sensors(struct amdxdna_client *client,
if (copy_to_user(u64_to_user_ptr(args->buffer), &sensor, sizeof(sensor)))
return -EFAULT;
- args->buffer_size = sizeof(sensor);
+ for (i = 0; i < util_count; i++) {
+ memset(&sensor, 0, sizeof(sensor));
+ sensor.input = npu_metrics.npu_busy[i];
+ sensor.type = AMDXDNA_SENSOR_TYPE_COLUMN_UTILIZATION;
+ sensor.unitm = 0;
+ scnprintf(sensor.label, sizeof(sensor.label), "Column %d Utilization", i);
+ scnprintf(sensor.units, sizeof(sensor.units), "%%");
+
+ if (copy_to_user(u64_to_user_ptr(args->buffer) + (i + 1) * sizeof(sensor),
+ &sensor, sizeof(sensor)))
+ return -EFAULT;
+ }
+
+ args->buffer_size = sensors_count * sizeof(sensor);
return 0;
}
diff --git a/drivers/accel/amdxdna/aie2_pci.h b/drivers/accel/amdxdna/aie2_pci.h
index 1bb88711bedb3..424fd6d852191 100644
--- a/drivers/accel/amdxdna/aie2_pci.h
+++ b/drivers/accel/amdxdna/aie2_pci.h
@@ -48,6 +48,7 @@
})
#if IS_ENABLED(CONFIG_AMD_PMF)
+#define AIE2_GET_PMF_NPU_METRICS(metrics) amd_pmf_get_npu_data(metrics)
#define AIE2_GET_PMF_NPU_DATA(field, val) \
({ \
struct amd_pmf_npu_metrics _npu_metrics; \
@@ -58,6 +59,7 @@
(_ret); \
})
#else
+#define AIE2_GET_PMF_NPU_METRICS(metrics) (-EOPNOTSUPP)
#define SENSOR_DEFAULT_npu_power U32_MAX
#define AIE2_GET_PMF_NPU_DATA(field, val) \
({ \
diff --git a/include/uapi/drm/amdxdna_accel.h b/include/uapi/drm/amdxdna_accel.h
index 9c44db2b3dcd4..5bd13f4435f55 100644
--- a/include/uapi/drm/amdxdna_accel.h
+++ b/include/uapi/drm/amdxdna_accel.h
@@ -353,7 +353,8 @@ struct amdxdna_drm_query_clock_metadata {
};
enum amdxdna_sensor_type {
- AMDXDNA_SENSOR_TYPE_POWER
+ AMDXDNA_SENSOR_TYPE_POWER,
+ AMDXDNA_SENSOR_TYPE_COLUMN_UTILIZATION
};
/**
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH 2/2] accel/amdxdna: Support sensors for column utilization
2026-02-28 6:10 ` [PATCH 2/2] accel/amdxdna: Support sensors for column utilization Mario Limonciello (AMD)
@ 2026-02-28 19:56 ` Lizhi Hou
2026-03-03 4:37 ` Claude review: " Claude Code Review Bot
1 sibling, 0 replies; 9+ messages in thread
From: Lizhi Hou @ 2026-02-28 19:56 UTC (permalink / raw)
To: Mario Limonciello (AMD), mario.limonciello, airlied,
maarten.lankhorst, mamin506, mripard, ogabbay, simona,
tzimmermann
Cc: dri-devel
On 2/27/26 22:10, Mario Limonciello (AMD) wrote:
> The AMD PMF driver provides realtime column utilization (npu_busy)
> metrics for the NPU. Extend the DRM_IOCTL_AMDXDNA_GET_INFO sensor
> query to expose these metrics to userspace.
>
> Add AMDXDNA_SENSOR_TYPE_COLUMN_UTILIZATION to the sensor type enum
> and update aie2_get_sensors() to return both the total power and up
> to 8 column utilization sensors if the user buffer permits.
>
> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
> ---
> drivers/accel/amdxdna/aie2_pci.c | 27 ++++++++++++++++++++++++---
> drivers/accel/amdxdna/aie2_pci.h | 2 ++
> include/uapi/drm/amdxdna_accel.h | 3 ++-
> 3 files changed, 28 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/accel/amdxdna/aie2_pci.c b/drivers/accel/amdxdna/aie2_pci.c
> index a2e586512e26c..1f11822f0bfda 100644
> --- a/drivers/accel/amdxdna/aie2_pci.c
> +++ b/drivers/accel/amdxdna/aie2_pci.c
> @@ -787,16 +787,24 @@ static int aie2_get_clock_metadata(struct amdxdna_client *client,
> static int aie2_get_sensors(struct amdxdna_client *client,
> struct amdxdna_drm_get_info *args)
> {
> + struct amdxdna_dev_hdl *ndev = client->xdna->dev_handle;
> struct amdxdna_drm_query_sensor sensor = {};
> + struct amd_pmf_npu_metrics npu_metrics;
> + u32 sensors_count, util_count, i;
> int ret;
>
> - if (args->buffer_size < sizeof(sensor))
> + util_count = min_t(u32, ndev->total_col, 8);
> + sensors_count = 1 + util_count;
> +
> + if (args->buffer_size < sensors_count * sizeof(sensor))
> return -EINVAL;
>
> - ret = AIE2_GET_PMF_NPU_DATA(npu_power, sensor.input);
> + ret = AIE2_GET_PMF_NPU_METRICS(&npu_metrics);
> if (ret)
> return ret;
> +
> sensor.type = AMDXDNA_SENSOR_TYPE_POWER;
> + sensor.input = npu_metrics.npu_power;
> sensor.unitm = -3;
> scnprintf(sensor.label, sizeof(sensor.label), "Total Power");
> scnprintf(sensor.units, sizeof(sensor.units), "mW");
> @@ -804,7 +812,20 @@ static int aie2_get_sensors(struct amdxdna_client *client,
> if (copy_to_user(u64_to_user_ptr(args->buffer), &sensor, sizeof(sensor)))
> return -EFAULT;
>
> - args->buffer_size = sizeof(sensor);
> + for (i = 0; i < util_count; i++) {
> + memset(&sensor, 0, sizeof(sensor));
> + sensor.input = npu_metrics.npu_busy[i];
> + sensor.type = AMDXDNA_SENSOR_TYPE_COLUMN_UTILIZATION;
> + sensor.unitm = 0;
> + scnprintf(sensor.label, sizeof(sensor.label), "Column %d Utilization", i);
> + scnprintf(sensor.units, sizeof(sensor.units), "%%");
> +
> + if (copy_to_user(u64_to_user_ptr(args->buffer) + (i + 1) * sizeof(sensor),
> + &sensor, sizeof(sensor)))
> + return -EFAULT;
> + }
> +
> + args->buffer_size = sensors_count * sizeof(sensor);
>
> return 0;
> }
> diff --git a/drivers/accel/amdxdna/aie2_pci.h b/drivers/accel/amdxdna/aie2_pci.h
> index 1bb88711bedb3..424fd6d852191 100644
> --- a/drivers/accel/amdxdna/aie2_pci.h
> +++ b/drivers/accel/amdxdna/aie2_pci.h
> @@ -48,6 +48,7 @@
> })
>
> #if IS_ENABLED(CONFIG_AMD_PMF)
> +#define AIE2_GET_PMF_NPU_METRICS(metrics) amd_pmf_get_npu_data(metrics)
> #define AIE2_GET_PMF_NPU_DATA(field, val) \
> ({ \
> struct amd_pmf_npu_metrics _npu_metrics; \
> @@ -58,6 +59,7 @@
> (_ret); \
> })
> #else
> +#define AIE2_GET_PMF_NPU_METRICS(metrics) (-EOPNOTSUPP)
> #define SENSOR_DEFAULT_npu_power U32_MAX
> #define AIE2_GET_PMF_NPU_DATA(field, val) \
> ({ \
> diff --git a/include/uapi/drm/amdxdna_accel.h b/include/uapi/drm/amdxdna_accel.h
> index 9c44db2b3dcd4..5bd13f4435f55 100644
> --- a/include/uapi/drm/amdxdna_accel.h
> +++ b/include/uapi/drm/amdxdna_accel.h
> @@ -353,7 +353,8 @@ struct amdxdna_drm_query_clock_metadata {
> };
>
> enum amdxdna_sensor_type {
> - AMDXDNA_SENSOR_TYPE_POWER
> + AMDXDNA_SENSOR_TYPE_POWER,
> + AMDXDNA_SENSOR_TYPE_COLUMN_UTILIZATION
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
> };
>
> /**
^ permalink raw reply [flat|nested] 9+ messages in thread* Claude review: accel/amdxdna: Support sensors for column utilization
2026-02-28 6:10 ` [PATCH 2/2] accel/amdxdna: Support sensors for column utilization Mario Limonciello (AMD)
2026-02-28 19:56 ` Lizhi Hou
@ 2026-03-03 4:37 ` Claude Code Review Bot
1 sibling, 0 replies; 9+ messages in thread
From: Claude Code Review Bot @ 2026-03-03 4:37 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Rigid buffer size requirement breaks forward/backward compatibility** (`aie2_pci.c`):
```c
util_count = min_t(u32, ndev->total_col, 8);
sensors_count = 1 + util_count;
if (args->buffer_size < sensors_count * sizeof(sensor))
return -EINVAL;
```
This is the most significant issue in the series. Patch 1 requires space for 1 sensor. Patch 2 changes this to require space for `1 + min(total_col, 8)` sensors. Problems:
1. **No discovery mechanism**: Userspace has no way to query how many sensors exist before allocating the buffer. It must know `total_col` a priori, which is a driver implementation detail.
2. **All-or-nothing**: If userspace only wants the power sensor, it cannot query just that one — it must allocate for all sensors or get `-EINVAL`.
3. **Hardware-dependent size**: Different hardware with different `total_col` values require different buffer sizes for the same IOCTL parameter.
A better design would be: fill as many sensors as fit in the provided buffer and set `args->buffer_size` to the number of bytes actually written. Userspace can then allocate a larger buffer and retry if it wants more. This is a common pattern in kernel ioctls.
**`AIE2_GET_PMF_NPU_DATA` becomes dead code** (`aie2_pci.h`):
Patch 2 adds:
```c
#define AIE2_GET_PMF_NPU_METRICS(metrics) amd_pmf_get_npu_data(metrics)
```
and changes `aie2_get_sensors()` to use `AIE2_GET_PMF_NPU_METRICS()` instead of `AIE2_GET_PMF_NPU_DATA()`. But the old macro (and its `SENSOR_DEFAULT_npu_power` helper) are left in place with no remaining callers. These should be removed.
Also, `AIE2_GET_PMF_NPU_METRICS` is a trivial one-line wrapper that adds no value — it just calls `amd_pmf_get_npu_data()`. Consider calling the function directly (guarded by `#if IS_ENABLED(CONFIG_AMD_PMF)`) instead of adding a macro that just aliases the function.
**Partial copy_to_user on failure** (`aie2_pci.c`):
```c
if (copy_to_user(u64_to_user_ptr(args->buffer), &sensor, sizeof(sensor)))
return -EFAULT;
for (i = 0; i < util_count; i++) {
...
if (copy_to_user(u64_to_user_ptr(args->buffer) + (i + 1) * sizeof(sensor),
&sensor, sizeof(sensor)))
return -EFAULT;
}
args->buffer_size = sensors_count * sizeof(sensor);
```
If the power sensor is successfully copied but a subsequent column utilization copy fails, the function returns `-EFAULT` without setting `args->buffer_size`. Userspace has received partial data in its buffer but has no indication of how much was written. This is a minor concern in practice (a partial `copy_to_user` failure typically indicates a corrupted user pointer), but building into a local buffer and doing a single `copy_to_user` would be cleaner and avoid partial writes entirely.
**`npu_busy` is `u16` but `sensor.input` is `__u32`** — this is fine, the implicit widening is correct. Just noting for completeness that the PMF metrics use `u16` values while the UAPI struct uses `__u32`.
**Missing sensor index/identifier**: The column utilization sensors are only distinguished by their `label` string ("Column 0 Utilization", etc.). There is no numeric index field in the sensor struct. Userspace must parse the label string to determine which column a sensor corresponds to. Adding a field or using a sub-type would be more robust, though this would require a UAPI change.
**Version bump scope**: The version bump to 0.7 is in patch 1 with the comment `"Support getting power and utilization data"`, but utilization support is only added in patch 2. The version bump should either be in patch 2 (when the full feature is present) or the comment in patch 1 should only mention power data.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 9+ messages in thread
* Claude review: Add some sensor data to amdxdna
2026-02-28 6:10 [PATCH 0/2] Add some sensor data to amdxdna Mario Limonciello (AMD)
2026-02-28 6:10 ` [PATCH 1/2] accel/amdxdna: Add IOCTL to retrieve realtime NPU power estimate Mario Limonciello (AMD)
2026-02-28 6:10 ` [PATCH 2/2] accel/amdxdna: Support sensors for column utilization Mario Limonciello (AMD)
@ 2026-03-03 4:37 ` Claude Code Review Bot
2 siblings, 0 replies; 9+ messages in thread
From: Claude Code Review Bot @ 2026-03-03 4:37 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: Add some sensor data to amdxdna
Author: "Mario Limonciello (AMD)" <superm1@kernel.org>
Patches: 4
Reviewed: 2026-03-03T14:37:39.281145
---
This 2-patch series adds sensor data reporting to the amdxdna (NPU accelerator) driver by hooking into the AMD PMF (Platform Management Framework) driver to obtain realtime NPU power and column utilization metrics. The UAPI enum `DRM_AMDXDNA_QUERY_SENSORS` already exists but had no handler — this series provides the implementation.
The overall approach is reasonable, but there are several design issues:
1. **Patch 2 breaks the API contract established by patch 1** — patch 1 requires a buffer for 1 sensor, patch 2 retroactively requires space for 1+N sensors or returns `-EINVAL`. Userspace cannot discover the required buffer size without trial-and-error.
2. **The macro abstraction is messy** — patch 1 introduces `AIE2_GET_PMF_NPU_DATA` then patch 2 immediately adds a parallel `AIE2_GET_PMF_NPU_METRICS`, leaving the former as dead code.
3. **The `unitm` / `units` semantics are inconsistent** for the power sensor.
---
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH V2] accel/amdxdna: Support sensors for column utilization
@ 2026-03-11 17:18 Lizhi Hou
2026-03-11 20:48 ` Claude review: " Claude Code Review Bot
2026-03-11 20:48 ` Claude Code Review Bot
0 siblings, 2 replies; 9+ messages in thread
From: Lizhi Hou @ 2026-03-11 17:18 UTC (permalink / raw)
To: ogabbay, quic_jhugo, dri-devel, mario.limonciello,
maciej.falkowski
Cc: Mario Limonciello (AMD), linux-kernel, max.zhen, sonal.santan,
Lizhi Hou
From: "Mario Limonciello (AMD)" <superm1@kernel.org>
The AMD PMF driver provides realtime column utilization (npu_busy)
metrics for the NPU. Extend the DRM_IOCTL_AMDXDNA_GET_INFO sensor
query to expose these metrics to userspace.
Add AMDXDNA_SENSOR_TYPE_COLUMN_UTILIZATION to the sensor type enum
and update aie2_get_sensors() to return both the total power and up
to 8 column utilization sensors if the user buffer permits.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
[lizhi: support legacy tool which uses small buffer. checkpatch cleanup]
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
---
drivers/accel/amdxdna/aie2_pci.c | 34 +++++++++++++++++++++++++++-----
drivers/accel/amdxdna/aie2_pci.h | 8 ++++++++
include/uapi/drm/amdxdna_accel.h | 3 ++-
3 files changed, 39 insertions(+), 6 deletions(-)
diff --git a/drivers/accel/amdxdna/aie2_pci.c b/drivers/accel/amdxdna/aie2_pci.c
index a2e586512e26..c57c785a2d15 100644
--- a/drivers/accel/amdxdna/aie2_pci.c
+++ b/drivers/accel/amdxdna/aie2_pci.c
@@ -787,16 +787,18 @@ static int aie2_get_clock_metadata(struct amdxdna_client *client,
static int aie2_get_sensors(struct amdxdna_client *client,
struct amdxdna_drm_get_info *args)
{
+ struct amdxdna_dev_hdl *ndev = client->xdna->dev_handle;
struct amdxdna_drm_query_sensor sensor = {};
+ struct amd_pmf_npu_metrics npu_metrics;
+ u32 sensors_count = 0, i;
int ret;
- if (args->buffer_size < sizeof(sensor))
- return -EINVAL;
-
- ret = AIE2_GET_PMF_NPU_DATA(npu_power, sensor.input);
+ ret = AIE2_GET_PMF_NPU_METRICS(&npu_metrics);
if (ret)
return ret;
+
sensor.type = AMDXDNA_SENSOR_TYPE_POWER;
+ sensor.input = npu_metrics.npu_power;
sensor.unitm = -3;
scnprintf(sensor.label, sizeof(sensor.label), "Total Power");
scnprintf(sensor.units, sizeof(sensor.units), "mW");
@@ -804,7 +806,29 @@ static int aie2_get_sensors(struct amdxdna_client *client,
if (copy_to_user(u64_to_user_ptr(args->buffer), &sensor, sizeof(sensor)))
return -EFAULT;
- args->buffer_size = sizeof(sensor);
+ sensors_count++;
+ if (args->buffer_size <= sensors_count * sizeof(sensor))
+ goto out;
+
+ for (i = 0; i < min_t(u32, ndev->total_col, 8); i++) {
+ memset(&sensor, 0, sizeof(sensor));
+ sensor.input = npu_metrics.npu_busy[i];
+ sensor.type = AMDXDNA_SENSOR_TYPE_COLUMN_UTILIZATION;
+ sensor.unitm = 0;
+ scnprintf(sensor.label, sizeof(sensor.label), "Column %d Utilization", i);
+ scnprintf(sensor.units, sizeof(sensor.units), "%%");
+
+ if (copy_to_user(u64_to_user_ptr(args->buffer) + sensors_count * sizeof(sensor),
+ &sensor, sizeof(sensor)))
+ return -EFAULT;
+
+ sensors_count++;
+ if (args->buffer_size <= sensors_count * sizeof(sensor))
+ goto out;
+ }
+
+out:
+ args->buffer_size = sensors_count * sizeof(sensor);
return 0;
}
diff --git a/drivers/accel/amdxdna/aie2_pci.h b/drivers/accel/amdxdna/aie2_pci.h
index 1bb88711bedb..0ae174862592 100644
--- a/drivers/accel/amdxdna/aie2_pci.h
+++ b/drivers/accel/amdxdna/aie2_pci.h
@@ -48,6 +48,7 @@
})
#if IS_ENABLED(CONFIG_AMD_PMF)
+#define AIE2_GET_PMF_NPU_METRICS(metrics) amd_pmf_get_npu_data(metrics)
#define AIE2_GET_PMF_NPU_DATA(field, val) \
({ \
struct amd_pmf_npu_metrics _npu_metrics; \
@@ -58,6 +59,13 @@
(_ret); \
})
#else
+#define AIE2_GET_PMF_NPU_METRICS(metrics) \
+({ \
+ typeof(metrics) _m = metrics; \
+ memset(_m, 0xff, sizeof(*_m)); \
+ (-EOPNOTSUPP); \
+})
+
#define SENSOR_DEFAULT_npu_power U32_MAX
#define AIE2_GET_PMF_NPU_DATA(field, val) \
({ \
diff --git a/include/uapi/drm/amdxdna_accel.h b/include/uapi/drm/amdxdna_accel.h
index 9c44db2b3dcd..5bd13f4435f5 100644
--- a/include/uapi/drm/amdxdna_accel.h
+++ b/include/uapi/drm/amdxdna_accel.h
@@ -353,7 +353,8 @@ struct amdxdna_drm_query_clock_metadata {
};
enum amdxdna_sensor_type {
- AMDXDNA_SENSOR_TYPE_POWER
+ AMDXDNA_SENSOR_TYPE_POWER,
+ AMDXDNA_SENSOR_TYPE_COLUMN_UTILIZATION
};
/**
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* Claude review: accel/amdxdna: Support sensors for column utilization
2026-03-11 17:18 [PATCH V2] accel/amdxdna: Support sensors for column utilization Lizhi Hou
@ 2026-03-11 20:48 ` Claude Code Review Bot
2026-03-11 20:48 ` Claude Code Review Bot
1 sibling, 0 replies; 9+ messages in thread
From: Claude Code Review Bot @ 2026-03-11 20:48 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**1. Removed buffer size validation (medium severity)**
The original code had:
```c
if (args->buffer_size < sizeof(sensor))
return -EINVAL;
```
This is removed by the patch. The new code proceeds to `copy_to_user()` of the first (power) sensor without checking whether the user buffer is large enough for even one sensor. If userspace passes `buffer_size = 0`, the copy proceeds unconditionally and the `<=` check at line 170 (`args->buffer_size <= sensors_count * sizeof(sensor)`) only fires *after* the copy. The minimum buffer size check should be retained before the first `copy_to_user`.
**2. Off-by-one / boundary check logic**
```c
sensors_count++;
if (args->buffer_size <= sensors_count * sizeof(sensor))
goto out;
```
This check is after copying, so it guards the *next* sensor, not the current one. For the first sensor (power), there's no pre-check at all (see point 1). The logic works for subsequent column sensors since the check prevents writing beyond the buffer, but it's fragile and inconsistent — the first sensor has no guard while subsequent ones do.
**3. Hard-coded `8` instead of using `ARRAY_SIZE`**
```c
for (i = 0; i < min_t(u32, ndev->total_col, 8); i++) {
```
The `8` corresponds to the size of `npu_metrics.npu_busy[8]` in `struct amd_pmf_npu_metrics`. This should use `ARRAY_SIZE(npu_metrics.npu_busy)` to avoid a silent bug if the struct is ever expanded.
**4. `!CONFIG_AMD_PMF` stub uses wrong error return**
```c
#define AIE2_GET_PMF_NPU_METRICS(metrics) \
({ \
typeof(metrics) _m = metrics; \
memset(_m, 0xff, sizeof(*_m)); \
(-EOPNOTSUPP); \
})
```
The `memset` to `0xff` is presumably for debug purposes, but since the caller checks `if (ret) return ret;` this data is never used. More importantly, the existing `AIE2_GET_PMF_NPU_DATA` stub uses `-EOPNOTSUPP` too but also fills in a `U32_MAX` default value and returns `0` (success) — the two stubs have inconsistent error semantics. The new `AIE2_GET_PMF_NPU_METRICS` returns an error, meaning sensors will be completely unavailable when `CONFIG_AMD_PMF` is disabled, whereas the old `AIE2_GET_PMF_NPU_DATA` returned success with a sentinel value. If the original behavior was to still report sensors (with a sentinel) when PMF is unavailable, this changes that contract.
**5. `sensor.input` type mismatch**
Looking at the structs:
- `struct amd_pmf_npu_metrics` has `u16 npu_busy[8]`
- `struct amdxdna_drm_query_sensor` has `__u32 input`
```c
sensor.input = npu_metrics.npu_busy[i];
```
This is fine (implicit widening from `u16` to `u32`), but worth noting that the `npu_power` field is also `u16` in the metrics struct. Both assignments are safe.
**6. `ndev->total_col` could exceed 8**
```c
for (i = 0; i < min_t(u32, ndev->total_col, 8); i++) {
```
The `min_t` clamp is correct since `npu_busy` has only 8 entries. But if hardware with >8 columns exists, userspace will silently get incomplete data with no indication. A comment would be helpful here.
**7. The old `AIE2_GET_PMF_NPU_DATA` macro is now dead code**
The patch adds `AIE2_GET_PMF_NPU_METRICS` and converts the sole caller, but doesn't remove the now-unused `AIE2_GET_PMF_NPU_DATA` macro (both the `CONFIG_AMD_PMF` and `!CONFIG_AMD_PMF` variants) or the related `SENSOR_DEFAULT_npu_power` define. These should be cleaned up.
**8. Minor: goto label placement**
```c
out:
args->buffer_size = sensors_count * sizeof(sensor);
return 0;
```
The `out:` label with its blank line above the assignment is fine stylistically, but placing it directly above the assignment makes the flow clearer that `buffer_size` is always updated regardless of how many sensors were written.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 9+ messages in thread* Claude review: accel/amdxdna: Support sensors for column utilization
2026-03-11 17:18 [PATCH V2] accel/amdxdna: Support sensors for column utilization Lizhi Hou
2026-03-11 20:48 ` Claude review: " Claude Code Review Bot
@ 2026-03-11 20:48 ` Claude Code Review Bot
1 sibling, 0 replies; 9+ messages in thread
From: Claude Code Review Bot @ 2026-03-11 20:48 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: accel/amdxdna: Support sensors for column utilization
Author: Lizhi Hou <lizhi.hou@amd.com>
Patches: 3
Reviewed: 2026-03-12T06:48:17.277058
---
This is a single-patch series (V2) adding column utilization sensor support to the `accel/amdxdna` driver. It extends the existing `aie2_get_sensors()` function to return up to 8 column utilization sensors alongside the existing power sensor, using NPU busy metrics from the AMD PMF driver. The approach is reasonable and the patch is relatively small, but there are a few issues worth addressing around buffer validation, the `!CONFIG_AMD_PMF` stub, and the hard-coded array size.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-03-11 20:48 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-28 6:10 [PATCH 0/2] Add some sensor data to amdxdna Mario Limonciello (AMD)
2026-02-28 6:10 ` [PATCH 1/2] accel/amdxdna: Add IOCTL to retrieve realtime NPU power estimate Mario Limonciello (AMD)
2026-03-03 4:37 ` Claude review: " Claude Code Review Bot
2026-02-28 6:10 ` [PATCH 2/2] accel/amdxdna: Support sensors for column utilization Mario Limonciello (AMD)
2026-02-28 19:56 ` Lizhi Hou
2026-03-03 4:37 ` Claude review: " Claude Code Review Bot
2026-03-03 4:37 ` Claude review: Add some sensor data to amdxdna Claude Code Review Bot
-- strict thread matches above, loose matches on Subject: below --
2026-03-11 17:18 [PATCH V2] accel/amdxdna: Support sensors for column utilization Lizhi Hou
2026-03-11 20:48 ` Claude review: " Claude Code Review Bot
2026-03-11 20:48 ` Claude Code Review Bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox