* [PATCH V2] accel/amdxdna: Support sensors for column utilization
@ 2026-03-11 17:18 Lizhi Hou
2026-03-11 17:21 ` Mario Limonciello
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Lizhi Hou @ 2026-03-11 17:18 UTC (permalink / raw)
To: ogabbay, quic_jhugo, dri-devel, mario.limonciello,
maciej.falkowski
Cc: Mario Limonciello (AMD), linux-kernel, max.zhen, sonal.santan,
Lizhi Hou
From: "Mario Limonciello (AMD)" <superm1@kernel.org>
The AMD PMF driver provides realtime column utilization (npu_busy)
metrics for the NPU. Extend the DRM_IOCTL_AMDXDNA_GET_INFO sensor
query to expose these metrics to userspace.
Add AMDXDNA_SENSOR_TYPE_COLUMN_UTILIZATION to the sensor type enum
and update aie2_get_sensors() to return both the total power and up
to 8 column utilization sensors if the user buffer permits.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
[lizhi: support legacy tool which uses small buffer. checkpatch cleanup]
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
---
drivers/accel/amdxdna/aie2_pci.c | 34 +++++++++++++++++++++++++++-----
drivers/accel/amdxdna/aie2_pci.h | 8 ++++++++
include/uapi/drm/amdxdna_accel.h | 3 ++-
3 files changed, 39 insertions(+), 6 deletions(-)
diff --git a/drivers/accel/amdxdna/aie2_pci.c b/drivers/accel/amdxdna/aie2_pci.c
index a2e586512e26..c57c785a2d15 100644
--- a/drivers/accel/amdxdna/aie2_pci.c
+++ b/drivers/accel/amdxdna/aie2_pci.c
@@ -787,16 +787,18 @@ static int aie2_get_clock_metadata(struct amdxdna_client *client,
static int aie2_get_sensors(struct amdxdna_client *client,
struct amdxdna_drm_get_info *args)
{
+ struct amdxdna_dev_hdl *ndev = client->xdna->dev_handle;
struct amdxdna_drm_query_sensor sensor = {};
+ struct amd_pmf_npu_metrics npu_metrics;
+ u32 sensors_count = 0, i;
int ret;
- if (args->buffer_size < sizeof(sensor))
- return -EINVAL;
-
- ret = AIE2_GET_PMF_NPU_DATA(npu_power, sensor.input);
+ ret = AIE2_GET_PMF_NPU_METRICS(&npu_metrics);
if (ret)
return ret;
+
sensor.type = AMDXDNA_SENSOR_TYPE_POWER;
+ sensor.input = npu_metrics.npu_power;
sensor.unitm = -3;
scnprintf(sensor.label, sizeof(sensor.label), "Total Power");
scnprintf(sensor.units, sizeof(sensor.units), "mW");
@@ -804,7 +806,29 @@ static int aie2_get_sensors(struct amdxdna_client *client,
if (copy_to_user(u64_to_user_ptr(args->buffer), &sensor, sizeof(sensor)))
return -EFAULT;
- args->buffer_size = sizeof(sensor);
+ sensors_count++;
+ if (args->buffer_size <= sensors_count * sizeof(sensor))
+ goto out;
+
+ for (i = 0; i < min_t(u32, ndev->total_col, 8); i++) {
+ memset(&sensor, 0, sizeof(sensor));
+ sensor.input = npu_metrics.npu_busy[i];
+ sensor.type = AMDXDNA_SENSOR_TYPE_COLUMN_UTILIZATION;
+ sensor.unitm = 0;
+ scnprintf(sensor.label, sizeof(sensor.label), "Column %d Utilization", i);
+ scnprintf(sensor.units, sizeof(sensor.units), "%%");
+
+ if (copy_to_user(u64_to_user_ptr(args->buffer) + sensors_count * sizeof(sensor),
+ &sensor, sizeof(sensor)))
+ return -EFAULT;
+
+ sensors_count++;
+ if (args->buffer_size <= sensors_count * sizeof(sensor))
+ goto out;
+ }
+
+out:
+ args->buffer_size = sensors_count * sizeof(sensor);
return 0;
}
diff --git a/drivers/accel/amdxdna/aie2_pci.h b/drivers/accel/amdxdna/aie2_pci.h
index 1bb88711bedb..0ae174862592 100644
--- a/drivers/accel/amdxdna/aie2_pci.h
+++ b/drivers/accel/amdxdna/aie2_pci.h
@@ -48,6 +48,7 @@
})
#if IS_ENABLED(CONFIG_AMD_PMF)
+#define AIE2_GET_PMF_NPU_METRICS(metrics) amd_pmf_get_npu_data(metrics)
#define AIE2_GET_PMF_NPU_DATA(field, val) \
({ \
struct amd_pmf_npu_metrics _npu_metrics; \
@@ -58,6 +59,13 @@
(_ret); \
})
#else
+#define AIE2_GET_PMF_NPU_METRICS(metrics) \
+({ \
+ typeof(metrics) _m = metrics; \
+ memset(_m, 0xff, sizeof(*_m)); \
+ (-EOPNOTSUPP); \
+})
+
#define SENSOR_DEFAULT_npu_power U32_MAX
#define AIE2_GET_PMF_NPU_DATA(field, val) \
({ \
diff --git a/include/uapi/drm/amdxdna_accel.h b/include/uapi/drm/amdxdna_accel.h
index 9c44db2b3dcd..5bd13f4435f5 100644
--- a/include/uapi/drm/amdxdna_accel.h
+++ b/include/uapi/drm/amdxdna_accel.h
@@ -353,7 +353,8 @@ struct amdxdna_drm_query_clock_metadata {
};
enum amdxdna_sensor_type {
- AMDXDNA_SENSOR_TYPE_POWER
+ AMDXDNA_SENSOR_TYPE_POWER,
+ AMDXDNA_SENSOR_TYPE_COLUMN_UTILIZATION
};
/**
--
2.34.1
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH V2] accel/amdxdna: Support sensors for column utilization
2026-03-11 17:18 [PATCH V2] accel/amdxdna: Support sensors for column utilization Lizhi Hou
@ 2026-03-11 17:21 ` Mario Limonciello
2026-03-11 18:49 ` Lizhi Hou
2026-03-11 20:48 ` Claude review: " Claude Code Review Bot
2026-03-11 20:48 ` Claude Code Review Bot
2 siblings, 1 reply; 6+ messages in thread
From: Mario Limonciello @ 2026-03-11 17:21 UTC (permalink / raw)
To: Lizhi Hou, ogabbay, quic_jhugo, dri-devel, maciej.falkowski
Cc: Mario Limonciello (AMD), linux-kernel, max.zhen, sonal.santan
On 3/11/26 12:18, Lizhi Hou wrote:
> From: "Mario Limonciello (AMD)" <superm1@kernel.org>
>
> The AMD PMF driver provides realtime column utilization (npu_busy)
> metrics for the NPU. Extend the DRM_IOCTL_AMDXDNA_GET_INFO sensor
> query to expose these metrics to userspace.
>
> Add AMDXDNA_SENSOR_TYPE_COLUMN_UTILIZATION to the sensor type enum
> and update aie2_get_sensors() to return both the total power and up
> to 8 column utilization sensors if the user buffer permits.
>
> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
> Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
> [lizhi: support legacy tool which uses small buffer. checkpatch cleanup]
> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Thanks, no concerns on the cleanups.
> ---
> drivers/accel/amdxdna/aie2_pci.c | 34 +++++++++++++++++++++++++++-----
> drivers/accel/amdxdna/aie2_pci.h | 8 ++++++++
> include/uapi/drm/amdxdna_accel.h | 3 ++-
> 3 files changed, 39 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/accel/amdxdna/aie2_pci.c b/drivers/accel/amdxdna/aie2_pci.c
> index a2e586512e26..c57c785a2d15 100644
> --- a/drivers/accel/amdxdna/aie2_pci.c
> +++ b/drivers/accel/amdxdna/aie2_pci.c
> @@ -787,16 +787,18 @@ static int aie2_get_clock_metadata(struct amdxdna_client *client,
> static int aie2_get_sensors(struct amdxdna_client *client,
> struct amdxdna_drm_get_info *args)
> {
> + struct amdxdna_dev_hdl *ndev = client->xdna->dev_handle;
> struct amdxdna_drm_query_sensor sensor = {};
> + struct amd_pmf_npu_metrics npu_metrics;
> + u32 sensors_count = 0, i;
> int ret;
>
> - if (args->buffer_size < sizeof(sensor))
> - return -EINVAL;
> -
> - ret = AIE2_GET_PMF_NPU_DATA(npu_power, sensor.input);
> + ret = AIE2_GET_PMF_NPU_METRICS(&npu_metrics);
> if (ret)
> return ret;
> +
> sensor.type = AMDXDNA_SENSOR_TYPE_POWER;
> + sensor.input = npu_metrics.npu_power;
> sensor.unitm = -3;
> scnprintf(sensor.label, sizeof(sensor.label), "Total Power");
> scnprintf(sensor.units, sizeof(sensor.units), "mW");
> @@ -804,7 +806,29 @@ static int aie2_get_sensors(struct amdxdna_client *client,
> if (copy_to_user(u64_to_user_ptr(args->buffer), &sensor, sizeof(sensor)))
> return -EFAULT;
>
> - args->buffer_size = sizeof(sensor);
> + sensors_count++;
> + if (args->buffer_size <= sensors_count * sizeof(sensor))
> + goto out;
> +
> + for (i = 0; i < min_t(u32, ndev->total_col, 8); i++) {
> + memset(&sensor, 0, sizeof(sensor));
> + sensor.input = npu_metrics.npu_busy[i];
> + sensor.type = AMDXDNA_SENSOR_TYPE_COLUMN_UTILIZATION;
> + sensor.unitm = 0;
> + scnprintf(sensor.label, sizeof(sensor.label), "Column %d Utilization", i);
> + scnprintf(sensor.units, sizeof(sensor.units), "%%");
> +
> + if (copy_to_user(u64_to_user_ptr(args->buffer) + sensors_count * sizeof(sensor),
> + &sensor, sizeof(sensor)))
> + return -EFAULT;
> +
> + sensors_count++;
> + if (args->buffer_size <= sensors_count * sizeof(sensor))
> + goto out;
> + }
> +
> +out:
> + args->buffer_size = sensors_count * sizeof(sensor);
>
> return 0;
> }
> diff --git a/drivers/accel/amdxdna/aie2_pci.h b/drivers/accel/amdxdna/aie2_pci.h
> index 1bb88711bedb..0ae174862592 100644
> --- a/drivers/accel/amdxdna/aie2_pci.h
> +++ b/drivers/accel/amdxdna/aie2_pci.h
> @@ -48,6 +48,7 @@
> })
>
> #if IS_ENABLED(CONFIG_AMD_PMF)
> +#define AIE2_GET_PMF_NPU_METRICS(metrics) amd_pmf_get_npu_data(metrics)
> #define AIE2_GET_PMF_NPU_DATA(field, val) \
> ({ \
> struct amd_pmf_npu_metrics _npu_metrics; \
> @@ -58,6 +59,13 @@
> (_ret); \
> })
> #else
> +#define AIE2_GET_PMF_NPU_METRICS(metrics) \
> +({ \
> + typeof(metrics) _m = metrics; \
> + memset(_m, 0xff, sizeof(*_m)); \
> + (-EOPNOTSUPP); \
> +})
> +
> #define SENSOR_DEFAULT_npu_power U32_MAX
> #define AIE2_GET_PMF_NPU_DATA(field, val) \
> ({ \
> diff --git a/include/uapi/drm/amdxdna_accel.h b/include/uapi/drm/amdxdna_accel.h
> index 9c44db2b3dcd..5bd13f4435f5 100644
> --- a/include/uapi/drm/amdxdna_accel.h
> +++ b/include/uapi/drm/amdxdna_accel.h
> @@ -353,7 +353,8 @@ struct amdxdna_drm_query_clock_metadata {
> };
>
> enum amdxdna_sensor_type {
> - AMDXDNA_SENSOR_TYPE_POWER
> + AMDXDNA_SENSOR_TYPE_POWER,
> + AMDXDNA_SENSOR_TYPE_COLUMN_UTILIZATION
> };
>
> /**
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH V2] accel/amdxdna: Support sensors for column utilization
2026-03-11 17:21 ` Mario Limonciello
@ 2026-03-11 18:49 ` Lizhi Hou
0 siblings, 0 replies; 6+ messages in thread
From: Lizhi Hou @ 2026-03-11 18:49 UTC (permalink / raw)
To: Mario Limonciello, ogabbay, quic_jhugo, dri-devel,
maciej.falkowski
Cc: Mario Limonciello (AMD), linux-kernel, max.zhen, sonal.santan
Applied to drm-misc-next
On 3/11/26 10:21, Mario Limonciello wrote:
>
>
> On 3/11/26 12:18, Lizhi Hou wrote:
>> From: "Mario Limonciello (AMD)" <superm1@kernel.org>
>>
>> The AMD PMF driver provides realtime column utilization (npu_busy)
>> metrics for the NPU. Extend the DRM_IOCTL_AMDXDNA_GET_INFO sensor
>> query to expose these metrics to userspace.
>>
>> Add AMDXDNA_SENSOR_TYPE_COLUMN_UTILIZATION to the sensor type enum
>> and update aie2_get_sensors() to return both the total power and up
>> to 8 column utilization sensors if the user buffer permits.
>>
>> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
>> Reviewed-by: Lizhi Hou <lizhi.hou@amd.com>
>> [lizhi: support legacy tool which uses small buffer. checkpatch cleanup]
>> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
>
> Thanks, no concerns on the cleanups.
>
>> ---
>> drivers/accel/amdxdna/aie2_pci.c | 34 +++++++++++++++++++++++++++-----
>> drivers/accel/amdxdna/aie2_pci.h | 8 ++++++++
>> include/uapi/drm/amdxdna_accel.h | 3 ++-
>> 3 files changed, 39 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/accel/amdxdna/aie2_pci.c
>> b/drivers/accel/amdxdna/aie2_pci.c
>> index a2e586512e26..c57c785a2d15 100644
>> --- a/drivers/accel/amdxdna/aie2_pci.c
>> +++ b/drivers/accel/amdxdna/aie2_pci.c
>> @@ -787,16 +787,18 @@ static int aie2_get_clock_metadata(struct
>> amdxdna_client *client,
>> static int aie2_get_sensors(struct amdxdna_client *client,
>> struct amdxdna_drm_get_info *args)
>> {
>> + struct amdxdna_dev_hdl *ndev = client->xdna->dev_handle;
>> struct amdxdna_drm_query_sensor sensor = {};
>> + struct amd_pmf_npu_metrics npu_metrics;
>> + u32 sensors_count = 0, i;
>> int ret;
>> - if (args->buffer_size < sizeof(sensor))
>> - return -EINVAL;
>> -
>> - ret = AIE2_GET_PMF_NPU_DATA(npu_power, sensor.input);
>> + ret = AIE2_GET_PMF_NPU_METRICS(&npu_metrics);
>> if (ret)
>> return ret;
>> +
>> sensor.type = AMDXDNA_SENSOR_TYPE_POWER;
>> + sensor.input = npu_metrics.npu_power;
>> sensor.unitm = -3;
>> scnprintf(sensor.label, sizeof(sensor.label), "Total Power");
>> scnprintf(sensor.units, sizeof(sensor.units), "mW");
>> @@ -804,7 +806,29 @@ static int aie2_get_sensors(struct
>> amdxdna_client *client,
>> if (copy_to_user(u64_to_user_ptr(args->buffer), &sensor,
>> sizeof(sensor)))
>> return -EFAULT;
>> - args->buffer_size = sizeof(sensor);
>> + sensors_count++;
>> + if (args->buffer_size <= sensors_count * sizeof(sensor))
>> + goto out;
>> +
>> + for (i = 0; i < min_t(u32, ndev->total_col, 8); i++) {
>> + memset(&sensor, 0, sizeof(sensor));
>> + sensor.input = npu_metrics.npu_busy[i];
>> + sensor.type = AMDXDNA_SENSOR_TYPE_COLUMN_UTILIZATION;
>> + sensor.unitm = 0;
>> + scnprintf(sensor.label, sizeof(sensor.label), "Column %d
>> Utilization", i);
>> + scnprintf(sensor.units, sizeof(sensor.units), "%%");
>> +
>> + if (copy_to_user(u64_to_user_ptr(args->buffer) +
>> sensors_count * sizeof(sensor),
>> + &sensor, sizeof(sensor)))
>> + return -EFAULT;
>> +
>> + sensors_count++;
>> + if (args->buffer_size <= sensors_count * sizeof(sensor))
>> + goto out;
>> + }
>> +
>> +out:
>> + args->buffer_size = sensors_count * sizeof(sensor);
>> return 0;
>> }
>> diff --git a/drivers/accel/amdxdna/aie2_pci.h
>> b/drivers/accel/amdxdna/aie2_pci.h
>> index 1bb88711bedb..0ae174862592 100644
>> --- a/drivers/accel/amdxdna/aie2_pci.h
>> +++ b/drivers/accel/amdxdna/aie2_pci.h
>> @@ -48,6 +48,7 @@
>> })
>> #if IS_ENABLED(CONFIG_AMD_PMF)
>> +#define AIE2_GET_PMF_NPU_METRICS(metrics) amd_pmf_get_npu_data(metrics)
>> #define AIE2_GET_PMF_NPU_DATA(field, val) \
>> ({ \
>> struct amd_pmf_npu_metrics _npu_metrics; \
>> @@ -58,6 +59,13 @@
>> (_ret); \
>> })
>> #else
>> +#define AIE2_GET_PMF_NPU_METRICS(metrics) \
>> +({ \
>> + typeof(metrics) _m = metrics; \
>> + memset(_m, 0xff, sizeof(*_m)); \
>> + (-EOPNOTSUPP); \
>> +})
>> +
>> #define SENSOR_DEFAULT_npu_power U32_MAX
>> #define AIE2_GET_PMF_NPU_DATA(field, val) \
>> ({ \
>> diff --git a/include/uapi/drm/amdxdna_accel.h
>> b/include/uapi/drm/amdxdna_accel.h
>> index 9c44db2b3dcd..5bd13f4435f5 100644
>> --- a/include/uapi/drm/amdxdna_accel.h
>> +++ b/include/uapi/drm/amdxdna_accel.h
>> @@ -353,7 +353,8 @@ struct amdxdna_drm_query_clock_metadata {
>> };
>> enum amdxdna_sensor_type {
>> - AMDXDNA_SENSOR_TYPE_POWER
>> + AMDXDNA_SENSOR_TYPE_POWER,
>> + AMDXDNA_SENSOR_TYPE_COLUMN_UTILIZATION
>> };
>> /**
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Claude review: accel/amdxdna: Support sensors for column utilization
2026-03-11 17:18 [PATCH V2] accel/amdxdna: Support sensors for column utilization Lizhi Hou
2026-03-11 17:21 ` Mario Limonciello
@ 2026-03-11 20:48 ` Claude Code Review Bot
2026-03-11 20:48 ` Claude Code Review Bot
2 siblings, 0 replies; 6+ messages in thread
From: Claude Code Review Bot @ 2026-03-11 20:48 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**1. Removed buffer size validation (medium severity)**
The original code had:
```c
if (args->buffer_size < sizeof(sensor))
return -EINVAL;
```
This is removed by the patch. The new code proceeds to `copy_to_user()` of the first (power) sensor without checking whether the user buffer is large enough for even one sensor. If userspace passes `buffer_size = 0`, the copy proceeds unconditionally and the `<=` check at line 170 (`args->buffer_size <= sensors_count * sizeof(sensor)`) only fires *after* the copy. The minimum buffer size check should be retained before the first `copy_to_user`.
**2. Off-by-one / boundary check logic**
```c
sensors_count++;
if (args->buffer_size <= sensors_count * sizeof(sensor))
goto out;
```
This check is after copying, so it guards the *next* sensor, not the current one. For the first sensor (power), there's no pre-check at all (see point 1). The logic works for subsequent column sensors since the check prevents writing beyond the buffer, but it's fragile and inconsistent — the first sensor has no guard while subsequent ones do.
**3. Hard-coded `8` instead of using `ARRAY_SIZE`**
```c
for (i = 0; i < min_t(u32, ndev->total_col, 8); i++) {
```
The `8` corresponds to the size of `npu_metrics.npu_busy[8]` in `struct amd_pmf_npu_metrics`. This should use `ARRAY_SIZE(npu_metrics.npu_busy)` to avoid a silent bug if the struct is ever expanded.
**4. `!CONFIG_AMD_PMF` stub uses wrong error return**
```c
#define AIE2_GET_PMF_NPU_METRICS(metrics) \
({ \
typeof(metrics) _m = metrics; \
memset(_m, 0xff, sizeof(*_m)); \
(-EOPNOTSUPP); \
})
```
The `memset` to `0xff` is presumably for debug purposes, but since the caller checks `if (ret) return ret;` this data is never used. More importantly, the existing `AIE2_GET_PMF_NPU_DATA` stub uses `-EOPNOTSUPP` too but also fills in a `U32_MAX` default value and returns `0` (success) — the two stubs have inconsistent error semantics. The new `AIE2_GET_PMF_NPU_METRICS` returns an error, meaning sensors will be completely unavailable when `CONFIG_AMD_PMF` is disabled, whereas the old `AIE2_GET_PMF_NPU_DATA` returned success with a sentinel value. If the original behavior was to still report sensors (with a sentinel) when PMF is unavailable, this changes that contract.
**5. `sensor.input` type mismatch**
Looking at the structs:
- `struct amd_pmf_npu_metrics` has `u16 npu_busy[8]`
- `struct amdxdna_drm_query_sensor` has `__u32 input`
```c
sensor.input = npu_metrics.npu_busy[i];
```
This is fine (implicit widening from `u16` to `u32`), but worth noting that the `npu_power` field is also `u16` in the metrics struct. Both assignments are safe.
**6. `ndev->total_col` could exceed 8**
```c
for (i = 0; i < min_t(u32, ndev->total_col, 8); i++) {
```
The `min_t` clamp is correct since `npu_busy` has only 8 entries. But if hardware with >8 columns exists, userspace will silently get incomplete data with no indication. A comment would be helpful here.
**7. The old `AIE2_GET_PMF_NPU_DATA` macro is now dead code**
The patch adds `AIE2_GET_PMF_NPU_METRICS` and converts the sole caller, but doesn't remove the now-unused `AIE2_GET_PMF_NPU_DATA` macro (both the `CONFIG_AMD_PMF` and `!CONFIG_AMD_PMF` variants) or the related `SENSOR_DEFAULT_npu_power` define. These should be cleaned up.
**8. Minor: goto label placement**
```c
out:
args->buffer_size = sensors_count * sizeof(sensor);
return 0;
```
The `out:` label with its blank line above the assignment is fine stylistically, but placing it directly above the assignment makes the flow clearer that `buffer_size` is always updated regardless of how many sensors were written.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 6+ messages in thread* Claude review: accel/amdxdna: Support sensors for column utilization
2026-03-11 17:18 [PATCH V2] accel/amdxdna: Support sensors for column utilization Lizhi Hou
2026-03-11 17:21 ` Mario Limonciello
2026-03-11 20:48 ` Claude review: " Claude Code Review Bot
@ 2026-03-11 20:48 ` Claude Code Review Bot
2 siblings, 0 replies; 6+ messages in thread
From: Claude Code Review Bot @ 2026-03-11 20:48 UTC (permalink / raw)
To: dri-devel-reviews
Overall Series Review
Subject: accel/amdxdna: Support sensors for column utilization
Author: Lizhi Hou <lizhi.hou@amd.com>
Patches: 3
Reviewed: 2026-03-12T06:48:17.277058
---
This is a single-patch series (V2) adding column utilization sensor support to the `accel/amdxdna` driver. It extends the existing `aie2_get_sensors()` function to return up to 8 column utilization sensors alongside the existing power sensor, using NPU busy metrics from the AMD PMF driver. The approach is reasonable and the patch is relatively small, but there are a few issues worth addressing around buffer validation, the `!CONFIG_AMD_PMF` stub, and the hard-coded array size.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 0/2] Add some sensor data to amdxdna
@ 2026-02-28 6:10 Mario Limonciello (AMD)
2026-02-28 6:10 ` [PATCH 2/2] accel/amdxdna: Support sensors for column utilization Mario Limonciello (AMD)
0 siblings, 1 reply; 6+ messages in thread
From: Mario Limonciello (AMD) @ 2026-02-28 6:10 UTC (permalink / raw)
To: mario.limonciello, airlied, lizhi.hou, maarten.lankhorst,
mamin506, mripard, ogabbay, simona, tzimmermann
Cc: Mario Limonciello (AMD), dri-devel
Previously a series [1] was sent to add power sensor data for amdxdna
through the PMF mailbox. The PMF patch was merged but the XDNA
patches were planned for later.
Re-send the XDNA patch, and then add in another patch for getting
column utilization.
Link: https://lore.kernel.org/platform-driver-x86/20260105172956.3732123-3-lizhi.hou@amd.com/ [1]
Lizhi Hou (1):
accel/amdxdna: Add IOCTL to retrieve realtime NPU power estimate
Mario Limonciello (AMD) (1):
accel/amdxdna: Support sensors for column utilization
drivers/accel/amdxdna/aie2_pci.c | 50 +++++++++++++++++++++++++
drivers/accel/amdxdna/aie2_pci.h | 22 +++++++++++
drivers/accel/amdxdna/amdxdna_pci_drv.c | 3 +-
include/uapi/drm/amdxdna_accel.h | 3 +-
4 files changed, 76 insertions(+), 2 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 2/2] accel/amdxdna: Support sensors for column utilization
2026-02-28 6:10 [PATCH 0/2] Add some sensor data to amdxdna Mario Limonciello (AMD)
@ 2026-02-28 6:10 ` Mario Limonciello (AMD)
2026-03-03 4:37 ` Claude review: " Claude Code Review Bot
0 siblings, 1 reply; 6+ messages in thread
From: Mario Limonciello (AMD) @ 2026-02-28 6:10 UTC (permalink / raw)
To: mario.limonciello, airlied, lizhi.hou, maarten.lankhorst,
mamin506, mripard, ogabbay, simona, tzimmermann
Cc: Mario Limonciello (AMD), dri-devel
The AMD PMF driver provides realtime column utilization (npu_busy)
metrics for the NPU. Extend the DRM_IOCTL_AMDXDNA_GET_INFO sensor
query to expose these metrics to userspace.
Add AMDXDNA_SENSOR_TYPE_COLUMN_UTILIZATION to the sensor type enum
and update aie2_get_sensors() to return both the total power and up
to 8 column utilization sensors if the user buffer permits.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
---
drivers/accel/amdxdna/aie2_pci.c | 27 ++++++++++++++++++++++++---
drivers/accel/amdxdna/aie2_pci.h | 2 ++
include/uapi/drm/amdxdna_accel.h | 3 ++-
3 files changed, 28 insertions(+), 4 deletions(-)
diff --git a/drivers/accel/amdxdna/aie2_pci.c b/drivers/accel/amdxdna/aie2_pci.c
index a2e586512e26c..1f11822f0bfda 100644
--- a/drivers/accel/amdxdna/aie2_pci.c
+++ b/drivers/accel/amdxdna/aie2_pci.c
@@ -787,16 +787,24 @@ static int aie2_get_clock_metadata(struct amdxdna_client *client,
static int aie2_get_sensors(struct amdxdna_client *client,
struct amdxdna_drm_get_info *args)
{
+ struct amdxdna_dev_hdl *ndev = client->xdna->dev_handle;
struct amdxdna_drm_query_sensor sensor = {};
+ struct amd_pmf_npu_metrics npu_metrics;
+ u32 sensors_count, util_count, i;
int ret;
- if (args->buffer_size < sizeof(sensor))
+ util_count = min_t(u32, ndev->total_col, 8);
+ sensors_count = 1 + util_count;
+
+ if (args->buffer_size < sensors_count * sizeof(sensor))
return -EINVAL;
- ret = AIE2_GET_PMF_NPU_DATA(npu_power, sensor.input);
+ ret = AIE2_GET_PMF_NPU_METRICS(&npu_metrics);
if (ret)
return ret;
+
sensor.type = AMDXDNA_SENSOR_TYPE_POWER;
+ sensor.input = npu_metrics.npu_power;
sensor.unitm = -3;
scnprintf(sensor.label, sizeof(sensor.label), "Total Power");
scnprintf(sensor.units, sizeof(sensor.units), "mW");
@@ -804,7 +812,20 @@ static int aie2_get_sensors(struct amdxdna_client *client,
if (copy_to_user(u64_to_user_ptr(args->buffer), &sensor, sizeof(sensor)))
return -EFAULT;
- args->buffer_size = sizeof(sensor);
+ for (i = 0; i < util_count; i++) {
+ memset(&sensor, 0, sizeof(sensor));
+ sensor.input = npu_metrics.npu_busy[i];
+ sensor.type = AMDXDNA_SENSOR_TYPE_COLUMN_UTILIZATION;
+ sensor.unitm = 0;
+ scnprintf(sensor.label, sizeof(sensor.label), "Column %d Utilization", i);
+ scnprintf(sensor.units, sizeof(sensor.units), "%%");
+
+ if (copy_to_user(u64_to_user_ptr(args->buffer) + (i + 1) * sizeof(sensor),
+ &sensor, sizeof(sensor)))
+ return -EFAULT;
+ }
+
+ args->buffer_size = sensors_count * sizeof(sensor);
return 0;
}
diff --git a/drivers/accel/amdxdna/aie2_pci.h b/drivers/accel/amdxdna/aie2_pci.h
index 1bb88711bedb3..424fd6d852191 100644
--- a/drivers/accel/amdxdna/aie2_pci.h
+++ b/drivers/accel/amdxdna/aie2_pci.h
@@ -48,6 +48,7 @@
})
#if IS_ENABLED(CONFIG_AMD_PMF)
+#define AIE2_GET_PMF_NPU_METRICS(metrics) amd_pmf_get_npu_data(metrics)
#define AIE2_GET_PMF_NPU_DATA(field, val) \
({ \
struct amd_pmf_npu_metrics _npu_metrics; \
@@ -58,6 +59,7 @@
(_ret); \
})
#else
+#define AIE2_GET_PMF_NPU_METRICS(metrics) (-EOPNOTSUPP)
#define SENSOR_DEFAULT_npu_power U32_MAX
#define AIE2_GET_PMF_NPU_DATA(field, val) \
({ \
diff --git a/include/uapi/drm/amdxdna_accel.h b/include/uapi/drm/amdxdna_accel.h
index 9c44db2b3dcd4..5bd13f4435f55 100644
--- a/include/uapi/drm/amdxdna_accel.h
+++ b/include/uapi/drm/amdxdna_accel.h
@@ -353,7 +353,8 @@ struct amdxdna_drm_query_clock_metadata {
};
enum amdxdna_sensor_type {
- AMDXDNA_SENSOR_TYPE_POWER
+ AMDXDNA_SENSOR_TYPE_POWER,
+ AMDXDNA_SENSOR_TYPE_COLUMN_UTILIZATION
};
/**
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* Claude review: accel/amdxdna: Support sensors for column utilization
2026-02-28 6:10 ` [PATCH 2/2] accel/amdxdna: Support sensors for column utilization Mario Limonciello (AMD)
@ 2026-03-03 4:37 ` Claude Code Review Bot
0 siblings, 0 replies; 6+ messages in thread
From: Claude Code Review Bot @ 2026-03-03 4:37 UTC (permalink / raw)
To: dri-devel-reviews
Patch Review
**Rigid buffer size requirement breaks forward/backward compatibility** (`aie2_pci.c`):
```c
util_count = min_t(u32, ndev->total_col, 8);
sensors_count = 1 + util_count;
if (args->buffer_size < sensors_count * sizeof(sensor))
return -EINVAL;
```
This is the most significant issue in the series. Patch 1 requires space for 1 sensor. Patch 2 changes this to require space for `1 + min(total_col, 8)` sensors. Problems:
1. **No discovery mechanism**: Userspace has no way to query how many sensors exist before allocating the buffer. It must know `total_col` a priori, which is a driver implementation detail.
2. **All-or-nothing**: If userspace only wants the power sensor, it cannot query just that one — it must allocate for all sensors or get `-EINVAL`.
3. **Hardware-dependent size**: Different hardware with different `total_col` values require different buffer sizes for the same IOCTL parameter.
A better design would be: fill as many sensors as fit in the provided buffer and set `args->buffer_size` to the number of bytes actually written. Userspace can then allocate a larger buffer and retry if it wants more. This is a common pattern in kernel ioctls.
**`AIE2_GET_PMF_NPU_DATA` becomes dead code** (`aie2_pci.h`):
Patch 2 adds:
```c
#define AIE2_GET_PMF_NPU_METRICS(metrics) amd_pmf_get_npu_data(metrics)
```
and changes `aie2_get_sensors()` to use `AIE2_GET_PMF_NPU_METRICS()` instead of `AIE2_GET_PMF_NPU_DATA()`. But the old macro (and its `SENSOR_DEFAULT_npu_power` helper) are left in place with no remaining callers. These should be removed.
Also, `AIE2_GET_PMF_NPU_METRICS` is a trivial one-line wrapper that adds no value — it just calls `amd_pmf_get_npu_data()`. Consider calling the function directly (guarded by `#if IS_ENABLED(CONFIG_AMD_PMF)`) instead of adding a macro that just aliases the function.
**Partial copy_to_user on failure** (`aie2_pci.c`):
```c
if (copy_to_user(u64_to_user_ptr(args->buffer), &sensor, sizeof(sensor)))
return -EFAULT;
for (i = 0; i < util_count; i++) {
...
if (copy_to_user(u64_to_user_ptr(args->buffer) + (i + 1) * sizeof(sensor),
&sensor, sizeof(sensor)))
return -EFAULT;
}
args->buffer_size = sensors_count * sizeof(sensor);
```
If the power sensor is successfully copied but a subsequent column utilization copy fails, the function returns `-EFAULT` without setting `args->buffer_size`. Userspace has received partial data in its buffer but has no indication of how much was written. This is a minor concern in practice (a partial `copy_to_user` failure typically indicates a corrupted user pointer), but building into a local buffer and doing a single `copy_to_user` would be cleaner and avoid partial writes entirely.
**`npu_busy` is `u16` but `sensor.input` is `__u32`** — this is fine, the implicit widening is correct. Just noting for completeness that the PMF metrics use `u16` values while the UAPI struct uses `__u32`.
**Missing sensor index/identifier**: The column utilization sensors are only distinguished by their `label` string ("Column 0 Utilization", etc.). There is no numeric index field in the sensor struct. Userspace must parse the label string to determine which column a sensor corresponds to. Adding a field or using a sub-type would be more robust, though this would require a UAPI change.
**Version bump scope**: The version bump to 0.7 is in patch 1 with the comment `"Support getting power and utilization data"`, but utilization support is only added in patch 2. The version bump should either be in patch 2 (when the full feature is present) or the comment in patch 1 should only mention power data.
---
Generated by Claude Code Patch Reviewer
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-03-11 20:48 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-11 17:18 [PATCH V2] accel/amdxdna: Support sensors for column utilization Lizhi Hou
2026-03-11 17:21 ` Mario Limonciello
2026-03-11 18:49 ` Lizhi Hou
2026-03-11 20:48 ` Claude review: " Claude Code Review Bot
2026-03-11 20:48 ` Claude Code Review Bot
-- strict thread matches above, loose matches on Subject: below --
2026-02-28 6:10 [PATCH 0/2] Add some sensor data to amdxdna Mario Limonciello (AMD)
2026-02-28 6:10 ` [PATCH 2/2] accel/amdxdna: Support sensors for column utilization Mario Limonciello (AMD)
2026-03-03 4:37 ` Claude review: " Claude Code Review Bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox