[PATCH v2 0/5] better LUT segmentation for EOTFs

public inbox for drm-ai-reviews@public-inbox.freedesktop.org
 help / color / mirror / Atom feed

* [PATCH v2 0/5] better LUT segmentation for EOTFs
@ 2026-05-06 19:11 Melissa Wen
  2026-05-06 19:11 ` [PATCH v2 1/5] drm/amd/display: use GAMCOR for degamma private props in subsampled format Melissa Wen
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Melissa Wen @ 2026-05-06 19:11 UTC (permalink / raw)
  To: airlied, alexander.deucher, christian.koenig, harry.wentland,
	simona, siqueira, sunpeng.li
  Cc: Krunoslav Kovac, Dr . David Alan Gilbert, Bhawanpreet Lakha,
	Alex Hung, Aurabindo Pillai, Matthew Schwartz, pekka.paalanen,
	amd-gfx, dri-devel, kernel-dev

Hi,

With an external HDR monitor, we can see gradient banding around the sun
in the intro of Ori and the Will of the Wisps game on steamOS/Gamescope.
Gamescope uses AMD predefined transfer functions for degamma,
shaper/pre-3D-LUT and blend/post-3D-LUT plus CRTC regamma, however, only
degamma block has hardware curves. Shaper, blend, regamma predefined TFs
are software-computed by AMD color module into PWL LUTs. In addition, we
cannot use hardware curves on PRE_DEGAM with subsampled format, so that,
predefined TFs are also translated to LUTs in this situation, using
GAMCOR block instead. For this translation, the driver originally used
the same helper for EOTFs and inverse EOTFs, even though they differ in
input domain, number of regions and number of TF points per region.

Baring this in mind, patch 1 maps degamma predefined curves as LUT using
GAMCOR block for AMD driver-specific property that are still in use by
current gamescope. This was inspired by a similar patch from Harry for
colorop [1]. Patch 2 reverts commit 8b89acc0b2ba ("drm/amd/display:
Remove unused cm3_helper_translate_curve_to_degamma_hw_format") to
reintroduce cm3_helper_translate_curve_to_degamma_hw_format() and patch
3 wire it up for encoded -> linear-light LUTs (degamma/blend). With 16
samples per region across 12 regions for blend LUT (where hardware
fixed-function curves are not available and predefined TFs are
software-computed into LUTs), banding becomes almost imperceptible.

Patch 4 and 5 increase precision in the brightest half, where PQ/SRGB
EOTFs are steeper, by enabling up to 256 samples per region and halving
the per-region point count across 9 regions (128 in [0.5, 1], 64 in
[0.25, 0.5], …). This better matches the shape of PQ/SRGB EOTFs.
Although patches 4 and 5 seem conceptually correct to me, I couldn't see
clear improvement in the bright end with or without them.

This series targets DCN3+ hw families. With this series:
- degamma and blend LUTs use
  cm3_helper_translate_curve_to_degamma_hw_format(): encoded input,
  non-zero end slope, up to 256 points linearly interpolated between
  adjacent TF pts, fitting [0,1] encoded input range.
- shaper and regamma LUTs continue using
  cm3_helper_translate_curve_to_hw_format(): linear-light input, zero
  end slope, 16 points per region across 32 regions.

[1] https://lore.kernel.org/dri-devel/20260330153451.99472-8-harry.wentland@amd.com/

[v1] https://lore.kernel.org/dri-devel/20260414220237.184289-1-mwen@igalia.com/
Changes:
- new patch for GAMCOR usage in case of degamma predefined TF with subsampled formats
- fix misleading information regarding degamma hw curves (Kruno)
- clarify LUT segmentation choice using 8-bit sRGB as a reference (Kruno)

Best Regards,

Melissa

Melissa Wen (5):
  drm/amd/display: use GAMCOR for degamma private props in subsampled
    format
  Revert "drm/amd/display: Remove unused
    cm3_helper_translate_curve_to_degamma_hw_format"
  drm/amd/display: use a separate helper to translate degamma curves
  drm/amd/display: support up to 256 samples per region in degamma/blend
    LUT
  drm/amd/display: use halving distribution for PQ/sRGB linearizing LUT

 .../amd/display/amdgpu_dm/amdgpu_dm_color.c   |  16 +-
 .../amd/display/dc/dcn30/dcn30_cm_common.c    | 184 ++++++++++++++++++
 .../display/dc/dwb/dcn30/dcn30_cm_common.h    |   4 +
 .../amd/display/dc/hwss/dcn32/dcn32_hwseq.c   |  10 +-
 4 files changed, 204 insertions(+), 10 deletions(-)

-- 
2.53.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/5] drm/amd/display: use GAMCOR for degamma private props in subsampled format
  2026-05-06 19:11 [PATCH v2 0/5] better LUT segmentation for EOTFs Melissa Wen
@ 2026-05-06 19:11 ` Melissa Wen
  2026-05-07  3:11   ` Claude review: " Claude Code Review Bot
  2026-05-06 19:11 ` [PATCH v2 2/5] Revert "drm/amd/display: Remove unused cm3_helper_translate_curve_to_degamma_hw_format" Melissa Wen
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Melissa Wen @ 2026-05-06 19:11 UTC (permalink / raw)
  To: airlied, alexander.deucher, christian.koenig, harry.wentland,
	simona, siqueira, sunpeng.li
  Cc: Krunoslav Kovac, Dr . David Alan Gilbert, Bhawanpreet Lakha,
	Alex Hung, Aurabindo Pillai, Matthew Schwartz, pekka.paalanen,
	amd-gfx, dri-devel, kernel-dev

When setting plane degamma TF via AMD driver-specific color properties,
the driver uses PRE_DEGAM color block (ROM). However, this block cannot
be used with subsampled formats as it affects the linearity of color
space in which HW scaler operates. For subsampled format, use the AMD
color module to map plane degamma predefined curve to LUT and use GAMCOR
block instead (RAM).

This is based on Harry's implementation for Fixed Matrix Colorop.

Link: https://lore.kernel.org/dri-devel/20260330153451.99472-1-harry.wentland@amd.com/
Co-developed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
---
 .../drm/amd/display/amdgpu_dm/amdgpu_dm_color.c  | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_color.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_color.c
index 20a76d81d532..4e5b664bbec0 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_color.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_color.c
@@ -1424,7 +1424,7 @@ __set_dm_plane_degamma(struct drm_plane_state *plane_state,
 	const struct drm_color_lut *degamma_lut;
 	enum amdgpu_transfer_function tf = AMDGPU_TRANSFER_FUNCTION_DEFAULT;
 	uint32_t degamma_size;
-	bool has_degamma_lut;
+	bool has_degamma_lut, is_subsampled_format;
 	int ret;
 
 	degamma_lut = __extract_blob_lut(dm_plane_state->degamma_lut,
@@ -1454,12 +1454,20 @@ __set_dm_plane_degamma(struct drm_plane_state *plane_state,
 		if (ret)
 			return ret;
        } else {
-		dc_plane_state->in_transfer_func.type =
-			TF_TYPE_PREDEFINED;
+	       /* Check if format requires post-scale color processing (subsampled formats) */
+		is_subsampled_format = (dc_plane_state->format >= SURFACE_PIXEL_FORMAT_VIDEO_BEGIN &&
+					dc_plane_state->format < SURFACE_PIXEL_FORMAT_SUBSAMPLE_END);
+
+		dc_plane_state->in_transfer_func.type = TF_TYPE_PREDEFINED;
 
 		if (!mod_color_calculate_degamma_params(color_caps,
-		    &dc_plane_state->in_transfer_func, NULL, false))
+							&dc_plane_state->in_transfer_func,
+							NULL,
+							is_subsampled_format)) {
+			drm_err(plane_state->state->dev,
+				"Failed to calculate degamma params.\n");
 			return -ENOMEM;
+		}
 	}
 	return 0;
 }
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/5] Revert "drm/amd/display: Remove unused cm3_helper_translate_curve_to_degamma_hw_format"
  2026-05-06 19:11 [PATCH v2 0/5] better LUT segmentation for EOTFs Melissa Wen
  2026-05-06 19:11 ` [PATCH v2 1/5] drm/amd/display: use GAMCOR for degamma private props in subsampled format Melissa Wen
@ 2026-05-06 19:11 ` Melissa Wen
  2026-05-07  3:11   ` Claude review: " Claude Code Review Bot
  2026-05-06 19:11 ` [PATCH v2 3/5] drm/amd/display: use a separate helper to translate degamma curves Melissa Wen
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Melissa Wen @ 2026-05-06 19:11 UTC (permalink / raw)
  To: airlied, alexander.deucher, christian.koenig, harry.wentland,
	simona, siqueira, sunpeng.li
  Cc: Krunoslav Kovac, Dr . David Alan Gilbert, Bhawanpreet Lakha,
	Alex Hung, Aurabindo Pillai, Matthew Schwartz, pekka.paalanen,
	amd-gfx, dri-devel, kernel-dev

This reverts commit 8b89acc0b2baecfe331f5336e7ff1fcc5a44b062.

So that we can detach NL->L LUT programming from L->NL one, i.e., we can
use cm3_helper_translate_curve_to_degamma_hw_format for plane degamma and
blend (post-3DLUT curve) and cm3_helper_translate_curve_to_hw_format for
plane shaper (pre-3DLUT curve) and stream regamma.

Signed-off-by: Melissa Wen <mwen@igalia.com>
---
 .../amd/display/dc/dcn30/dcn30_cm_common.c    | 151 ++++++++++++++++++
 .../display/dc/dwb/dcn30/dcn30_cm_common.h    |   4 +
 2 files changed, 155 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_cm_common.c b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_cm_common.c
index bfd5515c2f4f..0949b1dffc63 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_cm_common.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_cm_common.c
@@ -303,6 +303,157 @@ bool cm3_helper_translate_curve_to_hw_format(struct dc_context *ctx,
 	return true;
 }
 
+#define NUM_DEGAMMA_REGIONS    12
+
+
+bool cm3_helper_translate_curve_to_degamma_hw_format(
+				const struct dc_transfer_func *output_tf,
+				struct pwl_params *lut_params)
+{
+	struct curve_points3 *corner_points;
+	struct pwl_result_data *rgb_resulted;
+	struct pwl_result_data *rgb;
+	struct pwl_result_data *rgb_plus_1;
+
+	int32_t region_start, region_end;
+	int32_t i;
+	uint32_t j, k, seg_distr[MAX_REGIONS_NUMBER], increment, start_index, hw_points;
+
+	if (output_tf == NULL || lut_params == NULL || output_tf->type == TF_TYPE_BYPASS)
+		return false;
+
+	corner_points = lut_params->corner_points;
+	rgb_resulted = lut_params->rgb_resulted;
+	hw_points = 0;
+
+	memset(lut_params, 0, sizeof(struct pwl_params));
+	memset(seg_distr, 0, sizeof(seg_distr));
+
+	region_start = -NUM_DEGAMMA_REGIONS;
+	region_end   = 0;
+
+
+	for (i = region_end - region_start; i < MAX_REGIONS_NUMBER ; i++)
+		seg_distr[i] = -1;
+	/* 12 segments
+	 * segments are from 2^-12 to 0
+	 */
+	for (i = 0; i < NUM_DEGAMMA_REGIONS ; i++)
+		seg_distr[i] = 4;
+
+	for (k = 0; k < MAX_REGIONS_NUMBER; k++) {
+		if (seg_distr[k] != -1)
+			hw_points += (1 << seg_distr[k]);
+	}
+
+	j = 0;
+	for (k = 0; k < (region_end - region_start); k++) {
+		increment = NUMBER_SW_SEGMENTS / (1 << seg_distr[k]);
+		start_index = (region_start + k + MAX_LOW_POINT) *
+				NUMBER_SW_SEGMENTS;
+		for (i = start_index; i < start_index + NUMBER_SW_SEGMENTS;
+				i += increment) {
+			if (j == hw_points - 1)
+				break;
+			if (i >= TRANSFER_FUNC_POINTS)
+				return false;
+			rgb_resulted[j].red = output_tf->tf_pts.red[i];
+			rgb_resulted[j].green = output_tf->tf_pts.green[i];
+			rgb_resulted[j].blue = output_tf->tf_pts.blue[i];
+			j++;
+		}
+	}
+
+	/* last point */
+	start_index = (region_end + MAX_LOW_POINT) * NUMBER_SW_SEGMENTS;
+	rgb_resulted[hw_points - 1].red = output_tf->tf_pts.red[start_index];
+	rgb_resulted[hw_points - 1].green = output_tf->tf_pts.green[start_index];
+	rgb_resulted[hw_points - 1].blue = output_tf->tf_pts.blue[start_index];
+
+	corner_points[0].red.x = dc_fixpt_pow(dc_fixpt_from_int(2),
+					     dc_fixpt_from_int(region_start));
+	corner_points[0].green.x = corner_points[0].red.x;
+	corner_points[0].blue.x = corner_points[0].red.x;
+	corner_points[1].red.x = dc_fixpt_pow(dc_fixpt_from_int(2),
+					     dc_fixpt_from_int(region_end));
+	corner_points[1].green.x = corner_points[1].red.x;
+	corner_points[1].blue.x = corner_points[1].red.x;
+
+	corner_points[0].red.y = rgb_resulted[0].red;
+	corner_points[0].green.y = rgb_resulted[0].green;
+	corner_points[0].blue.y = rgb_resulted[0].blue;
+
+	/* see comment above, m_arrPoints[1].y should be the Y value for the
+	 * region end (m_numOfHwPoints), not last HW point(m_numOfHwPoints - 1)
+	 */
+	corner_points[1].red.y = rgb_resulted[hw_points - 1].red;
+	corner_points[1].green.y = rgb_resulted[hw_points - 1].green;
+	corner_points[1].blue.y = rgb_resulted[hw_points - 1].blue;
+	corner_points[1].red.slope = dc_fixpt_zero;
+	corner_points[1].green.slope = dc_fixpt_zero;
+	corner_points[1].blue.slope = dc_fixpt_zero;
+
+	if (output_tf->tf == TRANSFER_FUNCTION_PQ) {
+		/* for PQ, we want to have a straight line from last HW X point,
+		 * and the slope to be such that we hit 1.0 at 10000 nits.
+		 */
+		const struct fixed31_32 end_value =
+				dc_fixpt_from_int(125);
+
+		corner_points[1].red.slope = dc_fixpt_div(
+			dc_fixpt_sub(dc_fixpt_one, corner_points[1].red.y),
+			dc_fixpt_sub(end_value, corner_points[1].red.x));
+		corner_points[1].green.slope = dc_fixpt_div(
+			dc_fixpt_sub(dc_fixpt_one, corner_points[1].green.y),
+			dc_fixpt_sub(end_value, corner_points[1].green.x));
+		corner_points[1].blue.slope = dc_fixpt_div(
+			dc_fixpt_sub(dc_fixpt_one, corner_points[1].blue.y),
+			dc_fixpt_sub(end_value, corner_points[1].blue.x));
+	}
+
+	lut_params->hw_points_num = hw_points;
+
+	k = 0;
+	for (i = 1; i < MAX_REGIONS_NUMBER; i++) {
+		if (seg_distr[k] != -1) {
+			lut_params->arr_curve_points[k].segments_num =
+					seg_distr[k];
+			lut_params->arr_curve_points[i].offset =
+					lut_params->arr_curve_points[k].offset + (1 << seg_distr[k]);
+		}
+		k++;
+	}
+
+	if (seg_distr[k] != -1)
+		lut_params->arr_curve_points[k].segments_num = seg_distr[k];
+
+	rgb = rgb_resulted;
+	rgb_plus_1 = rgb_resulted + 1;
+
+	i = 1;
+	while (i != hw_points + 1) {
+		if (dc_fixpt_lt(rgb_plus_1->red, rgb->red))
+			rgb_plus_1->red = rgb->red;
+		if (dc_fixpt_lt(rgb_plus_1->green, rgb->green))
+			rgb_plus_1->green = rgb->green;
+		if (dc_fixpt_lt(rgb_plus_1->blue, rgb->blue))
+			rgb_plus_1->blue = rgb->blue;
+
+		rgb->delta_red   = dc_fixpt_sub(rgb_plus_1->red,   rgb->red);
+		rgb->delta_green = dc_fixpt_sub(rgb_plus_1->green, rgb->green);
+		rgb->delta_blue  = dc_fixpt_sub(rgb_plus_1->blue,  rgb->blue);
+
+		++rgb_plus_1;
+		++rgb;
+		++i;
+	}
+	cm3_helper_convert_to_custom_float(rgb_resulted,
+						lut_params->corner_points,
+						hw_points, false);
+
+	return true;
+}
+
 bool cm3_helper_convert_to_custom_float(
 		struct pwl_result_data *rgb_resulted,
 		struct curve_points3 *corner_points,
diff --git a/drivers/gpu/drm/amd/display/dc/dwb/dcn30/dcn30_cm_common.h b/drivers/gpu/drm/amd/display/dc/dwb/dcn30/dcn30_cm_common.h
index 95f9318a54ef..c23dc1bb29bf 100644
--- a/drivers/gpu/drm/amd/display/dc/dwb/dcn30/dcn30_cm_common.h
+++ b/drivers/gpu/drm/amd/display/dc/dwb/dcn30/dcn30_cm_common.h
@@ -63,6 +63,10 @@ bool cm3_helper_translate_curve_to_hw_format(struct dc_context *ctx,
 	const struct dc_transfer_func *output_tf,
 	struct pwl_params *lut_params, bool fixpoint);
 
+bool cm3_helper_translate_curve_to_degamma_hw_format(
+				const struct dc_transfer_func *output_tf,
+				struct pwl_params *lut_params);
+
 bool cm3_helper_convert_to_custom_float(
 		struct pwl_result_data *rgb_resulted,
 		struct curve_points3 *corner_points,
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 3/5] drm/amd/display: use a separate helper to translate degamma curves
  2026-05-06 19:11 [PATCH v2 0/5] better LUT segmentation for EOTFs Melissa Wen
  2026-05-06 19:11 ` [PATCH v2 1/5] drm/amd/display: use GAMCOR for degamma private props in subsampled format Melissa Wen
  2026-05-06 19:11 ` [PATCH v2 2/5] Revert "drm/amd/display: Remove unused cm3_helper_translate_curve_to_degamma_hw_format" Melissa Wen
@ 2026-05-06 19:11 ` Melissa Wen
  2026-05-07  3:11   ` Claude review: " Claude Code Review Bot
  2026-05-06 19:11 ` [PATCH v2 4/5] drm/amd/display: support up to 256 samples per region in degamma/blend LUT Melissa Wen
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Melissa Wen @ 2026-05-06 19:11 UTC (permalink / raw)
  To: airlied, alexander.deucher, christian.koenig, harry.wentland,
	simona, siqueira, sunpeng.li
  Cc: Krunoslav Kovac, Dr . David Alan Gilbert, Bhawanpreet Lakha,
	Alex Hung, Aurabindo Pillai, Matthew Schwartz, pekka.paalanen,
	amd-gfx, dri-devel, kernel-dev

In newer DCN families, there is no hw predefined curves for shaper,
blend and regamma. When userspace sets pre-defined curves for these
blocks, the driver uses AMD color module to program predefined curve as
LUT. However, it was using the same LUT segmentation for EOTF and
inverse EOTF by using the same color management helper. This is causing
banding on blend when PQ predefined curve is set. Besides that, degamma
predefined HW curve cannot be used with subsampled 4:2:0/4:2:2 formats
as it affects the linearity of color space in which HW scaler operates.

To mitigate banding when using the blend block and better support
subsampled format on degamma, use different translation helpers when
linearizing and delinearizing.

Signed-off-by: Melissa Wen <mwen@igalia.com>
---
 .../gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c    | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c
index fd42f0afc3a9..9dfbb68d503b 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c
@@ -493,9 +493,8 @@ bool dcn32_set_mcm_luts(
 	if (plane_state->blend_tf.type == TF_TYPE_HWPWL)
 		lut_params = &plane_state->blend_tf.pwl;
 	else if (plane_state->blend_tf.type == TF_TYPE_DISTRIBUTED_POINTS) {
-		result = cm3_helper_translate_curve_to_hw_format(plane_state->ctx,
-								 &plane_state->blend_tf,
-								 &dpp_base->regamma_params, false);
+		result = cm3_helper_translate_curve_to_degamma_hw_format(&plane_state->blend_tf,
+									 &dpp_base->regamma_params);
 		if (!result)
 			return result;
 
@@ -551,9 +550,8 @@ bool dcn32_set_input_transfer_func(struct dc *dc,
 	if (plane_state->in_transfer_func.type == TF_TYPE_HWPWL)
 		params = &plane_state->in_transfer_func.pwl;
 	else if (plane_state->in_transfer_func.type == TF_TYPE_DISTRIBUTED_POINTS &&
-		cm3_helper_translate_curve_to_hw_format(plane_state->ctx,
-							&plane_state->in_transfer_func,
-							&dpp_base->degamma_params, false))
+		cm3_helper_translate_curve_to_degamma_hw_format(&plane_state->in_transfer_func,
+								&dpp_base->degamma_params))
 		params = &dpp_base->degamma_params;
 
 	dpp_base->funcs->dpp_program_gamcor_lut(dpp_base, params);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 4/5] drm/amd/display: support up to 256 samples per region in degamma/blend LUT
  2026-05-06 19:11 [PATCH v2 0/5] better LUT segmentation for EOTFs Melissa Wen
                   ` (2 preceding siblings ...)
  2026-05-06 19:11 ` [PATCH v2 3/5] drm/amd/display: use a separate helper to translate degamma curves Melissa Wen
@ 2026-05-06 19:11 ` Melissa Wen
  2026-05-07  3:11   ` Claude review: " Claude Code Review Bot
  2026-05-06 19:11 ` [PATCH v2 5/5] drm/amd/display: use halving distribution for PQ/sRGB linearizing LUT Melissa Wen
  2026-05-07  3:11 ` Claude review: better LUT segmentation for EOTFs Claude Code Review Bot
  5 siblings, 1 reply; 12+ messages in thread
From: Melissa Wen @ 2026-05-06 19:11 UTC (permalink / raw)
  To: airlied, alexander.deucher, christian.koenig, harry.wentland,
	simona, siqueira, sunpeng.li
  Cc: Krunoslav Kovac, Dr . David Alan Gilbert, Bhawanpreet Lakha,
	Alex Hung, Aurabindo Pillai, Matthew Schwartz, pekka.paalanen,
	amd-gfx, dri-devel, kernel-dev

cm3_helper_translate_curve_to_degamma_hw_format() reads one tf_pts entry
per HW LUT point, limiting the number of samples per region to
NUMBER_SW_SEGMENTS (16, at seg_distr[k] = 4) - higher seg_distr[k]
underflows the increment to 0. But the next patch introduces a halving
distribution for PQ/sRGB EOTFs that requires up to 128 samples in its
upper region (seg_distr[k] = 7).

As preparation, extend the loop index by 4 bits and linearly interpolate
adjacent tf_pts entries with the new interp_tf_pts() helper, where the 4
least significant bits are weight in 1/16 increments. This raises the
cap to 256 samples per region (seg_distr[k] = 8). seg_distr[k] <= 4
paths remain unchanged: the 4 least significant bits remain zero and
interp_tf_pts() reduces to a direct lookup.

Co-developed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
---
 .../amd/display/dc/dcn30/dcn30_cm_common.c    | 32 +++++++++++++++----
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_cm_common.c b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_cm_common.c
index 0949b1dffc63..70b7bc3494a2 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_cm_common.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_cm_common.c
@@ -305,6 +305,22 @@ bool cm3_helper_translate_curve_to_hw_format(struct dc_context *ctx,
 
 #define NUM_DEGAMMA_REGIONS    12
 
+/* Linear interpolation of tf_pts entries, where (i >> 4) is the integer tf_pts
+ * index, (i & 0xf) is the 1/16 sub-position.
+ */
+static struct fixed31_32 interp_tf_pts(const struct fixed31_32 *output_tf_channel, int i)
+{
+	struct fixed31_32 in_plus_one, in, value;
+	uint32_t t = i & 0xf;
+
+	in_plus_one = output_tf_channel[(i >> 4) + 1];
+	in = output_tf_channel[i >> 4];
+	value = dc_fixpt_sub(in_plus_one, in);
+	value = dc_fixpt_shr(dc_fixpt_mul_int(value, t), 4);
+	value = dc_fixpt_add(in, value);
+
+	return value;
+}
 
 bool cm3_helper_translate_curve_to_degamma_hw_format(
 				const struct dc_transfer_func *output_tf,
@@ -348,18 +364,20 @@ bool cm3_helper_translate_curve_to_degamma_hw_format(
 
 	j = 0;
 	for (k = 0; k < (region_end - region_start); k++) {
-		increment = NUMBER_SW_SEGMENTS / (1 << seg_distr[k]);
+		increment = (NUMBER_SW_SEGMENTS << 4) / (1 << seg_distr[k]);
 		start_index = (region_start + k + MAX_LOW_POINT) *
 				NUMBER_SW_SEGMENTS;
-		for (i = start_index; i < start_index + NUMBER_SW_SEGMENTS;
-				i += increment) {
+		for (i = (start_index << 4);
+		     i < (start_index << 4) + (NUMBER_SW_SEGMENTS << 4);
+		     i += increment) {
 			if (j == hw_points - 1)
 				break;
-			if (i >= TRANSFER_FUNC_POINTS)
+			if ((i >> 4) + 1 >= TRANSFER_FUNC_POINTS)
 				return false;
-			rgb_resulted[j].red = output_tf->tf_pts.red[i];
-			rgb_resulted[j].green = output_tf->tf_pts.green[i];
-			rgb_resulted[j].blue = output_tf->tf_pts.blue[i];
+
+			rgb_resulted[j].red = interp_tf_pts(output_tf->tf_pts.red, i);
+			rgb_resulted[j].green = interp_tf_pts(output_tf->tf_pts.green, i);
+			rgb_resulted[j].blue = interp_tf_pts(output_tf->tf_pts.blue, i);
 			j++;
 		}
 	}
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 5/5] drm/amd/display: use halving distribution for PQ/sRGB linearizing LUT
  2026-05-06 19:11 [PATCH v2 0/5] better LUT segmentation for EOTFs Melissa Wen
                   ` (3 preceding siblings ...)
  2026-05-06 19:11 ` [PATCH v2 4/5] drm/amd/display: support up to 256 samples per region in degamma/blend LUT Melissa Wen
@ 2026-05-06 19:11 ` Melissa Wen
  2026-05-07  3:11   ` Claude review: " Claude Code Review Bot
  2026-05-07  3:11 ` Claude review: better LUT segmentation for EOTFs Claude Code Review Bot
  5 siblings, 1 reply; 12+ messages in thread
From: Melissa Wen @ 2026-05-06 19:11 UTC (permalink / raw)
  To: airlied, alexander.deucher, christian.koenig, harry.wentland,
	simona, siqueira, sunpeng.li
  Cc: Krunoslav Kovac, Dr . David Alan Gilbert, Bhawanpreet Lakha,
	Alex Hung, Aurabindo Pillai, Matthew Schwartz, pekka.paalanen,
	amd-gfx, dri-devel, kernel-dev

When linearizing, the input is an encoded signal bounded to [0,1] and
PQ/sRGB EOTFs are steepest near 1, requiring more precision near the
bright end.

Take the 8-bit sRGB case as a reference: 256 possible inputs and 256 HW
LUT points line up, so the LUT acts as plain indexing. Float
representations don't land perfectly, but LERP-ing between two HW
entries, when input is within a small epsilon of one of them, doesn't
materially change the result.

Replace the uniform 12-region distribution (16 points each,
192 total, range [2^-12, 1]) with a 9-region halving distribution for
the PQ/sRGB pre-defined EOTF: 128 points in the top region [0.5, 1], 64
in the next, 32 in the next, and so on, down to 1 point in each of the
two darkest regions. Total samples grow from 192 to 256, with uniform
1/256 spacing across [0, 1]. The dark tail below 2^-9 is no longer
sampled separately, which is acceptable for PQ/sRGB.

Suggested-by: Krunoslav Kovac <Krunoslav.Kovac@amd.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
---
 .../amd/display/dc/dcn30/dcn30_cm_common.c    | 33 ++++++++++++++-----
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_cm_common.c b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_cm_common.c
index 70b7bc3494a2..66fe7f313ea3 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_cm_common.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_cm_common.c
@@ -303,8 +303,6 @@ bool cm3_helper_translate_curve_to_hw_format(struct dc_context *ctx,
 	return true;
 }
 
-#define NUM_DEGAMMA_REGIONS    12
-
 /* Linear interpolation of tf_pts entries, where (i >> 4) is the integer tf_pts
  * index, (i & 0xf) is the 1/16 sub-position.
  */
@@ -345,17 +343,34 @@ bool cm3_helper_translate_curve_to_degamma_hw_format(
 	memset(lut_params, 0, sizeof(struct pwl_params));
 	memset(seg_distr, 0, sizeof(seg_distr));
 
-	region_start = -NUM_DEGAMMA_REGIONS;
-	region_end   = 0;
+	if (output_tf->tf == TRANSFER_FUNCTION_PQ ||
+	    output_tf->tf == TRANSFER_FUNCTION_SRGB) {
+		/* 9 segments
+		 * segments are from 2^-9 to 0
+		 */
+		const uint8_t SEG_COUNT = 9;
+		seg_distr[0] = 0; // Since we only have one point in darkest region
+		for (k = 1; k < SEG_COUNT; k++)
+			seg_distr[k] = k - 1; // 2^(k-1) points per region; halves as k decreases
 
+		region_start = -SEG_COUNT;
+		region_end = 0;
+	} else {
+		/* 12 segments
+		 * segments are from 2^-12 to 2^0
+		 * There are less than 256 points, for optimization
+		 */
+		const uint8_t SEG_COUNT = 12;
+
+		for (i = 0; i < SEG_COUNT; i++)
+			seg_distr[i] = 4;
+
+		region_start = -SEG_COUNT;
+		region_end = 0;
+	}
 
 	for (i = region_end - region_start; i < MAX_REGIONS_NUMBER ; i++)
 		seg_distr[i] = -1;
-	/* 12 segments
-	 * segments are from 2^-12 to 0
-	 */
-	for (i = 0; i < NUM_DEGAMMA_REGIONS ; i++)
-		seg_distr[i] = 4;
 
 	for (k = 0; k < MAX_REGIONS_NUMBER; k++) {
 		if (seg_distr[k] != -1)
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Claude review: better LUT segmentation for EOTFs
  2026-05-06 19:11 [PATCH v2 0/5] better LUT segmentation for EOTFs Melissa Wen
                   ` (4 preceding siblings ...)
  2026-05-06 19:11 ` [PATCH v2 5/5] drm/amd/display: use halving distribution for PQ/sRGB linearizing LUT Melissa Wen
@ 2026-05-07  3:11 ` Claude Code Review Bot
  5 siblings, 0 replies; 12+ messages in thread
From: Claude Code Review Bot @ 2026-05-07  3:11 UTC (permalink / raw)
  To: dri-devel-reviews

Overall Series Review

Subject: better LUT segmentation for EOTFs
Author: Melissa Wen <mwen@igalia.com>
Patches: 6
Reviewed: 2026-05-07T13:11:56.356959

---

This is a well-motivated 5-patch series addressing gradient banding in HDR content (specifically PQ EOTFs) when AMD display hardware lacks fixed-function curves for certain color blocks (shaper, blend, regamma). The core insight is sound: the existing code used the same LUT segmentation helper (`cm3_helper_translate_curve_to_hw_format`) for both EOTFs (encoded→linear, degamma-like) and inverse EOTFs (linear→encoded, regamma-like), despite these having fundamentally different input domains and curvature distributions.

**Structure**: Patches 1-3 form the main fix — enabling GAMCOR for subsampled formats, reintroducing the degamma-specific translation helper, and wiring it up. Patches 4-5 are a precision refinement — adding interpolation and a halving segment distribution for PQ/sRGB. The author honestly notes patches 4-5 didn't produce visible improvement, which is refreshing transparency.

**Correctness**: The math checks out. Array bounds are respected (`rgb_resulted` has 259 entries, max `hw_points` is 256; `TRANSFER_FUNC_POINTS` is 1025, max tf_pts index accessed is 401). The `interp_tf_pts` interpolation is clean and correctly degenerates to a direct lookup when `seg_distr[k] <= 4`. The NULL-ramp path through `mod_color_calculate_degamma_params` is safe — all ramp dereferences are guarded by `ramp &&` checks.

**Scope**: Changes are tightly scoped to DCN3+ hardware (dcn32_hwseq.c), touching only 4 files. The series is reviewable and bisectable.

**Minor concerns**: One whitespace issue in patch 1, and the revert in patch 2 could benefit from a brief note about what was changed from the original removed code (if anything). Overall the series looks ready with minor fixes.

---

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Claude review: drm/amd/display: use GAMCOR for degamma private props in subsampled format
  2026-05-06 19:11 ` [PATCH v2 1/5] drm/amd/display: use GAMCOR for degamma private props in subsampled format Melissa Wen
@ 2026-05-07  3:11   ` Claude Code Review Bot
  0 siblings, 0 replies; 12+ messages in thread
From: Claude Code Review Bot @ 2026-05-07  3:11 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Purpose**: When using subsampled video formats (4:2:0/4:2:2), the PRE_DEGAM hardware block (ROM curves) cannot be used because it would affect scaler linearity. This patch detects subsampled formats and passes that flag to `mod_color_calculate_degamma_params` so it computes distributed points for GAMCOR (RAM) instead.

**Correctness of the `map_user_ramp` repurposing**: Passing `is_subsampled_format` as the `map_user_ramp` parameter is semantically imprecise but functionally correct. When `true`, it prevents early returns at `color_gamma.c:1693-1696` that would keep `TF_TYPE_PREDEFINED`, forcing the function to fall through to `TF_TYPE_DISTRIBUTED_POINTS` computation. The `ramp=NULL` path is safe — all ramp accesses are guarded:

```c
// color_gamma.c:1718 — guards ramp dereference
if (map_user_ramp && ramp && ramp->type == GAMMA_RGB_256) {
```

```c
// color_gamma.c:1803 — same pattern
map_user_ramp && ramp && ramp->type == GAMMA_RGB_256,
```

**Whitespace issue**: Line 1457 of `amdgpu_dm_color.c` uses a tab + 7 spaces for the comment indentation, while surrounding code uses double-tabs. This is visible in the applied source:

```c
       } else {
	       /* Check if format requires ... */  ← tab + 7 spaces
		is_subsampled_format = ...             ← two tabs
```

The `} else {` at line 1456 also has space-based indentation, but that's a pre-existing issue.

**Error logging**: Adding `drm_err()` before returning `-ENOMEM` is a good improvement over the silent return in the original code.

**Nit**: The `is_subsampled_format` variable is declared at function scope but only used in the `else` branch. Declaring it inside the `else` block would be cleaner and make the scope explicit.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Claude review: Revert "drm/amd/display: Remove unused cm3_helper_translate_curve_to_degamma_hw_format"
  2026-05-06 19:11 ` [PATCH v2 2/5] Revert "drm/amd/display: Remove unused cm3_helper_translate_curve_to_degamma_hw_format" Melissa Wen
@ 2026-05-07  3:11   ` Claude Code Review Bot
  0 siblings, 0 replies; 12+ messages in thread
From: Claude Code Review Bot @ 2026-05-07  3:11 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Purpose**: Re-introduces `cm3_helper_translate_curve_to_degamma_hw_format()` which was previously removed as unused. It will be wired up in patch 3.

**Code review of the reverted function**: This is a straightforward revert of removal. The function uses 12 regions from 2^-12 to 2^0, with 16 samples per region (seg_distr[k]=4), totaling 192 hw_points. Key differences from the regamma helper:

- **Corner point slopes**: Non-zero for PQ (extrapolates to 1.0 at 10000 nits / 125 normalized), zero for everything else. The regamma helper uses zero slopes unconditionally.
- **Region range**: [2^-12, 1] for encoded input domain vs the regamma helper's much wider range for linear-light input.

**Bounds check**: `rgb_resulted` has 259 entries (`256+3`), 192 hw_points fits. The `arr_curve_points` loop at lines 449-458 iterates `k` from 0 to `MAX_REGIONS_NUMBER-1` (33), which fits the 34-entry array. The `(i >> 4) + 1 < TRANSFER_FUNC_POINTS` check at the original seg_distr=4: max index = start_index + 16 = at most 400+16 = 416 < 1025.

**Stale pointer after memset**: Lines 339-340 save `corner_points` and `rgb_resulted` as pointers into `lut_params`, then line 343 zeroes the struct. The pointers remain valid (they address fields within the struct, not separate allocations), so this is correct.

The function declaration in `dcn30_cm_common.h` is properly added. No issues.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Claude review: drm/amd/display: use a separate helper to translate degamma curves
  2026-05-06 19:11 ` [PATCH v2 3/5] drm/amd/display: use a separate helper to translate degamma curves Melissa Wen
@ 2026-05-07  3:11   ` Claude Code Review Bot
  0 siblings, 0 replies; 12+ messages in thread
From: Claude Code Review Bot @ 2026-05-07  3:11 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Purpose**: Wires up the degamma-specific helper from patch 2 for the blend TF and input (degamma) TF paths in `dcn32_hwseq.c`.

The change is minimal and correct. Two call sites are updated:

1. **Blend TF** (line 493→): `cm3_helper_translate_curve_to_hw_format` → `cm3_helper_translate_curve_to_degamma_hw_format` for `TF_TYPE_DISTRIBUTED_POINTS` blend curves. The degamma helper doesn't take a `dc_context` or `fixpoint` parameter, which is fine — it doesn't need them.

2. **Input TF / degamma** (line 553→): Same substitution for `in_transfer_func`.

**Observation**: The shaper path (around line 516 in the unmodified code) is intentionally left using `cm3_helper_translate_curve_to_hw_format`, since shapers are pre-3DLUT (linear→encoded) and need the regamma-style segmentation. This matches the cover letter's design.

**No issues found.**

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Claude review: drm/amd/display: support up to 256 samples per region in degamma/blend LUT
  2026-05-06 19:11 ` [PATCH v2 4/5] drm/amd/display: support up to 256 samples per region in degamma/blend LUT Melissa Wen
@ 2026-05-07  3:11   ` Claude Code Review Bot
  0 siblings, 0 replies; 12+ messages in thread
From: Claude Code Review Bot @ 2026-05-07  3:11 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Purpose**: Extends the degamma helper to support more than 16 samples per region by introducing 4-bit sub-sample interpolation between adjacent `tf_pts` entries.

**The `interp_tf_pts` helper** (lines 309-321): Clean linear interpolation. When `t = i & 0xf` is 0 (which is always the case for `seg_distr[k] <= 4`), `value = in + 0 = in`, reducing to a direct lookup. The formula `in + (in_plus_one - in) * t / 16` is mathematically correct.

**Bounds safety for higher seg_distr values**: For `seg_distr[k] = 8` (max supported):
- `increment = (16 << 4) / (1 << 8) = 256 / 256 = 1`
- Per-region: 256 iterations, accessing tf_pts indices from `start_index` to `start_index + 16`, with fractional sub-positions 0/16 through 15/16.
- The `(i >> 4) + 1 >= TRANSFER_FUNC_POINTS` guard correctly checks the +1 for interpolation access.

**Potential concern with `i` type**: `i` is declared as `int32_t`. After the shift, `start_index << 4` has a maximum value of `400 * 16 = 6400`, well within `int32_t` range. No overflow risk.

**Style nit**: The blank line between the `#define NUM_DEGAMMA_REGIONS` removal and the `interp_tf_pts` comment creates slightly inconsistent spacing, but this is trivial.

**No functional issues found.**

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Claude review: drm/amd/display: use halving distribution for PQ/sRGB linearizing LUT
  2026-05-06 19:11 ` [PATCH v2 5/5] drm/amd/display: use halving distribution for PQ/sRGB linearizing LUT Melissa Wen
@ 2026-05-07  3:11   ` Claude Code Review Bot
  0 siblings, 0 replies; 12+ messages in thread
From: Claude Code Review Bot @ 2026-05-07  3:11 UTC (permalink / raw)
  To: dri-devel-reviews

Patch Review

**Purpose**: Replaces the uniform 12-region/16-points-per-region distribution with a 9-region halving distribution for PQ and sRGB EOTFs, concentrating samples in the bright end where these curves are steepest.

**Segment distribution verification**:
```
seg_distr[0] = 0  →  2^0 = 1 point    (region [2^-9, 2^-8])
seg_distr[1] = 0  →  2^0 = 1 point    (region [2^-8, 2^-7])
seg_distr[2] = 1  →  2^1 = 2 points   (region [2^-7, 2^-6])
seg_distr[3] = 2  →  2^2 = 4 points   (region [2^-6, 2^-5])
seg_distr[4] = 3  →  2^3 = 8 points   (region [2^-5, 2^-4])
seg_distr[5] = 4  →  2^4 = 16 points  (region [2^-4, 2^-3])
seg_distr[6] = 5  →  2^5 = 32 points  (region [2^-3, 2^-2])
seg_distr[7] = 6  →  2^6 = 64 points  (region [2^-2, 2^-1])
seg_distr[8] = 7  →  2^7 = 128 points (region [2^-1, 2^0])
Total: 1+1+2+4+8+16+32+64+128 = 256 points ✓
```

This fits within `rgb_resulted[256+3]`. The maximum `seg_distr` value is 7, requiring `increment = 256/128 = 2`, which is within patch 4's capability (up to seg_distr=8).

**Bounds on tf_pts access**: For k=8 (brightest region), `start_index = (-9+8+25)*16 = 384`. Max `i` = `(384<<4) + 256 = 6400`. Max `i >> 4 = 400`, `(i>>4)+1 = 401 < 1025`. Safe.

**Corner point `end_value = 125` for PQ**: With `region_end = 0`, corner_points[1].x = 2^0 = 1.0. The PQ slope computation `(1.0 - Y(1.0)) / (125 - 1.0)` extrapolates from the last sample to the 10000-nit point. This is unchanged from the original function and remains correct.

**Non-PQ/non-sRGB fallback**: Transfer functions other than PQ and sRGB (e.g., GAMMA22, HLG) still use the 12-region uniform distribution. This is a reasonable choice — the halving distribution is specifically tuned for PQ/sRGB curvature.

**Comment style nit**: Lines 352-354 use `//` comments while surrounding comments use `/* */`. This is a minor inconsistency; the kernel accepts both styles but `/* */` is more traditional for multi-word comments in display code.

**Observation on the cover letter's candor**: The author notes no clear visual improvement from patches 4-5. While the mathematical reasoning is sound (more precision where curvature is steepest), the lack of visible improvement suggests the banding was already adequately addressed by patches 1-3's switch to the degamma helper with 192 points. These patches are still worthwhile as future-proofing for higher-precision displays.

---

**Summary**: Patches 1-3 are the substantive fix for gradient banding and look correct. Patches 4-5 are a clean precision enhancement. The only actionable item is the whitespace fix in patch 1 (tab+spaces vs double-tab for comment indentation). The series is in good shape overall.

---
Generated by Claude Code Patch Reviewer

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-05-07  3:11 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-06 19:11 [PATCH v2 0/5] better LUT segmentation for EOTFs Melissa Wen
2026-05-06 19:11 ` [PATCH v2 1/5] drm/amd/display: use GAMCOR for degamma private props in subsampled format Melissa Wen
2026-05-07  3:11   ` Claude review: " Claude Code Review Bot
2026-05-06 19:11 ` [PATCH v2 2/5] Revert "drm/amd/display: Remove unused cm3_helper_translate_curve_to_degamma_hw_format" Melissa Wen
2026-05-07  3:11   ` Claude review: " Claude Code Review Bot
2026-05-06 19:11 ` [PATCH v2 3/5] drm/amd/display: use a separate helper to translate degamma curves Melissa Wen
2026-05-07  3:11   ` Claude review: " Claude Code Review Bot
2026-05-06 19:11 ` [PATCH v2 4/5] drm/amd/display: support up to 256 samples per region in degamma/blend LUT Melissa Wen
2026-05-07  3:11   ` Claude review: " Claude Code Review Bot
2026-05-06 19:11 ` [PATCH v2 5/5] drm/amd/display: use halving distribution for PQ/sRGB linearizing LUT Melissa Wen
2026-05-07  3:11   ` Claude review: " Claude Code Review Bot
2026-05-07  3:11 ` Claude review: better LUT segmentation for EOTFs Claude Code Review Bot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox