From mboxrd@z Thu Jan  1 00:00:00 1970
From: Claude Code Review Bot <claude-review@example.com>
To: dri-devel-reviews@example.com
Subject: Claude review: drm/sched: Add test suite for concurrent job submissions
Date: Mon, 25 May 2026 22:03:53 +1000
Message-ID: <review-patch1-20260520093350.1036001-1-marco.pagani@linux.dev>
In-Reply-To: <20260520093350.1036001-1-marco.pagani@linux.dev>
References: <20260520093350.1036001-1-marco.pagani@linux.dev> <20260520093350.1036001-1-marco.pagani@linux.dev>
X-Mailer: Claude Code Patch Reviewer
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

Patch Review

**Teardown ordering (the v4 fix)**

The cleanup action registration order in `drm_sched_concurrent_init` is:
1. `drm_mock_sched_fini_wrap` (registered first =E2=86=92 cleaned up last)
2. `complete_destroy_workqueue` (registered second =E2=86=92 cleaned up fir=
st)

Entity free actions are registered per-worker inside the test body, so they=
 run before both of the above (LIFO). The `complete_destroy_workqueue` func=
tion:

```c
static void complete_destroy_workqueue(void *context)
{
	struct sched_concurrent_context *ctx =3D context;

	complete_all(&ctx->wait_go);

	destroy_workqueue(ctx->sub_wq);
}
```

This correctly handles the early-failure case: if a test aborts before call=
ing `complete_all` in the test body, stuck workers are unblocked here befor=
e the workqueue is destroyed, and the scheduler is finalized last. This ord=
ering looks correct.

However, there's a subtle concern: entity free actions run *before* `comple=
te_destroy_workqueue`. If a test aborts after workers are queued to the wor=
kqueue but before `complete_all` is called, the cleanup would free entities=
 first, then `complete_all` would unblock workers who try to use those free=
d entities. Looking at the test code, this scenario is avoided because ther=
e are no assertions between the `queue_work` loop and `complete_all`:

```c
for (i =3D 0; i < params->num_workers; i++) {
	worker =3D &workers[i];
	INIT_WORK(&worker->work, drm_sched_parallel_worker);
	queue_work(ctx->sub_wq, &worker->work);
}

complete_all(&ctx->wait_go);
flush_workqueue(ctx->sub_wq);
```

No assertion can fire between queuing and completing, so this is safe in pr=
actice. Still, this is a fragile invariant =E2=80=94 a future maintainer ad=
ding a `KUNIT_ASSERT` between these two blocks could reintroduce the use-af=
ter-free. A brief comment here noting this constraint might be worthwhile.

**Double `complete_all` is safe**

In the normal (non-error) flow, `complete_all(&ctx->wait_go)` is called bot=
h in the test body (line 260/429) and again in `complete_destroy_workqueue`=
 during cleanup. Since `complete_all` sets `done =3D UINT_MAX` and is idemp=
otent, the second call is a no-op. This is correct.

**Sliding window in interleaved worker**

```c
max_in_flight_job =3D min(worker->job_count, params->num_in_flight_jobs);
for (i =3D 0; i < max_in_flight_job; i++)
	drm_mock_sched_job_submit(worker->jobs[i]);

for (i =3D 0; i < worker->job_count; i++) {
	done =3D drm_mock_sched_job_wait_finished(worker->jobs[i], timeout);
	...
	j =3D i + max_in_flight_job;
	if (j < worker->job_count)
		drm_mock_sched_job_submit(worker->jobs[j]);
}
```

The sliding window logic is correct. It pre-fills `max_in_flight_job` jobs,=
 then as each job completes, submits the next one, maintaining the window s=
ize until jobs are exhausted.

**Time budget calculation**

```c
worker_share_us =3D (params->test_duration_ms * USEC_PER_MSEC) /
		  params->num_workers;
...
worker->job_duration_us =3D params->job_base_duration_us * (i + 1);
worker->job_count =3D worker_share_us / worker->job_duration_us;
worker->job_count =3D max(1U, worker->job_count);
```

With the given parameters (1000ms, 100us base, 16 workers): each worker get=
s 62,500us. Worker 0 gets 625 jobs of 100us, worker 15 gets 39 jobs of 1600=
us. Total serialized time sums to =E2=89=881 second. The per-job timeout of=
 `test_duration_ms * 2` (2 seconds) is sufficient.

The `max(1U, worker->job_count)` ensures at least one job per worker even i=
f the duration exceeds the share. With the current parameters this never tr=
iggers, but it's a reasonable safety measure for future parameter sets.

**Minor observations:**

1. The `kunit_info` calls in workers include a trailing `\n`, e.g.:
```c
kunit_info(test_ctx->test, "Parallel worker %u submitting %u jobs started\n=
",
	   worker->id, params->num_jobs);
```
`kunit_info` (like `pr_info`) appends a newline automatically. The explicit=
 `\n` results in a double newline in output. Not a bug, but slightly incons=
istent with typical kernel logging style. Same applies to the "Worker %u su=
bmitting" and "Job %u of worker %u timed out" messages.

2. The test suite currently has a single parameter set per test case. While=
 sufficient for initial coverage, it would be straightforward to add more p=
arameter combinations (e.g., more workers than jobs, single worker with man=
y jobs) in the future.

3. The `KUNIT_DEFINE_ACTION_WRAPPER` macros for `drm_mock_sched_fini_wrap` =
and `drm_mock_sched_entity_free_wrap` are defined at file scope within the =
concurrent test section. If other test suites in this file later need the s=
ame wrappers, they're available, but the existing `drm_sched_basic_exit` st=
ill calls `drm_mock_sched_fini` directly rather than using the wrapper. Thi=
s is fine =E2=80=94 the wrapper is only needed for the `kunit_add_action_or=
_reset` pattern.

**Summary:** Clean, well-designed concurrency test. The v4 teardown fix is =
correct. The only actionable nit is the extra `\n` in `kunit_info` format s=
trings. Otherwise, this looks ready.

---
Generated by Claude Code Patch Reviewer